<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Flood Detection, Rails and Memcached.</title>
	<atom:link href="http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached/</link>
	<description>The official blog for the web game, Forumwarz</description>
	<lastBuildDate>Sun, 03 Feb 2013 21:44:22 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: markchd</title>
		<link>http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached/#comment-172</link>
		<dc:creator><![CDATA[markchd]]></dc:creator>
		<pubDate>Wed, 16 Apr 2008 17:56:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached#comment-172</guid>
		<description><![CDATA[It&#039;s so simple when someone explains it.]]></description>
		<content:encoded><![CDATA[<p>It&#8217;s so simple when someone explains it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Green Rails</title>
		<link>http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached/#comment-173</link>
		<dc:creator><![CDATA[Green Rails]]></dc:creator>
		<pubDate>Wed, 16 Apr 2008 17:56:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached#comment-173</guid>
		<description><![CDATA[This rocks.

I have been looking for simple-stupid ways of controlling floods of unwanted trafficfor ages; without memcached a cluster of servers gets worse at detecting the problem the bigger the cluster.  indexing the created_at column is a big waste.  Your solution is brilliant.

Howevah...

What about the bots you want?  Sure, they&#039;re not POSTing comment spam with links to V1@gr@ sites, or anything, but not every rogue bot is a POSTer.  Whitelists/blacklists all, well they all suck, and heuristics User-agent bot detection suck only a bot, er, a bit less.

Trying to separate the sheep from the wolves is a challenge indeed.

Wouldn&#039;t it be cool to make a little non-human detector based on patterns of behavior?  Sure, the wget script would be simple: same IP, same User-agent, same rate.  But detecting the smarter ones (like ones I have written in past, dark and evil days) that snarf up content, but irregularly (using rand()) and with innocent user-agents, but which happen with uncanny regularity.

Some patterns are intentional (Googlebot has a well-known list of IPs it comes from) and therefore helpful.  Anything violating robots.txt rules is dead meat.  Any IP that doesn&#039;t also get the images on your page is either a) the last remaining Lynx user, or b) a bot.  But it&#039;s the tying this all together, especially in a large, clustered environment that makes the problem hard.

I think memcached can be used to aggregate this information in the same way you wrote about, and that should make the problem a) much simpler and immediate, and b) much lighter-weight.

And when it&#039;s all done, all we need to do is figure out how to end spam for once and for all by turning the bots upon each other in some n-squared kind of way that makes the rest of us blokes just trying to focus on doing good things laugh with glee as the spambots self-destruct.  Moo ha ha!

And if there&#039;s not a good algorithm here, there&#039;s gotta be a good B-movie plot.

Tom]]></description>
		<content:encoded><![CDATA[<p>This rocks.</p>
<p>I have been looking for simple-stupid ways of controlling floods of unwanted trafficfor ages; without memcached a cluster of servers gets worse at detecting the problem the bigger the cluster.  indexing the created_at column is a big waste.  Your solution is brilliant.</p>
<p>Howevah&#8230;</p>
<p>What about the bots you want?  Sure, they&#8217;re not POSTing comment spam with links to V1@gr@ sites, or anything, but not every rogue bot is a POSTer.  Whitelists/blacklists all, well they all suck, and heuristics User-agent bot detection suck only a bot, er, a bit less.</p>
<p>Trying to separate the sheep from the wolves is a challenge indeed.</p>
<p>Wouldn&#8217;t it be cool to make a little non-human detector based on patterns of behavior?  Sure, the wget script would be simple: same IP, same User-agent, same rate.  But detecting the smarter ones (like ones I have written in past, dark and evil days) that snarf up content, but irregularly (using rand()) and with innocent user-agents, but which happen with uncanny regularity.</p>
<p>Some patterns are intentional (Googlebot has a well-known list of IPs it comes from) and therefore helpful.  Anything violating robots.txt rules is dead meat.  Any IP that doesn&#8217;t also get the images on your page is either a) the last remaining Lynx user, or b) a bot.  But it&#8217;s the tying this all together, especially in a large, clustered environment that makes the problem hard.</p>
<p>I think memcached can be used to aggregate this information in the same way you wrote about, and that should make the problem a) much simpler and immediate, and b) much lighter-weight.</p>
<p>And when it&#8217;s all done, all we need to do is figure out how to end spam for once and for all by turning the bots upon each other in some n-squared kind of way that makes the rest of us blokes just trying to focus on doing good things laugh with glee as the spambots self-destruct.  Moo ha ha!</p>
<p>And if there&#8217;s not a good algorithm here, there&#8217;s gotta be a good B-movie plot.</p>
<p>Tom</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: leethal</title>
		<link>http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached/#comment-174</link>
		<dc:creator><![CDATA[leethal]]></dc:creator>
		<pubDate>Wed, 16 Apr 2008 17:56:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached#comment-174</guid>
		<description><![CDATA[* blog added to RSS feed *

Nice article!]]></description>
		<content:encoded><![CDATA[<p>* blog added to RSS feed *</p>
<p>Nice article!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bigguyinblack</title>
		<link>http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached/#comment-175</link>
		<dc:creator><![CDATA[Bigguyinblack]]></dc:creator>
		<pubDate>Wed, 16 Apr 2008 17:56:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached#comment-175</guid>
		<description><![CDATA[It does lead to double posts when people try to edit their message and get the flooding message. They wait a few moments, submit the edit, and it goes through as a new message.]]></description>
		<content:encoded><![CDATA[<p>It does lead to double posts when people try to edit their message and get the flooding message. They wait a few moments, submit the edit, and it goes through as a new message.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Evil Trout</title>
		<link>http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached/#comment-176</link>
		<dc:creator><![CDATA[Evil Trout]]></dc:creator>
		<pubDate>Wed, 16 Apr 2008 17:56:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached#comment-176</guid>
		<description><![CDATA[@Bigguyinblack this bug was short lived and fixed yesterday.]]></description>
		<content:encoded><![CDATA[<p>@Bigguyinblack this bug was short lived and fixed yesterday.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Zre</title>
		<link>http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached/#comment-177</link>
		<dc:creator><![CDATA[Zre]]></dc:creator>
		<pubDate>Wed, 16 Apr 2008 17:56:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached#comment-177</guid>
		<description><![CDATA[Cute! It&#039;s even nicer than &#039;nude&#039; Arktor!
POST MOAR KODE!]]></description>
		<content:encoded><![CDATA[<p>Cute! It&#8217;s even nicer than &#8216;nude&#8217; Arktor!<br />
POST MOAR KODE!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Blindey McBlinderson</title>
		<link>http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached/#comment-178</link>
		<dc:creator><![CDATA[Blindey McBlinderson]]></dc:creator>
		<pubDate>Wed, 16 Apr 2008 17:56:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached#comment-178</guid>
		<description><![CDATA[&quot;Any IP that doesn’t also get the images on your page is either a) the last remaining Lynx user, or b) a bot.&quot;

Hey, not everybody needs images.]]></description>
		<content:encoded><![CDATA[<p>&quot;Any IP that doesn’t also get the images on your page is either a) the last remaining Lynx user, or b) a bot.&quot;</p>
<p>Hey, not everybody needs images.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jetienne</title>
		<link>http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached/#comment-179</link>
		<dc:creator><![CDATA[jetienne]]></dc:creator>
		<pubDate>Wed, 16 Apr 2008 17:56:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.forumwarz.com/2008/04/16/flood-detection-rails-and-memcached#comment-179</guid>
		<description><![CDATA[@Bigguyinblack

Check the code: validates_each(field, :on =&gt; :create), it&#039;s on creation only...]]></description>
		<content:encoded><![CDATA[<p>@Bigguyinblack</p>
<p>Check the code: validates_each(field, <img src='http://s1.wp.com/wp-includes/images/smilies/icon_surprised.gif' alt=':o' class='wp-smiley' /> n =&gt; :create), it&#8217;s on creation only&#8230;</p>
]]></content:encoded>
	</item>
</channel>
</rss>
