Fighting Spam

Fighting Spam

Web Spam is defined as:

"Irrelevant or unsolicited messages sent over the Internet, typically to large numbers of users, for the purposes of advertising, phishing, spreading malware, etc."

Spam on the internet was apparently named after a Monty Python sketch, set in a cafe in which every item on the menu included spam.  Spam being the tinned meat product produced by the Hormel Foods Corporation.

How to Recognise Web Spam:

These are often messages or comments left on your website with no interest in your service or product but rather to promote their own products or services.  Sometimes you can't distinguish spam from legitimate interaction as they simply seem like flattering comments, such as:

"Oh my goodness! Incredible article dude!"

‘I saw your post awhile back and saved it to my computer. Only now have I got a chance to reading it and have to tell you good work.’

...and so on.

Spam is however usually noticeable from the poor quality of grammar or spelling.  At times the messages make little sense or no sense at all, down to the characters being used not making words at all, such as:

"Now he is using cheap uggs what he learened at the high level on the hardwood as year a head coach"

"Metal roofs become quickly heated compared to other synthetic drugs which are not a part of that. It promises results, backing up a camper trailer is nothing..."

"Your acquire required heat into a home process if be might manufacture aid individuals lowering consumption show online magazine shares, to presently connect..."

Well, I think you get the point.  You may pick these up a little easier by searching for a link embedded in these comments.  You'll notice I highlighted words in the above examples, those would usually be linking out to another website, often one that may be running a scam of some sort.

Why to People Spam

Web Spammers usually do what they do because they believe that it can help market their own website.  They post a comment on your blog in the hopes that you may approve it. In their comment is a link to their website, often totally unrelated to yours or of one of questionable content. That link now appears on your blog. It is generally accepted that Google and other search engines pay attention to how many other web sites link back to a website, as it is regarded as one way to measure the popularity of a website.

The theory behind spamming websites is that by creating hundreds of thousands of links back to their website that they'll increase their chances of appearing near the top of a search engines search results.  The rationale behind this is that once on the first page of a search results for a particular phrase that they should expect related traffic, helping them with the scam they might be running.

How do People Spam?

While some people still manually comment on blogs with the above mentioned goals in mind, most spam the web using automated processes.  Spammers generally create scripts, bots (robots), that will automatically post information, filling out forms on your website.  In this manner a spammer can “post” to a multitude of websites in a relatively short time with minimal effort.

Preventing Spam:

While there is no fool-proof method to ensure that you never receive spam, there are different methods you can employ to ensure that this is reduced:

Captcha's when added to a web form, ask the poster to complete a task.  These tasks may range from replicating a phrase in an image to completing a simple puzzle.  Something that is usually difficult to near impossible for a bot or script to do.

Include an automated spam filter, which attempts to filter comments and other form submissions for links and/or the use of unnatural language.  Unfortunately these systems can occasionally filter out legitimate comments and enquiries while occasionally still letting spam through.

Manually reviewing all comments in addition to manually filtering all other form submissions, while this may be the most accurate method of filtering this is also the most resource intensive, often requiring hours of review.

Force all users to authenticate an account before they are able to post.  This method requires that those who wish to comment; use your service or conclude a transaction sign up to your database, logging in each time they wish to interact with your website.  This may be cumbersome for many to use and may result in little interaction.

Automated form validation, much like the automated spam filter, validates certain form fields ensuring that the formatting remains consistent with what is being processed.  So when expecting to see an email address, the form has to include an "@" in the form entry, failing this, the form can't be submitted.  While this can be generally effective there are always exceptions to the rule which may preclude some valid submissions.

For a more information and step by step spam guide, check out: