The War on Spam

In recent weeks the volume of new spammer accounts has grown considerably. These accounts are sometimes used to post spam messages to the discussion group, but more often are simply used to game google by including spam urls in the user profile.

Due to the high volume of new spammer accounts I have implemented a new policy regarding new accounts. Given how things play out, it may become permanent:

1. New accounts are bocked by default, until released by an administrator. The user receives an email explaining this. While blocked, the user profile is invisible to anyone but the site administrators. They are also, of course, unable to sign in.

2. Accounts that seem legitimate, are released, while accounts that are clearly spam (e.g., from know spammers, include spam urls) are either deleted or put in the spammer category.

3. Accounts that we can't be sure about may be put in the "on probation" category. Members of this class can post, but their posts will appear only after being reviewed by an administrator. If the user turns out to be legitimate, it will be moved to the regular category, allowing the member to post directly.

4. Note that the "on probation" category is also used for members who are not spammers, but are considered or tend to post messages that are off topic. The messages posted by users that are on probation are in general reviewed by me before being allowed to appear.


New users are advised that by putting a short sentence or two about their specific interests relating to PL in their user profile, they will help us allow them to post sooner.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

No Index

One way WikiWiki combats the Spam problem is to use the noindex tag to prevent indexing of new page content for about 24 hours. Search engines only get the go ahead when the last update is made more than 24 hours ago. Adherence to the tag is voluntary, but is respected by many of the largest search-engines.

Similarly, you could 'noindex' user profiles, or at least those of probationary users.

Implement those properties. And advertise them to new users. This reduces incentives to 'game the system', or at least makes LtU a harder target than its neighbors.

I seriously dislike the 'blocked user' policy. We occasionally see users join because an article gets referenced, or because they follow a link to an LtU discussion from outside the community. You would be causing such people to walk away in disgust.

If you really want a clean probationary period for users, allow non-probationary users (or just admins) to 'vote' them into or out of the community based on the content of their posts. There are more than a few people who would be willing to help with killing spam.

Until users are in the community, their user profile is <meta name="robots" content="noindex"/>. Thus, the only price paid for 'spamming' users is a (relatively minor) cost to your database size. You can (if you really need space) automate cleanup by auto-deleting user accounts that don't post within 168 hours of signing up. (You can also advertise this property.)

The policies you describe above will succeed in creating a lot of pain for new users and a lot of work for yourself.

I would also strongly advise

I would also strongly advise against any sort of auto-deletion if the user does not post. Plenty of people create accounts simply to be able to better monitor what is being posted.

Really?

I never see them...

A surprisingly high

A surprisingly high percentage of our legitimate userbase log in regularly and never (or rarely) make a post, and you do see people who make their first post a few years after they sign up for an account.

I would also strongly advise

I agree with this. I rarely post (because I don't have the expertise), but I have an account so a can see "new comments" and because I may want to post in the future

Yes, I agree with this

Yes, I agree with this wholeheartedly as well.

noindex

Although noindex does make posting spam links pointless, I don't think the spammers are smart enough to realise that. Advogato has put noindex on new, uncertified accounts for years, but it doesn't stop the spammers from trying.
It's worthwhile anyway, to stop them successfully gaming search engines, but it might not reduce the admin work needed to moderate the spam.

Who discusses the discusser?

We occasionally see users join because an article gets referenced, or because they follow a link to an LtU discussion from outside the community.

In fact, consider the recent "Critical code studies" discussion--the author of some material being discussed joined LtU to participate. Opinions about that particular topic aside, it seems to me a uniformly good thing that someone whose work is mentioned can easily join in.

Why block login of new

Why block login of new users? Why not just send all new users to probation? It seems that would eliminate almost all of David's concern. New users could join and post and would be told "your post is awaiting anti-spam moderation", rather than having to remember to come back to post if they're still in the mood to do so.

I think this suggestion is

I think this suggestion is worth serious consideration.

This seems pointless.

This seems pointless. Users should be able to sign in after making an account and should immediately begin a probationary period. Asking new users to wait for approval simply to get limited access strikes me as absurd; I can't recall ever seeing such a painful system on any forum or mailing list.

As for profiles, there should be an option to allow search engine indexing. It should be off by default, and the user should only be allowed to turn it on if they've gotten through the initial probationary period. (Alternatively, it could automatically turn on when the initial probationary period ends.) If they get put back on probation for some reason, this option should remain enabled.

Context

To put this in context, I described the current software situation recently here. The measures Ehud is describing are temporary and intended to simplify the manual admin work until an upgrade is completed, which will give us additional options.

Suggestions about how to customize the site are appreciated, but unless that's accompanied by an offer to write the necessary PHP code for Drupal 4.6.x (with some reasonable forward migration path), they probably won't get implemented.

Are you using a spam filter tool?

Given that LtU runs on Drupal, I'd say that using Mollom is probably the best way to fight spam and spammers. Have you considered doing that? See http://mollom.com.

Mollom

Anton reported that he had less than excellent experiences with Mollom in his comment that he linked to.

I asked a question on Serverfault about comment-spam filtering services, but there hasn't been any useful feedback in over a week.

Some comments

Thanks for all the suggestions.

It is gratifying to see many people speak up to uphold the policy of openness that characterizes LtU. Practically, however, the suggestions are not implementable at this point, as Anton indicated. Indeed, if more people would volunteer to help with spam fighting, we might not need the draconian measures now imposed.

Let me respond concretely to a couple of ideas floated in this thread. The profiles of users that are on probation are visible. This encourages profile spam, and requires the administrators to essentially edit 5-10 new account daily to remove spam urls. Blocking the users is very easy to achieve using drupal and solves this issue. Putting users on probation automatically is, in addition, not as easy to achieve as blocking them, and requires too much tinkering at the moment; tinkering that hopefully will be unnecessary once we upgrade. Hence it is a waste of time to do this now. Finally, there is no reason authors will not be able to join the conversation; surely, their accounts will be noticed and released even more quickly than other accounts.

While this may be a pure coincidence, I can report that since establishing the new system, the number of spam accounts (which reached double digit numbers daily) stopped to a trickle. Hopefully the spammers will go away and we can return to our normal policies.