LtU turns 7: The year of spam

Seven years ago today LtU was born. I find it incredible that we have been doing this for so long, that some of the earliest members are still here, and that some of the same topics are still going strong! While the range of topics and general style of LtU remained fairly constant over the years, each year brought with it its own flavour. The main reason for this was that LtU was always open to new members, and each contributing editor influenced the discussion according to his interests.

So how can one summarize year seven?

I think that for Anton and me year seven will be remembered as the year of spam. We have been fighting spam daily, and I fear that we will have to put in place more draconian measures on new users shortly. Some of you probably saw a couple of spam messages that managed to get past us. But let me assure you: this is a tiny fraction. There are hundreds of new users that signed up only to post spam, with at least two or three new spammers signing up daily. Since we try to accommodate new members, I am not deleting users that fail to comply with our request for real names or identifying personal information - and so detecting potential spammers before they begin posting spam is difficult and time consuming. One reason why I posted fewer programming language related posts was that I was simply too busy fighting spam...

This is a good opportunity to thank Anton again for all he does to keep LtU up and running ( his insightful and amusing posts I take for granted, you see). Without his help in putting in place the technical infrastructure required for all the spam monitoring and control we would have drowned in spam long ago. This is one reason (aside from the fact that I was very busy with other things) that year seven is (still) not the Year of the Wiki. We put up a wiki, but decided that the integration of the wiki into LtU would require too much time, time both us couldn't spend this year.

Spam came to LtU for the simple reason that LtU became too well known a site... In fact the second thing that happened to LtU this year is that the number of active members grew considerably. This is, of course, very gratifying. I still remember the early days, when LtU had three members, and we didn't know if between the three of us we can keep finding enough interesting material to keep the site alive.

As one might expect this meant that some topics that were discussed here many times came up for discussion again. It is good to revisit these issues from time to time, but I fear that the rising volume of messages, and the number of new users, some of whom with less decorum than others, kept many old timers from engaging in these discussions, leading to some long threads that were not up to the usual quality of LtU discussions. Since no one was there to object, some may have gotten the impression that these threads (replete with ad hominem attacks, insults and language advocacy) are acceptable on LtU. I am partly to blame for not stepping in, but I just didn't have the time to follow all these discussions. So let me take this opportunity to remind everyone that discussions of this type are not welcome by the LtU community, and suggest more recent members consult the LtU policy as well as the LtU spirit pages. We discussed various forms of moderation and control in the past, and I still think the conclusion we reached - that is that the community should "police" itself - is the right one. If you find the content or style objectionable, post about it (in a separate thread, if needed).

I noticed that several of the LtU contributing editors began to post less and less. While I think the items on the home page are
interesting and exciting, there are fewer new home page items each week than I'd like. One reason for this is that many prefer to post things on their own blogs, and a fair amount of LtU candidate material gets posted to places like programming.reddit.com. While there are LtU members who prefer to keep the site restricted as much as possible to the discussion of published academic papers, my opinion is that if a regular member considers some project, site or presentation to be of interest to the LtU community, he should post about it here. This is even truer when it comes to contributing editors, of course. Contributing editors - don't hesitate, contribute! I remind everyone that we have some departments that are begging for stories, top among them the new departments devoted to Scala and Ruby.

It seems to me that LtU is in a state of transition. We can fight to remain the LtU we all know and love - but this requires effort. Or we can hope for the best, and see LtU turn into comp.lang.misc. To make sure we don't jump the shark, the community has to step up. Both in terms of steering the conversation, and keeping threads from getting long and disorganized, and by posting new and interesting stuff!

This is a good opportunity to ask long time members to mentor new members, not just direct them to the getting started page :-) . I implore old timers that are sitting back to engage in the conversation, and let us know what they are up to. We miss you guys!
And most of all, I pray for spammers to just crawl back to where they came from.

The last wish, I know, is unlikely to happen. The others I think are within our reach!

Happy birthday everyone!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Happy birthday!

Happy birthday LtU! And thanks to Ehud, Anton, and the LtU regulars for making this such a compelling place to visit!

what about use a trust metric like advogato?

Regarding spamming, I don't know what philosophy of openness vs workload for the admins you want to adopt, but advogato's trust metric might be an interesting approach for this site.

(for those who don't know how advogato works, users are assigned a level, from observer to master, based on peer evaluation. Only people above observer can post or reply. the definition of the levels is of course context dependent)

One thing that might be interesting is to somewhat share the confidence built at one site and reuse in in another... but this is a different topic.

Seconded

I think Raph Levien's great work on attack resistant trust metrics (PDF)—and associated code—is woefully underutilized. It might also be nice to implement some kind of slope-1 collaborative filtering system as LtU grows.

ranking systems

Paul Snively: slope-1 collaborative filtering system

I might need a collaborative filtering system some day, so I'd add to any discussion if we start here or elsewhere. But I'm biased toward offering first principle analyses over evaluating old published schemes. (I find folks can too quickly simplify during problem definition so solutions ignore whole classes of pertinent issues; not that I mean to criticize.)

In the last year in my day job I refined replacement algorithms grading relevance of parts of large data sets in a manner attempting to be fair to new data even when old data has advantage of tenure. New material must not be held up to as high a standard as old, or it can't break through an "old-boy" glass ceiling. That's one important thing to mind in algorithms. New data needs continuous aging behavior from loose to more stringent metrics.

Also, one dimension rankings oversimplify how people actually rank things. (Well, there's always a high-grid population subset seeking aristocratic total ordering of people & things in 1 dimension to simplify pecking order management and exposure control. But one needn't encourage elitism except by choice.)

Anyway, I think multi-dimensional ranking would be more useful and generate less nonsense (to non-mainstream thinkers) due to homogenization. Inter-dimension ranking effects could be calculated in some manner like to slope-1. I guess I'm thinking of some kind of Bayesian "consider-the-source" ranking of voters' attitudes towards topics to scale effect of their votes. For example (for some topic foo) a positive vote ranking a foo-centric post from someone who dislikes foo might count more than positive votes from foo fans.

But that approach might incite a semantic web fan flash mob swamping discussion with attempts to categorize in typologies, and we'd be reduced to librarians mediating vocabulary fights of newbies. (I almost said peasants. :-)

"grid" terminology

(Sorry for replying to myself.) I thought some folks might not have heard the term "grid" before. It comes from Mary Douglas in anthropology. I found a decent intro on http://en.wikipedia.org/wiki/Cultural_Theory_of_risk

Americans in general tend to be high grid, low group: 1) (high grid) believe everything is sensibly according to plan and everyone lives in social positions befitting what they deserve; 2) (low group) the influence of others, even in one's own family, is considered minor and often completely inconsequential.

Technologists tend to experience amplification of high grid perspective by self selection when society rewards such a career path (in many ways) and when creation of tools to order the world encourages a world view that's regimented and ordered. Resulting ethnocentrism makes one dimensional ranking look more reasonable.

[Edit: Sorry I'll try to keep my generalizations in check. In fact, I should try to keep my yap shut for a while, for practice. Thanks for all the good work, which is seldom praised but much appreciated.]

Ouch

Americans in general

Are you trying to prove that we need a moderation system? Seriously, straying a bit off-topic is one thing, but let's keep the generalizations under control.

I'm not convinced this is

I'm not convinced this is way off-topic when it's still relevant to spam-removal and post-rating tools. Nor do I think it's inappropriate to talk about cultural issues affecting tool design, which will require talking about tendencies. None of this says anything about individual Americans in the specific.

Thoughts from the middle of the grid

I don't really see the connection of "Americans" to this issue. The LtU audience is both wider and much narrower than that, and I see many more possible downsides than upsides to bringing that kind of generalization into the discussion.

I think the point about "technologists" was closer to the mark. I'd agree that it's something like the "amplification of high grid perspective" that leads to people wanting to develop automated systems to moderate discussion. I have some concerns about such systems, though. (Perhaps because I'm not American ;-)

One concern is that an automated system may just become a proxy for discussions we'd have to have anyway. Two of the entries on the advogato.org home page at the moment are about the trust metric system. Is it better to be discussing a trust metric system than just the posts that are considered out of line? The latter discussions have a benefit in that they're more directly about what needs to be discussed in order to communicate and perpetuate the community's values. Adding indirection here could be actively harmful.

There's also the point that automated systems are easier, and thus more attractive, to game. In "Gaming the system: How moderation tools can backfire", which is an interesting piece, Derek M. Powazek wrote "if your goal is thoughtful, positive conversations, beware of adopting the qualities of a game."

With a trust metric system, one specific game is simply the gaining of a credential, such as "Master" on advogato. This could lead to people signing up and posting purely for the purpose of achieving the credential, increasing the volume of discussion but probably not helping quality. (Perhaps LtU, being learning-oriented, could address this by adopting "Ignoramus" as the highest rank, reflecting the fact that the more you learn, the more you realize you don't know.)

Moderation and/or trust metric systems do seem useful in some of the large communities that rely on them, so perhaps Paul is right that something like that will become necessary "as LtU grows". But despite the relatively large size of LtU's silent readership, I doubt that the active membership will soon grow to the levels where such a system becomes essential, because the site is so subject-specific.

On the subject of spam, I see that as a fairly orthogonal concern, which happens to overlap the above in some places. E.g. spammers create admin issues even when their comments are being blocked, just due to sheer volume of traffic -- we had a multi-hour outage a couple of Sundays ago related to this. We may do things such as adding a captcha-like filter to the registration page, but I don't think we're in danger of needing a moderation system just to deal with spammers.

Or perhaps if we just got a

Or perhaps if we just got a good spamm filter, the moderators might get enough time to do more higlevel moderating, oblivionating the need for a automated moderating system.

Everyone's a moderator

Technical efforts against spam are ongoing. As I mentioned, it's not just about filtering the content -- just the traffic volume alone, which has more than doubled in the past year, creates issues.

But I think it's worth repeating what Ehud wrote in the topic: "the conclusion we reached - that is that the community should "police" itself - is the right one. If you find the content or style objectionable, post about it (in a separate thread, if needed)."

In other words, everyone's a moderator, so in theory, we should have no shortage of moderation resources.

Perhaps if we just solve the

Perhaps if we just solve the halting problem, other software engineering problems will follow!

It's an arms race and it always requires effort to deal with.

One view, in rant form!

I am an American (and actually from the brief descriptions of "high grid, low group" it also seems to not be that bad a fit; I am a capitalist,) but I'm also one of the more vocal advocates of implicit community moderation (again, I am a capitalist.) The "Gaming the system" link is quite good. I guess, however, that I'm not a particularly good representative, as I would not use a filtering system anyway. Actually, to be honest, if reading LtU required or even highly "benefited" from using a filtering system, I would cease reading it.

Along the lines of your "Everyone's a moderator" post, you do seem correct in that established community members (including myself) are not doing much to emphasize community standards. One case, not so much a problem but a trend, is the growing amount of Forum Topics that are not based on anything, just random thoughts, wonderings, questions, ideas. Part of this is, I believe, a sense of not wanting to be "unwelcoming" to new members. I guess I should state here that I seem to have a bit more extreme attitude toward new users/being welcoming than most people here, so take what I say as you will in that light. Essentially, I care about quality more than quantity. Even in programming language communities, where just having more users has benefits, I am still more interested in quality. In the case of LtU, it is far from clear that more is better. As such, I have no compunction about having the bar to entry be higher, e.g. things like waiting periods and also just a less accommodating attitude. I don't come to LtU to read unstructured grab-bag conversations whose goal is to communicate well understood ideas to a single person. While LtU is a good resource for learning via old conversations (and the wiki when it comes), I don't think new conversations should be a vehicle for explaining well established ideas.

Anyway, as I suggested above, it's probably a good thing that the majority of the LtU community (and the Haskell community) are much friendlier than I am.

Holes in the glass ceiling

For an alleged rant, that was remarkably useful. Thanks. I agree with many of your points, and I know there are quite a few others who feel similarly.

However, there are some good arguments in favor of giving some leeway to newer members - the main one, I think, is to help keep the site and its community healthy over time, i.e. too high a barrier to entry will just keep new members away, presumably leading to stagnation. This is closely related to the point McCusker raised about the old-boy glass ceiling. (It's the same sorts of reasons that lead universities to put up with students...)

However, this requires striking a balance, which is made all the more difficult by the fact that it's not done in a very systematic way. I do think we need to work on this, although it's more of a medium- to long-term goal from my perspective right now.

Turning Myself In

...not so much a problem but a trend, is the growing amount of Forum Topics that are not based on anything, just random thoughts, wonderings, questions, ideas.

I've been guilty of this is in the past; but the nature of my offense wasn't clear to me until I read this post. Good guidance on a thread gone bad is a rare thing -- too often, the thread is tolerated until it's really rotten.

However, it's definitely unattractive to see a thread wander from technical to sociological to completely meta, disciplinary stuff. A mechanism for petitioning a user to change their behaviour could help alot -- it would keep discipline out of the forums while giving it adequate space.

How about this

If a top level Forum Topic post does not have a link in it, reject it. I think this heuristic would almost perfectly discriminate between "good" and "bad" topics.

Did the early posts about

Did the early posts about Cat have links?

Most of them seem to.

Most of them seem to. Certainly policy 4.a seems relevant.

High Grid, Low Group

Rys McCusker: Americans in general tend to be high grid, low group: 1) (high grid) believe everything is sensibly according to plan and everyone lives in social positions befitting what they deserve; 2) (low group) the influence of others, even in one's own family, is considered minor and often completely inconsequential.

You say that like it's a bad thing.

:-)

Clarification

McCusker wrote:

[Edit: Sorry I'll try to keep my generalizations in check. In fact, I should try to keep my yap shut for a while, for practice.

The latter definitely wasn't what I was aiming for. Your comments on LtU are always interesting and thought-provoking. I was perhaps being overly preemptive in attempting to head off a discussion about national identity.

Thanks for all the good work, which is seldom praised but much appreciated.]

Thanks very much. LtU is one project where I don't think I've ever questioned whether the effort is worthwhile.

Subject enlargement

First of all, I'd like to thank spam and spammers for their contribution to the Information Theory and data mining field.
Having said that, my perspective is that programming langage are of added value when they adress core concerns of some kind.

At some point in the past, the primary concerns where about functional capacities, weird typing system etc.. But the area of interconnected computers, the massive amounts of data available now turns the table toward other problems than, say, typing system. Not that typing system is unimportant, but it seems to be overwhelmingly more important to have system integration, data integration, or some higher level construct for which exact formalization are interesting, but not that important.

Give me some javascript, if I can have a distributed variational approximation or distributed annealed importance sampling for the right problem, I'll be able to retrieve more value than a particular new specification for some amazing langage. Now shops like banks, biotech, marketing company regularly own computing grid. Information Retrieval and data mining becomes the hotspot for many application.

The challenges have moved up. lambda calculus is now integrated in various places, so is functional approaches. from langage theory (and a profane perspective), only distribution seems to be left.

For lambda the ultimate to keep up with its lead means to provide quality insight in other areas where the heat is, at the intersection of academic progress and major industrial interests.

As someone who doesn't use a

As someone who doesn't use a "real name," I'm getting tired of your harping on it. What matters is that people use *consistent* names, and that they post in good faith. An enforced lurking period of one week, or an email confirmation on registration, would probably solve this problem. If you require people to use their real names, you should also implement privacy options that allow them to only reveal those names to LtU members of their choice.

Real names

Don't take the harping personally. The culture of non-anonymity dates back to the early days of LtU. It's understood that not everyone will conform to that, for various reasons, but nevertheless new members are encouraged to use their real names. Exhortations about that aren't really directed at established members.

I'll second the real name

I'll second the real name issue. I think it's important to use a name that is identifying/unique at least. For instance, I have used 'naasking' since I first started posting online, and a google search on that string yields my real name on the first page of results, so I'm not trying to hide my identity; more of my online history is under my pseudonym than my real name, so isn't it more useful? :-)

Perhaps naasking is your real name?

It could be argued that in a significant sense 'naasking' is your real name online, and it's been said before that such cases're fine.

More editors can help keep the flow

I am one of those who has been posting a lot less to LtU recently. [Not that I have not been posting on the web, but I guess MaplePrimes has managed to swallow a huge amount of my time, where their ranking system now has me as the top contributor.] I will return! I have seen a number of PL items that have not yet made it to LtU, when I next spot a slow week, I'll make a post (assuming it's not during my vacation). Even though I have not been posting much of late, I still read LtU daily, it has been a really valuable resource.

human-scale spam swatting

Please consider trying something like what Craigslist does:

Add to each post a link (next to "reply") called "flag this post". Only logged in members may flag a post. Evolve a metric by which certain patterns of flagging trigger an automated solicitation of editorial attention.

The nature of things here, and although it smacks of social engineering, perhaps two links: "flag as spam" and "flag as inappropriate" where "spam" indicates a mindless attack on the site and "inappropriate" indicates an issue like topic drift, or professional impoliteness, or what have you.

-t

Future technology

We may very well try some version of this after our next Drupal upgrade, in the form of the Flag Content module.

How about a Captcha?

Is the spam you are receiving from humans or from bots?

I just checked the registration page, and it seems that name and email are all that are required. Consider using a captcha.

We had problems with spam on our forums, but eliminated 99% of it with some simple captchas. Something as simple as "What is this forum about? (Answer: programming): _________________" will work wonders.

Chris

Good idea

We had a similar experience on schemecookbook.org. We'll probably be doing something like that here. LtU does get many more human spammers, however, presumably because it has much better googlejuice.

LtU is more challenging to think up a relevant but usable question for, though. It's not about "programming"! Perhaps something like this: "Please reduce the following lambda calculus expression: ((λx. x) ltu)"

"Please reduce the following

"Please reduce the following lambda calculus expression: ((λx. x) ltu)"

Lovely. But soon everyone will visit this site for nothing but solving Captchas ;)

I really like that idea

That seems to accomplish three things at the same time:

1) It would eliminate any auto-generated spam and most other spam. At worst you'd have people pushing their products who actually are in the community

2) It would establish a low but reasonable barrier for entry.

3) It would work towards establishing what topics are considered absolutely vital for entering the conversation.

Since the other major topic is types, if I could suggest another one:
(toRational (3::Int)) + (toRational (4::FLOAT)) =
(answer 7%1)

Social software and group dynamics

Given the discussion about using social software to help manage this site's communication, I'd like to point out Clay Shirky's "A Group is Its Own Worst Enemy". His talk was aimed at the more general sort of social software systems, but there are some good lessons there for any group as it grows.

Programming language topics have gone "mainstream"

I never thought I'd see articles about Haskell (including blog entries about monads) and Erlang an OCaml and so on at reddit, but they've become the norm. Even articles about Forth :) But, wow, do I get ulcers reading most of them, and seeing every headline with Haskell in it getting modded up super high.

What I like about LtU is the overall feeling of neutrality. Sure, the static vs. dynamic typing debates break out too often, much to my dismay, but overall I like seeing interesting articles that don't have fanboy spin on them.

And yeah, as a contributing editor I should contribute more!

Note to self: Stop reading reddit.

2010, the year of even more spam

It seems that the amount of spam posts on LtU has increased drastically as of late. I find that more and more annoying, because every other time I look at the "Recent posts" page (my main entry point to LtU) it is topped by a row of automated spamming posts. And even after these posts are removed by the admins, who are usually quick (big thanks for that!), the page seems to remain in the order that was produced by the spamming. Unfortunately, that renders it more and more useless.

In that light, Ehud, do you think it may be time to reconsider the policies, and perhaps switch to moderation for new members after all?

You are right. We have been

You are right. We have been talking about it, and some changes will hopefully be implemented soon that will help with the situation. If the technical measures will not be enough, I will certainly raise the option of changing our policies.

Let me use this opportunity to note that several members have offered to help with spam control and their help greatly reduces the amount of spam you see (imagine how much worse things could have been!) They should be thanked by all. I have also been more proactive recently, and many spam accounts are deleted or put on probation. As you say, all this has not been enough.

Removing Spam

Well, when I remove spam, by marking an account as a spammer and going through the known spam page, stories go back to their original order in the recent posts page.

But at least one other admin seems to be using a different code path, which seems to have slightly different effects along the way.

One thing that happens is

One thing that happens is that if a user replies to a spam message, that non-spam reply will affect thread order even after the spam has been deleted. Deleting the non-spam message manually doesn't fix this directly, but the next time the spam deletion script runs, it fixes the thread order.

Improving some of this is high priority right now, and there'll be changes in the next few days.