Google Magic

I have characterized myself recently as a recovering typoholic and a convert to Visual Basic and in my various talks I use the paper on type-indexed rows that I wrote together with Mark Shields as the prime example of how deep you can fall as an addict to static typing.

As many of us undoubtedly do every once in a while, I was egosurfing for “typoholic”; vague hoping it would be a Google wack. However, much to my astonishment the first hit is actually our paper on type-indexed rows (alternatively type in typoholic on the Google homepage and hit “I’m feeling lucky”). That page does not contain the word "typoholic" and until now there were no links pointing to it!

If you ask me, this is pure voodoo. Perhaps I should start wrapping myself in aluminum foil to protect me against the Google mind control waves.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The Google AI

Perhaps a tiny demonstration of the notion that intelligence (artificial or otherwise) is little more than a collection of tricks, which are currently being identified and named (and patented!) with names like "PageRank" and "Bayesian".

As for typoholism, it seems you're going to need a disambiguation page on Wikipedia to distinguish your usage from that of being addicted to fonts.

Pagerank

As for Google's pagerank patent... there is prior art to the basic concept. Plus, I don't think most people would consider Pagerank to really be an "artificial intelligence" thing, just a really clever way of ranking web pages, the only downside being that it's trivially gamable.

Could you please provide a

Could you please provide a better link, your example of prior art came a few years after and references the original PageRank paper.
And how is pagerank trivially gameable? Making a hundred links from a page is no better than making one.

Pagerank and AI

Plus, I don't think most people would consider Pagerank to really be an "artificial intelligence" thing, just a really clever way of ranking web pages, the only downside being that it's trivially gamable.

The point is that this "really clever way of ranking web pages" succeeds in exhibiting behaviors that can appear intelligent, as in the example which Erik gave. Explaining the trick, as Paul did, doesn't make it any less successful. Are you so sure that your own intelligence is anything more than a collection of such "clever ways" of performing particular comprehensions on its input data?

But really, in writing about a "collection of tricks" I was referring to the common criticism that's been aimed at chatbots like Eliza and Alice, and Turing test competitions such as the one for the Loebner Prize, which tend to encourage a focus on such tricks (see e.g. Chatterbots, Tinymuds, And The Turing Test). Usually, "collection of tricks" is used as a criticism to indicate that something is not "real" AI. I was suggesting, not entirely seriously, that perhaps this thinking is backwards — perhaps we just haven't identified the right collection of tricks yet, but if you keep an eye on the patent office, sooner or later they'll all show up — let's call this the million patents approach to AI, by analogy to the million Shakespeare-typing monkeys.

In any case, using relationships between objects to analyze relevance and popularity is not simply a trick. The aspect which is a trick is that the content of the pages containing links isn't really "understood", which is part of what leads to gameability (trivial or otherwise). But in combination with other techniques, it's a perfectly valid means of analysis. And gameability is not necessarily an indication of lack of intelligence — after all, there's a wealth of evidence to show that humans are "trivially gameable" in a multitude of ways. In many cases, our gameability is in fact a direct result of identifiable tricks, processing shortcuts which we use in order to avoid having to think through every decision from scratch.

A nice simple example is herd behavior — following the actions of other people. You wrote that you "don't think most people would consider Pagerank to really be an artificial intelligence thing", revealing a desire to delegate the question of whether Pagerank is an AI feature to a majority vote, rather than developing your own arguments on the subject. Aside from the question of the logical validity of this approach, it is also gameable: if I wanted to convince you that Pagerank was an AI feature, all I would have to do is convince most people that it is. Luckily, most people don't know squat about AI, so that shouldn't be too difficult!

The "grandmother cell" theory is correct

Usually, "collection of tricks" is used as a criticism to indicate that something is not "real" AI.

People used to talk about a "grandmother cell", a neuron that would fire when you saw your grandmother, and then laugh, "haha - obviously the brain doesn't work like that". Recent research shows that the brain *does* work like that: common concepts have small numbers of neurons devoted to them.

Renamed for a new generation

Of course, the grandmother cell theory is now called The Halle Berry neuron theory.

Just one cell?

Recent research shows that the brain *does* work like that: common concepts have small numbers of neurons devoted to them.

My understanding is that it is slightly more subtle than this (but I'm certainly no expert). As I understand it, in the part of the visual system that is primarily concerned with object recognition, cells get more and more specialised as you move along the pathway away from the eyes, until you reach a point (V4? inferotemporal cortex?) where there are small groups or even single cells that react strongly only to very specific stimuli (particular faces etc). But that's not the same thing as saying that these individual cells are solely responsible for recognising your grandmother. Rather (again, from my naive understanding), some cells are more specific to grandmother-recognition than others; many cells are involved in the processing of all/most stimuli.

Any Sufficiently Advanced Technology

What's going on is actually quite simple: Google is indexing LtU as well as your home page. LtU contains pages that contain your name, the name of your paper, and the word "typoholic." That is, the LtU page(s) are only one edge in the graph removed from the paper, are weighted heavily towards relevance, and include a word that, statistically speaking, has the common form of an adjective in the English language. Google is almost certainly assuming that the word "typoholic" is an adjective modifying your paper and thus concludes that your paper (the "noun") is relevant.

Sound like a bit of a stretch? It shouldn't: Google's spell checking in the 20-odd languages that it supports doesn't use any dictionaries; it's all done with Google's statistical model of how the word in question is spelled across the web. And of course Google also does stemming, e.g. plurals and "-ing" forms are treated as equivalent to their root forms.

BTW, IIRC, Google tells you when it didn't find part of the search phrase in the result but the missing phrase was in a page that links to the result.

not quite magic

While your explanation is largely correct, i think the specific reason for this lies in Eric's previous post (use the cached version). He links to his paper with the anchortext 'typoholic'.
Associating anchortext with the page that is being linked to is a pretty well-known technique in search engine circles. I think you're giving google too much credit regarding nouns/adjectives, though only google really know for sure.

[on edit: i changed the 'previous post' link to a google search, because the LtU archive link i posted is doing odd things...]

MSN and Yahoo also rank your paper high

Search on MSN and Yahoo yields very similiar results on the first few entries. They are all doing similiar things. Not necessarily only Google is "mind-controling" you.