What's in a name?

Blog post by Olin Shivers.

Excerpts with my editorial:

Newell and Simon's Turing award cited them for their work on â€œsymbol systems.â€ Newell once told me that this was just names, and then he explained his understanding of names: â€œThey provide distal access.â€ That is, a name is a local piece of data that stands for some other piece of data, which is presumably large and remote. You can now use that small, convenient, local datum instead of the large, remote thing for which it stands.

The treatment bothers me, because it doesn't distinguish very well between names for computers and names for people, which contrasts in the idealism I picked up from Edwards ("names are too valuable to be wasted on compilers"):

As far as Newell was concerned, this is the purpose served by names, or symbols, in all computational systems. Including the one in your head.

And afterwards, dives into the computational:

The BitTorrent example above is particularly interesting, since it comes with an unusual, distributed dereferencing mechanism. A BitTorrent name also has the additional exotic property of being a â€œtrue name,â€ in that it names one and only one piece of data.

He then visits the humanistic viewpoint:

Good name choices make it easy and natural to do the right thingâ€”like expressive, well-chosen types, they lead you effortlessly to the terms you wanted to write. This is because names used by humans come with baggage. They fit into a larger framework. When you can get that framework right, and stick to it, then the names come easyâ€”not just when they are minted, but when someone is trying to recall the name for some existing thing.

Queue a bunch of suspiciously OO names that take the form verb-noun or noun-verb. And then finally back to the computational viewpoint, where FP reigns supreme:

One final remark about names. We use names all our lives, every day, all day. So they seem obvious and not so mysterious. But names are subtle. There is a sense in which the Î» calculus is nothing more, really, than a theory of: names. Go look at the fundamental, definitional rules of the Î» calculus.

I'm of the opinion that the humanistic and computational nature of names are completely different, and find it weird that they are presented in such close quarters as to imply a strong connection between them.

Also, the way FP and OOP deal with names illuminates a lot about the paradigms. In OOP, objects have identities (intrinsic globally unique names that allow for easy aliasing) with interfaces that are named according to natural metaphors; FP is rather more oriented to symbolic reasoning where values (in pure FP at least) are anonymous and defined strictly by their structure, functions are named according to transformations on this data (map, reduce, select, etc...).

human vs. machine

All names under consideration when we talk about computing are for both machines and humans.

There is not one kind of name for machines, another for humans.

Every name is both and which aspect matters depends upon context.

The treatment bothers me, because it doesn't distinguish very well between names for computers and names for people, which contrasts in the idealism I picked up from Edwards ("names are too valuable to be wasted on compilers")

By Thomas Lord at Fri, 2014-07-18 01:08 | login or register to post comments

not necessarily

You can separate them completely by have human names resolved to computer names in the IDE...as originally proposed by Jonathan Edwards in his subtext paper and explored in my maze of twisty classes paper.

By Sean McDirmid at Fri, 2014-07-18 01:15 | login or register to post comments

perspectives

You can separate them completely by have human names resolved to computer names in the IDE

What made the IDE?

And what interprets the "human" names typed into the IDE?

By Thomas Lord at Fri, 2014-07-18 01:26 | login or register to post comments

Turtles

Its turtles all the way down.

That is like saying everything in the universe is made of atoms, so all knowledge can be derived from studying atomic interactions...including music and literature.

By Sean McDirmid at Fri, 2014-07-18 02:50 | login or register to post comments

re: Turtles

That is like saying everything in the universe is made of atoms, so all knowledge can be derived from studying atomic interactions...including music and literature.

Newell's concept of symbol systems is reductive like "the doctrine of Atomism". Of that doctrine his Turing lecture says:

But because the underlying species of atoms are so simple and limited in their variety, quantitative theories were soon formulated which assimilated all the general structure in the original qualitative hypothesis. With cells, tectonic plates, and germs, the variety of structure is so great that the underlying qualitative principle remains distinct, and its contribution to the total theory clearly discernable.

Later, when he and Simon discuss the hypothesis relating symbol systems and general intelligence, it is clear they believe that the variety of structure inherent in the symbol system hypothesis is great enough that the principle will remain distinct and make a discernable contribution to the study of general intelligence vis a vis computing.

By Thomas Lord at Sun, 2014-07-20 02:24 | login or register to post comments

Later, when he and Simon

Later, when he and Simon discuss the hypothesis relating symbol systems and general intelligence, it is clear they believe that the variety of structure inherent in the symbol system hypothesis is great enough that the principle will remain distinct and make a discernable contribution to the study of general intelligence vis a vis computing.

But symbolic reasoning as a basis for intelligence has since been heavily discredited.

By Sean McDirmid at Sun, 2014-07-20 02:53 | login or register to post comments

pun confusion

But symbolic reasoning as a basis for intelligence has since been heavily discredited.

For Newell and Simon, "symbolic reasoning" (as I think you mean it here) and "physical symbol processing" are not the same. The former can certainly be done with the latter.

It may be that I am mistaken as to how you mean "symbolic reasoning as a basis for intelligence". I think you would say that, for example, an artificial neural network used for some AI task is not an example of "symbolic reasoning". On the other hand, an alpha-beta search of a game tree is "symbolic reasoning". Is that right?

In Newell and Simon's terminology, running an artificial neural network program would nevertheless be the work of a physical symbol system.

By Thomas Lord at Sun, 2014-07-20 03:17 | login or register to post comments

Running the neural network

Running the neural network might be symbolic reasoning done by the computer, but it is not under our control and mostly encapsulated from our view. The only way the trainers can reason about what they trained is through images of continuous gradients.

I'm not so interested in arguing about this, specifically because I don't really care about how the computer handles names and am only interested in how humans deal with abstractions using names. Computers are not the problem here, it is the humans interacting with them.

By Sean McDirmid at Sun, 2014-07-20 03:45 | login or register to post comments

That's more or less Shiver's bias

I don't really care about how the computer handles names and am only interested in how humans deal with abstractions using names.

Which is more or less how you could describe where Shivers goes in the last two sections (about naming choices and about lambda).

I don't know that Newell and Simon's "physical symbol processing system" is, as far as they developed it, a very productive concept. It's utility for generating new ideas is pretty slight. Yes, OK, here is an internally consistent set of concepts adding up to an intuition of what a universal computing machine is.... but then what? And, yeah, in retrospect, their reasoning about "general intelligence" is a bit dated.

But I have some grasp of what their concept is and I think it's interesting to read it as what it is rather than projecting more contemporary concepts on it.

Leaving Newell's concepts aside and turning more towards Shivers' last two sections I have an opinion that is a bit contra yours.

In ways I can't quite summarize yet my gut feeling is that the academics today have fetishized specifically human-use names beyond commensurate utility. I'm left cold by elaborate scoping mechanisms, hygenic macro systems, and all that. My opinion isn't important here. I'm not trying to make an argument for it. Just mentioning it.

I think there is a family of generalized concepts of computing that don't place much emphasis on names but instead take a process-oriented view of spatial structures.

In this alternative set of concepts an idealized process occupies some definition position in physical space. Data is dispersed in that space, each discretized location containing some finite amount of data. There is some spacial metric that defines positions relative to a process. A process is a kind of finite state machine that can respond, at each step, to the datums found in spacially close locations. Processes can alter the contents of nearby locations and processes can change their position in the space of accessible locations.

More formalized instances of this alternative set of concepts include classical Turing machine, concatenative languages, and combinator calculus. Trivial use of names may occur in these like word names in FORTH or combinator names but mostly spacial coordinates are used instead of names: the top two values on the stack; the third parameter to a combinator.

People for the most part happened not to choose spacially-oriented programming systems to build big complex software projects. Popular programming languages devote enormous amounts of attention to the human assignment of names to things. It's interesting to me to contemplate what large-system architectures would look like if, instead, more spacially-oriented programming systems were used.

But again, just mentioning it. Not trying here to argue that the spacial view is really productive either.

By Thomas Lord at Sun, 2014-07-20 05:46 | login or register to post comments

Interesting

Reminds me of early papers about Cyberspace. The drawback is that I am not very good at this - yes I can handle a matrix, where location is linear, but it all falls apart for me with a discrete random space.

I can write a matrix multiplication from scratch quite easily, but I am forever getting bugs from accessing the wrong member of a tuple. For example returning a boolean success and an index from insert, i get my firsts and seconds mixed up, especially when refactoring code. I find it much easier when I have defined a struct with named members.

By Keean Schupke at Sun, 2014-07-20 06:03 | login or register to post comments

_

Double post.

By Keean Schupke at Sun, 2014-07-20 06:03 | login or register to post comments

We actually agree somewhat:

I'm left cold by elaborate scoping mechanisms, hygenic macro systems, and all that.

Compare to Edwards "names are too rich in meaning to waste on compilers;" human names should never hit the computer, really, global name scopes sifted through via search, and contextual or explicit disambiguation is how we deal with names in real life.

By Sean McDirmid at Sun, 2014-07-20 07:55 | login or register to post comments

No Global

I don't really think global names exist for people, everything is contextual. You might know three people called 'Bob' but when I say that you will think of only one, determined by context, which can be external context (at work/home) or internal (state of mind). If you could plot state of mind in an N-dimensional space, each name would be located at the centroid of all the states it has been used in. The name recalled would be the closest (euclidean) to the current state.

It becomes increasingly hard to recall names at greater distances, to the point that it is almost impossible to enumerate all 'bobs' you have ever encountered. Often further bobs emerge when you shift the current state, which results in new matches coming to mind hours, even days later.

By Keean Schupke at Sun, 2014-07-20 08:10 | login or register to post comments

Global names do exist for

Global names do exist for people, they are just quite ambiguous and not unique. Google has made a nice business on figuring out what "bob" you want.

By Sean McDirmid at Sun, 2014-07-20 08:16 | login or register to post comments

Google is not a person

I would like to see the experiment where you recall names without a context (emotional state, location, weather, what you can see, what you can hear, what you can smell, the last thought you had before).

By Keean Schupke at Sun, 2014-07-20 08:35 | login or register to post comments

Google augments human

Google augments human intelligence and becomes a person to many. In any case, context is quite important in disambiguating names, but the names exist in a global namespace.

Unless you are claiming context is a namespace, but then I'm not sure how you are defining namespaces.

By Sean McDirmid at Sun, 2014-07-20 08:48 | login or register to post comments

Namespace is a synthetic concept

I would give you that if we were discussing machines, but humans pre-exist synthetic concepts like namespaces. The fact is human recall does not work like a namespace, as recall time is proportional to the distance between the current state and the state in which things were learned, and recall from a distant state may fail altogether.

A namespace is normally a one dimensional partially ordered space, I guess if you were to expand it to an N-dimensionsional euclidean space where probability and order of recall is relative to distance, you could call it a namespace.

Human + Google = Cyborg, which may have a completely different behaviour to a human alone :-)

By Keean Schupke at Sun, 2014-07-20 09:09 | login or register to post comments

Search does not remove the

Search does not remove the need for scope. It still doesn't make sense to access foo(x)'s parameter in a function bar(y). In fact lexical scoping is what makes the notion that names are for humans possible, since it allows you to rename any variable without changing the meaning of the program (unlike dynamic scoping).

By Jules Jacobs at Sun, 2014-07-20 10:00 | login or register to post comments

My current language has

My current language has lexical scopes for local variables, I don't see anyway to get rid of them yet. I was toying with an idea to create names commonly used for local variables and allow for lexically scoped binds...but I couldn't see much improvement there.

What I'm advocating for, and have argued for in the past, is a global namespace for every that is otherwise public. So no more packages or namespaces or hierarchy for such things, just one google friendly namespace. What scopes remain should be quite boring.

By Sean McDirmid at Sun, 2014-07-20 10:15 | login or register to post comments

How about this: libraries

How about this: libraries are still organized into packages with qualified names (Foo.Bar.Baz) but the search mechanism allows you to just write Baz, and if there exists only one then it will select Foo.Bar.Baz. That will be displayed as just Baz as long as there exists no other Baz. When another package is added with Foo.Quux.Baz, then the name displayed becomes Bar.Baz.

By Jules Jacobs at Sun, 2014-07-20 10:18 | login or register to post comments

We already do that with URLs

We already do that with URLs when we want to go old fashioned and type them directly.

When you type baz, you commit to a baz. There is no ambiguity in the baz you want. If another baz is added later, it doesn't affect your selected binding. Display wise, you see baz unless you are interested in knowing which baz. The name reference is just a bookmark.

By Sean McDirmid at Sun, 2014-07-20 10:19 | login or register to post comments

I think my language would be

Search does not remove the need for scope. It still doesn't make sense to access foo(x)'s parameter in a function bar(y). In fact lexical scoping is what makes the notion that names are for humans possible, since it allows you to rename any variable without changing the meaning of the program (unlike dynamic scoping).

I think my language would be a counterexample to the above argument. All symbols are global and scope exists to help search. And there are reasons you might want to reference foo(x)'s parameter from another context.

Lexical scope doesn't really differ much from dynamic scoping in terms of allowing renaming. It does obviously make it more local, but you still have to worry about shadowing in most languages, so it's not completely trivial. If you view names as a graph encoding, where each name gets resolved to a reference to a name-introducing construct, then the rules are the same for both: renaming is a no-op if it doesn't change the graph (duh!). The main problem with dynamic scoping historically was that languages didn't statically resolve names to a unique reference. Trying to resolve the ambiguity of names at run-time is the bad idea.

By Matt M at Sun, 2014-07-20 15:09 | login or register to post comments

But symbolic reasoning as a

But symbolic reasoning as a basis for intelligence has since been heavily discredited.

If you're referring to arguments like the Chinese Room, then no it hasn't. The majority of philosophers consider the Chinese Room an interesting thought experiment, but one that doesn't actually prove anything. If you mean something else, then can you please provide a link or keyword I can look up.

By naasking at Sun, 2014-07-20 13:54 | login or register to post comments

Machine learning.

By Sean McDirmid at Sun, 2014-07-20 22:21 | login or register to post comments

true names: control not computation

I think that section does not dive into the computational so much as it elaborates the concept of "distal access" by saying "in particular, distal access which grants some form of remote control". The examples chosen for this elaboration emphasize the generality of the concept of names by including both programming examples, and social examples, and an example that is both at the same time.

And afterwards, dives into the computational:

The BitTorrent example above is particularly interesting, since it comes with an unusual, distributed dereferencing mechanism. A BitTorrent name also has the additional exotic property of being a â€œtrue name,â€

By Thomas Lord at Fri, 2014-07-18 01:20 | login or register to post comments

As a counterpoint... The web

As a counterpoint...

The web has something called globally unique URLs. I rarely deal with them, and prefer to interact with the web through bookmarks or Bing/Google. The URL is for the computer and occasionally humans must deal with them; the web wasn't very usable until they were mostly hidden away from us.

By Sean McDirmid at Fri, 2014-07-18 02:52 | login or register to post comments

as a counter-counterpoint

*oopsy double post sorry*

By raould at Fri, 2014-07-18 22:22 | login or register to post comments

as a counter-counterpoint

">http://www.waterken.com/dev/YURL/Name/

i mean i wouldn't want *my* secret access urls to be indexed by google.

but of course then the counter-counter-counterpoint is that it *really* forces you to want to have Pet Names.

By raould at Fri, 2014-07-18 22:23 | login or register to post comments

artless art

I'd agree if you mean that the "art of names" section seems like a useless, tacked on, non-sequitor. Skipped from fundamentals about abstract names to just syntactic fetishism of some kind.

He then visits the humanistic viewpoint:

And yeah, the concluding section seems pretty random, too.

By Thomas Lord at Fri, 2014-07-18 01:24 | login or register to post comments

Names are hashes

A name is just a hash.

By kms at Fri, 2014-07-18 02:36 | login or register to post comments

and a hash...

and a hash provides distal access, particularly in the form of control. And a hash gives a way to propagate, replicate, and share that access. People fetishize the details of human-readable hashes because they are obsessed with finding human-readable hashes that reflect an orderly and useful arrangement of concepts. Lambda calculus is arguably an insightful theory of the abstract logic of hashes.

;-)

By Thomas Lord at Fri, 2014-07-18 02:53 | login or register to post comments

Names aren't hashes.

A hash is a lossy and mangled representation of the content. There is no strong relationship between names and content. A function's code can change without changing the function's name. But every time you change the function's code, you do change the function's hash.

Hashes can be used as names. Though, it's only viable for strongly unique hashes, e.g. secure hashes, and for immutable content (cf. convergent encryption).

By dmbarbour at Fri, 2014-07-18 02:56 | login or register to post comments

when is a hash not a name?

Isn't a hash always, at the very least, the name of some equivalence class? Commonly also the name of some hashing data-structure bucket?

By Thomas Lord at Fri, 2014-07-18 03:44 | login or register to post comments

a hash is never a name

A hash is never a name of the content hashed. Calling it 'a name of some equivalence class' is perhaps true in some sense, but is also a pointless equivocation in context of discussing named things or kms's position that names are hashes in general.

Nonetheless, a hash can be used as a name for content, simply by maintaining an external association between the hash and the content. This has been used in some p2p and content distribution networks and at least one filesystem as an alternative to human names for content. I'm also using this for Awelon Bytecode's separate compilation and linking model.

AFAICT, all naming systems use external association, albeit with varying scope.

By dmbarbour at Fri, 2014-07-18 04:28 | login or register to post comments

hash empiricism

A hash is never a name of the content hashed.

That depends on the context. As an example, consider a context where some perfect hash is being used.

In fact, if you are studying a system where hashes are used, you could do worse than to ask yourself "What, in this context, do hash values specifically name?"

By Thomas Lord at Fri, 2014-07-18 04:42 | login or register to post comments

A hash can be used as a

A hash can be used as a name. I've said as much. But that does not mean hashes are names. Hash values don't need to name anything. They can be used for fail-fast equality checks, message authentication, password validation, PRNGs, and in many other ways that don't involve a reverse association. Hashes also aren't very effective as names for many things.

Saying "it depends" in this context seems analogous to arguing that "pianos are weapons" depends on the usage context. I can't say that's completely wrong (pianos can be used as weapons), but it's certainly misleading and fruitless in a discussion regarding the nature of pianos.

By dmbarbour at Fri, 2014-07-18 05:46 | login or register to post comments

back to the paper

But that does not mean hashes are names.

Hashes are always names in the sense that Newell was using the word and which the blog post is about.

You can place yourself inside of some other discourse in which it makes sense to say "hashes are not names" but then you are not talking about Newell's remark or the blog post.

It might be worth asking "In what way is it helpful to regard names as Newell did? Why did Newell believe this view helped him develop career-defining insight? Why is it that Shivers recognizes as true in Newell's description?"

It seems silly to try to argue against Newell's way of looking at things without first being able to enter inside it.

By Thomas Lord at Fri, 2014-07-18 15:38 | login or register to post comments

Newell's definition of names

Hashes are always names in the sense that Newell was using the word

This is simply untrue. Shivers clarifies Newell's description thusly: "a name is a local piece of data that stands for some other piece of data". I've already provided several examples where hashes do not stand for other pieces of data. Providing 'distal access' is not inherent or essential to the nature of hashes.

By dmbarbour at Fri, 2014-07-18 15:59 | login or register to post comments

all names are hashes is tenable

First:

I've already provided several examples where hashes do not stand for other pieces of data.

As an example, you mention:

[hashes] can be used for fail-fast equality checks

In such a case the hash values being compared are the names of equivalence classes of other data. Each hash value is a name for the datums in the corresponding class.

You seem to think you have a "gotcha" here because the mapping from those kinds of hashed values to hash values is not a bijection. My point to you is that you are only expressing confusion about what specifically is named by the hash values.

By analogy DNS hostnames do not represent bijections to specific hosts. Some of them are names for equivalence classes of hosts. Or one could say that all DNS hostnames are names for a particular collection of routing rules scattered around various nodes of the net. That the hostname is not a true name for a specific host doesn't mean that the hostname is not a name.

Remember that Newell here is talking about the realization that computers are in general about processing symbolic models of other stuff in contrast to simply "number crunching".

I don't see any serious problem with kms asserting that "a name is just a hash". Names do act like hash values and vice versa. My point to kms was that this view doesn't add much. It amounts to a connotative rather than denotative alpha-substitution, so to speak. Still, I don't see any big problem with it other than that.

By Thomas Lord at Fri, 2014-07-18 19:08 | login or register to post comments

All hashes are names.

All hashes are names may be true but by your own argument all names are not hashes. Hashes are names of the equivalence class of object X, and not object X itself. The uniqueness property of names is critical, even if that is just locally unique hashes do not satisfy that property. If I have two objects there is no guarantee their hashes will be different, and hence hashes would be useless for naming these two objects.

So clearly hashes are names of equivalence classes, so some names are hashes, and maybe all hashes are names, but all names are hashed is trivially false - unless you wish to argue that the purpose of names is not to distinguish between two objects?

I note the original post is ambiguous "names are hashes" existentially true, but not universally.

By Keean Schupke at Sat, 2014-07-19 06:16 | login or register to post comments

Names and distinguishing objects

unless you wish to argue that the purpose of names is not to distinguish between two objects?

For Newell, a name is a symbol and symbols designate an object.

For Newell, the word designate does not necessarily require that the name uniquely distinguish an object.

It is OK with Newell to say that a name designates some aspect or property of an object even if that aspect or property is not unique to that object. Through this designation the name nevertheless provides a kind of remote access to the object.

The essential thing here is that a process is manipulating physical symbolic expressions about separate objects.

See the Turing lecture quote I put in a separate comment.

By Thomas Lord at Sat, 2014-07-19 16:22 | login or register to post comments

locally unique

It still needs to be locally unique in the context. In this case the name designates a property on the object so it must be unique amoung the objects properties.

As any two hashes can collide it does not even have the property of local uniqueness.

By Keean Schupke at Sat, 2014-07-19 16:34 | login or register to post comments

local uniqueness

The concept of "local uniqueness" may be interesting but I can't find it in any relevant part of Newell's output.

Do you have a pointer to where he somehow says that "local uniqueness" is essential to naming, aka symbolic designation?

Newell's conceptual framework here, by the way, is deliberately meant to be very broad, very abstract, and very reductionist. See the previous section of the 1975 Turing Award lecture (previous to the one I quoted): "Laws of Qualitative Stucture".

The concept of a Physical Symbol System is meant to be in parallel with other broad, reductionist, but productive concepts like "The Cell Doctrine in Biology", "Plate Tectonics in Geology", "The Germ Theory of Disease", and "The Doctrine of Atomism".

The physical chunks of information we talk about in a computing system are symbols and, in this conceptual framework, symbols are designators. No exceptions.

By Thomas Lord at Sat, 2014-07-19 16:45 | login or register to post comments

Functions and objects.

Imagine we name the arguments to a function. If the names are not unique we cannot use the argument:

f x y =

We name the first argument x the second y. If x and y are hashes there is a finite probability of a clash and as a result we cannot access all arguments correctly. Same applies to object properties.

By Keean Schupke at Sat, 2014-07-19 17:33 | login or register to post comments

re Functions and objects.

Imagine we name the arguments to a function. If the names are not unique we cannot use the argument

I can imagine a programming language in which a symbolic name for a parameter is non-deterministic as to which parameter it refers. This could be quite useful in some situations. That's not really here or there, though. You are right that there are contexts in which names must be unambiguous.

Newell, in any event, is talking very broadly -- as abstractly as he can -- about names in general (in the sense of "symbols which designate" in physical computing systems).

By Thomas Lord at Sat, 2014-07-19 18:07 | login or register to post comments

A hash value is close in

A hash value is close in nature to an observable property like color 'red'. This property does describe some subset of objects, but it doesn't (generally) provide any sort of distal access to them. And also like hash values, if you want to test whether two objects are equal, you can certainly start by checking whether they're both red and fail fast if not.

The property 'red' is not a name in the sense that Newell and Shivers use the word. You could argue that red names a color, but at that point you're just equivocating, shifting goalposts away from distal access to objects or data. And this is exactly what you're doing with your whole 'names an equivalence class' argument.

I will not engage you any further on this.

Regarding bijections, I've not argued for a bijection. Names can be ambiguous. Objects can have multiple aliases. We can speak of these conditions with respect to names.

Hashes can be used as names in the sense that Newell and Shivers describe. But it is rare. Even conventional use of a hashtable doesn't really qualify. The `hash(key)` value is only temporarily computed, not stored or communicated or used as a name. The key might be considered a name.

I don't see any serious problem with kms asserting that "a name is just a hash".

Indeed. People believe and assert all sorts of nonsense without any serious problem.

Names do act like hash values and vice versa.

Untrue. But it won't cause you any serious trouble, either.

By dmbarbour at Sat, 2014-07-19 07:04 | login or register to post comments

A hash value is close in

A hash value is close in nature to an observable property like color 'red'.

Quine called an inherent property like "red" a "natural kind" in his attempt to address Goodman's new riddle of induction.

Names are not natural kinds, and it's a mistake to conflate the two. There is nothing about my physical structure that necessitates my first name to be "Sandro", nor my online pseudonym to be "naasking".

Hashes are generated from the observable properties of an object, and so are natural kinds. Different hash functions will yield different hash values, but this is no different than looking at a physical specimen at different magnifications.

By naasking at Sat, 2014-07-19 14:50 | login or register to post comments

See Newell/Simon's Turing lecture

Hashes are generated from the observable properties of an object, and so are natural kinds.

For Newell, this means that hashes designate an aspect of the object and provide remote access to that aspect.

By Thomas Lord at Sat, 2014-07-19 16:06 | login or register to post comments

Well, a hash does represent

Well, a hash does represent a supertype of the content being hashed (it matches the content hashed and any other content that generates the same hash). Might not be a useful type though.

By Sean McDirmid at Fri, 2014-07-18 05:13 | login or register to post comments

example of hash as name

(For folks who forgot the relevant discrete math, a relation on a set of pairs that is reflexive, symmetric, and transitive causes the set of pairs to partition into disjoint equivalence classes. Keys with the same hash are in the same equivalence class, and the hash itself can be used to name the class unambiguously.)

A hash is never a name of the content hashed. Calling it 'a name of some equivalence class' is perhaps true in some sense,

(The following ignores static populations for which you can find a perfect hash. Dynamic content populations are the problem since you can't prevent collisions, though you can make them quite unlikely.)

You can name content with hashes, but complications ensue when collisions must be resolved when different content has the same hash. A few years ago I designed and implemented a distributed cache simulating a hash-based file system, using hashes for names. A full design is a long document, though, and even so a design review involves a lot of argument. I'm not allowed to say how the fussy parts work. The application space of storage using hashes as names is patent encumbered, so the obvious thing isn't safe to pursue without adding extra stuff to algorithms for sake of uniqueness. No one who has done it is going to chime in here.

Step one: cut streams into pieces using either block structure or edge-detection fingerprinting. (Don't use blocks: too many patents.) Cutting alone isn't a unique enough algorithm; another fancy thing must also occur to be unique. Cutting must be very fast, and preferably self-heal after content perturbations (it takes a while to write a self-heal test that measures this in a meaningful way). The way I cut with self-healing has a short and simple couple-sentence unpublished description, but it took a long time to find, so my saying that doesn't help.

Step two: hash content (treating all of it as key). The right hash to pick is an involved analysis, depending on what you want to get done, how fast it must go (to avoid bottleneck), your plan to resolve effect of collision, and the algorithm to manage probability of false positives and/or false negatives where they can occur. To use a smaller, faster hash requires you do something complex along probability management lines. Using a 64-bit hash is a big headache, but can be made to work. A slower 128-bit hash would be the sane way to get done faster.

Step three: design an index suiting what you want done, probably involving both hashmaps and digests (for distributed algorithms). It can be used for compression. (I tell people all compression is dictionary-based compression, and all that varies in algorithms is where dictionary is located and how it's managed.) You can guess what will work, if there's a way to resolve mistakes, as long as the cost of being wrong still leaves you with a win. But probabilistic algorithms have complex stories and take a lot of coding to deal with edge cases to be resilient against errors. Interfaces for distributed async components end up being a lot more complex than everything else, and you're basically required to roll green threads for scaling connection loads. I had a lot of time to think about how C's default synchronous control flow is irritating.

By Rys McCusker at Mon, 2014-07-21 09:01 | login or register to post comments

Seems flawed

I take the hash of an object, which has no collisions, and store it in an array for later use, as I expect to be able to refer to the object by name. A bit later someone inserts a new object with a hash collision, and maybe some supplemental data is generated (index number in the vector of the equivalence class). Later still I want to fetch the named object whose hash/name is stored in some work-in-progress array, but which one was it? Generating a second hash to disambiguate is no help (as there is still a finite probability of a double collision). In general hash collisions are resolved by comparing the original objects byte-by-byte, but for my name this is no good, as I would have to take a copy of the complete object along with the hash to disambiguate.

It works for a filesystem because a file has a unique (path) name anyway. So the hash is just acting as a fast index for random name lookup, and not the name itself (which is necessary for disambiguation).

By Keean Schupke at Mon, 2014-07-21 11:51 | login or register to post comments

not really a universal technique

I would not look for places you can use a hash as a name. I'd wait for a situation where only using a hash as name could work. For example, suppose you want to guess what a remote peer might recognize, and have nothing more than a digest as a clue (perhaps by design goal). Hashes are the only feasible way I know to do that.

Later still I want to fetch the named object whose hash/name is stored in some work-in-progress array, but which one was it?

The answer is, "Don't do that." More specifically, keep doing things however you do it now in a simple and clear way. If you have a work-in-progress collection, you can probably name things so they get serviced without using a hash, unless the only way it can work is by using a hash. If you look things up by an ID of some sort, you might use a hashmap if the collection is more than a handful in size and you want to touch as few cache lines as possible.

Most of the time I use a hash, the goal is to not touch cache lines for any equivalence class other than one containing an item I want to find. (I apply and/or write hashmaps a lot; where I work, I do all of them in product areas I touch. Several different varieties of hashmap are good for different purposes: open vs closed buckets, members by reference vs by value, duplicate keys allowed vs unique keys, etc.) The idea is to get cost per operation that is linear in equivalence class membership size: what folks usually call amortized constant time for hashes with enough bits. When I can't arrange map members have unique keys, all I can do is ensure definitely non-equal items fall in different equivalence classes via different hash values.

(For example, some IPsec objects in the racoon2 library go into global linked lists, and some lookup routines check a large number of attributes, depending on other attributes checked first. Such lists can be turned into hashmaps anyway, despite having no unique key, as long as at least one attribute must always match for a search involved, because hashing that creates an equivalence class where you only see members needing the full check. I sometimes call this striping, to distinguish it from a key/val pair map.)

When using a hash as a name, usually you want only one member in an equivalence class (i.e. no collisions), so you invalidate a hash value that has collisions. That's problematic when you're not at liberty to lose content, but it works well for a cache, especially when you arrange collisions must be rare. (The pragmatic rule of thumb to estimate expected collision rate in a good collision resistant hash is one in 2^(N/2) for a hash of N bits, accounting for the birthday paradox associated with the pigeonhole principle. When evidence reveals a collision, telling you more than one member is present in the same hash equivalence class, you can invalidate (blacklist) a hash for further use. Such loss is very sparse when 2^(N/2) times as many hashes remain good to use, if cache size approximates population size tracked.)

In high-speed, high-volume network traffic deduplication, you can store (hash, content) pairs in a key/val store with a hashmap for local lookup and digests for remote cache summary. When content locality is preserved by writes, partial matches can be found by reading sibling storage entries. I don't currently see a way this applies to a programming language, unless a lot of complex caching support is native builtin; rarely used features should go in libraries, not a base language.

By Rys McCusker at Tue, 2014-07-22 03:16 | login or register to post comments

using hashes as names

I use a 384-bit secure hash as names for my module system, and I can further resolve ambiguous collisions to a small degree (so long as they aren't for type-compatible modules/components). But it's unlikely I'll encounter any collisions.

_{Aside: The difference between "hash values can be used for names" vs. "hash values are names" (referring to named objects or data) is, to me, analogous to the difference between "wood can be used for boats" and "wood is boats". Confusing material with its application is a category error. Wood isn't the boat, not even when used for boats.}

By dmbarbour at Mon, 2014-07-21 14:48 | login or register to post comments

natural language is slippery

I agree with your aside. It's easy to greatly alter meaning by small changes in wording, despite superficial similarity in basic idea. I left out subtle parts of your point by quoting.

By Rys McCusker at Tue, 2014-07-22 03:28 | login or register to post comments

better examples

I think I see what this is about. The address of an object would be better than a hash as an example, as the address can be about the same size (a single integer) but is guaranteed locally unique. IP-address plus memory-address might even be globally unique. Really names for the machine just need to be a unique (within the scope of the object) identifier, so a sequence source of integers will do.

By Keean Schupke at Fri, 2014-07-18 05:59 | login or register to post comments

Allen Newell's own words

Allen Newell's and Herbert Simon's 1975 Turing Award lecture may be of help here. I'll excerpt a section of it. I've added emphasis to the part that should clear up the confusion of several commenters as to Newell's meaning.

Let us return to the topic of symbols, and define a physical symbol system. The adjective "physical" denotes two important featuers: (1) Such systems clearly obey they laws of physics -- they are realizable by engineered systems made of engineered components; (2) although our use of the term "symbol" prefigures our intended interpretation, it is not restricted to human symbol systems.

A physical symbol system consists of a set of entitites, called symbols, which are physical patterns that occur as components of another type of entity called an expression (or symbol structure). Thus, a symbol structure is composed of a number of instances (or tokens) of symbols related in some physical way (such as one token being next to another). At any instant of time the system will contain a collection of these symbol structures. Besides these structures, the system also contains a collection of processes that operate on expressions to produce other expressions: processes of creation, modification, reproduction and destruction. A physical symbol system is a machine that produces through time an evolving collection of symbol structures. Such a system exists in a world of objects wider than just these symbolic expressions themselves.

Two notions are central to this structure of expressions, symbols, and objects: designation and interpretation.

Designation. An expression designates an object if, given the expression, the system can either affect the object itself or behave in ways dependent on the object.

In either case, access to the object via the expression has been obtained, which is the essence of designation.

Interpretation. The system can interpret an expression if the expression designaties a process and if, given the expression, the system can carry out the process.

Thus, for example, that a hash value does not always uniquely identify a hashed value does not change the fact that the hash value can be used in an expression such that processes manipulating the expression may behave in ways dependent on the hashed value. The symbol, the hash value, designates an aspect of the hashed value. It is a name for that aspect of the hashed value. Given the hash value, a process can access that aspect of the hashed value. In Shivers' vernacular, the hash value is a name for the indicated aspect of the hashed value.

By Thomas Lord at Sat, 2014-07-19 16:04 | login or register to post comments

Homeopathic naming

Newell speaks of expressions, and whether expressions designate an object. You seem to conflate the expression with the symbolic result of evaluating it, which may have been obtained by any number of means. With respect to designating the object foo, what makes you believe that Newell would consider `hash(foo)` in the same class as `3` (which might or might not be a result of hashing foo)? It seems to me that the expression `3` does not allow me to depend upon or affect the object foo, and thus does not designate foo under either of Newell's conditions.

In any case, it seems you'd stretch the concept of 'name' to ridiculous ends. If we have an expression `f(g(foo,bar),h(baz,qux)) :: bool`, which objects do you believe are designated by the boolean result of evaluating this? If your answer includes foo, perchance do you also believe in homeopathy?

By dmbarbour at Sat, 2014-07-19 16:59 | login or register to post comments

words in Newell

(aside: I am kind of guessing you are younger than me. In the 1980s I had a lot of time on my hand that I spent reading CS literature that was then well reputed. There have been linguistic shifts since then. I think I find a lot of this stuff easier to parse on its own terms than is typical for younger readers.)

You seem to conflate the expression with the symbolic result of evaluating it,

For Newell, a symbol is a physical chunk of information in a computing system. An expression is a deliberate physical arrangement of symbols. Nothing more, nothing less.

In particular, Newell has no concept in this context of "evaluating an expression" or the "result" of doing so. (Of course he would have no problem in a narrower context with saying something like "evaluating a numeric expression and obtaining an integer result" but we're not in that narrower context here.)

He does have a closely related concept: interpretation.

A process is a set of physical actions that manipulate expressions in ways interestingly sensitive to their information content. Rules can describe the behavior of a process. An expression can designate a set of rules for a process. If a process follows those designated rules on the basis of their presence in some expression then the process is interpreting the expression.

With respect to designating the object foo, what makes you believe that Newell would consider `hash(foo)` in the same class as `3` (which might or might not be a result of hashing foo)?

An expression that spells out "hash (foo)" presumably designates a process. If the expression is interpreted, a new expression, like "3", might be produced.

Here, "3" would seem to designate an abstract property of whatever object "foo" also designates.

A process working on that "3" can "behave in ways dependent on" whatever object "foo" also designates. In that sense "3" gives the process some access to that object. The "3" designates that object.

In any case, it seems you'd stretch the concept of 'name' to ridiculous ends.

Newell is deliberately expressing a very broad, reductionist concept. If he is "stretching" symbolic designation, at least it is on purpose.

By Thomas Lord at Sat, 2014-07-19 17:17 | login or register to post comments

Evaluation is a

Evaluation is a paradigm-specific example of what Newell describes (in your quote) as "an evolving collection of symbol structures"; effectively, it refers to a future symbol structure in a functional paradigm (whereas Newell's 'physical symbol system' can describe many non-functional paradigms, such as term rewriting).

Newell does not clarify within your above quotation whether 'designation' would be transitive over this evolution/evaluation. That is, it is not clear that if `hash(foo)` evolves to `3`, that the designations of `foo` and `hash` are preserved. He refers only to expressions - symbol structures at a given instant in time.

Here, "3" would seem to designate an abstract property of whatever object "foo" also designates.

I clearly said `3` may or may not have been the result of evaluating the expression `hash(foo)`. It might have been the result of evaluating `hash(bar)` for example. So tell me again why you believe the expression 3 - i.e. a hash value - would seem to designate object foo.

And I notice that you weaseled out of answering my question regarding the result of evaluating f(g(foo,bar),h(baz,qux)):bool. Did that question make you uncomfortable?

Your notion of 'reductionist' seems in this case to be the opposite of reductionism.

By dmbarbour at Sat, 2014-07-19 18:22 | login or register to post comments

'foo' and '3'

That which a symbol designates is not simply a property of the symbol itself but is dependent on the behavior of the system.

Designation. An expression designates an object if, given the expression, the system can either affect the object itself or behave in ways dependent on the object.

In either case, access to the object via the expression has been obtained, which is the essence of designation.

So, yes, in different systems 'foo' and '3' might designate the same object or different objects.

The question of "transitivity" of designations is overly specific.

Newell is being more abstract than that. For example, there might be more than one way to analyze a given system. These different modes of analysis might entail distinct equivalence relations among designated objects. In other words, the question "Do foo and 3 designate the same object?" may have more than one answer.

In the lecture, Newell and Simon develop the concept of a physical symbol system, capable of both designation and interpretation, as a particular description of a universal computing machine so that they can state a hypothesis about artificial intelligence:

The Physical Symbol System Hypothesis. A physical symbol system has the necessary and sufficient means for general intelligent action.

An elaboration of the meaning of the hypothesis is given in the lecture.

In Shivers we have:

As far as Newell was concerned, this is the purpose served by names, or symbols, in all computational systems. Including the one in your head. When he said â€œdistal access,â€ he assumed that you, too, have structures on one side of your head representing what you know about, say, Neil Armstrong, and a smaller structure on the other side of your brain encoding the name â€œNeil Armstrong,â€ and cognitive mechanisms allowing you to fetch the former given the latter.

Computing systems are comprised of physical arrangements of symbols into expressions which evolve under the action of processes.

The symbols, in this analytic approach, are always about some thing distinct from themselves.

Universally, computers are processing meaningful expressions about other things.

And I notice that you weaseled out of answering my question [....] Did that question make you uncomfortable?

In my opinion those are inappropriate ways to express yourself here. I saw you asked a technical question in the elided part but I was distracted from reading it by your crudeness.

By Thomas Lord at Sat, 2014-07-19 19:48 | login or register to post comments

eye of the beholder

(This is basically all positive feedback, in the form of elaboration.)

That which a symbol designates is not simply a property of the symbol itself but is dependent on the behavior of the system.

Yes, meaning is caused by interpretation. Bit-strings have no meaning without context. Operations applied to values is what gives them meaning. Property is imbued by usage. Symbol is a role projected on bit-strings by behavior in an environment. (Location can project meaning on bit-strings when location matters to environment context.)

For example, hashes are a projection from values treated as keys. Keys (bit-strings treated in the role of keys) don't have hash as a property independent from behavior choosing to generate hash from keys using a particular algorithm for a target purpose. Unfortunately, intention gets into the picture, introduced by goals pursued by programmers; what happens to bits after usage (or what can happen to them, if wired up suitably), is what gives them meaning, in so far as they eventually affect more system behavior or outputs consumed by a human (or consumed by a cybernetic artifact constructed by humans). Actual or potential effect is the metric.

Folks discussing names and symbols almost invariably talk about what happens in human minds; for example, this figures into Newell's stuff. But that's a different meaning of symbol, and has no relation to a bit-string used in the role of symbol in a computing system, except to the extent a person wants to see it that way by analogy. It's only intention to relate a symbol humans care about to one in a computing system that causes any relation, because there is none objectively beyond effects caused by behavior introduced by programmers.

By Rys McCusker at Sat, 2014-07-19 20:58 | login or register to post comments

Again, you quote "fetch the

Again, you quote "fetch the former given the latter" yet contradict yourself by trying to include as 'names' those symbols in contexts where they cannot fetch anything and where information regarding which objects they might have designated is lost.

It seems to me you stretch Newell's words quite liberally. It is not clear to me that Newell (much less Shivers!) conflates the broad idea of 'designation' with names, vs. considering naming a subset of designation. It is certainly clear from the examples that Shivers doesn't consider relevant the 'every symbol is a name for something' concept that you're attributing to Newell.

In any case, I'm done discussing this subject.

By dmbarbour at Sat, 2014-07-19 22:14 | login or register to post comments

more of Newell's words

In Newell's language, in a physical symbol system there is no data other than symbols arranged as expressions. These are always designators for more of the same.

Tracing the history of the concept of a physical symbol system, a concept which he says wasn't fully formed until the 1950s, he comes to the example of list processing:

List Processing. The next step, taken in 1956, was list processing. The contents of the data structures were now symbols, in the sense of our physical symbol system: patterns that designated, that had referents. Lists held addresses which permitted access to other lists -- thus the notion of list structures. That this was a new view was demonstrated to us many times in the early days of list processing when colleagues would ask where the data were -- that is, which list finally held the collections of bits that were the content of the system. They found it strange that there were no such bits, there were only symbols that designated yet other symbols structures.

The LISP atom '3', in this manner of speaking, can be said to designate the square root aspect of the atom '9' as well as the predecessor aspect of the atom '4'. We could also say that the atom '3' designates a whole wealth of arithmetic relations to other numeric atoms.

Shivers on how "symbols" and "names" relate, per Newell:

Newell and Simon's Turing award cited them for their work on â€œsymbol systems.â€ Newell once told me that this was just names, [...]

It's funny because for Newell it seems this formalizable description of a universal computing machine is uncontroversial: trivially recognizable as equally powerful with other universal machine formulations; merely slightly unfamiliar since, like turtles holding up the world, it's symbols and designations all the way down.

It's the Physical Symbol System Hypothesis that relates physical symbol systems to general intelligence that is unsettled and (to he and Simon, at least) interesting.

In any case, I'm done discussing this subject.

Again? OK!

By Thomas Lord at Sun, 2014-07-20 01:11 | login or register to post comments

User login

Navigation

What's in a name?

Comment viewing options

Browse archives

Active forum topics

New forum topics

Recent comments