User loginNavigation |
archiveshashing facts (the secret of nnimh)My intention here is to start a topic on "entertaining hashing facts", in case a programming language decides to build some particular hash algorithm into a module with persistent impact (in a manner obligating backwards compatibility). In the spirit of proverb measure twice and cut once, such decisions out to be informed by reliable facts instead of guesses. I wonder if others know things useful in planning PL effects. It's possible there's not much interesting to say, in which case the discussion goes nowhere. That's okay. But here's an example of what I mean. A few months ago I was thinking about putting a checksum in a persistent file format, where it was going to be important that a PL portably knew how to do the right checksum. The 32-bit checksum used by gzip has a polynomial specified by RFC, which appears in the penultimate four bytes of a gzip file format, in little endian order (followed by uncompressed length in the final four bytes, also in little endian order). While you can just use I wrote a search over four and five letter lowercase sequences, followed by a newline, to find crc checksums with a memorable looking hex pattern, by printing those with patterns like anagrams in hex nibbles. I was really lucky, because I found an input whose hash has the same eight digits in a row, so big and little endian order look the same. % echo nnimh > nnimh.txt % hexdump nnimh.txt 0000000 6e 6e 69 6d 68 0a 0000006 % gzip nnimh.txt % hexdump nnimh.txt.gz 0000000 1f 8b 08 08 80 e2 cb 55 00 03 6e 6e 69 6d 68 2e 0000010 74 78 74 00 cb cb cb cc cd e0 02 00 ee ee ee ee 0000020 06 00 00 00 0000024
As you can see, the value of Maybe discussing hashes and checksums is a dumb idea, in which case I apologize for bringing it up. I guess I'm hoping folks will avoid an argument about the fastest or most effective hash, which are things easy to measure empirically if you write a careful test. (For example, in a hashmap resolving collision by probing, a hash is best when unused slots are distributed uniformly, which you can measure as standard deviation of distance between non-adjacent empty buckets. I don't think intuitive opinion has a place in arguments based on empirical tests.) |
Browse archivesActive forum topics |
Recent comments
22 weeks 2 days ago
22 weeks 2 days ago
22 weeks 2 days ago
44 weeks 3 days ago
48 weeks 5 days ago
50 weeks 2 days ago
50 weeks 2 days ago
1 year 6 days ago
1 year 5 weeks ago
1 year 5 weeks ago