LtU Forum

hashing facts (the secret of nnimh)

My intention here is to start a topic on "entertaining hashing facts", in case a programming language decides to build some particular hash algorithm into a module with persistent impact (in a manner obligating backwards compatibility). In the spirit of proverb measure twice and cut once, such decisions out to be informed by reliable facts instead of guesses. I wonder if others know things useful in planning PL effects.

It's possible there's not much interesting to say, in which case the discussion goes nowhere. That's okay. But here's an example of what I mean. A few months ago I was thinking about putting a checksum in a persistent file format, where it was going to be important that a PL portably knew how to do the right checksum. The 32-bit checksum used by gzip has a polynomial specified by RFC, which appears in the penultimate four bytes of a gzip file format, in little endian order (followed by uncompressed length in the final four bytes, also in little endian order). While you can just use crc32() from the zlib library for this, what if you didn't, perhaps because a 'better' version is dictated by local code rules? How can you informally check the right checksum is generated for a standard input?

I wrote a search over four and five letter lowercase sequences, followed by a newline, to find crc checksums with a memorable looking hex pattern, by printing those with patterns like anagrams in hex nibbles. I was really lucky, because I found an input whose hash has the same eight digits in a row, so big and little endian order look the same.

% echo nnimh > nnimh.txt
% hexdump nnimh.txt
0000000 6e 6e 69 6d 68 0a                              
0000006
% gzip nnimh.txt 
% hexdump nnimh.txt.gz 
0000000 1f 8b 08 08 80 e2 cb 55 00 03 6e 6e 69 6d 68 2e
0000010 74 78 74 00 cb cb cb cc cd e0 02 00 ee ee ee ee
0000020 06 00 00 00                                    
0000024

As you can see, the value of crc32(0, "nnimh\n", 6) is equal to 0xeeeeeeee, and this is easy to check on a command line when an environment has ready access to gzip and hexdump. As a secondary bonus, to me the pattern has a mnemonic, because it reminds me of a Disney movie, The Secret of NIMH.

Maybe discussing hashes and checksums is a dumb idea, in which case I apologize for bringing it up. I guess I'm hoping folks will avoid an argument about the fastest or most effective hash, which are things easy to measure empirically if you write a careful test. (For example, in a hashmap resolving collision by probing, a hash is best when unused slots are distributed uniformly, which you can measure as standard deviation of distance between non-adjacent empty buckets. I don't think intuitive opinion has a place in arguments based on empirical tests.)

Parsing@SLE 2015: Call for talk proposals

*********************************************************************
CALL FOR TALK PROPOSALS

Third Workshop on Parsing@SLE 2015

October 25, 2015
Pittsburgh, Pennsylvania, United States
Co-located with SPLASH 2015: OOPSLA, Onward!, SLE and GPCE

http://2015.splashcon.org/track/ParsingAtSLE2015
*********************************************************************
Deadline for talk proposals: August 31, 2015

The goal of this workshop is to bring together today's experts in the fields of parser construction and parser application from across the diverse application areas. Participants will present ongoing work as well as explore the challenges that lie ahead. By bringing the whole community together (a rare occurrence, given the diversity of domain-specific conferences/workshops), we hope to have a wide-ranging collection of talks on parsing-related topics, and forge new collaborations.

*** Topics ***

While parsing and parser generation, both in theory and in practice, are mature topics, there are still many challenging problems with respect to the construction, maintenance, optimization, and application of parsers in real world scenarios.

Especially in the context of real programming languages there are ample theoretical as well as practical obstacles to be overcome. Contemporary parsing challenges are caused by programming-language evolution and diversity in the face of new application areas such as IDE construction, reverse engineering, software metrics, domain specific (embedded) languages, etc. What are modular formalisms for parser generation? How to obtain (fast and correct) parsers for both legacy and new languages that require more computational power than context-free grammars and regular expressions can provide? How to use increasing parallelism offered by multi-cores and GPUs in parsers? How to enable the verified construction or prototyping of parsers for languages such as COBOL, C++ and Scala without years of effort?

In addition to the traditional programming-language applications of parsing technology, several other areas of computing also depend heavily on parsers. Examples include computational linguistics, network traffic classification, network security, and bioinformatics. Those areas often have their own unusual requirements, such as: speed (e.g. in network algorithmics), memory efficiency (e.g. embedded devices for networks, but also computational linguistics), or rapid/dynamic parser construction (e.g. in network traffic classification and in bioinformatics) as grammars are adapted. We encourage talk proposals on parsing challenges and solutions in such non-traditional areas as well.

*** Call for Submissions ***

We solicit talk proposals in the form of short abstracts (max. 2 pages in ACM 2-column format). A good talk proposal describes an interesting position, demonstration, or early achievement. The submissions will be reviewed on relevance and clarity, and used to plan the mostly interactive sessions of the workshop day. Parsing@SLE is not a publication venue. Publication of accepted abstracts and slides on the website is voluntary.

* Deadline for talk proposals: Monday August 31, 2015
* Notification: Monday September 7, 2015
* Workshop: Sunday October 25, 2015
* Submission website: https://www.easychair.org/conferences/?conf=parsingsle2015

*** Previous Editions ***

Parsing@SLE 2013: http://www.sleconf.org/blog/11-20-2013-parsing-at-sle-2013
Parsing@SLE 2014: http://www.sleconf.org/2014/Parsing-at-SLE.html

*** Organizers ***

* Loek Cleophas, Umeå University, Sweden; and FASTAR, Stellenbosch University, South Africa
* Ali Afroozeh, CWI research institute for mathematics and computer science in the Netherlands

Don't use "Yield" for co-routines; instead use "Postpone"

A pattern followed “↑” and a type can be used to match the pattern with something which has been extended from the type. For example,

 
PostponedFringe.[aTree:Tree]:FutureList◅String▻  ≡ 
   aTree � 
       aLeaf↑Leaf ⦂ [aLeaf.getString[ ]] ⍌
       aFork↑Fork ⦂ [⩛Fringe.[aFork.getLeft[ ]],  ⩛Postpone Fringe.[aFork.getRight[ ]]]

For example, ["The", "boy"]▮ is equivalent to the following:
PostponedFringe.[Fork.[Leaf.["The"], Leaf.["boy"]]]▮

The procedure Fringe can be used to define SameFringe? that determines if two trees have the same fringe [Hewitt 1972]:

   SameFringe?.[aTree:Tree, anotherTree:Tree]:Boolean ≡  //  test if two trees have the same fringe
               PostponedFringe.[aTree] = PostponedFringe.[anotherTree]▮

For example, [False, True]▮ is equivalent to the following:

Let aList ← PostponedFringe.[Fork.[Leaf.["The"], Leaf.["boy"]]]。    // aList = ["The", "boy"]
  [SameFringe?.[aList, ["The", "girl"],              // False
   SameFringe?.[aList, ["The", "boy"]]]▮           // True

Note that Leaf can be defined as an extension of Tree:

Actor Leaf.[aString:String] extension Tree using
        getString[ ]:String → aString▮

and Fork can be defined as an extension of Tree:

Actor Fork.[aLeft:Tree, aRight:Tree] extension Tree using
        getLeft[ ]:Tree → aLeft,
        getRight[ ]:Tree → aRight▮

Edited for clarity.

RAII and Async Stackless Fibers

I like RAII for exception safe programming, and as a general model for resource management. In another thread it was mentioned that this does not work well with an asynchronous stackless fibre model (where stackless might mean cactus stacks). As I like the fibre model for managing concurrency as well, because you don't need to have threads waiting on high latency services, I wanted to continue that discussion, focusing on: is RAII really a problem with these kind of fibres? What is the core problem? How could a new language combine these features in a way that avoids the problem (if the problem is a significant one)?

I don't really see what the problem is, as in simple terms RAII is attaching a resource (be it memory, file-handle, or any other) to a handle that is placed on a stack. The resource is allocated when the handle is created, and freed when the handle is destroyed. The handle is uncopyable, so it can only be assigned using move-semantics. Naively this would seem to work fine with cactus stacks.

"Inconsistency Robustness" now available

Inconsistency robustness is information system performance in the face of continually pervasive inconsistencies---a shift from the previously dominant paradigms of inconsistency denial and inconsistency elimination attempting to sweep them under the rug. Inconsistency robustness is a both an observed phenomenon and a desired feature:

• Inconsistency Robustness is an observed phenomenon because large information-systems are required to operate in an environment of pervasive inconsistency.
• Inconsistency Robustness is a desired feature because we need to improve the performance of large information system.

This volume has revised versions of refereed articles and panel summaries from the first two International Symposia on Inconsistency Robustness conducted under the auspices of the International Society for Inconsistency Robustness (iRobust http://irobust.org). The articles are broadly based on theory and practice, addressing fundamental issues in inconsistency robustness.

The field of Inconsistency Robustness aims to provide practical rigorous foundations for computer information systems dealing with pervasively inconsistent information.

Please write a review here.

Branch Prediction and the Performance of Interpreters - Don’t Trust Folklore

The following paper has been posted on Hacker News and seems very relevant to the LtU audience, including its more practically-minded part.

Branch Prediction and the Performance of Interpreters - Don’t Trust Folklore
Erven Rohou, Bharath Narasimha Swamy, André Seznec

Interpreters have been used in many contexts. They provide portability and ease of development at the expense of performance. The literature of the past decade covers analysis of why interpreters are slow, and many software techniques to improve them. A large proportion of these works focuses on the dispatch loop, and in particular on the implementation of the switch statement: typically an indirect branch instruction. Folklore attributes a significant penalty to this branch, due to its high misprediction rate. We revisit this assumption, considering state-of-the-art branch predictors and the three most recent Intel processor generations on current interpreters. Using both hardware counters on Haswell, the latest Intel processor generation, and simulation of the ITTAGE, we show that the accuracy of indirect branch prediction is no longer critical for interpreters. We further compare the characteristics of these interpreters and analyze why the indirect branch is less important than before.

The paper measures the accuracy of branch predictors on the indirect branch found in "naive" instruction decoding loops of bytecode interpreters. Experiments are performed with CPython, Javascript (SpiderMonkey) and CLI (.NET) on the interpreter side, and recent Intel processors as well as the authors' own state-of-the-art predictor (simulated) on the predictor side. The conclusion is that, assuming reasonably-sized predictor state, current prediction schemes fare very well on the dreaded indirect branch of while/switch interpreters. Reading between the lines, this makes well-known dispatch optimization techniques (e.g., jump threading) somewhat obsolete, at least on top of the line Intel processors.

I, for one, would like to see more work in this style, with hardware people bringing their expertise and experimental rigor to low-level PL implementation issues.

Also, I suspect that the PL implementation community would be interested in additional results, such as other interpreters and less aggressive microarchitectures (ARM processors, anyone?).

Albatross has grown two mighty wings: Induction and Recursion

The version 0.2 of the Albatross programming language / proof assistant has been released.

It has some new features. The most important ones are induction and recursion which come together with inductive data types. With these two features Albatross approaches the expressiveness of Coq. It is possible to declare lists, trees, define recursive functions on them and prove properties of these functions.

The documentation contains examples of lists, binary trees and a verified insertion sort algorithm. All the examples a fully verifiable with the version 0.2 of the Albatross compiler (download here).

Some of the key features of Albatross:

  • No special proof language: All proofs are expressed via boolean expressions. Like a mathematician you make assumptions a draw step by step conclusions
  • Defined and understandable proof automation: There are no tactics. Everybody who understands the language and its proof engine can understand proofs without executing them. Proofs communicate to the human reader and to the compiler
  • Albatross' logic is classical predicate logic which is easier to understand for those not familiar with dependant type theory.
  • Albatross has been designed from the beginning as a programming language. I.e. once the backend is available, Albatross programs can be compiled to executable code without any intermediate code extraction.

SPLASH 2015 - Call for Student Volunteers

/************************************************************************************/
ACM Conference on Systems, Programming, Languages, and Applications:
Software for Humanity (SPLASH'15)

Pittsburgh, Pennsylvania, USA
25th-30th October, 2015

http://www.splashcon.org

Sponsored by ACM SIGPLAN

/************************************************************************************/
Call for Student Volunteers
/************************************************************************************/

The ACM SIGPLAN conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH) embraces all aspects of software construction and delivery to make it the premier conference at the intersection of programming, languages, and software engineering.

** Student Volunteers **
The SPLASH Student Volunteer program provides an opportunity for students from around the world to associate with some of the leading personalities in industry and research in the following areas: programming languages, object-oriented technology and software development. Student volunteers contribute to the smooth running of the conference by performing tasks such as: assisting with registration, providing information about the conference to attendees, assisting session organizers and monitoring sessions.

Applications Due: 7 August, 2015
http://2015.splashcon.org/track/splash2015-sv

Student Volunteer Co-Chairs: Jonathan Bell (Columbia University) and Daco Harkes (TU Delft)

Information:
SPLASH Early Registration Deadline: 25 September, 2015
Contact: info@splashcon.org
Website: http://2015.splashcon.org

Location:
Sheraton Station Square Hotel
Pittsburgh, Pennsylvania, United States

Organization:
SPLASH General Chair: Jonathan Aldrich (Carnegie Mellon University)
OOPSLA Papers Chair: Patrick Eugster (Purdue University)
Onward! Papers Chair: Gail Murphy (University of British Columbia)
Onward! Essays Chair: Guy Steele (Oracle Labs)
DLS Papers Chair: Manuel Serrano (INRIA)

Artifact Evaluation Co-Chairs: Robby Findler (Northwestern University) and Michael Hind (IBM Research)
Demos Co-Chairs: Igor Peshansky (Google) and Pietro Ferrara (IBM Research)
Inspirations Co-Chairs: Darya Kurilova (Carnegie Mellon University), Zach Tatlock (University of Washington), and Crista Lopes (UC Irvine)
Local Arrangements Chair: Claire Le Goues (Carnegie Mellon University)
Posters Chair: Nick Sumner (Simon Fraser University)
Publications Chair: Alex Potanin (Victoria University of Wellington)
Publicity and Web Co-Chairs: Craig Anslow (University of Calgary) and Tijs van der Storm (CWI)
SPLASH-E Chair: Eli Tilevich (Virginia Tech)
SPLASH-I Co-Chairs: Tijs van der Storm (CWI) and Jan Vitek (Northeastern University)
Sponsorship Chair: Tony Hosking (Purdue University)
Student Research Competition Co-Chairs: Sam Guyer (Tufts University) and Patrick Lam (University of Waterloo)
Student Volunteer Co-Chairs: Jonathan Bell (Columbia University) and Daco Harkes (TU Delft)
Wavefront Co-Chairs: Dennis Mancl (Alcatel-Lucent)
Web Technology Chair: Eelco Visser (TU Delft)
Workshops Co-Chairs: Du Li (Carnegie Mellon University) and Jan Rellermeyer (IBM Research)
SLE General Chair: Richard Paige, University of York
GPCE General Chair: Christian Kästner, Carnegie Mellon University
PLoP General Chair: Filipe Correia, University of Porto
/************************************************************************************/

NOOL 2015

I know there are some secret and not so secret object lovers here, so this workshop might be of interest. In particular, I think OO research needs a big kick in the butt: objects are still quite useful but our entrenched OO languages and systems leave much to be desired, it is up to the research community (as well as enlightened practitioners) to show the world that there is hope in objects yet! Info:

The 0th Workshop on New Object-Oriented Languages (NOOL) 2015

NOOL-15 is a new unsponsored workshop to bring together users and implementors of new(ish) object oriented systems. Through presentations, and panel discussions, as well as demonstrations, and video and audiotapes, NOOL-15 will provide a forum for sharing experience and knowledge among experts and novices alike.

Aggregations (e.g., sets) in Logic Programs

Prolog and its derivatives have lacked adequate ways to compute with aggregations, e.g., sets.

For example, suppose there is a ground-complete predicate Link[aNode, anotherNode, aCost]
that is true exactly when there is a path from aNode to anotherNode with aCost.

When ⊩ Path[aNode, aNode, aCost]→ // when a goal is set for a cost between aNode and itself
   ⊢ aCost=0▮           // assert that the cost from a node to itself is 0

The following goal-driven Logic Program works forward from start to find the cost to finish:

  
When ⊩ Path[start, finish, aCost]→ 
   ⊢ aCost=Minimum {nextCost + remainingCost
                     | ⊨ Link[start, next≠start, nextCost], Path[next, finish, remainingCost]}▮
          // a cost from start to finish is the minimum of the set of the sum of the
              // cost for the next node after start and
                   // the cost from that node to finish

The following goal-driven Logic Program works backward from finish to find the cost from start:

When ⊩ Path[start, finish, aCost]→ 
   ⊢ aCost=Minimum {remainingCost + previousCost 
                      |⊨ Link[previous≠finish, finish, previousCost], Path[start, previous, remainingCost]}▮
          // the cost from start to finish is the minimum of the set of the sum of the
              // cost for the previous node before finish and 
                  // the cost from start to that Node 

Note that the above Logic Programs work together concurrently providing information to each other.

For more information see Inconsistency Robustness for Logic Programs

XML feed