LtU Forum

Recursive Descent Parser Generators

I'm writing a parser generator, to avoid being TLDR (which is most of my messages) I wanted to ask if anyone knows of other LL(*) parser generators, what I should aim for performance wise (what's acceptable) and if you've had experience writing something similar what's one of the toughest nuts to crack on this style of parser?

For me (more than one thing) :

  1. Generating Parsers with unbound look-ahead uses a lot of memory to do statically so none (or very little) of the processing is done during run-time.
  2. Follow predictions are a PITA: you have to analyze all possible parse paths coming into the given rule, walk the context stack to disambiguate the offending lock(s), and keep stepping ahead and only accept if the parent contexts can parse and that state can also accept (this was a nightmare, but I think I have it...?) I'm writing the prediction discriminator logic now (my most recent task.) Dangling else isn't an issue because iirc it is aware that the parent context is on an edge state once the current rule exits, and I apply to the maximal munch principle.
  3. Lexical Ambiguities: I'm a glutton for punishment, I used includes to allow complex grammars to span multiple files. To avoid issues with parse order and precedence on the user's side (you can still specify which trumps the other, but this trumping order is 'always on' if used) I allow for lexical ambiguities to create new parse states and following those ambiguity identities you can figure out which path was actually intended.
  4. To solve the memory issue, I decided that I would use: a symbol stack, an active rule context, recursive descent, and reductions.
    • Reductions some of solve the issue of memory by identifying where multiple paths of a prediction all enter the same rule on the same state, parsing that rule, and evaluate what comes after that rule. This abates some of the issues with the look-ahead requirements, but adds more complexity when the follow-state of that rule is ambiguous with the prediction! I haven't even gotten to this issue :(
  5. AST Generation: making this look easy is the hard part. It should just work (I will focus on refining it once everything else is done.)
  6. It's so easy to create grammars that are simple but yield an exponential state explosion. I need to do more research to detect this ahead of time so I don't waste time on processing something useless, so the user has a better experience than 'Out of Memory Exception'.

If this ends up working as intended, I'll be astonished, because when I started this eight years ago, I didn't have an effing clue. I still probably don't, but I at least know what I want to do now. The grammars define no semantic predicates, nor actions on lexical matches, so the generator has to do all the work. So far (without error context) it's 50-80 times faster than my own hand-written parser for its own language, until I found that I wasn't handling lexical ambiguities on follow states and now have to fully implement them to do a proper comparison.

Edit:
The goal with this LL(*) approach is there is very little back-tracking employed. The only time it should need to is when it's handling left-recursive rules and it has no choice to do so because it has to fall to a weaker prediction mechanism to yield a preliminary identity onto the stack that it can use to move forward, then rinse and repeat.

Edit 2: Redacting the following, turns out the issue is more complex :) The sample I gave it had zero issues with :(
'No Alt Indirect Left Recursion' is when you have something like:
A ::= B | C;
B ::= A '[' ']';
C ::= "Other";

Without code to properly identify these 'No Alt Indirect Left Recursion' states, it will recurse infinitely because there's no fall-back to choose first, it would always predict 'B' if it has: Other[].

New paper: Theory of Programs

A simple mathematical framework, based on elementary set theory, to cover programming concepts.

See http://bit.ly/1CkZBPj.

-- Bertrand Meyer

Lamport: Interprocess Communication

Something that may be pertinent to the recent (lengthy) discussion of the deep implementation details of interprocess communication is Leslie Lamport's interesting discussion of how to implement various kinds of interprocess communication. The bulk of the paper deals with various kinds of shared registers, which are taken as the primitives through which messages are passed. Starting from non-atomic operations, Lamport shows how to build atomic operations that allow message-passing to occur.

"At a low level, message passing is often considered to be a form of transient communication between asynchronous processes. However, a closer examination of asynchronous message passing reveals that it involves a persistent communication. Messages are placed in a buffer that is periodically tested by the receiver. Viewed at a low level, message passing is typically accomplished by putting a message in a buffer and setting an interrupt bit that is tested on every machine instruction.
...
Hardware implementations of asynchronous communication often make assumptions about the relative speeds of the communicating processes. Such assumptions can lead to simplifications. For example, the problem of constructing an atomic register, discussed below, is shown to be easily solved by assuming that two successive reads of a register cannot be concurrent with a single write. If one knows how long a write can take, a delay can be added between successive reads to ensure that this assumption holds. No such assumptions are made here about process speeds. The results therefore apply even to communication between processes of vastly differing speeds."

Some New Directions for ACP Research

This paper at ArXiv.org lists some new directions for research related to the Algebra of Communicating Processes (ACP). Most of these directions have been inspired by work on SubScript, an ACP based extension to the programming language Scala. SubScript applies several new ideas that build on ACP, but currently these lack formal treatment. Some of these new ideas are rather fundamental. E.g. it appears that the theory of ACP may well apply to structures of any kind of items, rather than to just processes. The aim of this list is to raise awareness of the research community about these new ideas; this could help both the research area and the programming language SubScript.

Practical Principled FRP: Forget the past, change the future, FRPNow!

A new ICFP 15 paper by Ploeg and Claessen; abstract:

We present a new interface for practical Functional Reactive Programming (FRP) that (1) is close in spirit to the original FRP ideas,
(2) does not have the original space-leak problems, without using arrows or advanced types, and (3) provides a simple and expressive way for performing I/O actions from FRP code. We also provide a denotational semantics for this new interface, and a technique (using Kripke logical relations) for reasoning about which FRP functions may “forget their past”, i.e. which functions do not have an inherent space-leak. Finally, we show how we have implemented this interface as a Haskell library called FRPNow.

Fixed link.

Who can make LtU2?

If we had voting on things (a la reddit, slashdot, hacker news, etc.) I think we'd all be better off. There are technological things which won't outright fix everything, but should greatly help I should think.

But we'd need somebody who can actually set that stuff up.

Bi-simulation in security auditing

Bi-simulation has been an important topic in Computer Science (e.g. intensely studied for process calculi) which may be applicable to an important emerging issue for computer engineering, namely security auditing.

See bottom of page 3 of Distributed Public Recording: Providing Security Without the Risks of Mandatory Backdoors

The Page-Fault Weird Machine: Lessons in Instruction-less Computation

In case one instruction is one too many for you. A compiler which executes instructionless code through triggering pagefaults.

This is a proof by construction that the Intel MMU's fault handling mechanism is Turing complete. We have constructed an assembler that translates 'Move, Branch if Zero, Decrement' instructions to C source that sets up various processor control tables. After this code has executed, the CPU computes by attempting to fault without ever executing a single instruction. Optionally, the assembler can also generate X86 instructions that will display variables in the VGA frame buffer and will cause control to be transferred between the native (display) instructions and 'weird machine' trap instructions.

Dedekind, Cantor, Conway, & Hewitt (w/ some Chomsky)

Here, I will attempt to cool down controversy on LtU over Hewitt's construction of Real by explaining it in more familiar terms. I think this will help to shed some light on Direct Logic and begin to hint why it is interesting in a PLT context.

I will borrow conceptually from John H. Conway who wrote, in "On numbers and games":

Let us see how those were good at constructing numbers have approached this problem in the past.

Dedekind (and before him the author -- thought to be Eudoxus -- of the fifth book of Euclid) constructed the real numbers from the rationals. His method was to divide the rationals into two sets L and R in such a way that no number of L was greater than any number of R, and use this "section" to define a new number {L|R} in the case that neither L nor R had an extremal point.

His method produces a logically sound collection of real numbers (if we ignore some objections on the grounds of ineffectivity, etc.), but has been criticised on several counts. Perhaps the most important is that the rationals are supposed to have been already constructed in some other way, and yet are "reconstructed" as certain real numbers. The distinction between the "old" and "new" rationals seems artificial but essential.

Cantor constructed the infinite ordinal numbers. Supposing the integers 1,2,3... given he observed that their order-type ω was a nwe (and infinite) number greater than all of them. Then the order-type of {1,2,3...,ω} is a still greater number ω + 1, and so on, and on, and on. The similar objections to Cantor's procedure have already been met by von Neumann, who observes that it unnecessary to suppose 1,2,3,... given and that it is natural to start at 0 rather than 1. He also takes each ordinal as the set (rather than the order-type) of all previous ones. Thus for von Neumann, 0 is the empty set, 1 the set {0}, 2, the set {0, 1}, ..., ω the set {0, 1, 2, ...} and so on.

In this chapter [Conway teases] we intend to show that these two methods are in reality part of a simpler and more general method by which we can construct a very large Class No of numbers, including at the same time the real numbers and the ordinal numbers, and others such as those mentioned above [e.g. infinitessimals] [....]

Hewitt can be read as constructing Real by specifying a set of 3-way partitions of the dyadic fractions between 0 and 1.

Every Real in [0..1] corresponds to three sets: {L | M | R}

The sets are ordered so that no member of L is greater than or equal to any member of M or R, no member of M is greater than or equal to any member of R.

  Zero:

  0 ≡ {{} | {} | Dyad} 
  where Dyad is the set of all dyadic fractions between 0 and 1

  One:
  1 ≡ {Dyad | {} | {}} 

Hopefully the dyadic fractions are themselves members of Real and we can memorialize that fact in a manner similar to Dedekind:

  ∀d ∈ Dyad, {dl | { d } | dr } ∈ ℝ

  where dl ≡ { x ∈ Dyad | x < d }
    and dr ≡ { x ∈ Dyad | x > d }

  Note: the Real {dl | { d } | dr } is called d.

The rationals in general correspond to regular languages as follows:

  Let Rat be the set of of sets of finite strings of 0 and 1 such that:

  ∀r in Rat, r is a regular language
    r is well ordered by a prefix relation
    ∀s in r, s is finite length and ends with 1

    rl ≡ { d ∈ Dyad | ∃x ∈ r, x > d }
    rm ≡ { d ∈ r | ∀x ∈ r, x <= d }
    rr ≡ { d ∈ Dyad | ∃x ∈ r, x < d }

   Note: the Real { rl, rm, rr } is called r }

As you can see, the subset of Real given by Rat is the set of all rational fractions between 0 and 1.

The rationals are given by certain regular languages over the alphabet {0,1}. Similarly, the constructible reals are given by certain recursivvely enumerable languages.

  Let Tructable be the set of all recursively enumerable sets of finite 
  strings of 0s and 1s, each string ending with a 1, such that each member
  of Tructable is well ordered by a prefix relation.

  By analogous construction to the rationals:

  ∀ t ∈ Tructable, { tl, tm, tr } is in Real

All rationals (dyadic and otherwise) are Tructable.

Finally, we consider a set of languages of finite strings for which the generator functions are non-deterministic.

  Let Fracs be the set of all sets of finite strings of 0s and 1s, 
  each string ending with a 1, such that each member of Fracs is
  well ordered by a prefix relation.

  By analogous construction, 

  ∀ f ∈ Fracs, { fl, fm, fr } is in Real

Note that if a member f of Fracs has a maximal element, then f is also a dyadic rational.

Note that if a member f is a regular language over {0,1}, then it is a rational.

Note that if a member of f is a recursively enumerable set, then it is a tructable real.

There are other possibilities. By a diagonalization argument, we know that Fracs contains members f which are not recursively enumerable. Those f are the unconstructible reals.

Let's suppose that the real world contains fair randomness. For example, there is a coin flip or some other kind of measurement we can repeat that will produce random outcomes. Neither heads or tails will be permanently starved. A sequence of coinflips on a given world-line will eventually produce both a 0 and a 1. Furthermore, on some world-lines, the sequence of coin flips is not described by any recursively enumerable function (at least assuming that there is no upper bound on the number of times we get to flip the coin).

The number of possible world-lines on which coin flips might be recorded is uncountable and, indeed, corresponds nicely to the members of Fracs.

In other words, a machine can enumerate the members of any f in Fracs, from shortest to longest. Such a machine, on a particular world-line, is pretty much what Hewitt means by an "Actor address".

SPLASH 2015 - 2nd Combined Call for Contributions

/************************************************************************************/
ACM Conference on Systems, Programming, Languages, and Applications:
Software for Humanity (SPLASH'15)

Pittsburgh, Pennsylvania, USA
25th-30th October, 2015

http://www.splashcon.org

Sponsored by ACM SIGPLAN

/************************************************************************************/
COMBINED CALL FOR CONTRIBUTIONS
Demos
Doctoral Symposium
Dynamic Languages Symposium (DLS)
OOPSLA Artifacts
Posters
SPLASH-E
Student Research Competition
Student Volunteers
Tutorials
Wavefront
Workshops

Co-Located Conferences: SLE, GPCE, DBPL, PLoP
/************************************************************************************/

The ACM SIGPLAN conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH) embraces all aspects of software construction and delivery to make it the premier conference at the intersection of programming, languages, and software engineering. SPLASH is now accepting submissions. We invite high quality submissions describing original and unpublished work.

Most of the following tracks have submissions due: 30 JUNE

** Demos **
The SPLASH Demonstrations track is an excellent vehicle for sharing your latest work with an experienced and technically savvy audience. Live demonstrations show the impact of software innovation. Demonstrations are not product sales pitches, but rather an opportunity to highlight, explain, and present interesting technical aspects of running applications in a dynamic and highly interactive setting. Presenters are encouraged to actively solicit feedback from the audience, which should lead to very interesting and entertaining demonstration sessions.

Submissions Due: 30 June, 2015
http://2015.splashcon.org/track/splash2015-demos

** Doctoral Symposium **
The SPLASH Doctoral Symposium provides students with useful guidance for completing their dissertation research and beginning their research careers. The Symposium will provide an interactive forum for doctoral students who have progressed far enough in their research to have a structured proposal, but will not be defending their dissertation in the next 12 months.

Submissions Due: 30 June, 2015
http://2015.splashcon.org/track/splash2015-ds

** Dynamic Languages Symposium (DLS) **
The 11th Dynamic Languages Symposium (DLS) at SPLASH 2015 is the premier forum for researchers and practitioners to share knowledge and research on dynamic languages, their implementation, and applications. The influence of dynamic languages – from Lisp to Smalltalk to Python to Javascript – on real-world practice and research continues to grow.

Submissions Due: 15 June, 2015
http://2015.splashcon.org/track/dls2015-papers

** OOPSLA Artifacts **
The Artifact Evaluation process is a service provided by the community to help authors of accepted papers provide more substantial supplements to their papers so future researchers can more effectively build on and compare with previous work. The Artifact Evaluation Committee has been formed to assess how well paper authors prepare artifacts in support of such future researchers. Roughly, authors of papers who wish to participate are invited to submit an artifact that supports the conclusions of the paper.

Submissions Due: 9 June, 2015
http://2015.splashcon.org/track/splash2015-artifacts

** Posters **
The SPLASH Poster track provides an excellent forum for authors to present their recent or ongoing projects in an interactive setting, and receive feedback from the community. We invite submissions covering any aspect of programming, systems, languages and applications. The goal of the poster session is to encourage and facilitate small groups of individuals interested in a technical area to gather and interact. It is held early in the conference, to promote continued discussion among interested parties. Posters can be independent presentations or associated with one of the other parts of SPLASH.

Submissions Due: 30 June, 2015
http://2015.splashcon.org/track/splash2015-posters

** SPLASH-E **
The SPLASH-E track brings together researchers and educators to share educational results, ideas, and challenges centered in Software and Programming Languages. Submission formats vary, including papers, tool demos, lightning talks, challenge-topics for discussion, and suggested themes for "unconference" sessions. Help us create an engaging forum for educational issues related to SPLASH!

Submissions Due: 30 June, 2015
http://2015.splashcon.org/track/splash2015-splash-e

** Student Research Competition **
The ACM SIGPLAN Student Research Competition (ACM SRC) is an internationally-recognized venue that enables undergraduate and graduate students to experience the research world, share their research results with other students and SPLASH attendees. The competition has separate categories for undergraduate and graduate students and awards prizes to the top three students in each category. The ACM SIGPLAN Student Research Competition shares the Poster session’s goal to facilitate interaction with researchers and industry practitioners; providing both sides with the opportunity to learn of ongoing, current research.

Submissions Due: 30 June, 2015
http://2015.splashcon.org/track/splash2015-src

** Student Volunteers **
The SPLASH Student Volunteer program provides an opportunity for students from around the world to associate with some of the leading personalities in industry and research in the following areas: programming languages, object-oriented technology and software development. Student volunteers contribute to the smooth running of the conference by performing tasks such as: assisting with registration, providing information about the conference to attendees, assisting session organizers and monitoring sessions.

Submissions Due: 7 August, 2015
http://2015.splashcon.org/track/splash2015-sv

** Tutorials **
The SPLASH 2015 Tutorials programme will consist of prestigious tutorials on current topics in software, systems, and languages research. The scope of Tutorials is the same as the conference itself: all aspects of software construction and delivery at the intersection of programming, languages, and software engineering. Tutorials in particular focus on the nexus between research and practice, including work that takes inspiration from or builds connections to areas not commonly considered at SPLASH. Tutorials should introduce researchers to current research in an area, or show important new tools that can be used in research.

Submissions Due: 30 June, 2015
http://2015.splashcon.org/track/splash2015-tutorials

** Wavefront **
The SPLASH Wavefront track is looking for presentations and technology talks of interest to the software community, particularly to software professionals working in companies large and small. Wavefront is a forum for presenting experience reports and tutorials about innovative tools, technologies, and software practices.

Submissions Due: 30 June, 2015
http://2015.splashcon.org/track/splash2015-wavefront

** Workshops **
The SPLASH Workshops track will host a variety of high-quality workshops, allowing their participants to meet and discuss research questions with peers, to mature new and exciting ideas, and to build up communities and start new collaborations. SPLASH workshops complement the main tracks of the conference and provide meetings in a smaller and more specialized setting. Workshops cultivate new ideas and concepts for the future, optionally recorded in formal proceedings.

Late Phase Submissions Due: 30 June, 2015
http://2015.splashcon.org/track/splash2015-workshops

** Co-Located Events **

SLE - 8th International Conference on Software Language Engineering (SLE)
Submissions Due: 15 June, 2015
http://conf.researchr.org/home/sle2015

GPCE - 14th International Conference on Generative Programming: Concepts & Experiences (GPCE)
Submissions Due: 21 June, 2015
http://conf.researchr.org/home/gpce2015

DBPL - 15th Symposium on Database Programming Languages (DBPL)
Submissions Due: 15 June, 2015
http://conf.researchr.org/home/dbpl2015

PLoP - 22nd International Conference on Pattern Languages of Programming (PLoP)
Submissions Due: 12 May, 2015
http://www.hillside.net/plop/2015/

Information:
SPLASH Early Registration Deadline: 25 September, 2015
Contact: info@splashcon.org
Website: http://2015.splashcon.org

Location:
Sheraton Station Square Hotel
Pittsburgh, Pennsylvania, United States

Organization:
SPLASH General Chair: Jonathan Aldrich (Carnegie Mellon University)
OOPSLA Papers Chair: Patrick Eugster (Purdue University)
Onward! Papers Chair: Gail Murphy (University of British Columbia)
Onward! Essays Chair: Guy Steele (Oracle Labs)
DLS Papers Chair: Manuel Serrano (INRIA)
Artifacts Co-Chairs: Robby Findler (Northwestern University) and Michael Hind (IBM Research)
Demos Co-Chair: Igor Peshansky (Google) and Pietro Ferrara (IBM Research)
Doctoral Symposium Chair: Yu David Liu, State University of New York (SUNY) Binghamton
Local Arrangements Chair: Claire Le Goues (Carnegie Mellon University)
PLMW Workshop Co-Chairs: Darya Kurilova (Carnegie Mellon University) and Zachary Tatlock (University of Washington)
Posters Co-Chairs: Nick Sumner (Simon Fraser University)
Publications Chair: Alex Potanin (Victoria University of Wellington)
Publicity and Web Co-Chairs: Craig Anslow (University of Calgary) and Tijs van der Storm (CWI)
SPLASH-E Chair: Eli Tilevich (Virginia Tech)
SPLASH-I Co-Chairs: Tijs van der Storm (CWI) and Jan Vitek (Northeastern University)
Student Research Competition Co-Chairs: Sam Guyer (Tufts University) and Patrick Lam (University of Waterloo)
Student Volunteer Co-Chairs: Jonathan Bell (Columbia University) and Daco Harkes (TU Delft)
Sponsorship Chair: Tony Hosking (Purdue University)
Tutorials Co-Chair: Romain Robbes (University of Chile) and Ronald Garcia (University of British Columbia)
Video Chair: Michael Hilton (Oregon State University)
Videos Previews Czar: Thomas LaToza (University of California, Irvine)
Wavefront Co-Chairs: Dennis Mancl (Alcatel-Lucent) and Joe Kiniry (Galois)
Web Technology Chair: Eelco Visser (TU Delft)
Workshop Co-Chairs: Du Li (Carnegie Mellon University) and Jan Rellermeyer (IBM Research)
/************************************************************************************/

XML feed