LtU Forum

Error reporting strategies during parsing

Although I've seen lots of great papers on error recovery during parsing, I haven't been able to find much describing how parse errors are reported.

A couple of the problems I'm having:

  1. if the grammar requires lots of lookahead and backtracking to parse, there's potentially *tons* of possible errors to report if the input is invalid. But reporting all of them might be overwhelming/confusing and probably unhelpful
  2. the position reported for an error often doesn't correspond with the actual position at which the user made the mistake
  3. reporting errors inside nested structure

Are there any papers describing error reporting strategies, or parser generators or combinator libraries implementing simple but effective strategies?

javascript shift-reduce parser

Here is an universal parser in javascript:
http://synth.wink.ws/moonyparser/

I've been to hell and back to make it faster, but that's it, shift-reduce can't be much faster. Unfortunately, finding errors in text makes it three times slower. At least it has linear parsing time.

parser utilizes a new grammar language borrowed from Synth project. It is a kind of structured BNF language.

mobile web apps are slow -- and GC is to blame

The blog post Mobile Web Apps are slow caught my eye. It makes several bold claims: the performance of JavaScript is plateauing, the real problem is memory consumption of web apps, GC performance decreases unless 10 times more memory is available than objects are live, JS memory allocation is too hard to control, GCed languages are unsuited for mobile apps. Unfortunately these claims are only discussed in the context of JS.

Does or would FP suffer from the same problem on systems with limited memory? I would be curious to hear from programmers who use Lua or OCaml for iOS applications about the problems mentioned in the blog post and about GC overhead in FP.

Addendum: The author of the blog post elaborated his post in a much more detailed article
Why mobile web apps are slow
. Meanwhile Andreas Rossberg, who works on Google's V8 implementation of JavaScript, explained to me that the SunSpider benchmark is about the worst possible to judge JS performance. For example, SunSpider gives a lot of weight to date operations that are hardly typical for web apps. But sure enough, JS implementations are now really good at it in order to look good when benched with SunSpider.

Tools that provide "closed" view of open/extensible abstractions?

What is the state of the art in tool support for looking at a codebase composed of open abstractions extended by independent modules, through a "lens" that allows programmers to see the final inter-dependent code as if all the abstractions were closed?

I'm thinking of this question in the context of ideas like the "Independently Extensible Solutions to the Expression Problem" paper. Open abstractions are great in that you can add new functionality without disturbing an existing, working codebase. The down-side is that open abstractions can end up obscuring the structure and control flow of the program that actually runs. When it comes time to debug a program, or diagnose performance issues, it really helps to view/treat a system as closed.

Coming from a performance-oriented world (real-time rendering, games, and GPU compute), this is a common criticism I hear of virtual functions in C++: you need to know about every possible implementer of a virtual function before you have any idea what actually happens at a call site. You can't even intercept "all calls to this particular virtual function" in your debugger.

It almost seems like there is a correspondence between the per-feature modules introduced in something like the "independently extensible" work, and changelists/patches that are shared in a distributed version control system. Each of these approaches allows a discrete feature to be packaged up, and then applied to a codebase. Sometimes you want to see the codebase in terms of a set of composed features/changes, which can be added/removed/reordered easily, and other times you want to see it as a single crystallized artifact that results from "squashing" a set of changes.

Another workable UI metaphor would be the way that layers work in, e.g., Photoshop. Your primary view of a document is the composition of all the active layers, but you can still target edits toward particular layers, or toggle their visibility.

Quote Safe unquote JVM language?

The problem is: running user scripts on your server. Supposing the Java runtime is in use, you'd like to guarantee:

  1. the users code won't blow up the heap and crash every other users scripts.
  2. view or alter other users code and data using reflection
  3. blow out it's thread with recursion abuse

Java can be made pretty 'safe' using the it's built in sandboxing features (SecurityManager, AccessController, and Classloader) but most all JVM languages out there now (JRuby, Jython, Groovy, ...) are dynamic in nature and nearly impossible to 'secure' in the sense of the three items above:

  1. Heap: users can blow up the heap with simple string concatenation
  2. Code/variable visibility: these languages don't respect private access modifiers in general, from there you can own the system by accessing whatever classloading architecture the given JVM language implements. Also depend on this to acheive their dynamic natures.
  3. Recursion; methods call methods not much to be done about this

So you might suppose a language that disallows reflection and heap allocation might be a good thing in such an environment. Suppose 'new' was not a keyword, and a convention were adopted such as 'declaration is instantiation' then you could generate bytecode that would simulate stack allocation thus protecting the heap. Disallow recursion by embedding some code to examine the call stack for the current method.

Has anyone else considered this use case? Am I talking about Ada here?

Crowdsourced Enumeration Queries

From "Crowdsourced Enumeration Queries" by Beth Trushkowsky, Tim Kraska, Michael J. Franklin, Purnamrita Sarkar. ICDE 2013 best paper award, and one of my recent favorites.

Hybrid human/computer database systems promise
to greatly expand the usefulness of query processing by incorpo-
rating the crowd for data gathering and other tasks. Such systems
raise many implementation questions. Perhaps the most funda-
mental question is that the closed world assumption underlying
relational query semantics does not hold in such systems. As a
consequence the meaning of even simple queries can be called
into question. Furthermore, query progress monitoring becomes
difficult due to non-uniformities in the arrival of crowdsourced
data and peculiarities of how people work in crowdsourcing
systems. To address these issues, we develop statistical tools
that enable users and systems developers to reason about query
completeness. These tools can also help drive query execution
and crowdsourcing strategies. We evaluate our techniques using
experiments on a popular crowdsourcing platform

I've been playing with crowdsourcing function evaluation, and the above line of work shines: different types of human queries suggest different types of semantics. For example, Select all states in the US makes sense, while select all ice cream flavors has, arguably, a quantification error. The differences lead to fun stuff, such as distinct query plan optimizations for different human computations. I've found this style of thinking to guide my own recent implementation work.

The overall research field intersects many good topics: linguistics / NLP, query planning, language design, etc.

Pdf is here.

Cryptography DSL.

Lifted from a press release issued back in 2008.

Cryptol is a domain specific language for the design,
implementation and verification of cryptographic algorithms,
developed over the past decade by Galois for the United States
National Security Agency. It has been used successfully in a
number of projects, and is also in use at Rockwell Collins, Inc.

Domain-specific languages (DSLs) allow subject-matter experts to
design solutions in using familiar concepts and
constructs. Cryptol, as a DSL, allows domain experts in
cryptography to design and implement cryptographic algorithms
with a high degree of assurance in the correctness of their
design, and at the same time, producing a high performance
implementation of their algorithms.

http://corp.galois.com/cryptol/

Note: I am not in any way affiliated with Galois inc, and not, myself, a user of the Cryptol language. I'm posting it because I thought LtU might find this an interesting thing to discuss.

Harlan (a high level language for general purpose GPU computing)

Eric Holk has released the source to Harlan.

"We propose a declarative approach to coordinating computation and data movement between CPU and GPU, through a domain-specific language that we called Harlan"

Paper here : http://www.cs.indiana.edu/~eholk/papers/parco2011.pdf

Git here : https://github.com/eholk/harlan

Constraint-Based Type Inference and Parametric Polymorphism

An old paper ('94) by Ole Agesen, abstract:

Constraint-based analysis is a technique for inferring implementation types. Traditionally it has been described using mathematical formalisms. We explain it in a different and more intuitive way as a flow problem. The intuition is facilitated by a direct correspondence between run-time and analysis-time concepts.

Precise analysis of polymorphism is hard; several algorithms have been developed to cope with it. Focusing on parametric polymorphism and using the flow perspective, we analyze and compare these algorithms, for the first time directly characterizing when they succeed and fail.

Our study of the algorithms lead us to two conclusions. First, designing an algorithm that is either efficient or precise is easy, but designing an algorithm that is efficient and precise is hard. Second, to achieve efficiency and precision simultaneously, the analysis effort must be actively guided towards the areas of the program with the highest pay-off. We define a general class of algorithms that do this: the adaptive algorithms. The two most powerful of the five algorithms we study fall in this class.

Dynamic inheritance?

Consider class C that inherits from class B in a (statically) typed language. I would like to extend, at runtime, an instance of type B to type C:

    B b = new B();
    Assert(b is B);
    b.Method(); // invokes B.Method() 

    extend b to C();  // partial constructor for C
    Assert(b is B);
    Assert(b is C);
    b.Method(); // invokes C.Method() override 

Is there any reason why this should not be allowed in terms of type safety? If not, what is it called and is there any language out there that supports this?

XML feed