LtU Forum

Advancement in TDFA and POSIX submatch extraction

It came up in an old LTU thread about regular expressions. There was an argument whether tagged FA invented by Ville Laurikari can support POSIX disambiguation semantics. It turns out, they can: it is possible to construct efficient Laurikari TDFA with POSIX semantics, as well as with leftmost greedy semantics (details in this paper).

Back in 2007 Chris Kuklewicz suggested an algorithm which is implemented in his Regex-TDFA Haskell library. He also wrote an informal description of his algorithm, but never fully formalized it. Some ten years later I stumbled upon this thread when I was trying implement fast submatch extraction in the open source lexer generator re2c. I revised both the original algorithm by Laurikari and the modification by Kuklewicz and found a number of improvements and bugs in Regex-TDFA.

Language features for tracing JIT?

Are there any special programming language features ("superpowers") that are made possible by specifically exploting a tracing JIT compiler?

It seems to me like tracing JITs are often developed for existing languages (JavaScript, Python, Lua, etc) and tend to focus on efficiently compiling the existing code bases that are written with their traditional programming idioms. I am interested in the opposite approach: what programming idioms and language extensions can help programmers to make the most of their tracing JIT compiler?

I can offer one candidate. Tracing JIT compilers can choose to specialize machine code based on the exact values of parameters that are only available at runtime, and the language can facilitate "hinting" whether a certain variable should be specialized on its value (e.g. generate machine code for the specific case that i = 42) or on its type (e.g. generate machine code for the general case that i is an integer.) Often a certain operations can be compiled much more efficiently when a parameter is constant, for example the size argument to a memcpy or memcmp, and so a potential "superpower" is to apply these optimizations automatically without having to statically generate code that anticipates each possible value.

Is that a legitimate superpower? what are some others? what tricks can tracing JIT users exploit to leave C/C++ hackers eating their dust? :-)

Something I forgot

I think it was discussed here before but it came up in a conversation I had, and I forgot whether I ever got an answer to the following question:

Why doesn't Haskell need a value restriction a-la ML to deal with unsound interactions between mutable state and polymorphism?

Did no-one ever check this or does it have to do with invariance? I forgot.

New simple proof system

I took basic logic and made it as familiar and easy to learn as possible. Proofs look just like proofs in elementary school algebra--a series of steps where each step has a formula on the left and a justification on the right. Formulas are manipulated using rewrite rules (commutativity, etc.) rather than the typical deduction rules. This isn't any sort of limitation. It can implement both intuitionistic and classical logic using only rewrite rules (currently it uses intuitionistic).

I wrote up the result as a sort of game. The first real proof is problem 8, which is just a few steps. Some of the later problems are ferociously difficult (but possible)--no one I've shown it to has gotten close to solving them, so if anyone manages to solve problem 25, 35, 36, or 37, please let me know.

It's completely free.

NOOL 2017

Co-located with SPLASH 2017, there will be another NOOL workshop - see here and here for previous editions.

Please consider submitting an abstract! It's a fun, friendly event and this year it's followed by the LIVE workshop.

Type system based on epistemic modal logic?

This is something I’ve been mulling over lately, and I’m curious to hear opinions on it and whether there’s any existing research (I couldn’t find any).

Since type systems based on linear logic are receiving more attention lately in Rust and the forthcoming linear types extension for Haskell, I was looking around at different logics to see if they would make a useful basis for type system features in a programming language. I was aware of modal logics, but hadn’t really looked into them since I read From Löb’s Theorem to Spreadsheets a few years ago.

I came across epistemic modal logic, a logic for reasoning about knowledge. That struck me as potentially very useful for distributed data stores, in which different agents (servers) know different things (have different data), and also know things about the things that they and other agents know.

For instance, if a client asks a server for some data, the server may be aware that it doesn’t have the data in question, by the “negative introspection axiom”, ¬Kiφ ⇒ Ki¬Kiφ, if an agent (i) does not know (¬Ki) some fact (φ), then that agent knows (Ki) that it does not know the fact (¬Kiφ). However, it may know that some other agent does know the fact, and can go and fetch it accordingly, or arrange for the other agent to send it directly to the client.

This could incorporate other modalities such as possibility (◊), necessity (□), “everyone knows” (E), “common knowledge” (C), “distributed knowledge” (D). Just brainstorming, I feel like a type system based on these ideas might be able to enforce things like:

  • You can only make a synchronous request for some data from a server if that server necessarily has the data and can therefore respond immediately. Pseudocode:

    syncRequest (fact : FactID, server : ServerID) : Fact
      requires necessarily (server knows fact)
  • After you send some data to a server, you know that it possibly knows the data and that you can ask for it asynchronously.

    send (fact : FactID, server : ServerID) : ()
      ensures possibly (server knows fact)
  • Any part of a message (metadata or data) can be lost, and the server can recover or negotiate repair to achieve eventual consistency. (I’m envisioning a sort of Erlang-style “fire and forget” message passing system, for example over UDP.)

  • Constraints on server & client knowledge to express tradeoffs between consistency and availability.

I’m just riffing informally here. Let me know your thoughts, or if you can point me to any resources. :)

The question of the possibility of a simple formal foundation to the natural languages.

I think that it can safely be said that the natural languages can transport any formal structure; that we can communicate any mathematical structure using the natural language. That's the essence of metamathematics.

But then we are led to believe that the natural language has no proper formal structure. It is informal. In the sense that it is not possible to feed the Don Quijote to an algorithm that will be able to pinpoint any formal inconsistency implicit in the text (without recurring to any other text or knowledge). An example imaginary inconsistency would be if at one point Cervantes says that Don Quijote always likes lo love Dulcinea, and at another he says that he does not like to love her. The kind of inconsistencies that would destroy any formal structure if inadvertently injected in transit.

These 2 previous paragraphs seem a bit paradoxical to me, since, if there is no procedure to decide whether a natural text is inconsistent (or if there are only heuristic fallible procedures) metamathematics should be impossible, and mathematics could not have been born.

If we take a formal structure expressed in some mathematical formalism, and express it with the natural language, metamathematically, we need to be sure that there are no inconsistencies in any of the cases. In the former case, we can check, and there are algrithms that can check. In the later case, we can check. Can no algorithm generally check?

I know that there is a history to this, from Llull and Leibniz, reaching a summit with Frege, to be toppled by Russell, and followed by the neopositivists etc., and later experiments like the semantic web.

So my question here is: Is there some kind of proof or argument showing that the natural languages cannot be provided with a simple mathematical foundation? Some recognizable fundamental property of the natural language that is inherently inconsistent? (Are both questions the same?)

What I mean with a "simple" mathematical foundation is that it must consist on a core formal theory (that can then be suplemented with a number of ad-hoc rules that provide for shortcuts and phrases in the natural languages). A priori, it should be totally disprovided of semantics. Then we should be able to interpret extensions built on that foundation in natural language texts, taken as abstract structures of words; and then, by proxy, we might provide those extensions with the semantics of the natural language texts in which they are interpreted. (Perhaps we need some a priori semantics, for things like time [edited to add: or I? That would seem to lead us towards Gödel's pit, damm. However, I think that we can settle in principle for a descriptive language without person], that are deeply involved in the basic structure of the natural language?)

So, simple in the sense that Frege's proposal was simple, or that the semantic web is simple (my previous paragraph is meant to be interpreted in their intended use), but a huge neural network with crazy amounts of delicately balanced branches trained by all books ever published is not simple. Notwithstanding the problem that with the neural network solution we are dealing with an ungodly mixture of syntax and semantics.

I really don't know whether I am using the right terminology to phrase my question - or even whether there exists a terminological framework where it can be meaningful and exact. So apologies if I have taken a couple of poetic licences in trying to lay it out.

Question: do you have to climb the tower of interpreters?

You know what is interesting about meta-circular interpreters? The infinite recursion of eval-apply is always broken in the end by the application of some primitives. So in the end, no matter what your computational model is, you are still executing a stream of primitives in your "ground" language. And, usually, "lambda" is all you'll ever need as a primitive. But looking at so many toy interpreters you'd see that "lambda" is not quite so "primitive" and requires a lot of machinery to work.
So instead of providing some means of abstraction directly as a ground mechanism to build everything else around it, I wanted to do a simple thought exercise and to go from another direction: if we have a fixed set of primitives, without means of abstraction, what are the ways to compose/combine them?

Imagine we have a really old CPU without a support for a call instruction, but it has a conditional jump, ALU instructions, loads/stores. It's a simple exercise to define how would a call instruction would look like (in pseudo-assembler):

define CALL(target_addr):
load stack_pointer, tmp
store tmp, the-address-of-the-next-instruction-after-call //don't care now how we can get this
dec tmp
store stack_pointer, tmp
//skipped the frame pointer manipulation
branch target_addr

So, in general, our CPU IS capable of doing necessary steps to create an abstraction mechanism, but there is no way to "plug it in" by the CPU itself unless our preprocessor rewrites all "CALL" tokens into these primitives. Not good.
The only other way I could think of is to interpret another language, which has this CALL as primitive. I.e. the CPU itself becomes such a "preprocessor". But does that mean that the only way to escape deficiencies of your current computational model is to climb the tower of interpreters? (aka interpret another language that has that capability. My favorite examples: a Scheme interpreter with first-class macros in scheme by Matt Might, Kernel interpreter in Scheme by J.Shutt, 3-LISP by Brian Smith)

Are there any other ways, besides interpreting another language? Maybe there is a way to build such a reflexive virtual machine that can modify and extend itself? I haven't seen such yet.

I've recently stumbled upon Ian Piumarta's "Open,extensible composition models" VPRI Technical Report TR-2011-002 which describes a meta-circular evaluator, where instead of your usual big cond-statement in eval you select your evaluator based on the type of expression you are evaluating (same for applicators) and this mapping is available for the programmer. I've found this a very interesting read, maybe someone could comment on this paper, as I've surely missed a lot.

I have surely asked a lot of questions in a very unstructed form, but if anyone would maybe give some useful pointers (something to read on the topic, maybe) it would be greatly appreciated.

isomorƒ: an experimental structured editor for witing/deploying functional code

isomorƒ is attempting to bridge the divide between functional programming, serverless architecture, and cloud-based structured-editing with the grand hope of creating a simplified, beginner-friendly, but powerful IDE experience with easy microservice deployment.

The platform runs on a compact, pure, statically-typed functional AST with all the power in the IDE including a syntactic sugar layer, passively identified reuse ideas, exposte optimization, automatic versioning, and immediate cross-user sharing (all enabled by the guarantees of functional purity / referential transparency).

We currently have a prototype sandbox of the IDE available and a high-level vision at our blog.

We would love any feedback on the sandbox, the idea/implementation, academic/educational/commercial applications or anything else!


hobbes, Morgan Stanley OSS

Over the last few years, I have been developing hobbes -- a programming language, JIT compiler, and database system -- as part of my work for Morgan Stanley. It has become a critical piece of infrastructure in our low-latency, high-volume trading applications, and we have decided to release the source code to the public on github (currently can be built for recent Linux and macOS platforms):

The database system is a lightweight (self-contained, header-only) library used by applications to efficiently log structured (binary) data over shared memory to minimize application latency and reflect application type structure as accurately as possible in log files. We use this to record a very detailed image of application state over time, which we analyze/query out of band.

The JIT compiler can be used embedded in another application (e.g. to "hot patch" an application with very efficient, precisely typed intercepts) or as a standalone interactive interpreter (e.g.: to monitor and query application log data).

The programming language is a variant of Haskell (HM type inference, algebraic data types, qualified types / type classes) with some adjustments to help reduce boilerplate and derive very efficient machine code. For example, we use "structural" record, variant, and (iso-)recursive types to represent data as it's naturally represented in applications and can be deconstructed generically at compile time (e.g. a record can be printed if its first field can be printed and its rest can be printed, a variant can be printed if its first case can be printed and its rest cases can be printed, a recursive type can be printed if its one-step unrolling can be printed, etc).

We are actively using this on major projects, we will continue to develop this github project as we need new features, and we are interested in engaging others outside of the firm for their thoughts, feedback, and hopefully pull requests. :)

XML feed