LtU Forum

Differences between gradual, soft, and dynamic dependent typing?

I've skimmed a few papers on gradual/soft/dynamic dependent typing, but I'm not sure where to focus my energy and what to read in detail. Can someone tell me what the prominent differences are between these three approaches?

On the (Im)possibility of Obfuscating Programs

I hope this sort of thing is welcome on LtU.

Informally, an obfuscator O is an (efficient, probabilistic) ``compiler'' that takes as input a program (or circuit) P and produces a new program O(P) that has the same functionality as P yet is ``unreadable'' in some sense. Obfuscators, if they exist, would have a wide variety of cryptographic and complexity-theoretic applications, ranging from software protection to homomorphic encryption to complexity-theoretic analogues of Rice's theorem. Most of these applications are based on an interpretation of the ``unreadability'' condition in obfuscation as meaning that O(P) is a ``virtual black box,'' in the sense that anything one can efficiently compute given O(P), one could also efficiently compute given oracle access to P.

In this work, we initiate a theoretical investigation of obfuscation. Our main result is that, even under very weak formalizations of the above intuition, obfuscation is impossible. We prove this by constructing a family of functions F that are inherently unobfuscatable in the following sense: there is a predicate pi such that (a) given any program that computes a function f in F, the value pi(f) can be efficiently computed, yet (b) given oracle access to a (randomly selected) function f in F, no efficient algorithm can compute pi(f) much better than random guessing.

We extend our impossibility result in a number of ways, including even obfuscators that (a) are not necessarily computable in polynomial time, (b) only approximately preserve the functionality, and (c) only need to work for very restricted models of computation TC0). We also rule out several potential applications of obfuscators, by constructing ``unobfuscatable'' signature schemes, encryption schemes, and pseudorandom function families.

On the (Im)possibility of Obfuscating Programs

Programming languages for low level autonomic systems

A few quick definitions

Autonomic Systems - Systems that self-manage/repair and improve.

Agorics - A system based on the market forces generally with currency + bidding

I have been thinking about an agoric architecture for a while that can manage low-level resources, so that the system can adapt as much as possible to the problem at hand. I'm trying to emulate some of the flexibility of neural plasticity.

The sorts of things the programs within the system should be able to do

  • Given two garbage collectors in the system, they should be able to pick the one that least interferes with their important operations.
  • Similarly pick the best scheduler
  • Pick the best virtual memory manager
  • Cope with new instances of these low level systems (schedulers/GCs etc), and the removal of others it might have been relying on
  • Adjust the stack space assigned to it, if needed

This is as well as deciding which normal programs to interact with, etc.

So I need a language that allows the programmer to adjust all these strategies. Preferably a language that can also allows the programmer to ignore all these considerations and be as close to a normal language as possible (using default handlers for all these considerations).

Has there been any previous research about this sort of programming language? I had been thinking of an imperative C like language (perhaps taking a leaf from C--), with special functions a la main() that handle these sorts of things when the program is loaded into memory. I'm also interested in abstracting away as much as possible from the underlying architecture.

I have blog-in-making here for other discussions.

What is the dual of { } ?

I have a question for the logicians and category/set theorists among LtU readers.

Background:

I'm currently dealing with a simple matching algorithm for strings. One can create a search pattern either by building a set of admissible characters {a, b, c, ... } from the empty set {} or a search pattern from matching basically any character ( like using a . operator in regexps ) and withdraws a set of characters that shall not be matched:

ANY - {a, b, c, ... }

For disambiguation purposes I'm interested in making all those sets disjoint. It shall not be really significant in the discussion that characters are matched. This is just an implementation detail.

The question is about the status of ANY? As it seems the full ZF set theory would be far too much specification. It entirely suffices that each of those sets S can be finally constructed or finally de-constructed as for ANY - S. In one case one starts with a "set of no entities" and in the other case with a "set of all entities". The latter clearly violates the principle that the set is constructed after the elements. I know ANY isn't entirely insane because I have a working implementation with the usual relations like union, difference, subset and intersection.

What I'd like to know is about a more in depth treatment of ANY in the literature. I expect more interest from computing scientists given the above motivation than from classical mathematics.

Open Recursion

I've been contemplating an embedded DSL. It strikes me how incredibly useful open recursion would be for what I want to achieve, to the point that I might want to give the technique some kind of explicit support and encouragement. Curious to see what had already been done, a quick google search turned up Closed and Open Recursion by Ralf Hinze.

Open recursion Another handy feature offered by most languages with objects and classes is the ability for one method body to invoke another method of the same object via a special variable called self or, in some langauges, this. The special behavior of self is that it is late-bound, allowing a method defined in one class to invoke another method that is defined later, in some subclass of the first.

I don't particularly like the examples being offered, but it's an interesting set of slides nonetheless. Of all the many features of objects, somehow I had failed to fully appreciate this aspect. I don't necessarily want to keep the recursion open until the last possible minute either; I would prefer to close up the recursion as I see fit, quite possibly at compile time. Is there any sensible and attractive way to re-open a recursion once it's been closed?

I was quite impressed with CTL Model Checking in Haskell: A Classic Algorithm Explained as Memoization on Kenn Knowles' blog. Seeing another good example of open recursion would require solving Project Euler Problem 220.

How best to add a record type to my typed Scheme variant?

I'm working on a compiler for a Scheme-like language, 'Irken'. The purpose of Irken is to generate a VM for a (yet to be designed) python-like language. I intend for Irken to be the implementation language for the VM, and also to be the language by which the language is extended - similar to the role of pre-scheme in scheme48.

http://www.nightmare.com/rushing/irken/

One goal here is to eliminate the need for end users to program in C.

Another reason for the roundabout approach: I want massively scalable cooperative threads (think 100,000+ threads, think Erlang)... so using the C stack is out (as is BitC, which I've recently learned about here).

Irken is a whole-program compiler that generates one large C function - 'vm()'. Irken uses the gcc address-of-label extension and 'goto' to implement closures. The current continuation (i.e., 'stack') is heap-allocated.

A couple of months ago my desire for a speedy VM got the best of me and I decided to dabble with type inference - mostly so I could get rid of runtime type checks. It's also nice to get the type safety. 8^)

I now have ML-style let-polymorphism - the algebraic data types. I can implement nice polymorphic data structures like red-black trees, parsers, etc. I've been pleasantly surprised at how little trouble the type inference has been. I've been able to write well-typed generators using call/cc, for example.

I put off thinking about records for a while, assuming that it would be a sugary detail. Now I'm seeing that it's more of a tar pit.

So I'm asking for advice - given the purpose of Irken, what approach should I take in adding a record type? Depending on the amount of work and complexity involved, I could accept anything in the range of 'all records require type annotation' to 'everything is figured out magically by the compiler and has no run-time cost'. Leaning toward the latter direction, though. Also, it'd be nice if I could sugar it up to support some simple OO features.

I've read Wand's "Type Inference for Record Concatenation..." and a bunch of stuff by Remy related to OCaml's row polymorphism. I've also been reading Ohori's "A Polymorphic Record Calculus...".

The impression I have right now is:

Wand: simple monomorphic records + recursive types.
Remy: less simple polymorphic records + recursive types.
Ohori: really cool polymorphic records but no recursive types?

The Ohori approach also looks like I'd have to throw my existing code away - it seems to replace products and sums with labeled records?

Thanks for any and all advice!

Tagged Arithmetic Optimization

Has there been a paper describing optimizations to arithmetic for tagged pointer representations? For instance, stealing 1-bit to distinguish pointers and integers, and using 0 as the tag value for integers, we don't need to perform any shifting for integer addition in order to normalize the integer back to its tagged representation:

-- integer i is represented as a word shifted by 1 bit:
-- word = tag(int)
tag(int) = int << 2
         = word * 2

-- int = untag(word)
untag(word) = word >> 2
            = word / 2

-- no shift needed; subtraction the same
addition = int0 + int1
         = tag( untag(word0) + untag(word0) )
         = tag( (word0 / 2) + (word1 / 2) )
         = tag( (word0 + word1) / 2)
         = 2 * (word0 + word1) / 2
         = word0 + word1

-- one right shift needed to normalize
multiplication = int0 * int1
               = tag( untag(word0) * untag(word1) )
               = tag( (word0 / 2) * (word1 / 2) )
               = tag( word0 * word1 / 4 )
               = 2 * word0 * word1 / 4
               = word0 * word1 / 2

-- one left shift needed to normalize
division = int0 / int1
         = tag( untag(word0) / untag(word1) )
         = tag( (word0 / 2) / (word1 / 2) )
         = tag(word0 / word1)
         = 2 * (word0 / word1)
etc.

Obviously these can be derived from basic arithmetic, but it's a bit tedious, so I was wondering if there was a reference describing a full set of such identities, or any more sophisticated optimizations. Perhaps a tag calculus of sorts.

Of course, care must be taken to handle overflow on the word when adding and multiplying, but I think "branch on overflow" is a fairly common instruction, and would provide for an efficient implementation to tweak the result.

Jane Street Summer Project '09

I'm pleased to announce the Jane Street Summer Project for 2009. The goal of the JSSP is to make functional programming languages into better practical tools for programming in the real world. To do that, we will fund students over the summer to work on open-source projects which aim at improving the practical utility of their favorite functional language.

The JSSP is a follow-on to last year's OCaml Summer Project. A key difference this year is that we are opening up the project to proposals in languages other than OCaml (although we expect to maintain a focus on OCaml projects.) There are also some changes to the funding structure that are particularly relevant for projects located in the US.

If you want to find out more, you can look at the JSSP blog and at the project FAQ

We're looking forward to a great and productive summer, and we hope some of you come along for the ride!

Ruling out nonsensical data type definitions

In a language that enforces a separation between data and codata, it seems like (with a Haskell-esque syntax):

data Silly = Very Silly

should be statically disallowed, since the resulting type Silly is uninhabited.

First, is this intuition correct? Or does defining types of this kind have some utility?

Second, what's the dual concept for codata? I think that if we are ruling out Silly as uninhabited, we should probably also rule out both of:

codata CoSilly = Quite CoSilly
codata CoSilly2 = Just Int | Quite2 CoSilly2

for a related reason.

I'm thinking that the rule is something like this: A data definition must contain at least one alternative which is not recursive. A codata definition must not contain any alternative which is "only recursive" (meaning that it is a product of only recursive references to the codata type being defined).

Am I even close? I am unaware of the correct terminology here, because searching is getting me nowhere. I've read a few overviews of total functional programming, including Turner's "Total Functional Programming", and learned of other syntactically reasonable data type definitions that have to be rejected because the reintroduce bottom, but didn't find a discussion of types of this kind (which seem merely useless in the data case, but also seem like an enemy of productivity proofs in the codata case).

Cybernetics and AI

This post is inspired by the Elephant 2000 discussion on the main page. Paul Pangaro's page has a wealth of material on everything related to cybernetics and computing. There is also an interesting discussion of the relation between Cybernetics and AI , and don't miss Cybernetics as an approach to software experience.

XML feed