LtU Forum

System-level Static Analysis

Here's a shocker: static analysis of C or ASM is tricky. I'd rather be analyzing Pi-calculus or Haskell. Now, of course, when

  • you have to deal with existing system-level software (read Linux kernel or kernel modules)
  • you need to deal with infinite behaviours
  • you want guarantees of safety, in terms of what system calls are performed and when they are performed
  • you want these guarantees to be expressed in a manner that can be checked against a local policy
  • you can't afford to develop a general kernel-level sandbox
  • you can afford to assume that array accesses and pointer operations are safe
  • static policies for dynamic control, as provided by SELinux are not powerful enough

it starts to look like you don't have much choice.

In this case, after some thinking, I believe that my best angle would be type-and-effects, either on C (preferably) or ASM. Of course, this requires a degree of formal semantics and/or type system framework for the chosen language. Also of course, I'd be willing to go the PCC way if there's anything available. Again, preferably for C.

So far, my googling on the subject has located:

  • A message on LtU Classics, unfortunately pointing to a 404.
  • Nikolaos S. Papaspyrou's PhD thesis on the formal semantics of C, which I'm trying to read at the moment.
  • Tal and DTal, which are probably quite usable but a bit too close to the metal for my tastes.
  • Jeff Cook's formalization of C in Nqthm (I haven't been able to locate the actual document yet and I'm not familiary with Nqthm).
  • Michael Norrish's formalization of C in HOL.
  • CQual, whose documentation I'm currently reading.

Before proceeding any further, I would like to hear your opinion. Is there any important angle or previous work I've forgotten ? Is anyone around here working on something similar ? Do you have any suggestion ?

Ralph Griswold died

Ralph Griswold died two weeks ago. He created several programming languages, most notably Snobol (in the 60s) and Icon (in the 70s) — both outstandingly innovative, integral, and efficacious in their areas. Despite the abundance of scripting and other languages today, Snobol and Icon are still unsurpassed in many respects, both as elegance of design and as practicality.

FP vs. JSP

Somebody who professes to not particularly like FP finds themselves implementing one: via Artima, the StringTemplate system shows what a trained software developer can do to clean up the previously somewhat filthy world of templating. (Although I think the Perl HTML templating library I used before was "push" too, by the way; maybe it is just the JSP users who were sent down the wrong path for so long).

On Publishing

Does any on have any suggestions on where a good place to submit a paper on type systems would be? Say for example, a paper which describes the type system and type inference algorithm for a stack based compositional language.

Much appreciated,
Christopher Diggins

Delevoping a new language...

Hello, everybody.
I'm developing a new programming language, which can be described as hybrid imperative/functional/object-oriented one. It has borrowed a lot from:

-- LISP and Scheme (lists as the most important data structure),
-- Perl, Ruby and some other scripting languages (multi-dimensional arrays, hashes with any key type, regexps still to be implemented),
-- OO-features like in C++/Java (but with persistant class->object bindings),
-- finally, even with some WWW formatting and data declaration languages (like (X)HTML/XML/JS...) (The original idea was (and remains) to try replacement of static "declarative-type" languages like HTML with dynamic functional-type language, allowing construction of text/graphic page content on the run.)

(I'm not sure, is this forum suitable for discussion of such project - but if it is, i'll try to supply some more info.)

Block-scope

I've recently been puzzled by what I considered to be a design error in Javascript. Then I realized that Python behaved the same. Finally, I realized that people were aware of the issue: these languages don't have block-scope, the only scoping construct is the function. If you declare a variable inside a loop, the same instance will be used for every iteration, and it will be visible after the end of the loop.

As I said I considered this as a design error; to me, it's not natural at all. I realize that I get this feeling because of other languages that I use, mostly OCaml but also C. The other day I was discussing with somebody, trying to convince him of the awkwardness of this, but he found that normal, said that experienced Javascript programmers know this behaviour, and that he didn't see any reason why one choice is better than the other. So I started to look for a good answer showing that one definition of scoping has much better properties than another, etc. It's not that easy.

Here are the questions that I want to share:
* Anybody knows of (a document explaining) the rationale behind JS or Python's choice?
* What solution do you prefer, and which _solid_ argument would you propose for it?

I couldn't find anything concerning the first question. I found a claim that JS2 would fix that, but it didn't seem right to me when reading the specification draft. Concerning the second question, here are possible answers...

0. Conservatism

Since C/C++/... has block-scope, it would have been wise to keep it the same, and avoid the user's puzzling. Or at least forbid the var declaration inside a block, asking the user to declare it at toplevel in a function, which is equivalent anyway.

1. Expressivity

The more properties one has to reason about a language, the better it is. One such property can be: inserting some expression (under X conditions) in a sequence doesn't change the final result of a program. It can't be true if the expression declares a variable at toplevel -- some people might advocate for the let-in construct at this point. Without block scoping it also isn't true as soon as the expression declares a variable inside a block. So the property is more constrained.

2. Static scoping

Another possibility is to ask for easier static analysis. Without block scoping, the scope gets enriched after a loop which declares a variable, *if there was at least one iteration*. This condition makes the lexical environment runtime-dependent. But well, with or without block-scope, static analysis is just impossible for dynamic languages anyway..

PS: Today I got mail on the r6rs list criticizing the absence of scope for the begin construct. It seems that even Scheme made this choice..

Languages With Some Form of Implicit Subtyping

Hello all. Caveat: I would have just searched on google et al, but I am not sure that I have the terminology correct. I think that I am looking for implicit structural subtyping. Essentially, I am looking for a language that would allow something like this:

fun f o = o.ToString()
fun g o = o.ToInt()

fun test o1 o2 =
let s1 = o1.ToString() in
let s2 = o2.ToString() in
let s = s1 ^ ", " ^ s2 in

let l = [s1, s2] in
let l' = map g l in
l'

Note that this should be statically typechecked. Essentially, I am looking for something with a type system similar to O'Caml's object system with row variables, but allowing for the line [s1, s2] without use of the operator :> (or whatever it is) to explicitly state some subtype relationship. That is, l' would have the type of a list of objects implementing ToInt(), where ToInt() has type () -> '_a, that is, ToInt() takes unit and returns some type not yet determined, but that will be. A key aspect that I am looking for is broad use of inference so that one doesn't have to explicitly annotate subtyping relationships, at least not often. If you can think of languages with similar type systems, I would appreciate it. If you could note the differences or constraints as well, that would be great!

Just a note, I am not looking for a language to use, nor one with precisely the above semantics and syntax. I am mostly just interested in the theory behind such kinds of type checking. I hope that I got the name right, but pointers to other keywords to search on would help, too.

Thanks, sorry if this is not the right forum for such a post.

Programming in R

Hi,

I don't know if this is the proper place to do it but I will give it a shot anyways. I am a newbie to programming in R and am having difficulties performing trivial task which would take me seconds to code in matlab, but unfortunetley I am stuck with R. I have a matrix with 3 columns and multiple rows. I wish the have the rows of the matrix sorted by the second column in the matrix (that is the arrange the rows of the matrix, so that the second column is ordered). Any help would be appreciated

Modified javac That Processes Annotations on Local Variables

Hi everyone!

This is my first post here, so if I'm doing something wrong, please let me know and have mercy with me. But I do think I have something to share that you may find interesting: Using material available from Sun under the Java Research License (JRL), I have created a modified version of the javac compiler (javac 1.5.0_06), which I call LAPT-javac, that encodes annotations on local variables into the class files it compiles.

Pre-compiled binaries, the source code, and other useful information is available from the project website at

http://www.cs.rice.edu/~mgricken/research/laptjavac

Please note that you have to accept the terms of the JRL before you can use the files I provide.

Some background information may be useful: Java 5.0 introduced annotations, meta-data that can be attached to other parts of the program. Annotations may target other annotation types, packages, types, constructors, methods, fields (including enum constants), parameters, local variables, or any combination of the above. Depending on their retention policy, annotations may be accessible in the source code only using the apt Annotation Processing Tool (SOURCE retention); they may be encoded in the class file but remain invisible at runtime (CLASS retention); or they may be accessible at runtime using reflection (RUNTIME retention).

Unfortunately, both javac and apt completely ignore local annotations on local variables. In accordance with the grammar, they do not signal a syntax error, but annotations on local variables are neither accessible in the source or class files nor at runtime! In effect, an annotation on a local variable

void someMethod() {
  // ...
  @SomeAnnotation int myLocal = 123;
  // ...
}

behaves exactly the same as if it were written like this:

void someMethod() {
  // ...
  /* @SomeAnnotation */ int myLocal = 123;
  // ...
}

Annotations on local variables are comments, and comments only. A while ago, I realized that another project I am working on could tremendously benefit if I could at least access annotations on local variables in the class files. So I decided to take a look at the javac source code released under the JRL.

To my delight, I found that annotations on local variables were being parsed correctly, and all the information was available all through the very last stage, writing the class files. My change involved only four files (ClassWriter.java, ClassReader.java, Code.java and Name.java) and about 100 lines.

On the project website linked above, I am making available a modified version of the javac compiler (javac 1.5.0_06), LAPT-javac, that encodes annotations on local variables in the class file. I also specify the format for two new, non-standard attributes, RuntimeVisibleLocalVariableAnnotations and RuntimeInvisibleLocalVariableAnnotations, on methods and constructors that closely mirror the format of the RuntimeVisibleParameterAnnotations and RuntimeInvisibleParameterAnnotations attributes that Sun introduced with Java 5.0.

To use LAPT-javac, please download the LAPT-javac binaries file and execute it using Java 5.0:

java -jar laptjavac.jar <parameters>

where <parameters> matches the parameters on the javac command line you would normally use.

I have to admit that what I currently provide may not be very useful to most of you. Even though I encode annotations on local variables with RUNTIME retention in the class files, the original Java 5.0 reflection API is not be able to process these annotations, and I do not currently provide a modified reflection API. So unless you are directly processing class files on a binary/bytecode level like I am for my other project, you are probably not going to get anything out of LAPT-javac. In effect, right now only CLASS retention works as it should.

However, I have some experience with writing libraries that can be put on the Java bootclasspath, and I think I will be able to provide an enhanced reflection library that allows programmers to query annotations on local variables at runtime, just like it can already be done for method parameters. From what I know, apt uses javac's parser, and I know that parser reads annotations on local variables just fine, so an enhanced version of apt is definitely possible too (though less interesting to me personally). My hope is that eventually Sun will incorporate my changes or something similar into standard Java. I truly believe annotations on local variables can be very useful, as it really opens the door for usage-site annotations as opposed to definition-site ones.

I am very much looking forward to hearing your comments. Thank you for your time and interest.

Mathias Ricken
Graduate Student
Rice University
Computer Science

The US Torture Bill as C code

Slightly politcal, but the boingboing commentary is simply hilarious from a PL point of view.

From my perspective the joke is on a deeper level. You see, it's simply that C is torture.

XML feed