Intellisense for dynamic languages

I've recently been toying with various approaches for supporting some level of intellisense on a dynamically typed language, specifically Lua (a functional scripting language with lexical scoping). I'd like to bounce some ideas around the community and hear some opinions. I should pretext this by stating that I'm not really a "languages guy." I have a CS degree with the standard compilers course but no advanced experience...which is probably why I was naive enough to undertake this in the first place :P But I have a significant amount working.

Since intellisense information is based on type information, there are inherent difficulties in implementing this for dynamic languages. The problem I'm most interested in is the following.

Let's say we have a name n in the scope s which is assigned different types within a conditional statement. For example, assume N and M are valid variables with different types below.

function s()
local n

if x then n = N else n = M end

...
end

We want to be able to resolve the type of n to provide intellisense information for it after the assignment. In order to do so, it seems we must first resolve x. However, this may not be possible outside of runtime (it could be a runtime lib call for instance). A similar problem occurs when a function has multiple return statements.

The solution I'm currently considering is to present all possibilities to the user. For instance, in the example above we know n is either type of N or type of M. In an environment like Visual Studio or Eclipse the intellisense drop down could display both possibilities with a special icon or text that indicates the ambiguity. In some situations, it may be obvious to the author what the correct option is for a given case. In a nutshell, if the parser can't figure it out statically then we ask the author to disambiguate and treat the name as the disambiguated type from that point forward. At least until it's made ambiguous again by a similar sort of assignment.

I can see a somewhat controversial aspect of this being that it can make you think about types while coding in a dynamically typed language. Also, since it's implied that the parser must keep track of every assignment, scalability is a concern.

Comments? Alternatives? Will this be useful?

Cheers,
Trystan

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

explicit or implicit disambiguation?

How do you plan on handling disambiguation? By having the user select from a pop-up menu whether n is say a string or a number, or based on which operation the user selects to perform on n?

I would recommend the latter: in the intellisense popup, show the operations that can be performed on all of n's possible types (maybe in bold) as well as those that can be performed on at least one of n's possible types (maybe in gray). If the user selects an operation or method that works with all of n's possible types, great. If the user selects an operation that only works with some possible types, then you've narrowed down n's possible type. You might look into union types and type inference for more ideas.

Now, the case you mentioned could actually occur either because the IDE doesn't know enough about the program, or because the program actually has a lurking type error that may be manifested at run time. Are you also planning on providing indicators for possible bugs?

I imagine I'll need both

I imagine I'll need both depending on the case but for most cases I think it will need to be explicit. For instance, a '+' operator is valid on multiple Lua types. Although in other cases, as with the '.' operator, the type can be implicitly known.

I like your idea regarding the "all or at least one" operations. One issue I need to face is if N and M in the example are both Lua tables. I don't think Lua really treats different tables as different types but I think it's desirable for my purposes. So, if n is an ambiguous table (could be M or N) and the programmer types 'n.' then the pop-up can show the members of N AND M as you suggest.

I don't plan to provide indicators for possible bugs at the moment. Maybe in the future though.

Thanks.

How about in a debugger

If you have a visual debugger that lets you edit code while the program is at a breakpoint, then you'll have runtime type information available. Of course the danger would be that the user will enter code that won't work next time around because the type will be different.

I like Sameer's idea of color-coding operations that are common to all possible types.

I guess another option could be to let the user stick in some kind of type assertion at the top of some block of code (e.g. assertType(obj, MyWidgetType)) to clue intellisense in to what type you expect.

Intersection types or abstract interpretation

Off the top of my head, I would suggest looking for either intersection types or abstract interpretation (both were discussed on LtU several times).

With intersection types you would not bother to know which branch of the if statement is executed, you just assume both are possible, so n has type MT ∩ NT (MT intersected with NT - inhabited by all values that are both MT and NT - where MT and NT are types of M and N).

With abstract interpretation you try to "execute abstractly", informally meaning you only resolve enough information to give you the answer you want (e.g., you are not interested in values of M or N, but you want yes/no differentiation for x).

Hope that helps.

You will probably want to

You will probably want to read upp on success typings for ideas on how to implement this.

Also think about interfaces....

Whether explicit or implied. In you code sample shown above, you can only call the intersection of method names or you'll throw a runtime error. Even if you're returning n to the calling function, the same issue is applicable.

In practice, it seems like most auto-complete for dynamic languages uses reflection to interrogate objects and then ends up guessing.

Some possibly related work

While looking for something else, I stumbled upon "Towards Type Inference for JavaScript" by Christopher Anderson, Paola Giannini, and Sophia Drossopoulou: Citeseer link, which attempts to analyze (a subset of a simplified form of) Javascript with the goal of type-checking flexible objects. There is also Peter Thiemann's "Towards a Type System for Analyzing JavaScript Programs", which I haven't read although it sounds interesting (Springerlink abstract).

The former paper also has some interesting references in its bibliography which might be worth investigating.

Applying the methodology to Lua should be straightforward, but I suspect that the absence of Lua syntax for declaring prototype relationships would make the task rather more difficult. (In Lua, a prototype relationship is established by associating a table of metamethods with a simple table; one of the metamethods extends member lookup either by deferring lookup to a prototype table or by providing a function to implement member lookup.) There are standard Lua patterns for creating such "types", though, so it might be possible to use some simple source annotation to ease the job of source analysis; this would certainly be good enough for "intellisense".

Both Lua and Javascript allow access to undefined table keys. (Lua simply returns nil while Javascript returns the undefined value.) In Lua, as mentioned above, access of an undefined key may trigger an arbitrary Lua program which could produce a run-time error; however, Lua style is to return nil for undefined keys.

In effect, this means that every Lua table is (semantically) a complete mapping of all Lua objects into the set of Lua objects -- in other words, not much different from a function. This is what gives Lua its unique flavour; it is both extremely flexible, and extremely resistent to analysis. (Undoubtedly LtU readers will differ in their opinions of whether this is a Good Thing; I'm not going to comment here on that question. I'm simply trying to present the semantics.)

There have been some attempts to provide tab-completion (as we used to call "intellisense") in the context of a Lua interpreter, through run-time enumeration of table members. While this can be implemented adequately, it is not perfect; unlike Javascript, there is no defined mechanism for enumerating all the keys which a table might map to a non-nil value (and, indeed, there is no a priori reason to believe that a mapping to nil is not meaningful in the context of a particular program). Furthermore, it's possible that triggering a table's lookup metamethod might alter the state of the table object.

Doing this analysis statically in the context of an IDE does not necessarily suffer from the above problems. It does not need to be 100% accurate to be useful, particularly if there were some mechanism (source code annotations or retained programmer-provided "hints") to correct deficiencies.

Self

The Self papers on Type Analysis (see also the thesis) may also be helpful/interesting along this line.

What about eval statements ?

I just tried Lua once some years ago ... so I'm not sure about that. But I assume it has some kind of eval statement.
As python has.
x = "5 * 3"
y = eval(x) #=> 15

And it seems pretty impossible to tell the type of an eval statement that evaluate a string that is built at runtime.

Well anyway, even if in some cases you cannot tell the type of a variable for sure, you're doing something very useful.
This is one the thing that python miss IMHO.

Indeed that would be more work

Lua does have an equivalent, loadstring( string ). The return value of which can be invoked like a function to carry out the evaluation. I'm probably not going to worry about this case for now. Also, there's a good chance that when someone uses loadstring, they're getting the string from a source that's not available during static analysis, such as a remote location.

btw, Thanks for all the input folks! This gives me a lot to think about.

Type of an Eval Statement

The type of an eval statement could theoretically be generated by running a compiler during run-time. Not many compilers are reentrant at runtime though. This gives me some good ideas though!

Lua's compiler is indeed re-entrant at runtime

The Lua standard function loadstring actually invokes the compiler, returning either a function or an error indication. In theory, the returned function could be annotated with a guess about the type which it will return, but I don't see how that would be any better than simply examining the return value(s). That would be different if it were possible to annotate with a precise type judgement.

Plugging Haskell In

You may be interested in Plugging Haskell In, then.

Also see Norman Ramsey's papers on Lua-ML

which you can find here.

The type of eval

...simply is

    String -> exists T.T

However, this is not very useful unless you have some form of dynamic type analysis (a case on types) to discover the actual witness type. Which brings you to type Dynamic.

If you don't mind the plug, in Alice ML we support eval via such a construction.

On the other hand, I have to say that I'm somewhat sceptical that eval actually is useful in serious code...

Hm.

Databases (stored procedures), word processors (e.g., Emacs/Word), IDEs (Eclipse), web browsers (plugins/scripts), web servers (Apache modules), and operating systems (loaded modules) all extend the running system with new code. I think that you can make a case that the dynamically-extensible systems is the only successful architecture for large programs.

I would be really, really happy if I had an emacs I could program with ML.

Well, I agree

But I don't think this is equivalent to eval. Most of your examples rather employ a form of embedded interpreter, which is a weaker but safer approach. And of course, dynamically loading code does not necessarily imply runtime interpretation/compilation at all (e.g. with OS modules). But you have a point in that it is sometimes hard to define the boundaries between these approaches.

EDIT: I also would like to point out that even dynamic creation of program components does not necessarily require online compilation/interpretation. In Alice ML for example, you can dynamically compute, create and export components as first-class values, without the need to go through the source language.