Julia has no dependent types

https://github.com/JuliaLang/julia/issues/6113

sorry if this has already been posted; i don't read here much and ltu was mentioned in the link above (but i can't find anything).

i want to ask some questions about my limited, practical, non-theoretical understanding of dependent types:

it seems to me that in julia i can define a datatype (something like a struct in c), that has an associated parametric type that contains information about the data. so, for example, i could put the length of an array in there. and then, as long as any code that constructs or modifies an instance of that datatype follows the contract implicit in the associated parametric type, the code will work correctly and i can dispatch on array length.

now, it seems to me that this differs from how i expected dependent types to work, in that there is no inspection of the actual data value on dispatch. when i dispatch on the array length i am using information in the parametric type associated with my datatype that i trust is correct. but there is, at the moment of dispatch on that type, no inspection. length(array) is not called.

so the correctness of my code depends on the correctness of functions that construct and modify the datatype, rather than on functions that inspect it.

so my questions are:

1 - do other dependent type systems inspect the data?

2 - is this just an implementation detail, or is it connected with what was argued about in the link?

3 - please can someone explain the link like i am five? if the above is not connected with the argument, then i don't get it... in particular, julia often (usually?) compiles code specific to particular types. it doesn't do type unification (HM), afaik, but it does propagate types from annotations, and all concrete types are final (leaf), so the compiled code is "static" in some sense.

thanks,
andrew

[edit: i don't know in detail how julia works, but my impression is that although it's called a "jit" it's basically a static compiler that is triggered only when dispatch on a new combination of function and types is encountered. wherever possible it extrapolates type information to compile static code with fixed types. this is why it is fast.]

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Ideally, dependent types are

Ideally, dependent types are proven statically - i.e. so they don't need to be checked at runtime. Doing so provides better performance and nicer guarantees. The best dependent type systems do not inspect data at runtime. In some cases, a mixed approach can be used, where some checks are performed at runtime when they could not be validated statically. Even in this case, it is ideal that a minimal set of checks is identified and implicitly pushed to the very edge of the subprogram, i.e. gatekeeper pattern. If 'dependent types' is used in a fully dynamically typed model, then the types don't really achieve anything useful - other than documentation and a sort of fail-fast to help eliminate bugs.

I don't really follow Julia, so I don't know where it falls on the spectrum from static to dynamic.

Julia is dynamically typed

Julia is entirely dynamically typed, but there are two caveats. As a matter of implementation rather than model, the Julia compiler uses type inference to guide code generation (it uses the LLVM JIT). Unlike many dynamically typed languages, Julia's types can be parameterized by either types or integers: thus Array{T, n} is the type of an array with element type T and n dimensions. This is the dynamic analogue of (static) dependent typing.

Any "compound" type can be

Any "compound" type can be made dependent. Here 'compound' means a type made out of other types. Sticking with tradition, we'll talk about function types "A -> B" (which are 'compound' because they have some input type "A" and some output type "B") and pairs "A x B" (which are 'compound' because they contain the type of their first element "A" and the type of their second element "B"). Being "dependent" means that the second *type* is calculated using the first *value*. I'll explain what this means below.

Dependent types are analogous to "let". For example, the following pseudocode should hopefully be clear enough:

let (x : Nat) = 10
 in square(x)

It declares a variable "x" of type "Nat"(ural number), gives it the value 10, then calls a "square" function on this variable, presumably squaring the number, to produce the overall result. If we think about what's happening here, we have some context(/environment/scope/namespace/whatever you want to call it) which maps variable names to values. The "let FOO = BAR in BAZ" construct is saying "My value is whatever BAZ evaluates to in a context where FOO maps to BAR".

Now, dependent types work in a similar way, except they're "static", meaning they don't have any access to (runtime) data. If we get rid of the data (the "10") in the above code, we get the following:

let (x : Nat)
 in square(x)

Now, we're meant to be generating a type rather than a number, so let's change the function we call (nothing else has changed):

let (x : Nat)
 in SomethingToDoWith(x)

Where "SomethingToDoWith(x)" generates a type involving the value of "x" (in the same way that "List a" generates a type in Haskell involving the value of "a", and "List<t>" generates a type in Java involving the value of "t"). The crucial difference is that "x" is a regular program variable rather than some meta-thing.

So, how do we get a value for "x"? There are two ways: either we can make a function which takes "x" as an argument and spits out a "SomethingToDoWith(x)", or we can make a pair where the first element is "x" and the second is a "SomethingToDoWith(x)". We can write them like this, respectively:

 for-all (x : Nat)
there-is SomethingToDoWith(x)

here-is (x : Nat)
    and SomethingToDoWith(x)

Values of these types look something like these:

function(x : Nat) {
  return some_way_of_building_a_SomethingToDoWith(x);
}

(x, some_way_of_building_a_SomethingToDoWith(x));

These are dependent functions and dependent pairs. Now, if we replace "SomethingToDoWith(x)" with some useful type, like "IsEven(x)" or "Base64EncodingOf(x)", or we get more elaborate like "ListOfLengthAndType(n, t)" or "ElementInList(e, l)", then we can start to see why this is useful.

The reason we need both dependent functions and dependent pairs is that the former are useful when we want to relate the output of a function to its input, the latter are useful to pass values around without breaking their relations (and because clearly there are some relations which don't apply "for-all" values).

so given you can do all that

so given you can do all that in julia, that thread is really just someone throwing a hissy fit about whether not the words "dynamic analogue of" (as in john cowan's post below) appears before "dependent type"?

and it seems like no-one cares about what worries me, which is the quality of the assurance i have that the types are actually correct.

[edit:] it just dawned on me that my worries are related to referential transparency. i guess most (all?) languages with dependent types are RT? julia isn't.

Can you really do that in

Can you *really* do that in Julia? Please provide examples, since I'm not too familiar with it.

While it's easy to calculate types dynamically, it's also pointless since they have no computational content (they cannot affect the behaviour of the program). The (main) thing that types are good for is type-checking, and the (main) thing that type-checking is good for is finding type errors, but I'm struggling to understand how that would work dynamically. A type error means that we're trying to use some value in a context that forbids it. This is fine if our program is well-typed, but if type-errors are only spotted just-in-time then clearly nothing is forbidden: if I specify that my function is Int -> String, I can still pass in a Bool because the types haven't been checked.

There are only three ways to implement this that I know of:

Unityped languages, where every value has the same type (Int+Bool+Function+String+Error+...), in which case there's no point type-checking (since every program is well-typed) and there's certainly no point in dependent types (since they can't narrow anything down).

Making every type nullable and just producing NULL (or "Error" or "Exception" or whatever) whenever something goes wrong. Dependent types would be of little use in such a language, since every specification can be trivially implemented by NULL.

Causing a hard crash. Dependent types would be useful in this context, but such a language would probably cause frustration to developers.

ok, maybe i haven't

ok, maybe i haven't understood. but when you said

function(x : Nat) {
  return some_way_of_building_a_SomethingToDoWith(x);
}

i thought of constructors in julia. for example, i have been writing a library to do modular arithmetic. there's simple example code at https://github.com/JuliaLang/julia/blob/master/examples/modint.jl

there, you can see a datatype ModInt, whose type (tag) is parameterised by the modulus. the constructor

ModInt(k) = new(mod(k,n))

tells you how to make a new value of type ModInt{n}(k) (which is "something to do with n"). so, for example,

gf2_1 = ModInt{2}(5)

creates a new member of GF(2) with value 5%2 = 1.

now, if i create another instance

gf5_0 = ModInt{5}(5)

which is in GF(5) with value 0, and then try to add these

gf2_1 + gf5_0

i will get (when the program runs) an error saying that no suitable method exists, because they are different types (ie different fields; different moduli) and addition is defined only for instances of the same type:

+{n}(a::ModInt{n}, b::ModInt{n}) = ModInt{n}(a.k+b.k)

both instances must have the same modulus n (which is introduced as a type variable kind-of-thing by the {n} before the parameter list).

i hope that's clear and fits your earlier description.

andrew

ps it may seem like the modulus is trapped in a different level (or scope or whatever) because it appears above inside braces rather than parens. but you could then define something like

function make_me_a_number_modulo(n, k)
    ModInt{n}(k)
end

which makes everything clearly at the level of values in the code.

Aha!there, you can see a

Aha!

there, you can see a datatype ModInt, whose type (tag) is parameterised by the modulus.

A tag is not a type. A tag specifies which component of a sum type is being used, but the type is still a sum type. For example (in Haskell):

data UnitType = UnitVal
data Sum a b = Left a | Right b

type Bool = Sum UnitType UnitType

true = Left UnitVal
false = Right UnitVal

Here "true" and "false" both have type "Bool", but they have different tags ("Left" and "Right"). The most important difference between types and tags is that I cannot use types to influence the behaviour of the program: there is no way for me to write a function which behaves differently for different argument types. The proof of this is trivial: my function will only ever receive arguments of one type, its input type!

If I want my function to accept, say, Strings and Ints, then my input type is "Sum String Int", which again is still just one type; although it just-so-happens to be a compound type. So again, there is no way to alter my behaviour based on type, since there is only one type. What I *can* do is look at the tag and branch based on that, but tags are not types as I pointed out in the true/false example above.

Now, let's say I have a 'Sum String Int' and I branch on it:

function(x : Sum String Int) {
  return case x of
    Left s => string_func(s)
    Right i => int_func(i);
}

Now I can run a string function on strings and an int function on ints, so I have gained type information by branching on the tag, but this still doesn't mean that I've branched on the type!

i wil get (at run time) an error saying that no suitable method exists, because they are different types and addition is defined only for instances of the same type

Hopefully the explanation above has clarified that these values do not have different types, although they do have different tags. I'll assume that the language provides us with an "Any" type, since that's how all scripting and class-based languages work (although they don't allow us to access it, because it provides no information). As a slight alteration, I'll give our tags some parameters:

data Any = S String | I Int | E Error | F Function | Mod n Mod{n} | ... ;

Now we can write the following function, which seems to distill the essence of your example, whilst making the dynamic dispatch explicit:

function add(x : Any, y : Any) {
  return case x of
    Mod n x_mod => case y of
      Mod m y_mod => case n == m of
        true => Mod n ((x_mod + y_mod) % n)
        false => E DifferentModError
      _ => E SecondArgumentNotModError
    _ => E FirstArgumentNotModError;
}

This function accepts two arguments of type "Any" and returns a value of type "Any". These types have no impact on this function's behaviour, but since "Any" is a sum type, the arguments have tags which *are* used to determine the function's behaviour.

As we can see, everything has type "Any", so there is no point doing any type-checking. Hence there is no point in providing dependent types in such a language, since they won't be checked.

What Julia seems to be doing is providing an implicit dynamic dispatch feature, ie. implementing the above branching automatically, so we only need to provide the expressions on the right-hand-side of the "=>" (usually known as "methods"). In which case, the parameterised tags allow us to narrow down which branch is chosen by Julia. This has nothing to do with types, and is just a particular kind of pattern-matching. For example, I can easily pattern-match on any data I like:

function only_mod_ten(x : Any) {
  return case x of
    Mod 10 x_mod => I x_mod
    _ => E NotModTenError;
}

function only_ten_mod(x : Any) {
  return case x of
    Mod n 10 => I n
    _ => E NotTenModError;
}

When the language pattern-matches for us, based on tags (eg. the way methods are dispatched on classes) then we lose this specificity. It seems that Julia has added a bit of that ability back, albeit in an explicit, verbose way; ie. the two arguments to 'Mod' seem to be treated differently. Presumably I can't write a method implementation for Mod{n}(10), ie. an implementation which will be called whenever the second argument is 10, regardless of the first?

Another potential improvement would be allowing unification, ie. by repeating the same variables in a pattern-match:

function add(x : Any, y : Any) {
  return case x of
    Mod n x_mod => case y of
      Mod n y_mod => Mod n ((x_mod + y_mod) % n)
      _ => E SecondArgumentError
    _ => E FirstArgumentError;
}

This lets us avoid the explicit equality check. From Julia's point of view, it would mean a method implementation like "add(Mod{n}(x), Mod{n}(y)) { ... }", ie. where both arguments share the same "n".

just having read the first

just having read the first part of your reply, which tries to explain the difference between tags and types, it seems to me that you could say the same about any parametric type because

function proof{Unconstrained}(x::ParameterizedType{Unconstrained})
  if Unconstrained == ...
  ...
end

so from your argument it would be wrong to call anything that goes inside the {...} a dependent type? they are, instead, tags? and so i need to create the "name part" of the type ("ParameterizedType" above) from a value?

if so, the only way to do this in julia would be to use macros.

---------

and reading the rest of your comment, yes, you could argue that this is a kind of matching.

but now i am wondering about john's comment. isn't this still "analogous" to dependent types in a dynamic language? given that it's a dynamic language anyway, how can you get closer to dependent types in a static language?

is the problem then more about claiming something that is simply impossible in a dynamic language, or is there still a useful analogy here?

i suspect you have already implicitly answered the last points above, but i am in a rush for work so will need to think later.

thanks again,
andrew

I think that calling the

I think that calling the contents of "{...}" a 'dependent type' is misleading/confusing, since even 'proper' dependent types wouldn't call it that. It makes sense to call it a 'dependency', since "x" depends on it. In a 'proper' dependent type this would be called an 'index' or a 'parameter' (these mean pretty much the same thing, I didn't know the difference until asking [1] ).

Since we're comparing static and dynamic type systems, I think it's important to point out a conceptual difference between my use of an "Any" type and your naming a variable "Unconstrained".

First let's imagine a compiled, PHP-like language, with slightly stronger type-checking. For simplicity we'll ignore classes except for "Object". We might say:

$my_function = function(int $x, int $y) {
  return $x + $y;
};

Note that in PHP a function is an object, so we can't say "$my_function(5, $my_function)"; our code will not compile, due to the type error. Now, let's say we get rid of non-object types and have all integers, bools, etc. be objects as well. In this case we get:

$my_function = function(Object $x, Object $y) {
  return $x + $y;
};

Now we are free to say "$my_function(5, $my_function)". In fact, we're free to pass anything around without ever having a type error. This is identical to the "Any" type I used above, ie.:

$my_function = function(Any $x, Any $y) {
  return $x + $y;
};

Now, since *everything* has the same type (the language is "unityped"), there's no point putting type annotations anywhere! Every type can be inferred trivially, since it's always "Any". Hence, most dynamically-typed languages never mention this type anywhere, do not allow it to be referenced from within the language and hence do not support type annotations, basically because there's no point in doing so. If I were to name my variables in a way that mentions their type, then *every* program would look like this:

any1 = function(any2, any3) {
  return any4(any2, any3);
};

Hence your "Unconstrained" variable has type "Any", but so does x!

In the same way that 'fish have no word for "water"', this unitype property tends to go completely unnoticed by dynamic language users. It is only when a unityped language is compared to a language with a few different types, like Java (int, float, bool, object and a few others), or user-definable types, like Haskell and ML, that confusion sets in. This is because, looking for something in the unityped language to compare these types to and failing to see Any (since it's never mentioned!), usually ends up with false comparisons to tags. That's what's happened here.

I highly recommend sections 17.4 and 18.3 of [2] which discuss this with greater rigour.

Now, with this in mind, we can see that the variable "Unconstrained" has type Any, and the variable "x" *also* has type "Any", since everything does. We also know that x will have the tag "ParameterizedType{Unconstrained}", since that is the tag which told Julia to jump to this code. We're then free to do anything we like with these values, eg. checking if "Unconstrained" is equal to something.

This is clearly useful for dispatching, just like multimethods and other dynamic dispatch mechanisms, but I don't think there's much point in adding 'proper' dependent types; the only thing that dependent types offer is the propagation of type information, in order to constrain how our functions can be called and to justify the ways in which we call them. Dynamically typed languages don't allow this. In a dynamically-typed language, errors are a kind of runtime value, just as legitimate as any other kind of value. In a dependently typed language, errors are caught by the compiler and we never get as far as running the program.

As an analogy, imagine doing the same thing for "Maybe" types. There's not much point in a dynamic language having "Maybe" to indicate the possibility of an error, since everything already has the possibility of being an error. I've never seen a tutorial about Maybe in a dynamically-typed language which covered Maybe's purpose: indicating the possibility of an error. Instead, the tutorials focus on how to pattern-match, or how to avoid explicit error-handlers using map (Functor) or bind (Monad) functions. In fact I remember reading a tutorial about "Monads" which was actually about pattern-matching [3]!

The same applies to dependent types, I think.

[1] http://cs.stackexchange.com/questions/20100/what-are-the-difference-between-and-consequences-of-using-type-parameters-and-ty
[2] www.cs.cmu.edu/~rwh/plbook/book.pdf
[3] http://objitsu.blogspot.co.uk/2013/01/php-and-maybe-miserable-monads.html

Types as stage-escaping values

Recently, when designing a multi-stage programming model, I've found a new respect for using the word "type" to refer to a dynamic tag. Type information, unlike most other information, seems to flow from later stages to earlier ones, letting us obtain knowledge now if we demonstrate that we could definitely obtain it later.

I'm imagining this kind of flow could be incorporated into a language design alongside other looping constructs, conditionals, etc. If this can happen, I think certain bundles of typelike and inhabitantlike values will behave just like a dynamic language's type-tagged structures.

For a language like Julia, which apparently supercharges its type tags for useful purposes, I think "dependent type" could be a great term to describe its features, even if it's a term that usually implies languages much closer to Coq, Agda, and Idris.