Question concerning parameterization over literals

We're currently looking at something in BitC, and I'm wondering if there may be a simpler way. I'm concerned that we may have gotten stuck in a "we know how to do types now, so everything is a typing problem" sort of rut.

Background

In BitC, we have both vectors and arrays. The size of an array is part of its type, but we presently have no way to abstract over that size. This means that we cannot currently build a type class for something like (length expr) and instantiate a version of length that operates on arbitrary arrays. Perhaps more importantly, it means that we cannot express a low-level read function that puts data into an existing array in-place, because we cannot abstract over different possible array sizes.

After some head scratching, we concluded that there is no fundamental problem with extending typeclass-style constraints and instantiation to incorporate instantiation over literals. The underlying observation is that every literal can be considered to be a member of some singleton type, and you can then abstract over that type in the usual way. So far so good.

But as we dug into it a bit, we somehow moved from "abstraction over literals" to "abstraction over literal-associated singleton types". For example, our old specification of the array type constructor was:

    (array T e)

where e must be a literal of type word. The new specification seems to have become:

    (array T T.len)

where "T.len" is required to be a singleton literal type. In this case, that would be a constrained type of the form 'a\(Literal 'a word). So an abstracted "read" function would now be something like:

read: (forall ((Literal 'len Word))
        (fn Stream (by-ref (mutable (array byte 'len))) 'len -> Word)

meaning that read accepts a stream, an array by reference, and a length guaranteed to match the array length, and returns the number of characters read.

All well and good, but from a usability perspective we still want to be able to write something simple like:

(array char 2)

For the moment, there are only a very few syntactic contexts where this would make sense, and Swaroop has proposed that we accept (interchangebly) either a singleton literal type or a value of singleton literal type and "lift" the value to its type in these syntactic contexts. And I think all of that will work.

The Question

While all of this ought to work, it feels like we are going around Robin Hood's barn here. Are there other languages that have dealt with this, and if so, what should we look at to understand better what has previously been done?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Length-sensitive array types considered harmful

I believe Pascal was the last language to say that an array's size was part of its type; that design strategy didn't work too well, and later languages fell back on run-time array size checking (or the lack of it, as in C/C++). Have a look at Brian Kernighan's famous paper "Why Pascal Is Not My Favorite Programming Language" at http://www.lysator.liu.se/c/bwk-on-pascal.html , where he gives details of the problem and says:

This botch is the biggest single problem with Pascal. I believe that if it could be fixed, the language would be an order of magnitude more usable.

Many counterexamples

BitC has vectors if you prefer that route. The reason arrays are in there is that there are far too many use cases for them in the context of being compatible with pre-existing data structures that come from the C world.

Your point is well taken, but it isn't addressing the topic.

Number-Parameterized Types

Not sure, but it sounds like this problem is an instance of Number-Parameterized Types. The Lightweight Static Capabilities paper applies this technique to the problem of array bounds checking.

Monomorphism will likely kill you

You can kindasorta get there in C++ by wrapping an array in a class (thus eliminating the array/pointer lossiness that bedevils raw arrays) and using templates to handle the size.

This example is a toy; and isn't very robust C++ code, but it serves the point.

template 
class FixedArray 
{ 
  private:
    T rawData[Length];
  public:
      // appropriate constructors

      const T& operator [] (size_t index) const
      {
          if (index >= Length) throw ArrayBoundsException
           return rawData[index];
      }
      T& operator [] (size_t index)
      {
          if (index >= Length) throw ArrayBoundsException
           return rawData[index];
      }
      bool empty() const { return Length == 0; }
      size_t size() const { return Length; }
};

So far, all well and good. Bounds checks still occur at runtime, but FixedArray<T,5> is a different class from FixedArray<T,6. If someone wants to use further template hackery to encode either a compile-time assertion, or a range-restricted int class; one can in many cases eliminate the runtime bounds check, or at least move it somewhere else more sensible.

The problem, though, is that since arrays of different sizes are different types, you run into the problems that made Pascal programmers wince. One way around it is to start writing substring functions that return an array of the same base type but different length.

But here, monomorphism kills you. Depending on the goodness of your compiler, you may find that a simple recursive operation over a million-element array will instantiate into existence a 999999-element array, a 999998-element array, reducito ad absurdum. In the toy class above, these are all inline; but many nontrivial int-parameterized classes aren't.

If for some reason you need an array BIGGER then the one you currently have; say you add a (functional) push method which takes a T and a FixedArray<T,n> and returns a FixedArray<T,n+1>--you might wind up with an infinite regress if you are not careful.

The solution most languages take is size-lossy arrays--which moves the nasty intractable and non-terminating cases from the typechecker to the running program.

We can probably summarise

We can probably summarise this as "you can't just abstract over literals without still suffering most of Pascal's problems" and "if you go any further you just got yourself dependently-typed programming"?

"you can't just abstract

"you can't just abstract over literals without still suffering most of Pascal's problems"

Is that really the case? Aren't these problems really caused by the fact that an indexer in Pascal is not a first-class value?

For instance, in the Lightweight Static Capabilities linked paper above, any integer values have to be validated before being used as indexers. If they pass validation, they can then be passed around like ordinary indexes. Certainly there are limitations to this, but this would seem to strictly generalize Pascal's semantics.

Apparently I just wasn't clear enough :-)

"you can't just abstract over literals without still suffering most of Pascal's problems"

Is that really the case? Aren't these problems really caused by the fact that an indexer in Pascal is not a first-class value?

Emphasis on the just - doesn't have to be first class, but you do have to be able to pass it about.

Yup. :)

Which is not necessarily a bad thing.

What exactly is the problem?

An example (using boost::array which is essentially the same as the shown array, but compatible with the standard algorithms):

template<typename T, int n, int m>
boost::array<T, n + m> concat(boost::array<T, n> const & x, boost::array<T, m> const & y)
{
   boost::array<T, n + m> result;
   std::copy(x.begin(), x.end(), result.begin());
   std::copy(y.begin(), y.end(), result.begin() + n);
   return result;
}

Isn't the problem of Pascal's arrays only that you can't write something like template<int n> in C++?

I'm curious

How would it work when you doesn't know before the length of the returned array?

For example, a filter function f(array[n] of T, predicate(T)->bool) -> array[m] which returns an array composed of the element where the predicate function is true..
All you can know at compile time is that m will be inferior or equal to n..

You can't write such a

You can't write such a filter function returning an array with the correct size in C++ as it's unknown at compile-time. You'd have to return a vector, linked list, deque or any other collection with a dynamic size as in other languages.

Off-topic, if you are curious what you'd normaly do in C++ (but it's not interesting from a type system point of view):
You normaly don't program algorithms for a particular collection but you work with pairs of iterators representing a subrange of a collection (actually not only of a collection, but perhaps a file or whatever).
In this case you'd use std::remove_if to filter the whole array (well it's the opposite of filter as it removes the elements for which the predicate is true).

newEnd = std::remove_if(array.begin(), array.end(), predicate)

std::remove_if reorders the collection (or whatever is behind the iterators) so that the elements for which the predicated returns false are in the range [begin, newEnd).

If you don't want to mutate the collection you can use std::remove_copy_if which additionally takes an iterator to the begin of another range in which the result should be placed. You have to make sure on your own that the target range is big enough to hold the entire result!

I think it's a mistake to

I think it's a mistake to assume that you end up with a more convenient general-purpose language by avoiding dependent typing. You probably know that all of these individual design decisions you're making can go away if you just go dependent.

Hopefully we'll have a new Ynot release soon that shows just how straightforward low-level programming with dependent types can be.

That will be interesting to see...

In particular, I'll be interested to see how you get sound and complete inference out in usable form.

Who said anything about

Who said anything about "complete inference"? :)

Dependent Soft Typing?

... at least that's what 'incomplete' inference for dependent types brings to my mind.

So are there pointers?

My hope for this topic was that someone could point out prior solutions and/or publications that we may have missed. Any pointers?

Dependently typed languages

DML (Dependent ML) for example was built for around this idea: avoiding array bounds checks using static information on sizes available in types. The technology of singleton types was also further studied by Frank Pfenning and Bob Harper in different settings, have a look at their pages.

Yes, we fell down this hole

Yes, we fell down this hole in Cyclone as well and ended up with a design similar to DML at first, and then found this too limiting due to the lack of flow-sensitivity in the constraints. So we moved to something more like dependent types to handle the constraints. (That's what spawned the work on Hoare Type Theory and now Ynot.)

You might also check out Iavor Diatchki's thesis for doing sized types in Haskell.

Both of those pointers are

Both of those pointers are helpful.

We're aware of Iavor's excellent work. His work and ours occurred at the same time independently, and I believe that the more general version presented in his dissertation was influenced in part by our discussions. The end form in BitC is more flexible, but not significantly different in its underlying structure. What Iavor's work does not resolve is prescriptive unboxing, which (in practice) interacts closely with the bit-level representation issue.

Ada has to be mentioned, I

Ada has to be mentioned, I suppose. Ada generic units (parameterized units) can have "objects" as parameters, and these can be used to abstract over array length and similar things. Note, however, that you need to instantiate the unit for each length, which is painful and inflexible. Ada offers another mechanism that can sometimes be used to alleviate this problem, i.e. discriminants.

Shape theory

Maybe the research on Shape Theory by Barry Jay is of interest here.

The ltu archives probably have discussions on this.

Constrained types?

Maybe constrained types are worth investigating?