Another "big" question

To continue the interesting prognostication thread, here is another question. Several scientific fields have become increasingly reliant on programming - ranging from sophisticated data analysis to various kinds of standard simulation methodologies. Thus far most of this work is done in conventional languages, with R being the notable exception, being a language mostly dedicated to statistical data analysis. However, as far as statistical analysis goes, R is general purpose -- it is not tied to any specific scientific field [clarification 1, clarification 2]. So the question is whether you think that in the foreseeable future (say 5-15 years) at least one scientific field will make significant use (over 5% market share) of a domain specific language whose functionality (expressiveness) or correctness guarantees will be specific to the scientific enterprise of the field.

It might be interesting to connect the discussion of this question to issues of "open science", the rise in post-publication peer-review, reproducability and so on.

Will scientific fields give rise to hegemonic domain specific languages (within 5-15 years)?

pollcode.com free polls

Comment viewing options

surveillance and propaganda

So the question is whether you think that in the foreseeable future (say 5-15 years) at least one scientific field will make significant use (over 5% market share) of a domain specific language whose functionality (expressiveness) or correctness guarantees will be specific to the scientific enterprise of the field.

Science of a sort: mass surveillance and mass behavioral engineering. Problem for winning a bet, though, is won't necessarily appear in the literature. (Hopefully it also won't work very well.)

What will the language

What will the language include?

re what will the language

What will the language include?

AI-based query compilers.

The mass surveillance domain operates over an ad hoc assemblage of data sources: credit card transaction histories, phone call records, travel records, search histories, web-surfing histories, license plate spottings, feeds of all major newspapers, .... etc.

A term in a query might say: "people who went to Chicago in Septermber and who pay attention to the San Francisco Giants baseball team."

A DSL query compiler can figure out how to interpret that against the ad hoc assemblage of data sets.

A machine-learning program (in pseudocode) might read:

1. Everyone in boldface on the Gotham Times social page is a swell.

2. Test the hypothesis that swells can be picked out of a group by looking at their credit card balances, travel itineraries, and number of pets.

3. List, with confidence levels, the 20 most obvious swells in Pennsylvania who are not listed in boldface on the Gotham Times social page.

I may be showing my biases,

I may be showing my biases, but that doesn't really look like a programming language to me, more like a natural language query language. I am pretty sure these exist and have existed for many years.

query language

Ehud, you're possibly missing the point here. The query language being "natural language" is just referring to the parser. And we know compilers consist of a lot more than parsers!

I actually worked on a related tool briefly: a data aggregator. The academic involved used to work on a similar tool with another group, their product was purchased by Amazon for something in the order of a hundred millions of dollars.

The data aggregator provides a uniform interfaces to heterogenous data sources: google searches, medical databases, government registries: whatever you write a driver for.

Searching such a mesh of data isn't the same as a relational database, because it involves weights and costs. So a programmer will be using some powerful language to tune search algorithms for a specific domain, such as law enforcement, national security, or financial market analysis (for example). That language is itself domain specific (heterogenous data searching) and will be used to implement even more domain specific languages (that is, be self-specialising).

I have no doubt some kind of "AI" will be involved (whatever that is:) The complexity here is so extreme that it is hard to believe it could possibly be done without a domain specific language to manage the search process.

I spent a few years working

I spent a few years working on query languages (including a very cool optimizer), so I am not dissing this. It just seemed that the factors you mentioned ("weights and costs") weren't the focus of the message I was replying to. I am fully open to discussing them, especially as they relate to the language vis a vis the runtime system.

re: more like a natural language query language.

more like a natural language query language.

Partly.

The example was also meant to illustrate the concept of a very high level "shell scripting" language for scheduling ML trainings over various data sets and to explore various human-specified social hypotheses.

The pseudo-code script would allocate nodes on some cloud, according to some budget, try to learn to recognize "swells" based on various metrics, testing itself on the definite list of swells identified via the style page of the newspaper.

Excel and MATLAB will remain

Excel and MATLAB will remain quite popular, if that's what you mean. Whatever would make a DSL successful, it has nothing to do with correctness guarantees or even expressiveness really, and everything to do with usability and accessibility.

As for open science, Phillip Guo has done a lot of work in this area...using Python I think.

Excel and MATLAB will remain

Excel and MATLAB will remain quite popular, if that's what you mean.

Of course, that's not what I mean... The question is whether a new language, specific to some scientific field, will emerge and become significant.

Do you mean, since these

Do you mean, since these languages are used across many fields, they are not very domain specific? Domain specific languages and systems (like Mumps) are already quite numerous, but I have the feeling you have a more specific definition of domain specific?

Sorry if I want' clear. In

Sorry if I wasn't clear. In the current what I meant was specific to a scientific field. Hence why R isn't an example of what I have in mind. I am asking about DSLs that embody scientific assumptions (whether substantive or methodological).

So you mean like a

So you mean like a scientific workflow system?

That may be one I direction.

That may be one I direction. I am actually more interested in something like "a domain specific language for the
modeling of genetic regulatory mechanisms" (e.g., GReg) or "A Domain Specific Language for Specifying and Constraining Synthetic Biological Parts, Devices, and Systems (e.g., Eugene). That sort of thing.

As for open science, Phillip

As for open science, Phillip Guo has done a lot of work in this area...using Python I think.

For sure, the current best practices of open science use currently available languages. The question was about what the future holds. Remember, the context here is "big questions for the field to address".

True, those advances have been going on in other fields, not PL. I'm not sure what we could contribute ourselves, that say the systems community can't provide in the form of virtualization and by recording non-deterministic choices.

Not sure myself. Hence the

Not sure myself. Hence the invitation to prognosticate.

Frameworks and DSLs

I think your bias is showing ;-) Despite most of our backend being in an unusual mix of JavaScript and OpenCL, our 'data science' is Python + analytics frameworks. In particular, Pandas (dataframes), some libraries (graphx, networkx -- which we've started wrapping into a framework), and Spark (~LINQ and SQL). It feels like the bulk of experimental stuff has gone into such systems, and as any individual tools is useless on its own, Python is winning for glue.

So when *would* a silo'd language specific to a problem emerge? My guess is twofold:

1) The brief beginning of a field, when it's the only feasible bet -- imagine bio workbench automation, like microfluidic controllers.

2) End-user programming, where the emphasis is on "hiding" the programming. For example, imagine bio workbench v10: it'd be incredible to show a picture of cells, punch in when/what to feed them, and get an email when they deviate from the expected picture.

Interim results

With 19 votes in, we have 10 YES and 9 NO votes. So we are evenly divided on the question whether new languages, specific to certian scientific domains, will emerge in 5-15 years.

So here's one idea

Will probabilistic programming languages evolve to provide greater support for the inferential challenges of specific scientific domains (in particular in the context of building and analyzing simulations)?

OK, let's try another. Units

OK, let's try another. Units of measurement.

Any reason why that still

Any reason why that still hasn't caught on? It doesn't seem to be technological.

context and working memory

When you're working on a particular problem, you have a lot of relevant context in your working memory. In the case of units, it's either 1) pretty obvious the units for a particular formula or 2) trivially derived... while you're writing it. You never type units in to your calculator for an individual operation, so it feels burdensome to explicitly mention them in your code, especially when they are derived units without their own abbreviations, like "foos per bar".

When writing new code, it's preferable to write less explicit context to minimize typing (on a keyboard) and to maximize how much of the problem at hand fits in your head or field of vision. When revisiting older code or code from others, it's beneficial to have more context. You need to recreate the mental state that lead to the writing of that code. As code becomes more familiar, the context that was once beneficial becomes clutter. Compare to type annotations. They are onerous without inference, but valuable as reference, until familiarity, then you care more about things like argument order, or detailed semantics.

Relating to your research, I'd like to see experiments with a types (or units) equivalent of a text editors ¶ button, showing type (or unit) annotations rather than hidden whitespace.

I've thought about doing this in APX, where backward (quantum) type inference makes it super easy, type parameters support generic programming as always. I don't really have a good story for unit products (e.g. m/s * s = ms/s = m); but I could at least hack that in via an explicit unit construct. It also isn't a very high priority, yet (it would be if I decide to go after the iPython notebook crowd).

As for showing type annotations, it is a visual design problem. When to show them and where to show them? Is the line gutter good enough? Space on the line is limited if I want to keep true to what the user types.

AFAIK

Does it work well in

Does it work well in practice?

It works well for what it

It works well for what it does, but it does not scale up to realistic code, at least not easily. For example if you have a vector with components of different units you have to use a tuple rather than an array of floats. So code that used to work for any vector is now specific to a vector of particular size and units. Think about what type a generic matrix multiply function would have, and how the type checker would have to check that an implementation has that type. With type level lists or indeed full dependent types it would be a lot more useful because then you could write code that is generic in the shape and units of the tuples.

Vector should be a subtype

Vector should be a subtype of tuple whose element type parameters allow for addition and multiplication.

In fact, a position vector in APX is made up of horizontal and vertical coordinates; not exactly units but useful to have in cases where extra feedback is desired.

Rot90

Doesn't that distinction go out the window once you support rotation? (Maybe not - honest question)

These are "brands" that are

These are "brands" that are optional. They can be ignored, it is simply extra feedback on those that is not used for correctness checks (rather, for scrubbing). Units would probably not be brands, since checking is important, but I haven't explored them enough to come up with a design.

It seems easy to just represent most value subtypes as brands: you can associate them with behavior, you can customize the UI, and you can promptly ignore them when not convenient (any value can take on a brand, and brands can be dropped). They don't work so well for objects however.

I'd say units should not be

I'd say units should not be part of a type hierarchy, but a separate tag attached to each data item. Or, as part of some grand generalization, a language should provide multiple type systems and allow to combine them. Then units could well be one of them (but it takes dependent types to implement a useful unit checker inside a type system).

Units are all about which symmetries have to be respected. For instance a float of meter is like a float except any function applied on it needs to satisfy a certain scaling law. See this paper: Abstraction and Invariance for Algebraically Indexed Types.

I'd look at this the other way around

There is a parametricity result along the lines of: If function foo takes unit u to unit u^2 for any unit u, then, operating on scalars, it maps (k*x) to k^2 (foo x) for any k. I don't think the unit system needs to arise out of the scaling invariant, though. I think the usual approach of defining a type Quantity is a reasonable foundation for units.

That's the traditional view but I don't agree. The parametricity law is fundamental and the type checking rules are a conservative approximation to it. A function like x -> exp(2*log(x)) is a perfectly fine function of type u -> u^2, though the type system may not know that because the unit type checking has to be a conservative approximation. I don't think it's right to take the conservative approximation as fundamental and the beautiful general property that characterizes the extensional meaning as secondary.

Which is more elegant?

I disagree that representing units as numbers that transform in a certain way is the elegant way to do things. Look at the way that differential geometry is done. Many older treatments defined scalars, vectors and one-forms as numbers that transformed in a certain way under change of coordinates. Now most formal treatments tend to treat them abstractly, instead, with what they "are" hidden. I think this is the cleaner way to do things.

How are you going to handle constants? Is the radiusEarth just a number that changes linearly with some arbitrary distance parameter, like meter?

How are you going to handle typing your exponential example? You can handle that the standard approach by inserting an explicit "lift", taking a function that preserves the appropriate scaling law to one on the corresponding unit types. So using lift introduces a proof obligation that wouldn't be automatically checkable by types. The properties that would be checked by types are those that can be reasoned out locally with unit annotations. That's how things with types usually go.

RadiusEarth is 6.4e6 times one meter. That's exactly how physicists think about the RadiusEarth too. There is a reference quantity which we call one meter, and RadiusEarth is about 6.4e6 times that.

Given a proof that the function satisfies the scaling law you can give it the type u -> u^2. The reason this is safe is precisely because the scaling law fully characterizes units. If you define the meaning of units by its type checking rules which are a conservative approximation, how would you know that it's safe to give that function that type? The scaling law is just a consequence of the type checking rules.

Defining radiusEarth in terms of meter just begs the question. That's how I would define it, too, but then what is meter? I'd say it's an abstract constructor that builds a distance quantity. In your approach, it seems that it has to be a formal parameter to everything that uses distance.

Again, I wouldn't give a type of u -> u^2 to an arbitrary function that satisfies the scaling law. Consider the function f(x) = (x+1)*x - x for a silly example of why not (what would be the value of the sub-expression x+1 if x is of unit u)? What I would do is provide a lifting mechanism to lift such a function to the type u -> u^2:

lift f = f(x/d)*d^2

If f is scale invariant, then this lifting doesn't depend on the distance d chosen.

meter is just an arbitrary

meter is just an arbitrary reference floating point number that represents one physical meter. It tells you the relationship between a number in your program and a physical meter. For instance we could use float meter=1.0 as the reference quantity, or meter=2.3, or anything. The point of units that the value of the reference quantity does not matter, since all operations on the unit are scale invariant so if you end up with a number with no units then it can not depend on the value of meter even though meter might have been used in its calculation. It isn't a formal parameter to everything that uses distance. Only functions that convert between meters and ordinary floats need to know the value of meter (e.g. for printing output in meters).

The function (x+1)*x - x is indeed another example why taking the scaling law as primary is a better way to define the meaning of the type u -> u^2, rather than the type checking rules for +, *, etc. I view it as similar to this: what is the meaning of a refinement type {x | p(x)}? Is it "values x that the type checker accepts as having that type", or is it "values x for which p(x) is true"?

I agree with your point that

I agree with your point that units are ultimately about scaling behavior. But how would you turn that idea into an algorithm for verifying dimensions and units? That algorithm would have to know that certain operations (arithmetic, in practice) on certain data types (numbers but not only) obey certain mathematical rules. That's not the kind of information that typical type systems are based on.

Apples per Hour

What you want is to be able to divide distance by time, getting a number (which because of scaling is correct) but also know the correct unit is ms^-1.

In order to do this you need to know the dimension type of each number. The dimension can be anything. I have thee apples, and I eat them in one hour. I can eat thee apples per hour.

Really Integer should not be a type, but a class of types, where you have to give it a name like 'Apples' before you can use it as a type.

Also.

Dimensionless Numbers

Dimensionless numbers cause more of a problem. For example strain, which is change in length over original length. The fact that a number represents strain is relevant information, yet it has no dimension. But this is probably just a special case of where two different units have the same dimension. For example Torque and Energy are both Nm, but Torque is not measured in Joules. In this case it is because Torque is a vector and Energy a scalar.

Yes, that's why you either

Yes, that's why you either need a dependent type system, or a cast operation so that you can say: trust me, this function satisfies this scaling law.

Parameterized types + upcast/downcast

To be effective, parameterized types + upcast/downcast can be used where

Upcast:
* From of extension to of extended interface
* From of discriminant to of discrimination type
* From of implementation to of implementation extension interface

Downcast:
* From of interface to of extension
* From of discrimination type to of discriminant

The above provide means to state axioms about scaling, transformation, invariance, etc.

My point was that in order

My point was that in order to observe that the scaling law holds that implies a value is a distance, you need to be able to observe the dependence on the meter parameter.

Is it "values x that the type checker accepts as having that type", or is it "values x for which p(x) is true"?

I'm not sure how you're looking at the usual approach as leaving the semantics of units up to the type system. That doesn't seem to be a fair description.

A meter is not an arbitrary number. It's not a number at all. It's a symbolic value. Modeling it as a number feels like a hack to me.

Abstract quantities vs. symbolic quantities

Actually, I'm wrong to say that quantities should be "symbolic values". That would be a big mess (which symbolic expressions are allowed?) Rather, I think the definition we want is: quantities are functions of a hidden physical parameter space. I think that captures the meat of what you (Jules) are after while eliminating my objection about them being abstract and not numbers. You can lift any operation on numbers to quantities, pointwise, but they are not numbers themselves. And you can represent them as regular numbers by making the fact that they're functions something you reason about but can't use.

That's closer, but I don't

That's closer, but I don't think that's necessary or desirable. A quantity of unit meter depedends on the value of a meter, but it's not a function of it in the sense of being lambda meter => ...expr.... But if you change the value of one meter, then the value of all quantities changes too. In particular if you double the value of one meter then the value of all meter quantities doubles too.

Suppose a physicist writes a program that takes as input all relevant physical quantities and outputs whether a rocket with those parameters will work. The type of such a program would be forall lengthUnit. forall weightUnit. float<lengthUnit> -> float<lengthUnit> -> float<weightUnit> -> ... -> bool. This is just a program that gets compiled to a function willRocketWork : float -> float -> ... -> float -> bool. If the program is unit correct then you should be able to use it with both metric and imperial units. For instance if the first parameter is the length of the rocket and the second parameter is the length of the trajectory it must fly, then it should produce the same answer for:

// metric (meters)
willRocketWork 54.3 5643.5 ...

// imperial (inches)
willRocketWork 2137.8 222185.0


The program uses floats under the hood, the unit correctness is an invariance law which is external behaviour of the function.

A program with defined units like this:

define unit meter
define unit kg

... program here ...


is equivalent to a program that's quantified over these units, and gets passed in the value of one meter and one kg, and thus must satisfy some scaling law. This is similar to the correspondence between existential and universal quantifiers.

Note that at no point do we need to represent a value of type float<u> as a function, neither conceptually nor in actual implementation. Everything is reduced to quantification over units, and that's just values (usually functions) satisfying a scaling law.

Semantics

A program with defined units like [...] is equivalent to a program that's quantified over these units, and gets passed in the value of one meter and one kg, and thus must satisfy some scaling law.

Could you explain how you intend to reason that radiusEarth^2 is an area if radiusEarth is of type float?

If you can do it, that would seem to imply that your unit types are substructural. That looks like a mess to me.

radiusEarth is not of type float, but of type float<meter>. The values of this type are floats, but functions on it must respect scaling laws.

Perhaps an analogy helps explain what I mean. In homotopy type theory you have types with user defined equality. For instance we can define integers modulo 5 as integers, except that n and n+5 are considered equal for all n. This places an obligation on the context in which such integers modulo 5 can be used: they can only be used in contexts that are invariant under adding 5. The representation of such an integer modulo 5 is just an integer, but the program must respect the equality so it cannot distinguish 0 from 5 or 2 from 107. The values of this type are integers, but functions on integers modulo 5 must respect some invariance laws.

This kind of type system indeed goes beyond what type systems usually do, although parametricity has the same flavor: the meaning of a type is not fully determined by the set of its values.

I don't know if that makes sense...

Note that I'm not proposing anything radical here. The only thing I'm saying is that scaling laws are not just any old consequence of the type checking rules for units. They could (and should) be what defines the meaning of unit types, e.g. a function f : float -> float is valid for the type float<u> -> float<u^2> precisely when f(ax) = a^2*f(x), and potential new type checking rules are valid precisely when they are justified by the scaling laws.

To be honest I don't understand what exactly you're disagreeing with...

If you don't see my point,

If you don't see my point, try modeling the situation in Coq. It sounds like you do have some kind of substructural type system in mind. I prefer making everything fully structural. Parametricity has a different flavor, to my taste.

What type would you give to 'log radiusEarth', I wonder? float<??>

I'm saying that would be a value of type Quantity, possibly with a refinement type capturing exactly the log-dependence on distance.

I don't have a different

I don't have a different type system in mind than what is already in the literature on unit systems. Abstraction and Invariance for Algebraically Indexed Types. That paper also contains a formalization in Coq.

Since radiusEarth needs to be used scale invariantly, log radiusEarth needs to be used translation invariantly (because log turns multiplication into addition). This is a different invariance, but it's also treated in the paper.

How does your Quantity type work, and what does the refinement type look like?

A difference

It occurs to me that there is still a noticeable difference between what you're (they're) proposing vs. what I had in mind. Namely, what happens if you try to compare two quantities?

In my approach, I would probably type the comparison operator for Quantities to require that the comparison doesn't depend on the physical parameters. In your approach, this is presumably a valid program:

f x y = if x < y then 1 else 2
g x y = if x < y then 4 else 3

foo x:Distance y:Distance^2 = f x y + g x y

The function foo is indeed

The function foo is indeed invariant under scaling, so that's a valid function of that type. Whether the type checking rules allow it is another question, since they are only a conservative approximation and may reject some valid programs (as type checkers often do). In a dependently typed language you could have a construct to turn foo : float -> float -> float into Distance -> Distance^2 -> float given a proof of forall a,x,y. foo (a*x) (a^2y) = foo x y.

That looks like the main difference

I think this is the interesting difference, though. I would package up all dependence on physical parameters into a Quantity type, and not let that dependence escape. If I really wanted to do the example I just gave, I'd create a boolean-valued quantity type that results when you compare real-valued quantities. The branching would then have to be lifted into that type. An implementation could still just execute whichever branch the arbitrary set of parameters took it, though.

I like this approach more (at the moment), but that paper seems interesting. I haven't yet investigated the other applications listed in the paper -- that approach may be more general. Anyway, it was a useful conversation to me. Thanks Jules.

Thank you too :) Maybe the

Thank you too :)

Maybe the two approaches are even equivalent if you have a Quantity version of all types.

Yes, I think you can look at Quantities as being like values in a reader monad (a function like 'meter' asks for the length of a particular world distance in an arbitrary distance-respecting coordinate system) and your approach is akin to putting the entire program inside that monad.

Not usually defined: Meter < Meter^2; but can do Meter↦Meter^2

Meter < Meter^2 is not usually defined. although the following defines < for meters:

⦅Meter[x:Float] <  Meter[y:Float]⦆:Boolean ≡ x<y


Also, the following is a procedure (although it might offend some people):

AreaofSquareWithSideLength[x:Meter]:Meter^2 ≡ x*x


Looks uncontroversial

I was not offended.

Phrasing this idea as

Phrasing this idea as "multiple type systems" doesn't seem to be what you really want, I think. You presumably want to be able to express lists of distances, for example, which shows that there aren't two distinct type systems. But I'm also skeptical of the idea of certain values being both e.g. Float and Distance, as "3 feet" should be a distinct value from "3", not the same value with a different type annotation.

There are two reasons that

There are two reasons that lead me to the "multiple type systems" idea for unit checking:

1. The specification of a dimension or unit is often orthogonal to the definition of the data structure that it is attached to. The label "distance" can go with a number of any type, but also with a vector, usually implemented as a list/array of three numbers. There are also cases where you'd call some values of an algebraic data type a "distance", e.g. for symbolic computations. Of course you'd want to define a list of distances, so my two type systems are not completely separate. That definition would be "a list of (number, distance) items".
2. I have yet to see a useable implementation of units inside a standard type system (that's not what was done in F#, for example). The stumbling block is that distance^N * distance^M is distance^(N+M), for any integer N and M, which would require dependent types, which I consider a research topic rather than technology ready for prime time. Another problem is the orthogonality outlined above - today's type system would force me to define every combination of dimension and data structure separately.

1. Labeling a vector of

1. Labeling a vector of numbers as "distance" is surely just shorthand for labeling the numbers themselves. i.e. it's sugar.

2a. I agree about needing dependent types in general. In an important class of use cases M and N in your example are constants, though.

2b. Unit polymorphism can maybe help there.

This was a question that I

This was a question that I had. If that's the case, then we only need to apply units to numbers, which we can do with a type parameter. Vectors then just need to be generic.

1. Yes, it's sugar, but for

1. Yes, it's sugar, but for the scientist using the language it's important. I haven't seen languages yet that provide such syntactic sugar for their type system.

2a. N and M are indeed often constants, but for a language with an exponentiation operator/function, you can't avoid dealing with the general case.

Python is already replacing MATLAB, and Julia is making lots of progress and they not only focus on getting the fundamentals right but also on the ergonomics, so I don't doubt that it will be successful in replacing Python/MATLAB. Perhaps it will even replace R.

I don't think there will be a language that is tied to a specific scientific field because of the language. In fact I don't believe in domain specific languages at all. Language success in a specific domain has everything to do with the libraries and very little with the language design itself. Language design decisions that are good in one domain are almost always good in all domains. We only need one language that is extensible enough to express domain specific libraries reasonably well.

I don't think there will be

I don't think there will be a language that is tied to a specific scientific field because of the language. In fact I don't believe in domain specific languages at all.

Wow.

Note that what I meant here

Note that what I meant here (if it wasn't clear from the context) is that I do not believe in external domain specific languages. I fully believe in domain specific languages embedded in a host language (also known as libraries...).

libraries are languages

I agree.

But libraries define abstractions, they focus on a domain, they are often much harder to learn than the languages hosting them. Languages that focus on library usage are going to be more successful than languages that focus on gazing at their own naval, of course.

There is still room for restrictive DSLs that are based on non-expressive programming paradigms - declarative markup languages (CSS, TeX), business rule systems, for example - they will continue to thrive. I've seen some interesting output from Markus Völter in creating rich specialized programming environments, but this has less to do with language.

DSLs as Libraries.

Okay, vast over-simplification, but languages generally provide basic abstractions and control flow. You only need a DSL where there is some fundamental mis-match between the language and the problem domain. Even then languages like Haskell allow defining DSL within the language using Monads. I would much rather see a DSL library for Haskell than a whole new language for different scientific disciplines. I would like to understand why 'R' persists as a separate langauage, and make sure that a general-purpose language provides sufficient abstractive power to have 'R' as a library with no loss of elegance in the written code.

.

Embedded DSLs are still DSLs.

Embedded DSLs are still DSLs.

There must be some line in

There must be some line in the sand where a library becomes an eDSL, I gues if it play too heavily with evaluation and becomes not very compatible with other libraries in the same language? For example, I would guess that a parser combinator library in scala or Haskell is more of a eDSL because it's point of interactions with other libraries is quite limited?

Sure. You know it when you

Sure. You know it when you see it.

Sounds like a formal

Sounds like a formal definition to me.

I find eDSLs to be really hard to work with, using one is fine, using two is basically impossible. Maybe it is easier in the Haskell world, are eDSLs composable?

Composable eDSLs

They should be, I don't think they are in Haskell, unless you start using MonadTransformers (which really seem the wrong abstraction for composition anyway).

Haskell eDSLs needn't be monadic. If you go in that direction, you'd better use Applicative, which composes better. But even that is too restrictive.

I wouldn't agree to eDSLs not composing in general. But this question is too general to answer.

If you'll allow me a plug, "Language Composition Untangled" distinguishes different kinds of compositions well enough to say what composes and what doesn't, and why. (Disclaimer: I coauthored this).
http://www.informatik.uni-marburg.de/~seba/publications/languagecomposition.pdf

cool, nice paper

is helping me understand a lot! Personally I find this to be a very cool contribution. Defining terms is always a good thing to try to do. (one small nit: html isn't a language, is my gut feeling, but i grok the example. :-)

An EDSL looks close enough to an external DSL

That's a definition I'd go by. More precisely, I'd define "close enough" ignoring syntactic differences, and refine it using some fuzzier concept of similarity. And I mention "external DSL" because I care about domain-specific syntax and domain concepts.

This suffices to exclude most Java libraries and include embeddings of existing external languages. But unlike your definition, I'd say that Scala's BigInt defines an EDSL for infinite-precision numeric computations, simply by operator overloading. Parser combinators are included because they embed a syntax for CFGs (even though that's not usually the semantics; I'd argue that indeed CFGs are a better semantics).

Pure embedding, as defined by Hudak, is more restrictive: eDSL operators implement a compositional/denotational semantics. I'd say the above requirement is implicit.

Finally, about parser combinators not interacting, I have some vague intuition, but I can imagine places where they can interact with other libraries — like semantic actions. Could you elaborate?

Would a parser combinator

Would a parser combinator library compose with a FRP library to create an interactive code editor? Or a GPU eDSL with FRP?

We typically don't expect languages to compose. If I'm using a eDSL that "lifts" execution and allows only manipulation via pure functions, I can't imagine easily using that with another eDSL that does the same thing! This seems to form a nice natural dividing line between eDSL and library. Or maybe its just a limitation in my thinking.

Why wouldn't it compose? You

Why wouldn't it compose? You can write a parser using parser combinators, and at that point it doesn't matter how your parser works internally, so you can use that parser in an editor written using FRP just like you would use any other parser. Of course you might want incremental parsing, and then your parser combinator library/eDSL would need to support that. There isn't a clear difference between a library and an eDSL.

Hmm, if I I want the syntax

Hmm, if I I want the syntax parsed via the parser combinator to be controlled by a signal, that is easy to do? When you have values that are lifted into the parser world or FRP world, then you are limited to manipulating them via pure functions, and then you can't really get them to talk to each other because their worlds are different.

Anything that lifts a value into another world, making it untouchable in the ground host world, is basically what I think should be called an eDSL. So ya, that includes futures/promises, FRP, libraries for expressing GPU code, etc...

Why would you be limited to

Why would you be limited to pure functions? To what are we comparing anyway? External parser generators?

Presumably the FRP/GUI library would have a text field which emits a stream of edit events. The incremental parser combinator library would take a delta in the input and turn it into a delta in the output AST. That would then be further processed and trigger the right changes in the GUI.

Just a plain old parser

Just a plain old parser combinator library that was designed without FRP in mind. We can imagine how we could design them to work together a priori, but not posteriori.

Or just take two independently developed full blown eDSLs whose usage together is feasible, and describe how they can actually be used together. It is not a very easy problem.

I just described how...or tried to.

Just google it, it hasn't been done.

Edit: I wasn't trying to avoid your answer, just that my own search found that this is something that people thought about doing but couldn't. It could just be the lack of foresight on the DSL creator, but I think non composability is to be expected when lots of lifting is involved.

Not sure about your specific example, but EDSLs don't compose is too ill-defined. From presentations I listened to, I can think of Composition and Reuse with Compiled Domain-Specific Language — and composing EDSLs isn't even the contribution there, cross-EDSL optimization is.

You might still have a point... but I'm not fully convinced yet. (But I don't have time to try this out myself just yet).

I just don't think this is

I just don't think this is design consideration: take one eDSL that changes how code is evaluated (by lifting), and another one that does the same...both are trying to re-interpret the world, so how would ad hoc composition even make sense?

Languages with eDSL support

I think an eDSL should be a first class abstraction in a language.

Everything is going the

Everything is going the opposite way though. Instead of SQL people are now using relational algebra combinators. Instead of template languages to build HTML people are now using HTML expressions in Javascript. Instead of CSS people are beginning to just use records of styles in Javascript. Instead of parser generators people are using parser combinators.

It turns out that embedding a DSL inside a host language as a library with or without syntactic sugar is incredibly useful. You have all the abstraction facilities of the host language at your disposal. You see this very clearly in e.g. HTML templating languages: it just makes a lot more sense to be able to define a host language function with some HTML generating code in it, than build an elaborate template language with sufficient expressive and abstractive power.

Structural editors will shine in this area, because they remove the trade off between DSLs (pretty syntax) and libraries/eDSLs (a powerful host language). You get to have an arbitrary domain specific syntax and a powerful host language.

non-programmers

Designers still need CSS, they aren't going to become uber, or even adequate, Javascript programs overnight. If they are spending their 10,000 hours of practice on design, they don't have time to become programmers. But CSS is very hackable by Javascript, so a programmer can come in, take a bunch of input from a designer in the form of CSS, and create an expressive program. In a world where there is just one programmer, perhaps skipping CSS (or XAML) is an option, but if you have to work with a design team, the work flow is necessarily more complex.

I think catering to non-programmers is the point where DSLs could really shine, though much of the PL community seems to think otherwise (much of the DSL work is aimed at programmers, which is very unfortunate!).

Many designers already know a bit of JS. To build up CSS styles in JS you don't need to know any real programming, just a different syntax, and in a better language/environment that would go away too.

checkoutButtonStyle = {'color': "#FFF", ...}

<button js-style="checkoutButtonStyle">


Now you get a lot of advantages simply because you are in a full programming language. You get variables, math expressions, style mixins, abstraction, etc.

buttonSize = 200
checkoutButtonStyle = merge(someBaseStyle, {'height': buttonSize, 'width': 3*buttonSize})

<button js-style="checkoutButtonStyle">


Now imagine this in an IDE with a statically typed language with record types so that you get autocomplete of the possible styles (color/height/etc) and their values (a color picker for color, etc), and with operator overloading so you could write styleA + styleB instead of merge(styleA,styleB). That would be a close to ideal interface, yet it's embedded in a full powered programming language.

I'm not sure about catering to non programmers. A lot of HCI work seems to be focused on that, but I haven't seen anything successful yet. I don't think there's much space where a non turing complete DSL would shine, in between GUI interfaces and a full programming language. If you learn to work with a DSL rather than with a conventional GUI you might as well spend a little bit more effort and learn an embedded DSL and later learn enough programming as you go.

My wife is going through

My wife is going through this right now (she was asked to learn CSS for her job). She is an art-school trained designer with a lot of uber design skills (sketching, visual layout, color, interaction, user research), which have take a lot of time to acquire and maintain; in contrast, web design-developers are more rounded, but with fewer design skills as they have used time to acquire a lot of dev skills. Learning CSS is a bit of a challenge for her, the main hurdle being abstraction: CSS provides a bit of that through inherited properties, which is a bit of a problem for someone used to focusing on the concrete.

I'm not sure about catering to non programmers.

Well, CSS does that already, so it won't be going away. The problem is that we don't take much time to reflect on why X is already "successful," we just propose Y that is more "powerful," yet completely misses the point of X's success. PHP is another good example of that phenomena. We just see a bad language and think industry is crazy for using it, that somehow the success of a language is driven by marketing (???) rather than benefits that we basically refuse to understand.

The system I sketched (of

The system I sketched (of which several variations are already in use in industry, it's not my idea) does not have or need any inherited properties.

I started programming in PHP and programmed solely in PHP for 5 years. PHP was successful because it was the only option. Then web hosts started to support PHP and only PHP, so it had that advantage for a while. Using PHP for a new project in 2015 is crazy.

I only meant to use

I only meant to use inherited properties of an example of what my wife was having problems with. I think even more complex abstractions are going to be much harder, but thankfully CSS doesn't have any of that. What you sketched would be much harder for her.

PHP was successful in its time not just because it was the only option, if it was so bad another option would have appeared (BTW, Facebook still uses PHP, they haven't migrated completely over to Haskell yet). It is hard for us to think like non-programmers or new programmers, so we can't really see why they worked very easily. That is the curse of PL: we are great programmers (well, I hope so) wanting to make tools basically for ourselves.

Alternatives to PHP have

Alternatives to PHP have appeared, and they have replaced PHP except for legacy code.

Although I think that styling system is simpler than CSS, it doesn't really matter. CSS is for professional designers, and they already have preprocessors to extend it (e.g. less/sass) which bring it closer and closer to a full programming language. First came the variables, then mixins, then namespaces, then generated styles. The march to turing completeness is inevitable, and then you end up with 100 badly designed languages each with its own peculiar syntax and quirks. Because they are all separate languages it's nigh impossible to make them work together. Far better to embed all those as a library in one general purpose language.

There is some truth to that,

There is some truth to that, but I think the march occurs independent of these lower end users. The high end users push for more power, come up with new tools, maybe they get used, but some people are left behind to use the old more understandable way. It is really a problem of human modularity: how do I let my designers be productive in what they are good at with the ability to use their work by a dev in more powerful ways than they could manage on their own?

I mention XAML because it solved this problem more explicitly, though I think it is quite crazy to edit it directly. But tooling is important, so a designer can work in a UI tool, hand off XAML assets to a dev, and see it in the application. As a programmer, I hate working with XAML...but because I work by myself and have no need for that kind of human modularity.

Perhaps a friendlier general purpose live programming language would solve all these problems, but even though that's my topic, I don't think so. Maybe the Eve folks will have a better shot at it.

There is some truth to that,

There is some truth to that, but I think the march occurs independent of these lower end users. The high end users push for more power, come up with new tools, maybe they get used, but some people are left behind to use the old more understandable way. It is really a problem of human modularity: how do I let my designers be productive in what they are good at with the ability to use their work by a dev in more powerful ways than they could manage on their own?

I mention XAML because it solved this problem more explicitly, though I think it is quite crazy to edit it directly. But tooling is important, so a designer can work in a UI tool, hand off XAML assets to a dev, and see it in the application. As a programmer, I hate working with XAML...but because I work by myself and have no need for that kind of human modularity.

Perhaps a friendlier general purpose live programming language would solve all these problems, but even though that's my topic, I don't think so. Maybe the Eve folks will have a better shot at it.

Totally

We do address rather advanced problems. From this perspective, it's interesting that PL stuff is actually applied.

DSl

Languages are libraries (functionality) + constraints (semantic) + syntax to pull it together.

Python manages to hit a nice spot for bundling libraries that are almost as sophisticated as small languages. After many DSLs have experimenting with providing custom syntax for their domain, Python takes the other approach. Here is one small, simple syntax. It is enough for each of the DSLs that you want to learn - that gap in expressiveness is a trade-off for familiarity.

The semantics of the library are expressible as shared-state (automatic memory management between the host application and the library), and simple idioms: e.g. generators. This allows enough constraint on the library that a random sequence of API calls isn't valid, but that that there are a large number of sequences of calls that mean different things. This mechanism of restricting the valid call sequences replaces alternative approaches like opaque handles.

The functionality available in Python is hard to rival: a relatively simple ABI to low-level C means there is a python wrapper for anythings, or a simple to build one.

Python was originally designed as a simple scripting language: less focus on how functionality could be expressed within Python and more on how external functionaliy could be chained together. It has evolved a long way from those roots, but for most purposes Python is already occupying that niche of one language that is extensible enough to express domain specific libraries reasonably well.

The original 800 languages paper was talking about a problem scale that was smaller than "all the programs", but the scale still justified the effort of building a language, and for the user there is a cognitive load in learning a new language. Many domains today are problems that are interesting for a day or two - can I hack this onto that to script this one action? For that scale a DSL is too large: a simple installer for a pre-written library with a bunch of examples in a know syntax and similar semantics is small enough to justify the effort.

Seriously, folks

I know this question is of narrower interest than the previous one, but really it's time to get of the fence and participate in the poll!

Interim results

With 29 votes, still an even split: 15 YES to 14 NO votes.

I am a bit concerned that

I am a bit concerned that the third option isn't getting much traction. It suggests to me a lack of confidence in the expressive power of programming languages.

I don't think the expressive

I don't think the expressive power is the problem, so much as the usability vs. cost. Using an eDSL requires more understanding of non-domain-related facts, and using a DSL requires also creating good tooling with feedback, which requires a lot of time and investment for the DSL author. I chose "yes" to the poll, but I expect it'll be 15 years or more before most of these issues become easier. We're getting there though, and proof of concepts that have gotten traction now exist: the copilot eDSL for low-level realtime systems, and Elm for UIs.

Good points. I think a main

Good points. I think a main concern is getting enough buy in from the scientific community to be a viable alternative to main stream tools.

Elm is a good example of a

Elm is a good example of a language that does not need to be domain specific. The ability to run on the web and the support for FRP are largely orthogonal to the rest of the language design. There's really no reason why in the (far) future we couldn't have a single general purpose language that is also used for writing FRP web apps.

Interim results

With ~40 votes, still an even split. If you take the YES votes and the "No, but it's a shame" vote together, the pro-PL view wins 60%. Still rather low, all things considered...

The scientists's point of view

I suggest to look at this question from the domain scientists' point of view, rather than from a computing point of view. Scientists don't care much about the classifications from the PLT world (if only out of ignorance). The important entities in a computational scientist's universe are (1) data, (2) scientific models, (3) computational methods, (4) computational tools.

With the introduction of computers, data took the form of arrays, tables, graphs, and databases. After a period of anarchy, we see a transition to well-defined data formats, often based on XML or a generic binary storage format such as HDF5. These are formal languages, but are they DSLs? Before you answer "no", consider OpenMath, a data format for mathematical formulas that is probably Turing-complete, though no-one uses it to write programs. Another direction that data representation takes is APIs for databases. We might see DSLs for API specifications.

Scientific models are for now the main victims of the computer age. Before, took the form of mathematical equations in articles and textbooks. Today, they are often much too complex to be written down on paper, containing non-trivial algorithms and thousands of numerical parameters. Many scientific models exist only as implementations in computational tools, inaccessible to most domain scientists who aren't up to studying millions of lines of optimized Fortran code to deduce the model they are applying. For a more detailed discussion see my recent article on this topic.

Ideally, we should have formal languages for scientific models. They would combine features of programming languages (to deal with the algorithms) and data formats. Call them DSLs if you wish. There are some first steps into that direction, e.g. OpenMath, but a lot remains to be done.

Computational methods are basically just high-level algorithms. Like scientific models, they tend to exist digitally only as implementations in computational tools (which are, of course, software), or in slight variants such as workflows. Like scientific models, human understanding of computatioal methods would gain if they could be separated from the optimization aspects inherent in almost all scientific software. That's again a potential application for DSLs.

Anyone attempting to work towards these goals should also take a serious look at how scientific notation developed over time. After all, scientific notation is the non-formal precursor of everything mentioned above. An important aspect of scientific notation is that it can be fine-tuned at any time. There is a fixed conventional basis (the maths we learn in school), but many scientific articles start by introducing additional notation and conventions, or even slight redefinitions of established notation. I'd like to see the same flexibility in formal languages, meaning adaptable syntax and semantics. Not *one* language for all of science, nor even a discipline such as physics, but a common language plus extensions, implement in such a way that everything can work together. The closest I have seen in programming languages is Racket.

Ideally, we should have

Ideally, we should have formal languages for scientific models.

That's exactly the thing I had in mind when running the poll.

Not *one* language for all of science, nor even a discipline such as physics

Quite. Hence why I asked about domain specific language for specific fields. Ideally you'd have a language that guarantees properties that are relevant to the demands of specific kinds of simulations in a specific field, say.

Do we count things such as

Do we count things such as circuit layout languages/simulators?

It's not just guaranteeing

It's not just guaranteeing domain-specific properties, though that's clearly nice to have. It also matters that a language for scientific models can be analyzed and processed in other ways than just executing a program. It should be straightfoward to derive, say, an approximation for some important special case, or to prove properties such as conservation laws.

For this reason, I suspect most domains would be better off with a language that is not Turing-complete, but in exchange easier to reason about. I haven't seen any work in that direction - if anyone has references, please share them!

Simulation

As we speak there engineers over on reddit literally crying out for DSLs they can use in simulations.

One could design a language

One could design a language for simulations...I wonder if the name simula is taken?

A simple solution then

Oddly enough they refer to a language called simula in the thread...

I'm a bit confused: we have a ton of DSLs already in numerous domains. Not just a few, they are all over the place. They just aren't done by PL people in a PL way, so we are pretending they don't exist?

Can you give examples of

Can you give examples of specific scientific fields, where scientific results are produced using simulations implemented in what you would consider DSLs (in the broadly sense), and the credibility of the results relies at least in part on those DSLs? Further, do these DSLs embody and scientific field-specific domain knowledge?

Modelica

The closest I know is Modelica.

Not specific scientific

Not specific scientific fields, but there's the CVX Matlab eDSL for convex optimization, Stan for probabilistic programming using MCMC, Infer.NET for probabilistic programming using expectation propagation, various tools for differential equations (Matlab libraries, Elmer, etc).

If you think about it almost all scientific simulations can be reduced to three fundamental problems:

• Integration (probabilistic programming, quantum simulations)
• Optimization (linear, quadratic, cone, convex optimization)
• Equation solving (newton's method, finite element method)

There are tools for all of these but they are not specific to a scientific field; they are specific to the class of problems they solve.

Quite. My ruminations are

Quite. My ruminations are about closer integration with substantial theories or assumptions of specific scientific endeavors.

How specific? Most science

How specific? Most science fields are not unlike computer science, with many sub-fields that are barely related to each other.

I also think we are interpreting domain differently. When you say domain, I get the feeling that you are referring to a specific science, but domain could, and maybe more often, used to describe more specialized but still pretty general activities, like simulation, statistics, or modeling.

I am talking about specific scientific domains, and the domain knowledge unique to each field: say population genetics, quantum feild theory, that sort of thing. The narrower the better.

Programming languages,

Programming languages, graphics, machine learning, robotics, networking, then as well? Even then we have subtopics, like deep learning. Are they well supported with DSLs already, should they be? (I actually know of many for DNNs, but none are really used)

Boxing in creativity?

How useful is a language targeted at one specific area? Cross-discipline research is getting more important, with goal-focused cross-discipline groups occurring more and more (research themes, where a common interest is shared by research groups from many departments). Is a language targeted only at one area going to limit discoveries by 'boxing' in the research?

DSLs built on less-specific DSLs

This kind of thing happens already, but it tends to happen as "toolboxes" built on top of existing, less-specific domain-specific languages, i.e., languages that are specific to computation and data analysis, but general-purpose relative to a specific scientific domain. Matlab and Mathematica being the two most common contenders, although SciPy (which is in turn built on Python) is gaining a lot of traction. For example, the specific domains you called out are covered by PGEToolbox (Matlab toolbox) and FeynCalc (Mathematica package).

Robot Scientist

Are you familiar with the Robot Scientist Project? There is some interesting work within the project on synthesising knowledge and theories within specific domains. They were framing this work within Inductive Logic Programming once, although that was a long time ago and I've not read their more recent output.

Never heard of it.

Never heard of it.

Simula

Simula, designed for simulation in the 1960s...