archives

word2vec

So I made some claims in another topic that the future of programming might be intertwined with ML. I think word2vec provides some interesting evidence to this claim. From the project page:

simple way to investigate the learned representations is to find the closest words for a user-specified word. The distance tool serves that purpose. For example, if you enter 'france', distance will display the most similar words and their distances to 'france', which should look like:

                 Word       Cosine distance
-------------------------------------------
                spain              0.678515
              belgium              0.665923
          netherlands              0.652428
                italy              0.633130
          switzerland              0.622323
           luxembourg              0.610033
             portugal              0.577154
               russia              0.571507
              germany              0.563291
            catalonia              0.534176

Of course, we totally see this inferred relationship as a "type" for country (well, if you are OO inclined). Type then is related to distance in a vector space. These vectors have very interesting type-like properties that manifest as inferred analogies; consider:

It was recently shown that the word vectors capture many linguistic regularities, for example vector operations vector('Paris') - vector('France') + vector('Italy') results in a vector that is very close to vector('Rome'), and vector('king') - vector('man') + vector('woman') is close to vector('queen') [3, 1]. You can try out a simple demo by running demo-analogy.sh.

I believe this could lead to some interesting augmentation in PL, in that types can then be used to find useful abstractions in a large corpus of code. But it probably requires an adjustment in how we think about types. The approach is also biased to OO types, but I would love to hear alternative interpretations.

Another "big" question

To continue the interesting prognostication thread, here is another question. Several scientific fields have become increasingly reliant on programming - ranging from sophisticated data analysis to various kinds of standard simulation methodologies. Thus far most of this work is done in conventional languages, with R being the notable exception, being a language mostly dedicated to statistical data analysis. However, as far as statistical analysis goes, R is general purpose -- it is not tied to any specific scientific field [clarification 1, clarification 2]. So the question is whether you think that in the foreseeable future (say 5-15 years) at least one scientific field will make significant use (over 5% market share) of a domain specific language whose functionality (expressiveness) or correctness guarantees will be specific to the scientific enterprise of the field.

It might be interesting to connect the discussion of this question to issues of "open science", the rise in post-publication peer-review, reproducability and so on.

Will scientific fields give rise to hegemonic domain specific languages (within 5-15 years)?
 
pollcode.com free polls