An idea and syntax for "programming language"

Wanted to know, what you think about this kind of programming language:
WWL

In case you think it's good, I would like to have some cooperation in developing it as I haven't all that time alone - but I think it would make web a better place ;)

Tambet VÃ¤li

it's a smart idea but...

The idea looks like it is in its very earliest stages so I do not suppose that you will find a lot discussion about it here in LtU. There is not much there, yet.

From skimming, it looks like there are, in one manner of speaking, several ideas there, all bundled up:

Users should express subjective relations among pages they read such as "A and B are about the same topic but A is better." Or, "A and B are about the same topic but A is highly technical while B assumes less background."
Users can express their preferences by editing a textual declarative program directly, or by using interactive widgets (e.g. browser plugins) to automagically edit such scripts.
Users can share their scripts, not least by putting them on a web page!
Programs can collect such scripts from many users (e.g., by crawling a bunch of web pages) and build a database. It is important for the scripts to be automagically "composable" in some meaningful sense.
An inference engine, of some sort, can perform useful computations over scripts and databases of scripts. For example, let's suppose that I am reading something out of my field - something in synthetic biology, say. I keep seeing references to "Sanger sequencing" and I want to learn more about it than the Wikipedia entry offers, but I'm getting lost in the most technical documents I can find. Perhaps a big database of people's subjective impressions of these documents can help me find a more advanced article on the topic, but one that I can still follow. At the very least, maybe the database can help me avoid articles on the topic which many users regarded as having decent "taste" have flagged as "misleading".

That's a smart idea in at least this sense: it takes the basic idea of Google's "Pagerank" algorithm, but tries to one-up it with a little bit of help from users. The raw number of links to an article is interesting - as Google shows. If we could separate out links where the linker is saying "here is a crappy article" from links where the linker is saying "here is a great article" or "here is a great article for laymen", etc.... then that's even more interesting. And you propose tools to help users give hints like "great article" or "crappy article" when they link... and search (and other applications) take it from there.

I can imagine that working in several domains.

Here is your problem:

Almost every part of your idea has been thought of, save for some details. You need to look at prior art and ongoing R&D by others, in my opinion.

If you are looking for syntaxes for expressing relations among pages, start looking into the body of work dubbed "the semantic web". You'll find, there, a syntactic framework ready to use (actually, more than one, unfortunately). You'll find work on building distributed, decentralized databases. You'll find work on inference engines. There are a lot of advantages, imo, to joining that approach for your (apparent) goals. "semantic web" - start there.

What looks like it could be novel in your project is your ideas about what particular relations among pages to support, what particular kinds of inferences to draw from them, and what kind of authoring and consumption tools to build around that. I would suggest taking a fairly long look into the semantic web world and thinking about how to re-express your ideas in that context. [Disclaimer: free advice, and worth every penny!]

By Thomas Lord at Tue, 2010-01-05 09:15 | login or register to post comments

To Thomas

About those systems you suggested to describe (UIs) - yes, I have designed them a lot. Tested the syntax against many of them. I think that most important thing is to create this framework. Sharing scripts between users is, in fact, always putting them to web. You need some localhost:1234 to develop something in your own machine. If you register in server, your scripts main page will be listed in this servers script as unit (or import). You can also import servers listings. This general server will crawl those pages starting from one of such servers. Users can import everything from some server - for example, if it's trusted by biologists etc.

Descriptions of plugins and web interfaces are for future. They haven't to be implemented as first step - I think about it as general language supporting many kinds of interfaces; user has a kind of driver or service, which is able to update script on her web page. This driver provides interface to different browser plugins, UIs and other tools, which will update users scripts. At first it will be good if user can update them with text editor and ftp client.

Anyway, good plugin's interface would look like that:
* User can select package name for toolbar (this toolbar changes that package).
* User can add one-page buttons, which will add one-page relations to her script (like IsInCategory(url, SomeUser:CategoryPackage:CategoryName). Pressing this button will add current page.
* User can add sequence buttons, which will add sequence items about order of pages (like someseq:urltab1 << someseq:urltab2. someseq:urltab2 << someseq:urltab3. etc.) Pressing such button will show dialog, which contains all open tabs in their order in browser. User can select some set with checkboxes and reorder them if needed, clicking OK will add them).
* User can add buttons for multipage relations. This is more complex case, which I wont describe here - and not so often needed.
* There is also a manual script button, which allows to select some from defined relations-sequences to add it, then select urls from open pages or write their names (like you can do in Eclipse, basically, with autocomplete). You can also just write with it.

There is sidebar, which allows following:
* User can create relations, which select result from some relation (Prolog, again, defines the query syntax) and order them with some sequence, showing x first results. Server and users driver will do the magic.

This is the use case. I won't be too specific as I think that scalability is important feature of this tool. Special case handlers are already there.

-----------------------

It seems that I should tell you some history of this idea to make it more sensible. It is, actually, one idea. It started up from thoughts from several fields - for practical use -, but ended up with this one interface.

My language is designed around this concept:
IsBetterThan(A, B) :- IsBetterThan(A, C), IsBetterThan(C, B).

This sentence in Prolog states that IsBetterThan is such kind of idea, which yields some kind of sequence. I did think about software, which could allow free use of such ideas. That was Prolog. Prolog shows that such sentences - kind of logical rules - can be used to build very complex logic. You can express any function with them, in a turing-complete manner.

So, the whole syntax of mine is completely writable in such form.

My problems still were:
* Leaving all those possibilities open will lead to very complex and slow syntax. This was one reason to create sequence type.
* Prolog is very complex language for programmer with C or Python or PHP etc. background. For me, it's not so easily readable as many other language. Thus, I created (once again) the sequence type and also some operators like "if" and "switch", which do essentially different things from their C counterparts, but look so similar that they are intuitively understandable for C programmer. I made sure that they could be converted to simple Prolog statements.

As I know about possible optimizations to sequence type, I decided that this will be most natively supported way to state facts. That's also because sequences of that kind allow easy mixing of data from different users. Other rules are there, but I don't plan to implement them in very intelligent way - this is left to implementations of future. I will, still, implement them in reasonable amount.

------------------------

Now, yes. Many parts of it are already done in different forms. I would mention:
* There are systems like Amazons book suggest. If there was such tool under compatible licence, I would happily use it to share Prolog sentences between users.
* There is fuzzy logic, different kinds of expert systems etc. This is a good framework for expressing varying certanty between ideas.
* There is Prolog, which is really scalable tool if used like that. There are also fuzzy logic modules for Prolog.
* I want to support this particular sequence type as it allows to express strengths of different facts, comparison of sites on different scales etc.

----------------------------

This Wiki there is currently, for some reason, unupdateable. It became so very fast. I haven't got reply from administrators of that site, so I paste some possible implementation details to here:

== Introduction to implementation guidelines ==

=== About ordinal relations ===

Relations in form:
R(a{, b}*)

Could be matched one-by-one, doing no implications. That's because implications make the whole process much more complex - not only that they make fact database very large, but in system like such, implications actually have smaller degree of certainty than directly expressed facts. Also, when each such relation is atom in database (one entry), it is clear that some system like Amazon's one could be used with small modifications. It's only needed to find such system with compatible license and API.

Also, rules can be matched in the same way - making them atoms. Each structure consisting if's and switched can be made into a set of Prolog rules in normal (simple) form, then they can be matched as atoms. If several rules are in contradiction, they should be ordered by priority. It should be made clear that the Prolog, which is used, does not detect any contradictions as errors. Also, low-priority rules contradictory to facts could be removed as well.

=== Explanation of sequence types ===

Sequence types are the most powerful tool of wwl, specifically at first versions (whereas later, relations might become as well-supported and thus more powerful). It's also good syntactic sugar for some kind of relations and also a data structure, which allows very good matching possibilities.

From these kinds of queries (supposing that a, b, c etc. are mapped to some sequence elements):
a b d e

It is possible to say that a

From these structures:
a a

It is possible to imply more relations.

Those two sequences are ''compatible'', but not ''matching''. This is, because they do not break the topology.

Adding one relation to the other:
a

It will be ''matching by implication'', because the same fact can be implied from the first one. They are still not ''directly matching'', because no facts are identical. This is possible that in practice, direct match is stronger match.

Given that some fact base is containing the following:
c

It will be ''contradicting'' by implication to first and directly to second fact set.

Having the following:
a b d e

It will be sequence with two ''unrelated subsequences'' [which are not subsequences of any other unrelated subsequences, to be mathematically complete]. There could be also strongly connected subsequences (supported by several facts) and weakly connected subsequences (connected to very few facts in relation to their size and internal interconnectedness). Weakly connected subsequences should not be considered in first implementations as something separate as they are not anything special.

=== Matching sequences ===

There are, thus, several kinds of matches and the possibility to join.

Based on the range of compability, number of direct and indirect matches and number of direct and indirect contradictions, it is possible to find the general similarity between several sequences.

To create satisfactory database, there is possibility to get two kinds of structures from two or more sequences - list of direct and indirect matches (in form of union) plus list of direct and indirect contradictions (in form of union of undirected relations or simply a number in first versions). List of matches can contain two arrays - one array containing strength of element in one list (based on if it's direct match, indirect match or implied from rule, where last one is weakest). Both arrays are connected to owners of those sequences. As relations count as atoms, they could be stored in separate objects, which are pointed together with direction of match.

Creating a match between two users should be relatively simple, but for all matches of ten users it belongs harder - it could be 100 match lists already. Luckily, with few billion operations computer is able to do per second, this matching becomes simpler - memory use, anyway, is a problem. It should be something like O(n) to be usable.

Now, having all those atoms in one database it actually is not hard to create linked list from each atom to all members, who have this atom. Atoms could be stored in binary tree for easy match. Each atom contains direct sequence relation in form of sequence pointer (user->package->unit->name) and two atoms, which are related by this sequence. They start linked lists of users, who have declared relations between them, which are kept sorted. When adding a pointer to each such relation, which points to currently selected user, it is possible to iterate over similarities between users (going through users database for each atom, then through all relations of some sequence, which have this user; for each match, pointer of this relation is moved to next). From such sequence, it should be possible to find strong relation groups for each user in this sequence.

This already allows a robust count of matches between users, but it still tells nothing about the quality of this match.

There are several algorithms, which can be used (and also applied recursively, which is why I write them in unordered manner):
* a: It's possible to decide a relevance of relation to one user, using how big percentage of all shared matches of this user it makes, also considering this relevance smaller, if there are less users with this match.
* b: It's possible to decide a quality of relation in the following manner: as more there are users with this relation, as stronger this is (more usable in global matching); as more similar is the number of matches in different directions, as stronger it is, logarithmically (if there are 500 matches in one direction and 500 in another, it's very strong; in cases there are 1000 users with this relation and they all agree, it's less usable, because it does not discriminate users).
* Those a and b can be used recursively.
* It's possible to remove all atoms from global search, which do not make up considerable proportion of those user's shared matches.
* It's possible to intercompare sequence relations in such manner that two long sequence matches are taken and compared to each other, noticing how they discriminate users. Their discriminativity is strongest, if they put users into four equal categories, including all users (many users might have not both declared). Their weight is biggest if they categorize users in exactly same way. With computer processing 43 billions instructions per second, one match for 10.000 users would take possibly 50.000 or a few times more instructions, 860.000 matches between two users could be done per second. Thus, it should be possible to interprocess all users in a few minutes without a lot of optimizations - matches can be found and it's mostly up to create a map of matches, which states both similarities and differences in strong manner.
* Starting by taking longest atom chains, which make sense, one can find similarities between different relation atoms and match them together. Chains, which have a few contradicions, a lot matches and unmatched sequences at some length (users, who only have one of two relations defined) will set some weight that those other users could have declared the same relations. Such relations will form pairs, which could even be merged in some cases (anyway, some more data about how each user belongs, might be needed with other algorithms). Those new endings should be as stronger as more discriminative the match pair. Now, this could be applied to all users - storing both matches and mismatches separately - for some good number of sequence pairs.
* As all such sequences belong to sequence groups of many users - those groups can be taken over to all new and existing users of such kind of list - as more strongly as more discriminative the list is. They can be stored in shared parts of user's sequences at first, then new sequences for each users can be combined like DNA molecule in some similar pattern matching from medicine.

It's possible to find more optimizations and make algorithms simpler, which should be discussed here:
["Discussion about sequence matching optimizations"]

By qtvali at Tue, 2010-01-05 10:15 | login or register to post comments

Lambda the Ultimate

User login

Navigation