Candle - a new script language that unifies XML technologies

Dear members of LtU,

I'm glad to announce the 0.9.1 beta release of Candle (Common ApplicatioN Development LanguagE). Candle is an open-source (MPL) scripting language that unifies the core features of many XML-related technologies (including XSLT, XQuery, XQuery Update, RELAX NG, BNF, XHTML, SVG and more). It can be used to develop command-line, desktop and Internet applications quickly.

Some of the advantages of Candle comparing to XSLT and XQuery are:

Candle is an unified language instead of two highly-overlapping languages. Candle uses scripting syntax instead of the highly verbose markup syntax of XSLT.
Candle's markup language is strongly-typed even without schema, whereas XML is only weakly-typed without schema.
Candle consists of a pattern language which cleanly unifies several pattern-related DSLs (including RegEx, BNF, RELAX NG, XQuery Sequence Type). It can easily match on sequence of items, nodes and characters.
Candle unifies functional and procedural programming. Through a mechanism I called separation-of-side-effects, Candle unifies two worlds in a more orderly manner then any existing multi-paradigm programming languages. In Candle, routines are divided into functions and methods. Functions are routines without side-effects and methods are routines with side-effects. The rule of separation-of-side-effects is that methods can call functions, but not vice versa. And expressions in Candle are always functional. In this way, pure functional islands are well-preserved in the vast sea of procedural code.
Candle is a general-purpose scripting language like Python, whereas XSLT and XQuery are just DSLs. Candle alone is sufficient to develop complex command-line, desktop and Internet applications, whereas XSLT and XQuery still need to integrate with other languages to develop a serious application.

For more information, you can visit the website (http://www.candlescript.org/) or the SourceForge project.

You can also read two blog articles Why I Invented Candle (I), Why I Invented Candle (II) and Towards the Grand Unification to understand the design rationales behind Candle.

Your feedback on Candle is highly appreciated.

Henry

--------------------------------------------------------------------
_{Candle App Platform - An unified platform
for
desktop and Internet apps.}

Curl

First, I applaud you on providing answers to our traditional "Why yet another language?" question.

You say:

the single and most important reason for inventing Candle is to create a new general-purpose programming language that treats markup data as built-in data type

I'm a bit curious on how much research you've performed on other languages with the same goal, such as Curl language.

By dmbarbour at Mon, 2011-08-08 16:30 | login or register to post comments

Scala?

By Ehud Lamm at Tue, 2011-08-09 00:02 | login or register to post comments

Xtatic? (was pretty

Xtatic? (was pretty interesting research in same range of ideas, imho)

By Cyril at Tue, 2011-08-09 00:24 | login or register to post comments

XQuery?

If we're pointing out languages that meet the same goals, I think the obvious one is XQuery itself. Unlike the poster's characterization, XQuery is not "just a DSL", and does not have to be embedded in a host language - many people are already using it to reduce complexity in web-apps by cutting out the middle tier.

By jpcs at Tue, 2011-08-09 09:04 | login or register to post comments

XQuery is not general enough

Yes. I agree that XQuery is much more than most DSLs. But to me, it is more equivalent to PL/SQL, rather than a general purpose language like Python. The major missing things are multi-threading/async support, general I/O support, OOP support, and event-driven GUI support.

By Henry Luo at Wed, 2011-08-10 06:52 | login or register to post comments

relation to existing work, etc.

First an aside that I think jpcs asks a good question ("XQuery?").

Clarifying some features on Candle

Thanks Thomas, for your comments and questions. They are all very solid questions.

The "put"/"get" feature in Candle is referential transparent if you consider the global variables in XSLT referential transparent. The functions in Candle can only shadow the values in this named stack, not modifying them. XSLT also supports tunneled variable. Candle is slightly more dynamic by allowing the name to be dynamic. Another way to understand "put"/"get" is to imagine that ALL functions in Candle or XQuery are required to pass one argument called $named-value-stack around, then the "put" statement just wraps this $named-value-stack with one more pair of name-value before passing it to functions in the statement block. And "get" is just a function that unwinds the $named-value-stack and returns the first value with matching name.

The document store or database mechanism is missing in current implementation of Candle. And it's not easy to implement one, so there won't be transactional support in Candle in the next few beta releases, but it should be in the final formal release.

Candle's going to implement a "directory" concept in the next beta release. So that you can query and update a file system just like you query and update XML. I think that's more useful than the abstract "collection" concept in XQuery.

The update mechanism in Candle is quite different from XQuery Update. It's more like PL/SQL. I'll write a separate reference doc on that.

Routines shall be first-class in next beta release of Candle.

Candle has not implemented tail-recursion removal. But that's just an optimization. "put" does not affect tail-recursion removal. You can think of it as one extra parameter passed along implicitly by the system.

Candle already supports making HTTP request. I'll expand the current ref doc and write a tutorial on that.

There's no asynchronous processing support in Candle at the moment. There might be some preliminary threading support in next beta release.

And next beta release should be in about 2 to 3 months time.

By Henry Luo at Wed, 2011-08-10 07:53 | login or register to post comments

defining "referential transparency"

I think that the allegedly pure functions in Candle are not actually pure. I think the following is a mistake (and I might be wrong):

The "put"/"get" feature in Candle is referential transparent if you consider the global variables in XSLT referential transparent. The functions in Candle can only shadow the values in this named stack, not modifying them. XSLT also supports tunneled variable. Candle is slightly more dynamic by allowing the name to be dynamic. [emphasis added]

Here is a proposed test of referential transparency:

Assume that there are constructs which can determine the binding of a name (assignments) and, on the other hand, there are constructs which produce the value of a name (references).

And, let's consider a class of programs that contain no "dead code" -- every apparent control flow path in these programs can be taken, depending on input. We don't need to be able enumerate the set of no-dead-code programs, which is good, because of course we can't. But we know that every program is either in this class or it isn't.

We can ask: given the text of a program, considering the no-dead-code programs, can we compute for each reference to a name, the exact set of assignments to that name which can determine the value at the time of the reference?

More specifically, I need some algorithm A, which given any program, will compute for each reference a list of assignments that might determine the value when that reference is evaluated. If a program happens to be of the no-dead-code variety, A must provably produce exactly the list of assignments that effect each reference.

If that computation can always be made in some amount of time that is bound by some function of the textual length of the program, then the language is referentially transparent.

On the other hand, if that computation can't (in general) be made in any less time than it takes to actually run the program itself -- then the language is not referentially transparent.

By that test, XSLT with its version of tunneled variables is referentially transparent. It is a functional language (as designed).

By that same test, the allegedly functional subset of Candle is not referentially transparent (and so I would say misses its design goal of being a truly functional subset).

By Thomas Lord at Thu, 2011-08-11 23:27 | login or register to post comments

"put"/"get" are just dynamically scoped variables

The last paragraph of this article states:

A more subtle example is that of a function that uses a global variable (or a dynamically scoped variable, or a lexical closure) to help it compute its results. Since this variable is not passed as a parameter but can be altered, the results of subsequent calls to the function can differ even if the parameters are identical. (In pure functional programming, destructive assignment is not allowed; thus a function that uses global (or dynamically scoped[citation needed]) variables is still referentially transparent, since these variables cannot change.)

The put/get in Candle is just the dynamically scoped variable. It's value is not destructively assigned but only shadowed during the evaluation, thus it is referentially transparent.

Here's another way to understand it or test it. We can transform Candle functions into a set of equivalent functions without the put/get statements using the approach I mentioned in the previous article. If all the result functions are referentially transparent. Then we can say the original functions are referentially transparent.(That's my invented way of proving referential transparency. It self needs to be proved. But my gut feeling tells me it should work. :-)

By Henry Luo at Fri, 2011-08-12 08:56 | login or register to post comments

Referential Transparency

The article you quote is wrong with respect to 'dynamically scoped' variables. That happens.

Ability to 'transform' into a referentially transparent language is irrelevant. Consider: one way of interpreting C code is that it does not destructively alter anything... rather, it just creates a new environment and passes it to the next statement. This 'environment passing style' interpretation works until we add concurrency. But we would still not call C 'referentially transparent'.

In Haskell, we get similar with a State or ST monad. The overall 'runState' or 'runST' computation can be said to be pure, deterministic, referentially transparent - but within the monad, the abstraction developers have is of changing state. And statements cannot be said to be referentially transparent since they must be concerned about the implicit referential context of each statement.

RT must be understood syntactically - at the level of statements and expressions. It really does make a difference whether the environment - the referential context - is 'explicit' vs. 'implicit'.

By dmbarbour at Fri, 2011-08-12 18:36 | login or register to post comments

Candle is referentially transparent on the overall

May be I can agree that functions in Candle which call get() are not referentially transparent judging by the inputs they take.

But on the overall Candle is always referentially transparent, i mean the top-most function main(). Because the main() only depends on the inputs, and when the program starts, the named stack is always empty.

And if someone wants to analyze functions in Candle in a pure referentially transparent way, they can always transform the functions in Candle using the approach I mentioned above.

By Henry Luo at Sat, 2011-08-13 08:17 | login or register to post comments

XSLT functions are also not referentially transparent

And XSLT functions using tunnel parameters are also not referentially transparent judging by the same criteria.

Say template A pushed one tunnel parameter, then calls template B,C,D,E...X,Y and finally template Z which pops the tunnel parameter. Then in this call chain, templates B,C,D,E...X,Y are not referentially transparent, because they depend on the pushed tunnel parameter, not just the parameters of the templates.

Below is how XSLT 2.0 Spec explains its tunnel parameter:

Note:
Tunnel parameters are conceptually similar to dynamically-scoped variables in some functional programming languages.

Candle is just one of those functional programming languages.

By Henry Luo at Sat, 2011-08-13 12:56 | login or register to post comments

XSLT and referential transparency

You said something earlier that is the crux of why XSLT is referentially transparent and Candle's allegedly functional subset is not:

Candle is slightly more dynamic by allowing the name [of a tunneled variable being assigned or referenced] to be dynamic.

I understand you to mean that "put" and "get" can be used to bind or reference a dynamically computed name. The name used can also occur statically in the program.

Here's a consequence of letting names be dynamic like that:

Given a reference to a tunneled variable in XSLT, by examining the program, I can always list a set of binding locations in the source code such that: my set includes all binding locations that can impact that reference; and my set contains no other binding locations except, maybe, some "false positives" which are related only by dead code paths.

In Candle, because the name of a tunneled variable can be dynamic, the same computation is impossible.

In that sense, the meaning of names in XSLT are statically apparent in a way that the meaning of names are not in Candle.

That proves that Candle's allegedly functional subset is not referentially transparent: the fact that that computation about bindings and references is impossible. Probably, if you did not allow the names to put/get to be dynamic, the allegedly functional subset of Candle would, then, be functional. (But see further below.)

In other words, the similar operational model of a binding stack doesn't make or break referential transparency -- that's not relevant. What matters is whether you can statically analyze the relation between binding sites and reference sites with precision (modulo dead code).

Suppose that Candle were changed to only allow static names to put/get. Now there is a chance that the allegedly functional subset is truly functional. I still have another concern:

How do you treat the function "now()", to name one example?

In XSLT, a program can call a putative time function like "now()" many times. Each time it returns the current time. The catch is that in XSLT now() is a constant function that always returns exactly the same value each time it is called in a given run of the program.

If that were not true, then XSLT would no longer be referentially transparent.

So, how does "now()" behave in Candle? I ask because I think for the non-functional parts of Candle, you'd want the value of now() to change over time ... but for the allegedly functional parts, you want it to remain constant. How does Candle reconcile that tension?

By Thomas Lord at Sun, 2011-08-14 17:11 | login or register to post comments

Candle shall make the name of put static in next release

Thomas, thanks for your detailed analysis.

I'm thinking in the same way that, in next beta release of Candle, I'll make names of put statement static. Syntax of put statement shall be made similar to the let statement, put $var as type = var { ... }, so that put can weave with FLWOR statements.

The treatment of now() and random() in current beta release of Candle is a bit incomplete. In next release, they'll be split into two versions:
function now() !! same across the entire query evaluation method now() !! changes on each evaluation function random(seed, cnt) !! based on the seed, returns a series of random numbers method random()

By Henry Luo at Mon, 2011-08-15 03:21 | login or register to post comments

Candle vs. other XML-enabled languages

Firstly, thanks everyone for your comments. I think the questions have go beyond what I can answer in a single post.

In this post I just share my observations on the XML support in some languages that I'm aware of. I summary them in a feature comparison table below. It's illustrative rather than exhaustive, or authoritative (if any fact below is wrong, you are welcome to point it out):

Features \ Languages	Curl	Scala	JavaScript	.Net	JavaFX	XQuery	Candle
Dynamic XML Construction/Composition _{(directly in the language, not through DOM APIs)}	Yes	Yes	No	No	Yes	Yes	Yes
Path Expression	No?	Basic	Basic _{E4X is basic comparing to XPath;} _{jQuery has an interesting feature that uses CSS selector to query HTML and XML, and this approach is formalized by W3C Selectors API}	No	No	Good _XPath	Good _XPath-based
Query Support _{(set-oriented, FLWOR, etc)}	No?	Basic	No	Basic _LINQ	No	Good	Good _XQuery-based
Template Transformation	No?	No	No	No	No	No	Good _XSLT-based
Node Pattern and Schema Matching ^{(directly in the language not relying on external schema languages)}	No?	Basic _{through case classes}	No	No	No	Basic _{Sequence type}	Good _{RELAX NG based}
Node Update _{(using CRUD statements instead of DOM APIs)}	No	No	No	No	No	Good _{XQuery Update}	Good _{XQuery Update based}
Grammar Support _{(the ability to parse any abstract text based on a general BNF grammar and return the result syntax tree, AST; then all query/transform features can be applied to the AST. I know this can be done in all the languages using 3rd party libraries, but I'm talking about built-in support here)}	No	No	No	No	No	No	Good _BNF based

To me, just allowing XML to be directly constructed in a language does not count as real built-in XML support. That's just "syntax sugar" to the DOM API. Real built-in support means supporting path, query, transform, pattern matching and update on XML data. And from the comparison, you can see that Candle offers the most complete built-in processing support of XML.

By Henry Luo at Wed, 2011-08-10 06:36 | login or register to post comments

I'm pretty sure Scala

I'm pretty sure Scala supports queries (through comprehensions) and node pattern matching (via...pattern matching).

But isn't XML dead/dying anyways as a web interchange format? These days, JSON seems increasingly preferred given its simplicity (untyped...) and integration with JavaScript. JSON benefits from language support also (like through C#'s dynamic typing).

By Sean McDirmid at Wed, 2011-08-10 08:25 | login or register to post comments

JSON and XML are just hierarchical data

I've updated my table based on your input on Scala.

XML, JSON or even BSON are just different syntax for hierarchical data. The above discussions are really about hierarchical data processing. They applies to both XML and JSON.

By Henry Luo at Wed, 2011-08-10 12:52 | login or register to post comments

I disagree, at least on

I disagree, at least on JSON. JSON is all about coomunicating with JavaScript structures, i.e., plain old simple associative arrays. With JSON, all the craziness of XML goes away and I don't really need any special language. But you also have new problems, like data dependant union types, things not easily expressed in a static type system. You must really embrace dynamic when using JSON.

Now, JSON is not useful for real HTML-like doc markup, and in that case, you do want something like XML. But then I still don't see much point to something like candle, outputting markup is easy enough with literals while the read/parsing problems aren't very solvable regardless.

By Sean McDirmid at Wed, 2011-08-10 13:14 | login or register to post comments

But you also have new

But you also have new problems, like data dependant union types, things not easily expressed in a static type system. You must really embrace dynamic when using JSON.

Could you elaborate a bit? While I assume the description of any general predicate over a data format may indeed require dependent types (and I remember seeing dependent types when glancing at the paper "The Next 700 Data Description Languages"), I had the impression that most actual dependencies where of the kind "the union case is determined by an accompanying tag value" that are very well modeled by standard algebraic datatypes. Of course, things like indefinite nesting or selector lookup may require more elaborate type system like {C,X}Duce regular types, but I'm not aware of any practically relevant case that would give a hard time to the existing type technology.

By gasche at Wed, 2011-08-10 20:23 | login or register to post comments

Its hard to say how people

Its hard to say how people actually use JSON in practice, since I've observed its very diverse (I've dealt with Facebook, Foursquare, Yelp, + lots of others). Sometimes they use tags, sometimes the element is just their or not depending on some hidden value that you can't see at all. I had this discussion with Don Syme about type providers (typed F# wrappers around web data); it just seemed that the approach had lots of challenges to handle JSON.

It wouldn't be a big deal if JSON wasn't becoming the standard for receiving/sending web data. But XML is becoming increasingly rare in public APIs. Some sites offer both, but many are just bypassing XML altogether.

By Sean McDirmid at Wed, 2011-08-10 23:38 | login or register to post comments

JSON vs. XML

XML is getting hammered on reddit today, rightly so for anyone who has tried to work with it. The niche where XML works well is much smaller than was thought two years ago.

By Sean McDirmid at Sat, 2011-08-20 23:54 | login or register to post comments

re JSON vs. XML

The niche where XML works well is much smaller than was thought two years ago.

citation?

By Thomas Lord at Sun, 2011-08-21 03:23 | login or register to post comments

Sorry, I was referring to

Sorry, I was referring to data in the original source article.

By Sean McDirmid at Sun, 2011-08-21 04:43 | login or register to post comments

XML's "works well" niche

So, what that data suggests is that fewer (percentage-wise) people are choosing XML, not that what they are doing instead works well in comparison.

By Thomas Lord at Mon, 2011-08-22 16:28 | login or register to post comments

Little Bits of XML Repeating

I guess we'll see how many XML technologies are reinvented with the JSON brand. ;)

By dmbarbour at Mon, 2011-08-22 17:04 | login or register to post comments

hypothesis about XML's ill health

To a first approximation, if you are using free software components rather than paying to license proprietary ones, the odds are very good that you are stuck using James Clark's parsers. If you want XSLT you are probably stuck using xsltproc. For XQuery, you are stuck with with Berkeley DB/XML nowadays provided by Oracle.

None of that software is especially terrible and it's all stronger than all the other free software alternatives I've seen. (If anyone wants to prove me wrong and point out better free software tools, bless them!)

None of that software is especially terrible but neither is any of it especially great. There are missing pieces, awkward interfaces, performance problems, and so on. I've really enjoyed using all of those components and getting things done with them - at times - but I've always been constantly reminded by frustrations how they are really not nearly as robust and efficient as the problem domain really demands. None of the free software XML foundation has really been lavished with as much serious and sustained development effort as you'd wish.

So, now, what happens when I want to build an application? I have a choice. I can use huge and slightly tricky to understand XML libraries but, because of the limits of the free software implementations, my application will have some flakiness, inefficiencies, and/or functionality gaps. Or, I can use the lower tech stuff like JSON and because of the *design gaps* in that lower tech stuff, my application will have some flakiness, inefficiencies, and/or functionality gaps. (Or, I can resolve to go off and, "step 0", build better XML tools but then you'll likely never hear from me again after my boss fires me for turning a 6 month project into a 6 year project.)

For XML to really take its rightful place, it's going to be necessary to substantially raise the robustness, efficiency, and completeness of the foundation of free software XML components. By observation of the high quality proprietary software components I infer that this is going to take a lot of well organized labor and a sophisticated leadership -- exactly the kind of thing the free software world is historically lousy at doing.

It's kind of a bummer because if you could put a dollar figure on the development costs needed I think it would be nearly peanuts compared to typical large corporate budgets. Enough that individuals can't do it and typical middle managers can't risk it ... but peanuts in the grand scheme of things that would help juice the software economy.

By Thomas Lord at Mon, 2011-08-22 18:29 | login or register to post comments

Design Gaps

My impressions of XML haven't been so positive. OTOH, most of my complaints would also apply to JSON.

I would like to see models that focus more on sets, streams, updates (patches), laziness, map-reduce, and data fusion. Honestly, I'd think we'd be better pursuing more heavily in the direction of relational programming with live queries (i.e. temporal inserts/deletes come across the line). (XPath and XSLT queries and their like seem sort of hacked onto XML - they're good, but maybe we could do better if the underlying data model were designed for it.)

And I also like how David Ryan's Argot simply negotiates the shared types at the beginning of every file or stream, in a compact way. Structured editing wouldn't require XML's obtuse format if we have such types.

Making XML technologies more efficient and robust would maybe help for a lot of projects. Maybe. But I still feel the technology itself has a lot of design gaps. What is its 'rightful place'?

By dmbarbour at Mon, 2011-08-22 19:00 | login or register to post comments

In my own experience, its

In my own experience, its that the most important APIs are either dual XML/JSON, or JSON-only. So you still have a bunch of mature APIs that were designed with XML and are no longer evolving (maybe since they aren't very popular, and if its not broken don't fix it), and you have the popular APIs that developers are using en-mass that are more volatile and more likely to be JSON.

A new web API today is more likely to be based on JSON. The static checking capabilities of XML are just too difficult to deal with, both on the provider and consumer sides. Perhaps if there was better technology to make XML more usable, say in the form of F# type providers or something like Candle, the momentum might swing back to XML. And, of course, XML is meant for more than data interchange, that it is becoming less popular in this one niche doesn't mean that it is becoming less popular in its other niches.

By Sean McDirmid at Wed, 2011-08-24 00:46 | login or register to post comments

CDuce

Have you seen CDuce?

By mlhaufe at Fri, 2011-08-19 09:11 | login or register to post comments

Comparing CDuce and Candle

Thanks for suggesting CDuce.

I definitely had visited CDuce website (might be long ago), as I had bookmarked their website. My impression was that it was something like RELAX NG.

I had a closer look at their documents again today. And I found it to be much more than that. And I swear I'll spend more time on CDuce. There'll be many things Candle can learn from it.

Comparing CDuce and Candle (based on my limited study of CDuce), I think they have quite some overlapping features, e.g. they all have letters 'c', 'd' and 'e' in the name :-)

They all have RELAX NG kind of type and pattern system. CDuce is more powerful and supports features like capture variable.
They all supports dynamic XML construction.
They all supports XPath expression. Candle is more complete.
They all supports query expression. FLWOR in Candle is probably richer than select_from in CDuce.
They all supports function as first class data.

Some differences:

CDuce's syntax is based on functional languages. Candle's syntax is more like scripting language, like JavaScript. I think the latter will be more friendly to programmers from OOP world.
CDuce supports imperative assignment. I don't know if that would void CDuce as a functional language. Candle also supports imperative features, but enforces separation-of-side-effects.
Candle supports updating of XML data using CRUD statements. CDuce does not.
Candle has a bigger scope to become a general-purpose app development language. And CDuce is more like a focused XML processing language.

By Henry Luo at Tue, 2011-08-23 06:23 | login or register to post comments

Lambda the Ultimate

User login

Navigation