Candle - a new script language that unifies XML technologies

Dear members of LtU,

I'm glad to announce the 0.9.1 beta release of Candle (Common ApplicatioN Development LanguagE). Candle is an open-source (MPL) scripting language that unifies the core features of many XML-related technologies (including XSLT, XQuery, XQuery Update, RELAX NG, BNF, XHTML, SVG and more). It can be used to develop command-line, desktop and Internet applications quickly.

Some of the advantages of Candle comparing to XSLT and XQuery are:
  • Candle is an unified language instead of two highly-overlapping languages. Candle uses scripting syntax instead of the highly verbose markup syntax of XSLT.
  • Candle's markup language is strongly-typed even without schema, whereas XML is only weakly-typed without schema.
  • Candle consists of a pattern language which cleanly unifies several pattern-related DSLs (including RegEx, BNF, RELAX NG, XQuery Sequence Type). It can easily match on sequence of items, nodes and characters.
  • Candle unifies functional and procedural programming. Through a mechanism I called separation-of-side-effects, Candle unifies two worlds in a more orderly manner then any existing multi-paradigm programming languages. In Candle, routines are divided into functions and methods. Functions are routines without side-effects and methods are routines with side-effects. The rule of separation-of-side-effects is that methods can call functions, but not vice versa. And expressions in Candle are always functional. In this way, pure functional islands are well-preserved in the vast sea of procedural code.
  • Candle is a general-purpose scripting language like Python, whereas XSLT and XQuery are just DSLs. Candle alone is sufficient to develop complex command-line, desktop and Internet applications, whereas XSLT and XQuery still need to integrate with other languages to develop a serious application.

For more information, you can visit the website (http://www.candlescript.org/) or the SourceForge project.

You can also read two blog articles Why I Invented Candle (I), Why I Invented Candle (II) and Towards the Grand Unification to understand the design rationales behind Candle.

Your feedback on Candle is highly appreciated.

Henry

--------------------------------------------------------------------
Candle App Platform - An unified platform for desktop and Internet apps.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Curl

First, I applaud you on providing answers to our traditional "Why yet another language?" question.

You say:

the single and most important reason for inventing Candle is to create a new general-purpose programming language that treats markup data as built-in data type

I'm a bit curious on how much research you've performed on other languages with the same goal, such as Curl language.

Scala?

Scala?

Xtatic? (was pretty

Xtatic? (was pretty interesting research in same range of ideas, imho)

XQuery?

If we're pointing out languages that meet the same goals, I think the obvious one is XQuery itself. Unlike the poster's characterization, XQuery is not "just a DSL", and does not have to be embedded in a host language - many people are already using it to reduce complexity in web-apps by cutting out the middle tier.

XQuery is not general enough

Yes. I agree that XQuery is much more than most DSLs. But to me, it is more equivalent to PL/SQL, rather than a general purpose language like Python. The major missing things are multi-threading/async support, general I/O support, OOP support, and event-driven GUI support.

relation to existing work, etc.

First an aside that I think jpcs asks a good question ("XQuery?").

Other questions / suggestions:

The "put" / "get" construct seems to eliminate the referential transparency of pure functions. Is that the case? (This is a general language question about what people expect from referential transparency. "put/get" is essentially a dynamic binding construct (which the author calls a "tunneled variable" concept). "put" occurs in pure functions and dynamically binds some name, then executes an enclosed body. Any evaluation associated with that body, even if it is not lexically enclosed in the scope of the "put", can retrieve that binding with "get". "put" can be nested to shadow these dynamic variables.)

I ask because I'm not clear how deeply runs the motivation for the procedural / functional divide in Candle. The materials I found on the web site seemed to say that the main motivation for including pure functions as a separate thing was to improve code clarity and maintainability. I would contrast that with the motivation in something like XSLT. I *think* I'm safe saying that the functional nature of XSLT is additionally motivated by the goals of easing automatic analysis of many programs and, more importantly, of enabling highly optimized implementations. I'm wondering how much of the design of Candle takes that into account and where "put" and "get" fit in.

Is there, somewhere, some concise description of the store semantics? Does Candle have the notion of document collections (ala XQuery)? A transaction semantics that allows concurrent processes to communicate through that store? An exception mechanism?

How are the impure features of Candle different from XQuery Update?

In the proprietary software world to a greater extent but also to some extent in the free software world, there is investment in building XPath / XSLT / XQuery engines backed by transactional persistent stores that support concurrency, replication, etc. There seems to be a lot of pretty intense work in these areas (which would be expected given the analogous history of SQL). Is it a goal, anti-goal, or otherwise of Candle to be a language which can likely be "compiled down" to the W3C languages and usefully run on those high performance engines?

Are there known barriers to adding first class functions to Candle?

Are tail calls in Candle safe-for-space (including tail calls in "put" statements)?

Do Candle's flow of control mechanisms provide a direct, complete model for the flow of control structure of HTTP (and should they)? For example, is there a call mechanism with structured parameter passing based on a method name, header data, and manifestly typed (perhaps multi-part and lazily consumed) body? A way to reify asynchronous requests? etc.?

Clarifying some features on Candle

Thanks Thomas, for your comments and questions. They are all very solid questions.

The "put"/"get" feature in Candle is referential transparent if you consider the global variables in XSLT referential transparent. The functions in Candle can only shadow the values in this named stack, not modifying them. XSLT also supports tunneled variable. Candle is slightly more dynamic by allowing the name to be dynamic. Another way to understand "put"/"get" is to imagine that ALL functions in Candle or XQuery are required to pass one argument called $named-value-stack around, then the "put" statement just wraps this $named-value-stack with one more pair of name-value before passing it to functions in the statement block. And "get" is just a function that unwinds the $named-value-stack and returns the first value with matching name.

The document store or database mechanism is missing in current implementation of Candle. And it's not easy to implement one, so there won't be transactional support in Candle in the next few beta releases, but it should be in the final formal release.

Candle's going to implement a "directory" concept in the next beta release. So that you can query and update a file system just like you query and update XML. I think that's more useful than the abstract "collection" concept in XQuery.

The update mechanism in Candle is quite different from XQuery Update. It's more like PL/SQL. I'll write a separate reference doc on that.

Routines shall be first-class in next beta release of Candle.

Candle has not implemented tail-recursion removal. But that's just an optimization. "put" does not affect tail-recursion removal. You can think of it as one extra parameter passed along implicitly by the system.

Candle already supports making HTTP request. I'll expand the current ref doc and write a tutorial on that.

There's no asynchronous processing support in Candle at the moment. There might be some preliminary threading support in next beta release.

And next beta release should be in about 2 to 3 months time.

defining "referential transparency"

I think that the allegedly pure functions in Candle are not actually pure. I think the following is a mistake (and I might be wrong):

The "put"/"get" feature in Candle is referential transparent if you consider the global variables in XSLT referential transparent. The functions in Candle can only shadow the values in this named stack, not modifying them. XSLT also supports tunneled variable. Candle is slightly more dynamic by allowing the name to be dynamic. [emphasis added]

Here is a proposed test of referential transparency:

Assume that there are constructs which can determine the binding of a name (assignments) and, on the other hand, there are constructs which produce the value of a name (references).

And, let's consider a class of programs that contain no "dead code" -- every apparent control flow path in these programs can be taken, depending on input. We don't need to be able enumerate the set of no-dead-code programs, which is good, because of course we can't. But we know that every program is either in this class or it isn't.

We can ask: given the text of a program, considering the no-dead-code programs, can we compute for each reference to a name, the exact set of assignments to that name which can determine the value at the time of the reference?

More specifically, I need some algorithm A, which given any program, will compute for each reference a list of assignments that might determine the value when that reference is evaluated. If a program happens to be of the no-dead-code variety, A must provably produce exactly the list of assignments that effect each reference.

If that computation can always be made in some amount of time that is bound by some function of the textual length of the program, then the language is referentially transparent.

On the other hand, if that computation can't (in general) be made in any less time than it takes to actually run the program itself -- then the language is not referentially transparent.

By that test, XSLT with its version of tunneled variables is referentially transparent. It is a functional language (as designed).

By that same test, the allegedly functional subset of Candle is not referentially transparent (and so I would say misses its design goal of being a truly functional subset).

"put"/"get" are just dynamically scoped variables

The last paragraph of this article states:

A more subtle example is that of a function that uses a global variable (or a dynamically scoped variable, or a lexical closure) to help it compute its results. Since this variable is not passed as a parameter but can be altered, the results of subsequent calls to the function can differ even if the parameters are identical. (In pure functional programming, destructive assignment is not allowed; thus a function that uses global (or dynamically scoped[citation needed]) variables is still referentially transparent, since these variables cannot change.)

The put/get in Candle is just the dynamically scoped variable. It's value is not destructively assigned but only shadowed during the evaluation, thus it is referentially transparent.

Here's another way to understand it or test it. We can transform Candle functions into a set of equivalent functions without the put/get statements using the approach I mentioned in the previous article. If all the result functions are referentially transparent. Then we can say the original functions are referentially transparent.(That's my invented way of proving referential transparency. It self needs to be proved. But my gut feeling tells me it should work. :-)

Referential Transparency

The article you quote is wrong with respect to 'dynamically scoped' variables. That happens.

Ability to 'transform' into a referentially transparent language is irrelevant. Consider: one way of interpreting C code is that it does not destructively alter anything... rather, it just creates a new environment and passes it to the next statement. This 'environment passing style' interpretation works until we add concurrency. But we would still not call C 'referentially transparent'.

In Haskell, we get similar with a State or ST monad. The overall 'runState' or 'runST' computation can be said to be pure, deterministic, referentially transparent - but within the monad, the abstraction developers have is of changing state. And statements cannot be said to be referentially transparent since they must be concerned about the implicit referential context of each statement.

RT must be understood syntactically - at the level of statements and expressions. It really does make a difference whether the environment - the referential context - is 'explicit' vs. 'implicit'.

Candle is referentially transparent on the overall

May be I can agree that functions in Candle which call get() are not referentially transparent judging by the inputs they take.

But on the overall Candle is always referentially transparent, i mean the top-most function main(). Because the main() only depends on the inputs, and when the program starts, the named stack is always empty.

And if someone wants to analyze functions in Candle in a pure referentially transparent way, they can always transform the functions in Candle using the approach I mentioned above.

XSLT functions are also not referentially transparent

And XSLT functions using tunnel parameters are also not referentially transparent judging by the same criteria.

Say template A pushed one tunnel parameter, then calls template B,C,D,E...X,Y and finally template Z which pops the tunnel parameter. Then in this call chain, templates B,C,D,E...X,Y are not referentially transparent, because they depend on the pushed tunnel parameter, not just the parameters of the templates.

Below is how XSLT 2.0 Spec explains its tunnel parameter:

Note:
Tunnel parameters are conceptually similar to dynamically-scoped variables in some functional programming languages.

Candle is just one of those functional programming languages.

XSLT and referential transparency

You said something earlier that is the crux of why XSLT is referentially transparent and Candle's allegedly functional subset is not:

Candle is slightly more dynamic by allowing the name [of a tunneled variable being assigned or referenced] to be dynamic.

I understand you to mean that "put" and "get" can be used to bind or reference a dynamically computed name. The name used can also occur statically in the program.

Here's a consequence of letting names be dynamic like that:

Given a reference to a tunneled variable in XSLT, by examining the program, I can always list a set of binding locations in the source code such that: my set includes all binding locations that can impact that reference; and my set contains no other binding locations except, maybe, some "false positives" which are related only by dead code paths.

In Candle, because the name of a tunneled variable can be dynamic, the same computation is impossible.

In that sense, the meaning of names in XSLT are statically apparent in a way that the meaning of names are not in Candle.

That proves that Candle's allegedly functional subset is not referentially transparent: the fact that that computation about bindings and references is impossible. Probably, if you did not allow the names to put/get to be dynamic, the allegedly functional subset of Candle would, then, be functional. (But see further below.)

In other words, the similar operational model of a binding stack doesn't make or break referential transparency -- that's not relevant. What matters is whether you can statically analyze the relation between binding sites and reference sites with precision (modulo dead code).

Suppose that Candle were changed to only allow static names to put/get. Now there is a chance that the allegedly functional subset is truly functional. I still have another concern:

How do you treat the function "now()", to name one example?

In XSLT, a program can call a putative time function like "now()" many times. Each time it returns the current time. The catch is that in XSLT now() is a constant function that always returns exactly the same value each time it is called in a given run of the program.

If that were not true, then XSLT would no longer be referentially transparent.

So, how does "now()" behave in Candle? I ask because I think for the non-functional parts of Candle, you'd want the value of now() to change over time ... but for the allegedly functional parts, you want it to remain constant. How does Candle reconcile that tension?

Candle shall make the name of put static in next release

Thomas, thanks for your detailed analysis.

I'm thinking in the same way that, in next beta release of Candle, I'll make names of put statement static. Syntax of put statement shall be made similar to the let statement, put $var as type = var { ... }, so that put can weave with FLWOR statements.

The treatment of now() and random() in current beta release of Candle is a bit incomplete. In next release, they'll be split into two versions:

function now() !! same across the entire query evaluation
method now() !! changes on each evaluation
function random(seed, cnt) !! based on the seed, returns a series of random numbers
method random()

Candle vs. other XML-enabled languages

Firstly, thanks everyone for your comments. I think the questions have go beyond what I can answer in a single post.

In this post I just share my observations on the XML support in some languages that I'm aware of. I summary them in a feature comparison table below. It's illustrative rather than exhaustive, or authoritative (if any fact below is wrong, you are welcome to point it out):
Features \ Languages Curl Scala JavaScript .Net JavaFX XQuery Candle
Dynamic XML Construction/Composition
(directly in the language, not through DOM APIs)
Yes Yes No No Yes Yes Yes
Path Expression No? Basic Basic
  • E4X is basic comparing to XPath;
  • jQuery has an interesting feature that uses CSS selector to query HTML and XML, and this approach is formalized by W3C Selectors API
No No Good
  • XPath
Good
  • XPath-based
Query Support
(set-oriented, FLWOR, etc)
No? Basic No Basic
  • LINQ
No Good Good
  • XQuery-based
Template Transformation No? No No No No No Good
  • XSLT-based
Node Pattern and Schema Matching
(directly in the language not relying on external schema languages)
No? Basic
  • through case classes
No No No Basic
  • Sequence type
Good
  • RELAX NG based
Node Update
(using CRUD statements instead of DOM APIs)
No No No No No Good
  • XQuery Update
Good
  • XQuery Update based
Grammar Support
(the ability to parse any abstract text based on a general BNF grammar and return the result syntax tree, AST; then all query/transform features can be applied to the AST. I know this can be done in all the languages using 3rd party libraries, but I'm talking about built-in support here)
No No No No No No Good
  • BNF based

To me, just allowing XML to be directly constructed in a language does not count as real built-in XML support. That's just "syntax sugar" to the DOM API. Real built-in support means supporting path, query, transform, pattern matching and update on XML data. And from the comparison, you can see that Candle offers the most complete built-in processing support of XML.

I'm pretty sure Scala

I'm pretty sure Scala supports queries (through comprehensions) and node pattern matching (via...pattern matching).

But isn't XML dead/dying anyways as a web interchange format? These days, JSON seems increasingly preferred given its simplicity (untyped...) and integration with JavaScript. JSON benefits from language support also (like through C#'s dynamic typing).

JSON and XML are just hierarchical data

I've updated my table based on your input on Scala.

XML, JSON or even BSON are just different syntax for hierarchical data. The above discussions are really about hierarchical data processing. They applies to both XML and JSON.

I disagree, at least on

I disagree, at least on JSON. JSON is all about coomunicating with JavaScript structures, i.e., plain old simple associative arrays. With JSON, all the craziness of XML goes away and I don't really need any special language. But you also have new problems, like data dependant union types, things not easily expressed in a static type system. You must really embrace dynamic when using JSON.

Now, JSON is not useful for real HTML-like doc markup, and in that case, you do want something like XML. But then I still don't see much point to something like candle, outputting markup is easy enough with literals while the read/parsing problems aren't very solvable regardless.

But you also have new

But you also have new problems, like data dependant union types, things not easily expressed in a static type system. You must really embrace dynamic when using JSON.

Could you elaborate a bit? While I assume the description of any general predicate over a data format may indeed require dependent types (and I remember seeing dependent types when glancing at the paper "The Next 700 Data Description Languages"), I had the impression that most actual dependencies where of the kind "the union case is determined by an accompanying tag value" that are very well modeled by standard algebraic datatypes. Of course, things like indefinite nesting or selector lookup may require more elaborate type system like {C,X}Duce regular types, but I'm not aware of any practically relevant case that would give a hard time to the existing type technology.

Its hard to say how people

Its hard to say how people actually use JSON in practice, since I've observed its very diverse (I've dealt with Facebook, Foursquare, Yelp, + lots of others). Sometimes they use tags, sometimes the element is just their or not depending on some hidden value that you can't see at all. I had this discussion with Don Syme about type providers (typed F# wrappers around web data); it just seemed that the approach had lots of challenges to handle JSON.

It wouldn't be a big deal if JSON wasn't becoming the standard for receiving/sending web data. But XML is becoming increasingly rare in public APIs. Some sites offer both, but many are just bypassing XML altogether.

JSON vs. XML

XML is getting hammered on reddit today, rightly so for anyone who has tried to work with it. The niche where XML works well is much smaller than was thought two years ago.

re JSON vs. XML

The niche where XML works well is much smaller than was thought two years ago.

citation?

Sorry, I was referring to

Sorry, I was referring to data in the original source article.

XML's "works well" niche

So, what that data suggests is that fewer (percentage-wise) people are choosing XML, not that what they are doing instead works well in comparison.

Little Bits of XML Repeating

I guess we'll see how many XML technologies are reinvented with the JSON brand. ;)

hypothesis about XML's ill health

To a first approximation, if you are using free software components rather than paying to license proprietary ones, the odds are very good that you are stuck using James Clark's parsers. If you want XSLT you are probably stuck using xsltproc. For XQuery, you are stuck with with Berkeley DB/XML nowadays provided by Oracle.

None of that software is especially terrible and it's all stronger than all the other free software alternatives I've seen. (If anyone wants to prove me wrong and point out better free software tools, bless them!)

None of that software is especially terrible but neither is any of it especially great. There are missing pieces, awkward interfaces, performance problems, and so on. I've really enjoyed using all of those components and getting things done with them - at times - but I've always been constantly reminded by frustrations how they are really not nearly as robust and efficient as the problem domain really demands. None of the free software XML foundation has really been lavished with as much serious and sustained development effort as you'd wish.

So, now, what happens when I want to build an application? I have a choice. I can use huge and slightly tricky to understand XML libraries but, because of the limits of the free software implementations, my application will have some flakiness, inefficiencies, and/or functionality gaps. Or, I can use the lower tech stuff like JSON and because of the *design gaps* in that lower tech stuff, my application will have some flakiness, inefficiencies, and/or functionality gaps. (Or, I can resolve to go off and, "step 0", build better XML tools but then you'll likely never hear from me again after my boss fires me for turning a 6 month project into a 6 year project.)

For XML to really take its rightful place, it's going to be necessary to substantially raise the robustness, efficiency, and completeness of the foundation of free software XML components. By observation of the high quality proprietary software components I infer that this is going to take a lot of well organized labor and a sophisticated leadership -- exactly the kind of thing the free software world is historically lousy at doing.

It's kind of a bummer because if you could put a dollar figure on the development costs needed I think it would be nearly peanuts compared to typical large corporate budgets. Enough that individuals can't do it and typical middle managers can't risk it ... but peanuts in the grand scheme of things that would help juice the software economy.

Design Gaps

My impressions of XML haven't been so positive. OTOH, most of my complaints would also apply to JSON.

I would like to see models that focus more on sets, streams, updates (patches), laziness, map-reduce, and data fusion. Honestly, I'd think we'd be better pursuing more heavily in the direction of relational programming with live queries (i.e. temporal inserts/deletes come across the line). (XPath and XSLT queries and their like seem sort of hacked onto XML - they're good, but maybe we could do better if the underlying data model were designed for it.)

And I also like how David Ryan's Argot simply negotiates the shared types at the beginning of every file or stream, in a compact way. Structured editing wouldn't require XML's obtuse format if we have such types.

Making XML technologies more efficient and robust would maybe help for a lot of projects. Maybe. But I still feel the technology itself has a lot of design gaps. What is its 'rightful place'?

In my own experience, its

In my own experience, its that the most important APIs are either dual XML/JSON, or JSON-only. So you still have a bunch of mature APIs that were designed with XML and are no longer evolving (maybe since they aren't very popular, and if its not broken don't fix it), and you have the popular APIs that developers are using en-mass that are more volatile and more likely to be JSON.

A new web API today is more likely to be based on JSON. The static checking capabilities of XML are just too difficult to deal with, both on the provider and consumer sides. Perhaps if there was better technology to make XML more usable, say in the form of F# type providers or something like Candle, the momentum might swing back to XML. And, of course, XML is meant for more than data interchange, that it is becoming less popular in this one niche doesn't mean that it is becoming less popular in its other niches.

CDuce

Have you seen CDuce?

Comparing CDuce and Candle

Thanks for suggesting CDuce.

I definitely had visited CDuce website (might be long ago), as I had bookmarked their website. My impression was that it was something like RELAX NG.

I had a closer look at their documents again today. And I found it to be much more than that. And I swear I'll spend more time on CDuce. There'll be many things Candle can learn from it.

Comparing CDuce and Candle (based on my limited study of CDuce), I think they have quite some overlapping features, e.g. they all have letters 'c', 'd' and 'e' in the name :-)

  • They all have RELAX NG kind of type and pattern system. CDuce is more powerful and supports features like capture variable.
  • They all supports dynamic XML construction.
  • They all supports XPath expression. Candle is more complete.
  • They all supports query expression. FLWOR in Candle is probably richer than select_from in CDuce.
  • They all supports function as first class data.

Some differences:

  • CDuce's syntax is based on functional languages. Candle's syntax is more like scripting language, like JavaScript. I think the latter will be more friendly to programmers from OOP world.
  • CDuce supports imperative assignment. I don't know if that would void CDuce as a functional language. Candle also supports imperative features, but enforces separation-of-side-effects.
  • Candle supports updating of XML data using CRUD statements. CDuce does not.
  • Candle has a bigger scope to become a general-purpose app development language. And CDuce is more like a focused XML processing language.