Schemas for JSON?

I'm seeing a big gap in the world right now in re: schemas for JSON.

In particular, it appears to me that a large fraction of the world (well, okay, of the people who are encoding data in language-neutral ways) has the wrong impression about JSON, believing that its fundamental difference from XML is its simplicity, when in fact (I claim) JSON's raison d'etre is the fact that it didn't start life as a markup language.

If you believe me, then in fact schemas for JSON values are in fact extremely valuable. I see one proposal on the web, but my brief reading of it suggests that we can perhaps do better. Is there a bunch of work on this that I'm just not aware of?

Apologies in advance for offending anyone, or for failing to be aware of obvious answers to my question.

Thanks!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Protocol buffers offer

Protocol buffers offer schemas that seem like they could apply reasonably well to JSON.

Thrift, Avro, Kwalify

Apache Thrift is a data serialization tool that uses a schema language (originally from Facebook; similar to Google's Protocol Buffers). Though the "primary" serialization formats are binary, some of the language support libraries can also read/write JSON.

Apache Avro is a similar data serialization tool and supports JSON values. The schema definition language is itself JSON, which is kind of verbose, but they also now have a less verbose custom syntax for defining a schema.

Kwalify is a YAML schema definition tool. JSON is a subset of YAML, so you could probably use Kwalify on JSON.

I'll take this opportunity to plug my own tool. The main difference from other tools is that the data model is based on algebraic data types. The text format is JSON-like but not JSON. One could write a parser that accepts JSON (like Thrift has).

cks

Your tool looks interesting - I have built something similar. There are no contact details on the web page however?

Contact

I don't run into many people that are interested in this kind of thing, so I'd love to get your take on it. E-mail: [my-first-name]@cakoose.com.

the slope

If you add schemas don't you want to be able to automate simple ways of composing schemas and instances of those schemas (nesting, products, sums, and so forth)?

That being the case, don't you want to also have namespaces?

For the purpose of treating automatic composition and decomposition of nesting, don't you need an attribute / child distinction?

Aside from syntax and a (fixable) restriction on the type of attribute values, going in this direction don't you eventually simply re-invent XML?

Conversely, if you program in a style that uses only a default namespace and shuns attributes and uses a short-hand syntax for XML, haven't you got JSON?

So, why not simply pick an XML schema system and find a syntax for its restriction to JSON values -- double check you get a semantics for that restricted schema dialect that is useful -- and you are done?

More on the attribute/child distinction?

Could you elaborate on why you need the attribute/child distinction?

Simple

Elements contain their children, whereas they are described by their attributes.

re: SImple

Yes. And a concrete example of how this applies to composition and decomposition is that without knowing the types of either of two nodes, A and B, I can perform an operation like replacing all of the children of A with the single child B, knowing to blindly preserve any attributes A might have (and leaving B wholly unchanged but for its parent).

Even more concrete?

I don't understand why such an operation is important enough to require a child/attribute distinction at the very core of your data model.

Perhaps an even more concrete example might help me? For example, when would you want to do something like that? Why can't it be taken care of by requiring that A have a field called "content" that accepts child nodes?

Yep.

The short version: yes, that's precisely what I want. I don't believe anyone's done it (though it sounds like I should take a look at Kwalify). Moreover, there *are* concrete proposals on the table that appear ill-specified, and I hope that these don't gain too much traction.

(aside)

re: "Moreover, there *are* concrete proposals on the table that appear ill-specified, and I hope that these don't gain too much traction."

Don't ya just hate when that kind of _____ happens?

:-)

(Rhetorical question. Of course you do. It's just a nice way you put there, just then.)