Please delete this if it's not appropriate. I'm not sure I've seen anything similar here before, and it's very vague and waffly. It's posted on the chance that someone might point out some glaring error, or some related work (I'm now on holiday so will slowly be catching up on recent topics).|
one of the projects at work is an archive of astronomy data. i've just been unofficialy revising the preliminary design, prior to review.
the archive stores/provides access to data and, more importantly, allows new data to be generated. the architecture is a standard three tier system - presentation and resources sandwiching the "business logic".
the business logic assumes something that could be called a model, or an ontology, or a type system (as far as i can see). for example, two spectra may have different units (energy and wavelength, say). the business layer must "know" this, so that it can do the appropriate conversion.
a more complex example - important because it introduces higher order types - is a pipeline (a series of transformations on data). users must be able to configure pipeline processing and modify existing pipelines.
one of the ways in which the archive might expose data is via an rdf ontology. this would provide a standard way for third party programs to find callibration data associated with a particular image, for example.
as far as i know there is no standard rdf ontology for astronomy data (but i haven't looked), but there is work on defining a "data model". so any model could change significantly as the archive is developed.
the obvious question, then, is how to express this information in the system design. the current approach, as far as i can tell, is that it will be implicit in the structure of the code. some parts will be reflected directly in object interfaces (eg there's likely to be an "image" interface), but other parts will not (eg no "entity" or "can-be-converted-to" interfaces).
i'm talking about objects and interfaces above because the implementation is likely to be java. however, i'm more interested here in the "best" solution for this isolated problem.
an alternative approach would be to make the model completely explicit. this would make the generation of rdf simpler (but is this a presentation layer concern?). the question then becomes, i think, how to map the model to the type system.
(one reason why i don't think this is the current approach is that the business layer may use ejbs, which are fairly heavweight (i believe) - the ejb infrastructure helps implement distributed processing and transactions).
at first, objects seem like a nice approach. however, there seems to be a mismatch between "things" and "relations" (objects and methods?). if the possible transformations on an entity are reflected by its methods, how do i describe a pipeline (a series of method calls)? i can see two solutions: either make processes objects too (which means that methods end up with vague generic names) or use a language in which methods are first-class (maybe multimethods in clos?).
a related question is how the language's type system should be used. should it directly reflect the model, or should it reflect the meta-information in the model (entity, relation, etc). i think subtyping allows the two to be combined, but restricting the use of a type system to a meta-role might allow more dynamic code in some languages (it would be easier to create new types at runtime).
maybe there is a middle ground, with higher level types in the ontology (kinds of things?) being associated with objects, and lower level types associated with instances. this might allow the low-level details to be defined separately (eg in an xml document). "simple" changes to the ontology of the system would be reflected by changes only in the document.
finally, if the model/ontology were somehow explicit, how would that information be used in the business layer? the standard oo approach is to place processing in the appropriate class, but i'm worried that this will be more vulnerable to changes in the model. a data driven approach (where the model is the data) might be more robust? and how does the system do reasoning about the information? is this done (in an oo implementation) by the superclasses that represent meta-information?