How to learn about formal top-down approach to software architecture?

Hello everyone, I'm a software developer interested in information retrieval. Currently I'm working on my 3rd search engine project and am VERY frustrated about the amount of boilerplate code that is written again and again, with the same bugs, etc.

Basic search engine is a very simple beast that could be described in a formal language consisting of two "layers":

1. "Layer of primitives" (or axioms, kernel language - don't know how to name them). They consist of several sets (as a set of resources - files, websites), relations on sets (as 'site A links to site B') and simple operations as 'open stream to resource A', 'read record from stream', 'merge N streams', 'index set of records by field F', etc. Also, there is a lot of data conversion, as 'save stream in YAML format', 'load stream from XML format', etc.

2. 'Application layer' - several very high-level operations that form a search engine lifecycle, as 'harvest new resources', 'crawl harvested resources', 'merge crawled resources to the database', 'index crawled resources', 'merge indexes', etc. Every one of this high-level operations could be expressed in the terms of "primitives" from 1.

Such a high-level representation could be easily tested, maybe even proved formally, and implemented (or code-generated) in the programming language of choice.

So, the question: does anybody design systems in this way - formally, rigorously ( maybe even at the level of algebra/group theory), in the strict top-down approach? What can I read to learn about ?

This question was also posted here:

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Somebody on Stack Overflow

Somebody on Stack Overflow suggested Z.

I would counter by suggesting a language based on Z, but targeted towards model checking: MIT's Alloy Analyzer project.

You should also search for things like "model checking search engines", or maybe "4th generation language search engine", "model checking information retrieval", etc. Just whip up a bunch of buzzwords. "Architectural analysis", "architecture analysis", "formal analysis" + domain-specific vocabulary might help. This could get you papers on what researchers have done in this area, apart from software engineers.

If you want MORE control in developing a model checker, then you could use a metalanguage like Maude that can be used to define a DSL for describing the information retrieval domain. Or you can use Maude plus a Maude-based model checking package.

A fun take on model-driven

A fun take on model-driven development with Alloy is to introduce refinement into more traditional code: Danny Yoo's Alchemy.

(... I'm still looking out for approaches that make it easy to transition up and down between abstraction layers, as is somewhat possible in Armando Solar-Lezama's sketching work).


Very cool, thanks for sharing both links! Never seen either of these before.

Formal != top-down

Just to be clear, taking a formal approach does not necessarily mean that you must use strict top-down development. That said, there are certainly a number of organizations that use these techniques, and have reported on what they've done. Jim Woodcock and others recently published a great survey on formal methods in practice, which could be a good place to start (plenty of pointers to other material).

As for specific formalisms, there are a wide variety. Here are a few places that might be useful starting points:

  • Z-Bo has already mentioned Alloy. Daniel Jackson's book on Alloy, Software Abstractions, is a good intro to using Alloy (it also includes a case study comapring Alloy with several other formalisms).
  • Jonathan Jacky's The Way of Z is a practical introduction to Z that includes examples of going all the way from Z specs to code.
  • Bill Roscoe's The Theory and Practice of Concurrency is a nice introduction to process algebra and model-checking, if you're looking for a formalism that focuses more on interactions than on relations. Tools for working with the CSP process algebra can be found here.
  • There are a number of books on the B-method, as well as several open source tools.
  • On the algebraic specification front, Algebraic System Specification and Development: Survey and Annotated Bibliography gives a good survey of various techniques. The CoFI is probably also a good place to look for more information and pointers to tools.
  • Other possibilities abound: various other process algebras, such as CCS, the pi-calculus, mCRL; logical formalisms such as Lamport's TLA+; a wide variety of model-checkers and theorem-provers (SPIN, SMV, Cospan, Coq, Twelf, Isabelle, PVS).

Are you sure it's software architecture?

It doesn't sound as if you want to tackle problems in general software architecture. You seem to be describing problems in a specific domain that sound as if a Domain Specific Language (DSL) would help. Perhaps searching for resources on Domain Specific Modelling would give you some pointers.

In particular, your primitive layer sounds likes a simple language. A code generator for this language would certainly help in your frustrations with re-writing boilerplate code, and possibly with testing for correctness.

It sounds as if you then want to make assumptions about the functionality in the first layer in order to prove the correctness of the operations in the second layer. It is hard to give specific advice on how to approach this without knowing more about the nature of the operations that you want to certify, and what types of properties you wish to certify. If this is the direction that you want to go in then I would second the advice above that Coq, or Hol, might be appropriate tools for the job.