Dependencies as first class entities

When thinking about what pains are worst when writing code, it seems like it all boils down to dependencies. Even as a user, the problem seems to be the same. Ehrm, correction, as a user I get frequently insanely mad at dependency problems. As a programmer, it is more of the moaning kind.

My first thought was to do a better make language, since this was where most of my pains were. Then I realized, it is bigger than this. Just building the thing isn't all there is, installing it, remember what was installed and why, be able to remove some of it that seems to be in the way, and so on. Not to mention configuration problems. Now, why was that configuration set in that way, who depends on it? What would happen if I changed it? Sometimes I don't even know how to find the answers.

My current thought is to deal with this once and for all. Let's make dependencies a first class entity in the programming language, make it externally visible and then watch what would happen.

The questions I haven't been able to figure out any answers to yet are:

  1. How should this be implemented?
  2. What could this achieve?

Has anyone been there before me, as usual, and written some vice word on the subject?

Anyone having similar ideas?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

More than a language issue

The bigger version of the problem you describe is often called "configuration management". In addition to the items you list, the better approaches also consider versioning issues and the need to run multiple versions of libraries in the same system.

One approach that seems to have some traction at the moment is OSGI.

OSGi seems to solve parts of it,

but there is more. OSGi as far as I can read from Wikipedia seems to handle the "other" end of the spectrum from what I had in mind. Bundling code into components that can be started, stopped, installed, uninstalled. Seems like it solves dependency problems using some kind of name server functionality, which is a nice way to solve it for dynamic behaviors.

But the kinds of dependencies I was thinking of are more of the kind "this object is now faulty, what other objects are affected by it?", as an example.

I guess you could divide it into two categories:

  1. Dependency interrogation to provide information to a user, e.g. describing where a configuration came from or provide debugging information when something fails.
  2. Provide externally visible dependency information about externally visible entities. A building system would be one user of this, but I guess systems like OSGi could use this too.

The first bullet is what would need language support. The second would probably be a compiler/interpreter concern only.

Pick your poison

Making dependencies first-class seems to be at odds with abstraction principle: to be able to observe them, let alone pass them around, you'll need to break the boundary of the thing that encapsulates them (package, executable, closure, process etc.). Of course if the binding is immutable (a-la lambda calculus, static linkage) then its a non-problem. Plan9 namespaces is another example, where a parent can setup a namespace for the child, after which it becomes private to the child.

If bindings are mutable by 3rd parties then I would stick with the abstraction principle (assuming no harm by default) rather than deal with the fall-out from a dependency mess that would ensue if abstraction is broken...

Its not the language, its the environment where it runs in.

external dependencies define many dimensions like
- component version number
- package / namespace
- interfaces exported by component ;
- platform description / required operating system / JVM version ; etc. etc.
- other packages required by the component
- documentation of interfaces.

So you are more tied to the execution environment (JVM; .NET ; native OS) than the language.
With Java you have that the JVM is a more general and fundamental setting than the language, this applies to most cases.

You might take a look how Maven does a good job at managing dependencies for Java and co.

to be more precise.

Most aspects of a dependency are external to the language; so they can't be expressed in a programming language.

Maybe so, but...

... but quite a lot exists within the language itself too. Imagine types for example. Using an object oriented model, you can easily see "is-a" relationships, which also translates into dependencies. Any object aggregated into another object is another type of dependency. An object using another object is yet another type of dependency.

When objects are shared in a big network, the dependency information might be interesting to have. I think this is mostly useful in error situations, but it might be useful for other situations too. Combined with the possibility to interrogate versions would make it even more useful. Then you could place restrictions on what version of certain objects that are allowed to be combined with what version of other objects.

I was just thinking someone might be triggered by the idea and come up with some useful scenarios where dependencies as language feature would be good to have. But it seems like this didn't happen. At least not yet...

In the java world they have

In the java world they have annotations

An annotation adds metadata to the defined class / type; the metadata can be looked up/checked at run time. It is possible to define new kinds of annotations.

Do you think that this kind of per class metadata could express version information adequately for your purposes ?

Java annotations seems generic

so I guess the answer is yes. But to make this useful the compiler needs to be aware of it and what it is, so inheritance etc propagates the information correctly.

Runtime dependencies

A possible implementation would be to have each object hold a reference to the object(s) and functions that lead to that object. Of course, such scheme only works when objects are immutable (=values).

Capturing runtime dependencies would allow us to do online (deep) impact analysis or offline postmortem analysis.

Another plus is full traceability. We would ultimately know the sources of all important (derived) data - including software- and how they interplay. This is something that banks, and their regulators, are very keen on.

Storing dependencies

Having dependencies as first class entities in a language makes it possible to store dependency information in some kind of external storage (e.g. database), which might be nice for banking application for example.

Using a compiler switch you could also store all externally visible dependencies in a file suitable for a build system. This way, you could have full dependency tracking not only for file dependencies but also for configurations (in C this would be macros), compiler switches etc. This by itself does not require dependencies to be first class entities though.

Not entirely a language problem, but still...

... I'd say there are some things that could become easier with language support. Let's say you have something you call "configuration", which is basically an object with a name, description and value. Wouldn't it be nice to be able to tell the user where this configuration comes from? A text file? Command line? Etc.

Sure, simpler cases are easy to program if you design for it beforehand. But for more complex cases, it would be nice to have some language support. So you can just write

print myConf.origin

And be away with it...

My intention wasn't primarily to be able to change the dependencies in weird ways. Rather, I envisioned a more introspective, maybe read-only, way of gathering dependency information. It could be nice to write:

if a isDependentOn b then

I.e., if b is needed to calculate a, then maybe I want to take another route. I don't see why this would break boundaries of things, just by asking how things relates.

And yes, versioning would be nice to have too as a language feature. But I leave that for another thread sometime in the future, it's a bit to big topic to throw in to the mix.

Codependencies

I think you cannot have a dependency system without also having codependencies.

Dependencies form a graph from a set of pairs (A,B) meaning A depends on B and throw in R, the root. A simple example is control flow which tells that some function f calls g so f depends on g. Given some library of functions we know how to start with R and get the transitive closure (the algorithm is called garbage collection :)

Codependencies are the dual. The elements are a set of triples (C, A, B) meaning if C then A else B. A simple example is conditional compilation. Codedependencies are evaluated by starting with a set of base conditions (macros in C) and flowing towards the root, which is the resultant program.

In the language C, codependencies are done first (by the pre-processor) and then the dependencies (by the linker searching libraries).

You might look at constraint languages for configuration

There's been some work on constraint languages for configuration management. In general, solving package constraints is an NP-complete problem. You might look at OPIUM, which is a constraint-based package manager. The original paper is available here. They appear to have incorporated it into Eclipse.

Beyond just instantiating/installing the correct dependencies, there is the issue of configuration value dependencies. For example, if a component is parameterized by a value which it externalizes, then that value needs to be propagated to any client components. I've looked at this issue in the context of configuration languages. I borrowed some ideas from programming language module systems when designing the configuration language for Engage.

Engage was implemented as an external DSL. I've thought about how it might be more tightly integrated into a programming language. A big challenge, as mentioned in other comments, is that many dependencies involve the external environment. I suggest providing some constructs in the language to make the high level decisions along with extensible hooks that can be used by the programmer to interface to external tools. This is similar to the approach used by Java class-loaders. As for specific language constructs, perhaps first-class modules along with a constraint notation to specify module dependencies/imports.

Dependencies in Felix

FYI Felix has two kinds of dependencies specifiable in the language. First you need to know it's a cross-cross-compiler: it generates C++ as an intermediate language then compiles and links that. The first kind of dependency you can state is for the first stage:

  header gmpxx = '#include "gmpxx.h";
  type mpz = "mpz_t" requires gmpxx;

The requires clause here specifies that if you use the type mpz the compiler has to emit an include for the header file defining it. If you do not use the type mpz the header may not be included: whether you use the type is determined by the dependencies implicit in the program call graph.

The second type of dependency you can specify is like this,
which is a more correct version of the above:

  header gmpxx = '#include "gmpxx" requires package "gmpxx";

This says that if you use the header, you need the package too. The package is an abstract name which is the basename of a configuration file containing instructions on how to link the gmpxx library. It can also include the header information and if so the include spec can be removed from the program.

There's more, but these are the two basic language features involved. Actually the requirement clause allows expressions with alternatives but I could not figure out how to make that work (lacking a SAT solver).

The utility of this setup is that provided you construct your configuration database correctly, the program requires only that you label types (and occasionally functions) with an abstract dependency, and the compiler driver does all the rest, allowing Felix programs to be run without needing any compiler switches, linker instructions, or other crud, whilst the program source remains relatively portable because the dependencies are only specified in the abstract.

The big problem I have with this isn't the lack of a constraint solver to handle alternatives, but the fact that I can still only specify *codependencies* like this:

  macro val WINDOWS = true;
  macro val UNIX = false;
  ..
  if WINDOWS do
     fun dlopen : string -> address = "LoadLibrary($1)";
  elif UNIX do
     fun dlopen : string -> address = "dlopen($1)" requires package "dlfcn";
   else ERROR;
   done

In other words, traditional conditional compilation. The point here is that the OS macro tag is driving the code choice, so the code generated depends on the OS tag, which is the inverse of saying that the LoadLibrary version depends on WINDOWS, so I cannot use the requires clause. In other words I have no decent language construct here for codependencies (conditional compilation does not rate as "decent" :) Suggestions would be welcome!

Choice calculus

Have a look at CC. I think you'll like it.

I haven't read it completely

I haven't read it completely yet, only skimmed it a bit. But it seems like something that I would like to be able to support. The question for me is in what way. I guess this is always the question when it comes to how to implement a calculus...

After some more thinking

I wonder whether dependency tracking would be able to solve those kinds of problems implicitly. Imagine a function returning current OS variant (Windows, Linux, FreeBSD, ...), and then some conditional module/header inclusions depending on what variant the function returned.

With full dependency tracking the compiler should be able to figure out that for this particular compile this function will never return anything else then say Linux. Because of this all else will come naturally since the modules/headers specifies which OS variant(s) they support.

The challenge will probably be that sometimes you want a generic result supporting as many different environments as possible for the same binary, sometimes you only want to support a specific machine but often something in between. Should be possible to solve with dependency description, but in this case it needs to be specified by the programmer rather than implicitly determined.

Managed to post same comment

Managed to post same comment twice, but haven't figured out how to remove the copy...

Some thoughts on dependencies

Mark Burgess, the author of cfengine, has written quite a bit about configuration management. Please see:

Testable System Administration, in Communications of the ACM. This is an opinion piece, with a rant containing Burgess's hatred for package-based approaches to configuration management, notably cfengine's biggest competitor, puppet.

I have a hard time understanding anything Burgess writes, so god bless you if you can understand him easily. e.g. the documentation for cfengine literally says things like, "We speak of a promiser (the abstract object making the promise), the promisee is the abstract object to whom the promise is made, and then there is a list of associations that we call the `body' of the promise, which together with the promiser-type tells us what it is all about.". I have always had negative surface impressions of cfengine as:

  • just enough of a framework to make something structured
  • it doesn't intrinsically make anything you do easy or good
  • and if you want a wheel you'd better friggin' know how to build one... because it doesn't give a lick about packages or services or anything

That said, the major motivation for considering promise theory is that it is designed for decentralized authority. In other words, autonomous agents and that even touches concepts like "independent compilation" as opposed to "separate compilation".

It is also really queer that Mark claims his approach to be simpler, but he can't describe his ideas in anything graphically simpler than a rat's nest of a cyclical graph structure. There are more scalable approaches to rendering dependencies, from design theory, known as dependency structure matrices, and already have been applied to software. I would like for languages to have a library that allows analyzing programs in the language for meaningful dependency properties, like type dependencies. Even better if the language had a library for micro-refactoring to fix dependency issues.

You also probably want to consider how to manage all this state you want to track. I know you may not look at this as an End-to-end arguments problem, but at large scales, it looks to me as such. One approach worth looking at is Trickles.

You may also want to use the search term "provenance" as well.

That said, for questions of provenance, good system design should mitigate its necessity. For example, many so-called Dependency Injection frameworks allow multiple processing phases, from local to external. For example, an XML configuration file, followed by configuration within the module e.g. via attributes/annotations. Scattering such configuration generally implies poor system design, as the system's configuration is dependent on dynamic dispatch. It is far better to have a static configuration for a system.

Another thought about dependencies

After having read debates about whether lazy or eager evaluation makes the best resource utilization, I'm wondering whether dependency tracking that also includes resource usage could help in finding the best compromise?
My idea would be that the application would be able for each moment describe its resource needs to advance a bit further, and could also give a number of combinations in falling usefulness to help a scheduler. The scheduler could then decide which alternative to provide and when based on fairness and priority. This could span over several CPUs, could be used in NUMA environments or even distributed over a network.
Would this be too complicated to implement, or would this be possible?