Type safety in libraries

I wonder if this subject was already studied, as I think I need to look into it for my language. How do you ensure type safety across compilation boundaries?

For example, I am thinking of the situation where a type is shared between an application and a dynamic library. There is no control over the memory layout of the type if both are compiled separately. You could decide that the type belongs to the other codebase, and only perform safe operations on it (call methods, etc.) this way differences in version and even in implementation do not break your code.

But what if both have the type compiled in and this cannot change? The library could interface with some other code which requires the type to be implemented in the library, and the application may need to perform operations which require the memory layout of the type to be used.

In Java, there is bytecode verification - I do not want to depend on using bytecode. Ideally dodo should compile to machine instructions.

In C++, the memory layout is explicitly defined in header files which are shared between different pieces of code - this is brittle and poor encapsulation.

In Lisp, all structures are reducible to cons operations. This is more seductive but I don't think I could live with the performance penalty.

In all these cases, the internal state of an object is accessible by foreign code -although in the case of Java the JVM can put restrictions on code other than its own- which I want to avoid.

In some cases I could rebuild the object at the other end based on my (known) implementation of the type and the public properties of the object, but this does not always contain enough info to do anything useful.

Could I determine in a trusted way if separately compiled pieces of code use the same implementation of a type?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

At some level there must be

At some level there must be an agreement between the application and the library. Even if you use introspection then the introspection data needs to be agreed. But at least that defines a limit to the amount of agreement needed, and keeps it internal to the introspection system instead of needing agreement on the interface to all user code.

Also note that without agreement there are no safe operations, for example you suggest method calls as safe, but on some platforms Gnu and MS compilers have different ABIs so you can't mix code compiled by different compilers. And as far as I know there is no way of checking.

What you're asking is this:

"After I discard all type information by conventional compilation, can I verify type consistency?" Obviously the answer is no. You have to stash it somewhere if you want it.

One possibility is to keep it in a separate object file in parallel with the conventional object format of the platform. This is what GNU Ada does to keep track of dependency information: each parallel-object file contains the timestamps of all the source files on which the object file depends, and a pre-linker pass will refuse to work if an object file is out of date with respect to the source.

I expected that would be the

I expected that would be the case. I will have to keep some metadata which the loader will use to check consistency. But this scheme does not enforce a relation of trust between the application and the library, so it could be intentionally or not defeated.

It seems you are gesturing

It seems you are gesturing in the direction of typed assembly language and PCC.

Which kind of assembly?

Is it "assembly" in the sense of assembling object files or "assembly" in the sense of down-to-the-metal kind of language?

PCC is proof-carrying code, thanks for the pointer. And the paper I found indicates this does not incur run-time penalties :-))))

"Typed Assembly Language" as

"Typed Assembly Language" as in machine code that's typed. TAL is the acronym.

Well

well, "Type-safe" code is code that accesses types only in well-defined, allowable ways. so if the code accesses memory at arbitrary offsets outside the range of memory that belongs to that object's publicly exposed fields, it is not type-safe.

INDEED!!! your libs should be type-safe to prevents users overhead and let their programs stable without any extra work like casting or null detection

Is it generally practical to

Is it generally practical to be type-safe in the way you defined? For performance reasons, I expect this form of safety to be broken occasionally (but if the type is defined in only one place, we are safe).