LtU Forum

C runtime type info gimmick which supports scripting

I am experimenting with runtime type information in C, so that values in memory have types, rather like Lisp and Smalltalk values have types, and not just variables. This post is intended to be fun. (The idea is for you to enjoy this, and I don't want to argue. But I might respond to remarks with short dialogs in lieu of argument, since that would be fun.) The rest of this is explanation and some detail.

In an earlier post I described a kind of heap I call a vat. It has the effect of associating metainfo with every block of memory allocated, in a different position in memory. (There is a constant time arithmetic calculation that turns a block pointer into a metainfo pointer, and vice versa; so the main penalty in access is a cache line miss if this happens more often when lacking adjacency to its block.) A continuous memory block is a rod, while the fixed size metainfo describing it is a knob. A counted reference to rod (or any point within) is a hand. On a 64-bit platform, a knob is 128 bits, a hand is also 128 bits, and rods vary in size.

Some bits of a hand are a copy of bits in the knob, so a reference is only valid when it agrees with the current self description of an object, particularly the current 16-bit generation number, so dangling refs are seen. Of the bits copied, another 20 bits are the id of the object's plan, which describes the type, layout, and any dynamic behavior if you also supply methods for that type in hashmap entries. You can think of plan as morally identical to class in Smalltalk, but initially I was aiming for perfect format description on a field by field basis, with recursion grounded in primitive native types. I wanted to write centralized code to debug print anything based on the plan. But then I sort of noticed I would be able to write Smalltalk style code that works on the C objects, if you were willing to call a dynamic api to send messages (method selectors with parameters).

If you point at a subfield somewhere inside the middle of a rod, you can eventually figure out exactly what it is by drilling down in the rod's plan, after fetching the plan for the id in the rod's metainfo knob. But since this lookup is offset-based, it aliases anything with the same offset. That is, a struct named foobar that contains a foo and then a bar will have the same offset for foobar and foo. If you send a message, did you mean to foobar or to foo? I would start a search in method tables for the largest containing object, then work through nesting until I found a match.

At the moment I am writing metainfo by hand, but generating it from C declarations later is the idea. On 64-bit platforms a plan is 256 bits, and each field requires 128 bits, plus however long the string names get, and method tables if there are any. When a field is composed of bit fields, I use a plan that says all its fields are bitfields, and then each field describes bit sizes and shifts instead of byte sizes and offsets.

Even if most of an app is comprised of static code, I will want to write dynamic and interactive scripts to explore runtime state in tests, and to inspect things at runtime in ways I did not anticipate earlier. I would normally think of glomming on a scripting language to do this, on the side. But I can use C as the scripting language instead, if memory state is universally annotated this way with type metainfo.

What would be involved in moving logic beyond FOL?

Carl Hewitt has opined that first-order logic (FOL) should not be regarded as logic of choice for various people such as computer scientists (e.g., he says "First-order theories are entirely inadequate for Computer Science" in a reply to me in Mathematics self-proves its own Consistency (contra Gödel et. al.). He recommends instead his Inconsistency-Robust Logic for mathematicians, computer scientists and others.

What would it take to really change our perspective to be grounded in a logic other than FOL? I think that looking at John Corcoran's First days of a logic course provides the beginnings of an answer (the article treats Aristotelian logic rather than FOL): an alternative logic should provide a natural basis grounding all of the concepts that we expect a new student of logic to grasp, and a rival to the kind of traditional logic course based on first-order logic should be comparably good or better for building on.

What would an introductory logic course look like for, e.g., Inconsistency-Robust Logic? Would it start with Aristotelian logic or would it start with something else? What would having taken such a course be good for, as a prerequisite?

Restructor: Full Program Automatic Refactoring

Wrote up a description of some personal research I did 2005-2010: http://strlen.com/restructor/

This algorithm will take an arbitrary program (including side effects) and refactor (create and inline functions) until the code contains no more under-abstraction (copy paste code) or over-abstraction. It is an implementation of code-compression: it finds a set of functions representing your code such that the total AST nodes are minimized.

I had originally intended this as a new way of programming, where you could modify your code using copy paste to your hearts content, and leave the annoying work of abstraction to the computer. I now realize that is a bit naive, and note some reasons why this may not be ideal on the page above. Still, an interesting idea that I've not seen before, would love to hear y'alls expert opinions :)

For reference, I posted about this idea on LTU in 2004 under the topic "abstractionless programming".

Free links to all (or practically all) recent SIGPLAN papers

From http://www.sigplan.org/OpenTOC/


ACM OpenTOC is a unique service that enables Special Interest Groups to generate and post Tables of Contents for proceedings of their conferences enabling visitors to download the definitive version of the contents from the ACM Digital Library at no charge.

Downloads of these articles are captured in official ACM statistics, improving the accuracy of usage and impact measurements. Consistently linking to definitive versions of ACM articles should reduce user confusion over article versioning.

SW verification continues

There is new release of the Albatross compiler available.

The Albatross programming language shoots at making verified software construction available to everybody.

The language description has been completely updated. The document describes how to get the compiler.

The previous releases already contained induction and recursion and even inductive sets.

The most important new feature is abstract data types. Abstract data types are realized by abstract classes, abstract functions and abstract properties. By using the abstraction it is possible to verify a lot of properties which can be inherited by any type which satisfies the concept of the abstract data type.

The design of the language is still an ongoing activity any comment. Hints, issue reports etc. are welcome.

Regards
Helmut

Finding Solutions vs. Verifying Solutions

[Edit] Due to a conversation below, it turned out that sets P and NP are commonly defined differently than I considered them in my initial report. Updated report with the same content, but without this inconsistency can be found here: Finding Solutions vs. Verifying Solutions. I also changed the blog title from "P is not equal to NP" to the above.

Below is the initial, but obsolete post kept for conversational reasons:

In this short report, I argue that P is not equal to NP.

Summary: an approach is made by analyzing functions and their inverses mappings between domain and codomain elements. Drawn conclusions state that verifying data is an inverse function of finding solutions to the same problem. Further observation implies that P is not equal to NP.

Any thoughts?

the type of eval in Shen

Thought some people might find this interesting. Years ago in the LFCS I was talking to Mike Fourman about Lisp and ML and the eval function and how I liked eval; he said 'What is the type of eval?'. I had no answer. Years later I can answer :).

In Shen, eval exists as a function that takes lists and evaluates them. so (* 3 4) gives 12 and [* 3 4] is a heterogeneous list w.o. a type in virgin Shen. If you apply eval to a list structure you get the normal form of the expression you would get by replacing the [...]s in the argument by (...)s.

e.g (eval [* 3 4]) = (* 3 4) = 12

Let's call these arguments to eval terms. Now some terms raise dynamic type errors to eval and some do not. So what we'd like is a class of terms that are well typed. We'll define term as a parametric type so [* 3 4] : (term number).

Using sequent calculus notation in Shen we enter the type theory. See the introductory video here (http://shenlanguage.org/) if you don't understand this notation.

(datatype term

  T1 : (term (A --> B));
  T2 : (term A);
  __________________
  [T1 T2] : (term B);   
  
  \\ some clerical stuff skipped here 
  X : (term A) >> Y : (term B);
  ______________________________
  [lambda X Y] : (term (A --> B));  
  
  if (not (cons? T))
  T : A;
  ______________________
  T : (mode (term A) -);)

So what is the type of eval?

eval : (term A) --> A. Surely?

Let's add this to the end of our data type definition

  _______________________
  eval : ((term A) --> A);

and run it.

Shen, copyright (C) 2010-2017 Mark Tarver
www.shenlanguage.org, Shen Professional Edition 17
running under Common Lisp, implementation: SBCL
port 2.1 ported by Mark Tarver
home licensed to Mark Tarver

(0-) (datatype term

  T1 : (term (A --> B));
  T2 : (term A);
  __________________
  [T1 T2] : (term B);   
  
  \\ some clerical stuff skipped here 
  X : (term A) >> Y : (term B);
  ______________________________
  [lambda X Y] : (term (A --> B));  
  
  if (not (cons? T))
  T : A;
  ______________________
  T : (mode (term A) -);

  _________________________
  eval : ((term A) --> A);)
type#term

(1-) (tc +)  \\ enable type checking
true

(2+) (* 3 4)
12 : number

(3+) ((* 3) 4)
12 : number

(4+) (eval [[* 3] 4])
12 : number

(5+) (eval [[* 3] "a"])
type error

(6+) [* 3]
[* 3] : (term (number --> number))

(7+) (eval [* 3])
# CLOSURE (LAMBDA (V1852)) {1006925DFB} : (number --> number)

(8+) [lambda X [lambda Y X]]
[lambda X [lambda Y X]] : (term (B --> (A --> B)))

(9+) (eval [lambda X [lambda Y X]])
# FUNCTION (LAMBDA (X)) {1006B5799B} : (B --> (A --> B))

You only need a few more rules to complete the term data type and add currying on the fly, but I'll leave it there. This was a byproduct of a much more extensive project I'm working on wrt to a typed 2nd order logic; but I thought it was fun to share.

bw

Mark

Any thoughts on WanaDecrypt0r?

This has been all over the news in my country and I doubt people on LtU will have missed it. There is a large scale SMBv1/SMBv2 worm active in the world called WanaDecryt0r.

Any thought what this means for language security features as hackers are becoming more and more creative in exploiting holes?

Prove: 'Cont r a = (a -> r) -> r' forms a monad

I don't follow Haskell too much, and more often than not I disagree with Erik.

However, this came up during an interview so it's supposedly something I should know.

Prove that 'Cont r a = (a -> r) -> r' forms a monad.

Any takers?

Implementing typing rules -- how do I implement non-syntactic rules?

Hi all,

Typing rules in papers are usually not directly implementable as a checker. For example, they usually include a rule that requires coming up with a type for a binder, e.g. a lambda argument (when lambda doesn't have type of its argument in the syntax). I'm wondering if there's a standard way of implementing this type of rules, because as far as I can see none of the papers I'm looking at explicitly say how to do this.

To be more concrete, I'm looking at the Frank paper (http://lambda-the-ultimate.org/node/5401). The paper says it has bidirectional typing dicipline, but other than that it doesn't hint at how to implement the typing rules. So when I see this rule for lamabda (Fun) I have no idea how to implement that.

I checked the literature on bidirectional typing (Local Type Inference etc.) but couldn't see anything relevant. So I'm guessing that in the literature when I see this kind of rules it means "use Damas-Hindley-Milner style metavariable generation + unification etc.", am I right? Are there any other ways of doing this? If yes, then how do I know which method I should be using?

Thanks

XML feed