Language mystery: identify the source language to a worm based on its object code

Here's a fun challenge for LtU. The team at Securelist is analyzing a worm called Duqu and found a few interesting things. One of them is that they can't figure out the source language for the core framework.

After having performed countless hours of analysis, we are 100% confident that the Duqu Framework was not programmed with Visual C++. It is possible that its authors used an in-house framework to generate intermediary C code, or they used another completely different programming language.

We would like to make an appeal to the programming community and ask anyone who recognizes the framework, toolkit or the programming language that can generate similar code constructions, to contact us or drop us a comment in this blogpost. We are confident that with your help we can solve this deep mystery in the Duqu story.

I'm not clear on how much knowing the source language helps with the security analysis, but what else were you doing with your time? All the details and clues in the object file can be found on their blog.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

scale seems one person task

The scope and difficulty described requires only one person, for less than a year, especially if they started with a description of roll-your-own objects in C of the sort Brad Cox wrote about in the 80's. If you didn't have a copy of cfront to play with, but wanted objects, this was one way to go.

The first object-oriented language I used much in 1989 was such a thing, written by one guy on the team as a C preprocessor explicitly mimicking the design style Brad Cox used in Objective C. So it was a sort of bare bones ObjC clone, and it was quite easy to use. (I got them to consider using C++; they showed signs of interest in that "new-fangled" language.)

The SecureList team might be looking at such a thing developed recently, or many years ago. Without evidence, I'd guess it was someone's pet project from the 80's or 90's that had a mature library doing exactly the thing needed. If so, there's a chance more of the same appears somewhere else, if it's someone's favorite tool. But I wouldn't expect open source projects. It's quite plausible someone would recognize the style, so asking like this is a good plan.

I have no further guesswork on this, though. (A couple years later I put a Smalltalk style class system into an XLisp clone, and this is why it doesn't seem like a big task to me, except a C preprocessor is significantly harder than either Lisp or Smalltalk parsers.)

I invented v-tables myself

I invented v-tables myself in 95 while working on Kimera at UW along with arena-style automatic memory management. Of course, I wasn't the first one to invent these things, I was just reinventing wheels (well-known to everyone else) to deal with C :).

This code could very well be C + custom v-tables. That would explain why the v-tables aren't in the same location in the object layout.

Reminds me of a scheme adware author

This reminded me of an interview with an ad-ware author.

S: You wrote adware. You bastard.

M: [sheepishly] Yes, I did. I got to write half of it in Scheme, which probably means that I deployed more Scheme runtime than anybody else on the planet.

broken link

Could you fix the link to the interview?

done

done

Mystery solved

According to this post the mystery has been solved. It appears to be just boring old Microsoft C with some custom OO style framework on top, and probably not a full blown language compiler.

So much for LtU's dream of a state sponsored cyber warfare language.

the dream (nightmare?)

So much for LtU's dream of a state sponsored cyber warfare language.

That doesn't follow. This:

boring old Microsoft C with some custom OO style framework on top [presumably SOO]

could easily describe an intermediate language.

Its more like a dialect of C

Its more like a dialect of C without the benefit of extending the compiler. This is how we coded in the old days...and I'm not even that old.

re dialect of C and age

Its more like a dialect of C without the benefit of extending the compiler. This is how we coded in the old days...and I'm not even that old.

Believe me, I'm old enough and have been programming long enough to have similar memories.

My point was partly that absence of evidence is not evidence of absence: just because you can track down a conventional language compiler and library that led to the binary you are looking at doesn't mean that you aren't looking at code generated by a Sooper Sekrit doomsday cyberwar language (that happens to compile to C as intermediate language). It's also partly that if you wanted some handy OO features in your intermediate language, but really didn't want the full monty of C++ ... you might just take this approach (C plus a lightweight library). Of course, that only means that the binary is not inconsistent with the existence of a Sooper Sekrit Evil Language, not that it supports the hypothesis.

If another tool was pumping

If another tool was pumping out C as an intermediate language, there would still be an MO to detect, which I think the researchers are smart enough to take into account. E.g., what does C-front C++ assemblies look like? You first disassemble into C and then into C++. I believe they've decided that this is straight C + a particular library (macro or otherwise) that enables OO programming.

re: If anothr tool was pumping

If another tool was pumping out C as an intermediate language, there would still be an MO to detect

Where you say "would" I would say "might or might not". Especially if the adversary is deliberately trying to hide the (hypothesize) DSL.

You are definitely correct,

You are definitely correct, if they want to really hide their tracks they should be able to do so. However, I doubt they would waste that much resources on this, especially to make the code "look" like it was coming from a fairly simple combination of tools.