Wide Scope Dead Code Analysis

I've been doing some research lately on the various compilation phases. I figured it's not a bad idea, given I'm writing a compiler framework. One topic that I've found interesting of late is Dead Code Analysis. Into that, I've noticed there are two areas of interest:

  1. Flow control analysis to detect which branches are living and which are dead: wherein the condition is whether the branch can be reached or not.
  2. Live Variable Analysis to detect which variables may be potentially read during one of the live branches within the code.

The question I have is has there been any research related to a wider scope dead code analysis, which involves a broader scope analysis on the entire program. A live analysis done on the members of the program, based upon what delineates a member as living. In some cases if the program is intended to be stand-alone, unused portions, even those public, can be considered dead, in class libraries accessibility modifiers delineate their live status. Once the scope is defined, you can then analyze the internals of the program to determine which parts of the program are needed and which are not. The question then becomes: is the analysis worth the effort?


Insight welcome

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

RMA

You may want to have a look at Virgil, particularly reachable members analysis.

Escape analysis and dead code analysis

Lots of dead code is only hypothetically dead rather than certainly dead. We can't say for sure unless we know a cell won't be updated reflectively. So some static analysis such as uninitialized members must be flagged at compile time and manually discarded by the programmer. The only exception would be escape analysis hoisting used members into registers for short-lived objects. In this way, escape analysis can eliminate certainly dead code viz a viz not hoisting it.

Compiler Analyses

Just a random thought: it seems to me than a sufficiently advanced version of a specific code analysis (like dead code analysis, control flow analysis, dataflow analysis, constant folding) always has to incorporate all other analyses.

For example, dead code analysis:

if(b){ ... }

To determine whether ... is dead, you need to know the value of b, which requires dataflow analysis. If you had:

b = true | false

Then you'd need constant folding to determine the value of b. If you had:

if(false){ b = true } else { b = false }

Then dead code analysis is required to determine the value of b.

So all these analysis are interdependent.

Re: Compiler Analyses

So essentially I'd need to create a staged analysis. The project I'm writing aims to enable others to extend it, including areas such as code analysis. The primary goal in this, given that it's a language independent compiler framework, is it can't assume to know everything about a given language, but it can make basic assumptions based off of known-common language elements. If a new construct is introduced, while it might be translated in the end to simpler constructs that can be easily analyzed, there might be pre-translation analysis that needs to be carried out on it.

From what I can tell, dead code, control flow, data flow, constant folding, and so on would be individual stages, each as a part of the compiler context during the analysis phase (or phases, depending.)

I have a lot of decisions to make prior to getting to that point, mostly related to how to structure the framework to expose the analysis phase to a possible language designer, so they can extend/alter the analysis. So if anyone here has any suggestions on that, I'm listening (I'd go into mine, but since I'm learning this on my own with a trigonometry level of math, no graph theory and so on, my descriptions would be very verbose and devoid of any symbolic short-hand.)

Not sure you'll end up with anything publishable

Sorry to say, but I think that dead code analysis is a pretty straightforward optimization implemented (very, very) often. As far as global dead code analysis goes, it is mentioned in the wikipedia page referenced by you that that is even common, as the hint "even whole modules can become dead code" shows. I am not sure it's particularly worth it to waste academic paper on it.

But then, guess anything is publishable, as all the monad papers show. (My "Carthago should be laid down to waste" pun ;-) )

Publish?

I'm not really looking to publish. I was more interested in other persons insight into the subject matter.

I'm aware that an entire assembly can be dead code provided nothing ever uses it, such is true of most academic projects that are finished and long forgotten because they're never executed; however, that's not really the focus of what I'm referring to.

It was more a question of whether such a feature should be implemented in language 'x', and assuming you do, what level of control should the user of language 'x' have over the 'dead' code and its removal from the application.

Typical example is compiler generated junk that is a part of Visual Basic.NET binaries (I believe it's the 'My' namespace), having a proper ablative model in place for the generated code would be useful in areas where the generated code isn't ever used.

Ah, you mentioned research

Ah, you mentioned research, so I supposed you wanted to write on it.

Research

Research, to the aim of writing, is one thing; however, software that puts the concepts to use is more practical for me. I'm a hobbyist with an Associate's degree, so there's a lot of aspects that would be difficult for me to write on since what I know has been built over time. Writing a proper paper on it would be out of the question since I don't know the proper source for a large portion of the things that I, think that I, know.