How To Read a Paper

Some of the papers mentioned recently are quite advanced, and require careful reading.

It occured to me that not everyone here is adept at reading papers of this sort, and that some tips might be helpful.

So if you are one of those who constantly read papers, how about describing your reading habits?

Do you skim first? Do you follow the detailed proofs on first rading? Do you try to execute code?

Specifically, what would you suggest to someone who asks for help reading A Monadic Framework for Subcontinuations?

And no, "go back to school" isn't an acceptable answer...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Is it OK to skip the formal stuff?

Or, to rephrase the question, when is it OK to skip the formal stuff, and when - if you've skipped it - should you come back to it?

I tend to circle around difficult papers, putting the ones I can't make any headway with at all into the "too hard basket" and looking out for more basic treatments of the topic area. It's OK to retreat and regroup. There are a few papers from the "too hard basket" that I've managed to get through on about the third or fourth attempt. You keep cracking away at it.

Reading "the formal stuff" is a challenge in its own right - it takes time and concentration (and, I suspect, practice). It helps if the notation is familiar; which is to say, it helps to familiarise oneself with some of the more common forms of notation (set theory, logic etc.)


People should realize that notation is often 'abused' (that's a technical term...) A good tip is not to be terrified by strange and/or unfamiliar notation...

couple tips

I wrote a couple of tips on reading a formal semantics.

For a long time I tended to skip the formal stuff too readily. It can even be clearer than the prose, if it's using a formalism you're familiar with, anyway. I think the contradiction of formal systems is that they have a higher barrier to entry, but once you know the language they're easier to understand because there are fewer opportunities for ambiguity.

But it's probably better to skip the formal stuff and read the prose than give up entirely. Often there are clues in the prose that can help decode the formal parts, and sometimes they occur after the formal parts. That's one reason it's important to read a paper iteratively ('scuse me, recursively).

"efficient reading"

Some courses on universities hand out this nice little PDF to their students - you might find it useful.

Re: "efficient reading"

Nice link.

Although surely there must be a guide to reading that handout, and a guide to reading that guide and so on. Why keep bashing one's head against a hard paper (looping) when one can recurse to a simple base case? :-)

Repetition is the parent of mastery

I once read that Richard Feynman, the noted phyicisist, recommended a practice that is somewhat like what I do myself.

He suggested that as soon as you get to point in a paper or book where you feel out of your depth, go back and start reading from the beginning. The idea is that on the second time through, you notice things you had missed that help set up the part that was problematic. Also it gives a bit of time for the key concepts to sink in.

I find that even with papers that I think I understand the first time through, if the paper is good enough, it is worth re-reading at a later date for new insights. (I find this especially when I read classics like Wadler's monad papers.)

Another approach that I have found useful is the "paper detour": I find out where the notation or idea that is causing a problem comes from (typically some kind of math) and I go read some intro materials for that field, before coming back to the first paper. Almost all papers assume some prior knowledge that you have to acquire to really understand them in context.

The final piece of advice that I could give is somewhat obvious, but worth stating: don't read papers that don't really interest you. If you find that you are simply suffering, or that in your heart of hearts you don't REALLY have a burning curiosity for the subject matter, just set the paper aside and find something else that does fizz you. Insatiable curiosity is what fuels the intellectual leaps that a good paper calls upon you to make, and if you don't have it, why bother?

And who knows? The other thing you spend your time learning might stimulate the right questions to come back to this paper at a later date.

Hang-ups that count

Feynman also said that he was good at distinguishing between hang-ups that counted and those that didn't. Sometimes a hang-up can be glossed over without missing the real value of the paper. I can't say that I'm particularly good at distinguishing the two, but I try.

My calculus teacher also commented that textbooks are rarely read sequentially, and almost never read from cover to cover. I like to read a paper over once to try to get the big picture. Then, I go back and try to work out the details.

Trying to explain the ideas to others works miracles for your own understanding. Don't get hung up in the abstract or the introduction. Oftentimes I find the introduction makes more sense after I've read the rest of the paper... especially when discussing programming languages and techniques. If I'm somewhat familiar with the language, I almost always head straight to the examples, and then read about the consequences or implementation or whatever.

The most important thing is to be persistent and realize there are no rules, only suggestions.

Check that abstract and intro

If I'm sure I should be reading the paper, I will read it more than once. I like what Marc said above: repetition is the parent of mastery. When you know you are interested in the material and should be reading it for whatever reason, it's considerably easier to tackle a paper because you have determination on your side.

When you're not sure you're interested, that's when you need to do some filtering. I will often read the abstract and introduction. This tells me whether it is worth my time to plough my way through the whole thing. If the writing is poor in the intro, it is unlikely to get better in the main body. Even if your curiosity is aroused, warning flags should be going up. From my experience, the majority of citations occur in the introduction, so if the writing is looking ugly but your curiosity is piqued, poke through the references.

(Just an aside: scientific papers can be some of the worst writing you will come across. Even if the topic is dry, one should endeavour to at least to bring the topic out. I guess that's awfully hard when you want people from other cultures to read it...)

Of course, if you read the abstract and you don't care what is being said, you're better off stopping. It's a complete waste of time to read an academic paper when what you're reading has little impact on you. Just come back to the discussion and read the comments. :)


It seems to me that PL papers use humour (e.g., language games) more often than papers in other fields. This can be quite confusing to newcomers to the field and non-native speakers, but it's worth it, of course...

More specific

I was thinking about advice that is more specifically related to programming language theory research papers.

Fractal reading

I do repetitions, but my first and sometimes even the second pass are not "sequential scans". On the first one, I read the abstract, study tables of contents (if available), read the conclusion. I try to understand the place of the paper in the scientific landscape using the bibliography AND keywords (Google and CiteSeer help tremendously). If I notice what I call "hub" papers (having unusually big number of citations) on the topic, I examine their abstracts and conclusions. When I suspect I see one of the most interesting papers in the niche, the motivation is so great that I actually read it through. If I cannot make it, because the material is too unfamiliar, I put it aside, and make a note in a separate document (a mix of "to read" and "to think" lists, configured as a start page for my web browser, newest items first). I considered using a blog for this, but I prefer a free form editing of local HTML.

Well, if I made the sequential reading to the end, and I enjoyed it, I make another note to re-read it, together with quick and wild ideas on how this might relate to other items on the list. An exercise for a reader: come up with associations between: MetaKlaim, Hughes' arrows, xbattle, SQL, and futures (the obvious one - LtU - brings you 0 points :-) ).

The Answer Is...

Categories! The answer is always categories. Right? ;-)

This is how I do it. After ea

This is how I do it. After each step I decide if I really should continue with the next step, or just ditch the paper and read something else.

1. Quickly skim the abstract - to get an overview what it is all about
2. Read the conclusion - To find out what the paper really wants to tell you
3. Really read the abstract - find out how the paper comes to the conclusion
4. Read TOC - find out which chapters are worth reading
5. Read the chapters that are worth reading.

it's a language

I remember a couple years ago I was reading a paper and tearing my hair out for hours trying to figure out why a variable came out of nowhere on the bottom of an inference rule. Finally I gave up and asked a more senior student if I was going crazy or what. He just shrugged and said, "oh yeah, if a free variable shows up it's just implicitly existentially quantified." That's the kind of thing you can't just know. You have to be told.

I've found that I've improved dramatically at reading papers over the course of my time in grad school, and it's probably because I've had to learn the language. There are so many things left implicit. We all know the best way to learn a language is by immersion--but as Ehud says, not everyone goes to grad school. So the second best thing is probably to find (friendly) online communities where you can ask questions.

I've also found that the idea of "it takes money to make money" applies: once you start to be able to understand a small corner of the research field, you can then read all the papers that relate to that, and so on [1]. It's just the getting started that's the hard part. For that you probably need a lot of practice and patience.

Also, remind yourself that these people don't have a monopoly on smart. Tell yourself this over and over if you have to. I spent years thinking "you're too dumb, you can't get this, you won't get this, it's just beyond your grasp." That's a load of crap. It's just a language, like any other.


[1] Mitch Wand refers to the ever-growing list of references worth pursuing as "The Book of Sand".

these people don't have a monopoly on smart

In fact, if you are really struggling with a paper, pause and consider.

Are they really that smart or are the merely really bad authors? Think... have they defined their notation? Are there any examples? Have they been delibrately obtuse to inflate the importance of the result?

Consider laying out $300 dollars for a text book written like this. Would you feel like you made a good purchase, or have you been ripped off?

This is part of why I gave up on physics, I found most of the papers had the difficulty artificially inflated. Once I had puzzled out the deepest mystery of it, I felt totally cheated. I _knew_ I could have rewritten most physics papers in vastly simpler and clearer language.

Working through logic

When I confront logic/math that causes me trouble, one trick I use is to assume one portion nullops. That is, take something and assume it evaluates to zero/factors out of the solution. Frequently it _can't_, but that's OK. work through the rest of the problem and see what happens. What you choose to ignore is, of course, important, but let intuition guide you here - pick the "hard part" to ignore first, and work through the rest. Then do the same with the rest of it, and work through the hard part.

Just a trick for getting your hands dirty when approaching a forbidding looking problem.

It's the notation

I've often felt I might get further if I had some kind of notational glossary I could refer to, through which I could google or follow links into MathWorld or Wikipedia.

Ultimately though, a community like a mailing list, IRC, or a website I can ask naive questions of regarding more than just notation is really necessary for me. And I mean *really* naive, questions like "what's a vector?" (I do know this, but someone else might not). I don't need to be spoonfed answers, but I do need to know where to look.

The web was supposed to make so much material available to all of us, yet when I am faced with such dizzying notation and terminology, I am without exception told to simply go to school to learn it there. The phrase "barrier to entry" comes to mind for such an approach, even for those like me who can probably afford it monetarily ... I just have to rearrange my whole life, that's all.


Maybe the LtU readership should create such a glossary and add it to the FAQ.

Personalizing LtU

I sometimes find myself searching for papers which I want to reread.

Maybe it would be a nice addition for LtU to let people be able to build up a personal DB of papers/topics they want to remember? Would be great if personal notes could also be made...


Ever tried CiteULike ? It's quite good for that purpose.

Which communities?

Ultimately though, a community like a mailing list, IRC, or a website I can ask naive questions of regarding more than just notation is really necessary for me.

Care to share which mailing lists and websites you ask your general programming language theory questions on?
LtU is a great place to discuss current research efforts, be it seems like too many questions would be inappropriate. A lot of times I can't think of the proper forum where I can ask on-topic questions. Here's an example ;-)

What's the difference between static type inference
(like in Haskell) and a latent (tagged) type system (think Scheme) with a real good partial evaluator?
In the type-inference situation, we're using unification to solve a set of constraints. If there's no solution, we give up and say the program was ill-typed. On the other hand, it seems like tag checking code in a latently typed lanugage would be a good candidate to get partially evaluated away at compile time. For example it seems like...

;; typed values are really pairs of values and types
;; e.g. (123 . 'number)
;;      ("abc" . 'string)
(define (isnumber n) (eq? (cdr n) 'number))
(define (add arg1 arg2)
    (if (and (isnumber arg1) (isnumber arg2))
        (machine-add (car arg1) (car arg2))
        (error "not a number")))

(add '(4 . number) '(5 . number))

...should be easily changed into...

(machind-add 4 5)

... And after we're done partially evaluating our source, if we still have left over type-check predicates like 'isnumber' in the code then we again declare that the program is ill-typed. Are the two methods comparable in any way? Are there any papers available discussing this?

Like any place you ask a question.

Two things have to be in alignment: Knowledgeable people with overlapping interests.

I'm sure there are many Schemers that are interested in Type tagging. And many of the more advanced Schemers are knowledgeable in HM Type Inference. The foundations for type inference in ML is fairly rigorous with a few known tradeoffs. I don't think that the understanding of how to do type inference in Scheme is near as tight.

Will partial reduction be enough to get you to Haskell level type inference? I would doubt it, but then I'm not an expert. Probably get you closer, but will probably end up with some things in Scheme which can not be known until run-time based on the complexity of the code path decisions.

But at the very least, I think it would be quite interesting to see how far such a path at static type checking could lead.

I'm not sure I understand

Wouldn't the answer be unequivocally no? In a turing complete language, a static type system that guarantees the absence of type errors must necessarily reject some programs that don't have any errors present.

One can create useful systems that guarantee the presence of the errors they find, but not the absence. One can insert run-time checks when static typing fails to make guarantees. But these systems aren't really comparable to Haskell's type system.

This answer is too easy. I really feel like I'm missing something key here.

Maybe, but...

Wouldn't the answer be unequivocally no?

I'm assuming the question is 'Is parital evaluation of type checking predicates the same as static typing?' The answer might be no (after all I'm asking a question, not making a statement), but I don't think its that easy of a question to decide. Here's another stab at trying to explain what I'm getting at. Take your program written in a safe, but dynamically typed language. (Again, think of Scheme. We won't allow the language to violate its own abstractions [i.e. 'safe'] and we won't catch these errors until run-time ['dynamic']). Then do an inline expansion of all the primitives in the code. This step will expose the hidden type checking predicates that normally get evaluated at run-time. Now, take the expanded-out program and give it to your partial-evaluator. A 'sufficiently good' partial evaluator should be able to "evaluate away" some of the type-check predicates (our program has now been specialized to work with only those particular types). Finally, take a look the resulting source from the partial evaluator and count the number of type checking predicates still left in the code. Here come the conjectures. If you have zero type-check predicates left, would the program have passed a static type checker? If you have one or more type-check predicates left, would the program have failed a static type checker? (In which case maybe we shouldn't allow the program to be run) And the third option is: does the partial-evaluator halt? (or does it run forever instead of reporting something like an 'occurs check' or an infinite type like a static type checker might).

Good question

IMO, that would have been a reasonable question to ask as a new forum topic here, since it's about languages in general, type systems, and partial evaluation, as opposed to being a question about e.g. how to do something in a particular language.

To partially answer the question, there's definitely a close connection between types and partial evaluation. See e.g. the subthread here, which starts with a reference to the paper Types as Abstract Interpretations[*]. Replies to the linked post include a few other links, such as an LL post of mine in which I gave a high-level overview of the connection between types and abstract interpretation.

[*] The linked paper is a good example for the original subject of this thread. Those formulae are not light reading. This paper is in my "to be read when I retire" bin. I recall having seen simpler papers on this topic, but I don't recall specifics right now.


IMO, that would have been a reasonable question to ask as a new forum topic here, since it's about languages in general, type systems, and partial evaluation

I agree.


For more standard notation, one can find brief descriptions of the it with google fairly easily. I did this at one point for some logic notation. Also, most "introductory" papers/books/etc. in the area will describe the notation. That said CS/PL papers seem to be some of the worst papers I've read for inventing notation.

Write a program

When I really want to understand a paper I try to write a program which I can use to test the ideas in the paper.

Sometimes this is ofcourse not feasible, for example when it describes a new hack in an already advanced garbage collector. But there are times when you can turn parts of the paper into a program. This is ofcourse even easier if the paper is about a programming technique or a particular library. Once you have a program you have so much more possibilities to explore the ideas.

There is also the fact that you cannot cheat when implementing something. If there is a concept you haven't quite grasped it will immediately hit you on the head when you try to implement it. And you will immediately spot corners which the authors of the paper wasn't all that detailed, or even wrong about.

Rewrite the examples

With some papers I've found it helpful to type in the functions and play around with them. With the Monadic Subcontinuations paper, I ended up reimplementing the library as described in the paper using GADTs instead of coercion functions, switched it the monads to monad transformers, and abstracted the interface into a typeclass.

I found the typeclass particularly helpful, because it lets you focus on the essential bits.

class Monad m => MonadCC p sk m | m -> p sk where
	newPrompt   :: m (p a)
	pushPrompt  :: p a -> m a -> m a
	withSubCont :: p b -> (sk a b -> m b) -> m a
	pushSubCont :: sk a b -> m a -> m b

Depends on the purpose

I read differently depending on why I want to read the paper. And if I don't want to read it at all, I don't...

If I'm just looking for a conceptual understanding, I'll read the prose and skip all the code and formal proofs. I miss out on a lot, but I wouldn't remember them anyway, so why bother?

If I'm trying to implement something, I skip to the algorithm section and translate whatever code/pseudocode/description is given their to My Favorite Programming Language. Presumably I've already read it for conceptual understanding. Hopefully that's enough information to get something that works; if not, I'll start on the proofs to understand what's going on.

If I find the algorithm isn't suitable, then I'll read the proofs. Because they usually provide insight into how the algorithm was constructed. I can change different assumptions and see if the conclusions still hold. That usually gives some hints as to what aspects need to be modified to fit the algorithm to my particular circumstances.

When i'm really having troubl

When i'm really having trouble getting some paper or some part of a paper I invariably dive into the references to broaden my understanding of the subject area. When, after that, i still don't get it, then i start skimming citeseer and to find yet more related papers.

Practice makes perfect!! Not only in reading papers but in any knowledge area! The best advice i have ever gotten is: "Do the exercises". I don't do all, and most of the time i don't even fully do the math (sketches) but working my way through tapl actually trying to solve the exercises was the best thing that ever happend to my formal thinking. Setting up proofs for myself really was an eye opener, and made it easier to understand other proofs and formal reasoning and by extension comp. sci. papers.

First, people should learn how to write them!

Rule 1.

No matter what you think, what your supervisor thinks,
no matter the journal editor and reviewer thinks...

Write to be understood by someone who knows a lot less about the field than you do. If you do, you will be read, understood and cited. If you don't, your beloved paper will moulder unread and unloved.

Rule 2.

There is _no_ standard notation. You may think there is but there isn't. More confusion arises from inconsistent use of notation between papers than almost anything else. Make _explicit_ what _your_ notation means.

Rule 3.

Read it out allowed. I'm serious. Read it to _aloud_ to your dog or something.

There is a strong evolutionary link between how long a sentence we have breath for, and what our minds can comprehend.

If you run out of breath by the end of the sentence, your readers minds will be going ga-ga.

Rule 4.

Add a simple example or two. Nothing makes theory comprehensible faster than an example. Show, don't tell. Show the example first, and then give the theory behind it.

Rule 5.

Make time to make it simple. If you haven't invested significant effort in making it simple, don't be surprised if it is too complex.

Rule 6.

A picture is worth a thousand words.

Rule 7.

Tell them what you going to tell them, tell them, tell them what you told them.

Or better, tell them the bottom line, and why it matters, derive the result. Give a summary that highlights the main points.

More rules on how to write

I agree fully with John Carter's rules. Here are some more rules that I think are very important (some of them overlap with John's rules):

(1) The pyramid rule. The most important sentence should be the first; the most important paragraph should be the first; the most important section should be the introduction. This should be applied recursively for each paragraph and section.

(2) The give it away rule. Explain your results up front, instead of slowing working up to them. This will let readers use their time and effort efficiently. If the results are interesting, it motivates people to read the whole paper.

(3) The truth rule. Every sentence you say should be an independent truth. It should not be false, supposedly to make it "easier to understand", since it will have the opposite effect: it will only confuse the heck out of a reader trying to really understand what you are saying. True sentences are pillars of stability. [Logic programmers will see that this rule is inspired by good Prolog programming style.]

(4) The simplicity rule. This is the hardest one of all to follow, but the most worthwhile. If you find that your explanation is too complicated, it means that you do not understand the material well enough. If you really understand something well, then you should be able to explain it simply. This holds for *everything*, even things that seem very complicated. The Feynman lectures are a good example of this rule.

Rules (3) and (4) may seem contradictory, but they are not. It just takes a lot of thought to satisfy both at the same time! Needless to say, I think that this effort is really worth it.

Exception to every rule

(3) The truth rule. Every sentence you say should be an independent truth.

You just have to relax this a little bit if you are going to argue reductio ad absurdum.

Just a little bit. :)

No exception! Use constructive logic

You just have to relax this a little bit if you are going to argue reductio ad absurdum.

There is no need to relax the truth rule. Use constructive logic instead!


No, use trivial logic

As every statement must be an independent truth, then only axioms can be used as statements. Any proof longer than one statement is forbidden by this "independence" clause, no?

My 2 cents...ahem...points

I think it is worthwhile to keep the two points about research papers in mind:

  1. Target audience: experts in the area who want a focused presentation of the novel aspects of the presented work.
  2. Examples, examples, examples: Nobody understands or invents formal systems just by looking at them.

Re the target audience: Research papers (non-journal) are optimised to communicate isolated results to fellow researchers. Authors also have to obey strict page limts. So, they'll use established language and formal system with little explanation, and they have limited space for examples. If you have trouble reading a paper, make sure you speak the right language. Most formal systems build on simpler previous ones. Understand those first.

Re examples: Even experts need to work through examples to understand a formal system. A good paper will provide you with good ones; otherwise, you need invent some yourself. Don't just "read" the examples; run them through any formal systems and/or execute them.

Not surprisingly, these two points are also good to keep in mind when writing papers:

  • Know your audience. Favour brevity and use established language and notation.
  • Pick good examples. Don't jump straight into the greek. Motivate and explain core ideas with examples first. That helps experts and non-experts alike.

    Now apply them..

    Great tips everyone. I am sure many will appreciate the advice you all gave. But applying these suggestions isn't as straightforwardas it might seem.

    How about explaing how to apply your advice to one of the papers currently on the home page (like I said, I think the monadic subcontinuation paper isa good candidate).

    How would you identify the fundamental prior results, needed in order to understand the paper's contribution? Tp which sections would you give the most attention? etc.

    Paper Tiger or Tiger Paper?

    like I said, I think the monadic subcontinuation paper is a good candidate

    The first thing to observe is that this is a fairly advanced paper. To have any chance of understanding the paper you need to have understood concepts that pre-suppose that you already know how to read papers. ;-)

    But if you think you really want to be able to understand this paper, someday at least, if you can read the title and pick out the references in the intro you are already on your way.

    What is a subcontinuation? To understand this you need to find out what a continuation is. The paper kindly offers the Strachey, et al. paper, but you could also use Google or Citeseer or your local university library to find something else.

    Once you master continuations (or if you're bold and skip right to subcontinuations) track down some of the papers referenced in the intro regarding subcontinuations, such as the Hieb et al. paper.

    Continuations and subcontinuations should keep you busy for a while. ;-)

    If you are still keen, the next question is "what is a monad?". Experience with thread here on LtU suggests that finding this answer will also take some time. ;-) The paper is kind enough to point you to relevant Wadler papers, which is a good start.

    Once you have all this background, you are probably in a position to understand what good a monadic treatment of subcontinuations will do for you, and you probably know enough to read this paper.

    If you run into trouble, reread one of the sources you consulted earlier to see if you can shore up your understanding. If you got this far, you are probably almost there.

    If you run into trouble with any of the immediate sub-papers referred to from here, be a good PLT student and use this process recursively. ;-)