organizing papers...

Related to the recent "how to read papers" thread...

How do you guys organize your papers? I have a file naming scheme that's a big pain to manage and not terribly useful. I guess what I want is something like iPapers, but less tied to PubMed and cross-platform. (At least I need it to run on my Linux workstation, and hopefully on my Mac laptop...) And I guess in my dirtiest fantasies it would maybe integrate with Citeseer and CiteULike, but that's not so important to me, actually.

I'm sure semantic web types will tell me that I really want a general purpose RDF browser or something to manage general meta-data, but I'd be happy with something less general and more tailored to this domain.

Anyway, does anybody out there have a tool (or just an organizational discipline) that they're happy with? Satisfy my curiosity...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

organizing papers

I started a directory long ago for holdling stuff I downloaded. I called it 'pie' so that no one would know that I was wasting time downloading research papers (as if).

I have one huge directory containing programming language papers, and a few others for other things (math, graphics, psychology, biology, etc.). Inside the huge directory are subdirectories for each topic. Since I haven't bothered to organize them at all, there are way too many and there are probably duplications.

Each one can contain a directory called 'printed', which holds the ones I've printed, in the hopes that I don't print out something I have already printed, and waste paper.

The funny thing is that I almost never go back to this repository, even though I've only read a portion of the papers in it. I think I just collect them obsessively -- you know, in case I'm stranded on a desert island with nothing but a computer and a power outlet.

It's kind of funny to look at the list now and see how haphazard the categorization is.

I wish I had written a short synopsis of each paper I read; it would be a useful web site by now. Hmm -- maybe this is something that LtU should consider having: a collection of summaries.

abstractinterpretation
adaptivity
algebraic
algol
analysis
ancient
arrows
atomicity
binding
biological
boxing
bsp
burstall
categorytheory
classic
combinators
compilation
concurrency
continuations
coroutines
co-stuff
curiosities
cute
datastructures
deforestation
defunctionalization
denotational
dependenttypes
derivative
design
discriminated_union
distributed
domaintheory
dsl
dynamicscoping
dynamictypes
exceptions
extensible
filters
flowanalysis
fold
functional
functionaldependencies
functionalhttp
gadt
garbagecollection
generative
genericity
gofer
graph.reduction
gui
haskell
henrybaker
hindley-milner
imperative
incremental
iterators
johnhughes
lambdacalculus
lazy
lecturenotes
lenses
linearity
listlessness
lockstep
logicprogramming
lowlevel
macros
media
metacircular
modules
monads
multistage
namespaces
objects
optimization
overloading
parallelism
parsing
partial.evaluation
partial.types
patternmatching
persistence
persistent.threads
picalculus
polymorphism
polytypic
primitives
python
records
recursion
reflection
relational
ruby
scheme
shape
shells
simulation
softtyping
strictness
subtyping
supercompilation
systemf
tags
testing
theory
threads
toocloseforcomfort
trampolines
transactions
typeclasses
typeinference
types
undynamicking
unification
userdirectedoptimization
v
variant
virtual.machines
wadler
www
xml
ycombinator

Organization

I started a directory long ago for holdling stuff I downloaded. I called it 'pie' so that no one would know that I was wasting time downloading research papers (as if).

So instead they think you are wasting your time downloading recipes. :P


Originally my papers were in a giant pile. I could usually get around still as many came from citeseer and had citeseer names (e.g. the wadler98monads example below). Further I could use the date that the file was created to see which I'd gotten most recently or to get in the general area where the paper would be.

However, at one point I was reading papers on database implementation and particularly full-text databases I decided (but had long since wanted to) make a full-text database and put my papers in it. After I made the first indices it was interesting to search for terms and see old papers that I hadn't read in a while, but mostly I haven't used it extensively.

I print them, annotate them,

I print them, annotate them, and file them.

I'd like to say that I use citeulike (http://citeulike.org), but my discovery of that site coincided with ceasing reading academic papers.

Eating my own dogfood

Anyway, does anybody out there have a tool (or just an organizational discipline) that they're happy with?

I wrote a simple document storage and cataloguing application for a completely different purpose a few years ago, and I use that to store my papers.

Unfortunately, I download them much faster than I read them or catalogue them, so this approach doesn't work as well as I would like. ;-)

Still, it beats sifting through hundreds of crypticly named pdf/ps files looking for that Wadler paper I wanted to reread.

citeulike

One of the most liberating experiences of my life was deleting all the papers from my hard drive. That said, I haven't used CiteULike long enough to see how it fares in the long haul. But analogous experiences (e.g., Google vs. bookmarks) bodes well for the "labels" approach.

Same here

I used a naming scheme like Frank, say "vazirani86unique.ps", and a script to generate one big HTML file from the articles. If a simple text file existed with the same name "vazirani86unique.txt" it would look in that file for the title, abstract and annotations. It got beyond the point that it was easily manageable though.

I didn't move the stuff to my new laptop though, and pretty happy at that. Nowadays, if I want to look something up - I just Google for it.

My method

After I download a paper, I rename the file to something like "wadler98prettier", where the first is the author's last name, the second the year of publication, and the last an abstraction of the title. I store all my papers in one big directory Papers; or rather, I move them there after I've read them.

Concerning the filename: if there are multiple authors, I pick the one I read the most, or the most well-known: basically, I choose the one I know I will tend to associate most closely with the paper if I have to search for it. If it's a preprint or unpublished or has an unknown (to me) pubdate, I use a dash instead of the year. For the title, I choose some memory-jogging key words from the title. Again, the idea is that it reminds me of the paper; it needn't be accurate or mnemonic for someone else.

Also, I try to keep it short enough to double as a BibTeX key, and long enough to avoid collisions.

I also have a BibTeX file in my Papers directory, managed using BibDesk, which at some point I hope to update so it indexes all the papers there, but frankly I don't think that will ever happen.

I don't use a by-subject classification scheme. One reason is that I never go browsing through the papers by subject; instead, I remember a paper I need to check, and then maybe a couple other papers that it references, so the author-year-title scheme works better. Another reason is that any directory-based (i.e., hierarchical) classification scheme is going to be inadequate anyway, and of little use to me.

Oh, and I gzip all the PS files, since GhostView can read them that way, but not the DVI files, since Xdvi can't, and anyway they're pretty small. PDF's I keep as is, since nowadays they're usually compressed internally.

Yeah, that's pretty much it...

That's almost exactly the method I use, right down to gzipping PS files... My naming scheme is a bit more cumbersome, but it's pretty much the same idea. I agree that no simple hierarchical scheme is much better than just a big directory of well-named files, but I'd like to have access to the full titles and full author info, for instance, and I'd really like to be able to do tagging or something similar, sort by different attributes, etc.

I know it's basically your standard metadata management problem, but unfortunately I don't know of a good standard solution. Maybe I should actually look at RDF editor/browsers...

spotlight/desktop search

Now that I have Spotlight on my Mac (desktop search), I've uncategorized all my documents. Everything's in one directory and the filenames are typically the titles, abbreviated. The only reason for subdirectories is to keep related files together (chapters of a book, pages from a website, lecture series, etc)

Also, for every file, I try to have a corresponding .txt file that contains notes I've made about that article. I treat it like a log: never overwritten, always to be appended to. It is instructive to see how one's understanding has changed when you revisit a paper.

This approach frees me from having to give it a structure, in terms of filenames, directory hierarchies or RDF. Free-form notes and good text indexing work really well.

Spotlight saved me

Now that I have Spotlight on my Mac (desktop search), I've uncategorized all my documents. Everything's in one directory and the filenames are typically the titles, abbreviated.

Same here. When I pitched all my old equipment and bought my PowerBook and Tiger, Spotlight saved me a lot of trouble. The titles are good enough for a filename and I just search for terms, assuming the words in PDF can be searched.

While this is basically a "me too" post, I just can't praise Spotlight enough.

Also Completely OT

Geoff Wozniak: While this is basically a "me too" post, I just can't praise Spotlight enough.

The only thing I have to add to this is that the Mac users might also appreciate VoodooPad vs. a text file for notes. I'm using it to record my efforts in working through TAPL, for example, and also to keep notes on the various papers I'm reading.

Of course, if you don't mind giving up the nice UI, integration with Address Book, etc. there are plenty of nice serverless desktop wikis these days. My personal favorite is TiddlyWiki.

I agree. Desktop search is ni

I agree. Desktop search is nice. I'm using
Copernic in Windows. Is there a Linux equivalent?

JabRef

Today I stumbled across another BibTeX/organization tool: JabRef. It can launch PDF and PS viewers, and search CiteSeer. It seems to work a lot like BibDesk, but BibDesk is a Mac (Cocoa) app whereas JabRef is a Java (cross-platform) app.