From Writing and Analysis to the Repository: Taking the Scholars' Perspective on Scholarly Archiving

Marshall, C.C. From Writing and Analysis to the Repository: Taking the Scholars' Perspective on Scholarly Archiving. Proceedings of JCDL'08

This paper reports the results of a qualitative field study of the scholarly writing, collaboration, information management, and long-term archiving practices of researchers in five related subdisciplines. The study focuses on the kinds of artifacts the researchers create in the process of writing a paper, how they exchange and store materials over the short term, how they handle references and bibliographic resources, and the strategies they use to guarantee the long term safety of their scholarly materials.

Not directly programming language related, but two things makes this paper relevant. First, many of the tools involved, especially those that really enhance productivity are language-based, or include DSLs (e.g., Latex, Bibtex, R (+Sweave) etc.). Second, many of us write papers, and as language geeks we surely crave great tools...

So, what is you ideal tool chest when it comes to doing and publishing research? And what do you actually use everyday?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

my everyday tools

LaTeX, TeXShop to edit papers, EndNote to pull references from online databases, JabRef to convert references to BibTex, GnuPlot for simple figures, Perl for data processing, Eclipse for coding.

I'm not in Computer Science, but close...

Field: theoretical physics

Field: theoretical physics (hydrodynamics)

For bibliography generation and article-capture management: Zotero; ease of use is fantastic and it exports nicely to BibTeX.

For writing: LyX. It provides a power tool for writing and organization and 90% of the formatting I need. When I need the other 10%, it lets me put in raw LaTeX to get the job done. Best of both worlds.

For calculation: NumPy+C for the non-time-sensitive stuff, C++ and BLAS/LAPACK for the more painful stuff. I've tried to find ways to use FP languages, but the speed and easy interaction with MPI is just too compelling.

For graphing/visualization: Matplotlib/Python for 2d and most graphs--for nasty stuff, VTK/Python and maybe Mayavi. Unfortunately having a fit getting Mayavi and VTK working properly on my work Mac.

For diagrams: Inkscape.

For revision control: Mercurial. I love Darcs, but found it a bit more of a hassle to put into all my different environments.

For build management: SCons. WOW, what a difference it makes having your build environment in a proper language like Python so you can add targets algorithmically, extend the tool as you wish, etc. There's something really satisfying being able to say "scons -u all_plots" and walking away, knowing that overnight tens of simulations will be run, saved, plotted, converted into movies, and ready for your perusal in the morning.

But for the basic writing stack: LyX, inkscape, Zotero, and LaTeX under it all. What a winning combination!

No Matlab?!

No Matlab?!

NO. No Matlab. And I say

NO. No Matlab. And I say this having used Matlab in the past and taught a semester-long course on computational physics using Matlab, so I'm familiar with the tool.

In all meaningful ways that I can think of, Numpy/Matplotlib is a better solution. Python is a better language in general than Matlab, the matrix extensions provided by Numpy are fast and concise, the added functionality of being able to leverage the vast amount of Python software out there is massive, programs are very easily extensible in C and Fortran (even to the extent of embedding the C code right in the Python files; see, and it's all free, open software. Matplotlib is a fantastic plotting library with similar syntax to Matlab if you wish, and you can drive everything with a proper build tool like SCons.

I would never do another numerical project in Matlab unless absolutely forced.

Language Research

I'd split writing papers into the doing-part and the writing-part. Although they often happen concurrently the tools tend to be different.

Doing: Compiler / Interpreter for the source language. Changes depending on the experiment being done, but normally it's Prolog / Haskell / C or Python. Generally there is an inner language being written for most experiments, so an interpreter is preferable to a compiler a lot of the time. Once data is generated it gets fed into gnuplot / graphviz / maple, or increasingly Sage. Make and bash are essential as glue, and everything either lives in subversion, or can be generated from a file in subversion.

Writing: Vim / pdflatex / gv.

One note that I'd make about languages that are useful for research is not about the semantics of the language, it's more about the presentation. I've always found that interpreters are easier to work in than compilers for active research. Recently I've notice that worksheet layouts are much more productive than the normal interpreter command line. Sage is a really good example of how this can work. Fusing the editor into the interpreter is one (small) aspect, but the ability to break problems up into spatial modules and preserve lots of interpreted results rather than the latest is a huge boost.


If you're on a Mac Papers looks to be super nice. At least for organizing papers and searching various repositories.

Tools with latex

  1. Pdflatex produces better pdfs to work with than the old-fashioned latex->dvipdf toolchain, for a number of reasons. Understanding PDF is well worth doing that can be guessed by readers of MIT's PDF faq;

  2. I'm very happy with metapost and pdftricks for describing pictures and manipulating text. See Imported graphics in PDFLaTeX;
  3. One area where the latex world lags behind the MS Word world is in support for collaborative editing, particularly Word's Track Changes feature (not that it is perfect, by any stretch). However, there are useful tools: latexdiff produces a latex document that displays the changes between two other documents, and the LyX team have been putting together a Track Changes -like functionality.
  4. Syntax- and reference- aware editing makes a big difference to working with latex. We fortunate emacs users have the wonderful AUCTEX package; the vim-latex suite and the auctex.vim plugin provide approximations for vim users. The graphical LyX has its fans, although I have not found it to my liking when I tried it out.
  5. My doctoral thesis outgrew the limitations of bibtex; suffice it to say working with bibtex's representation of bibliographic items does not permit all of the rules of the Chicago Manual of Style to be followed. Instead I hand edit "thebibliography" environments. Bibtex is excellent in principle, but it is crying out for a successor.

So, any general conclusions?

So, any general conclusions? (Aside from a general dislike for MS Word...)