Who owns your research? Results of SIGPLAN Open Access survey

SIGPLAN is the ACM Special Interest Group (SIG) that focuses on Programming LANguages. It runs many of the field's academic conferences (ICFP, PLDI, POPL, SPLASH...) and its elected members are recognized researchers of the field (the chair and vice-chair are Jan Vitek and Jeremy Gibbons). It recently ran a survey on Open Access, questioning respondent's opinion on Open Access at large, Green vs Gold open-access, archival strategies, or indirectly-related questions such as "conferences vs. journals".

The results of the survey are summarized and present there: Who Owns Your Research: A Survey.

A harsh summary:

  • Everyone knows Open Access is the right way to publish research.
  • Seniors and Americans care about the ACM more than Juniors and Europeans.
  • Many people would "just use Arxiv" (but then, why are so few of our field's article made available on arxiv?)
  • Most do not care about Gold OA (where authors pay the publisher to make the article publicly available)
  • There is no consensus on "conferences versus journals"

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Most do not care about Gold

Most do not care about Gold OA (where authors pay the publisher to make the article publicly available)

That's a too-hasty assumption, because the survey's questioning on this point was badly formed.

The arXiv is incredibly cheap to run, and the cost of archiving papers on it (Cornell has reported $6/paper) would essentially be lost in the noise of conference fees -- ie, it would be <1% of conference registration costs. However, the survey asked about charges ($100/article at a *minimum*) which are wildly disproportionate to the true cost, and so I don't think you can take resistance to those price levels as indications of opposition to Gold OA.

Jan said something about the

Jan said something about the conferences easily covering their publishing costs with registrations, $1000 a pop! I doubt gold OA makes sense for a conference that already costs that much, and is pretty much ALREADY pay to play.

So the question is then journals.... The journal could cost $1 million to read and $1 million per paper to publish and that would not affect the number of journal papers I want to read or publish at all (since I don't see the point even if free).

The big problem is for researchers in small countries

Basically, the funding agencies in smaller countries typically have grants reviewed by people from multiple scientific disciplines, but since outsiders usually don't understand the unusual role of journals in CS, grant applications in CS end up facing a completely artificial extra hurdle. If we created the Journal of POPL or the Transactions of PLDI, which were conference proceedings treated as journal issues for bibliometric purposes, then the biggest problem with the conference/journal split would go away.

If you wanted to tinker with the model a bit, and do something like VLDB, I would be in favor of that, but the high order bit is *literally* just the name. This problem is so utterly inane that I think people have trouble comprehending it. :)

Oh, I know why we have

Oh, I know why we have journals for acedemic games, but damn, not everything in life should revolve around system gaming.

I seriously think we should up our game on communication. Dead trees are so dead, what's next? The goal is to influence and communicate, not put out disposable fodder for tenure cases.

Do what ECOOP does...

if you want my personal opinion DO WHAT ECOOP JUST DID.

Registration for students is 300 Euros for a whole week including food.

Papers are open access at no extra charge.

Authors retain copyright.

All the rest is just noise.

That makes sense, and it is

That makes sense, and it is definitely progress. I have no problem with $1K registrations as long as someone else is paying, but there is a danger that we limit who can attend a conference (not a researcher of a well funded company, not an professor, or a student). I've already cut back to one conference a year (SPLASH, usually, sorry ECOOP) given the expense of it all (it doesn't help that from China I have to do a long haul either way).

It would be nice to develop cheaper effective communication pathways in the community (in addition to conferences). Journals don't really fill that role, but maybe they could, at least something like a CACM (i.e. SIGPLAN Notices?). Or maybe not, my talks seem to get a lot more consumers than my papers do :p

The registration is not

The registration is not necessarily the main cost anyway. I bet that you pay a lot more in hotel+travel+food.

$1k for a airplane ticket,

$1k for a airplane ticket, $1k for food/lodging - that is at the high end for people coming from out of country staying in the conference hotel. If it's more domestic (from Europe to Europe, for example) and you stay at cheaper places, you can totally get transportation and lodging down a lot.

But it's not like $1k is unreasonable, conferences are expensive to organize! It's just a barrier.

Green OA, Gold OA

The arXiv is Green OA (author-side archiving), not Gold OA. Computer science papers have been available on FTP sites since the 1970s, making them the first examples of Green OA.

It's widely believed that Gold OA (Open Access journals) all charge author fees, but it's not the case. This 2006 study shows that 52% of Gold OA journals charge no author-side fees at all (the rest are supported by institutions, advertising, or both). What is more, only 12% of author-side fees are actually paid out of pocket by authors: the rest are paid by employers, funding sources, or both, or are waived by the journal publishers. Of course, things have probably changed in the last ten years, but the important point is that the "Gold OA = author fees" meme is just false, and should be fought by anyone who supports OA.

Green, Gold, Purple, Black...

All of those distinctions are part of game played by publishers to keep profiteering from our work.

What we need is to ensure that our research is accessible in perpetuity and that it can be easily found. The way forward is most likely to have something like LIPIcs (the Dagstuhl proceeding series) as an overlay over arXiv. We could pull in DBLP to be our handy index. And we are done.

Then we need to pay for it. My guess is that the total cost (funding DBLP, a bit of arXiv and something like LIPIcs) would be less than a million a year. Where do you find the money? A combination of government support, University libraries could pay a little, conference registration fee, industry donations...

arxiv in actual practice

I'm a supporter of arXiv (I subscribe to arxiv notifications and hope more people use it so that it becomes even more effective at tracking what's up in the field) and I was happy to see many textual comments of the survey mention it as a viable route. I'm however a bit surprised to see such a strong support for arxiv in words, when the actual usage by our community today is relatively low. Why don't more people upload their papers on arXiv today already?

(When I asked around, people said that having to upload the TeX sources was increasing the work involved, and that they instead went to the lower-overhead national open archive repositories that only require the PDF.)

Its not just uploading the

Its not just uploading the TeX sources, but getting your tex project to something that arxiv can handle WHEN THERE IS REALLY NO DOCUMENTATION ON HOW TO DO THAT (can't handle modern image formats like PNG or PDF, really painful). Really, try to figure it out, and either you know how to do it or have no idea where to start, Google doesn't seem to help.

I tried once, and I gave up. I could rewrite my paper in Word so I wouldn't have to deal with someone else's transparent latex installation, but that is way too much work.

It works now

Documentation might be lacking, but all those things work now.

PDF and PNG are documented to work (http://arxiv.org/help/submit, http://arxiv.org/help/submit_tex). I'd recommend going the pdflatex route — find \pdfoutput=1 here.

Is the documentation still troublesome? Do we need some blog post on the topic? I can't promise anything *that* soon, but I can try.

Evidence:
http://arxiv.org/abs/1312.0658

What about preprints?

Currently, the only problem I have with arXiv is that I can upload there preprints, and link to final versions, but it seems ACM forbids uploading the final version — unless you add enough extra material (IIRC, 25-30%) to create a major revision that is owned by you (I think this clause is to allow non-ACM journal submissions).

* That's not a problem with arxiv, it's a problem with ACM.


Censorship has been an issue on arXiv; HAL has done better

Censorship has been an issue on arXiv. HAL has done better.

PLDI, POPL, ECOOP surveys

Jan Vitek has additional comments that incorporate not only the SIGPLAN survey, but also recent surveys for the attendants of PLDI and POPL and the authors of ECOOP. Inside you'll find his comments on "The Myth of the Irrelevance of Journal Publishing", "Against the Kaleidoscope of Open Access Colors" and "The ACM Disconnect".

The matter has been settled...

https://plus.google.com/113412990429233162110/posts/WSR5MoSMpN5

Ah

Too obvious... :)

Self-publishing unsatisfactory, walking from ACM

Let me continue here from another thread.

gasche wrote:

Blaisorblade wrote:

As others have remarked, green open access doesn't fully work, because most of us academics aren't disciplined enough (we weren't hired as librarians, after all).

I think this is an observation that goes in the wrong direction, and that we would be better off by agreeing as a community that it's our responsibility to make our articles accessible online (whether by actually being reactive in hosting them online, or delegating to arxiv).

I mean, at least where I work, researchers are tasked with plenty of things they were not hired for (many of administrative nature), and putting PDFs online in a place search engines can find them is among the less annoying and time-consuming. On the other hand, the current state of affairs that supposedly let professionals handle this aspect is, to many, deeply insatisfactory.

I would be glad to email Tiark Rompf once in a while to get his PDFs back online if the undergraduate students of the future don't have to jump through hoops to access research articles (I had to, this is not an imaginary concern). This may "not fully work", but I think that remark misses the point.

I was semi-quoting a comment by Jan Vitek pushing for gold open access (LIPIcs-style), not encouraging the present state of things. But while I've put my articles online (in multiple ways), even that is deeply unsatisfactory (as I'll explain below), so this can only be done in the spirit of "let's try a kludge to make things slightly more tolerable". In particular, I disagree with this part:

we would be better off by agreeing as a community that it's our responsibility to make our articles accessible online (whether by actually being reactive in hosting them online, or delegating to arxiv).

To be sure, maybe that's better than what we have, but worse than a more robust solution. If we take the effort to agree as a community, we might as well do something better; two alternatives seem "force ACM into Open Access" and "secede from ACM" (something I mention only because it has been proposed in public, though on a Facebook comment).

Right now, I see no fully satisfactory solution, except for those that have "institutional repositories" available which care to preserve articles in perpetuity; even there, you are only allowed to put preprints.

I've remarked above on the current problems with putting non-preprints on arXiv; in fact, I've re-read the terms, and I'm not convinced any more they actually allow posting on arXiv — except before submission (or is arXiv an "institutional repository"?). Moreover, the ACM author terms are rather confusing (maybe on purpose), and if I wanted to be sure to follow the rules, the next step would be asking a lawyer. Could those rules be on purpose?

Publishing overrated

Unless you are a grad student or a professor aiming at tenure...publishing doesn't give you much: your paper will probably not be read anyways unless you do your own publicity.

If the ACM or the SIGs want to add value to justify their existences, it is right there: help us to get people to read our papers!

Date of publication establishes priority date for research.

The date of publication establishes the official priority date for research.

No one seems to care about

No one seems to care about that. Being first isn't good enough even for bragging rights, you have to be first and communicative.

Now if you mean patent submissions...that is a whole other kettle.

Priority is important in research

Priority is important in research.

Once something is published, everything following is officially considered to be derivative.

For example, Church was the first to publish a proof of the computational undecidability of the halting problem. Even though Turing may have done some independent work, it is officially considered to be derivative because no one can be sure that there was no path from Church's publication to Turing's work. In fact, Turing hurriedly finished his work for publication already knowing about Church's work.

To history, Turing is more

To history, Turing is more well known than Church even if Church was first (do we study Church Machines in a CS theory class?).

Wasn't Church Turing's PhD adviser?

And this is all a bit vain. Most people think Bret Victor invented live programming and no one knows who Chris Hancock is. And that is pretty normal.

Church machines are much more fundamental than Turing machines

Church machines (the lambda calculus) are much more fundamental than Turing Machines.

Lambda Calculus is more fundamental than Turing Machines

The Lambda Calculus is more fundamental than Turing Machines.

Also, the Lambda Calculus preceded Turing Machines. Unfortunately, many researchers, e.g. Gödel, did not understand the fundamental importance of the Lambda Calculus.

The Lambda Calculus is

The Lambda Calculus is fundamental, and of course Church will forever be famous. Just "Turing Machine" is such a sexy word for most people whilw "Lambda Calculus" includes two scary concepts (greek letters and math).

tldr; marketing is important.

When general recursive

When general recursive functions, lambda-calculus, and Turing machines were proven euqui-powerful to each other, it enhanced the reputation of general recursive functions and lambda-calculus. Because it's intuitively clear that Turing machines correspond to what you can do by mechanical means.

Is it?

Because it's intuitively clear that Turing machines correspond to what you can do by mechanical means.

I don't find it intuitively obvious that, e.g. adding a second dimension to the tape doesn't increase expressive power. It's intuitive that you can implement Turing machines by physical means but not that everything you can implement by physical means could be computed by a Turing machine. At least to me.

The claim isn't that

The claim isn't that anything you can implement by physical means could be computed by a Turing machine. It's that anything you can do by "mechanical calculation" can be computed by a Turing machine. A little thought clears up the question about whether anything more is computable using a 2D tape; but Gödel himself, while using general recursive functions, wasn't confident of them as a general model of computation until they were proved equi-powerful with Turing machines. Likewise it might seem obvious to us that lambda-calculus can do general computation, but we live in a world where it's been common knowledge for many decades that combining conditionals, iteration or recursion, and unbounded time and space give general computation.

A little thought clears up

A little thought clears up the question of 2D tape? And this is less thought than is required to check that lambda-calculus can emulate a Turing machine? Probably so, given that understanding how to use lambda-calculus as a general computer involves understanding Church encoding, Y combinator, etc. But this still seems like a pretty arbitrary distinction. The way we come to believe Church's thesis is mostly by trying a bunch of possible extensions and demonstrating that they don't add power. It's not obvious a priori that they don't, but after a while we can see the pattern.

You can't implement Turing machines by physical means

The only thing you can implement are finite-state machines.

Discrete and continuous

Don't you mean, the only Turing machines you can implement are finite-state machines?

* Yes. I meant "arbitrary Turing machines"


That was Turing's contribution

To design Turing machines, Turing did some "mathematical modeling" of the "computers" of the time — that is, people doing computations on paper. You can write or erase signs, read them, or use them to alter the state of your mind, which doesn't have infinite capacity. In his paper, Turing actually describes this argument in some detail, as far as I've read, because it's not *that* obvious (though I've never gotten myself to read the real thing). Of course, this argument can only be heuristic — but this argument is what mattered to convince mathematicians.

Gödel knew about lambda calculus, but wasn't convinced. I'm not aware of a general heuristic argument that lambda calculus describes computation — you can only do that by showing that some constructs (mu-recursive functions) are expressible, and then showing that those constructs are enough for Turing-completeness ;-).

I still think that studying lambda calculus is better than studying Turing machines, but apparently there's even some research to do before lambda calculus does everything that is needed. Luckily, Bob Harper's on it — see e.g. https://existentialtype.wordpress.com/2014/09/28/structure-and-efficiency-of-computer-programs/

Message-passing is a good way to understand the lambda calculus

Message-passing is a good way to understand the lambda calculus,

Each lambda expression is an Actor that can be sent messages.

Beta Reduction

How does that model beta reduction?

Beta reduction is sending the argument to the lambda expression

Beta reduction is sending the argument to the lambda expression.

Reducing Actors?

Beta reduction is a re-writing of the expression, for example (\x y . add x y) applied to 3 reduces to the expression (\y . add 3 y). How is this modelled by sending a message to an actor? My understanding is actors are opaque, and you cannot observe the code they contain, nor rewrite that code into a reduced form?

re reducing actors?

With respect to an actor definition, receipt of a message binds free variables in the behavioral specification.

My understanding is actors are opaque,

That depends on where you are (conceptually) standing, of course.

The actors that implement Google services are opaque to you. Those closer to you might not be.

Opaque and Imperative?

I meant that actors are opaque, in the sense that it is not like Lisp where you can write macros. You cannot manipulate the source of other actors because you only see the runtime object.

There's something else as well, given f x = x + 1, in a functional language f 2 is 3, but with actors it is not. You have to send the message "2" to actor "f" which returns "3". This is imperative, IE f 2 "returns" 3, whereas in functional programming f 2 is 3.

re opaque and imperative

I meant that actors are opaque, in the sense that it is not like Lisp where you can write macros.

Several of those words don't mean what you seem to think they mean. Your sentence doesn't make much sense.

-

Care to be more specific

That doesn't help much. What specifically are you having trouble with?

A further point is it seems actors are impure, yet do not provide a pure subset. Foundationally this seems backwards, as you cannot derive purity from impurity, but you can do it the other way around (with Monads).

Maybe I can express that better. Beta reduction is defined by syntactic re-writing, hence f 2 is 3 there is no difference between writing "f 2" and 3, they are just different ways of writing the same thing. In an actor system there is a moment before the message us sent where you have f.[2], then there is a time where you have something undefined, and finally you get the response back (if nothing has crashed and no cables come loose) of 3.

Actor Model is founded on physics

The Actor Model is founded on physics. In particular, it is based on the principle that "Interaction creates reality" based in part on the work of the physicist Carlo Rovelli.

See page 10 of Actor Model of Computation.

For example, Factorial.[3] designates the action of sending Factorial the message [3], which can in due course produce a response with the Actor 6 that can then be used in further computation.

Maths is pure

Right, but there are two things, Physics and Maths. To me it makes sense to start with a pure system and add a model of impurity. I don't really see any reason why I would want to do it the other way around.

Actor Model is a *mathematical* theory of computation

The Actor Model is a mathematical theory of computation, e.g., axioms, denotations, etc.

Everything an Actor

But isn't everything in the Actor Model an Actor, and hence impure as it relies on message passing semantics?

An Actor can be determinate

An Actor can be determinate, i.e., it always provides the same response for the same message.

Determinate != Pure

Do you mean referentially transparent (IE the result could be memo-ised). This is not the same thing as pure. A pure function's application is identical to its output, so given "f x = x + 1" then "f 2" is identical to "3". We could substitute "3" for "f 2" everywhere in the program without changing its meaning. With Actors this is not true as f exists before the message "2" is sent to it, for some period the value is indeterminate (the message has been sent, but no result is returned yet), and eventually f.[2] "returns" 3, but it could fail (due to out-of-memory, or a cable being disconnected). Semantically these are different things.

How can you model pure functions using Actors?

purity

There is no observable difference between your "f.[2]" and "3". The two expressions can be freely substituted for one another without changing the meaning of an actor program.

it could fail (due to out-of-memory, or a cable being disconnected). Semantically these are different things.

In actor semantics, every message arrives after an unbounded bur finite time.

In physical reality that can't be literally true, of course.

The semantics of actors reflect the fact that, without some additional source of information, a program can not observe any difference between a message which will never be delivered, and a message which is just taking a very long time.

"f.[2]" and "3" are the same type and react to the exact same messages with the same results. They are the same.

Messages and Execution.

"f.[2]" and "3" are the same type and react to the exact same messages with the same results. They are the same.

And yet the message and its response still exists. There _is a difference. Lets use a modified syntax to expose the difference:

f 2 () 

So the '()' represent the action of execution. Before execution we have "f 2" and after execution we have 3. The difference is important when side-effects are considered. Look at:

if x then (f 2) else (f 3)

Here 'f 2' is passed unexecuted to the 'if' statement, and it only gets executed (and its side-effects only happen) if something like this happens inside the 'if':

if x y z = case x where
   True -> y ()
   False -> z ()

Note how this means something completely different:

if x then (f 2 ()) else (f 3 ())

Here the side effects of both 'f 2' and 'f 3' branches happen before the if statement itself is even executed.

To me it seems sending a message is the same as executing (it control when the side-effects happen). Using the syntax above there seems to be no way using actors to pass the parameters to a function without executing it.

re messages and execution

Here the side effects of both 'f 2' and 'f 3' branches happen before the if statement itself is even executed.

Earlier you defined f to have no side effects.

Here, let's try to make this more precise.

Here is the constant three:

  3       this is an actor of type Integer, called "3" 

Here is a type definition for procedures that take and return an integer:

  Interface IntegerToInteger { [Integer]↦Integer }▮

Here is the (verbose form) of a procedure definition:

  Successor ≡ Actor implements IntegerToInteger using 
                [n : Integer] → n + 1 ▮

"Successor" is an actor. An actor of this style is called a procedure actor.

A procedure is called like this (where we define "three" to mean the value returned):

  three ≡ Successor.[2]

That definition has exactly the same meaning as:

  three ≡ 3

I think you are a little hung-up on ordering of evaluation. Here is a second example to clear that up.

Let's stipulate, to bring observable side effects into the picture, that:

  OutputA.[]  prints "a" on the terminal
  OutputB.[]  prints "b" on the terminal

Here is a pair of successor functions that produce some observable output.

  SuccA ≡ Actor implements IntegerToInteger using 
           [n : Integer] → 
             Do { OutputA.[] } ⬢
             n + 1 ▮
  SuccB ≡ Actor implements IntegerToInteger using 
           [n : Integer] → 
             Do { OutputB.[] } ⬢
             n + 1 ▮

Lastly, we can define DoublePlusTwoLoudly (this time using the short-hand notation):

  DoublePlusTwoLoudly.[x : Integer] ≡
    Let a ← SuccA (x),
        b ← SuccB (x)
      a + b ▮

If we evaluate DoublePlusTwoLoudly[3] we get 8 and there are also side effects. The output on the terminal could be "ab" or "ba". If it were not for the side effects, DoublePlusTwoLoudly[3]could be compiled the same way as the constant 8.

Arguments, Values, and Nullary Functions.

But this does not allow for the arguments. It is important that we can pass the arguments to 'f' in the 'if' statement without evaluating it.

We do not want to have to define multiple versions of 'if' for forwarding arguments of all types and arities to the functions in the 'then' and 'else' branches.

You have provided a version which does this:

if x then (f) else (g)

But we need to cope with:

if x then (f 'abc') else (g 23)
if x then (f 1 2 3 'abc') else (g 'a' 'b')

And it is critical that only the side effects of the selected branch 'f' or 'g' happen, not both. Also I am using 'if' as a familiar example, this should really apply to all functions:

f (g 1 2 ()) -- execute 'g 1 2' before passing its result to 'f'.
f (g 1 2) -- pass a nullary function to 'f' that is not executed.

This makes it clear there is a difference between a nullary function and a value. Any theory hoping to encompass all of computing needs to make this distinction. In mathematics all functions are implicitly executed as they tend to be in functional languages, hence why ML does not allow nullary functions (you use a unary function with a void argument instead).

lazy and strict (re arguments)

You have plenty of syntactic choices about how to pass unevaluated arguments to a conditional. You could abstract them as procedures, a la the Rabbit thesis. You could use a postpone operator as sugar. You could define the "if" syntax do implicitly do these things (which I believe is actually the case). There is no problem here.

It's a little exotic but not *that* exotic.

Nullary Actors and Postpone

I am not interested in defining the 'if' syntax to implicitly do this, as it needs to be a general mechanism. In C++ you would do this with a function object where you pass all the arguments to the constructor, and you then use 'operator()' to execute the function. The problem is the boilerplate makes this difficult. It would seem like Actors have the same problem. What you could do is:

Have an actor that accepts all the arguments as a message and returns a nullary actor that can be executed later (by passing it a nullary message). But this involves creating a new actor for every nullary function. It works, but it does not make it nice to use.

Postpone 'might' do what I want, but I can't find a clear definition.

I guess my feeling is that Actors do it backwards, messages always get acted on, and you have to wrap in another actor to return a nullary actor, rather than simply allowing partial application. This results in lots of boiler-plate which is not necessary if you do it the other way around.

expectations of readers (re Postpone 'might' do what I want...)

Postpone 'might' do what I want, but I can't find a clear definition.

There might be one around somewhere but anyway, it is routine. For example, anyone who is familiar with DELAY, FORCE, and implicit forcing in Scheme should be able to work out their own definition without much difficulty.

I guess my feeling is that Actors do it backwards, messages always get acted on, and you have to wrap in another actor to return a nullary actor, rather than simply allowing partial application.

You can rather trivially express a lambda calculus or a combinator calculus in terms of actors. It is true that those formalisms are not primitive concepts in actor world, but they do not require an elaborate definition.

Conversely, lambda and combinators can't express generalized actors at all, at least not without adding a bunch of new axioms to their definitions.

Boilerplate

My current understanding is you would need a new actor for each message type. Kind of like this:

defer1 = f 1 2
defer2 = g 'a'
doit x y z = if x then y else z

doit random_bool defer1 defer2

You would need a new 'defer' for every unexecuted function you wish to pass around as I can't see a way of generically forwarding arguments using Actors.

Ah Well

I read up on the reference. Personally, I am more in the "Reality is continuous because Godel says so." camp. But that's a camp of one, so if this does it for you, yeah well.

In Actor Model, parallel lambda calculus more intuitive than TM

In the Actor Model (which is based on the physics of computation), the parallel lambda calculus is more intuitive than Turing Machines.

See section on implementing lambda calculus using Actors in the following article:
Actor Model

Lambda calculus is more intutive than Turing Machines

Each lambda expression can be modeled as an Actor.