Green Open Access and Preprint Linking

ICSE 2013

One of the most useful changes to the ICSE International Conference on Software Engineering this year, was that the program website contained links to preprints of many of the papers presented.

As ICSE is a large event (over 1600 people attended in 2013), it is worth taking a look at what happened. What is preprint linking? How many authors actually provided a preprint link? What about other conferences? What are the wider implications for open access publishing in software engineering?

Self-Archiving

Preprint linking is based on the idea that authors, who do all the work in both writing and formating the paper, have the right to self-archive the paper they created themselves (also called green open access). Authors can do this on their personal home page, in institutional repositories of, e.g., the universities where they work or in public preprint repositories such as arxiv.

Sharing preprints has been around in science since decades (if not ages): As an example, my ‘alma mater’ CWI was founded in 1947, and has a technical report series dating back to that year. These technical reports were exchanged (without costs) with other mathematical research institutes. First by plain old mail, then by email, later via ftp, and now through http.

While commercial publishers may dislike the idea that a free preprint is available for papers they publish in their journals or conference proceedings, 69% of the publishers do in fact allow (some form of) self-archiving. For example, ACM, IEEE, Springer, and Elsevier (the publishers I work most with) explicitly permit it, albeit always under specific conditions. These conditions can usually be met, and include such requirements as providing a note that the paper has been accepted for publication, a pointer to the URL where the published article can be found, and a copyright notice indicating the publisher now owns the copyright.

Preprint links shown on ICSE site.

Preprint links as shown on ICSE site.

Preprint Linking

All preprint linking does, is ask authors of accepted conference papers, whether they happen to have a link to a preprint available. If so, the conference web site will include a link to this preprint in its progam as listed on its web site.

For ICSE, doing full preprint linking at the conference site was proposed and conducted by Dirk Beyer, after an earlier set of preprint links was collected on a separate github gist by Adrian Kuhn.

Dirk Beyer runs Conference Publishing Consulting, the organization hired by ICSE to collect all material to be published, and get it ready for inclusion in the ACM/IEEE Digital Libraries. As part of this collection process, ICSE asked the authors to provide a link to a preprint, which, if provided, was included in the ICSE on line program.

The ICSE 2013 proceedings were published by IEEE. In their recently updated policy, they indicate that “IEEE will make available to each author a preprint version of that person’s article that includes the Digital Object Identifier, IEEE’s copyright notice, and a notice showing the article has been accepted for publication.” Thus, for ICSE, authors were provided with a possibility to download this version, which they then could self-archive.

Preprints @ ICSE 2013

With a preprint mechanism setup at ICSE, the next question is how many researchers actually made use of it. Below are some statistics I collected from the ICSE conference site:

Track / Conference #Papers presented #Preprints Percentage
Research Track 85 49 57%
ICSE NIER 31 19 61%
ICSE SEIP 19 6 31%
ICSE Education 13 3 23%
ICSE Tools 16 7 43%
MSR 64 36 56%
Total 228 120 53%

 

In other words, a little over half of the authors (53%) provided a preprint link. And, almost half of the authors decided not to.

I hope and expect that for upcoming ICSE conferences, more authors will submit their preprint links. As a comparison, at the recent FORTE conference, 75% of the authors submitted a preprint link.

For ICSE, this year was the first time preprint linking was available. Authors may have not been familiar with the phenomenon, may not have realized in advance how wonderful a program with links to freely available papers is, may have missed the deadline for submitting the link, or may have missed the email asking for a link altogether as it ended up in their spam folder. And, in all honesty, even I managed to miss the opportunity to send in my link in time for some of my MSR 2013 papers. But that won’t happen again.

Preprint Link Sustainability

An issue of some concern is the “sustainability” of the preprint links — what happens, for example, to homepages with preprints once the author changes jobs (once the PhD student finishes)?

The natural solution is to publish preprints not just on individual home pages, but to submit them to repositories that are likely to have a longer lifetime, such as arxiv, or your own technical report series.

An interesting route is taken by ICPC, which instead of preprint links simply provides a dedicated preprint search on Google Scholar, with authors and title already filled in. If a preprint has been published somewhere, and the author/title combination is sufficiently unique, this works remarkably well. MSR uses a mixture of both appraoches, by providing a search link for presentations for which no preprint link was provided.

Implications

Open access, and hence preprint publishing, is of utmost importance for software engineering.

Software engineering research is unique in that it has a potentially large target audience of developers and software engineering practitioners that is on line continually. Software engineering research cannot afford to dismiss this audience by hiding research results behind paywalls.

For this reason, it is inevitable that on the long run, software engineering researchers will transform their professional organizations (ACM and IEEE) so that their digital libraries will make all software engineering results available via open access.

Irrespective of this long term development, the software engineering research community must hold on to the new preprint linking approach to leverage green open access.

Thus:

  1. As an author, self-archive your paper as a preprint or technical report. Consider your paper unpublished if the preprint is not available.
  2. If you are a professor leading a research group, inspire your students and group members to make all of their publications available as preprint.
  3. If you are a reviewer for a conference, insist that your chairs ensure that preprint links are collected and made available on the conference web site.
  4. If you are a conference organizer or program chair, convince all authors to publish preprints, and make these links permanently available on the conference web site.
  5. If you are on a hiring committee for new university staff members, demand that candidates have their publications available as preprints.

Much of this has been possible for years. Maybe one of the reasons these practices have not been adopted in full so far, is that they cost some time and effort — from authors, professors, and conference organizers alike — time that cannot be used for creative work, and effort that does not immediately contribute to tenure or promotion. But it is time well spent, as it helps to disseminate our research to a wider audience.

Thanks to the ICSE move, there now may be a momentum to make a full swing transition to green open access in the software eningeering community. I look forward to 2014, when all software engineering conferences will have adopted preprint linking, and 100% of the authors will have submitted their preprint links. Let us not miss this unique opportunity.

Acknowledgments

I am grateful to Dirk Beyer, for setting up preprint linking at ICSE, and for providing feedback on this post.

Update (Summer 2013)

David Notkin on Why We Publish

This week David Notkin (1955-2013) passed away, after a long battle against cancer. He was one of my heroes. He did great research on discovering invariants, reflexion models, software architecture, clone analysis, and more. His 1986 Gandalf paper was one of the first I studied when starting as a PhD student in 1990.

December 2011 David sent me an email in which he expressed interest to do a sabbatical in our TU Delft Software Engineering Research Group in 2013-2014. I was as proud as one can be. Unfortunately, half a year later he had to cancel his plans due to his health.

David was loved by many, as he had a genuine interest in people: developers, software users, researchers, you. And he was a great (friendly and persistent) organizer — 3 weeks ago he still answered my email on ICSE 2013 organizational matters.

In February 2013, he wrote a beautiful editorial for the ACM Transactions on Software Engineering and Methodology, entitled Looking Back. His opening reads: “It is bittersweet to pen my final editorial”. Then David continues to address the question why it is that we publish:

“… I’d like very much for each and every reader, contributor, reviewer, and editor to remember that the publications aren’t primarily for promotions, or for citation counts, or such.

Rather, the intent is to make the engineering of software more effective so that society can benefit even more from the amazing potential of software.

It is sometimes hard to see this on a day-to-day basis given the many external pressures that we all face. But if we never see this, what we do has little value to society. If we care about influence, as I hope we do, then adding value to society is the real measure we should pursue.

Of course, this isn’t easy to quantify (as are many important things in life, such as romance), and it’s rarely something a single individual can achieve even in a lifetime. But BHAGs (Big Hairy Audacious Goals) are themselves of value, and we should never let them fade far from our minds and actions.”

Dear David, we will miss you very much.


See also:


Desk Rejected

One of the first things we did after all NIER 2013 papers were in, was identifying papers that should be desk rejected. What is a desk reject? Why are papers desk rejected? How often does it happen? What can you do if your paper is desk rejected?

A desk reject means that the program chairs (or editors) reject a paper without consulting the reviewers. This is done for papers that fail to meet the submission requirements, and which hence cannot be accepted. Filtering out desk rejects in advance is common practice for both conferences and journals.

To identify such desk rejects for NIER 2013, program co-chair Sebastian Elbaum and I made a first pass through all 160+ submissions. In the end, we desk rejected around 10% of the submissions (a little more than I had anticipated).

Causes for reject included problems in:

  • Formatting: The paper does not meet the 4 page limit;
  • Scope: The paper is not about software engineering;
  • Presentation: The paper contains, e.g., too many grammatical problems;
  • Innovation: The paper does not explain how it builds upon and
    extends the existing body of knowledge.

Of these, for NIER the formatting was responsible for half of the desk rejects.

Plagiarism
A potential cause that we did not encounter is plagiarism (fraud), or its special form self-plagiarism (submitting the same, or very similar, papers to multiple venues).

In my experience, plain plagiarism is not very common (I encountered one case in another conference, where we had to apply the IEEE Guidelines on Plagiarism).

Self-plagiarism is a bigger problem as it can range from copy-pasting a few paragraphs from an earlier paper to a straight double submission. While the former may be acceptable, the latter is considered a cardinal sin (your paper will be rejected at both venues, and reviewers don’t like reviewing a paper that cannot be accepted). And there are many shades of grey in between.

Notifications
We sent out notifications to authors of desk rejected papers within a few days after the submission deadline (it took a bit of searching to figure out that the best way to do this is to use the delete paper option from EasyChair). Thus, desk rejects not only serve to reduce the reviewing load of the program committee, but also to provide early feedback to authors whose papers just cannot make it.

Is there anything you can do to avoid being desk rejected?
The simple advice is to carefully read the submission guidelines. Besides that, it may be wise to submit a version adhering to all criteria early on when there is no immediate deadline stress yet. This may then serve as a fallback in case you mess up the final submission (uploading, e.g., the wrong pdf). Usually chairs have access to these earlier versions, and they can then decide to use the earlier version in case (only) the final version is a clear desk reject (for NIER this situation did not occur).

Is there anything you can do after being desk rejected?
Usually not. Most desk rejects are clear violations of submission requirements. If you think your desk reject is based on subjective grounds (presentation, innovation), and you strongly disagree, you could try to contact the chairs to get your paper into the reviewing phase anyway. The most likely outcome, however, will still be a reject, so it may not be in your self-interest to postpone this known outcome.

Submission times
And … are desk rejects are related to the paper submission time? Yes, there is a (mild) negative correlation: For NIER, there were more desk rejects in the earlier than in the later submissions. My guess is that this is quite common. There seem to be authors who simply try their same pdf at multiple conferences, hoping for an easy conference with little reviewing only.

Acceptance rates
This brings me to the final point. Conferences are commonly ranked based on their acceptance ratio. The lower the percentage of accepted papers, the more prestigious the conference is considered. The most interesting figure is obtained if acceptance rates are based on the serious competition only — i.e., the subset of papers that made it to the reviewing phase. Desk rejected papers do not qualify as such, and hence should  not be taken into account when computing conference acceptance rates.

Paper Arrival Rates

As the deadline passed, I just closed the submission site for ICSE NIER 2013. How many hours in advance do authors typically submit their paper?

To answer that question, I collected some data on the time at which each paper was submitted. (I just looked at initial submissions, not at re-submissions). Here is a graph showing new paper arrivals, sorted by hours before the deadline, grouped in bins of 2 hours.

As you can see, we received a little less than 30 submissions more than 2 days in advance. But the vast majority likes to submit in the final 24 hours. The last paper was submitted just 5 minutes before the deadline.

Accumulating this graph and displaying the data as percentage yields the following chart:

This gives some insight in the percentage of papers submitted at different time slots before the deadline.

Let’s draw the following easy to remember conclusions from this graph:

  • 1/6th of the papers are submitted more than 48 hours ahead of time.
  • 1/3d of the papers are submitted 24 hours before the deadline
  • Half of the papers are submitted 14 hours before the dealine
  • 2/3d of the papers are submitted 10 hours before the deadline
  • 1/6th of the papers are submitted in the final 4 hours before the deadline

Is this relevant? If this is valid, as a conference organizer you can guestimate the number of submissions, say, 24 hours ahead of time, which is when you’d have 1/3d of the papers in.

But also if you’re an author this can be interesting. Conference systems like EasyChair give your paper an ID that equals the number of submissions so far. So if you submit at, say, 10 hours before the deadline, and get paper ID 200, the chart suggests that you may end up competing with 300 submissions in total.

The chart may very well be different for other conferences. NIER is part of ICSE which is held in California, with a deadline at 12pm Pago Pago time, on a Friday, soliciting 4-page papers, without requiring pre-submission of abstracts. These are all circumstances that will affect submission behavior. If you have pointers to similar data for other conferences let me know, and I’ll update the post.

Enjoy!