Golden Open Access for the ACM: Who Should Pay?

In a move that I greatly support, the ACM Special Interest Group on Programming Languages (SIGPLAN), is exploring various ways to adopt a truly Golden Open Access model, by rolling out a survey asking your opinion, set up by Michael Hicks. Even though I myself am most active in ACM’s Special Interest Group on Software Engineering SIGSOFT, I do publish at and attend SIGPLAN conferences such as OOPSLA. And I sincerely hope that SIGSOFT will follow SIGPLAN’s leadership in this important issue.

ACM presently supports green open access (self-archiving) and a concept called “Open TOC” in which papers are accessible via a dedicated “Table of Contents” page for a particular conference. While better than nothing, I agree with OOPSLA 2017 program chair Jonathan Aldrich who explains in his blog post that Golden Open Access is much preferred.

This does, however, raise the question who should pay for making publications open access, which is part of the SIGPLAN survey:

  • Attendants Pay: Increase the conference fees: SIGPLAN estimates that this would amount to an increase by around $50,- per attendee.

  • Authors Pay: Introduce Article Processing Charges: SIGPLAN indicates that if a full conference goes open access this would presently amount to $400 per paper.

screen-shot-2017-01-05-at-4-23-12-pm

Note that the math here suggest that the number of registrants is around 8 times the number of papers in the main research track. Also note that it assumes that only papers in the main research track are made open access. A conference like ICSE, however, has many workshops with many papers: It is equally important that these become open access too, which would change the math considerably.

The article processing charges of $400,- are presented as a given: They may seem in line with what commercial publishers charge, but they are certainly very high compared to what, e.g. LIPIcs charges for ECOOP (which is less than $100). These costs of $400,- come from ACM’s desire (need) to continue to make a substantial profit from their publishing activities, and should go down.

In his blog post, Jonathan Aldrich argues for the “author pays” model. His reasoning is that this can be viewed as a “funder pays” model: Most authors are funded by research grants, and usually in those grants funds can be found to cater for the costs involved in publishing open access.

On this point (and this point alone) I disagree with Jonathan. To me it feels fundamentally wrong to punish authors by making them pay $400 more for their registration. If anything, they should get a reduction for delivering the content of the conference.

I see Jonathan’s point that some funding agencies are willing to cover open access costs (e.g. NSF, NWO, H2020), and that it is worthwhile to explore how to tap into that money. But this requires data on what percentage of papers could be labeled as “funded”. For my department, I foresee several cases where it would be the department who’d have to pay for this instead of an external agency.

I do sympathize with Jonathan’s appeal to reduce conference registration costs, which can be very high. But the cost of making publications open access should be borne by the full community (all attendants), not just by those who happen to publish a paper.

Shining examples of open access computer science conferences are the Usenix, AAAI, and NIPS events. Full golden open access of all content, and no extra charges for authors — these conferences are years ahead of the ACM.

Do you have an opinion on “author pays” versus “participant pays”? Fill in the survey!

Thank you SIGPLAN for initiating this discussion!

Self-Archiving Publications in Elsevier Pure (at TU Delft)

TU Delft recently has adopted Elsevier Pure as its database to keep track of all publications from its employees.

At the same time, TU Delft has adopted a mandated green open access policy. This means that for papers published after May 2016, an author-prepared version (pdf) must be uploaded into Pure.

I am very happy with TU Delft’s commitment to green open access (and TU Delft is not alone). This decision also means, however, that TU Delft researchers need to do some extra work, to make their author-prepared versions available.

To make it easier for TU Delft researchers to upload their papers and comply with the green open access policy, here are some suggestions based on my experience so far working with Pure.

I can’t say I’m a big fan of Elsevier Pure. In the interest of open access, however, I’m doing my best to tolerate the quirks of Pure, in order to help the TU Delft to share all its research papers freely and persistently with everyone in the world.

Since Pure is used at hundreds of different universities, this post may also be relevant for researchers not working at TU Delft.

Contents

  1. The Outcome
  2. Accessing Pure
  3. Entering Meta-Data
  4. Entering your Author-Prepared version
  5. A Paper’s Life Cycle
  6. Updating Entries (Before/After “Approval”)
  7. Older (before 2015) Entries
  8. End-of-the-Year Publication Reporting
  9. Google Indexing
  10. Complicated Author Names
  11. Exporting To Bibtex

The Outcome

Pure Paper Data

Anyone can browse publications in Pure, available at https://pure.tudelft.nl.

All pages have persistent URL’s, making it easy to refer to a list of all your publications (such as my list), or individual papers (such as my recent one on crash reproduction). For all recent papers I have added a pdf of the version that we as authors prepared ourselves (aka the postprint), as well as a DOI link to the publisher version (often behind a paywall).

Thus, you can use Pure to offer, for each publication, your self-archived (green open access) version as well as the final publisher version.

Moreover, these publications can be aggregated to the section, department, and faculty level, for management reporting purposes.

In this way, Pure data shows the tax payers how their money is spent on academic research, and gives the tax payer free access to the outcomes. The tax payer deserves it that we invest some time in populating Pure with accurate data.

Accessing Pure

To enter publications into pure, you’ll need to login. On https://pure.tudelft.nl, in the footer at the right, you’ll find “Log into Pure”. Use your TU Delft netid.

If you’re interested in web applications, you will quickly recognize that Pure is a fairly old system, with user interface choices that would not be made these days.

Entering Meta-Data

You can start entering a publication by hitting the big green button “Add new” at the top right of the page. It will open a brand new browser window for you.

In the new window, click “Research Output”, which will turn blue and expand into three items.

Then there are several ways to enter a publication, including:

  1. Import via Elsevier Scopus, found via “Import from Online Source”. This is by far the easiest, if (1) your publication venue is indexed by Scopus, (2) it is already visible at Scopus (which typically takes a few months), and if (3) you can find it on Scopus. To help Scopus, I have set up an ORCID author identifier and connected it to my Scopus author profile.

  2. Import via Bibtex, found via “Import from file”. If you click it, importing from bibtex is one of the options. You can obtain bibtex entries from DBLP, Google Scholar, ACM, your departmental publications server, or write them by hand in your favorite editor, and then copy paste them into Pure.

  3. Entering details via a series of buttons and forms (“Create from template”). I recommend not to use this option. If you go against this advice, make sure that if you want to enter a conference paper, you do not pick the template “Paper/contribution to conference”, as you should pick “Conference Contribution/Chapter in Conference Proceedings” instead. Don’t ask me why.

In all cases, yet another browser window is opened, in which you can inspect, correct, and save the bibliographic data.

Entering your Author-Prepared version

With each publication, you can add various “electronic versions”.

Each can be a file (pdf), a link to a version, or a DOI. For pdfs you want to upload, make sure you check it meets the conditions under your publisher allows self-archiving.

Pure distinguishes various version types, which you can enter via the “Document version” pull down menu. Here you need to include at least the following two versions:

  • The “accepted author manuscript”. This is also called a postprint, and is the version that (1) is fully prepared by you as authors; and that (2) includes all improvements you made after receiving the reviews. Here you can typically upload the pdf as you prepared it yourself.

  • The “final published version”. This is the Publisher’s version. It is likely that the final version is copyrighted by the publisher. Therefore, you typically include a link (DOI) to the final version, and do not upload a pdf to Pure. If you import from Scopus, this field is automatically set.

Furthermore, Pure permits setting the “access to electronic version”, and defining the “public access”. Relevant items include:

  • Open, meaning (green) open access. This is what I typically select for the “accepted author manuscript”.

  • Restricted, meaning behind a paywall. This is what I typically select for the final published version.

  • Embargoed, meaning that the pdf cannot be made public until a set date. Can be used for commercial publishers who insist on restricting access to post-prints from institutional repositories in the first 1-2 years.

The vast majority (80%) of the academic publishers permits authors to archive their accepted manuscripts in institutional repositories such as Pure. However, publishers typically permit this under specific conditions, which may differ per publisher. You can check out my Green Open Access FAQ if you want to learn more about these conditions, and how to find them for your (computer science) publisher.

A Paper’s Life Cycle

Making papers early available is one of the benefits of self-archiving. This can be done in Pure by setting the paper’s “Publication Status”. This field can have the following values:

  1. “In preparation”: Literally a pre-print. Your paper can be considered a draft and may still change.
  2. “Submitted”: You submitted your paper to a journal or conference where it is now under review.
  3. “Accepted/In press”: Yes, paper accepted! This also means that you as an author can share your “accepted author manuscript”.
  4. “E-Pub ahead of print”: I don’t see how this differs from the Accepted state.
  5. “Published”: The paper is final and has been officially published.

In my Green Open Access FAQ I provide an answer to the question Which Version Should I Self-Archive.

I typically enter publications once accepted, and share the Pure link with the accepted author manuscript as pre-print link on Twitter or on conference sites (e.g. ICSE 2017)

In particular, I do the following once my paper is accepted:

  1. I update my pdf with a (foot)note indicating where it will be published, and who will eventually hold the copyright.
  2. I create a bibtex entry for an @inproceedings (conference, workshop) or @article (journal) publication.
  3. I upload the bibtex entry into pure.
  4. I add my own pdf with the author-prepared version to the resulting pure entry
  5. I set the state to “Accepted”.
  6. I share the Pure link on Twitter with the rest of the world.

Once the publisher actually manages to publish this paper as well (this may be several months later!), I update my pure entry:

  1. I add the DOI link to the final published version.
  2. I provide the missing bibliographic meta-data (page numbers, volume, number, …).
  3. I set the state to “Published”.

My preprint links I shared still contain a pointer to the self-archived pdf, but now also to the official version at the publisher for those who have access through the pay wall.

Updating Entries (Before/After “Approval”)

A publication you entered can be in three states:

  • “For Approval” means that the publication has not been approved yet by a TU Delft Library employee. It also means that you can still make changes yourself. This is the default state a publication is in once after you entered it yourself.

  • “Approved” means that a TU Delft library employee has approved the publication. This means that you yourself cannot change this publication anymore. If your publication does need a correction nevertheless, you will have to email the TU Delft library contact person for your department (Jasper van Dijck for my Department of Software Technology).

  • “Entry in progress”: This is a state you can use to indicate that you still plan to update the publication — it instructs the library not to try to approve the (intentionally incomplete) entry. In the life cycle discussed above, you could use this state to mark an entry as in progress between the acceptance of the paper (no DOI yet) and the actual publication (DOI available).

(These states can be configured differently for different Pure installations.)

Older (before 2015) Entries

Entries from 2015 and before were automatically imported from TU Delfts old Metis system (which in our Department of Software Technology in turn was populated from our ST Publication Server. Since Metis did not support pdf uploads, these older publications do not come with open access post-prints in Pure.

To update such older entries, see the updating procedure described above.

End-of-the-Year Publication Reporting

As any (Dutch) university, the TU Delft has to report to its stakeholders what its “output” is. This information is collected in Pure, and used by the government to analyze the research performance of various universities.

This means that in, say, February of year N, all publications in year N-1 must have been entered into Pure.

If you follow the Scopus approach (like I try to do), this means that due to the delay in Scopus you may have to switch to the bibtex approach to enter publications from November or December.

Note that in the United Kingdom under the REF open access policy, authors must upload their papers within 3 months of being accepted. The TU Delft has no such rule yet as far as I know, but this would simplify the process of end-of-the-year publication collection.

Google Indexing

The TU Delft Pure data itself is not indexed by Google (as far as I know). The papers that I have uploaded into Pure are discovered by Google Scholar. This is in line with the harvesting objective of Pure:

Research output entered in Pure is harvested by Google Scholar and visible on that platform.

Note also that pdfs uploaded in Pure should be automatically (after validation by the library) copied to https://repository.tudelft.nl, which is indexed, meaning that your papers (and your post-prints) will end up in Google Scholar.

Complicated Author Names

Pure contains official employee names as registered by TU Delft.

Some authors publish under different (variants of their) names. For example, Dutch universities have trouble handling the complex naming habits of Portuguese and Brazilian employees.

If Pure is not able to map an author name to the corresponding employee, find the author name in the publication, click edit, and then click “Replace”. This allows searching the TU Delft employee database for the correct person.

If Pure has found the correct employee, but the name displayed is very differently from what is listed on the publication itself, you can edit the author for that publication, and enter a different first and last name for this publication.

Exporting To Bibtex

If you’re logged in, you can download your publication list in various formats, including BibTex (you’ll find the button for this at the bottom of the page).

I needed slightly different BibTex output in order to be able to import it into our local publication server, so I wrote a little Python script to scrape a Pure web page (mine, yours, or anyone’s), that adds relevant information (such as a url field linking back to the paper’s Pure page).


Version history

  • 20 November 2016: Version 0.1, for internal purposes.
  • 07 December 2016: Version 0.2, first public version.
  • 14 December 2016: Version 0.3, minor improvements.
  • 13 January 2017: Version 0.4, updated Google Scholar information.
  • 16 March 2017: Version 0.5, updated approval states based on correction from Hans Meijerrathken.
  • 17 March 2017: Version 0.6, toc, life cycle and exporting added.

Acknowledgments: Thanks to Moritz Beller for providing feedback and trying out Pure.

© Arie van Deursen, December 2016.

Green Open Access FAQ

Green field

Image credit: Flickr, user static_view

(Opinionated) answers to frequently asked questions on (green) open access, from a computer science (software engineering) research perspective.

Disclaimer: IANAL, so if you want to know things for sure you’ll have to study the references provided. Use at your own risk.

Green open access is trickier than I thought, so I might have made mistakes. Corrections are welcome, just as additional questions for this FAQ. Thanks!

Green Open Access Questions

  1. What is Green Open Access?
  2. What is a pre-print?
  3. What is a post-print?
  4. What is a publisher’s version?
  5. Do publishers allow Green Open Access?
  6. Under what conditions is Green Open Access permitted?
  7. What is Yellow Open Access?
  8. What is Gold Open Access?
  9. What is Hybrid Open Access?
  10. What are the Self-Archiving policies of common computer science venues?
  11. Is Green Open Access compulsory?
  12. Should I share my pre-print under a Creative Commons license?
  13. What is a good place for self-archiving?
  14. Can I use PeerJ Preprints for Self-Archiving?
  15. Can I use ResearchGate or Academia.edu for Self-Archiving?
  16. Which version(s) should I self-archive?
  17. What does Gold Open Access add to Green Open Access?
  18. Will Green Open Access hurt commercial publishers?
  19. What is the greenest publisher in computer science?
  20. Should I use ACM Authorizer for Self-Archiving?
  21. As a conference organizer, can I mandate Green Open Access?
  22. What does Green Open Access cost?
  23. Should I adopt Green Open Access?
  24. Where can I learn more about Green Open Access?

What is Green Open Access?

In Green Open Access you as an author archive a version of your paper yourself, and make it publicly available. This can be at your personal home page, at the institutional repository of your employer (such as the one from TU Delft), or at an e-print server such as arXiv.

The word “archive” indicates that the paper will remain available forever.

What is a pre-print?

A pre-print is a version of a paper that is entirely prepared by the authors.

Since no publisher has been involved in any way in the preparation of such a pre-print, it feels right that the authors can deposit such pre-prints where ever they want to. Before submission, the authors, or their employers such as universities, hold the copyright to the paper, and hence can publish the paper in on line repositories.

Following the definition of SHERPA‘s RoMEO project, pre-prints refer to the version before peer-review organized by a publisher.

What is a post-print?

Following the RoMEO definitions, a post-print is a final draft as prepared by the authors themselves after reviewing. Thus, feedback from the reviewers has typically been included.

Here a publisher may have had some light involvement, for example by selecting the reviewers, making a reviewing system available, or by offering a formatting template / style sheet. The post-print, however, is author-prepared, so copy-editing and final markup by the publisher has not been done.

What is a publisher’s version?

While pre- and post-prints are author-prepared, the final publisher’s version is created by the publisher.

The publishers involvement may vary from very little (camera ready version entirely created by authors) up to substantial (proof reading, new markup, copy editing, etc.).

Publishers typically make their versions available after a transfer of copyright, from the authors to the publisher. And with the copyright owned by the publisher, it is the publisher who determines not only where the publisher’s version can be made available, but also where the original author-prepared pre- or post-prints can be made available.

Do publishers allow Green Open Access?

Self-archiving of non-published material that you own the copyright to is always allowed.

Whether self-archiving of a paper that has been accepted by a publisher for publication is allowed depends on that publisher. You have transferred your copyright, so it is up to the publisher to decide who else can publish it as well.

Different publishers have different policies, and these policies may in turn differ per journal. Furthermore, the policies may vary over time.

The SHERPA project does a great job in keeping track of the open access status of many journals. You’ll need to check the status of your journal, and if it is green you can self-archive your paper (usually under certain publisher-specific conditions).

In the RoMEO definition, green open access means that authors can self-archive both pre-prints and post-prints.

Under what conditions is Green Open Access permitted?

Since the publisher holds copyright on your published paper, it can (and usually does) impose constraints on the self-archived versions. You should always check the specific constraints for your journal or publisher, for example via the RoMEO journal list.

The following conditions are fairly common:

  1. You generally can self-archive pre- and post-prints only, but not the publisher version.

  2. In the meta-data of the self-archived version you need to add a reference to the final version (for example through its DOI).

  3. In the meta-data of the self-archived version you need to include a statement of the current ownership of the copyright, sometimes through specific sentences that must be copy-pasted.

  4. The repository in which you self-archive should be non-commercial. Thus, arXiv and institutional repositories are usually permitted, but commercial ones like PeerJ Preprints, Academia.edu or ResearchGate are not.

  5. Some commercial publishers impose an embargo on post-prints. For example Elsevier permits sharing the post-print version on an institutional repository only after 12-24 months (depending on the journal).

Usually meeting the demands of a single publisher is relatively easy to do. Given points 2 and 3, it typically involves creating a dedicated pdf with a footnote on the first page with the required extra information.

However, every publisher has its own rules. If you publish your papers in a range of different venues (which is what good researchers do), you’ll have to know many different rules if you want to do green open access in the correct way.

What is Yellow Open Access?

Some publishers (such as Wiley) allow self-archiving of pre-prints only, and not of post-prints. This is referred to as yellow open access in RoMEO. Yellow is more restrictive than green.

As an author, I find yellow open access frustrating, as it forbids me to make the version of my paper that was improved thanks to the reviewers available via open access.

As a reviewer, I feel yellow open access wastes my effort: I tried to help authors by giving useful feedback, and the publisher forbids my improvements to be reflected in the open access version.

What is Gold Open Access?

Gold Open Access refers to journals (or conference proceedings) that are completely accessible to the public without requiring paid subscriptions.

Often, gold implies green, for example when a publisher such as PeerJ, PLOS ONE or LIPIcs adopts a Creative Commons license — which allows anyone, including the authors, to share a copy under the condition of proper attribution.

The funding model for open access is usually not based on subscriptions, but on Author Processing Charges, i.e., a payment by the authors for each article they publish (varying between $70 (LIPIcs) up to $1500 (PLOS ONE) per paper).

What is Hybrid Open Access?

Hybrid open access refers to a restricted (subscription-funded) journal that permits authors to pay extra to make their own paper available as open access.

This practice is also referred to as double dipping: The publisher catches revenues from both subscriptions and author processing charges.

University libraries and funding agencies do not like hybrid access, since they feel they have to pay twice, both for the authors and the readers.

Green open access is better than hybrid open access, simply because it achieves the same (an article is available) yet at lower costs.

What are the Self-Archiving policies of common computer science venues?

For your and my convenience, here is the green status of some publishers that are common in software engineering (check links for most up to date information):

  • ACM: Green, e.g., TOSEM, see also the ACM author rights. For ACM conferences, often the author-prepared camera-ready version includes a DOI already, making it easy to adhere to ACM’s meta-data requirements.
  • IEEE: Green, e.g., TSE. The IEEE has a policy that the IEEE makes a version available that meets all IEEE meta-data requirements, and that authors can use for self-archiving. See also their self-archiving FAQ.
  • Springer: Green, e.g., EMSE, SoSyM, LNCS. Pre-print on arXiv, post-print on personal page immediately and in repository in some cases immediately and in others after a 12 month embargo period.
  • Elsevier: Mostly green, e.g., JSS, IST. Pre-print allowed, post-print with CC-BY-NC-ND license on personal page immediately and in institutional repository after 12-48 month embargo period.
  • Wiley: Mostly yellow, i.e., only pre-prints can be immediately shared, and post-prints (even on personal pages) only after 12 month embargo. E.g. JSEP.

Luckily, there are also some golden open access publishers (which typically permit self-archiving as well should you still want that):

Is Green Open Access compulsory?

Funding agencies (NWO, EU, Bill and Melinda Gates Foundation, …) as well as universities (TU Delft, University of California, UCL, ETH Zurich, Imperial College, …) are increasingly demanding that all publications resulting from their projects or employees are available in open access.

My own university TU Delft insists, like many others, on green open access:

As of 1 May 2016 the so-called Green Road to Open Access publishing is mandatory for all (co)authors at TU Delft. The (co)author must publish the final accepted author’s version of a peer-reviewed article with the required metadata in the TU Delft Institutional Repository.

This makes sense: The TU Delft wants to have copies of all the papers that its employees produce, and make sure that the TU Delft stakeholders, i.e. the Dutch citizens, can access all results. Note that TU Delft insists on post-prints that include reviewer-induced modifications.

The Dutch national science foundation NWO has a preference for gold open access, but accepts green open access if that’s impossible (“Encourage Gold, require immediate Green“).

Should I share my pre-print under a Creative Commons license?

You should only do this if you are certain that the publisher’s conditions on self-archiving pre-prints are compatible with a Creative Commons license. If that is the case, you probably are dealing with a golden open access publisher anyway.

Creative Commons licenses are very liberal, allowing anyone to re-distribute (copy) the licensed work (under certain conditions, including proper attribution).

This effectively nullifies (some of) the rights that come with copyright. For that reason, publishers that insist on owning the full copyright to the papers they publish typically disallow self-archiving earlier versions with such a license.

For example, ACM Computing Surveys insists on a set statement indicating

… © ACM, YYYY. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution…

This “not for redistribution” is incompatible with Creative Commons, which is all about sharing.

Furthermore, a Creative Commons license is irrevocable. So once you picked it for your pre-print, you effectively made a choice for golden open access publishers only (some people might consider this desirable, but it seriously limits your options).

Therefore, my suggestion would be to keep the copyright yourself for as long as you can, giving you the freedom to switch to Creative Commons once you know who your publisher is.

What is a good place for self-archiving?

It depends on your needs.

Your employer may require that you use your institutional repository (such as the TU Delft Repository). This still allows you to post a version elsewhere as well.

Subject repositories such as arXiv offer good visibility to your peers. In fields like physics using arXiv is very common, whereas in Computer Science this is less so. A good thing about arXiv is that it permits versioning, making it possible to submit a pre-print first, which can then later be extended with the post-print. You can use several licenses. If you intend publishing your paper, however, you should adopt arXiv’s Non-Exclusive Distribution license (which just allows arXiv to distribute the paper) instead of the more generous Creative Commons license — which would likely conflict with the copyright claims of the publisher of the refereed paper.

Your personal home page is a good place if you want to offer an overview of your own research. Home page URLs may not be very permanent though, so as an approach to self archiving it is less ideal.

Can I use PeerJ Preprints for Self-Archiving?

Probably not — and it’s also not what PeerJ Preprints are intended for.

PeerJ Preprints is a commercial eprint server requiring a Creative Commons license. It is intended to share drafts that have not yet been peer reviewed for formal publication.

It offers good visibility (a preprint on goto statements attracted 15,000 views), and a smooth user interface for posting comments and receiving feedback. Articles can not be removed once uploaded.

The PeerJ Preprint service is compatible with other golden open access publishers (such as PeerJ itself or Usenix).

The PeerJ Preprint service, however, is incompatible with most other publishers (such as ACM, IEEE, or Springer) because (1) the service is commercial; (2) the service requires Creative Commons as license; (3) preprints once posted cannot be removed.

So, if you want to abide with the rules, uploading a pre-print to PeerJ Preprints severely limits your subsequent publication options.

Can I use ResearchGate or Academia.edu for Self-Archiving?

No — unless you only work with liberal publishers with permissive licenses such as Creative Commons.

ResearchGate and Academia.edu are researcher social networks that also offer self-archiving features. As they are commercial repositories, most publishers will not allow sharing your paper on these networks.

The ResearchGate copyright pages provide useful information on this.

The Academia.edu copyright pages state the following:

Many journals will also allow an author to retain rights to all pre-publication drafts of his or her published work, which permits the author to post a pre-publication version of the work on Academia.edu. According to Sherpa, which tracks journal publishers’ approach to copyright, 90% of journals allow uploading of either the pre-print or the post-print of your paper.

This seems misleading to me: Most publishers explicitly dis-allow posting preprints to commercial repositories such as Academia.edu.

In both cases, the safer route is to use permitted places such as your home page or institutional repository for self-archiving, and only share links to your papers with ResearchGate or Academia.edu.

Which version(s) should I self-archive?

It depends.

Publishing a pre-print as soon as it is ready has several advantages:

  • You can receive rapid feedback on a version that is available early.

  • You can extend your pre-print with an appendix, containing material (e.g., experimental data) that does not fit in a paper that you’d submit to a journal

  • It allows you to claim ownership of certain ideas before your competition.

  • You offer most value to society since you allow anyone to benefit as early as possible from your hard work

Nevertheless, publishing a post-print only can also make sense:

  • You may want to keep some results or data secret from your competition until your paper is actually accepted for publication.

  • You may want to avoid confusion between different versions (pre-print versus post-print).

  • You may be scared to leave a trail of rejected versions submitted to different venues.

  • You may want to submit your pre-print to a venue adopting double blind reviewing, requiring you to remain anonymous as author. Publishing your pre-print during the reviewing phase would make it easy for reviewers to find your paper and connect your name to it.

For these reasons, and primarily to avoid confusion, I typically share just the post-print: The camera-ready version that I create and submit to the publisher is also the version that I self-archive as post-print.

What does Gold Open Access add to Green Open Access?

For open access, gold is better than green since:

  • it removes the burden of making articles publicly available from the researcher to the publisher.
  • it places a paper in a venue that is entirely open access. Thus, also other papers improving upon, or referring to your paper (published in the same journal) will be open access too.
  • gold typically implies green, i.e., the license of the journal is similar to Creative Commons, allowing anyone, including the authors, to share a copy under the condition of proper attribution.

Will Green Open Access hurt commercial publishers?

Maybe. But most academic publishers already allow green open access, and they are doing just fine. So I would not worry about it.

What is the greenest publisher in computer science?

The greenest publisher should be the one imposing the least restrictions on self-archiving.

From that perspective, publishers who want to be the greenest should in fact want to be gold, making their papers available under a permissive Creative Commons license. An example is Usenix.

Among the non-golden publishers, the greenest are probably the non-commercial ones, such as IEEE and ACM: They require simple conditions that are usually easy to meet.

The ACM, “the world’s largest educational and scientific computing society”, claims to be among the “greenest” publishers. Based on their tolerant attitude towards self-archiving of post-prints this may be somewhat justified. Furthermore, their Authorizer mechanism permits setting up free access to the publisher’s version.

But greenest is gold. So I look forward to the day the ACM follows its little sister Usenix in a full embrace of golden open access.

Should I use ACM Authorizer for Self-Archiving?

The ACM offers the Authorizer mechanism to provide free access to the Publisher’s Version of a paper, which only works from one user-specified URL. For example, I can use it to create a dedicated link from my institutional paper page to the publisher’s version.

However, Authorizer links cannot be accessed from other pages, and there is no point in emailing or tweeting them. Since only one authorizer link can exist per paper, I cannot use an authorizer link for both my institutional repository, and for the repository of my funding agency.

These restrictions on Authorizer links make them unsuitable as a replacement for self-archiving (let alone as a replacement for golden open access).

As a conference organizer, can I mandate Green Open Access?

Green open access is self-archiving, giving the authors the permission to archive their own papers.

As a conference organizer working with a non open access (ACM, IEEE, Springer-Verlag) publisher, you are not allowed to archive and distribute all the papers of the conference yourself.

OOSPLA program with DOI and preprint link

What several conferences do instead, though, is collecting links to pre- or post-prints. For example, the on line program of the recent OOPSLA 2016 conference has links to both the publisher’s version (through a DOI) and to an author-provided post-print.

For OOPSLA, 20 out of the 52 (38%) of the authors provided such a link to their paper, a number that is similar in other conferences adopting such preprint linking.

As a conference organizer, you can do your best to encourage authors to submit their pre-print links. Or you can use your influence in the steering committee to push the conference to switch to an open access publisher, such as LIPIcs or Usenix.

As an author, you can help by actually offering a link to your pre-print.

What does Green Open Access cost?

For authors, green open access typically costs no money. University repositories, arXiv, and PeerJ Preprints are all free to use.

It does cost (a bit of) effort though:

  • You need to find out the specific conditions under which the publisher of your current paper permits self-archiving.
  • You need to actually upload your paper to some repository, provide the correct meta-data, and meet the publisher’s constraints.

The fact that open access is free for authors does not mean that there are no costs involved. For example, the money to keep arXiv up and running comes from a series of sponsors, including TU Delft.

Should I adopt Green Open Access?

Yes.

Better availability of your papers will help you in several ways:

  • Impact in Research: Other researchers can access your papers more easily, increasing the chances that they will build upon your results in their work;
  • Impact in Practice: Practitioners may be interested in using your results: A pay-wall is an extra and undesirable impediment for such adoption;
  • Improved Results: Increased usage of your results in either industry or academia will put your results to the real test, and will help you improve your results.

Besides that, (green) open access is a way of delivering to the tax payers what they paid for: Your research results.

Where can I learn more about Green Open Access?

Useful resources include:


Version history:

  • 6 November 2016: Version 0.1, Initial version, call for feedback.
  • 14 November 2016: Version 0.2, update on commercial repositories.
  • 18 November 2016: Version 0.3, update on ACM Authorizer.
  • 20 November 2016: Version 0.4, added TOC, update on commercial repositories.
  • 06 December 2016: Version 0.5, updated information on ACM and IEEE.
  • 20 December 2016: Version 0.6, added info on Creative Commons and AI venues.

Acknowledgments: I thank Moritz Beller (TU Delft) and Dirk Beyer (LMU Munich) for valuable feedback and corrections.

© Arie van Deursen, November 2016. By the end of December 2016 this FAQ will be licensed under CC BY-SA 4.0.

Some Research Paper Writing Recommendations

Last week, I received an email from Alex Orso and Sebastian Uchitel, who had been asked to give a talk on “How to get my papers accepted at top SE conferences” at the Latin American School on Software Engineering. Here’s their question:

We hope you can spare a few minutes to share with us the key recommendations you would give to PhD students that have not yet had successful submissions to top software engineering conferences, such as ICSE.

An interesting request, and I certainly look forward to receive some of the advice my fellow researchers will be providing you can see the advice of my fellow researchers in a presentation by Alex Orso.

When working with my students on papers, I must admit I sometimes repeat myself. Below are some of the things I hear myself say most often.

Explain the Innovation

The first thing to keep in mind is that a research paper should explain the innovation. This makes it quite different from a text book chapter, or from a hands-on tutorial. The purpose is not to explain a technique so that others can use it. Instead, the purpose of a research paper is to explain what is new about the proposed technique.

Identify the Contributions

Explaining novelty is driven by contributions. A contribution is anything the world did not know before this paper, but which we now do know thanks to this paper.

I tend to insist on an explicit list of contributions, which I usually put at the end of the paper.

“The contributions of this paper are …”

Each contribution is an outcome, not the process of doing something. Contributions are things, not effort. Thus, “we spent 6 months manually analyzing 500,000 commit messages” is not a contribution. This effort, though, hopefully has resulted in a useful contribution, which may be that “for projects claiming to do test-driven development, to our surprise we found that 75% of the code commits are not accompanied by a commit in the test code.”

Usually, when thinking about the structure of a paper, quite a bit of actual research has been done already. It is then important to reassess everything that has been done, in order to see what the real contributions of the research are. Contributions can include a new experimental design, a novel technique, a shared data set or open source tool, as well as new empirical evidence contradicting, confirming, or enriching existing theories.

Structure the Paper

With the contributions laid out, the structure of the paper appears naturally: Each contribution corresponds to a section.

This does not hold for the introductory and concluding sections, but it does hold for each of the core sections.

Furthermore, it is essential to separate background material from own contributions. Clearly, most papers will rely on existing theories or techniques. These must be explained. Since the goal of the paper is to explain the innovation, all material that is not new should be clearly isolated. In this way, it easiest for the reader (and the reviewer) to see what is new, and what is not new, about this paper.

As an example, take a typical structure of a research paper:

  1. Introduction
  2. Background: Cool existing work that you build upon.
  3. Problem statement: The deficiency you spotted
  4. Conceptual solution: A new way to deal with that problem!
  5. Open source implementation: Available for everyone!
  6. Experimental design for evaluation: Trickier than we thought!
  7. Evaluation results: It wasn’t easy to demonstrate, but yes, we’ve good evidence that this may work!
  8. Discussion: What can we do with these results? Promising ideas for future research or applications? And: a critical analysis of the threats to the validity of our results.
  9. Related work
  10. Concluding remarks.

In such a setup, sections 4-7 can each correspond to a contribution (and sometimes to more than one). The discussion section (8) is much more speculative, and usually does not contribute solid new knowledge.

Communicate the Contributions

Contributions not just help in structuring a paper.

They are also the key aspect program committees look at when deciding about acceptence of a paper.

When reviewers evaluate a paper, they try to identify, and interpret the contributions. Are these contributions really new? Are they important? Are they trivial? Did the authors provide sufficient evaluations for their claims? The paper should help the reviewer, by being very explicit about the contributions and the claims to fame of these contributions.

When program committee members discuss a paper, they do so in terms of contributions. Thus, contributions should not just be strong, they should also be communicable.

For smaller conferences, it is safe to assume that all reviewers are epxerts. For large conferences, such as ICSE, the program committee is broad. Some of reviewers will be genuine experts on the topic of the paper, and these reviewers should be truly excited about the results. Other reviewers, however, will be experts in completely different fields, and may have little understanding of the paper’s topic. When submitting to prestigious yet broad conferences, it is essential to make sure that any reviewer can understand and appreciate the contributions.

The ultimate non-expert is the program chair. The chair has to make a decision on every paper. If the program chair cannot understand a paper’s contributions, it is highly unlikely that the paper will get accepted.

Share Contributions Early

Getting a research paper, including its contributions, right, is hard. Especially since contributions have to be understandable by non-experts.

Therefore, it is crucial to offer help to others, volunteering to read preliminary drafts of papers, assessing the strength of theircontributions. In return, you’ll have other people, possibly non-experts, assess the drafts you are producing, in this way helping each other to publish a paper at this prestigious conference.

But wait. Isn’t helping others a bad idea for highly competitive conferences? Doesn’t it reduce one’s own chances?

No. Software engineering conferences, including ICSE and FSE, accept any paper that is good. Such conferences do not work with accpetance rates that are fixed in advance. Thus, helping each other may increase the acceptance rate, but will not negatively affect any author.

Does This Help?

I hope some of these guidelines will be useful to “PhD students that have not yet had successful submissions to top software engineering conferences, such as ICSE.”

A lot more advice is available on the Internet on how to write a research paper. I do not have a list of useful resources available at the time of writing, but perhaps in the near future I will extend this post with useful additional pointers.

Luckily, this post is not a research paper. None of the ideas presented here is new. But they have worked for me, and I hope they’ll work for you too.


Image credits: Pencils in the Air, by Peter Logan, Photo by Mira66. flickr

Green Open Access and Preprint Linking

ICSE 2013

One of the most useful changes to the ICSE International Conference on Software Engineering this year, was that the program website contained links to preprints of many of the papers presented.

As ICSE is a large event (over 1600 people attended in 2013), it is worth taking a look at what happened. What is preprint linking? How many authors actually provided a preprint link? What about other conferences? What are the wider implications for open access publishing in software engineering?

Self-Archiving

Preprint linking is based on the idea that authors, who do all the work in both writing and formating the paper, have the right to self-archive the paper they created themselves (also called green open access). Authors can do this on their personal home page, in institutional repositories of, e.g., the universities where they work or in public preprint repositories such as arxiv.

Sharing preprints has been around in science since decades (if not ages): As an example, my ‘alma mater’ CWI was founded in 1947, and has a technical report series dating back to that year. These technical reports were exchanged (without costs) with other mathematical research institutes. First by plain old mail, then by email, later via ftp, and now through http.

While commercial publishers may dislike the idea that a free preprint is available for papers they publish in their journals or conference proceedings, 69% of the publishers do in fact allow (some form of) self-archiving. For example, ACM, IEEE, Springer, and Elsevier (the publishers I work most with) explicitly permit it, albeit always under specific conditions. These conditions can usually be met, and include such requirements as providing a note that the paper has been accepted for publication, a pointer to the URL where the published article can be found, and a copyright notice indicating the publisher now owns the copyright.

Preprint links shown on ICSE site.

Preprint links as shown on ICSE site.

Preprint Linking

All preprint linking does, is ask authors of accepted conference papers, whether they happen to have a link to a preprint available. If so, the conference web site will include a link to this preprint in its progam as listed on its web site.

For ICSE, doing full preprint linking at the conference site was proposed and conducted by Dirk Beyer, after an earlier set of preprint links was collected on a separate github gist by Adrian Kuhn.

Dirk Beyer runs Conference Publishing Consulting, the organization hired by ICSE to collect all material to be published, and get it ready for inclusion in the ACM/IEEE Digital Libraries. As part of this collection process, ICSE asked the authors to provide a link to a preprint, which, if provided, was included in the ICSE on line program.

The ICSE 2013 proceedings were published by IEEE. In their recently updated policy, they indicate that “IEEE will make available to each author a preprint version of that person’s article that includes the Digital Object Identifier, IEEE’s copyright notice, and a notice showing the article has been accepted for publication.” Thus, for ICSE, authors were provided with a possibility to download this version, which they then could self-archive.

Preprints @ ICSE 2013

With a preprint mechanism setup at ICSE, the next question is how many researchers actually made use of it. Below are some statistics I collected from the ICSE conference site:

Track / Conference #Papers presented #Preprints Percentage
Research Track 85 49 57%
ICSE NIER 31 19 61%
ICSE SEIP 19 6 31%
ICSE Education 13 3 23%
ICSE Tools 16 7 43%
MSR 64 36 56%
Total 228 120 53%

 

In other words, a little over half of the authors (53%) provided a preprint link. And, almost half of the authors decided not to.

I hope and expect that for upcoming ICSE conferences, more authors will submit their preprint links. As a comparison, at the recent FORTE conference, 75% of the authors submitted a preprint link.

For ICSE, this year was the first time preprint linking was available. Authors may have not been familiar with the phenomenon, may not have realized in advance how wonderful a program with links to freely available papers is, may have missed the deadline for submitting the link, or may have missed the email asking for a link altogether as it ended up in their spam folder. And, in all honesty, even I managed to miss the opportunity to send in my link in time for some of my MSR 2013 papers. But that won’t happen again.

Preprint Link Sustainability

An issue of some concern is the “sustainability” of the preprint links — what happens, for example, to homepages with preprints once the author changes jobs (once the PhD student finishes)?

The natural solution is to publish preprints not just on individual home pages, but to submit them to repositories that are likely to have a longer lifetime, such as arxiv, or your own technical report series.

An interesting route is taken by ICPC, which instead of preprint links simply provides a dedicated preprint search on Google Scholar, with authors and title already filled in. If a preprint has been published somewhere, and the author/title combination is sufficiently unique, this works remarkably well. MSR uses a mixture of both appraoches, by providing a search link for presentations for which no preprint link was provided.

Implications

Open access, and hence preprint publishing, is of utmost importance for software engineering.

Software engineering research is unique in that it has a potentially large target audience of developers and software engineering practitioners that is on line continually. Software engineering research cannot afford to dismiss this audience by hiding research results behind paywalls.

For this reason, it is inevitable that on the long run, software engineering researchers will transform their professional organizations (ACM and IEEE) so that their digital libraries will make all software engineering results available via open access.

Irrespective of this long term development, the software engineering research community must hold on to the new preprint linking approach to leverage green open access.

Thus:

  1. As an author, self-archive your paper as a preprint or technical report. Consider your paper unpublished if the preprint is not available.
  2. If you are a professor leading a research group, inspire your students and group members to make all of their publications available as preprint.
  3. If you are a reviewer for a conference, insist that your chairs ensure that preprint links are collected and made available on the conference web site.
  4. If you are a conference organizer or program chair, convince all authors to publish preprints, and make these links permanently available on the conference web site.
  5. If you are on a hiring committee for new university staff members, demand that candidates have their publications available as preprints.

Much of this has been possible for years. Maybe one of the reasons these practices have not been adopted in full so far, is that they cost some time and effort — from authors, professors, and conference organizers alike — time that cannot be used for creative work, and effort that does not immediately contribute to tenure or promotion. But it is time well spent, as it helps to disseminate our research to a wider audience.

Thanks to the ICSE move, there now may be a momentum to make a full swing transition to green open access in the software eningeering community. I look forward to 2014, when all software engineering conferences will have adopted preprint linking, and 100% of the authors will have submitted their preprint links. Let us not miss this unique opportunity.

Acknowledgments

I am grateful to Dirk Beyer, for setting up preprint linking at ICSE, and for providing feedback on this post.

Update (Summer 2013)

David Notkin on Why We Publish

This week David Notkin (1955-2013) passed away, after a long battle against cancer. He was one of my heroes. He did great research on discovering invariants, reflexion models, software architecture, clone analysis, and more. His 1986 Gandalf paper was one of the first I studied when starting as a PhD student in 1990.

December 2011 David sent me an email in which he expressed interest to do a sabbatical in our TU Delft Software Engineering Research Group in 2013-2014. I was as proud as one can be. Unfortunately, half a year later he had to cancel his plans due to his health.

David was loved by many, as he had a genuine interest in people: developers, software users, researchers, you. And he was a great (friendly and persistent) organizer — 3 weeks ago he still answered my email on ICSE 2013 organizational matters.

In February 2013, he wrote a beautiful editorial for the ACM Transactions on Software Engineering and Methodology, entitled Looking Back. His opening reads: “It is bittersweet to pen my final editorial”. Then David continues to address the question why it is that we publish:

“… I’d like very much for each and every reader, contributor, reviewer, and editor to remember that the publications aren’t primarily for promotions, or for citation counts, or such.

Rather, the intent is to make the engineering of software more effective so that society can benefit even more from the amazing potential of software.

It is sometimes hard to see this on a day-to-day basis given the many external pressures that we all face. But if we never see this, what we do has little value to society. If we care about influence, as I hope we do, then adding value to society is the real measure we should pursue.

Of course, this isn’t easy to quantify (as are many important things in life, such as romance), and it’s rarely something a single individual can achieve even in a lifetime. But BHAGs (Big Hairy Audacious Goals) are themselves of value, and we should never let them fade far from our minds and actions.”

Dear David, we will miss you very much.


See also:


Desk Rejected

One of the first things we did after all NIER 2013 papers were in, was identifying papers that should be desk rejected. What is a desk reject? Why are papers desk rejected? How often does it happen? What can you do if your paper is desk rejected?

A desk reject means that the program chairs (or editors) reject a paper without consulting the reviewers. This is done for papers that fail to meet the submission requirements, and which hence cannot be accepted. Filtering out desk rejects in advance is common practice for both conferences and journals.

To identify such desk rejects for NIER 2013, program co-chair Sebastian Elbaum and I made a first pass through all 160+ submissions. In the end, we desk rejected around 10% of the submissions (a little more than I had anticipated).

Causes for reject included problems in:

  • Formatting: The paper does not meet the 4 page limit;
  • Scope: The paper is not about software engineering;
  • Presentation: The paper contains, e.g., too many grammatical problems;
  • Innovation: The paper does not explain how it builds upon and
    extends the existing body of knowledge.

Of these, for NIER the formatting was responsible for half of the desk rejects.

Plagiarism
A potential cause that we did not encounter is plagiarism (fraud), or its special form self-plagiarism (submitting the same, or very similar, papers to multiple venues).

In my experience, plain plagiarism is not very common (I encountered one case in another conference, where we had to apply the IEEE Guidelines on Plagiarism).

Self-plagiarism is a bigger problem as it can range from copy-pasting a few paragraphs from an earlier paper to a straight double submission. While the former may be acceptable, the latter is considered a cardinal sin (your paper will be rejected at both venues, and reviewers don’t like reviewing a paper that cannot be accepted). And there are many shades of grey in between.

Notifications
We sent out notifications to authors of desk rejected papers within a few days after the submission deadline (it took a bit of searching to figure out that the best way to do this is to use the delete paper option from EasyChair). Thus, desk rejects not only serve to reduce the reviewing load of the program committee, but also to provide early feedback to authors whose papers just cannot make it.

Is there anything you can do to avoid being desk rejected?
The simple advice is to carefully read the submission guidelines. Besides that, it may be wise to submit a version adhering to all criteria early on when there is no immediate deadline stress yet. This may then serve as a fallback in case you mess up the final submission (uploading, e.g., the wrong pdf). Usually chairs have access to these earlier versions, and they can then decide to use the earlier version in case (only) the final version is a clear desk reject (for NIER this situation did not occur).

Is there anything you can do after being desk rejected?
Usually not. Most desk rejects are clear violations of submission requirements. If you think your desk reject is based on subjective grounds (presentation, innovation), and you strongly disagree, you could try to contact the chairs to get your paper into the reviewing phase anyway. The most likely outcome, however, will still be a reject, so it may not be in your self-interest to postpone this known outcome.

Submission times
And … are desk rejects are related to the paper submission time? Yes, there is a (mild) negative correlation: For NIER, there were more desk rejects in the earlier than in the later submissions. My guess is that this is quite common. There seem to be authors who simply try their same pdf at multiple conferences, hoping for an easy conference with little reviewing only.

Acceptance rates
This brings me to the final point. Conferences are commonly ranked based on their acceptance ratio. The lower the percentage of accepted papers, the more prestigious the conference is considered. The most interesting figure is obtained if acceptance rates are based on the serious competition only — i.e., the subset of papers that made it to the reviewing phase. Desk rejected papers do not qualify as such, and hence should  not be taken into account when computing conference acceptance rates.