Europe’s Open Access “Plan S” and Paper Publishing in Software Engineering Research

Posted in Society by Arie van Deursen

A year ago, more than a dozen influential research funders in Europe launched Plan S. This plan poses, from 2021 onwards, strict requirements on open access publishing of any research funded through the Plan S coalition. To understand what this means for my field of research, software engineering, I did some data collection. My data suggests that 14% (one out of seven) of the published papers are affected, meaning that conferences may lose 14% of their papers, unless publishers take action.

Plan S in a Nutshell

Plan S is an initiative launched by:

The European Union, which runs the Horizon Europe program of €100 billion (over 113 billion US dollars). It is the successor to H2020, and includes funding for the prestigious personal grants of the European Research Council (ERC).
Twelve national research funding organizations, from various European countries, such as The Netherlands (where I live), the United Kingdom, and Austria.

The aim of these Plan S “funders” (collectively called “Coalition S”), is that

With effect from 2021, all scholarly publications on the results from research funded by public or private grants provided by national, regional and international research councils and funding bodies, must be published in Open Access Journals, on Open Access Platforms, or made immediately available through Open Access Repositories without embargo.

The coalition has taken an axiomatic approach to expressing its plans, starting with 10 principles, followed by a Guidance to the Implementation. The results is a somewhat hard to understand document, in which there are multiple ways to become Plan S compliant.

In all forms of Plan S compliance the Creative Commons license plays a key role. As Plan S (under the header Rights and Licensing) puts it:

The public must be granted a worldwide, royalty-free, non-exclusive, irrevocable license to share (i.e., copy and redistribute the material in any medium or format) and adapt (i.e., remix, transform, and build upon the material) the article for any purpose, including commercial, provided proper attribution is given to the author.

This, thus, corresponds to the Creative Commons Attribution license, also known as CC BY. Note that this is a very generous license, essentially allowing anyone to do anything with the paper. Traditionally, publishers do not like this, as they wish to keep exclusive control over who distributes the paper.

Strictly speaking, Plan S does not require CC BY per se, but authors need to ask permission for any other license. For the CC BY-SA “Share-Alike” variant of the license permission will be granted automatically, but for CC BY-ND “No Derivatives” permission needs to be asked. Coalition S explicitly indicates that CC BY-NC “Non-Commercial” is not allowed:

We will not accept a Non-Commercial restriction on the re-use of research results.

Given this CC BY starting point, Plan S distinguishes three routes to compliance:

Open access venues: The conference or journal is gold open access, meaning all papers in it are freely available. This is “the ideal” case, from Plan S perspective, and compliant. Open access fees (“Article Processing Charges”) are common in this route, and will be refunded by Coalition S.
Subscription-based venues: These by themselves are non-compliant, but can be made compliant if the author immediately (no embargo) deposits the Author’s Accepted Manuscript (AAM) in a compliant repository with a CC BY license. This license is a complicating factor, since many publishers pose restrictions on redistribution of self-archived papers (they are self-archived, and no one else can do this — which is at odds with the sharing principle of CC BY). If such restrictions exist, a way out can exist if the venue permits hybrid open access, in which authors can pay an extra fee to make their own article open access available with a CC BY license. This model is offered by many publishers, but not by all. Note, however, that in Plan S, while this route is “compliant”, Plan S does not refund the APC fees.
Subscrition venues in transition: If the conference or journal is not open access yet, but in transition towards a full open access model by 2024, the publisher and Plan S can agree on “transformative arrangements”. In this case the paper will be compliant, and if there are fees involved they will (likely) be covered.

The 10 principles also address other issues relevant to open access: it requires that “the structure of fees must be transparent” (principle 05, suggesting that some of the current article processing charges are unexplainably high), and warns that the funders will monitor compliance and sanction non-compliant beneficiaries/grantees (principle 09, a direct threat to me).

Plan S should start in 2021, although publishers can earn some extra time by participating in the above-mentioned “transformative arrangements”.

Plan S Compliance in Software Engineering Research

To understand whether Plan S compliant publishing in my area of research, software engineering, is possible at the moment, I looked at the top 20 venues in the area of Software Systems, according to Google Metrics.

In these top 20 venues, just three are gold open access: POPL and OOPSLA, both published by ACM SIGPLAN, and ETAPS TACAS, published by Springer. It is in these venues that authors funded through Coalition S, can safely publish, following the gold open access route to compliance. Their open access fees will be covered by the Coalition S funders.

The remaining 17 are closed access subscription venues, published by ACM, IEEE, Elsevier and Springer. Authors who wish to publish there, and who need to be compliant with Plan S, would then have to resort to the self-archiving route.

Since the self-archiving constraints of these four publishers do not permit the use of CC BY without a fee, the hybrid route applies, in which (1) authors pay a fee; (2) the publisher distributes with CC BY; and (3) the author shares on a Plan S compliant repository. Note that this route is compliant, but that the fee is not refunded by Coalition S.

This self-archiving route works for IEEE journals, but not for IEEE conferences. This is because for IEEE conferences presently authors do not have the option to pay a fee to publish just their own paper open access (unlike ACM). As stated by IEEE in their FAQ on the “IEEE Policy Regarding Authors Rights to Post Accepted Versions of Their Articles”:

Currently IEEE does not have an Open Access program for conference articles.

In other words: Conferences published by IEEE are not Plan S compliant, not even with the green open access route (as IEEE does not permit CC BY).

Of the 20 venues, IEEE is the sole publisher of two conferences (ICSME and SANER), one magazine (IEEE Software), and the co-publisher of another three (ICSE, SANER, MSR) which are published alternatingly by IEEE or ACM.

In summary, of the 20 top venues:

Three are compliant through gold open access.
Eleven are compliant through a fee-based hybrid model with CC BY.
Three are half of the time compliant through a fee-based hybrid model with CC BY, the other half non-compliant.
Three can presently not be made compliant.

Note that other fields may fare better: top conferences in security (Usenix), AI (AAAI, NIPS), or OOPSLA/POPL/ICFP sponsored by SIGPLAN are all full gold open access. This, however, seems the exception rather than the rule.

Plan S Rationale

With Plan S requiring many publishers to change their policies, one may wonder what the rationale behind this plan is. The way I see it, the key reason for the European funders to propose this plan is leverage, in the following ways:

The European Union as a whole will benefit more from their €100 billion investment, if any (European) citizen can freely access the resulting knowledge;
Research is never conducted in isolation. Progress in research is not just visible in papers directly funded through a project, but also in subsequent papers building on top of those results (refuting, strengthening, criticizing, or expanding them). The more venues are open access, the higher the chance that these follow up results are also published as open access.
The universities in the European Union together will benefit financially if the publishing market shifts towards open access: The current profit margins of up to 40% of publishing giants like Elsevier are a waste of tax payer money that instead should be directly invested in research and education, the exact same causes that the EU and its Horizon Europe program seeks to advance as well. Pumping €100 billion into a system that wastes money at scale is ineffective.

Furthermore, note that this coalition works in all areas of research, including climate change, health care, and artificial intelligence. From the European perspective, the world needs informed societal debate about these topics. To that end, the EU is committed to maximizing the free availability of any research it is funding.

Last but not least, Coalition S is working hard to expand the list of funders, talking to both China and India, for example. Also, Jordania and Zambia have already joined, as well as the Bill and Melinda Gates Foundation (though their presence in computer science research is limited, compared to, e.g., China).

Impact on Software Engineering Conferences

With software engineering venues so clearly affected by Plan S, the next question is how many papers will be affected. Thus, I decided to collect some data, to measure the impact of Plan S in my field.

Since conferences (with full length rigorously reviewed papers) are dominant in software engineering, I focused on these. I picked two editions of ICSE and ESEC/FSE (for which I am a member of the steering committee) and for the smaller and more specialized ISSTA conference (which I happened to attend this summer).

For each published paper, I manually checked the acknowledgments to see whether the authors were beneficiaries from any of the Plan S funders. I did this for the main (technical research) track papers only, and not for, e.g., demonstration sub-tracks.

The results (also available as spreadsheet) are as follows:

A few results stand out:

Overall, 14% (1 in 7) of the papers currently receive grants from Coalition S.
The two big conferences, ICSE (over 1000 participants) and ESEC/FSE (over 300 participants), exhibit an impact on around 11-12% of the papers.
For the smaller ISSTA conference, more than 25% of the papers are (co-)funded through Coalition S. This number reflects the composition of the community, and the impact is enlarged by the small total number of papers. Should the affected researchers decide not to submit to ISSTA anymore, this may constitute an existential threat to the conference.
The EU is by far the biggest funder, with researchers and industry from many countries benefiting from participation in large EU projects. Furthermore, the EU ERC (Advanced) Grants are extremely prestigious (€2.5 million) and have been won by leaders in the software engineering field such as (in the collected data) Carlo Ghezzi, Mark Harman, Bashar Nuseibeh, and Andreas Zeller.
The UK is the second biggest funder, mostly through its EPSRC program. This is the UK’s national program, unrelated to the European Union. Thus, EPSRC’s participation in Coalition S will not be affected by Brexit (apart from increased financial pressure on ESPRC’s overall budget as the UK’s economy is shrinking).
While a small country with limited funds, Luxembourg is very active in the area of software engineering, causing high impact for, e.g., the ISSTA conference.

The 14% I found is substantially higher than the estimate of 6% impact found by Clarivate Analytics (cited by the ACM), and the 5% found by the ACM itself. If anything, this factor 3 or even factor 5 with ISSTA difference calls for a detailed assessment for each venue affected.

My data is based on what I saw in the acknowledgments: In reality it is likely that more papers are affected. You can check your own papers in my on line spreadsheet — corrections are welcome.

Collecting the data takes took me around a minute per paper. You are cordially invited to repeat this exercise for your own favorite conference or journal (TSE, EMSE, JSS, MSR, ICSME, RE, MODELS, …), and I will do my best to reflect your data in this post. If you’re a conference organizer, the safest thing to do is survey authors about their funding, enquiring about Coalition S based funding explicitly.

There is a another point to be made that required little data gathering.

The 14% figure relates to impact on the conference. Individual researchers can be affected much more. Our group at TU Delft, for example, has been very successful in attracting substantial funding both from the EU and from the Dutch NWO. As a consequence, for me personally, half of my publications will be affected. For some new PhD students starting in my group funded on such projects all publications will be affected.

A Call for Action

Clearly, the impact of Plan S can be substantial, on individual researchers as well as on conferences and journals.

This calls for action.

ACM, as one of the leading publishers in computer science, shared an update on their Plan S progress in their July 2019 news letter. It states:

It is worth noting that ACM has been working with various consortia in the US, Europe, and elsewhere on a framework for transitioning the traditional ACM Digital Library licensing (subscription) model to a Gold Open Access model utilizing an innovative “transformative agreement” model. More details will be announced later in 2019 as the first of these Agreements are executed; once these are in place, all ACM Publications will comply with the majority of Plan S requirements.

This is good news, and certainly not a simple undertaking. I sincerely hope that ACM will be able to meet not just the majority, but all requirements, and for all conferences and journals. This essentially implies a change of business model for the ACM Digital Library, from a subscription based to an author-(institution)-pays model. This in itself will not be easy, and is further complicated by several constraints and strong criteria imposed by Plan S, for example concerning cost transparency. The key challenge will be to convince Coalition S that these criteria are indeed met.

The ACM Special Interest Group on Programming Languages, SIGPLAN, meanwhile, sets an example on how to progress within the current setting. The research papers of three of its key conferences are published as part of the Proceedings of the ACM in Programming Languages. This is a Gold Open Access journal in which different volumes are devoted to different conferences. The POPL, OOPSLA, and ICFP conferences have adopted this model, and hence are fully open access. To quote the Inaugural Editorial Message by Philip Wadler:

PACMPL is a Gold Open Access journal. It will be archived in ACM’s Digital Library, but no membership or fee is required for access. Gold Open Access has been made possible by generous funding through ACM SIGPLAN, which will cover all open access costs in the event authors cannot. Authors who can cover the costs may do so by paying an Article Processing Charge (APC). PACMPL, SIGPLAN, and ACM Headquarters are committed to exploring routes to making Gold Open Access publication both affordable and sustainable.

The ACM SIG for Software Engineering, SIGSOFT, so far has not taken action along these lines. Nevertheless, this is simple to do, especially since SIGPLAN has laid out all the ground work.

Furthermore, last year, we as ACM SIGSOFT members elected Tom Zimmermann as our chair. In his statement for the elections he wrote:

We should make gold open access a priority for SIGSOFT

He also provided details on how to achieve this, mostly along the lines of SIGPLAN. By electing him, we as ACM SIGSOFT members gave him the mandate to carry this out. This will not be easy to do, but calls for all support from the full software engineering research community to help the ACM SIGSOFT leadership with this important mission.

The other main non-profit society publisher in software engineering is the IEEE. IEEE publishes various conferences and journals in software engineering on its own, such as ICSME, MODELS, RE and ICST. Furthermore, several major conferences are co-sponsored by IEEE and ACM together, such as ICSE and ASE.

Unfortunately, I have not been able to find on line information about IEEE’s vision on Plan S, and its impact on the conference proceedings published by the IEEE. This makes it very unclear what, from 2021 onwards, the publication options are for many software engineering conferences.

Nevertheless, it is my hope that IEEE will embrace Plan S, and move to open access conference proceedings, as many other society publishers have done.

This, then, will open the floor to joint open access publications, for example through the new fully open access “Proceedings of the ACM in Software Engineering”.

Version History

Version 0.4, 20-08-2019. First public version.
Version 0.5, 25-08-2019. Major update to reflect that self-archiving route can aslo be used to meet Plan S requirements.
Version 0.6, 26-08-2019. Small updates about CC BY options.
Version 0.7, 28-08-2019. Major update about repository route in combination with CC BY and hybrid open access, and transformative arrangements.
Version 0.8, 30-08-2019. Add links to IEEE open access faq/
Version 0.9, 04-09-2019. Small typos fixed

Note: IANAL — use this information at your own risk.

Acknowledgements: Thanks to Diomidis Spinellis, Simon Bains, Jeroen Bosman, Bianca Kramer, and Jeroen Sondervan for feedback on an earlier drafts on this post.

License: Copyright (c) Arie van Deursen, 2019. Licensed under CC BY.

Slide Deck

23Jul2018

The Battle for Affordable Open Access

Posted in Research by Arie van Deursen

Last week, Elsevier cut off thousands of scientists in Germany and Sweden from reading its recent journal articles, when negotiations over the cost of a nationwide open-access agreement broke down.

In these negotiations, universities are trying to change academic publishing, while publishers are defending the status quo. If you are an academic, you need to decide how to respond to this conflict:

If you don’t change your own behavior, you are chosing Elsevier’s side, helping them maintain the status quo.
If you are willing to change, you can help the universities. The simplest thing to do is to rigorously self-archive all your publications.

The key reason academic publishing needs to change is that academic publishers, including Elsevier, realize profit margins of 30-40%.

Euro bills

To put this number in perspective, consider my university, TU Delft. Our library spends €4-5 million each year on (journal) subscriptions. 30-40% of this amount, €1-2 million each year, ends up directly in the pockets of the shareholders of commercial publishers.

This is unacceptable. My university needs this money: To handle the immense work load coming with ever increasing student numbers, and to meet the research demands of society. A university cannot afford to waste money by just handing it over to publishers.

Universities across Europe have started to realize this. The Dutch, German, French, and Swedish universities have negotiated at the national level with publishers such as Springer Nature, Wiley, Taylor & Francis, Oxford University Press, and Elsevier (the largest publisher). In many cases deals have been made, with more and more options for open access publishing, at prices that were acceptable to the universities.

However, in several cases no deals have been made. The Dutch universities could not agree with the Royal Society of Chemistry Publishing, the French failed with Springer Nature, and now Germany and Sweden could not come to agreement with Elsevier. A common point of contention is that universities are only willing to pay for journal subscriptions if their employees can publish open access without additional article processing charges — a demand that directly challenges the current business model in academic publishing.

The negotiations are not over yet. Both in terms of open access availability and in terms of price publishers are far from where the universities want them to be. And if the universities would not negotiate themselves, tax payers and governments could simply force them, by putting a cap on the amount of money universities are allowed to spend on journal subscriptions.

Universities are likely to join forces, also across nations. They will determine maximum prices, and will not be willing to make exceptions. The negotiations will be brutal, as the publishers have much to loose and much to fight for.

In all these negotiations it is crucial that universities take back ownership of what they produce. Every single researcher can contribute, simply by making all of their own papers available on their institutional (pure.tudelft.nl for my university) or subject repositories (e.g., arxiv.org). This helps in two ways:

It helps researchers cut off (Germans and Swedes as we speak) from publishers in case negotiations fail.
It reduces the publishers’ power in future negotiations as the negative effects of cancellations have been reduced.

This seems like a simple thing to do, and it is: It should not take an average researcher more than 10 minutes to post a paper on a public repository.

Nevertheless, during my two years as department head I have seen many researchers who fail to see the need or take the time to upload their papers. I have begged, prayed, and pushed, wrote a green open access FAQ to address any legal concerns researchers might have, and wrote a step-by-step guide on how to upload a paper.

On top of that, my university, like many others, have made it compulsory for its employees to upload their papers to the institutional repository (this is not surprising since TU Delft plays a leading role in the Dutch negotiations between universities and publishers). Furthermore both national (NWO) and European (H2020, Horizon Europe) funding agencies insist on open access publications.

Despite all this, my department barely meets the university ambition of having 60% of its 2018 publications available as (green or gold) open access. To the credit of my departmental employees, however, they do better than many other departments. Also pre-print links uploaded to conference sites have typically been less than 60%, suggesting that the culture of self-archiving in computer science leaves much to be desired.

If anything, the recent cut off by Elsevier in Sweden and Germany emphasizes the need for self-archiving.

If you’re too busy to self-archive, you are helping Elsevier getting rich from public money.

If you do self-archive, you help your university explain to publishers that their services are only needed when they bring true value to the publishing process at an affordable price.

Euro image credit: pixabay, CC0 Creative Commons.

25Jun2018

My Last Program Committee Meeting?

Posted in Research by Arie van Deursen

This month, I participated in what may very well have been my last physical program committee (PC) meeting, for ESEC/FSE 2018. In 2017, top software engineering conferences like ICSE, ESEC/FSE, ASE and ISSTA (still) had physical PC meetings. In 2019, these four will all switch to on line PC meetings instead.

I participated in almost 20 of such meetings, and chaired one in 2017. Here is what I learned and observed, starting with the positives:

As an author, I learned the importance of helping reviewers to quickly see and concisely formulate the key contributions in a way that is understandable to the full pc.
As a reviewer I learned to study papers so well that I could confidently discuss them in front of 40 (randomly critical) PC members.
During the meetings, I witnessed how reviewers can passionately defend a paper as long as they clearly see its value and contributions, and how they will kill a paper if it has an irreparable flaw.
I started to understand reviewing as a social process in which reviewers need to be encouraged to change their minds as more information unfolds, in order to arrive at consensus.
I learned phrases reviewers use to permit them to change their minds, such as “on the fence”, “lukewarm”, “not embarrassing”, “my +1 can also be read as a -1”, “I am not an expert but”, etc. Essential idioms to reach consensus.
I witnessed how paper discussions can go beyond the individual paper, and trigger broad and important debate about the nature of the arguments used to accept or reject a paper (e.g. on evaluation methods used, impact, data availability, etc)
I saw how overhearing discussions of papers reviewed by others can be useful, both to add insight (e.g. additional related work) and to challenge the (nature of the) arguments used.
I felt, when I was PC co-chair, the pressure from 40 PC members challenging the consistency of any decision we made on paper acceptance. In terms of impact on the reviewing process, this may well be the most important benefit of a physical PC meeting.
I experienced how PC meetings are a great way to build a trusted community and make friends for life. I deeply respected the rigor and well articulated concerns of many PC members. And nothing bonds like spending two full days in a small meeting room with many people and insufficient oxygen.

I also witnessed some of the problems:

My biggest struggle was the incredible inefficiency of PC meetings. They take 1-2 days from 8am-6pm, you’re present at discussions of up to 100 papers discussed in 5-10 minutes each, yet participate in often less than 10 papers, in some cases just one or two.
I had to travel long distances just for meetings. Co-located meetings (e.g. the FSE meeting is typically immediately after ICSE) reduce the footprint, but I have crossed the Atlantic multiple times just for a two day PC meeting.
My family paid a price for my absence caused by almost 20 PC meetings. I have missed multiple family birthdays.
The financial burden on the conference (meeting room + 40 x dinner and 80 lunches, €5000) and each PC member (travel and 2-3 hotel nights, adding up easily to €750 per person paid by the PC members) is substantial.
I saw how vocal pc members can dominate discussions, yielding less opportunity for the more timid pc members who need more time to think before they dare to speak.
I hardly attended a PC meeting in which not at least a few PC members eventually had to cancel their trip, and at best participated via Skype. This gives papers reviewed by these PC members a different treatment. As PC chair for ESEC/FSE we had five PC members who could not make it, all for valid (personal, painful) reasons. I myself had to cancel one PC meeting a week before the meeting, when one of my children had serious health problems.
Insisting on a physical PC meeeting limits the choice of PC members: When inviting 40 PC members for ESEC/FSE 2017, we had 20 candidates who declined our invitation as they could not commit a year in advance to attending a PC meeting (in Buenos Aires).

Taking the pros and cons together, I have come to believe that the benefits do not outweigh the high costs. It must be possible to organize an on line PC meeting with special actions to keep the good parts (quality control, consistent decisions, overhearing/inspecting each others reviews, …).

I look forward to learning from ICSE, ESEC FSE, ISSTA and ASE experiences in 2019 and beyond about best practices to apply for organizing a successful on line PC meeting.

In principle, ICSE will have on line PC meetings in 2019, 2020, and 2021, after which the steering committee will evaluate the pros and cons.

As ICSE 2021 program co-chairs, Tao Xie and I are very happy about this, and we will do our best to turn the ICSE 2021 on line PC meeting into a great success, for the authors, the PC members, and the ICSE community. Any suggestions on how to achieve this are greatly appreciated.

T-Shirt saying "Last PC Meeting Ever?"

Christian Bird realized the ESEC/FSE 2018 PC meeting may be our last, and realized this nostalgic moment deserved a T-shirt of its own. Thanks!!

5Jan2017

Golden Open Access for the ACM: Who Should Pay?

Posted in Research by Arie van Deursen

In a move that I greatly support, the ACM Special Interest Group on Programming Languages (SIGPLAN), is exploring various ways to adopt a truly Golden Open Access model, by rolling out a survey asking your opinion, set up by Michael Hicks. Even though I myself am most active in ACM’s Special Interest Group on Software Engineering SIGSOFT, I do publish at and attend SIGPLAN conferences such as OOPSLA. And I sincerely hope that SIGSOFT will follow SIGPLAN’s leadership in this important issue.

ACM presently supports green open access (self-archiving) and a concept called “Open TOC” in which papers are accessible via a dedicated “Table of Contents” page for a particular conference. While better than nothing, I agree with OOPSLA 2017 program chair Jonathan Aldrich who explains in his blog post that Golden Open Access is much preferred.

This does, however, raise the question who should pay for making publications open access, which is part of the SIGPLAN survey:

Attendants Pay: Increase the conference fees: SIGPLAN estimates that this would amount to an increase by around $50,- per attendee.
Authors Pay: Introduce Article Processing Charges: SIGPLAN indicates that if a full conference goes open access this would presently amount to $400 per paper.

Note that the math here suggest that the number of registrants is around 8 times the number of papers in the main research track. Also note that it assumes that only papers in the main research track are made open access. A conference like ICSE, however, has many workshops with many papers: It is equally important that these become open access too, which would change the math considerably.

The article processing charges of $400,- are presented as a given: They may seem in line with what commercial publishers charge, but they are certainly very high compared to what, e.g. LIPIcs charges for ECOOP (which is less than $100). These costs of $400,- come from ACM’s desire (need) to continue to make a substantial profit from their publishing activities, and should go down.

In his blog post, Jonathan Aldrich argues for the “author pays” model. His reasoning is that this can be viewed as a “funder pays” model: Most authors are funded by research grants, and usually in those grants funds can be found to cater for the costs involved in publishing open access.

On this point (and this point alone) I disagree with Jonathan. To me it feels fundamentally wrong to punish authors by making them pay $400 more for their registration. If anything, they should get a reduction for delivering the content of the conference.

I see Jonathan’s point that some funding agencies are willing to cover open access costs (e.g. NSF, NWO, H2020), and that it is worthwhile to explore how to tap into that money. But this requires data on what percentage of papers could be labeled as “funded”. For my department, I foresee several cases where it would be the department who’d have to pay for this instead of an external agency.

I do sympathize with Jonathan’s appeal to reduce conference registration costs, which can be very high. But the cost of making publications open access should be borne by the full community (all attendants), not just by those who happen to publish a paper.

Shining examples of open access computer science conferences are the Usenix, AAAI, and NIPS events. Full golden open access of all content, and no extra charges for authors — these conferences are years ahead of the ACM.

Do you have an opinion on “author pays” versus “participant pays”? Fill in the survey!

Thank you SIGPLAN for initiating this discussion!

7Dec2016

Self-Archiving Publications in Elsevier Pure

Posted in Research by Arie van Deursen

In 2016, TU Delft adopted Elsevier Pure as its database to keep track of all publications from its employees.

At the same time, TU Delft has adopted a mandated green open access policy. This means that for papers published after May 2016, an author-prepared version (pdf) must be uploaded into Pure.

I am very happy with this commitment to green open access (and TU Delft is not alone). This decision also means, however, that we as researchers need to do some extra work, to make our author-prepared versions available.

To make it easier for you to upload your papers and comply with the green open access policy, here are some suggestions based on my experience so far working with Pure.

I can’t say I’m a big fan of Elsevier Pure. In the interest of open access, however, I’m doing my best to tolerate the quirks of Pure, in order to help the TU Delft to share all its research papers freely and persistently with everyone in the world.

Elsevier Pure is used at hundreds of different universities. If you work at one of them, this post may help you in using Pure to make your research available as open access.

The Outcome

Anyone can browse publications in Pure, available at https://pure.tudelft.nl.

All pages have persistent URL’s, making it easy to refer to a list of all your publications (such as my list), or individual papers (such as my recent one on crash reproduction). For all recent papers I have added a pdf of the version that we as authors prepared ourselves (aka the postprint), as well as a DOI link to the publisher version (often behind a paywall).

Thus, you can use Pure to offer, for each publication, your self-archived (green open access) version as well as the final publisher version.

Moreover, these publications can be aggregated to the section, department, and faculty level, for management reporting purposes.

In this way, Pure data shows the tax payers how their money is spent on academic research, and gives the tax payer free access to the outcomes. The tax payer deserves it that we invest some time in populating Pure with accurate data.

Accessing Pure

To enter publications into pure, you’ll need to login. On https://pure.tudelft.nl, in the footer at the right, you’ll find “Log into Pure”. Use your TU Delft netid.

If you’re interested in web applications, you will quickly recognize that Pure is a fairly old system, with user interface choices that would not be made these days.

Entering Meta-Data

You can start entering a publication by hitting the big green button “Add new” at the top right of the page. It will open a brand new browser window for you.

In the new window, click “Research Output”, which will turn blue and expand into three items.

Then there are several ways to enter a publication, including:

Import via Elsevier Scopus, found via “Import from Online Source”. This is by far the easiest, if (1) your publication venue is indexed by Scopus, (2) it is already visible at Scopus (which typically takes a few months), and if (3) you can find it on Scopus. To help Scopus, I have set up an ORCID author identifier and connected it to my Scopus author profile.
Import via Bibtex, found via “Import from file”. If you click it, importing from bibtex is one of the options. You can obtain bibtex entries from DBLP, Google Scholar, ACM, your departmental publications server, or write them by hand in your favorite editor, and then copy paste them into Pure.
Entering details via a series of buttons and forms (“Create from template”). I recommend not to use this option. If you go against this advice, make sure that if you want to enter a conference paper, you do not pick the template “Paper/contribution to conference”, as you should pick “Conference Contribution/Chapter in Conference Proceedings” instead. Don’t ask me why.

In all cases, yet another browser window is opened, in which you can inspect, correct, and save the bibliographic data. After saving, you’ll have a new entry with a unique URL that you can use for sharing your publication. The URL will stay the same after you make additional updates.

Entering your Author-Prepared version

With each publication, you can add various “electronic versions”.

Each can be a file (pdf), a link to a version, or a DOI. For pdfs you want to upload, make sure you check it meets the conditions under your publisher allows self-archiving.

Pure distinguishes various version types, which you can enter via the “Document version” pull down menu. Here you need to include at least the following two versions:

The “accepted author manuscript”. This is also called a postprint, and is the version that (1) is fully prepared by you as authors; and that (2) includes all improvements you made after receiving the reviews. Here you can typically upload the pdf as you prepared it yourself.
The “final published version”. This is the Publisher’s version. It is likely that the final version is copyrighted by the publisher. Therefore, you typically include a link (DOI) to the final version, and do not upload a pdf to Pure. If you import from Scopus, this field is automatically set.

Furthermore, Pure permits setting the “access to electronic version”, and defining the “public access”. Relevant items include:

Open, meaning (green) open access. This is what I typically select for the “accepted author manuscript”.
Restricted, meaning behind a paywall. This is what I typically select for the final published version.
Embargoed, meaning that the pdf cannot be made public until a set date. Can be used for commercial publishers who insist on restricting access to post-prints from institutional repositories in the first 1-2 years.

The vast majority (80%) of the academic publishers permits authors to archive their accepted manuscripts in institutional repositories such as Pure. However, publishers typically permit this under specific conditions, which may differ per publisher. You can check out my Green Open Access FAQ if you want to learn more about these conditions, and how to find them for your (computer science) publisher.

Once uploaded, your pdf is available for download for everyone. Pure adds a cover page with meta-data such as the citation (how it is published) and the DOI to the final version. This cover page is useful, as it helps to meet the intent of the conditions most publishers require on green open access publishing.

Google Scholar indexes Pure, so after a while your paper should also appear on your Scholar page.

A Paper’s Life Cycle

Making papers early available is one of the benefits of self-archiving. This can be done in Pure by setting the paper’s “Publication Status”. This field can have the following values:

“In preparation”: Literally a pre-print. Your paper can be considered a draft and may still change.
“Submitted”: You submitted your paper to a journal or conference where it is now under review.
“Accepted/In press”: Yes, paper accepted! This also means that you as an author can share your “accepted author manuscript”.
“E-Pub ahead of print”: I don’t see how this differs from the Accepted state.
“Published”: The paper is final and has been officially published.

In my Green Open Access FAQ I provide an answer to the question Which Version Should I Self-Archive.

I typically enter publications once accepted, and share the Pure link with the accepted author manuscript as pre-print link on Twitter or on conference sites (e.g. ICSE 2018)

In particular, I do the following once my paper is accepted:

I create a bibtex entry for an @inproceedings (conference, workshop) or @article (journal) publication.
I upload the bibtex entry into pure.
I add my own pdf with the author-prepared version to the resulting pure entry
I set the Publication Status to “Accepted”.
I set the Entry Status (bottom of the page) to “in progress”
I save the entry (bottom of the page)
I share the resulting Pure link on Twitter with the rest of the world so that they can read my paper.

Once the publisher actually manages to publish this paper as well (this may be several months later!), I update my pure entry:

I add the DOI link to the final published version.
I provide the missing bibliographic meta-data (page numbers, volume, number, …).
I set the Publication Status to “Published”.
I set the Entry Status to “for approval” (by the library who can then change it into an immutable “approved” if they think this is a valid entry).

My preprint links I shared still contain a pointer to the self-archived pdf, but now also to the official version at the publisher for those who have access through the pay wall.

Permalinks

The Pure page for your paper including all meta-information and all versions of that paper (example) in principle is stable, and its URL provide a permanent link (unless you delete it).

You can also directly link to the individual pdfs you upload (example). However, these are more volatile: If you upload a newer version the old link will be dead. Moreover, in some cases the (TU Delft) library has moved pdfs around thereby destroying old pdf links.

Therefore, I recommend to use links to the full record rather than individual pdfs when sharing pure links.

Self-Archiving Elsevier Papers

Elsevier does not like it if you self-archive papers published in Elsevier journals into Elsevier Pure. The official rules are that Elsevier journal papers are subject to an embargo, yet at the same time can be published with a CC-BY-NC-ND license on arxiv.

Combining these two leads to the following steps, assuming you have a pre-print (never reviewed), and a post-print (the author-prepared accepted version after review).

Upload your pre-print onto Arxiv.
Add a footnote to your post-print stating: This manuscript version is made available under the CC-BY-NC-ND 4.0 license.
Update your arxiv pre-print with your CC-BY-NC-ND licensed post-print, and add publication details (journal name, volume, issue) to your arxiv entry.
Create a Pure entry for your journal paper
Upload the post-print as author-accepted version to your Pure entry, make it available immediately, and set the license to CC-BY-NC-ND.

Note that the Elsevier rules explicitly allow steps 1-3, and in fact insists on the CC-BY-NC-ND license. Elsevier does not suggest you take step 5, but as a consequence of the CC-BY-NC-ND license you are permitted to do so.

What Elsevier would want you to do instead of step 5 is add the postprint to Pure under a (2 year) embargo, thus delaying (green) open access availability by 2 years. Elsevier Pure even supports this embargo option as one of the “access” options, in which you could enter the end-date of such an embargo.

Note: Yes, these steps are annoying. But: at the time of writing (2019), universities in Germany, Sweden, and California have no access to recent papers published by Elsevier. If you want your paper to be read in any of these countries make sure to upload it into your university repository. If you don’t want to go through these steps and you want your paper to be read, I recommend you pick a different publisher.

Complicated Author Names

Pure contains official employee names as registered by TU Delft.

Some authors publish under different (variants of their) names. For example, Dutch universities have trouble handling the complex naming habits of Portuguese and Brazilian employees.

If Pure is not able to map an author name to the corresponding employee, find the author name in the publication, click edit, and then click “Replace”. This allows searching the TU Delft employee database for the correct person.

If Pure has found the correct employee, but the name displayed is very differently from what is listed on the publication itself, you can edit the author for that publication, and enter a different first and last name for this publication.

Exporting Linked Bibtex (to Orcid)

If you’re logged in, you can download your publication list in various formats, including BibTex (you’ll find the button for this at the bottom of the page).

I prefer bibtex entries that have a url back to the place where all info is. Therefore, I wrote a little Python script to scrape a Pure web page (mine, yours, or anyone’s), that adds such information.

I use the bibtex entries produced by this script to populate my Orcid profile as well as our Departmental Publication Server with publications from Pure that link back to their corresponding pure page.

Version history

20 November 2016: Version 0.1, for internal purposes.
07 December 2016: Version 0.2, first public version.
14 December 2016: Version 0.3, minor improvements.
13 January 2017: Version 0.4, updated Google Scholar information.
16 March 2017: Version 0.5, updated approval states based on correction from Hans Meijerrathken.
17 March 2017: Version 0.6, life cycle and exporting added.
24 November 2017: Version 0.7, simplified life cycle and approval states.
03 March 2018: Version 0.8, added info on populating Orcid from Pure.
27 July 2018: Version 0.9, added info on permalinks, licensed as CC BY-SA 4.0
08 March 2019: Version 1.0, added info on publishing Elsevier papers.

Acknowledgments: Thanks to Moritz Beller for providing feedback and trying out Pure.

6Nov2016

Green Open Access FAQ

Posted in Research by Arie van Deursen

Image credit: Flickr, user static_view

(Opinionated) answers to frequently asked questions on (green) open access, from a computer science (software engineering) research perspective.

Disclaimer: IANAL, so if you want to know things for sure you’ll have to study the references provided. Use at your own risk.

Green open access is trickier than I thought, so I might have made mistakes. Corrections are welcome, just as additional questions for this FAQ. Thanks!

Green Open Access Questions

What is Green Open Access?
What is a pre-print?
What is a post-print?
What is a publisher’s version?
Do publishers allow Green Open Access?
Under what conditions is Green Open Access permitted?
What is Yellow Open Access?
What is Gold Open Access?
What is Hybrid Open Access?
What are the Self-Archiving policies of common computer science venues?
Is Green Open Access compulsory?
Should I share my pre-print under a Creative Commons license?
Can I use Green Open Access to comply with Plan S?
What is a good place for self-archiving?
Can I use PeerJ Preprints for Self-Archiving?
Can I use ResearchGate or Academia.edu for Self-Archiving?
Which version(s) should I self-archive?
What does Gold Open Access add to Green Open Access?
Will Green Open Access hurt commercial publishers?
What is the greenest publisher in computer science?
Should I use ACM Authorizer for Self-Archiving?
As a conference organizer, can I mandate Green Open Access?
What does Green Open Access cost?
Should I adopt Green Open Access?
Where can I learn more about Green Open Access?

What is Green Open Access?

In Green Open Access you as an author archive a version of your paper yourself, and make it publicly available. This can be at your personal home page, at the institutional repository of your employer (such as the one from TU Delft), or at an e-print server such as arXiv.

The word “archive” indicates that the paper will remain available forever.

What is a pre-print?

A pre-print is a version of a paper that is entirely prepared by the authors.

Since no publisher has been involved in any way in the preparation of such a pre-print, it feels right that the authors can deposit such pre-prints where ever they want to. Before submission, the authors, or their employers such as universities, hold the copyright to the paper, and hence can publish the paper in on line repositories.

Following the definition of SHERPA‘s RoMEO project, pre-prints refer to the version before peer-review organized by a publisher.

What is a post-print?

Following the RoMEO definitions, a post-print is a final draft as prepared by the authors themselves after reviewing. Thus, feedback from the reviewers has typically been included.

Here a publisher may have had some light involvement, for example by selecting the reviewers, making a reviewing system available, or by offering a formatting template / style sheet. The post-print, however, is author-prepared, so copy-editing and final markup by the publisher has not been done.

A (Plan S) synonym for postprint is “Author-Accepted Manuscript”, sometimes abbreviated as AAM.

What is a publisher’s version?

While pre- and post-prints are author-prepared, the final publisher’s version is created by the publisher.

The publishers involvement may vary from very little (camera ready version entirely created by authors) up to substantial (proof reading, new markup, copy editing, etc.).

Publishers typically make their versions available after a transfer of copyright, from the authors to the publisher. And with the copyright owned by the publisher, it is the publisher who determines not only where the publisher’s version can be made available, but also where the original author-prepared pre- or post-prints can be made available.

A (Plan S) synonym is “Version of Record”, sometimes abbreviated as VoR.

Do publishers allow Green Open Access?

Self-archiving of non-published material that you own the copyright to is always allowed.

Whether self-archiving of a paper that has been accepted by a publisher for publication is allowed depends on that publisher. You have transferred your copyright, so it is up to the publisher to decide who else can publish it as well.

Different publishers have different policies, and these policies may in turn differ per journal. Furthermore, the policies may vary over time.

The SHERPA project does a great job in keeping track of the open access status of many journals. You’ll need to check the status of your journal, and if it is green you can self-archive your paper (usually under certain publisher-specific conditions).

In the RoMEO definition, green open access means that authors can self-archive both pre-prints and post-prints.

Under what conditions is Green Open Access permitted?

Since the publisher holds copyright on your published paper, it can (and usually does) impose constraints on the self-archived versions. You should always check the specific constraints for your journal or publisher, for example via the RoMEO journal list.

The following conditions are fairly common:

You generally can self-archive pre- and post-prints only, but not the publisher version.
In the meta-data of the self-archived version you need to add a reference to the final version (for example through its DOI).
In the meta-data of the self-archived version you need to include a statement of the current ownership of the copyright, sometimes through specific sentences that must be copy-pasted.
The repository in which you self-archive should be non-commercial. Thus, arXiv and institutional repositories are usually permitted, but commercial ones like PeerJ Preprints, Academia.edu or ResearchGate are not.
Some commercial publishers impose an embargo on post-prints. For example Elsevier permits sharing the post-print version on an institutional repository only after 12-24 months (depending on the journal).

Usually meeting the demands of a single publisher is relatively easy to do. Given points 2 and 3, it typically involves creating a dedicated pdf with a footnote on the first page with the required extra information.

However, every publisher has its own rules. If you publish your papers in a range of different venues (which is what good researchers do), you’ll have to know many different rules if you want to do green open access in the correct way.

What is Yellow Open Access?

Some publishers (such as Wiley) allow self-archiving of pre-prints only, and not of post-prints. This is referred to as yellow open access in RoMEO. Yellow is more restrictive than green.

As an author, I find yellow open access frustrating, as it forbids me to make the version of my paper that was improved thanks to the reviewers available via open access.

As a reviewer, I feel yellow open access wastes my effort: I tried to help authors by giving useful feedback, and the publisher forbids my improvements to be reflected in the open access version.

What is Gold Open Access?

Gold Open Access refers to journals (or conference proceedings) that are completely accessible to the public without requiring paid subscriptions.

Often, gold implies green, for example when a publisher such as PeerJ, PLOS ONE or LIPIcs adopts a Creative Commons license — which allows anyone, including the authors, to share a copy under the condition of proper attribution.

The funding model for open access is usually not based on subscriptions, but on Article Processing Charges, i.e., a payment by the authors for each article they publish (varying between $70 (LIPIcs) up to $1500 (PLOS ONE) per paper).

What is Hybrid Open Access?

Hybrid open access refers to a restricted (subscription-funded) journal that permits authors to pay extra to make their own paper available as open access.

This practice is also referred to as double dipping: The publisher catches revenues from both subscriptions and author processing charges.

University libraries and funding agencies do not like hybrid access, since they feel they have to pay twice, both for the authors and the readers.

Green open access is better than hybrid open access, simply because it achieves the same (an article is available) yet at lower costs.

What are the Self-Archiving policies of common computer science venues?

For your and my convenience, here is the green status of some publishers that are common in software engineering (check links for most up to date information):

ACM: Green, e.g., TOSEM, see also the ACM author rights. For ACM conferences, often the author-prepared camera-ready version includes a DOI already, making it easy to adhere to ACM’s meta-data requirements. Note that some ACM conference are gold open access, for example the ones published in the Proceedings of the ACM on Programming Languages.
IEEE: Green, e.g., TSE. The IEEE has a policy that the IEEE makes a version available that meets all IEEE meta-data requirements, and that authors can use for self-archiving. See also their self-archiving FAQ.
Springer: Green, e.g., EMSE, SoSyM, LNCS. Pre-print on arXiv, post-print on personal page immediately and in repository in some cases immediately and in others after a 12 month embargo period.
Elsevier: Mostly green, e.g., JSS, IST. Pre-print allowed; post-print with CC BY-NC-ND license on personal page immediately and in institutional repository after 12-48 month embargo period. To circumvent the embargo you can publish the pre-print on arxiv, update it with the post-print (which is permitted), and update the license to CC BY-NC-ND as required by Elsevier, after which anyone (including you) can share the postprint on any non-commercial platform.
Wiley: Mostly yellow, i.e., only pre-prints can be immediately shared, and post-prints (even on personal pages) only after 12 month embargo. E.g. JSEP.

Luckily, there are also some golden open access publishers (which typically permit self-archiving as well should you still want that):

PeerJ Computer Science: Gold (creative commons) and green.
Usenix: Gold since 2008. Published with PeerJ. Authors retain their copyright.
LIPIcs-based proceedings: Conferences publishing their papers via Dagstuhl’s Leibniz International Proceedings in Informatics LIPIcs, such as ECOOP, FSCD, …
IEEE Access: The ‘mega-journal’ from IEEE covering all IEEE’s fields of interest.
PLOS ONE: The successful (nonprofit) mega-journal that also publishes computer science papers.
Many venues in Artificial Intelligence, including AAAI, the Journal of Machine Learning Research, Computational Linguistics, the Semantic Web Journal, or the Annual Conference on Neural Information Processing Systems (NIPS).
Specialized conferences or journals such as the Journal of Object Technology or Computational Linguistics.

Is Green Open Access compulsory?

Funding agencies (NWO, EU, Bill and Melinda Gates Foundation, …) as well as universities (TU Delft, University of California, UCL, ETH Zurich, Imperial College, …) are increasingly demanding that all publications resulting from their projects or employees are available in open access.

My own university TU Delft insists, like many others, on green open access:

As of 1 May 2016 the so-called Green Road to Open Access publishing is mandatory for all (co)authors at TU Delft. The (co)author must publish the final accepted author’s version of a peer-reviewed article with the required metadata in the TU Delft Institutional Repository.

This makes sense: The TU Delft wants to have copies of all the papers that its employees produce, and make sure that the TU Delft stakeholders, i.e. the Dutch citizens, can access all results. Note that TU Delft insists on post-prints that include reviewer-induced modifications.

The Dutch national science foundation NWO has a preference for gold open access, but accepts green open access if that’s impossible (“Encourage Gold, require immediate Green“).

Should I share my pre-print under a Creative Commons license?

You should only do this if you are certain that the publisher’s conditions on self-archiving pre-prints are compatible with a Creative Commons license. If that is the case, you probably are dealing with a golden open access publisher anyway.

Creative Commons licenses are very liberal, allowing anyone to re-distribute (copy) the licensed work (under certain conditions, including proper attribution).

This effectively nullifies (some of) the rights that come with copyright. For that reason, publishers that insist on owning the full copyright to the papers they publish typically disallow self-archiving earlier versions with such a license.

For example, ACM Computing Surveys insists on a set statement indicating

… © ACM, YYYY. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution…

This “not for redistribution” is incompatible with Creative Commons, which is all about sharing.

Furthermore, a Creative Commons license is irrevocable. So once you picked it for your pre-print, you effectively made a choice for golden open access publishers only (some people might consider this desirable, but it seriously limits your options).

Therefore, my suggestion would be to keep the copyright yourself for as long as you can, giving you the freedom to switch to Creative Commons once you know who your publisher is.

Can I use Green Open Access to Comply with Plan S?

Yes, you can, but you are only compliant with Plan S if you share your postprint, with a Creative Commons License, immediately (no embargo).

But, unfortunately, the creative commons license is likely incompatible with the constraints of your publisher of the eventual paper. As a way around, in some (most) cases (e.g., ACM, IEEE journals, Springer) you are allowed to distribute your postprint with a CC BY license if you actually pay the hybrid open access fee. These fees are not refundable under Plan S, but this hybrid-and-then-self-archive route is compliant with Plan S.

What is a good place for self-archiving?

It depends on your needs.

Your employer may require that you use your institutional repository (such as the TU Delft Repository). This helps your employer to keep track of how many of its publications are available as open access. The higher this number, the stronger the position of your employer when negotiating open access deals with publishers. Institutional archiving still allows you to post a version elsewhere as well.

Subject repositories such as arXiv offer good visibility to your peers. In fields like physics using arXiv is very common, whereas in Computer Science this is less so. A good thing about arXiv is that it permits versioning, making it possible to submit a pre-print first, which can then later be extended with the post-print. You can use several licenses. If you intend publishing your paper, however, you should adopt arXiv’s Non-Exclusive Distribution license (which just allows arXiv to distribute the paper) instead of the more generous Creative Commons license — which would likely conflict with the copyright claims of the publisher of the refereed paper.

Your personal home page is a good place if you want to offer an overview of your own research. Home page URLs may not be very permanent though, so as an approach to self archiving it is not suitable. You can use it in addition to archiving in repositories, but not as a replacement.

Can I use PeerJ Preprints for Self-Archiving?

Probably not — and it’s also not what PeerJ Preprints are intended for.

PeerJ Preprints is a commercial eprint server requiring a Creative Commons license. It is intended to share drafts that have not yet been peer reviewed for formal publication.

It offers good visibility (a preprint on goto statements attracted 15,000 views), and a smooth user interface for posting comments and receiving feedback. Articles can not be removed once uploaded.

The PeerJ Preprint service is compatible with other golden open access publishers (such as PeerJ itself or Usenix).

The PeerJ Preprint service, however, is incompatible with most other publishers (such as ACM, IEEE, or Springer) because (1) the service is commercial; (2) the service requires Creative Commons as license; (3) preprints once posted cannot be removed.

So, if you want to abide with the rules, uploading a pre-print to PeerJ Preprints severely limits your subsequent publication options.

Can I use ResearchGate or Academia.edu for Self-Archiving?

No — unless you only work with liberal publishers with permissive licenses such as Creative Commons.

ResearchGate and Academia.edu are researcher social networks that also offer self-archiving features. As they are commercial repositories, most publishers will not allow sharing your paper on these networks.

The ResearchGate copyright pages provide useful information on this.

The Academia.edu copyright pages state the following:

Many journals will also allow an author to retain rights to all pre-publication drafts of his or her published work, which permits the author to post a pre-publication version of the work on Academia.edu. According to Sherpa, which tracks journal publishers’ approach to copyright, 90% of journals allow uploading of either the pre-print or the post-print of your paper.

This seems misleading to me: Most publishers explicitly dis-allow posting preprints to commercial repositories such as Academia.edu.

In both cases, the safer route is to use permitted places such as your home page or institutional repository for self-archiving, and only share links to your papers with ResearchGate or Academia.edu.

Which version(s) should I self-archive?

It depends.

Publishing a pre-print as soon as it is ready has several advantages:

You can receive rapid feedback on a version that is available early.
You can extend your pre-print with an appendix, containing material (e.g., experimental data) that does not fit in a paper that you’d submit to a journal
It allows you to claim ownership of certain ideas before your competition.
You offer most value to society since you allow anyone to benefit as early as possible from your hard work

Nevertheless, publishing a post-print only can also make sense:

You may want to keep some results or data secret from your competition until your paper is actually accepted for publication.
You may want to avoid confusion between different versions (pre-print versus post-print).
You may be scared to leave a trail of rejected versions submitted to different venues.
You may want to submit your pre-print to a venue adopting double blind reviewing, requiring you to remain anonymous as author. Publishing your pre-print during the reviewing phase would make it easy for reviewers to find your paper and connect your name to it.

For these reasons, and primarily to avoid confusion, I typically share just the post-print: The camera-ready version that I create and submit to the publisher is also the version that I self-archive as post-print.

What does Gold Open Access add to Green Open Access?

For open access, gold is better than green since:

it removes the burden of making articles publicly available from the researcher to the publisher.
it places a paper in a venue that is entirely open access. Thus, also other papers improving upon, or referring to your paper (published in the same journal) will be open access too.
gold typically implies green, i.e., the license of the journal is similar to Creative Commons, allowing anyone, including the authors, to share a copy under the condition of proper attribution.

Will Green Open Access hurt commercial publishers?

Maybe. But most academic publishers already allow green open access, and they are doing just fine. So I would not worry about it.

What is the greenest publisher in computer science?

The greenest publisher should be the one imposing the least restrictions on self-archiving.

From that perspective, publishers who want to be the greenest should in fact want to be gold, making their papers available under a permissive Creative Commons license. An example is Usenix.

Among the non-golden publishers, the greenest are probably the non-commercial ones, such as IEEE and ACM: They require simple conditions that are usually easy to meet.

The ACM, “the world’s largest educational and scientific computing society”, claims to be among the “greenest” publishers. Based on their tolerant attitude towards self-archiving of post-prints this may be somewhat justified. Furthermore, their Authorizer mechanism permits setting up free access to the publisher’s version.

But greenest is gold. So I look forward to the day the ACM follows its little sister Usenix in a full embrace of golden open access.

Should I use ACM Authorizer for Self-Archiving?

The ACM offers the Authorizer mechanism to provide free access to the Publisher’s Version of a paper, which only works from one user-specified URL. For example, I can use it to create a dedicated link from my institutional paper page to the publisher’s version.

However, Authorizer links cannot be accessed from other pages, and there is no point in emailing or tweeting them. Since only one authorizer link can exist per paper, I cannot use an authorizer link for both my institutional repository, and for the repository of my funding agency.

These restrictions on Authorizer links make them unsuitable as a replacement for self-archiving (let alone as a replacement for golden open access).

As a conference organizer, can I mandate Green Open Access?

Green open access is self-archiving, giving the authors the permission to archive their own papers.

As a conference organizer working with a non open access (ACM, IEEE, Springer-Verlag) publisher, you are not allowed to archive and distribute all the papers of the conference yourself.

What several conferences do instead, though, is collecting links to pre- or post-prints. For example, the on line program of the recent OOPSLA 2016 conference has links to both the publisher’s version (through a DOI) and to an author-provided post-print.

For OOPSLA, 20 out of the 52 (38%) of the authors provided such a link to their paper, a number that is similar in other conferences adopting such preprint linking.

As a conference organizer, you can do your best to encourage authors to submit their pre-print links. Or you can use your influence in the steering committee to push the conference to switch to an open access publisher, such as LIPIcs or Usenix.

As an author, you can help by actually offering a link to your pre-print.

What does Green Open Access cost?

For authors, green open access typically costs no money. University repositories, arXiv, and PeerJ Preprints are all free to use.

It does cost (a bit of) effort though:

You need to find out the specific conditions under which the publisher of your current paper permits self-archiving.
You need to actually upload your paper to some repository, provide the correct meta-data, and meet the publisher’s constraints.

The fact that open access is free for authors does not mean that there are no costs involved. For example, the money to keep arXiv up and running comes from a series of sponsors, including TU Delft.

Should I adopt Green Open Access?

Yes.

Better availability of your papers will help you in several ways:

Impact in Research: Other researchers can access your papers more easily, increasing the chances that they will build upon your results in their work;
Impact in Practice: Practitioners may be interested in using your results: A pay-wall is an extra and undesirable impediment for such adoption;
Improved Results: Increased usage of your results in either industry or academia will put your results to the real test, and will help you improve your results.

Besides that, (green) open access is a way of delivering to the tax payers what they paid for: Your research results.

Where can I learn more about Green Open Access?

Useful resources include:

SHERPA / RoMEO: Green Open Access conditions and restrictions for all journals and publishers.
The UCL Open Access FAQs.
The IEEE Self-Archiving FAQ.

Version history:

6 November 2016: Version 0.1, Initial version, call for feedback.
14 November 2016: Version 0.2, update on commercial repositories.
18 November 2016: Version 0.3, update on ACM Authorizer.
20 November 2016: Version 0.4, added TOC, update on commercial repositories.
06 December 2016: Version 0.5, updated information on ACM and IEEE.
20 December 2016: Version 0.6, added info on Creative Commons and AI venues.
27 July 2018: Version 0.7, update on where to archive. Released as CC BY-SA 4.0.
18 November 2018: Version 0.8, updated info on Elsevier.
10 September, 2019: Version 0.9, added question on Plan S compliance.

Acknowledgments: I thank Moritz Beller (TU Delft) and Dirk Beyer (LMU Munich) for valuable feedback and corrections.

10Jul2013

Some Research Paper Writing Recommendations

Posted in Research by Arie van Deursen

Last week, I received an email from Alex Orso and Sebastian Uchitel, who had been asked to give a talk on “How to get my papers accepted at top SE conferences” at the Latin American School on Software Engineering. Here’s their question:

We hope you can spare a few minutes to share with us the key recommendations you would give to PhD students that have not yet had successful submissions to top software engineering conferences, such as ICSE.

An interesting request, and ~~I certainly look forward to receive some of the advice my fellow researchers will be providing~~ you can see the advice of my fellow researchers in a presentation by Alex Orso.

When working with my students on papers, I must admit I sometimes repeat myself. Below are some of the things I hear myself say most often.

Explain the Innovation

The first thing to keep in mind is that a research paper should explain the innovation. This makes it quite different from a text book chapter, or from a hands-on tutorial. The purpose is not to explain a technique so that others can use it. Instead, the purpose of a research paper is to explain what is new about the proposed technique.

Identify the Contributions

Explaining novelty is driven by contributions. A contribution is anything the world did not know before this paper, but which we now do know thanks to this paper.

I tend to insist on an explicit list of contributions, which I usually put at the end of the paper.

“The contributions of this paper are …”

Each contribution is an outcome, not the process of doing something. Contributions are things, not effort. Thus, “we spent 6 months manually analyzing 500,000 commit messages” is not a contribution. This effort, though, hopefully has resulted in a useful contribution, which may be that “for projects claiming to do test-driven development, to our surprise we found that 75% of the code commits are not accompanied by a commit in the test code.”

Usually, when thinking about the structure of a paper, quite a bit of actual research has been done already. It is then important to reassess everything that has been done, in order to see what the real contributions of the research are. Contributions can include a new experimental design, a novel technique, a shared data set or open source tool, as well as new empirical evidence contradicting, confirming, or enriching existing theories.

Structure the Paper

With the contributions laid out, the structure of the paper appears naturally: Each contribution corresponds to a section.

This does not hold for the introductory and concluding sections, but it does hold for each of the core sections.

Furthermore, it is essential to separate background material from own contributions. Clearly, most papers will rely on existing theories or techniques. These must be explained. Since the goal of the paper is to explain the innovation, all material that is not new should be clearly isolated. In this way, it easiest for the reader (and the reviewer) to see what is new, and what is not new, about this paper.

As an example, take a typical structure of a research paper:

Introduction
Background: Cool existing work that you build upon.
Problem statement: The deficiency you spotted
Conceptual solution: A new way to deal with that problem!
Open source implementation: Available for everyone!
Experimental design for evaluation: Trickier than we thought!
Evaluation results: It wasn’t easy to demonstrate, but yes, we’ve good evidence that this may work!
Discussion: What can we do with these results? Promising ideas for future research or applications? And: a critical analysis of the threats to the validity of our results.
Related work
Concluding remarks.

In such a setup, sections 4-7 can each correspond to a contribution (and sometimes to more than one). The discussion section (8) is much more speculative, and usually does not contribute solid new knowledge.

Communicate the Contributions

Contributions not just help in structuring a paper.

They are also the key aspect program committees look at when deciding about acceptence of a paper.

When reviewers evaluate a paper, they try to identify, and interpret the contributions. Are these contributions really new? Are they important? Are they trivial? Did the authors provide sufficient evaluations for their claims? The paper should help the reviewer, by being very explicit about the contributions and the claims to fame of these contributions.

When program committee members discuss a paper, they do so in terms of contributions. Thus, contributions should not just be strong, they should also be communicable.

For smaller conferences, it is safe to assume that all reviewers are epxerts. For large conferences, such as ICSE, the program committee is broad. Some of reviewers will be genuine experts on the topic of the paper, and these reviewers should be truly excited about the results. Other reviewers, however, will be experts in completely different fields, and may have little understanding of the paper’s topic. When submitting to prestigious yet broad conferences, it is essential to make sure that any reviewer can understand and appreciate the contributions.

The ultimate non-expert is the program chair. The chair has to make a decision on every paper. If the program chair cannot understand a paper’s contributions, it is highly unlikely that the paper will get accepted.

Share Contributions Early

Getting a research paper, including its contributions, right, is hard. Especially since contributions have to be understandable by non-experts.

Therefore, it is crucial to offer help to others, volunteering to read preliminary drafts of papers, assessing the strength of theircontributions. In return, you’ll have other people, possibly non-experts, assess the drafts you are producing, in this way helping each other to publish a paper at this prestigious conference.

But wait. Isn’t helping others a bad idea for highly competitive conferences? Doesn’t it reduce one’s own chances?

No. Software engineering conferences, including ICSE and FSE, accept any paper that is good. Such conferences do not work with accpetance rates that are fixed in advance. Thus, helping each other may increase the acceptance rate, but will not negatively affect any author.

Does This Help?

I hope some of these guidelines will be useful to “PhD students that have not yet had successful submissions to top software engineering conferences, such as ICSE.”

A lot more advice is available on the Internet on how to write a research paper. I do not have a list of useful resources available at the time of writing, but perhaps in the near future I will extend this post with useful additional pointers.

Luckily, this post is not a research paper. None of the ideas presented here is new. But they have worked for me, and I hope they’ll work for you too.

Image credits: Pencils in the Air, by Peter Logan, Photo by Mira66. flickr

10Jun2013

Green Open Access and Preprint Linking

Posted in Research by Arie van Deursen

One of the most useful changes to the ICSE International Conference on Software Engineering this year, was that the program website contained links to preprints of many of the papers presented.

As ICSE is a large event (over 1600 people attended in 2013), it is worth taking a look at what happened. What is preprint linking? How many authors actually provided a preprint link? What about other conferences? What are the wider implications for open access publishing in software engineering?

Self-Archiving

Preprint linking is based on the idea that authors, who do all the work in both writing and formating the paper, have the right to self-archive the paper they created themselves (also called green open access). Authors can do this on their personal home page, in institutional repositories of, e.g., the universities where they work or in public preprint repositories such as arxiv.

Sharing preprints has been around in science since decades (if not ages): As an example, my ‘alma mater’ CWI was founded in 1947, and has a technical report series dating back to that year. These technical reports were exchanged (without costs) with other mathematical research institutes. First by plain old mail, then by email, later via ftp, and now through http.

While commercial publishers may dislike the idea that a free preprint is available for papers they publish in their journals or conference proceedings, 69% of the publishers do in fact allow (some form of) self-archiving. For example, ACM, IEEE, Springer, and Elsevier (the publishers I work most with) explicitly permit it, albeit always under specific conditions. These conditions can usually be met, and include such requirements as providing a note that the paper has been accepted for publication, a pointer to the URL where the published article can be found, and a copyright notice indicating the publisher now owns the copyright.

Preprint links as shown on ICSE site.

Preprint Linking

All preprint linking does, is ask authors of accepted conference papers, whether they happen to have a link to a preprint available. If so, the conference web site will include a link to this preprint in its progam as listed on its web site.

For ICSE, doing full preprint linking at the conference site was proposed and conducted by Dirk Beyer, after an earlier set of preprint links was collected on a separate github gist by Adrian Kuhn.

Dirk Beyer runs Conference Publishing Consulting, the organization hired by ICSE to collect all material to be published, and get it ready for inclusion in the ACM/IEEE Digital Libraries. As part of this collection process, ICSE asked the authors to provide a link to a preprint, which, if provided, was included in the ICSE on line program.

The ICSE 2013 proceedings were published by IEEE. In their recently updated policy, they indicate that “IEEE will make available to each author a preprint version of that person’s article that includes the Digital Object Identifier, IEEE’s copyright notice, and a notice showing the article has been accepted for publication.” Thus, for ICSE, authors were provided with a possibility to download this version, which they then could self-archive.

Preprints @ ICSE 2013

With a preprint mechanism setup at ICSE, the next question is how many researchers actually made use of it. Below are some statistics I collected from the ICSE conference site:

Track / Conference	#Papers presented	#Preprints	Percentage
Research Track	85	49	57%
ICSE NIER	31	19	61%
ICSE SEIP	19	6	31%
ICSE Education	13	3	23%
ICSE Tools	16	7	43%
MSR	64	36	56%
Total	228	120	53%

In other words, a little over half of the authors (53%) provided a preprint link. And, almost half of the authors decided not to.

I hope and expect that for upcoming ICSE conferences, more authors will submit their preprint links. As a comparison, at the recent FORTE conference, 75% of the authors submitted a preprint link.

For ICSE, this year was the first time preprint linking was available. Authors may have not been familiar with the phenomenon, may not have realized in advance how wonderful a program with links to freely available papers is, may have missed the deadline for submitting the link, or may have missed the email asking for a link altogether as it ended up in their spam folder. And, in all honesty, even I managed to miss the opportunity to send in my link in time for some of my MSR 2013 papers. But that won’t happen again.

Preprint Link Sustainability

An issue of some concern is the “sustainability” of the preprint links — what happens, for example, to homepages with preprints once the author changes jobs (once the PhD student finishes)?

The natural solution is to publish preprints not just on individual home pages, but to submit them to repositories that are likely to have a longer lifetime, such as arxiv, or your own technical report series.

An interesting route is taken by ICPC, which instead of preprint links simply provides a dedicated preprint search on Google Scholar, with authors and title already filled in. If a preprint has been published somewhere, and the author/title combination is sufficiently unique, this works remarkably well. MSR uses a mixture of both appraoches, by providing a search link for presentations for which no preprint link was provided.

Implications

Open access, and hence preprint publishing, is of utmost importance for software engineering.

Software engineering research is unique in that it has a potentially large target audience of developers and software engineering practitioners that is on line continually. Software engineering research cannot afford to dismiss this audience by hiding research results behind paywalls.

For this reason, it is inevitable that on the long run, software engineering researchers will transform their professional organizations (ACM and IEEE) so that their digital libraries will make all software engineering results available via open access.

Irrespective of this long term development, the software engineering research community must hold on to the new preprint linking approach to leverage green open access.

Thus:

As an author, self-archive your paper as a preprint or technical report. Consider your paper unpublished if the preprint is not available.
If you are a professor leading a research group, inspire your students and group members to make all of their publications available as preprint.
If you are a reviewer for a conference, insist that your chairs ensure that preprint links are collected and made available on the conference web site.
If you are a conference organizer or program chair, convince all authors to publish preprints, and make these links permanently available on the conference web site.
If you are on a hiring committee for new university staff members, demand that candidates have their publications available as preprints.

Much of this has been possible for years. Maybe one of the reasons these practices have not been adopted in full so far, is that they cost some time and effort — from authors, professors, and conference organizers alike — time that cannot be used for creative work, and effort that does not immediately contribute to tenure or promotion. But it is time well spent, as it helps to disseminate our research to a wider audience.

Thanks to the ICSE move, there now may be a momentum to make a full swing transition to green open access in the software eningeering community. I look forward to 2014, when all software engineering conferences will have adopted preprint linking, and 100% of the authors will have submitted their preprint links. Let us not miss this unique opportunity.

Acknowledgments

I am grateful to Dirk Beyer, for setting up preprint linking at ICSE, and for providing feedback on this post.

Update (Summer 2013)

ESEC FSE 2013 adopted preprint linking, and shows another good use of it: Prior to the conference every day one or two preprint links were tweeted through the conference’s @esecfse Twitter account.
For 2013, SPLASH / OOPSLA has included preprint links in its program as well.

24Apr2013

David Notkin on Why We Publish

Posted in Research, Teaching by Arie van Deursen

This week David Notkin (1955-2013) passed away, after a long battle against cancer. He was one of my heroes. He did great research on discovering invariants, reflexion models, software architecture, clone analysis, and more. His 1986 Gandalf paper was one of the first I studied when starting as a PhD student in 1990.

December 2011 David sent me an email in which he expressed interest to do a sabbatical in our TU Delft Software Engineering Research Group in 2013-2014. I was as proud as one can be. Unfortunately, half a year later he had to cancel his plans due to his health.

David was loved by many, as he had a genuine interest in people: developers, software users, researchers, you. And he was a great (friendly and persistent) organizer — 3 weeks ago he still answered my email on ICSE 2013 organizational matters.

In February 2013, he wrote a beautiful editorial for the ACM Transactions on Software Engineering and Methodology, entitled Looking Back. His opening reads: “It is bittersweet to pen my final editorial”. Then David continues to address the question why it is that we publish:

“… I’d like very much for each and every reader, contributor, reviewer, and editor to remember that the publications aren’t primarily for promotions, or for citation counts, or such.
Rather, the intent is to make the engineering of software more effective so that society can benefit even more from the amazing potential of software.
It is sometimes hard to see this on a day-to-day basis given the many external pressures that we all face. But if we never see this, what we do has little value to society. If we care about inﬂuence, as I hope we do, then adding value to society is the real measure we should pursue.
Of course, this isn’t easy to quantify (as are many important things in life, such as romance), and it’s rarely something a single individual can achieve even in a lifetime. But BHAGs (Big Hairy Audacious Goals) are themselves of value, and we should never let them fade far from our minds and actions.”

Dear David, we will miss you very much.

Desk Rejected

Posted in Research by Arie van Deursen

One of the first things we did after all NIER 2013 papers were in, was identifying papers that should be desk rejected. What is a desk reject? Why are papers desk rejected? How often does it happen? What can you do if your paper is desk rejected?

A desk reject means that the program chairs (or editors) reject a paper without consulting the reviewers. This is done for papers that fail to meet the submission requirements, and which hence cannot be accepted. Filtering out desk rejects in advance is common practice for both conferences and journals.

To identify such desk rejects for NIER 2013, program co-chair Sebastian Elbaum and I made a first pass through all 160+ submissions. In the end, we desk rejected around 10% of the submissions (a little more than I had anticipated).

Causes for reject included problems in:

Formatting: The paper does not meet the 4 page limit;
Scope: The paper is not about software engineering;
Presentation: The paper contains, e.g., too many grammatical problems;
Innovation: The paper does not explain how it builds upon and
extends the existing body of knowledge.

Of these, for NIER the formatting was responsible for half of the desk rejects.

Plagiarism
A potential cause that we did not encounter is plagiarism (fraud), or its special form self-plagiarism (submitting the same, or very similar, papers to multiple venues).

In my experience, plain plagiarism is not very common (I encountered one case in another conference, where we had to apply the IEEE Guidelines on Plagiarism).

Self-plagiarism is a bigger problem as it can range from copy-pasting a few paragraphs from an earlier paper to a straight double submission. While the former may be acceptable, the latter is considered a cardinal sin (your paper will be rejected at both venues, and reviewers don’t like reviewing a paper that cannot be accepted). And there are many shades of grey in between.

Notifications
We sent out notifications to authors of desk rejected papers within a few days after the submission deadline (it took a bit of searching to figure out that the best way to do this is to use the delete paper option from EasyChair). Thus, desk rejects not only serve to reduce the reviewing load of the program committee, but also to provide early feedback to authors whose papers just cannot make it.

Is there anything you can do to avoid being desk rejected?
The simple advice is to carefully read the submission guidelines. Besides that, it may be wise to submit a version adhering to all criteria early on when there is no immediate deadline stress yet. This may then serve as a fallback in case you mess up the final submission (uploading, e.g., the wrong pdf). Usually chairs have access to these earlier versions, and they can then decide to use the earlier version in case (only) the final version is a clear desk reject (for NIER this situation did not occur).

Is there anything you can do after being desk rejected?
Usually not. Most desk rejects are clear violations of submission requirements. If you think your desk reject is based on subjective grounds (presentation, innovation), and you strongly disagree, you could try to contact the chairs to get your paper into the reviewing phase anyway. The most likely outcome, however, will still be a reject, so it may not be in your self-interest to postpone this known outcome.

Submission times
And … are desk rejects are related to the paper submission time? Yes, there is a (mild) negative correlation: For NIER, there were more desk rejects in the earlier than in the later submissions. My guess is that this is quite common. There seem to be authors who simply try their same pdf at multiple conferences, hoping for an easy conference with little reviewing only.

Acceptance rates
This brings me to the final point. Conferences are commonly ranked based on their acceptance ratio. The lower the percentage of accepted papers, the more prestigious the conference is considered. The most interesting figure is obtained if acceptance rates are based on the serious competition only — i.e., the subset of papers that made it to the reviewing phase. Desk rejected papers do not qualify as such, and hence should not be taken into account when computing conference acceptance rates.