Writers and Collaborators Workshops

In September this year we will organize a Lorentz workshop in the area of software analytics and big software. Lorentz worskhops take place in the Lorentz Center in Leiden, and are somewhat similar to Dagstuhl seminars common in computer science: A small group, a week long retreat, and a focus on interaction and collaboration.

Workshop poster

To make this interaction happen, we will experiment with “writer’s and collaborator’s workshops”, inspired by Writer’s Workshops for design patterns.

The workshops we have in mind are short (1-2 hour) sessions, in which a small group of participants (the “discussants”) study one or more papers (proposed by the “author”) in depth.

The primary objective of the session is to provide feedback on the paper to the author. This feedback can relate to any aspect of the paper, such as the experimental setup, the related work, the precise objective, future work that can be carried out to build upon these results, etc.

Besides that, the discussion of each paper serves to explore possible (future) collaborations between any of the participants. Thus, discussants can bring in their own related work, and explore how joining forces can help to further advance the paper’s key results.

The set up of the workshops draws inspiration from Writer’s Workshops commonly used in the design patterns community, which in turn were inspired by workshops int he creative literature community. Pattern workshops have been used to review, evaluate, and improve pattern descriptions. At the same time, the process is akin to a peer review process, except that the objective is not paper selection, but in depth discussion between authors and participants about the key message of a paper.

The specific format we propose is as follows.

The preparation phase aims to match authors and discussants. Using a conference paper management system like EasyChair, the steps include:

  1. Authors submit the paper they would like to have discussed. This can be a paper currently under review (e.g., their most recent ICSE submissions), a draft version of a paper they would like to submit, or an already published paper they would like to expand (for example for a journal submission).

  2. All workshop participants can see all papers, and place “bids” on papers they would be particularly interested in studying in advance.

  3. Papers and participants are grouped into coherent units of 3-4 papers and around 10 participants each.

  4. Each paper to be discussed gets assigned at least three discussants, based on the groups and bids.

  5. Discussants study the papers assigned in advance, and compile a short summary of the paper and its main strenghts and points for improvement.

The actual workshops will take 1-2 hours, has up to 10 participants, and includes the discussion of 2-3 papers using 30-45 minutes per paper. We propose the following format:

  1. For each workshop, we assign one moderator to steer the process.

  2. One of the discussants is assigned to summarize the paper in around 5 minutes, and explain it to the participants.

  3. Each discussant explains what he or she particularly liked about the paper

  4. Each discussant identifies opportunities for possible improvements to the paper.

  5. Workshop participants who did not review the paper themselves offer their perspectives on the discussion, including areas of further work.

  6. After this, the author him or herself can step in, and respond to the various points raised.

  7. As the discussion concludes, the moderator provides a summary of the main findings of the discussion of this paper.

  8. The process is repeated for the next paper, rotating the author, moderator, and discussant roles.

If you have ever attended a physical PC meeting, you will recognize our attempt to keep some of the good parts of a PC meeting, without the need to make any form of “acceptance” decision.

Since several of the lessons learned during such a session will transcend the individual papers discussed, we will also use plenary sessions in which each of the moderators can summarize the main findings of their workshops, and share them with everyone.

As also emphasized by the patterns community, this format requires a safe setting with participants who trust each other. In particular:

  • Papers discussed are confidential: Authors need not be scared that participants “steal” their ideas;
  • Feedback is directed at the work rather than the author, preserving the dignity of the author.

Clearly, such a “writers and collaborators workshop” does require work from the participants, both in terms of following the protocol and in preparing the discussions. So we will have to see if it really works or whether some adjustments are necessary.

Yet this format does provide an excellent way to truly engage with each other’s research, and we look forward to the improved research results and future collaborations that will emerge from this.

If you have any exeprience with formats like this, please let me know!


P.S. We still have some places available in the workshop, so contact me if you are interested in participating.

The Battle for Affordable Open Access

Last week, Elsevier cut off thousands of scientists in Germany and Sweden from reading its recent journal articles, when negotiations over the cost of a nationwide open-access agreement broke down.

In these negotiations, universities are trying to change academic publishing, while publishers are defending the status quo. If you are an academic, you need to decide how to respond to this conflict:

  1. If you don’t change your own behavior, you are chosing Elsevier’s side, helping them maintain the status quo.
  2. If you are willing to change, you can help the universities. The simplest thing to do is to rigorously self-archive all your publications.

The key reason academic publishing needs to change is that academic publishers, including Elsevier, realize profit margins of 30-40%.

Euro bills

To put this number in perspective, consider my university, TU Delft. Our library spends €4-5 million each year on (journal) subscriptions. 30-40% of this amount, €1-2 million each year, ends up directly in the pockets of the shareholders of commercial publishers.

This is unacceptable. My university needs this money: To handle the immense work load coming with ever increasing student numbers, and to meet the research demands of society. A university cannot afford to waste money by just handing it over to publishers.

Universities across Europe have started to realize this. The Dutch, German, French, and Swedish universities have negotiated at the national level with publishers such as Springer Nature, Wiley, Taylor & Francis, Oxford University Press, and Elsevier (the largest publisher). In many cases deals have been made, with more and more options for open access publishing, at prices that were acceptable to the universities.

However, in several cases no deals have been made. The Dutch universities could not agree with the Royal Society of Chemistry Publishing, the French failed with Springer Nature, and now Germany and Sweden could not come to agreement with Elsevier. A common point of contention is that universities are only willing to pay for journal subscriptions if their employees can publish open access without additional article processing charges — a demand that directly challenges the current business model in academic publishing.

The negotiations are not over yet. Both in terms of open access availability and in terms of price publishers are far from where the universities want them to be. And if the universities would not negotiate themselves, tax payers and governments could simply force them, by putting a cap on the amount of money universities are allowed to spend on journal subscriptions.

Universities are likely to join forces, also across nations. They will determine maximum prices, and will not be willing to make exceptions. The negotiations will be brutal, as the publishers have much to loose and much to fight for.

In all these negotiations it is crucial that universities take back ownership of what they produce. Every single researcher can contribute, simply by making all of their own papers available on their institutional (pure.tudelft.nl for my university) or subject repositories (e.g., arxiv.org). This helps in two ways:

  • It helps researchers cut off (Germans and Swedes as we speak) from publishers in case negotiations fail.
  • It reduces the publishers’ power in future negotiations as the negative effects of cancellations have been reduced.

This seems like a simple thing to do, and it is: It should not take an average researcher more than 10 minutes to post a paper on a public repository.

Nevertheless, during my two years as department head I have seen many researchers who fail to see the need or take the time to upload their papers. I have begged, prayed, and pushed, wrote a green open access FAQ to address any legal concerns researchers might have, and wrote a step-by-step guide on how to upload a paper.

Open Access Adoption at TU Delft

On top of that, my university, like many others, have made it compulsory for its employees to upload their papers to the institutional repository (this is not surprising since TU Delft plays a leading role in the Dutch negotiations between universities and publishers). Furthermore both national (NWO) and European (H2020, Horizon Europe) funding agencies insist on open access publications.

Despite all this, my department barely meets the university ambition of having 60% of its 2018 publications available as (green or gold) open access. To the credit of my departmental employees, however, they do better than many other departments. Also pre-print links uploaded to conference sites have typically been less than 60%, suggesting that the culture of self-archiving in computer science leaves much to be desired.

If anything, the recent cut off by Elsevier in Sweden and Germany emphasizes the need for self-archiving.

If you’re too busy to self-archive, you are helping Elsevier getting rich from public money.

If you do self-archive, you help your university explain to publishers that their services are only needed when they bring true value to the publishing process at an affordable price.


© Arie van Deursen, 2018. Licensed under CC BY-SA 4.0.

Euro image credit: pixabay, CC0 Creative Commons.

My Last Program Committee Meeting?

This month, I participated in what may very well have been my last physical program committee (PC) meeting, for ESEC/FSE 2018. In 2017, top software engineering conferences like ICSE, ESEC/FSE, ASE and ISSTA (still) had physical PC meetings. In 2019, these four will all switch to on line PC meetings instead.

I participated in almost 20 of such meetings, and chaired one in 2017. Here is what I learned and observed, starting with the positives:

  1. As an author, I learned the importance of helping reviewers to quickly see and concisely formulate the key contributions in a way that is understandable to the full pc.

  2. As a reviewer I learned to study papers so well that I could confidently discuss them in front of 40 (randomly critical) PC members.

  3. During the meetings, I witnessed how reviewers can passionately defend a paper as long as they clearly see its value and contributions, and how they will kill a paper if it has an irreparable flaw.

  4. I started to understand reviewing as a social process in which reviewers need to be encouraged to change their minds as more information unfolds, in order to arrive at consensus.

  5. I learned phrases reviewers use to permit them to change their minds, such as “on the fence”, “lukewarm”, “not embarrassing”, “my +1 can also be read as a -1”, “I am not an expert but”, etc. Essential idioms to reach consensus.

  6. I witnessed how paper discussions can go beyond the individual paper, and trigger broad and important debate about the nature of the arguments used to accept or reject a paper (e.g. on evaluation methods used, impact, data availability, etc)

  7. I saw how overhearing discussions of papers reviewed by others can be useful, both to add insight (e.g. additional related work) and to challenge the (nature of the) arguments used.

  8. I felt, when I was PC co-chair, the pressure from 40 PC members challenging the consistency of any decision we made on paper acceptance. In terms of impact on the reviewing process, this may well be the most important benefit of a physical PC meeting.

  9. I experienced how PC meetings are a great way to build a trusted community and make friends for life. I deeply respected the rigor and well articulated concerns of many PC members. And nothing bonds like spending two full days in a small meeting room with many people and insufficient oxygen.

I also witnessed some of the problems:

  1. My biggest struggle was the incredible inefficiency of PC meetings. They take 1-2 days from 8am-6pm, you’re present at discussions of up to 100 papers discussed in 5-10 minutes each, yet participate in often less than 10 papers, in some cases just one or two.

  2. I had to travel long distances just for meetings. Co-located meetings (e.g. the FSE meeting is typically immediately after ICSE) reduce the footprint, but I have crossed the Atlantic multiple times just for a two day PC meeting.

  3. My family paid a price for my absence caused by almost 20 PC meetings. I have missed multiple family birthdays.

  4. The financial burden on the conference (meeting room + 40 x dinner and 80 lunches, €5000) and each PC member (travel and 2-3 hotel nights, adding up easily to €750 per person paid by the PC members) is substantial.

  5. I saw how vocal pc members can dominate discussions, yielding less opportunity for the more timid pc members who need more time to think before they dare to speak.

  6. I hardly attended a PC meeting in which not at least a few PC members eventually had to cancel their trip, and at best participated via Skype. This gives papers reviewed by these PC members a different treatment. As PC chair for ESEC/FSE we had five PC members who could not make it, all for valid (personal, painful) reasons. I myself had to cancel one PC meeting a week before the meeting, when one of my children had serious health problems.

  7. Insisting on a physical PC meeeting limits the choice of PC members: When inviting 40 PC members for ESEC/FSE 2017, we had 20 candidates who declined our invitation as they could not commit a year in advance to attending a PC meeting (in Buenos Aires).

Taking the pros and cons together, I have come to believe that the benefits do not outweigh the high costs. It must be possible to organize an on line PC meeting with special actions to keep the good parts (quality control, consistent decisions, overhearing/inspecting each others reviews, …).

I look forward to learning from ICSE, ESEC FSE, ISSTA and ASE experiences in 2019 and beyond about best practices to apply for organizing a successful on line PC meeting.

In principle, ICSE will have on line PC meetings in 2019, 2020, and 2021, after which the steering committee will evaluate the pros and cons.

As ICSE 2021 program co-chairs, Tao Xie and I are very happy about this, and we will do our best to turn the ICSE 2021 on line PC meeting into a great success, for the authors, the PC members, and the ICSE community. Any suggestions on how to achieve this are greatly appreciated.

T-Shirt saying "Last PC Meeting Ever?"

Christian Bird realized the ESEC/FSE 2018 PC meeting may be our last, and realized this nostalgic moment deserved a T-shirt of its own. Thanks!!


(c) Arie van Deursen, June 2018.

Managing Complex Spreadsheets — The Story of PerfectXL

This week we finished grading of the software architecture course I’m teaching.

Like many teachers, I use a spreadsheet for grading together with my co-teachers and teaching assistants. In this case, we concurrently worked with five people on a Google Spreadsheet. The resulting spreadsheet is quite interesting:

  • The spreadsheet currently has 22 sheets (tabs)

  • There are input sheets for basic information on the over one hundred students in the class, the groups they form, and the rubrics we use.

  • There are input sheets from various forms the students used to enter course-related information

  • There are input sheets for different sub-assignments, which the teachers and assistants use to enter subgrades for each rubric: Some grades are individual, others are per team. Such sheets also contain basic formulas to compute grades from rubrics.

  • There are overview sheets collecting the sub-grades from various sheets, combining them to overall grades. The corresponding formulas can become quite tricky, involving rounding, lookups, sumproducts, thresholds, conditional logic based on absence or presence of certain grades, etc.

  • There are various output sheets, to report grades to students, to export grades to the university’s administrative systems, and to offer diagrams showing grade distributions for educational assessments of the course.

The spreadsheet used has a history of five years: Each year we take the existing one, copy it, and remove the student data. We then adjust it to the changes we made to the course (additional assignments, new grading policies, better rubrics, etc).

Visualization of sheet dependencies

All in all, this spreadsheet has grown quite complex, and it is easy to make a mistake. For example, I once released incorrect grades — a rather stressful event both for my students and myself. And all I did wrong was forgetting the infamous false argument needed in a vlookup — despite the fact that I was well aware of this “feature”. For the this year’s spreadsheet we had duplicate student ids, in a column where each row had to be unique, leading to a missing grade, and again extra effort and stress to resolve this as soon as possible.

I suspect that if you use spreadsheets seriously, for example for grading, you recognize the characteristics of my spreadsheet — and maybe your sheets are even more complicated.

Now I have an interest in spreadsheets that goes beyond that of the casual user: As a software engineering researcher, I have looked extensively at spreadsheets. I did this together with Felienne Hermans, first when she was doing her PhD under my supervision in the context of the Perplex project (co-funded by Microsoft) and then in the context of the Prose project (funded by the Dutch STW agency). From a research perspective, these projects were certainly successful, leading to a series of publications in such venues as ECOOP 2010, ICSE 2011-2013, ICSM, EMSE, and SANER.

But we did our research not just to publish papers: We also had (and have) the ambition to actually help the working spreadsheet user, as well as large organizations that depend on spreadsheets for business-critical decision making.

To that end, we founded a company, Infotron, which offers tooling, services, consultancy, and training to help organizations and individuals become more effective with their spreadsheets.

After several years of operating somewhat under the radar, the Infotron team (headed by CEO Matéo Mol) has now launched an on line service, PerfectXL, in which users can upload a spreadsheet and get it analyzed. The service then helps in the following ways:

  • PerfectXL can visualize the architectural dependencies between sheets, as shown above for my course sheet;
  • PerfectXL can identify risks (such as the vlookup I mentioned, interrupted ranges, or overly complex conditional logic);
  • PerfectXL can assess the structure and quality of the formulas in your sheet.

If this sounds interesting, you can try out the service for free at perfectxl.com. There are various pricing options that help Infotron run and further grow this service — pick the subscription that suits you and your organization best!

Even if you decide not to use the PerfectXL service, the site contains a lot of generally useful information, such as various hints and tips on how to create and maintain well-structured spreadsheets.

Enjoy!

Golden Open Access for the ACM: Who Should Pay?

In a move that I greatly support, the ACM Special Interest Group on Programming Languages (SIGPLAN), is exploring various ways to adopt a truly Golden Open Access model, by rolling out a survey asking your opinion, set up by Michael Hicks. Even though I myself am most active in ACM’s Special Interest Group on Software Engineering SIGSOFT, I do publish at and attend SIGPLAN conferences such as OOPSLA. And I sincerely hope that SIGSOFT will follow SIGPLAN’s leadership in this important issue.

ACM presently supports green open access (self-archiving) and a concept called “Open TOC” in which papers are accessible via a dedicated “Table of Contents” page for a particular conference. While better than nothing, I agree with OOPSLA 2017 program chair Jonathan Aldrich who explains in his blog post that Golden Open Access is much preferred.

This does, however, raise the question who should pay for making publications open access, which is part of the SIGPLAN survey:

  • Attendants Pay: Increase the conference fees: SIGPLAN estimates that this would amount to an increase by around $50,- per attendee.

  • Authors Pay: Introduce Article Processing Charges: SIGPLAN indicates that if a full conference goes open access this would presently amount to $400 per paper.

screen-shot-2017-01-05-at-4-23-12-pm

Note that the math here suggest that the number of registrants is around 8 times the number of papers in the main research track. Also note that it assumes that only papers in the main research track are made open access. A conference like ICSE, however, has many workshops with many papers: It is equally important that these become open access too, which would change the math considerably.

The article processing charges of $400,- are presented as a given: They may seem in line with what commercial publishers charge, but they are certainly very high compared to what, e.g. LIPIcs charges for ECOOP (which is less than $100). These costs of $400,- come from ACM’s desire (need) to continue to make a substantial profit from their publishing activities, and should go down.

In his blog post, Jonathan Aldrich argues for the “author pays” model. His reasoning is that this can be viewed as a “funder pays” model: Most authors are funded by research grants, and usually in those grants funds can be found to cater for the costs involved in publishing open access.

On this point (and this point alone) I disagree with Jonathan. To me it feels fundamentally wrong to punish authors by making them pay $400 more for their registration. If anything, they should get a reduction for delivering the content of the conference.

I see Jonathan’s point that some funding agencies are willing to cover open access costs (e.g. NSF, NWO, H2020), and that it is worthwhile to explore how to tap into that money. But this requires data on what percentage of papers could be labeled as “funded”. For my department, I foresee several cases where it would be the department who’d have to pay for this instead of an external agency.

I do sympathize with Jonathan’s appeal to reduce conference registration costs, which can be very high. But the cost of making publications open access should be borne by the full community (all attendants), not just by those who happen to publish a paper.

Shining examples of open access computer science conferences are the Usenix, AAAI, and NIPS events. Full golden open access of all content, and no extra charges for authors — these conferences are years ahead of the ACM.

Do you have an opinion on “author pays” versus “participant pays”? Fill in the survey!

Thank you SIGPLAN for initiating this discussion!

Self-Archiving Publications in Elsevier Pure

In 2016, TU Delft adopted Elsevier Pure as its database to keep track of all publications from its employees.

At the same time, TU Delft has adopted a mandated green open access policy. This means that for papers published after May 2016, an author-prepared version (pdf) must be uploaded into Pure.

I am very happy with this commitment to green open access (and TU Delft is not alone). This decision also means, however, that we as researchers need to do some extra work, to make our author-prepared versions available.

To make it easier for you to upload your papers and comply with the green open access policy, here are some suggestions based on my experience so far working with Pure.

I can’t say I’m a big fan of Elsevier Pure. In the interest of open access, however, I’m doing my best to tolerate the quirks of Pure, in order to help the TU Delft to share all its research papers freely and persistently with everyone in the world.

Elsevier Pure is used at hundreds of different universities. If you work at one of them, this post may help you in using Pure to make your research available as open access.

The Outcome

Pure Paper Data

Anyone can browse publications in Pure, available at https://pure.tudelft.nl.

All pages have persistent URL’s, making it easy to refer to a list of all your publications (such as my list), or individual papers (such as my recent one on crash reproduction). For all recent papers I have added a pdf of the version that we as authors prepared ourselves (aka the postprint), as well as a DOI link to the publisher version (often behind a paywall).

Thus, you can use Pure to offer, for each publication, your self-archived (green open access) version as well as the final publisher version.

Moreover, these publications can be aggregated to the section, department, and faculty level, for management reporting purposes.

In this way, Pure data shows the tax payers how their money is spent on academic research, and gives the tax payer free access to the outcomes. The tax payer deserves it that we invest some time in populating Pure with accurate data.

Accessing Pure

To enter publications into pure, you’ll need to login. On https://pure.tudelft.nl, in the footer at the right, you’ll find “Log into Pure”. Use your TU Delft netid.

If you’re interested in web applications, you will quickly recognize that Pure is a fairly old system, with user interface choices that would not be made these days.

Entering Meta-Data

You can start entering a publication by hitting the big green button “Add new” at the top right of the page. It will open a brand new browser window for you.

In the new window, click “Research Output”, which will turn blue and expand into three items.

Then there are several ways to enter a publication, including:

  1. Import via Elsevier Scopus, found via “Import from Online Source”. This is by far the easiest, if (1) your publication venue is indexed by Scopus, (2) it is already visible at Scopus (which typically takes a few months), and if (3) you can find it on Scopus. To help Scopus, I have set up an ORCID author identifier and connected it to my Scopus author profile.

  2. Import via Bibtex, found via “Import from file”. If you click it, importing from bibtex is one of the options. You can obtain bibtex entries from DBLP, Google Scholar, ACM, your departmental publications server, or write them by hand in your favorite editor, and then copy paste them into Pure.

  3. Entering details via a series of buttons and forms (“Create from template”). I recommend not to use this option. If you go against this advice, make sure that if you want to enter a conference paper, you do not pick the template “Paper/contribution to conference”, as you should pick “Conference Contribution/Chapter in Conference Proceedings” instead. Don’t ask me why.

In all cases, yet another browser window is opened, in which you can inspect, correct, and save the bibliographic data. After saving, you’ll have a new entry with a unique URL that you can use for sharing your publication. The URL will stay the same after you make additional updates.

Entering your Author-Prepared version

With each publication, you can add various “electronic versions”.

Each can be a file (pdf), a link to a version, or a DOI. For pdfs you want to upload, make sure you check it meets the conditions under your publisher allows self-archiving.

Pure distinguishes various version types, which you can enter via the “Document version” pull down menu. Here you need to include at least the following two versions:

  • The “accepted author manuscript”. This is also called a postprint, and is the version that (1) is fully prepared by you as authors; and that (2) includes all improvements you made after receiving the reviews. Here you can typically upload the pdf as you prepared it yourself.

  • The “final published version”. This is the Publisher’s version. It is likely that the final version is copyrighted by the publisher. Therefore, you typically include a link (DOI) to the final version, and do not upload a pdf to Pure. If you import from Scopus, this field is automatically set.

Furthermore, Pure permits setting the “access to electronic version”, and defining the “public access”. Relevant items include:

  • Open, meaning (green) open access. This is what I typically select for the “accepted author manuscript”.

  • Restricted, meaning behind a paywall. This is what I typically select for the final published version.

  • Embargoed, meaning that the pdf cannot be made public until a set date. Can be used for commercial publishers who insist on restricting access to post-prints from institutional repositories in the first 1-2 years.

Example Pure cover page.

The vast majority (80%) of the academic publishers permits authors to archive their accepted manuscripts in institutional repositories such as Pure. However, publishers typically permit this under specific conditions, which may differ per publisher. You can check out my Green Open Access FAQ if you want to learn more about these conditions, and how to find them for your (computer science) publisher.

Once uploaded, your pdf is available for download for everyone. Pure adds a cover page with meta-data such as the citation (how it is published) and the DOI to the final version. This cover page is useful, as it helps to meet the intent of the conditions most publishers require on green open access publishing.

Google Scholar indexes Pure, so after a while your paper should also appear on your Scholar page.

A Paper’s Life Cycle

Making papers early available is one of the benefits of self-archiving. This can be done in Pure by setting the paper’s “Publication Status”. This field can have the following values:

  1. “In preparation”: Literally a pre-print. Your paper can be considered a draft and may still change.
  2. “Submitted”: You submitted your paper to a journal or conference where it is now under review.
  3. “Accepted/In press”: Yes, paper accepted! This also means that you as an author can share your “accepted author manuscript”.
  4. “E-Pub ahead of print”: I don’t see how this differs from the Accepted state.
  5. “Published”: The paper is final and has been officially published.

In my Green Open Access FAQ I provide an answer to the question Which Version Should I Self-Archive.

I typically enter publications once accepted, and share the Pure link with the accepted author manuscript as pre-print link on Twitter or on conference sites (e.g. ICSE 2018)

In particular, I do the following once my paper is accepted:

  1. I create a bibtex entry for an @inproceedings (conference, workshop) or @article (journal) publication.
  2. I upload the bibtex entry into pure.
  3. I add my own pdf with the author-prepared version to the resulting pure entry
  4. I set the Publication Status to “Accepted”.
  5. I set the Entry Status (bottom of the page) to “in progress”
  6. I save the entry (bottom of the page)
  7. I share the resulting Pure link on Twitter with the rest of the world so that they can read my paper.

Once the publisher actually manages to publish this paper as well (this may be several months later!), I update my pure entry:

  1. I add the DOI link to the final published version.
  2. I provide the missing bibliographic meta-data (page numbers, volume, number, …).
  3. I set the Publication Status to “Published”.
  4. I set the Entry Status to “for approval” (by the library who can then change it into an immutable “approved” if they think this is a valid entry).

My preprint links I shared still contain a pointer to the self-archived pdf, but now also to the official version at the publisher for those who have access through the pay wall.

Permalinks

The Pure page for your paper including all meta-information and all versions of that paper (example) in principle is stable, and its URL provide a permanent link (unless you delete it).

You can also directly link to the individual pdfs you upload (example). However, these are more volatile: If you upload a newer version the old link will be dead. Moreover, in some cases the (TU Delft) library has moved pdfs around thereby destroying old pdf links.

Therefore, I recommend to use links to the full record rather than individual pdfs when sharing pure links.

Complicated Author Names

Pure contains official employee names as registered by TU Delft.

Some authors publish under different (variants of their) names. For example, Dutch universities have trouble handling the complex naming habits of Portuguese and Brazilian employees.

If Pure is not able to map an author name to the corresponding employee, find the author name in the publication, click edit, and then click “Replace”. This allows searching the TU Delft employee database for the correct person.

If Pure has found the correct employee, but the name displayed is very differently from what is listed on the publication itself, you can edit the author for that publication, and enter a different first and last name for this publication.

Exporting Linked Bibtex (to Orcid)

If you’re logged in, you can download your publication list in various formats, including BibTex (you’ll find the button for this at the bottom of the page).

I prefer bibtex entries that have a url back to the place where all info is. Therefore, I wrote a little Python script to scrape a Pure web page (mine, yours, or anyone’s), that adds such information.

I use the bibtex entries produced by this script to populate my Orcid profile as well as our Departmental Publication Server with publications from Pure that link back to their corresponding pure page.


Version history

  • 20 November 2016: Version 0.1, for internal purposes.
  • 07 December 2016: Version 0.2, first public version.
  • 14 December 2016: Version 0.3, minor improvements.
  • 13 January 2017: Version 0.4, updated Google Scholar information.
  • 16 March 2017: Version 0.5, updated approval states based on correction from Hans Meijerrathken.
  • 17 March 2017: Version 0.6, life cycle and exporting added.
  • 24 November 2017: Version 0.7, simplified life cycle and approval states.
  • 03 March 2018: Version 0.8, added info on populating Orcid from Pure.
  • 27 July 2018: Version 0.9, added info on permalinks, licensed as CC BY-SA 4.0

Acknowledgments: Thanks to Moritz Beller for providing feedback and trying out Pure.

© Arie van Deursen, 2018. Licensed under CC BY-SA 4.0.

Green Open Access FAQ

Green field

Image credit: Flickr, user static_view

(Opinionated) answers to frequently asked questions on (green) open access, from a computer science (software engineering) research perspective.

Disclaimer: IANAL, so if you want to know things for sure you’ll have to study the references provided. Use at your own risk.

Green open access is trickier than I thought, so I might have made mistakes. Corrections are welcome, just as additional questions for this FAQ. Thanks!

Green Open Access Questions

  1. What is Green Open Access?
  2. What is a pre-print?
  3. What is a post-print?
  4. What is a publisher’s version?
  5. Do publishers allow Green Open Access?
  6. Under what conditions is Green Open Access permitted?
  7. What is Yellow Open Access?
  8. What is Gold Open Access?
  9. What is Hybrid Open Access?
  10. What are the Self-Archiving policies of common computer science venues?
  11. Is Green Open Access compulsory?
  12. Should I share my pre-print under a Creative Commons license?
  13. What is a good place for self-archiving?
  14. Can I use PeerJ Preprints for Self-Archiving?
  15. Can I use ResearchGate or Academia.edu for Self-Archiving?
  16. Which version(s) should I self-archive?
  17. What does Gold Open Access add to Green Open Access?
  18. Will Green Open Access hurt commercial publishers?
  19. What is the greenest publisher in computer science?
  20. Should I use ACM Authorizer for Self-Archiving?
  21. As a conference organizer, can I mandate Green Open Access?
  22. What does Green Open Access cost?
  23. Should I adopt Green Open Access?
  24. Where can I learn more about Green Open Access?

What is Green Open Access?

In Green Open Access you as an author archive a version of your paper yourself, and make it publicly available. This can be at your personal home page, at the institutional repository of your employer (such as the one from TU Delft), or at an e-print server such as arXiv.

The word “archive” indicates that the paper will remain available forever.

What is a pre-print?

A pre-print is a version of a paper that is entirely prepared by the authors.

Since no publisher has been involved in any way in the preparation of such a pre-print, it feels right that the authors can deposit such pre-prints where ever they want to. Before submission, the authors, or their employers such as universities, hold the copyright to the paper, and hence can publish the paper in on line repositories.

Following the definition of SHERPA‘s RoMEO project, pre-prints refer to the version before peer-review organized by a publisher.

What is a post-print?

Following the RoMEO definitions, a post-print is a final draft as prepared by the authors themselves after reviewing. Thus, feedback from the reviewers has typically been included.

Here a publisher may have had some light involvement, for example by selecting the reviewers, making a reviewing system available, or by offering a formatting template / style sheet. The post-print, however, is author-prepared, so copy-editing and final markup by the publisher has not been done.

What is a publisher’s version?

While pre- and post-prints are author-prepared, the final publisher’s version is created by the publisher.

The publishers involvement may vary from very little (camera ready version entirely created by authors) up to substantial (proof reading, new markup, copy editing, etc.).

Publishers typically make their versions available after a transfer of copyright, from the authors to the publisher. And with the copyright owned by the publisher, it is the publisher who determines not only where the publisher’s version can be made available, but also where the original author-prepared pre- or post-prints can be made available.

Do publishers allow Green Open Access?

Self-archiving of non-published material that you own the copyright to is always allowed.

Whether self-archiving of a paper that has been accepted by a publisher for publication is allowed depends on that publisher. You have transferred your copyright, so it is up to the publisher to decide who else can publish it as well.

Different publishers have different policies, and these policies may in turn differ per journal. Furthermore, the policies may vary over time.

The SHERPA project does a great job in keeping track of the open access status of many journals. You’ll need to check the status of your journal, and if it is green you can self-archive your paper (usually under certain publisher-specific conditions).

In the RoMEO definition, green open access means that authors can self-archive both pre-prints and post-prints.

Under what conditions is Green Open Access permitted?

Since the publisher holds copyright on your published paper, it can (and usually does) impose constraints on the self-archived versions. You should always check the specific constraints for your journal or publisher, for example via the RoMEO journal list.

The following conditions are fairly common:

  1. You generally can self-archive pre- and post-prints only, but not the publisher version.

  2. In the meta-data of the self-archived version you need to add a reference to the final version (for example through its DOI).

  3. In the meta-data of the self-archived version you need to include a statement of the current ownership of the copyright, sometimes through specific sentences that must be copy-pasted.

  4. The repository in which you self-archive should be non-commercial. Thus, arXiv and institutional repositories are usually permitted, but commercial ones like PeerJ Preprints, Academia.edu or ResearchGate are not.

  5. Some commercial publishers impose an embargo on post-prints. For example Elsevier permits sharing the post-print version on an institutional repository only after 12-24 months (depending on the journal).

Usually meeting the demands of a single publisher is relatively easy to do. Given points 2 and 3, it typically involves creating a dedicated pdf with a footnote on the first page with the required extra information.

However, every publisher has its own rules. If you publish your papers in a range of different venues (which is what good researchers do), you’ll have to know many different rules if you want to do green open access in the correct way.

What is Yellow Open Access?

Some publishers (such as Wiley) allow self-archiving of pre-prints only, and not of post-prints. This is referred to as yellow open access in RoMEO. Yellow is more restrictive than green.

As an author, I find yellow open access frustrating, as it forbids me to make the version of my paper that was improved thanks to the reviewers available via open access.

As a reviewer, I feel yellow open access wastes my effort: I tried to help authors by giving useful feedback, and the publisher forbids my improvements to be reflected in the open access version.

What is Gold Open Access?

Gold Open Access refers to journals (or conference proceedings) that are completely accessible to the public without requiring paid subscriptions.

Often, gold implies green, for example when a publisher such as PeerJ, PLOS ONE or LIPIcs adopts a Creative Commons license — which allows anyone, including the authors, to share a copy under the condition of proper attribution.

The funding model for open access is usually not based on subscriptions, but on Article Processing Charges, i.e., a payment by the authors for each article they publish (varying between $70 (LIPIcs) up to $1500 (PLOS ONE) per paper).

What is Hybrid Open Access?

Hybrid open access refers to a restricted (subscription-funded) journal that permits authors to pay extra to make their own paper available as open access.

This practice is also referred to as double dipping: The publisher catches revenues from both subscriptions and author processing charges.

University libraries and funding agencies do not like hybrid access, since they feel they have to pay twice, both for the authors and the readers.

Green open access is better than hybrid open access, simply because it achieves the same (an article is available) yet at lower costs.

What are the Self-Archiving policies of common computer science venues?

For your and my convenience, here is the green status of some publishers that are common in software engineering (check links for most up to date information):

  • ACM: Green, e.g., TOSEM, see also the ACM author rights. For ACM conferences, often the author-prepared camera-ready version includes a DOI already, making it easy to adhere to ACM’s meta-data requirements.
  • IEEE: Green, e.g., TSE. The IEEE has a policy that the IEEE makes a version available that meets all IEEE meta-data requirements, and that authors can use for self-archiving. See also their self-archiving FAQ.
  • Springer: Green, e.g., EMSE, SoSyM, LNCS. Pre-print on arXiv, post-print on personal page immediately and in repository in some cases immediately and in others after a 12 month embargo period.
  • Elsevier: Mostly green, e.g., JSS, IST. Pre-print allowed, post-print with CC-BY-NC-ND license on personal page immediately and in institutional repository after 12-48 month embargo period.
  • Wiley: Mostly yellow, i.e., only pre-prints can be immediately shared, and post-prints (even on personal pages) only after 12 month embargo. E.g. JSEP.

Luckily, there are also some golden open access publishers (which typically permit self-archiving as well should you still want that):

Is Green Open Access compulsory?

Funding agencies (NWO, EU, Bill and Melinda Gates Foundation, …) as well as universities (TU Delft, University of California, UCL, ETH Zurich, Imperial College, …) are increasingly demanding that all publications resulting from their projects or employees are available in open access.

My own university TU Delft insists, like many others, on green open access:

As of 1 May 2016 the so-called Green Road to Open Access publishing is mandatory for all (co)authors at TU Delft. The (co)author must publish the final accepted author’s version of a peer-reviewed article with the required metadata in the TU Delft Institutional Repository.

This makes sense: The TU Delft wants to have copies of all the papers that its employees produce, and make sure that the TU Delft stakeholders, i.e. the Dutch citizens, can access all results. Note that TU Delft insists on post-prints that include reviewer-induced modifications.

The Dutch national science foundation NWO has a preference for gold open access, but accepts green open access if that’s impossible (“Encourage Gold, require immediate Green“).

Should I share my pre-print under a Creative Commons license?

You should only do this if you are certain that the publisher’s conditions on self-archiving pre-prints are compatible with a Creative Commons license. If that is the case, you probably are dealing with a golden open access publisher anyway.

Creative Commons licenses are very liberal, allowing anyone to re-distribute (copy) the licensed work (under certain conditions, including proper attribution).

This effectively nullifies (some of) the rights that come with copyright. For that reason, publishers that insist on owning the full copyright to the papers they publish typically disallow self-archiving earlier versions with such a license.

For example, ACM Computing Surveys insists on a set statement indicating

… © ACM, YYYY. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution…

This “not for redistribution” is incompatible with Creative Commons, which is all about sharing.

Furthermore, a Creative Commons license is irrevocable. So once you picked it for your pre-print, you effectively made a choice for golden open access publishers only (some people might consider this desirable, but it seriously limits your options).

Therefore, my suggestion would be to keep the copyright yourself for as long as you can, giving you the freedom to switch to Creative Commons once you know who your publisher is.

What is a good place for self-archiving?

It depends on your needs.

Your employer may require that you use your institutional repository (such as the TU Delft Repository). This helps your employer to keep track of how many of its publications are available as open access. The higher this number, the stronger the position of your employer when negotiating open access deals with publishers. Institutional archiving still allows you to post a version elsewhere as well.

Subject repositories such as arXiv offer good visibility to your peers. In fields like physics using arXiv is very common, whereas in Computer Science this is less so. A good thing about arXiv is that it permits versioning, making it possible to submit a pre-print first, which can then later be extended with the post-print. You can use several licenses. If you intend publishing your paper, however, you should adopt arXiv’s Non-Exclusive Distribution license (which just allows arXiv to distribute the paper) instead of the more generous Creative Commons license — which would likely conflict with the copyright claims of the publisher of the refereed paper.

Your personal home page is a good place if you want to offer an overview of your own research. Home page URLs may not be very permanent though, so as an approach to self archiving it is not suitable. You can use it in addition to archiving in repositories, but not as a replacement.

Can I use PeerJ Preprints for Self-Archiving?

Probably not — and it’s also not what PeerJ Preprints are intended for.

PeerJ Preprints is a commercial eprint server requiring a Creative Commons license. It is intended to share drafts that have not yet been peer reviewed for formal publication.

It offers good visibility (a preprint on goto statements attracted 15,000 views), and a smooth user interface for posting comments and receiving feedback. Articles can not be removed once uploaded.

The PeerJ Preprint service is compatible with other golden open access publishers (such as PeerJ itself or Usenix).

The PeerJ Preprint service, however, is incompatible with most other publishers (such as ACM, IEEE, or Springer) because (1) the service is commercial; (2) the service requires Creative Commons as license; (3) preprints once posted cannot be removed.

So, if you want to abide with the rules, uploading a pre-print to PeerJ Preprints severely limits your subsequent publication options.

Can I use ResearchGate or Academia.edu for Self-Archiving?

No — unless you only work with liberal publishers with permissive licenses such as Creative Commons.

ResearchGate and Academia.edu are researcher social networks that also offer self-archiving features. As they are commercial repositories, most publishers will not allow sharing your paper on these networks.

The ResearchGate copyright pages provide useful information on this.

The Academia.edu copyright pages state the following:

Many journals will also allow an author to retain rights to all pre-publication drafts of his or her published work, which permits the author to post a pre-publication version of the work on Academia.edu. According to Sherpa, which tracks journal publishers’ approach to copyright, 90% of journals allow uploading of either the pre-print or the post-print of your paper.

This seems misleading to me: Most publishers explicitly dis-allow posting preprints to commercial repositories such as Academia.edu.

In both cases, the safer route is to use permitted places such as your home page or institutional repository for self-archiving, and only share links to your papers with ResearchGate or Academia.edu.

Which version(s) should I self-archive?

It depends.

Publishing a pre-print as soon as it is ready has several advantages:

  • You can receive rapid feedback on a version that is available early.

  • You can extend your pre-print with an appendix, containing material (e.g., experimental data) that does not fit in a paper that you’d submit to a journal

  • It allows you to claim ownership of certain ideas before your competition.

  • You offer most value to society since you allow anyone to benefit as early as possible from your hard work

Nevertheless, publishing a post-print only can also make sense:

  • You may want to keep some results or data secret from your competition until your paper is actually accepted for publication.

  • You may want to avoid confusion between different versions (pre-print versus post-print).

  • You may be scared to leave a trail of rejected versions submitted to different venues.

  • You may want to submit your pre-print to a venue adopting double blind reviewing, requiring you to remain anonymous as author. Publishing your pre-print during the reviewing phase would make it easy for reviewers to find your paper and connect your name to it.

For these reasons, and primarily to avoid confusion, I typically share just the post-print: The camera-ready version that I create and submit to the publisher is also the version that I self-archive as post-print.

What does Gold Open Access add to Green Open Access?

For open access, gold is better than green since:

  • it removes the burden of making articles publicly available from the researcher to the publisher.
  • it places a paper in a venue that is entirely open access. Thus, also other papers improving upon, or referring to your paper (published in the same journal) will be open access too.
  • gold typically implies green, i.e., the license of the journal is similar to Creative Commons, allowing anyone, including the authors, to share a copy under the condition of proper attribution.

Will Green Open Access hurt commercial publishers?

Maybe. But most academic publishers already allow green open access, and they are doing just fine. So I would not worry about it.

What is the greenest publisher in computer science?

The greenest publisher should be the one imposing the least restrictions on self-archiving.

From that perspective, publishers who want to be the greenest should in fact want to be gold, making their papers available under a permissive Creative Commons license. An example is Usenix.

Among the non-golden publishers, the greenest are probably the non-commercial ones, such as IEEE and ACM: They require simple conditions that are usually easy to meet.

The ACM, “the world’s largest educational and scientific computing society”, claims to be among the “greenest” publishers. Based on their tolerant attitude towards self-archiving of post-prints this may be somewhat justified. Furthermore, their Authorizer mechanism permits setting up free access to the publisher’s version.

But greenest is gold. So I look forward to the day the ACM follows its little sister Usenix in a full embrace of golden open access.

Should I use ACM Authorizer for Self-Archiving?

The ACM offers the Authorizer mechanism to provide free access to the Publisher’s Version of a paper, which only works from one user-specified URL. For example, I can use it to create a dedicated link from my institutional paper page to the publisher’s version.

However, Authorizer links cannot be accessed from other pages, and there is no point in emailing or tweeting them. Since only one authorizer link can exist per paper, I cannot use an authorizer link for both my institutional repository, and for the repository of my funding agency.

These restrictions on Authorizer links make them unsuitable as a replacement for self-archiving (let alone as a replacement for golden open access).

As a conference organizer, can I mandate Green Open Access?

Green open access is self-archiving, giving the authors the permission to archive their own papers.

As a conference organizer working with a non open access (ACM, IEEE, Springer-Verlag) publisher, you are not allowed to archive and distribute all the papers of the conference yourself.

OOSPLA program with DOI and preprint link

What several conferences do instead, though, is collecting links to pre- or post-prints. For example, the on line program of the recent OOPSLA 2016 conference has links to both the publisher’s version (through a DOI) and to an author-provided post-print.

For OOPSLA, 20 out of the 52 (38%) of the authors provided such a link to their paper, a number that is similar in other conferences adopting such preprint linking.

As a conference organizer, you can do your best to encourage authors to submit their pre-print links. Or you can use your influence in the steering committee to push the conference to switch to an open access publisher, such as LIPIcs or Usenix.

As an author, you can help by actually offering a link to your pre-print.

What does Green Open Access cost?

For authors, green open access typically costs no money. University repositories, arXiv, and PeerJ Preprints are all free to use.

It does cost (a bit of) effort though:

  • You need to find out the specific conditions under which the publisher of your current paper permits self-archiving.
  • You need to actually upload your paper to some repository, provide the correct meta-data, and meet the publisher’s constraints.

The fact that open access is free for authors does not mean that there are no costs involved. For example, the money to keep arXiv up and running comes from a series of sponsors, including TU Delft.

Should I adopt Green Open Access?

Yes.

Better availability of your papers will help you in several ways:

  • Impact in Research: Other researchers can access your papers more easily, increasing the chances that they will build upon your results in their work;
  • Impact in Practice: Practitioners may be interested in using your results: A pay-wall is an extra and undesirable impediment for such adoption;
  • Improved Results: Increased usage of your results in either industry or academia will put your results to the real test, and will help you improve your results.

Besides that, (green) open access is a way of delivering to the tax payers what they paid for: Your research results.

Where can I learn more about Green Open Access?

Useful resources include:


Version history:

  • 6 November 2016: Version 0.1, Initial version, call for feedback.
  • 14 November 2016: Version 0.2, update on commercial repositories.
  • 18 November 2016: Version 0.3, update on ACM Authorizer.
  • 20 November 2016: Version 0.4, added TOC, update on commercial repositories.
  • 06 December 2016: Version 0.5, updated information on ACM and IEEE.
  • 20 December 2016: Version 0.6, added info on Creative Commons and AI venues.
  • 27 July 2018: Version 0.7, update on where to archive. Released as CC BY-SA 4.0

Acknowledgments: I thank Moritz Beller (TU Delft) and Dirk Beyer (LMU Munich) for valuable feedback and corrections.

© Arie van Deursen, November 2016. Licensed under CC BY-SA 4.0.