Golden Open Access for the ACM: Who Should Pay?

Posted in Research by Arie van Deursen

In a move that I greatly support, the ACM Special Interest Group on Programming Languages (SIGPLAN), is exploring various ways to adopt a truly Golden Open Access model, by rolling out a survey asking your opinion, set up by Michael Hicks. Even though I myself am most active in ACM’s Special Interest Group on Software Engineering SIGSOFT, I do publish at and attend SIGPLAN conferences such as OOPSLA. And I sincerely hope that SIGSOFT will follow SIGPLAN’s leadership in this important issue.

ACM presently supports green open access (self-archiving) and a concept called “Open TOC” in which papers are accessible via a dedicated “Table of Contents” page for a particular conference. While better than nothing, I agree with OOPSLA 2017 program chair Jonathan Aldrich who explains in his blog post that Golden Open Access is much preferred.

This does, however, raise the question who should pay for making publications open access, which is part of the SIGPLAN survey:

Attendants Pay: Increase the conference fees: SIGPLAN estimates that this would amount to an increase by around $50,- per attendee.
Authors Pay: Introduce Article Processing Charges: SIGPLAN indicates that if a full conference goes open access this would presently amount to $400 per paper.

Note that the math here suggest that the number of registrants is around 8 times the number of papers in the main research track. Also note that it assumes that only papers in the main research track are made open access. A conference like ICSE, however, has many workshops with many papers: It is equally important that these become open access too, which would change the math considerably.

The article processing charges of $400,- are presented as a given: They may seem in line with what commercial publishers charge, but they are certainly very high compared to what, e.g. LIPIcs charges for ECOOP (which is less than $100). These costs of $400,- come from ACM’s desire (need) to continue to make a substantial profit from their publishing activities, and should go down.

In his blog post, Jonathan Aldrich argues for the “author pays” model. His reasoning is that this can be viewed as a “funder pays” model: Most authors are funded by research grants, and usually in those grants funds can be found to cater for the costs involved in publishing open access.

On this point (and this point alone) I disagree with Jonathan. To me it feels fundamentally wrong to punish authors by making them pay $400 more for their registration. If anything, they should get a reduction for delivering the content of the conference.

I see Jonathan’s point that some funding agencies are willing to cover open access costs (e.g. NSF, NWO, H2020), and that it is worthwhile to explore how to tap into that money. But this requires data on what percentage of papers could be labeled as “funded”. For my department, I foresee several cases where it would be the department who’d have to pay for this instead of an external agency.

I do sympathize with Jonathan’s appeal to reduce conference registration costs, which can be very high. But the cost of making publications open access should be borne by the full community (all attendants), not just by those who happen to publish a paper.

Shining examples of open access computer science conferences are the Usenix, AAAI, and NIPS events. Full golden open access of all content, and no extra charges for authors — these conferences are years ahead of the ACM.

Do you have an opinion on “author pays” versus “participant pays”? Fill in the survey!

Thank you SIGPLAN for initiating this discussion!

7Dec2016

Self-Archiving Publications in Elsevier Pure

Posted in Research by Arie van Deursen

In 2016, TU Delft adopted Elsevier Pure as its database to keep track of all publications from its employees.

At the same time, TU Delft has adopted a mandated green open access policy. This means that for papers published after May 2016, an author-prepared version (pdf) must be uploaded into Pure.

I am very happy with this commitment to green open access (and TU Delft is not alone). This decision also means, however, that we as researchers need to do some extra work, to make our author-prepared versions available.

To make it easier for you to upload your papers and comply with the green open access policy, here are some suggestions based on my experience so far working with Pure.

I can’t say I’m a big fan of Elsevier Pure. In the interest of open access, however, I’m doing my best to tolerate the quirks of Pure, in order to help the TU Delft to share all its research papers freely and persistently with everyone in the world.

Elsevier Pure is used at hundreds of different universities. If you work at one of them, this post may help you in using Pure to make your research available as open access.

The Outcome

Anyone can browse publications in Pure, available at https://pure.tudelft.nl.

All pages have persistent URL’s, making it easy to refer to a list of all your publications (such as my list), or individual papers (such as my recent one on crash reproduction). For all recent papers I have added a pdf of the version that we as authors prepared ourselves (aka the postprint), as well as a DOI link to the publisher version (often behind a paywall).

Thus, you can use Pure to offer, for each publication, your self-archived (green open access) version as well as the final publisher version.

Moreover, these publications can be aggregated to the section, department, and faculty level, for management reporting purposes.

In this way, Pure data shows the tax payers how their money is spent on academic research, and gives the tax payer free access to the outcomes. The tax payer deserves it that we invest some time in populating Pure with accurate data.

Accessing Pure

To enter publications into pure, you’ll need to login. On https://pure.tudelft.nl, in the footer at the right, you’ll find “Log into Pure”. Use your TU Delft netid.

If you’re interested in web applications, you will quickly recognize that Pure is a fairly old system, with user interface choices that would not be made these days.

Entering Meta-Data

You can start entering a publication by hitting the big green button “Add new” at the top right of the page. It will open a brand new browser window for you.

In the new window, click “Research Output”, which will turn blue and expand into three items.

Then there are several ways to enter a publication, including:

Import via Elsevier Scopus, found via “Import from Online Source”. This is by far the easiest, if (1) your publication venue is indexed by Scopus, (2) it is already visible at Scopus (which typically takes a few months), and if (3) you can find it on Scopus. To help Scopus, I have set up an ORCID author identifier and connected it to my Scopus author profile.
Import via Bibtex, found via “Import from file”. If you click it, importing from bibtex is one of the options. You can obtain bibtex entries from DBLP, Google Scholar, ACM, your departmental publications server, or write them by hand in your favorite editor, and then copy paste them into Pure.
Entering details via a series of buttons and forms (“Create from template”). I recommend not to use this option. If you go against this advice, make sure that if you want to enter a conference paper, you do not pick the template “Paper/contribution to conference”, as you should pick “Conference Contribution/Chapter in Conference Proceedings” instead. Don’t ask me why.

In all cases, yet another browser window is opened, in which you can inspect, correct, and save the bibliographic data. After saving, you’ll have a new entry with a unique URL that you can use for sharing your publication. The URL will stay the same after you make additional updates.

Entering your Author-Prepared version

With each publication, you can add various “electronic versions”.

Each can be a file (pdf), a link to a version, or a DOI. For pdfs you want to upload, make sure you check it meets the conditions under your publisher allows self-archiving.

Pure distinguishes various version types, which you can enter via the “Document version” pull down menu. Here you need to include at least the following two versions:

The “accepted author manuscript”. This is also called a postprint, and is the version that (1) is fully prepared by you as authors; and that (2) includes all improvements you made after receiving the reviews. Here you can typically upload the pdf as you prepared it yourself.
The “final published version”. This is the Publisher’s version. It is likely that the final version is copyrighted by the publisher. Therefore, you typically include a link (DOI) to the final version, and do not upload a pdf to Pure. If you import from Scopus, this field is automatically set.

Furthermore, Pure permits setting the “access to electronic version”, and defining the “public access”. Relevant items include:

Open, meaning (green) open access. This is what I typically select for the “accepted author manuscript”.
Restricted, meaning behind a paywall. This is what I typically select for the final published version.
Embargoed, meaning that the pdf cannot be made public until a set date. Can be used for commercial publishers who insist on restricting access to post-prints from institutional repositories in the first 1-2 years.

The vast majority (80%) of the academic publishers permits authors to archive their accepted manuscripts in institutional repositories such as Pure. However, publishers typically permit this under specific conditions, which may differ per publisher. You can check out my Green Open Access FAQ if you want to learn more about these conditions, and how to find them for your (computer science) publisher.

Once uploaded, your pdf is available for download for everyone. Pure adds a cover page with meta-data such as the citation (how it is published) and the DOI to the final version. This cover page is useful, as it helps to meet the intent of the conditions most publishers require on green open access publishing.

Google Scholar indexes Pure, so after a while your paper should also appear on your Scholar page.

A Paper’s Life Cycle

Making papers early available is one of the benefits of self-archiving. This can be done in Pure by setting the paper’s “Publication Status”. This field can have the following values:

“In preparation”: Literally a pre-print. Your paper can be considered a draft and may still change.
“Submitted”: You submitted your paper to a journal or conference where it is now under review.
“Accepted/In press”: Yes, paper accepted! This also means that you as an author can share your “accepted author manuscript”.
“E-Pub ahead of print”: I don’t see how this differs from the Accepted state.
“Published”: The paper is final and has been officially published.

In my Green Open Access FAQ I provide an answer to the question Which Version Should I Self-Archive.

I typically enter publications once accepted, and share the Pure link with the accepted author manuscript as pre-print link on Twitter or on conference sites (e.g. ICSE 2018)

In particular, I do the following once my paper is accepted:

I create a bibtex entry for an @inproceedings (conference, workshop) or @article (journal) publication.
I upload the bibtex entry into pure.
I add my own pdf with the author-prepared version to the resulting pure entry
I set the Publication Status to “Accepted”.
I set the Entry Status (bottom of the page) to “in progress”
I save the entry (bottom of the page)
I share the resulting Pure link on Twitter with the rest of the world so that they can read my paper.

Once the publisher actually manages to publish this paper as well (this may be several months later!), I update my pure entry:

I add the DOI link to the final published version.
I provide the missing bibliographic meta-data (page numbers, volume, number, …).
I set the Publication Status to “Published”.
I set the Entry Status to “for approval” (by the library who can then change it into an immutable “approved” if they think this is a valid entry).

My preprint links I shared still contain a pointer to the self-archived pdf, but now also to the official version at the publisher for those who have access through the pay wall.

Permalinks

The Pure page for your paper including all meta-information and all versions of that paper (example) in principle is stable, and its URL provide a permanent link (unless you delete it).

You can also directly link to the individual pdfs you upload (example). However, these are more volatile: If you upload a newer version the old link will be dead. Moreover, in some cases the (TU Delft) library has moved pdfs around thereby destroying old pdf links.

Therefore, I recommend to use links to the full record rather than individual pdfs when sharing pure links.

Self-Archiving Elsevier Papers

Elsevier does not like it if you self-archive papers published in Elsevier journals into Elsevier Pure. The official rules are that Elsevier journal papers are subject to an embargo, yet at the same time can be published with a CC-BY-NC-ND license on arxiv.

Combining these two leads to the following steps, assuming you have a pre-print (never reviewed), and a post-print (the author-prepared accepted version after review).

Upload your pre-print onto Arxiv.
Add a footnote to your post-print stating: This manuscript version is made available under the CC-BY-NC-ND 4.0 license.
Update your arxiv pre-print with your CC-BY-NC-ND licensed post-print, and add publication details (journal name, volume, issue) to your arxiv entry.
Create a Pure entry for your journal paper
Upload the post-print as author-accepted version to your Pure entry, make it available immediately, and set the license to CC-BY-NC-ND.

Note that the Elsevier rules explicitly allow steps 1-3, and in fact insists on the CC-BY-NC-ND license. Elsevier does not suggest you take step 5, but as a consequence of the CC-BY-NC-ND license you are permitted to do so.

What Elsevier would want you to do instead of step 5 is add the postprint to Pure under a (2 year) embargo, thus delaying (green) open access availability by 2 years. Elsevier Pure even supports this embargo option as one of the “access” options, in which you could enter the end-date of such an embargo.

Note: Yes, these steps are annoying. But: at the time of writing (2019), universities in Germany, Sweden, and California have no access to recent papers published by Elsevier. If you want your paper to be read in any of these countries make sure to upload it into your university repository. If you don’t want to go through these steps and you want your paper to be read, I recommend you pick a different publisher.

Complicated Author Names

Pure contains official employee names as registered by TU Delft.

Some authors publish under different (variants of their) names. For example, Dutch universities have trouble handling the complex naming habits of Portuguese and Brazilian employees.

If Pure is not able to map an author name to the corresponding employee, find the author name in the publication, click edit, and then click “Replace”. This allows searching the TU Delft employee database for the correct person.

If Pure has found the correct employee, but the name displayed is very differently from what is listed on the publication itself, you can edit the author for that publication, and enter a different first and last name for this publication.

Exporting Linked Bibtex (to Orcid)

If you’re logged in, you can download your publication list in various formats, including BibTex (you’ll find the button for this at the bottom of the page).

I prefer bibtex entries that have a url back to the place where all info is. Therefore, I wrote a little Python script to scrape a Pure web page (mine, yours, or anyone’s), that adds such information.

I use the bibtex entries produced by this script to populate my Orcid profile as well as our Departmental Publication Server with publications from Pure that link back to their corresponding pure page.

Version history

20 November 2016: Version 0.1, for internal purposes.
07 December 2016: Version 0.2, first public version.
14 December 2016: Version 0.3, minor improvements.
13 January 2017: Version 0.4, updated Google Scholar information.
16 March 2017: Version 0.5, updated approval states based on correction from Hans Meijerrathken.
17 March 2017: Version 0.6, life cycle and exporting added.
24 November 2017: Version 0.7, simplified life cycle and approval states.
03 March 2018: Version 0.8, added info on populating Orcid from Pure.
27 July 2018: Version 0.9, added info on permalinks, licensed as CC BY-SA 4.0
08 March 2019: Version 1.0, added info on publishing Elsevier papers.

Acknowledgments: Thanks to Moritz Beller for providing feedback and trying out Pure.

6Nov2016

Green Open Access FAQ

Posted in Research by Arie van Deursen

Image credit: Flickr, user static_view

(Opinionated) answers to frequently asked questions on (green) open access, from a computer science (software engineering) research perspective.

Disclaimer: IANAL, so if you want to know things for sure you’ll have to study the references provided. Use at your own risk.

Green open access is trickier than I thought, so I might have made mistakes. Corrections are welcome, just as additional questions for this FAQ. Thanks!

Green Open Access Questions

What is Green Open Access?
What is a pre-print?
What is a post-print?
What is a publisher’s version?
Do publishers allow Green Open Access?
Under what conditions is Green Open Access permitted?
What is Yellow Open Access?
What is Gold Open Access?
What is Hybrid Open Access?
What are the Self-Archiving policies of common computer science venues?
Is Green Open Access compulsory?
Should I share my pre-print under a Creative Commons license?
Can I use Green Open Access to comply with Plan S?
What is a good place for self-archiving?
Can I use PeerJ Preprints for Self-Archiving?
Can I use ResearchGate or Academia.edu for Self-Archiving?
Which version(s) should I self-archive?
What does Gold Open Access add to Green Open Access?
Will Green Open Access hurt commercial publishers?
What is the greenest publisher in computer science?
Should I use ACM Authorizer for Self-Archiving?
As a conference organizer, can I mandate Green Open Access?
What does Green Open Access cost?
Should I adopt Green Open Access?
Where can I learn more about Green Open Access?

What is Green Open Access?

In Green Open Access you as an author archive a version of your paper yourself, and make it publicly available. This can be at your personal home page, at the institutional repository of your employer (such as the one from TU Delft), or at an e-print server such as arXiv.

The word “archive” indicates that the paper will remain available forever.

What is a pre-print?

A pre-print is a version of a paper that is entirely prepared by the authors.

Since no publisher has been involved in any way in the preparation of such a pre-print, it feels right that the authors can deposit such pre-prints where ever they want to. Before submission, the authors, or their employers such as universities, hold the copyright to the paper, and hence can publish the paper in on line repositories.

Following the definition of SHERPA‘s RoMEO project, pre-prints refer to the version before peer-review organized by a publisher.

What is a post-print?

Following the RoMEO definitions, a post-print is a final draft as prepared by the authors themselves after reviewing. Thus, feedback from the reviewers has typically been included.

Here a publisher may have had some light involvement, for example by selecting the reviewers, making a reviewing system available, or by offering a formatting template / style sheet. The post-print, however, is author-prepared, so copy-editing and final markup by the publisher has not been done.

A (Plan S) synonym for postprint is “Author-Accepted Manuscript”, sometimes abbreviated as AAM.

What is a publisher’s version?

While pre- and post-prints are author-prepared, the final publisher’s version is created by the publisher.

The publishers involvement may vary from very little (camera ready version entirely created by authors) up to substantial (proof reading, new markup, copy editing, etc.).

Publishers typically make their versions available after a transfer of copyright, from the authors to the publisher. And with the copyright owned by the publisher, it is the publisher who determines not only where the publisher’s version can be made available, but also where the original author-prepared pre- or post-prints can be made available.

A (Plan S) synonym is “Version of Record”, sometimes abbreviated as VoR.

Do publishers allow Green Open Access?

Self-archiving of non-published material that you own the copyright to is always allowed.

Whether self-archiving of a paper that has been accepted by a publisher for publication is allowed depends on that publisher. You have transferred your copyright, so it is up to the publisher to decide who else can publish it as well.

Different publishers have different policies, and these policies may in turn differ per journal. Furthermore, the policies may vary over time.

The SHERPA project does a great job in keeping track of the open access status of many journals. You’ll need to check the status of your journal, and if it is green you can self-archive your paper (usually under certain publisher-specific conditions).

In the RoMEO definition, green open access means that authors can self-archive both pre-prints and post-prints.

Under what conditions is Green Open Access permitted?

Since the publisher holds copyright on your published paper, it can (and usually does) impose constraints on the self-archived versions. You should always check the specific constraints for your journal or publisher, for example via the RoMEO journal list.

The following conditions are fairly common:

You generally can self-archive pre- and post-prints only, but not the publisher version.
In the meta-data of the self-archived version you need to add a reference to the final version (for example through its DOI).
In the meta-data of the self-archived version you need to include a statement of the current ownership of the copyright, sometimes through specific sentences that must be copy-pasted.
The repository in which you self-archive should be non-commercial. Thus, arXiv and institutional repositories are usually permitted, but commercial ones like PeerJ Preprints, Academia.edu or ResearchGate are not.
Some commercial publishers impose an embargo on post-prints. For example Elsevier permits sharing the post-print version on an institutional repository only after 12-24 months (depending on the journal).

Usually meeting the demands of a single publisher is relatively easy to do. Given points 2 and 3, it typically involves creating a dedicated pdf with a footnote on the first page with the required extra information.

However, every publisher has its own rules. If you publish your papers in a range of different venues (which is what good researchers do), you’ll have to know many different rules if you want to do green open access in the correct way.

What is Yellow Open Access?

Some publishers (such as Wiley) allow self-archiving of pre-prints only, and not of post-prints. This is referred to as yellow open access in RoMEO. Yellow is more restrictive than green.

As an author, I find yellow open access frustrating, as it forbids me to make the version of my paper that was improved thanks to the reviewers available via open access.

As a reviewer, I feel yellow open access wastes my effort: I tried to help authors by giving useful feedback, and the publisher forbids my improvements to be reflected in the open access version.

What is Gold Open Access?

Gold Open Access refers to journals (or conference proceedings) that are completely accessible to the public without requiring paid subscriptions.

Often, gold implies green, for example when a publisher such as PeerJ, PLOS ONE or LIPIcs adopts a Creative Commons license — which allows anyone, including the authors, to share a copy under the condition of proper attribution.

The funding model for open access is usually not based on subscriptions, but on Article Processing Charges, i.e., a payment by the authors for each article they publish (varying between $70 (LIPIcs) up to $1500 (PLOS ONE) per paper).

What is Hybrid Open Access?

Hybrid open access refers to a restricted (subscription-funded) journal that permits authors to pay extra to make their own paper available as open access.

This practice is also referred to as double dipping: The publisher catches revenues from both subscriptions and author processing charges.

University libraries and funding agencies do not like hybrid access, since they feel they have to pay twice, both for the authors and the readers.

Green open access is better than hybrid open access, simply because it achieves the same (an article is available) yet at lower costs.

What are the Self-Archiving policies of common computer science venues?

For your and my convenience, here is the green status of some publishers that are common in software engineering (check links for most up to date information):

ACM: Green, e.g., TOSEM, see also the ACM author rights. For ACM conferences, often the author-prepared camera-ready version includes a DOI already, making it easy to adhere to ACM’s meta-data requirements. Note that some ACM conference are gold open access, for example the ones published in the Proceedings of the ACM on Programming Languages.
IEEE: Green, e.g., TSE. The IEEE has a policy that the IEEE makes a version available that meets all IEEE meta-data requirements, and that authors can use for self-archiving. See also their self-archiving FAQ.
Springer: Green, e.g., EMSE, SoSyM, LNCS. Pre-print on arXiv, post-print on personal page immediately and in repository in some cases immediately and in others after a 12 month embargo period.
Elsevier: Mostly green, e.g., JSS, IST. Pre-print allowed; post-print with CC BY-NC-ND license on personal page immediately and in institutional repository after 12-48 month embargo period. To circumvent the embargo you can publish the pre-print on arxiv, update it with the post-print (which is permitted), and update the license to CC BY-NC-ND as required by Elsevier, after which anyone (including you) can share the postprint on any non-commercial platform.
Wiley: Mostly yellow, i.e., only pre-prints can be immediately shared, and post-prints (even on personal pages) only after 12 month embargo. E.g. JSEP.

Luckily, there are also some golden open access publishers (which typically permit self-archiving as well should you still want that):

PeerJ Computer Science: Gold (creative commons) and green.
Usenix: Gold since 2008. Published with PeerJ. Authors retain their copyright.
LIPIcs-based proceedings: Conferences publishing their papers via Dagstuhl’s Leibniz International Proceedings in Informatics LIPIcs, such as ECOOP, FSCD, …
IEEE Access: The ‘mega-journal’ from IEEE covering all IEEE’s fields of interest.
PLOS ONE: The successful (nonprofit) mega-journal that also publishes computer science papers.
Many venues in Artificial Intelligence, including AAAI, the Journal of Machine Learning Research, Computational Linguistics, the Semantic Web Journal, or the Annual Conference on Neural Information Processing Systems (NIPS).
Specialized conferences or journals such as the Journal of Object Technology or Computational Linguistics.

Is Green Open Access compulsory?

Funding agencies (NWO, EU, Bill and Melinda Gates Foundation, …) as well as universities (TU Delft, University of California, UCL, ETH Zurich, Imperial College, …) are increasingly demanding that all publications resulting from their projects or employees are available in open access.

My own university TU Delft insists, like many others, on green open access:

As of 1 May 2016 the so-called Green Road to Open Access publishing is mandatory for all (co)authors at TU Delft. The (co)author must publish the final accepted author’s version of a peer-reviewed article with the required metadata in the TU Delft Institutional Repository.

This makes sense: The TU Delft wants to have copies of all the papers that its employees produce, and make sure that the TU Delft stakeholders, i.e. the Dutch citizens, can access all results. Note that TU Delft insists on post-prints that include reviewer-induced modifications.

The Dutch national science foundation NWO has a preference for gold open access, but accepts green open access if that’s impossible (“Encourage Gold, require immediate Green“).

Should I share my pre-print under a Creative Commons license?

You should only do this if you are certain that the publisher’s conditions on self-archiving pre-prints are compatible with a Creative Commons license. If that is the case, you probably are dealing with a golden open access publisher anyway.

Creative Commons licenses are very liberal, allowing anyone to re-distribute (copy) the licensed work (under certain conditions, including proper attribution).

This effectively nullifies (some of) the rights that come with copyright. For that reason, publishers that insist on owning the full copyright to the papers they publish typically disallow self-archiving earlier versions with such a license.

For example, ACM Computing Surveys insists on a set statement indicating

… © ACM, YYYY. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution…

This “not for redistribution” is incompatible with Creative Commons, which is all about sharing.

Furthermore, a Creative Commons license is irrevocable. So once you picked it for your pre-print, you effectively made a choice for golden open access publishers only (some people might consider this desirable, but it seriously limits your options).

Therefore, my suggestion would be to keep the copyright yourself for as long as you can, giving you the freedom to switch to Creative Commons once you know who your publisher is.

Can I use Green Open Access to Comply with Plan S?

Yes, you can, but you are only compliant with Plan S if you share your postprint, with a Creative Commons License, immediately (no embargo).

But, unfortunately, the creative commons license is likely incompatible with the constraints of your publisher of the eventual paper. As a way around, in some (most) cases (e.g., ACM, IEEE journals, Springer) you are allowed to distribute your postprint with a CC BY license if you actually pay the hybrid open access fee. These fees are not refundable under Plan S, but this hybrid-and-then-self-archive route is compliant with Plan S.

What is a good place for self-archiving?

It depends on your needs.

Your employer may require that you use your institutional repository (such as the TU Delft Repository). This helps your employer to keep track of how many of its publications are available as open access. The higher this number, the stronger the position of your employer when negotiating open access deals with publishers. Institutional archiving still allows you to post a version elsewhere as well.

Subject repositories such as arXiv offer good visibility to your peers. In fields like physics using arXiv is very common, whereas in Computer Science this is less so. A good thing about arXiv is that it permits versioning, making it possible to submit a pre-print first, which can then later be extended with the post-print. You can use several licenses. If you intend publishing your paper, however, you should adopt arXiv’s Non-Exclusive Distribution license (which just allows arXiv to distribute the paper) instead of the more generous Creative Commons license — which would likely conflict with the copyright claims of the publisher of the refereed paper.

Your personal home page is a good place if you want to offer an overview of your own research. Home page URLs may not be very permanent though, so as an approach to self archiving it is not suitable. You can use it in addition to archiving in repositories, but not as a replacement.

Can I use PeerJ Preprints for Self-Archiving?

Probably not — and it’s also not what PeerJ Preprints are intended for.

PeerJ Preprints is a commercial eprint server requiring a Creative Commons license. It is intended to share drafts that have not yet been peer reviewed for formal publication.

It offers good visibility (a preprint on goto statements attracted 15,000 views), and a smooth user interface for posting comments and receiving feedback. Articles can not be removed once uploaded.

The PeerJ Preprint service is compatible with other golden open access publishers (such as PeerJ itself or Usenix).

The PeerJ Preprint service, however, is incompatible with most other publishers (such as ACM, IEEE, or Springer) because (1) the service is commercial; (2) the service requires Creative Commons as license; (3) preprints once posted cannot be removed.

So, if you want to abide with the rules, uploading a pre-print to PeerJ Preprints severely limits your subsequent publication options.

Can I use ResearchGate or Academia.edu for Self-Archiving?

No — unless you only work with liberal publishers with permissive licenses such as Creative Commons.

ResearchGate and Academia.edu are researcher social networks that also offer self-archiving features. As they are commercial repositories, most publishers will not allow sharing your paper on these networks.

The ResearchGate copyright pages provide useful information on this.

The Academia.edu copyright pages state the following:

Many journals will also allow an author to retain rights to all pre-publication drafts of his or her published work, which permits the author to post a pre-publication version of the work on Academia.edu. According to Sherpa, which tracks journal publishers’ approach to copyright, 90% of journals allow uploading of either the pre-print or the post-print of your paper.

This seems misleading to me: Most publishers explicitly dis-allow posting preprints to commercial repositories such as Academia.edu.

In both cases, the safer route is to use permitted places such as your home page or institutional repository for self-archiving, and only share links to your papers with ResearchGate or Academia.edu.

Which version(s) should I self-archive?

It depends.

Publishing a pre-print as soon as it is ready has several advantages:

You can receive rapid feedback on a version that is available early.
You can extend your pre-print with an appendix, containing material (e.g., experimental data) that does not fit in a paper that you’d submit to a journal
It allows you to claim ownership of certain ideas before your competition.
You offer most value to society since you allow anyone to benefit as early as possible from your hard work

Nevertheless, publishing a post-print only can also make sense:

You may want to keep some results or data secret from your competition until your paper is actually accepted for publication.
You may want to avoid confusion between different versions (pre-print versus post-print).
You may be scared to leave a trail of rejected versions submitted to different venues.
You may want to submit your pre-print to a venue adopting double blind reviewing, requiring you to remain anonymous as author. Publishing your pre-print during the reviewing phase would make it easy for reviewers to find your paper and connect your name to it.

For these reasons, and primarily to avoid confusion, I typically share just the post-print: The camera-ready version that I create and submit to the publisher is also the version that I self-archive as post-print.

What does Gold Open Access add to Green Open Access?

For open access, gold is better than green since:

it removes the burden of making articles publicly available from the researcher to the publisher.
it places a paper in a venue that is entirely open access. Thus, also other papers improving upon, or referring to your paper (published in the same journal) will be open access too.
gold typically implies green, i.e., the license of the journal is similar to Creative Commons, allowing anyone, including the authors, to share a copy under the condition of proper attribution.

Will Green Open Access hurt commercial publishers?

Maybe. But most academic publishers already allow green open access, and they are doing just fine. So I would not worry about it.

What is the greenest publisher in computer science?

The greenest publisher should be the one imposing the least restrictions on self-archiving.

From that perspective, publishers who want to be the greenest should in fact want to be gold, making their papers available under a permissive Creative Commons license. An example is Usenix.

Among the non-golden publishers, the greenest are probably the non-commercial ones, such as IEEE and ACM: They require simple conditions that are usually easy to meet.

The ACM, “the world’s largest educational and scientific computing society”, claims to be among the “greenest” publishers. Based on their tolerant attitude towards self-archiving of post-prints this may be somewhat justified. Furthermore, their Authorizer mechanism permits setting up free access to the publisher’s version.

But greenest is gold. So I look forward to the day the ACM follows its little sister Usenix in a full embrace of golden open access.

Should I use ACM Authorizer for Self-Archiving?

The ACM offers the Authorizer mechanism to provide free access to the Publisher’s Version of a paper, which only works from one user-specified URL. For example, I can use it to create a dedicated link from my institutional paper page to the publisher’s version.

However, Authorizer links cannot be accessed from other pages, and there is no point in emailing or tweeting them. Since only one authorizer link can exist per paper, I cannot use an authorizer link for both my institutional repository, and for the repository of my funding agency.

These restrictions on Authorizer links make them unsuitable as a replacement for self-archiving (let alone as a replacement for golden open access).

As a conference organizer, can I mandate Green Open Access?

Green open access is self-archiving, giving the authors the permission to archive their own papers.

As a conference organizer working with a non open access (ACM, IEEE, Springer-Verlag) publisher, you are not allowed to archive and distribute all the papers of the conference yourself.

What several conferences do instead, though, is collecting links to pre- or post-prints. For example, the on line program of the recent OOPSLA 2016 conference has links to both the publisher’s version (through a DOI) and to an author-provided post-print.

For OOPSLA, 20 out of the 52 (38%) of the authors provided such a link to their paper, a number that is similar in other conferences adopting such preprint linking.

As a conference organizer, you can do your best to encourage authors to submit their pre-print links. Or you can use your influence in the steering committee to push the conference to switch to an open access publisher, such as LIPIcs or Usenix.

As an author, you can help by actually offering a link to your pre-print.

What does Green Open Access cost?

For authors, green open access typically costs no money. University repositories, arXiv, and PeerJ Preprints are all free to use.

It does cost (a bit of) effort though:

You need to find out the specific conditions under which the publisher of your current paper permits self-archiving.
You need to actually upload your paper to some repository, provide the correct meta-data, and meet the publisher’s constraints.

The fact that open access is free for authors does not mean that there are no costs involved. For example, the money to keep arXiv up and running comes from a series of sponsors, including TU Delft.

Should I adopt Green Open Access?

Yes.

Better availability of your papers will help you in several ways:

Impact in Research: Other researchers can access your papers more easily, increasing the chances that they will build upon your results in their work;
Impact in Practice: Practitioners may be interested in using your results: A pay-wall is an extra and undesirable impediment for such adoption;
Improved Results: Increased usage of your results in either industry or academia will put your results to the real test, and will help you improve your results.

Besides that, (green) open access is a way of delivering to the tax payers what they paid for: Your research results.

Where can I learn more about Green Open Access?

Useful resources include:

SHERPA / RoMEO: Green Open Access conditions and restrictions for all journals and publishers.
The UCL Open Access FAQs.
The IEEE Self-Archiving FAQ.

Version history:

6 November 2016: Version 0.1, Initial version, call for feedback.
14 November 2016: Version 0.2, update on commercial repositories.
18 November 2016: Version 0.3, update on ACM Authorizer.
20 November 2016: Version 0.4, added TOC, update on commercial repositories.
06 December 2016: Version 0.5, updated information on ACM and IEEE.
20 December 2016: Version 0.6, added info on Creative Commons and AI venues.
27 July 2018: Version 0.7, update on where to archive. Released as CC BY-SA 4.0.
18 November 2018: Version 0.8, updated info on Elsevier.
10 September, 2019: Version 0.9, added question on Plan S compliance.

Acknowledgments: I thank Moritz Beller (TU Delft) and Dirk Beyer (LMU Munich) for valuable feedback and corrections.

17Oct2016

PhD Student Vacancy in Test Amplification

Posted in Research by Arie van Deursen

Within the Software Engineering Research Group of Delft University of Technology, we are looking for an enthusiastic and strong PhD student in the area of “test amplification”.

The PhD project will be in the context of the new STAMP project funded by the H2020 programme of the European Union.

STAMP is a 3-year R&D project, which leverages advanced research in automatic test generation to push automation in DevOps one step further through innovative methods of test amplification. It will reuse existing assets (test cases, API descriptions, dependency models), in order to generate more test cases and test configurations each time the application is updated. This project has an ambitious agenda towards industry transfer. In this regard, the STAMP project gathers 3 research groups which have strong expertise in software testing and continuous development as well as 6 industry partners that develop innovative open source software products.

The STAMP project is led by Benoit Baudry from INRIA, France. The STAMP consortium consists of the following partners

INRIA, France: DiverSE (Rennes) and SPIRALS (Lille)
ActiveEon, France
ATOS, Spain
Engineering, Italy
OW2, France
SINTEF ICT, Norway
TellU, Norway
TU Delft, The Netherlands
XWiki, France

The PhD student employed by Delft University of Technology will conduct research as part of the STAMP project together with the STAMP partners. Employment will be for a period of four years. The PhD student will enroll in the TU Delft Graduate School.

The primary line of research for the TU Delft PhD student will revolve around runtime test amplification. Online test amplification automatically extracts information from logs collected in production in order to generate new tests that can replicate failures, crashes, anomalies and outlier events. The research will be devoted to (i) defining monitoring techniques and log data analytics to collect run-time information; (ii) detecting interesting behaviors with respect to existing tests; (iii) creating new tests for testing the behaviors of interest, for example through state machine learning or genetic algorithms; (iv) adding new probes and new log messages into the production code to improve its testability.

Besides this primary line of research, the PhD student will be involved in lines of research led by the other STAMP partners, addressing unit test amplification and configurability test amplification. Furthermore, the PhD student will be involved in case studies and evaluations conducted in collaboration with the industrial partners in the consortium.

From the TU Delft Software Engineering group, several people will be involved, including Arie van Deursen (principal investigator), Andy Zaidman, and Mauricio Aniche. Furthermore, where possible collaborations with existing projects will be setup, such as the 3TU Big Software on the Run and TestRoot projects.

Requirements for the PhD candidate include:

Being a team player;
Strong writing and presentation skills;
Being hungry for new knowledge in software engineering;
Ability to develop prototype research tools;
Interest in bringing program analysis, testing, and genetic algorithms together;
Eagerness to work with the STAMP partners on test amplification in their contexts;
Completed MSc degree in computer science

For more information on this vacancy and the STAMP project, please contact Arie van Deursen.

To apply, please follow the instructions of the official opening at the TU Delft Vacancies pages. Your letter letter should include a clear motivation why you want to work on the STAMP project, and an explanation of what you can bring to the STAMP project. Also provide your CV, (pointers to) written material (e.g. a term paper, an MSc thesis, or published conference or journal papers), and if possible pointers to (open source) software projects you have contributed to.

The vacancy will be open until 2 February 2017, but applying early never hurts. We look forward to receiving your application!

24Jul2016

Asking Students to Create Exam Questions

Posted in Teaching by Arie van Deursen

Do you also find it hard to come up with good multiple choice questions? Then maybe you will like the idea of letting students propose (rather than just answer) questions. A colleague suggested this idea, arguing that it would benefit the students (creating a question requires mastering the material) and would save me work as well.

I liked this idea, and during the last three years I have applied it in my undergrad software testing course. This is a course for around 200 students which are evaluated based on an individual multiple choice exam (besides programming work conducted in pairs).

In class, I discuss example questions, and I invite students to come up with their own. The logistics are as follows:

An exam consists of 40 multiple choice questions.
Students can submit their questions until one week before the exam.
As a teacher I decide which (if any) of the questions I include, and whether I think changes to the questions are necessary.
If I include a student question, the student benefits from knowing the answer and from receiving a small bonus for submitting an included question.

To help the students in creating questions, I point them to Cem Kaner’s post on writing multiple choice test questions. I explain that for each question I need:

A clear stem of one or two sentences that is meaningful in itself;
One clear correct choice;
Three distractors that are approximately equally plausible yet also objectively incorrect.

So far, I have used this procedure for eight exams during the last three years.

The students who have proposed questions that I include consistently turn out to belong to the best. This probably means that only very good students go through the effort of creating a question; It also suggests that trying to come up with a question is a good way of preparing for an exam.

For each exam I receive 10-20 questions from around five students: This very much depends on the individual students and may vary per year. Some students recognize the opportunity and submit 20 questions; But most consider it too difficult and do not come up with any.

I typically include 3-5 student questions in the exam (so one in ten questions comes from a student). This essentially depends on the number of good student questions proposed — I don’t impose an upper limit on the number of questions students can submit nor on the total number of student questions that I’m willing to include.

It is only at the exam that the students find out which questions I ask, and whether any of their questions are included. So while there is the possibility that all students share and know in advance some of the questions that might be asked, the students still need to prepare to answer other questions.

My class wondered whether I would be willing to let all 40 questions be provided by a student: My answer was ‘yes’: if a student masters the material so well that he or she is able come up with 40 usable questions covering all the material, that students deserves the highest grade.

Not all submitted questions are usable. I haven’t done the precise math, but I think I include around 20% of the proposed questions. Reasons not to include a question typically are that the question is too simple, that it is ambiguous (some distractors can be considered correct too), or that it overlaps with another question that I consider better. Some students also propose (small variations on) questions that I had used in exams of earlier years. If the similarity is too big, I reject the question.

In some cases I adopt the underlying “idea” of a proposed question, yet rewrite it substantially. In those cases the proposing student still receives the bonus point; Furthermore, the student will probably still know the correct answer.

The best part about involving students in exam creation is that some of the proposed questions are better than I could have made myself. Such questions relate to the students’ own experience (e.g.: “In an earlier course we had to aim at 80% line coverage. In light of what we learned in this course, which of the following …”).

Overall, I am very happy with this way of involving students in the exam creation process. It not only saves me some (though not much) work — it also results in inspirational questions that I could not have invented myself. And, perhaps most importantly, it makes exam creation a lot more fun to me.

Acknowledgments:

The idea to let students propose their own questions was suggested to me by Julia Caussin, programme coordinator of the bachelor computer science at Delft University of Technology.
Image credit: bilal-kamoon, flickr, CC BY 2.0.

See also:

Isabel Gauthier, Teaching and Evaluating All at Once: Asking Students to Write Their Own Questions. Center for Teaching, Vanderbilt University.
Carlos Gonzalez-Cabezas, Olivia S. Anderson, Mary C. Wright, and Margherita Fontana. Association Between Dental Student-Developed Exam Questions and Learning at Higher Cognitive Levels. Journal of Dental Education, 2015 (abstract).

28Oct2015

Embedded Software Development with C Language Extensions

Posted in Research by Arie van Deursen

Arie van Deursen, with Markus Voelter, Bernd Kolb, and Stephan Eberle.

In embedded systems development, C remains the dominant programming language, because it permits writing low level algorithms and producing efficient binaries. Unfortunately, the price to pay for this is limited support for explicit and safe abstractions.

To overcome this, engineers at itemis and fortiss created mbeddr: an extensible version of C that comes with extensions relevant to embedded software development. Examples include explicit support for state machines, variability management, physical units, interfaces and components, or unit testing. The extensions are supported by an IDE created through JetBrains MPS. Furthermore, mbeddr users can introduce their own extensions.

To me, the ideas under mbeddr are extremely appealing. But I also had concerns: Would this work in practice? Does this scale to real world embedded systems? What are the benefits of such an approach? What are the problems?

Therefore, when Markus Voelter, lead architect of mbeddr invited me to join in a critical evaluation of a system created with mbeddr that they just shipped, I happily accepted. Eventually, this resulted in our paper Using C Language Extensions for Developing Embedded Software: A Case Study, which was accepted for publication and presentation at OOPSLA 2015.

The subject system built with mbeddr is an electricity smart meter, which continuously senses the instantaneous voltage and current on a mains line using analog front ends and analog-to-digital converters. It’s mbeddr implementation consists of 80 interfaces and 167 components, corresponding to roughly 44,000 lines of C code.

Main layers, sub-systems, and components of the smart metering system.

Our goal in analyzing this system was to find out the degree to which C language extensions (as implemented in mbeddr) are useful for developing embedded software. We adopted the case study research method to investigate the use of mbeddr in an actual commercial project, since the true risks and benefits of language extensions can be observed only in such projects. Focussing on a single case allows us to provide significant details about that case.

To achieve this goal, we investigated the following aspects of the smart metering system:

Complexity: Are the abstractions provided by mbeddr beneficial for mastering the complexity encountered in a real-world embedded system? Which additional abstractions would be needed or useful?
Testing: Can the mbeddr extensions help with testing the system? In particular, is hardware-independent testing possible to support automated, continuous integration and build? Is incremental integration and commissioning supported?
Overhead: Is the low-level C code generated from the mbeddr extensions efficient enough for it to be deployable onto a real-world embedded device?
Effort: How much effort is required for developing embedded software with mbeddr?

The detailed analysis and answers are in the paper. Our main findings are the following:

The extensions help mastering complexity and lead to software that is more testable, easier to integrate and commission, and that is more evolvable.
Despite the abstractions introduced by mbeddr, the additional overhead is very low and acceptable in practice.
The development effort is reduced, particularly regarding evolution and commissioning.

In our paper, we also devote four pages to potential threats to the validity of our findings. Most importantly, in our experience with this case study and other projects, introducing mbeddr into an organization may be difficult, despite these benefits, due to a lack of developer skills and the need to adapt the development process.

The key insight for me is that mbeddr can help bring down one of the biggest cost and risk factors in embedded systems development, which is the integration and commissioning on the target hardware. Typically, this phase accounts for 40-50% of the project cost; for the smart meter system this was 13%. This was achieved by extensive unit and integration testing, using interfaces that could be instantiated both in a test as well as a target hardware environment.

Continuous integration is not just about the use of a continuous integration server. It is primarily about carefully modularizing the system into components that can be tested independently in different environments. Unfortunately, modularization is hard, especially in languages without explicit modularization primitives. Our study shows how extending C with language constructs can help to devise a modular, testable architecture, substantially reducing integration and commissioning costs.

For more information, see:

Markus Völter, Arie van Deursen, Bernd Kolb, Stephan Eberle. Using C Language Extensions for Developing Embedded Software: A Case Study. OOPSLA/SPLASH 2015 (pdf).
Presentation at OOSPLA 2015 by Markus Voelter (youtube, slides)
Information on this paper at the OOPSLA program pages.

13Oct2015

Delft Technology Fellowship for Top Female (Computer) Scientists

Posted in Research, Teaching by Arie van Deursen

Delft University of Technology is aiming to substantially increase the number of top female faculty members. To help accelerate this, the Delft Technology Fellowship offers high-profile, tenure-track positions to top female scientists in research fields in which Delft University of Technology (TU Delft) is active.

One of those fields is of course Computer Science — so if you’re a female computer scientist (or software engineering researcher!) interested in working as an assistant, associate or even full professor (depending on your experience) at the departments of Computer Science and Engineering of the TU Delft Faculty of Electrical Engineering, Mathematics, and Computer Science (EEMCS), please consider applying.

Previous rounds of the TU Delft Fellowship program were held in 2012 and 2014. In both years, 9 top scientists were hired, in such diverse fields as interactive media design, protein machines, solid state physics, climate change, and more.

Since applicants can come from any field of research, the competition for the TU Delft fellowship program is fierce. The program is highly international, with just four out of the current 18 fellows from The Netherlands. As a fellow, you should be the best in your field, and you should be able to explain to non computer scientists what makes you so good.

As a Delft Technology Fellow, you can propose your own research program. As in previous years, it can be in any research field in which TU Delft is active, such as computer science.

The computer science and engineering research at TU Delft is organized into 12 so-called sections, covering such topics as algorithmics, embedded software, cyber security, pattern recognition, and my own topic software engineering. Each section consists of around four faculty members and 10-15 PhD students, and is typically headed by one full professor. PhD students are usually externally funded, through government subsidies obtained in competition, or via collaborations with industry.

As a fellow at the EEMCS faculty, you are expected to bring your own topic. You would, however, typically be working within one of the existing sections. Thus, if you apply, it makes sense to identify the section that is most related your area of work, and explore if you see collaboration opportunities. To that end, you can contact any of the section leaders, or me if you want to discuss where your topic would fit best. Naturally, if you are in software engineering, also feel free to contact me, or any current SERG group member.

For formal instructions on how to apply, please consult the Fellowship web site. The application procedure is open from 12 October 2015 until 8 January 2016.

21Sep2015

PhD/PostDoc Vacancies in Persistent Code Reviews

Posted in Research by Arie van Deursen

In the fall 2015 we are starting a brand new project that we titled Persistent Code Reviewing, funded by NWO. If you’re into code reviews, software quality, or software testing, please consider applying for a position as PhD student or Postdoc within this project.

To quote the abstract of the project proposal:

Code review is the manual assessment of source code by human reviewers. It is mainly intended to identify defects and quality problems in code changes before deployment in production. Code review is widely recommended: Several studies have shown that it supports software quality and reliability crucially. Properly doing code reviews requires expensive developer time and zeal, for each and every reviewed change.

The goal of “Persistent Code Reviews” project is to make the efforts and knowledge that reviewers put in a code review available outside the code change context to which they are directed.

Naturally, given my long term interest in software testing, we will include any test activities (test design and execution, test adequacy considerations) that affect the reviewing process in our analysis.

The project is funded by the Top Programme of NWO, the Netherlands Organization for Scientific Research.

Within the project, we have openings for two PhD students and one postdoctoral researcher. The research will be conducted at the Software Engineering Research Group (SERG) of Delft University of Technology in The Netherlands. At SERG, you will be working in a team of around 25 researchers, including 6 full time faculty members.

In this project you will be supervised by Alberto Bacchelli and myself. To learn more about any of these positions, please contact one of us.

Requirements for all positions include:

Being a team player;
Strong writing and presentation skills;
Being hungry for new knowledge in software engineering;
Ability to develop prototype research tools;
Interest in bringing program analysis, testing, and human aspects of software engineering together.

To apply, please send us an application letter, a CV, and (pointers) to written material (e.g. a term paper or an MSc thesis for applicants for the PhD positions, and published papers or the PhD thesis for the postdoc).

We are in the process of further distributing this announcement: Final decisions on the appointments will be made end of October.

We look forward to receiving your application as soon as possible.

20Sep2015

In Vivo Software Analytics: PhD/Postdoc positions

Posted in Research by Arie van Deursen

Last week, we had the kickoff of a new project we are participating in addressing “In Vivo Software Analytics”. In this project, called “Big Software on the Run” (BSR) we monitor the quality of software in its “natural habitat”, i.e., as it is running in the wild. The project is a collaboration between the three technical universities (3TU) of The Netherlands (Eindhoven, Twente, Delft).

To quote the 3TU.BSR plan:

Millions of lines of code – written in different languages by different people at different times, and operating on a variety of platforms – drive the systems performing key processes in our society. The resulting software needs to evolve and can no longer be controlled a priori as is illustrated by a range of software problems. The 3TU.BSR research program will develop novel techniques and tools to analyze software systems in vivo – making it possible to visualize behavior, create models, check conformance, predict problems, and recommend corrective actions.

Essentially, we propose to address big software by applying big data techniques to system health information obtained at run time. It provides feedback from operations to developers, in order to make systems more resilient against the risks that come with rapid change.

The project brings together some of the best softare engineering and data science groups and researchers of the three technical universities in The Netherlands:

TU Eindhoven: The Visualization (Jack van Wijk) and Architecture of Information Systems research groups (Wil van der Aalst)
University of Twente: The Formal Methods and Tools group (Jaco van de Pol, Marieke Huisman)
TU Delft: The CyberSecurity group (Inald Lagendijk) and the Software Engineering Research Group (myself)

The project is sponsored by NIRICT, the 3TU center for Netherlands Resaerch in Information and Communication Technology.

The project duration is four years. At each of the three technical universities two PhD students and one one postdoc will be employed. To maxize collaboration, each PhD student has two supervisors, from two different universities. Furthermore, the full research team, including all supervisors, PhD students, and postdocs, will regularly visit each other.

Within the Delft Software Engineering Research Group, we are searching for one PhD student and one postdoc to strengthen the 3TU.BSR project team.

The PhD student we are looking for will work on the intersection between visualization and dynamic program analysis. In particular, we are searching for a PhD student to work on log event analysis, and visualization of anomalies and exceptions as occurring in traces of running systems. The PhD student will be jointly supervised by Jack van Wijk and myself.

The postdoctoral researcher we are looking for should be able to establish connections between the various research themes and groups working on the project (such as visualization, process mining, repository mining, privacy-preserving log file analysis, model checking). Thus, we are looking for a researcher who successfully completed his or her PhD thesis, and is open to work with various of the six PhD students within the project. The postdoc will be based in the Software Engineering Research Group.

Requirements for both positions include:

Being a team player;
Strong writing and presentation skills;
Being hungry for new knowledge in software engineering;
Ability to develop prototype research tools;
Interest in bringing visualization, run time analysis, and human aspects of software engineering together.

To apply, please send me an application letter, a CV, and (pointers) to written material (e.g. a term paper or an MSc thesis for applicants for the PhD position, and published papers or the PhD thesis for the postdoc).

We are in the process of further distributing this announcement: Final decisions on the appointments will be made end of October.

I look forward to receiving your application!

31Aug2015

A South African Perspective on Privacy and Intelligence

Posted in Society by Arie van Deursen

The Dutch government has proposed a new law on intelligence and security services (“Wet op de inlichtingen- en veiligheidsdiensten” — Wiv20XX).

As several privacy-related organizations have made clear, this law proposes non-specific (bulk) interception powers for any form of telecom or data transfer without independent ex-ante review or court involvement (see the summary by Matthijs Koot, and reactions on the bill by Bits of Freedom, Privacy International, the Institute for Information Law of the University of Amsterdam IVIR, and the Internet Society ISOC).

This bill gives the Dutch government unprecedented power to violate the privacy of its citizens. Either the Dutch government does not recognize the crucial role of privacy in a well-functioning democracy, or it does not realize what enormous privacy infringements are made possible through Internet surveillance.

When discussing the importance of privacy, I am always reminded of South Africa’s anti-apartheid activist Albie Sachs and his autobiography “The Soft Vengeance of a Freedom Fighter” (first published in 1990, and turned into a film in 2014).

As a law student at the University of Capetown, Albie Sachs started fighting apartheid at the age of 17, in 1952. He was imprisoned from 1963-1964 (solitary confinement) and again in 1966, after which he was exiled from his home country South Africa.

In 1988, living in Maputo, Mozambique, he lost his right arm and an eye when his car was bombed by the South African secret police.

From 1991 until 1993, after Nelson Mandela’s release in 1990, Albie Sachs played a pivotal role in the negotiations leading to the new South African constitution.

In 1994 Nelson Mandela appointed him as judge of the highest court of South Africa, the Constitutional Court. He worked for the Truth and Reconciliation Commission between 1995 and 1998.

Albie Sachs wrote his Soft Vengeance in 1989. Nelson Mandela was still in prison, and the struggle against Apartheid was not won yet. Albie Sachs had just lost his arm and eye, and his book was his attempt to cope with his injuries.

For his recovery he was flown into a London hospital. He noticed that he was remarkably optimistic, and he was wondering why. Here is his reason (p.58):

“Perhaps part of my pleasure at being in this hospital room is that I am fairly sure it is not bugged. Sometimes I used to imagine my phone in Maputo being listened in to by at least three different secret services […]”

“Possibly my continuing sense of post-bomb euphoria comes from the fact that at least for the time being I am out of the net of hidden sensors, my spirit free from spying for the first time in three decades.”

He explains what it means to be surveilled:

“Ever since I was seventeen I have been politically active, I have lived with the notion that there are others accompanying every move I make, listening to every word I say.”

“Did the secret police really follow every up and down of my marriage, pick up the terms of our divorce, record automatically the names of our children even before they were entered in the birth register?”

And this gives rise to his dream for the future:

“I too have a dream, that there will one day be a world without police files, and bugged rooms, and tapped telephones, and intercepted mail, and that I will actually live in it.”

Albie Sachs is not alone in his dream. According to article 12 of the United Nations Universal Declaration of Human Rights, we all have a right to privacy:

“No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks.”

To date, the Internet has given us amazing possibilities to communicate with our family and friends, to search, read, and share information on almost any topic we find interesting, and to shop for almost any item we think we need. As a software engineering educator and researcher, I am proud to have played a tiny part in making this happen.

Unfortunately, the Internet can also be used as a place for massive surveillance activities, at levels that, for example, the South African apartheid regime could only have dreamed of. As a software engineer, I am terrified by the technical opportunities the Internet provides to governments wishing to know everything about their citizens.

A government aimed at drafting a modern intelligence bill should recognize this immense power, and take responsibility to safeguard the necessary privacy protection.

The Dutch government has failed to do so. It has proposed a bill with insufficient independent oversight, a bill that oppressive regimes, such as the former South African regime, would be happy to embrace.

Luckily, the present bill is still a draft. I sincerely hope that the final version will offer adequate privacy protection, and bring the world closer to the dream of Albie Sachs.