Green Open Access FAQ

Green field

Image credit: Flickr, user static_view

(Opinionated) answers to frequently asked questions on (green) open access, from a computer science (software engineering) research perspective.

Disclaimer: IANAL, so if you want to know things for sure you’ll have to study the references provided. Use at your own risk.

Green open access is trickier than I thought, so I might have made mistakes. Corrections are welcome, just as additional questions for this FAQ. Thanks!

Green Open Access Questions

  1. What is Green Open Access?
  2. What is a pre-print?
  3. What is a post-print?
  4. What is a publisher’s version?
  5. Do publishers allow Green Open Access?
  6. Under what conditions is Green Open Access permitted?
  7. What is Yellow Open Access?
  8. What is Gold Open Access?
  9. What is Hybrid Open Access?
  10. What are the Self-Archiving policies of common computer science venues?
  11. Is Green Open Access compulsory?
  12. Should I share my pre-print under a Creative Commons license?
  13. Can I use Green Open Access to comply with Plan S?
  14. What is a good place for self-archiving?
  15. Can I use PeerJ Preprints for Self-Archiving?
  16. Can I use ResearchGate or Academia.edu for Self-Archiving?
  17. Which version(s) should I self-archive?
  18. What does Gold Open Access add to Green Open Access?
  19. Will Green Open Access hurt commercial publishers?
  20. What is the greenest publisher in computer science?
  21. Should I use ACM Authorizer for Self-Archiving?
  22. As a conference organizer, can I mandate Green Open Access?
  23. What does Green Open Access cost?
  24. Should I adopt Green Open Access?
  25. Where can I learn more about Green Open Access?

What is Green Open Access?

In Green Open Access you as an author archive a version of your paper yourself, and make it publicly available. This can be at your personal home page, at the institutional repository of your employer (such as the one from TU Delft), or at an e-print server such as arXiv.

The word “archive” indicates that the paper will remain available forever.

What is a pre-print?

A pre-print is a version of a paper that is entirely prepared by the authors.

Since no publisher has been involved in any way in the preparation of such a pre-print, it feels right that the authors can deposit such pre-prints where ever they want to. Before submission, the authors, or their employers such as universities, hold the copyright to the paper, and hence can publish the paper in on line repositories.

Following the definition of SHERPA‘s RoMEO project, pre-prints refer to the version before peer-review organized by a publisher.

What is a post-print?

Following the RoMEO definitions, a post-print is a final draft as prepared by the authors themselves after reviewing. Thus, feedback from the reviewers has typically been included.

Here a publisher may have had some light involvement, for example by selecting the reviewers, making a reviewing system available, or by offering a formatting template / style sheet. The post-print, however, is author-prepared, so copy-editing and final markup by the publisher has not been done.

A (Plan S) synonym for postprint is “Author-Accepted Manuscript”, sometimes abbreviated as AAM.

What is a publisher’s version?

While pre- and post-prints are author-prepared, the final publisher’s version is created by the publisher.

The publishers involvement may vary from very little (camera ready version entirely created by authors) up to substantial (proof reading, new markup, copy editing, etc.).

Publishers typically make their versions available after a transfer of copyright, from the authors to the publisher. And with the copyright owned by the publisher, it is the publisher who determines not only where the publisher’s version can be made available, but also where the original author-prepared pre- or post-prints can be made available.

A (Plan S) synonym is “Version of Record”, sometimes abbreviated as VoR.

Do publishers allow Green Open Access?

Self-archiving of non-published material that you own the copyright to is always allowed.

Whether self-archiving of a paper that has been accepted by a publisher for publication is allowed depends on that publisher. You have transferred your copyright, so it is up to the publisher to decide who else can publish it as well.

Different publishers have different policies, and these policies may in turn differ per journal. Furthermore, the policies may vary over time.

The SHERPA project does a great job in keeping track of the open access status of many journals. You’ll need to check the status of your journal, and if it is green you can self-archive your paper (usually under certain publisher-specific conditions).

In the RoMEO definition, green open access means that authors can self-archive both pre-prints and post-prints.

Under what conditions is Green Open Access permitted?

Since the publisher holds copyright on your published paper, it can (and usually does) impose constraints on the self-archived versions. You should always check the specific constraints for your journal or publisher, for example via the RoMEO journal list.

The following conditions are fairly common:

  1. You generally can self-archive pre- and post-prints only, but not the publisher version.

  2. In the meta-data of the self-archived version you need to add a reference to the final version (for example through its DOI).

  3. In the meta-data of the self-archived version you need to include a statement of the current ownership of the copyright, sometimes through specific sentences that must be copy-pasted.

  4. The repository in which you self-archive should be non-commercial. Thus, arXiv and institutional repositories are usually permitted, but commercial ones like PeerJ Preprints, Academia.edu or ResearchGate are not.

  5. Some commercial publishers impose an embargo on post-prints. For example Elsevier permits sharing the post-print version on an institutional repository only after 12-24 months (depending on the journal).

Usually meeting the demands of a single publisher is relatively easy to do. Given points 2 and 3, it typically involves creating a dedicated pdf with a footnote on the first page with the required extra information.

However, every publisher has its own rules. If you publish your papers in a range of different venues (which is what good researchers do), you’ll have to know many different rules if you want to do green open access in the correct way.

What is Yellow Open Access?

Some publishers (such as Wiley) allow self-archiving of pre-prints only, and not of post-prints. This is referred to as yellow open access in RoMEO. Yellow is more restrictive than green.

As an author, I find yellow open access frustrating, as it forbids me to make the version of my paper that was improved thanks to the reviewers available via open access.

As a reviewer, I feel yellow open access wastes my effort: I tried to help authors by giving useful feedback, and the publisher forbids my improvements to be reflected in the open access version.

What is Gold Open Access?

Gold Open Access refers to journals (or conference proceedings) that are completely accessible to the public without requiring paid subscriptions.

Often, gold implies green, for example when a publisher such as PeerJ, PLOS ONE or LIPIcs adopts a Creative Commons license — which allows anyone, including the authors, to share a copy under the condition of proper attribution.

The funding model for open access is usually not based on subscriptions, but on Article Processing Charges, i.e., a payment by the authors for each article they publish (varying between $70 (LIPIcs) up to $1500 (PLOS ONE) per paper).

What is Hybrid Open Access?

Hybrid open access refers to a restricted (subscription-funded) journal that permits authors to pay extra to make their own paper available as open access.

This practice is also referred to as double dipping: The publisher catches revenues from both subscriptions and author processing charges.

University libraries and funding agencies do not like hybrid access, since they feel they have to pay twice, both for the authors and the readers.

Green open access is better than hybrid open access, simply because it achieves the same (an article is available) yet at lower costs.

What are the Self-Archiving policies of common computer science venues?

For your and my convenience, here is the green status of some publishers that are common in software engineering (check links for most up to date information):

  • ACM: Green, e.g., TOSEM, see also the ACM author rights. For ACM conferences, often the author-prepared camera-ready version includes a DOI already, making it easy to adhere to ACM’s meta-data requirements. Note that some ACM conference are gold open access, for example the ones published in the Proceedings of the ACM on Programming Languages.
  • IEEE: Green, e.g., TSE. The IEEE has a policy that the IEEE makes a version available that meets all IEEE meta-data requirements, and that authors can use for self-archiving. See also their self-archiving FAQ.
  • Springer: Green, e.g., EMSE, SoSyM, LNCS. Pre-print on arXiv, post-print on personal page immediately and in repository in some cases immediately and in others after a 12 month embargo period.
  • Elsevier: Mostly green, e.g., JSS, IST. Pre-print allowed; post-print with CC BY-NC-ND license on personal page immediately and in institutional repository after 12-48 month embargo period. To circumvent the embargo you can publish the pre-print on arxiv, update it with the post-print (which is permitted), and update the license to CC BY-NC-ND as required by Elsevier, after which anyone (including you) can share the postprint on any non-commercial platform.
  • Wiley: Mostly yellow, i.e., only pre-prints can be immediately shared, and post-prints (even on personal pages) only after 12 month embargo. E.g. JSEP.

Luckily, there are also some golden open access publishers (which typically permit self-archiving as well should you still want that):

Is Green Open Access compulsory?

Funding agencies (NWO, EU, Bill and Melinda Gates Foundation, …) as well as universities (TU Delft, University of California, UCL, ETH Zurich, Imperial College, …) are increasingly demanding that all publications resulting from their projects or employees are available in open access.

My own university TU Delft insists, like many others, on green open access:

As of 1 May 2016 the so-called Green Road to Open Access publishing is mandatory for all (co)authors at TU Delft. The (co)author must publish the final accepted author’s version of a peer-reviewed article with the required metadata in the TU Delft Institutional Repository.

This makes sense: The TU Delft wants to have copies of all the papers that its employees produce, and make sure that the TU Delft stakeholders, i.e. the Dutch citizens, can access all results. Note that TU Delft insists on post-prints that include reviewer-induced modifications.

The Dutch national science foundation NWO has a preference for gold open access, but accepts green open access if that’s impossible (“Encourage Gold, require immediate Green“).

Should I share my pre-print under a Creative Commons license?

You should only do this if you are certain that the publisher’s conditions on self-archiving pre-prints are compatible with a Creative Commons license. If that is the case, you probably are dealing with a golden open access publisher anyway.

Creative Commons licenses are very liberal, allowing anyone to re-distribute (copy) the licensed work (under certain conditions, including proper attribution).

This effectively nullifies (some of) the rights that come with copyright. For that reason, publishers that insist on owning the full copyright to the papers they publish typically disallow self-archiving earlier versions with such a license.

For example, ACM Computing Surveys insists on a set statement indicating

… © ACM, YYYY. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution…

This “not for redistribution” is incompatible with Creative Commons, which is all about sharing.

Furthermore, a Creative Commons license is irrevocable. So once you picked it for your pre-print, you effectively made a choice for golden open access publishers only (some people might consider this desirable, but it seriously limits your options).

Therefore, my suggestion would be to keep the copyright yourself for as long as you can, giving you the freedom to switch to Creative Commons once you know who your publisher is.

Can I use Green Open Access to Comply with Plan S?

Yes, you can, but you are only compliant with Plan S if you share your postprint, with a Creative Commons License, immediately (no embargo).

But, unfortunately, the creative commons license is likely incompatible with the constraints of your publisher of the eventual paper. As a way around, in some (most) cases (e.g., ACM, IEEE journals, Springer) you are allowed to distribute your postprint with a CC BY license if you actually pay the hybrid open access fee. These fees are not refundable under Plan S, but this hybrid-and-then-self-archive route is compliant with Plan S.

What is a good place for self-archiving?

It depends on your needs.

Your employer may require that you use your institutional repository (such as the TU Delft Repository). This helps your employer to keep track of how many of its publications are available as open access. The higher this number, the stronger the position of your employer when negotiating open access deals with publishers. Institutional archiving still allows you to post a version elsewhere as well.

Subject repositories such as arXiv offer good visibility to your peers. In fields like physics using arXiv is very common, whereas in Computer Science this is less so. A good thing about arXiv is that it permits versioning, making it possible to submit a pre-print first, which can then later be extended with the post-print. You can use several licenses. If you intend publishing your paper, however, you should adopt arXiv’s Non-Exclusive Distribution license (which just allows arXiv to distribute the paper) instead of the more generous Creative Commons license — which would likely conflict with the copyright claims of the publisher of the refereed paper.

Your personal home page is a good place if you want to offer an overview of your own research. Home page URLs may not be very permanent though, so as an approach to self archiving it is not suitable. You can use it in addition to archiving in repositories, but not as a replacement.

Can I use PeerJ Preprints for Self-Archiving?

Probably not — and it’s also not what PeerJ Preprints are intended for.

PeerJ Preprints is a commercial eprint server requiring a Creative Commons license. It is intended to share drafts that have not yet been peer reviewed for formal publication.

It offers good visibility (a preprint on goto statements attracted 15,000 views), and a smooth user interface for posting comments and receiving feedback. Articles can not be removed once uploaded.

The PeerJ Preprint service is compatible with other golden open access publishers (such as PeerJ itself or Usenix).

The PeerJ Preprint service, however, is incompatible with most other publishers (such as ACM, IEEE, or Springer) because (1) the service is commercial; (2) the service requires Creative Commons as license; (3) preprints once posted cannot be removed.

So, if you want to abide with the rules, uploading a pre-print to PeerJ Preprints severely limits your subsequent publication options.

Can I use ResearchGate or Academia.edu for Self-Archiving?

No — unless you only work with liberal publishers with permissive licenses such as Creative Commons.

ResearchGate and Academia.edu are researcher social networks that also offer self-archiving features. As they are commercial repositories, most publishers will not allow sharing your paper on these networks.

The ResearchGate copyright pages provide useful information on this.

The Academia.edu copyright pages state the following:

Many journals will also allow an author to retain rights to all pre-publication drafts of his or her published work, which permits the author to post a pre-publication version of the work on Academia.edu. According to Sherpa, which tracks journal publishers’ approach to copyright, 90% of journals allow uploading of either the pre-print or the post-print of your paper.

This seems misleading to me: Most publishers explicitly dis-allow posting preprints to commercial repositories such as Academia.edu.

In both cases, the safer route is to use permitted places such as your home page or institutional repository for self-archiving, and only share links to your papers with ResearchGate or Academia.edu.

Which version(s) should I self-archive?

It depends.

Publishing a pre-print as soon as it is ready has several advantages:

  • You can receive rapid feedback on a version that is available early.

  • You can extend your pre-print with an appendix, containing material (e.g., experimental data) that does not fit in a paper that you’d submit to a journal

  • It allows you to claim ownership of certain ideas before your competition.

  • You offer most value to society since you allow anyone to benefit as early as possible from your hard work

Nevertheless, publishing a post-print only can also make sense:

  • You may want to keep some results or data secret from your competition until your paper is actually accepted for publication.

  • You may want to avoid confusion between different versions (pre-print versus post-print).

  • You may be scared to leave a trail of rejected versions submitted to different venues.

  • You may want to submit your pre-print to a venue adopting double blind reviewing, requiring you to remain anonymous as author. Publishing your pre-print during the reviewing phase would make it easy for reviewers to find your paper and connect your name to it.

For these reasons, and primarily to avoid confusion, I typically share just the post-print: The camera-ready version that I create and submit to the publisher is also the version that I self-archive as post-print.

What does Gold Open Access add to Green Open Access?

For open access, gold is better than green since:

  • it removes the burden of making articles publicly available from the researcher to the publisher.
  • it places a paper in a venue that is entirely open access. Thus, also other papers improving upon, or referring to your paper (published in the same journal) will be open access too.
  • gold typically implies green, i.e., the license of the journal is similar to Creative Commons, allowing anyone, including the authors, to share a copy under the condition of proper attribution.

Will Green Open Access hurt commercial publishers?

Maybe. But most academic publishers already allow green open access, and they are doing just fine. So I would not worry about it.

What is the greenest publisher in computer science?

The greenest publisher should be the one imposing the least restrictions on self-archiving.

From that perspective, publishers who want to be the greenest should in fact want to be gold, making their papers available under a permissive Creative Commons license. An example is Usenix.

Among the non-golden publishers, the greenest are probably the non-commercial ones, such as IEEE and ACM: They require simple conditions that are usually easy to meet.

The ACM, “the world’s largest educational and scientific computing society”, claims to be among the “greenest” publishers. Based on their tolerant attitude towards self-archiving of post-prints this may be somewhat justified. Furthermore, their Authorizer mechanism permits setting up free access to the publisher’s version.

But greenest is gold. So I look forward to the day the ACM follows its little sister Usenix in a full embrace of golden open access.

Should I use ACM Authorizer for Self-Archiving?

The ACM offers the Authorizer mechanism to provide free access to the Publisher’s Version of a paper, which only works from one user-specified URL. For example, I can use it to create a dedicated link from my institutional paper page to the publisher’s version.

However, Authorizer links cannot be accessed from other pages, and there is no point in emailing or tweeting them. Since only one authorizer link can exist per paper, I cannot use an authorizer link for both my institutional repository, and for the repository of my funding agency.

These restrictions on Authorizer links make them unsuitable as a replacement for self-archiving (let alone as a replacement for golden open access).

As a conference organizer, can I mandate Green Open Access?

Green open access is self-archiving, giving the authors the permission to archive their own papers.

As a conference organizer working with a non open access (ACM, IEEE, Springer-Verlag) publisher, you are not allowed to archive and distribute all the papers of the conference yourself.

OOSPLA program with DOI and preprint link

What several conferences do instead, though, is collecting links to pre- or post-prints. For example, the on line program of the recent OOPSLA 2016 conference has links to both the publisher’s version (through a DOI) and to an author-provided post-print.

For OOPSLA, 20 out of the 52 (38%) of the authors provided such a link to their paper, a number that is similar in other conferences adopting such preprint linking.

As a conference organizer, you can do your best to encourage authors to submit their pre-print links. Or you can use your influence in the steering committee to push the conference to switch to an open access publisher, such as LIPIcs or Usenix.

As an author, you can help by actually offering a link to your pre-print.

What does Green Open Access cost?

For authors, green open access typically costs no money. University repositories, arXiv, and PeerJ Preprints are all free to use.

It does cost (a bit of) effort though:

  • You need to find out the specific conditions under which the publisher of your current paper permits self-archiving.
  • You need to actually upload your paper to some repository, provide the correct meta-data, and meet the publisher’s constraints.

The fact that open access is free for authors does not mean that there are no costs involved. For example, the money to keep arXiv up and running comes from a series of sponsors, including TU Delft.

Should I adopt Green Open Access?

Yes.

Better availability of your papers will help you in several ways:

  • Impact in Research: Other researchers can access your papers more easily, increasing the chances that they will build upon your results in their work;
  • Impact in Practice: Practitioners may be interested in using your results: A pay-wall is an extra and undesirable impediment for such adoption;
  • Improved Results: Increased usage of your results in either industry or academia will put your results to the real test, and will help you improve your results.

Besides that, (green) open access is a way of delivering to the tax payers what they paid for: Your research results.

Where can I learn more about Green Open Access?

Useful resources include:


Version history:

  • 6 November 2016: Version 0.1, Initial version, call for feedback.
  • 14 November 2016: Version 0.2, update on commercial repositories.
  • 18 November 2016: Version 0.3, update on ACM Authorizer.
  • 20 November 2016: Version 0.4, added TOC, update on commercial repositories.
  • 06 December 2016: Version 0.5, updated information on ACM and IEEE.
  • 20 December 2016: Version 0.6, added info on Creative Commons and AI venues.
  • 27 July 2018: Version 0.7, update on where to archive. Released as CC BY-SA 4.0.
  • 18 November 2018: Version 0.8, updated info on Elsevier.
  • 10 September, 2019: Version 0.9, added question on Plan S compliance.

Acknowledgments: I thank Moritz Beller (TU Delft) and Dirk Beyer (LMU Munich) for valuable feedback and corrections.

© Arie van Deursen, November 2016. Licensed under CC BY-SA 4.0.

PhD Student Vacancy in Test Amplification

Within the Software Engineering Research Group of Delft University of Technology, we are looking for an enthusiastic and strong PhD student in the area of “test amplification”.

The PhD project will be in the context of the new STAMP project funded by the H2020 programme of the European Union.

STAMP is a 3-year R&D project, which leverages advanced research in automatic test generation to push automation in DevOps one step further through innovative methods of test amplification. It will reuse existing assets (test cases, API descriptions, dependency models), in order to generate more test cases and test configurations each time the application is updated. This project has an ambitious agenda towards industry transfer. In this regard, the STAMP project gathers 3 research groups which have strong expertise in software testing and continuous development as well as 6 industry partners that develop innovative open source software products.

The STAMP project is led by Benoit Baudry from INRIA, France. The STAMP consortium consists of the following partners

The PhD student employed by Delft University of Technology will conduct research as part of the STAMP project together with the STAMP partners. Employment will be for a period of four years. The PhD student will enroll in the TU Delft Graduate School.

The primary line of research for the TU Delft PhD student will revolve around runtime test amplification. Online test amplification automatically extracts information from logs collected in production in order to generate new tests that can replicate failures, crashes, anomalies and outlier events. The research will be devoted to (i) defining monitoring techniques and log data analytics to collect run-time information; (ii) detecting interesting behaviors with respect to existing tests; (iii) creating new tests for testing the behaviors of interest, for example through state machine learning or genetic algorithms; (iv) adding new probes and new log messages into the production code to improve its testability.

stamp-wps

Besides this primary line of research, the PhD student will be involved in lines of research led by the other STAMP partners, addressing unit test amplification and configurability test amplification. Furthermore, the PhD student will be involved in case studies and evaluations conducted in collaboration with the industrial partners in the consortium.

From the TU Delft Software Engineering group, several people will be involved, including Arie van Deursen (principal investigator), Andy Zaidman, and Mauricio Aniche. Furthermore, where possible collaborations with existing projects will be setup, such as the 3TU Big Software on the Run and TestRoot projects.

Requirements for the PhD candidate include:

  • Being a team player;
  • Strong writing and presentation skills;
  • Being hungry for new knowledge in software engineering;
  • Ability to develop prototype research tools;
  • Interest in bringing program analysis, testing, and genetic algorithms together;
  • Eagerness to work with the STAMP partners on test amplification in their contexts;
  • Completed MSc degree in computer science

For more information on this vacancy and the STAMP project, please contact Arie van Deursen.

To apply, please follow the instructions of the official opening at the TU Delft Vacancies pages. Your letter letter should include a clear motivation why you want to work on the STAMP project, and an explanation of what you can bring to the STAMP project. Also provide your CV, (pointers to) written material (e.g. a term paper, an MSc thesis, or published conference or journal papers), and if possible pointers to (open source) software projects you have contributed to.

The vacancy will be open until 2 February 2017, but applying early never hurts. We look forward to receiving your application!

Asking Students to Create Exam Questions

Do you also find it hard to come up with good multiple choice questions? Then maybe you will like the idea of letting students propose (rather than just answer) questions. A colleague suggested this idea, arguing that it would benefit the students (creating a question requires mastering the material) and would save me work as well.

I liked this idea, and during the last three years I have applied it in my undergrad software testing course. This is a course for around 200 students which are evaluated based on an individual multiple choice exam (besides programming work conducted in pairs).

In class, I discuss example questions, and I invite students to come up with their own. The logistics are as follows:

  • An exam consists of 40 multiple choice questions.
  • Students can submit their questions until one week before the exam.
  • As a teacher I decide which (if any) of the questions I include, and whether I think changes to the questions are necessary.
  • If I include a student question, the student benefits from knowing the answer and from receiving a small bonus for submitting an included question.

To help the students in creating questions, I point them to Cem Kaner’s post on writing multiple choice test questions. I explain that for each question I need:

  • A clear stem of one or two sentences that is meaningful in itself;
  • One clear correct choice;
  • Three distractors that are approximately equally plausible yet also objectively incorrect.

So far, I have used this procedure for eight exams during the last three years.

The students who have proposed questions that I include consistently turn out to belong to the best. This probably means that only very good students go through the effort of creating a question; It also suggests that trying to come up with a question is a good way of preparing for an exam.

For each exam I receive 10-20 questions from around five students: This very much depends on the individual students and may vary per year. Some students recognize the opportunity and submit 20 questions; But most consider it too difficult and do not come up with any.

I typically include 3-5 student questions in the exam (so one in ten questions comes from a student). This essentially depends on the number of good student questions proposed — I don’t impose an upper limit on the number of questions students can submit nor on the total number of student questions that I’m willing to include.

It is only at the exam that the students find out which questions I ask, and whether any of their questions are included. So while there is the possibility that all students share and know in advance some of the questions that might be asked, the students still need to prepare to answer other questions.

My class wondered whether I would be willing to let all 40 questions be provided by a student: My answer was ‘yes’: if a student masters the material so well that he or she is able come up with 40 usable questions covering all the material, that students deserves the highest grade.

Not all submitted questions are usable. I haven’t done the precise math, but I think I include around 20% of the proposed questions. Reasons not to include a question typically are that the question is too simple, that it is ambiguous (some distractors can be considered correct too), or that it overlaps with another question that I consider better. Some students also propose (small variations on) questions that I had used in exams of earlier years. If the similarity is too big, I reject the question.

In some cases I adopt the underlying “idea” of a proposed question, yet rewrite it substantially. In those cases the proposing student still receives the bonus point; Furthermore, the student will probably still know the correct answer.

The best part about involving students in exam creation is that some of the proposed questions are better than I could have made myself. Such questions relate to the students’ own experience (e.g.: “In an earlier course we had to aim at 80% line coverage. In light of what we learned in this course, which of the following …”).

Overall, I am very happy with this way of involving students in the exam creation process. It not only saves me some (though not much) work — it also results in inspirational questions that I could not have invented myself. And, perhaps most importantly, it makes exam creation a lot more fun to me.


Acknowledgments:

  • The idea to let students propose their own questions was suggested to me by Julia Caussin, programme coordinator of the bachelor computer science at Delft University of Technology.

  • Image credit: bilal-kamoon, flickr, CC BY 2.0.


See also:

Embedded Software Development with C Language Extensions

Arie van Deursen, with Markus Voelter, Bernd Kolb, and Stephan Eberle.

In embedded systems development, C remains the dominant programming language, because it permits writing low level algorithms and producing efficient binaries. Unfortunately, the price to pay for this is limited support for explicit and safe abstractions.

To overcome this, engineers at itemis and fortiss created mbeddr: an extensible version of C that comes with extensions relevant to embedded software development. Examples include explicit support for state machines, variability management, physical units, interfaces and components, or unit testing. The extensions are supported by an IDE created through JetBrains MPS. Furthermore, mbeddr users can introduce their own extensions.

To me, the ideas under mbeddr are extremely appealing. But I also had concerns: Would this work in practice? Does this scale to real world embedded systems? What are the benefits of such an approach? What are the problems?

Therefore, when Markus Voelter, lead architect of mbeddr invited me to join in a critical evaluation of a system created with mbeddr that they just shipped, I happily accepted. Eventually, this resulted in our paper Using C Language Extensions for Developing Embedded Software: A Case Study, which was accepted for publication and presentation at OOPSLA 2015.

The subject system built with mbeddr is an electricity smart meter, which continuously senses the instantaneous voltage and current on a mains line using analog front ends and analog-to-digital converters. It’s mbeddr implementation consists of 80 interfaces and 167 components, corresponding to roughly 44,000 lines of C code.

Main layers, sub-systems, and components of the smart metering system.

Main layers, sub-systems, and components of the smart metering system.

Our goal in analyzing this system was to find out the degree to which C language extensions (as implemented in mbeddr) are useful for developing embedded software. We adopted the case study research method to investigate the use of mbeddr in an actual commercial project, since the true risks and benefits of language extensions can be observed only in such projects. Focussing on a single case allows us to provide significant details about that case.

To achieve this goal, we investigated the following aspects of the smart metering system:

  1. Complexity: Are the abstractions provided by mbeddr beneficial for mastering the complexity encountered in a real-world embedded system? Which additional abstractions would be needed or useful?
  2. Testing: Can the mbeddr extensions help with testing the system? In particular, is hardware-independent testing possible to support automated, continuous integration and build? Is incremental integration and commissioning supported?
  3. Overhead: Is the low-level C code generated from the mbeddr extensions efficient enough for it to be deployable onto a real-world embedded device?
  4. Effort: How much effort is required for developing embedded software with mbeddr?

The detailed analysis and answers are in the paper. Our main findings are the following:

  • The extensions help mastering complexity and lead to software that is more testable, easier to integrate and commission, and that is more evolvable.
  • Despite the abstractions introduced by mbeddr, the additional overhead is very low and acceptable in practice.
  • The development effort is reduced, particularly regarding evolution and commissioning.

In our paper, we also devote four pages to potential threats to the validity of our findings. Most importantly, in our experience with this case study and other projects, introducing mbeddr into an organization may be difficult, despite these benefits, due to a lack of developer skills and the need to adapt the development process.

The key insight for me is that mbeddr can help bring down one of the biggest cost and risk factors in embedded systems development, which is the integration and commissioning on the target hardware. Typically, this phase accounts for 40-50% of the project cost; for the smart meter system this was 13%. This was achieved by extensive unit and integration testing, using interfaces that could be instantiated both in a test as well as a target hardware environment.

Continuous integration is not just about the use of a continuous integration server. It is primarily about carefully modularizing the system into components that can be tested independently in different environments. Unfortunately, modularization is hard, especially in languages without explicit modularization primitives. Our study shows how extending C with language constructs can help to devise a modular, testable architecture, substantially reducing integration and commissioning costs.

For more information, see:

  • Markus Völter, Arie van Deursen, Bernd Kolb, Stephan Eberle. Using C Language Extensions for Developing Embedded Software: A Case Study. OOPSLA/SPLASH 2015 (pdf).
  • Presentation at OOSPLA 2015 by Markus Voelter (youtube, slides)
  • Information on this paper at the OOPSLA program pages.

Delft Technology Fellowship for Top Female (Computer) Scientists

TU Delft Logo

Delft University of Technology is aiming to substantially increase the number of top female faculty members. To help accelerate this, the Delft Technology Fellowship offers high-profile, tenure-track positions to top female scientists in research fields in which Delft University of Technology (TU Delft) is active.

One of those fields is of course Computer Science — so if you’re a female computer scientist (or software engineering researcher!) interested in working as an assistant, associate or even full professor (depending on your experience) at the departments of Computer Science and Engineering of the TU Delft Faculty of Electrical Engineering, Mathematics, and Computer Science (EEMCS), please consider applying.

Previous rounds of the TU Delft Fellowship program were held in 2012 and 2014. In both years, 9 top scientists were hired, in such diverse fields as interactive media design, protein machines, solid state physics, climate change, and more.

Since applicants can come from any field of research, the competition for the TU Delft fellowship program is fierce. The program is highly international, with just four out of the current 18 fellows from The Netherlands. As a fellow, you should be the best in your field, and you should be able to explain to non computer scientists what makes you so good.

As a Delft Technology Fellow, you can propose your own research program. As in previous years, it can be in any research field in which TU Delft is active, such as computer science.

The computer science and engineering research at TU Delft is organized into 12 so-called sections, covering such topics as algorithmics, embedded software, cyber security, pattern recognition, and my own topic software engineering. Each section consists of around four faculty members and 10-15 PhD students, and is typically headed by one full professor. PhD students are usually externally funded, through government subsidies obtained in competition, or via collaborations with industry.

As a fellow at the EEMCS faculty, you are expected to bring your own topic. You would, however, typically be working within one of the existing sections. Thus, if you apply, it makes sense to identify the section that is most related your area of work, and explore if you see collaboration opportunities. To that end, you can contact any of the section leaders, or me if you want to discuss where your topic would fit best. Naturally, if you are in software engineering, also feel free to contact me, or any current SERG group member.

For formal instructions on how to apply, please consult the Fellowship web site. The application procedure is open from 12 October 2015 until 8 January 2016.

PhD/PostDoc Vacancies in Persistent Code Reviews

logo-nwo

In the fall 2015 we are starting a brand new project that we titled Persistent Code Reviewing, funded by NWO. If you’re into code reviews, software quality, or software testing, please consider applying for a position as PhD student or Postdoc within this project.

To quote the abstract of the project proposal:

Code review is the manual assessment of source code by human reviewers. It is mainly intended to identify defects and quality problems in code changes before deployment in production. Code review is widely recommended: Several studies have shown that it supports software quality and reliability crucially. Properly doing code reviews requires expensive developer time and zeal, for each and every reviewed change.

The goal of “Persistent Code Reviews” project is to make the efforts and knowledge that reviewers put in a code review available outside the code change context to which they are directed.

Naturally, given my long term interest in software testing, we will include any test activities (test design and execution, test adequacy considerations) that affect the reviewing process in our analysis.

The project is funded by the Top Programme of NWO, the Netherlands Organization for Scientific Research.

Within the project, we have openings for two PhD students and one postdoctoral researcher. The research will be conducted at the Software Engineering Research Group (SERG) of Delft University of Technology in The Netherlands. At SERG, you will be working in a team of around 25 researchers, including 6 full time faculty members.

In this project you will be supervised by Alberto Bacchelli and myself. To learn more about any of these positions, please contact one of us.

Requirements for all positions include:

  • Being a team player;
  • Strong writing and presentation skills;
  • Being hungry for new knowledge in software engineering;
  • Ability to develop prototype research tools;
  • Interest in bringing program analysis, testing, and human aspects of software engineering together.

To apply, please send us an application letter, a CV, and (pointers) to written material (e.g. a term paper or an MSc thesis for applicants for the PhD positions, and published papers or the PhD thesis for the postdoc).

We are in the process of further distributing this announcement: Final decisions on the appointments will be made end of October.

We look forward to receiving your application as soon as possible.

In Vivo Software Analytics: PhD/Postdoc positions

Last week, we had the kickoff of a new project we are participating in addressing “In Vivo Software Analytics”. In this project, called “Big Software on the Run” (BSR) we monitor the quality of software in its “natural habitat”, i.e., as it is running in the wild. The project is a collaboration between the three technical universities (3TU) of The Netherlands (Eindhoven, Twente, Delft).

In Vivo Software Analytics

To quote the 3TU.BSR plan:

Millions of lines of code – written in different languages by different people at different times, and operating on a variety of platforms – drive the systems performing key processes in our society. The resulting software needs to evolve and can no longer be controlled a priori as is illustrated by a range of software problems. The 3TU.BSR research program will develop novel techniques and tools to analyze software systems in vivo – making it possible to visualize behavior, create models, check conformance, predict problems, and recommend corrective actions.

Essentially, we propose to address big software by applying big data techniques to system health information obtained at run time. It provides feedback from operations to developers, in order to make systems more resilient against the risks that come with rapid change.

The project brings together some of the best softare engineering and data science groups and researchers of the three technical universities in The Netherlands:

The project is sponsored by NIRICT, the 3TU center for Netherlands Resaerch in Information and Communication Technology.

The project duration is four years. At each of the three technical universities two PhD students and one one postdoc will be employed. To maxize collaboration, each PhD student has two supervisors, from two different universities. Furthermore, the full research team, including all supervisors, PhD students, and postdocs, will regularly visit each other.

Within the Delft Software Engineering Research Group, we are searching for one PhD student and one postdoc to strengthen the 3TU.BSR project team.

The PhD student we are looking for will work on the intersection between visualization and dynamic program analysis. In particular, we are searching for a PhD student to work on log event analysis, and visualization of anomalies and exceptions as occurring in traces of running systems. The PhD student will be jointly supervised by Jack van Wijk and myself.

The postdoctoral researcher we are looking for should be able to establish connections between the various research themes and groups working on the project (such as visualization, process mining, repository mining, privacy-preserving log file analysis, model checking). Thus, we are looking for a researcher who successfully completed his or her PhD thesis, and is open to work with various of the six PhD students within the project. The postdoc will be based in the Software Engineering Research Group.

Requirements for both positions include:

  • Being a team player;
  • Strong writing and presentation skills;
  • Being hungry for new knowledge in software engineering;
  • Ability to develop prototype research tools;
  • Interest in bringing visualization, run time analysis, and human aspects of software engineering together.

To apply, please send me an application letter, a CV, and (pointers) to written material (e.g. a term paper or an MSc thesis for applicants for the PhD position, and published papers or the PhD thesis for the postdoc).

We are in the process of further distributing this announcement: Final decisions on the appointments will be made end of October.

I look forward to receiving your application!

3TU.BSR Tracks

A South African Perspective on Privacy and Intelligence

The Dutch government has proposed a new law on intelligence and security services (“Wet op de inlichtingen- en veiligheidsdiensten” — Wiv20XX).

As several privacy-related organizations have made clear, this law proposes non-specific (bulk) interception powers for any form of telecom or data transfer without independent ex-ante review or court involvement (see the summary by Matthijs Koot, and reactions on the bill by Bits of Freedom, Privacy International, the Institute for Information Law of the University of Amsterdam IVIR, and the Internet Society ISOC).

This bill gives the Dutch government unprecedented power to violate the privacy of its citizens. Either the Dutch government does not recognize the crucial role of privacy in a well-functioning democracy, or it does not realize what enormous privacy infringements are made possible through Internet surveillance.

Book cover Sachs' Soft Vengeance

When discussing the importance of privacy, I am always reminded of South Africa’s anti-apartheid activist Albie Sachs and his autobiography “The Soft Vengeance of a Freedom Fighter” (first published in 1990, and turned into a film in 2014).

As a law student at the University of Capetown, Albie Sachs started fighting apartheid at the age of 17, in 1952. He was imprisoned from 1963-1964 (solitary confinement) and again in 1966, after which he was exiled from his home country South Africa.

In 1988, living in Maputo, Mozambique, he lost his right arm and an eye when his car was bombed by the South African secret police.

From 1991 until 1993, after Nelson Mandela’s release in 1990, Albie Sachs played a pivotal role in the negotiations leading to the new South African constitution.

In 1994 Nelson Mandela appointed him as judge of the highest court of South Africa, the Constitutional Court. He worked for the Truth and Reconciliation Commission between 1995 and 1998.

Albie Sachs wrote his Soft Vengeance in 1989. Nelson Mandela was still in prison, and the struggle against Apartheid was not won yet. Albie Sachs had just lost his arm and eye, and his book was his attempt to cope with his injuries.

For his recovery he was flown into a London hospital. He noticed that he was remarkably optimistic, and he was wondering why. Here is his reason (p.58):

“Perhaps part of my pleasure at being in this hospital room is that I am fairly sure it is not bugged. Sometimes I used to imagine my phone in Maputo being listened in to by at least three different secret services […]”

“Possibly my continuing sense of post-bomb euphoria comes from the fact that at least for the time being I am out of the net of hidden sensors, my spirit free from spying for the first time in three decades.”

He explains what it means to be surveilled:

“Ever since I was seventeen I have been politically active, I have lived with the notion that there are others accompanying every move I make, listening to every word I say.”

“Did the secret police really follow every up and down of my marriage, pick up the terms of our divorce, record automatically the names of our children even before they were entered in the birth register?”

And this gives rise to his dream for the future:

“I too have a dream, that there will one day be a world without police files, and bugged rooms, and tapped telephones, and intercepted mail, and that I will actually live in it.”

Albie Sachs is not alone in his dream. According to article 12 of the United Nations Universal Declaration of Human Rights, we all have a right to privacy:

“No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks.”

To date, the Internet has given us amazing possibilities to communicate with our family and friends, to search, read, and share information on almost any topic we find interesting, and to shop for almost any item we think we need. As a software engineering educator and researcher, I am proud to have played a tiny part in making this happen.

Unfortunately, the Internet can also be used as a place for massive surveillance activities, at levels that, for example, the South African apartheid regime could only have dreamed of. As a software engineer, I am terrified by the technical opportunities the Internet provides to governments wishing to know everything about their citizens.

A government aimed at drafting a modern intelligence bill should recognize this immense power, and take responsibility to safeguard the necessary privacy protection.

The Dutch government has failed to do so. It has proposed a bill with insufficient independent oversight, a bill that oppressive regimes, such as the former South African regime, would be happy to embrace.

Luckily, the present bill is still a draft. I sincerely hope that the final version will offer adequate privacy protection, and bring the world closer to the dream of Albie Sachs.

Delft Students on Software Architecture: DESOSA 2015

With Rogier Slag.

This year, we taught another edition of the TU Delft Teaching Software Architecture — With GitHub course.

We are proud to announce the resulting on line book: Delft Students on Software Architecture is a collection of architectural descriptions of open source software systems written by students from Delft University of Technology during a master-level course taking place in the spring of 2015.

desosa 2015 book cover

At the start of the course, teams of 3-4 students could adopt a project of choice on GitHub. The projects selected had to be sufficiently complex and actively maintained (one or more pull requests merged per day).

During a 10 week period, the students spent one third of their time on this course,and engaged with these systems in order to understand and describe their software architecture.

Inspired by Brown and Wilsons’ Architecture of Open Source Applications, we decided to organize each description as a chapter, resulting in the present online book.

Recurring Themes

The chapters share several common themes, which are based on smaller assignments the students conducted as part of the course. These themes cover different architectural ‘theories’ as available on the web or in textbooks. The course used Rozanski and Woods’ Software Systems Architecture, and therefore several of their architectural viewpoints and perspectives recur.

The first theme is outward looking, focusing on the use of the system. Thus, many of the chapters contain an explicit stakeholder analysis, as well as a description of the context in which the systems operate. These were based on available online documentation, as well as on an analysis of open and recently closed issues for these systems.

A second theme involves the development viewpoint, covering modules, layers, components, and their inter-dependencies. Furthermore, it addresses integration and testing processes used for the system under analysis.

A third recurring theme is variability management. Many of today’s software systems are highly configurable. In such systems, different features can be enabled or disabled, at compile time or at run time. Using techniques from the field of product line engineering, several of the chapters provide feature-based variability models of the systems under study.

A fourth theme is metrics-based evaluation of software architectures. Using such metrics architects can discuss (desired) quality attributes (performance, scaleability, maintainability, …) of a system quantitatively. Therefore various chapters discuss metrics and in some cases actual measurements tailored towards the systems under analysis.

First-Hand Experience

Last but not least, the chapters are also based on the student’s experience in actually contributing to the systems described. As part of the course over 75 pull requests to the projects under study were made, including refactorings (Jekyll 3545, Docker 11350, Docker 11323, Syncany 391), bug fixes
(Diaspora 5714, OpenRA 7486, OpenRA 7544, Kodi 6570), and helpful documentation such as a Play Framework screen cast.

Through these contributions the students often interacted with lead developers and architects of the systems under study, gaining first-hand experience with the architectural trade-offs made in these systems.

Enjoy!

Working with the open source systems and describing their architectures has been a great experience, both for the teachers and the students.

We hope you will enjoy reading the DESOSA chapters as much as we enjoyed writing them.

Beyond Page Objects

Beyond Page Objects

During the last couple of months I had a good time using Protractor to create an end-to-end test suite for an AngularJS web application.

While applying the Page Object pattern, I realized that I needed more guidance on what page objects to create, and how to navigate through my web application.

To that end, I started drawing little state charts for my web application. Naturally, ‘page objects’ corresponded to states, and their methods to either state inspection methods (is my browser in the correct state?) or state transition methods (clicking this button will bring me to the next state).

Gradually, the following process emerged:

  1. If you want to test certain behavior of your web application, draw a little state diagram to capture the navigation for that behavior.

  2. Create ‘state objects’ for each of the states.

  3. Give each state object its ‘inspection methods’ (what is visible in the web application if I’m in this state) as well as ‘transition methods’ (clicks leading to a new state).

  4. I also find it helpful to give each state object a ‘selfcheck’ method, which just verifies whether the web application is indeed in the state corresponding to the state object.

  5. With the state objects in place, think of the paths you want to take through the application.

  6. The simplest starting point is to write one test for each basic transition: Bring the application in state A, click somehwere, and verify you ended up in the required state B.

  7. Next, you may want to consider longer paths, in which earlier transitions affect later behavior. An example is testing proper use (and resetting of) client-side caching.

I wrote a longer article about this approach, available as “Beyond Page Objects: Testing Web Applications with State Objects”, published in ACM Queue in June 2015, as well as in the Communications of the ACM in August 2015.

The paper also explains how to deal with more complex state machines (using superstates and AND-states, for example), how to use a transition tree to oversee the coverage of longer paths, and how to deal with the infamous back-button. Furthermore, I extended the example “PhoneCat” AngularJS application with a state-object based test suite, available from my GitHub page.

Admittedly, the idea to use state machines for testing purposes is not new at all — yet elaborating how it can be used for testing web applications with WebDriver was helpful to me. I hope you find the use of state objects helpful too.