Green Open Access and Preprint Linking

Posted in Research by Arie van Deursen

One of the most useful changes to the ICSE International Conference on Software Engineering this year, was that the program website contained links to preprints of many of the papers presented.

As ICSE is a large event (over 1600 people attended in 2013), it is worth taking a look at what happened. What is preprint linking? How many authors actually provided a preprint link? What about other conferences? What are the wider implications for open access publishing in software engineering?

Self-Archiving

Preprint linking is based on the idea that authors, who do all the work in both writing and formating the paper, have the right to self-archive the paper they created themselves (also called green open access). Authors can do this on their personal home page, in institutional repositories of, e.g., the universities where they work or in public preprint repositories such as arxiv.

Sharing preprints has been around in science since decades (if not ages): As an example, my ‘alma mater’ CWI was founded in 1947, and has a technical report series dating back to that year. These technical reports were exchanged (without costs) with other mathematical research institutes. First by plain old mail, then by email, later via ftp, and now through http.

While commercial publishers may dislike the idea that a free preprint is available for papers they publish in their journals or conference proceedings, 69% of the publishers do in fact allow (some form of) self-archiving. For example, ACM, IEEE, Springer, and Elsevier (the publishers I work most with) explicitly permit it, albeit always under specific conditions. These conditions can usually be met, and include such requirements as providing a note that the paper has been accepted for publication, a pointer to the URL where the published article can be found, and a copyright notice indicating the publisher now owns the copyright.

Preprint links as shown on ICSE site.

Preprint Linking

All preprint linking does, is ask authors of accepted conference papers, whether they happen to have a link to a preprint available. If so, the conference web site will include a link to this preprint in its progam as listed on its web site.

For ICSE, doing full preprint linking at the conference site was proposed and conducted by Dirk Beyer, after an earlier set of preprint links was collected on a separate github gist by Adrian Kuhn.

Dirk Beyer runs Conference Publishing Consulting, the organization hired by ICSE to collect all material to be published, and get it ready for inclusion in the ACM/IEEE Digital Libraries. As part of this collection process, ICSE asked the authors to provide a link to a preprint, which, if provided, was included in the ICSE on line program.

The ICSE 2013 proceedings were published by IEEE. In their recently updated policy, they indicate that “IEEE will make available to each author a preprint version of that person’s article that includes the Digital Object Identifier, IEEE’s copyright notice, and a notice showing the article has been accepted for publication.” Thus, for ICSE, authors were provided with a possibility to download this version, which they then could self-archive.

Preprints @ ICSE 2013

With a preprint mechanism setup at ICSE, the next question is how many researchers actually made use of it. Below are some statistics I collected from the ICSE conference site:

Track / Conference	#Papers presented	#Preprints	Percentage
Research Track	85	49	57%
ICSE NIER	31	19	61%
ICSE SEIP	19	6	31%
ICSE Education	13	3	23%
ICSE Tools	16	7	43%
MSR	64	36	56%
Total	228	120	53%

In other words, a little over half of the authors (53%) provided a preprint link. And, almost half of the authors decided not to.

I hope and expect that for upcoming ICSE conferences, more authors will submit their preprint links. As a comparison, at the recent FORTE conference, 75% of the authors submitted a preprint link.

For ICSE, this year was the first time preprint linking was available. Authors may have not been familiar with the phenomenon, may not have realized in advance how wonderful a program with links to freely available papers is, may have missed the deadline for submitting the link, or may have missed the email asking for a link altogether as it ended up in their spam folder. And, in all honesty, even I managed to miss the opportunity to send in my link in time for some of my MSR 2013 papers. But that won’t happen again.

Preprint Link Sustainability

An issue of some concern is the “sustainability” of the preprint links — what happens, for example, to homepages with preprints once the author changes jobs (once the PhD student finishes)?

The natural solution is to publish preprints not just on individual home pages, but to submit them to repositories that are likely to have a longer lifetime, such as arxiv, or your own technical report series.

An interesting route is taken by ICPC, which instead of preprint links simply provides a dedicated preprint search on Google Scholar, with authors and title already filled in. If a preprint has been published somewhere, and the author/title combination is sufficiently unique, this works remarkably well. MSR uses a mixture of both appraoches, by providing a search link for presentations for which no preprint link was provided.

Implications

Open access, and hence preprint publishing, is of utmost importance for software engineering.

Software engineering research is unique in that it has a potentially large target audience of developers and software engineering practitioners that is on line continually. Software engineering research cannot afford to dismiss this audience by hiding research results behind paywalls.

For this reason, it is inevitable that on the long run, software engineering researchers will transform their professional organizations (ACM and IEEE) so that their digital libraries will make all software engineering results available via open access.

Irrespective of this long term development, the software engineering research community must hold on to the new preprint linking approach to leverage green open access.

Thus:

As an author, self-archive your paper as a preprint or technical report. Consider your paper unpublished if the preprint is not available.
If you are a professor leading a research group, inspire your students and group members to make all of their publications available as preprint.
If you are a reviewer for a conference, insist that your chairs ensure that preprint links are collected and made available on the conference web site.
If you are a conference organizer or program chair, convince all authors to publish preprints, and make these links permanently available on the conference web site.
If you are on a hiring committee for new university staff members, demand that candidates have their publications available as preprints.

Much of this has been possible for years. Maybe one of the reasons these practices have not been adopted in full so far, is that they cost some time and effort — from authors, professors, and conference organizers alike — time that cannot be used for creative work, and effort that does not immediately contribute to tenure or promotion. But it is time well spent, as it helps to disseminate our research to a wider audience.

Thanks to the ICSE move, there now may be a momentum to make a full swing transition to green open access in the software eningeering community. I look forward to 2014, when all software engineering conferences will have adopted preprint linking, and 100% of the authors will have submitted their preprint links. Let us not miss this unique opportunity.

Acknowledgments

I am grateful to Dirk Beyer, for setting up preprint linking at ICSE, and for providing feedback on this post.

Update (Summer 2013)

ESEC FSE 2013 adopted preprint linking, and shows another good use of it: Prior to the conference every day one or two preprint links were tweeted through the conference’s @esecfse Twitter account.
For 2013, SPLASH / OOPSLA has included preprint links in its program as well.

24Apr2013

David Notkin on Why We Publish

Posted in Research, Teaching by Arie van Deursen

This week David Notkin (1955-2013) passed away, after a long battle against cancer. He was one of my heroes. He did great research on discovering invariants, reflexion models, software architecture, clone analysis, and more. His 1986 Gandalf paper was one of the first I studied when starting as a PhD student in 1990.

December 2011 David sent me an email in which he expressed interest to do a sabbatical in our TU Delft Software Engineering Research Group in 2013-2014. I was as proud as one can be. Unfortunately, half a year later he had to cancel his plans due to his health.

David was loved by many, as he had a genuine interest in people: developers, software users, researchers, you. And he was a great (friendly and persistent) organizer — 3 weeks ago he still answered my email on ICSE 2013 organizational matters.

In February 2013, he wrote a beautiful editorial for the ACM Transactions on Software Engineering and Methodology, entitled Looking Back. His opening reads: “It is bittersweet to pen my final editorial”. Then David continues to address the question why it is that we publish:

“… I’d like very much for each and every reader, contributor, reviewer, and editor to remember that the publications aren’t primarily for promotions, or for citation counts, or such.
Rather, the intent is to make the engineering of software more effective so that society can benefit even more from the amazing potential of software.
It is sometimes hard to see this on a day-to-day basis given the many external pressures that we all face. But if we never see this, what we do has little value to society. If we care about inﬂuence, as I hope we do, then adding value to society is the real measure we should pursue.
Of course, this isn’t easy to quantify (as are many important things in life, such as romance), and it’s rarely something a single individual can achieve even in a lifetime. But BHAGs (Big Hairy Audacious Goals) are themselves of value, and we should never let them fade far from our minds and actions.”

Dear David, we will miss you very much.

Speaking in Irvine on Metrics and Architecture

Posted in Research by Arie van Deursen

End of last year I was honored to receive an invitation to present in the Distinguished Speaker Series at the Insitute for Software Research at University of California at Irvine.

I quickly decided that the topic to discuss would be our research on software architecture, and in particular our work on metrics for maintainability.

Irvine is one of the world’s leading centers on research in software architecture. The Institute of Software Research is headed by Richard Taylor, who supervised Roy Fielding when he wrote his PhD thesis covering the REST architectural style, and Nenad Medvidovic during his work on architectural description laguages. Current topics investigated at Irvine include design and collaboration (André van der Hoek, and David Redmiles of ArgoUML fame), software analyis and testing (James Jones), and programming laguages (Cristina Lopes), to name a few. An overview of the group’s vision on software architecture can be found in their recently published textbook. In short, I figured that if there is one place to present our software architecture research it must be Irvine.

The talk (90 minutes) itself will be loosely based on my keynote at the Brazilian Software Engineering Symposium (SBES 2012), which in turn is based on joint research with Eric Bouwers and Joost Visser (both from SIG). ~~I’ll post the slides when I’m done.~~ The full slides are available on speakerdeck, but here’s the storyline along with some references.

A Software Risk Assessment (source: ICSM 2009)

The context of this research is a software risk assessment, in which a client using a particular system seeks independent advice (from a consultant) on the technical quality of the system as created by an external supplier.

How can the client be sure that the system made for him is of good quality? In particular, will it be sufficiently maintainable, if the business context of the system in question changes? Will it be easy to adapt the system to the ever changing world?

In situations like these, it is quintessential to be able to make objective, evidence-based statements about the maintainability of the system in question.

Is this possible? What role can metrics play? What are their inherent limitations? How can we know that a metric indeed captures certain aspects of maintainability? How should metric values be interpreted? How should proposals for new metrics be evaluated?

Simple answers to these questions do not exist. In this talk, I will summarize our current progress in answering these questions.

I will start out by summarizing four common pitfalls when using metrics in a software development project. Then, I will describe a metrics framework in which metrics are put into context by means of benchmarking and a quality model. Subsequently, I’ll zoom in on architectural metrics, focusing on metrics for encapsulation. I will discuss a proposal for a new metric, as well as its evaluation. The evaluation comprises both a quantitative assessment (using repository-mining) of its construct validity (doest it measure encapsulation?), as well as qualitative assessments of the usefulness in practice (by interviewing consultants who applied the metrics in their day to day work).

Based on this, I will reflect on the road ahead for empirical research in software metrics and architecture, emphasizing the need for shared datasets, as well as the use of qualitative research methods to evaluate practical impact.

The talk is scheduled for Friday March 15, in Irvine — I sincerely hope to see you there!

If you can’t make it, Eric Bouwers and I will present a 3.5-hour tutorial based on this same material at ICSE 2013, in May in San Francisco. The tutorial will be more interactive, taking your experience into account as well where possible, and it will have a stronger emphasis on metrics (based on SIG’s 10 year experience with using metrics in industry). Register now for our tutorial, and looking forward to seeing you there!

Design for Upgradability and the Rails DigiD Outage

Posted in Development by Arie van Deursen

On January 9th, the Dutch DigiD system was taken offline for 9 hours. The reason was a vulnerability (CVE-2013-0155 and CVE-2013-0156) in the underlying Ruby on Rails system used. According to the exploit, it enables attackers to bypass authentication, inject SQL, perform a denial of service, or execute arbitrary code.

DigiD is a Dutch authentication system used by over 600 organizations, including the national taxes. Over 9 million Dutch citizens have a DigiD account, which they must use for various interactions with the government, such as filing taxes electronically. The organization responsible for DigiD maintenance, Logius, decided to take DigiD off line when it heard about the vulnerability. It then updated the Rails system to a patched version. The total downtime of DigiD was about 9 hours (from 12:20 until 21:30). Luckily, it seems DigiD was never comprimised.

The threat was real enough, though, as illustrated by the Bitcoin digital currency system: the Bitcoin currency exchange called Vircurex actually was compromised. According to Vircurex, it was able to “deploy fixes within five minutes after receiving the notification from the Rails security mailing list.”

To better understand the DigiD outage, I contacted spokesman Michiel Groeneveld from Logius. He stated that (1) applying the fix was relatively easy, and that (2) most of the down time was caused by “extensively testing” the new release.

Thus, the real lesson to be learned here is that speed of upgrading is crucial to reduce downtime (ensure high availability) in case a third party component turns into a security vulnerability. The software architect caring about both security and availability, must apply design for upgradability (categorized under replaceability in ISO 25010).

Any upgrade can introduce incompatibilities. Even the patch for this Rails vulnerability introduced a regression. Design for upgradability is about dealing with such regressions. It involves:

Isolation of depedencies on the external components, for example through the use of wrappers or aspects in order to reduce the impact of incompatibilities.
Dependency hygiene, ensuring the newest versions of external components are used as soon as they are available (which is good security policy anyway). This helps avoid the accumulation of incompatibilities, which may cause updates to take weeks rather than minutes (or even hours). Hot security fixes may even be unavailable for older versions: For Ruby on Rails, which is now in version 3.x, the most popular comment at the fix site was a telling “lots of love from people stuck on 2.3“
Test automation, in order to reduce the execution time of regression tests for the system working with the upgraded component. This will include end-to-end system tests, but can also include dedicated tests ensuring that the wrappers built meet the behavior expected from the component.
Continuous deployment, ensuring that once the source code can deal with the upgraded library, the actual system can be deployed with a push on the button.

None of these comes for free. In other words, the product owner should be willing to invest in these. It is the responsibility of the architect to make clear what the costs and benefits are, and what the risks are of not investing in isolation, dependency hygiene, test automation, and continuous deployment. In this explanation, the architect can point to additional benefits, such as better maintainability, but these may be harder to sell than security and availability.

This brings me to two research connections of this case.

The first relates to regression testing. A hot fix for a system that is down is a case where it actually matters how long the execution of an (automated) regression test suite takes: test execution time in this case equals down time. Intuitively, test cases covering functionality for which Rails is not even used, need not be executed. This is where the research area of selective regression testing comes in. The typical technique uses control flow analysis in order to reduce a large regression test suite given a particular change. This is classic software engineering research dating back to the 90s: For a representative article have a look at Rothermel and Harrold’s Safe, Efficient Regression Test Selection Technique.

Design for upgradability also relates to some of the research I’m involved in.
What an architect caring about upgradability can do is estimate the anticipated upgrading costs of an external component. This could be based on a library’s “compatibility reputation”. But how can we create such a compatibility rating?

At the time of writing, we are working on various metrics that use a library’s release history in order to predict API stability. We are using the (huge) maven repository to learn about breaking changes in libraries in the wild, and we are investigating to what extent encapsulation practices are effective. With that in place, we hope to be able to provide decision support concerning the maintainability costs of using third party libraries.

For our first results, have a look at our ICSM 2012 paper on Measuring Library Stability Through Historical Version Analysis — for the rest, stay tuned, as there is more to come.

EDIT (February 4, 2013)

For a more detailed account of the impact of the Rails vulnerabilites, have a look at What The Rails Security Issue Means For Your Startup by Patrick McKenzie. The many (sometimes critical) comments on that post are also an indication for how hard upgrading in practice is (“How does this help me … when I have a multitude of apps running some Rails 1.x or 2.x version?“).

An interesting connection with API design is provided by Ned Batchelder, who suggests to rename .load and .safe_load to .dangerous_load and .load, respectively (in a Python setting in which similar security issues exist).

EDIT (April 4, 2013)

As another (separate) example of an urgent security fix, today (April 4, 2013), the PostgreSQL Global Development Group has released a security update to all current versions of the PostgreSQL database system. The most important security issue fixed in this release, CVE-2013-1899, makes it possible for a connection request containing a database name that begins with “-” to be crafted that can damage or destroy files within a server’s data directory.

Here again, all users of the affected versions are strongly urged to apply the update immediately, illustrating once more the need to be able to upgrade rapidly.

21Dec2012

Line Coverage: Lessons from JUnit

Posted in Development by Arie van Deursen

In unit testing, achieving 100% statement coverage is not realistic. But what percentage would good testers get? Which cases are typically not included? Is it important to actually measure coverage?

To answer questions like these, I took a look at the test suite of JUnit itself. This is an interesting case, since it is created by some of the best developers around the world, who care a about testing. If they decide not to test a given case, what can we learn from that?

Coverage of JUnit as measured by Cobertura

Overall Coverage at First Sight

Applying Cobertura to the 600+ test cases of JUnit leads to the results shown above (I used the maven-enabled version of JUnit 4.11). Overall, instruction coverage is almost 85%. In total, JUnit comprises around 13,000 lines of code and 15,000 lines of test code (both counted with wc). Thus, the test suite that is larger than the framework, leaves around 15% of the code untouched.

Covering Deprecated Code?

A first finding is that in JUnit coverage of deprecated code tends be to be lower. Junit 4.11 contains 13 deprecated classes (more than 10% of the code base), which achieve only 65% line coverage.

JUnit includes another dozen or so deprecated methods spread over different classes. These tend to be small methods (just forwarding a call), which often are not tested.

Furthermore, JUnit 4.11 includes both the modern org.junit.* packages as well as the older junit.* packages from 3.8.x. These older packages constitute ~20% of the code base. Their coverage is 70%, whereas the newer packages have a coverage of almost 90%.

This lower coverage for deprecated code is somewhat surprising, since in a test-driven development process you would expect good coverage of code before it gets deprecated. The underlying mechanism may be that after deprecation there is no incentive to maintain the test cases: If I would issue a ticket to improve the test cases for a deprecated method on JUnit I suspect this issue would not get a high priority. (This calls for some repository mining research on deprecation and testing, in the spirit of our work on co-evolution of tests and code).

Another implication is that when configuring coverage tools, it may be worth excluding deprecated code from analysis. A coverage tool that can recognize @Deprecated tags would be ideal, but I am not aware of such a tool. If excluding deprecated code is impossible, an option is to adjust coverage warning thresholds in your continuous integration tools: For projects rich in deprecated code it will be harder to maintain high coverage percentages.

Ignoring deprecated code, the JUnit coverage is 93%.

An Untested Class!

In the non-deprecated code, there was one class not covered by any test:
runners.model.NoGenericTypeParametersValidator. This class validates that @Theories are not applied to generic types (which are problematic due to type erasure).

I easily found the pull request introducing the validator about a year ago. Interestingly, the pull request included a test class clearly aimed at testing the new validator. What happened?

Tests in JUnit are executed via @Suites. The new test class, however, was not added to any suite, and hence not executed.
Once added to the proper suite, it turned out the new tests failed: the new validation code was never actually invoked.

I posted a comment on the (already closed) pull request. The original developer responded quickly, and provided a fix for the code and the tests within a day.

Note that finding this issue through coverage thresholds in a continuous integration server may not be so easy. The pull request in question causes a 1% increase in code size, and a 1% decrease in coverage. Alerts based on thresholds need to be sensitive to small changes like these. (And, the current ant-based Cloudbees JUnit continuous integration server does not monitor coverage at all).

What I’d really want is continuous integration alerts based on coverage deltas for the files modified in the pull request only. I am, however, not aware of tools supporting this at the moment.

The Usual Suspects: 6%.

To understand the final 5-6% of uncovered code, I had a look at the remaining classes. For those, there was not a single method with more than 2 or 3 uncovered lines. For this uncovered code, various typical categories can be distinguished.

First, there is the category too simple to test. Here is an example from org.junit.Assume, in which an assumeTrue is turned into an assumeFalse by just adding a negation operator:

public static void assumeFalse(boolean b) {
  assumeTrue(!b);
}

Other instances of too simple to test include basic getters, or overrides for methods such as toString.

A special case of too simple to test is the empty method. These are typically used to provide (or override) default behavior in inheritance hierarchies:

/**
 * Override to set up your specific external resource.
 *
 * @throws if setup fails (which will disable {@code after}
 */
protected void before() throws Throwable {
    // do nothing
}

Another category is code that is dead by design. An example is a static only class, which need not be instantiated. It is good Java practice (adopted selectively in JUnit too) to make this explicit by declaring the constructor private:

/**
 * Protect constructor since it is a static only class
 */
protected Assert() {
}

In other cases dead by design involves an assertion that certain situations will never occur. An example is Request.java:

catch (InitializationError e) {
  throw new RuntimeException(
    "Bug in saff's brain: " +
    "Suite constructor, called as above, should always complete");
}

This is similar to a default case in a switch statement that can never be reached.

A final category consists of bad weather behavior that is unlikely to happen. This typically manifests itself in not explicitly testing that certain exceptions are caught:

try {
  ...
} catch (InitializationError e) {
  return new ErrorReportingRunner(null, e);
}

Here the catch clause is not covered by any test. Similar cases occur for example when raising an illegal argument exception if inputs do not meet simple validation criteria.

EclEmma and JaCoCo

While all of the above is based on Cobertura, I started out using EclEmma/Jacoco 0.6.0 in Eclipse for doing the coverage analysis. There were two (small) surprises.

First, merely enabling EclEmma code coverage caused the JUnit test suite to fail. The issue at hand is that in JUnit test methods can be sorted according to different criteria. This involves reflection, and the test outcomes were influenced by additional (synthetic) methods generated by Jacoco. The solution is to configure Jacoco so that instrumentation of certain classes is disabled — or to make the JUnit test suite more robust against instrumentation.

Second, JaCoCo does not report coverage of code raising exceptions. In contrast to Cobertura, JaCoCo does on-the-fly instrumentation using an agent attached to the Java class loader. Instructions in blocks that are not completed due to an exception are not reported as being covered.

As a consequence, JaCoCo is not suitable for exception-intensive code. JUnit, however, is rich in exceptions, for example in the various Assert methods. Consequently, the code coverage for JUnit reported by JaCoCo is around 3% lower than by Cobertura.

Lessons Learned

Applying line coverage to one of the best tested projects in the world, here is what we learned:

Carefully analyzing coverage of code affected by your pull request is more useful than monitoring overall coverage trends against thresholds.
It may be OK to lower your testing standards for deprecated code, but do not let this affect the rest of the code. If you use coverage thresholds on a continuous integration server, consider setting them differently for deprecated code.
There is no reason to have methods with more than 2-3 untested lines of code.
The usual suspects (simple code, dead code, bad weather behavior, …) correspond to around 5% of uncovered code.

In summary, should you monitor line coverage? Not all development teams do, and even in the JUnit project it does not seem to be a standard practice. However, if you want to be as good as the JUnit developers, there is no reason why your line coverage would be below 95%. And monitoring coverage is a simple first step to verify just that.

23Nov2012

Desk Rejected

Posted in Research by Arie van Deursen

One of the first things we did after all NIER 2013 papers were in, was identifying papers that should be desk rejected. What is a desk reject? Why are papers desk rejected? How often does it happen? What can you do if your paper is desk rejected?

A desk reject means that the program chairs (or editors) reject a paper without consulting the reviewers. This is done for papers that fail to meet the submission requirements, and which hence cannot be accepted. Filtering out desk rejects in advance is common practice for both conferences and journals.

To identify such desk rejects for NIER 2013, program co-chair Sebastian Elbaum and I made a first pass through all 160+ submissions. In the end, we desk rejected around 10% of the submissions (a little more than I had anticipated).

Causes for reject included problems in:

Formatting: The paper does not meet the 4 page limit;
Scope: The paper is not about software engineering;
Presentation: The paper contains, e.g., too many grammatical problems;
Innovation: The paper does not explain how it builds upon and
extends the existing body of knowledge.

Of these, for NIER the formatting was responsible for half of the desk rejects.

Plagiarism
A potential cause that we did not encounter is plagiarism (fraud), or its special form self-plagiarism (submitting the same, or very similar, papers to multiple venues).

In my experience, plain plagiarism is not very common (I encountered one case in another conference, where we had to apply the IEEE Guidelines on Plagiarism).

Self-plagiarism is a bigger problem as it can range from copy-pasting a few paragraphs from an earlier paper to a straight double submission. While the former may be acceptable, the latter is considered a cardinal sin (your paper will be rejected at both venues, and reviewers don’t like reviewing a paper that cannot be accepted). And there are many shades of grey in between.

Notifications
We sent out notifications to authors of desk rejected papers within a few days after the submission deadline (it took a bit of searching to figure out that the best way to do this is to use the delete paper option from EasyChair). Thus, desk rejects not only serve to reduce the reviewing load of the program committee, but also to provide early feedback to authors whose papers just cannot make it.

Is there anything you can do to avoid being desk rejected?
The simple advice is to carefully read the submission guidelines. Besides that, it may be wise to submit a version adhering to all criteria early on when there is no immediate deadline stress yet. This may then serve as a fallback in case you mess up the final submission (uploading, e.g., the wrong pdf). Usually chairs have access to these earlier versions, and they can then decide to use the earlier version in case (only) the final version is a clear desk reject (for NIER this situation did not occur).

Is there anything you can do after being desk rejected?
Usually not. Most desk rejects are clear violations of submission requirements. If you think your desk reject is based on subjective grounds (presentation, innovation), and you strongly disagree, you could try to contact the chairs to get your paper into the reviewing phase anyway. The most likely outcome, however, will still be a reject, so it may not be in your self-interest to postpone this known outcome.

Submission times
And … are desk rejects are related to the paper submission time? Yes, there is a (mild) negative correlation: For NIER, there were more desk rejects in the earlier than in the later submissions. My guess is that this is quite common. There seem to be authors who simply try their same pdf at multiple conferences, hoping for an easy conference with little reviewing only.

Acceptance rates
This brings me to the final point. Conferences are commonly ranked based on their acceptance ratio. The lower the percentage of accepted papers, the more prestigious the conference is considered. The most interesting figure is obtained if acceptance rates are based on the serious competition only — i.e., the subset of papers that made it to the reviewing phase. Desk rejected papers do not qualify as such, and hence should not be taken into account when computing conference acceptance rates.

11Nov2012

Library Updating. Risk it Now, or Risk it Later?

Posted in Research by Arie van Deursen

Chances are your software depends on external libraries. What should you do, if a new version of such a library is released? Update immediately? But what if the library isn’t backward compatible? Should you swallow the pill immediately, and make the necessary changes to your system so that it can work with the new version? Or is it safe to wait for now, and avoid immediate cost and risk?

Together with Steven Raemaekers and Joost Visser (both from SIG), we embarked upon a reseach project in which we seek to answer questions like these. We are looking at library and API stability, as well as at the costs and consequences of library incompatibilities.

A first result, in which we try to measure library stability, has been presented at this year’s International Conference on Software Maintenance. The corresponding paper starts with a real life example illustrating the issues at hand.

The system in this example comprises around 200,000 lines of Java code, divided over around 4000 classes. The application depends on the Spring Framework, Apache Struts, and Hibernate. Its update history is shown below.

The system was built in 2004. Third-party library dependencies were managed using Maven. Version numbers of the latest versions which were available in 2004 were hard-coded in the conﬁguration ﬁles of the project. These libraries were not updated to more recent versions in the next seven years.

The system used version 1.0 (from 2003) of the Acegi authentication and security framework. In 2008, this library was included in Spring, and renamed into Spring Security, then at 2.0.0. As time passed, several critical safety-related bug fixes and improvements were added to Spring Security, as well as a number of changes breaking the existing API.

One might argue that keeping a security library up to date is always a good idea. But since the development team expected compatibility issues when upgrading the Acegi library, the update to Spring Security was deferred as long as possible.

In 2011, a new feature, single sign-on, was required for the system. To implement this, the team decided to adopt Atlassian Crowd.

Unfortunately, the old Acegi framework could not communicate with Atlassian Crowd. The natural replacement for Acegi was Spring Security, which was then in version 3.0.6.

However, the system already made use of an older version of the general Spring Framework. Therefore, in order to upgrade to Spring Security 3.0.6, an upgrade to the entire Spring Framework had to be conducted as well.

To make things worse, the system also made use of an older version (2.0.9) of Apache Struts. Since the new version of Spring could not work with this old version of Struts, Struts had to be updated as well.

Upgrading Struts not just affected the the system’s Java code, but also its Java Server Pages. Between Struts 2.0.9 and 2.2.3.1 the syntax of the Expression Language used in JSP changed. Consequently, all web pages in which dynamic content was presented uinsg JSP had to be updated.

Eventually, a week was spent to implement the changes and upgrades.

The good news was that there was an automated test suite available consisting of both JUnit and Selenium test cases. Without this test suite, the impact of this update would have been much harder to assess and control.

This case illustrates several issues with third-party library dependencies.

Third party libraries introduce backward incompatibilities.
Backward incompatibilities introduce costs and risks when upgrading libraries.
Backward incompatibilities are not just caused by direct dependencies
you control yourself but also by transitive ones you do not control.
There likely will come a moment in which upgrading must be done: To fix bugs, to improve security, or when the system’s functionality needs to be extended.
The longer you postpone updating, the bigger the eventual pain. As your system grows and evolves, the costs and risks of upgrading an old library increase. Such an accumulation of maintenance debt may lead to a much larger effort than in the case of smaller incremental updates.

In short, not upgrading your libraries immediately is taking the bet that it never needs to be done. Upgrading now is taking the bet it must be done anyway, in which case doing it as soon as possible is the cheapest route.

Paper "Measuring Software Library Stability through Historical Version Analysis

The full ICSM 2012 research paper.

In our research project, we seek to deepen our insight in these issues. We are looking at empirical data on how often incompatabilities occur, the impact of library popularity on library stability, the effort involved in resolving incompatibilities, and at ways in which to avoid them in the first place. Stay tuned!

Should you have similar stories from the updating trenches to share with us, please drop us a line!

3Nov2012

Paper Arrival Rates

Posted in Research by Arie van Deursen

As the deadline passed, I just closed the submission site for ICSE NIER 2013. How many hours in advance do authors typically submit their paper?

To answer that question, I collected some data on the time at which each paper was submitted. (I just looked at initial submissions, not at re-submissions). Here is a graph showing new paper arrivals, sorted by hours before the deadline, grouped in bins of 2 hours.

As you can see, we received a little less than 30 submissions more than 2 days in advance. But the vast majority likes to submit in the final 24 hours. The last paper was submitted just 5 minutes before the deadline.

Accumulating this graph and displaying the data as percentage yields the following chart:

This gives some insight in the percentage of papers submitted at different time slots before the deadline.

Let’s draw the following easy to remember conclusions from this graph:

1/6th of the papers are submitted more than 48 hours ahead of time.
1/3d of the papers are submitted 24 hours before the deadline
Half of the papers are submitted 14 hours before the dealine
2/3d of the papers are submitted 10 hours before the deadline
1/6th of the papers are submitted in the final 4 hours before the deadline

Is this relevant? If this is valid, as a conference organizer you can guestimate the number of submissions, say, 24 hours ahead of time, which is when you’d have 1/3d of the papers in.

But also if you’re an author this can be interesting. Conference systems like EasyChair give your paper an ID that equals the number of submissions so far. So if you submit at, say, 10 hours before the deadline, and get paper ID 200, the chart suggests that you may end up competing with 300 submissions in total.

The chart may very well be different for other conferences. NIER is part of ICSE which is held in California, with a deadline at 12pm Pago Pago time, on a Friday, soliciting 4-page papers, without requiring pre-submission of abstracts. These are all circumstances that will affect submission behavior. If you have pointers to similar data for other conferences let me know, and I’ll update the post.

Enjoy!

26Aug2012

Teaching Reactive Programming

Posted in Teaching by Arie van Deursen

One of the new courses at the TU Delft MSc Computer Science in 2012 was on reactive programming. The students loved this course, and I had a great time too. What was so good about it?

Format
The course was taught by Erik Meijer, creator of the .NET reactive extensions framework Rx. Erik works at Microsoft, Redmond, and has a part time appointment at TU Delft. The lectures thus were packed in two weeks, followed by several student presentations over Skype after Erik had returned to Redmond.

Book: Programming Reactive Extensions and LINQ

Content
The course content included big data, asynchronous operations on observable collections, push versus pull, Pip Coburn’s change function, the role of abstraction, monads, LINQ, coSQL, event processing, schedulers, and the reactive extensions architecture. Course material included Programming Reactive Extensions and LINQ by Jesse Liberty and Paul Betts.

Labwork
Students subsequently used this understanding of reactive programming to build a cloud-based (Windows) phone app, to be put in the market place. Results include one app to keep an eye on your stack overflow account, and two apps focused on train delays. Some helper libraries developed by the students are now on github, such as a proxy for the Dutch Railways API, and ExchangeLINQ, a LINQ query provider for the Stack Exchange API.

The Engineer as Educator
One thing that made this course special was Erik sharing his extensive experience in API design. He explained the actual tradeoffs he and his team made in the design of Rx — for example when deciding that subscribing to an IObservable should return an IDisposable in order to allow the developer to stop the subscription.

In order to explain his design decisions, Erik naturally made use of his background in functional programming. To answer the student’s questions, he used monads, co- and contravariance, category theory, and trampolines, to name a few. Thus, the course demonstrated how a thorough understanding of programming language theory is a prerequisite for good API design. More than anything, this course motivated students to dive into the theory of (functional) programming.

Pizza
In the final session, the students presented their apps and their reactive programming skills. The IDEs were opened, and the students experienced what it feels like when a senior engineer like Erik reviews your code. The session took place at 6pm over Skype, with students having pizza and beer, while Erik was having his morning coffee in Redmond.

Final Reactive Programming Presentations

27Jun2012

Teaching Testing in Year One

Posted in Teaching by Arie van Deursen

Starting 2013/2014, TU Delft will run a revised curriculum for the bachelor computer science. The software testing course that I have been teaching to 2nd and 3rd year CS students will move from to the (end of) the first year.

This is an exciting prospect. It confirms that testing is not an afterthought, but something that should be built into software development right from the start.

But can it be done? What should freshmen coming straight from high school be taught before they can start with testing? And what should a first year testing course contain?

To build up the required knowledge, the TU Delft curriculum anticipates three pre-testing courses.

In the first, students learn about object-oriented programming, covering topics ranging from simple loops to inheritance, polymorphism, and interfaces. They will even learn a bit about the mechanics of testing their code automatically.
Subsequently, they use the acquired programming skills in a simple project. They learn to work in teams, to write software according to requirements provided by others, and to share their (UML) design diagrams with other team members.
As the third step, they learn about data structures such as linked lists or binary search trees, and learn to use recursion. These courses (object-oriented programming, a project, and data structures), are scheduled for the first three quarters of the first year.

Then in the fourth quarter, a dedicated course on software testing comes in. The course should hook students to innovative forms of testing for the rest of their lives. Here’s what I have in mind for that.

The practical basis will include exploratory testing, behavior-driven development, and the use of testable scenarios to specify requirements. With respect to unit testing, the students will learn JUnit, the use of build tools (maven), coverage analysis, and the use of continuous integration tools (Jenkins). I even hope to get them to understand a mocking framework like Mockito. Students will apply these techniques to a small existing applications (JPacman) which they will have to adapt and test.

The more theoretical basis will be provided by the systematic derivation of test cases from models, such as state machines or decision tables. Furthermore, I’ll elaborate on different adequacy models (beyond statement coverage!) as well as combinatorial testing techniques (e.g., pairwise testing).

This being an academic course, it will also include a critical reflection on the tools and techniques covered. We’ll identify strengths and weaknesses, and see how today’s hottest research aims at addressing these weaknesses.

Well, perhaps this is all too ambitious. I will try, and we will see. Luckily, TU Delft is not the first university to move testing to the first year: Eindhoven is a notable other example, and I am sure there are more (although perhaps not many). “Test early, test often” — learn it early, apply it often.

Arie van Deursen

Software engineering in theory and practice

Author Archives: Arie van Deursen

Green Open Access and Preprint Linking

Self-Archiving

Preprint Linking

Preprints @ ICSE 2013

Preprint Link Sustainability

Implications

Acknowledgments

Update (Summer 2013)

David Notkin on Why We Publish

Speaking in Irvine on Metrics and Architecture

See also:

Design for Upgradability and the Rails DigiD Outage

EDIT (February 4, 2013)

EDIT (April 4, 2013)

Line Coverage: Lessons from JUnit

Overall Coverage at First Sight

Covering Deprecated Code?

An Untested Class!

The Usual Suspects: 6%.

EclEmma and JaCoCo

Lessons Learned

Desk Rejected

Library Updating. Risk it Now, or Risk it Later?

Paper Arrival Rates

Teaching Reactive Programming

Teaching Testing in Year One