Semantic Versioning? In Maven Central? Breaking Changes!

I love the use of semantic versioning. It provides clear guidelines when an API version number should be bumped. In particular, if a change to an API breaks backward compatibility, the major version identifier must be updated. Thus, when as a client of such an API I see a minor or patch update, I can safely upgrade, since these will never break my build.

But to what extent are people actually using the guidelines of semantic versioning? Has the adoption of semantic versioning increased over time? Does the use of semantic versioning speedup library upgrades? And if there are breaking changes, are these properly announced by means of deprecation tags?

To better understand semantic versioning, we studied seven years of versioning history in the Maven Central Repository. Here is what we found.

Versioning Policies

Semantic Versioning is a systematic approach to give version numbers to API releases. The approach has been developed by Tom Preston-Werner, and is advocated by GitHub. It is “based on but not necessarily limited to pre-existing widespread common practices in use in both closed and open-source software.”

In a nutshell, semantic versioning is based on MAJOR.MINOR.PATCH identifiers with the following meaning:

  • MAJOR: increment when you make incompatible API changes

  • MINOR: increment when you add functionality in backward compatible manner

  • PATCH: incremented when you make backward-compatible bug fixes

Versioning in Maven Central

To understand how developers actually use versioning policies, we studied 7 years of history of maven. Our study comprises around 130,000 released jar files, of which 100,000 come with source code. On average, we found around 7 versions per library in our data set.

The data set runs from 2006 to 2011, so predates the semantic versioning manifesto. Since the manifesto claims to be based on existing practices, it is interesting to investigate to what extent API developers releasing via maven have adhered to these practices.

As shown below, in Maven Central, the major.minor.patch scheme is adopted by the majority of projects.

Frequency of semver identifiers in maven

Breaking Changes

To understand to what extent version increments correspond to breaking changes, we identified all “update-pairs”: A version as well as its immediate successor of a given maven package.

To be able to assess semantic versioning compliance, we need to determine backward compatibility for such update pairs. Since in general this is undecidable, we use binary compatibility as a proxy. Two libraries are binary compatible, if interchanging them does not force one to recompile. Examples of incompatible changes are removing a public method or class, or adding parameters to a public method.

We used the Clirr tool to detect such binary incompatibilities automatically for Java. For those interested, there is also a SonarQube Clirr plugin for compatibliity checking in a continuous integration setting; An alternative tool is japi.

Major versus Minor/Patch Releases

For major releases, breaking changes are permitted in semantic versioning. This is also what we see in our dataset: A little over 1/3d of the major releases introduces binary incompatibilities:

Breaking changes in major releases

Interestingly, we see a very similar figure for minor releases: A little over 1/3d of the minor releases introduces binary incompatibilities too. This, however, is in violation with the semantic versioning guidelines.

Breaking changes in minor releases

The situation is somewhat better for patch releases, which introduce breaking changes in around 1/4th of the cases. This nevertheless is in conflict with semantic versioning, which insists that patches are compatible.

Breaking changes in patch releases

Overall, we see that minor and major releases are equally likely to introduce breaking changes.

The graph below shows that these trends are fairly stable over time. Minor (~40%) and patch releases (~45%) are most common, and major releases (~15%) are least common.

Adherence over time.

The orange line indicates releases with breaking changes: Around 30% of the releases introduce breaking changes — a number that is slightly decreasing over time, but still at 29%.

Of these releases with breaking changes, the vast majority is non-major (the crossed green line at ~25%). Thus, 1/4th of the releases does not comply with semantic versioning.

Deprecation to the Rescue?

The above analysis demonstrates a deep need to introduce breaking changes: appearantly, developers wish to introduce breaking changes in 30% of their API releases.

Apart from requiring that such changes take place in major releases,
semantic versioning insists on using deprecation to announce such changes. In particular:

Before you completely remove the functionality in a new major release there should be at least one minor release that contains the deprecation so that users can smoothly transition to the new API.

Given the many breaking changes we saw, how many deprecation tags would there be?

At the library level, 1000 out of 20,000 libraries (5%) we studied include at least one depracation tag.

When we look at the individual method level, we find the following:

  • Over 86,000 public methods are removed in a minor release, without any deprecation tag.
  • Almost 800 public methods receive a deprecation tag in their life span: These methods, however, are never removed.
  • 16 public methods receive a deprecation tag that is subsequently undone in a later release (the method is “undeprecated”)

That is all we found. In other words, we did not find a single method that was deprecated according to the guidelines of semantic versioning (deprecate in minor, then delete in major).

Furthermore, the worst behavior (deletion in minor without deprecation, over 86,000 methods) occurs hundred times more often than the best behavior witnessed (deprecation without deletion, almost 800 methods)

Discussion

Our findings are based on a somewhat older dataset obtained from maven. It might therefore be that adherence to semantic versioning at the moment is higher than what we report.

Our findings also take breaking changes to any public method or class into account. In practice, certain packages will be considered as API only, whereas others are internal only — yet declared public due to the limitations of the Java modularization system. We are working on a further analysis in which we explicitly measure to what extent the changed public methods are used. Initial findings within the maven data set suggest that there are hundreds of thousands of compilation problems in clients caused by these breaking changes — suggesting that it is not just internal modules containing these changes.

Some non-Java package managers explicitly have adopted semantic versioning. A notable example is npm, and also nuget has initiated a discussion. While npm has a dedicated library to determine version orderings according to the Semantic Versioning standard, I am not aware of Javascript compatibility tools that are used to check for breaking changes. For .NET (nuget) tools like Clirr exists, such as LibCheck, ApiChange, and the NDepend build comparison tools. I do not know, however, how widespread their use is.

Last but not least, binary incompatibility, as measured by Clirr, is just one form of backward incompatibility. An API may change the behavior of methods as well, in breaking ways. A way to detect those might be by applying old test suites to new code — which we have not done yet.

Conclusion

Breaking API changes are a fact of life. Therefore, meaningful version identifiers are a cause worth fighting for. Thus:

  1. If you adopt semantic versioning for your API, double check that your minor and patch releases are indeed backward compatible. Consider using tools such as clirr, or a dedicated test suite.

  2. If you rely on an API that claims to be following semantic versioning, don’t put too much faith in backward compatibility promises.

  3. If you want to follow semantic versioning, be prepared to bump the major version identifier often.

  4. Should you need to introduce breaking changes, do not forget to announce these with deprecation tags in minor releases first.

  5. If you are in research, note that the problems involved in preventing, encapsulating, detecting, and handling breaking changes are tough and real. Backward compatiblity deserves your research attention.

The full details of our study are published in our paper presented (slides) at the IEEE International Conference on Source Analysis and Manipulation (SCAM), held in Victoria, BC, September 2014.

This post is based on joint work by Steven Raemaekers and Joost Visser, both from the Software Improvement Group.

© Copyright Arie van Deursen, October 2014.

EDIT, November 2014

If you are a Java developer and serious about semantic versioning, then checkout this open source semantic versioning checking tool. It is available on Maven Central and on GitHub, and it can check differences, validate version numbers, and suggest proper version numbers for your new jar release.

Design for Upgradability and the Rails DigiD Outage

On January 9th, the Dutch DigiD system was taken offline for 9 hours. The reason was a vulnerability (CVE-2013-0155 and CVE-2013-0156) in the underlying Ruby on Rails system used. According to the exploit, it enables attackers to bypass authentication, inject SQL, perform a denial of service, or execute arbitrary code.

DigiD is a Dutch authentication system used by over 600 organizations, including the national taxes. Over 9 million Dutch citizens have a DigiD account, which they must use for various interactions with the government, such as filing taxes electronically. The organization responsible for DigiD maintenance, Logius, decided to take DigiD off line when it heard about the vulnerability. It then updated the Rails system to a patched version. The total downtime of DigiD was about 9 hours (from 12:20 until 21:30). Luckily, it seems DigiD was never comprimised.

The threat was real enough, though, as illustrated by the Bitcoin digital currency system: the Bitcoin currency exchange called Vircurex actually was compromised. According to Vircurex, it was able to “deploy fixes within five minutes after receiving the notification from the Rails security mailing list.”

To better understand the DigiD outage, I contacted spokesman Michiel Groeneveld from Logius. He stated that (1) applying the fix was relatively easy, and that (2) most of the down time was caused by “extensively testing” the new release.

Thus, the real lesson to be learned here is that speed of upgrading is crucial to reduce downtime (ensure high availability) in case a third party component turns into a security vulnerability. The software architect caring about both security and availability, must apply design for upgradability (categorized under replaceability in ISO 25010).

Any upgrade can introduce incompatibilities. Even the patch for this Rails vulnerability introduced a regression. Design for upgradability is about dealing with such regressions. It involves:

  1. Isolation of depedencies on the external components, for example through the use of wrappers or aspects in order to reduce the impact of incompatibilities.

  2. Dependency hygiene, ensuring the newest versions of external components are used as soon as they are available (which is good security policy anyway). This helps avoid the accumulation of incompatibilities, which may cause updates to take weeks rather than minutes (or even hours). Hot security fixes may even be unavailable for older versions: For Ruby on Rails, which is now in version 3.x, the most popular comment at the fix site was a telling “lots of love from people stuck on 2.3

  3. Test automation, in order to reduce the execution time of regression tests for the system working with the upgraded component. This will include end-to-end system tests, but can also include dedicated tests ensuring that the wrappers built meet the behavior expected from the component.

  4. Continuous deployment, ensuring that once the source code can deal with the upgraded library, the actual system can be deployed with a push on the button.

None of these comes for free. In other words, the product owner should be willing to invest in these. It is the responsibility of the architect to make clear what the costs and benefits are, and what the risks are of not investing in isolation, dependency hygiene, test automation, and continuous deployment. In this explanation, the architect can point to additional benefits, such as better maintainability, but these may be harder to sell than security and availability.

This brings me to two research connections of this case.

The first relates to regression testing. A hot fix for a system that is down is a case where it actually matters how long the execution of an (automated) regression test suite takes: test execution time in this case equals down time. Intuitively, test cases covering functionality for which Rails is not even used, need not be executed. This is where the research area of selective regression testing comes in. The typical technique uses control flow analysis in order to reduce a large regression test suite given a particular change. This is classic software engineering research dating back to the 90s: For a representative article have a look at Rothermel and Harrold’s Safe, Efficient Regression Test Selection Technique.

Design for upgradability also relates to some of the research I’m involved in.
What an architect caring about upgradability can do is estimate the anticipated upgrading costs of an external component. This could be based on a library’s “compatibility reputation”. But how can we create such a compatibility rating?

At the time of writing, we are working on various metrics that use a library’s release history in order to predict API stability. We are using the (huge) maven repository to learn about breaking changes in libraries in the wild, and we are investigating to what extent encapsulation practices are effective. With that in place, we hope to be able to provide decision support concerning the maintainability costs of using third party libraries.

For our first results, have a look at our ICSM 2012 paper on Measuring Library Stability Through Historical Version Analysis — for the rest, stay tuned, as there is more to come.

EDIT (February 4, 2013)

For a more detailed account of the impact of the Rails vulnerabilites, have a look at What The Rails Security Issue Means For Your Startup by Patrick McKenzie. The many (sometimes critical) comments on that post are also an indication for how hard upgrading in practice is (“How does this help me … when I have a multitude of apps running some Rails 1.x or 2.x version?“).

An interesting connection with API design is provided by Ned Batchelder, who suggests to rename .load and .safe_load to .dangerous_load and .load, respectively (in a Python setting in which similar security issues exist).

EDIT (April 4, 2013)

As another (separate) example of an urgent security fix, today (April 4, 2013), the PostgreSQL Global Development Group has released a security update to all current versions of the PostgreSQL database system. The most important security issue fixed in this release, CVE-2013-1899, makes it possible for a connection request containing a database name that begins with “-” to be crafted that can damage or destroy files within a server’s data directory.

Here again, all users of the affected versions are strongly urged to apply the update immediately, illustrating once more the need to be able to upgrade rapidly.