A South African Perspective on Privacy and Intelligence

The Dutch government has proposed a new law on intelligence and security services (“Wet op de inlichtingen- en veiligheidsdiensten” — Wiv20XX).

As several privacy-related organizations have made clear, this law proposes non-specific (bulk) interception powers for any form of telecom or data transfer without independent ex-ante review or court involvement (see the summary by Matthijs Koot, and reactions on the bill by Bits of Freedom, Privacy International, the Institute for Information Law of the University of Amsterdam IVIR, and the Internet Society ISOC).

This bill gives the Dutch government unprecedented power to violate the privacy of its citizens. Either the Dutch government does not recognize the crucial role of privacy in a well-functioning democracy, or it does not realize what enormous privacy infringements are made possible through Internet surveillance.

Book cover Sachs' Soft Vengeance

When discussing the importance of privacy, I am always reminded of South Africa’s anti-apartheid activist Albie Sachs and his autobiography “The Soft Vengeance of a Freedom Fighter” (first published in 1990, and turned into a film in 2014).

As a law student at the University of Capetown, Albie Sachs started fighting apartheid at the age of 17, in 1952. He was imprisoned from 1963-1964 (solitary confinement) and again in 1966, after which he was exiled from his home country South Africa.

In 1988, living in Maputo, Mozambique, he lost his right arm and an eye when his car was bombed by the South African secret police.

From 1991 until 1993, after Nelson Mandela’s release in 1990, Albie Sachs played a pivotal role in the negotiations leading to the new South African constitution.

In 1994 Nelson Mandela appointed him as judge of the highest court of South Africa, the Constitutional Court. He worked for the Truth and Reconciliation Commission between 1995 and 1998.

Albie Sachs wrote his Soft Vengeance in 1989. Nelson Mandela was still in prison, and the struggle against Apartheid was not won yet. Albie Sachs had just lost his arm and eye, and his book was his attempt to cope with his injuries.

For his recovery he was flown into a London hospital. He noticed that he was remarkably optimistic, and he was wondering why. Here is his reason (p.58):

“Perhaps part of my pleasure at being in this hospital room is that I am fairly sure it is not bugged. Sometimes I used to imagine my phone in Maputo being listened in to by at least three different secret services […]”

“Possibly my continuing sense of post-bomb euphoria comes from the fact that at least for the time being I am out of the net of hidden sensors, my spirit free from spying for the first time in three decades.”

He explains what it means to be surveilled:

“Ever since I was seventeen I have been politically active, I have lived with the notion that there are others accompanying every move I make, listening to every word I say.”

“Did the secret police really follow every up and down of my marriage, pick up the terms of our divorce, record automatically the names of our children even before they were entered in the birth register?”

And this gives rise to his dream for the future:

“I too have a dream, that there will one day be a world without police files, and bugged rooms, and tapped telephones, and intercepted mail, and that I will actually live in it.”

Albie Sachs is not alone in his dream. According to article 12 of the United Nations Universal Declaration of Human Rights, we all have a right to privacy:

“No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks.”

To date, the Internet has given us amazing possibilities to communicate with our family and friends, to search, read, and share information on almost any topic we find interesting, and to shop for almost any item we think we need. As a software engineering educator and researcher, I am proud to have played a tiny part in making this happen.

Unfortunately, the Internet can also be used as a place for massive surveillance activities, at levels that, for example, the South African apartheid regime could only have dreamed of. As a software engineer, I am terrified by the technical opportunities the Internet provides to governments wishing to know everything about their citizens.

A government aimed at drafting a modern intelligence bill should recognize this immense power, and take responsibility to safeguard the necessary privacy protection.

The Dutch government has failed to do so. It has proposed a bill with insufficient independent oversight, a bill that oppressive regimes, such as the former South African regime, would be happy to embrace.

Luckily, the present bill is still a draft. I sincerely hope that the final version will offer adequate privacy protection, and bring the world closer to the dream of Albie Sachs.

Delft Students on Software Architecture: DESOSA 2015

With Rogier Slag.

This year, we taught another edition of the TU Delft Teaching Software Architecture — With GitHub course.

We are proud to announce the resulting on line book: Delft Students on Software Architecture is a collection of architectural descriptions of open source software systems written by students from Delft University of Technology during a master-level course taking place in the spring of 2015.

desosa 2015 book cover

At the start of the course, teams of 3-4 students could adopt a project of choice on GitHub. The projects selected had to be sufficiently complex and actively maintained (one or more pull requests merged per day).

During a 10 week period, the students spent one third of their time on this course,and engaged with these systems in order to understand and describe their software architecture.

Inspired by Brown and Wilsons’ Architecture of Open Source Applications, we decided to organize each description as a chapter, resulting in the present online book.

Recurring Themes

The chapters share several common themes, which are based on smaller assignments the students conducted as part of the course. These themes cover different architectural ‘theories’ as available on the web or in textbooks. The course used Rozanski and Woods’ Software Systems Architecture, and therefore several of their architectural viewpoints and perspectives recur.

The first theme is outward looking, focusing on the use of the system. Thus, many of the chapters contain an explicit stakeholder analysis, as well as a description of the context in which the systems operate. These were based on available online documentation, as well as on an analysis of open and recently closed issues for these systems.

A second theme involves the development viewpoint, covering modules, layers, components, and their inter-dependencies. Furthermore, it addresses integration and testing processes used for the system under analysis.

A third recurring theme is variability management. Many of today’s software systems are highly configurable. In such systems, different features can be enabled or disabled, at compile time or at run time. Using techniques from the field of product line engineering, several of the chapters provide feature-based variability models of the systems under study.

A fourth theme is metrics-based evaluation of software architectures. Using such metrics architects can discuss (desired) quality attributes (performance, scaleability, maintainability, …) of a system quantitatively. Therefore various chapters discuss metrics and in some cases actual measurements tailored towards the systems under analysis.

First-Hand Experience

Last but not least, the chapters are also based on the student’s experience in actually contributing to the systems described. As part of the course over 75 pull requests to the projects under study were made, including refactorings (Jekyll 3545, Docker 11350, Docker 11323, Syncany 391), bug fixes
(Diaspora 5714, OpenRA 7486, OpenRA 7544, Kodi 6570), and helpful documentation such as a Play Framework screen cast.

Through these contributions the students often interacted with lead developers and architects of the systems under study, gaining first-hand experience with the architectural trade-offs made in these systems.

Enjoy!

Working with the open source systems and describing their architectures has been a great experience, both for the teachers and the students.

We hope you will enjoy reading the DESOSA chapters as much as we enjoyed writing them.

Beyond Page Objects

Beyond Page Objects

During the last couple of months I had a good time using Protractor to create an end-to-end test suite for an AngularJS web application.

While applying the Page Object pattern, I realized that I needed more guidance on what page objects to create, and how to navigate through my web application.

To that end, I started drawing little state charts for my web application. Naturally, ‘page objects’ corresponded to states, and their methods to either state inspection methods (is my browser in the correct state?) or state transition methods (clicking this button will bring me to the next state).

Gradually, the following process emerged:

  1. If you want to test certain behavior of your web application, draw a little state diagram to capture the navigation for that behavior.

  2. Create ‘state objects’ for each of the states.

  3. Give each state object its ‘inspection methods’ (what is visible in the web application if I’m in this state) as well as ‘transition methods’ (clicks leading to a new state).

  4. I also find it helpful to give each state object a ‘selfcheck’ method, which just verifies whether the web application is indeed in the state corresponding to the state object.

  5. With the state objects in place, think of the paths you want to take through the application.

  6. The simplest starting point is to write one test for each basic transition: Bring the application in state A, click somehwere, and verify you ended up in the required state B.

  7. Next, you may want to consider longer paths, in which earlier transitions affect later behavior. An example is testing proper use (and resetting of) client-side caching.

I wrote a longer article about this approach, available as “Beyond Page Objects: Testing Web Applications with State Objects”, published in ACM Queue in June 2015, as well as in the Communications of the ACM in August 2015.

The paper also explains how to deal with more complex state machines (using superstates and AND-states, for example), how to use a transition tree to oversee the coverage of longer paths, and how to deal with the infamous back-button. Furthermore, I extended the example “PhoneCat” AngularJS application with a state-object based test suite, available from my GitHub page.

Admittedly, the idea to use state machines for testing purposes is not new at all — yet elaborating how it can be used for testing web applications with WebDriver was helpful to me. I hope you find the use of state objects helpful too.

Semantic Versioning? In Maven Central? Breaking Changes!

I love the use of semantic versioning. It provides clear guidelines when an API version number should be bumped. In particular, if a change to an API breaks backward compatibility, the major version identifier must be updated. Thus, when as a client of such an API I see a minor or patch update, I can safely upgrade, since these will never break my build.

But to what extent are people actually using the guidelines of semantic versioning? Has the adoption of semantic versioning increased over time? Does the use of semantic versioning speedup library upgrades? And if there are breaking changes, are these properly announced by means of deprecation tags?

To better understand semantic versioning, we studied seven years of versioning history in the Maven Central Repository. Here is what we found.

Versioning Policies

Semantic Versioning is a systematic approach to give version numbers to API releases. The approach has been developed by Tom Preston-Werner, and is advocated by GitHub. It is “based on but not necessarily limited to pre-existing widespread common practices in use in both closed and open-source software.”

In a nutshell, semantic versioning is based on MAJOR.MINOR.PATCH identifiers with the following meaning:

  • MAJOR: increment when you make incompatible API changes

  • MINOR: increment when you add functionality in backward compatible manner

  • PATCH: incremented when you make backward-compatible bug fixes

Versioning in Maven Central

To understand how developers actually use versioning policies, we studied 7 years of history of maven. Our study comprises around 130,000 released jar files, of which 100,000 come with source code. On average, we found around 7 versions per library in our data set.

The data set runs from 2006 to 2011, so predates the semantic versioning manifesto. Since the manifesto claims to be based on existing practices, it is interesting to investigate to what extent API developers releasing via maven have adhered to these practices.

As shown below, in Maven Central, the major.minor.patch scheme is adopted by the majority of projects.

Frequency of semver identifiers in maven

Breaking Changes

To understand to what extent version increments correspond to breaking changes, we identified all “update-pairs”: A version as well as its immediate successor of a given maven package.

To be able to assess semantic versioning compliance, we need to determine backward compatibility for such update pairs. Since in general this is undecidable, we use binary compatibility as a proxy. Two libraries are binary compatible, if interchanging them does not force one to recompile. Examples of incompatible changes are removing a public method or class, or adding parameters to a public method.

We used the Clirr tool to detect such binary incompatibilities automatically for Java. For those interested, there is also a SonarQube Clirr plugin for compatibliity checking in a continuous integration setting; An alternative tool is japi.

Major versus Minor/Patch Releases

For major releases, breaking changes are permitted in semantic versioning. This is also what we see in our dataset: A little over 1/3d of the major releases introduces binary incompatibilities:

Breaking changes in major releases

Interestingly, we see a very similar figure for minor releases: A little over 1/3d of the minor releases introduces binary incompatibilities too. This, however, is in violation with the semantic versioning guidelines.

Breaking changes in minor releases

The situation is somewhat better for patch releases, which introduce breaking changes in around 1/4th of the cases. This nevertheless is in conflict with semantic versioning, which insists that patches are compatible.

Breaking changes in patch releases

Overall, we see that minor and major releases are equally likely to introduce breaking changes.

The graph below shows that these trends are fairly stable over time. Minor (~40%) and patch releases (~45%) are most common, and major releases (~15%) are least common.

Adherence over time.

The orange line indicates releases with breaking changes: Around 30% of the releases introduce breaking changes — a number that is slightly decreasing over time, but still at 29%.

Of these releases with breaking changes, the vast majority is non-major (the crossed green line at ~25%). Thus, 1/4th of the releases does not comply with semantic versioning.

Deprecation to the Rescue?

The above analysis demonstrates a deep need to introduce breaking changes: appearantly, developers wish to introduce breaking changes in 30% of their API releases.

Apart from requiring that such changes take place in major releases,
semantic versioning insists on using deprecation to announce such changes. In particular:

Before you completely remove the functionality in a new major release there should be at least one minor release that contains the deprecation so that users can smoothly transition to the new API.

Given the many breaking changes we saw, how many deprecation tags would there be?

At the library level, 1000 out of 20,000 libraries (5%) we studied include at least one depracation tag.

When we look at the individual method level, we find the following:

  • Over 86,000 public methods are removed in a minor release, without any deprecation tag.
  • Almost 800 public methods receive a deprecation tag in their life span: These methods, however, are never removed.
  • 16 public methods receive a deprecation tag that is subsequently undone in a later release (the method is “undeprecated”)

That is all we found. In other words, we did not find a single method that was deprecated according to the guidelines of semantic versioning (deprecate in minor, then delete in major).

Furthermore, the worst behavior (deletion in minor without deprecation, over 86,000 methods) occurs hundred times more often than the best behavior witnessed (deprecation without deletion, almost 800 methods)

Discussion

Our findings are based on a somewhat older dataset obtained from maven. It might therefore be that adherence to semantic versioning at the moment is higher than what we report.

Our findings also take breaking changes to any public method or class into account. In practice, certain packages will be considered as API only, whereas others are internal only — yet declared public due to the limitations of the Java modularization system. We are working on a further analysis in which we explicitly measure to what extent the changed public methods are used. Initial findings within the maven data set suggest that there are hundreds of thousands of compilation problems in clients caused by these breaking changes — suggesting that it is not just internal modules containing these changes.

Some non-Java package managers explicitly have adopted semantic versioning. A notable example is npm, and also nuget has initiated a discussion. While npm has a dedicated library to determine version orderings according to the Semantic Versioning standard, I am not aware of Javascript compatibility tools that are used to check for breaking changes. For .NET (nuget) tools like Clirr exists, such as LibCheck, ApiChange, and the NDepend build comparison tools. I do not know, however, how widespread their use is.

Last but not least, binary incompatibility, as measured by Clirr, is just one form of backward incompatibility. An API may change the behavior of methods as well, in breaking ways. A way to detect those might be by applying old test suites to new code — which we have not done yet.

Conclusion

Breaking API changes are a fact of life. Therefore, meaningful version identifiers are a cause worth fighting for. Thus:

  1. If you adopt semantic versioning for your API, double check that your minor and patch releases are indeed backward compatible. Consider using tools such as clirr, or a dedicated test suite.

  2. If you rely on an API that claims to be following semantic versioning, don’t put too much faith in backward compatibility promises.

  3. If you want to follow semantic versioning, be prepared to bump the major version identifier often.

  4. Should you need to introduce breaking changes, do not forget to announce these with deprecation tags in minor releases first.

  5. If you are in research, note that the problems involved in preventing, encapsulating, detecting, and handling breaking changes are tough and real. Backward compatiblity deserves your research attention.

The full details of our study are published in our paper presented (slides) at the IEEE International Conference on Source Analysis and Manipulation (SCAM), held in Victoria, BC, September 2014.

This post is based on joint work by Steven Raemaekers and Joost Visser, both from the Software Improvement Group.

© Copyright Arie van Deursen, October 2014.

EDIT, November 2014

If you are a Java developer and serious about semantic versioning, then checkout this open source semantic versioning checking tool. It is available on Maven Central and on GitHub, and it can check differences, validate version numbers, and suggest proper version numbers for your new jar release.

Think Twice Before Using the “Maintainability Index”

Code metrics results in VS2010

This is a quick note about the “Maintainability Index”, a metric aimed at assessing software maintainability, as I recently run into developers and researchers who are (still) using it.

The Maintainability Index was introduced at the International Conference on Software Maintenance in 1992. To date, it is included in Visual Studio (since 2007), in the recent (2012) JSComplexity and Radon metrics reporters for Javascript and Python, and in older metric tool suites such as verifysoft.

At first sight, this sounds like a great success of knowledge transfer from academic research to industry practice. Upon closer inspection, the Maintainability Index turns out to be problematic.

The Original Index

The Maintainabilty Index was introduced in 1992 by Paul Oman and Jack Hagemeister, originally presented at the International Conference on Software Maintenance ICSM 1992 and later refined in a paper that appeared in IEEE Computer. It is a blend of several metrics, including Halstead’s Volume (HV), McCabe’s cylcomatic complexity (CC), lines of code (LOC), and percentage of comments (COM). For these metrics, the average per module is taken, and combined into a single formula:

formula

To arrive at this formula, Oman and Hagemeister started with a number of systems from Hewlett-Packard (written in C and Pacscal in the late 80s, “ranging in size from 1000 to 10,000 lines of code”). For each system, engineers provided a rating (between 1-100) of its maintainability. Subsequently, 40 different metrics were calculated for these systems. Finally, statistical regression analysis was applied to find the best way to combine (a selection of) these metrics to fit the experts’ opinion. This eventually resulted in the given formula. The higher its value, the more maintainable a system is deemed to be.

The maintainability index attracted quite some attention, also because the Software Engineering Institute (SEI) promoted it, for example in their 1997 C4 Software Technology Reference Guide. This report describes the Maintainability Index as “good and sufficient predictors of maintainability”, and “potentially very useful for operational Department of Defense systems”. Furthermore, they suggest that “it is advisable to test the coefficients for proper fit with each major system to which the MI is applied.”

Use in Visual Studio

Visual Studio Code Metrics were announced in February 2007. A November 2007 blogpost clarifies the specifics of the maintainability index included in it. The formula Visual Studio uses is slightly different, based on the 1994 version:

Maintainability Index =
  MAX(0, (171 - 5.2 * ln(Halstead Volume)
             - 0.23 * Cyclomatic Complexity
             - 16.2 * ln(Lines of Code)
         ) * 100 / 171)

As you can see, the constants are literally the same as in the original formula. The new definition merely transforms the index to a number between 0 and 100. Also, the comment metrics has been removed.

Furthermore, Visual Studio provides an interpretation:

MI >= 20 High Maintainabillity
10 <= MI < 20 Moderate Maintainability
MI < 10 Low Maintainability

 

I have not been able to find a justification for these thresholds. The 1994 IEEE Computer paper used 85 and 65 (instead of 20 and 10) as thresholds, describing them as a good “rule of thumb”.

The metrics are available within Visual Studio, and are part of the code metrics power tools, which can also be used in a continuous integration server.

Concerns

I encountered the Maintainability Index myself in 2003, when working on Software Risk Assessments in collaboration with SIG. Later, researchers from SIG published a thorough analysis of the Maintainability Index (first when introducing their practical model for assessing maintainability and later as section 6.1 of their paper on technical quality and issue resolution).

Based on this, my key concerns about the Maintainability Index are:

  1. There is no clear explanation for the specific derived formula.
  2. The only explanation that can be given is that all underlying metrics (Halstead, Cyclomatic Complexity, Lines of Code) are directly correlated with size (lines of code). But then just measuring lines of code and taking the average per module is a much simpler metric.
  3. The Maintainability Index is based on the average per file of, e.g., cyclomatic complexity. However, as emphasized by Heitlager et al, these metrics follow a power law, and taking the average tends to mask the presence of high-risk parts.
  4. The set of programs used to derive the metric and evaluate it was small, and contained small programs only.
  5. Furthermore, the programs were written in C and Pascal, which may have rather different maintainability characteristics than current object-oriented languages such as C#, Java, or Javascript.
  6. For the experiments conducted, only few programs were analyzed, and no statistical significance was reported. Thus, the results might as well be due to chance.
  7. Tool smiths and vendors used the exact same formula and coefficients as the 1994 experiments, without any recalibration.

One could argue that any of these concerns is reason enough not to use the Maintainability Index.

These concerns are consistent with a recent (2012) empirical study, in which one application was independently built by four different companies. The researchers used these systems two compare maintainability and several metrics, including the Maintainability Index. Their findings include that size as a measure of maintainability has been underrated, and that the “sophisticated” maintenance metrics are overrated.

Think Twice

In summary, if you are a researcher, think twice before using the maintainability index in your experiments. Make sure you study and fully understand the original papers published about it.

If you are a tool smith or tool vendor, there is not much point in having several metrics that are all confounded by size. Check correlations between the metrics you offer, and if any of them are strongly correlated pick the one with the clearest and simplest explanation.

Last but not least, if you are a developer, and are wondering whether to use the Maintainability Index: Most likely, you’ll be better off looking at lines of code, as it gives easier to understand information on maintainability than a formula computed over averaged metrics confounded by size.

Further Reading

  1. Paul Omand and Jack Hagemeister. “Metrics for assessing a software system’s maintainability”. Proceedings International Conference on Software Mainatenance (ICSM), 1992, pp 337-344. (doi)
  2. Paul W. Oman, Jack R. Hagemeister: Construction and testing of polynomials predicting software maintainability. Journal of Systems and Software 24(3), 1994, pp. 251-266. (doi).
  3. Don M. Coleman, Dan Ash, Bruce Lowther, Paul W. Oman. Using Metrics to Evaluate Software System Maintainability. IEEE Computer 27(8), 1994, pp. 44-49. (doi, postprint)
  4. Kurt Welker. The Software Maintainability Index Revisited. CrossTalk, August 2001, pp 18-21. (pdf)
  5. Maintainability Index Range and Meaning. Code Analysis Team Blog, blogs.msdn, 20 November 2007.
  6. Ilja Heitlager, Tobias Kuipers, Joost Visser. A practical model for measuring maintainability. Proceedings 6th International Conference on the Quality of Information and Communications Technology, 2007. QUATIC 2007. (scholar)
  7. Dennis Bijlsma, Miguel Alexandre Ferreira, Bart Luijten, and Joost Visser. Faster Issue Resolution with Higher Technical Quality of Software. Software Quality Journal 20(2): 265-285 (2012). (doi, preprint). Page 14 addresses the Maintainability Index.
  8. Khaled El Emam, Saida Benlarbi, Nishith Goel, and Shesh N. Rai. The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics. IEEE Transactions on Software Engineering, 27(7):630:650, 2001. (doi, preprint)
  9. Dag Sjøberg, Bente Anda, and Audris Mockus. Questioning software maintenance metrics: a comparative case study. Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement (ESEM), 2012, pp. 107-110. (doi, postprint).
Edit September 2014

Included discussion on Sjøberg’s paper, the thresholds in Visual Studio, and the problems following from averaging in a power law.


© Arie van Deursen, August 2014.

Learning from Apple’s #gotofail Security Bug

Yesterday, Apple announced iOS7.0.6, a critical security update for iOS7 — and an update for OSX / Safari is likely to follow soon (if you haven’t updated iOS yet, do it now).

The problem turns out to be caused by a seemingly simple programming error, now widely discussed as #gotofail on Twitter.

What can we, as software engineers and educators of software engineers, learn from this high impact bug?

The Code

A careful analysis of the underlying problem is provided by Adam Langley. The root cause is in the following code:

static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
                                 uint8_t *signature, UInt16 signatureLen)
{
    OSStatus        err;
    ...

    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;
    ...

fail:
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
}

To any software engineer, the two consecutive goto fail lines will be suspicious. They are, and more than that. To quote Adam Langley:

The code will always jump to the end from that second goto, err will contain a successful value because the SHA1 update operation was successful and so the signature verification will never fail.

Not verifying a signature is exploitable. In the upcoming months, there will remain plenty of devices running older versions of iOS and MacOS. These will remain vulnerable, and epxloitable.

Brittle Software Engineering

When first seeing this code, I was once again caught by how incredibly brittle programming is. Just adding a single line of code can bring a system to its knees.

For seasoned software engineers this will not be a real surprise. But students and aspiring software engineers will have a hard time believing it. Therefore, sharing problems like this is essential, in order to create sufficient awareness among students that code quality matters.

Code Formatting is a Security Feature

When reviewing code, I try to be picky on details, including white spaces, tabs, and new lines. Not everyone likes me for that. I often wondered whether I was just annoyingly pedantic, or whether it was the right thing to do.

The case at hand shows that white space is a security concern. The correct indentation immediately shows something fishy is going on, as the final check now has become unreachable:

    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
    goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;

Insisting on curly braces would hightlight the fault even more:

    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) {
        goto fail;
    }
    goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;

Indentation Must be Automatic

Because code formatting is a security feature, we must not do it by hand, but use tools to get the indentation right automatically.

A quick inspection of the sslKeyExchange.c source code reveals that it is not routinely formatted automatically: There are plenty of inconsistent spaces, tabs, and code in comments. With modern tools, such as Eclipse-format-on-save, one would not be able to save code like this.

Yet just forcing developers to use formating tools may not be enough. We must also invest in improving the quality of such tools. In some cases, hand-made layout can make a substantial difference in understandability of code. Perhaps current tools do not sufficiently acknowledge such needs, leading to under-use of today’s formatting tools.

Code Review to the Rescue?

Besides automated code formatting, critical reviews might also help. In the words of Adam Langley:

Code review can be effective against these sorts of bug. Not just auditing, but review of each change as it goes in. I’ve no idea what the code review culture is like at Apple but I strongly believe that my colleagues, Wan-Teh or Ryan Sleevi, would have caught it had I slipped up like this. Although not everyone can be blessed with folks like them.

While I fully subscribe to the importance of reviews, a word of caution is at place. My colleague Alberto Bacchelli has investigated how code review is applied today at Microsoft. His findings (published as a paper at ICSE 2013, and nicely summarized by Alex Nederlof as The Truth About Code Reviews) include:

There is a mismatch between the expectations and the actual outcomes of code reviews. From our study, review does not result in identifying defects as often as project members would like and even more rarely detects deep, subtle, or “macro” level issues. Relying on code review in this way for quality assurance may be fraught.

Automated Checkers to the Rescue?

If manual reviews won’t find the problem, perhaps tools can find it? Indeed, the present mistake is a simple example of a problem caused by unreachable code. Any computer science student will be able to write a (basic) “unreachable code detector” that would warn about the unguarded goto fail followed by more (unreachable) code (assuming parsing C is a ‘solved problem’).

Therefore, it is no surprise that plenty of commercial and open source tools exist to check for such problems automatically: Even using the compiler with the right options (presumably -Weverything for Clang) would warn about this problem.

Here, again, the key question is why such tools are not applied. The big problem with tools like these is their lack of precision, leading to too many false alarms. Forcing developers to wade through long lists of irrelevant warnings will do little to prevent bugs like these.

Unfortunately, this lack of precision is a direct consequence of the fact that unreachable code detection (like many other program properties of interest) is a fundamentally undecidable problem. As such, an analysis always needs to make a tradeoff between completeness (covering all suspicious cases) and precision (covering only cases that are certainly incorrect).

To understand the sweet spot in this trade off, more research is needed, both concerning the types of errors that are likely to occur, and concerning techniques to discover them automatically.

Testing is a Security Concern

As an advocate of unit testing, I wonder how the code could have passed a unit test suite.

Unfortunately, the testability of the problematic source code is very poor. In the current code, functions are long, and they cover many cases in different conditional branches. This makes it hard to invoke specific behavior, and bring the functions in a state in which the given behavior can be tested. Furthermore, observability is low, especially since parts of the code deliberately obfuscate results to protect against certain times of attacks.

Thus, given the current code structure, unit testing will be difficult. Nevertheless, the full outcome can be tested, albeit it at the system (not unit) level. Quoting Adam Langley again:

I coded up a very quick test site at https://www.imperialviolet.org:1266. […] If you can load an HTTPS site on port 1266 then you have this bug.

In other words, while the code may be hard to unit test, the system luckily has well defined behavior that can be tested.

Coverage Analysis

Once there is a set of (system or unit level) tests, the coverage of these tests can be used an indicator for (the lack of) completeness of the test suite.

For the bug at hand, even the simplest form of coverage analysis, namely line coverage, would have helped to spot the problem: Since the problematic code results from unreachable code, there is no way to achieve 100% line coverage.

Therefore, any serious attempt to achieve full statement coverage should have revealed this bug.

Note that trying to achieve full statement coverage, especially for security or safety critical code, is not a strange thing to require. For aviation software, statement coverage is required for criticality “level C”:

Level C:

Software whose anomalous behavior, as shown by the system safety assessment process, would cause or contribute to a failure of system function resulting in a major failure condition for the aircraft.

This is one of the weaker categories: For the catastrophic ‘level A’, even stronger test coverage criteria are required. Thus, achieving substantial line coverage is neither impossible nor uncommon.

The Return-Error-Code Idiom

Finally, looking at the full sources of the affected file, one of the key things to notice is the use of the return code idiom to mimic exception handling.

Since C has no built-in exception handling support, a common idiom is to insist that every function returns an error code. Subsequently, every caller must check this returned code, and include an explicit jump to the error handling functionality if the result is not OK.

This is exactly what is done in the present code: The global err variable is set, checked, and returned for every function call, and if not OK followed by (hopefully exactly one) goto fail.

Almost 10 years ago, together with Magiel Bruntink and Tom Tourwe, we conducted a detailed empirical analysis of this idiom in a large code base of an embedded system.

One of the key findings of our paper is that in the code we analyzed we found a defect density of 2.1 deviations from the return code idiom per 1000 lines of code. Thus, in spite of very strict guidelines at the organization involved, we found many examples of not just unchecked calls, but also incorrectly propagated return codes, or incorrectly handled error conditions.

Based on that, we concluded:

The idiom is particularly error prone, due to the fact that it is omnipresent as well as highly tangled, and requires focused and well-thought programming.

It is sad to see our results from 2005 confirmed today.


© Arie van Deursen, February 22, 2014.


EDIT February 24, 2014: Fixed incorrect use of ‘dead code detection’ terminology into correct ‘unreachable code detection’, and weakened the claim that any computer science student can write such a detector based on the reddit discussion.


Teaching Software Architecture: with GitHub!

Arie van Deursen, Alex Nederlof, and Eric Bouwers.

When teaching software architecture it is hard to strike the right balance between practice (learning how to work with real systems and painful trade offs) and theory (general solutions that any architect needs to thoroughly understand).

To address this, we decided to try something radically different at Delft University of Technology:

GitHub Octocat

  • To make the course as realistic as possible, we based the entire course on GitHub. Thus, teams of 3-4 students had to adopt an active open source GitHub project and contribute. And, in the style of “How GitHub uses GitHub to Build GitHub“, all communication within the course, among team members, and with external stakeholders took place through GitHub where possible.

Rozanski & Woods

  • Furthermore, to ensure that students actually digested architectural theory, we made them apply the theory to the actual project they picked. As sources we used literature available on the web, as well as the software architecture textbook written by Nick Rozanski and Eoin Woods.

To see if this worked out, here’s what we did (the course contents), how we did it (the course structure, including our use of GitHub), and what most of us liked, or did not like so much (our evaluation).

Course Contents

GitHub Project Selection

At the start of the course, we let teams of 3-4 students adopt a project of choice on GitHub. We placed a few constraints, such as that the project should be sufficiently complex, and that it should provide software that is actually used by others — there should be something at stake.

Based on our own analysis of GitHub, we also provided a preselection of around 50 projects, which contained at least 200 pull requests (active involvement from a community) and an automated test suite (state of the art development practices).

The projects selected by the students included Netty, Jenkins, Neo4j, Diaspora, HornetQ, and jQuery Mobile, to name a few.

Making Contributions

To get the students involved in their projects, we encouraged them to make actual contributions. Several teams were successful in this, and offered pull requests that were actually merged. Examples include an improvement of the Aurelius Titan build process, or a fix for an open issue in HornetQ. Other contributions were in the form of documentation, for example one team created a (popular) screencast on how to build a chat client/server system with Netty in under 15 minutes, posted on Youtube.

The open source projects generally welcomed these contributions, as illustrated by the HornetQ reaction:

Thanks for the contribution… I can find you more if you need to.. thanks a lot! :+1:

And concerning the Netty tutorial:

Great screencast, fantastic project.

Students do not usually get this type of feedback, and were proud to have made an impact. Or, as the CakePHP team put it:

All in all we are very happy with the reactions we got, and we really feel our contribution mattered and was appreciated by the CakePHP community.

DelftSWA twitter account

To make it easier to establish contact with such projects, we setup a decidcated @delftswa Twitter account. Many of the projects our students analyzed have their own Twitter accounts, and just mentioning a handle like @jenkinsci or @cakephp helped receive the project’s attention and establish the first contact.

Identifying Stakeholders

One of the first assignments was to conduct a stakeholder analysis:
understanding who had an interest in the project, what their interest was, and which possibly conflicting needs existed.

To do this, the students followed the approach to identify and engage stakeholders from Rozanski and Woods. They distinguish various stakeholder classes, and recommend looking for stakeholders who are most affected by architectural decisions, such as those who have to use the system, operate or manage it, develop or test it, or pay for it.

Stakeholder analysis for Aurelius Titan

Aurelius Stakeholders (click to enlarge)

To find the stakeholders and their architectural concerns, the student teams analyzed any information they could find on the web about their project. Besides documentation and mailing lists, this also included an analysis of recent issues and pull requests as posted on GitHub, in order to see what the most pressing concerns of the project at the moment are and which stakeholders played a role in these discussions.

Architectural Views

Since Kruchten’s classical “4+1 views” paper, it has been commonly accepted that there is no such thing as the architecture. Instead, a single system will have multiple architectural views.

Following the terminology from Rozanski and Woods, stakeholders have different concerns, and these concerns are addressed through views.
Therefore, with the stakeholder analysis in place, we instructed students to create relevant views, using the viewpoint collection from Rozanski and Woods as starting point.

Viewpoints from Rozanski & Woods

Rozanski & Woods Viewpoints

Some views the students worked on include:

  • A context view, describing the relationships, dependencies, and interactions between the system and its environment. For example, for Diaspora, the students highlighted the relations with users, `podmins’, other social networks, and the user database.

Diaspora Context Diagram

  • A development view, describing the system for those involved in building, testing, maintaining, and enhancing the system. For example, for neo4j, students explained the packaging structure, the build process, test suite design, and so on.

Neo4J Module View

  • A deployment view, describing the environment into which the system will be deployed, including the dependencies the system has on its runtime environment. For GitLab, for example, the students described the installations on the clients as well as on the server.

GitLab Deployment View

Besides these views, students could cover alternative ones taken from other software architecture literature, or they could provide an analysis of the perspectives from Rozanski and Woods, to cover such aspects as security, performance, or maintainability.

Again, students were encouraged not just to produce architectural documentation, but to challenge the usefulness of their ideas by sharing them with the projects they analyzed. This also forced them to focus on what was needed by these projects.

For example, the jQuery Mobile was working on a new version aimed at realizing substantial performance improvements. Therefore, our students conducted a series of performance measurements comparing two versions as part of their assignment. Likewise, for CakePHP, our students found the event system of most interest, and contributed a well-received description of the new CakePHP Events System.

Software Metrics

For an architect, metrics are an important tool to discuss (desired) quality attributes (performance, scaleability, maintainability, …) of a system quantitatevely. To equip students with the necessary skills to use metrics adequately, we included two lectures on software metrics (based on our ICSE 2013 tutorial).

Students were asked to think of what measurements could be relevant for their projects. To that end, they first of all had to think of an architecturally relevant goal, in line with the Goal/Question/Metric paradigm. Goals picked by the students included analyzing maintainability, velocity in responding to bug reports, and improving application responsiveness.

Subsequently, students had to turn their project goal into 3 questions, and identify 2 metrics per question.

Brackets Analyzability

Brackets Analyzability

Many teams not just came up with metrics, but also used some scripting to actually conduct measurements. For example, the Brackets team decided to build RequireJS-Analyzer, a tool that crawls JavasScript code connected using RequireJS and analyzes the code, to report on metrics and quality.

Architectural Sketches

Being an architect is not just about views and metrics — it is also about communication. Through some luck, this year we were able to include an important yet often overlooked take on communication: The use of design sketches in software development teams.

We were able to attract Andre van der Hoek (UC Irvine) as guest speaker. To see his work on design sketches in action, have a look at the Calico tool from his group, in which software engineers can easily share design ideas they write at a sketch board.

In one of the first lectures, Andre presented Calico to the students. One of the early assignments was to draw any free format sketches that helped to understand the system under study.

The results were very creative. The students freed themselves from the obligation to draw UML-like diagrams, and used colored pencils and papers to draw what ever they felt like.

HornetQ Design Sketch

For example, the above picture illustrates the HornetQ API; Below you see the Jenkins work flow, including dollars floating from customers the contributing developers.

Jenkins Design Sketch

Lectures From Industry

Last but not least, we wanted students to meet and learn from real software architects. Thus, we invited presentations from the trenches — architects working in industry, sharing their experience.

For example, in 2013 we had TU Delft alumnus Maikel Lobbezoo, who presented the architectural challenges involved in building the payment platform offered by Adyen. The Adyen platform is used to process gazilions of payments per day, acrross the world. Adyen has been one of the fastest growing tech companies in Europe, and its success can be directly attributed to its software architecture.

Adyen Flow

Maikel explained the importance of having a stateless architecture, that is linearly scalable, redundant and based on push notifications and idempotency. His key message was that the successful architect possesses a (rare) combination of development skills, communicative skills, and strategical bussiness insight.

Our other guest lecture covered cyber-physical systems: addressing the key role software plays in safety-critical infrastructure systems such as road tunnels — a lecture provided by Eric Burgers from Soltegro. Here the core message revolved around traceability to regulations, the use of model-based techniques such as SysML, and the need to communicate with stakeholders with little software engineering experience.

Course Structure

Constraints

Some practical constraints of the course included:

  • The course was worth 5 credit points (ECTS), corresponding to 5 * 27 = 135 hours of work per student.
  • The course run for half a semester, which is 10 consecutive weeks
  • Each week, there was time for two lectures of 90 minutes each
  • We had a total of over 50 students, eventually resulting in 14 groups of 3-4 students.
  • The course was intended four 4th year students (median age 22) who were typically following the TU Delft master in Computer Science, Information Architecture, or Embedded Systems.

GitHub-in-the-Course

All communication within the course took place through GitHub, in the spirit of the inspirational video How GitHub uses GitHub to Build GitHub. As a side effect, this eliminated the need to use the (not so popular) Blackboard system for intra-course communication.

From GitHub, we obtained a (free) delftswa organization. We used it to host various repositories, for example for the actual assignments, for the work delivered by each of the teams, as well as for the reading material.

Each student team obtained one repository for themselves, which they could use to collaboratively work on their assignments. Since each group worked on a different system, we decided to make these repositories visible to the full group: In this way, students could see the work of all groups, and learn from their peers how to tackle a stakeholder analysis or how to assess security issues in a concrete system.

Having the assignment itself on a GitHub repository not only helped us to iteratively release and improve the assignment — it also gave students the possibility to propose changes to the assignment, simply by posting a pull request. Students actually did this, for example to fix typos or broken links, to pose questions, to provide answers, or to start a discussion to change the proposed submission date.

We considered whether we could make all student results public. However, we first of all wanted to provide students with a safe environment, in which failure could be part of the learning process. Hence it was up to the students to decide which parts of their work (if any) they wanted to share with the rest of the world (through blogs, tweets, or open GitHub repositories).

Time Keeping

One of our guiding principles in this course was that an architect is always eager to learn more. Thus, we expected all students to spend the full 135 hours that the course was supposed to take, whatever their starting point in terms of knowledge and experience.

This implies that it is not just the end result that primarily counts, but the progress the students have made (although there is of course, a “minimum” result that should be met at least).

To obtain insight in the learning process, we asked students to keep track of the hours they spent, and to maintain a journal of what they did each week.

For reasons of accountability such time keeping is a good habit for an architect to adopt. This time keeping was not popular with students, however.

Student Presentations

We spent only part of the lecturing time doing class room teaching.
Since communication skills are essential for the successful architect,
we explicitly allocated time for presentations by the students:

  • Half-way all teams presented a 3-minute pitch about their plans and challenges.
  • At the end of the course we organized a series of 15 minute presentations in which students presented their end results.

Each presentation was followed by a Q&A round, with questions coming from the students as well as an expert panel. The feedback concerned the architectural decisions themselves, as well as the presentation style.

Grading

Grading was based per team, and based on the following items:

  • Final report: the main deliverable of each team, providing the relevant architectural documentation created by the team.
  • Series of intermediate (‘weekly’) reports corresponding to dedicated assignments on, e.g, metrics, particular views, or design sketches. These assignments took place in the first half of the course, and formed input for the final report;
  • Team presentations

Furthermore, there was one individual assignment, in which each student had to write a review report for the work of one of the other teams. Students did a remarkably good (being critical as well as constructive) job at this. Besides allowing for individual grading, this also ensured each team received valuable feedback from 3-4 fellow students.

Evaluation

All in all, teaching this course was a wonderful experience. We gave the students considerable freedom, which they used to come up with remarkable results.

A key success factor was peer learning. Groups could learn from each other, and gave feedback to each other. Thus, students not just made an in depth study of the single system they were working on, but also learned about 13 different architectures as presented by the other groups. Likewise, they not just learned about the architectural views and perspectives they needed themselves, but learned from their co-students how they used different views in different ways.

The use of GitHub clearly contributed to this peer learning. Any group could see, study, and learn from the work of any other group. Furthermore, anyone, teachers and students alike, could post issues, give feedback, and propose changes through GitHub’s facilities.

On the negative side, while most students were familiar with git, not all were. The course could be followed using just the bare minimum of git knowledge (add, commit, checkout, push, pull), yet the underlying git magic sometimes resulted in frustration with the students if pull requests could not be merged due to conflicts.

An aspect the students liked least was the time keeping. We are searching for alternative ways of ensuring that time is spent early on in the project, and ways in which we can assess the knowledge gain instead of the mere result.

One of the parameters of a course like this is the actual theory that is discussed. This year, we included views, metrics, sketches, as well as software product lines and service oriented architectures. Topics that were less explicit this year were architectural patterns, or specific perspectives such as security or concurrency. Some of the students indicated that they liked the mix, but that they would have preferred a little more theory.

Conclusion

In this course, we aimed at getting our students ready to take up a leading role in software development projects. To that end, the course put a strong focus on:

  • Close involvement with open source projects where something was at stake, i.e., which were actually used and under active development;
  • The use of peer learning, leveraging GitHub’s social coding facilities for delivering and discussing results
  • A sound theoretical basis, acknowleding that architectural concepts can only be grasped and deeply understood when trying to put them into practice.

Next year, we will certainly follow a similar approach, naturally with some modifications. If you have suggestions for a course like this, or are aware of similar courses, please let us know!

Acknowledgments

A big thank you to all students who participated in IN4315 in 2013,
to jury members Nicolas Dintzner, Felienne Hermans, and Georgios Gousios, and guest lecturers Andre van der Hoek, Eric Burgers, Daniele Romano, and Maikel Lobbezoo for shaping this course!

Further Reading

  1. Zach Holman. How GitHub uses GitHub to Build GitHub. RubyConf, September 2011.
  2. Nick Rozanski and Eoin Woods. Software Systems Architecture: Working with Stakeholders Using Viewpoints and Perspectives. Addison-Wesley, 2012, 2nd edition.
  3. Diomides Spinellis and Georgios Gousios (editors). Beautiful Architecture: Leading Software Engineers Explain How They Think. O’Reilly Media, 2009.
  4. Amy Brown and Greg Wilson (editors). The Architecture of Open Source Applications. Volumes 1-2, 2012.
  5. Software Architecture Wikipedia page (thoughtfully and substantially edited by the IFIP Working Group on Software Architecture in 2012)
  6. Eric Bouwers. Metric-based Evaluation of Implemented Software Architectures. PhD Thesis, Delft University of Technology, 2013.
  7. Remco de Boer, Rik Farenhorst, Hans van Vliet: A Community of Learners Approach to Software Architecture Education. CSEE&T, 2009.
  8. Earlier use of GitHub in the class room by Neil Ernst while at UBC: Using GitHub for 3rd Year Software Engineering, and Teaching Advanced Software Engineering.

© Text: Arie van Deursen, Alex Nederlof, and Eric Bouwers, 2013.

© Architectural views: students of the TU Delft IN4315 course, 2013.