Think Twice Before Using the “Maintainability Index”

Code metrics results in VS2010

This is a quick note about the “Maintainability Index”, a metric aimed at assessing software maintainability, as I recently run into developers and researchers who are (still) using it.

The Maintainability Index was introduced at the International Conference on Software Maintenance in 1992. To date, it is included in Visual Studio (since 2007), in the recent (2012) JSComplexity and Radon metrics reporters for Javascript and Python, and in older metric tool suites such as verifysoft.

At first sight, this sounds like a great success of knowledge transfer from academic research to industry practice. Upon closer inspection, the Maintainability Index turns out to be problematic.

The Original Index

The Maintainabilty Index was introduced in 1992 by Paul Oman and Jack Hagemeister, originally presented at the International Conference on Software Maintenance ICSM 1992 and later refined in a paper that appeared in IEEE Computer. It is a blend of several metrics, including Halstead’s Volume (HV), McCabe’s cylcomatic complexity (CC), lines of code (LOC), and percentage of comments (COM). For these metrics, the average per module is taken, and combined into a single formula:

formula

To arrive at this formula, Oman and Hagemeister started with a number of systems from Hewlett-Packard (written in C and Pacscal in the late 80s, “ranging in size from 1000 to 10,000 lines of code”). For each system, engineers provided a rating (between 1-100) of its maintainability. Subsequently, 40 different metrics were calculated for these systems. Finally, statistical regression analysis was applied to find the best way to combine (a selection of) these metrics to fit the experts’ opinion. This eventually resulted in the given formula. The higher its value, the more maintainable a system is deemed to be.

The maintainability index attracted quite some attention, also because the Software Engineering Institute (SEI) promoted it, for example in their 1997 C4 Software Technology Reference Guide. This report describes the Maintainability Index as “good and sufficient predictors of maintainability”, and “potentially very useful for operational Department of Defense systems”. Furthermore, they suggest that “it is advisable to test the coefficients for proper fit with each major system to which the MI is applied.”

Use in Visual Studio

Visual Studio Code Metrics were announced in February 2007. A November 2007 blogpost clarifies the specifics of the maintainability index included in it. The formula Visual Studio uses is slightly different, based on the 1994 version:

Maintainability Index =
  MAX(0, (171 - 5.2 * ln(Halstead Volume)
             - 0.23 * Cyclomatic Complexity
             - 16.2 * ln(Lines of Code)
         ) * 100 / 171)

As you can see, the constants are literally the same as in the original formula. The new definition merely transforms the index to a number between 0 and 100. Also, the comment metrics has been removed.

Furthermore, Visual Studio provides an interpretation:

MI >= 20 High Maintainabillity
10 <= MI < 20 Moderate Maintainability
MI < 10 Low Maintainability

 

I have not been able to find a justification for these thresholds. The 1994 IEEE Computer paper used 85 and 65 (instead of 20 and 10) as thresholds, describing them as a good “rule of thumb”.

The metrics are available within Visual Studio, and are part of the code metrics power tools, which can also be used in a continuous integration server.

Concerns

I encountered the Maintainability Index myself in 2003, when working on Software Risk Assessments in collaboration with SIG. Later, researchers from SIG published a thorough analysis of the Maintainability Index (first when introducing their practical model for assessing maintainability and later as section 6.1 of their paper on technical quality and issue resolution).

Based on this, my key concerns about the Maintainability Index are:

  1. There is no clear explanation for the specific derived formula.
  2. The only explanation that can be given is that all underlying metrics (Halstead, Cyclomatic Complexity, Lines of Code) are directly correlated with size (lines of code). But then just measuring lines of code and taking the average per module is a much simpler metric.
  3. The Maintainability Index is based on the average per file of, e.g., cyclomatic complexity. However, as emphasized by Heitlager et al, these metrics follow a power law, and taking the average tends to mask the presence of high-risk parts.
  4. The set of programs used to derive the metric and evaluate it was small, and contained small programs only.
  5. Furthermore, the programs were written in C and Pascal, which may have rather different maintainability characteristics than current object-oriented languages such as C#, Java, or Javascript.
  6. For the experiments conducted, only few programs were analyzed, and no statistical significance was reported. Thus, the results might as well be due to chance.
  7. Tool smiths and vendors used the exact same formula and coefficients as the 1994 experiments, without any recalibration.

One could argue that any of these concerns is reason enough not to use the Maintainability Index.

These concerns are consistent with a recent (2012) empirical study, in which one application was independently built by four different companies. The researchers used these systems two compare maintainability and several metrics, including the Maintainability Index. Their findings include that size as a measure of maintainability has been underrated, and that the “sophisticated” maintenance metrics are overrated.

Think Twice

In summary, if you are a researcher, think twice before using the maintainability index in your experiments. Make sure you study and fully understand the original papers published about it.

If you are a tool smith or tool vendor, there is not much point in having several metrics that are all confounded by size. Check correlations between the metrics you offer, and if any of them are strongly correlated pick the one with the clearest and simplest explanation.

Last but not least, if you are a developer, and are wondering whether to use the Maintainability Index: Most likely, you’ll be better off looking at lines of code, as it gives easier to understand information on maintainability than a formula computed over averaged metrics confounded by size.

Further Reading

  1. Paul Omand and Jack Hagemeister. “Metrics for assessing a software system’s maintainability”. Proceedings International Conference on Software Mainatenance (ICSM), 1992, pp 337-344. (doi)
  2. Paul W. Oman, Jack R. Hagemeister: Construction and testing of polynomials predicting software maintainability. Journal of Systems and Software 24(3), 1994, pp. 251-266. (doi).
  3. Don M. Coleman, Dan Ash, Bruce Lowther, Paul W. Oman. Using Metrics to Evaluate Software System Maintainability. IEEE Computer 27(8), 1994, pp. 44-49. (doi, postprint)
  4. Kurt Welker. The Software Maintainability Index Revisited. CrossTalk, August 2001, pp 18-21. (pdf)
  5. Maintainability Index Range and Meaning. Code Analysis Team Blog, blogs.msdn, 20 November 2007.
  6. Ilja Heitlager, Tobias Kuipers, Joost Visser. A practical model for measuring maintainability. Proceedings 6th International Conference on the Quality of Information and Communications Technology, 2007. QUATIC 2007. (scholar)
  7. Dennis Bijlsma, Miguel Alexandre Ferreira, Bart Luijten, and Joost Visser. Faster Issue Resolution with Higher Technical Quality of Software. Software Quality Journal 20(2): 265-285 (2012). (doi, preprint). Page 14 addresses the Maintainability Index.
  8. Khaled El Emam, Saida Benlarbi, Nishith Goel, and Shesh N. Rai. The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics. IEEE Transactions on Software Engineering, 27(7):630:650, 2001. (doi, preprint)
  9. Dag Sjøberg, Bente Anda, and Audris Mockus. Questioning software maintenance metrics: a comparative case study. Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement (ESEM), 2012, pp. 107-110. (doi, postprint).
Edit September 2014

Included discussion on Sjøberg’s paper, the thresholds in Visual Studio, and the problems following from averaging in a power law.


© Arie van Deursen, August 2014.

7 thoughts on “Think Twice Before Using the “Maintainability Index”

  1. I have made jokes about this formula in my SE lectures in the past ten years, naming it a “most elaborate example of sophisticated curve fitting”. My favorite part is the sine of the square root of comments – why would there be a periodic raise and fall of maintain ability? But the sine as used never gets periodic and simply is low (= low maintainability) if there’s very few or very many comments. Genius!

  2. A fantastic post and great read.
    I am doing my MEng in Software Engineering and im looking at introducing a different way of writing endpoints on server side code. I have found that there’s millions and millions loc that over years gets unmanageable. Every new feature request ends in a new endpoint.

    I was hoping to stumble onto a formula i could use to measure maintainability in these systems, vs the new approach, but alas no luck so far. At least i found a way not measure it!

    Thank you for the post it was fantastic.

  3. I’ve been using the Visual Studio implementation of this maintainability metric for some years in multi-tier systems ranging from 100kloc to 750kloc and I’ve found it of significant practical use.

    Intuitively, a combination of simple linear length (lines of code) with cyclomatic complexity and the vocabularly required for reasoning (Halstead Volume) seems reasonable.

    Pragmatically, having the metric constantly visible while working (via Code Lens) is a great reminder to keep things managable as I go.

    Of course the thresholds are arbitary – they would be arbitary at any level. The maintainability of a function doesn’t suddenly become bad just because the score drops another two points.

    What’s useful about the maintainability index is the relative measure – given two functions with substantially different scores and limited available development effort, which change makes the most useful contribution to the overall health of the codebase?

  4. Pingback: Monitoring PL/SQL Code Evolution With PL/SQL Cop for SonarQube | Philipp Salvisberg's Blog

  5. Pingback: Canned Software – Lorenzo Solano Martinez

  6. “I have not been able to find a justification for these thresholds.” here it is:
    “For the thresholds we decided to break down this 0-100 range 80-20 so that we kept the noise level low and only flagged code that was really suspicious. We have:

    0-9 = Red
    10-19 = Yellow
    20-100 = Green”
    https://docs.microsoft.com/de-de/archive/blogs/codeanalysis/maintainability-index-range-and-meaning

    Keep in mind the 85,65 is the HP threshold for the unnormded formular with “+ comments”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s