This is a quick note about the “Maintainability Index”, a metric aimed at assessing software maintainability, as I recently run into developers and researchers who are (still) using it.
At first sight, this sounds like a great success of knowledge transfer from academic research to industry practice. Upon closer inspection, the Maintainability Index turns out to be problematic.
The Original Index
The Maintainabilty Index was introduced in 1992 by Paul Oman and Jack Hagemeister, originally presented at the International Conference on Software Maintenance ICSM 1992 and later refined in a paper that appeared in IEEE Computer. It is a blend of several metrics, including Halstead’s Volume (HV), McCabe’s cylcomatic complexity (CC), lines of code (LOC), and percentage of comments (COM). For these metrics, the average per module is taken, and combined into a single formula:
To arrive at this formula, Oman and Hagemeister started with a number of systems from Hewlett-Packard (written in C and Pacscal in the late 80s, “ranging in size from 1000 to 10,000 lines of code”). For each system, engineers provided a rating (between 1-100) of its maintainability. Subsequently, 40 different metrics were calculated for these systems. Finally, statistical regression analysis was applied to find the best way to combine (a selection of) these metrics to fit the experts’ opinion. This eventually resulted in the given formula. The higher its value, the more maintainable a system is deemed to be.
The maintainability index attracted quite some attention, also because the Software Engineering Institute (SEI) promoted it, for example in their 1997 C4 Software Technology Reference Guide. This report describes the Maintainability Index as “good and sufficient predictors of maintainability”, and “potentially very useful for operational Department of Defense systems”. Furthermore, they suggest that “it is advisable to test the coefficients for proper fit with each major system to which the MI is applied.”
Use in Visual Studio
Visual Studio Code Metrics were announced in February 2007. A November 2007 blogpost clarifies the specifics of the maintainability index included in it. The formula Visual Studio uses is slightly different, based on the 1994 version:
Maintainability Index = MAX(0, (171 - 5.2 * ln(Halstead Volume) - 0.23 * Cyclomatic Complexity - 16.2 * ln(Lines of Code) ) * 100 / 171)
As you can see, the constants are literally the same as in the original formula. The new definition merely transforms the index to a number between 0 and 100. Also, the comment metrics has been removed.
Furthermore, Visual Studio provides an interpretation:
|MI >= 20||High Maintainabillity|
|10 <= MI < 20||Moderate Maintainability|
|MI < 10||Low Maintainability|
I have not been able to find a justification for these thresholds. The 1994 IEEE Computer paper used 85 and 65 (instead of 20 and 10) as thresholds, describing them as a good “rule of thumb”.
The metrics are available within Visual Studio, and are part of the code metrics power tools, which can also be used in a continuous integration server.
I encountered the Maintainability Index myself in 2003, when working on Software Risk Assessments in collaboration with SIG. Later, researchers from SIG published a thorough analysis of the Maintainability Index (first when introducing their practical model for assessing maintainability and later as section 6.1 of their paper on technical quality and issue resolution).
Based on this, my key concerns about the Maintainability Index are:
- There is no clear explanation for the specific derived formula.
- The only explanation that can be given is that all underlying metrics (Halstead, Cyclomatic Complexity, Lines of Code) are directly correlated with size (lines of code). But then just measuring lines of code and taking the average per module is a much simpler metric.
- The Maintainability Index is based on the average per file of, e.g., cyclomatic complexity. However, as emphasized by Heitlager et al, these metrics follow a power law, and taking the average tends to mask the presence of high-risk parts.
- The set of programs used to derive the metric and evaluate it was small, and contained small programs only.
- For the experiments conducted, only few programs were analyzed, and no statistical significance was reported. Thus, the results might as well be due to chance.
- Tool smiths and vendors used the exact same formula and coefficients as the 1994 experiments, without any recalibration.
One could argue that any of these concerns is reason enough not to use the Maintainability Index.
These concerns are consistent with a recent (2012) empirical study, in which one application was independently built by four different companies. The researchers used these systems two compare maintainability and several metrics, including the Maintainability Index. Their findings include that size as a measure of maintainability has been underrated, and that the “sophisticated” maintenance metrics are overrated.
In summary, if you are a researcher, think twice before using the maintainability index in your experiments. Make sure you study and fully understand the original papers published about it.
If you are a tool smith or tool vendor, there is not much point in having several metrics that are all confounded by size. Check correlations between the metrics you offer, and if any of them are strongly correlated pick the one with the clearest and simplest explanation.
Last but not least, if you are a developer, and are wondering whether to use the Maintainability Index: Most likely, you’ll be better off looking at lines of code, as it gives easier to understand information on maintainability than a formula computed over averaged metrics confounded by size.
- Paul Omand and Jack Hagemeister. “Metrics for assessing a software system’s maintainability”. Proceedings International Conference on Software Mainatenance (ICSM), 1992, pp 337-344. (doi)
- Paul W. Oman, Jack R. Hagemeister: Construction and testing of polynomials predicting software maintainability. Journal of Systems and Software 24(3), 1994, pp. 251-266. (doi).
- Don M. Coleman, Dan Ash, Bruce Lowther, Paul W. Oman. Using Metrics to Evaluate Software System Maintainability. IEEE Computer 27(8), 1994, pp. 44-49. (doi, postprint)
- Kurt Welker. The Software Maintainability Index Revisited. CrossTalk, August 2001, pp 18-21. (pdf)
- Maintainability Index Range and Meaning. Code Analysis Team Blog, blogs.msdn, 20 November 2007.
- Ilja Heitlager, Tobias Kuipers, Joost Visser. A practical model for measuring maintainability. Proceedings 6th International Conference on the Quality of Information and Communications Technology, 2007. QUATIC 2007. (scholar)
- Dennis Bijlsma, Miguel Alexandre Ferreira, Bart Luijten, and Joost Visser. Faster Issue Resolution with Higher Technical Quality of Software. Software Quality Journal 20(2): 265-285 (2012). (doi, preprint). Page 14 addresses the Maintainability Index.
- Khaled El Emam, Saida Benlarbi, Nishith Goel, and Shesh N. Rai. The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics. IEEE Transactions on Software Engineering, 27(7):630:650, 2001. (doi, preprint)
- Dag Sjøberg, Bente Anda, and Audris Mockus. Questioning software maintenance metrics: a comparative case study. Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement (ESEM), 2012, pp. 107-110. (doi, postprint).
Edit September 2014
Included discussion on Sjøberg’s paper, the thresholds in Visual Studio, and the problems following from averaging in a power law.
© Arie van Deursen, August 2014.