Managing Complex Spreadsheets — The Story of PerfectXL

This week we finished grading of the software architecture course I’m teaching.

Like many teachers, I use a spreadsheet for grading together with my co-teachers and teaching assistants. In this case, we concurrently worked with five people on a Google Spreadsheet. The resulting spreadsheet is quite interesting:

  • The spreadsheet currently has 22 sheets (tabs)

  • There are input sheets for basic information on the over one hundred students in the class, the groups they form, and the rubrics we use.

  • There are input sheets from various forms the students used to enter course-related information

  • There are input sheets for different sub-assignments, which the teachers and assistants use to enter subgrades for each rubric: Some grades are individual, others are per team. Such sheets also contain basic formulas to compute grades from rubrics.

  • There are overview sheets collecting the sub-grades from various sheets, combining them to overall grades. The corresponding formulas can become quite tricky, involving rounding, lookups, sumproducts, thresholds, conditional logic based on absence or presence of certain grades, etc.

  • There are various output sheets, to report grades to students, to export grades to the university’s administrative systems, and to offer diagrams showing grade distributions for educational assessments of the course.

The spreadsheet used has a history of five years: Each year we take the existing one, copy it, and remove the student data. We then adjust it to the changes we made to the course (additional assignments, new grading policies, better rubrics, etc).

Visualization of sheet dependencies

All in all, this spreadsheet has grown quite complex, and it is easy to make a mistake. For example, I once released incorrect grades — a rather stressful event both for my students and myself. And all I did wrong was forgetting the infamous false argument needed in a vlookup — despite the fact that I was well aware of this “feature”. For the this year’s spreadsheet we had duplicate student ids, in a column where each row had to be unique, leading to a missing grade, and again extra effort and stress to resolve this as soon as possible.

I suspect that if you use spreadsheets seriously, for example for grading, you recognize the characteristics of my spreadsheet — and maybe your sheets are even more complicated.

Now I have an interest in spreadsheets that goes beyond that of the casual user: As a software engineering researcher, I have looked extensively at spreadsheets. I did this together with Felienne Hermans, first when she was doing her PhD under my supervision in the context of the Perplex project (co-funded by Microsoft) and then in the context of the Prose project (funded by the Dutch STW agency). From a research perspective, these projects were certainly successful, leading to a series of publications in such venues as ECOOP 2010, ICSE 2011-2013, ICSM, EMSE, and SANER.

But we did our research not just to publish papers: We also had (and have) the ambition to actually help the working spreadsheet user, as well as large organizations that depend on spreadsheets for business-critical decision making.

To that end, we founded a company, Infotron, which offers tooling, services, consultancy, and training to help organizations and individuals become more effective with their spreadsheets.

After several years of operating somewhat under the radar, the Infotron team (headed by CEO Matéo Mol) has now launched an on line service, PerfectXL, in which users can upload a spreadsheet and get it analyzed. The service then helps in the following ways:

  • PerfectXL can visualize the architectural dependencies between sheets, as shown above for my course sheet;
  • PerfectXL can identify risks (such as the vlookup I mentioned, interrupted ranges, or overly complex conditional logic);
  • PerfectXL can assess the structure and quality of the formulas in your sheet.

If this sounds interesting, you can try out the service for free at perfectxl.com. There are various pricing options that help Infotron run and further grow this service — pick the subscription that suits you and your organization best!

Even if you decide not to use the PerfectXL service, the site contains a lot of generally useful information, such as various hints and tips on how to create and maintain well-structured spreadsheets.

Enjoy!

Speaking in Irvine on Metrics and Architecture

End of last year I was honored to receive an invitation to present in the Distinguished Speaker Series at the Insitute for Software Research at University of California at Irvine.

I quickly decided that the topic to discuss would be our research on software architecture, and in particular our work on metrics for maintainability.

Irvine is one of the world’s leading centers on research in software architecture. The Institute of Software Research is headed by Richard Taylor, who supervised Roy Fielding when he wrote his PhD thesis covering the REST architectural style, and Nenad Medvidovic during his work on architectural description laguages. Current topics investigated at Irvine include design and collaboration (André van der Hoek, and David Redmiles of ArgoUML fame), software analyis and testing (James Jones), and programming laguages (Cristina Lopes), to name a few. An overview of the group’s vision on software architecture can be found in their recently published textbook. In short, I figured that if there is one place to present our software architecture research it must be Irvine.

The talk (90 minutes) itself will be loosely based on my keynote at the Brazilian Software Engineering Symposium (SBES 2012), which in turn is based on joint research with Eric Bouwers and Joost Visser (both from SIG). I’ll post the slides when I’m done. The full slides are available on speakerdeck, but here’s the storyline along with some references.

A Software Risk Assessment (from )

A Software Risk Assessment (source: ICSM 2009)

The context of this research is a software risk assessment, in which a client using a particular system seeks independent advice (from a consultant) on the technical quality of the system as created by an external supplier.

How can the client be sure that the system made for him is of good quality? In particular, will it be sufficiently maintainable, if the business context of the system in question changes? Will it be easy to adapt the system to the ever changing world?

In situations like these, it is quintessential to be able to make objective, evidence-based statements about the maintainability of the system in question.

Is this possible? What role can metrics play? What are their inherent limitations? How can we know that a metric indeed captures certain aspects of maintainability? How should metric values be interpreted? How should proposals for new metrics be evaluated?

Simple answers to these questions do not exist. In this talk, I will summarize our current progress in answering these questions.

I will start out by summarizing four common pitfalls when using metrics in a software development project. Then, I will describe a metrics framework in which metrics are put into context by means of benchmarking and a quality model. Subsequently, I’ll zoom in on architectural metrics, focusing on metrics for encapsulation. I will discuss a proposal for a new metric, as well as its evaluation. The evaluation comprises both a quantitative assessment (using repository-mining) of its construct validity (doest it measure encapsulation?), as well as qualitative assessments of the usefulness in practice (by interviewing consultants who applied the metrics in their day to day work).

Based on this, I will reflect on the road ahead for empirical research in software metrics and architecture, emphasizing the need for shared datasets, as well as the use of qualitative research methods to evaluate practical impact.

The talk is scheduled for Friday March 15, in Irvine — I sincerely hope to see you there!

If you can’t make it, Eric Bouwers and I will present a 3.5-hour tutorial based on this same material at ICSE 2013, in May in San Francisco. The tutorial will be more interactive, taking your experience into account as well where possible, and it will have a stronger emphasis on metrics (based on SIG’s 10 year experience with using metrics in industry). Register now for our tutorial, and looking forward to seeing you there!


See also: