“Inspect and adapt” is an important principle in the Agile Manifesto: “At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly,” says Principle 12.
One of the important areas of focus in the “inspect and adapt” loop is software quality. Measuring software quality has always been a challenge and with enterprises having adopted agility in the project delivery and deployment, this has become even more complex.
Benchmarking is one of the existing techniques we have employed here to measure quality standards and continuously improve ourselves by goal setting and learning from the experiences of other products and services.
Quality Bands is a framework based on the concept of benchmarking and helps define the standard for product quality across segments like mobile, web, desktop and components.
This method is conceptually similar to stock market sentiment. Market sentiment is the feeling or tone of a market, or its crowd psychology, as revealed through the activity and price movement of the securities traded in that market. Likewise, Quality Bands convey the sentiment of the metric across segments.
Deciding on the set of metrics to monitor quality standards is the first step for any organization or team. At Adobe, we have been using these metrics in Quality Bands:
• Percentage of fixed bugs
• Percentage of withdrawn bugs
• Percentage of deferred bugs
• Percentage of localization functional bugs
• Translation accuracy
• Functional efforts spent towards testing (in hours)
• Globalization readiness score
• Customer satisfaction score (CSAT)
• Net promoter score (NPS)
• Bug agility score
• Code coverage score
Using percentages instead of absolute numbers for bug data helps normalize the data and convey the sentiment of the metric. This percentage is calculated from the total number of localization bugs reported in the defined time period. This is the formula used: percentage of localization fixed bugs = (number of localization bugs fixed/total number of bugs) *100
Quality Bands are created separately across desktop, web and mobile. This is expected since we cannot compare desktop products with a mobile app or a web service. An important step here is to select the products that contribute to the band. There may be 10-12 product lines in the desktop category but we chose selective products out of these based on the following parameters:
Business Usage: Products that make a significant impact to the company’s business.
Product Domain: Products from diverse domains are included. For example, we have one product each from the photography, video, web publishing, document and marketing cloud domains. These domains might be different for other organizations, but the intention is to have a product representation from each of the domains.
Existence in the Market: Products that have existed in the market for at least a year and have an established customer base. Such products are expected to have evolved processes and standards.
The next step in the process is to collect data corresponding to the set of metrics listed above. For example, we have six products contributing to the Desktop Band — A, B, C, D, E and F.
Here is the data set for localization fixed bugs for each of these products: x, y, z, t, u, v (expressed in percentages). The data corresponds to a fixed timeline that could be release-based or follow a quarterly cadence.
Based on these values, we calculate the mean (x) and standard deviation (σ) for these data points. In statistics, the standard deviation (SD, also represented by the Greek letter sigma σ or the Latin letter s) is a measure that is used to quantify the amount of variation or dispersion of a set of data values. A low standard deviation indicates that the data points tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the data points are spread out over a wider range of values.
The band range is defined by the lower and upper band values as per the following calculation: Lowerband=(x–σ)Upperband = (σ+x- ).
Empirical Rule: If data is normally distributed, around 68% of the results should fall within the mean plus or minus your standard deviation. 95% of the results should fall within two standard deviations, plus or minus the mean.
For Quality Bands, we use the first standard deviation to calculate the upper and lower band values. Once these values are calculated, we create a plot that looks like the chart in Figure 1. The band in Figure 1 lies within the range 88.2-99.5%. The red line indicates the area of focus for the metric. Lower translation accuracy could be due to either of the following factors:
• Issues with the string translation — grammatical, legal and so on.
• Translated string does not match the user interface context.
• Preferential changes where the linguists have a better suggestion for the existing string.
The band plots are created for each of the metrics. The band values signify the sentiment of the metric in the chosen category. If we have another product, G, in the desktop category with 75% of translation accuracy, we will plot it against the standard plot and analyze it with regard to the standard band. Product G with 75% translation accuracy is clearly an outlier since the band’s lower range is 88.2%. This indicates a scope of improvement in this product as it does not lie within the desktop sentiment. The next step is to do a double click analysis for product G and identify why this product has lower accuracy as compared to other products. The reasons could be:
• No process in the product to share context with linguists for translation.
• Linguists not using the shared context for translation.
• Strings were checked late in the cycle, which led to translations being done in a quick turnaround time.
The applicable reason identified is then used to set a goal for process improvement in the next cycle. This method leads the teams to set concrete goals based on actual data and hence there is no gap between the actual picture and the goals being set.
This is an iterative process and gives the teams a holistic view of the software quality at each timestamp. The products that contribute to the band also have inherent goals for process improvement. For instance, in the translation accuracy band, products D and F could also identify the possible reasons for lower accuracy and set actionable goals that could lead to the band itself getting better in the next iteration. The band will get better with the lower range of the band rising to a higher value than the previous one.
This method overall provides a scientific approach to software quality measurement and fits in the lean mindset for product development. The teams make improvements in how they work by acting on the hypothesis and seeing what happens via the data in the next iteration.