| WHITE PAPER |
Software Code Internationalization
Self-Healing with AI
In preparing products for global implementation, developers need to internationalize software code. Developers are not always fully versed in all potential internationalization issues, and thus, they are often overlooked. Static code analysis tools are popular in the industry to get the job done, but they have their own limitations.
Currently available static code analysis tools have both technical and business limitations. Technical limitations of current tools include:
- Being largely rule-based
- Significant number of false positives in the results
- No way for fixing issues without human intervention
Business limitations of these tools include:
- Cumbersome to use
- High cost of ownership
- Inconsistent quality
NetApp’s software code internationalization self-healing with AI is a method that learns from the analysis of the history of the source code to identify and categorize issues based on the past mistakes and generate suitable fixes. This uses artificial intelligence (AI) in a combination of classification learning models and pattern recognition models.
This approach breaks the source code into the snippets of code, and then tokenizes the code snippets to identify all the elements present in them. It uses the trained model to identify if a certain piece of code contains any internationalization violation. If the code snippet contains an issue, then the system further identifies the type of the issue such as “hardcoding,” “concatenation,” “date/time formats,” etc.
The user receives a highly accurate report, containing a very minimal number of false positives or false negatives.
This approach helps identify issues automatically, and generates the fix for the given issues and hence called a self-healable system.
The proposed method uses modules which are divided into backend and frontend.
Modules at the backend are used to save the data extracted from the source code and ML models that are published as application programming interfaces (APIs) to be consumed by the frontend systems.
1. Classification of code line: Training and Testing [BEMOD1]This module takes the source code being scanned and divides it as per its logical completeness (e.g., line-wise, function-wise, etc.). Then, it changes code to vectors, labels each snippet with error type and recommended fix, and passes it to the BEMOD2 prediction module.
2. Code line prediction [BEMOD2]: This module takes the snippet of code from BEMOD1 and identifies the type of violation the code snippet contains based on the model trained for classification. Then, it generates the fix based on the pattern recognition.
3. Saving the predictions for further model learning [BEMOD3]: The module saves all the predictions made by BEMOD2 to the database for model learning.
Frontend modules are those that interface between the users and the trained ML model(s) and its scanning capability.
1. Scanner [FEMOD1]: The Scanner scans the code, breaks it in logical snippets, tokenizes it, and sends to the BEMOD3 for prediction.
2. Interactive Reports [FEMOD2]: Display and confirm violations: These reports display the result of code scanning and recommended fixes based on the training. The reports allow the user to mark any reported issue as false positive or any non-issue as false negative. The final report is used to apply the fix automatically.
3. Plugin [FEMOD3]: These are similar modules that show results on an Integrated Development Environment (IDE) and show the similar data on live code by highlighting the code in appropriate colors.
Advantages over Previous Solutions
This technology minimizes the internationalization efforts, as most of the work is done by the self-healing system, which is ready to work after initial training without the need for a rule-based scanning.
This solution is highly accurate as the system forms the patterns from the real data. Plus, the bug fixing is also automated.
For more information write to firstname.lastname@example.org