Automated Bug Fixing Tools and Techniques

Artificial Intelligence (AI) has been applied to many areas of Software Engineering. Software Debugging is one such area, which is one of the highly cognitive tasks, demanding a lot of developer time and effort. Understanding its significance, many big technology companies around the globe are investing a lot of revenue to automate the bug fixing of their software. Although the current research in Automated Bug Fixing is far from over, there are plenty of innovative tools and techniques already proposed, requiring little to negligible human intervention.

Automated Bug Fixing aka Automated Program Repair is automatic fixing of software bugs without the intervention of human developer.

INTRODUCTION

Automated Bug Fixing tools require some data as input, including buggy source code and the specifications describing correct behavior of the software. Given the inputs, the tools suggest a suitable code patch as an output, which replaces the buggy code to fix the specific bug without breaking other functionality. There may be other data sources required as input by the Automated Bug Fixing techniques such as the history of code changes from Version Control System or the history of bug fixes from Bug Tracker tool.

Overall Automated Bug Fixing Process

Majority of Automated Bug Fixing techniques use test suites as the specifications to define correctness criteria, where there is at least one failing test revealing the bug, and passing tests indicate the behavior that should be preserved. Explained in this way, the goal of Automated Bug Fixing technique may also be phrased as: suggest one or more code patches (a set of changes to source code) that leads all the tests to pass. Since the test oracle is often incomplete and does not cover all possible inputs, the generated patches may not be correct and human involvement may be required to validate patch correctness as well as the quality. To improve the quality of generated patches, test suites may be aided by using information from other artifacts, including documentation, formal specifications, language specifications, type systems, code change history from version control system and bug fix history from bug tracker.

EXAMPLE

A software developer pushes some code containing the statements shown below into a distributed shared repository. The central Continuous Integration (CI) system triggers bunch of regression tests after the build is successful. After running the tests, the CI system reports failed tests to the developer. Now the developer needs to analyze the logs and reports to understand and fix the problem. After a lot of contemplation and debugging, the developer fixes the bug by adding a null pointer check as indicated in code snippet below.

Instead of CI system informing the developer of the failing tests, it may be possible to suggest some suitable fixes to the developer. In addition to that, it may also be possible to fix the bug automatically and test the software again.

List<Address> addresses = geocoder.getFromLocation(lat, lng, 1);

if(addresses.size() != 0)

{

Address obj = addresses.get(0);

String currentAddress = "";

if (obj != null) // Bug Fix Code Statement

{

if (obj.getLocality() != null)

{

currentAddress = obj.getLocality() + ", ";

}

currentAddress = currentAddress + obj.getCountryName();

}

else

{

currentAddress = "";

}

return currentAddress;

}

TECHNIQUES

The first step for many Automated Bug Fixing techniques is identifying the areas in source code which are likely to contain the errors, commonly referred to as fault localization. Fault localization can be accomplished simply by using stack traces and bug reports, or some sophisticated novel techniques such as statistical debugging may be employed, which provides a ranked list of potentially faulty code statements. Automated Bug Fixing techniques can be distinguished in four broad categories, including: Heuristic-Based, Synthesis-Based, Learning-Based and Template-Based techniques.

1. Heuristic-Based. Heuristic-Based techniques use different algorithms and rules to generate multiple code patches, each of which is applied to the source code and validated by testing the modified source code. A code patch is finally selected as a bug fix if it passes all the tests in the test suite. Heuristic-Based Automated Bug Fixing techniques typically do not provide correctness guarantees, due to the problem of incomplete test suites and search space explosion. However, the quality of bug fixing patches generated by Heuristic-Based tools can be improved by complementing the employed technique with machine learning based approaches or other additional data.

2. Synthesis-Based. Synthesis-Based techniques proceed by extracting repair constraints that must be satisfied by the correct repaired code using Symbolic Execution, and the solution to extracted constraints are obtained by constraint solving or search techniques. In the domain of Synthesis-Based techniques, the repair problem can be stated as a synthesis problem.

3. Learning-Based. Learning-Based techniques involve the application of machine learning and deep learning approaches to Automated Bug Fixing. These kind of techniques may gain momentum in coming years, owing to the rising trend in machine learning approaches towards Software Engineering problems. One of the key challenges in this category of techniques is getting large amounts of top quality human patches as training data.

4. Template-Based. Template-Based techniques generate bug fixes for specific types of errors such as null pointer exceptions, buffer overflows and memory leaks etc. These techniques use specific repair templates to fix bugs in the limited scope. For example, insert a conditional statement to guard a de-reference operation with a null-pointer check or insert missing memory deallocation statements. Compared to other techniques, Template-Based techniques trade off repair scope for readability and correctness of the resulting patches.

INTEGRATION

Depending upon the complexity of bug and the kind of technique used, the suggested code patch output by the technique may be applied and tested automatically in Continuous Integration (CI) pipeline, or it may be suggested to a human programmer to apply and validate manually. There are different ways in which Automated Bug Fixing techniques can be incorporated in the development process, such as:

Integrated Development Environment (IDE). Automated Bug Fixing tools can be incorporated as part of the already integrated bug detection tools in development environments. The suggested repairs may be done automatically and notified to the developers or they may be suggested to the developers to be patched manually.
Continuous Integration (CI) Server. Continuous Integration (CI) pipelines, such as Jenkins, are probably a good choice for integrating Automated Bug Fixing techniques into the development process. CI server also provides the necessary inputs for repair tools that utilize test suites as correctness specifications. As indicated in figure below, Automated Bug Fixing can become an activity in CI pipeline that is triggered by regression test failures.

Integrating Automated Bug Fixing in CI Pipeline

Production. When the fault happens at runtime once the program has been deployed in production, the Automated Bug Fixing tools can suggest patches based on bug reports or stack traces.

TOOLS

There are many implementations of Automated Bug Fixing techniques, many of which are research prototypes and few have been put in industrial practice. Some of the popular tools include:

1. Facebook Getafix. Facebook has developed a tool named Getafix which automatically finds repairs for bugs and suggests them to the developers. The tool provides fixes for bugs found by static analysis tool used by Facebook named Infer. In addition, the tool powers Sapfix to suggest fixes for bugs discovered by automated testing tool called Sapienz. Getafix employs a clustering algorithm to learn fix patterns from previous code changes and analyzes the context around the faulty code to find appropriate fixes.

2. GenProg. GenProg is one of the earliest generate-and-test tools, which uses genetic programming to search for bug fixes. The tool evaluates generated repairs using test suites and the process terminates when a repair is found which preserves all required functionality and fixes the bug.

3. Angelix. It is a synthesis-based test-driven tool for C programs. It can generate multiline repairs for large programs at multiple dependent buggy locations.

4. Prophet. Prophet is one of the initial Automated Bug Fixing tools for C programs which uses machine learning and static analysis techniques to train a probabilistic model of patch correctness from the history of successful human patches. After that, it uses the learned model to generate and prioritize potentially correct patches in the search space.

5. ClearView. ClearView is a generate-and-validate tool that employs learned invariants to generate repair patches for fixing fatal defects and security vulnerabilities.

6. DeepFix. DeepFix can fix common programming errors in a program using neural networks trained to predict buggy code locations followed by the correct statements.

7. Repairnator. Repairnator is an open-source tool for automated fixing of test failure errors, compilation errors, static analysis errors and program crashes. Repairnator is integrated with continuous integration tools including Travis CI and Jenkins and makes pull-requests with fixes.

8. PAR. PAR is a generate-and-validate tool which uses manually defined fix templates.

9. LeakFix. LeakFix is a memory-leak fixing tool for C programs which generates fixes by inserting free( ) statements at certain program points.

10. NpeFix. It is an automatic bug fixing tool for NullPointerException in Java.

CHALLENGES

There are various open challenges to be addressed by future research in this domain.

1. Correctness. All the Automated Bug Fixing techniques rely on some measures of patch correctness, that the generated patches actually fix the bug. Most of the techniques use test suites as correctness specifications i.e. a patch is considered correct if it passes all the tests. However, test suites are often incomplete and do not cover all the possible scenarios. Due to this reason, the generated patches may be validated but incorrect. Sometimes the generated patches may be overfitting i.e. producing correct output for all the inputs in test suite but not for other inputs uncovered by test suite. Most of the techniques today do not provide correctness guarantees, which is essential for application of Automated Bug Fixing in safety-critical domain. Due to the problem of weak and incomplete specifications such as the test suites, human-in-loop is deemed necessary by many tools and complete automation is still not achieved.

2. Scope. Research on Automated Bug Fixing techniques has focused on specific kinds of errors, simple programs and making small modifications. Moreover, the tools only target specific areas in the development process and may not work for the bugs in production. An Automated Bug Fixing tool which is general for all kinds of bugs, which can make complex multiple modifications in large programs and which can fix bugs in the program deployed in production is still far away in the foreseeable future.

3. Quality. Quality is about increasing the likelihood that an automatically generated correct fix is readable and maintainable in the long term. Tackling this challenge is one of the most important steps toward real-life adoption of Automated Bug Fixing techniques.

CONCLUSION

Automated Bug Fixing is an active research domain with a lot of work to be accomplished in future. It can provide a lot of support in reducing manual debugging effort and help companies to manage resources efficiently. It takes us one step closer to the ultimate goal of creating learning based AI systems which can write computer programs.

MALIK UMER BLOG

Search This Blog

Top Skills to Master in the Age of AI