6 years ago · 6dbd01393d
--- a/code/README.md
+++ b/code/README.md
@@ -6,7 +6,7 @@ The implementation uses "line number mappings" as proposed by Williams and Spacc
 
				 This repository responds to the call for public SZZ implementations by Rodríguez-Pérez, Robles, and González-Barahona. ["Reproducibility and Credibility in Empirical Software Engineering: A Case Study Based on a Systematic Literature Review of the use of the SZZ Algorithm"](https://www.researchgate.net/publication/323843822_Reproducibility_and_Credibility_in_Empirical_Software_Engineering_A_Case_Study_based_on_a_Systematic_Literature_Review_of_the_use_of_the_SZZ_algorithm), *Information and Software Technology*, Volume 99, 2018.
			
 
				 
			
 
				 If you find SZZ Unleashed useful for your research, please cite our paper:
			
 
				-- Borg, M., Svensson, O., Berg, K., & Hansson, D. (2019). SZZ Unleashed: An Open Implementation of the SZZ Algorithm - Featuring Example Usage in a Study of Just-in-Time Bug Prediction for the Jenkins Project. arXiv preprint [arXiv:1903.01742](https://arxiv.org/abs/1903.01742).
			
 
				+- Borg, M., Svensson, O., Berg, K., & Hansson, D. (2019). SZZ Unleashed: An Open Implementation of the SZZ Algorithm - Featuring Example Usage in a Study of Just-in-Time Bug Prediction for the Jenkins Project. In *Proc. of the Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE)*, pp. 7-12. arXiv preprint [arXiv:1903.01742](https://arxiv.org/abs/1903.01742).
			
 
				 
			
 
				 # Table of Contents
			
 
				 1. [Background](#background)
			
@@ -24,36 +24,36 @@ The bug-introducing commits can be extracted either from a bug tracking system s
 
				 ## Running SZZ Unleashed <a name="szz_usage"></a>
			
 
				 Building and running SZZ Unleashed requires Java 8 and Gradle. Python is required to run the supporting scripts and Docker must be installed to use the provided Docker images. All scripts and compilations has been tested on Linux and Mac, and partly on Windows 10.
			
 
				 
			
 
				-The figure shows a suggested workflow consisting of four steps. Step 1 and Step 2 are pre-steps needed to collect and format required data. Step 3 is SZZ Phase 1, i.e., identifying bug-fixing commits. Step 4 is SZZ Phase 2, i.e., identyfying bug-introducing commits. Steps 1-3 are implemented in Python scripts, whereas Step 4 is implemented in Java.
			
 
				+The figure shows a suggested workflow consisting of four steps. Step 1 and Step 2 are pre-steps needed to collect and format required data. Step 3 is SZZ Phase 1, i.e., identifying bug-fixing commits. Step 4 is SZZ Phase 2, i.e., identifying bug-introducing commits. Steps 1-3 are implemented in Python scripts, whereas Step 4 is implemented in Java.
			
 
				 
			
 
				 ![SZZ Unleashed workflow](/workflow.png) <a name="workflow"></a>
			
 
				 
			
 
				 ### Step 1. Fetch issues (SZZ pre-step) ###
			
 
				-To get issues one needs a bug tracking system. As an example the project Jenkins uses [JIRA](https://issues.jenkins-ci.org).
			
 
				+To get issues one needs a bug tracking system. As an example the project Jenkins uses [Jira](https://issues.jenkins-ci.org).
			
 
				 From here it is possible to fetch issues that we then can link to bug fixing commits.
			
 
				 
			
 
				-We have provided an example script that can be used to fetch issues from Jenkins issues (see 1 in the [figure](#workflow)). In the directory fetch_jira_bugs, one can find the **fetch.py** script. The script has a jql string which is used as a filter to get certain issues. JIRA provides a neat way to test these jql strings directly in the [web page](https://issues.jenkins-ci.org/browse/JENKINS-41020?jql=). Change to the advanced view and then enter the search criteria. Notice that the jql string is generated in the browser's url bar once enter is hit.
			
 
				+We have provided an example script that can be used to fetch issues from Jenkins issues (see 1 in the [figure](#workflow)). In the directory fetch_jira_bugs, one can find the **fetch.py** script. The script has a jql string which is used as a filter to get certain issues. Jira provides a neat way to test these jql strings directly in the [web page](https://issues.jenkins-ci.org/browse/JENKINS-41020?jql=). Change to the advanced view and then enter the search criteria. Notice that the jql string is generated in the browser's url bar once enter is hit.
			
 
				 
			
 
				-To fetch issues from Jenkins JIRA, just run:
			
 
				+To fetch issues from Jenkins Jira, just run:
			
 
				 ```python
			
 
				 python fetch.py --issue-code <issue_code> --jira-project <jira_project_base_url>
			
 
				 ```
			
 
				-passing as parameters the code used for the project issues on JIRA and the name of the JIRA repository of the project (e.g., _issues.jenkins-ci.org_). The script creates a directory with issues (see issues folder in the [figure](#workflow)). These issues will later on be used by the `find_bug_fixes.py` script. 
			
 
				+passing as parameters the code used for the project issues on Jira and the name of the Jira repository of the project (e.g., _issues.jenkins-ci.org_). The script creates a directory with issues (see issues folder in the [figure](#workflow)). These issues will later on be used by the `find_bug_fixes.py` script. 
			
 
				 
			
 
				 A more thorough example of this script can be found [here](./examples/Fetch.md).
			
 
				 
			
 
				 ### Step 2. Preprocess the git log output (SZZ pre-step) ###
			
 
				-Second we need to convert the `git log` output to something that can be processed. That requires a local copy of the repository that we aim to analyze, [Jenkins Core Repository](https://github.com/jenkinsci/jenkins). Onced cloned, one can now run the **git_log_to_array.py** script (see 2 in the [figure](#workflow)). The script requires an absolute path to the cloned repository and a SHA-1 for an initial commit.
			
 
				+Second we need to convert the `git log` output to something that can be processed. That requires a local copy of the repository that we aim to analyze, [Jenkins Core Repository](https://github.com/jenkinsci/jenkins). Once cloned, one can now run the **git_log_to_array.py** script (see 2 in the [figure](#workflow)). The script requires an absolute path to the cloned repository and a SHA-1 for an initial commit.
			
 
				 ```python
			
 
				 python git_log_to_array.py --from-commit <SHA-1_of_initial_commit> --repo-path <path_to_local_repo>
			
 
				 ```
			
 
				-Once executed, this creates a file `gitlog.json` that can be used together with issues that we created with `fetch.py` script. 
			
 
				+Once executed, this creates a file `gitlog.json` that can be used together with issues that we created with the `fetch.py` script. 
			
 
				 
			
 
				 An example of this script and what it produces can be found [in the examples](./examples/GitlogToArray.md).
			
 
				 
			
 
				 ### Step 3. Identify bug-fixing commits (SZZ Phase 1) ###
			
 
				-Now using the `find_bug_fixes.py` (see 3 in the [figure](#workflow)) and this file, we can get a json file
			
 
				-that contains the Issue and its corresponding commit SHA-1, the commit date, the creation date and the resolution date. Just run:
			
 
				+Now, using the `find_bug_fixes.py` (see 3 in the [figure](#workflow)) and this file, we can get a json file
			
 
				+that contains the issue and its corresponding commit SHA-1, the commit date, the creation date and the resolution date. Just run:
			
 
				 ```python
			
 
				 python find_bug_fixes.py --gitlog <path_to_gitlog_file> --issue-list <path_to_issues_directory> --gitlog-pattern "<a_pattern_for_matching_fixes>"
			
 
				 ```
			
@@ -88,20 +88,20 @@ java -jar szz_find_bug_introducers-<version_number>.jar -i <path_to_issue_list.j
 
				 ```
			
 
				 
			
 
				 ## Output from SZZ Unleashed
			
 
				-As shown in the [figure](#workflow), the output consists of three different files: commits.json,
			
 
				-annotations.json and fix\_and\_bug\_introducing\_pairs.json.
			
 
				+As shown in the [figure](#workflow), the output consists of three different files: `commits.json`,
			
 
				+`annotations.json` and `fix_and_bug_introducing_pairs.json`.
			
 
				 
			
 
				-The commits.json file includes all commits that have been blamed to be bug
			
 
				-introducing but which haven't been analyzed by any anything.
			
 
				+The `commits.json` file includes all commits that have been blamed to be bug
			
 
				+introducing but which haven't been analyzed by anything.
			
 
				 
			
 
				-The annotations.json is a representation of the graph that is generated by the
			
 
				+The `annotations.json` is a representation of the graph that is generated by the
			
 
				 algorithm in the blaming phase. Each bug fixing commit is linked to all possible
			
 
				 commits which could be responsible for the bug. Using the improvement from
			
 
				 Williams et al's, the graph also contains subgraphs which gives a deeper search
			
 
				 for responsible commits. It enables the algorithm to blame other commits than
			
 
				 just the one closest in history for a bug.
			
 
				 
			
 
				-Lastly, the fix\_and\_bug\_introducing\_pairs.json includes all possible pairs
			
 
				+Lastly, the `fix_and_bug_introducing_pairs.json` includes all possible pairs
			
 
				 which could lead to a bug introduction and fix. This file is not sorted in any
			
 
				 way and it includes duplicates when it comes to both introducers and fixes. A
			
 
				 fix can be made several times and a introducer could be responsible for many
			
@@ -115,7 +115,7 @@ A description of how to configure SZZUnleashed further can be found in [the exam
 
				 
			
 
				 A more thorough instruction in using docker to produce the results can be found in [doc/Docker.md](doc/Docker.md). Below is a very brief instruction.
			
 
				 
			
 
				-There exists a *Dockerfile* in the repository. It contains all the steps in chronological order that is needed to generate the **fix\_and\_bug\_introducing\_pairs.json**. Simply run this command in the directory where the Dockerfile is located:
			
 
				+There exists a *Dockerfile* in the repository. It contains all the steps in chronological order that is needed to generate the `fix_and_bug_introducing_pairs.json`. Simply run this command in the directory where the Dockerfile is located:
			
 
				 
			
 
				 ```bash
			
 
				 docker build -t ssz .
			
@@ -125,7 +125,7 @@ Then start a temporary docker container:
 
				 ```bash
			
 
				 docker run -it --name szz_con szz ash
			
 
				 ```
			
 
				-In this container it is possible to study the results from the algorithm. The results are located in */root/szz/results*.
			
 
				+In this container it is possible to study the results from the algorithm. The results are located in *./szz/results*.
			
 
				 
			
 
				 Lastly, to copy the results from the container to your own computer run:
			
 
				 ```bash
			
@@ -144,14 +144,12 @@ To illustrate what the output from SZZ Unleashed can be used for, we show how to
 
				 The most simple features are the code churns. These are easily extracted by
			
 
				 just parsing each diff for each commit. The ones that are extracted are:
			
 
				 
			
 
				-1. **Total lines of code** - Which simply is how many lines of code in total
			
 
				-for all changed files.
			
 
				+1. **Total lines of code** - Which simply is how many lines of code in total for all changed files.
			
 
				 2. **Churned lines of code** - This is how many lines that have been inserted.
			
 
				 3. **Deleted lines of code** - The number of deleted lines.
			
 
				 4. **Number of Files** - The total number of changed files.
			
 
				 
			
 
				-To get these features, run: `python assemble_code_churns.py <path_to_repo>
			
 
				-<branch>`
			
 
				+To get these features, run: `python assemble_code_churns.py <path_to_repo> <branch>`
			
 
				 
			
 
				 ### Diffusion Features ###
			
 
				 The diffusion features are:
			
@@ -239,7 +237,7 @@ python model.py train
 
				 
			
 
				 ## Examples and executables <a name="examples_n_exec"></a>
			
 
				 
			
 
				-In [the examples](./examples) directory, one can find documents containing descriptions about each script. There is also [a data directory](./examples/data) where one can find data produced by the scripts. It can be used to either study how the output should look like or if anyone just want a dataset to train on.
			
 
				+In [the examples](./examples) directory, one can find documents containing descriptions about each script. There is also [a data directory](./examples/data) containing data produced by the scripts. It can be used to either study how the output should look like or if anyone just wants a dataset to train on.
			
 
				 
			
 
				 ## Authors <a name="authors"></a>