About
On this page, we publish the dataset used in the our paper “Onboarding to Open Source Projects with Good First Issues: A Preliminary Analysis (Hyuga Horiguchi, Itsuki Omori and Masao Ohira)” has been accepted for inclusion in the Early Research Achievements (ERA) track of the 28th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’21).
File list
- prs_nums_before_resolving_issue.csv
- resolved_issues_percenrage.csv
- prs_nums_after_resolving_issues.csv
Description
The first file was used for the analysis of RQ1.
The violin plot in Fig. 1 shows the distribution of the prs_num
in the 4th column of the file for each issue_type
in the 3rd column.
The 1st column, dev_id
is the ID to identify the developer. It is used to anonymize the account information of GitHub.
The 2nd column, issue_url
is the URL of the issue resolved by the developer.
The 3rd column, issue_type
shows whether the issue is a Regular Issue or a Good First Issue.
The 4th column, prs_num
is the number of PRs that the developer with the dev_id
has posted on GitHub before resolving the issue with the issue_url
.
The second file was used for the analysis of RQ2.
Table II shows the 1st, 4th, and 7th columns of the file as shown below.
The 1st column, repo_url
is the URL of the repository.
The 2nd column, issues_num
is the number of Regular Issues that the repository has.
The 3rd column, resolved_issues_num
is the number of resolved Regular Issues.
The 4th column, resolved_issues_percentage
is the value of the resolved_issues_num
(3rd column) divided by the issues_num
(2nd column).
The 5th column, good_first_issues_num
is the number of Good First Issues that the repository has.
The 6th column, resolved_good_first_issues_num
is the number of resolved Good First Issues.
The 7th column, resolved_good_first_issues_percentage
is the value of the resolved_good_first_issues_num
(6th column) divided by the good_first_issues_num
(5th column).
The 8th column, resolved_ratio
is the ratio of the resolved_good_first_issues_percentage
(7th column) divided by the resolved_issues_percentage
(4th column).
The third file was used for the analysis of RQ3.
Table III shows the percentage of developers for each repository whose the prs_num
(4th column) is 1 or higher among the Good First Issue of the issue_type
(3rd column).
The 1st column, dev_id
is the ID to identify the developer.
The 2nd column, issue_url
is the URL of the issue resolved by the developer.
The 3rd column, issue_type
shown whether the issue is a Regular Issue or a Good First Issue.
The 4th column, prs_num
is the number of PRs that the developer with the dev_id
has posted to the same repository as the issue_url
after resolving the issue with the issue_url
.
Contact
Hyuga Horiguchi (hhyuga201515xx@xxgmail.com)
Masao Ohira (masaoxx@xxwakayama-u.ac.jp)