High Impact Bug Dataset

Background and Motivation

Since increasing complexity and scale of modern software products imposes tight scheduling and resource allocations on software development projects, a project manager must carefully triage bugs to determine which bug should be necessarily fixed before shipping. Although in the field of Mining Software Repositories (MSR) there are many promising approaches to predicting, localizing, and triaging bugs, most of them do not consider impacts of each bug on users and developers but rather treat all bugs with equal weighting, excepting a few studies on high impact bugs including security, performance, blocking, and so forth. To make MSR techniques more actionable and effective in practice, we need deeper understandings of high impact bugs. In order to more precisely capture characteristics and consequences of high impact bugs, we collected 4,000 issue reports from four open source projects and tagged six types of high impact bugs on the collected issues by reviewing them manually.


You can download our dataset of high impact bags collected from the Apache Ambari, Camel, Derby, and Wicket projects. The dataset includes not only the information of issue reports but also the information of labels of high impact bugs: surprise, dormant, blocking, performance, security, and breakage bugs.

  • dataset (2.3MB, all data are included in MS-Excel FORMAT)

  • ambari (0.7MB, ambari project’s data in csv FORMAT)

  • camel (1.8MB, camel project’s data in csv FORMAT)

  • derby (1.8MB, derby PROJECT’S DATA IN CSV FORMAT)

  • wicket (1.4MB, wicket project’s data IN csv FORMAT)


Wakayama University, Japan

  • Yutaro Kashiwa (graduate student)
  • Yosuke Yamatani (graduate student)
  • Hayato Yoshiyuki (graduate student)
  • Yoshiya Maeda (graduate student)
  • Masao Ohira (associate professor)

Nara Institute of Science and Technology, Japan

  • Nachai Limsettho (graduate student)
  • Keisuke Fujino (graduate student)
  • Hideaki Hata (assistant professor)
  • Akinori Ihara (assistant professor)
  • Kenichi Matsumoto (professor)

Related publications:

  • Masao Ohira, Yutaro Kashiwa, Yosuke Yamatani, Hayato Yoshiyuki, Yoshiya Maeda, Nachai Limsettho, Keisuke Fujino, Hideaki Hata, Akinori Ihara and Kenichi Matsumoto, “A Dataset of High Impact Bugs: Manually-Classified Issue Reports”, In Proceedings, of The 12th Working Conference on Mining Software Repositories, 2015. (To appear) [Poster]
  • Yutaro Kashiwa, Hayato Yoshiyuki, Yusuke Kukita, and Masao Ohira, “A Pilot Study of Diversity in High Impact Bugs,” In Proceedings of 30th International Conference on Software Maintenance and Evolution (ICSME2014), pages 536–540, 2014.


Feel free to contact us If you find any problems with our dataset.