NLP-based approaches and tools have been proposed to improve the efficiency of software engineers, processes, and products, by automatically processing natural language artifacts (issues, emails, commits, etc.).
We believe that the availability of accurate tools is becoming increasingly necessary to improve Software Engineering (SE) processes. Two important processes are (i) issue management and prioritization and (ii) code comment classification where developers have to understand, classify, prioritize, assign, etc. incoming issues and code comments reported by end-users and developers.
We are pleased to announce the second edition of the NLBSE'23 tool competition on issue report classification and, for the first time on code comment classification; two important tasks in issue and code comment management and prioritization.
You are invited to participate in one or both tool competitions.
The issue report classification competition consists of building and assessing a multi-class classification model to classify issue reports as belonging to one category representing the type of information they convey.
We provide a dataset encompassing more than 1.4 million labeled issue reports (as bugs, enhancements, questions, and documentation) extracted from real open-source projects. You are invited to leverage this dataset to evaluate your proposed approache(s) and compare your achieved results against our baselines (based on FastText and RoBERTa).
You must train, tune and evaluate your multi-class classification model(s) using the provided training and test sets. To access these datasets as well as the competition's rules and baselines, please check out our repository.
Compared to the 2022 version of the issue report competition, we have made the following changes:
The issue report classification competition is organized by: Rafael Kallis (rk@rafaelkallis.com) and Maliheh Izadi (m.izadi@tudelft.nl).
The code comment classification competition consists of building and testing a set of binary classifiers to classify class comment sentences as belonging to one or more categories representing the types of information that a sentence is conveying.
For the competition, we provide a dataset of 6738 class comment sentences and 19 baseline classifiers based on the Random Forest model. Participants will propose their classifiers for this task to outperform the baselines.
You must train, tune and evaluate your binary classifiers using the provided training and test sets. We created a notebook and repository with information about the code comment classification competition including dataset, rules, baselines, and results.
The code comment classification competition is organized by: Pooja Rani (rani@ifi.uzh.ch), Oscar Chaparro (oscarch@wm.edu) and Luca Pascarella (lpascarella@ethz.ch).
To participate in any of the competitions, you must train, tune and evaluate your models using the provided training and test sets of the respective competition.
Additionally, you must write a paper (2-4 pages) describing:
Submit the paper by the deadline using our submission form.
All submissions must conform to the ICSE'23 formatting and submission instructions and do not need to be double-blinded.
Participation in both competitions is allowed, but requires a distinct paper for each submission.
Submissions will be evaluated and accepted based on correctness and reproducibility, defined by the following criteria:
The accepted submissions will be published at the workshop proceedings.
Participants will submit one or more multi-class classifiers and the submissions will be ranked based on the F1 score (micro-averaged) achieved by the proposed classifiers on the issue report test set, as indicated in the papers.
The submission with the highest F1 score will be the winner of the issue report classification competition.
Since participants will submit a set of binary classifiers (based on a single ML/DL model -- see more details in our notebook), we will use a formula to rank the competition submissions and determine a winner.
The formula, specified in our notebook , accounts for: (1) the overall averaged F1 score achieved by the classifiers, and (2) the proportion of classifiers that improve the baseline classifiers. Essentially, we encourage participants to improve as many baseline classifiers as possible.
The reported F1 scores and count of outperforming classifiers in the paper will be used to rank the participants.
Since you will be using the dataset and possibly the original work behind the dataset, please cite the following references in your paper:
@inproceedings{nlbse2023,
author={Kallis, Rafael and Izadi, Maliheh and Pascarella, Luca and Chaparro, Oscar and Rani, Pooja},
title={The NLBSE'23 Tool Competition},
booktitle={Proceedings of The 2nd International Workshop on Natural Language-based Software Engineering (NLBSE'23)},
year={2023}
}
Please cite if participating in the issue report classification competition:
@article{kallis2020tickettagger,
author={Kallis, Rafael and Di Sorbo, Andrea and Canfora, Gerardo and Panichella, Sebastiano},
title={Predicting issue types on GitHub},
journal={Science of Computer Programming},
volume={205},
pages={102598},
year={2021},
issn={0167-6423},
doi={https://doi.org/10.1016/j.scico.2020.102598},
url={https://www.sciencedirect.com/science/article/pii/S0167642320302069}
}
@inproceedings{kallis2019tickettagger,
author = {Kallis, Rafael and Di Sorbo, Andrea and Canfora, Gerardo and Panichella, Sebastiano},
title = {Ticket Tagger: Machine Learning Driven Issue Classification},
booktitle = {2019 {IEEE} International Conference on Software Maintenance and Evolution,
{ICSME} 2019, Cleveland, OH, USA, September 29 - October 4, 2019},
pages = {406--409},
publisher = { {IEEE} },
year = {2019},
doi = {10.1109/ICSME.2019.00070},
}
@inproceedings{izadi2022catiss,
author = {Izadi, Maliheh},
booktitle = {2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)},
title = {CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers},
year = {2022},
pages = {44-47},
doi = {10.1145/3528588.3528662},
url = {https://doi.ieeecomputersociety.org/10.1145/3528588.3528662},
publisher = {IEEE Computer Society}
}
Please cite if participating in the code comment classification competition:
@article{rani2021,
title={How to identify class comment types? A multi-language approach for class comment classification},
author={Rani, Pooja and Panichella, Sebastiano and Leuenberger, Manuel and Di Sorbo, Andrea and Nierstrasz, Oscar},
journal={Journal of systems and software},
volume={181},
pages={111047},
year={2021},
publisher={Elsevier}
}
@inproceedings{DiSorboVPCP21,
author = {Di Sorbo, Andrea and Visaggio, Corrado Aaron and Di Penta, Massimiliano and Canfora, Gerardo and Panichella, Sebastiano},
title = {An NLP-based Tool for Software Artifacts Analysis},
booktitle = { {IEEE} International Conference on Software Maintenance and Evolution,
{ICSME} 2021, Luxembourg, September 27 - October 1, 2021},
pages = {569--573},
publisher = { {IEEE} },
year = {2021},
doi = {10.1109/ICSME52107.2021.00058}
}
February 13, 2023 February 16, 2023
February 27, 2023
March 17, 2023
All dates are Anywhere on Earth (AoE).Issue report classification repository
Code comment classification notebook
Code comment classification repository
The authors of the best accepted (research and tool) papers will be invited to develop and submit a software tool to the NLBSE'23 special issue in the Software Track of the Journal of Science of Computer Programming.