Jointly Learning to Repair Code and Generate Commit Message

Type: Article

Publication Date: 2021-01-01

Citations: 4

DOI: https://doi.org/10.18653/v1/2021.emnlp-main.771

View Chat PDF

Abstract

We propose a novel task of jointly repairing program codes and generating commit messages. Code repair and commit message generation are two essential and related tasks for software development. However, existing work usually performs the two tasks independently. We construct a multilingual triple dataset including buggy code, fixed code, and commit messages for this novel task. We first introduce a cascaded method with two models, one is to generate the fixed code first, and the other generates the commit message based on the fixed and original codes. We enhance the cascaded method with different training approaches, including the teacher-student method, the multi-task method, and the back-translation method. To deal with the error propagation problem of the cascaded method, we also propose a joint model that can both repair the program code and generate the commit message in a unified framework. Massive experiments on our constructed buggy-fixed-commit dataset reflect the challenge of this task and that the enhanced cascaded model and the proposed joint model significantly outperform baselines in both quality of code and commit messages.

Locations

  • arXiv (Cornell University) - View - PDF
  • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing - View - PDF

Similar Works

Action Title Year Authors
+ Jointly Learning to Repair Code and Generate Commit Message 2021 Jiaqi Bai
Long Zhou
Ambrosio Blanco
Shujie Liu
Furu Wei
Ming Zhou
Zhoujun Li
+ CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model 2021 Tae-Hwan Jung
+ CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model 2021 Tae-Hwan Jung
+ CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model 2021 Tae Hwan Jung
+ CoreGen: Contextualized Code Representation Learning for Commit Message Generation 2020 Lun Yiu Nie
Cuiyun Gao
Zhicong Zhong
Wai Lam
Yang Liu
Zenglin Xu
+ CommitBART: A Large Pre-trained Model for GitHub Commits 2022 Shangqing Liu
Yanzhou Li
Yang Liu
+ Contextualized Code Representation Learning for Commit Message Generation 2020 Lun Yiu Nie
Cuiyun Gao
Zhicong Zhong
Wai Lam
Yang Liu
Zenglin Xu
+ RACE: Retrieval-Augmented Commit Message Generation 2022 Ensheng Shia
Yanlin Wang
Lun Du
Hongyu Zhang
Han Shi
Dongmei Zhang
Hongbin Sun
+ PDF Chat RACE: Retrieval-augmented Commit Message Generation 2022 Ensheng Shi
Yanlin Wang
Weiliang Tao
Lun Du
Hongyu Zhang
Han Shi
Dongmei Zhang
Hongbin Sun
+ PDF Chat On the Evaluation of Commit Message Generation Models: An Experimental Study 2021 Wei Tao
Yanlin Wang
Ensheng Shi
Lun Du
Shi Han
Hongyu Zhang
Dongmei Zhang
Wenqiang Zhang
+ PDF Chat On the Evaluation of Commit Message Generation Models: An Experimental Study 2021 Wei Tao
Yanlin Wang
Ensheng Shi
Du Lun
Shi Han
Hongyu Zhang
Zhang Dong-mei
Zhang Wen-qiang
+ Delving into Commit-Issue Correlation to Enhance Commit Message Generation Models 2023 Liran Wang
Xunzhu Tang
Yichen He
Changyu Ren
Shuhua Shi
Chaoran Yan
Zhoujun Li
+ PDF Chat Automated Commit Message Generation with Large Language Models: An Empirical Study and Beyond 2024 Pengyu Xue
Linhao Wu
Zhongxing Yu
Zhi Jin
Zhen Yang
Xinyi Li
Zhenyu Yang
Yue Tan
+ PDF Chat RAG-Enhanced Commit Message Generation 2024 Linghao Zhang
Hongyi Zhang
Chong Wang
Peng Liang
+ On the Evaluation of Commit Message Generation Models: An Experimental Study 2021 Tao Wei
Yanlin Wang
Ensheng Shi
Lun Du
Han Shi
Hongyu Zhang
Dongmei Zhang
Wenqiang Zhang
+ PDF Chat Delving into Commit-Issue Correlation to Enhance Commit Message Generation Models 2023 Liran Wang
Xunzhu Tang
Yichen He
Changyu Ren
Shuhua Shi
Chaoran Yan
Zhoujun Li
+ Less Training, More Repairing Please: Revisiting Automated Program Repair via Zero-shot Learning 2022 Chunqiu Steven Xia
Lingming Zhang
+ PDF Chat CCTEST: Testing and Repairing Code Completion Systems 2023 Zongjie Li
Chaozheng Wang
Zhibo Liu
Haoxuan Wang
Dong Chen
Shuai Wang
Cuiyun Gao
+ PDF Chat CIRCLE: continual repair across programming languages 2022 Wei Yuan
Quanjun Zhang
Tieke He
Chunrong Fang
Quoc Viet Hung Nguyen
Xiaodong Hao
Hongzhi Yin
+ CCTEST: Testing and Repairing Code Completion Systems 2022 Zongjie Li
Chaozheng Wang
Zhibo Liu
Haoxuan Wang
Shuai Wang
Cuiyun Gao

Citing (36)

Action Title Year Authors
+ Distilling the Knowledge in a Neural Network 2015 Geoffrey E. Hinton
Oriol Vinyals
Jay B. Dean
+ Sequence to Sequence Learning with Neural Networks 2014 Ilya Sutskever
Oriol Vinyals
Quoc V. Le
+ Neural Machine Translation by Jointly Learning to Align and Translate 2014 Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
+ Edinburgh Neural Machine Translation Systems for WMT 16 2016 Rico Sennrich
Barry Haddow
Alexandra Birch
+ PDF Chat Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation 2017 Melvin Johnson
Mike Schuster
Quoc V. Le
Maxim Krikun
Yonghui Wu
Zhifeng Chen
Nikhil Thorat
Fernanda Viégas
Martin Wattenberg
Greg S. Corrado
+ A Teacher-Student Framework for Zero-Resource Neural Machine Translation 2017 Yun Chen
Yang Liu
Yong Cheng
Victor O. K. Li
+ A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes 2017 Pablo Loyola
Edison Marrese-Taylor
Yutaka Matsuo
+ PDF Chat Elixir: Effective object-oriented program repair 2017 Ripon K. Saha
Yingjun Lyu
Hiroaki Yoshida
Mukul R. Prasad
+ Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study 2018 Tao Ge
Furu Wei
Ming Zhou
+ Cross-lingual Language Model Pretraining 2019 Guillaume Lample
Alexis Conneau
+ Neural Program Repair by Jointly Learning to Localize and Repair 2019 Marko Vasić
Aditya Kanade
Petros Maniatis
David Bieber
Rishabh Singh
+ Neural Program Repair by Jointly Learning to Localize and Repair 2019 Marko Vasić
Aditya Kanade
Petros Maniatis
David Bieber
Rishabh Singh
+ PDF Chat Neural Machine Translation of Rare Words with Subword Units 2016 Rico Sennrich
Barry Haddow
Alexandra Birch
+ Edinburgh Neural Machine Translation Systems for WMT 16 2016 Rico Sennrich
Barry Haddow
Alexandra Birch
+ PDF Chat A Teacher-Student Framework for Zero-Resource Neural Machine Translation 2017 Yun Chen
Yang Liu
Yong Cheng
Victor O. K. Li
+ Attention is All you Need 2017 Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin
+ A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes 2017 Pablo Loyola
Edison Marrese-Taylor
Yutaka Matsuo
+ Neural Machine Translation by Jointly Learning to Align and Translate 2015 Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
+ PDF Chat Automatically generating commit messages from diffs using neural machine translation 2017 Siyuan Jiang
Ameer Armaly
Collin McMillan
+ PDF Chat Robust Change Captioning 2019 Dong Huk Park
Trevor Darrell
Anna Rohrbach
+ PDF Chat Learning How to Mutate Source Code from Bug-Fixes 2019 Michele Tufano
Cody Watson
Gabriele Bavota
Massimiliano Di Penta
Martin White
Denys Poshyvanyk
+ CodeBERT: A Pre-Trained Model for Programming and Natural Languages 2020 Zhangyin Feng
Daya Guo
Duyu Tang
Nan Duan
Xiaocheng Feng
Ming Gong
Linjun Shou
Bing Qin
Ting Liu
Daxin Jiang
+ IntelliCode Compose: Code Generation Using Transformer 2020 Alexey Svyatkovskiy
Shao Kun Deng
Sheng‐Yu Fu
Neel Sundaresan
+ Graph-based, Self-Supervised Program Repair from Diagnostic Feedback 2020 Michihiro Yasunaga
Percy Liang
+ Unsupervised Translation of Programming Languages 2020 Marie-Anne Lachaux
Baptiste Rozière
Lowik Chanussot
Guillaume Lample
+ TAG : Type Auxiliary Guiding for Code Comment Generation 2020 Ruichu Cai
Zhihao Liang
Boyan Xu
zijian li
Yuexing Hao
Yao Chen
+ Contextualized Code Representation Learning for Commit Message Generation 2020 Lun Yiu Nie
Cuiyun Gao
Zhicong Zhong
Wai Lam
Yang Liu
Zenglin Xu
+ GraphCodeBERT: Pre-training Code Representations with Data Flow 2020 Daya Guo
Shuo Ren
Shuai Lu
Zhangyin Feng
Duyu Tang
Shujie Liu
Long Zhou
Nan Duan
A. Svyatkovskiy
Sheng‐Yu Fu
+ CodeBLEU: a Method for Automatic Evaluation of Code Synthesis 2020 Shuo Ren
Daya Guo
Shuai Lu
Long Zhou
Shujie Liu
Duyu Tang
Neel Sundaresan
Ming Zhou
Ambrosio Blanco
Shuai Ma
+ Unsupervised Translation of Programming Languages 2020 Baptiste Rozière
+ PDF Chat ATOM: Commit Message Generation Based on Abstract Syntax Tree and Hybrid Ranking 2020 Shangqing Liu
Cuiyun Gao
Sen Chen
Lun Yiu Nie
Yang Liu
+ PyMT5: multi-mode translation of natural language and Python code with transformers 2020 Colin Clement
Dawn Drain
Jonathan Timcheck
A. Svyatkovskiy
Neel Sundaresan
+ PDF Chat IntelliCode compose: code generation using transformer 2020 Alexey Svyatkovskiy
Shao Kun Deng
Sheng‐Yu Fu
Neel Sundaresan
+ CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation 2021 Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
A. Svyatkovskiy
Ambrosio Blanco
Colin B. Clement
Dawn Drain
Daxin Jiang
Duyu Tang
+ PDF Chat ATOM: Commit Message Generation Based on Abstract Syntax Tree and Hybrid Ranking 2020 Shangqing Liu
+ Attention Is All You Need 2017 Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin