Finding Code-Clone Snippets in Large Source-Code Collection by ccgrep
Finding the same or similar code snippets in the source code for a query code snippet is one of the fundamental activities in software maintenance. Code clone detectors detect the same or similar code snippets, but they report all of the code clone pairs in the target, which are generally excessive to the users. In this paper, we propose ccgrep, a token-based pattern matching tool with the notion of code clone pairs. The user simply inputs a code snippet as a query and specifies the target source code, and gets the matched code snippets as the result. The query and the result snippets form clone pairs. The use of special tokens (named meta-tokens) in the query allows the user to have precise control over the matching. It works for the source code in C, C++, Java, and Python on Windows or Unix with practical scalability and performance. The evaluation results show that ccgrep is effective in finding intended code snippets in large Open Source Software.
|Finding Code-Clone Snippets in Large Source-Code Collection by ccgrep (PaperID5-inoue-CCgrep_OSS_2021_CameraReady.pdf)||295KiB|