A Deep Learning Approach to Software Clone Detection
A code clone is a code segment that is similar to another one based on a certain similarity measure. Clones appear as a result of the common programming practice of “copy and paste”. However, cloned codes can cause serious problems in software maintenance and upgrading efforts. In addition, they are –sometimes-a sign of poor design. This paper presents an approach for predicting clones in open-source software systems based on the two commonly adopted approaches in deep learning; namely transfer learning and long short-term memory (LSTM) networks. A pre-trained LSTM deep network for text classification is used to extract the relevant features of similar code segments in open source systems. Then a support vector machine classifier is used to classify the clones using the extracted features. Experimental results show that the two employed techniques can predict similar code clones of the used systems with good accuracy, which indicates that the used pre-trained network can detect structural similarity that exists among code clones.
Index Terms - Code clones detection, deep learning, long short-term memory networks, and transfer learning,