关键词:
authorship
fingerprint
Java
software
摘要:
Computer programs belong to the authors who design, write, and test them. Authorship identification is concerned with determining the likelihood of a particular author having written some piece(s) of code, usually based on other code samples from the same programmer. Java is a popular object-oriented computer programming language. Programming fingerprints attempt to characterize the features that are unique to each programmer. In this study, we investigated the extraction of a set of software metrics of a given Java source code-by a program written in Visual C++-that could be used as a fingerprint to identify the author of the Java code. The contributions of the selected metrics to authorship identification were measured by a statistical process, namely canonical discriminant analysis, using the statistical software package SAS. Out of the 56 extracted metrics, 48 metrics were identified as being contributive to authorship identification. The authorship of 62.6-67.2% of the Java programs considered could be correctly identified with the extracted metrics. The identification rate could be as high as 85.8%, with derived canonical variates. Moreover, layout metrics played a more important role in authorship identification than the other metrics. (C) 2003 Elsevier Inc. All rights reserved.