THE USE OF PHRASEWORD AND LOCAL-WEIGTED TERMS AS FEATURES FOR TEXT REUSE AND PLAGIARISM DETECTION

  • Lucia Dwi Krisnawati
Keywords: text reuse, plagiarism detection, phrasewords, local weighting technique

Abstract

This study presents a framework for detecting text reuse which is based on two novel features for its two different stages. On the source retrieval subtask, it introduces the use of phrasewords, while on the text alignment subtask, significant words weighted locally are introduced as seeds. The experiment results shows that the proposed methods are capable of recognizing not only the (near-) duplicate cases, but partially reused cases, and the paraphrased texts as well.

Keywords: text reuse, plagiarism detection, phrasewords, local weighting technique

References

Alfikri, Z. F., & Purwarianti, A. 2012. The Construction of Indonesian English Cross Language Plagiarism Detection. Computer Science and Information Journal, 5(1), 16-23.
Alvi, F., Stevenson, M., & Clough, P. 2014. Hashing and Merging Heuristics for Text Reuse Detection. Notebook Papers of PAN CLEF 2014 Labs and Workshops. Retrieved from http://www.uni-weimar.de/medien/webis/events/pan-14/pan14web/about.html#proceedings.
Bandersky, M., & Croft, W. B. 2009. Finding Text Reuse on the Web. Proc. 2nd ACM International Conf, (pp. 262-271). ACM.
Cha, S. H. 2012. Comprehensive survey on distance/ similarity measures between probability functions. Journal of Math. Models and Methods in Applied Sciences, 1(4), 300-307.
Charikar, M. 2008. Similarity Estimation Techniques from Rounding Algorithm. Proceeding of 34th Annual Symposium on Theory of Computing (STOC), (pp. 380-388).
Elizalde, V. 2014. Using Noun Phrases and tf-idf for Plagiarized Document Retrieval. Notebook Papers of PAN at CLEF 2014. Retrieved from http://www.uni-weimar.de/medien/webis/events/pan-14/pan14-web/about.html
Gipp, B. 2014. Citation-based Plagiarism Detection: Detecting Disguise and Cross-language Plagiarism Using Citation Pattern Analysis. Wiesbaden: Springer Verlag.
Glinos, D. 2014. A Hybrid architecture of plagiarism detection. PAN CLEF´14 Labs and Workshop.
Gross, P., & Modaresi, P. 2014. Plagiarism Alignment Detection by Merging Context Seeds. Notebook Papers of PAN CLEF 2014 Labs and Workshops.
Haggag, O., & El-Beltagy, S. 2014. Plagiarism Candidate Retrieval Using Selective Query Formulation and Discriminative Query Scoring. In Forner (Ed.), Notebook Papers of PAN at CLEF 2013 (2013). Retrieved from http://www.uni-weimar.de/medien/ webis/events/pan-13/pan13-web/about.html#proceedings
Khan, I. H., et al. 2015. A Freamework for Plagiarism Detection in Arabic Documents. (N. e. al., Ed.) CCSEA, 5, 01-09.
Kiabod, M., Dehkordi, M. N., & Sharafi, S. M. 2012. A novel method of significat words identification in text summarization . Journal of Emerging Technologies in Web Intelligence, 4(3), 252-258.
Kong, L. et al. 2015. Source Retrieval and Text Alignment Corpus Construction for Plagiarism Detection. In L. Cappellato, N. Ferro, G. Jones, & E. S. Juan (Ed.), CLEF 2015 Evaluation Labs and Workshop – Working Notes Papers. Toulose, France.
Krisnawati, L. D., & Schulz, K. U. 2013.. Plagiarism detection for Indonesian texts. (E. Weipple, Ed.) Proceedings of the 15th Int. Conference on Information Integration and Web-based Applications and Services (iiWAS2013), pp. 595-599.
Krisnawati, L. D., & Schulz, K. U. 2017. Significant word-based Text Alignment for Text Reuse Detection. Int. Conf. on Research and Innovation in Computer, Electronics and Manufacturing Engineering (RICEME-17) (pp. 7-12). Bali: EIRAI.
Mardiana, T., Adji, T., & Hidajah, I. 2015. The Comparation of Distance-based Similarity Measure to Detection of Plagiarism in Indonesian Text. In R. e. Intan (Ed.), ICSIIT 2015 (pp. 155-164). Springer Verlag.
Potthast, et. Al. 2012. Overview of the4th International Competitionon PlagiarismDetection. In P. Forner (Ed.), Notebook Papers of CLEF 2012 Labs and Workshops. Rome, Italy. Retrieved from http://www.uni-weimar.de/medien/webis/events/pan-12/pan12-web/about.html
Potthast, M., et.al. 2010. An evaluation framework for plagiarism detection. 2nd International Conference on Computational Linguistics (COLING’10), pp. 997-1005.
Prakash, A., & Saha, K. S. 2014. Experiments on Document Chunking and Query Formulation for Plagiarism Source Retrieval. . Retrieved from Notebook for PAN at CLEF 2014.
Stamatatos, M. 2011. Plagiarism detection using stopword n-grams. American Society for Information Science and Technical Journal, 62(15), 2512-2527.
Stein, B., Eissen, S. M., & Potthast, M. 2007. Strategies for Retrieving plagiarized documents. SIGIR 07. Amsterdam: ACM.
Stein, B., Lipka, N., & Pettenhofer, P. 2011. Intrinsic Plagiarism Analysis. Journal of Languange Resources and Evaluation, 45(1), 63-82.
Suryata, A. F., Wibowo, A. T., & Romadhany, A. 2014. Performance efficicency in plagiarism indication detection system using indexing method with data tree 2-3. Int. Conf. Information and Communication Tech. (IcolICT), pp. 403-408.
Published
2018-02-06
How to Cite
Krisnawati, L. (2018). THE USE OF PHRASEWORD AND LOCAL-WEIGTED TERMS AS FEATURES FOR TEXT REUSE AND PLAGIARISM DETECTION. Seminar Hasil Penelitian Bagi Civitas Akademika 2017, 1(1), 27-44. Retrieved from https://genesis.ukdw.ac.id/lppm/seminar/index.php/seminar2017/article/view/2