Learning to Identify Internet Sexual Predation

India McGhee, Jennifer Bayzick, April Kontostathis, Lynne Edwards, Alexandra McBride, and Emma Jakubowski
International Journal of Electronic Commerce,
Volume 15 Number 3, Spring 2011, pp. 103.

Abstract: This work integrates communication theories and computer science algorithms to create a program that can detect the occurrence of sexual predation in an online social setting. Although much work has discussed social media in general, this particular aspect of online social interaction remains largely unexplored. In previous work we developed phrase-matching and rule-based approaches to classify and label lines of chat logs. In the current work we expand these techniques and use machine learning algorithms to classify posts. Our machine learning system leveraged the phrase-matching and rule-based systems to identify appropriate attributes for our supervised learning algorithms. Our machine learning experiments confirmed that the rules we developed are adequate to identify the coding rules. Neither decision trees nor instance-based learning algorithms were able to significantly improve upon the 68 percent accuracy we were able to achieve using the rule-based methods employed by a software program called ChatCoder 2, as described here.

Key Words and Phrases: classifiers, sexual predation, social media.