Automatic Detection of Nastiness and Early Signs of Cyberbullying Incidents on Social Media
Safi Samghabadi, Niloofar
MetadataShow full item record
Although social media has made it easy for people to connect on an unlimited virtual space, it has also opened doors to people who misuse it to bully others. Nowadays, abusive behavior and cyberbullying are considered as major issues in cyberspace that can seriously affect the mental and physical health of victims. However, due to the growing number of social media users, manual moderation of online content is impractical. Available automatic systems for hate speech and cyberbullying detection fail to make opportune predictions, which makes them ineffective for warning the possible victims of these attacks. In this thesis, we aim at advancing new technology that will help to protect vulnerable online users against cyber attacks. As a first approximation to this goal, we develop computational methods to identify extremely aggressive texts automatically. We start by exploiting a wide range of linguistic features to create a machine learning model to detect online abusive content. Then, we build a deep neural architecture to identify offensive content in online short and noisy texts more precisely, by incorporating emotion information into textual representations. We further expand these methods and propose a Natural Language Processing system that constantly monitors online conversations, and triggers an alert when a possible case of cyberbullying is happening. We design a new evaluation framework, and show that our system is able to provide timely and accurate cyberbullying predictions, based on limited evidence. In this research, we are mainly concerned about kids and young adults, as the most vulnerable group of users under online attacks. To this end, we propose new language resources for both tasks of abusive language and cyberbullying detection from social media platforms that are specifically popular among youth. Furthermore, within our experimentations, we discuss the differences among these corpora and the other available resources that include data on adult topics.