Named Entity Recognition on Social Media

Date

2022-12-02

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

With the increase in popularity of social media platforms (e.g., Twitter, Facebook, and Snapchat), more and more people create, share, and exchange information and ideas in such virtual spaces every day. Consequently, this raises an increasing demand for more tools and resources to automate the processing of social media text. Specifically, the user-generated text on social media tends to be ambiguous and incomprehensible as it is often very short and contains many misspellings and language variations, making it difficult for machine learning systems to perform correctly. Moreover, social media is considered a low-resource domain, with relatively less data available for building machine learning systems. Annotating data is always time-consuming and labor-intensive, and it requires domain knowledge and experts.

This dissertation aims at presenting novel methods to mitigate performance degradation and improve the model robustness of named entity recognition (NER) systems on social media. In an effort to mitigate performance degradation, we study text and image information extraction and fusion to adapt NER systems to multimodal social media environments. Besides, we propose exploiting trending topics to mitigate the impact of temporal drift.

For the purpose of improving model robustness, we investigate data augmentation techniques to increase the size and diversity of the data used for training NER systems. We propose new methods for transferring data across domains based on textual patterns (e.g., style and noise). Additionally, given that social media is a low-resource domain, we propose adversarial attack methods to audit the model robustness by creating adversarial examples to identify the potential vulnerabilities of NER systems.

The methods presented in this dissertation are meant to make NER systems resilient to performance decreases, robust under various conditions, and reliable in noisy social media environments, in the hope of benefiting downstream natural language processing tasks such as information extraction, question answering, machine reading comprehension, etc.

Description

Keywords

Named entity recognition, Social media

Citation

Portions of this document appear in: Chen, Shuguang, Gustavo Aguilar, Leonardo Neves, and Thamar Solorio. "Can images help recognize entities? A study of the role of images for Multimodal NER." arXiv preprint arXiv:2010.12712 (2020); and in: Chen, Shuguang, Leonardo Neves, and Thamar Solorio. "Mitigating temporal-drift: A simple approach to keep NER models crisp." arXiv preprint arXiv:2104.09742 (2021); and in: Chen, Shuguang, Gustavo Aguilar, Leonardo Neves, and Thamar Solorio. "Data augmentation for cross-domain named entity recognition." arXiv preprint arXiv:2109.01758 (2021); and in: Chen, Shuguang, Leonardo Neves, and Thamar Solorio. "Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition." arXiv preprint arXiv:2210.07916 (2022).