False Textual Information Detection - Towards Building a Truth Machine
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
With social media growing dominant, false information, such as questionable claims and fake news, diffuses fast. Detecting false information is one of the most elusive and long-standing challenges. With social media growing dominant, falsehood can diffuse faster and broader than truth. This calls for building a ``truth machine" that automatically debunks false information. Although existing works have developed methods to prevent false information, challenges still remain. For example, previous works demand a large amount of annotated data and related evidence, underestimating the difficulty of evidence linking and the cost of manual annotation. Besides, since a large number of works rely on evidence to determine the credibility of claims, we need to carefully address situations when no evidence or noisy evidence is provided. This thesis aims to improve detecting false textual information from four aspects: 1. we first target sentiment classification because previous works show that leveraging sentiment can boost content-based rumor detection. We propose a representation learning framework that incorporates both labeled and unlabeled data. We show that our model learns robust features across domains and removes domain-specific features. 2. we develop a hierarchical model with attention mechanism so that our model reveals important insights at the paragraph level or at the sentence level. We evaluate our model on news satire detection and find that our model can effectively discover satirical cues at different levels. 3. we extend evidence-aware claim verification from supervised learning to positive-unlabeled learning. This setting requires a comparatively small number of true claims, and more claims can be unlabeled. We adopt the generative adversarial network to generate pseudo negative examples and conduct a thorough analysis of selected models. 4. we pay special attention to analyzing whether estimating entailment between evidence and claim helps not only to verify it but also to the preliminary step of retrieving the necessary evidence. We find that entailment indeed improves evidence ranking, as far as the entailment model produces reliable outputs.