False Textual Information Detection - Towards Building a Truth Machine



Journal Title

Journal ISSN

Volume Title



With social media growing dominant, false information, such as questionable claims and fake news, diffuses fast. Detecting false information is one of the most elusive and long-standing challenges. With social media growing dominant, falsehood can diffuse faster and broader than truth. This calls for building a ``truth machine" that automatically debunks false information. Although existing works have developed methods to prevent false information, challenges still remain. For example, previous works demand a large amount of annotated data and related evidence, underestimating the difficulty of evidence linking and the cost of manual annotation. Besides, since a large number of works rely on evidence to determine the credibility of claims, we need to carefully address situations when no evidence or noisy evidence is provided. This thesis aims to improve detecting false textual information from four aspects: 1. we first target sentiment classification because previous works show that leveraging sentiment can boost content-based rumor detection. We propose a representation learning framework that incorporates both labeled and unlabeled data. We show that our model learns robust features across domains and removes domain-specific features. 2. we develop a hierarchical model with attention mechanism so that our model reveals important insights at the paragraph level or at the sentence level. We evaluate our model on news satire detection and find that our model can effectively discover satirical cues at different levels. 3. we extend evidence-aware claim verification from supervised learning to positive-unlabeled learning. This setting requires a comparatively small number of true claims, and more claims can be unlabeled. We adopt the generative adversarial network to generate pseudo negative examples and conduct a thorough analysis of selected models. 4. we pay special attention to analyzing whether estimating entailment between evidence and claim helps not only to verify it but also to the preliminary step of retrieving the necessary evidence. We find that entailment indeed improves evidence ranking, as far as the entailment model produces reliable outputs.



False information detection


Portions of this document appear in: Yang, Fan, Arjun Mukherjee, and Yifan Zhang. "Leveraging Multiple Domains for Sentiment Classification." In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2978-2988. 2016. And in: Yang, Fan, Arjun Mukherjee, and Eduard Dragut. "Satirical news detection and analysis using attention mechanism and linguistic features." arXiv preprint arXiv:1709.01189 (2017). And in: De Sarkar, Sohan, Fan Yang, and Arjun Mukherjee. "Attending sentences to detect satirical fake news." In Proceedings of the 27th International Conference on Computational Linguistics, pp. 3371-3380. 2018.