A Study on the Impact of Transfer Learning for Deception Detection
In the modern age, an enormous amount of communication occurs online, and it is difficult to know when something written is genuine or deceitful. There exist many reasons for someone to be less-than-truthful online (i.e., monetary gain, political gain), and identifying this behavior without any physical interaction is a difficult task. To address this, we utilize eight datasets from various domains to evaluate their effect on classifier performance when combined with transfer learning. We perform these experiments with multiple classifiers TFIDF features for classification and find that traditional classifiers suffer from a decrease in performance in almost all cases. Additionally, we generated text to evaluate transfer between a dataset similar to the target dataset and found that this improved BERT performance. Finally, we explored the effect that combining embeddings generated by separate BERT models fine-tuned on separate deception datasets has on performance and saw several examples of improvement in baseline accuracy. Furthermore, the effect of using multiple methods that add information to text via named entities was evaluated using a BERT model as well as a transfer learning method. We found that baseline BERT accuracy increased by up to 7.3%, with the most useful method replacing a named entity with its part-of-speech tag. Finally, we found that adding information via named entities consistently improved transfer learning accuracy for at least one method of adding information.