Authorship Attribution for Realistic Scenarios

Shrestha, Prasha 1987-

Authorship Attribution for Realistic Scenarios

dc.contributor.advisor	Solorio, Thamar
dc.contributor.committeeMember	Gonzalez, Fabio A.
dc.contributor.committeeMember	Rosso, Paolo
dc.contributor.committeeMember	Eick, Christoph F.
dc.contributor.committeeMember	Verma, Rakesh M.
dc.creator	Shrestha, Prasha 1987-
dc.date.accessioned	2018-11-30T17:14:17Z
dc.date.available	2018-11-30T17:14:17Z
dc.date.created	May 2018
dc.date.issued	2018-05
dc.date.submitted	May 2018
dc.date.updated	2018-11-30T17:14:17Z
dc.description.abstract	A majority of the previous works on authorship attribution make several assumptions while designing their problem. They assume that the candidate author set size is small and that documents of substantial length are available for each author. Also, they only consider a single genre scenario where texts with known authorship are of the same topic and genre as the text for which we are trying to perform attribution. In today's world, where most communication happens online, the text is likely to be short and the anonymity that social media offers makes it hard to narrow down the candidate authors. Moreover, for domains such as emails, we might not be able to garner in-domain data, and thus we need to be able to use data from more readily available sources such as tweets and reviews. We devise a more practical, albeit challenging, problem that is closely aligned with possible real-world authorship attribution problems. We consider short documents, a long list of possible authors, and the ability to leverage datasets from any available domain. In this work, we build neural network based models that create a well-rounded representation of the input text. A good representation of the text must be able to catch the smallest of signals present in it that can point towards the author. Only a model that can accomplish this can work for short texts while also being fairly robust to an increasing number of authors. Our results show that we were indeed successful in building such models. Our cross-domain representations are capable of distilling out the topic-specific attributes of the text such that what remains is purely owing to an author's style. This ensures that the attribution performance does not degrade when we move from in-domain data to cross-domain data. It is essential for authorship attribution methods to work for realistic scenarios, even though this adds more complexity to the task. We find that it is indeed possible to create methods that can perform well even in these challenging situations.
dc.description.department	Computer Science, Department of
dc.format.digitalOrigin	born digital
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/10657/3472
dc.language.iso	eng
dc.rights	The author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subject	Authorship attribution
dc.subject	Domain adaptation
dc.subject	Deep learning
dc.subject	Representation learning
dc.subject	Embeddings
dc.subject	CNN for NLP
dc.title	Authorship Attribution for Realistic Scenarios
dc.type.dcmi	Text
dc.type.genre	Thesis
local.embargo.lift	2020-05-01
local.embargo.terms	2020-05-01
thesis.degree.college	College of Natural Sciences and Mathematics
thesis.degree.department	Computer Science, Department of
thesis.degree.discipline	Computer Science
thesis.degree.grantor	University of Houston
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy

Files

Original bundle

Now showing 1 - 2 of 2

Name:: SHRESTHA-DISSERTATION-2018.pdf
Size:: 741.87 KB
Format:: Adobe Portable Document Format

Download

Name:: prasha-thesis.zip
Size:: 675.46 KB
Format:: Unknown data format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 4.43 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 1.82 KB
Format:: Plain Text
Description:

Download

Collections

Published ETD Collection