The expansion of internet connectivity, specifically in rural communities, allows more people to construct an online presence such as a social media account. Currently, many NLP systems rely on language models, which are models that predict the next word in a sequence and often require large amounts of text to perform well. This text is often authored by multiple people and exhibits a demographic bias that can negatively affect the performance of models for individuals that are not well represented by the majority demographic. To overcome this demographic bias, we would ideally custom-fit each NLP system to a single individual without relying on relatively large amounts of text from them. Unfortunately, many state-of-the-art language model-based systems utilize neural models and require access to large amounts of text to perform well. Recently, I developed a method to personalize neural language models, that were originally trained on generic text, to increase their performance for an individual, while only requiring a relatively small amount of their own text.
A person’s social media presence becomes their identity and therefore any text that is published from their account affects their public image. Authorship verification involves determining if an individual is the author of a snippet of text and can be used to prevent malicious use of a social media account. My research applies personalized neural language models to determine if a snippet of text belongs to a specific individual without the requirement of relatively large amounts of text from the individual in an unsupervised environment – a setup that does not require any manually labeled instances.
Determining the meaning of a word in context is a common focus for research and has been studied in different tasks such as usage similarity – evaluating the similarity of two usages of the same word – and word sense disambiguation – mapping a word in context to a dictionary-like definition called a sense. These topics can assist downstream applications such as machine translation, question answering, and sentiment analysis. During my Master’s program, I developed a state-of-the art unsupervised usage similarity method, which included embedding a word’s context using a neural network. My current research focuses on word sense disambiguation at the individual level and involves determining the meaning of a word in context for a specific individual. This research involves the creation of a novel dataset that consists of sense-annotated instances of words in text from individuals along with unannotated text from the same individuals. This research coincides with my personalization-based research, since individuals will often use a word for a different meaning. For example, a farmer might use the word spade as meaning a sturdy hand shovel, whereas an avid card player might use it to reference a playing card in the major suit that has one or more black figures on it