It's a Matter of Style: Experiments in Style Detection and Transformation with Natural Language Processing
Full natural language understanding requires comprehending not only the content or meaning of a piece of text or speech, but also the stylistic way in which it is conveyed. To enable real advancements in dialog systems, information extraction, and human-computer interaction, computers need to understand the entirety of what humans say, both the literal and the non-literal. This talk presents an in-depth investigation of one particular stylistic aspect, formality. First, we provide an analysis of humans' subjective perceptions of formality in four different genres of online communication, and highlight areas of high and low agreement and extract patterns that consistently differentiate formal from informal text. Next, we develop a statistical model for predicting formality at the sentence level, using rich NLP and deep learning features, and then evaluate the model's performance against human judgments across genres. Finally, we show how one can leverage machine translation methods to automatically transform an informal text into a formal one, paving the way for a generalized method of style transfer.This work was done with Ellie Pavlick (UPenn) during her summer internship at Yahoo Labs and Sudha Rao (UMD) during her summer internship at Grammarly.
Joel Tetreault is Director of Research at Grammarly. His research focus is Natural Language Processing and Deep Learning and applying these techniques to language understanding and the analysis of user-generated content. Currently he works on the research and productization of NLP tools and components for the next generation of intelligent writing assistance systems. Prior to joining Grammarly, he was a Senior Research Scientist at Yahoo Labs, where he worked on comments and news feed ranking, as well as hate speech detection; Senior Principal Manager of the Core Natural Language group at Nuance Communications, Inc.; and worked at Educational Testing Service for six years as a managing research scientist where he researched automated methods for essay scoring, detecting grammatical errors by non-native speakers, plagiarism detection, and content scoring. Tetreault received his B.A. in Computer Science from Harvard University and his M.S. and Ph.D. in Computer Science from the University of Rochester. He was also a postdoctoral research scientist at the University of Pittsburgh's Learning Research and Development Center, where he worked on developing spoken dialogue tutoring systems. In addition, he has co-organized the Building Educational Application workshop series for 11 years, several shared tasks, and is NAACL Treasurer.