October 14, 2019

Montclair State Researchers Work to Ensure the Free Flow of Information

Professors and graduate students in new Computational Linguistics program are taking on computational challenges to tackle censorship and improve machine translation of languages

Posted in: Homepage News and Events, Linguistics, Research

student and faculty discussion while looking at laptop — Computer science and linguistics have come together in a new way at Montclair State University. Shown, graduate students in the MS in Computational Linguistics program gather for their weekly discussion group with Anna Feldman, professor of Linguistics and Computer Science and Jing Peng, professor of Computer Science.

In research that mirrors front-page news, a new Computational Linguistics master’s program at Montclair State University is studying internet censorship, propaganda and fake news.

The MS in Computational Linguistics is the only program of its kind in New Jersey, and the work taking place here includes building computational models to identify figurative language, process resource-poor languages and detect internet censorship.

Learn more about all our cutting-edge programs at the Graduate School Open House on Sunday, October 27.

Graduate School Open House

Anna Feldman, professor of Linguistics and Computer Science, and Jing Peng, professor of Computer Science, are collaborating on automatic identification of idioms and detecting propaganda and biased language in social media and news. Feldman and Christopher Leberknight, an associate professor of Computer Science, are working on detecting and evading internet censorship.

The research is supported by the U.S. National Science Foundation, and along with studies by Eileen Fitzpatrick, the Linguistics department’s chairperson, on automatic deception detection, highlights the collaboration of the faculty and their first cohort of graduate students in this new master’s program. The students meet regularly in a new Natural Language Processing (NLP) lab, which is directed by Feldman, to talk about the work and their diverse interests of study.

Martina Ducret ’18, who earned a degree in English from Montclair State, is identifying linguistic cues in Bulgarian, Russian and Chinese articles for propaganda and fake news. “We’re trying to figure out what kind of characteristics would demonstrate that an article is propaganda versus an unbiased source,” she says.

“The proposal sparked my interest primarily due to the political climate of today,” Ducret says. “As information becomes more and more accessible, so does the spread of misinformation. As an English major, I do find this area particularly interesting – much of my undergraduate work consisted of reading through a ‘lens,’ trying to figure out an author’s motive for writing a piece, or discussing the impact of a work. This research seems very much in line with that, of course with the added layer of looking at it through a computational linguistic perspective.”

Graduate Students Take a World View

The professors are hiring additional students to detect censored content, while also preparing the first cohort of graduate students for their own research projects. Courses, including Machine Learning, Computational Linguistics, Natural Language Processing, and Quantitative Linguistics connect students directly to topics they are investigating.

Among Montclair State graduates, Michael Shehata ’16 is exploring how to improve the transliteration of prayers recited in the sacred Coptic liturgy for his church community. Since graduating with a degree in Linguistics from Montclair State, he has been a project manager working on Arabic at Google NY. At the same time, he’s been studying for a certificate in Computational Linguistics and is now officially enrolled in the master’s program. “I believe it will be really interesting to apply the knowledge of natural language processing to the real world to serve my community and help the church,” he says.

In a project that seeks to eliminate workplace discrimination, Melaney Moffit ’19 is examining how to improve the algorithmic fairness in voice recognition technology for African American English. “If we were here all day, I could give you a whole list of the different times people have been discriminated based on how they speak,” she says, demonstrating a passion for the work common among this first cohort.

The program is also attracting graduate students who earned their bachelor’s degrees at other New Jersey institutions and beyond. Zach Dau studied linguistics and cognitive science at Rutgers University and is continuing work on emojis he began there. “I’ve been given a great opportunity with the NLP lab here at Montclair as a graduate assistant to further my research,” he says.

Carlos Martinez, who earned a bachelor’s degree in Linguistics from The Ohio State University, says his goal is to create voice applications for Amazon Alexa and Google Assistant for Garifuna, a minority language spoken in Honduras, Guatemala and Belize, and the native language of his parents. “I imagine a future where I can say, ‘Hey Alexa, please speak to me in Garifuna. Tell my child a story in the language of my people.’”

Censorship Research

Leberknight and Feldman are working on a project whose goal is to ensure the free flow of information online by detecting and evading internet censorship. The project received support from the U.S. National Science Foundation for three years for a total amount of $1,040,850.

The work is motivated by a series of thought-provoking questions: What are the foreign threats to expose the vulnerabilities with a “free” internet that will one day restrict access to information online? What are the signs of potential tests for information warfare? Will online censorship practiced in nation states have any implication on U.S. internet polices and how we access information?

“Information has often been considered a threat to society that can disrupt the status quo and evoke civil unrest. Consequently, for many centuries authorities have placed tight controls or restrictions on the type information that can be accessed by its citizens. Today, this threat is greater than at any time in history due to the rapid spread and availability of information on the internet,” Leberknight says.

“Our research includes how linguistic components of social media posts might affect the blogposts’ likelihood of being censored,” Feldman says. “We discovered that language itself often determines the destiny of a blogpost. We already can predict if a blogpost is going to be censored with 85% accuracy, at least for Sina Weibo, a Chinese microblogging platform, similar to Twitter.”

Feldman and Leberknight are currently co-organizing a workshop on censorship, disinformation and propaganda, supported by the U.S. National Science Foundation, which will be held in early November in Hong Kong, co-located with one of the top NLP international conferences, Empirical Methods in Natural Language Processing.

Story by Marilyn Joyce Lehren