Brown Bag Talks


College of Humanities and Social Sciences (CHSS)

[Supported by the U.S. National Science Foundation: Grant No. 1048406]

The purpose of the group is to discuss issues in large datasets, language, and speech processing. We are hoping to bring our cross-disciplinary strengths to these areas, to share ideas, to discuss the current state of the art, and to collaborate on research topics. 


Schedule (2015-2016)


Conrad Schmitt 104
Wednesday, March 23, 11:30AM
Speaker: Eliecer Crespo-Fernández (University of Castilla-La Mancha, Spain)
Political language is consciously constructed with particular goals in mind. To find the right choice of words to address potential voters is key for political actors not only to give a positive image of themselves but also of the parties they represent. As it is the convention in politics to appear sensitive to people’s worries and sensitivities, politicians try hard to avoid or attenuate all those words which may sound unpleasant, cause discomfort and hence put their public images at risk. To this end, they resort to a wide range of euphemistic strategies when dealing with delicate or embarrassing issues.
Following a critical discourse approach to political language, the goal of this talk is to gain an insight into the strategic functions that euphemism performs in the discourse of local and state politicians from New Jersey. The analysis is based on a sample of their public comments excerpted from New Jersey’s largest newspaper, The Star-Ledger. 
The results reveal that New Jersey local and state legislators employ euphemism, as part of a major strategy of positive self-presentation, for a variety of purposes: first to refer to delicate topics without sounding inconsiderate to socially disadvantaged groups; second, to provide a safe ground for verbal attack and criticism; and third, to deliberately conceal from the public controversial topics.   
Bio:      Eliecer Crespo-Fernández, Department of Modern Languages, University of Castilla-La Mancha, Spain. His research interests focus on the lexical, semantic and pragmatic dimensions of euphemism and dysphemism which he has approached from different frameworks like (critical) discourse analysis, applied cognitive semantics and critical metaphor theory. He has recently focused on euphemism in political language, with special attention to the role of metaphor, and on the axiological potential of anglicism in discourse.
He has authored four books: El eufemismo y el disfemismo (2007); El lenguaje de los epitafios (2014); Sex in Language. Euphemistic and Dysphemistic Metaphors in Internet Forums (2015); and Describing English. A Practical Grammar Course (in press). He has also edited the special issue entitled Current Trends in Persuasive Discourse (2009) and co-edited the collective volume Euphemism as a Word Formation Process (2012). He has published a number of book chapters and research articles in major journals such as Text&Talk, Spanish in Context, Bulletin of Hispanic Studies and Review of Cognitive Linguistics.



Wednesday, February 24, 2016, 1pm-2pm

Schmitt Hall 104

 Speaker: Dr. Iryna Dilay, Ivan Franko Lviv National University, Ukraine

 Cognitive verbs in English: an attempt at a comprehensive lexicosemantic study 

The presentation is concerned with an attempt at a comprehensive study of paradigmatic, syntagmatic and motivational properties of English cognitive verbs as a lexicosemantic group within the semantic field of cognition. I am going to start with the inventory procedure followed by the analysis of semantic relations of the verbs, focusing on both semasiological and onomasiological vantage points of the research. Closely related to the paradigmatic semantic properties are syntagmatic properties of the cognitive verbs manifested in the peculiarities of their valence, collocations and frame structure. I will discuss the deep structure relations of the cognitive verbs underlying their prevalent surface structure S + V + O ± D. Then, based on the corpus-driven evidence, the subliminal meaning of the cognitive verbs collocations will be subject to analysis. Finally, the motivational relations of the cognitive verbs as the third dimension of the lexical system will be addressed in terms of semantic change and cognitive mappings. The suggested approach reveals the principal tendencies underlying the development of the group of English cognitive verbs as a non-discrete dynamic system reflecting the generative potential of the mental lexicon.

Bio:  Dr. Dilay is an Associate Professor in the English Department at Ivan Franko Lviv National University in Ukraine and a Fulbright visiting researcher in the Linguistics Department at Montclair State University (September 2015 – June 2016). Her research focuses on corpus-based study of English verbs. Iryna Dilay’s scholarly interests encompass corpus linguistics, semantics and syntax of verbs, pragmatics, cognitive linguistics, computational linguistics, and Natural Language Processing.



Dr. Byron Ahn (Swarthmore College)
Friday, November 20, 3:30 PM,  Schmitt Hall 104
Focusing on Reflexives
In this talk, I analyze novel data, in which focus accents in English occasionally occur in unexpected places. 
Compare the placement of focus in the two question/answer pairs below.
(1)Q: Who embarrassed Jenna?  (Agent Question)
A: DANNY embarrassed Jenna.  (Sole Narrow Focus on Agent)
(2)Q: Who embarrassed Jenna?   (Agent Question)
A: Jenna embarrassed HERSELF.  (Sole Narrow Focus on Reflexive)
This pattern in (2), in which a reflexive anaphor bears the focus accent, is striking: WH questions about the *agent* typically require a focus accent on the *agent* in the answer (e.g. Halliday 1967, Krifka 2004, among many others). Critically, this pattern only arises in certain syntactic contexts. Exploration where it is (un)available indicates that syntactic derivations must directly influence prosodic representations.
This research leads to three important conclusions. First, there are at least two types of reflexives in English, though they appear morphologically identical. Second, English reflexivity involves hidden structures that resemble more obvious structures in other languages. Third, and most broadly, the distribution of focal stress (and prosody in general) can be used by theoreticians, learners, and hearers as cues for abstract syntactic structures.
Brief bio:
Dr. Ahn recently completed his PhD at UCLA (Jan 2015), and has been a visiting assistant professor at Boston University (2014-2015) and Swarthmore College (2015-present). In general, his research program is aimed at reducing the amount of theoretical machinery involved in our model of Language, while expanding its empirical base. Specifically,  he is most interested in the syntactic nature of predicates and their arguments, and explores it in English with syntactic and prosodic tools. He's worked on a range of topics touching on syntax and prosody: the syntax of phrasal stress, reflexive anaphora, emphatic reflexives, grammatical voice, the nature of the syntax-phonology interface, nominal structure in Tongan, intonational contours in yes/no questions of English, tough constructions in English, and Japanese case marking.



Schedule (2014-2015) 



Wednesday,  April 29, 2015 2:30PM
Conrad Schmitt 204
Adams Meyers (New York University) 
Escaping Verb-Centrism through NomLex and NomBank: Missing Links in Predicate Argument Structure


Wednesday, February 25, 2015 @1pm
Conrad Schmitt 204
Mats Rooth (Cornell University)
Headed Span Theory in the Finite State Calculus
Headed span theory in phonology is an account of the phonological substance that represents an autosegment such as a nasality or ATR feature as a labeled interval in a line, rather than as a vertex in a graph.  The intervals (or spans) have distinguished head positions. Span theory is attractive in computational approaches to phonology that work with finite state sets of strings and finite state relations between strings, because of the possibility of straightforward string encodings of the phonological representations. This talk takes up the problem of working out a detailed, computationally executable construction of span theory in a finite state calculus.  This includes a construction the constraint families of headed span theory as operators. For instance, the headed faithfulness constraint FthHdSp(F,X) penalizes underlying segments with value X for the feature F which do not head an [F,X] span on the surface.  The family is represented as an operator that constructs a finite state relation that inserts violation marks from a feature and a value. The constraint is directly used in the finite state calculus to optimize a candidate set. At the end, I sketch a proposal for representing transparent segments in harmony using embedded exception spans. 
Mats Rooth is a professor at Cornell University, in the departments of Linguistics and Computing & Information Science.  He does research in two areas: computational linguistics and natural language semantics. He has worked extensively on mixed symbolic/probabilistic models of syntax and the lexicon, on contrastive intonation (what is called focus), and on related phenomena such as ellipsis and presupposition. In addition to these, he is currently working on finite state optimality theory and web harvesting of intonational data.



Wednesday, January 21, 2015, 1pm & 2pm
Conrad Schmitt 104

Talk 1: The deterministic prosody of wh-indeterminates
Jiwon Yun (Stony Brook University)
In a number of languages, wh-words are ambiguous between interrogatives and indefinites (e.g. nuku 'who/someone' in Korean). This talk concerns how the ambiguity of such 'wh-indeterminates' is resolved by prosody, focusing on the case of Korean. Contrary to the previous impressionistic observations that indefinites are distinguished from their interrogative counterparts by lack of phonological prominence, my experimental results indicate that it is phonological phrasing that plays a crucial role in disambiguating wh-indeterminates. The results further show that the role of phonological prominence is rather to force a semantically wide scope interpretation.

Talk 2: Debunking the vowel shift hypotheses in Mongolic and Korean
Seongyeon Ko (CUNY)
The goal of this talk is to reject the two famous vowel shift hypotheses in the so-called "Altaic" linguistics, which hold in their core that the vowel harmony systems of the oldest attested Mongolic and Korean language (= Old Mongolian and Middle Korean) are based on the "palatal" contrast (front vs. back vowels) and have developed into the modern systems through sequential chain shifts of vowel qualities. Instead, it is argued based on the comparative method, the typology of vowel shifts, and the phonetics of vowel features that both Old Mongolian and Middle Korean vowel harmonies can be best characterized as those based on the feature [Retracted Tongue Root]. It follows then that there were no vowel shifts in the vocalic history of Mongolic and Korean as previously claimed.


Natasha Abner
Morphology in Child Homesign: Evidence from Number Marking
March 18, 2015
Homesigners are deaf individuals who are not exposed to conventional sign languages and create sign systems 'from scratch' to communicate with the people around them. This situation provides an opportunity to study the role of language input in language development by investigating the language-like properties that do and do not emerge in homesign. In this research, we investigate the innovation of number language by child homesigners. We show that child homesigners have distinct gestural devices for expressing information about number and that they combine these devices with both deictic (pointing) and iconic gestures. Analysis of the homesigners' form-based gesture classes also reveal that they exhibit systematic form-meaning mappings characteristic of a morphological system. We also compare number language in child homesign to number expressions in mature (sign) languages.



 Wednesday, October 8, 2014, 1pm

Conrad Schmitt 104

Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions

Jing Peng (Computer Science, Montclair State University)
Anna Feldman (Linguistics & Computer Science, Montclair State University)
Ekaterina Vylomova (Computer Science, Bauman Moscow State Technical University & Linguistics, Montclair State University)


We describe an algorithm for automatic classification of idiomatic and literal expressions. Our starting point is that words in a given text segment, such as a paragraph, that are high-ranking representatives of a common topic of discussion, are less likely to be a part of an idiomatic expression. Our additional hypothesis is that contexts in which idioms occur, typically, are more affective; and therefore, we incorporate a simple analysis of the intensity of emotions expressed by contexts. We investigate the bag of words topic representation of one to three paragraphs containing an expression that should be classified as idiomatic or literal (a target phrase). We extract topics from paragraphs containing idioms and from paragraphs containing literals using an unsupervised clustering method, Latent Dirichlet Allocation (LDA) (Blei et al., 2003). Since idiomatic expressions exhibit the property of non-compositionality, we assume that they usually present different semantics than the words used in the local topic. We treat idioms as semantic outliers, and the identification of a semantic shift as outlier detection. Thus, this topic representation allows us to differentiate idioms from literals using local semantic contexts. Our results are encouraging.




 Schedule (2013-2014) 

Friday, April 11, 2014

Keelan Evanini (ETS) 

Friday, April 11, 1:30PM
Conrad Schmitt Hall 104

Using Pitch Contours to Improve Automated Spoken Language Proficiency Assessment

An utterance’s prosodic cues (such as intonation and stress) are important for successful communication, but many non-native speakers have difficulty mastering these aspects of a second language.  In particular, pitch contours can be difficult for non-native speakers of English, since a variety of contours, each associated with different meanings, can be appropriate for a given utterance.  This study focuses on the assessment of pitch contours in the context of an automated spoken language proficiency assessment eliciting read speech.  Various measures of representing a speaker’s pitch contour will be presented, along with several methods of comparing a given contour to a model of native speaker pitch contours.  Additionally, linguistic information is used to select subsets of the words in the reading passage that are important for the utterance’s prosody in order to improve the performance of the features.  Results show that the inclusion of the proposed features based on a speaker’s pitch contour improve the performance of an automated spoken language assessment system.


Keelan Evanini is a Managing Research Scientist in the NLP & Speech group in the Research & Development division at Educational Testing Service.  His research focuses on developing methods to automatically assess various linguistic aspects of non-native speaking proficiency, including pronunciation, intonation, fluency, and content appropriateness.  He has also conducted research into improving other components of a system for automated non-native spoken language assessment, such as the detection of plagiarized spoken responses and optimizing ASR performance for non-native speech.  He received a B.A. in Linguistics from the University of California, Berkeley and a Ph.D. in Linguistics from the University of Pennsylvania.


Mark Hubey (Computer Science, Montclair State University)

Friday, November 22, 1PM, Conrad Schmitt 104
Dissimilarity/Distance, Correlation, and All that


Emily Hill  (Computer Science, Montclair State University)
Friday, November 15, 1PM, Conrad Schmitt 104
Evaluating Feature Location Techniques for Software Maintenance
Today's software is large and complex, with systems consisting of millions of lines of code. New developers to a software project face significant challenges in locating code related to their maintenance tasks of fixing bugs or adding new features, called feature location. Developers can simply be assigned a bug and told to fix it---even when they have no idea where to begin. In fact, research has shown that a developer typically spends more time locating and understanding code during maintenance than modifying it. We can significantly reduce the cost of software maintenance by reducing the time and effort to find and understand the code relevant to a software maintenance task. 
In this talk, we will explore state of the art approaches to feature location for software maintenance, how they are currently evaluated, and whether elements of a feature can be generically categorized.
Emily Hill is an Assistant Professor at Montclair State University. Her primary research interests are in software engineering; specifically on reducing software maintenance costs through building intuitive software engineering and program comprehension tools. Her research is inter-disciplinary and combines aspects of software engineering, program analysis, natural language processing, computational linguistics, information retrieval, text mining, and machine learning.You can find out more on her web site:
Friday, October 25, 1PM @Conrad Schmitt 104
Marina Kunshchikova
Department of German  Linguistics
The Ural  State University, Yekaterinburg, Russia
Classroom bilingualism and second language wordplay
Bilingualism or two linguistic world’s correlation  is a well-known second language acquisition (SLA)  phenomenon that has been of a great academic interest for various research and fields of study. Classroom bilingualism (CB)  is a relatively new term that has its own place in the theory of SLA and bilingualism. CB is the subject of different linguistic disciplines including general linguistics, psycholinguistics, cognitive linguistics and sociolinguistics. We analyze the second language speech of the high English level students (bilinguals). Our research deals with the  non-standard (creative)  linguistics features of the classroom bilingual speech. The material of the research is  based on the students’ written English speech patterns in the situation of the classroom English –Russian bilingualism (80 essays or 600 000 printed characters have been analyzed) and also on the scripts of the oral English speech classes (60 academic hours or 500 000 printed characters). As a result, the examples of the word-play have been found  at all levels of the language system (the graphical, morphological, lexical, syntactic levels).  New found second language coinages at several levels of the language system  help to develop the theory of language creativity / the theory of the wordplay  and to contribute to the theory of the inerlanguage. 
Bio: Marina Kunshchikova is a postgraduate student at the Ural Federal University and an instructor of English in the German philology department. She’s currently a Fulbright visiting researcher at Montclair State University. Her research interests are second language acquisition, psycholinguistics, bilingualism and language creativity as well as corpus linguistics.
Xiaofei Lu
Associate Professor
Department of Applied Linguistics
The Pennsylvania State University

Friday, September 27, 2:30PM
Conrad Schmitt 104

A historical analysis of text complexity of the American reading curriculum 

The widely adopted Common Core State Standards (CCSS) call for raising the level of text complexity in textbooks used by American students across all grade levels; the authors of the English language arts component of the CCSS build their case for higher complexity in part upon a research base they say shows a steady decline in the difficulty of student reading textbooks over the past half century. In this interdisciplinary study, we offer our own independent analysis of third and sixth grade reading textbooks used throughout the past 115 years. Our dataset consists of 8,041 reading texts selected from 117 textbook series issued by 30 publishers, resulting in a corpus of roughly 10 million words. Each reading text in the corpus was assessed using a large set of readability, lexical complexity, and syntactic complexity measures. Contrary to previous reports, we find that text complexity has either risen or stabilized over the past half century; these findings have significant implications for the justification of the CCSS as well as for our understanding of a “decline” within American schooling more generally.

Bio: Xiaofei Lu is Gil Watz Early Career Professor in Language and Linguistics and Associate Professor of Applied Linguistics at The Pennsylvania State University. His research interests are primarily in corpus linguistics, computational linguistics, and intelligent computer-assisted language learning. 


Matt Mulholland and Joanne Quinn (Montclair State University)

Wednesday, September 25, 2013 4PM, Conrad Schmitt 104

Title: Suicidal Tendencies: The Automatic Classification of Suicidal and Non-Suicidal Lyricists Using NLP

Abstract: TBA 


Schedule (2012-2013) 


Friday, May 3, 2013

Location: CS104

Time: 10:30AM

Lisa Radding (Ethnic Technologies, LLC, S. Hackensack, NJ)

Applied Linguistics: Onomastics and Direct Marketing


I am an Onomastician. But to what end? Direct Marketing. More specifically, I facilitate target marketing by ethnicity, religion, language preference, etc. Companies can increase their ROI by creating specialized marketing campaigns, meant to engage specific clientele. To do this, they require predictive consumer intelligence that doesn’t encroach on an individual’s privacy. But basic information about an individual exists within the person’s name. I design a software product that uses onomastic research at its core to enable target marketing.

Linguists develop logical thinking and analytical skills, and gain experience organizing data sets and testing the validity of hypothesized general rules. This valuable skill set can be applied to multiple passions, in a variety of industries.  In my case study, the passion is Onomastics and the industry is Direct Marketing.


Lisa Radding is the Director of Research at Ethnic Technologies, LLC. In this position, she envisions, researches, and writes methodology enhancements for E-Tech, the industry-leading product in multicultural marketing. As a linguist, but specifically an onomastician (an expert in the study of proper names), she maintains and improves the ethnic name research at the core of E-Tech. Additionally, she is instrumental in the development of sister products under the E-Tech brand that enable added database segmentation particularly focused on the Hispanic, Asian, and African American markets. Ms. Radding has published work in onomastics in the Geographical Review, and has presented her work in this academic field, and in the context of Direct Marketing. Additionally, she currently serves on the Executive Council of the American Name Society.


Friday, April 19, 2PM

Schmitt Hall 204

Michael Flor, Educational Testing Service (Princeton, NJ)


ConSpel: Automatic spelling correction and the power of context.




Single-token non-word misspellings are the most common type of misspellings in student essays. This paper presents an investigation on using four different types of contextual information for improving the accuracy of automatic correction of such errors. The presentation includes three parts. In part one, I describe the methodology and tools we used for annotating misspellings in a corpus of 3000 student essays written by native and nonnative speakers of English, to the writing prompts of TOEFL® and GRE® tests. Part 2 presents the principles and architecture of a new spell-checking system (ConSpel) that utilizes contextual information for automatic correction of non-word misspellings. The task is framed as contextually-informed re-ranking of correction candidates. Will also briefly touch on technical innovations that make this system possible. Part 3 presents an investigation on using four different types of contextual information. The effectiveness of proposed methods is evaluated with the annotated corpus of essays. Using context-informed re-ranking of candidate suggestions, the ConSpel system exhibits very strong error-correction results. It also corrects errors generated by non-native English writers with almost same rate of success as it does for writers who are native English speakers.


Michael Flor is an associate research scientist in the NLP & Speech group, at the R&D Division of Educational Testing Service (ETS). He earned his PhD in cognitive psychology with specialization in psycholinguistics, from Tel Aviv University, Israel. Michael has also worked as a computational linguist for start-up companies, developing natural-language processing algorithms for content-personalization and search-engine applications. At ETS, Michael specializes in NLP research and systems development, focused on automatic processing of text data, combining statistical, linguistic and cognitive approaches.


March 22, 2013, 2PM

Schmitt Hall 104

Origins and development of spatial language: Some complexities

Barbara Landau

Johns Hopkins University

The acquisition of spatial language has historically provided an important test-bed for theories of the relationship between language and non-linguistic representation. According to one hypothesis, spatial language emerges from a foundation of non-linguistic spatial concepts that are present pre-linguistically; in this view, the child selects from a prior set of spatial concepts on the basis of linguistic input, electing only a subset of the spatial distinctions that are universally present in his non-linguistic repertoire.  These basic distinctions are shown in the child’s earliest spoken language, and correspond to concepts such as containment and support.  By contrast, a second hypothesis states that spatial language emerges strictly as a function of linguistic input; in this view, the child creates new spatial concepts (not previously available) on the basis of this linguistic input. In this talk, I consider several challenges to both of these hypotheses.  Chief among these challenges is the role played by the combinatorics inherent in spatial language, and hence the meanings that can be expressed across languages.  Acknowledging the complexities of the mapping between spatial language and spatial concepts forces us to abandon simplistic hypotheses and begin to think about learning in new and more subtle ways. 

About Barbara Landau:


March 1, 2013, 2PM

Schmitt Hall 104

Smaranda Muresan (Rutgers University)
Title: Computational Models for Context-dependent Deep Language Understanding
Within the last decade, machine learning techniques have significantly advanced the
field of natural language processing. However, most state-of-the-art machine learning models suffer from a fundamental limitation: they are based on finite-state or context-free formalisms which are inadequate for capturing the whole range of human language. Without richer, linguistically-inspired  formalisms  these models may soon reach a plateau for studies and applications in which deep language processing is needed. In this talk I will introduce a grammar formalism for deep linguistic processing --- Lexicalized Well-Founded Grammar (LWFG) --- that allows context-dependent interpretation of utterances and at the same time is learnable from data. Once a grammar is learned, a LWFG parser and semantic/pragmatic interpreter map text to its underlying meaning representation. I will discuss three key features of my framework for deep language understanding: 1) a rich but learnable grammar formalism; 2) a model that can learn complex representations from a small amount of data; and 3) context modeling. I will discuss how this framework can be used for learning consumer-health terminologies from text. If time allows, I will talk about our new NSF-funded project on developing technologies for teaching computer agents to follow instructions given in natural language in order to allow them to learn how to carry out complex tasks on behalf of a user (project in collaboration with Michael Littman and Marie desJardins).  
Smaranda Muresan is an assistant professor in the  Library and Information Science Department, School of Communication and Information at Rutgers University. She is the co-director of the Laboratory for the Study of Applied Language Technologies and Society, and a graduate faculty in the  department of Computer Science. She received her PhD in Computer Science from Columbia University in 2006. Before coming to Rutgers she was a Postdoctoral Research Associate at the  Institute for Advanced Computer Studies at the University of Maryland. Her research focuses on computational models for language understanding and learning, with applications to health informatics, human-computer instruction, and computational social science. Her research is funded primarily by NSF and DARPA. 



Speaker:  Nitin Madnani (Educational Testing Service, Princeton, NJ)

Time: 1PM

Date: Friday, February 22, 2013 

Location: CS-104

Light refreshments will be served


"What Test Takers Say: Analyzing Argument Organization and Topical Trends in Essays"

 In this talk, I will present two strands of natural language processing research at ETS that were designed to help us understand the nature of test-taker writing in essays.

In the first, I will talk about our research on test taker responses to argument-driven prompts, which contains not only language expressing claims and evidence, but also language used to organize these claims and pieces of evidence. Differentiating between the two may be useful for many applications and I will discuss our automated approach to detecting such high-level organizational elements in argumentative discourse.

 In the second part of the talk, I will discuss our research on test takers responses to more generic prompts about social issues. Without an understanding of the trends reflected in these responses, automated scoring systems may not be reliable and may also worsen over time. Our preliminary approach analyzes topical trends in test takers' responses and correlates these trends with those found in the news. We find evidence that many trends are similar across essays and the news but also observe some interesting differences.


 Bio: Nitin Madnani is currently a Research Scientist with the Text, Language and Computation group at the Educational Testing Service in Princeton, NJ. He received his PhD in Computer Science in 2010 from University of Maryland, College Park where he worked with Bonnie Dorr and Philip Resnik on a number of NLP topics but focused on Statistical Machine Translation and Automatic Paraphrase Generation. At ETS, he spends his time on building paraphrase models for use in automated test scoring, improving grammatical error correction, analyzing sentiment in essays and being the resident Python and information visualization geek. More details about his work can be found at




Speaker: Serguei Pakhomov, College of Pharmacy, University of Minnesota

Time: 2PM

Date: Friday, November  16, 2012

Location: Conrad Schmitt Hall 104

Title: Computerized Assessment of Spoken Language for Pharmacodynamic Analysis



 In this talk, I will introduce a new area of research that applies computational linguistic methods and tools to the assessment of adverse effects of neuroactive medications. Some anti-epileptic medications have been reported to cause word finding difficulties in a subset of people who take these medications. These word finding problems are currently not very well defined and are difficult to quantify and measure precisely. Furthermore, the underlying brain mechanism(s) that are responsible for these deficits are currently not known. One of the main objectives of our interdisciplinary group at the University of Minnesota Center for Clinical and Cognitive Neuropharmacology (C3N) is to develop methods for more precise and reproducible measurement of these deficits towards better characterization of their behavioral manifestations and the underlying mechanism(s). I will describe a range of computerized instruments that we have developed  and that are designed to extract speech and language characteristics from spontaneous speech. In particular, I will present the results of a study of 20 volunteers that were randomized to receive an anti-epileptic medication (topiramate), an anxiolytic medication (lorazepam), or placebo. In one of the cognitive assessment tasks, the subjects were asked to describe a picture. Their responses were audio recorded and subsequently examined in a semi-automated fashion to measure speech fluency. Our findings so far indicate that speech fluency characteristics including duration of silent pauses and the rate of disfluent speech events (um's and ah's, word fragments and repetitions) are sensitive to the effects of topiramate and constitute a promising direction for further research.


 Short Bio:

 Dr. Pakhomov currently is an Associate Professor at the University of Minnesota College of Pharmacy. He is a co-founder of the University of Minnesota Center for Clinical and Cognitive Neuropharmacology, a member of the Center for Cognitive Sciences and an affiliate member of the Institute for Health Informatics at the University of Minnesota. Dr. Pakhomov earned a Doctorate degree in Linguistics with a Cognitive Science minor  from the University of Minnesota in 2001. Prior to his academic appointment at the University of Minnesota, he worked as a research scientist in several commercial and academic organizations including the Mayo Clinic, Lernout and Hauspie, Inc. and Linguistic Technologies, Inc. Dr. Pakhomov’s primary research interest is in applying computational methods to analyze and quantify speech and language characteristics affected by neurodegenerative disorders and neuroactive medications.


Speakers:  Jing Peng (Computer Science) & Anna Feldman (Linguistics & Computer Science)

Time: 2PM, Friday, October 12, 2012

Location: Schmitt Hall 204, 2-4PM

Title:    Identifying Figurative Language in Text: First Results


In our talk we will discuss several experiments whose goal is to
automatically identify figurative language in text. While our
approach does not have to be limited to a specific type of figurative
language, we concentrate on idioms (and metaphors, to some extent).
We explore several hypotheses: 1) the problem of automatic idiom
detection can be reduced to the problem of identifying an outlier in a
dataset; 2) instead of extracting multiword expressions and then
determining which belongs to the idiomatic class, we view the process
of idiom detection as a binary classification of sentences. We apply
principal component analysis (PCA) (Jolliffe 1986; Shyu et al. 2003) for outlier
detection. Detecting idioms as lexical outliers does not exploit class
label information. So, in the following experiments, we use linear
discriminant analysis (LDA) (Fukunaga 1990) to obtain a
discriminant subspace and later use the three nearest neighbor 3NN
classifier to obtain accuracy. We discuss pros and cons of each
approach. All the approaches are more general than the previous
algorithms for idiom detection -- neither do they rely on target idiom
types, lexicons, or large manually annotated corpora, nor do they
limit the search space by a particular type of linguistic




Schedule (2011-2012)

Speaker: Professor of Linguistics, Emerita Alice Freed
Time: Friday, April 20, 2012 1PM
Location: Conrad Schmitt 104
Light refreshments will be served


Language and gender in the public eye: Still “different” after all these years. 


This talk is a preliminary review and update of the evidence and arguments I compiled about 10 years ago for a chapter in The Handbook on Language and Gender (Freed 2003). In the chapter, I argued that after 30 years of research on language, sex and gender - now nearly 40 - a significant discrepancy existed between public perceptions of how women and men speak (and how they are expected to speak) and the actual character of the language that people use. The persistence of this contradiction seemed to underscore the vitality of well-entrenched stereotypes about sex and gender and the weight and influence of societal efforts to maintain the impression of difference between women and men. What has changed is the type of evidence being presented today in support of these so-called differences. Brain research and the new “ null” is increasingly used to support claims of male-female language difference.            

In 2003, I explained that despite the vast quantities of naturally occurring speech samples from a wide range of contexts that null, linguistic anthropologists, and other scholars had analyzed – from the amount of talk, to the structure of narratives, the use of questions, to the availability of cooperative and competitive speech styles - no consistent pattern had been found between either sex or gender and the characteristics of the way we use language. Yet, despite the enormity of our research results, the public representation of the way women and men speak was, and I will argue, still is almost identical to the characterization provided in the middle of the last century. 

Sources of evidence about public views of language difference will come mainly from (1) a review of several articles that have recently appeared in the popular press/ mass media and (2) a preliminary analysis that uses several on-line language corpora and library databases that search for the occurrence of “gender difference” in media sources for the years 2000-2010. 

Freed, Alice F. 2003. “Epilogue: Reflections on Language and Gender Research.” In The Handbook on Language and Gender. Janet Holmes and Miriam Meyerhoff, (Eds.) Oxford: Blackwell Publishers. Pp. 699-721.

Joel Tetreault 


Wednesday, April 11, 2012, 2:30PM

Schmitt Hall, Room 110
(Light refreshments will be served)

 A New Twist on Methodologies for ESL Grammatical Error Detection 



The long-term goal of our work is to develop a system which detects errors in grammar and usage so that appropriate feedback can be given to non-native English writers, a large and growing segment of the world's population. Estimates are that in China alone as many as 300 million people are currently studying English as a second language (ESL). In particular, usage errors involving prepositions are among the most common types seen in the writing of non-native English speakers. For example, Izumi et al., (2003) reported error rates for English prepositions that were as high as 10% in a Japanese learner corpus.

Since prepositions are such a nettlesome problem for ESL writers, developing an NLP application that can reliably detect these types of errors will provide an invaluable learning resource to ESL students. In this talk we first review one popular machine learning methodology for detecting preposition and article errors in texts written by ESL writers. Next, we describe a novel approach to ESL grammatical error detection: using round-trip machine translation to automatically correct errors.

This is joint work with Nitin Madnani (ETS) and Martin Chodorow (CUNY).

Speaker Bio:

Joel Tetreault is a Managing Research Scientist specializing in Computational Linguistics in the Research & Development Division at Educational Testing Service in Princeton, NJ. His research focus is Natural Language Processing with specific interests in anaphora, dialogue and discourse processing, machine learning, and applying these techniques to the analysis of English language learning and automated essay scoring. Currently he is working on automated methods for detecting grammatical errors by non-native speakers, plagiarism detection, and content scoring methods. Previously, he was a postdoctoral research scientist at the University of Pittsburgh's Learning Research and Development Center (2004-2007). There he worked on developing spoken dialogue tutoring systems. Tetreault received his B.A. in Computer Science from Harvard University (1998) and his M.S. and Ph.D. in Computer Science from the University of Rochester (2004).



Tuesday, January 31, 2012

Cohen Lounge (Dickson Hall), 2PM
(Light refreshments will be served)

Learning to Generate Understandable Animations of American Sign Language

Matt Huenerfauth, Associate Professor, City University of New York (CUNY)

A majority of deaf high school graduates in the U.S. have a fourth-grade English reading level or below, and so computer-generated animations of American Sign Language (ASL) could make more information and services accessible to these individuals.  Instead of presenting English text on websites or computer software, information could be conveyed in the form of animations of virtual human characters performing ASL (produced by a computer through automatic translation software or by an ASL-knowledgable human scripting the animation).  Unfortunately, getting the details of such animations accurate enough linguistically so that they are clear and understandable is difficult, and methods are needed for automating the creation of high-quality ASL animations.

This talk will discuss my lab's research, which is at the intersection of the fields of assistive technology for people with disabilities, computational linguistics, and the linguistics of ASL.  Our methodology includes: experimental evaluation studies with native ASL signers, motion-capture data collection of an ASL corpus, linguistic analysis of this corpus, statistical modeling techniques, and animation synthesis technologies.  In this way, we investigate new models that underlie the accurate and natural movements of virtual human characters performing ASL; our current work focuses on modeling how signers use 3D points in space and how this affects the hand-movements required for ASL verb signs.

About the Speaker:

Matt Huenerfauth is an associate professor of computer science and linguistics at the City University of New York (CUNY); his research focuses on the design of computer technology to benefit people who are deaf or have low levels of written-language literacy.  He serves as an associate editor of the ACM Transactions on Accessible Computing, the major computer science journal in the field of accessibility for people with disabilities.  In 2008, he received a five-year Faculty Early Career Development (CAREER) Award from the National Science Foundation to support his research.  In 2005 and 2007, he received the Best Paper Award at the ACM SIGACCESS Conference on Computers and Accessibility, the major computer science conference on assistive technology for people with disabilities; he is serving as general chair for this conference in 2012.  He received his PhD from the University of Pennsylvania in 2006.



Monday, December 12, 1PM
Special Collections Room @ Sprague Library

On the roles of CODAs in sign language research: Separating L1/L2 acquisition from hearing status

Dr. Marie Nadolske

This study examines narratives of three groups of American Sign Language (ASL) signers: Deaf native signers (DOD), hearing native signers (CODAs), and highly proficient non-native hearing  signers (L2). Through the examination of several language domains, acquisition patterns can be identified based on whether ASL was learned as a first or second language. Alternately, differing language patterns were identified based on whether a signer had “normal” hearing or was Deaf. These findings resulted from the inclusion of the CODA group in this study. Without their valuable data, differences between the hearing L2 signers and Deaf L1 signers would be solely attributable to language acquisition status with no acknowledgement of the potential complications of being a bimodal-bilingual individual.

Friday, November 11, 2PM
Special Collections Room @ Sprague Library

Perception-Production Relations in Phonological Development
Dr. Tara McAllister Byun
 (Communication Sciences and Disorders)

Many children who neutralize phonemic contrasts in production exhibit diminished perceptual discrimination of the same contrasts. It has proven difficult to determine whether these parallel errors reflect the influence of a primary perceptual deficit on production, or vice versa. I will offer evidence on the direction of causation by comparing positional influences on speech production and perception in one four-year-old boy with phonological disorder. The case study subject neutralized some phonemic contrasts only in initial position, a context known to have enhanced perceptual salience to adult listeners. This unique phenomenon in child phonology has been proposed to arise from a child-specific pattern of perceptual sensitivity favoring final position. However, in a nonword discrimination task, the subject was significantly more accurate in detecting contrasts in initial position, where his production errors occurred. In light of this mismatch, I conclude that the subject's errors must be the consequence of a production-oriented factor. Independent of position, however, the subject's perception of a phonemic contrast he neutralized in production was decreased relative to other contrasts. I thus argue that this case represents an unambiguous example of a perceptual deficit arising from a primary deficit in the production domain.




Friday, October 14, 2011 2PM
Special Collections Room @ Sprague Library

Dr. Laura Lakusta (Psychology, Montclair State University)

Language and memory for motion events: The asymmetry between source and goal paths over development.

Human beings talk about events. The capacity to do so requires an interface between spatial cognition and language. However, given that  the format of linguistic and non-linguistic representations is likely to differ, the question arises of how these two systems map onto each other and how these mappings are learned. I will present research suggesting one possible solution to this problem: a homology exists between the non-linguistic and linguistic representations of Source and Goal paths.  First, when linguistically describing a broad range of events, children and adults are more likely to encode the Goal path rather than the Source path. A Goal bias is also found when individuals represent events non-linguistically, and even extends to the event representations of pre-linguistic infants. Thus, an asymmetry between Goal and Source paths is common to both linguistic and non-linguistic structure and is found early in development. In the second part of my talk, I will present research exploring the strength of this homology. Is a Goal bias rooted generally in cognition or is specific to intentional events? Research with infants, children, and adults suggest the latter – a Goal bias in non-linguistic cognition shows up most strongly for intentional events. These findings raise the important question of how children learn to collapse over conceptual domains for purposes of expressing Paths in language. 


Thursday, October 6, 2011 1PM
Special Collections Room @ Sprague Library

Analysis of Temporal Processing During Sentence Comprehension
Dr. Mary Call
 (Linguistics) and Dr. David Townsend (Psychology)

For several years, we have been studying the comprehension of temporal relationships in English by native speakers of English.  We are now extending this research to include the processing of temporal relationships by English language learners.   In a pilot study, we collected self-paced reading data from native speakers of Spanish that differs in interesting ways from similar data collected from native speakers of English.  One of our current projects is to carry out a larger scale study of this phenomenon.


In addition, we are testing the influence of the first language (Spanish, in this case) on the judgments that learners make about English sentences containing stative verbs that occur in a context-establishing when clause.  If these learners are relying on their Spanish (L1) strategies, we predict that they will choose verb forms that are either ungrammatical or less-preferred in English.


Schedule (2010-2011)

Where: Special Collections Room, Sprague Library
When: Wednesday, April 27, 2011 2PM

“Coordinating two minds:  Do familiar partners such as friends and couples have a communicative advantage compared to strangers?”

Meredyth Krych Appelbaum, Ph.D.

Department of Psychology


While one might think that language understanding is a relatively straightforward, passive process, in reality people must actively work together to establish the mutual belief that they have been understood (Clark, 1996, Clark & Krych, 2004; Clark &Wilkes-Gibbs, 1986.) Much of the research in referential communication involves strangers, because it is easier to study the establishment of their common ground, as opposed to friends or couples who might already share a great deal of information to 
which the experimenter is not privy. And yet, much of everyday communication occurs between people who are familiar with their conversational partners.  I will provide evidence that language coordination is much more complicated that it would at first seem (Clark & Krych, 2004).  Further, I will discuss a recent study that examines the impact of partner familiarity (strangers, friends, vs. couples) on the efficiency of communication for referential communication tasks in which partners have no privately shared common ground. There is mixed evidence in the literature as to whether familiar partners can communicate more effectively than strangers. Based on previous research (Krych-Appelbaum, et al., 2007), we expected and subsequently found that familiar partners did do no better than strangers.  One possible reason is that familiar partners may wrongly assume that their partners should understand them better than they actually do.


Where: Special Collections Room, Sprague Library
When: Friday, March 25, 1PM 2011
Speaker: Paul C. Amrhein (Psychology)
How Speech Act Theory Informs Psychotherapy Outcomes or Linguistic Pragmatics Meets Psychiatric Medicine


The narrative genre constructed during a psychotherapeutic session is rife with speech acts, most notably, requests and commitments, mutually exchanged by clinician and client. However, a theory capturing this phenomenon had not been put to empirical test until Amrhein, Miller, Yahne, Palmer and Fulcher (2003). Drawing from Austin, Searle, and McCawley, this theory (Amrhein, 2004) posits that much of the work incurred during psychotherapy concerns the clinician evocation of client utterances denoting desires, abilities, needs and reasons leading to expressions of commitment to maintain current behavior patterns with deleterious health consequences or, ideally, to change them.  More specifically, it is what Searle calls the “illocutionary force strength” of client verbal commitments that is proposed to be especially prognostic of future behavior.  My talk will present this theory and evidence to date indicating that commitment strength is a malleable, psychological construct influenced by treatment modality, therapist skill, and client intellectual characteristics.


Where: Special Collections Room, Sprague Library
When: Friday, February  25, 1 PM 2011
Speaker: Mary Boyle (Communication Sciences and Disorders)

Semantic Feature Analysis Treatment for Aphasic Word Retrieval Problems: 
The Challenge of Moving from Naming to  Discourse Production

Evidence from single-subject designs suggest that Semantic Feature  Analysis Treatment improves confrontation naming of treated items and untreated items for people with mild or moderate aphasia.  However, generalization of this improvement to word retrieval during discourse production has been mixed.  Providing treatment at the discourse level, rather than at the confrontation naming level, has yielded some promising results, but has generated a new set of questions, including the best way to measure word retrieval in discourse, the stability of such measurements from day to day, and the relationships of these measures to listeners' perception of a person's word retrieval ability.  This talk will review research results and discuss current projects at the single-word and discourse levels of treatment.


Time: Friday, December 3, 1PM 2010
Location: Cohen Lounge (Dickson Hall)
Speaker: Jing Peng (Computer Science)
 Transfer Learning with Applications to Text Classification
When labeled examples are difficult to obtain in a target domain, transfer learning can be very useful that exploits knowledge obtained  from a source domain to improve performance in the target domain. Existing techniques require that sampling distributions between the two domains are the same. However, this requirement is often violated in practice.  In this talk, we describe a technique that maps both target and source domain data into a space where we can bound the difference between two induced distributions, 
thereby dramatically improving performance. We provide experiments that demonstrate the superiority of the proposed technique.


Where: Special Collections Room, Sprague Library
When: Wednesday, October 27, 1 PM 2010
Speaker:  Jen Pardo (Psychology)

Dominance and Accommodation During Conversational Interaction

Phonetic variation is a problem for psycholinguistic theories of speech production and perception. If the goal of communication is parity between sender and receiver, then the demands of efficient communication should lead to matching in the phonetic forms employed by interacting talkers. However, phonetic variation is neither random nor due solely to physiology, and is the rule rather than the exception in communication. Therefore, there must be other communicative goals that influence the phonetic forms talkers use when speaking. The current project aims to delineate some of the individual, social, and situational factors that influence phonetic form variation in  ordinary conversational interactions. In three studies, unacquainted talkers were recorded before, 
during, and after performing a conversational task together. The recordings were analyzed and excerpts were presented to naive listeners who made perceptual similarity judgments that assessed the degree to which the talkers converged in phonetic form. Overall, there was a reliable tendency for the talkers to become more similar phonetically, but this tendency was subtle and was influenced by sex of the talker and the role of the talker in the interaction. Moreover, the patterns derived from the global assessments of phonetic similarity provided by the listeners were not related to analyses of individual acoustic attributes in a straightforward manner. These patterns of phonetic variation have important implications for an understanding of the processes of speech production, perception, and their connection.


Where: Special Collections Room, Sprague Library
When:  Friday, September 24, 2010 1-2:30 PM 
Speaker: Eileen Fitzpatrick (Linguistics)
Linguistic Cues to Deception