Brown Bag Talks

Brown Bag Talks

 

A  new series of Brown Bag seminars
College of Humanities and Social Sciences (CHSS)

[Supported by the U.S. National Science Foundation: Grant No. 1048406]

 

The purpose of the group is to discuss issues in large datasets, language, and speech processing. We are hoping to bring our cross-disciplinary strengths to these areas, to share ideas, to discuss the current state of the art, and to collaborate on research topics. 

 

 

 

Schedule (2012-2013) (always work in progress)

 

 Speaker: Barbara Landau (Dick and Lydia Todd Professor, Johns Hopkins University, Department of Cognitive Science)

Time:

Date: Friday, March 22, 2013

Location: TBA

Title: TBA

 

Speaker:  Smaranda Muresan, Library and Information Science Department, Rutgers University

Time: Friday, March 1, 2013

Location: TBA

Title: TBA

 

Speaker:  Nitin Madnani (Educational Testing Service, Princeton, NJ)

Time: 1PM

Date: Friday, February 22, 2013 

Location: CS-104

Light refreshments will be served

 

"What Test Takers Say: Analyzing Argument Organization and Topical Trends in Essays"

 In this talk, I will present two strands of natural language processing research at ETS that were designed to help us understand the nature of test-taker writing in essays.

In the first, I will talk about our research on test taker responses to argument-driven prompts, which contains not only language expressing claims and evidence, but also language used to organize these claims and pieces of evidence. Differentiating between the two may be useful for many applications and I will discuss our automated approach to detecting such high-level organizational elements in argumentative discourse.

 In the second part of the talk, I will discuss our research on test takers responses to more generic prompts about social issues. Without an understanding of the trends reflected in these responses, automated scoring systems may not be reliable and may also worsen over time. Our preliminary approach analyzes topical trends in test takers' responses and correlates these trends with those found in the news. We find evidence that many trends are similar across essays and the news but also observe some interesting differences.

---

 Bio: Nitin Madnani is currently a Research Scientist with the Text, Language and Computation group at the Educational Testing Service in Princeton, NJ. He received his PhD in Computer Science in 2010 from University of Maryland, College Park where he worked with Bonnie Dorr and Philip Resnik on a number of NLP topics but focused on Statistical Machine Translation and Automatic Paraphrase Generation. At ETS, he spends his time on building paraphrase models for use in automated test scoring, improving grammatical error correction, analyzing sentiment in essays and being the resident Python and information visualization geek. More details about his work can be found at http://www.desilinguist.org

 

 

 

Speaker: Serguei Pakhomov, College of Pharmacy, University of Minnesota

Time: 2PM

Date: Friday, November  16, 2012

Location: Conrad Schmitt Hall 104

Title: Computerized Assessment of Spoken Language for Pharmacodynamic Analysis

 

 Abstract:

 In this talk, I will introduce a new area of research that applies computational linguistic methods and tools to the assessment of adverse effects of neuroactive medications. Some anti-epileptic medications have been reported to cause word finding difficulties in a subset of people who take these medications. These word finding problems are currently not very well defined and are difficult to quantify and measure precisely. Furthermore, the underlying brain mechanism(s) that are responsible for these deficits are currently not known. One of the main objectives of our interdisciplinary group at the University of Minnesota Center for Clinical and Cognitive Neuropharmacology (C3N) is to develop methods for more precise and reproducible measurement of these deficits towards better characterization of their behavioral manifestations and the underlying mechanism(s). I will describe a range of computerized instruments that we have developed  and that are designed to extract speech and language characteristics from spontaneous speech. In particular, I will present the results of a study of 20 volunteers that were randomized to receive an anti-epileptic medication (topiramate), an anxiolytic medication (lorazepam), or placebo. In one of the cognitive assessment tasks, the subjects were asked to describe a picture. Their responses were audio recorded and subsequently examined in a semi-automated fashion to measure speech fluency. Our findings so far indicate that speech fluency characteristics including duration of silent pauses and the rate of disfluent speech events (um's and ah's, word fragments and repetitions) are sensitive to the effects of topiramate and constitute a promising direction for further research.

 

 Short Bio:

 Dr. Pakhomov currently is an Associate Professor at the University of Minnesota College of Pharmacy. He is a co-founder of the University of Minnesota Center for Clinical and Cognitive Neuropharmacology, a member of the Center for Cognitive Sciences and an affiliate member of the Institute for Health Informatics at the University of Minnesota. Dr. Pakhomov earned a Doctorate degree in Linguistics with a Cognitive Science minor  from the University of Minnesota in 2001. Prior to his academic appointment at the University of Minnesota, he worked as a research scientist in several commercial and academic organizations including the Mayo Clinic, Lernout and Hauspie, Inc. and Linguistic Technologies, Inc. Dr. Pakhomov’s primary research interest is in applying computational methods to analyze and quantify speech and language characteristics affected by neurodegenerative disorders and neuroactive medications.

 

Speakers:  Jing Peng (Computer Science) & Anna Feldman (Linguistics & Computer Science)

Time: 2PM, Friday, October 12, 2012

Location: Schmitt Hall 204, 2-4PM

Title:    Identifying Figurative Language in Text: First Results

Abstract:  

In our talk we will discuss several experiments whose goal is to
automatically identify figurative language in text. While our
approach does not have to be limited to a specific type of figurative
language, we concentrate on idioms (and metaphors, to some extent).
We explore several hypotheses: 1) the problem of automatic idiom
detection can be reduced to the problem of identifying an outlier in a
dataset; 2) instead of extracting multiword expressions and then
determining which belongs to the idiomatic class, we view the process
of idiom detection as a binary classification of sentences. We apply
principal component analysis (PCA) (Jolliffe 1986; Shyu et al. 2003) for outlier
detection. Detecting idioms as lexical outliers does not exploit class
label information. So, in the following experiments, we use linear
discriminant analysis (LDA) (Fukunaga 1990) to obtain a
discriminant subspace and later use the three nearest neighbor 3NN
classifier to obtain accuracy. We discuss pros and cons of each
approach. All the approaches are more general than the previous
algorithms for idiom detection -- neither do they rely on target idiom
types, lexicons, or large manually annotated corpora, nor do they
limit the search space by a particular type of linguistic
construction.

 


 

Schedule (2011-2012)

 

Speaker: Professor of Linguistics, Emerita Alice Freed
Time: Friday, April 20, 2012 1PM
Location: Conrad Schmitt 104
Light refreshments will be served

 

Language and gender in the public eye: Still “different” after all these years.

 

This talk is a preliminary review and update of the evidence and arguments I compiled about 10 years ago for a chapter in The Handbook on Language and Gender (Freed 2003). In the chapter, I argued that after 30 years of research on language, sex and gender - now nearly 40 - a significant discrepancy existed between public perceptions of how women and men speak (and how they are expected to speak) and the actual character of the language that people use. The persistence of this contradiction seemed to underscore the vitality of well-entrenched stereotypes about sex and gender and the weight and influence of societal efforts to maintain the impression of difference between women and men. What has changed is the type of evidence being presented today in support of these so-called differences. Brain research and the new “ null” is increasingly used to support claims of male-female language difference.            

In 2003, I explained that despite the vast quantities of naturally occurring speech samples from a wide range of contexts that null, linguistic anthropologists, and other scholars had analyzed – from the amount of talk, to the structure of narratives, the use of questions, to the availability of cooperative and competitive speech styles - no consistent pattern had been found between either sex or gender and the characteristics of the way we use language. Yet, despite the enormity of our research results, the public representation of the way women and men speak was, and I will argue, still is almost identical to the characterization provided in the middle of the last century. 

Sources of evidence about public views of language difference will come mainly from (1) a review of several articles that have recently appeared in the popular press/ mass media and (2) a preliminary analysis that uses several on-line language corpora and library databases that search for the occurrence of “gender difference” in media sources for the years 2000-2010. 

Freed, Alice F. 2003. “Epilogue: Reflections on Language and Gender Research.” In The Handbook on Language and Gender. Janet Holmes and Miriam Meyerhoff, (Eds.) Oxford: Blackwell Publishers. Pp. 699-721.


Joel Tetreault 

 

Wednesday, April 11, 2012, 2:30PM

Schmitt Hall, Room 110
(Light refreshments will be served)

 A New Twist on Methodologies for ESL Grammatical Error Detection 

 

Abstract:

The long-term goal of our work is to develop a system which detects errors in grammar and usage so that appropriate feedback can be given to non-native English writers, a large and growing segment of the world's population. Estimates are that in China alone as many as 300 million people are currently studying English as a second language (ESL). In particular, usage errors involving prepositions are among the most common types seen in the writing of non-native English speakers. For example, Izumi et al., (2003) reported error rates for English prepositions that were as high as 10% in a Japanese learner corpus.

Since prepositions are such a nettlesome problem for ESL writers, developing an NLP application that can reliably detect these types of errors will provide an invaluable learning resource to ESL students. In this talk we first review one popular machine learning methodology for detecting preposition and article errors in texts written by ESL writers. Next, we describe a novel approach to ESL grammatical error detection: using round-trip machine translation to automatically correct errors.

This is joint work with Nitin Madnani (ETS) and Martin Chodorow (CUNY).

Speaker Bio:

Joel Tetreault is a Managing Research Scientist specializing in Computational Linguistics in the Research & Development Division at Educational Testing Service in Princeton, NJ. His research focus is Natural Language Processing with specific interests in anaphora, dialogue and discourse processing, machine learning, and applying these techniques to the analysis of English language learning and automated essay scoring. Currently he is working on automated methods for detecting grammatical errors by non-native speakers, plagiarism detection, and content scoring methods. Previously, he was a postdoctoral research scientist at the University of Pittsburgh's Learning Research and Development Center (2004-2007). There he worked on developing spoken dialogue tutoring systems. Tetreault received his B.A. in Computer Science from Harvard University (1998) and his M.S. and Ph.D. in Computer Science from the University of Rochester (2004).

 

Slides


Tuesday, January 31, 2012

Cohen Lounge (Dickson Hall), 2PM
(Light refreshments will be served)

Learning to Generate Understandable Animations of American Sign Language

Matt Huenerfauth, Associate Professor, City University of New York (CUNY)

A majority of deaf high school graduates in the U.S. have a fourth-grade English reading level or below, and so computer-generated animations of American Sign Language (ASL) could make more information and services accessible to these individuals.  Instead of presenting English text on websites or computer software, information could be conveyed in the form of animations of virtual human characters performing ASL (produced by a computer through automatic translation software or by an ASL-knowledgable human scripting the animation).  Unfortunately, getting the details of such animations accurate enough linguistically so that they are clear and understandable is difficult, and methods are needed for automating the creation of high-quality ASL animations.

This talk will discuss my lab's research, which is at the intersection of the fields of assistive technology for people with disabilities, computational linguistics, and the linguistics of ASL.  Our methodology includes: experimental evaluation studies with native ASL signers, motion-capture data collection of an ASL corpus, linguistic analysis of this corpus, statistical modeling techniques, and animation synthesis technologies.  In this way, we investigate new models that underlie the accurate and natural movements of virtual human characters performing ASL; our current work focuses on modeling how signers use 3D points in space and how this affects the hand-movements required for ASL verb signs.

About the Speaker:

Matt Huenerfauth is an associate professor of computer science and linguistics at the City University of New York (CUNY); his research focuses on the design of computer technology to benefit people who are deaf or have low levels of written-language literacy.  He serves as an associate editor of the ACM Transactions on Accessible Computing, the major computer science journal in the field of accessibility for people with disabilities.  In 2008, he received a five-year Faculty Early Career Development (CAREER) Award from the National Science Foundation to support his research.  In 2005 and 2007, he received the Best Paper Award at the ACM SIGACCESS Conference on Computers and Accessibility, the major computer science conference on assistive technology for people with disabilities; he is serving as general chair for this conference in 2012.  He received his PhD from the University of Pennsylvania in 2006.

slides

 


Monday, December 12, 1PM
Special Collections Room @ Sprague Library

On the roles of CODAs in sign language research: Separating L1/L2 acquisition from hearing status

Dr. Marie Nadolske
(Linguistics)

This study examines narratives of three groups of American Sign Language (ASL) signers: Deaf native signers (DOD), hearing native signers (CODAs), and highly proficient non-native hearing  signers (L2). Through the examination of several language domains, acquisition patterns can be identified based on whether ASL was learned as a first or second language. Alternately, differing language patterns were identified based on whether a signer had “normal” hearing or was Deaf. These findings resulted from the inclusion of the CODA group in this study. Without their valuable data, differences between the hearing L2 signers and Deaf L1 signers would be solely attributable to language acquisition status with no acknowledgement of the potential complications of being a bimodal-bilingual individual.


Friday, November 11, 2PM
Special Collections Room @ Sprague Library


Perception-Production Relations in Phonological Development
Dr. Tara McAllister Byun
(Communication Sciences and Disorders)


Many children who neutralize phonemic contrasts in production exhibit diminished perceptual discrimination of the same contrasts. It has proven difficult to determine whether these parallel errors reflect the influence of a primary perceptual deficit on production, or vice versa. I will offer evidence on the direction of causation by comparing positional influences on speech production and perception in one four-year-old boy with phonological disorder. The case study subject neutralized some phonemic contrasts only in initial position, a context known to have enhanced perceptual salience to adult listeners. This unique phenomenon in child phonology has been proposed to arise from a child-specific pattern of perceptual sensitivity favoring final position. However, in a nonword discrimination task, the subject was significantly more accurate in detecting contrasts in initial position, where his production errors occurred. In light of this mismatch, I conclude that the subject's errors must be the consequence of a production-oriented factor. Independent of position, however, the subject's perception of a phonemic contrast he neutralized in production was decreased relative to other contrasts. I thus argue that this case represents an unambiguous example of a perceptual deficit arising from a primary deficit in the production domain.

 

slides

 


Friday, October 14, 2011 2PM
Special Collections Room @ Sprague Library

Dr. Laura Lakusta (Psychology, Montclair State University)

Language and memory for motion events: The asymmetry between source and goal paths over development.


Human beings talk about events. The capacity to do so requires an interface between spatial cognition and language. However, given that  the format of linguistic and non-linguistic representations is likely to differ, the question arises of how these two systems map onto each other and how these mappings are learned. I will present research suggesting one possible solution to this problem: a homology exists between the non-linguistic and linguistic representations of Source and Goal paths.  First, when linguistically describing a broad range of events, children and adults are more likely to encode the Goal path rather than the Source path. A Goal bias is also found when individuals represent events non-linguistically, and even extends to the event representations of pre-linguistic infants. Thus, an asymmetry between Goal and Source paths is common to both linguistic and non-linguistic structure and is found early in development. In the second part of my talk, I will present research exploring the strength of this homology. Is a Goal bias rooted generally in cognition or is specific to intentional events? Research with infants, children, and adults suggest the latter – a Goal bias in non-linguistic cognition shows up most strongly for intentional events. These findings raise the important question of how children learn to collapse over conceptual domains for purposes of expressing Paths in language. 
Slides

 



Thursday, October 6, 2011 1PM
Special Collections Room @ Sprague Library

Analysis of Temporal Processing During Sentence Comprehension
Dr. Mary Call
(Linguistics) and Dr. David Townsend (Psychology)

For several years, we have been studying the comprehension of temporal relationships in English by native speakers of English.  We are now extending this research to include the processing of temporal relationships by English language learners.   In a pilot study, we collected self-paced reading data from native speakers of Spanish that differs in interesting ways from similar data collected from native speakers of English.  One of our current projects is to carry out a larger scale study of this phenomenon.

 


In addition, we are testing the influence of the first language (Spanish, in this case) on the judgments that learners make about English sentences containing stative verbs that occur in a context-establishing when clause.  If these learners are relying on their Spanish (L1) strategies, we predict that they will choose verb forms that are either ungrammatical or less-preferred in English.

 



Where: Special Collections Room, Sprague Library
When: Wednesday, April 27, 2011 2PM

“Coordinating two minds:  Do familiar partners such as friends and couples have a communicative advantage compared to strangers?”

Meredyth Krych Appelbaum, Ph.D.

Department of Psychology

 

While one might think that language understanding is a relatively straightforward, passive process, in reality people must actively work together to establish the mutual belief that they have been understood (Clark, 1996, Clark & Krych, 2004; Clark &Wilkes-Gibbs, 1986.) Much of the research in referential communication involves strangers, because it is easier to study the establishment of their common ground, as opposed to friends or couples who might already share a great deal of information to 
which the experimenter is not privy. And yet, much of everyday communication occurs between people who are familiar with their conversational partners.  I will provide evidence that language coordination is much more complicated that it would at first seem (Clark & Krych, 2004).  Further, I will discuss a recent study that examines the impact of partner familiarity (strangers, friends, vs. couples) on the efficiency of communication for referential communication tasks in which partners have no privately shared common ground. There is mixed evidence in the literature as to whether familiar partners can communicate more effectively than strangers. Based on previous research (Krych-Appelbaum, et al., 2007), we expected and subsequently found that familiar partners did do no better than strangers.  One possible reason is that familiar partners may wrongly assume that their partners should understand them better than they actually do.

 


Where: Special Collections Room, Sprague Library
When: Friday, March 25, 1PM
Speaker: Paul C. Amrhein (Psychology)
How Speech Act Theory Informs Psychotherapy Outcomes or Linguistic Pragmatics Meets Psychiatric Medicine

 


The narrative genre constructed during a psychotherapeutic session is rife with speech acts, most notably, requests and commitments, mutually exchanged by clinician and client. However, a theory capturing this phenomenon had not been put to empirical test until Amrhein, Miller, Yahne, Palmer and Fulcher (2003). Drawing from Austin, Searle, and McCawley, this theory (Amrhein, 2004) posits that much of the work incurred during psychotherapy concerns the clinician evocation of client utterances denoting desires, abilities, needs and reasons leading to expressions of commitment to maintain current behavior patterns with deleterious health consequences or, ideally, to change them.  More specifically, it is what Searle calls the “illocutionary force strength” of client verbal commitments that is proposed to be especially prognostic of future behavior.  My talk will present this theory and evidence to date indicating that commitment strength is a malleable, psychological construct influenced by treatment modality, therapist skill, and client intellectual characteristics.

 



Where: Special Collections Room, Sprague Library
When: Friday, February  25, 1 PM
Speaker: Mary Boyle (Communication Sciences and Disorders)

Semantic Feature Analysis Treatment for Aphasic Word Retrieval Problems: 
The Challenge of Moving from Naming to  Discourse Production

Evidence from single-subject designs suggest that Semantic Feature  Analysis Treatment improves confrontation naming of treated items and untreated items for people with mild or moderate aphasia.  However, generalization of this improvement to word retrieval during discourse production has been mixed.  Providing treatment at the discourse level, rather than at the confrontation naming level, has yielded some promising results, but has generated a new set of questions, including the best way to measure word retrieval in discourse, the stability of such measurements from day to day, and the relationships of these measures to listeners' perception of a person's word retrieval ability.  This talk will review research results and discuss current projects at the single-word and discourse levels of treatment.
Slides

 



Time: Friday, December 3, 1PM
Location: Cohen Lounge (Dickson Hall)
Speaker: Jing Peng (Computer Science)
 
 Transfer Learning with Applications to Text Classification
 
When labeled examples are difficult to obtain in a target domain, transfer learning can be very useful that exploits knowledge obtained  from a source domain to improve performance in the target domain. Existing techniques require that sampling distributions between the two domains are the same. However, this requirement is often violated in practice.  In this talk, we describe a technique that maps both target and source domain data into a space where we can bound the difference between two induced distributions, 
thereby dramatically improving performance. We provide experiments that demonstrate the superiority of the proposed technique.
  Slides

 


Where: Special Collections Room, Sprague Library
When: Wednesday, October 27, 1 PM
Speaker:  Jen Pardo (Psychology)

Dominance and Accommodation During Conversational Interaction

Phonetic variation is a problem for psycholinguistic theories of speech production and perception. If the goal of communication is parity between sender and receiver, then the demands of efficient communication should lead to matching in the phonetic forms employed by interacting talkers. However, phonetic variation is neither random nor due solely to physiology, and is the rule rather than the exception in communication. Therefore, there must be other communicative goals that influence the phonetic forms talkers use when speaking. The current project aims to delineate some of the individual, social, and situational factors that influence phonetic form variation in  ordinary conversational interactions. In three studies, unacquainted talkers were recorded before, 
during, and after performing a conversational task together. The recordings were analyzed and excerpts were presented to naive listeners who made perceptual similarity judgments that assessed the degree to which the talkers converged in phonetic form. Overall, there was a reliable tendency for the talkers to become more similar phonetically, but this tendency was subtle and was influenced by sex of the talker and the role of the talker in the interaction. Moreover, the patterns derived from the global assessments of phonetic similarity provided by the listeners were not related to analyses of individual acoustic attributes in a straightforward manner. These patterns of phonetic variation have important implications for an understanding of the processes of speech production, perception, and their connection.

 



Where: Special Collections Room, Sprague Library
When:  Friday, September 24, 1-2:30 PM
Speaker: Eileen Fitzpatrick (Linguistics)
Linguistic Cues to Deception