msu logo

February 19, 2001

Q&A:

"(Speech segmentation) is especially useful for the seeing impaired and for reading e-mail over the phone."

Eileen Fitzpatrick
Professor, Linguistics


Eileen Fitzpatrick can sum up three decades as a computational linguistics specialist with one sentence: "After he ate my cat Freddie took a nap." Whether this cat eats lunch or becomes lunch depends on the reader.

"Pause in speech is important, but it is not a well-understood phenomena," said Fitzpatrick, who works on the prosodyÑthe metrical structure of verseÑin text-to-speech information retrieval. "Text-to-speech systems avoid very clear pauses because the computers don't know where to put them. Part of that is what gives text-to-speech an artificial sound."

In the classroom Fitzpatrick draws from the experience she gained from a 25-year career at AT&T Bell Labs, where she set up three projects that use language data as input to the decision-making process of the business.

Her ties to the corporate world also helped the Linguistics Department garner a $16,000 grant from Lucent Technologies Bell Laboratories. Fitzpatrick recently discussed text-to-speech information retrieval and how this funding allows MSU graduate students to apply their linguistics training at Lucent.

INSIGHT: What is text-to-speech information retrieval?
Fitzpatrick: The processing of language by computer in its broadest sense. It covers speech synthesis, speech recognition and text understanding, parts of informational retrieval text summarization and grammar checker.

INSIGHT: Tell us about the Lucent grant.
Fitzpatrick: The project, "Lucent Technologies Speech Segmentation," employs our graduate students trained in phonetics to segment speech data for Lucent's text-to-speech implementations, and to correct pre-assigned phonetics labels. The students will visit Bell Labs to receive demonstrations of how their work is being used, and will attend colloquia and presentations.

INSIGHT: How did Lucent select Montclair State for this grant?
Fitzpatrick: Three years ago, at President [Susan A.] Cole's suggestion, our department initiated an industry advisory board. The board includes people trained in linguistics from IBM, AT&T, Lucent, Random House, Educational Testing Service, the New Jersey Department of Education, Centers for Applied Linguistics, the Army Research Lab in Washington and various speech companies. A Lucent board member told me the company needed a lot of segmented speech to build a more accurate text-to-speech system. Because the signal processing system we use for speech analysis is also at Lucent, we could easily segment speech for them.

INSIGHT: What is the value of speech segmentation?
Fitzpatrick: It's especially useful for the seeing impaired and for reading e-mail over the phone. It can also be used for sending messages to pilots or diversÑany situation where your eyes are not available to read. Quality has improved tremendously since I began working in the field because of the enormous amounts of data being collated. However, we're still not to the point where we want to be in presenting this technology. When reading free text a sentence can be ambiguous and convey the wrong information. However, the needs of the visually impaired are so high they will take what current technology can offer, even with its flaws.

INSIGHT: Tell us about your research.
Fitzpatrick: I'm working on two projects. One is at Lucent. In text-to-speech we use a single speaker as a model. But that tells us nothing about the average characteristics of speech. I'm looking at the average duration of sounds over several speakers. For instance, a Spanish speaker learning English will pronounce "organize" as "organise," because the Spanish speaker has not adjusted the duration of the vowel before the "z." We won't know how to help Spanish-speaking people with that accent difference until we can model English for them. I'm also working with Steve Seegmiller [of Linguistics] collecting written English data from non-native speakers. Our objective is to build a corpus of non-native written English for various applications for research in second language acquisition and computer applications such as grammar checking.

Go back to the Insight index