ebmt
I have been working with Ralf Brown to improve CMU's Example Based Machine Translation (EBMT) system. As its name implies, EBMT is a form of automated machine translation that uses a large corpus of previously-translated example sentences. In the same way that an human might consult an experienced translator's work in determining how best to capture the essence of a phrase in another language, an EBMT system consults a large corpus of human translations.
EBMT's concept of the world (like most other forms of machine translation) is limited to the data that exists in its corpus. Unfortunately, many linguistic phenomena, words, and phrases occur with low frequencies. This poses a significant problem even in a reasonably large training corpus. Thus most of the recent work in our EBMT system has focused on forming generalizations to increase the coverage of our examples and capture things that are not directly seen in the text.
During my Master's I focused on using Arabic morphology to form translations through generalization. Arabic is a highly inflectional language. A root (usually a series of three or four consonants) combines with a voweling pattern to form a stem. Affixes representing morphological information such as person, numer, and case are added to these stems in order to form words. Thus, while it is unlikely that we will see in our training data all forms of an Arabic word, if we know the rules of Arabic morphology we can predict how unseen Arabic words would act in a context that demands specific inflections.
Please browse my publications for more specific details of my research and contact me if you have any questions, want to collaborate, or just to say 'hi'.