[PDF][PDF] Compiling bilingual lexicon entries from a non-parallel English-Chinese corpus

P Fung - Third Workshop on Very Large Corpora, 1995 - aclanthology.org
Third Workshop on Very Large Corpora, 1995aclanthology.org
We propose a novel context heterogeneity similarity measure between words and their
translations in helping to compile bilingual lexicon entries from a non-parallel English-
Chinese corpus. Current algorithms for bilingual lexicon compilation rely on occurrence
frequencies, length or positional statistics derived from parallel texts. There is little
correlation between such statistics of a word and its translation in non-parallel corpora. On
the other hand, we suggest that words with productive context in one language translate to …
Abstract
We propose a novel context heterogeneity similarity measure between words and their translations in helping to compile bilingual lexicon entries from a non-parallel English-Chinese corpus. Current algorithms for bilingual lexicon compilation rely on occurrence frequencies, length or positional statistics derived from parallel texts. There is little correlation between such statistics of a word and its translation in non-parallel corpora. On the other hand, we suggest that words with productive context in one language translate to words with productive context in another language, and words with rigid context translate into words With rigid context. Context heterogeneity measures how productive the context of a word is in a given domain, independent of its absolute occurrence frequency in the text. Based on this information, we derive statistics of bilingual word pairs from a non-parallel corpus. These statistics can be used to bootstrap a bilingual dictionary compilation algorithm.
aclanthology.org
Showing the best result for this search. See all results