Бидний тухай
Багш ажилтан
Энэ төслийн хүрээнд бид их хэмжээтэй, автоматаар үүссэн үг гарал үүслийн мэдээллийг агуулах CogNet гэх өгөгдлийн санг танилцуулж байна. Мөн энэхүү өгөгдлийн сангаас гадна, үгийн гарлыг олох алгоритм, түүний үнэлгээ ба анализуудыг үзүүлэх юм. Одоогоор CogNet нь 338 хэл (35 өөр бичиг үсэгтэй) дээр 5.9 сая когнэйт холбоосыг 94 хувийн өндөр чанартайгаар агуулаж байна.
We present CogNet, a large-scale, automatically-built database of sense-tagged cognates---words of common origin and meaning across languages. Beyond the database itself, the paper presents the algorithm and input resources used for its computation, an evaluation of the result, as well as a quantitative analysis of cognate data leading to novel insights on language diversity. Following early work that resulted in a first release of CogNet, our method has been made extensible, allowing the continuous evolution of the resulting database, both in precision and in coverage. The current version of CogNet contains 5.9~million cognate pairs (an increase of 90\%) over 338 languages and 35~writing systems, with new releases already in preparation. Furthermore, as large-scale cross-lingual knowledge such as CogNet is becoming crucial for improving the quality of multilingual applications in computational linguistics, we present a case study on the use of CogNet for bilingual lexicon induction.
Түлхүүр үгс: