Бидний тухай
Багш ажилтан
Data reuse is fundamental for reducing the data integration effort required to build data supporting new applications, especially in data scarcity contexts. However, data reuse requires to deal with data heterogeneity, which is always present in data coming from different sources. Such heterogeneity appears at different levels, like the language used by the data, the structure of the information it represents, and the data types and formats adopted by the datasets. Despite the valuable insights gained by reusing data across contexts, dealing with data heterogeneity is still a high price to pay. Additionally, data reuse is hampered by the lack of data distribution infrastructures supporting the production and distribution of quality and inter-operable data. These issues affecting data reuse are amplified considering cross-country data reuse, where geographical and cultural differences are more pronounced. In this paper, we propose LiveData, a cross-country data distribution network handling high quality and diversity-aware data. LiveData is composed by different nodes having an architecture providing components for the generation and distribution of a new type of data, where heterogeneity is transformed into information diversity and considered as a feature, explicitly defined and used to satisfy the data users purposes. This paper presents the specification of the LiveData network, by defining the architecture and the type of data handled by its nodes. This specification is currently being used to implement a concrete use case for data reuse and integration between the University of Trento (Italy) and the National University of Mongolia.
This study analyzes the possible relationship between personality traits, in terms of Big Five (extraversion, agreeableness, responsibility, emotional stability and openness to experience), and social interactions mediated by digital platforms in different socioeconomic and cultural contexts. We considered data from a questionnaire and the experience of using a chatbot, as a mean of requesting and offering help, with students from 4 universities: University of Trento (Italy), the National University of Mongolia, the School of Economics of London (United Kingdom) and the Universidad Cat\'olica Nuestra Se\~nora de la Asunci\'on (Paraguay). The main findings confirm that personality traits may influence social interactions and active participation in groups. Therefore, they should be taken into account to enrich the recommendation of matching algorithms between people who ask for help and people who could respond not only on the basis of their knowledge and skills.
Дижиталд бүрэн шилжихэд стратеги, менежментээс эхлээд шинэ технологийг ашиглах, шинэ үйлчилгээ бий болгох олон сорилт байна. Энэ нь зөвхөн техник, технологийн хувьсал биш бөгөөд бас нийгмийн өөрчлөлтийн асуудал бөгөөд үүнд нөлөөлөх нэг том хүчин зүйл нь дижитал их сургууль байх болно. Дижитал их сургууль нь өөрийн эрхлэх үйл, үйлчилгээг бүхэлд дүрсэлсэн виртуал тусгал – дижитал ертөнцийг байгуулж түүнийгээ орчин үеийн технологийн тусламжтайгаар бодит амьдралд ашиглах дижитал стратегитэй хөгждөг ирээдүйн их сургууль юм. Харин тэр ертөнцийг байгуулахад тогтвортой өгөгдлийн суурь бүтэц, чанартай их өгөгдлийг бий болгох нь зайлшгүй шийдвэрлэх ёстой үндсэн асуудал болж байна. Их сургуулийн өгөгдөл, мэдээлэл нь олон төрлийн мэдээллийн системд тархаж байрласан байдаг. Тэдгээр систем нь тодорхой зорилготой үйлчилгээ үзүүлдэг. Тухайлбал, ном, өгүүллийн мэдээллийг нэг системд хадгалж байхад, тэдний зохиогчдын тухай мэдээллийг хүний нөөцийн удирдлагын өөр системд хадгалж байдаг. Цаашилбал, нэг ижил өгөгдлийг олон ялгаатай байдлаар хадгалж байна. Өгөгдлийн ийм ялгаатай нөхцөлд ахисан түвшний хайлт, шинжилгээ хийх бараг боломжгүй юм. Хэрэв бид өгөгдлийг нэгтгэж их сургуулийн дижитал ертөнцийг байгуулж чадвал тухайн газар хэрэгжүүлсэн төслийн үр дүнг хөрөнгө оруулагчийн хамтад нь олох; ямар нэг үйл ажиллагаанд их сургуулийг илүү сайн төлөөлж чадах тухайн салбарын мэргэжилтнүүдийг олох; тухайн тэнхимийн, тухайн профессорын, тухайн сэдвээр эрдэм шинжилгээний бүтээлийн цаашдын чиг хандлагыг үнэлэх зэрэг боломж бүрдэнэ. Үүгээр тогтохгүй орчин үеийн технологи – хиймэл оюун, блокчэйн, AR/VR зэрэгт түшиглэсэн шинэ үйлчилгээг сургалт, судалгаа, нийгэмд нэвтрүүлэхэд илүү хялбар байх болно. Түүнчлэн их сургууль нь өөрийн суралцагч, хамтрагчдын талаар олон талт өгөгдөл цуглуулж тэднийг илүү сайн мэдсэнээр тусгайлсан, хувьчилсан үйлчилгээг ч хүргэх боломж бий. Тухайлбал, суралцагчийн хувьд хичээл сурлагын үйл явцаас гадна тэдний өдөр тутмын амьдралын хэв маяг, суралцахуйн чадамжийг таньж мэдсэнээр хүн бүрт тусгайлсан, магадгүй тохирсон ялгаатай үйлчилгээг дижитал их сургуулиар дамжуулан хүргэх боломжтой. Энэ илтгэлээр дижитал их сургуулийг байгуулахад өгөгдлийг семантик аргаар нэгтгэж суурь бүтэц, мэдлэгийн граф байгуулсан гарааны туршлага, түүнд түшиглэсэн шинэ үйлчилгээг жишээгээр харуулна. Мөн суралцагчийн зан хандал, өдөр тутмын үйлийн талаарх өгөгдөлд хийсэн зарим судалгааны ажлын талаар танилцуулна. Дижитал их сургуулийг байгуулахад цаашид хийх ажил, судалгааны боломжийн талаар тоймлон танилцуулах болно.
Smartphones enable understanding human behavior with activity recognition to support people’s daily lives. Prior studies focused on using inertial sensors to detect simple activities (sitting, walking, running, etc.) and were mostly conducted in homogeneous populations within a country. However, people are more sedentary in the post-pandemic world with the prevalence of remote/hybrid work/study settings, making detecting simple activities less meaningful for context-aware applications. Hence, the understanding of (i) how multimodal smartphone sensors and machine learning models could be used to detect complex daily activities that can better inform about people’s daily lives, and (ii) how models generalize to unseen countries, is limited. We analyzed in-the-wild smartphone data and ∼ 216K self-reports from 637 college students in five countries (Italy, Mongolia, UK, Denmark, Paraguay). Then, we defined a 12-class complex daily activity recognition task and evaluated the performance with different approaches. We found that even though the generic multi-country approach provided an AUROC of 0.70, the country-specific approach performed better with AUROC scores in [0.79-0.89]. We believe that research along the lines of diversity awareness is fundamental for advancing human behavior understanding through smartphones and machine learning, for more real-world utility across countries.
Universities aim to adopt digital technologies such as artificial intelligence, big data analysis, and semantic technology in their teaching and research activities as part of the goal of digital transformation. The NUM has implemented the Digital NUM pilot project, which delivers a knowledge graph for NUM as a main data infrastructure and a virtual assistant chatbot as a new service. In this talk, we introduce the results of the Digital NUM pilot project and experiences of project implementation.
Mood inference with mobile sensing data has been studied in ubicomp literature over the last decade. This inference enables context-aware and personalized user experiences in general mobile apps and valuable feedback and interventions in mobile health apps. However, even though model generalization issues have been highlighted in many studies, the focus has always been on improving the accuracies of models using different sensing modalities and machine learning techniques, with datasets collected in homogeneous populations. In contrast, less attention has been given to studying the performance of mood inference models to assess whether models generalize to new countries. In this study, we collected a mobile sensing dataset with 329K self-reports from 678 participants in eight countries (China, Denmark, India, Italy, Mexico, Mongolia, Paraguay, UK) to assess the effect of geographical diversity on mood inference models. We define and evaluate country-specific (trained and tested within a country), continent-specific (trained and tested within a continent), country-agnostic (tested on a country not seen on training data), and multi-country (trained and tested with multiple countries) approaches trained on sensor data for two mood inference tasks with population-level (non-personalized) and hybrid (partially personalized) models. We show that partially personalized country-specific models perform the best yielding area under the receiver operating characteristic curve (AUROC) scores of the range 0.78--0.98 for two-class (negative vs. positive valence) and 0.76--0.94 for three-class (negative vs. neutral vs. positive valence) inference. Further, with the country-agnostic approach, we show that models do not perform well compared to country-specific settings, even when models are partially personalized. We also show that continent-specific models outperform multi-country models in the case of Europe. Overall, we uncover generalization issues of mood inference models to new countries and how the geographical similarity of countries might impact mood inference.
We propose a general methodology and an infrastructure which allows to achieve interoperability within the same university and across universities. The former goal is achieved by incrementally defining and building a knowledge graph (KG) using data coming from multiple heterogeneous databases. Interoperability across universities is achieved by having a reference KG schema that each university can adapt to the local needs, but keeping track of the changes, and by natively supporting multilinguality. We achieve this latter requirement by exploiting a multilingual lexical resource containing more than one thousand languages and by seamlessly translating across the schemas and also (to some extent) across the data written in the local languages. The effectiveness of the proposed approach is proven by the services developed in the context of two different projects conducted in two universities in Italy and Mongolia.
The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian) and received 13 system submissions from 7 teams and the best system averaged 97.29% F1 score across all languages, ranging English (93.84%) to Latin (99.38%). Subtask 2, sentence-level morpheme segmentation, covered 18,735 sentences in 3 languages (Czech, English, Mongolian), received 10 system submissions from 3 teams, and the best systems outperformed all three state-of-the-art subword tokenization methods (BPE, ULM, Morfessor2) by 30.71% absolute. To facilitate error analysis and support any type of future studies, we released all system predictions, the evaluation script, and all gold standard datasets.
This paper describes a method to enrich lexical resources with content relating to linguistic diversity, based on knowledge from the field of lexical typology. We capture the phenomenon of diversity through the notions of lexical gap and language-specific word and use a systematic method to infer gaps semi-automatically on a large scale. As a first result obtained for the domain of kinship terminology, known to be very diverse throughout the world, we publish a lexico-semantic resource consisting of 198 domain concepts, 1,911 words, and 37,370 gaps covering 699 languages. We see potential in the use of resources such as ours for the improvement of a variety of cross-lingual NLP tasks, which we demonstrate through a downstream application for the evaluation of machine translation systems.
Семантик технологид түшиглэн өгөгдлийг нэгтгэж мэдлэгийн граф үүсгэх замаар өгөгдлийн суурь бүтцийг байгуулах асуудал чухал болж байна. Их сургуулийн хувьд профессор, судлаачид, хичээл, өгүүлэл, төсөл зэргийн өгөгдөл нь олон янзын мэдээллийн системд тархаж байрлах тохиолдол элбэг байдаг. Энэ судалгааны ажлын зорилго нь их сургуулийн мэдлэгийг илэрхийлэх онтологи хөгжүүлж үүнийг өгөгдлийн семантик нэгтгэлд ашиглах илүү үр дүнтэй, үр ашигтай өгөгдлийн суурь бүтцийн шийдлийг боловсруулах юм. Бид боловсруулсан шийдлийн дагуу орчин үеийн шинэ технологи, дижитал үйлчилгээнд ашиглах боломжтой 322 мянган гурвал бүхий мэдлэгийн граф үүсгэсэн болно.
Семантик технологи, онтологийг хэрэглээнд нэвтрүүлэхэд онтологийн чанарын асуудал чухал тавигддаг. Энэ ажлаар их сургуулийн онтологийн чанарыг бичиглэл, бүтцийн хувьд үнэлэх зорилготой. Бид OOPS үнэлгээний аргаар VIVO болон МУИС-ийн онтологийг харьцуулан судалж зайлшгүй засах шаардлагатай болон шаардлагагүй, ноцтой үл тохирлыг олж илрүүлсэн юм. Эдгээрээс зарим үл тохирлыг шинжилж засварлаж сайжруулах санал боловсруулсан болно.
Аливаа хэл хооронд оршдог бичлэг болон дуудлага төстэй, ижил утгатай гарал нэг үгсийг тодорхойлох нькомпьютер хэл шинжлэлийн даалгаварт хэрэглэх хэлний шинэ нөөцийг үүсгэх боломжийг олгож байна.Энэ ажлаар үгийн үсгийн дараалалд тулгуурлан гарал нэг үгийг автоматаар үүсгэх аргыг боловсруулахыгзорьсон юм. Бид төстэй болон өөр үсэгтэй таван хос хэлний хувьд гарал нэг үгсийг үүсгэх seq2seq гүнсургалтын загварыг гаргалаа. Сургасан загварыг үүсгэсэн үгийн тэмдэгтийн зөрүүгээр үнэлэхэд дунджаар 0.73 оновчтойгоор гарал нэг үгсийг зөв үүсгэж чадсан.
Goal of research work is to determine the most important features on predicting the grade of the General Entrance Exam (GEE). The features are high school student’s grade, personal information, and state exam results. We collected 96,827 high school students data and compared the F1 measures with different classification techniques such as decision tree, logistic regression, artificial neural network, and support vector machine. Among these techniques, the SVM provided the best F1 measure that is 0.70.
This paper presents the Mongolian Wordnet (MOW), and a general methodology of how to construct it from various sources e.g. lexical resources and expert translations. As of today, the MOW contains 23,665 synsets, 26,875 words, 2,979 glosses, and 213 examples. The manual evaluation of the resource1 estimated its quality at 96.4%.
Building a wordnet from scratch is a huge task, especially for languages less equipped with pre-existing lexical resources such as thesauri or bilingual dictionaries. We address the issue of costliness of human supervision through crowdsourcing that offers a good trade-off between quality of output and speed of progress. In this paper, we demonstrate a two-phase crowdsourcing workflow that consists of a synset localization step followed by a validation step. Validation is performed using the inter-rater agreement metrics Fleiss’ kappa and Krippendorf’s alpha, which allow us to estimate the precision of the result, as well as to set a balance between precision and recall. In our experiment, 947 synsets were localized from English to Mongolian and evaluated through crowdsourcing with the precision of 0.74.
Энэ ажлаар үгийн утгазүйн цахим сан - Нутгийн Мэдлэгийн Цөмийг (НМЦ) (Local Knowledge Core) хамтын ажиллагаат олны хүчээр үүсгэх аргачлалыг хэрэгжүүлсэн нээлттэй эхийн системийг танилцуулах болно. Энэ аргачлал олон хэлээр илэрхийлсэн утгазүйн уялдаа холбоо бүхий багц ойлголтуудыг ямар нэг хэл рүү нутагшуулахад оролцогчдыг зохион байгуулж тэдний хувь нэмрийг үр дүнтэй нэгтгэж утгазүйн шинэ цахим сан үүсгэх зорилготой. Бид ойлголтыг нутагшуулах гурван-шатлалт хүний оюуны даалгаврыг, мөн оролцогчдод даалгаварыг оновчтой үүсгэж өгөх алгоритмыг зохиож хэрэгжүүлсэн.
An efficient data analysis of traffic flow plays an important role in achieving better transportation services. The aim of this work is to find out passengers' travel pattern from incomplete transport access data. Our proposed big data analytical model predicting endpoints of travel regularity gives significantly improved representation of live traffic behavior. We investigated nearly 38.3k patterns in three months data recorded 35M boarding actions.
Building and enriching knowledge base techniques via crowdsourcing have been broadly investigated. However, crowdsourcing localization of ontology of diversity-aware knowledge base and of geospatial entities are studied less. In this paper, we show preliminary experiments on ontology localization task and transliteration of geographical names through crowd force. In the result of asking bilingual web users human intelligence task to translate ontology terms in English into Mongolian, we obtained 77% accuracy of localized ontology. Also 60% of location names have correctly localized by one crowdsourcing activity.
Developing ontologies from scratch appears to be very expensive in terms of cost and time required and often such efforts remain unfinished for decades. Ontology localization through translation seems to be a promising approach towards addressing this issue as it enables the greater reuse of the ontological (backbone) structure. However, during ontology localization, managing language diversity across cultures remains as a challenge that has to be taken into account and dealt with the right level of attention and expertise. Furthermore, reliability of the provided knowledge in the localized ontology is appearing as a non-trivial issue to be addressed. In this paper, we report the result of our experiment, performed on approximately 1000 concepts taken from the space ontology originally developed in English, consisted in providing their translation into Mongolian.