Бүтээл, нийтлэл - Монгол Улсын Их Сургууль

Судалгааны чиглэл:

Мэдээллийг профессор, багш, ажилтан МУИС-ийн мэдээллийн санд бүртгүүлснээр танд харуулж байна. Мэдээлэл дутуу, буруу тохиолдолд бид хариуцлага хүлээхгүй.

Өвчлөх эрсдлийг машин сургалтаар тооцох аргачлалын зохиомж 2025

Зохиогч(ид): Г.Амарсанаа, З.Цолмон, Х.Дорж, С.Цогтсайхан
"Өвчлөх эрсдлийг машин сургалтаар тооцох аргачлалын зохиомж", Монголын Мэдээллийн Технологи эрдэм шинжилгээний хурал, 2025-5-23, vol. 12, pp. 27-31

Хураангуй

Эрүүл мэндийн их өгөгдлийн шинжилгээг үйлчлүүлэгчид үзүүлэх тусламж үйлчилгээг чанартай, үр дүнтэй болгоход тухайлбал, өвчин, эмгэг төлөвийг урьдчилан таамаглах, эрсдэлийн үнэлгээ хийхэд өргөн ашиглаж байна. Бид хүн амын эрүүл мэндийн их өгөгдлийн дүн шинжилгээнд ашиглах боломжтой өгөгдлийн инженерчлэл, машин сургалт ашигласан эрсдэлийн үнэлгээний загвар хөгжүүлэх зорилго тавин энэ судалгааг хийлээ. Судалгаанд 2024 оны 10-р сараас 2025 оны 3-р сард АШУҮИС-ийн Монгол-Япон эмнэлгийн Эрүүл мэндийг дэмжих төвөөр үйлчлүүлэн эрүүл мэндийн урьдчилан сэргийлэх үзлэг шинжилгээнд хамрагдсан 961 хүний өгөгдөл болон үзлэг шинжилгээний дүнгээр оношлогдсон хоол боловсруулах өвчний оношнуудын өгөгдлийг ашиглав. Энэхүү аргачлал нь өгөгдөл цуглуулах, боловсруулах, цэвэрлэх, машин сургалтад бэлтгэх болон сургах гэсэн үе шатуудаар хэрэгжүүлнэ. Өгөгдөл бага учир үр дүнг үнэлэх боломжгүй хэдий ч бид одоогийн байдлаар өвчин тус бүр дээр ач холбогдол бүхий утгуудыг түүж авч, эрсдэлийн бүлгүүдийг үүсгэн хувь хүн бүрд эрсдэлийн үнэлгээг гаргах, логистик регрессийн аргаар машин сургах боломжтой аргачлалыг зохиомжиллоо.

Cognate Production Using Character-Based Neural Machine Translation Without Segmentation 2025

Зохиогч(ид): З.Цолмон, Г.Амарсанаа, Б.Хуягбаатар, M.Tsendsuren
"Cognate Production Using Character-Based Neural Machine Translation Without Segmentation" IEEE Access, vol. 13, pp. 34824 - 34830, 2025-2-19

https://ieeexplore.ieee.org/document/10892102?source=authoralert

Хураангуй

Cognates are words that share a common origin or have been borrowed across languages, often exhibiting similarities in both sound and meaning. In this work, we introduce a fully character-level neural sequence-to-sequence model for cognate production that does not require any segmentation. Our model operates at the character-level to transform a source word into its corresponding cognate in the target language, thereby obviating out-of-vocabulary issues and alleviating the need for subword segmentation. We evaluated our approach on a novel dataset and found that it outperforms both statistical machine translation baselines and prior neural methods on the same training dataset, as measured by standard coverage and mean reciprocal rank metrics. These results underscore the effectiveness of character-level sequence-to-sequence architectures for cognate generation in diverse language settings, including cross-alphabetic transformations.

LiveData A Worldwide Data Mesh for Stratified Data 2024

Зохиогч(ид): З.Цолмон, Б.Симоне, Г.Амарсанаа
"LiveData A Worldwide Data Mesh for Stratified Data", Монголын Мэдээллийн Технологи 2024, 2024-5-23, vol. 11, pp. 146-152

Хураангуй

Data reuse is fundamental for reducing the data integration effort required to build data supporting new applications, especially in data scarcity contexts. However, data reuse requires to deal with data heterogeneity, which is always present in data coming from different sources. Such heterogeneity appears at different levels, like the language used by the data, the structure of the information it represents, and the data types and formats adopted by the datasets. Despite the valuable insights gained by reusing data across contexts, dealing with data heterogeneity is still a high price to pay. Additionally, data reuse is hampered by the lack of data distribution infrastructures supporting the production and distribution of quality and inter-operable data. These issues affecting data reuse are amplified considering cross-country data reuse, where geographical and cultural differences are more pronounced. In this paper, we propose LiveData, a cross-country data distribution network handling high quality and diversity-aware data. LiveData is composed by different nodes having an architecture providing components for the generation and distribution of a new type of data, where heterogeneity is transformed into information diversity and considered as a feature, explicitly defined and used to satisfy the data users purposes. This paper presents the specification of the LiveData network, by defining the architecture and the type of data handled by its nodes. This specification is currently being used to implement a concrete use case for data reuse and integration between the University of Trento (Italy) and the National University of Mongolia.

Complex Daily Activities, Country-Level Diversity, and Smartphone Sensing: A Study in Denmark, Italy, Mongolia, Paraguay, and UK 2023

Зохиогч(ид): A.Karim, M.Lakmal, З.Цолмон, C.Carlo, M.Deniele, L.José, H.Alethia, C.Luka, B.Ivano, D.Marcelo, B.Matteo, C.Ronald, D.William, G.Fausto, G.Daniel, K.Peter, D.Amalia, B.Miriam, S.Sally, G.George, Ч.Алтангэрэл, Г.Амарсанаа
"Complex Daily Activities, Country-Level Diversity, and Smartphone Sensing: A Study in Denmark, Italy, Mongolia, Paraguay, and UK", Conference on Human Factors in Computing Systems, Герман, 2023-4-24, vol. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp. 1-23

Хураангуй

Smartphones enable understanding human behavior with activity recognition to support people’s daily lives. Prior studies focused on using inertial sensors to detect simple activities (sitting, walking, running, etc.) and were mostly conducted in homogeneous populations within a country. However, people are more sedentary in the post-pandemic world with the prevalence of remote/hybrid work/study settings, making detecting simple activities less meaningful for context-aware applications. Hence, the understanding of (i) how multimodal smartphone sensors and machine learning models could be used to detect complex daily activities that can better inform about people’s daily lives, and (ii) how models generalize to unseen countries, is limited. We analyzed in-the-wild smartphone data and ∼ 216K self-reports from 637 college students in five countries (Italy, Mongolia, UK, Denmark, Paraguay). Then, we defined a 12-class complex daily activity recognition task and evaluated the performance with different approaches. We found that even though the generic multi-country approach provided an AUROC of 0.70, the country-specific approach performed better with AUROC scores in [0.79-0.89]. We believe that research along the lines of diversity awareness is fundamental for advancing human behavior understanding through smartphones and machine learning, for more real-world utility across countries.

Generalization and Personalization of Mobile Sensing-Based Mood Inference Models: An Analysis of College Students in Eight Countries 2023

Зохиогч(ид): M.Lakmal, D.William, G.George, Ч.Алтангэрэл, Г.Амарсанаа, З.Цолмон, C.Carlo, M.Daniele, H.Alethia, Z.Jose Luiz, C.Luca, B.Ivano, K.Peter, B.Marcelo Rodas, B.Matteo, C.Ronald, G.Can, G.Fausto, S.Laura, G.Daniel, G.Amalia, N.Chaitanya, D.Shyam, C.Salvador Ruiz, S.Donglei, X.Hao, B.Miriam
"Generalization and Personalization of Mobile Sensing-Based Mood Inference Models: An Analysis of College Students in Eight Countries" Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 6, no. 4, pp. 1-32, 2023-1-11

https://dl.acm.org/doi/10.1145/3569483

Хураангуй

Mood inference with mobile sensing data has been studied in ubicomp literature over the last decade. This inference enables context-aware and personalized user experiences in general mobile apps and valuable feedback and interventions in mobile health apps. However, even though model generalization issues have been highlighted in many studies, the focus has always been on improving the accuracies of models using different sensing modalities and machine learning techniques, with datasets collected in homogeneous populations. In contrast, less attention has been given to studying the performance of mood inference models to assess whether models generalize to new countries. In this study, we collected a mobile sensing dataset with 329K self-reports from 678 participants in eight countries (China, Denmark, India, Italy, Mexico, Mongolia, Paraguay, UK) to assess the effect of geographical diversity on mood inference models. We define and evaluate country-specific (trained and tested within a country), continent-specific (trained and tested within a continent), country-agnostic (tested on a country not seen on training data), and multi-country (trained and tested with multiple countries) approaches trained on sensor data for two mood inference tasks with population-level (non-personalized) and hybrid (partially personalized) models. We show that partially personalized country-specific models perform the best yielding area under the receiver operating characteristic curve (AUROC) scores of the range 0.78--0.98 for two-class (negative vs. positive valence) and 0.76--0.94 for three-class (negative vs. neutral vs. positive valence) inference. Further, with the country-agnostic approach, we show that models do not perform well compared to country-specific settings, even when models are partially personalized. We also show that continent-specific models outperform multi-country models in the case of Europe. Overall, we uncover generalization issues of mood inference models to new countries and how the geographical similarity of countries might impact mood inference.

Word-level Morpheme segmentation using Transformer neural network 2022

Зохиогч(ид): З.Цолмон, A.Chinbat
"Word-level Morpheme segmentation using Transformer neural network", Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, Америкийн нэгдсэн улс, 2022-7-14, vol. 19, pp. 139-143

Хураангуй

This paper presents the submission of team NUM DI to the SIGMORPHON 2022 Task on Morpheme Segmentation Part 1, word-level morpheme segmentation. We explore the transformer neural network approach to the shared task. We develop monolingual models for world-level morpheme segmentation and focus on improving the model by using various training strategies to improve accuracy and generalization across languages.

Гүн сургалтын арга ашигласан Монгол дохионы хэлний хөрвүүлэгч 2022

Зохиогч(ид): З.Цолмон
"Гүн сургалтын арга ашигласан Монгол дохионы хэлний хөрвүүлэгч" Mongolian Journal of Engineering and Applied Science, vol. 4, pp. 1-5, 2022-5-4

https://journal.num.edu.mn/EAS

Хураангуй

Энэ ажлаар Монгол дохионы хэлний анхны орчуулагчийг хүний онцлог цэгүүдийг ашигласан гүн сургалтын аргаар бий болгов. Компьютер дүрс боловсруулалтын салбарт энэ төрлийн асуудал дээр гүн сургалтын арга хэрэглэж сургахад их хэмжээний өгөгдөл шаардлагатай байдаг. Одоогоор дэлхийд 300 гаруй дохионы хэл байдаг буюу улс бүр өөрсдийн дохионы хэлтэй байдаг. Бид энэ ажлын хүрээнд Монгол хэлний анхны дохионы хэлний өгөгдлийг үүсгэж ашигласан болно. Бид эхний ээлжинд 10 өгүүл- бэрийг илэрхийлсэн дохионы хэлний өндөр чанартай 869 видеог бэлтгэн үүнээс хүний нүүр, цээж, баруун, зүүн гараас нийтдээ 1662 ширхэг хүний биеийн онцлог цэгүүдийг ашиглан сургасан гүн сургалтын заг- вар гаргаж авав. Бидний сургасан загвар 96% нарийвчлалтайгаар дохионы хэлийг зөв таньж орчуулдаг болсон.

Гарал нэг үгийг үсгийн дараалалд тулгуурласан seq2seqзагвараар үүсгэх нь 2021

Зохиогч(ид): З.Цолмон, Г.Амарсанаа, Б.Хуягбаатар
"Гарал нэг үгийг үсгийн дараалалд тулгуурласан seq2seqзагвараар үүсгэх нь" MONGOLIAN JOURNAL OF ENGINEERING AND APPLIED SCIENCES, vol. 2, pp. 1-6, 2021-5-14

http://seas.num.edu.mn/article/232

Хураангуй

Аливаа хэл хооронд оршдог бичлэг болон дуудлага төстэй, ижил утгатай гарал нэг үгсийг тодорхойлох нькомпьютер хэл шинжлэлийн даалгаварт хэрэглэх хэлний шинэ нөөцийг үүсгэх боломжийг олгож байна.Энэ ажлаар үгийн үсгийн дараалалд тулгуурлан гарал нэг үгийг автоматаар үүсгэх аргыг боловсруулахыгзорьсон юм. Бид төстэй болон өөр үсэгтэй таван хос хэлний хувьд гарал нэг үгсийг үүсгэх seq2seq гүнсургалтын загварыг гаргалаа. Сургасан загварыг үүсгэсэн үгийн тэмдэгтийн зөрүүгээр үнэлэхэд дунджаар 0.73 оновчтойгоор гарал нэг үгсийг зөв үүсгэж чадсан.

PREDICTION MODEL OF PHOTOVOLTAIC POWER GENERATION FROM WEATHER DATA USING MACHINE LEARNING 2018

Зохиогч(ид): З.Цолмон, Д.Баясгалан
"PREDICTION MODEL OF PHOTOVOLTAIC POWER GENERATION FROM WEATHER DATA USING MACHINE LEARNING", Asia-Pacific Forum on Renewable Energy (AFORE), 2018-8-22, vol. 8, pp. 72-73

Хураангуй

Renewable energy includes solar and wind energy are depend on weather conditions and site-specific conditions. In this research work, we develop site-specific model for predicting energy productions from a photovoltaic (PV) system using machine learning based on weather data. The weather and the production data used in this work corresponds to a day averaged weather and power measurements collected from 2014. We compare two regression techniques, including Ridge and Random Forest. We evaluate the accuracy of each models using test dataset. Our results show that Random Forest regression-based model accuracy is highest with 0.99 and less features used.

Нутгийн Мэдлэгийн Цөмийг хамтын ажиллагаат олны хүчээр үүсгэх систем 2018

Зохиогч(ид): З.Цолмон, B.Erdenebileg, Г.Амарсанаа
"Нутгийн Мэдлэгийн Цөмийг хамтын ажиллагаат олны хүчээр үүсгэх систем", Монголын Мэдээллийн Технологи эрдэм шинжилгээний хурал, 2018-5-2, vol. 2018, pp. 23-28

Хураангуй

Энэ ажлаар үгийн утгазүйн цахим сан - Нутгийн Мэдлэгийн Цөмийг (НМЦ) (Local Knowledge Core) хамтын ажиллагаат олны хүчээр үүсгэх аргачлалыг хэрэгжүүлсэн нээлттэй эхийн системийг танилцуулах болно. Энэ аргачлал олон хэлээр илэрхийлсэн утгазүйн уялдаа холбоо бүхий багц ойлголтуудыг ямар нэг хэл рүү нутагшуулахад оролцогчдыг зохион байгуулж тэдний хувь нэмрийг үр дүнтэй нэгтгэж утгазүйн шинэ цахим сан үүсгэх зорилготой. Бид ойлголтыг нутагшуулах гурван-шатлалт хүний оюуны даалгаврыг, мөн оролцогчдод даалгаварыг оновчтой үүсгэж өгөх алгоритмыг зохиож хэрэгжүүлсэн.

Traffic Flow Analysis on Public Transport Access Data 2016

Зохиогч(ид): З.Цолмон, Г.Амарсанаа, Ж.Пүрэв
"Traffic Flow Analysis on Public Transport Access Data", FITAT, 2016-4-3, vol. 9, pp. 49-52

Хураангуй

An efficient data analysis of traffic flow plays an important role in achieving better transportation services. The aim of this work is to find out passengers' travel pattern from incomplete transport access data. Our proposed big data analytical model predicting endpoints of travel regularity gives significantly improved representation of live traffic behavior. We investigated nearly 38.3k patterns in three months data recorded 35M boarding actions.

Дэлгэрэнгүй мэдээлэл