Бидний тухай
Багш ажилтан
Data reuse is fundamental for reducing the data integration effort required to build data supporting new applications, especially in data scarcity contexts. However, data reuse requires to deal with data heterogeneity, which is always present in data coming from different sources. Such heterogeneity appears at different levels, like the language used by the data, the structure of the information it represents, and the data types and formats adopted by the datasets. Despite the valuable insights gained by reusing data across contexts, dealing with data heterogeneity is still a high price to pay. Additionally, data reuse is hampered by the lack of data distribution infrastructures supporting the production and distribution of quality and inter-operable data. These issues affecting data reuse are amplified considering cross-country data reuse, where geographical and cultural differences are more pronounced. In this paper, we propose LiveData, a cross-country data distribution network handling high quality and diversity-aware data. LiveData is composed by different nodes having an architecture providing components for the generation and distribution of a new type of data, where heterogeneity is transformed into information diversity and considered as a feature, explicitly defined and used to satisfy the data users purposes. This paper presents the specification of the LiveData network, by defining the architecture and the type of data handled by its nodes. This specification is currently being used to implement a concrete use case for data reuse and integration between the University of Trento (Italy) and the National University of Mongolia.
Smartphones enable understanding human behavior with activity recognition to support people’s daily lives. Prior studies focused on using inertial sensors to detect simple activities (sitting, walking, running, etc.) and were mostly conducted in homogeneous populations within a country. However, people are more sedentary in the post-pandemic world with the prevalence of remote/hybrid work/study settings, making detecting simple activities less meaningful for context-aware applications. Hence, the understanding of (i) how multimodal smartphone sensors and machine learning models could be used to detect complex daily activities that can better inform about people’s daily lives, and (ii) how models generalize to unseen countries, is limited. We analyzed in-the-wild smartphone data and ∼ 216K self-reports from 637 college students in five countries (Italy, Mongolia, UK, Denmark, Paraguay). Then, we defined a 12-class complex daily activity recognition task and evaluated the performance with different approaches. We found that even though the generic multi-country approach provided an AUROC of 0.70, the country-specific approach performed better with AUROC scores in [0.79-0.89]. We believe that research along the lines of diversity awareness is fundamental for advancing human behavior understanding through smartphones and machine learning, for more real-world utility across countries.
Mood inference with mobile sensing data has been studied in ubicomp literature over the last decade. This inference enables context-aware and personalized user experiences in general mobile apps and valuable feedback and interventions in mobile health apps. However, even though model generalization issues have been highlighted in many studies, the focus has always been on improving the accuracies of models using different sensing modalities and machine learning techniques, with datasets collected in homogeneous populations. In contrast, less attention has been given to studying the performance of mood inference models to assess whether models generalize to new countries. In this study, we collected a mobile sensing dataset with 329K self-reports from 678 participants in eight countries (China, Denmark, India, Italy, Mexico, Mongolia, Paraguay, UK) to assess the effect of geographical diversity on mood inference models. We define and evaluate country-specific (trained and tested within a country), continent-specific (trained and tested within a continent), country-agnostic (tested on a country not seen on training data), and multi-country (trained and tested with multiple countries) approaches trained on sensor data for two mood inference tasks with population-level (non-personalized) and hybrid (partially personalized) models. We show that partially personalized country-specific models perform the best yielding area under the receiver operating characteristic curve (AUROC) scores of the range 0.78--0.98 for two-class (negative vs. positive valence) and 0.76--0.94 for three-class (negative vs. neutral vs. positive valence) inference. Further, with the country-agnostic approach, we show that models do not perform well compared to country-specific settings, even when models are partially personalized. We also show that continent-specific models outperform multi-country models in the case of Europe. Overall, we uncover generalization issues of mood inference models to new countries and how the geographical similarity of countries might impact mood inference.
This paper presents the submission of team NUM DI to the SIGMORPHON 2022 Task on Morpheme Segmentation Part 1, word-level morpheme segmentation. We explore the transformer neural network approach to the shared task. We develop monolingual models for world-level morpheme segmentation and focus on improving the model by using various training strategies to improve accuracy and generalization across languages.
Энэ ажлаар Монгол дохионы хэлний анхны орчуулагчийг хүний онцлог цэгүүдийг ашигласан гүн сургалтын аргаар бий болгов. Компьютер дүрс боловсруулалтын салбарт энэ төрлийн асуудал дээр гүн сургалтын арга хэрэглэж сургахад их хэмжээний өгөгдөл шаардлагатай байдаг. Одоогоор дэлхийд 300 гаруй дохионы хэл байдаг буюу улс бүр өөрсдийн дохионы хэлтэй байдаг. Бид энэ ажлын хүрээнд Монгол хэлний анхны дохионы хэлний өгөгдлийг үүсгэж ашигласан болно. Бид эхний ээлжинд 10 өгүүл- бэрийг илэрхийлсэн дохионы хэлний өндөр чанартай 869 видеог бэлтгэн үүнээс хүний нүүр, цээж, баруун, зүүн гараас нийтдээ 1662 ширхэг хүний биеийн онцлог цэгүүдийг ашиглан сургасан гүн сургалтын заг- вар гаргаж авав. Бидний сургасан загвар 96% нарийвчлалтайгаар дохионы хэлийг зөв таньж орчуулдаг болсон.
Аливаа хэл хооронд оршдог бичлэг болон дуудлага төстэй, ижил утгатай гарал нэг үгсийг тодорхойлох нькомпьютер хэл шинжлэлийн даалгаварт хэрэглэх хэлний шинэ нөөцийг үүсгэх боломжийг олгож байна.Энэ ажлаар үгийн үсгийн дараалалд тулгуурлан гарал нэг үгийг автоматаар үүсгэх аргыг боловсруулахыгзорьсон юм. Бид төстэй болон өөр үсэгтэй таван хос хэлний хувьд гарал нэг үгсийг үүсгэх seq2seq гүнсургалтын загварыг гаргалаа. Сургасан загварыг үүсгэсэн үгийн тэмдэгтийн зөрүүгээр үнэлэхэд дунджаар 0.73 оновчтойгоор гарал нэг үгсийг зөв үүсгэж чадсан.
Renewable energy includes solar and wind energy are depend on weather conditions and site-specific conditions. In this research work, we develop site-specific model for predicting energy productions from a photovoltaic (PV) system using machine learning based on weather data. The weather and the production data used in this work corresponds to a day averaged weather and power measurements collected from 2014. We compare two regression techniques, including Ridge and Random Forest. We evaluate the accuracy of each models using test dataset. Our results show that Random Forest regression-based model accuracy is highest with 0.99 and less features used.
Энэ ажлаар үгийн утгазүйн цахим сан - Нутгийн Мэдлэгийн Цөмийг (НМЦ) (Local Knowledge Core) хамтын ажиллагаат олны хүчээр үүсгэх аргачлалыг хэрэгжүүлсэн нээлттэй эхийн системийг танилцуулах болно. Энэ аргачлал олон хэлээр илэрхийлсэн утгазүйн уялдаа холбоо бүхий багц ойлголтуудыг ямар нэг хэл рүү нутагшуулахад оролцогчдыг зохион байгуулж тэдний хувь нэмрийг үр дүнтэй нэгтгэж утгазүйн шинэ цахим сан үүсгэх зорилготой. Бид ойлголтыг нутагшуулах гурван-шатлалт хүний оюуны даалгаврыг, мөн оролцогчдод даалгаварыг оновчтой үүсгэж өгөх алгоритмыг зохиож хэрэгжүүлсэн.
An efficient data analysis of traffic flow plays an important role in achieving better transportation services. The aim of this work is to find out passengers' travel pattern from incomplete transport access data. Our proposed big data analytical model predicting endpoints of travel regularity gives significantly improved representation of live traffic behavior. We investigated nearly 38.3k patterns in three months data recorded 35M boarding actions.