a study on interpretable machine learning and applications

20
A Study on Interpretable Machine Learning and Applications Nguyen Thanh Phu 1820009 HUYNH LAB Graduate School of Advanced Science and Technology Japan Advanced Institute of Science and Technology Knowledge Science March, 2021 Keywords: Interpretable Machine Learning, Explainable Artificial Intelligence, K-means-based Clustering, Classification, Dempster–Shafer Theory. Research Content Machine learning and data mining techniques have been developed rapidly in recent times. In tasks such as classification, machine learning techniques have been shown to equal to and even surpass human performance. However, high-performance models are usually complex, opaque and have low interpretability thus making it difficult to explain the underlying behaviors of those models that lead to the final outcomes. In many domains such as medicine and health- care, interpretability is one of the most important factors when considering the adoption of those models. In our research, we aim to develop transparent machine learning models that are not only able to provide users with knowledge about the underlying data but also still can achieve competitive performance compared with other commonly used techniques. Specifically, in the field of unsupervised learning, clustering is a fundamental task that has been utilized in many scientific fields. Clustering groups data into clusters. For each cluster, objects in the same cluster are similar between themselves and dissimilar to objects in other clusters. K-means is a popular interpretable method for the clustering task. However, it suffers from the problem of underfitting data with simple dissimilarity measures - a key part in formu- lating clusters of k-means. In our work, we proposed a new k-means-based clustering method with a novel dissimilarity measure that can better fit with the underlying data. The effectiveness of the proposed clustering algorithm is proven by a comparative study conducted on popular clustering methods for categorical data. In the field of supervised learning, we proposed a two-stage binary classification system Copyright c 2021 by Nguyen Thanh Phu i

Upload: others

Post on 20-Feb-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Nguyen Thanh Phu 1820009
HUYNH LAB
Graduate School of Advanced Science and Technology Japan Advanced Institute of Science and Technology
Knowledge Science
March, 2021
Research Content
Machine learning and data mining techniques have been developed rapidly in recent times. In tasks such as classification, machine learning techniques have been shown to equal to and even surpass human performance. However, high-performance models are usually complex, opaque and have low interpretability thus making it difficult to explain the underlying behaviors of those models that lead to the final outcomes. In many domains such as medicine and health- care, interpretability is one of the most important factors when considering the adoption of those models. In our research, we aim to develop transparent machine learning models that are not only able to provide users with knowledge about the underlying data but also still can achieve competitive performance compared with other commonly used techniques.
Specifically, in the field of unsupervised learning, clustering is a fundamental task that has been utilized in many scientific fields. Clustering groups data into clusters. For each cluster, objects in the same cluster are similar between themselves and dissimilar to objects in other clusters. K-means is a popular interpretable method for the clustering task. However, it suffers from the problem of underfitting data with simple dissimilarity measures - a key part in formu- lating clusters of k-means. In our work, we proposed a new k-means-based clustering method with a novel dissimilarity measure that can better fit with the underlying data. The effectiveness of the proposed clustering algorithm is proven by a comparative study conducted on popular clustering methods for categorical data.
In the field of supervised learning, we proposed a two-stage binary classification system
Copyright c© 2021 by Nguyen Thanh Phu
i
named GSIC that is applicable for healthcare (or general) data. GSIC benefits from a high level of interpretability and can at the same time achieve the results comparable to commonly used classification techniques. The motivation behind the proposed system is the lack of effective classification methods for handling data generated by various distributions (such as healthcare or banking data) that can harmonize both performance and interpretability perspectives. The experimental evaluation with a use case in sepsis patients staying in ICU has shown the merits of our proposed classification system.
On the other hand, we realized the limitation of our proposed classification system when dealing with the uncertainty. Specifically, real data with high uncertainty and ambiguity is challenging for the classification task. E-KNN is a popular evidence theory-based classification method developed for handling uncertainty data. However, as a distance-based technique, it also suffers from the problem of high dimensionality as well as mixed distributed data where closed data points originated from different classes. Based on that motivation, we enhanced our proposed classification system GSIC with the capability of handling the uncertainty existing in the underlying data. The classification experiment conducted on various real data and popular classifiers has shown that the proposed technique has competitive results compared with state- of-the-art methods.
Research Purpose
Clustering is a common method that is widely used in a variety of fields. Clustering groups data into clusters. For each cluster, objects in the same cluster are similar between themselves and dissimilar to objects in other clusters [Berkhin, 2006]. K-means [Macqueen,1967] is the most well-known and widely used clustering method. However, one inherent limitation of this approach is its data type constraint, as the k-means technically can only work with numerical data type. During the last decade or so, several attempts have been made in order to remove the numeric data only limitation of k-means to make it applicable to clustering for categorical data. Particularly, some k-means like methods for categorical data have been proposed such as k-modes [Huang,1998], k-representatives [San et al., 2004], k-centers [Chen et al., 2013] and k-means like clustering algorithm [Nguyen et al., 2016]. Although these algorithms use a sim- ilar clustering fashion to the k-means algorithm, they are different in defining cluster mean or dissimilarity measure for categorical data.
Furthermore, measures to quantify the dissimilarity (similarity) for categorical values are still not well-understood because there is no coherent metric available between categorical values thus far. Several methods have been proposed for encoding categorical data as numerical values such as dummy coding (or indicator coding) [Cohen, 1983]. Particularly, they use binary values to indicate whether a categorical value is absent or present in a data record. However, by treating each category as an independent variable in that way, many important features and character- istics of categorical data type such as the distribution of categories or their relationships may not be taken into account. Especially, most previous works have unfortunately neglected the semantic information potentially inferred from relationships among categories. In this research, we propose a new clustering algorithm that is able to integrate those kinds of information into the clustering process for categorical data. Specifically, the new categorical clustering algorithm takes account of the semantic relationships between categories into the dissimilarity measure. Finally, an extensive experimental evaluation on benchmark data sets from UCI Machine Learn- ing Repository has proved the efficiency of our proposed algorithm with other existing methods
Copyright c© 2021 by Nguyen Thanh Phu
ii
in term of clustering quality.
For the task of classification, applying deep learning techniques could bring higher accuracy when dealing with big and heterogeneous data. However, such high accuracy comes with high complexity and opaqueness in the models [Johansson et al., 2011]. This situation leads to the difficulty of interpretability of those models - one of the important and required properties when implementing them within a decision support system, especially in medicine, healthcare and domains which require transparency for high-stake decisions [Rudin, 2018]. Recently, there is an increase in the popularity of explanatory artificial intelligence (XAI) or relatedly interpretable ML. XAI allows the transparency in whole or parts of systems and the explainability for the decisions from them. According to [Gilpin et al., 2018], those explanations are important to ensure algorithmic fairness, identify potential bias/problems in the training data and ensure that the algorithms perform as expected.
Due to the need for high-performance interpretable ML models, especially for medicine and healthcare applications, we propose a binary classifying system named GSIC (GSOM-based In- terpretable Classifying System) that based on a systematic combination of unsupervised and supervised ML techniques. In the proposed system, GSOM (The Growing Self-Organizing Map) [Alahakoon et al., 2000] plays a key role to help overcome the curse of dimensionality problem as well as improve the efficiency and interpretability by analyzing its generated mapping results. GSOM is selected as a popular dimensional reduction and visualization method that has the advantages of dynamically learning new data representation and capability of revealing salient relations between objects from underlying data contexts. In other to evaluate the performance of our proposed system, an experiment on the classification task is conducted. Furthermore, a use case on specific data of sepsis patients in the Intensive Care Unit (ICU) is demonstrated in order to prove the merit of our proposed system.
Besides, we realized the proposed classification system GSIC has the limitation when dealing with the uncertainty data which can degrade its performance. Specifically, the uncertainty can exist inside data due to the lack of information (which some cases cause the problem of sparse of dimensionality) or overlapping (mix-distributed) data. As the problem of uncertainty cannot be solved by traditional probabilistic framework [Jousselme et al., 2003], Evidence theory [Shafer, 1976] is one popular solution that can be applied to handle the uncertainty. In the field of supervised learning, several approaches that adopt evidence theory for solving the uncertainty problem. One of the notable methods is EKNN [Denoeux, 1995] which classifies a new data point based on evidence of classes of its neighbors which are discounted with the distances be- tween them. Despite its robustness in dealing with uncertainty information, EKNN suffers from several problems such as computational complexity like any KNN-based methods which come from the induction of distances in sample space. Several methods have been proposed to rem- edy this limitation by applying feature selection or dimensional reduction techniques in order to shrink the feature spaces such as ConvNet-BF [Tong et al., 2019] or REK-NN [Su et al., 2020].
Another inherent limitation of EKNN lays in its working mechanism of selecting a fixed number of nearby neighbors to induce evidence for classification process which can lead to mis- classifying due to the lack of information or closed data points originating from different classes. Also, the assignment of hard-to-classified data to only an ignorance group is also debatable. In order to reduce those limitations, instead of referencing to nearby neighbors, several methods compare new data points with prototypes that are produced from the training process such as ProDS [Denoeux et al., 2000], CCR [Liu et al., 2014]. Other methods consider assigning
Copyright c© 2021 by Nguyen Thanh Phu
iii
hard-to-classified data to a various classes-combined group named meta-class in addition to the original ignorance class such as BK-NN [Liu et al., 2013]. However, there are several criticisms that evidence based on classes of prototypes is unreasonable due to their non-semantic repre- sentations. Moreover, the classes of new data points also have to be specified with a degree of certainty rather than merely assigned to some common groups of classes.
In our work, we make an effort to remedy the above-mentioned problems by proposing a new classification method that can induce the evidence about the classes of new data objects based on groups of data that belong to various distributions. Specifically, by assuming that a data set contains instances that are generated by several different distributions, data generated by each distribution can be represented in the form of heterogeneous clusters. Each distribution has different rules for characterizing the classes of its generated data. For each cluster, we gauge behaviors of the data distribution by using decision trees on the whole set of data belonging to that cluster. Results from those decision trees could be considered as evidence for determining the classes of new data points. For making a final decision about the predicted class, Demp- ster’s combination rule [Shafer, 1976] is used to fuse the evidence collected from previous steps. Finally, a classification experiment conducted on various real data and popular classifiers has shown that the proposed technique has the results comparable to state-of-the-art methods.
Research Accomplishment
• T.-P. Nguyen and V.-N. Huynh, “A New Classification Technique Based on The Combination of Inner Evidence” in IUKM 2020: Integrated Uncertainty in Knowledge Modelling, 2020, pp. 174-186.
• T.-P. Nguyen, S. Nguyen, D. Alahakoon and V.-N. Huynh, “GSIC: A New Interpretable Sys- tem for Knowledge Exploration and Classification” in IEEE Access, vol. 8, pp. 108544-108554, 2020, doi: 10.1109/ACCESS.2020.3001428.
• T.-P. Nguyen and V.-N. Huynh, “A New Interpretable System for Knowledge Exploration and Classification: ICU Sepsis Data Case Study” in AHFE 2020: The Human Side of Service Engineering, July 2020.
• T.-P. Nguyen, D.-T. Dinh, and V.-N. Huynh, “A New Context-Based Clustering Framework for Categorical Data” in PRICAI 2018: Trends in Artificial Intelligence, 2018, pp. 697–709.
Copyright c© 2021 by Nguyen Thanh Phu
iv
A Study on Facility Location-allocation Models for Humanitarian Relief Logistics
PRANEETPHOLKRANG PANCHALEE 1820028
Supervisor: Professor Huynh Van Nam
Graduate School of Advanced Science and Technology Japan Advanced Institute of Science and Technology
[Knowledge Science]
March 2021
Part 1: Research Content
Nowadays, disaster onsets occur more frequently and take severe impacts on humankind and economic systems across the world. When disaster strikes, the relief agencies typically dispatch the relief supplies to help the victims as well as rescue the victims from the affected areas to the safe shelters. It can be stated that decision-making on shelter location-allocation is the most critical part of humanitarian relief logistics because it affects victims’ security and influences the success of disaster management strategy. Without an appropriate approach for determining shelter location-allocation, decision- makers would make ad-hoc decisions which result in high cost, slow response, and failure in rescue the victims. Proposing location-allocation models in the context of humanitarian logistics, monetary criterion cannot be ignored because it helps decision-makers to prepare sufficient budget in response to disaster. In the same way, considering monetary and non-monetary criteria simultaneously helps to ensure that the victims are being taken care well under the optimal budget. Other than model formulations, the proposed models should be solved by proper approaches to generate optimal solutions. The victims and decision-makers would get the benefit if the proposed models could simplify prompt decision-making for determining location-allocation in response to disasters. The aim of this research is to propose the models and a novel solution to assist the decision-makers to determine shelter location- allocation. Both monetary and non-monetary criteria are taken into account in the proposed models. The applicability of the proposed models is validated through the real-word case study of shelter location-allocation in response to flood in Surat Thani province of Thailand. The results generated by the proposed models are evaluated with the current shelter allocation plan determined by government sectors.
Part 2: Research Purpose
This study proposes the models to determine shelter location-allocation in response to disaster. In addition to the models, a novel approach for dealing with location-allocation is proposed. Therein, four models are formulated to consider proper locations to use as shelters. The first model seeks to deter- mine shelter location-allocation with total cost minimization. The proposed mathematical model is solved by Genetic Algorithm. The second model
2
considers both monetary and non-monetary for justifying shelter location- allocation. The objectives of the model are to simultaneously minimize total cost, total evacuation time, and number of open shelters. The proposed mathematical model is solved by Epsilon Constraint method and Goal Pro- gramming which are the posteriori and priori methods respectively. The third model seeks to concurrently minimize total cost, and total evacuation time. The proposed model is solved by a novel approach that integrated Epsilon Constraint method and Artificial Neural Network to simplify fast decision-making on shelter location-allocation. To the best of our knowl- edge, there are no existing works that combined these methods in coping with location problems, especially in field of humanitarian relief logistics. The fourth model involves multi-echelon relief facilities location-allocation. The first echelon determines appropriate shelter location-allocation to mini- mize total cost and minimize total evacuation time, while the second echelon involves justifying distribution center location-allocation to minimize distri- bution cost. The proposed model is solved by Epsilon Constraint method. The applicability of the proposed models and proposed solution approach is validated through the case study of shelter location-allocation in response to flooding in Surat Thani, Thailand. The results generated by each model are compared with the current shelter location-allocation plan determined by the government sector. The comparison results indicate that consider- ing appropriate shelter location-allocation based on proposed models mostly produces lower total cost than the current plan with appropriate time frames for evacuating the victims. It is plausible to use the proposed models and proposed solution approach to improve shelter location-allocation in response to disasters for the benefit of victims and decision-makers.
Part 3: Research Accomplishment
The accomplishment of this study in aspects of theoretically and practically in location-allocation problem in humanitarian logistics can be demonstrated by the publications both international journals and international conferences as follows:
International Journal:
3
• Praneetpholkrang, P., Youji, K., Kanjanawattana, S., Huynh, V.N. Two-Echelon Relief Facility Location-Allocation Model for Humanitar- ian Supply Chain. Status: Plan to submit to International Journal of Logistics Systems and Management.
International Conference:
• Praneetpholkrang, P., Huynh, V. N. (2020). Shelter Site Selection and Allocation Model for Efficient Response to Humanitarian Relief Logis- tics. The 7th International Conference on Dynamics in Logistics, 12-14 February 2020, Bremen, Germany, in Dynamic in Logistics, Lecture Notes in Logistics (pp. 309–318). Springer. (peer review).
• Praneetpholkrang, P., Huynh, V. N., Kanjanawattana, S. Bi-Objective Optimization Model for Determining Shelter Location-Allocation in Humanitarian Relief Logistics. The 10th International Conference on Operations Research and Enterprise Systems, Online Streaming, 2-4 February 2021. (peer review).
Keywords: Facility Location-allocation, Relief Supply Chain, Disaster Man- agement, Multi-objective Optimization, Epsilon Constraint Method, Genetic Algorithm, Goal Programming, Artificial Neural Network
4
Classroom: Association-based Activities, Biometric Data Analysis and
Supportive Lighting Environment Exploration
Student Number: 1820031 Name: LIU TING
I. Research Content
Cultivating students’ creativity has become an important part of teaching foreign languages at the university level. Foreign language teachers need to think about curriculum design and teaching approaches that can spark creativity in their students. This study proposed a creative pedagogy for the foreign language classroom. Activities that involve association and mind mapping in a student-centered mode can encourage students to think creatively. This study implemented association-based activities with mind mapping to encourage students to exercise creative, divergent thinking in their learning process. The setting for the study was a school of Japanese studies at a university in Dalian city in China. At this university, the students generally follow a traditional curriculum, which is unconcerned with improving creativity. Our fundamental aim was to explore whether a creative pedagogy could effectively promote creativity development in students’ creative thinking skills, language proficiency, and learning motivation. The experimental group received an 8-week intervention that combined the regular curriculum with association-based activities with mind mapping. The control group received the regular curriculum. It assumed that association-based activities with mind mapping positively impact the cultivation of creativity.
At present, few studies have investigated to what extent association-based activities influence foreign language learning among university students in terms of creativity outcomes. To clarify the effect of the association-based activities on creativity, we employed an experimental methodology involving a pre-test/post-test repeated measures design. All students were tested on creativity performance using three assessment instruments, a creative thinking test, a foreign language proficiency test, and a motivation questionnaire: evaluating creative thinking skills through creative thinking test, performance rating by three factors of fluency, flexibility, and originality; assessing Japanese language proficiency through Japanese-language proficiency test,
in terms of vocabulary, reading comprehension, and writing; administering a motivation questionnaire, including choice, executive, and increased motivation questionnaire, to assess students’ learning motivation.
Besides using traditional tests to measure students creativity outcomes, an EEG investigation was taken for testing students’ divergent thinking skills, and an eye- tracking analysis was taken for assessing students’ Japanese language proficiency, which provided biometric data to further verify the effectiveness of creative pedagogy. In recent years, with the rise and development of cognitive neuroscience, the research techniques of electroencephalography (EEG) and brain function imaging have provided powerful research tools for directly observing the activity of the brain when processing complex information, which provides a more direct method for exploring the brain mechanism of creative thinking, especially divergent thinking. In this study, the brain wave images and data of the two groups students were compared and analyzed during the divergent thinking tasks’ process. It’s expected that the findings will deepen understanding and promote the study of the effectiveness of creative thinking skills. In addition, this study used eye tracking sensors to explore creative pedagogy’s effects on reading ability that is considered to be the comprehensive reflection of foreign language proficiency. Eye tracking sensors was used to record eye movement indicators in real time, going on to map the eye movement indicators to the reading process that can effectively analyze the reading ability, which provides a quantitative assessment and data evidence of creative pedagogy’s effectiveness on students’ language proficiency.
Moreover, besides teaching methods, providing suitable classroom learning environments may further promote the cultivation of creativity. This study explored the supportive classroom lighting environment that can improve students’ participation in association-based activities, so as to improve their creativity. No literature was found to have explored the relationship of association-based activities and classroom lighting environment in the perspective of university students. The findings in this study can be used as guidelines for designing psychology-oriented classroom environments that can support the creativity cultivation of students.
In summary, the findings in this study suggest that association-based activities could be taken into consideration when cultivating creativity in foreign language teaching in university, and could be carried out in supportive classroom lighting environment. Data and insights culled from the findings in this study establish the knowledge framework of creative foreign language teaching methods and evaluation, which will contribute to the knowledge science to set future directions for the creative pedagogy in the field of foreign language teaching and learning in undergraduate education.
II. Research Purpose
The overall purpose of this study is to construct a new type of foreign language classroom teaching method and learning environment to achieve the teaching goal of promoting the development of foreign language learners’ creativity, and to investigate what extent the creativity could be cultivated. Through applying association-based activities with mind mapping teaching method design and conducting the supportive classroom lighting environment, to explore the feasibility based on the analysis of biometric data valuation, and suggest practical implication for creative pedagogy design in the foreign language classroom.
Specific research objectives are as follows.
(1) Construction of a creative pedagogy of association-based activities with mind mapping that centered on the development of creativity.
(2) Clarification of the evaluation criterion of the association-based activities. Evaluation comes from three aspects: creative thinking skills, foreign language proficiency, and learning motivation.
· Presenting traditional measurement methods for investigating the association- based activities’ feasibility, including creative thinking test, foreign language proficiency test, and learning motivation questionnaire.
· Applying biometric data analysis of EEG investigation for creative thinking skills, and eye-tracking detection for foreign language proficiency to present more accurate numerical results.
(3) Exploration the supportive lighting environments for students’ participating in association-based activities to further promote their creativity.
This study takes “creativity is the inherent endowment of each student” as it’s starting point, and therefore does not regard creativity training as an additional teaching task in the process of foreign language teaching, but rather believes that it can promote learning motivation and improve the positive aspects of foreign language expertise in the daily classroom. It is hoped that the teaching methods and classroom learning environment as well as the evaluation pattern that are presented in this study can be extended to other foreign language education fields in colleges and universities and promote the reform of foreign language teaching.
III. Research Accomplishment
Papers published in journals
(1) Ting Liu, Takaya Yuizono; Mind Mapping Training’s Effects on Reading Ability: Detection Based on Eye Tracking Sensors; Sensors; 20, 4422, 15 pages, 2020. (Doi:10.3390/s20164422; Indexed by Scopus, SCI; Impact factor: 3.275; SJR Q1)
(2) Ting Liu, Takaya Yuizono, Zhisheng Wang, and Haiwen Gao; The Influence of Classroom Illumination Environment on the Efficiency of Foreign Language Learning; Applied Sciences; 10, 1901, 11 pages, 2020. (Doi:10.3390/app10061901; Indexed by Scopus, SCI; Impact factor: 2.474; SJR Q2)
Conference proceedings
a. Ting Liu, Takaya Yuizono; Developing Innovation Skills in Second Language Education-Cultivation of Creativity and Intercultural Communicative Competence-; The 13th International Conference on Knowledge, Information and Creativity Support Systems (KICSS-2018); 6 pages; November, 15-17, 2018, Pattaya, Thailand.
b. Ting Liu, Takaya Yuizono; Eye Movement Characteristics in Reading Foreign Language Text Based on Mind Mapping Training; The 5th International Conference on Education (IICE Hawaii-2020) ; 1 page; January, 10-12, 2020, Honolulu, Hawaii, USA.
(2) Domestic conference proceeding
Ting Liu, Takaya Yuizono ; Proposal of Curriculum for Foreign Language Education to Cultivate Creativity; The 40th Research Conferences of Japan Creativity Society; accepted; 4 pages; September, 11-13, 2018, Osaka, Japan.
Paper under review
Ting Liu, Takaya Yuizono; Association-based Activities Effects on University Students’ Creativity in Foreign Language Classrooms; 2020, 09; Journal of Japan Creativity Society; 18 pages.
Oriented Development of Enterprise Message Management:
Study on Visual Attention of Email Topic Inference (AttLDA for
Email) and Integration of ECS and ERP (SuccERP)

Intended Degree: Knowledge Science, Doctoral Degree
Laboratory: Nagai Lab
Student Number: 1820034
Research Content & Research Purpose
Our dissertation is mainly focusing on several topics for improving collaboration
and communication in an enterprise. By considering two features of collaboration,
unstructured collaboration (information collaboration) and structured collaboration
(process collaboration), we primarily focus on two representative tools: email and
Enterprise Resource Planning (ERP) System.
In terms of an enterprise, most of the current research result is struggling to achieve
specific and practical goals by proposed theoretical findings in the ERP domain. To
enable the managers to get a fuller picture of all the messages generated from an ERP
system with the Enterprise Collaboration System (ECS) and improve collaboration and
communication, we propose a complete method to develop an artifact-SuccERP based
on the Design Science approach to carry out the integration. By exploring multiple ERP
systems, we summarize our tasks into three aspects before implementing the
integrations: authentication, data initialization, and specific procedures implementation;
we also explain how the data-processing and integrations between the ERP and ECS.
In our perspective, we can distinguish our contribution of the proposed SuccERP
into two parts; 1) We present a complete demonstration of how to get the architecture
and database schema of an existing ERP system and consider the internal and external
hosting issues. 2) According to a series of literature reviews, we implement the
integration based on the critical success factors and existing issues presented in the
previous studies. In other words, we try to fill up the gap in communication and
collaboration capabilities by enhancing the ERP and ECS systems' integrations. In short,
we fulfill the data-processing and data-sharing from an ERP system to the external
resources. Besides, based on our results, follow-up research can explore the
implementation with other external resources for improving different issues. Given the
context of the increasing demands of custom ERP, it is reasonable to provide detailed
research as a guideline to those enterprises that plan to upgrade and enhance their ERP
systems.
Next, the definition of information collaboration is employees applying IT tools
to communicate and request assistance (answer); email is the most standard
documentation tool for communication. Although existing studies use the topic model
to support users for classifying emails, they disregard that human is not like a machine
can focus on all the words in an email to determine the distribution of email topics. The
Latent Dirichlet Allocation (LDA) model forms a basis for inferring topics; our work
aims to discover how each word's visual attention influences the topic inference and
estimates attention to a word according to its location features.
By reviewing the visual-spatial research and the state-of-the-art visual attention
models, we select the Bayesian Models to estimate attention and proposing a novel
model-Attention orientation Latent Dirichlet Allocation model (AttLDA). In AttLDA,
each email can regard as encoded into a two-dimensional space. We take the line length
(the characters per line in an email) and window size (the number of lines in an email)
into account to draw the optimal display size as a visual space and assign a location for
each word in an email. Besides location, attention estimation also considers the Term
Frequency and Inverse Document Frequency (TFIDF) and inferred topics for each
iteration. Our aim is as follows; the readers can not completely capture all the hidden
topics behind each word in an email, especially the context in the forwarded message.
Unlike the previous research, our result shows each email's topic distribution and
includes the distribution of related words' attention in each topic. More precisely, we
can consider the visual attention as the significance of an email's topic distribution. In
our experiment, we consider the public Enron email corpus as a dataset and apply the
Perplexity metrics to measure the performance of AttLDA. AttLDA is outperforming
the previous research on the perplexity evaluation.
Advanced technology has made the communication distance between people
shorter than ever before and accumulates the number of messages quicker and quicker.
People might quickly out of control for managing their messages owing to their
negligence. Our research proposes the SuccERP, which builds a platform to manage
ERP and ECS messages through definite guidelines to keep communication efficiency.
On the other hand, we also proposed the AttLDA to effectively extract the email topics
to improve email message management performance, and it can be considered a feature
for settling further tasks.
Research Accomplishment
1. Lin, Y., Nagai, Y., Chiang, T., & Chiang, H. SuccERP: The Design Science based
integration of ECS and ERP in post-implementation stage. International Journal of
Engineering Business Management, Peer review, ijebm-20-0119. (2nd-Review,
submitted at 25-Sep-2020).
2. Lin, Y., Nagai, Y., Chiang, T., & Chiang, H. AttLDA: Email Topic Identification
using Latent Dirichlet Allocation integrated with Visual Attention. Information
Processing and Management, Peer review, IPM-D-20-00545. (1st-Review,
submitted at 10-July-2020).
3. Chiang, H. K., Nagai, Y., & Lin, Y. Y. (2020). Link up Industry 4.0 with the
Enterprise Collaboration System to Help Small and Medium Enterprises.
Mathematical Problems in Engineering, Peer review, vol. 2020, 1-13. (Accept, I
am not first author).
4. Lin, Y., Nagai, Y., Chiang, T., & Chiang, H. (2020, March). Design and Develop
Artifact for Integrating with ERP and ECS Based on Design Science. In
Proceedings of the 2020 The 3rd International Conference on Information Science
and System (pp. 218-223).





AI



AI
AI


AI
AI
4.2% AI 2020 :
5855


AI
AI

AI


A.
A-1.
Shirasaka , Hajime , Takashi Mikami , Makoto Onizuka, Youji Kohda and Amna Javed” Structural
Condition of Combinatorial Innovation through Patent-ability AI analysis”, International Journal of
Intellectual Property Management, Inderscience (2021 ).
A-2.
,2021 2 Vol.74 No.2 .
B.
Shirasaka , Hajime , Youji Kohda,2017,“Study on Impact of Evaluation for Intellectual Property Value
using Artificial Intelligence on Intellectual Property Management” 5th International Conference on
Serviceology, Vienna: ICServ2017 , 207-210.
B-2.
Shirasaka , Hajime, Takashi Mikami, Youji Kohda, Amna Javed and Yosuke Nara ,2019,“Artificial
Intelligence Examiner in Patent Evaluation” ,1st International Conference on Information and
Knowledge Management, Dhaka, :i-IKM2019,67-68.
B-3.
Shirasaka , Hajime , Youji Kohda ,2020,“New Determination of Inventive Step by Collaboration
between AI and Patent Attorneys” EPIP 2020 Conference, Madrid (EPIP in Madrid Postponed to 8-
10 September 2021 for COVID-19).
C.
C-1.
11 (3): 533-547.
C-2.
No.3 : 18-21
D.
D-1. ,2017, 452017 8
:108-113.
E.
6185209
E-2. "
E-3. "
E-4. "
” 2017-234732 6457058
E-5. "
” 2018-237699 6531302
E-6. "”
2018-069694 6506439
E-7. "
4
E-8. "
” 2018-112348
” 2018-187804
” 2019-002764
E-11. "
” 2019- 002783
E-12. "
“ 2019-002801
" 2019-519032 6550583
E-14. "
” 2019-526021 6555704
E-15. "
” 2020-23711
E-16. "
” 2020-3571
E-17. "
” 2020-3572
E-18. "
” 2020-21848
” 6653833
2020-26761
E-21. 4 1 Allowable 3 1
1
F.
F-1. 2019 3 4 JEITA
F-2. 2019 10 GOOD DESIGN AWARD 2019
F-3. 2020 11 Matching HUB Business Idea & Plan Competition