multidimensional analysis model for a document warehouse that includes textual measures kim jeong...

32
Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Upload: camron-campbell

Post on 18-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Multidimensional analysis modelfor a document warehousethat includes textual measures

KIM JEONG RAE

UOS.DML. 2015.11.27.

1

Page 2: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Introduction

Author Martha Mendoza, Erwin Alegria, Manuel Maca, Carlos Cobos, Elizabeth Leon

Location Information Technology Research Group(GTI), etc. Colombia

Title Multidimensional analysis model for a document warehouse that includes textual

measures

Document Type Decision Support Systems 72(2015) 44-59

Date February 2015

2

Page 3: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Contents

Abstract Analysis Model

Proposed document warehouse model Multi-dimensional model

Textual measures and aggregation function

OLAP document visualization

Conclusion Evaluation results

3

Page 4: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Abstract(1/2)4

Motivation Business systems are increasingly required to handle substantial quantities of unstruc-

tured textual information.

Problem To manage unstructured text data stored in data warehouses

Approach The new multi-dimensional analysis model is proposed that includes textual measures

as well as a topic hierarchy.

The textual measures that associate the topics with the text documents are generated by Probabilistic Latent Semantic Analysis, while the hierarchy is created automatically using a clustering algorithm.

Page 5: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Abstract(2/2)5

Result The model gained an increasing acceptance with use, while the visualization of the

model was also well received by users.

Contribution This paper proposes a multidimensional model that incorporates textual.

The model allows documents to be queried using OLAP operations.

Page 6: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Proposed document warehouse model6

Four main Processes

Page 7: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Proposed document warehouse model7

Topic Hierarchy Building ① Two algorithms process

Cosme(step1)

Modified IGBHSK(Iterative Global-Best Harmony Search K-means algorithm)

Page 8: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

8

Topic Hierarchy Building ① Modified IGBHSK(Iterative Global-Best Harmony Search K-means algorithm) : Three levels

Proposed document warehouse model

Page 9: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

9

Topic Hierarchy Building ① IGBHSK algorithm[Ref.#2] for Topic hierarchy

Proposed document warehouse model

Page 10: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Proposed document warehouse model10

Probabilistic measures calculation ②

Page 11: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

11

Probabilistic measures calculation ② PLSA(Probabilistic Latent Semantic Analysis) algorithm [Ref.#24]

A Probability model given a set of documents with words

P(d|z) : the probabilities of the topics in the document

P(w|z) : the probabilities of the words in the topics

EM(Expectation Maximization) algorithm[Ref.#6,17]

Proposed document warehouse model

Page 12: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Proposed document warehouse model12

ETL(Extract-Transform-Load) ③

Page 13: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Multi-dimensional model13

Relational DB

Schema

Page 14: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Multi-dimensional model14

Standard dimensions Document dimension : name, document type

Author dimension : name, email

Date dimension : publish date

Location dimension : city, country

Word dimension : all words from the stored document set

Topic dimension : Topic hierarchy

M-M relationships Author-Group Bridge, Topic-Document-Group Bridge, Topic-Word-Group Bridge

Measures of the fact table and the topic and word dimension bridge tables Topics_Probab_TM : A average Probability of Topics

Documents_TM : Probabilities of a Document within topics

Word_Probab_TM : Probabilities of a word within topics

Page 15: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Proposed document warehouse model15

Multidimensional cube building ④

Page 16: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Textual measures and aggregation function16

Topic_Probab_TM Measure

R : the number of documents recovered by the query

A : the total number of distinct topics in the documents recovered in AM

Page 17: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Textual measures and aggregation function17

Documents_TM Measure

: each row in the query

B : the total number of distinct documents recovered in the query

m : the number of topics in the Topic dimension

Page 18: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Textual measures and aggregation function18

Word_Probab_TM Measure

: each row in the query

B : the total number of distinct words recovered in the query

m : the number of topics in the Topic dimension

Page 19: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

OLAP document visualization19

Topics_Probab_TM : Document dimension - Type of Document

Page 20: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

OLAP document visualization20

Topics_Probab_TM : Date Dimension - year

Page 21: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

OLAP document visualization21

Topics_Probab_TM : Document type(rows) and year attribute(columns)

Page 22: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

OLAP document visualization22

Topics_Probab_TM : Attribute of year and Document type Slice – “Journal Article”

Page 23: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

OLAP document visualization23

Topics_Probab_TM : Attribute of year and Document type and author name Dice operation

Page 24: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

OLAP document visualization24

Document_TM : each Topic and Document

Page 25: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

OLAP document visualization25

Document_TM : each Topic and year and Document

Page 26: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Conclusion - Evaluation results26

Execution time results

Page 27: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Conclusion - Evaluation results27

Execution time results

Page 28: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Conclusion - Evaluation results28

User satisfaction results Statistical frequency analysis

Page 29: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Conclusion - Evaluation results29

User satisfaction results Multivariate analysis

Page 30: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Thank you

30

Page 31: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Proposed document warehouse model31

Results Cosme : XML file(Metadata)

Page 32: Multidimensional analysis model for a document warehouse that includes textual measures KIM JEONG RAE UOS.DML. 2015.11.27. 1

Proposed document warehouse model32

Result IGBHSK