shibut poster i11 168

1
Selection and Aggregation of Sentences in the Knowledge Formation Process M.S. Shibut, V.S. Yakovishin The Academy of Public Administration under the aegis of the President of the Republic of Belarus, 17, Moskovskaya Str., 220007, Minsk, Republic of Belarus, [email protected], http://pac.by/en Let S , S , S , S , S be sentences, expressed in terms of formal language, as shown in the figure below, 1 2 3 4 5 where a, in, o are signs of the secondary sentence parts, p, pt, pPs are signs of the different predicates (for the present, past indefinite, and present simple passive, respectively). According to the selection rule, the first sentence must be eliminated because of intensional superiority of the second sentence (S Н S ). The sentences S , S , S , S can be integrated in compliance with the 1 2 2 3 4 5 aggregation rule. Let “man”, “young man”, “library” be the subjects contained in user's request. Then, as a result of integration on the given subjects, the following three subject knowledge descriptions can be obtained: s({man}) = { S , S , S }, s({man, man_a.young}) = { S , S }, s({library}) = { S , S }. 2 3 5 2 5 2 4 Knowledge-based text adaptation. The subject knowledge formation can be used as a basis for automatic creation (compiling) of adapted (user-oriented) text materials, such as - various information-analytical reviews; - individual electronic textbooks; - any other adapted text materials. Knowledge-based information search. The information search can be realized as a two-stage process (that resembles the ore processing): - data search: the usual information retrieval is realized to draw information (as full as possible) from a number of sources; - knowledge search (“ore dressing”): the obtained results are processed to extract only the important information (“valuable elements”). Knowledge-based machine translation. In the translation of the source text from one natural language to another, the subject knowledge base (where the lexical compatibility is fixed) can be used as a supporting interlingua, that plays the role of an effective filter for screening all the misplaced meanings of polysemous words. The knowledge formation is presented as the process of selection and aggregation of input sentences. In this process, the text sentences are at first transformed into the formal language, and then they are integrated into the knowledge representation. The integration of the sentences that have one and the same subject is considered as a subject knowledge representation, and any collection of the subject knowledge representations, produced in the knowledge formation process, is considered as a user-oriented (“highly tailored”) description of subject field. It is supposed that the subject (usually characterized as “the something or someone that the sentence is about”, “the thing being talked about”) is expressed by a grammatically separated noun phrase that represents either the absolutely independent part of sentence (the formal subject of the division subject-predicate) or the general determinative part, i.e. the attribute that relates to the whole sentence (the actual subject of the division theme-rheme, also known as topic- comment, representing the “reflection of the speaker's attitude towards what is said”). The presented here knowledge formation method is based on the using of the special formal language. In the formal language, input text sentences are expressed in the set-theoretical (parenthesis-free, “discrete”) form as sets of their syntactic elements (syntagmes), which allows us to reduce the semantic identification of sentences to the using of standard set-theoretical relation of inclusion. Subject knowledge formation is a growth process in which two formation rules, namely the rules of selection and aggregation of sentences, must realize. Selection rule: o sentences S and S must be eliminated, if it is a subset of another sentence, i.e. 1 2 {S , S }® S , if S К S . 1 2 1 1 2 Aggregation rule realizes the integration of already selected sentences: if S , S , ... are sentences that 1 2 have the same subject N, they will unite in a subject knowledge representation, i.e. {S , S , ...}®s(N). 1 2 ne of the Subject knowledge representation is a set s(N) of sentences S , S , ... with the common subject, 1 2 represented by a noun phrase N (contained in user's request), i.e. s(N){S| К N, i і 1}. i Subject field representation is any collection s(N , N , ...) of subject knowledge representation produced 1 2 in the knowledge formation process, i.e. s(N , N , ...) = {s(N ), s(N ), ... }, 1 2 1 2 where N , N , are noun phrases that play the role of subjects in the division “subject-predicate” or in the 1 2 actual division “theme-rheme”. S i Stepwise subordination: Syntagme: (as in The book of the new author) (as in The new book) (X (X X ))={X X , X X } 1 1 2 2 3 1 1 2 2 2 3 (X X )={X X } 1 2 1 2 Collateral subordination: (as in The new book of the author) ((X X )X )={X X , X X } 1 1 2 2 3 1 1 2 1 2 3 Multisyntagme: (as in The new and old books) (X (X С X ))={X X , X X } 1 2 3 1 2 1 3 Subject (absolutely independent part): (as in The man reads a book) ((X X )X )={X , X X , X X } 1 1 2 2 3 1 1 1 2 1 2 3 Theme (topic): (as in In the evening, the man reads a book) ((X (X X ))X )={X , X , X X , X X } 1 1 2 2 3 3 4 3 4 1 1 1 2 2 2 3 The book The book The man The man of the author reads reads new new a book a book in the evening dependent member dependent member dependent members homogeneous parts subject subject theme head member head member head members The book The book of the author new new and old Input sentences 1. The young man reads a book. 2. The young man reads a book in the library. 3. The man walked in the park. 4. The library is situated in a graceful street. 5. The young man kicked the ball. Knowledge representation 1. man, man_a.young, man_p.read, read_o.book 2. man, man_a.young, man_p.read, read_o.book, read_in.library 3. man, man_pt.walk, walk_ in.park 4. library, library_pPs. situate, situate_in.street, street_a. graceful Knowledge representation 2. man, man_a.young, man_p.read, read_o.book, read_in.library 3. man, man_pt.walk, walk_ in.park 4. library, library_pPs. situate, situate_in.street, street_a. graceful Knowledge representation for “library” __________________________ 4. library, library_pPs. situate, situate_in.street, street_a. graceful 2. man, man_a.young, man_p.read, read_o.book, read_in.library Knowledge representation for “man” 2. man, man_a.young, man_p.read, read_o.book, read_in.library 3. man_pt.walk, walk_ in.park __________________________ User-oriented description of subject field 2. The library is situated in a graceful street. 4. The young man reads a book in the library. User-oriented description of subject field 2. The man walked in the park. 3. The young man reads a book in the library. 4. The young man kicked the ball. Selection rule Aggregation rule Query “man” Query “library” Id14 The described research was supported by research program on the Development of the State System of Scientific and Technical Information of the Republic of Belarus for 2009-2010, task No 3.3, sponsored by the State Committee for Science and Technology of the Republic of Belarus. We are pleased to thank prof. Rauf Sadykhov and prof. Anatoly Sachenko for their assistance. We are also very grateful to dr. Iryna Turchenko for the presentation of our paper. Transformation into the formal language Knowledge formation

Upload: marina-shibut

Post on 24-Jan-2018

69 views

Category:

Science


0 download

TRANSCRIPT

Selection and Aggregation of Sentences in the Knowledge Formation Process

M.S. Shibut, V.S. YakovishinThe Academy of Public Administration under the aegis of the President of the Republic of Belarus,

17, Moskovskaya Str., 220007, Minsk, Republic of Belarus, [email protected],http://pac.by/en

Let S , S , S , S , S be sentences, expressed in terms of formal language, as shown in the figure below, 1 2 3 4 5

where a, in, o are signs of the secondary sentence parts, p, pt, pPs are signs of the different predicates (for the present, past indefinite, and present simple passive, respectively).

According to the selection rule, the first sentence must be eliminated because of intensional superiority of the second sentence (S Н S ). The sentences S , S , S , S can be integrated in compliance with the 1 2 2 3 4 5

aggregation rule. Let “man”, “young man”, “library” be the subjects contained in user's request. Then, as a result of integration on the given subjects, the following three subject knowledge descriptions can be obtained: s({man}) = { S , S , S }, s({man, man_a.young}) = { S , S }, s({library}) = { S , S }.2 3 5 2 5 2 4

Knowledge-based text adaptation. The subject knowledge formation can be used as a basis for automatic creation (compiling) of adapted (user-oriented) text materials, such as

- various information-analytical reviews;- individual electronic textbooks;- any other adapted text materials.Knowledge-based information search. The information search can be realized as a two-stage process

(that resembles the ore processing): - data search: the usual information retrieval is realized to draw information (as full as possible) from a

number of sources; - knowledge search (“ore dressing”): the obtained results are processed to extract only the important

information (“valuable elements”). Knowledge-based machine translation. In the translation of the source text from one natural language to

another, the subject knowledge base (where the lexical compatibility is fixed) can be used as a supporting interlingua, that plays the role of an effective filter for screening all the misplaced meanings of polysemous words.

The knowledge formation is presented as the process of selection and aggregation of input sentences. In this process, the text sentences are at first transformed into the formal language, and then they are integrated into the knowledge representation. The integration of the sentences that have one and the same subject is considered as a subject knowledge representation, and any collection of the subject knowledge representations, produced in the knowledge formation process, is considered as a user-oriented (“highly tailored”) description of subject field. It is supposed that the subject (usually characterized as “the something or someone that the sentence is about”, “the thing being talked about”) is expressed by a grammatically separated noun phrase that represents either the absolutely independent part of sentence (the formal subject of the division subject-predicate) or the general determinative part, i.e. the attribute that relates to the whole sentence (the actual subject of the division theme-rheme, also known as topic-comment, representing the “reflection of the speaker's attitude towards what is said”).

The presented here knowledge formation method is based on the using of the special formal language. In the formal language, input text sentences are expressed in the set-theoretical (parenthesis-free, “discrete”) form as sets of their syntactic elements (syntagmes), which allows us to reduce the semantic identification of sentences to the using of standard set-theoretical relation of inclusion.

Subject knowledge formation is a growth process in which two formation rules, namely the rules of selection and aggregation of sentences, must realize.

Selection rule: o sentences S and S must be eliminated, if it is a subset of another sentence, i.e.1 2

{S , S }® S , if S КS .1 2 1 1 2

Aggregation rule realizes the integration of already selected sentences: if S , S , ... are sentences that 1 2

have the same subject N, they will unite in a subject knowledge representation, i.e. {S ,S , ...}® s(N).1 2

ne of the

Subject knowledge representation is a set s(N) of sentences S , S , ... with the common subject, 1 2

represented by a noun phrase N (contained in user's request), i.e.s(N){S | К N, i і 1}.i

Subject field representation is any collection s(N , N , ...) of subject knowledge representation produced 1 2

in the knowledge formation process, i.e.s(N , N , ...) = {s(N ), s(N ), ... },1 2 1 2

where N , N , are noun phrases that play the role of subjects in the division “subject-predicate” or in the 1 2

actual division “theme-rheme”.

Si

Stepwise subordination:

Syntagme:

(as in The book of the new author)

(as in The new book)

(X ∆ (X X ))={X ∆ X , X ∆ X }1 1 2 2 3 1 1 2 2 2 3∆

(X ∆X )={X ∆X }1 2 1 2

Collateral subordination:(as in The new book of the author)

((X ∆ X )∆ X )={X ∆ X , X ∆ X }1 1 2 2 3 1 1 2 1 2 3

Multisyntagme:(as in The new and old books)

(X ∆(X СX ))={X ∆X , X ∆X }1 2 3 1 2 1 3

Subject (absolutely independent part):(as in The man reads a book)

((X ∆ X )∆ X )={X , X ∆ X , X ∆ X }1 1 2 2 3 1 1 1 2 1 2 3

Theme (topic):(as in In the evening, the man reads a book)

((X ∆ (X ∆ X ))∆ X )={∆ X , X , X ∆ X , X ∆ X }1 1 2 2 3 3 4 3 4 1 1 1 2 2 2 3

The book

The book

The man

The man

of the author

reads

reads

new

new

a book

a book in the evening

dependentmember

dependentmember

dependentmembers

homogeneousparts

subject

subjecttheme

headmember

headmember

headmembers

The book

The book

of the authornew

new and old

Input sentences1. The young man reads a book.2. The young man reads a book in the library.3. The man walked in the park.4. The library is situated in a graceful street.5. The young man kicked the ball.…

Knowledge representation1. man, man_a.young, man_p.read, read_o.book2. man, man_a.young, man_p.read, read_o.book, read_in.library 3. man, man_pt.walk, walk_ in.park4. library, library_pPs. situate, situate_in.street, street_a. graceful…

Knowledge representation2. man, man_a.young, man_p.read, read_o.book, read_in.library 3. man, man_pt.walk, walk_ in.park4. library, library_pPs. situate, situate_in.street, street_a. graceful…

Knowledge representation for “library”

__________________________

4. library, library_pPs. situate,

situate_in.street, street_a. graceful

2. man,man_a.young, man_p.read, read_o.book, read_in.library

Knowledge representation for “man”2. man, man_a.young, man_p.read,

read_o.book, read_in.library 3. man_pt.walk,

walk_ in.park__________________________…

User-oriented description of subject field2. The library is situated in a graceful street.4. The young man reads a book in the library.

User-oriented description of subject field2. The man walked in the park.3. The young man reads a book in the library.4. The young man kicked the ball.

Selection rule

Aggregationrule Query “man”Query “library”

Id14

The described research was supported by research program on the Development of the State System of Scientific and Technical Information of the Republic of Belarus for 2009-2010, task No 3.3, sponsored by the State Committee for Science and Technology of the Republic of Belarus.

We are pleased to thank prof. Rauf Sadykhov and prof. Anatoly Sachenko for their assistance. We are also very grateful to dr. Iryna Turchenko for the presentation of our paper.

Transformation into theformal language

Knowledgeformation