[ieee 2009 asia-pacific conference on information processing, apcip - shenzhen, china...

4
Research of complex question parsing based on condition extraction Xiuling Pang Department of Educational Science and Technology Weifang University Weifang, China [email protected] Keliang Jia School of Information Management Shandong Economic University Jinan, China [email protected] Abstract—To the complex condition question, the parsing algorithm based on condition extraction is proposed. The complex condition question includes two parts: the condition description part and the question focus description part. The two parts are processed separately and then merged to obtain the complex condition question semantic representation. The algorithm can objectively reveal the structure essence of the complex condition question. The experimental result shows that this algorithm has obtained certain effect in the restricted domain question answering system. Keywords-complex condition question parsing; condition extraction; pattern matching I. INTRODUCTION In the restricted domain question answering system, users often ask a complex question when they consult a question. So the resolving of the complex question parsing can advance effectively the precision of the question answering system. On the support of the TREC Text REtrieval Conference [1], the research of the open domain question answering system are already obtained prodigious advancement. There are many system including question parsing, for example, Carnegie Mellon’s JAVELIN[2], Concordia’s QUANTUM[3] and TCS Travel Consultation System [4] including Japanese question parsing. To the Chinese question parsing, the method is based on the template matching of the keyword, part of speech and wildcard. In the QA System for Character Relations in HONG LOU MENG, the question about character relation was parsed based on keyword and wildcard [7]. HUANG Yinfei proposed a question syntactic parser which applied the Context Free Grammar (CFG)[8]. Based on all of the above, the question parsing aimed at the simple question, there is no relevant research to the complex question. The paper proposes a parsing algorithm based on condition extraction and researches how to extract the complex condition question representation from the user’s question. II. THE ALGORITHM OF THE COMPLEX QUESTION PARSING There are one or more condition statement sentence and a question focus sentence in the complex question. The task of the question parsing is to extract the question focus from the latter and to extract the conditions from the former, and then composes them to form a complex condition question representation of the question. By the analysis of a mass of complex question, we find the focus of the question fastens on the role of the event. In HowNet, DONG proposed 69 event roles which could cover all the roles of the most events. Several related definitions are introduced as follows. Definition 1: the focus of information asked by the question is called Question focus (QF). Definition 2: The formalized expression of the simple question semantic information is called Question Semantic Representation (QSR). Definition 3: The formalized expression of the complex question semantic information is called Complex Question Semantic Representation (CQSR). The condition is expressed by the form “name- value”, namely “condition name=condition value”. The CQSM about the question asked event roles is given as below. CQSR={condition 1 , condition 2 , condition n , QSR}. Here, “condition 1 , condition 2 , , condition n ” present the n conditions in the complex question QSR presents the semantic representation of the question focus sentence. For example Q CQSR={startingPoint= , destination= , QSR={QT=EventRole, ?Role=patient, EnC= default, EvC = }} Here, patient presents the role of the object. By the analysis of the structure of complex condition question, we find the condition sentence in the CQ (complex question) is corresponding to the condition in the CQSR and the question focus sentence in the CQ is corresponding to the QSR. The extraction of QSR is based on pattern matching [9]. The paper takes advantage of the information extraction methods to extract the condition from the condition sentences in the CQ. By constructing the condition extraction rules, the method scans the CQ and then extracts the condition from it. If there are more condition rules, the time to select the condition rule is more. In order to resolve the problem, we firstly extract the QF from the CQ and then find the condition of the QF in the knowledge database, finally select the condition rules according to the condition of the QF to extract the condition. A. Scanning the Question Scanning the question is the first step of the algorithm of the CQ parsing; its purpose is to extract the CQSM. The detail algorithm is as follows. 2009 Asia-Pacific Conference on Information Processing 978-0-7695-3699-6/09 $25.00 © 2009 IEEE DOI 10.1109/APCIP.2009.287 623

Upload: keliang

Post on 27-Jan-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Research of complex question parsing based on condition extraction

Xiuling Pang Department of Educational Science and Technology

Weifang University Weifang, China

[email protected]

Keliang Jia School of Information Management

Shandong Economic University Jinan, China

[email protected]

Abstract—To the complex condition question, the parsing algorithm based on condition extraction is proposed. The complex condition question includes two parts: the condition description part and the question focus description part. The two parts are processed separately and then merged to obtain the complex condition question semantic representation. The algorithm can objectively reveal the structure essence of the complex condition question. The experimental result shows that this algorithm has obtained certain effect in the restricted domain question answering system.

Keywords-complex condition question parsing; condition extraction; pattern matching

I. INTRODUCTION In the restricted domain question answering system,

users often ask a complex question when they consult a question. So the resolving of the complex question parsing can advance effectively the precision of the question answering system.

On the support of the TREC Text REtrieval Conference [1], the research of the open domain question answering system are already obtained prodigious advancement. There are many system including question parsing, for example, Carnegie Mellon’s JAVELIN[2], Concordia’s QUANTUM[3] and TCS Travel Consultation System [4] including Japanese question parsing. To the Chinese question parsing, the method is based on the template matching of the keyword, part of speech and wildcard. In the QA System for Character Relations in HONG LOU MENG, the question about character relation was parsed based on keyword and wildcard [7]. HUANG Yinfei proposed a question syntactic parser which applied the Context Free Grammar (CFG)[8].

Based on all of the above, the question parsing aimed at the simple question, there is no relevant research to the complex question. The paper proposes a parsing algorithm based on condition extraction and researches how to extract the complex condition question representation from the user’s question.

II. THE ALGORITHM OF THE COMPLEX QUESTION PARSING

There are one or more condition statement sentence and a question focus sentence in the complex question. The task of the question parsing is to extract the question focus from the latter and to extract the conditions from the former, and then composes them to form a complex

condition question representation of the question. By the analysis of a mass of complex question, we find the focus of the question fastens on the role of the event. In HowNet, DONG proposed 69 event roles which could cover all the roles of the most events. Several related definitions are introduced as follows.

Definition 1: the focus of information asked by the question is called Question focus (QF).

Definition 2: The formalized expression of the simple question semantic information is called Question Semantic Representation (QSR).

Definition 3: The formalized expression of the complex question semantic information is called Complex Question Semantic Representation (CQSR). The condition is expressed by the form “name- value”, namely “condition name=condition value”.

The CQSM about the question asked event roles is given as below.

CQSR={condition1, condition2, conditionn, QSR}. Here, “condition1, condition2, , conditionn” present

the n conditions in the complex question QSR presents the semantic representation of the question focus sentence.

For example Q CQSR={startingPoint= , destination= ,

QSR={QT=EventRole, ?Role=patient, EnC= default, EvC = }}

Here, patient presents the role of the object. By the analysis of the structure of complex condition

question, we find the condition sentence in the CQ (complex question) is corresponding to the condition in the CQSR and the question focus sentence in the CQ is corresponding to the QSR. The extraction of QSR is based on pattern matching [9]. The paper takes advantage of the information extraction methods to extract the condition from the condition sentences in the CQ. By constructing the condition extraction rules, the method scans the CQ and then extracts the condition from it. If there are more condition rules, the time to select the condition rule is more. In order to resolve the problem, we firstly extract the QF from the CQ and then find the condition of the QF in the knowledge database, finally select the condition rules according to the condition of the QF to extract the condition.

A. Scanning the Question Scanning the question is the first step of the algorithm

of the CQ parsing; its purpose is to extract the CQSM. The detail algorithm is as follows.

2009 Asia-Pacific Conference on Information Processing

978-0-7695-3699-6/09 $25.00 © 2009 IEEE

DOI 10.1109/APCIP.2009.287

623

1) Segmentation and POS: We have used the source code of the ICTCLAS system, and carried out partial modification. The domain word base was added to the system. In addition, the word in the domain word base has higher priority. The ICTCLAS system was developed in VC, but our question-parsing program is developed in Java, so we use JNI to realize for the Java invoking of the ICTCLAS system.

2)Load the question semantic module rules: A QSM rule querying about the event roles is given as below.

<QSM> <QSMID>CQSM_Event_Role_1</QSMID> <CONTENT> QfC(type= )+EvC(DEF=TakeVehicle| ) </CONTENT> <QSR_QT>EventRole</QSR_QT> <QSR_Role>patient</QSR_Role> <QSR_ENC>default</QSR_ENC> <QSR_EVC> </QSR_EVC> <EXAMPLE> + , + </EXAMPLE> </QSM> 3) Question semantic model matching: The detail of

the matching algorithm is as follows. Algorithm 1: Question semantic model matching

algorithm of the complex question. Input a complex question CQ, Step 1: Divide the CQ into N SubCQi 1 i N≤ ≤ , 1N ≥ by the punctuation, each sub sentence is presented

as 1 2, , ,i i iPword word word , here, P=|SubCQi|; Step 2: Utilize the question semantic model QSMT

include M rules , each rule is presented as 1 2, , , Qsc sc sc , here Q is the length of the rule.

Step 3: For ( i = 1 to N ) { Step 4: For ( j = 1 to M ) { Step 5: For ( k = 1 to P-Q ) { Step 6: For ( l = 1 to Q ) { Step 7: If ikword is not matched to lsc , k++, goto

step 5) Step 8: If ikword is matched to lsc , k++, l++, goto

step 5) Step 9: } Step 10: If l=Q, the matching is success, the

CQSR is produced and returned Step 11: Else, j++, goto step 4) Step 12: } Step 13: } Step 14: } Step 15: The extraction of the CQSR is failed. Output CQSR

B. Acquirement of the Condition Extraction Rules Acquirement of the condition extraction rules is the

second step of the algorithm of the CQ parsing; its tasks include: to extract the QF from the CQ, to find the conditions of the QF in the knowledge database, to load the condition extracting rules according to the found conditions.

1) The presentation of the condition in knowledge database: In order to parse the CQ, the knowledge database is needed. The granularity of the knowledge

presentation must be fine and the knowledge can present the needed conditions of the FQ.

When the conditions are presented in the knowledge database, three attributes are used: name, type and restrict. Name is the unique ID of a condition. It must be named by English character, number and so on. In this paper, 5 types of conditions are defined: Int, String, Date, Time and Set. To each condition type, we define the restriction which presents the bound of the condition. For example: the condition of the Int type

<condition> <name>age</name> <type>int</type> <restrict>18+</restrict> </condition> Here “18+” presents “>=18”. If there is no need to

restrict the condition, ANY could be filled in the restrict label.

2) Presentation of the condition extraction: To each condition, the condition extraction patterns are defined. Pattern matching is widely used in information extraction. The information extraction patterns may present the relation or the event in a special domain. The patterns are sequence lists which are made up of by items; each item is corresponding to a word or words. This paper defines the condition extraction patterns which are the same as information extraction patterns [10].

Set condition extraction pattern CP then 1 2, , , nCP Item Item Item= here 1 2{ , , , }i i i itItem W W W=

1 i n≤ ≤ (1 )ijW j t≤ ≤ is a word. For example, part of the condition extraction pattern is: age: “ ”<“ ”> (NUMBER) <“ ”> age: [“ ”] (NUMBER) <“ ”> homeAddress: “ ” “ ”<“ ”> (LOCATION) homeAddress: <“ ”>“ ”<“ ”> (LOCATION) startingPoint: “ ”( LOCATION) “ ” startingPoint: [“ ”] <“ ”> ( LOCATION) Here, “ ” “ ” etc present words, [“ ”] and [“

”] etc present character item, <“ ”> and <“ ”> etc presents optional item, (NUMBER) and (LOCATION) etc present the extracted items, “age” and “homeAddress” etc are the name of the corresponding condition.

After scanning the question, the system extracts the QF from the CQ, and then finds the conditions of the QF in the knowledge database, finally loads the condition extracting rules according to the found conditions.

C. Condition Extracting Condition extracting is the third step of the algorithm

of the CQ parsing; its tasks include two parts: the first is to extract the conditions by using the condition extracting rules, the second is to combine the conditions and the QSM of the question focus sentence into a CQSM.

The followed example shows how to extract the conditions and to produce the CQSM.

Step 1: Give a complex question Q “ 68”

Step 2: scan the question, the QSM of the question focus sentence is {QT= EvC= patient=

agent=default }.

624

Step 3: select the condition extracting patterns, the patterns are as follows: {age: “ ” < “ ” > (NUMBER) < “ ”> age: [“ ”] (NUMBER) < “ ”>} and {homeAddress: “ ” “ ”< “ ”> (LOCATION), homeAddress: <“ ”> “ ” < “ ” > (LOCATION)}

Step 4: extract the conditions according to the condition extracting rules, the conditions are: age=18, homeAddress= .

Step 5: combine the conditions and the QSM of the question focus sentence into a CQSM. CQSR={age=18homeAddress= QSR={QT= EvC=patient= agent=default}}.

D. Answer Reasoning Answer reasoning is the fourth step of the CQ parsing

algorithm; its task is to compare the conditions in the CQSM with the conditions in the knowledge database and to reason the answer. To a question asked event roles, the reasoner queries the knowledge database according to the values of the conditions. It can get the value of the role and return it to the user. The followed example shows the process of the answer reasoning.

To a question “”, after the question analyzing, its’ CQSR is:

CQSR={startingPoint= destination=QSR={QT=EventRole ?Role=patient EnC=defaultEvC = }}. In knowledge database, the question focus “

” needs two conditions (startingPoint) and (destination) whose restricts are <restrict>ANY</restrict>. The reasoner couldn’t directly reason the answer. So the system queries the patient role of the event “ ” according to the condition in the knowledge database and return the result “3” and “k51”.

III. EXPERIMENTS AND ANALYSIS As there is less research about complex question, there

aren’t standard evaluated data sources and standard evaluating methods for Chinese complex question. According to the above method, we design Jinan public traffic question answering system. The term of precision is conducted to evaluate the system, the formula is as follows.

all

corr

numnum

precision = (1)

At present, it has test the system in two aspects: close test and open test. To close test, the test set is 200 questions which are extracted randomly from the complex question set. On the other hand, open test tests scene facing actual users with organizing ten students stochastically to query 20 questions. In the main QA task [11] on the TREC2001, questions were no longer guaranteed to have an answer in the collection; systems returned a response of “NIL” to indicate their belief that no answer was present. As the size of knowledge database is limited and the information in it is not sufficient enough to answer all the questions, there are some user’s questions without answer in the knowledge database. In our system, NIL is correct when no correct answer is known to exist in the knowledge database for the question. The result shows as in table 1.

TABLE I. THE EXPERIMENT RESULT

answer NIL total R W R W R Precision%

Close 141 59 0 0 141 70.5 Open 75 60 21 44 96 48

In the table, R presents right, W presents wrong. As the table shows, the precision of open test is lower than the precisions of close test because the question in the close test comes directly from our complex question set. The total precision with the right answer and the right NIL reaches 48% in the open test. From the results we can see that the complex question parsing method based on condition extraction is applied and the method is feasible.

By the analysis of the wrong answer, there are three causes which are the shortage of QSM rules, the shortage of knowledge in knowledge database and the shortage of condition extracting rules.

IV. CONCLUSION AND FUTURE WORK In this paper, we proposed a complex question parsing

method to extract the CQSR of the complex question which was based on condition extraction and pattern matching. The experimental results show that the method is effective and the system reaches high precision. However, our algorithm still has some small drawbacks. More research will be done to improve our algorithm performance in the future, which includes: (1) Exploring the more reasonable expression method of the condition pattern. (2) Finding more intelligent method to extract the condition from the question.

ACKNOWLEDGMENTS This work is supported by the Research Foundation of

Shandong Economic University (No.01611320). The authors are grateful for the anonymous reviewers who made constructive comments.

REFERENCES [1] Text Retrieval Conference, http://trec.nist.gov. [2] E.Nyberg, T.Mitamura, “The JAVELIN question answering system

at TREC2002”, acquirable at: http://trec.nist.gov/pubs/ trec11 /papers/cmu.javelin.pdf.

[3] L. Plamondon, G. Lapalme, “Université de Montréal, The QUANTUM Question Answering System”, Proc. the Tenth Text REtrieval Conference (TREC 10), Nov.2001, pp. 579~586.

[4] I. Kobayashi, “A study on meaning processing of dialogue with an example of development of travel consultation system”, Information sciences, vol.144, Jan. 2002, PP. 45-74.

[5] Ask Jeeves, http://www.askjeeve.com. [6] AnswerBus, http://www.answerbus.com/index.shtml. [7] WANG Shu-xi, LIU Qun, BAI Shuo, “An Expert System About

Relationship”, Journal of Guangxi Normal University, vol. 21, Jan. 2003, pp. 31-36

[8] HUANG Yinfei ZHENG Fang, YAN Pengju, XU Mingxing, WU Wenhu, “The Design and Implementation of Campus Navigation System: EasyNav”, Journal of Chinese Information Processing, vol.15, Apr. 2001, pp: 35-40.

[9] CHEN Kang, FAN Xiaozhong, LIU Jie, JIA Keliang, “Calculation Method of Chinese Question Semantic Similarity Based on Question Semantic Representation”, Transactions of Beijing Institute of Technology, Vol.27, Dec. 2007. pp. 4700-4704

[10] Wang Jinghua, LIU Jianyi. “A New Algorithm of Rule Generation for Chinese Information Extraction”, Proceedings of International Conference on Natural Language Processing and Knowledge

625

Engineering 2005 (IEEE NLP-KE' 05), Beijing University of Posts and Telecommunications Press, Oct. 2005, pp.565-570.

[11] E. Voorhees, “Overview of the TREC 2001 Question Answering Track”, Proceeding of the 10th Text Retrieval Conference, Gaithersburg, NIST, Nov.2001, pp.42-51

626