automatic assignment of solutions to written indoor air quality complaints automatic assignment of...

1
Automatic assignment of solutions to written indoor air Automatic assignment of solutions to written indoor air quality complaints quality complaints Zoulikha Bellia Heddadji Zoulikha Bellia Heddadji 1,2 1,2 , Séverine Kirchner , Séverine Kirchner 1 1 , Nicole Vincent , Nicole Vincent 2 2 and Georges and Georges Stamon Stamon 2 2 1 1 Scientific and Technical Centre for Building Scientific and Technical Centre for Building (CSTB) (CSTB) 2 2 Université Paris Descartes Université Paris Descartes Introduction Introduction The approach The approach Indoor air complaints are more and more expressed by building’s occupants, but it is often difficult to have available multidisciplinary expertises which could solve the problem at a local level. The aim of the project is to develop a computer system dedicated to resolve French written indoor air quality complaints. Complaints' solutions mean documents outlining possible circumstances that caused the problem cited in the complaint. Solutions contain also guidelines and corrective actions to avoid the problem of the concerned domestic pollution. After having observed a current regularity of the complaints in our corpus, a method was proposed as follows: Elaboration of scenarios gathering the different situations observed. Each scenario contains a set of resolved complaints concerning the same indoor air pollution theme, leading to the same solution (problems due to molds exposure, dust mite, synthetic or man made mineral fibbers, chemical compounds, etc.). Then, by using automatic information retrieval systems adapted to natural language in which complaints are expressed, the text of a new complaint is matched to the corpus of resolved complaints clustered in scenarios. A generalist French dictionary of synonyms was used to match complaints written with different words but which share same meaning. Several information retrieval systems, function of different properties of the studied texts (size, relevance, etc), are implemented such as: oThe vector space model 4,5 oThe fuzzy proximity model 3 oAnd our own model: The wave of information 1 Finally, the solution attributed to the scenario to which the most similar complaint to the current text belongs is assigned to the complaint to resolve (Figure 1). Figure 1. Architecture of the approach Figure 1. Architecture of the approach Text-matching was conducted by comparing texts concerning similar topics. Then, our user interface is an aggregate of questions, while the responses constitute the new problem. It was noticed unanimously that three essential items exist in the description of complaints: symptoms, description of dwelling and description of outdoor environment. As such, we use three semantically meaningful XML tags corresponding to the interface questions. The texts delimited by tags correspond to the information filled-out by users (Figure 2). XML schema to save complaints XML schema to save complaints <?xml version="1.0" encoding="ISO-8859-1" standalone="no" ?> <Number>78</Number> <Symptoms>Je souffre de toux et d’irritation trachéale </ Symptoms > <Description_habitat>Présence de poussières et de moisissures. J’ai aussi une odeur d’égout dans la salle de bain. J’ai des trous et des fissures dans le mur du salon derrière le téléviseur. J’ai trouvé des excréments de chauve-souris dans l’ouverture de la toiture </Description_habitat> <Description_outdoor_environment> Usine qui fait beaucoup de poussière en fabriquant du bois</Description_outdoor_environment> Figure 2. An example of a complaint saved Figure 2. An example of a complaint saved in XML format in XML format Results ans conclusion Results ans conclusion The interest of the use of semantics managed by DICTIONNAIRE 2 in text matching (Vector space model) is shown by recall-precision curves. The curve corresponding to the semantic model (generally the most efficient model) is upper. The interest of using semantics The interest of using semantics Evaluation of the automatic assignment Evaluation of the automatic assignment Automatic assignments was tested on a set of 96 new complaints. Results were compared to the judgments of 3 experts. The level of agreement between the assignments of solutions made by the several information retrieval systems developed and experts' judgments are satisfactory compared to the rates of agreement between experts’ opinions. The success rates of automatic assignments are between 79.52% and 89.16%, while the levels of agreement between experts vary between 88.54% and 88.75%. Table 1. Success rates of automatic assignments of Table 1. Success rates of automatic assignments of Table 2. The level of agreement between experts’ o Table 2. The level of agreement between experts’ o Figure 3. Evaluation of the interest of the use of Figure 3. Evaluation of the interest of the use of 1.Bellia Heddadji Z. 2008. Modélisation et classification de textes. Application aux plaintes liées à des situations de pollution de l’air intérieur. Ph.D. Thesis, Université Paris Descartes (France). 2.Manguin JL. 2005. La dictionnairique Internet : l'exemple du dictionnaire des synonymes du CRISCO. In: Proceedings of CORELA – Cognition, Représentation, Langage, Numéro special. 3.Mercier A. and Beigbeder M. 2004. Application de la logique floue à un modèle de recherche d'information basé sur la proximité. In: Proceedings des 12es rencontres francophones sur la Logique Floue et ses applications. 4.Salton G. and Buckley C. 1988. Term-weighting approaches in automatic text retrieval. In: Information Processing and Management.

Upload: justin-wilson

Post on 26-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic assignment of solutions to written indoor air quality complaints Automatic assignment of solutions to written indoor air quality complaints Zoulikha

Automatic assignment of solutions to written indoor air quality complaints Automatic assignment of solutions to written indoor air quality complaints

Zoulikha Bellia HeddadjiZoulikha Bellia Heddadji1,21,2, Séverine Kirchner, Séverine Kirchner11 , Nicole Vincent , Nicole Vincent22 and Georges Stamon and Georges Stamon22

11Scientific and Technical Centre for BuildingScientific and Technical Centre for Building (CSTB)(CSTB)22Université Paris Descartes Université Paris Descartes

IntroductionIntroduction

The approachThe approach

Indoor air complaints are more and more expressed by building’s occupants, but it is often difficult to have available multidisciplinary expertises which could solve the problem at a local level. The aim of the project is to develop a computer system dedicated to resolve French written indoor air quality complaints. Complaints' solutions mean documents outlining possible circumstances that caused the problem cited in the complaint. Solutions contain also guidelines and corrective actions to avoid the problem of the concerned domestic pollution.

After having observed a current regularity of the complaints in our corpus, a method was proposed as follows:Elaboration of scenarios gathering the different situations observed.Each scenario contains a set of resolved complaints concerning the same indoor air pollution theme, leading to the same solution (problems due to molds exposure, dust mite, synthetic or man made mineral fibbers, chemical compounds, etc.). Then, by using automatic information retrieval systems adapted to natural language in which complaints are expressed, the text of a new complaint is matched to the corpus of resolved complaints clustered in scenarios. A generalist French dictionary of synonyms was used to match complaints written with different words but which share same meaning.Several information retrieval systems, function of different properties of the studied texts (size, relevance, etc), are implemented such as:

oThe vector space model4,5

oThe fuzzy proximity model3

oAnd our own model: The wave of information1

Finally, the solution attributed to the scenario to which the most similar complaint to the current text belongs is assigned to the complaint to resolve (Figure 1).

Figure 1. Architecture of the approachFigure 1. Architecture of the approach

Text-matching was conducted by comparing texts concerning similar topics. Then, our user interface is an aggregate of questions, while the responses constitute the new problem. It was noticed unanimously that three essential items exist in the description of complaints: symptoms, description of dwelling and description of outdoor environment. As such, we use three semantically meaningful XML tags corresponding to the interface questions. The texts delimited by tags correspond to the information filled-out by users (Figure 2).

XML schema to save complaintsXML schema to save complaints

<?xml version="1.0" encoding="ISO-8859-1" standalone="no" ?> <Number>78</Number> <Symptoms>Je souffre de toux et d’irritation trachéale </ Symptoms ><Description_habitat>Présence de poussières et de moisissures. J’ai aussi une odeur d’égout dans la salle de bain. J’ai des trous et des fissures dans le mur du salon derrière le téléviseur. J’ai trouvé des excréments de chauve-souris dans l’ouverture de la toiture </Description_habitat> <Description_outdoor_environment> Usine qui fait beaucoup de poussière en fabriquant du bois</Description_outdoor_environment>  

Figure 2. An example of a complaint savedFigure 2. An example of a complaint saved in XML formatin XML format

Results ans conclusionResults ans conclusion

The interest of the use of semantics managed by DICTIONNAIRE2 in text matching (Vector space model) is shown by recall-precision curves. The curve corresponding to the semantic model (generally the most efficient model) is upper.

The interest of using semanticsThe interest of using semantics

Evaluation of the automatic assignmentEvaluation of the automatic assignment

Automatic assignments was tested on a set of 96 new complaints. Results were compared to the judgments of 3 experts. The level of agreement between the assignments of solutions made by the several information retrieval systems developed and experts' judgments are satisfactory compared to the rates of agreement between experts’ opinions. The success rates of automatic assignments are between 79.52% and 89.16%, while the levels of agreement between experts vary between 88.54% and 88.75%.

Table 1. Success rates of automatic assignments of solutionsTable 1. Success rates of automatic assignments of solutions

Table 2. The level of agreement between experts’ opinionsTable 2. The level of agreement between experts’ opinions

Figure 3. Evaluation of the interest of the use of semanticsFigure 3. Evaluation of the interest of the use of semantics

1. Bellia Heddadji Z. 2008. Modélisation et classification de textes. Application aux plaintes liées à des situations de pollution de l’air intérieur. Ph.D. Thesis, Université Paris Descartes (France).

2. Manguin JL. 2005. La dictionnairique Internet : l'exemple du dictionnaire des synonymes du CRISCO. In: Proceedings of CORELA – Cognition, Représentation, Langage, Numéro special.

3. Mercier A. and Beigbeder M. 2004. Application de la logique floue à un modèle de recherche d'information basé sur la proximité. In: Proceedings des 12es rencontres francophones sur la Logique Floue et ses applications.

4. Salton G. and Buckley C. 1988. Term-weighting approaches in automatic text retrieval. In: Information Processing and Management.

5. Zargayouna H. 2005. Indexation sémantique de documents XML. Ph.D. Thesis, Université Paris XI Orsay (France).