integration of data models for process design — first steps and experiences

Integration

Computers & Chemical Engineering

Computers and Chemical Engineering 24 (2000) 599-605 www.elsevier.com,iIocate/compchemeng

of data models for process design - first steps and experiences

Birgit Bayer *, Ralph Schneider, Wolfgang Marquardt

Lehrstuhl fir Prozesstechnik, R WTH-Aachen, Templergraben 55, D-52056 Aachen, Germany

Abstract

In recent years numerous approaches have been presented to support the activities in chemical process design by means of software tools, but significant improvements of the design process can only be achieved if it is supported in its entirety. Therefore, the integration of the existing software tools is of prime importance. All these tools are based on data models that capture the information handled during the design process. In this study, a conceptual data model is presented that has been designed for the integration of existing data models as a first step towards the integration of the software tools using these models. This conceptual lifecycle process model CLiP is based on the ideas of general systems theory. This universal approach renders the model very flexible and guarantees a well-defined structure for data model integration. Some integration mechanisms are presented and discussed. Finally, a conceptual model is proposed based on the architecture tool integration. 0 2000 Elsevier Science Ltd. All rights reserved.

Keywords: Information modeling; Data model integration; Meta modeling; Process Data exchange Institute; Process design support

1. Introduction

The actual state of computer support in chemical engineering design is mainly characterized by loosely linked software packages for distinct activities like sim- ulators, CAx programs, heuristic systems for flowsheet synthesis, or costing tools. But in order to improve the design process significantly - as it is needed due to the increasingly competitive and global market - it has to be supported in its entirety by tight tool integration or even work process integration (Marquardt & Nagl, 1998). During the last few years there have been numerous approaches that focussed on such a support of the entire design process by integrating existing software tools.

The STEP initiative is concerned with the representa- tion of engineering product data and the definition of standards for the data exchange between software tools in several application domains. These standards are based on common data models defined in several application protocols (AP). There are three APs in the area

* Corresponding author. Tel.: + 49-241-804672; fax: + 49-241- 8888326.

E-mail address: [email protected] (B. Bayer)

of chemical engineering, AP 231 (developed by the Process Data exchange Institute, pdXi) focuses on the conceptual design of chemical processes; AP 221 en- compasses detailed description of apparatuses and in- struments; and AP 227 describes the exact layout of a plant. Further, there are information management systems under development, like n-dim at Carnegie Mellon University (Westerberg & n-dim Group, 1997). Batres and Naka (1999) propose the exchange of data between different tools and an integration of components into a platform based on a formal description of chemical engineering knowledge.

All these integration approaches and all software tools used during process design are based on data models to capture engineering knowledge in a computer-processable manner. These models have been developed for the design and realization of distinct support systems; they have their strong points regard- ing the software system they were designed for. But they also have their weaknesses, since they are rarely of general validity.

Therefore, we propose a well-structured data model that aims at covering the entire design lifecycle of a chemical process from product design to operation systems design (Schuler, 1998) to allow the integration

0098-1354/00/$ - see front matter 0 2000 Elsevier Science Ltd. All rights reserved. PII: SOO98-1354(00)00426-9

600 B. Bayer et al. /Computers and Chemical Engineering 24 (2000) 599-605

of other information models in order to profit from the knowledge and experiences they capture.

Information modeling is a standard task during the software design process, where three different levels can be distinguished (Fowler & Scott, 1997). Conceptualiza- tion targets at a description of the domain, independently from implementational decisions and constraints. The second level is the specification where the functionality of the developed software is defined by means of modules and their interfaces; in the implementation level finally all details of software realization are described. In order to enable the integration of several existing data models, a model has to be on the conceptual level (McKay, Bloor & de Pennington, 1996). Furthermore, the model has to be correct and complete, minimal, and understandable.

A conceptual information model provides a common vocabulary and describes the universe of discourse pre- cisely in the sense of ontology. By building this model a better understanding can be obtained, not only for the domain itself, but also for already existing data models.

In the following section we will sketch a novel conceptual information model that has been designed with the given requirements in mind. In contrast to the modeling approaches reported previously, it provides detailed descriptions of the chemical plant from a func- tional and an implementational perspective together with mathematical models of varying detail (Mar- quardt, von Wedel & Bayer, 1999). In Section 3 some integration mechanisms between this model and some parts of the pdXi data model will be shown. A pro-

+ 8pwda~non - a- ---+ iM~aanal

Fig. 1. The conceptual lifecycle process model CLiP with its three

layers of abstraction.

posed architecture follows in Section 4, based on a conceptual data model serving as an integration framework and as a mediator between existing tools in an integrated software environment.

2. A conceptual model for process engineering

This section is divided into two parts. In the first part we will present the product data framework of our conceptual lifecycle process model CLiP (which also includes a work process model not discussed here). The second part focuses on an important aspect of product data, the concepts describing the chemical plant itself. The concepts introduced here will be used later to exemplarily present the model integration mechanisms.

2.1. The modeling framework

As the root concept of the modeling framework the system is introduced (Marquardt et al., 1999). Accord- ing to the ideas of general systems theory a system is characterized by some properties like function and behavior (Bunge, 1979). An interface connects the system with other systems and its environment. A system can be decomposed into one or several parts, which are systems themselves as indicated by the contuins-associa- tion in Fig. 1. In addition to this decomposition which is restricted to systems of the same type, we introduced a refers to-link, that indicates that systems of different types can be interrelated or dependent. Aggregated and structured systems can be distinguished, where aggregated systems are accumulated of several, more or less interacting elements while structured systems have a well-defined internal structure. These modeling concepts form the abstract basis of the framework.

Different refinements of systems can be distinguished on a more concrete level (see Fig. 1). Technical systems represent all kinds of technical artifacts that are built to fulfill some functionality. They are instances of structured systems. Technical systems are either devices or connections. Devices have the major functionality and are linked with connections. A technical system can be described by mathematical models, which are structured systems themselves. These contain mathematical expres- sions, which are used to describe and predict the behavior of a technical system. Furthermore, material and social systems are introduced as instances of aggregated systems. Material abstracts matter and substances that can be used in various manners by technical systems. A social system can be a group of persons or a single person.

The concept of technical systems is further refined to chemical process systems, which consist of three distinguished parts, the processing and the operating subsys-

B. Bayer et al. /Computers and Chemical Engineering 24 (2000) 599-605 601

Fig. 2. Processing subsystem and its realization.

tern and the management system. The processing subsystem holds all functionalities of materials processing, the operating subsystem comprises the technology for con- trolling and managing this processing subsystem, and finally, the management system - an instance of social system - refers to the personnel working on the chemical plant. There are two different instantiations of material on this level of detail, the processing material which is processed in order to get a specified product and the construction materiaZ used to build the chemical process system. The behavior of processing material can be described by the material models. These are refer- enced by process models used to describe a chemical process system. Both, material models and process models are refinements of mathematical model.

The conceptual framework contains three different layers of abstraction. The meta meta layer holding the general system, the meta layer with the technical and the social systems and the simple class layer with the chemical process system. There are different possible specifications of meta data that are used to understand and describe data and the use of data in information systems (Jarke, Lenzerini, Vassiliou & Vassiliadis, 1999). Here, meta models are specified in the sense of a dictionary to describe data elements and relations between them. The model is implemented in the object base management system ConceptBase (Jarke, Gallers- diirfer, Jeusfeld, Staudt & Eherer, 1995).

2.2. The chemical plant model

The chemical process system and its subsystems form a major part of the model. In the following section we will focus on the processing subsystem. As mentioned before, each system can be characterized by some properties. Requirement, spec@ation, realization, and behavior were identified as the most important ones (Marquardt et al., 1999).

Fig. 2 shows the processing subsystem of a chemical process system together with the concepts from its realization partial model. The UML (Fowler & Scott, 1997) is used in this and the following diagrams to depict concepts by classes with their attributes and the associations between them. The realization of the abstract processing subsystem introduced above can be seen as the plant with all its elements, where chemical, physical, and biological processes are performed in order to produce a specified compound. We capture all plant elements by the concept of the plant item, which is associated to the processing subsystem. There are three subclasses, equipment, pipe, and nozzle. Equip- ment and pipe are realizations of processing device and processing connection, respectively (not shown in Fig. 2). Nozzles can be classified as equipment nozzles or pipe nozzles. The associations between equipment, equipment nozzle, pipe nozzle, and pipe depict the structure of the plant. Single pieces of equipment and groups of equipment are distinguished. The latter contain not only equipment but also the connecting pipes. A plant is a special group of equipment, whereas apparatuses and machines are pieces of equipment.

3. Data model integration

The problem of the integration of different data models and schemata is well known in the area of database construction and data warehouse construction. Therefore, different mechanisms and techniques have been developed that provide the ability to deal with several information sources. All these approaches can be classified into two main categories (Calvanese, De Giacomo, Lenzerini, Nardi & Rosati, 1999), the construction of one integrated schema and the definition of mappings between different schemata.

In this section, four different mechanisms for data model integration will be presented (two for each cate- gory). The integration mechanisms will be illustrated by some examples taking the equipment and its subordinate concepts from our modeling framework and some parts of pdXi (IS0 10303, 1998) describing plant items and process equipment. In pdXi, plant items like process equipment are physical objects that are part of a process plant. There are four major groups of process equipment: vessels, heat exchanger equipment, material, and mass transfer equipment. Within these groups, different types are distinguished that are characterized by a large number of attributes. In order to have the same notation for both models, the IDEFlX diagrams of pdXi were manually transformed into the UML and the class names were changed by the addition of the prefix ‘pdXi_‘.


3.1. Mapping via bridges

Data models that need to be integrated usually overlap in scope and semantics. Each overlap can be used to define mappings between the different schemata. This can be done by the definition of bridges that link two classes from different models that describe the same concept (Mariiio, Rechenmann & Uvieta, 1990).

Fig. 3 shows some classes of our model and the pdXi model representing plant items, pumps, and vessels. Gray arrows indicate the bridges between the two taxonomies: PlantItem matches pdXi_PlantItem, Piece- OfEquipment matches pdXi_ProcessEquipment, and so forth. Marifio et al. (1990) introduced bridges between class hierarchies that represent perspectives of different experts of one domain, where concepts within different perspectives can describe exactly the same object. In contrast, there is usually a mismatch between the con- tent and semantics of the concepts when existing data models are going to be integrated in an a-posteriori approach. This mismatch has to be overcome by means of a transformation. Therefore, the bridges need to be enriched by some transformation rules and constraints,

Fig. 3. Connecting similar concepts via bridges.

Fig. 4. A merged schema of pdXi and CLiP.

that reconcile the semantic differences between the hierarchies. Transformation rules can be defined using graph grammars that were introduced as a formal basis for the integration of different documents (Lefering & Schi.irr, 1996). In Fig. 3 only bridges between classes are shown. But an extension of this mapping mechanism to single attributes and relationships is possible.

A major drawback of the mapping via bridges is the high specification effort, concepts with similar meanings in the different models have to be identified and transformations between these concepts must be defined largely manually.

3.2. Schema integration

In schema integration different modeling schemata of existing or proposed databases are integrated into a global and unified schema. In this area, research has been done for a long time in information technology; Batini, Lenzerini and Navathe (1986) give an early review and still new approaches and results are pub- lished. All approaches of schema integration follow similar methodological steps in order to find and re- solve conflicts between the different schemata and to merge them into one conformed schema. Different data models serve as the input; the output is usually one conformed, global schema and mappings between the global schema and each integrated model.

In Fig. 4 one schema is shown that can be obtained from schema integration of the parts of CLiP and pdXi introduced in Fig. 3. The names of the concepts were not changed in order to make visible which concept comes from which model. It is obvious that the more abstract, upper part of the merged schema is equivalent to our model, whereas the more concrete concepts were taken from pdXi. This comes from the fact that pdXi is more detailed in the description of single apparatuses and machines while our modeling activities focus on the description of the overall dependencies.

The major drawback of the merging approach for integrating data models is that for each new data model that has to be integrated, a new global schema has to be created as well as new mappings or modified queries for all integrated models. Even though numerous work has been done in order to support and automate the merging of data models, this mechanism remains very inflex- ible. On the other hand, the integration of some existing data models for the domain of interest can be the starting point for the definition of a data model where other models can be mapped onto as described in Section 3.1. The advantage of this approach is the expected generality of the global data model where concepts and views from different models and modelers are captured and merged into one model.


Plantitem

f I I

Equipment Pipe

I ---

I GroupOfEquipment pdXi_ProcessEquipmenl

? I

, pdXI_FluidTranstarMachine

Fig. 5. Integration of pdXi taxonomy by a specialization link.

Fig. 6. Integration of different instantiations under one meta class.

3.3. Integration as subordinate concepts

When different data models for one domain exist on different levels of detail the integration into one schema can be facilitated by defining the concepts from the detailed model as subordinate concepts of the more general ones. As stated before, the pdXi model is more detailed than our modeling framework. This fact can be used to define an integrated model as shown in Fig. 5. The entire taxonomy of concepts for describing process equipment (i.e. pdXi_ProcessEquipment with all its subclasses, their attributes, and associations) was taken from pdXi and integrated without changes into our model of the chemical plant by defining a specialization relationship from Equipment to pdXi_ ProcessEquip- ment (indicated by the dashed line in Fig. 5).

Once such a concept is identified, that is common for both models and that is the root of a detailed taxonomy in one of the models and a part of a more abstract taxonomy in another one, an integration by defining a

specialization can be done very easily and with little loss of information. One can describe overall dependencies between data for a specific domain in a conceptual data model (comparable with the introduced modeling framework) and adopt more detailed descriptions from the existing data models. Subordinate taxonomies integrated by specializations of the conceptual classes can be taken from different data models. One can choose the best existing modeling approach for every single part of the conceptual model. But for each part of the model, only one existing schema can be integrated. An integration of two or more taxonomies without change as specializations of one concept would lead to redun- dancies and inconsistencies that can only be resolved by merging them (cf. Section 3.2).

3.4. Meta integration

The three integration mechanisms introduced above work on the simple class level of our modeling framework. At this level a rich semantic description of the individual concepts is provided in order to formalize design knowledge. By abstracting from these semantical details on the meta level of the data model, a conceptual integration of various existing data models can be enabled (Bayer, Schneider & Marquardt, 1999) as it is shown in Fig. 6.

Any technical system has some realization, indicated by the concept of the TechnicalSystemRealization. For a Processingsubsystem this realization is specified by PlantItems and subordinate concepts. In pdXi, the taxonomy with the root class (pdXi_)PIantZtem also describes physical objects, which are the parts of a chemical plant. These physical objects represent the realization of a chemical process, which is a special technical system. Therefore, not only Plantltem is an instance of TechnicalSystemRealization, but also pdXi_ PlantItem and its subclasses.

This instantiation of an abstract concept into an existing modeling schema is similar to the integration by subclassing; an entire taxonomy can be integrated into the model without changes. But instantiation is more flexible than subclassing. Modeling concepts from several existing data models can be instances of one abstract meta class and are thus related to each other. Fig. 6 shows that the taxonomies starting with Plant- Item and pdXi_PlantItem describe the same real world objects - physical objects that are part of the realization of a chemical process system. Some meta modeling languages provide mechanisms to formulate constraints and rules that can further specify the relations between the different instantiations. Those can be used to refine the relations between the different taxonomies similar to bridges introduced in Section 3.2.


4. A proposed architecture for tool integration

For the realization of a software environment with tight tool integration providing data exchange and data sharing, some of the introduced integration mechanisms can be combined in an efficient manner. When existing tools are going to be integrated, their data models cannot be deleted or even changed. Therefore, transformations are always needed for them. Since the number of transformations should be minimized, a global schema should be enabled where all tool models are mapped onto for integration. There are different possi- bilities for the development of such a global schema. It may be obtained by merging the native data models of the tools to be integrated. However, when a new tool has to be integrated into the environment, the merging and the transformation definitions have to be repeated. Therefore, the global data model should be defined independently from the tools to be integrated. If parts of this schema are on a meta level, a classification of the modeling concepts used by the software tools by means of instantiation is enabled, which eases the definition of transformations.

This leads us to the following architecture for an integrated conceptual information model; a global data model exists covering several layers of abstraction on different meta levels. The simple, class level has the same degree of detail as the data models in the integrated tools. There are transformations defined between the concepts of the global data model and the native tool models, similar to enriched bridges (Section 3.1). On the meta level, the relations between all modeling concepts are described; the ones in the global model (Section 3.4) as well as the ones in the tool models and the relations between them. When there is the need for data exchange between two tools, two transformations on the simple class level can be performed, from the first tool schema via the global schema to the second tool schema. When some data is needed that is stored in different tools, the meta model can provide the information, where the information can be obtained from and how it has to be transformed for that specific use. With these two different features, our integration architecture provides additional functionality as com- pared with a neutral exchange format like STEP; it qualifies to serve as the conceptual basis for mediation in heterogeneous platforms to support the chemical process design process.

5. Conclusions

For an effective support of the design lifecycle in chemical engineering, an integration of existing software tools and support systems that are currently used is inevitable. Since all computer-aided tools are based

on some data models, an integration of these models is needed. This study presents some mechanisms for the integration of the existing model schemata. A global schema proved to be advantageous for all of these integration mechanisms. We developed the modeling framework CLiP from scratch that can serve as the global model for the integration of product data models. It is based on general systems theory in order to obtain a model as general and conceptual as possible. CLiP covers several meta levels of abstraction that can be used for conceptual integration.

The framework builds a central part for the specification of an integrated design environment that is currently prototypically realized within the Collaborative Research Center (CRC) IMPROVE at RWTH Aachen. Future work will focus on a partial implementation of the framework in order to demonstrate its capability for the integration of product data stored in the existing application tools.

Acknowledgements

This work is supported by the DFG, Deutsche Forschungsgemeinschaft, in the CRC 476.

References

Batini, C., Lenzerini, M., & Navathe, S. B. (1986). A comparative analysis of methodologies for database schema integration. ACM Computing Surveys, 18, 323-364.

Batres, R., & Naka, Y. (1999). Process plant ontologies based on a multi-dimensional framework. 5th International Conference on Foundations of Computer-Aided Process Design. Breckenridge, Colorado.

Bayer, B., Schneider, R., & Marquardt, W. (1999). Product data modeling for chemical process design. In U. F. Baake, & R. N. Zobel, From product design to product marketing, society for computer simulation (pp. 139-143). Delft: Delft University Press.

Bunge, M. (1979). Treatise on basic philosophy ontology II: A world of systems, vol. 4. Dordrecht: Riedel.

Calvanese, D., De Giacomo, G., Lenzerini, M., Nardi, D., & Rosati, R. (1999). Source integration. In: Jarke, M., Lenzerini, M., Vas- silou, Y., & Vassiliadis, P. (1999), 27-45.

Fowler, M., & Scott, K. (1997). UML Distilled. Reading: Addison Wesley.

IS0 10303 (1998). Process engineering data: process design and process specifications of major equipment. Part 231, IS0 TC184/ SC4/WG3 N740.

Jarke, M., Gallersdorfer, R., Jeusfeld, M. A., Staudt, M., & Eherer, S. (1995). Concept base - a deductive object base for meta data management. Journal of Intelligent Information Systems, 4, 167- 192.

Jarke, M., Lenzerini, M., Vassiliou, Y., & Vassiliadis, P. (1999). Fundamentals of data warehouses. Berlin: Springer.

Lefering, M., & Schiirr, A. (1996). Specification of integration tools. In M. Nagl, Building tightly integrated software development envi- ronments (pp. 324-334). Berlin: Springer.

Marifio, O., Rechenmann, F., & Uvieta, P. (1990). Multiple perspectives and classification mechanism in object-oriented representa- tion 9th European Conference on Artificial Intelligence, Sweden.


Marquardt, W., & Nagl, M. (1998). Tool integration via interface standardization. DECHEMA-Monographie, 135, 95- 126.

Marquardt, W., vonWede1, L., & Bayer, B. (1999). Perspectives on lifecycle process modeling. 5th International Conference on Foundations of Computer-Aided Process Design, Breckenridge, Colorado.

McKay, A., Bloor, M. S., & de Pennington, A. (1996). A framework for product data. IEEE Transactions on Knowledge and Data Engineering, 8, 825-838.

Schuler, H. (1998). ProzeSfiihrung. Chemie-Zngenieur-Technik, 70, 1249- 1264.

Westerberg, A., & n-dim Group (1997). Designing the process design process. Computers and Chemical Engineering, 21, Sl-S9.

integration of data models for process design — first steps and experiences

Documents