the structure of inverses in schema mappings · for s-t tgd mappings, fagin [fag07] focused only on...

44
The Structure of Inverses in Schema Mappings Ronald Fagin and Alan Nash IBM Almaden Research Center 650 Harry Road San Jose, CA 95120 Contact email: [email protected] To appear: J. ACM Abstract A schema mapping is a specification that describes how data structured under one schema (the source schema) is to be transformed into data structured under a different schema (the target schema). The notion of an inverse of a schema mapping is subtle, because a schema mapping may associate many target instances with each source instance, and many source instances with each target instance. In PODS 2006, Fagin defined a notion of the inverse of a schema mapping. This notion is tailored to the types of schema mappings that commonly arise in practice (those specified by “source-to-target tuple-generating dependencies”, or s-t tgds). We resolve the key open problem of the complexity of deciding whether there is an inverse. We also explore a number of interesting questions, including: What is the structure of an inverse? When is the inverse unique? How many non-equivalent inverses can there be? When does an inverse have an inverse? How big must an inverse be? Surprisingly, these questions are all interrelated. We show that for schema mappings M specified by full s-t tgds (those with no existential quantifiers), if M has an inverse, then it has a polynomial-size inverse of a particularly nice form, and there is a polynomial-time algorithm for generating it. We introduce the notion of “essential conjunctions” (or “essential atoms” in the full case), and show that they play a crucial role in the study of inverses. We use them to give greatly simplified proofs of some known results about inverses. What emerges is a much deeper understanding about this fundamental and complex operator. Categories and Subject Descriptors: H.2.5 [Heterogeneous Databases]: Data translation; H.2.4 [Systems]: Relational databases General Terms: Algorithms, theory Additional Keywords and Phrases: Schema mapping, data exchange, data integration, model management, inverse, chase, dependencies, essential conjunction, essential atom 1 Introduction Schema mappings are high-level specifications that describe the relationship between two database schemas. A schema mapping is defined to be a triple M =(S, T, Σ), where S (the source schema) and T (the target schema) are sequences of distinct relation symbols with no relation symbols in common, and Σ is a set of database dependencies that specify the association between source instances and target instances. If I is a source instance (an instance of the schema S), and J is a target instance (an instance of the schema T), then we say that J is a solution for I if the pair (I,J ) together satisfies Σ. We sometimes identify the schema mapping M with the set 1

Upload: others

Post on 25-Jan-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

The Structure of Inverses in Schema Mappings

Ronald Fagin and Alan NashIBM Almaden Research Center

650 Harry RoadSan Jose, CA 95120

Contact email:[email protected]

To appear: J. ACM

Abstract

A schema mapping is a specification that describes how data structured under one schema (the sourceschema) is to be transformed into data structured under a different schema (the target schema). The notionof an inverse of a schema mapping is subtle, because a schema mapping may associate many target instanceswith each source instance, and many source instances with each target instance. In PODS 2006, Fagin defineda notion of the inverse of a schema mapping. This notion is tailored to the types of schema mappings thatcommonly arise in practice (those specified by “source-to-target tuple-generating dependencies”, ors-t tgds).We resolve the key open problem of the complexity of decidingwhether there is an inverse. We also explorea number of interesting questions, including: What is the structure of an inverse? When is the inverse unique?How many non-equivalent inverses can there be? When does an inverse have an inverse? How big must aninverse be? Surprisingly, these questions are all interrelated. We show that for schema mappingsM specifiedby full s-t tgds (those with no existential quantifiers), ifM has an inverse, then it has a polynomial-size inverseof a particularly nice form, and there is a polynomial-time algorithm for generating it. We introduce the notionof “essential conjunctions” (or “essential atoms” in the full case), and show that they play a crucial role in thestudy of inverses. We use them to give greatly simplified proofs of some known results about inverses. Whatemerges is a much deeper understanding about this fundamental and complex operator.

Categories and Subject Descriptors:H.2.5 [Heterogeneous Databases]: Data translation; H.2.4[Systems]:Relational databases

General Terms: Algorithms, theory

Additional Keywords and Phrases: Schema mapping, data exchange, data integration, model management,inverse, chase, dependencies, essential conjunction, essential atom

1 Introduction

Schema mappings are high-level specifications that describe the relationship between two database schemas. Aschema mapping is defined to be a tripleM = (S,T, Σ), whereS (the source schema) andT (the targetschema) are sequences of distinct relation symbols with no relation symbols in common, andΣ is a set of databasedependencies that specify the association between source instances and target instances. IfI is a source instance(an instance of the schemaS), andJ is a target instance (an instance of the schemaT), then we say thatJ is asolutionfor I if the pair(I, J) together satisfiesΣ. We sometimes identify the schema mappingM with the set

1

Page 2: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

of pairs(I, J) that satisfyΣ, and write(I, J) ∈ M. The most widely studied case arises whenΣ is a finite setof source-to-target tuple-generating dependencies (s-t tgds). We refer to a schema mapping specified by a finiteset of s-t tgds as ans-t tgd mapping. These mappings have also been used in data integration scenarios under thename of GLAV (global-and-local-as-view) assertions [Len02]. Our main focus in this article is on inverses for s-ttgd mappings.

1.1 Motivation and History

Since schema mappings form the essential building blocks ofsuch crucial data inter-operability tasks as dataexchange and data integration (see the surveys [Kol05, Len02]), several different operators on schema mappingshave been singled out as deserving study in their own right [Ber03]. The composition operator and the inverseoperator have emerged as two of the most fundamental operators on schema mappings. The composition operatoris important, because we wish to understand what schema mapping is obtained by first applying one schemamapping and then another schema mapping. The inverse operator is important, because we wish to know how to“undo” the effects of a schema mapping (intuitively, to go back and retrieve the original data). Both compositionand inversion arise naturally in the study of schema evolution [BM07].

The composition operator has been investigated in depth [FKPT05, MH03, Mel04, NBM07]; however, progresson the study of the inverse operator was not made until recently. Even finding the exact semantics of this opera-tor is a delicate task, since unlike the traditional use of the name “mapping”, a schema mapping is not simply afunction that maps an instance of the source schema to an instance of the target schema. Instead, for each sourceinstance, the schema mapping may associate many target instances. Furthermore, for each target instance, theremay be many corresponding source instances.

Fagin [Fag07] gave a formal definition of what it means for a schema mappingM′ to be an inverse of aschema mappingM. The intuition behind his approach is that the result of applying firstM and thenM′ (that is,the compositionM◦M′) should be the identity mapping. In Fagin’s (and our) case ofspecial interest, whereMis an s-t tgd mapping, there is a complication with this intuition. Fagin showed that ifM is an s-t tgd mapping,then there is no schema mappingM′ (s-t tgd mapping or otherwise) such thatM◦M′ equals the standard identitymapping (which, intuitively, is the mapping that consists precisely of the pairs(I, J) whereJ = I). He showedthat in a precise sense, the closest thatM◦M′ can come to the standard identity mapping is forM◦M′ to bethecopy mapping, which is specified by s-t tgds that “copy” the source instance to the target instance (we shalldefine the copy mapping formally later). Therefore, he definedM′ to be aninverseof M if M◦M′ is the copymapping. He showed how to construct an inverse of an s-t tgd mapping that is itself an s-t tgd mapping when suchan inverse exists. He also developed a number of tools for thestudy of inverses of s-t tgd mappings. He showedthat deciding invertibility of an s-t tgd mapping is coNP-hard, and left open the question as to whether it is evendecidable. We give a matching coNP upper bound, which shows that deciding invertibility is coNP-complete.Since Fagin’s coNP-hardness lower bound holds even when thes-t tgds that specify the schema mapping arefull(have no existential quantifiers), it follows that decidinginvertibility in the full case is also coNP-complete.

Fagin, Kolaitis, Popa and Tan [FKPT08] observed that most schema mappings that arise in practice do nothave an inverse. Therefore, they introduced and studied a principled relaxation of the notion of an inverse ofa schema mapping, which they called aquasi-inverse. Intuitively, it is obtained from the notion of an inverseby not differentiating between instancesI andI ′ that have the same set of solutions, and so are equivalent fordata-exchange purposes. We shall not discuss the formal details of quasi-inverses here. Instead, we simply notethat they showed that ifM is an s-t tgd mapping, then every inverse ofM is a quasi-inverse ofM, but there ares-t tgd mappings with a quasi-inverse but no inverse.

Arenas, Perez, and Riveros [APR08] defined another relaxation of the notion of inverse. IfM andM′ areschema mappings, they say thatM′ is arecovery ofM if (I, I) ∈ M◦M′ for every source instanceI. They saythatM ′ is amaximum recovery ofM if M′ is a recovery ofM, and ifM◦M′ ⊆ M ◦M′′ for every recoveryM′′ of M. Intuitively, M′ is a maximum recovery ofM if M ◦ M′ comes as close as possible to being thestandard identity mapping. Arenas, Perez, and Riveros showed that ifM is an s-t tgd mapping, then every inverse

2

Page 3: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

of M is a maximum recovery ofM. They also showed the key result that every s-t tgd mapping has a maximumrecovery (which contrasts with the fact that there are s-t tgd mappings with no inverse).

Although we did not explicitly say this earlier, all of the work we just described (on inverses, quasi-inverses,and maximum recoveries) were based on the assumption that werestrict our attention to source instances thatareground (contain no null values). Fagin, Kolaitis, Popa and Tan [FKPT09] defined new notions of inverseand of recovery, calledextended inversesandextended recoveries(the latter gives a corresponding notion of amaximum extended recovery), that they argue are more appropriate notions to use when the source instances maycontain null values. They showed that every s-t tgd mapping that is extended invertible is invertible, but notconversely. Intuitively, it is harder for a schema mapping to be extended invertible, since there are more instances(i.e., non-ground source instances) to consider. Fagin, Kolaitis, Popa and Tan showed that when source instancesmay contain nulls, then there are s-t tgd mappings that do nothave a maximum recovery, under the definition ofmaximum recovery by Arenas, Perez, and Riveros in [APR08],but that every s-t tgd mapping has a maximumextendedrecovery. In this article, we allow only ground source instances (in fact, as we shall explain shortly, manyof the issues considered in this article are natural only under the assumption that source instances are restricted tobeing ground).

When it comes to “flavors of inverses” (that is, Fagin’s notion of inverse, Fagin, Kolaitis, Popa and Tan’snotions of quasi-inverse and of maximum extended recovery,and Arenas, Perez, and Riveros’s notion of maximumrecovery), Fagin’s notion of inverse is the gold standard. It is the hardest to attain, and it is automatically an inversein all of the other flavors. This is why we feel that it is worthwhile to investigate the structure of inverses, whichwe do in this article.

1.2 The Language of Inverses

For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language needed to express inverses of s-t tgd mappings. Fagin, Kolaitis, Popa and Tan [FKPT08]resolved this problem. Specifically, they gave an algorithmfor constructing acanonical candidate inversefor ans-t tgd mapping. It is specified by using what they calleds-t tgds with constants and inequalities. These are likes-t tgds, but there may also be formulasconst(x) (which say thatx is a constant) and inequalities in the premise.They showed that if an s-t tgd mapping is invertible, then itscanonical candidate inverse is indeed an inverse.

We definenormal inverses, that are specified by special cases of s-t tgds with constants and inequalities. Thecanonical candidate inverse is a normal inverse. Hence, if an s-t tgd mapping has an inverse, then it has a normalinverse. It is not hard to see that this would not be true if we were to allow non-ground source instances. This iswhy we consider only ground source instances, in the tradition of the earlier articles [APR08, Fag07, FKPT08].Normal inverses are especially nice, in that ifI is a source instance,M is an s-t tgd mapping specified byΣ, andM′ is a normal inverse ofM that is specified byΣ′, then the result of chasingI with Σ and then chasing theresult byΣ′ gives back exactlyI (this is not true of arbitrary inverses, even where the chaseis well-defined).

We focus our study mainly on normal inverses.

1.3 Our Contributions

In addition to our result mentioned earlier where we resolvethe complexity of the deciding if an s-t tgd mappingis invertible, we obtain a number of other new results about inverses, that we now discuss.

Essential conjunctions and atoms.We introduce the notions ofessential conjunctions(and, in the full case,of essential atoms), which turn out to play a fundamental role for the study of inverses. Roughly speaking, anessential conjunction for a relational atomA (with respect to an s-t tgd mapping) is a conjunction such that (a)the atoms in the conjunction arise in the chase ofA, and (b) if all of these atoms arise together in a chase, thenA

is present in the source. We show that an s-t tgd mapping is invertible if and only if each atom has an essentialconjunction (in the full case, if and only if each atom has an essential atom). Further, we show how to construct anormal inverse directly from the essential conjunctions.

3

Page 4: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

Unique inverses.For most notions of “inverse” that arise in mathematics, if there is an inverse, then it is unique.However, as we show, no schema mapping has a unique inverse. What about a unique normal inverse? This ispossible, and we give a characterization of those s-t tgd mappings with a unique normal inverse.

In the full case (where the s-t tgds have no existential quantifiers) there is an especially interesting story. Let ussay that a full s-t tgd mappingM = (S,T, Σ) is ontoif every target instance is the result of chasing some sourceinstance withΣ. We show that if a full s-t tgd mapping is invertible and onto,then it has a unique normal inverse.What about the converse? We show that the converse fails. What if we enrich the language of possible inverses?Following [FKPT08], we definedisjunctive tgds with inequalitiesby allowing inequalities in the premise anddisjunctions in the conclusion (such mappings were shown tobe necessary to express quasi-inverses of full s-t tgdmappings in [FKPT08]). We show that a full s-t tgd mappingM has a unique inverse specified by disjunctive tgdswith inequalities if and only ifM is invertible and onto. Furthermore, we show thatM satisfies these conditionsif and only if M is equivalent to a slight generalization of the copy mapping, called ap-copy mapping.

Inverse of an inverse.For most notions of “inverse” that arise in mathematics, ifM′ is an inverse ofM, thenM is an inverse ofM′. This is becauseM ′ is a right-inverse ofM (that is,M◦M′ is the identity) if and only ifM is a left-inverse ofM′, and most notions of inverse that arise in mathematics are 2-sided (every right-inverseis a left-inverse, and vice-versa). An inverse of a schema mapping, as defined by Fagin in [Fag07], is only aright-inverse. In particular, it doesnot follow that if M′ is an inverse of a schema mappingM, thenM is aninverse ofM′. In fact, it does not even follow that ifM′ is an inverse of a schema mapping, thenM′ is invertible.Surprisingly, it turns out to be rare that a normal inverse ofan s-t tgd mapping is itself invertible. We show thatM is a full s-t tgd mapping with an invertible normal inverse ifand only ifM is again equivalent to a p-copymapping. By combining this result with our results about unique inverses, we obtain the unexpected result thatseveral nice properties of a full s-t tgd mapping are equivalent:M has an invertible normal inverse if and only ifM has a unique inverse specified by disjunctive tgds with inequalities if and only ifM is invertible and onto ifand only ifM is equivalent to a p-copy mapping. We also show that these nice properties are not equivalent if weremove the restriction thatM be full.1

The size of an inverse, and the complexity of computing an inverse. How big does a normal inverse need tobe? We show that there is a family of full, invertible s-t tgd mappingsM such that the size of the smallest normalinverse ofM is exponential in the size ofM. Therefore, we broaden the class of normal mappings by allowing notjust inequalities but also Boolean combinations of equalities in the premises, and we call these mappingsBooleannormal. Allowing Boolean normal mappings does not increase the expressive power of normal mappings, butallows a more compact representation. Indeed, we show that every invertible full s-t tgd mapping has a Booleannormal inverse of polynomial size. And in fact, we give a polynomial-time algorithm for generating this Booleannormal inverse.

Is there a relationship between the number of normal inverses and the size of the minimal Boolean normalinverse? We cannot bound the number of normal inverses in terms of the size of the minimal Boolean normalinverse, since there are examples with an infinite number of inequivalent normal inverses. However, we showthat if there are only a small number of inequivalent normal inverses, then the minimal number of constraints in aBoolean normal inverse is small. Specifically, we show that if M is a full s-t tgd mapping, withk source relationsymbols and with exactlym ≥ 1 inequivalent normal inverses, thenM has a Boolean normal inverse with atmostk + log2(m) constraints.

Simpler proofs of known results. We give greatly simplified proofs of two results whose previous proofs werequite complex. First, we give a simple proof of the result in [FKPT08] that for invertible s-t tgd mappingsM, thecanonical candidate inverse ofM is indeed an inverse ofM. We now discuss the second result where we give agreatly simplified proof. Fagin [Fag07] introduced theunique-solutions property, which says that no two distinctsource instances have the same set of solutions. He showed that the unique-solutions property is a necessarycondition for a schema mapping to have an inverse. He gave a complicated proof that for LAV mappings (thosespecified by s-t tgds with a singleton premise), the unique-solutions property is not only a necessary condition butalso a sufficient condition for invertibility. We give a simple proof of this result.

1We do not define the property of beingonto whenM is not full. In fact,we remark that it is not completely clearwhat the “right”definition of onto should be in the nonfull case.

4

Page 5: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

2 Preliminaries

In this section, we give a number of definitions. Most of thesedefinitions are from [FKMP05], which we refer thereader to for an introduction to data exchange.

Schemas and Schema Mappings.A schemaR is a finite sequence(R1, . . . , Rk) of relation symbols, eachof a fixed arity. AninstanceI over R (which we may call anR-instance, or simply aninstance, whenR isunderstood), is a sequence(RI

1, . . . , RIk), where eachRI

i is a finite relation of the same arity asRi. We shall oftenuseRi to denote both the relation symbol and the relationRI

i that interprets it.

A schema mappingis a tripleM = (S,T, Σ) consisting of a source schemaS, a target schemaT, and a setΣ of constraints (defined shortly). We say thatM is specified byΣ. If Σ is a finite set of s-t tgds (defined shortly),then we may refer toM as ans-t tgd mapping. WhenS andT are clear from context, we will sometimes sayΣwhen we should say(S,T, Σ), and talk about a set of constraints, when we should talk about a schema mapping.

Instances and Formulas.From now on, we assume thatS andT are two fixed schemas. We callS thesourceschemaandT thetarget schema. We refer toS-instances assource instances, andT-instances astarget instances.LetC be a fixed countably infinite set ofconstantsand letN be a fixed countably infinite set ofnullsthat is disjointfromC. We assume that all source instances have individual values(that is, individual entries of tuples) from thesetC of constants only, while all target instances have individual values fromC ∪ N . We may sometimesrefer toS-instances asground instancesto emphasize the fact that all individual values in such instances areconstants. Intuitively, schema mappings of the formM = (S,T, Σ) model the situation in which we performdata exchange fromS to T: the individual values of source instances are known, whileincomplete informationin the specification of data exchange may give rise to null values in the target instances. We writedom(I) for the(active) domain of an instanceI, that is, the set of all individual values that appear inI.

If P is anm-ary relation symbol inS, andx1, . . . , xm are variables, not necessarily distinct, thenP (x1, . . . , xm)is arelational atom, or simplyatom(overS). We may refer to it as aP -atom. In the context of a schema mappingM = (S,T, Σ), we may refer to aP -atom whereP is in S as asource atom, and aP -atom whereP is in T as atarget atom. If P is anm-ary relation symbol inS, andc1, . . . , cm are values (constants or nulls), not necessarilydistinct, thenP (c1, . . . , cm) is a fact (overS). We may refer to it as aP -fact. We sometimes identify an instancewith its set of facts. IfI1 andI2 are instances, and the set of facts ofI1 is a subset of the set of facts ofI2, thenwe may writeI1 ⊆ I2, and say thatI1 is asubinstanceof I2.

There is a special unary relation symbolconst (that is not part of any schema). We refer to the formulaconst(x) for a variablex as aconst formula; the intended interpretation ofconst is thatconst(x) should holdprecisely ifx is a constant.

If δ is a conjunction of relational atoms (but noconst formulas), then we defineIδ to be an instance obtainedfrom δ as follows. For each variablev, assign a distinct fixed constantcv, and let the facts ofIδ consist of the factsP (cv1

, . . . , cvk) whereP (v1, . . . , vk) is an atom inδ. For example, ifδ isP (x, y) ∧Q(y), thenIδ is the instance

{P (cx, cy), Q(cy)}. If δ is a conjunction of relational atoms and at least oneconst formula, then we defineIδ asfollows. For each variablev such thatconst(v) is in δ, assign a distinct fixed constantcv, and for each remainingvariablev assign a distinct fixed nullnv. DefineIδ be the facts that result by taking each relational atom inδ anddoing a replacement using the assignment we just described.For example, ifδ isP (x, y)∧Q(y)∧ const(x), thenIδ is the instance{P (cx, ny), Q(ny)}. It is sometimes convenient to allowδ to contain also inequalitiesx 6= y,wherex andy are distinct variables among the variables appearing in relational atoms ofδ. In that case, we simplyignore the inequalities in definingIδ. Note that ifδ andδ′ contain the same relational atoms, andδ has noconstformula, whileδ′ contains the formulasconst(x) for every variablex in the relational atoms ofδ′, thenIδ andIδ′ are the same, and contain only constants (no nulls). Throughout this paper, we reserve the symbolδ (possiblywith subscripts or primes) to be a conjunction of relationalatoms,const formulasconst(x) for variablesx, andinequalitiesx 6= y for distinct variablesx andy,

A renamingof variables is a one-to-one function that maps variables tovariables. Aweak renamingofvariables is a function (not necessarily one-to-one) that maps variables to variables. We may sometimes refer to arenaming as astrict renaming, to distinguish it from a weak renaming.

5

Page 6: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

Define aprime atomto be one that contains precisely the variablesx1, x2, . . . , xk for somek, and where theinitial appearance ofxi precedes the initial appearance ofxj if i < j. For example,P (x1, x2, x1, x3, x2) is aprime atom, butQ(x2, x1) andR(x2, x3) are not. Note that for every relational atom, there is a unique renamingof variables to obtain a prime atom.

Constraints. All sets of constraints we consider are finite, unless otherwise specified. We consider constraints ofseveral forms. Asource-to-target tuple-generating dependency (s-t tgd)is a constraint of the form∀x∀y(α(x, y) →∃zβ(x, z)), whereα is a conjunction of source atoms andβ is a conjunction of target atoms. The variables inxare exactly the variables that appear in bothα andβ. Note in particular that this gives us the safety condition thatevery variable inβ that is not existentially quantified appears also inα. We will generally suppress writing the∀x∀y part. If z is empty, we say thatϕ is full. We use the standard notion ofsatisfactionof constraints, denoted|=. For example, ifI is a source instance,J is a target instance, andΣ is a set of s-t tgds, then we may write(I, J) |= Σ to mean that the pair(I, J) satisfies the members ofΣ.

Homomorphisms. Let J , J ′ be two instances. A functionh that maps values to values is ahomomorphismfrom J to J ′ if for every constantc, we have thath(c) = c, and for every relation symbolR and each tuple(a1, . . . , an) ∈ RJ , we have that(h(a1), . . . , h(an)) ∈ RJ′

. We writeJ → J ′ if there is a homomorphism fromJ to J ′. The instancesJ andJ ′ are said to behomomorphically equivalentif there are homomorphisms fromJto J ′ and fromJ ′ to J . We then writeJ ↔ J ′.

Solutions and Universal Solutions.LetM = (S,T, Σ) be a schema mapping. ThenJ is asolutionfor I (underM) if (I, J) |= Σ. We write Sol(M, I) to denote the solutions forI underM. We say that a solutionU for thesource instanceI is auniversal solution[FKMP05] if U → J for every solutionJ for I.

Composition and Inverse. We recall the concept of thecompositionof two schema mappings, introduced in[FKPT05, Mel04], and the concept of aninverseof a schema mapping, introduced in [Fag07].

Let M12 = (S1,S2,Σ12) andM23 = (S2,S3,Σ23) be schema mappings. ThecompositionM12 ◦ M23

is a schema mapping(S1, S3, Σ13) such that for everyS1-instanceI and everyS3-instanceJ , we have that(I, J) |= Σ13 if and only if there is anS2-instanceK such that(I,K) |= Σ12 and(K, J) |= Σ23. As noted in[FKPT05], the composition is unique. When the schemas are understood from the context, we will often writeΣ12 ◦ Σ23 for the compositionM12 ◦M23.

Let S be a replica of the source schemaS; that is, for every relation symbolR of S, the schemaS containsa relation symbolR that is not inS and has the same arity asR. We also assume thatR and S are distinctwhenR andS are distinct. IfA is a relational atomR(x1, . . . , xk), thenA is the relational atomR(x1, . . . , xk).Similarly, if F is a factR(c1, . . . , ck), thenF is the factR(c1, . . . , ck). If I is an instance ofS1, defineI to bethe corresponding instance ofS1. Thus,I consists precisely of the factsF such thatF is a fact ofI.

Thecopy mappingis the schema mappingId = (S, S,ΣId), whereΣId consists of the s-t tgdsR(x1, . . . , xk) →

R(x1, . . . , xk), whereR is k-ary, and whereR ranges over the relation symbols inS. Thus, ifI1 is anS-instanceandI2 is anS-instance, then(I1, I2) |= ΣId if and only if I1 ⊆ I2.

LetM12 = (S1,S2, Σ12) be a schema mapping. We say that a schema mappingM21 = (S2, S1, Σ21) is aninverseof M12 if M12 ◦M21 = Id, that is, the result of composingM12 with M21 is the copy mapping. SoM21 is an inverse ofM12 precisely if for every groundS1-instanceI and every groundS1-instanceJ , we havethat(I, J) |= Σ12 ◦ Σ21 if and only if I ⊆ J . If M12 has an inverse, then we say thatM12 is invertible.

Chasing. If M12 = (S1,S2, Σ12) is an s-t tgd mapping, thenchasing, or applying thechase process[MMS79]to theS1-instanceI with Σ12 produces anS2-instanceU such thatU is a universal solution forI underM12

[FKMP05]. We may writeU = chase12(I), and say thatU is the result of the chase. For definiteness, we usethe version of the chase as defined in [FKPT05], although it does not really matter, since whatever version of thechase we use, the results are all homomorphically equivalent. Similarly, we may writechase21(I) for the result ofchasing anS2-instanceI with Σ21. We shall also extend this notation to cases whereΣ12 or Σ21 are not simplysets of s-t tgds, but where we also allowconst formulas and inequalities in the premises.

6

Page 7: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

3 Deciding Invertibility

In [Fag07] it is shown that deciding invertibility of s-t tgdmappings is coNP-hard, and it was left open as towhether it is even decidable. In this section, we prove a matching coNP upper bound, which shows that decidinginvertibility of s-t tgd mappings is coNP-complete. Since the co-NP lower bound of [Fag07] holds also in the fullcase, this shows that deciding invertibility offull s-t tgd mappings is also coNP-complete.

It is not hard to verify (see [FKPT08]) that ifM12 is an s-t tgd mapping, andI andI ′ are source instanceswhereI ⊆ I ′, then Sol(M12, I

′) ⊆ Sol(M12, I). If the opposite implication also necessarily holds, thenM12

is said to have thesubset property[FKPT08]. Thus, an s-t tgd mappingM12 has thesubset propertyif wheneverI andI ′ are source instances where Sol(M12, I

′) ⊆ Sol(M12, I), thenI ⊆ I ′. It was shown in [FKPT08]that the subset property (which they called the(=,=)-subset property) is a necessary and sufficient condition forinvertibility of an s-t tgd mapping. We shall make use of the following property, which we call the “homomorphicversion” of the subset property, and which we shall show is equivalent to the subset property: wheneverI andI ′

are source instances wherechase12(I) → chase12(I′), thenI ⊆ I ′.

To prove our next result, we shall make use of the following simple lemma by Fagin, Kolaitis, Miller, andPopa [FKMP05].

Lemma 3.1 [FKMP05] If M12 = (S1,S2, Σ12) is an s-t tgd mapping, then the solutions for a ground instanceI are exactly the target instancesJ such thatchase12(I) → J .

Recall that ifA is the relational atomP (v1, . . . , vk), thenIA is an instance that consists of the single factP (cv1

, . . . , cvk). We shall make use of the following proposition.

Proposition 3.2 For an s-t tgd mappingM12 = (S1,S2, Σ12), the following are equivalent:

1. M12 is invertible.

2. WheneverI and I ′ are ground instances whereSol(M12, I′) ⊆ Sol(M12, I), thenI ⊆ I ′ (the subset

property).

3. WheneverI andI ′ are ground instances wherechase12(I) → chase12(I′), thenI ⊆ I ′ (the homomorphic

version of the subset property).

4. For every relational atomA and ground instanceI,

chase12(IA) → chase12(I) impliesIA ⊆ I.

5. For every relational atomA and ground instanceI with at mostn facts,

chase12(IA) → chase12(I) impliesIA ⊆ I

wheren is the number of facts inchase12(IA).

Proof The equivalence of (1) and (2) was shown in [FKPT08]. We now show that (2) and (3) are equivalent.Assume first that (2) holds. To prove that (3) holds, assume that chase12(I) → chase12(I

′); we must show thatI ⊆ I ′. By (2), it is sufficient to show that Sol(M12, I

′) ⊆ Sol(M12, I). Assume thatJ ∈ Sol(M12, I′); we

must show thatJ ∈ Sol(M12, I). SinceJ ∈ Sol(M12, I′), it follows from Lemma 3.1 thatchase12(I

′) → J .Since alsochase12(I) → chase12(I

′), it follows by transitivity of homomorphism thatchase12(I) → J . It thenfollows from Lemma 3.1 thatJ ∈ Sol(M12, I), as desired.

Assume now that (3) holds. To prove that (2) holds, assume that Sol(M12, I′) ⊆ Sol(M12, I); we must show

that I ⊆ I ′. By (3), it is sufficient to show thatchase12(I) → chase12(I′), Now chase12(I

′) ∈ Sol(M12, I′).

Therefore, since Sol(M12, I′) ⊆ Sol(M12, I), it follows thatchase12(I

′) ∈ Sol(M12, I). So by Lemma 3.1, itfollows thatchase12(I) → chase12(I

′), as desired.

7

Page 8: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

We now show that (3) and (4) are equivalent. It is clear that (3) implies (4), since (4) is a special caseof (3) whereIA plays the role ofI, andI plays the role ofI ′. To prove that (4) implies (3), we shall showthat if (3) fails, then (4) fails. Assume that (3) fails. Therefore, there are ground instancesI andI ′ such thatchase12(I) → chase12(I

′) andI 6⊆ I ′. SinceI 6⊆ I ′, we can assume (by renaming constants if needed) thatthere is an atomA such thatIA ⊆ I but IA 6⊆ I ′. SinceIA ⊆ I, we havechase12(IA) → chase12(I). Sincealsochase12(I) → chase12(I

′), it follows by transitivity of homomorphism thatchase12(IA) → chase12(I′),

witnessing that (4) fails (whereI ′ plays the role ofI).

Finally, we show that (4) and (5) are equivalent. It is clear that (4) implies (5). Assume now that (5) holds;we shall show that (4) holds. Assume thatchase12(IA) → chase12(I). Then necessarily alsochase12(IA) →chase12(I

′) for someI ′ ⊆ I with at mostn facts, sincechase12(IA) maps into at mostn facts inchase12(I). Soby (5), we have thatIA ⊆ I. Therefore, (4) holds.2

Proposition 3.2 gives us a very simple proof of the desired coNP upper bound on the problem of decidinginvertibility of s-t tgd mappings, as the next proof shows.

Theorem 3.3 The problem of deciding if an s-t tgd mapping is invertible iscoNP-complete.

Proof The proof of coNP-hardness is in [Fag07]. We now show the coNPupper bound. We make use of theequivalence of (1) and (5) in Proposition 3.2. To check thatM12 = (S1,S2, Σ12) is not invertible, guess arelational atomA, a ground instanceI such thatIA 6⊆ I whereI has at mostn facts, and a homomorphismh : chase12(IA) → chase12(I), wheren is the number of facts inchase12(IA). 2

Since Fagin’s coNP-hardness lower bound holds even when theschema mapping is full and LAV (that is,when each of the s-t tgds that specify the schema mapping is full and has a singleton premise), we obtain thefollowing corollary.

Corollary 3.4 The following complexity results hold.

1. The problem of deciding if a full s-t tgd mapping is invertible is coNP-complete.

2. The problem of deciding if a LAV s-t tgd mapping is invertible is coNP-complete.

3. The problem of deciding if a full and LAV s-t tgd mapping is invertible is coNP-complete.

4 Normal Mappings

In this section, we study a class of mappings (that we callnormal), which are an especially attractive choice forinverses of s-t tgd mappings. If an s-t tgd mappingM has an inverse, then it has a normal inverse, because (a) thecanonical candidate inverse (defined later) is normal, and (b) if M has an inverse, then the canonical candidateinverse ofM is indeed an inverse ofM [FKPT08]. Since we are interested in inversesM21 = (S2, S1, Σ21)of an s-t tgd mappingM12 = (S1,S2, Σ12), the normal mappings of interest to us have source schemaS2 andtarget schemaS1.

Definition 4.1 A constraint isnormal if it is of the formα ∧ χA ∧ η → A, whereα is a conjunction of sourceatoms,A is a target atom,χA is the conjunction of the formulasconst(x) for every variablex of A, andη is aconjunction (possibly empty) of inequalities of the formx 6= y for distinct variablesx, y of A. Further, there isthe safety condition that every variable inA must appear inα. As usual, we have suppressed writing the leadinguniversal quantifiers. A schema mapping is said to be normal if all of its constraints are normal.2

8

Page 9: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

Notice that werequire the const predicate on all variables inA, but justallow inequalities on variables inA. In particular,χA is fully determined byA (which is why we write it with the subscriptA), whereasη isnot determined byA (and can even be empty). Note also that every normal constraint is full (has no existentialquantifiers).

An example of a normal constraint is

P (x1, x2, x3, x3) ∧ const(x2) ∧ const(x3) ∧ (x2 6= x3) → Q(x2, x3, x2) (1)

If the formulaconst(x1) were added to the premise of (1), then the resulting constraint would not be normal,since the variablex1 does not appear in the conclusion of (1). If either of the formulasconst(x2) or const(x3)were removed from (1), then the resulting constraint would not be normal, since the variablesx2 andx3 appear inthe conclusion of (1). If the inequalityx2 6= x3 were removed from (1), then the resulting constraint would stillbe normal, since inequalities are not required. If the inequality x1 6= x2 were added to the premise of (1), thenthe resulting constraint would not be normal, since the variablex1 does not appear in the conclusion of (1).

Let M12 = (S1,S2, Σ12) andM21 = (S2, S1, Σ21) be schema mappings. Let us say thatΣ21 is too strong(for M12) if there is a groundS1-instanceI and a groundS1-instanceJ such thatI ⊆ J but(I, J) 6|= Σ12 ◦Σ21.SoΣ21 is not too strong precisely if whenever there is a groundS1-instanceI and a groundS1-instanceJ suchthat I ⊆ J , then(I, J) |= Σ12 ◦ Σ21. If Σ12 is a set of s-t tgds, andΣ21 is arbitrary, then it follows fromProposition 5.2 of [Fag07] thatΣ21 is not too strong precisely if(I, I) |= Σ12 ◦Σ21 for every groundS1-instanceI.

Example 4.2 Let S1 consist of the binary relation symbolP and the unary relation symbolT . Let S2 consistof the binary relation symbolP ′ and the unary relation symbolsQ andT ′. Let Σ12 = {P (x, y) → P ′(x, y),P (x, x) → Q(x), T (x) → T ′(x), T (x) → P ′(x, x)}, and letM12 = (S1,S2, Σ12). It was shown in [FKPT08]thatM12 is invertible, and that the inverse “requires inequalities” (details are in [FKPT08]). LetΣ′

21 consistsonly of the constraintP ′(x, y) → P (x, y). We now show thatΣ′

21 is too strong forM12. Let I = {T (0)}, andlet J = I. So of courseI ⊆ J (in fact, we have equality). However, we now show that(I, J) 6|= Σ12 ◦Σ′

21, whichshows thatΣ′

21 is too strong. Assume by way of contradiction that(I, J) |= Σ12 ◦Σ′21. Then there isK such that

(I,K) |= Σ12 and(K, J) |= Σ′21. Since(I,K) |= Σ12, we know thatK contains the factP ′(0, 0). Therefore,

since(K, J) |= Σ′21, we have thatJ containsP (0, 0), which contradicts the fact thatJ =

{T (0)

}. This is the

desired contradiction. So indeed,Σ′21 is too strong forM12. Now let Σ′′

21 = {P ′(x, y) ∧ (x 6= y) → P (x, y)}(so thatΣ′′

21 is obtained fromΣ′21 by adding the inequalityx 6= y to the premise). Then (as we now show),Σ′′

21 isnot too strong forM12. In fact, if I is a groundS1-instanceI andJ is a groundS1-instance such thatI ⊆ J , andif U = chase12(I), then(I, U) |= Σ12, and it is not hard to see that(U, J) |= Σ21. Hence,(I, J) |= Σ12 ◦ Σ21,as desired.2

Let us say thatΣ21 is too weak (forM12) if there is a groundS1-instanceI and a groundS1-instanceJ suchthat(I, J) |= Σ12 ◦ Σ21 but I 6⊆ J . SoΣ21 is not too weak precisely if whenever there is a groundS1-instanceIand a groundS1-instanceJ such that such that(I, J) |= Σ12 ◦ Σ21, thenI ⊆ J .

Example 4.3 Let M12 be as in Example 4.2. LetΣ′′′21 be the empty set. Then(I, J) |= Σ12 ◦ Σ′′′

21 for everychoice ofI andJ , and it follows easily thatΣ′′′

21 is too weak forM12. As a more interesting example, letΣ′′21 be

as in Example 4.2. We now show thatΣ′′21 is too weak forM12. Let I = {T (0)}, let J be the empty set, and

let K = {T ′(0), P ′(0, 0)}. It is easy to see that(I,K) |= Σ12, and(K, J) |= Σ′′21. So (I, J) |= Σ12 ◦ Σ′′

21.However,I 6⊆ J . So indeed,Σ′′

21 is too weak forM12. 2

The following simple proposition follows immediately fromour definitions.

Proposition 4.4 Let M12 = (S1,S2, Σ12) andM21 = (S2, S1, Σ21) be schema mappings. ThenM21 is aninverse ofM12 if and only ifΣ21 is not too strong and not too weak forM12.

9

Page 10: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

Example 4.5 Let M12 be as in Examples 4.2 and 4.3. LetΣ21 = {P ′(x, y) ∧ (x 6= y) → P (x, y), Q(x) →

P (x, x), T ′(x) → T (x)}, and letM21 = (S2, S1, Σ21). It can be shown thatM21 is an inverse ofM12, and soΣ21 is not too strong and not too weak forM12. 2

If Σ21 is not too strong, then for every groundS1-instanceI and every groundS1-instanceJ whereI ⊆ J ,there is an instanceK “in the middle” such that(I,K) |= Σ12 and(K, J) |= Σ21. We may say thatK witnessesthat(I, J) |= Σ12 ◦ Σ21. The next proposition says that ifM21 is a normal inverse ofM12, then any universalsolution can play the role of this witness. This is a quite useful as a tool in proving properties of normal inverses.

Proposition 4.6 AssumeM21 = (S2, S1, Σ21) is a normal inverse of the s-t tgd mappingM12 = (S1,S2, Σ12).Let I be a groundS1-instance, and letU be an arbitrary universal solution forI with respect toM12. Then(U, I) |= Σ21, andU witnesses(I, J) |= Σ12 ◦ Σ21 whenI ⊆ J .

Proof SinceM21 is an inverse ofM12, we know that(I, I) |= Σ12 ◦ Σ21, and therefore there exists someKsuch that(I,K) |= Σ12 and(K, I) |= Σ21. LetU be an arbitrary universal solution forI with respect toM12.Then there is a homomorphismh : U → K that is the identity on all values that appear inI (sinceI is ground).Pick a constraintϕ ∈ Σ21; by our normality assumption, it must be of the form

α(x, y) ∧ χA(x) ∧ η(x) → A(x).

Assume thatU satisfies the premise ofϕ on a, b. ThenK |= α(h(a), h(b)).2 SinceU |= χA(a), we haveh(a) = a and thereforeK |= α(a, h(b)). Also, sinceU |= η(a), we haveK |= η(a). SoK satisfies the premiseof ϕ on a, h(b). Hence, since(K, I) |= Σ21, we must haveI |= A(a). This shows that(U, I) |= ϕ. Sinceϕ isan arbitrary member ofΣ21, it follows that(U, I) |= Σ21, as desired. Since(U, I) |= Σ21 andI ⊆ J , it followseasily that(U, J) |= Σ21. Since(I, U) |= Σ12 and(U, J) |= Σ21, we have thatU witnesses(I, J) |= Σ12 ◦ Σ21,as desired.2

We now give an example that shows that ifM21 is not normal, then there may be no universal solution forI

that witnesses(I, I) |= Σ12 ◦ Σ21.

Example 4.7 Let S1 consist of the unary relation symbolS, and letS2 consist of the binary relation symbolT .Let Σ12 consist of the single s-t tgdS(x) → ∃y(T (x, y) ∧ T (y, x)), and letΣ21 consist of the single s-t tgdT (x, y) ∧ T (y, x) → S(x). Note that this latter s-t tgd is not a normal constraint, since it does not have theformulaconst(x) in its premise. LetM12 = (S1,S2, Σ12) andM21 = (S2, S1, Σ21).

Let I be an arbitrary nonempty ground instance, and letU be an arbitrary universal solution forI with respectto Σ12. Assume thatS(c) is a fact ofI. ThenU must contain factsT (c, n) andT (n, c) for some nulln. If(U, J) |= Σ21, then necessarilyJ contains the factS(n), which is not a fact ofI. SoU does not witness(I, I) |= Σ12 ◦ Σ21. However, if we takeK to be an instance that contains precisely the factsT (c, c) such thatS(c) is a fact ofI, then it is easy to see thatK witnesses(I, I) |= Σ12 ◦Σ21. It is then straightforward to see thatM21 is an inverse ofM12. Thus,M21 is a (non-normal) inverse ofM12 where no universal solution witnesses(I, I) |= Σ12 ◦ Σ21.

LetM′21 = (S2, S1, Σ′

21), whereΣ′21 consists of the single normal constraintT (x, y)∧T (y, x)∧const(x) →

S(x). ThenM′21 is a normal inverse ofM12, and as Proposition 4.6 tells us, every universal solution for I

witnesses(I, J) |= Σ12 ◦ Σ21. 2

It is easy to see that the chase process can be applied for normal mappings. The final theorem of this sectiongives an elegant characterization of normal inverses of s-ttgd mappings. Specifically, it says thatM21 is aninverse ofM12 if and only if chase21(chase12(I)) = I for every ground instanceI. To prove this theorem, weneed two lemmas, that characterize when a normal mapping is not too strong for an s-t tgd mapping, and when anormal mapping is not too weak for an s-t tgd mapping.

2If a = (a1, . . . , ar), then byh(a), we mean(h(a1), . . . , h(ar)).

10

Page 11: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

Lemma 4.8 LetM12 = (S1,S2, Σ12) be an s-t tgd mapping andM21 = (S2, S1, Σ21) be a normal mapping.ThenΣ21 is not too strong if and only ifchase21(chase12(I)) ⊆ I for every ground instanceI.

Proof Assume first thatchase21(chase12(I)) ⊆ I for every ground instanceI. We must show that wheneverthere are ground instancesI andJ such thatI ⊆ J , then(I, J) |= Σ12 ◦ Σ21. Let I andJ be ground instancessuch thatI ⊆ J . LetU = chase12(I), and letU ′ = chase21(U). So(I, U) |= Σ12 and(U,U ′) |= Σ21. Also, byassumption,U ′ ⊆ I. Since alsoI ⊆ J , it follows thatU ′ ⊆ J . Since(U,U ′) |= Σ21 andU ′ ⊆ J , we see that(U, J) |= Σ21. Since(I, U) |= Σ12 and(U, J) |= Σ21, it follows that(I, J) |= Σ12 ◦ Σ21, as desired.

Assume now thatΣ21 is not too strong. So(I, I) |= Σ12◦Σ21. LetU = chase12(I), and letU ′ = chase21(U).We must show thatU ′ ⊆ I. The argument in the proof of Proposition 4.6 shows that(U, I) |= Σ21. SinceM21 isa normal mapping, it is easy to see that the result of chasing an arbitrary instance withΣ21 is a ground instance.In particular,U ′ is a ground instance. SinceU ′ is the result of chasingU with Σ21, andU ′ is ground, a standardproperty of the chase tells us that for every instanceJ such that(U, J) |= Σ21, necessarilyU ′ ⊆ J . If we takeJto beI, then we see thatU ′ ⊆ I, as desired.2

Lemma 4.9 LetM12 = (S1,S2, Σ12) be an s-t tgd mapping andM21 = (S2, S1, Σ21) be a normal mapping.ThenΣ21 is not too weak if and only ifI ⊆ chase21(chase12(I)) for every ground instanceI.

Proof Assume first thatΣ21 is not too weak. So whenever there are ground instancesI andJ such that(I, J) |=

Σ12 ◦ Σ21, then I ⊆ J . Let J be chase21(chase12(I)). Then(I, J) |= Σ12 ◦ Σ21. So I ⊆ J , that is, I ⊆chase21(chase12(I)), as desired.

Assume now thatI ⊆ chase21(chase12(I)) for every ground instanceI. LetI andJ be ground instances suchthat(I, J) |= Σ12 ◦Σ21; we must show thatI ⊆ J . Since(I, J) |= Σ12 ◦Σ21, there isK such that(I,K) |= Σ12

and(K, J) |= Σ21. Since(I,K) |= Σ12, a standard property of the chase tells us thatchase12(I) → K. Similarly,chase21(K) → J . Sincechase12(I) → K, we see (by chasing both sides withΣ21) thatchase21(chase12(I)) →chase21(K). Hence, since alsochase21(K) → J , we havechase21(chase12(I)) → J , by transitivity of homo-morphism. Since alsoI ⊆ chase21(chase12(I)), it follows thatI → J . Therefore, sinceI andJ are both ground,we haveI ⊆ J , as desired.2

We now discuss a nice property of normal inverses. Corollary7.4 of [Fag07] says that ifM12 = (S1,S2, Σ12)

andM21 = (S2, S1, Σ21) are both full s-t tgd mappings, thenM21 is an inverse ofM12 if and only ifchase21(chase12(I)) = I for every ground instanceI. The next theorem says that this strong property (thatchase21(chase12(I)) = I for every ground instanceI) holds for normal inversesM21, even whenM12 is notfull.

Theorem 4.10 LetM12 = (S1,S2, Σ12) be an s-t tgd mapping andM21 = (S2, S1, Σ21) a normal mapping.ThenM21 is an inverse ofM12 if and only ifchase21(chase12(I)) = I for every ground instanceI.

Proof This follows easily from Proposition 4.4, along with Lemmas4.8 and 4.9.2

Theorem 4.10 fails if we drop the assumption thatM21 be normal. In particular, it is straightforward toverify that if M12 and M21 are as in Example 4.7, andI is an arbitrary nonempty ground instance, thenchase21(chase12(I)) 6⊆ I. It is more challenging to find an example whereM21 is an inverse ofM12 butI 6⊆ chase21(chase12(I)). Such an example was given in [FKPT08], as we now describe.

Example 4.11 We now give an example from [FKPT08] whereM21 is an inverse ofM12 but where there isa ground instanceI such thatI 6⊆ chase21(chase12(I)). Let S1 consist of the unary relation symbolP , andlet S2 consist of the binary relation symbolQ. Let Σ12 consist ofP (x) → ∃yQ(x, y), and letΣ21 consist

11

Page 12: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

of the constraintsQ(x, y) → P (y) andQ(x, y) ∧ const(y) → P (x).3 Let M12 = (S1,S2, Σ12) andM21 =

(S2, S1, Σ21). Note thatM21 is not normal, because the second member ofΣ21 hasconst(y) rather thanconst(x)in its premise.

We now show thatM21 is an inverse ofM12. To do this, we need to show that ifI andJ are ground instances,then(I, J) |= Σ12 ◦ Σ21 if and only if I ⊆ J .

First, letI be a ground instance that consists ofn factsP (c1), . . ., P (cn), and letK be{Q(ci, ci) : 1 ≤ i ≤

n}. It is easy to see that(I,K) |= Σ12 and(K, I) |= Σ21. Hence,(I, I) |= Σ12 ◦ Σ21, which implies that ifI ⊆ J , then(I, J) |= Σ12 ◦Σ21. Next, assume thatI andJ are ground instances such that(I, J) |= Σ12 ◦Σ21; weshall show thatI ⊆ J . Since(I, J) |= Σ12 ◦Σ21, there isK such that(I,K) |= Σ12 and(K, J) |= Σ21. SupposeI consists ofn factsP (c1), . . . , P (cn). Since(I,K) |= Σ12, we know thatK contains{Q(ci, yi) | 1 ≤ i ≤ n},for some choices ofy1, . . . , yn. There are two cases:

• Case 1.Someyi is not a constant. Because of the first constraint inΣ21, we see thatJ containsP (yi), and soJ is not ground. Hence, this case is not possible.

• Case 2.Everyyi is a constant. Because of the second constraint inΣ21, we see thatJ containsP (ci), for1 ≤ i ≤ n, and soI ⊆ J , as desired.

This concludes the proof thatM21 is an inverse ofM12.

Now let I = {P (0)}. We have thatchase12(I) = {Q(0, n)} for a null n. Thenchase21(chase12(I)) ={P (n)

}. SoI 6⊆ chase21(chase12(I)). 2

5 Essential Conjunctions and Essential Atoms

In this section, we introduce the notions ofessential conjunctions(and, in the full case, ofessential atoms), whichturn out to play a fundamental role for the study of inverses.Roughly speaking, an essential conjunction for arelational atomA (with respect to an s-t tgd mappingM12 = (S1,S2, Σ12)) is a conjunction such that (a) theatoms in the conjunction arise in the chase ofA with Σ12, and (b) all of these atoms can arise together in a chasewith Σ12 only if A is present in the source. We show that an s-t tgd mapping is invertible if and only if each atomhas an essential conjunction (in the full case, if and only ifeach atom has an essential atom). Further, we showhow to construct a normal inverse directly from the essential conjunctions. It is convenient to consider separatelythe two parts (a) and (b) above. When a conjunction satisfies (a), that is, roughly speaking, the atoms in theconjunction arise in the chase ofA, then we say that the conjunction isrelevant. When a conjunction satisfies (b),that is, roughly speaking, all of the the atoms in the conjunction can arise together in a chase withΣ12 only if Ais present in the source, then we say that the conjunction isdemanding.

We now say more precisely what we mean for a conjunctionδ to be relevant for a source atomA. Thedefinition is given in terms of the existence of a certain homomorphism. In addition to havingδ contain targetatoms, we also allowδ to containconst formulasconst(x) for some or all of the variablesx inA, where theconstformulasconst(x) in δ tell us what variables must be mapped onto themselves in the homomorphism. This way,instead of simply requiring that the target atoms inδ be in the chase ofA, we can be a little more general, andinstead require only that a homomorphic image of the target atoms ofδ be in the chase ofA. This added level ofgenerality is needed in our characterizations we give laterof when normal mappings are inverses. We follow thesimplifying convention established earlier that ifδ contains noconst formulas, this is treated the same as havingit contain all of theconst formulasconst(x) for every variablex of A, in that the homomorphism must map eachvariable onto itself. In addition to target atoms andconst formulas, we also allowδ to contain inequalities, fornotational simplicity later, even though we ignore the inequalities in our definitions. Since the chase works oninstances, not on atoms, we make use ofIδ andIA, as defined earlier, rather thanδ andA directly. We now givethe formal definition of a relevant conjunctionδ for a source atomA.

3Note that our choice ofΣ21, although clearly peculiar (given whatΣ12 is), is what we intended: it is not a typographical error!

12

Page 13: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

Definition 5.1 Let Σ12 be a finite set of s-t tgds. Assume thatA is a relational atom, andδ is a conjunctionα ∧ χ ∧ η, whereα is a conjunction of relational atoms,χ is a conjunction (possibly empty) ofconst formulasconst(x) for variablesx in A, andη is a conjunction (possibly empty) of inequalities of the form x 6= y fordistinct variablesx, y in A. Let us say thatδ is relevant forA (with respect toΣ12) if Iδ → chase12(IA). Notethat the inequalities play no role, but are allowed for notational convenience.2

Full case: We now examine the notion of “relevant” in the case whenΣ12 is full. The const formulasplay no role in inverses of full s-t tgd mappings, as shown in [FKPT08] (see Proposition 7.11 for a precisestatement of what what we mean by “play no role”). Therefore,it is natural to assume in the full case that therelevant conjunctionδ has noconst formulas. ThenIδ contains only constants (no nulls). So we have thatIδ →chase12(IA) holds if and only ifIδ ⊆ chase12(IA). Hence,δ is relevant forA if and only if Iδ ⊆ chase12(IA).This corresponds very well with the intuition we gave earlier thatδ is relevant forA if and only if “the atoms inδarise in the chase ofA”.

We now consider examples of the notion of being relevant.

Example 5.2 We begin with examples in the full case. LetΣ12 consist of the full s-t tgdsP (x, y) → (R(x, y) ∧S(x, x) ∧ T (y)) andQ(x) → S(x, x). LetA beP (x, y). ThenIA is {P (cx, cy)}, wherecx andcy are constants.Also, we have thatchase12(IA) is {R(cx, cy), S(cx, cx), T (cy)}. Let δ1 beR(x, y) ∧ S(x, x), let δ2 beS(x, y),and letδ3 beS(x, x). Then we have thatIδ1

is {R(cx, cy), S(cx, cx)}, thatIδ2is {S(cx, cy)}, and thatIδ3

is{S(cx, cx)}. Soδ1 andδ3 are both relevant forA, whereasδ2 is not relevant forA. 2

Example 5.3 Assume thatΣ12 consists of the s-t tgdsP (x) → ∃z(R(x, z) ∧ S(x, x)) andQ(x, y) → S(x, x).Let A beP (x). ThenIA is {P (cx)}, wherecx is a constant, andchase12(IA) is {R(cx, n), S(cx, cx)} for anull n. Let δ1 beR(x, y) ∧ const(x). ThenIδ1

is {R(cx, ny)} for a nullny. SoIδ1→ chase12(IA) under the

homomorphismh1 whereh1(cx) = cx andh1(ny) = n. Therefore,δ1 is relevant forA. Now let δ2 beR(x, y).ThenIδ2

is {R(cx, cy)} for constantscx, cy. Soδ2 is not relevant forA. since if there were a homomorphismh2 : Iδ2

→ chase12(IA), we would haveh2(cy) = n, which is not possible. Finally, letδ3 beS(x, y)∧ const(x).ThenIδ3

is {S(cx, ny)} for a nullny. SoIδ3→ chase12(IA) under the homomorphismh3 whereh3(cx) = cx

andh3(ny) = cx. Therefore,δ3 is relevant forA. 2

We now consider the notion of a demanding conjunction. Roughly speaking, the conjunctionδ is demandingfor the source atomA if “all of the atoms inδ can arise together in a chase only ifA is present in the source”. Wenow give the formal definition of a demanding conjunctionδ for a source atomA.

Definition 5.4 Let Σ12, A, andδ be as in Definition 5.1. Let us say thatδ is demanding forA (with respect toΣ12) if for every ground instanceI such thatIδ → chase12(I), necessarilyIA ⊆ I. 2

Full case:We now examine the notion of “demanding” whenΣ12 is full. As in our discussion of the full casefor the notion of “relevant”, we assume thatδ has noconst formulas. As before,Iδ then contains only constants(no nulls). So we have thatIδ → chase12(I) holds if and only ifIδ ⊆ chase12(I). Hence,δ is demanding forAif and only if wheneverIδ ⊆ chase12(I), thenIA ⊆ I. This corresponds very well to the intuition we gave earlierthatδ is demanding forA if “all of the atoms inδ can arise together in a chase only ifA is present in the source”.We now consider examples of the notion of being demanding.

Example 5.5 We begin with examples in the full case. LetΣ12, A, δ1, δ2, andδ3 be as in Example 5.2. Wenow show thatδ1 is demanding forA. As before,Iδ1

is {R(cx, cy), S(cx, cx)} andIA is {P (cx, cy)}, wherecxandcy are constants. Assume thatIδ1

⊆ chase12(I), that is,{R(cx, cy), S(cx, cx)} ⊆ chase12(I). In particular,{R(cx, cy)} ⊆ chase12(I). By looking atΣ12, we see that the only way this can happen is if{P (cx, cy)} ⊆ I.That is,IA ⊆ I, as desired.

It is easy to see that there is no instanceI such thatIδ2→ chase12(I). Therefore, it is automatically true

that δ2 is demanding forA. However, we now show thatδ3 is not demanding forA. Let I be {Q(cx)}. It iseasy to see thatIδ3

andchase12(I) are each{S(cx, cx)} for the constantcx, so Iδ3= chase12(I). Therefore,

Iδ3⊆ chase12(I). However,IA 6⊆ I. Soδ3 is not demanding forA. 2

13

Page 14: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

Example 5.6 Let Σ12, A, δ1, δ2, andδ3 be as in Example 5.3. We now show thatδ1 is demanding forA. LetI be a ground instance such thatIδ1

→ chase12(I); we must show thatIA ⊆ I. Assume that the distinctP -facts in I areP (c1), . . . , P (ck) (also, I may contain someQ-facts). Then theR-facts in chase12(I) arepreciselyR(c1, n1), . . . , R(ck, nk) for distinct nullsn1, . . . , nk. Leth1 be a homomorphism such thath1 : Iδ1

→chase12(I) (such a homomorphism exists by assumption). SinceIδ1

is {R(cx, ny)} for a null ny, and sinceh1(cx) = cx, it follows easily that one ofc1, . . . , ck is cx. SoP (cx) is a fact ofI. But IA is {P (cx)}. Hence,IA ⊆ I, as desired. So indeed,δ1 is demanding forA.

By a similar argument to that given in Example 5.3, we see thatthere is no instanceI such thatIδ2→

chase12(I). Therefore, it is automatically true thatδ2 is demanding forA.

Finally, we show thatδ3 is not demanding forA. As in Example 5.3, we have thatIδ3is is {S(cx, ny)} for a

null ny. Let I be{Q(cx, cx)}. Sochase12(I) is {S(cx, cx)}. Therefore,Iδ3→ chase12(I). However,IA 6⊆ I.

Soδ3 is not demanding forA. 2

We can now put together the notions of “relevant” and “demanding” to obtain the notion we really want, thatof anessential conjunction.

Definition 5.7 Let Σ12, A, andδ be as in Definition 5.1. We say thatδ is essential forA (with respect toΣ12) ifδ is both relevant forA and demanding forA (with respect toΣ12). 2

Example 5.8 Let Σ12, A, δ1, δ2, andδ2 be as in Examples 5.2 and 5.5. We see from these examples thatδ1 isboth relevant and demanding forA, thatδ2 is demanding but not relevant forA, and thatδ3 is relevant but notdemanding forA. Soδ1 is essential forA, but neitherδ2 norδ3 are essential forA. 2

Example 5.9 Let Σ12, A, δ1, δ2, andδ3 be as in Examples 5.3 and 5.6. We see from these examples thatδ1 isboth relevant and demanding forA, thatδ2 is demanding but not relevant forA, and thatδ3 is relevant but notdemanding forA. Soδ1 is essential forA, but neitherδ2 norδ3 are essential forA. 2

The s-t tgd mappingM12 = (S1,S2, Σ12) is said to have theconstant-propagation property[Fag07] if forevery ground instanceI, every member of the active domain ofI is a member of the active domain ofchase12(I)(that is,dom(I) ⊆ dom(chase12(I))). Later, we shall make use of the following proposition from[Fag07].

Proposition 5.10 [Fag07] Every invertible s-t tgd mapping has the constant-propagation property.

The next proposition gives a similar propagation property.

Proposition 5.11 Assume thatA is a source atom, andδ is an essential conjunction forA with respect to the setΣ12 of s-t tgds. Then every variable inA appears inδ.

Proof Assume thatA is P (v1, . . . , vk), wherev1, . . . , vk are variables, not necessarily distinct. Assume that thevariablevi does not appear inδ; we shall derive a contradiction.

Letd be a new constant, and letI be obtained fromIA by replacing every occurrence ofcviin IA byd. Sinceδ

is relevant forA, we know that there is a homomorphismh : Iδ → chase12(IA). So for the same homomorphismh, we haveh : Iδ → chase12(I), sincecvi

does not appear inIδ. SinceIδ → chase12(I), even thoughIA 6⊆ I, itfollows thatδ is not demanding forA, which contradicts the assumption thatδ is essential forA. 2

Recall that aweak renamingis a function that maps variables to variables (the word “weak” refers to the factthat the function is not necessarily one-to-one). Ifϕ is a formula, andf is a weak renaming, letϕf be the result ofreplacing every variablex in ϕ by f(x). We may refer toϕf as aweak renaming ofϕ. If ϕ is a normal constraintwith premiseδ, then we say thatf is consistent with the inequalities ofδ if f(x) andf(y) are distinct for eachinequalityx 6= y in δ. The intuition is that iff is consistent with the inequalities of the normal constraint ϕ, then

14

Page 15: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

ϕf is a special case ofϕ. For example, assume thatϕ is the normal constraint (1) of Section 4. Letf be the weakrenaming that is the identity except thatf(x1) = x2. Thenϕf is

P (x2, x2, x3, x3) ∧ const(x2) ∧ const(x3) ∧ (x2 6= x3) → Q(x2, x3, x2) (2)

Note that (2) can be viewed as a special case of (1) wherex1 andx2 are equal. In fact, the formulasϕf wherefis consistent with the inequalitiesϕ can be thought of as consisting of all of the special cases ofϕ.

The next theorem relates the notion of “not too strong” with the notion of ”demanding”, and relates the notionof “not too weak” with the notion of “relevant”. Shortly, we shall discuss the intuition behind this rather technicaltheorem.

Theorem 5.12 LetM12 = (S1,S2, Σ12) be an s-t tgd mapping andM21 = (S2, S1, Σ21) be a normal mapping.Then

1. Σ21 is not too strong if and only if every constraint inΣ21 is of the formδ → A, whereδf is demanding forAf for every weak renamingf consistent with the inequalities ofδ.

2. Σ21 is not too weak if and only if for each source atomA, there is a relevant conjunctionδ for A such thatδ → A is a weak renaming of a constraint inΣ21.

Proof (1) Assume first that there is a constraintδ → A of Σ21 and a weak renamingf consistent with theinequalities ofδ such thatδf is not demanding forAf . Let δ′ beδf , and letA′ beAf . Thenδ′ → A′ is a normalconstraint that is a logical consequence ofΣ21. Sinceδ′ is not demanding forA′, there is an instanceI such thatIδ′ → chase12(I), yet IA′ 6⊆ I. SinceIδ′ → chase12(I), it follows that IA′ is the result of chasingchase12(I)

with δ′ → A′. So IA′ ⊆ chase21(chase12(I)) and thereforechase21(chase12(I)) 6⊆ I. By Lemma 4.8,Σ21 istoo strong.

Conversely, assume thatΣ21 is too strong. Then, by Lemma 4.8, there is a ground instanceI such thatchase21(chase12(I)) 6⊆ I. It follows that there must be a constraint of the formδ → A in Σ21 such that the resultof chasingchase12(I) with δ → A produces a fact not inI. By renaming constants inI if needed, this tells usthat there is a weak renamingf such thatI(δf ) → chase12(I) andI(Af ) 6⊆ I. Hence,δf is not demanding forAf .

(2) Assume first thatΣ21 is not too weak. Pick a source atomA. By Lemma 4.9, we know thatIA ⊆

chase21(chase12(IA)). So there must be a constraint inΣ21 that fires onchase12(IA) to introduceIA. Hence,there must be a weak renamingδ → A of a constraint inΣ21 such thatIδ → chase12(IA). Soδ is relevant forA.

Conversely, assume that for each source atomA, there is a relevant conjunctionδ for A such thatδ →A is a weak renaming of a constraintϕ ∈ Σ21. ThenIδ → chase12(IA) becauseδ is relevant forA, andtherefore we haveIA ⊆ chase21(chase12(IA)) becauseϕ fires onchase12(IA) to introduceIA. SinceIA ⊆

chase21(chase12(IA)) for each source atomA, it follows easily thatI ⊆ chase21(chase12(I)) for every groundinstanceI. By Lemma 4.9,Σ21 is not too weak.2

Let us discuss the intuition behind Theorem 5.12. We consider first part (1) of Theorem 5.12. ForΣ21 to benot too strong, the members ofΣ21 must be restricted. Letϕ be a member ofΣ21. As we commented earlier, theformulasϕf , wheref is a weak renaming consistent with the inequalities ofδ, can be viewed as all of the specialcases ofϕ. So Part (1) of Theorem 5.12 says thatΣ21 is not too strong if and only if for every special caseϕf

of every memberϕ of Σ21, if ϕf is δf → Af , thenδf is demanding forAf . By the intuition we gave earlier forthe notion of “demanding”,δf being demanding forAf means that, roughly speaking, “all of the atoms inδf canarise together in a chase only ifAf is present in the source”.

Consider now part (2) of Theorem 5.12. ForΣ21 to be not too weak,Σ21 must contain (or imply) certainconstraints. If we again think ofϕf as a special case of a normal constraintϕ, then Part (2) of Theorem 5.12 saysthatΣ21 is not too weak if and only if for each source atomA, there is a relevant conjunctionδ for A such that

15

Page 16: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

δ → A is a special case of a constraint inΣ21. By the intuition we gave earlier for the notion of “relevant”, δbeing relevant forA means that, roughly speaking, “the atoms inδ arise in the chase ofA”.

The next theorem characterizes normal inverses of s-t tgd mappings in terms of the notions ofdemanding,relevant, andessential.

Theorem 5.13 LetM12 = (S1,S2, Σ12) be an s-t tgd mapping andM21 = (S2, S1, Σ21) be a normal mapping.ThenM21 is an inverse ofM12 if and only if

1. Every constraint inΣ21 is of the formδ → A, whereδf is demanding forAf for every weak renamingfconsistent with the inequalities ofδ.

2. For each source atomA, there is a relevant conjunctionδ for A such thatδ → A is a weak renaming of aconstraint inΣ21. (By Part (1), this relevant conjunction is essential.)

Proof This follows immediately from Proposition 4.4 and Theorem 5.12. 2

Definition 5.14 Let M12 = (S1,S2, Σ12) be an s-t tgd mapping. Lete be a partial function whose domainconsists of all prime source atoms that have an essential conjunction. If the prime source atomA has an essentialconjunction, thene(A) is an essential conjunction forA that is a conjunction of relational atoms and the formulasconst(x) for each variablex of A. (Note that each essential conjunction forA must contain all of the variables ofA, by Proposition 5.11.) IfA has no essential conjunction, thene(A) is undefined. LetΣe

21 consist of all formulase(A) ∧ ηA → A, whereA is a prime source atom and wheree(A) is defined, andηA consists of all inequalitiesof the formx 6= y wherex andy are distinct variables ofA. 2

The next theorem shows how we can construct an inverse out of essential conjunctions.

Theorem 5.15 LetM12 be an s-t tgd mapping. The following are equivalent.

1. M12 is invertible.

2. For every source atomA, there is an essential conjunction forA.

3. Me21 is an inverse ofM12, for every partial functione as in Definition 5.14.

4. Me21 is an inverse ofM12, for some partial functione as in Definition 5.14.

We defer the proof of Theorem 5.15 to Section 6. Note that the inversesMe21 in (3) and (4) of Theorem 5.15

are normal. The equivalence of (1) and (2) in Theorem 5.15 gives a clean necessary and sufficient condition forthe existence of an inverse. Further, the equivalence of (1)and (2) shows the fundamental importance of essentialconjunctions for the study of inverses (normal or otherwise).

The next two lemmas will be useful later.

Lemma 5.16 LetA andA′ be source atoms. Assume thatδ is relevant forA and demanding forA′. ThenA andA′ are the same atom.

Proof Sinceδ is relevant forA, we know thatIδ → chase12(IA). Therefore, sinceδ is demanding forA′, wehaveIA′ ⊆ IA. SinceIA andIA′ are both singletons, we haveIA′ = IA, soA andA′ are the same atom, asdesired. 2

Lemma 5.17 LetM12 be a full s-t tgd mapping, andM21 = (S2, S1, Σ21) a normal inverse forM12. LetA bea source atom andB a target atom whereIA ⊆ chase21(IB). ThenB is demanding forA with respect toΣ12.

Proof Assume thatIB ⊆ chase12(I); we must show thatIA ⊆ I. LetU = chase12(I). We know from Proposi-tion 4.6 that(U, I) |= Σ21. SinceΣ21 is full, this implies further thatchase21(U) ⊆ I. SinceIB ⊆ chase12(I)

andIA ⊆ chase21(IB), it follows that IA ⊆ chase21(chase12(I)) = chase21(U). Since alsochase21(U) ⊆ I,we have thatIA ⊆ I, and soIA ⊆ I, as desired.2

To prevent confusion, the reader should note that the inclusion IA ⊆ chase21(IB) in Lemma 5.17 useschase21, not chase12. If we were to instead consider the inclusionIA ⊆ chase12(IB) (where we now takeBto be a source atom andA a target atom), then this inclusion would say thatA is relevant forB.

16

Page 17: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

5.1 The Full Case

If Σ12 is full, then we are interested in the situation whereδ has noconst formulas. ThenIδ contains onlyconstants (no nulls). Ifδ is simply a relational atom, and ifδ is demanding, then we callδ a demanding atom.Similarly, we define arelevant atomand anessential atom. The reason we are interested in the demanding atoms(and the essential atoms) in the full case is because of the following proposition.

Proposition 5.18 LetM12 = (S1,S2, Σ12) be a full s-t tgd mapping. Assume thatA is a source atom. Assumethat the conjunctionδ has noconst formulas. Ifδ is demanding forA, thenδ contains a demanding atom forA.If δ is essential forA, thenδ contains an essential atom forA.

Proof Assume by way of contradiction thatδ is demanding forA, but that no atomB of δ is demanding forA.So for every atomB of δ, there is a ground instanceJB such thatIB → chase12(JB) andIA 6⊆ JB. Sinceevery member ofIB is a constant, and sinceIB → chase12(JB), it follows thatIB ⊆ chase12(JB). Let I bethe union of the instancesJB. SoIB ⊆ chase12(I). From our assumptions aboutδ, it follows easily thatIδ isthe union, over all atomsB in δ, of IB . Therefore, sinceIB ⊆ chase12(I) for every atomB of δ, it follows thatIδ ⊆ chase12(I), and soIδ → chase12(I). Since for everyB we have thatIA 6⊆ JB, andIA is a singleton set,it follows thatIA 6⊆ I. So we have thatIδ → chase12(I) andIA 6⊆ I. This contradicts the assumption thatδ isdemanding forA. Therefore,δ contains a demanding atom forA, as desired.

Now assume thatδ is essential forA. We have shown thatδ contains a demanding atomB for A. Sinceδ isrelevant forA, we haveIδ → chase12(IA). But Iδ contains only constants (no nulls), and soIδ ⊆ chase12(IA).As we noted,Iδ is the union, over all atomsB in δ, of IB . Therefore,IB ⊆ Iδ ⊆ chase12(IA), and soIB →chase12(IA). Therefore,B is relevant forA. SinceB is both relevant and demanding forA, we see thatB isessential forA, as desired.2

The next proposition says that, in the full case, we can strengthen the equivalence of (1) and (2) in Theo-rem 5.15.

Theorem 5.19 LetM12 be a full s-t tgd mapping. The following are equivalent.

1. M12 is invertible.

2. For every source atomA, there is an essential atom forA.

Proof Proposition 6.2 will tell us that in the full case, the essential conjunction in (2) of Theorem 5.15 can betaken to have noconst formulas. The result then follows from Proposition 5.18..2

As with Theorem 5.15, the equivalence of (1) and (2) in Theorem 5.19 gives a clean necessary and sufficientcondition for the existence of an inverse, this time in the full case. Further, the equivalence of (1) and (2) showsthe fundamental importance of essential atoms for the studyof inverses (normal or otherwise) in the full case.

In the full case, we can strengthen Proposition 5.11 as follows.

Proposition 5.20 Assume thatA is a source atom, andB is an essential atom forA with respect to the setΣ12

of full s-t tgds. Then the variables inB are exactly the variables inA.

Proof By Proposition 5.11, we know that every variable inA appears inB. SinceB is relevant for A, andΣ12 isfull, we haveIB ⊆ chase12(IA). Therefore, every variable inB appears inA. 2

17

Page 18: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

6 The Canonical Candidate Inverse

In [FKPT08], the “canonical candidate inverse” was defined for each s-t tgd mapping, and it was shown (withquite a complicated proof) that if an s-t tgd mappingM has an inverse, then the canonical candidate inverse ofM is also an inverse ofM. Since the canonical candidate inverse is a normal inverse,in this section we takeadvantage of the machinery we have developed to give a much simpler proof of this result (and of the fact that thecanonical candidate inverse is the weakest possible normalinverse).

Definition 6.1 Let M12 = (S1,S2, Σ12) be an s-t tgd mapping. For each source atomA, let IA be, as before,the instance containing the fact obtained by replacing eachvariablev inA by a distinct constantcv. LetVA be theresult of chasingIA with Σ12. LetνA be the conjunction of relational atoms obtained by replacing every constantcv of VA by the variablev, and replacing every nulln of VA by a new variablevn (that does not appear inA). LetχA be the conjunction of the formulasconst(x) for each variablex in A, and letωA be the conjunction ofνA andχA. Let ηA be the conjunction of all inequalities of the formx 6= y wherex andy are distinct variables inA, 2

Proposition 6.2 LetM12 = (S1,S2, Σ12) be an invertible s-t tgd mapping. LetA be a source atom. ThenωA,as defined in Definition 6.1, is an essential conjunction forA. If M12 is full, thenνA, as defined in Definition 6.1,is an essential conjunction forA.

Proof It is clear thatωA is relevant forA (as isνA, in the full case).

We now show thatωA is demanding forA. Assume thatIωA→ chase12(I) for some ground instanceI; we

must show thatIA ⊆ I. Now IωA= chase12(IA). Sochase12(IA) → chase12(I). By the implication (1)⇒ (4)

of Proposition 3.2, it follows thatIA ⊆ I, as desired. A similar argument shows thatνA is demanding forA inthe full case.2

The canonical candidate inverse[FKPT08] of an invertible s-t tgd mappingM12 = (S1,S2, Σ12) is thenormal mappingMc

21 = (S2, S1, Σc21) whereΣc

21 contains, for every prime source atomA, the constraintνA ∧χA ∧ ηA → A. By Propositions 5.11 and 6.2, every variable inA appears inνA, and so these constraints arewell-defined, and specify normal mappings.

We can now prove Theorem 5.15.

Proof of Theorem 5.15:The implications (3)⇒ (4), and (4)⇒ (1), are immediate. The implication (1)⇒ (2)follows by Proposition 6.2. So we need only show the implication (2)⇒ (3). Assume that (2) holds. Therefore,e(A) is an essential conjunction forA, for each atomic formulaA. We now make use of Theorem 5.13, where therole of Σ21 is played byΣe

21. Let e(A) ∧ ηA → A be an arbitrary member ofΣe21. Let δ bee(A) ∧ ηA. Thenδ

is essential forA, and hence demanding forA and relevant forA. Let f be a weak renaming consistent withηA.By construction ofηA, it follows thatf is a one-to-one on the variables ofA. Therefore, sinceδ is demanding forA, alsoδf is is demanding forAf (the possibility thatf is not necessarily one-to-one when we consider also thevariables not inA cannot hurt). So part (1) of Theorem 5.13 holds when the role of Σ21 is played byΣe

21. Also,part (2) of Theorem 5.13 holds, where the weak renaming is a renaming obtained by renaming the variables in aprime source atom. Therefore,Me

21 is an inverse ofM12, and so (3) of Theorem 5.15 holds, as desired.2

It is shown in [FKPT08] that ifM12 is an invertible s-t tgd mapping, then the canonical candidate inverse ofM12 is indeed an inverse ofM12. The proof in [FKPT08] is quite complicated. We will now givea proof, basedon the following proposition, that is much simpler (given our machinery).

Theorem 6.3 [FKPT08] Assume thatM12 = (S1,S2, Σ12) is an invertible s-t tgd mapping. Then the canonicalcandidate inverse ofM12 is indeed an inverse ofM12.

Proof Let e be the function that assigns to each prime source atomA the formulaωA. By Proposition 6.2, weknow thate(A) is an essential conjunction forA. So by the implication (1)⇒ (3) of Theorem 5.15, we know thatMe

21 is an inverse ofM. But Me21 is the canonical candidate inverse ofM12, and so the canonical candidate

inverse ofM12 is an inverse ofM12, as desired.2

18

Page 19: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

Our final proposition (which we shall make use of later) in this section says that the canonical candidateinverse of an invertible s-t tgd mappingM12 is the weakest normal inverse ofM12. This proposition followsfrom results in [FKPT08] (although the notion of a normal inverse did not appear in [FKPT08]). Since the proofsof those results are rather complicated, we shall give a simple, self-contained proof of the next proposition.

Proposition 6.4 LetM12 be an s-t tgd mapping, letMc21 = (S2, S1, Σc

21) be the canonical candidate inverse ofM12, and letM21 = (S2, S1, Σ21) be an arbitrary normal inverse ofM12. ThenΣ21 logically impliesΣc

21.

Proof Let σ be an arbitrary member ofΣc21; we must show thatΣ21 logically impliesσ. Assume not; we shall

derive a contradiction.

SinceΣ21 does not logically implyσ, there is(K, J) such that(K, J) |= Σ21 but(K, J) 6|= σ. LetJ ′ consistof the ground facts ofJ ; that is,J ′ consists of all factsP (a1, . . . , ak) of J wherea1, . . . , ak) are constants. Since(K, J) |= Σ21, it follows by normality ofM21 that (K, J ′) |= Σ21 (intuitively, the non-ground facts ofJ haveno effect in determining satisfaction ofΣ21). Similarly, since(K, J) 6|= σ, it follows that(K, J ′) 6|= σ.

Using the notation of Definition 6.1, let us writeσ asνA ∧χA ∧ηA → A. Also, letVA be as in Definition 6.1.Since(K, J ′) 6|= σ, we have (by renaming constants if necessary) that there is ahomomorphismh : VA → K

(that preserves constants and respects the inequalitiesηA), whereIA 6⊆ J ′. SinceVA is the result of chasingIAwith Σ12, we have(IA, VA) |= Σ12. Therefore, sinceVA → K, it follows from Lemma 3.1 that(IA,K) |= Σ12.Since also(K, J ′) |= Σ21, it follows that(IA, J ′) |= Σ12 ◦ Σ21. SinceM21 is an inverse ofM12, and sinceIAandJ ′ are ground instances, it follows thatIA ⊆ J ′. This is our desired contradiction.2

7 Unique Inverses

Mathematicians are accustomed to inverses being unique. For example, an invertible function has a unique inverse,and an invertible finite matrix has a unique inverse. The reason that inverses are typically unique is that they aretypically two-sided inverses. Thus, assume thatY andY ′ are both two-sided inverses ofX for some associativeoperation?, and thatI is the identity under this operation. ThenX ? Y = I. Apply Y ′ to both sides, and we get

Y ′ ? (X ? Y ) = Y ′ ? I. (3)

By associativity, the left-hand side of (3) is(Y ′ ? X) ? Y = I ? Y = Y , and the right-hand side of (3) isY ′.Therefore,Y = Y ′, and so the inverse ofX is unique.

Sometimes inverses are not unique. This can happen in particular for infinite matrices, where multiplicationis not necessarily associative. For a discussion of such phenomena on infinite matrices, we refer the interestedreader to the first author’s first paper [Fag68]. In particular, Lemma 2 of that paper gives a sufficient conditionfor an infinite matrix to have a unique two-sided inverse. It also gives an example where that unique two-sidedinverse is a unique right inverse, but where there are multiple left inverses.

In this section, we consider the notion of uniqueness of inverses of s-t tgd mappings (including questions suchas whether or not a given s-t tgd mapping has a unique normal inverse). Our interest in this question was inspiredby the following two examples of s-t tgd mappings, one with a unique normal inverse, and another with multiplenormal inverses.

Example 7.1 Let M12 = (S1, S2, Σ12), whereS1 consists of the unary relation symbolR, whereS2 consistsof the unary relation symbolS, and whereΣ12 consists of the tgdR(x) → S(x). Let Σ21 consist of the normalconstraintS(x)∧ const(x) → R(x), and letM21 = (S2, S1, Σ21). It is easy to see thatM21 is a normal inverseof M12. A theorem we shall give later (Theorem 7.7) implies thatM21 is in fact the unique normal inverse ofM12. 2

19

Page 20: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

Example 7.2 Let S1 consist of the unary relation symbolR, and letS2 consist of the binary relation symbolS.Let Σ12 consist of the tgdR(x) → S(x, x). LetΣ21 consist of the normal constraintS(x, x)∧const(x) → R(x),and letΣ′

21 consist of the normal constraintS(x, y) ∧ const(x) → R(x). LetM12 = (S1, S2, Σ12), letM21 =

(S2, S1, Σ21), and letM′21 = (S2, S1, Σ′

21). It is straightforward to verify thatM21 andM′21 are inequivalent

normal inverses ofM12. 2

Because of these two examples (but where the focus was on unique inverses specified by tgds), Fagin [Fag07]says, “It might be interesting to examine the question of when there is a unique inverse mapping specified in agiven language.” We note that a referee of [Fag07] commentedthat a reason whyM12 in Example 7.2 does nothave a unique inverse is thatM12 is “not onto”. In this section, we give a definition of what it means for a full s-ttgd mapping to be onto, and show that, this condition is a sufficient condition for an invertible full s-t tgd mappingto have a unique normal inverse (however, as we shall see, it is not a necessary condition).

In this section, we also show that no schema mapping has a unique inverse. Therefore, to attain uniqueness, wemust restrict to a class of possible inverses, such as normalinverses. We give a necessary and sufficient conditionfor an s-t tgd mapping to have a unique normal inverse. We use this theorem to show that even a nonfull s-ttgd mapping can have a unique normal inverse. Since, as we noted, being onto is a sufficient but not necessarycondition for an invertible full s-t tgd mapping to have a unique normal inverse, we find a larger classC of schemamappings (those specified by disjunctive tgds with inequalities) such that being onto is a necessary and sufficientcondition for an invertible full s-t tgd mappingM to have a unique inverse in the classC. We also show thesurprising result that such mappingsM are very special: they are very close to being copy mappings.

Say that two schema mappings(S1,S2, Σ12) and(S1,S2, Σ′12) areequivalentif Σ12 andΣ′

12 are logicallyequivalent. It is useful to consider a weaker notion than equivalence. Recall that ifM12 = (S1,S2, Σ12) andM21 = (S2, S1, Σ21) are schema mappings, thenM21 is an inverse ofM12 if and only if for every pairI, Jof ground instances, we have that(I, J) |= Σ12 ◦ Σ21 if and only if I ⊆ J . Therefore, for pairs(J1, J2)whereJ2 is not a ground instance, the pair(J1, J2) satisfying or not satisfyingΣ21 plays no role whatever indetermining whether or notM21 is an inverse ofM12. Based on this intuition, let us say thatΣ21 andΣ′

21

areweakly equivalentif wheneverJ1 is arbitrary andJ2 is a ground instance, then(J1, J2) |= Σ21 if and onlyif (J1, J2) |= Σ′

21.4 We may also say that(S2, S1, Σ21) and (S2, S1, Σ′21) are then weakly equivalent. If

(S2, S1, Σ21) and(S2, S1, Σ′21) are both normal mappings, then they are weakly equivalent ifand only if they

are equivalent. This follows easily from the observation that if J ′2 is the result of removing every non-ground fact

from J2, then(J1, J2) |= Σ21 if and only if (J1, J′2) |= Σ21.

We capture the intuition about the irrelevance of pairs(J1, J2) whereJ2 is not a ground instance in thefollowing simple proposition.

Proposition 7.3 LetM12 be a schema mapping, and letM21 andM ′21 be weakly equivalent schema mappings.

ThenM21 is an inverse ofM12 if and only ifM′21 is an inverse ofM12.

Proof By symmetry, we need only show that ifM21 is an inverse ofM12, thenM′21 is an inverse ofM12.

Let I, J be ground instances. SinceM21 is an inverse ofM12, we know that(I, J) |= Σ12 ◦ Σ21 if and onlyif I ⊆ J . To show thatM′

21 is an inverse ofM12, we need only show that(I, J) |= Σ12 ◦ Σ21 if and only if(I, J) |= Σ12 ◦ Σ′

21.

Now (I, J) |= Σ12 ◦Σ21 if and only if there isJ ′ such that(I, J ′) |= Σ12 and(J ′, J) |= Σ21. SinceM21 andM ′

21 are weakly equivalent, andJ is ground, we have that(J ′, J) |= Σ21 if and only if (J ′, J) |= Σ′21. Hence,

(I, J) |= Σ12 ◦ Σ21 if and only if there isJ ′ such that(I, J ′) |= Σ12 and(J ′, J) |= Σ′21, which happens if and

only if (I, J) |= Σ12 ◦ Σ′21. Therefore,(I, J) |= Σ12 ◦ Σ21 if and only if (I, J) |= Σ12 ◦ Σ′

21. This was to beshown. 2

We now make use of Proposition 7.3 to show that no schema mapping has a unique inverse. In the followingtheorem (and later), when we speak of “uniqueness”, we mean uniqueness up to equivalence.

4This notion arises also in the full version of [FKPT08].

20

Page 21: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

Theorem 7.4 No schema mapping has a unique inverse.

Proof Let M12 = (S1,S2, Σ12) be an invertible schema mapping. Assume thatM21 = (S2, S1, Σ21) isan inverse ofM12. DefineΣ′

21 by having(J1, J2) |= Σ′21 if and only if (J1, J2) |= Σ21 andJ2 is ground.5

DefineΣ′′21 by having(J1, J2) |= Σ′′

21 if and only if either(J1, J2) |= Σ′21 or J2 contains a null value. Let

M′21 = (S2, S1, Σ′

21) and letM′′21 = (S2, S1, Σ′′

21). By construction, we see thatΣ21, Σ′21, andΣ′′

21 are allweakly equivalent. It follows from Proposition 7.3 that sinceM21 is an inverse ofM12, so areM′

21 andM′′21.

We also see from the construction thatΣ′21 andΣ′′

21 are not logically equivalent. SoM′21 andM′′

21 are inversesof M12 that are not equivalent.2

Because of Theorem 7.4, if we wish to study uniqueness of inverses, we must restrict our attention to particularclasses (such as normal inverses). We have seen that normal inverses are an important class (in particular, everyinvertible s-t tgd mapping has a normal inverse). In Example7.1, we gave an s-t tgd mapping with a unique normalinverse (although we have not proven uniqueness yet), and inExample 7.2, we gave an example with multiplenormal inverses.

The next theorem gives a necessary and sufficient condition,based on our notions of “essential” and “demand-ing”, for an invertible s-t tgd mapping to have a unique normal inverse.

Theorem 7.5 An invertible s-t tgd mapping has a unique normal inverse if and only if for every source atomA,if δ is an essential conjunction forA, andδ′ is a demanding conjunction forA, both with formulasconst(x) forexactly the variablesx that appear inA, thenIδ → Iδ′ .

Proof Assume first thatM12 is an invertible s-t tgd mapping with a unique normal inverse. LetA be a sourceatom. Assume thatδ is essential forA, andδ′ is demanding forA, and both have formulasconst(x) for exactlythe variablesx that appear inA, Assume that we do not haveIδ → Iδ′ ; we shall derive a contradiction. Assumewithout loss of generality thatδ′ has no inequalities as conjuncts (if necessary, remove them). Let e be as inDefinition 5.14 withe(A) = δ. It follows from Theorem 5.15 thatMe

21 is an inverse ofM12. Let σ′ beδ′ ∧ ηA → A, whereηA is a conjunction of the inequalitiesx 6= y for distinct variablesx, y of A. Let Σ21 =

Σe21 ∪ {σ′}. LetM21 = (S2, S1, Σ21).

We now show thatM21 is also an inverse ofM12. Let δ′′ beδ′ ∧ ηA. Let f be a weak renaming consistentwith the inequalities ofδ′′. It follows easily from Theorem 5.13 that we need only show that (δ′′)f is demandingfor Af . Let I be a ground instance such thatI(δ′′)f → chase12(I); we must show thatIAf ⊆ I. Sinceconst(x)appears inδ′′ only for variablesx in A, and sincef is consistent with the inequalities inηA, we can assumewithout loss of generality (by renaming variables if needed) that f(x) = x for each variablex of A. SoAf

is simplyA. Sinceconst(x) appears inδ′′ only for variablesx wheref(x) = x, it follows easily thatf is ahomomorphism fromIδ′′ to I(δ′′)f . Furthermore,Iδ′′ = Iδ′ , since inequalities are ignored in computingIδ′′ .Therefore,Iδ′ → I(δ′′)f . Hence, sinceI(δ′′)f → chase12(I), we haveIδ′ → chase12(I). Sinceδ′ is demandingfor A, this implies thatIA ⊆ I. ButA = Af . Hence,IAf ⊆ I, as desired.

We now show thatMe21 andM21 are not equivalent, which gives our desired contradiction.Let J = Iδ′ . Let

I be the result of chasingJ with Σe21. Clearly(J, I) |= Σe

21. We now show that(J, I) 6|= σ′, and so(J, I) 6|= Σ21.Note that because of the structure ofΣe

21, it follows thatI is a ground instance.

Let A(c) be the result of chasingJ with σ′. We need only show thatA(c) does not appear inI. Let σ be anarbitrary member ofΣe

21. By construction ofΣe21, we know thatσ is of the forme(A) ∧ ηA′ → A′, whereA′ is a

prime source atom, and whereηA′ is a conjunction of the inequalitiesx 6= y for distinct variablesx, y of A′. Wemust show that the result of chasingJ with σ does not produceA(c). There are two cases.

Case 1:A′ involves a different relation symbol thanA does. So certainly the result of chasingJ with σ doesnot produceA(c).

5Note that we defineΣ′

21not by writing formulas, but simply by saying which pairs(J1, J2) satisfyΣ′

21.

21

Page 22: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

Case 2:A′ involves the same relation symbol asA. There are two subcases.

Subcase 2a:A′ equalsA. Since there is no homomorphism fromIδ to Iδ′ , that is, fromIδ to J , and sincee(A) = δ, it follows thatσ does not fire onJ .

Subcase 2b:A′ is different fromA. Then the equality pattern of the variables inA′ is different from theequality pattern of the variables inA. Hence, the result of chasingJ with σ again does not produceA(c).

We now prove the converse. Assume that for every source atomA, if δ is an essential conjunction forA, andδ′ is a demanding conjunction forA, both with formulasconst(x) for exactly the variablesx that appear inA,thenIδ → Iδ′ . Let e be as in Definition 5.14. SoMe

21 = (S2, S1, Σe21) is an inverse ofM12, by Theorem 5.15.

Let M21 = (S2, S1, Σ21) be an arbitrary normal inverse ofM12. We need only show thatΣe21 andΣ21 are

logically equivalent.

We first show thatΣ21 logically impliesΣe21. Let σ be an arbitrary member ofΣe

21. Thenσ is of the forme(A) ∧ ηA → A. Let δ bee(A). By part (2) of Theorem 5.13, we know that there is an essential conjunctionδ′ for A such thatδ′ → A is a weak renaming of a constraint inΣ21. Sinceδ andδ′ are both essential forA,it follows by assumption thatIδ andIδ′ are homomorphically equivalent. It is not hard to see that this impliesthatΣ21 logically impliesσ. Sinceσ is an arbitrary member ofΣe

21, it follows thatΣ21 logically impliesΣe21, as

desired.

We now show thatΣe21 logically impliesΣ21. Let σ be an arbitrary member ofΣ21. By part (1) of Theo-

rem 5.13, we know thatσ is of the formδ′ → A, whereA is a source atom and where(δ′)f is demanding forAf for every weak renamingf consistent with the inequalities ofδ′. For each weak renamingf consistent withthe inequalities ofδ′, let τf be obtained fromσf by adding to the premise ofσf (if it is not already there) eachinequalityx 6= y for every pairx, y of distinct variables in the conclusion ofσf . It is fairly straightforward to seethatσ is logically equivalent to the set of all such formulasτf . So to prove thatΣe

21 logically impliesΣ21, weneed only show thatΣe

21 logically implies each such constraintτf .

Now τf is a normal constraint of the formδ′′ ∧ ηA′ → A′, whereδ′′ is demanding forA′ (since as we said,(δ′)f is demanding forAf ), andηA′ is the conjunction of all inequalitiesx 6= y for distinct variablesx, y ofA′. By further renaming variables if needed, we can assume thatA′ is a prime atom. Now there is an essentialconjunctionδ for A′ such thatδ ∧ ηA′ → A′ is a normal constraint inΣe

21. Let us denote this constraint byγ.Sinceδ is essential forA, andδ′′ is demanding forA, it follows by assumption thatIδ → Iδ′′ . It follows easilythatγ logically impliesτf . SoΣe

21 logically impliesτf , as desired.2

From Theorem 7.5, we obtain a sufficient condition, that we shall utilize shortly, for a unique normal inverse.

Proposition 7.6 LetM12 = (S1,S2, Σ12) be an invertible s-t tgd mapping. Assume that wheneverA is a sourceatom andδ′ is a demanding conjunction forA with formulasconst(x) precisely for the variablesx of A, thenchase12(IA) → Iδ′ . ThenM12 has a unique normal inverse.

Proof We shall make use of Theorem 7.5. LetA be arbitrary source atom. Assume thatδ is an essential conjunc-tion forA, andδ′ is a demanding conjunction forA, both with formulasconst(x) for exactly the variablesx thatappear inA; we must show thatIδ → Iδ′ . Sinceδ is relevant forA, we haveIδ → chase12(IA). By assumption,we havechase12(IA) → Iδ′ . SoIδ → Iδ′ , as desired.2

Let us say that a full s-t tgd mapping isonto if every target instance is the result of chasing some sourceinstance. That is, the full s-t tgd mappingM12 = (S1,S2, Σ12) is onto if for every target instanceJ there is asource instanceI such thatchase12(I) = J . Note that the mappingM12 of Example 7.1 is onto, whereas themappingM12 of Example 7.2 is not onto.

Theorem 7.7 A full s-t tgd mapping that is invertible and onto has a uniquenormal inverse.

We defer the proof until later, when we have developed more tools.

22

Page 23: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

As an example of the use of Theorem 7.7, let us consider the mappingM12 of Example 7.1. This mapping isinvertible and onto, and so has a unique normal inverse by Theorem 7.7.

Does the converse hold? That is, is every full s-t tgd mappingwith a unique normal inverse necessarily onto?The next example shows that this is false.

Example 7.8 Let S1 consist of four unary relation symbolsPi, for 1 ≤ i ≤ 4, and letS2 consist of the fourunary relation symbolsQi, for 1 ≤ i ≤ 4 and the unary relation symbolR. Let Σ12 consist of the full s-t tgdsPi(x) → Qi(x), for 1 ≤ i ≤ 4, along with the full s-t tgdsP1(x) ∧ P2(x) → R(x) andP3(x) ∧P4(x) → R(x). Let M12 = (S1,S2, Σ12). The mappingM12 is not onto, since the target instance whoseset of facts is{Q1(0), Q2(0)} is not a solution for any source instanceI (such an instanceI must contain thefactsP1(0), P2(0), and so every solution forI must also contain the factR(0)). Let M21 = (S2, S1, Σ21),

whereΣ21 ={Qi(x) ∧ const(x) → Pi(x) : 1 ≤ i ≤ 4

}. AlthoughM12 is not onto, we now show (by using

Proposition 7.6) thatM12 has a unique normal inverse, namelyM21.

LetA be a source atom. By symmetry of the roles of the source atoms,we can assume without loss of gener-ality thatA is the source atomP1(x). Let δ′ be a demanding conjunction forA with const formulaconst(x) (andno otherconst formula). We now show thatδ′ must containQ1(x). Assume not; we shall derive a contradiction.

Now IA = {P1(cx)}, wherecx is the constant associated with the variablex as in Section 2. Letd be aconstant different fromcx. Let I consist of the factsPi(d) for 1 ≤ i ≤ 4, along with the factsPi(cx) for2 ≤ i ≤ 4. Sochase12(I) consists of the factsQi(d) for 1 ≤ i ≤ 4, along with the factsQi(cx) for 2 ≤ i ≤ 4,along with the factsR(cx) andR(d). Since the onlyconst formula inδ′ is const(x), it follows thatIδ′ containsonly one constant, namely the constantcx, and possibly also null values. Leth be a function whereh(cx) = cxandh(n) = d for every nulln. Sinceδ′ does not containQ1(x), it follows thatIδ′ does not containQ1(cx). SoIδ′ contains some subset of{Q2(cx), Q3(cx), Q4(cx)}, possibly along with some factsQi(n) for some nullsnand for1 ≤ i ≤ 4, possibly along withR(cx), and possibly some factsR(n) for some nullsn. Hence,h is ahomomorphism that mapsIδ′ to chase12(I). SinceIδ′ → chase12(I) butIA 6⊆ I, this contradicts the assumptionthatδ′ is demanding forA. This contradiction shows thatδ′ must containQ1(x). Hence,chase12(IA) ⊆ Iδ′ . Soby Proposition 7.6, it follows thatM12 has a unique normal inverse.2

Can a nonfull s-t tgd mapping have a unique normal inverse? The next example shows that the answer is“yes”.

Example 7.9 Let S1 consist of two unary relation symbolsP andQ, and letS2 consist of the binary relationsymbolR. Let Σ12 consist of the s-t tgdsP (x) → ∃yR(x, y), Q(y) → ∃xR(x, y), andP (x) ∧ Q(y) →R(x, y). Note that the first tgd is not a logical consequence of the third tgd, since if theP relation in the source isnonempty and theQ relation in the source is empty, then the first tgd fires but thethird tgd does not fire. Similarly,the second tgd is not a logical consequence of the third tgd. Let M12 = (S1,S2, Σ12), and letMc

21 be thecanonical candidate inverse ofM12. Thus,Mc

21 = (S2, S1, Σc21), whereΣc

21 consists of the normal constraintsR(x, y) ∧ const(x) → P (x) andR(x, y) ∧ const(y) → Q(y).

It is straightforward to verify thatMc21 is an inverse ofM12. We now show thatMc

21 is the unique normalinverse ofM12. LetM21 = (S2, S1, Σ21) be another normal inverse ofM12. By Proposition 6.4, we know thatΣ21 logically impliesΣc

21. So we need only show thatΣc21 logically impliesΣ21. Let τ be an arbitrary member

δ → A of Σ21; we must show thatΣc21 logically impliesτ . By symmetry in the roles ofP andQ, we can assume

without loss of generality that the conclusionA of τ is of the formP (x), wherex is a variable. By normality ofτ ,we know thatconst(x) appears inδ. We now show that there is a variable y such that the relational atomR(x, y)appears inδ. Assume not; we shall derive a contradiction.

We now define a ground instanceI. For each relational atomR(v, w) in δ, letP (cv) andQ(cw) be facts ofI,wherecv andcw are constants, as before. Because of the tgdP (x) ∧Q(y) → R(x, y) in Σ12, we see that the factR(cv, cw) is in chase12(I) for each relational atomR(v, w) in δ. It follows easily thatIδ → chase12(I). Sinceby assumption, there is no variable y such that the relational atomR(x, y) appears inδ, it follows by constructionof I thatIA 6⊆ I. BecauseIδ → chase12(I) andIA 6⊆ I, we know thatδ is not demanding forA. But δ → A is

23

Page 24: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

in Σ21, so there is a violation of condition (1) of Theorem 5.13 (where the weak renamingf is the identity map).This is our desired contradiction. Therefore, there is a variable y such that the relational atomR(x, y) appearsin δ.

Sinceδ containsR(x, y) andconst(x), it follows easily thatτ is a logical consequence of the tgdR(x, y) ∧

const(x) → P (x), which is inΣc21. SoΣc

21 logically impliesτ , as desired. This completes the proof thatMc21 is

the unique normal inverse ofM12.

Let Σ′12 consist of the first two s-t tgds inΣ12; that is,Σ′

12 consists of the s-t tgdsP (x) → ∃yR(x, y) andQ(y) → ∃xR(x, y). LetM′

12 = (S1,S2, Σ′12). The curious reader may wonder whetherM′

12, like M12, has aunique normal inverse. The answer is “no”, as we now show. Thecanonical candidate inverse ofM′

12 is the sameas the canonical candidate inverse ofM12, and it is straightforward to see that this canonical candidate inverse isindeed an inverse ofM′

12. We now give another, inequivalent inverse. LetΣ′21 consist of the normal constraints

R(x, y) ∧ const(x) → P (x) andR(x, y) ∧ const(y) → Q(y), as before, along with a third normal constraintR(x, x)∧R(x, y)∧ const(y) → P (y). LetM′

21 = (S2, S1, Σ′21). Since the conclusion of the third constraint is

P (y) rather thanQ(y), this third constraint is not a logical consequence of the first two. However, sinceR(x, x)does not arise in a chase, it is easy to see thatchase′21(chase12(I)) = I for each ground instanceI, and soM′

21

is another normal inverse ofM12. 2

As we see from Example 7.8, being invertible and onto is not a necessary and sufficient condition for a full s-ttgd mapping to have a unique normal inverse. Is there a language with a richer set of constructs such that beinginvertible and onto is a necessary and sufficient condition for a full s-t tgd mapping to have a unique inverse inthis language? We now give such a language.

Definition 7.10 A disjunctive tgd with inequalitiesis a constraint of the formα∧η → β, whereα is a conjunctionof source atoms,β is a disjunction of formulas of the form∃zβ′ with β′ a conjunction of target atoms, andη isa conjunction (possibly empty) of inequalities of the formx 6= y for distinct variablesx, y of β that are notexistentially quantified. Further, there is the safety condition that every variable inβ that is not existentiallyquantified must appear inα. Again, we have suppressed writing the leading universal quantifiers. 2

Disjunctive tgds with inequalities were defined in [FKPT08], where they were shown to be rich enough tospecify quasi-inverses of quasi-invertible full s-t tgd mappings.6 It was also shown there that inequalities inthe premise, and both disjunctions and existential quantifiers in the conclusion, are needed in general to specifyquasi-inverses of quasi-invertible full s-t tgd mappings.Note thatconst formulas are not part of the syntax. Everyinvertible full s-t tgd mapping has an inverse specified in this language, even without the disjunctions, namely thecanonical candidate inverse with theconst formulas dropped. The reason it is all right to drop theconst formulasis because of a simple result in [FKPT08], which we now state,which says thatconst formulas play no role forinverses of full s-t tgd mappings.

Proposition 7.11 [FKPT08] Let M12 = (S1,S2, Σ12) be a full s-t tgd mapping. LetM21 = (S2, S1, Σ21),whereΣ21 is a set of disjunctive s-t tgds with constants and inequalities. LetM′

21 = (S2, S1, Σ′21), whereΣ′

21 isobtained fromΣ21 by removing everyconst formula. LetI andJ be ground instances. Then(I, J) |= Σ12 ◦ Σ21

if and only if (I, J) |= Σ12 ◦ Σ′21. In particular,M21 is an inverse ofM12 if and only ifM′

21 is an inverse ofM12.

We now give a variant of Proposition 4.6 that holds for inverses specified by disjunctive tgds with inequalitiesand that we shall find useful later.

6In the definition of disjunctive tgds with inequalities thatis given in [FKPT08], inequalitiesx 6= y are allowed in the premise for anypairx, y of variables in the premise, and not just for variablesx, y that appear in the conclusion and are not existentially quantified. However,the disjunctive tgds with inequalities used in their construction of quasi-inverses all satisfy our restriction on variables that may appear ininequalities. This restriction is not needed for our results here, but we make this restriction by analogy with our restriction on inequalities thatwe have for normal mappings, that the inequalities must involve only variables in the conclusion.

24

Page 25: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

Proposition 7.12 Assume thatM12 = (S1,S2, Σ12) is a full s-t tgd mapping,M21 = (S2, S1, Σ21) is aninverse ofM12, and Σ21 is a set of disjunctive tgds with inequalities. LetI be a ground instance, and letU = chase12(I). Then(U, I) |= Σ21, andU witnesses(I, J) |= Σ12 ◦ Σ21 whenI ⊆ J .

Proof SinceM21 is an inverse ofM12, we know that(I, I) |= Σ12 ◦ Σ21 and therefore there exists someKsuch that(I,K) |= Σ12 and(K, I) |= Σ21. SinceU is a universal solution forI with respect toM12, there is ahomomorphismh : U → K that is the identity onI. Pick a constraintϕ ∈ Σ21; by assumption, it must be of theform

α(x, y) ∧ η(x) → ψ(x),

whereη(x) is a conjunction of inequalities (possibly empty) among thevariables inx, and whereψ(x) is adisjunction of existentially-quantified conjunctions, and where the variables inψ(x) that are not existentiallyquantified are exactly the members ofx. Assume thatU satisfies the premise ofϕ on a, b. That is,U |=α(a, b) ∧ η(a). Sinceh is a homomorphism fromU to K, we have thatK |= α(h(a), h(b)). Now everymember ofU is a constant, sinceU = chase12(I) andΣ12 is full. Thereforeh(a) = a andh(b) = b, and soK |= α(a, b) ∧ η(a). Since(K, I) |= Σ21, we must haveI |= ψ(a). This shows that(U, I) |= ϕ. Sinceϕis an arbitrary member ofΣ21, it follows that (U, I) |= Σ21, as desired. SinceI ⊆ J , it follows easily that(U, J) |= Σ21. Since(I, U) |= Σ12 and(U, J) |= Σ21, we have thatU witnesses(I, J) |= Σ12 ◦ Σ21, as desired.2

Recall that thecopy mapping, which is used to define the inverse, is the schema mappingId = (S, S,ΣId),whereΣId consists of the s-t tgdsR(x) → R(x) asR ranges over the relation symbols inS. We now define ap-copy mapping(where thep stands for “pseudo” or “permutation”) that is a generalization of the copy mapping.

Definition 7.13 The schema mapping(S,T, Σ) is ap-copy mappingif:

1. Every member ofΣ is of the form

P (x1, . . . , xk) → Q(xf(1), . . . , xf(k)),

whereP is a source relation symbol,Q is a target relation symbol,x1, . . . , xk are distinct variables, andfis a permutation of{1, . . . k}.

2. Every source relation symbol appears in exactly one premise ofΣ.

3. Every target relation symbol appears in exactly one conclusion ofΣ. 2

For example, assume thatS1 consists of the binary relation symbolP1 and the ternary relation symbolP2,andS2 consists of the binary relation symbolQ1 and the ternary relation symbolQ2. Assume thatΣ12 consistsof the s-t tgdsP1(x, y) → Q1(y, x) andP2(x, y, z) → Q2(y, x, z). Then(S1,S2, Σ12) is a p-copy mapping.

The next theorem says that disjunctive tgds with inequalities form a rich enough language that a full s-t tgdmapping has a unique inverse in this language if and only if itis invertible and onto. It also says that the only fulls-t tgd mappings with these properties are p-copy mappings.

Theorem 7.14 LetM12 = (S1,S2, Σ12) be a full s-t tgd mapping. The following are equivalent.

1. M12 has a unique inverse specified by disjunctive tgds with inequalities.

2. M12 is invertible and onto.

3. M12 is equivalent to a p-copy mapping.

25

Page 26: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

Proof We begin by showing that (3)⇒ (2). From (3), we know that there is a schema mappingM′12 that is

equivalent toM12 and that is a p-copy mapping. Clearly,M′12 is invertible and onto. It follows easily thatM12

is invertible and onto, as desired.

We now show that (2)⇒ (1). Assume that (2) holds. SinceM12 is invertible, Theorem 6.3 tells us that thecanonical candidate inverse is indeed an inverse ofM12, soM12 has a normal inverse. By Proposition 7.11,the const formulas are irrelevant, and soM12 has an inverse specified by tgds with inequalities, and hencebydisjunctive tgds with inequalities. Now assume thatM21 = (S2, S1, Σ21) andM′

21 = (S2, S1, Σ′21) are both

inverses ofM12, whereΣ21 andΣ′21 are disjunctive tgds with inequalities. We must show thatΣ21 andΣ′

21 arelogically equivalent. We now show thatΣ21 logically impliesΣ′

21. By symmetry, we have thatΣ′21 logically

impliesΣ21. Assume that(J,K) |= Σ21. We must show that(J,K) |= Σ′21. By replacing each null in(J,K) by

a new constant, we obtain(J ′,K ′) where every entry of every tuple is a constant, such that(J ′,K ′) is isomorphicto (J,K) (but where the isomorphism may map constants into either constants or nulls, and may map nulls intoeither constants or nulls). SinceΣ21 has noconst formulas, it follows easily that(J ′,K ′) |= Σ21. SinceM12

is onto, there is a ground instanceI such thatJ ′ = chase12(I). So(I, J ′) |= Σ12. Since also(J ′,K ′) |= Σ21,we have that(I,K ′) |= Σ12 ◦ Σ21. Therefore, sinceM21 is an inverse ofM12, we have thatI ⊆ K ′. Hence,sinceM′

21 is an inverse ofM12, we have that(I,K ′) |= Σ12 ◦ Σ′21. So by Proposition 7.12, we have that

(J ′,K ′) |= Σ′21. SinceΣ′

21 has noconst formulas, we have as before(J,K) |= Σ′21. This was to be shown.

We conclude by showing that (1)⇒ (3). Assume that (1) holds. We must show thatM12 is equivalent toa p-copy mapping. By Theorem 5.19, we know that every source atom has an essential target atom. Lete be afunction that maps every prime source atomA onto a target atome(A) that is essential forA. By Theorem 5.15,we know that(S2, S1, Σe

21) is an inverse ofA. LetΣE21 be the result of removing everyconst formula from every

member ofΣe21, and letME

21 = (S2, S1,ΣE21). Note that every member ofΣE

21 is of the formB ∧ ηA → A,whereB is an essential atom forA, and whereηA consists of all inequalities of the formx 6= y wherex andy are distinct variables ofA. Note that by Proposition 5.20, the variables inA andB are the same, soηA hasinequalities among all distinct variables ofB also. By Proposition 7.11 we know thatME

21 is an inverse ofM12.Since (1) holds,ME

21 is the unique inverse that is specified by disjunctive tgds with inequalities. We now provethe following claim:

Claim 1: chase12(IA) is a singleton for each source atomA.

Assume thatA is a source atom wherechase12(IA) is not a singleton; we shall derive a contradiction. Denotee(A) byB. Thus,B is essential forA. SinceB is relevant forA, we have thatIB ⊆ chase12(IA), sochase12(IA)is nonempty. Assume thatchase12(IA) has more than one member; we shall derive a contradiction. Now ΣE

21

contains the formulaB ∧ ηA → A. Let νA be as in Definition 6.1. In particular,νA contains B and at least onemore distinct relational atomB′. FormΣ21 from ΣE

21 by replacingB ∧ ηA → A by νA ∧ ηA → A, and letM21 = (S2, S1, Σ21). SinceB is demanding forA, it is easy to see thatνA is demanding forA. It follows fromTheorem 5.13 (and the fact that every weak renaming consistent with ηA is a strict renaming, sinceA andνA havethe same variables) thatM21 is an inverse ofM12. We now show thatΣ21 is not logically equivalent toΣE

21.

Assume thatB is an atomQ(x1, . . . , xt), and thatF is a factQ(a1, . . . , at). Let us say thatF is anexactmatchfor B if ai = aj if and only if xi andxj are the same variable, for alli, j. Similarly, we define what itmeans for one atom to be an exact match for another atom (with the same relational symbol). LetJ consist ofa single factF that is an exact match forB. We now show that(J, ∅) 6|= ΣE

21, but (J, ∅) |= Σ21. SoΣ21 is notlogically equivalent toΣE

21, as desired. The fact that(J, ∅) 6|= ΣE21 follows from the fact thatΣE

21 contains theformulaB ∧ ηA → A, andJ contains a fact that is an exact match forB. It remains to show that(J, ∅) |= Σ21.Letσ be a member ofΣ21 except forνA ∧ ηA → A. SinceJ consists of a single fact that is an exact match forB,it follows thatσ does not fire onJ , because otherwiseF would be an exact match for the atomB′ in the premiseof σ, and soB′ andB would be an exact match, which is not possible since they are essential for atoms that arenot an exact match for each other. So(J, ∅) |= σ. We now show that(J, ∅) |= νA ∧ ηA → A. To show this, wemust show thatνA ∧ ηA → A does not fire onJ . If it were to fire onJ , then there would be a mappingh onthe variables inνA that maps each atom inνA ontoF and (because ofηA) is one-to-one on the variables inνA.Recall thatB′ is a relational atom inνA other thanB. SinceB andB′ map onto the same factF , it follows that

26

Page 27: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

B′ is, likeB, aQ-atom. Assume thatB′ isQ(xi1 , . . . , xit), where eachxir

is in {x1, . . . , xt}. Assume thatFis the factQ(a1, . . . , at). Now h(xir

) = ar = h(xr) for eachr, since bothB andB′ map ontoF . Sinceh isone-to-one on variables, it follows thatxir

andxr are the same variable for eachr. SoB′ andB are the sameatom, a contradiction. This contradiction shows thatνA ∧ ηA → A does not fire onJ , as desired. This concludesthe proof thatΣ21 is not logically equivalent toΣE

21.

SinceM21 andME21 are both inverses ofM12, specified by tgds with inequalities, even thoughΣ21 andΣE

21

are not logically equivalent, this contradicts our assumption that (1) holds. This proves Claim 1.

DefineΣ′12 to consist of all s-t tgds of the formA→ e(A), whereA is a prime atom with all variables distinct,

and wheree(A) is as before (in particular,e(A) is an essential atom forA). Note for later use that the variablesin the premiseA and conclusione(A) are the same by Proposition 5.20. Sincee(A) is essential forA, it followsthate(A) is relevant forA, soIe(A) ⊆ chase12(IA). Therefore, from Claim 1, we see thatIe(A) = chase12(IA).It is clear thatΣ12 logically impliesΣ′

12. Later, we shall show thatΣ12 is logically equivalent toΣ′12. First, we

prove another claim.

Claim 2: Assume thatA andB are atoms, and thatB is essential forA with respect toΣ12. ThenΣ′12

logically implies the s-t tgdA→ B.

Assume that Claim 2 were false; we shall derive a contradiction. Assume thatA is aP -atom. Defineν′A likeνA, except that the chase is withΣ′

12 instead ofΣ12. LetB′ beν′A. Note thatB′ is a singleton atom, because itarises only by firing the s-t tgd inΣ′

12 whose premise is theP -atom with all variables distinct. SinceΣ′12 does

not logically imply the s-t tgdA → B, we know thatB is different fromB′. SinceB′ is derived as the result ofa chase withΣ′

12, and sinceΣ12 logically impliesΣ′12, it follows thatB′ is in νA. SoνA contains at least the two

distinct atomsB andB′. This contradicts Claim 1, which is our desired contradiction.

Claim 3: Σ12 is logically equivalent toΣ′12.

We already noted thatΣ12 logically impliesΣ′12, so Claim 3 is proven if we show thatΣ′

12 logically impliesΣ12. Assume not; we shall derive a contradiction. We can assume without loss of generality that every memberof Σ12 has a singleton conclusion. Letα → B be a member ofΣ12 that is not a logical consequence ofΣ′

12.We now show that there is no atomA such thatB is essential forA with respect toΣ12. Assume that therewere. By Claim 2, it follows thatΣ′

12 logically impliesA → B. Sinceα → B is a member ofΣ12, we haveIB ⊆ chase12(Iα), Therefore, sinceB is demanding forA with respect toΣ12, it follows thatIA ⊆ Iα, that is,the atomA appears inα. Hence, sinceΣ′

12 logically impliesA→ B, we have thatΣ′12 logically impliesα→ B,

a contradiction. This contradiction shows that there is no atomA such thatB is essential forA with respect toΣ12.

Assume thatB is the atomQ(x1, . . . , xt) wherex1, . . . , xt are variables (not necessarily distinct). Letτ be anarbitrary member ofΣ12 of the formδ → Q(z1, . . . , zt), wherez1, . . . , zt are variables, not necessarily distinct,and wherexi andxj are the same variable wheneverzi andzj are the same variable. Lethτ be a function withdomain the variables inτ such thathτ (zi) = xi for eachi (this is well-defined, sincexi andxj are the samevariable wheneverzi andzj are the same variable), and wherehτ maps each variable in the premiseδ of τ thatis not in the conclusionQ(z1, . . . , zt) of τ onto a new variable. Letτ ′ be the image ofτ underhτ . Thus,τ ′ isa weak renaming ofτ . Note thatτ ′ is not necessary a strict renaming ofτ , since two distinct variableszi andzj

will map onto the same variable ifxi andxj are the same variable. It is straightforward to verify that if I is asource instance and the chase ofI with τ produces a factF that is an exact match forB, then the chase ofI withτ ′ also producesF .

By construction, the conclusion ofτ ′ isB. Assume that the premise ofτ ′ is δ′. Letψτ be the formula∃yδ′,wherey consists of the variables inτ ′ that are not inB. LetZ consist of all such formulasψτ . In particular,Zcontains∃yα, wherey consists of all variables inα that are not inB. NowZ is finite, since its size is at most thenumber of members ofΣ12. Letγ be the formulaB ∧ η → ζ, whereη is the conjunction of all inequalities of theform xi 6= xj wherexi andxj are distinct variables inB (and hence distinct variables inα), and whereζ is the

disjunction of the members ofZ. Let Σ21 = ΣE21 ∪ {γ}, and letM21 = (S2, S1, Σ21).

We now show thatM21 is an inverse ofM12, and thatΣ21 is not logically equivalent toΣE21. To show that

27

Page 28: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

M21 is an inverse ofM12, we must show that for all ground instancesI andJ :

(I, J) |= Σ12 ◦ Σ21 if and only if I ⊆ J. (4)

SinceME21 is an inverse ofM12, we know that for all ground instancesI andJ :

(I, J) |= Σ12 ◦ ΣE21 if and only if I ⊆ J. (5)

Now Σ21 logically impliesΣE21, sinceΣ21 is a superset ofΣE

21. It follows easily thatΣ12 ◦ Σ21 logicallyimpliesΣ12 ◦ΣE

21. So if (I, J) |= Σ12 ◦Σ21, then(I, J) |= Σ12 ◦ΣE21, which, by (5), implies thatI ⊆ J . Assume

now thatI ⊆ J ; we must show that(I, J) |= Σ12 ◦ Σ21. Let J∗ = chase12(I). Since(I, J∗) |= Σ12, we needonly show that(J∗, J) |= Σ21. Now (J∗, I) |= ΣE

21, by Proposition 7.12. and so(J∗, J) |= ΣE21 sinceI ⊆ J .

Therefore, sinceΣ21 = ΣE21 ∪ {γ}, we need only show that(J∗, J) |= γ. Assume thatγ fires onJ∗. Then there

is a one-to-one mappingh (one-to-one because ofη) from the variables ofB to constants, that mapsB onto a factF of J∗. SoF is an exact match forB. SinceF is in J∗, there is a memberτ of Σ12 that generatesF in the chaseof I with Σ12. Letτ ′ andδ′ be as before. SinceF is an exact match forB, it follows from an earlier comment thatthe chase ofI with τ ′ generatesF . It follows fairly easily that∃yδ′ is satisfied inI underh, so∃yδ′ is satisfiedin I underh. SinceI ⊆ J , it follows that∃yδ′ is satisfied inJ underh. But ∃yδ′ is a disjunct in the conclusionof γ. Therefore,(J∗, J) |= γ, as desired. This concludes the proof thatM21 is an inverse ofM12.

We now show thatΣ21 is not logically equivalent toΣE21. It is clear that(IB , ∅) 6|= γ, and so(IB , ∅) 6|= Σ21.

We now show that(IB , ∅) |= ΣE12. Since, as we showed, there is no atomA such thatB is essential forA with

respect toΣ12, no member ofΣE21 has a strict renaming ofB in its premise. So for each memberB′ ∧ ηA′ → A′

of ΣE21, there is no mappingh that mapsB′ ontoB and satisfiesηA′ . It follows that(IB , ∅) |= ΣE

12, as desired.

We have shown thatM12 has two distinct, inequivalent inverses given by disjunctive tgds with inequalities,namelyME

21 andM21. This contradiction shows that Claim 3 holds.

We now state and prove our final claim.

Claim 4: For every target relation symbolQ, there is exactly one member ofΣ′12 whose conclusion is a

Q-atom. No two variables appearing in thisQ-atom are the same.

To prove this claim, we begin by showing that there must be at least one member ofΣ′12 whose conclusion

is aQ-atom, where no two variables appearing in thisQ-atom are the same. Assume not; we shall derive acontradiction. Letγ be the formulaQ(x1, . . . , xt) ∧ η → β, where the variablesx1, . . . , xt are distinct, whereη is a conjunction of the inequalitiesxi 6= xj wheneveri 6= j, and whereβ is an arbitrary disjunction of source

atoms whose variables altogether are exactlyx1, . . . , xt.7 Let Σ21 = ΣE21 ∪ {γ}, and letM21 = (S2, S1, Σ21).

We now show thatM21 is an inverse ofM12, and thatΣ21 is not logically equivalent toΣE21. Let J∗ =

chase12(I). LetK = chase′12(I), the result of chasingI with Σ′12. Since (by Claim 3)Σ12 is logically equivalent

to Σ′12, and since bothΣ12 andΣ′

12 are full, it follows thatK = J∗ (bothK andJ∗ are the unique, null-free coreof the universal solutions forI). As in the proof of Claim 3, to show thatM21 is an inverse ofM12, we needonly show that(J∗, J) |= γ. Since by assumption there is no member ofΣ′

12 whose conclusion is aQ-atom withno two variables appearing in thisQ-atom the same, and since, as we showed,J∗ = chase′12(I) it follows easilythatJ∗ does not contain aQ-factQ(c1, . . . , ct), wherec1, . . . , ct are distinct constants. Soγ does not fire onJ∗,and so(J∗, J) |= γ, as desired.

We now show thatΣ21 is not logically equivalent toΣE21. Let J consist of the singleton factQ(c1, . . . , ct),

wherec1, . . . , ct are distinct constants. Then(J, ∅) 6|= γ, and so(J, ∅) 6|= Σ21. We now show that(J, ∅) |= ΣE21.

Each member ofΣE21 is of the forme(A) ∧ ηA → A, Since, by assumption, there is no member ofΣ′

12 whoseconclusion is aQ-atom with no two variables appearing in thisQ-atom the same, and since each atom in thepremise of a member ofΣE

21 is a conclusion of a member ofΣ′12, it follows that no atom in a premise of a member

of ΣE21 is aQ-atom with no two variables appearing in thisQ-atom the same. Therefore, no member ofΣE

21 firesonJ . So(J, ∅) |= ΣE

21, as desired. .

7A disjunction is required if no source atom has arity at leastt.

28

Page 29: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

We have shown thatM12 has two distinct, inequivalent inverses given by disjunctive tgds with inequalities,namelyME

21 andM21. This contradiction shows that there must be at least one memberσ of Σ′12 whose conclu-

sion is aQ-atom with no two variables appearing in thisQ-atom the same.

We now show that there can be no other memberσ′ of Σ′12 whose conclusion is aQ-atom. Assume not; we

shall derive a contradiction. Assume that the premise ofσ is aP -atom and the premise ofσ′ is aP ′-atom. Sinceσ andσ′ are different, we know thatP andP ′ are different, by construction ofΣ′

12. Since no two variablesappearing in the conclusion ofσ are the same, there is a mappingh that maps the variables inσ to the variablesin σ′ that maps the conclusion ofσ onto the conclusion ofσ′. LetA be theP -atom that is the result of applyingh to the premise ofσ. So the chase ofIA with Σ′

12 is IB′ , whereB′ is the conclusion ofσ′. This contradicts thefact that conclusion ofσ′ is essential for the premise ofσ′. This contradiction shows that the only member ofΣ12

whose conclusion is aQ-atom isσ, where every variable in the conclusion is distinct. This completes the proofof Claim 4.

As we noted earlier, the variables in the source and target ofeach member ofΣ′12 are the same by Proposi-

tion 5.20, since the target is essential for the source with respect toΣ12. By construction, for every source relationsymbolP , there is exactly one member ofΣ′

12 whose premise is aP -atom, and every variable is distinct in thisP -atom. By Claim 4, for every target relation symbolQ, there is exactly one member ofΣ′

12 whose conclusion isaQ-atom, and no two variables appearing in thisQ-atom are the same. It follows thatM12 is a p-copy mapping.Also, by Claim 3,Σ12 is logically equivalent toΣ′

12. So (3) holds, as desired. This completes the proof that (1)⇒ (3). 2

Note that we cannot replace (2) in the statement of the theorem by simply “M12 is onto”, because of theschema mapping with source relation symbolsP andR and the single target relation symbolQ, that is specifiedby the tgdsP (x) → Q(x), R(x) → Q(x). This schema mapping is clearly onto but not invertible.

We can now give the proof of Theorem 7.7.

Proof of Theorem 7.7:

Assume thatM12 = (S1,S2, Σ12) is a full s-t tgd mapping that is invertible and onto. By Theorem 7.14,we know thatM12 is equivalent to a p-copy mapping. We can assume without lossof generality thatM12 itselfis a p-copy mapping. SinceM12 is invertible, we know that the canonical candidate inverseis a normal inverseof M12. We now use Proposition 7.6 to show thatM12 has a unique normal inverse. To apply Proposition 7.6,assume thatA is a source atom andδ′ is a demanding conjunction forA with formulasconst(x) precisely for thevariablesx of A; we must show thatchase12(IA) ⊆ Iδ′ .

If Σ12 includes the tgdP (x1, . . . , xk) → Q(xf(1), . . . , xf(k)), and ify1, . . . , yk are variables, not necessarilydistinct, then let us refer to the atomsP (y1, . . . , yk) andQ(yf(1), . . . , yf(k)) asbuddies. LetχA be the conjunctionof the formulasconst(x) for the variablesx of A. Letγ be the conjunction of the buddies of the relational atomsin δ′, and letγ′ beγ′ ∧ χA. Sinceδ′ is demanding forA, and sinceIδ′ = chase12(Iγ′), it follows thatIA ⊆ Iγ′ .Sochase12(IA) ⊆ chase12(Iγ′). Clearlychase12(Iγ′) = Iδ′ . Hence,chase12(IA) ⊆ Iδ′ , as desired.2

Let us reconsider the schema mappingM12 from Example 7.8. It has a unique normal inverse, but sinceM12

is not equivalent to a p-copy mapping, it follows from Theorem 7.14 thatM12 does not have a unique inversespecified by disjunctive tgds with inequalities. In addition toM21 = (S2, S1, Σ21) from Example 7.8, anotherinverse is specified byΣ21 along with the disjunctive tgdR(x) → (P1(x) ∨ P3(x)).

Define anear p-copy mappingto be a full s-t tgd mappingM = (S,T, Σ) where (i) for each memberσ of Σ,the premise and conclusion ofσ are each singletons, with the same variables in the premise as in the conclusion,and with the variables in the conclusion all distinct, and where (ii) every member ofT appears in the conclusionof exactly one member ofΣ, and every member ofS appears in the premise of at most one member ofΣ. Thus, anear p-copy mapping may differ from being a p-copy mapping for two reasons. First, the variables in the premiseare not necessarily distinct. Second, some member ofS may fail to appear inΣ. By theconstants-added versionof an s-t tgd mapping, we mean the mapping that results by adding to the premise of every tgd the formulasconst(x) for every variablex that appears in the conclusion. Returning again to Example 7.8, we see that theunique normal inverseM21 = (S2, S1, Σ21) is the constants-added version of a near p-copy mapping (it is only

29

Page 30: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

“near”, because the relation symbolR does not appear inΣ21). This is not a coincidence. As a consequence of alater result (Theorem 10.2) that relates the number of normal inverses to the number of constraints in an inverse,we obtain the following result, whose proof we shall give in Section 10.

Theorem 7.15 If a full s-t tgd mapping has a unique normal inverseM21, thenM21 is equivalent to theconstants-added version of a near p-copy mapping.

It is straightforward to verify that the schema mappingM21 in Example 7.9 is not equivalent to the constants-added version of a near p-copy mapping. Since alsoM21 is the unique normal inverse of a schema mapping, itfollows that Theorem 7.15 fails in the nonfull case.

In addition to the fact that Theorem 7.14 allows disjunctions in the inverse and Theorem 7.15 does not, anotherdifference between the two theorems is that Theorem 7.14 characterizes the mappingM12, whereas Theorem 7.15characterizes the inverse mappingM21.

We close this section with an explanation of whyconst formulas are not allowed in the language for inversesused in Theorem 7.14. Would the theorem still be true if we were to enrich the language for inverses still furtherto be disjunctive tgds with constants and inequalities? It turns out that uniqueness is then hopeless. For example,consider the schema mappingM12 of Example 7.1. Letσ1 be the constraintS(x) ∧ const(x) → R(x), and letσ2 be the constraintS(x) → ∃yR(y). In addition to the inverseM21 given in Example 7.1, which is specifiedby σ1, another inverse is specified by{σ1, σ2}. The constraintσ2 is not logically implied by the constraintσ1,because of theconst formula in the premise ofσ1 but notσ2. More generally, if there were a full, invertible s-ttgd mappingM′

12 with a unique inverse specified by disjunctive tgds with constants and inequalities, then all themore so it would have a unique inverse specified by disjunctive tgds with inequalities (there is at least one suchinverse, namely the canonical candidate inverse). So from the implication (1)⇒ (3) of Theorem 7.14, it wouldfollow thatM′

12 is equivalent to a p-copy mapping. But then the obvious generalization of the construction wejust gave for a second inverse ofM12 of Example 7.1 would show thatM′

12 has inequivalent inverses specifiedby disjunctive tgds with constants and inequalities, a contradiction.

8 Inverse of the Inverse

In this section, we consider the question as to when a normal inverse of a schema mapping is itself invertible.Surprisingly, it turns out to be rare that a normal inverse ofan s-t tgd mapping is invertible. We focus on the fullcase, and then show by example that our results do not hold in the nonfull case.

We now give an example of a schema mapping with an invertible normal inverse.

Example 8.1 Let M12 andM21 be as in Example 7.1. ThenM21 is a normal inverse ofM12, andM12 is aninverse ofM21. SoM21 is an invertible normal inverse ofM12. 2

More generally, it is straightforward to see that every p-copy mapping has an invertible normal inverse. Thus,as in Example 8.1, letM12 be a p-copy mapping, and letM21 be obtained fromM12 by “reversing the arrows”and addingconst formulas to the premises. Then, as in Example 8.1,M21 is a normal inverse ofM12, andM12

is an inverse ofM21, The next theorem tells us that p-copy mappings are theonly full s-t tgd mappings with aninvertible normal inverse. Before we state and prove this theorem, we need a simple lemma from [Fag07].

Lemma 8.2 [Fag07] LetM12 be an invertible s-t tgd mapping. IfI1 6= I2, thenchase12(I1) 6= chase12(I2).

We now give the main result of this section.

Theorem 8.3 LetM12 be a full s-t tgd mapping. ThenM12 has an invertible normal inverse if and only ifM12

is equivalent to a p-copy mapping.

30

Page 31: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

Proof As we noted earlier, it is straightforward to see that every p-copy mappingM12 has an invertible normalinverse (in fact,M12 itself is an inverse of its normal inverse). We now prove the converse for full s-t tgdmappings. Thus, letM12 = (S1,S2, Σ12) be a full s-t tgd mapping, and letM21 = (S2, S1, Σ21) be aninvertible normal inverse ofM12. We shall show thatM12 is equivalent to a p-copy mapping.

Whenever we speak of relevant, demanding, or essential atoms in this proof, we mean with respect toΣ12.We shall reserveA andA′ for source atoms (with relation symbols inS1), andB andB′ for target atoms (withrelation symbols inS2).

Claim 1: If B is a relevant atom for a source atomA, thenchase21(IB) = IA.

We now prove Claim 1. Assume thatB is a relevant atom forA. Now chase21(IB) is nonempty, sinceotherwisechase21(IB) = chase21(∅), which is impossible by Lemma 8.2 (whereM21 plays the role ofM12).SinceM21 is full, we know that thatchase21(IB) has no nulls, and so every fact inchase21(IB) is of the formIA′ for some atomA′. The claim is proven if we show that wheneverIA′ ⊆ chase21(IB), thenA′ is the sameatom asA. So assume thatIA′ ⊆ chase21(IB). By Lemma 5.17, where the role ofA is played byA′, we knowthatB is demanding forA′. SinceB is also relevant forA, it follows from Lemma 5.16 thatA′ is the same atomasA, as desired.

Claim 2: Each source atomA has exactly one relevant atomB, andB is essential forA.

We now prove Claim 2. SinceM12 is invertible, it follows from Theorem 5.19 thatA has some essentialatomB. SoB is relevant forA. Assume thatA has another relevant atomB′; we shall derive a contradiction. ByClaim 1, we have thatchase21(IB) andchase21(IB′) both equalIA, and so are equal. But this is impossible byLemma 8.2 (whereM21 plays the role ofM12).

Let us denote the unique relevant atom forA byBA. For the next claim, recall that ifϕ is a formula, andf isa weak renaming, thenϕf is the result of replacing every variablex in ϕ by f(x).

Claim 3: Let f be a weak renaming. Then(BA)f = BAf .

We now prove Claim 3. Assume thatAf = A′. SinceBA is relevant forA, it is clear that the result(BA)f

of weakly renamingBA usingf is relevant forA′. That is,(BA)f is relevant forA′. By definition, the uniquerelevant atom forA′ isBA′ . Therefore,(BA)f = BA′ = BAf , as desired.

Claim 4: LetB be a target atom. ThenB is relevant for some source atom.

We now prove Claim 4. We prove it first when every variable inB is distinct. SinceM21 is invertible,we know thatchase21(IB) 6= ∅, since otherwisechase21(IB) = chase21(∅), which is impossible by Lemma 8.2(whereM21 plays the role ofM12). So there is some memberδ → A of Σ21 that fires onIB . Hence,chase21(Iδ)

includesIA. By Claim 1, we have thatchase21(IBA) = IA. So chase21(IBA

) ⊆ chase21(Iδ). SinceM21 isinvertible, it satisfies (the homomorphic version of) the subset property, as given in Section 3 (although the subsetproperty and its homomorphic version are shown to be equivalent to invertibility for s-t tgd mappings, this holdsalso for normal mappings, by the same proof). SoIBA

⊆ Iδ. Therefore,δ hasBA as a conjunct. Since theconstraintδ → A of Σ21 fires onIB , there is a homomorphism fromBA to B. Since every variable inB isdistinct, it follows thatBA andB are the same up to a renaming of variables. Therefore, sinceBA is relevant forA, we know thatB is relevant for some atom obtained by renaming the variablesof A. This completes the proofof Claim 4 when all of the variables inB are distinct.

LetB′ be a target atom where the variables need not be distinct. LetB be an atom where all of the variablesare distinct and whereB′ is obtained fromB by a weak renamingf , that is,B′ = Bf . Since all of the variable inB are distinct, it follows from what we have shown thatB is relevant for some source atomA, and soB is simplyBA. Therefore,B′ = (BA)f . Hence, by Claim 3, we know thatB′ isBAf . SoB′ is relevant forAf .

Claim 5: LetA be a source atom with all variables distinct. Then every variable inBA is distinct.

We now prove Claim 5. SinceBA is essential forA, it follows from Proposition 5.11 thatBA has exactlythe same variables asA. We now show that every variable inBA is distinct. Assume not; we shall derive acontradiction. LetB′ be an atom with the same relation symbol asBA but with every variable distinct. SoB′ hasstrictly more variables thanA. Since also every variable inA is distinct, and every variable inB′ is distinct, it

31

Page 32: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

follows that the arity ofB′ is strictly bigger than the arity ofA. By Claim 4,B′ is relevant for some source atomA′. Since by Claim 2 we know thatA′ has a unique relevant atom, and this atom is essential forA′, it follows thatB′ is essential forA′. So by Proposition 5.11, we know thatB′ andA′ have the same variables. Therefore, sinceevery variable inB′ is distinct, the arity ofA′ is at least the arity ofB′, which as we noted is strictly bigger thanthe arity ofA. So the arity ofA′ is strictly bigger than the arity ofA. SinceBA is obtained fromB′ by a weakrenaming, andB′ is relevant forA′, it follows thatBA is relevant for an atomA′′ obtained fromA′ by a weakrenaming. SinceBA is demanding forA, it follows from Lemma 5.16 thatA′′ andA are the same atom. But thisis impossible, sinceA′′ has the same arity asA′, and the arity ofA′ is strictly bigger than the arity ofA. This isour desired contradiction. This completes the proof of Claim 5.

Let Σ′12 consist of all of the constraintsA → BA, whereA is a prime atom with all variables distinct. Let

M′12 = (S1,S2, Σ′

12).

Claim 6: Σ12 andΣ′12 are logically equivalent.

We now prove Claim 6. ClearlyΣ12 logically impliesΣ′12. We now show thatΣ′

12 logically impliesΣ12. Wefirst show that each of the constraintsA′ → BA′ is a logical consequence ofΣ′

12. Let A be an atom with thesame relation symbol asA′ and with all variables distinct. So there is a renamingf whereA′ isAf . By Claim 3,we know that(BA)f = BA′ . SoA′ → BA′ is (A → BA)f . ThereforeA′ → BA′ is a logical consequence ofA→ BA, and so ofΣ′

12, as desired.

Assume now thatϕ → B is a member ofΣ12. By Claim 4, we know thatB isBA for some source atomA.Sincechase12(IA) is IBA

, andchase12(Iϕ) includesIBA, it follows thatchase12(IA) ⊆ chase12(Iϕ), and there-

fore chase12(IA) → chase12(Iϕ). So by the homomorphic version of the subset property,IA ⊆ Iϕ. Therefore,A is inϕ. Soϕ→ B is a logical consequence ofA→ B, that is, ofA→ BA, and so is a logical consequence ofΣ′

12. This completes the proof of Claim 6.

We conclude by showing thatM′12 is a p-copy mapping. LetA→ BA be a member ofΣ′

12. By construction,every variable inA is distinct. As noted earlier,BA has exactly the same variables asA, and by Claim 5, everyvariable inBA is distinct. By construction, every source relation symbolappears in exactly one premise ofΣ′

12.To complete the proof thatM′

12 is a p-copy mapping, all that is left to show is that every target relation symbolappears in exactly one conclusion ofΣ′

12.

LetQ be an arbitrary target relation symbol, and letB′ be aQ-atom with every variable distinct. By Claim 4,we have thatB′ is relevant for some source atomA′, and soB′ equalsBA′ . Assume thatA′ is aP -atom. LetAbe the primeP -atom with all variables distinct. Letf be a weak renaming whereA′ isAf . By Claim 3, we knowthatBA′ , that is,B′, is (BA)f . Hence, sinceB′ is aQ-atom, so isBA. SoQ appears in some conclusion ofΣ′

12.

We now show thatQ cannot be in more than one conclusion inΣ′12. SayQ were in the conclusion of the

member ofΣ′12 whose premise has relation symbolP and also in the conclusion of the member ofΣ′

12 whosepremise has relation symbolP ′. LetF be the factP (0, . . . , 0), where every variable is set to 0. Similarly, letF ′

be the factP ′(0, . . . , 0), where every variable is set to 0. Then the result of chasingF with Σ′12 is Q(0, . . . , 0),

and identically the result of chasingF ′ with Σ′12 isQ(0, . . . , 0). But this is impossible by Lemma 8.2, sinceΣ12

andΣ′12 are logically equivalent by Claim 6. This concludes the proof thatM′

12 is a p-copy mapping.2

In Theorem 7.14, we characterized full s-t tgd mappings thathave a unique inverse specified by disjunctivetgds with inequalities, by showing that these are exactly those schema mappings equivalent to a p-copy mapping.In Theorem 8.3, we characterized full s-t tgd mappings with an invertible normal inverse by showing that these,too, are exactly those schema mappings equivalent to a p-copy mapping. We thereby obtain the unexpected resultthat a full s-t tgd mapping has a unique inverse specified by disjunctive tgds with inequalities if and only if it hasan invertible normal inverse. The next theorem states this equivalence (along with the other equivalences that weobtain from Theorems 7.14 and 8.3).

Theorem 8.4 LetM12 be a full s-t tgd mapping. The following are equivalent.

1. M12 has a unique inverse specified by disjunctive tgds with inequalities.

2. M12 has an invertible normal inverse.

32

Page 33: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

3. M12 is equivalent to a p-copy mapping.

4. M12 is invertible and onto.

What about the nonfull case? The technical condition (4) of Theorem 8.4 no longer makes sense then, becausewe have not even defined what it means for a nonfull mapping to be onto. We now show by example that thisequivalence of (1) and (2) in Theorem 8.4 fails when we drop the assumption thatM12 be full.

Example 8.5 Let S1 consist of the unary relation symbolsP1 andP2, and letS2 consist of the unary rela-tion symbolsQ1 andQ2. Let Σ12 consist of the s-t tgdsP1(x) → Q1(x) andP2(x) → ∃y(Q2(x) ∧ Q1(y)).Let Σ12 consist of the s-t tgdsP1(x)toQ1(x) andP2(x) → Q2(x). Let Σ21 consist of the normal constraintsQ1(x) ∧ const(x) → P1(x) andQ2(x) ∧ const(x) → P2(x). Let Σ′

21 consist of the normal constraintsQ1(x) ∧ const(x) → P1(x) andQ2(x) ∧ Q1(y) ∧ const(x) → P2(x). Let M12 = (S1,S2, Σ12), letM′

12 = (S1,S2, Σ′12), let M21 = (S2, S1, Σ21), and letM′

21 = (S2, S1, Σ′21). It is straightforward to verify

thatM21 andM′21 are inequivalent normal inverses ofM12, andM′

12 is an inverse ofM21. So condition (2) ofTheorem 8.4 holds, sinceM21 is a normal inverse ofM12, andM′

12 is an inverse ofM21. However, condition(1) of Theorem 8.4 fails, sinceM12 has two inequivalent normal inverses, namelyM21 andM′

21. Furthermore,it is not hard to see that condition (3) of Theorem 8.4 fails also. 2

9 The Size of an Inverse, and Complexity of Computing an Inverse

In this section, we consider the question of whether there isa polynomial-size inverse in some language. Theexact definition of size is not very important. For simplicity, we take the size of a relational atom to be its arity,the size of an inequality or aconst formula to be 1, the size of a formula to be the sum of the sizes of the relationalatoms, inequalities, andconst formulas in it, and the size of a schema mapping(S,T, Σ) to be the sum of thesizes of the members ofΣ. We show that there is a family of invertible, full s-t tgd mappingsM where theminimal number of constraints in a normal inverse ofM is exponential in the size ofM (and so, since the size ofa mapping is at least equal to the number of constraints in it,the normal inverse ofM with the smallest size hassize exponential inM). We also show, however, that if we expand the language to allow the premise to containBoolean combinations of equalities rather than simply conjunctions of inequalities, then every invertible, full s-ttgd mapping has a polynomial-size inverse, that can be computed in polynomial time. Note that we cannot tellif the output of this polynomial-time algorithm is actuallyan inverse, since (by Corollary 3.4), the complexityof deciding invertibility, even in the full case, is coNP-complete. Instead, what we know is that if the schemamappingM is invertible, then the output of this polynomial-time algorithm is an inverse ofM. It is an interestingopen problem as to whether similar results can be obtained for s-t tgd mappings that are not full.

Theorem 9.1 There is a family of full s-t tgd mappings, each of which is invertible, but where the minimal numberof constraints in a normal inverse is exponential in the sizeof the schema mapping.

Proof The family is parameterized by the positive integerk. LetS1 = {P0, . . . , Pk}, and letS2 = {P ′0, . . . , P

′k,

Q0, . . . , Qk−1}. Assume that all of the relation symbols inS1 andS2 are4k-ary.

Letx1, . . . , x4k be distinct variables. LetS1 consist of the s-t tgdsPi(x1, x2, . . . , x4k) → P ′i (x1, x2, . . . , x4k),

for 0 ≤ i ≤ k. Define xi, for 0 ≤ i ≤ k − 1, by lettingxi4i+2 = x4i+1, xi

4i+4 = x4i+3, andxij = xj if

j 6∈ {4i+ 2, 4i+ 4}. For example,x0 is

(x1, x1, x3, x3, x5, x6, . . . , x4k−1, x4k),

andx1 is(x1, x2, x3, x4, x5, x5, x7, x7, x9, x10, . . . , x4k−1, x4k).

33

Page 34: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

Let S2 consist of the s-t tgdsPi+1(xi) → P ′

0(xi), for 0 ≤ i ≤ k − 1. Let S3 consist of the s-t tgdsP0(x

i) →Qi(x

i), for 0 ≤ i ≤ k − 1. Let Σ12 = S1 ∪ S2 ∪ S3, and letM12 = (S1,S2, Σ12).

We begin by showing thatM12 is invertible. LetT1 consist of the s-t tgdsP ′j(x1, x2, . . . , x4k) → Pj(x1, x2, . . . , x4k),

for 1 ≤ j ≤ k (note that we do not include the casej = 0). LetT2 consist of the formula

P ′0(x1, x2, . . . , x4k) ∧ ((x1 6= x2) ∨ (x3 6= x4))

∧ ((x5 6= x6) ∨ (x7 6= x8))

∧ . . .

∧ ((x4k−3 6= x4k−2) ∨ (x4k−1 6= x4k))

→ P0(x1, x2, . . . , x4k).

LetT3 consist of the s-t tgdsQi(xi) → P0(x

i), for 0 ≤ i ≤ k−1. LetΣ21 = T1∪T2∪T3, and letM21 = (S2, S1,Σ21).

Note thatchase21 is well-defined, even in the presence of the formula inT2.

We now show thatM21 is an inverse ofM12. It is sufficient to show thatI = chase21(chase12(I)) for eachground instanceI (this is because the analogue of Theorem 4.10 holds, by the same proof). We first show thatI ⊆ chase21(chase12(I)). If Pi(a) is a fact ofI (and soPi(a) is a fact ofI), and if1 ≤ i ≤ k, then we see fromthe tgds inS1 andT1 thatPi(a) is in chase21(chase12(I)). So assume thatP0(a) is a fact ofI (and soP0(a) is afact ofI). There are two cases.

Case 1:There isi with 0 ≤ i ≤ k−1 such thata4i+2 = a4i+1 anda4i+4 = a4i+3. Then the s-t tgdP0(xi) →

Qi(xi) in S3 and the s-t tgdQi(x

i) → P0(xi) in T3 guarantee thatP0(a) is a fact ofchase21(chase12(I)).

Case 2: There is noi with 0 ≤ i ≤ k − 1 such thata4i+2 = a4i+1 anda4i+4 = a4i+3. Then the s-ttgd P0(x1, x2, . . . , x4k) → P ′

0(x1, x2, . . . , x4k) in S1 and the formula inT2 guarantee thatP0(a) is a fact ofchase21(chase12(I)).

We now show the reverse inclusion, thatchase21(chase12(I)) ⊆ I. Because the premise of each member ofof Σ12 and ofΣ21 contains a single relational atom, we see that each fact ofchase21(chase12(I)) is obtained bychasing a single factPi(a1, a2, . . . , a4k) of I with a single memberσ1 of Σ12, and then chasing the single tuplethat results from this chase by a single memberσ2 of Σ21. We must show that the result of this second chase iseither the empty set, or is the factPi(a1, a2, . . . , a4k).

We now consider cases.

Case 1:σ1 is the s-t tgdP0(x1, x2, . . . , x4k) → P ′

0(x1, x2, . . . , x4k)

of S1. Assume thatσ1 was applied to the factP0(a1, a2, . . . , a4k) of I to obtain the factP ′0(a1, a2, . . . , a4k).

The only member ofΣ21 whose premise containsP ′0 is the formula inT2, and so we may assume thatσ2 is this

formula. Since the conclusion ofσ2 is P0(x1, x2, . . . , x4k), it follows that chasingP ′0(a1, a2, . . . , a4k) with σ2

gives either the empty set or the factP0(a1, a2, . . . , a4k), as desired.

Case 2:σ1 is the s-t tgdPi(x1, x2, . . . , x4k) → P ′

i (x1, x2, . . . , x4k)

of S1, for somei with 1 ≤ i ≤ k. Assumeσ1 was applied to the factPi(a1, a2, . . . , a4k) of I to obtain thefactP ′

i (a1, a2, . . . , a4k). The only member ofΣ21 whose premise containsP ′i is the tgdP ′

i (x1, x2, . . . , x4k) →

Pi(x1, x2, . . . , x4k), and so we may assume thatσ2 is this tgd. Clearly, chasingP ′i (a1, a2, . . . , a4k) with σ2 gives

the factPi(a1, a2, . . . , a4k), as desired.

Case 3:σ1 is the s-t tgdPi+1(x

i) → P ′0(x

i)

of S2, for somei with 0 ≤ i ≤ k− 1. Assume thatσ1 was applied to the factPi+1(a1, a2, . . . , a4k) of I to obtainthe factP ′

0(a1, a2, . . . , a4k). So necessarilya4i+2 = a4i+1 anda4i+4 = a4i+3. The only member ofΣ21 whose

34

Page 35: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

premise containsP ′0 is the formula inT2, and so we may assume thatσ2 is this formula. But this formula is not

fired byP ′0(a1, a2, . . . , a4k), sincea4i+2 = a4i+1 anda4i+4 = a4i+3. So this case is not possible.

Case 4:σ1 is the s-t tgdP0(x

i) → Qi(xi)

of S3, for somei with 0 ≤ i ≤ k − 1. Assume thatσ1 was applied to the factP0(a1, a2, . . . , a4k) of I to obtainQi(a1, a2, . . . , a4k). The only member ofΣ21 whose premise containsQi is the s-t tgdQi(x

i) → P0(xi) of T3,

so we may assume thatσ2 is this s-t tgd. Clearly, the result of chasingQi(a1, a2, . . . , a4k) with σ2 is either theempty set or the factP0(a1, a2, . . . , a4k), as desired.

This concludes the proof thatchase21(chase12(I)) ⊆ I, which was the final step in the proof thatM21 is aninverse ofM12.

We now show that the size of the smallest normal inverse ofM12 is exponential in the size ofM12. AssumethatM′

21 = (S2, S1, Σ′21) is a normal inverse ofM12. It follows from Theorem 4.10 that for every ground

instanceI:

I = chase′21(chase12(I)). (6)

Let us refer to4i + 1 and4i + 3 asbuddies, for 0 ≤ i ≤ k − 1. Let a = (a1, . . . , a4k) be a4k-tuple ofconstants. Let us calla specialif:

1. for each pairi1, i2 of buddies, exactly one of the equalitiesai1 = ai1+1 or ai2 = ai2+1 holds; and

2. these are the only equalities among members ofa (that is, ifai = aj for distinct valuesi, j, then there is anoddt such that{i, j} = {t, t+ 1}).

Let theequality profileof a be the2k-tuple (ρ1, ρ3, ρ5, . . . , ρ4k−1) whereρi = 0 if ai = ai+1, andρi = 1 ifai 6= ai+1. Let us say that an equality profile isspecialif it is the equality profile of a special tuplea. It is easy tosee that an equality profile is special precisely if exactly one ofρ1 or ρ3 is 0, exactly one ofρ5 or ρ7 is 0, etc.

For simplicity in what follows, when we say that an inequality xi 6= xj appears ina formula, we mean thateither the inequalityxi 6= xj or the inequalityxj 6= xi actually appears. Letσ be a member ofΣ′

21 whoseconclusion is of the formP0(x), wherex = (xm1

, xm2, . . . , xm4k

). For each odd numberi with 1 ≤ i ≤ 4k− 1,let us say thati is of type 1 with respect toσ if xmi

andxmi+1are distinct variables and the inequalityxmi

6= xmi+1

appears in the premise ofσ. If i is not of type 1 with respect toσ, then let us say that it is oftype 0 with respectto σ. Thus,i is of type 0 precisely if either (a)xmi

andxmi+1are the same variable, or else (b) they are different

variables and the inequalityxmi6= xmi+1

does not appear in the premise ofσ.

Let ρ = (ρ1, ρ3, ρ5, . . . , ρ4k−1) be a special equality profile, and leta be a special4k-tuple of constants withequality profileρ. Let I be a ground instance whose only fact isP0(a). Now chase12(I) consists of the singlefactP ′

0(a) (the s-t tgds inS3 cannot be applied in the chase sincea is special). It follows from (6) that there mustbe a memberσρ of Σ′

21 such that the chase ofP ′0(a) with σρ producesP0(a). It is clear thatσρ must have the

following properties:

1. The conclusion ofσρ is of the formP0(x), wherex = (xm1

, xm2, . . . , xm4k

);

2. variablesxmrandxms

can be the same variable only ifamr= ams

;

3. i is of type 0 with respect toσρ for eachi whereρi = 0; and

4. the only relation symbol that appears in the premise ofσρ is P ′0.

We now show that for each oddi with 1 ≤ i ≤ 4k − 1, we have thati is of typeρi with respect toσρ. Wealready have that ifρi = 0, theni is of type 0 with respect toσρ (this follows from the third condition above forσρ). So we need only show that ifρi = 1, theni is of type 1 with respect toσρ. Let i1 = i, and leti2 be the

35

Page 36: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

buddy ofi. Note for later use thati1 is odd (becausei is odd, andi1 = i). Sincea is special, and sinceρi1 = 1, itfollows thatρi2 = 0. Therefore, as noted before,i2 is of type 0 with respect toσρ. Assume thati1 is also of type0 with respect toσρ; we shall derive a contradiction.

Let h be a mapping from the variables inσρ to constants whereh(xmi) = ami

for eachi with 1 ≤ i ≤ 4k.This function is well-defined by the second condition aboutσρ.

We now show thath respects each of the inequalities ofσρ, that is, that ify 6= y′ is an inequality that appearsin the premise ofσρ, thenh(y) 6= h(y′). Assume thaty 6= y′ is an inequality that appears in the premise ofσρ, but h(y) = h(y′); we shall derive a contradiction. Sinceh(y) = h(y′), and sincea is a special4k-tupleof constants with equality profileρ, we know that there is some oddj such that{y, y′} =

{xmj

, xmj+1

}and

ρj = 0. By property (3) above (withj playing the role ofi), we know thatj is of type 0 with respect toσρ.Hence, the inequalityxmj

6= xmj+1, that is, the inequalityy 6= y′, does not appear in the premise ofσρ. This isour desired contradiction.

LetJ1 be the target instance that consists of all of the factsP ′0(h(y1), . . . , h(y4k)), where the atomP ′

0(y1, . . . , y4k)appears in the premise ofσρ. ObtainJ2 from J1 by replacing each occurrence ofai1+1 by ai1 . Definea′ =(a′1, . . . , a

′4k) by lettinga′i1+1 = ai1 , and lettinga′j = aj if j 6= i1 + 1. Sinceai2+1 = ai2 (becauseρi2 = 0), we

havea′i2+1 = ai2+1 = ai2 = a′i2 . Thus,a′i2+1 = a′i2 .

Defineh′ by lettingh′(y) = h(y) if y is notxi1+1, and lettingh′(xi1+1) = h(xi1), that is,h′(xi1+1) = ai1 .SoJ2 consists of all of the factsP ′

0(h′(y1), . . . , h

′(y4k)), where the atomP ′0(y1, . . . , y4k) appears in the premise

of σρ.

We now show thath′ respects each of the inequalities ofσρ. There are three cases.

Case 1:{y, y′} does not containxi1+1. Thenh′(y) = h(y) andh′(y′) = h(y). Now h(y) 6= h(y′), sincehrespects the inequalities ofσρ. Therefore,h′(y) 6= h′(y′), as desired.

Case 2: {y, y′} containsxi1+1 but notxi1 . In particular, eithery or y′ is xi1+1; assume without loss ofgenerality thaty is xi1+1. Thenh′(y) = h(xi1 ) andh′(y′) = h(y′). Sincey′ is notxi1 or xi1+1, andi1 is odd,we know by the fact thata is a special4k-tuple thath(y′) 6= h(xi1 ). Soh′(y′) = h(y′) 6= h(xi1 ) = h′(y).Therefore,h′(y) 6= h′(y′), as desired.

Case 3:{y, y′} = {xi1 , xi1+1}. This case is not possible, sincei1 is of type 0 with respect toσρ, and so theinequalityxmi1

6= xmi1+1does not appear in the premise ofσρ.

SinceJ2 consists of all of the factsP ′0(h

′(y1), . . . , h′(y4k)), where the atomP ′

0(y1, . . . , y4k) appears in thepremise ofσρ, and sinceh′ respects each of the inequalities ofσρ, it follows that the chase ofJ2 with σρ containsP0(a

′). Form the source instanceI2 from the target instanceJ2 by replacing each factP ′0(b) by P0(b). Let τ1 be

the s-t tgdP0(x1, x2, . . . , x4k) → P ′0(x1, x2, . . . , x4k). Clearly, the chase ofI2 with τ1 is J2.

Sincei1 andi2 are buddies, there iss with 0 ≤ s ≤ k − 1 such that{i1, i2} = {4s+ 1, 4s+ 3}. Let I ′2 bethe set differenceI2 \ P0(a

′), and letJ ′2 be the set differenceJ2 \ P ′

0(a′). Then the chase ofI ′2 with τ1 is J ′

2. LetI ′′2 consist of the factPs+1(a

′). Let τ2 be the s-t tgdPs+1(xs) → P ′

0(xs). Sincea′i1 = a′i1+1 (by construction)

anda′i2 = a′i2+1 (as noted earlier), the chase ofI ′′2 with τ2 contains the factP ′0(a

′). Let I3 = I ′2 ∪ I ′′2 . So thechase ofI3 with {τ1, τ2} containsJ ′

2 ∪ {P ′0(a

′)}, which containsJ2. Sinceτ1 andτ2 are members ofΣ12, itfollows that the chase ofI3 with Σ12 containsJ2. Since the chase ofI3 with Σ12 containsJ2, and the chase ofJ2 with σρ containsP0(a

′), it follows thatchase′21(chase12(I3)) containsP0(a′), which is not inI3. Therefore,

I3 6= chase′21(chase12(I3)), which contradicts (6) whenI is I3. This is our desired contradiction.

We just showed that ifρ = (ρ1, ρ2, ρ5, . . . , ρ4k−1) is a special equality profile, then there is a memberσρ ofΣ′

21 such that for each oddi with 1 ≤ i ≤ 4k − 1, we have thati is of typeρi with respect toσρ. Therefore,σρ andσ

ρ′ are different whenρ 6= ρ

′. Hence,Σ′21 has at least as many members as there are special equality

profiles. Clearly, there are2k distinct special equality profiles. SoΣ′21 has at least2k members. Since the size

of the schema mappingM12 is linear ink, it follows that the number of constraints inM′21 is exponential in the

size ofM12. SinceM′21 is an arbitrary normal inverse ofM12, this proves the theorem.2

Definition 9.2 A constraint isBoolean normalif it is of the formα ∧ χA ∧ θ → A, whereα is a conjunction ofsource atoms,A is a target atom,χA is the conjunction of the formulasconst(x) for every variablex of A, andθ

36

Page 37: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

is a Boolean combination (possibly empty) of equalitiesx = y for variablesx, y of A. Further, there is the safetycondition that every variable inA must appear inα. Again, we have suppressed writing the leading universalquantifiers. A schema mapping is said to be Boolean normal if all of its constraints are Boolean normal.2

Thus, we obtain the definition of “Boolean normal” from the definition of “normal” by allowing Boolean com-binations of equalities in the premise, rather than simply conjunctions of inequalities. Of course, every normalschema mapping is a Boolean normal schema mapping. Furthermore, it is easy to see that every Boolean normalschema mapping is equivalent to a normal schema mapping. That is, allowing Boolean combinations of equalitiesin the premise, rather than simply conjunctions of inequalities, does not increase the expressive power. However,allowing Boolean combinations of equalities in the premisedoes potentially allow a more compact representation.In particular, we can see that this happens in the inverse mappings in the proof of Theorem 9.1. There (exceptfor the fact that we did not bother to include the const formulas) we produced examples of inverses, specifiedby Boolean normal constraints, that are are of polynomial size, and hence exponentially more compact than anyinverse specified by normal constraints. The next theorem says that this is a general phenomenon: in the full case,we can always find, in polynomial time, a polynomial-size inverse (if an inverse exists). Before we prove this nexttheorem, we need some more machinery.

Let σ be an s-t tgd whose premise consists only ofP -atoms for some single relational symbolP . Define anequivalence relationEσ on the variables that appear inσ as follows. Assume thatP is t-ary. For eachi with1 ≤ i ≤ t, let Yi be the set of all variables that appear in theith position of some atom in the premise ofσ.Let Eσ be the most refined equivalence relation (largest number of equivalence classes) such that eachYi is asubset of an equivalence class ofEσ. It is easy to see that each equivalence class ofEσ is a union ofYi’s. Foreach equivalence class, select a unique representative, and let [x] denote the representative of the equivalenceclass containingx. Form σ′ from σ by replacing each variablex by [x]. It is easy to see that each of theatoms in the premise ofσ′ is the same atom (call itA). Intuitively, we have done a minimum unification ofthe atoms in the premise ofσ, so that they all “unify” toA. Formσ† from σ′ by replacing the premise ofσ′

by A. Now σ† is a “special case” ofσ (thus,σ† is obtained fromσ by identifying some variables and thenreplacing the conjunctionA ∧ · · · ∧ A by simplyA), and soσ† is a logical consequence ofσ. As an example,let σ beP (x, y, z) ∧ P (y, w, z) → Q(z, w) ThenY1 = {x, y}, Y2 = {y, w}, andY3 = {z}. There are twoequivalence classes ofEσ, namelyY1 ∪ Y2 = {x, y, w} and{z}. If we take the representative of the equivalenceclass{x, y, w} to bex, thenσ′ is P (x, x, z) ∧ P (x, x, z) → Q(z, x), andσ† is P (x, x, z) → Q(z, x). Note thatif σ has a singleton premise, thenσ† is simplyσ itself.

The next lemma shows why we are interested in these formulasσ†. We state this lemma in the full case (whichis the only case where we shall apply it), although it holds also for the nonfull case, provided we carefully definewhat we mean when we say that two different chases (which might introduce different nulls) are equal.

Lemma 9.3 Let σ be a full s-t tgd whose premise consists only ofP -atoms for some fixed relational symbolP ,and letI be an instance consisting of a singleP -fact. Then the chase ofI with σ equals the chase ofI with σ†.

Proof Sinceσ logically impliesσ†, it follows easily that the chase ofI with σ† is contained in the chase ofI withσ. We now show the opposite inclusion. Assume thatσ fires onI. Then there is a homomorphismh from the thepremise ofσ to I. So if theYi’s are as in the definition ofσ†, thenh(x) = h(y) wheneverx andy are both inYi. It follows easily thath(x) = h(y) wheneverx andy are in the same equivalence class ofEσ. Defineh′ on thevariables[x] in the premise ofσ† by lettingh′([x]) = h(x). Sinceh(x) = h(y) wheneverx andy are in the sameequivalence class ofEσ. it follows thath′ is well-defined. It is straightforward to verify thath′ is a homomorphismfrom the premise ofσ† to I, and soσ† fires onI. Further, it is not hard to see that the homomorphic image ofthe conclusion ofσ underh equals the homomorphic image of the conclusion ofσ† underh′. It follows that thechase ofI with σ is contained in the chase ofI with σ†. This was to be shown.2

Theorem 9.4 There is a polynomial-time algorithm such that if the input is a schema mappingM12 specified bya finite set of full s-t tgds, then the output is a polynomial-size Boolean normal schema mapping that is an inverse

37

Page 38: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

ofM12 if M12 has an inverse.8

Proof LetM12 = (S1,S2, Σ12), whereΣ12 is a finite set of full s-t tgds. For each memberϕ(x) → (A1 ∧ . . .∧Ar) of Σ12, where eachAi is an atom, letΣ′

12 contain the s-t tgdsϕ(x) → A1, . . . , ϕ(x) → Ar. Thus,Σ′12 is a

finite set of full s-t tgds, each with a singleton conclusion,that is logically equivalent toΣ12.

We now give a procedure to augmentΣ′12 to a setΣ′′

12. LetU be the set of all of the s-t tgdsσ†, as definedearlier, and letΣ′′

12 = Σ′12 ∪ U . SinceΣ′′

12 consists ofΣ′12 along with some logical consequences ofΣ′

12, itfollows thatΣ′′

12 is logically equivalent toΣ′12. Since alsoΣ′

12 is logically equivalent toΣ12, it follows thatΣ′′12 is

logically equivalent toΣ12. By renaming variables if needed, we can assume that no two distinct members ofΣ′′12

have a variable in common. Furthermore, we find it convenientto assume that for each memberσ of Σ′′12, there

is another memberσ� of Σ′′12 that is obtained fromσ by renaming the variables in a one-to-one manner and with

a disjoint set of variables fromσ (we addσ� to Σ′′12 if needed). It is easy to see that there is a polynomial-time

procedure for generatingΣ′′12 from Σ12.

Let us say that a memberσ of Σ′′12 is specialif the premise contains a single atom, and if every variable in

the premise appears in the conclusion (and hence the same variables appear in the premise and the conclusion).Let σ be a special member ofΣ′′

12. Assume thatσ is P (xz1, . . . , xzt

) → Q(xi1 , . . . , xik). So the conclusion of

σ is aQ-atom. Letτ be an arbitrary member ofΣ′′12, other thanσ, such that the conclusion ofτ is aQ-atom.

Assume that the conclusion ofτ is Q(xj1 , . . . , xjk). Recall thatσ andτ have no variables in common. LetEτ

be the most refined equivalence relation (largest number of equivalence classes) on the variables inσ andτ suchthatxi`

andxj`are in the same equivalence class, for1 ≤ ` ≤ k. Let θ1τ be a conjunction of equalities among the

variables inσ, where the equalityxir= xis

is an atom inθ1τ precisely ifxirandxis

are in the same equivalenceclass ofEτ . Intuitively, θ1τ tells us how to equate variables to obtain a minimum unification of the conclusions ofσ andτ . For each equivalence classE of Eτ , select a unique representative. If this equivalence classE contains avariable inσ, then choose the representative ofE to be a variable inσ. (The only times that the equivalence classE does not contain a variable inσ is whenE consists of a variable in the premise ofτ but not in the conclusionof τ . This is possible, since we are not assuming thatτ is special.) Let[x]τ denote the representative of theequivalence class ofEτ containingx. Let us refer to the variables[xi1 ]

τ , . . . , [xik]τ asdistinguished. Let us say

that aP -atomP (xw1, . . . , xwt

) in the premise ofτ is distinguishedif [xw`]τ is distinguished for1 ≤ ` ≤ t. If A

is the distinguishedP -atomP (xw1, . . . , xwt

), defineγA to be the conjunction of the equalities[xw`]τ = [xz`

for 1 ≤ ` ≤ t. Intuitively, γA tells us how to equate variables to obtain a minimum unification of A and thepremise ofσ. Defineθ2τ to be the disjunction of the formulasγA for each distinguishedP -atomA of τ . If thisdisjunction is empty (becauseτ has no distinguishedP -atom), thenθ2τ is the empty disjunction, which is logicallyequivalent toFalse. Intuitively, θ2τ tells us what possible collections of equalities among variables are needed toprovide a minimum unification of some atom in the premise ofτ with the premise ofσ. Let θτ be the formulaθ1τ → θ2τ . Intuitively, θ2τ says that if the conclusions ofτ andσ can unify to the same atom, then some atom in thepremise ofτ can unify to the same atom as the premise ofσ. Note that ifτ has no distinguishedP -atom, thenθτ

is logically equivalent to¬θ1τ . Let θ be the conjunction of the formulasθτ , over all membersτ of Σ12 other thanσ, where the conclusion ofτ is aQ-atom. We now defineσ∗ to beQ(xi1 , . . . , xik

)∧ θ → P (xz1, . . . , xzt

). Notethat (the hatted version of) the premise ofσ is the conclusion ofσ∗, and the conclusion ofσ is the relational atomin the premise ofσ∗. Let Σ21 consist of all of the formulasσ∗, whereσ is a special member ofΣ12. ObtainΣ′

21

from Σ21 by adding to the premise of every memberτ of Σ21 the conjunctsconst(x) wherex is a variable thatappears inτ . LetM′

21 = (S2, S1, Σ′21). The output of the algorithm isM′

21. It is clear that our algorithm runsin polynomial time (and so, of course, the outputM′

21 is of polynomial size). ClearlyM′21 is a Boolean normal

schema mapping. We shall show thatM′21 is an inverse ofM12 if M12 has an inverse.

Assume thatθ is a Boolean combination of equalities, andf is a weak renaming of variables. Let us say thatθ

holds underf if the Boolean expression that results by replacing each equality x = y by Truewhenf(x) andf(y)are the same variable, and replacing each equalityx = y by Falsewhenf(x) andf(y) are different variables,evaluates toTrue. Similarly, if g is a function that maps variables to constants, then say thatθ holds underg if

8Note that by part (1) of Corollary 3.4, we cannot hope for a polynomial-time algorithm for deciding invertibility. Therefore, the output ofthe algorithm is left unspecified ifM12 has no inverse.

38

Page 39: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

the Boolean expression that results by replacing each equality x = y by Truewheng(x) andg(y) are the sameconstant, and replacing each equalityx = y by Falsewheng(x) andg(y) are different constants. Let us say thatf andg agree on equalitiesif for eachx, we have thatf(x) = f(y) if and only if g(x) = g(y). Clearly, iff andg agree on equalities, thenθ holds underf if and only if θ holds underg. As before, ifϕ is a formula, letϕf bethe result of replacing every variablex in ϕ by f(x). If A is an atom, letAg be the fact that arises by replacingevery variablex in A by g(x).

Claim: For every constraintσ∗ in Σ21, which must be of the formβ ∧ θ → α, where (1)α is a source atom,(2) β is a target atom with the same variables asα, and (3)θ is a Boolean combination of equalities among thevariables, and for every weak renamingf , we have thatθ holds underf if and only if βf is an essential atom forαf (with respect toΣ12).

Note thatσ∗ is derived fromσ in Σ′′12, whereσ is α→ β. Assume thatα is aP -atom andβ is aQ-atom. We

now prove the Claim. Assume first thatβf is essential forαf ; we wish to show thatθ holds underf . To showthis, we must show that ifτ is a member ofΣ′′

12 other thanσ, and the conclusion ofτ is aQ-atom, thenθτ holdsunderf (this is becauseθ is the conjunction of such formulasθτ ). So assume thatθ1τ holds underf ; we mustshow thatθ2τ holds underf . Now the conclusion ofσf is βf . Sinceθ1τ holds underf , it follows thatσf andτf

have the same conclusion. So the conclusion ofτf is βf . Let g be a function that maps variables into constantsand that agrees withf on equalities. So the conclusion of ofτg is βg. Let I be an instance whose facts are thefactsAg for each atomA in the premise ofτ . So the chase ofI with τ includesβg. Sinceβf is essential forαf ,it follows thatαg is a fact inI. Soαg isAg for some atomA in the premise ofτ . It follows thatγA, as definedearlier, holds underg, and soθ2τ holds underg. Sincef andg agree on equalities, this implies thatθ2τ holds underf , as desired.

Assume now thatθ holds underf ; we must show thatβf is an essential atom forαf . Certainlyβf is relevantfor αf , sinceσ is in Σ12. So we must show thatβf is demanding forαf . Let g be a function that maps variablesinto constants and that agrees withf on equalities. Soθ holds underg. Let I be an instance wherechase12(I)containsβg; we need only show thatαg is a fact inI. It is easy to see that the result of chasing withΣ12 andΣ′′

12

are the same. So the result of chasingI with Σ′′12 containsβg. Hence, there is a constraintτ in Σ′′

12 that fires onI and producesβg. If τ is σ, then it is not hard to see that this implies thatαg is in I, as desired. Ifτ is notσ, itis straightforward to verify thatθ1τ holds underg. Since alsoθ holds underg, this implies thatθ2τ holds underg.So there is some distinguished atomA in the premise ofτ such thatγA holds underg. Hence,Ag andαg are thesame fact. Let us denote the conclusion ofτ byC. Sinceθ1τ holds underg, we have thatβg = Cg. SoCg arisesby chasingI with τ . Since alsoτ hasA in its premise, we have thatAg is in I. But we showed thatAg andαg

are the same fact. Soαg is in I, as desired. This concludes the proof of the Claim.

Assume thatM12 has an inverse. We now use the Claim to prove thatM′21 is an inverse ofM12.

Assume thatσ∗ is a member ofΣ21, andσ∗ is β ∧ θ → α. Let k be the number of variables that appearin σ∗. Define the setTσ∗ as follows. For each weak renamingf of the variables inσ∗ such that the range off

is in {x1, . . . , xk} and such thatθ holds forf , let Tσ∗ contain the constraintsβf ∧ ηf → αf , whereηf is theconjunction of the inequalitiesf(x) 6= f(y) wherex andy are variables ofσ∗ and wheref(x) andf(y) aredifferent variables. (The assumption that range off is in {x1, . . . , xk} is only to assure thatTσ∗ be finite.) It isstraightforward to see thatσ∗ is logically equivalent toTσ∗ , and similarly the constants-added version ofσ∗ islogically equivalent to the constants-added version ofTσ∗ (recall that the constants-added version of an s-t tgdmapping is the mapping that results by adding to the premise of every tgd the formulasconst(x) for every variablex that appears in the conclusion). LetΣ′′

21 be the union of the constants-added version of the setsTσ∗ over allσ∗

in Σ21, and letM′′21 = (S2,S1, Σ′′

21). By Proposition 7.11, we know thatM′21 is an inverse ofM12 if and only

if M′′21 is an inverse ofM12. So we need only show thatM′′

21 is an inverse ofM12.

We now use Theorem 5.13 to show thatM′′21 is an inverse ofM12. Since eachTσ∗ was obtained by consid-

ering weak renamingsf such thatθ holds forf , it follows easily from the Claim that for every memberϕ of Σ′′21,

the premise ofϕ is essential for the conclusion ofϕ. Hence, the first condition of Theorem 5.13 holds (whenΣ′′21

plays the role ofΣ21). We now show that the second condition also holds. LetA be a source atom. SinceM12 isinvertible, we know by Proposition 6.2 thatωA is essential forA, and so contains an atomB that is essential forA (with respect toΣ12).

39

Page 40: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

It follows from Lemma 9.3 that there is a memberσ of Σ′′12 with a singleton premise (and a singleton con-

clusion) such that the chase ofIA is the same withσ as it is withΣ12. Write σ asα → β. So there is a weakrenamingf such thatαf isA andβf isB. We now show that every variable inα appears inβ, and soσ is special.Assume that some variablex appears inα but not inβ; we shall derive a contradiction. Letf ′ be a weak renamingthat is likef except thatf ′(x) is a new variable. Soαf ′

is different fromA, althoughβf ′

is the same asβf , thatis,B. Sochase12(Iαf′ ) containsIB , even thoughIA 6⊆ Iαf′ . This contradicts the fact thatB is essential forA.Hence,σ is special, as desired.

So there isθ such thatσ∗ is β ∧ θ → α, andσ∗ is in Σ21. Sinceβf (namely,B) is essential forαf (namely,A), it follows from the Claim thatθ holds underf . Therefore, ifχA is the conjunction of the formulasconst(x)

for every variablex ofA, and ifηf is as above, thenB ∧χA ∧ ηf → A is a weak renaming of a constraint inΣ′′21.

Hence, the second condition of Theorem 5.13 holds (whenΣ′′21 plays the role ofΣ21), as desired. This completes

the proof thatM′′21 is an inverse ofM12. 2

It is open as to whether such a polynomial-time algorithm exists in the nonfull case. It is even open in thenonfull case as to whether or not there always exists a Boolean normal inverse of polynomial size if an inverseexists.

10 Relating Number of Inverses to Number of Constraints in anInverse

In this section, we show that for each full s-t tgd mappingM12, there is a relationship between the number ofnormal inverses ofM12 and the minimal number of constraints in a Boolean inverse for M12. We first showthat we cannot bound the number of inverses in terms of the minimal number of constraints in a Boolean normalinverse, since there is a full s-t tgd mapping with infinitelymany distinct normal inverses. We then show that wecan bound the minimal number of constraints in a Boolean normal inverse in terms of the number of inverses (andthe number of relation symbols).

We begin by giving an example of a full s-t tgd mapping with infinitely many distinct normal inverses.

Example 10.1 Let S1 consist of the unary relation symbolP , and letS2 consist of the binary relation symbolQ.Let Σ12 consist of the s-t tgdP (x) → Q(x, x). Let Σk

21 consist of the normal constraint

Q(x, y1) ∧Q(y1, y2) ∧ . . . ∧Q(yk−1, yk) ∧Q(yk, x) ∧ const(x) → P (x).

Let M12 = (S1,S2, Σ12), and letMk21 = (S2, S1, Σk

21). It is straightforward to verify that for every groundinstanceI and for eachk ≥ 1 we havechasek

21(chase12(I)) = I (wherechasek21(J) is the result of chasing

J with Σk21). It therefore follows from Theorem 4.10 thatMk

21 is an inverse ofM12 for everyk. It is alsostraightforward to verify thatΣk

21 andΣk′

21 are not logically equivalent ifk 6= k′. SoM12 has infinitely manyinequivalent normal inverses.2

The next theorem says that we can bound the minimal number of constraints in a Boolean normal inverse interms of the number of inverses (and the number of relation symbols).

Theorem 10.2 LetM12 be a full s-t tgd mapping, withk source relation symbols. Assume thatM12 has exactlym ≥ 1 inequivalent normal inverses. ThenM12 has a Boolean normal inverse with at mostk + log2(m)constraints.

Proof Assume thatM12 = (S1,S2, Σ12). Let us say that the source atomB is goodif chase12(IB) has exactlyone member. Let us say thatB is bad if B is not good. Letb be the number of bad prime source atoms. We shallshow that2b ≤ m (wherem is the number of inequivalent normal inverses ofM) and thatM has a Booleannormal inverse withk + b constraints. Since2b ≤ m, we have thatb ≤ log2(m), and sok + b ≤ k + log2(m).The theorem follows.

40

Page 41: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

For each source relation symbolP , letAP be theP -atomP (x1, . . . xr) wherex1, . . . , xr are distinct. IfB isaP -atomP (y1, . . . , yr), letϕB be the formula that is the conjunction of (1) the equalitiesxi = xj for eachi, jwhereyi andyj are the same variable, and (2) the inequalities of the formxi 6= xj for eachi, j whereyi andyj

are different variables. Intuitively,ϕB completely describes the equality pattern of the variablesin B. Let θP bethe disjunction of the formulasϕB whereB is a goodP -atom. LetσP be the formulaωAP

∧ θP → AP , whereωAP

is defined as in Definition 6.1.

Let B1, . . . , Bb be precisely the bad prime source atoms (they may involve various relation symbols). ByProposition 6.2, we know thatωBi

is essential forBi with respect toΣ12, for eachi. SinceBi is bad, it followsthatωBi

has more than one atom. By the construction in Definition 6.1,we see thatωBicontains the formulas

const(x) for every variablex of Bi. So by Proposition 5.18, we know that some atomCi in ωBiis essential for

Bi with respect toΣ12, for eachi. If Bi is P (y1, . . . , yr), defineηi to be the conjunction of the inequalities ofthe formyi 6= yj whereyi andyj are distinct variables. Letψ0

i be the constraintCi ∧ ηi → Bi, and letψ1i be the

constraintωBi∧ ηi → Bi.

Let v = (v1, . . . , vb) be an arbitrary{0, 1}-vector of lengthb. DefineΣv

21 to consist of thek formulasσP

(one for each source relation symbolP ) along with theb constraintsψvi

i for 1 ≤ i ≤ b. Let Mv

21 = (S2, S1,Σv

21). We now show that eachMv

21 is an inverse ofM12, and thatMv

21 andMv′

21 are not equivalent ifv 6= v′.Since the number of vectorsv is 2b, this shows that2b ≤ m. Further, since eachMv

21 is a Boolean normal inversewith k + b constraints, this shows thatM has a Boolean normal inverse withk + b constraints (in fact, it has atleast2b Boolean normal inverse withk + b constraints). This is sufficient to complete the proof.

Fix v = (v1, . . . , vb). We begin by showing thatMv

21 is an inverse ofM12. We now define a functione thatmaps each prime source atomB to an essential conjunctione(B) with respect toΣ12. For the bad prime sourceatomBi, we lete(Bi) = Ci if vi = 0, ande(Bi) = ωBi

if vi = 1. By construction,e(Bi) is essential forBi ifvi = 0, and by Proposition 6.2, we know thate(Bi) is essential forBi if vi = 1. For each good prime source atomA, we lete(B) = ωB. Again by Proposition 6.2, we know thate(B) is then essential forB. So by Theorem 5.15,Me

21 is an inverse ofM. We now show thatMe21 is equivalent toMv

21, which completes the proof thatMv

21 isan inverse ofM12.

For each prime source atomB whereB is bad,Me21 andMv

21 contain the same constraint with conclusionB.Let us now consider the good prime source atomsB. The formulaσP is logically equivalent to the set consistingof all of the formulasωAP

∧ϕB → AP , whereB is a good primeP -atom. Assume thatB is a good prime sourceatom. Letσ1 be the formulaωAP

∧ ϕB → AP , and letσ2 be the formulaωB ∧ ηB → B, where as beforeηB isthe conjunction of all inequalities of the formx 6= y wherex andy are distinct variables inB. By construction,σ2 is the unique member ofMe

21 with conclusionB. So to complete the proof thatMe21 is equivalent toMv

21,we need only show that the formulaσ1 is logically equivalent to the formulaσ2.

Assume thatB is the good atomP (y1, . . . , yr), wherey1, . . . , yr are variables, not necessarily distinct. LetψB be the formula obtained fromσ1 by replacing the variablexi by yi, for 1 ≤ i ≤ r. We now show thatψB

is logically equivalent to bothσ1 andσ2, which implies thatσ1 andσ2 are logically equivalent, as desired. InformingψB, two variablesxi andxj in σ1 are replaced by the same variable precisely ifyi andyj are the samevariable, which holds precisely if the equalityxi = xj appears inϕB . It follows easily thatψB is logicallyequivalent toσ1. We now show thatψB is logically equivalent toσ2.

It is easy to see that the conclusions ofψB andσ2 are the same, and that the result of replacing the variablexi by yi, for 1 ≤ i ≤ r, in ϕB is equivalent toηB . Let τB be the result of replacing the variablexi by yi in ωAP

,for 1 ≤ i ≤ r. So we need only show thatτB is equivalent toωB. Now the conjunct(s) ofτB must be inωB, byproperties of the chase with s-t tgds. SinceωB is a singleton (becauseB is good), it follows easily thatτB is thesame asωB. This concludes the proof thatMe

21 is equivalent toMv

21,

We conclude the proof by showing thatMv

21 andMv′

21 are not equivalent ifv 6= v′. Sayv 6= v′, and thatv = (v1, . . . , vb) andv′ = (v′1, . . . , v

′b). So there isi with 1 ≤ i ≤ b such thatvi 6= v′i. Assume without loss of

generality thatvi = 0 andv′i = 1. We now show that(ICi, ∅) satisfiesΣv

21 but notΣv

21. This of course shows thatMv

21 andMv′

21 are not equivalent. Clearlyψ0i fires onICi

, and so(ICi, ∅) does not satisfyψ0

i . Hence,(ICi, ∅)

does not satisfyΣv

21, becauseΣv

21 containsψ0i . We now show that no member ofΣv

21 fires onICi. SinceBi is

41

Page 42: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

bad, we know thatωBihas some other atomA in addition toCi as a conjunct. SinceCi is essential forBi, it

follows from Proposition 5.11 thatBi andCi have the same variables. SinceΣ12 is full, every variable inA isin Bi, and hence inCi. Assume thatCi isQ(y1, . . . , ym). ThenICi

consists of the factQ(cy1, . . . , cym

). If ψ1i

were to fire onICi, then there would be a homomorphismh from the premise ofψ1

i to ICi. SinceCi is part of the

premise ofψ1i , we must haveh(yi) = cyi

for 1 ≤ i ≤ m. Sinceh must mapA ontoQ(cy1, . . . , cym

), and sinceevery variable inA is amongy1, . . . , ym, it is easy to see thatA must beCi, which is a contradiction. Soψ1

i does

not fire onICi. We now show that no other member ofΣv

21 fires onICi. If some memberψ

v′

j

j were to fire onICi

wherej 6= i, then because of the inequalities inψv′

j

j , it would follow that some (and in fact, every) member ofchase12(IBj

) is of the formQ(c1, . . . , cm), whereck = c` if and only if yk = y` (intuitively, each member of the

prefix ofψv′

j

j is aQ-atom with the same equality pattern of variables asCi). So there would be a homomorphismICi

→ chase12(IBj). SinceCi is demanding forBi, it follows thatIBi

⊆ IBj. But this is impossible, since

i 6= j. So no memberψv′

j

j fires onICiwherej 6= i. A similar argument shows that noσP fires onICi

. So no

member ofΣv′

21 fires onICi, and hence(ICi

, ∅) satisfiesΣv′

21, as desired.2

It is an open problem as to whether a version of Theorem 10.2 holds in the nonfull case.

Note in particular from Theorem 10.2 that if the s-t tgd mappingM12 has a unique normal inverse (so thatm = 1 in Theorem 10.2) thenM12 has a Boolean normal inverse with at mostk constraints, wherek is thenumber of source relation symbols. This is the key to provingTheorem 7.15. We now give that proof.

Proof of Theorem 7.15:Assume thatM12 = (S1,S2, Σ12) is a full s-t tgd mapping with a unique normalinverseM21 = (S2, S1, Σ21), andM21 is not equivalent to the constants-added version of a near p-copy map-ping. Assume without loss of generality thatΣ21 has the minimal number of constraints among the various setsof normal constraints logically equivalent toΣ21, and each constraintσ in Σ21 has the minimal size among nor-mal constraints logically equivalent toσ. SinceM12 has a unique normal inverse, it follows from Theorem 10.2(wherem = 1) thatM21 has at mostk constraints, wherek is the number of source relation symbols.

Let P be an arbitrary source relation symbol, and letAP be aP -atom with all variables distinct. IfP isin the conclusion of no member ofΣ21, thenchase21(chase12(IAP

)) contains noP -fact, soΣ21 is too weak.Therefore,P is in the conclusion of some member ofΣ21. Since alsoΣ21 has at mostk constraints, it followsthat P is in the conclusion of exactly one member ofΣ21. Let σP be the member ofΣ21 whose conclusion isa P -atom. Every variable in the conclusion ofσP is distinct, or elsechase21(chase12(IAP

)) does not containIAP

, soΣ21 is too weak. LetBP be aP -atom with all variables the same. NowσP has no inequalities, or elsechase21(chase12(IBP

)) does not containIBP, soΣ21 is too weak.

The proof of Theorem 10.2 shows thatAP is good, that is, thatchase12(IAP) is a singleton. This singleton

is the only relevant atom (with respect toΣ12) for AP , so it follows fairly easily from part (2) of Theorem 5.13(and the assumption that constraints inΣ21 are of minimal size) that the premise ofσP contains (along withconstformulas) only a single relational atom. This relevant atomis also essential forP , as noted in Theorem 5.13. Sothe premise ofσP has (along withconst formulas) a single relational atom, that is essential for the conclusion ofσP . Note that this is also true about each weak renaming ofσP (that is, the atomB′ in the premise of the weakrenaming is essential for the conclusionA′ of the weak renaming). This is becauseA′ must have an essentialatom, and the only candidate isB′.

LetQ be an arbitrary relation symbol inS2. We now show that at most one member ofΣ21 can haveQ appearin its premise. Assume thatσP andσP ′ both haveQ appear in its premise; we must show thatP andP ′ are thesame. LetBQ be aQ-atom with all variables the same. ThenBQ is essential for bothBP andBP ′ (this followsfrom our earlier comment about weak renamings ofσP ). Hence, by Lemma 5.16, it follows thatBP andBP ′

are the same atom, soP andP ′ are the same, as desired. By Proposition 5.20, the variablesin the premise andconclusion ofσP are the same.

It follows from what we have shown thatM21 is equivalent to the constants-added version of a near p-copymapping. This was to be shown.2

42

Page 43: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

11 Invertibility in the LAV Case

Recall that a schema mapping has theunique-solutions propertyif no two distinct source instances have the sameset of solutions. Fagin [Fag07] showed that the unique-solutions property is a necessary condition for a schemamapping to have an inverse. Fagin [Fag07] also showed that for LAV mappings (those specified by s-t tgds with asingleton premise), the unique-solutions property is not only a necessary condition but also a sufficient conditionfor invertibility. The proof of this latter result was quitecomplicated. In this section, we give a very simple proof.

Just as we defined a homomorphic version of the subset property in Section 3, there is a homomorphic versionof the unique-solutions property, namely, thatI = I ′ wheneverchase12(I) ↔ chase12(I

′). The reason that theunique-solutions property is equivalent to its homomorphic version is because of the fact, shown in [FKMP05],that two source instancesI and I ′ have the same solutions if and only if they have homomorphic universalsolutions. Note that it follows immediately from the two homomorphic versions that the subset property impliesthe unique-solutions property.

We now give our greatly simplified proof that the unique solutions property characterizes invertibility in theLAV case.

Theorem 11.1 [Fag07] A LAV s-t tgd mapping is invertible if and only if it has the unique-solutions property.

Proof We just noted that the subset property implies the unique-solutions property. Since satisfying the subsetproperty is equivalent to invertibility, the “only if” direction follows (even when the s-t tgd mapping is not LAV).

Assume now thatM12 = (S1,S2, Σ12) is a LAV mapping that satisfies the unique-solutions property. Wenow show thatM12 satisfies the homomorphic version of the subset property, and so is invertible. Assume thatIandI ′ are such thatchase12(I) → chase12(I

′). Then

chase12(I ∪ I′) = chase12(I) ∪ chase12(I

′) ↔ chase12(I′),

where the equation follows from the fact thatM12 is LAV, and the homomorphismchase12(I) ∪ chase12(I′) →

chase12(I′) follows from the fact that we can select the variables generated in the chase so thatchase12(I)

andchase12(I′) have no nulls in common. Then by the homomorphic version of the unique-solutions property,

I ∪ I ′ = I ′ and thereforeI ⊆ I ′. This shows thatM12 satisfies the homomorphic version of the subset property,and so is invertible, as desired.2

12 Concluding Remarks and Open Problems

In addition to resolving the key problem left open in [Fag07]as to the complexity of deciding if an s-t tgd mappinghas an inverse, and also providing greatly simplified proofsof some known results, we have explored a number ofinteresting issues, about the structure of inverses, unique inverses, number of inverses, inverses of inverses, andsizes of inverses. We have shown that in the full case, these issues are, surprisingly, quite interrelated. We havealso shown that in the nonfull case, these tight interconnections do not hold. We showed that in the full case,there is a polynomial-size Boolean inverse, and a polynomial-time algorithm for producing it. As we noted inSections 9 and 10, there remain open problems about the size and about the number of constraints in inversesin the nonfull case. Perhaps the most interesting open problem is whether every invertible s-t tgd mapping (notnecessarily full) has a polynomial-size Boolean inverse, and if so, whether there is a polynomial-time algorithmfor producing it.

References

[APR08] M. Arenas, J. Perez, and C. Riveros. The recovery ofa schema mapping: Bringing exchanged databack. InACM Symposium on Principles of Database Systems (PODS), pages 13–22, 2008.

43

Page 44: The Structure of Inverses in Schema Mappings · For s-t tgd mappings, Fagin [Fag07] focused only on inversesspecified by tgds, and left open the problem of char-acterizing the language

[Ber03] P. A. Bernstein. Applying model management to classical meta-data problems. InConference onInnovative Data Systems Research (CIDR), pages 209–220, 2003.

[BM07] P. A. Bernstein and S. Melnik. Model management 2.0: Manipulating richer mappings. InACMSIGMOD International Conference on Management of Data (SIGMOD), pages 1–12, 2007.

[Fag68] R. Fagin. Representation theory for a class of denumerable markov chains.J. Math. Analysis andApplications, 23:500–530, 1968. Senior Thesis, Dartmouth College.

[Fag07] R. Fagin. Inverting schema mappings.ACM Transactions on Database Systems (TODS), 32(4), 2007.

[FKMP05] R. Fagin, Ph. G. Kolaitis, R. J. Miller, and L. Popa.Data exchange: Semantics and query answering.Theoretical Computer Science (TCS), 336(1):89–124, 2005.

[FKPT05] R. Fagin, Ph. G. Kolaitis, L. Popa, and W.-C. Tan. Composing schema mappings: Second-orderdependencies to the rescue.ACM Transactions on Database Systems (TODS), 30(4):994–1055, 2005.

[FKPT08] R. Fagin, Ph. G. Kolaitis, L. Popa, and W-C. Tan. Quasi-inverses of schema mappings.ACM Trans-actions on Database Systems (TODS), 33(2), 2008.

[FKPT09] R. Fagin, Ph. G. Kolaitis, L. Popa, and W-C. Tan. Reverse data exchange: Coping with nulls. InACMSymposium on Principles of Database Systems (PODS), pages 23–32, 2009.

[Kol05] Ph. G. Kolaitis. Schema mappings, data exchange, and metadata management. InACM Symposiumon Principles of Database Systems (PODS), pages 61–75, 2005.

[Len02] M. Lenzerini. Data integration: A theoretical perspective. InACM Symposium on Principles ofDatabase Systems (PODS), pages 233–246, 2002.

[Mel04] S. Melnik. Generic Model Management: Concepts and Algorithms, volume 2967 ofLecture Notes inComputer Science. Springer, 2004.

[MH03] J. Madhavan and A. Y. Halevy. Composing mappings among data sources. InInternational Confer-ence on Very Large Data Bases (VLDB), pages 572–583, 2003.

[MMS79] D. Maier, A. O. Mendelzon, and Y. Sagiv. Testing implications of data dependencies.ACM Transac-tions on Database Systems (TODS), 4(4):455–469, 1979.

[NBM07] A. Nash, P. A. Bernstein, and S. Melnik. Compositionof mappings given by embedded dependencies.ACM Transactions on Database Systems (TODS), 32(1), 2007.

44