regular object types and x tatic based on: a paper by vladimir gapeyev and benjamin c. pierce a...

38
Regular Object Types and XTATIC Based on: A Paper by Vladimir Gapeyev and Benjamin C. Pierce A Presentation of the paper by Benjamin C. Pierce Presented by: Lena Lempert

Post on 20-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Regular Object Types and

XTATIC

Based on: A Paper by Vladimir Gapeyev and Benjamin C. Pierce A Presentation of the paper by Benjamin C. Pierce

Presented by: Lena Lempert

Page 2: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Introduction Regular types have been proposed as a base

for statically typed processing of XML. However, regular types have only been

explored in special-purpose languages – languages with type system designed around regular types (XDuce, CDuce, Xquery).

Page 3: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Our objective To develop XTATIC language, which goal is

to bring regular types to a broader audience by offering them as a lightweight extension of a popular object-oriented language – C#.

Page 4: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Key ideas of XTATIC

XTATIC data model - a combination of: Tree-structured data model of XDuce Classes-and-objects data model of object oriented

language. Treats XML structures as objects.

Page 5: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

FX : a core calculus for XTATIC A formal core of the XTATIC design is being developed. A tool for this investigation – a tiny language called FX. FX features are drawn from:

FJ – Featherweight Java The core of XDuce

Points of interest include: A smooth interleaving of the two data models A definition of “subtype” relation A natural encoding of XML documents using singleton classes

Page 6: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

XTATIC exampleAn XML fragment:

<Person> <Name> Lena Lempert </Name> <Email> [email protected]</Email></Person><Person> <Name> Queen Elisabeth </Name> <Phone> +44 55 6666 </Phone></Person>

The corresponding XTATIC value

[ <Person> [ <Name> [ ‘Lena Lempert’ ] <Email> [‘[email protected]’ ]

], <Person> [

<Name> [ ‘Queen Elisabeth’ ],<Phone> [ +44 55 6666 ]]

]

A type for this expression:[ <Person> [ <Name> [ pcdata ],

(<Email> [ pcdata ] | <Phone> [ pcdata ] ) ] *]

| union

* repetition

, concetanation

Page 7: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

XTATIC example (cont.) Sequence values can be examined

using type-based pattern matching. Example:

list – variable that contains sequence of the type given in previous slide

If Person has Email – extracts the email to pcdata variable e and uses it to extend the text in spamlist

Otherwise, the person must have Phone. Second case binds the whole entry to variable p and adds it to the phonebook sequence

Empty sequence

match (list) {case [ <Person>[ <Name>[pcdata], <Email>[pcdata e ] ], any rest ]:

spamlist = [ spamlist, ‘,‘, e ];case [ <Person>[ <Name>[pcdata], <Phone>[pcdata] ] p, any rest]]:

phonebook = [[ phonebook, p ]];case []: //.. }

Page 8: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Data model Data model of a language is:

The collection of values that programs in the language manipulate

The types of those values Fundamental relations such as value typing and

subtyping Our primary goal – combination of trees and

objects (and their types). Therefore we will concentrate on data model of FX, which is combination of data models of XDuce and FJ.

Page 9: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

The XDuce Data Model XDuce – language of labels Consists of:

A set of label values A set of label types A denotation function [[·]] giving the set

[[ L ]] L of label values that are members of each type L The subtyping relation:

L1 ᆮ L2 (L1 is a subtype of L2) iff [[L1]] [[L2]] Simple choice of label language:

for each value l L , consider l to be a label type as well. Then l is the singleton type whose denotation contains just l.

A wildcard label type ~, denoting the whole set L.

Page 10: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

The XDuce Data Model (cont.) Tree value t – consists of a label value and a sequence of children tree

values:

t ::= <(l)>[t1, …, tn] where n ≥ 0 XDuce types – regular types - regular expressions over an “alphabet”

consisting of tree types <(L)>[X]:T ::=

<(L)>[X] tree[] empty sequenceT, T concetanationT | T unionT* repetition

Subtyping relation for regular types:

T1 < T2 iff [[ T1 ]] [[ T2 ]]

Page 11: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

The FJ Data Model FJ (Featherweight Java) is a tiny calculus

designed to capture the essential typing mechanisms of class-based object-oriented languages such as Java and C#.

Included: the core mechanisms of objects creation, field access, method invocation, inheritance.

Ommited: interfaces, overloading, static members, concurrency, and even assignment!

Page 12: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

But how can we manage without assignment?…

Here’s the trick: Demand that the fields of an object be initialized from it’s constructor

arguments and never touched again. A class definition must have the form:

class C {

D1 f1; … Dn fn;

C (D1 x1, …Dn xn) { f1=x1; … fn = xn }

… method definitions…

} Now identify an object with the expression new C(a1, …, an) used to

create it – i.e., just treat new expressions as values.

Page 13: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

The FJ Data Model (cont. 1) An FJ program consists of a collection of class declarations plus a single

expression to be evaluated. FJ types = class names C FJ values = objects

o ::= new C(o1, o2, …, on) (n ≥ 0)

The constructor arguments o1, o2, …, on (usually written just ō) must correspond exactly to the fields of class C. Example:

private:ab

private:ef

CC

DDd = new D(a, b, e, f)

Page 14: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

The FJ Data Model (cont. 2) We say that a value C(ō) is a valid object of the class C if:

Its field values ō conform to the field types declared for C

The denotation of a class C: The set of all valid objects of the class C and its

subclasses Subtyping relation:

C1 ᆮ C2 (C1 is a subtype of C2) iff [[C1]] [[C2]]

Page 15: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

The FX Data Model The interleaving of:

XDuce data model FJ data model

Observation 1: We can treat sequences of trees as objects

A special class Seq, whose subtypes are all the regular types.

All the tree values are transalated to the objects of the class Seq.

Page 16: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

The FX Data Model (cont.) Observation 2:

We can treat the data model of classes and objects as a “label language” : Objects – labels in XDuce trees Classes – label types

Page 17: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

FX values and types Values

a ::= FX valuenew C(ā) object[ ] delimited sequence

t ::= tree value<(a)> [ ]

Types

A ::= FX typeC Class name[X] Regular type name[T] Regular type

T ::= regular type<(A)> [X] tree type[] empty

sequenceT, T

concetanationT | T unionT* repetition

t

Full FX languageFull FX language

Regular expression sub-languageRegular expression sub-language

t

Page 18: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Program context A program context is a tuple:

Ctx = <Typenames, def, Classes, ᆮ :, fields, mtypes, mbody>

Where: Typenames – set of names for regular types def – a function that maps each name in Typenames to its definition Classes – a set of class names, containing special names Object and Seq ᆮ : - a transitive subclass relation, such that C ᆮ : Object for all C, and

such that Seq has no sub or super-types except Object. fields – a list F1 f1 … Fn fn, such that fields(Seq) and fields(Object) are

empty, and if C is a subclass of D, then fields(D) is a prefix of fields(C). mtypes – method type mbody – method body

Page 19: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

FX type membership The syntax of values given until now allows ill-formed object values

new C(ā) , where actual field values ā do not conform to the field types declared for class C in program context.

To correct this, we introduce a type membership relation a A:value a is valid, if there is a type A, such that a A.

Type denotation (set of values of the type):[[ A ]] = { a | a A }

Denotation of Seq: Does not contain objects ( new Seq(ā)) Contains all valid sequence values

Subtyping in FX:

A1 is a subtype of A2 [[A1]] [[ A2]]

Page 20: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

The FX Language Syntax The full-blown FX language syntax:

e := expression

x value variable

new C(ē) new object creation

e.f field access

e.m(ē) method call

<(e)>[ē] tree

[ē] sequence

match(e) { case [P]: ē} pattern match

Page 21: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

The FX Language Syntax (cont.) Q := FX pattern

C class[X] pattern name[P] regular patternQ x FX var binding

P := Regular Pattern<(Q)>[P] tree[] empty sequenceP, P concetanationP | P alternativeT* type repetitionP x regular var binding

Page 22: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

FX syntax - explanations and constraints Types (types of fields, or appearing in method signatures) – can be

regular types as well as classes. Variables – can hold any FX values, either objects or sequences. Only tree values can be members of sequence values.

[ [t], (new C(a)), [s]] – not allowed! Sequence expressions – nested sequences allowed!

The reason - we want the following expression to be legal:[ db.getPapers(“POPL”), db.getPapers(“ICFP”)]

If the method getPapers() returns values of sequence type. An object is never legal as a member of sequence. A tree expression <(e)>[d] is never allowed outside of sequence

parentheses[…].

Page 23: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Pattern matching Deconstruction of sequence values is done by matching them against

patterns using the match construct. Syntactically resembles C# switch Behaves like XDUCE match

match(d) {

case [P1]: e1;

case [P2]: e2;

case [Pn]: en;

}

A sequence

A sequence pattern

No “fall through”

Page 24: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Patterns matching (cont. 1) The syntax of FX patterns [P]:

As in XDUCE – a pattern is just a type annotated with variable binders.

A class pattern – has the form

C x (C – class name, x – variable to be bound). In pattern matching, we do not examine object fields for conformance

with the declared field types – only the class tag of the object is checked (in contrast to the type membership relation).

Similarly, in pattern matching, the validity of sequences is not checked. This is safe, as only valid objects and sequences exist at a run time.

Page 25: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Patterns matching (cont. 2) We can use a class pattern in the label possition in a

tree pattern. Classes can be types of labels in tree types. Allows to extract a label from a tree as an object, for a

later use in the program.

Page 26: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Properties We can now state for FX the standard results of

static type safety: Preservation Progress

Formal definitions: A value environment Σ A typing environment Γ Σ conforms to Γ ( Σ Γ), if:

dom(Σ) = dom(Γ) Σ(x) Γ(x), for all x

● - an environment with an empty domain

Page 27: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Properties (cont.) Expression e gets type A, in the typing environment

ΓΓ ├ e A

Expression e evaluates to value a, in the value environment Σ

Σ ├ e ↓ a Evaluation of e gets stuck in a finite number of

steps:

Σ ├ e ↓

Page 28: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Properties: proposition Proposition (pattern matching preserves validity):

Let a A Q – a pattern If

a ►Q => Σ and ►Q => Γ

Then A <: tyof(Q) and Σ Γ

tyof(Q) – type obtained from Q by erasing value binding annotations.

Page 29: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Properties: preservation and soundness Preservation:

For Σ Γ , if Γ ├ e A and Σ ├ e ↓ a

Then a A

Soundness: If ●├ e A Then not ●├ e ↓

Page 30: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

XML in FX How the “leaf data” (PCDATA) can be

treated? We extend the C# data model by introducing

singleton classes for individual characters. The program context Ctx provides:

A class Char (standard C# character class) For each character c – a class CharC extending Char.

Each CharC contains a single object – new CharC()

Page 31: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

XML in FX - pcdata We can define a regular type pcdata,

representing XML character data: def(pcdata) = (<(Char)>[])*

a sequence of trees, where each tree: Has no children Has a character object as its label

<(Object)>[pcdata] – a tree whose body contains only character data.

Page 32: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Why not use C# String? First reason:

pcdata representation opens the way to interesting uses of pattern matching for string regular expression processing. Since Chara is a subtype of Char – we can write types

that restrict text to a particular form. Example:

All character sequences starting with ‘a’ and ending with ‘b’:

<(‘a’)>[], pcdata, <(‘b’)>[]

Page 33: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Why not use C# String? (cont.) Second reason:

In XML, two character sequences following each other are indistinguishable from a single larger character sequence.

pcdata – satisfies this requirement[pcdata, pcdata] = [<(Char)>[]*, <(Char)>[]*] = [<(Char)>[]*] = [pcdata]

String – does not satisfy this requirement[<String>[], <String>[]] ≠ [<String>[]]

Page 34: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

The encoding of XML documents in XTATIC

Encoding of XML tags Exactly the same intuition we used for characters!

A special class Tag For each tag <g> - a singleton class Tag<g>

Tag <g> is a subclass of Tag

a single object – new Tag<g>()

Page 35: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

The encoding of XML documents in XTATIC - example

XML fragment:

<basket> <apple/> <banana/> </basket>

XTATIC value:< new Tag<basket>()>[< new Tag<apple>()>[],

<new Tag<banana>()>[] ]

XTATIC type:

<Tag<basket>> [ <Tag<apple>>[], <Tag<banana>>[] ]

Page 36: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Status FX language definition : more or less finished Prototype typechecker / interpreter for FX :

running Pattern match compilation : underway Run-time system: just starting Extension with attributes: underway

Page 37: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Some of the remaining challenges Run-time representation issues Exploring alternative pattern matching primitives Dealing with update operations on XML structures

Possible approaches: Leave type system alone; use run-time checking to maintain

safety Add types for mutable XML structures

Namespaces Additional XML features (e.g. from XML-Schema) Integration with polymorphizm (generics) Dealing with large XML structures (streaming)

Page 38: Regular Object Types and X TATIC Based on:  A Paper by Vladimir Gapeyev and Benjamin C. Pierce  A Presentation of the paper by Benjamin C. Pierce Presented

Related work Current work at MS on integrating “native

XML types” with C# Work on adding regular expression types and

patterns to OCaml CDuce Relax-NG