special interest activity

26
Special Interest Activity Special Interest Activity Trio – A System for Integrated Management of Data, Uncertainty, and Lineage ITK478 Yan Cui

Upload: ankti

Post on 11-Jan-2016

53 views

Category:

Documents


3 download

DESCRIPTION

Special Interest Activity. Trio – A System for Integrated Management of Data, Uncertainty, and Lineage. ITK478 Yan Cui. Agenda. Understand Trio new database system System requirements of installing Trio system Source codes/packages for Trio system Procedure of Trio system installation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Special Interest Activity

Special Interest ActivitySpecial Interest ActivityTrio – A System for Integrated Management of Data, Uncertainty, and Lineage

ITK478Yan Cui

Page 2: Special Interest Activity

AgendaAgenda

Understand Trio new database systemSystem requirements of installing Trio

systemSource codes/packages for Trio systemProcedure of Trio system installationConfiguration of Trio SystemExperiment Trio DBMS Using TrioExplorer

and TrioPlus (demo)Trio Query Language (TriQL) Structure

(demo)Advantage and Disadvantage of using Trio

DBMSDocumenting Trio system report/bugs

Page 3: Special Interest Activity

Understand Trio new database Understand Trio new database systemsystemBasic concept of new Trio DBMS

◦ Trio is a new kind of database system (DBMS), which was developed by Stanford University Lab at Dec, 2006. It is based on an extended relational model called Uncertainty Lineage Database (ULDB) [1], and also supports Trio’s query language Called TriQL [1]. This new database system technology handles structured data, uncertainty of data, and data lineage together in a fully integrated manner.

Trio System architecture (components)

Uncertainty-Lineage Database (ULDB)TriQL: The Trio Query Language

Page 4: Special Interest Activity

Trio System architectureTrio System architecture

Four primary components of Trio DBMS◦ command-line client (TrioPlus)◦ TrioExplorer◦ Trio API and translator (Python)◦ standard relational DBMS (PostgreSQL)

Uncertainty-Lineage of Database◦ Encoded data table◦ Lineage table◦ Trio metadata◦ Trio Stored Procedures

Page 5: Special Interest Activity

Uncertainty-Lineage Database Uncertainty-Lineage Database (ULDB)(ULDB)Alternatives‘?’ (Maybe) AnnotationsNumerical ConfidencesLineageSample: Drives (person, color, car) and

Saw (witness, color, car) uncertainty tables with/no with confidence

Page 6: Special Interest Activity

Uncertainty-Lineage Database Uncertainty-Lineage Database (ULDB)(ULDB)Alternatives

◦Definition: Alternatives are presenting uncertainty about the contents of a tuple [2]

◦ ‘||’ annotation◦Drives(person, color, car) and Saw

(witness, color, car) uncertainty tables

◦‘Select * from Drives”

Page 7: Special Interest Activity

Uncertainty-Lineage Database Uncertainty-Lineage Database (ULDB)(ULDB)‘?’ (Maybe) Annotations

◦Definition: ‘?’ annotation present the existence of a tuple on the x-tuple, also called maybe x-tuple [2]

◦Drives(person, color, car) and Saw (witness, color, car) uncertainty tables

◦‘select * from Drives’

Page 8: Special Interest Activity

Uncertainty-Lineage Database Uncertainty-Lineage Database (ULDB)(ULDB)Numerical Confidences - Numerical

confidence also was considered as probability [2].

Drives(person, color, car) and Saw (witness, color, car) uncertainty tables with confidence

‘select * from Drives’

Page 9: Special Interest Activity

Uncertainty-Lineage Database Uncertainty-Lineage Database (ULDB)(ULDB)Lineage

◦“recorded at the granularity of tuple alternatives: Lineage connects as x-tuple alternative to other x-tuple alternative.” in [2]

◦Drives(person, color, car) and Saw (witness, color, car) uncertainty tables with confidence

◦‘select person from Drives’

Page 10: Special Interest Activity

TriQL: The Trio Query TriQL: The Trio Query LanguageLanguageTwo major parts - built-in functions

and predicates for querying confidence values and lineage, and regular SQL syntax [2]

Page 11: Special Interest Activity

System requirements of System requirements of installing Trio systeminstalling Trio systemoperating systems - as Linux, Mac

OS X, and Win-32 (XP, Vista, and 32-bit Server)

PostgreSQL database (version 8.2.5, 8.1.10, 8.0.14, and 7.4.18) - Linux and Win32

Python API - windows, Linux/Unix, Mac OS X, OS/2, and Amiga

Page 12: Special Interest Activity

Source codes/packages for Trio Source codes/packages for Trio systemsystem Listing source codes: Python 2.4 can be downloaded from http://www.python.org/ . Easy_install can be downloaded from

http://peak.telecommunity.com/DevCenter/EasyInstall and the file called ez_setup.py.

Readline 1.7.win32 can be downloaded from http://www.python.org/ .

ctypes-1.0.2.win32-py2.4 can be download from http://www.python.org/ .

PostgreSQL 8.1 can be downloaded from http://www.postgresql.org/ .

Graphviz 2.14 is the only version compatible with Trio API. It is available in http://infolab.stanford.edu/trio/code/graphviz-2.14.1.exe .

PyGreSQL can be downloaded from http://www.pygresql.org/ . Pylons 0.9.5 can be downloaded from http://pylonshq.com/ . PLY 2.2 can be downloaded from http://www.dabeaz.com/ply/ . PyParsing can be downloaded from

http://pyparsing.wikispaces.com/ . PyDot can be downloaded from

http://code.google.com/p/pydot/downloads/list . Trio API 1.0 can be downloaded from

http://infolab.stanford.edu/~theobald/sources/TRIO.zip .

Page 13: Special Interest Activity

Procedure of Trio system Procedure of Trio system installationinstallation Python

◦ Download Python 2.4 windows version (python-2.4.4.msi).◦ Install Python in C:/Python directory◦ Set path=c:/Python24; in environment variables

Readline - download Readline-1.7.win32-py2.4.exe and install into Python directory

Ctypes – download ctypes-1.0.2.win32-py2.4.exe and install into Python directory

PostgreSQL◦ Download PostgreSQL 8.1 windows version (postgresql-8.1.msi )◦ Install PostgreSQL 8.1 as following:

Language selection (Fig 10) – English Introduction screen (Fig 11) – next Welcome message and instructions (Fig 12) – next Feature selection (Fig 13) – next Service installation (Fig 14) – check install a service, input account name

‘postgres’ and password. Initdb (Fig 15) – check initialize database cluster, superusername and password. Procedural languages (Fig 16) – Check PL/pgsql only. Contrib modules (Fig 17) – check Admin81 only. Next in (Fig 18, 19, 20, 21) to complete the installation.

◦ Set path C:\Program Files\PostgreSQL\8.1\bin; after completed the installation.

Page 14: Special Interest Activity

Procedure of Trio system Procedure of Trio system installation (cont)installation (cont) Graphviz - Download Graphviz 2.14 version and install to

your workstation and set path C:\PROGRA~1\ATT\Graphviz\bin; in environment variables after completed the installation

Easy_install - Download ez_setup.py in C:/ directory PyGreSQL – In command line, cd\ to c: directory, and run

python ez_setup.py PyGreSQL to install components. Pylons – In command line, cd\ to c: directory, and run

python ez_setup.py Pylons==0.9.5 to install Pylons. Set path c:\python24\Scripts in environment variables.

PLY – In command line, cd\ to c: directory, and run python ez_setup.py Ply==2.2 or easy_install Ply==2.2.

PyParsing – In command line, cd\ to c: directory, and run python ez_setup.py PyParsing.

PyDot – download the source from website. Access to folder in command line, and then install manually by running ‘python setup.py install’.

Trio API◦ Download source code in any directory◦ Copy Trio-1.0\spi\triospi_win32.dll to PostgreSQL’s lib directory and

renamed as triospi.dll

Page 15: Special Interest Activity

Procedure of Trio system Procedure of Trio system installation (cont)installation (cont)

PostgreSQL installation

Page 16: Special Interest Activity

Configuration of Trio SystemConfiguration of Trio System

Windows superuser authentication to access PostgreSQL

TrioExplorerTrioPlus

Page 17: Special Interest Activity

Configuration of Trio SystemConfiguration of Trio System Windows superuser authentication to access

PostgreSQL◦ Double click in start->all programs->PostgreSQL 8.1->pgadmin

III◦ Right click on Login Roles to create new login role

Role name ‘myname’ (as same as windows login account) Set password (Password can be any) Check all role privileges and click ok.

◦ Right click on Database to create new database Database name ‘myname’ (as same as username) Owner is ‘myname’ and click ok

◦ Initialize Trio schema information In Trio-1.0\setup, open setup.py with notepad to comment out

the last three codes and put the following. (After complete the initialization, please change back to original).

os.system("psql %s %s < setup.sql" % (pgdbname, username))

os.system("psql %s %s < setup_triospi.sql" % (pgdbname, username))

os.system("psql %s %s < trio_get_conf.sql" % (pgdbname, username)) Save the file, and at the command line, cd \Trio-1.0\setup,

and run ‘python setup.py myname myname’ in command line. Provide password to create schema and done.

Page 18: Special Interest Activity

PostgreSQL 8.1->pgadmin III

Page 19: Special Interest Activity

Configuration of Trio SystemConfiguration of Trio System

TrioExplorer◦ Make sure PostgreSQL is working.◦ Running TrioExplore – Ensure path ‘c:\python24\

Scripts;’ in environment variables. And double click ‘start_te_server.bat’ under Trio-1.0\explorer.

◦ At the command line, you are now prompted for an admin user login to PostgreSQL, which should have been created along with your PostgreSQL installation and which will be used by TrioExplorer to create new user roles and database instances.

◦ TrioExplorer should now be reachable from your browser using http://localhost:8080/. For new users can now press ‘Create a new user’ and create their own Trio login and database instances, which are then managed by the PostgreSQL server.

Page 20: Special Interest Activity

Configuration of Trio SystemConfiguration of Trio System

TrioPlus◦ Create new PostgreSQL user role and database

instance Run ‘createuser demo’ Run ‘createdb demo’, the name must be the

same as username◦ Initialize Trio schema information for new user by

access as same as windows superuser authentication to access PostgreSQL. Use TrioExplorer will be easily just press ‘Create new role’ in Web.

◦ Connect to new Trio database using the command line clients by running ‘python trioplus.py –u demo –d demo –p’

Page 21: Special Interest Activity

DemoDemo

Experiment Trio DBMS Using TrioExplorer And TrioPlus

Trio Query Language (TriQL) Structure (Drop index/table, Create Trio table/index, TriQL language) from http://infolab.stanford.edu/~widom/triql.html Drives(person, color, car) and Saw (witness, color, car)

Page 22: Special Interest Activity

Table of TriQL contentsTable of TriQL contentsULDBs Uncertian attibutes, maybe annotations and

confidence values

SQL over ULDBs Selection, projection, join, subqueries, duplicate-elimination, grouping and aggregation, aggregate variants, set operators, order by

Flatten and GroupAlts Flatten is used to turn tuples with alternative values into regular tuples, while GroupAlts is used to create or restructure alternative values

Horizontal subqeries: The [ ] construct [ ] in the where clause, [ ] whith joins, Syntactics shortcuts in [ ], [ ] in the select clause, [ ] with Self-Joins

Builit-in Functions Conf() and Maybe() Multi-table conf()Result confidences Result confidence evaluation, uniform and

scaled result confidences, On-Demand confidence computation

Built-in Predicate Lineage() The Lineage() predicate lets queries filter joined tuples based on whether they are related via lineage

Options Nolineage, Noconf, and NoMaybe Indicate lineage, confidence values, and/or?'s should be omitted from query results

Data modificaiton Insert statement, delete statement, update statement

Page 23: Special Interest Activity

Advantage and Advantage and DisadvantageDisadvantageAdvantage

◦ Open source and free support for any non-benefit users to experience new Trio DBMS

◦ Advanced components in relational DBMS◦ Computing confidences◦ Efficient, Convenient, safe, Multi-User storage of

and access to, Massive, PersistentDisadvantage

◦ Time cost for query◦ Dependency◦ On development stage

Page 24: Special Interest Activity

Advantage and Advantage and DisadvantageDisadvantageDisadvantage

◦ Time cost for query

Using ‘SELECT attr-list FROM X1, X2, ..., Xn WHERE predicate’ as a query example in [6] for a comparison between relational database and ULDB.

Over standard relational database: For each tuple in cross-product of X1, X2, ..., Xn

Evaluate the predicate If true, project attr-list to create result tuple

Over ULDB: For each tuple in cross-product of X1, X2, ..., Xn

Create “super tuple” T from all combinations of alternatives

Evaluate predicate on each alternative in T ; keep only the true ones

Project attr-list on each alternative to create result tuple Details: ‘?’, lineage, confidences

Page 25: Special Interest Activity

Documenting Trio system Documenting Trio system report/bugsreport/bugs The install instruction in website

http://dbpubs.stanford.edu:8011/doku.php/trio:installation , indicated unclearly the version of Graphviz for Trio system. In Graphviz website http://www.graphviz.org/ only has version 2.16 but not compatable except version 2.14. Graphviz version 2.14 is available for download in http://infolab.stanford.edu/trio/code/graphviz-2.14.1.exe .

The windows authentication supperuser needs to be created first in the PostgreSQL in order to connect to database. After established the connection, TrioExplorer and TrioPlus can use the supperuser’s login and password as windows authentiction to access to database system. However, it doesn’t mention at all in the installation procedure on how to create this typle of new user. The only way to solve it is to use PostgreSQL->pgadmin III manually.

After created the supperuser, I have to modify some codes in setup.py in Trio-1.0->setup directory in order to run ‘python setup.py –u myname –d myname –p’.

TriQL query statements in http://infolab.stanford.edu/~widom/triql.html#options, there are many samples queries not working properly as desired.

Page 26: Special Interest Activity

ReferencesReferences M. Mutsuzaki, M. Theobald, A. de Keijzer, J. Widom, P.

Agrawal, O. Benjelloun, A. Das Sarma, R. Murthy, and T. Sugihara. Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS. Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR '07), Pacific Grove, California, January 2007. Demonstration description.

O. Benjelloun, A. Das Sarma, C. Hayworth, and J. Widom. An Introduction to ULDBs and the Trio System. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Databases, 29(1):5-16, March 2006.

Trio: A System for integrated Management of Data, Uncertainty, and Lineage. Retrieved on November, 18, 2007 from http://infolab.stanford.edu/trio/ .

PostgreSQL. Retrieved on November, 20, 2007 from http://www.postgresql.org/.

Python. Retrieved on November, 15, 2007 from http://www.python.org/.

Trio: A System for Data, Uncertainty, and Lineage. given by Jennifer at various venues, 2006-07. Ppt.