autocompletion for mashups presented by: ido schreier writers: neoklis polyzotis ohad greenshpan...

52
Autocompletion for Mashups Presented by: Ido Schreier Writers: Neoklis Polyzotis Ohad Greenshpan Tova Milo Copyright 2009 VLDB Endowment Article link

Post on 21-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Autocompletion for Mashups

Presented by:Ido Schreier

Writers:Neoklis PolyzotisOhad GreenshpanTova Milo

Copyright 2009 VLDB EndowmentArticle link

Agenda Introduction “MatchUp”- Assists creating Mashups. The Algorithm Implementation Experiments & Performance Summarize

Appendixes:1. Computing Importance.2. Future plans.

Mashup

What’s a Mashup? Music Video

A Sample from YouTube…

A Web Mashup Mashup is a technology for integration of

data, services and applications being available on the web, into a single application.

A Web Mashup (cont’)

A collection of Web APIs.

File System

Display

Sound

Network

Operating System

Google Maps

RSS feeds

Songs Lyrics

Weather forecasting

The Web

Program Mashup

Mashup Samples Can be found in www.ProgrammableWeb.com

It will be our reference DB Statistics

4551 Mashups 1573 APIs

Some Samples… Mashups

10X10

Why Mashup?

1. Quick Applications’ delivery. 2. Reusing existing (successful)

resources. 3. Possibility to quickly change

applications for new situations.

Why Mashup? (cont’)

4. Gain valuable insights through information remix.

5. Innovate and create value through community contribution.

(1) Choose some relevant

components

(2) Decide which should be

connected and learn their spec

Components Repository

(3) Glue

Mashup Development

The Problem

Given a large number of components, selecting the right components and the appropriate connections between them. Inexperienced developers. Time.

Agenda Introduction “MatchUp”- Assists creating Mashups. The Algorithm Implementation Experiments & Performance Summarize

Appendixes:1. Computing Importance.2. Future plans.

The Solution

A System that will assists developers by recommending possible compositions for his chosen components.

“Matchup” “Collective Wisdom” Iterative creation of the Mashup.

“MatchUp” method: AutoCompletionAutoCompletion

In other fields: File locations in Unix. E-mail programs. Source code editors.

Data-model Components

The atomicatomic MashletsMashlets The basic unit in a Mashup. Implements specific functionalities. E.g:

News RSS feeds. Visual functionalities. Draw a map. Extracts coordinates of a place.

The compound Mashlet- - Glue PatternGlue Pattern (GP) A Logical component combines

several atomicatomic MashletsMashlets or Other GPsGPs. Every GP is a Mashup. e.g.:

Glue the previous mentioned Mashlets.Map + coordinates of a place + RSS News feeds

= display News in a map

Mashups Graph

Our problem domain

MM GPGP

DBDB

M = {atomic Mashlets available on the Web}GP = {GPs available on the Web}

U = {Collection of Mashlets chosen by the user}

UU

Given a group U, “MatchUp” will give the user recommendations of kk possible combinations. Will choose GPs from GP. May Ignore/Replace Mashlets from U. May Add Mashlets from DB.

“MatchUp”: Autocompletion for Mashups.

The MashletsMashlets Supports an interfaceinterface of variables (I/O)

and methods that are visible to other MashletsMashlets.

Internal Data. Rules (Logic):

What’s the output according to given input. May be implemented as queries. Using high-level programming language.

Web-Services Inheritance.

Inheritance of MashletsMashlets MashletsMashlets may be similar to others with

common functionality. Can be distributed into a small group

of types. Chat, Sports, Travel, Photos, etc.

Atomic Mashlets (APIs)

Mashups (GPs)

# In Total 1573 4551

# Of categories 51 20

# at Maps category

99 2134

# at Music Category

60 310

Inheritance of MashletsMashlets (cont’)

Mashlet m2 inherits from Mashlet m1 if: {m1 interface} {m2 interface}

Inheritance distance metric: Quantify the price of using m1 instead

of m2 Dist(m2m1) Є [0,1)

GPs significance measurement A GP g will be called “a candidate

completion” if it can link non-empty subset of U.

Each g is transferred into a D-dimensions point. D=|DB|+1. g Pg == (Pg[0],m1,…,m|DB|) Pgi Є [0..1] , i=0..|DB|

Some definitions

Given GP g and Mashlet m: Components(g)={all Mashlets in g} g(m)=m’, if m’ Є Components(g) and

also m’ is the closest generalization to m.

Dist(mm’) is minimal. g(m)== , if none exists

Pg=(Pg[0],m1,…,m|U|,m|U|+1,…,m|DB|)

g importance against other GPs in GP:

Imp(g) = Static importance of GP g (will be discussed later)

IMP(M)={Imp(g)|gЄ GP}

User’s Mashlets

Dist(mm’)

Pg[m] == 1

0

PIdeal == (0,0,0,0,0,…,0) m1

m2

GP static importan

ce

PIdeal

Pg

Scoring function - S(Pg)

In reverse to the distance between Pg and PIdeal .

Monotony: For g,g’ Є GP , if every coordinate m

value in g, is lower from coordinate m in g’, than S(g’) S(g)

Agenda Introduction “MatchUp”- Assists creating Mashups. The Algorithm Implementation Experiments & Performance Summarize

Appendixes:1. Computing Importance.2. Future plans.

>g3,0.7<

>g2,0.5<

>g5,0.4<

>g1,0.3<

>g6,0.2<

>g4,0.2<

>g7,0.1<

>gp,score<

L1

>g5,0.8<

>g7,0.5<

>g2,0.5<

>g1,0.5<

>g3,0.2<

>g4,0<

>gp,score<

L2

>g7,1<

>g6,0.4<

>g5,0.4<

>g4,0.4<

>g3,0.4<

>g2,0.2<

>g1,0>

>gp,score<

L0

>g3,0.9<

>g5,0.8<

>g4,0.8<

>g6,0.7<

>g7,0.6<

>g2,0.6<

>g1,0.1<

>gp,score<

L|DB|

MashletsGP

Popularity

Algorithm internal data

<g,w>Є Lm => g Є GP , g:mm’ , w=Dist(m->m’)

<g,Pg0>

L1

>gp,score<

>g7,0.1<

>g4,0.2<

>g6,0.2<

>g1,0.3<

>g5,0.4<

>g2,0.5<

>g3,0.7<

L2

>gp,score<

>g4,0<

>g3,0.2<

>g1,0.5<

>g2,0.5<

>g7,0.5<

>g5,0.8<

L0

>gp,score<

>g1,0<

>g2,0.2<

>g3,0.4<

>g4,0.4<

>g5,0.4<

>g6,0.4<

>g7,1<

L|DB|

>gp,score<

>g1,0.1<

>g2,0.6<

>g7,0.6<

>g6,0.7<

>g4,0.8<

>g5,0.8<

MashletsGP

Popularity

Algorithm stops when: |PQueue| = kk && S(PQueue(k))>= S(g’)

Problem with the algorithm

The number of lists the algorithm accesses is very large

Most of the Mashlet lists are unrelated to the user’s selection. Average Mashup contain less than 5

Mashlets!

The refined Algorithm

Iterates only L0 to L|U| Not 100% correct when using same definition for S(g’).

Why? Enough for our problem when we redefine the general

threshold S(g’). g’ doesn’t connect any irrelevant Mashlets.

Correctness’ proofs Lemma: Let S(g’) be the threshold at

the end of one iteration. Let g be a candidate GP that has not been yet examined by AC*AC*. Then, S(g) S(g’)

L1

>gp,score<

>g7,0.1<

>g4,0.2<

>g6,0.2<

>g1,0.3<

>g5,0.4<

>g2,0.5<

>g3,0.7<

L2

>gp,score<

>g4,0<

>g3,0.2<

>g1,0.5<

>g2,0.5<

>g7,0.5<

>g5,0.8<

L0

>gp,score<

>g1,0<

>g2,0.2<

>g3,0.4<

>g4,0.4<

>g5,0.4<

>g6,0.4<

>g7,1<

L3

>gp,score<

>g1,0.1<

>g2,0.6<

>g7,0.6<

>g6,0.7<

>g4,0.8<

>g5,0.8<

>g3,0.9<

MashletsGP

Popularity

Lemma’s proof

The Theorem Algorithm AC*AC* returns a correct solution.

A contradiction proof… g supposed to be chosen.

S(gk)<S(g) AC* didn’t find g

S(g) S(g’) AC* stopped

S(g’) S(gk) a contradiction!!!

Agenda Introduction “MatchUp”- Assists creating Mashups. The Algorithm Implementation Experiments & Performance Summarize

Appendixes:1. Computing Importance.2. Future plans.

MatchUp’s Implementation

Can be combined in any Mashup’s editor.

Tested at IBM Mashup Center Platform. InfoSphere Mashup Hub- create XML

Data feeds. The Mashlets. Lotus Mashups – visual layer to

assemble some Data feeds. The GPs.

MatchUp’s Implementation (cont’)

Extensions to the current DB: Inheritance information. A relational DB for the lists, GPs scores, etc.

Written in Java. Wrapped as Web-Service

Inputs: A list of Mashlets- U. An integer k.

Output: A list of top-k possible completions for U.

Websphere

Application Server

MatchUpAlgorithm

4

Knowledge base

1

1

2

3

5

IBM Mashup Center

Agenda Introduction “MatchUp”- Assists creating Mashups. The Algorithm Implementation Experiments & Performance Summarize

Appendixes:1. Computing Importance.2. Future plans.

Stage #1- Checking Performance

Num of Mashlets: 1-40000. 1:3.5 ratio between |M| and |GP| 4000 GPs at ProgrammableWeb.com

GP Structure GP complexity – c. At ProgrammableWeb.com, 2 c 5

Experiments & Performance (cont’) Inheritance depth

Split to sets Maximal inheritance depth – d. Doesn’t affect performance

Mashlet Importance Uniform distribution for the base function. a,b,c don’t affect performance.

User Input: 2 |U| 20 3 kk 20

Runtime vs. #Mashlets & kk

c = 5 c = 10

Less than a second!

Runtime vs. #Mashlets & User Mashlets

c = 5kk = 3

Stage #2: Results’ Quality

10 users used the system to build a travel-related Mashup.

k=10 Did the users adopt the recommendations? Could they find some better completions?

The users ranked the given completions. About the same as MatchUp. Reflection of personal “taste”.

What is better: omitting a Mashlet, or adding a redundant one?

Agenda Introduction “MatchUp”- Assists creating Mashups. The Algorithm Implementation Experiments & Performance Summarize

Appendixes:1. Computing Importance.2. Future plans.

Summarize What’s a Mashup? “MatchUp” help developers creating

Mashups. Autocompletion mechanism. Can be attached to any Mashup’ editor.

Take advantage of previous works. A TA Algorithm, based some ranking

functions. Efficient and effective.

Agenda Introduction “MatchUp”- Assists creating Mashups. The Algorithm Implementation Experiments & Performance Summarize

Appendixes:1. Computing Importance.2. Future plans.

Any Questions?

Appendix 1. Computing Importance

The Static Importance of a GP g - Imp(g) and a Mashlet m –Imp(m).

A base function for each m & g # of downloads. Explicit rating system.

A PageRank style importance Importance by Inheritance. Importance by Mashlets–GPs connections.

3 weigh parameters: a+b+c=1. At “Matchup”, a=b=c=1/3.

Appendix 2. Future plans

Add personal preferences of the scoring function.

Declare syntactic inheritance. Based logical programming for

Relational DB. Datalog-style.

Support querying Mashlets’ DBs all over the web.