1 autocompletion for mashups ohad greenshpan, tova milo, neoklis polyzotis tel-aviv university ucsc

1 Autocompletion for Mashups Ohad Greenshpan, Tova Milo, Neoklis Polyzotis Tel-Aviv University UCSC Slide 2 2 Talk Roadmap Introduction on Mashups and Autocompletion Problem Definition The Algorithm Implementation & experiments Conclusions & Related Work Slide 3 3 Introduction - What is a mashup ? Mashup is a technology for integration of data, services and applications being available on the web, into a single application. Slide 4 4 Application Integration Data Logic GUI Logic GUI Logic GUI Logic GUI Logic GUI Data Logic GUI Data Logic GUI Data Logic GUI Data Logic GUI Mashup Platform Slide 5 5 Mashup Development is difficult... Choose some relevant components Decide which should be connected and learn their spec Components Repository Glue Mashup Repositories 10 2 Slide 6 6 knowledge ? Slide 7 7 Introduction - Mashup Autocompletion Slide 8 8 The Mashup Model Data & logic Mashlets & Mashlet-APIs API Mashlet Data & logic Mashlets & Mashlet-APIs API Mashlet Data & logic Mashlets & Mashlet-APIs API Glue Pattern Slide 9 9 Inheritance A B A B Slide 10 10 Mashup Autocompletion Problem Definition Given a database of mashlets and GPs and a set of mashlets selected by the user, identify and rank GPs that link a subset of the selected mashlets. Based on: Popularity & Relevance to user query What would be the ideal GP: The most popular one that connects only the user mashlets and nothing else Relaxations: - Less popular - Connects variants of the user mashlets - Connects a subset of the user mashlets - Connects additional mashlets Slide 11 11 Inheritance Slide 12 12 -Each glue pattern is represented as a point in a multidimensional space. -One dimension representing the GP popularity -The rest: All mashlets 1) User Mashlets 2) Other mashlets -The algorithm goal is to find the top-k GPs that link the given user mashlets (the ones close to the optimal GP). Problem Abstraction m1 m2 GP Popularity A simplified 3D illustration 0 0 0 0 0 0 0 0 0... g 0.4 0.3 0.2 0 1 0 0 1 0 1 0... Slide 13 13 Data Structure & Basic Top-k Algorithm L1 >gp,score< >g7,0.1< >g4,0.2< >g6,0.2< >g1,0.3< >g5,0.4< >g2,0.5< >g3,0.7< L2 >gp,score< >g4,0.1< >g3,0.2< >g1,0.5< >g2,0.5< >g7,0.5< >g5,0.8< >g6,0.8< L0 >gp,score< >g1,0.1< >g2,0.2< >g3,0.4< >g4,0.4< >g5,0.4< >g6,0.4< >g7,0.4< L3 >gp,score < >g1,0.1< >g2,0.6< >g7,0.6< >g6,0.7< >g4,0.8< >g5,0.8< >g3,0.9< Glue Patterns Mashlets GP Popularity Slide 14 14 Problems with the algorithm The number of lists the algorithm accesses is very large Most of the mashlet lists are unrelated to the user selection (query) Slide 15 15 Data Structure Glue Patterns Mashlets GP Popularity User mashlets Slide 16 16 Algorithm n n and p g [m]=0 for n < m |M all | n M Slide 17 17 Correctness of AC* - Lemma Theorem 4.1: Algorithm AC* returns a correct solution Proof is based on a lemma showing that any candidate that has not been encountered by AC*, has a total score lower than the threshold. Optimality of AC* Competing Algorithms: C class of deterministic algorithms that operate under the same access model as AC*. Algorithms receive as input the lists, the monotonic function, and k. Algorithms can use any order (i.e., not specifically round-robin) and any thresholding scheme, and can rely on accessed elements. Instance Optimality: AC* is instance optimal within class C if there are constants c and c0 such that for every input instance I, cost(AC*,I) ccost(A,I)+c0 for any A C. Slide 18 18 Calculating Popularity Glue Pattern and Mashlets Rank Page-rank style algorithm Takes into account popularity of mashlets and GPs, as well as relationship between them. MM GP M M Slide 19 19 Websphere Application Server MatchUp Algorithm 4 Knowledge base 1 1 2 3 5 IBM Mashup Center Implementation Slide 20 20 Experiments (synthetic dataset) Synthetic dataset for large-scale experiments - Generated a DB of 40k mashlets & GPs (ProgrammableWeb has 4k) - Based on ProgrammableWeb characteristics. Experiments for synthetic dataset - Varying # of total mashlets and GPs - Varying k - Varying # of user mashlets - Varying GP complexity Slide 21 21 GP Complexity = 5, varying k Results (synthetic dataset) Slide 22 22 GP Complexity = 10, varying k Results (synthetic dataset) Slide 23 23 Varying # of user mashlets Results (synthetic dataset) Slide 24 24 Real dataset - Used real-life mashlets from ProgrammableWeb and IBM Mashup Center - Scenario: development of a travel-related mashup Experiments for quality assesment - IBM Mashup Center as the mashup platform - Users placed mashlets - MatchUp offered top-10 GPs for their mashlets - Users searched for alternatives Results - User satisfaction was high - High correlation between suggestions and users lists - Browsing for additional results was in general unsuccessful - Gluing process was significantly expedited Experiments (real dataset) Slide 25 25 Related Work Autocompletion in many other domains Phrase Prediction (Nandi & Jagadish, VLDB 2007) File locations (Myers, CHI 2000) Web service composition Model for WS composition (Berardi et al., VLDB 2005) Optimized and customized algorithm (Mcilraith and Son, KR 2002) Mashup assembly tools MashMaker (Ennals & Garofalakis, SIGMOD 2007) : data -> widgets MashupAdvisor (Elmeleegy et al., ICWS 2008): mashup -> output recomm. -> assembly to achieve this output Slide 26 26 Future Work Infer semantic inheritance automatically Distributed environment Incorporating context and user preference Conclusions A novel Autocompletion mechanism for rapid development of mashups Using the collective wisdom of other users on the web A dedicated Threshold-based top-k algorithm which reduces the search space Pagerank-style calculation of mashlets and glue patterns popularity Slide 27 27

1 autocompletion for mashups ohad greenshpan, tova milo, neoklis polyzotis tel-aviv university ucsc

Documents