i2.1: in-network storage ns-cta inarc meeting 23-24 march 2011 cambridge, ma

Download I2.1: In-Network Storage NS-CTA INARC Meeting 23-24 March 2011 Cambridge, MA

If you can't read please download the document

Upload: basil-knight

Post on 18-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • I2.1: In-Network Storage NS-CTA INARC Meeting 23-24 March 2011 Cambridge, MA
  • Slide 2
  • Research Objective Devise techniques to enhance the efficacy of data dissemination and query processing in an information network that is distributed over an unreliable communication network
  • Slide 3
  • Researchers CategoryNameInstitution/Center LeadMudhakar SrivatsaIBM/INARC ResearcherTarek AbdelzaherUIUC/INARC ResearcherArun IyengarIBM/INARC ResearcherXifeng YanUCSB/INARC CollaboratorGuohong CaoPSU/CNARC CollaboratorVikas KawadiaBBN/IRC
  • Slide 4
  • Task Overview Data Dissemination (Guohong Cao C2.1) (Social-aware) Mobility, Hybrid networks Provenance dissemination In-network caching Joint work with CNARC Complex graph query processing on clusters (Xifeng Yan I2.2) Distributed graph query processing on DTNs Distributed trust computation (Vikas Kawadia IRC, T2.3) CNARC I2.1 INARC
  • Slide 5
  • Outline Provenance dissemination (Srivatsa, Iyengar, PSU/CNARC) In-network storage for provenance queries Diversity based in-network caching (Tarek, Iyengar, Srivatsa, PSU/CNARC) Enhancing operational capacity of network Cyclades (Srivatsa, Yan, BBN/IRC) Distributed graph query processing
  • Slide 6
  • Provenance Dissemination References M. Srivatsa, W. Gao and A. Iyengar. Provenance driven Data Dissemination in Disruption Tolerant Networks. Under review (Fusion 2011) (IBM/INARC & PSU/CNARC) W. Gao, A. Iyengar, M. Srivatsa and G. Cao. Supporting Cooperative Caching in Disruption Tolerant Networks. In IEEE Intl Conference on Distributed Computing Systems (ICDCS), 2011 (IBM/INARC & PSU/CNARC) Y. Zhang, W. Gao, G. Cao, T. La Porta, B.Krishnamachari, and A. Iyengar Social-Aware Data Diffusion in Delay Tolerant MANETs". Book chapter to appear in Handbook of Optimization in Complex Networks: Communication and Social Networks, Springer. (IBM/INARC & PSU/CNARC) W. Gao, A. Iyengar and M. Srivatsa. System and Method for Caching Provenance Information. Patent filed. (IBM/INARC & PSU/CNARC) Discussions with Robert Cole (CERDEC), John Hancock (ArtisTech/IRC), Matthew Aguirre (ArtisTech/IRC), Alice Leung (BBN/IRC) Used our DTN in-network caching code to run experiments on traces typical of military scenarios
  • Slide 7
  • 2010 IBM Corporation7 Disruption Tolerant Networks (DTNs) Opportunistic and intermittent network connectivity Low node density Unpredictable node mobility
  • Slide 8
  • 2010 IBM Corporation8 Network Model Network contact graph at time t An edge iff the nodes i and j have contacted before time t Each edge is modeled by a contact process (e.g., a homogeneous Poisson process) Pairwise inter-contact time is exponentially distributed Parameter: pairwise contact rate We can predict when the next contact will happen
  • Slide 9
  • 2010 IBM Corporation9 Basic Idea Semi-ring provenance model d = a v (b ^ c) Provenance level partial order: {} < {b}, {c} < {a}, {b, c} < {a, b, c} Quantifying marginal provenance level of a data item d = f(a 1, , a n ) d = a. (b + c) -> w d b = w d c = ; w d a = Utility-based data placement A unified probabilistic framework Two caching nodes optimize their caches upon contact Partition Map Network Contact Graph Routing Table (opportunistic path) Worker Message Queue Overlapped P">
  • Adapt Sedge to DTN Master Vertex -> Partition Map Network Contact Graph Routing Table (opportunistic path) Worker Message Queue Overlapped Partition (Cache) Superstep = Time slot e.g. 1 min, 1 hour, 1 day, etc. 1 1 2 2 3 3 4 4 5 5 6 6 P1 P2 P3 P4 P5 P6 Contact Graph Cluster Connection
  • Slide 23
  • Preliminary Experiments: Running Sedge on DTNs Data Web graph: 30M vertices, 150M edges # of partitions = # of nodes in contact graph Contact Graph 1.Complete Contact Graph: 12 nodes 2.MIT Bluetooth Contact: 9 nodes, 24 hours (http://crawdad.cs.dartmouth.edu) Assumption: pairwise contact follows Poisson distribution. Other sophisticated contact types are also supported by Sedge. Query h-step Random Walk Query
  • Slide 24
  • Complete Contact Graph RW with random start =average contacts per superstep
  • Slide 25
  • MIT Bluetooth Contact (24 hours) 1 1 2 2 4 4 3 3 5 5 6 6 Contact Graph 8 8 9 9 7 7 0.37 0.33 0.82 0.33 0.71 0.47 0.4 0.3 1.0 0.250.39 0.33 w(e)=average contacts per hour
  • Slide 26
  • Cyclades Cyclades is Sedge for DTNs BSP model: a synch corresponds to a contact in DTN Naively running Sedge on DTNs may be inefficient (e.g., requires a large number of contacts -> high query latency) Revisit graph query processing with DTN constraints Machines have a high data-rates to each other whenever they are in contact Contact opportunities may be rare Cyclades innovations: New metric: minimize the number of contacts (synchs in BSP) Algorithms based on intra-machine speculative execution and inter-machine opportunistic aggregation We present results on computing shortest paths based metrics (e.g., node betweenness)
  • Slide 27
  • Cyclades initial results Dataset: DBLP data with 1.2M papers Information network Each paper is a node Two nodes have an undirected edge if they have a common author Weight of an edge is the inverse of Jacquard distance between the author lists of the papers Problem Compute node betweenness centrality
  • Slide 28
  • Cyclades initial results Approach Partition nodes into clusters; define a perimeter node as a node that has an edge to a node in another cluster Intra-cluster: compute all-pair shortest path matrix M between perimeter nodes (using only edges within the partition) Inter-cluster: on an opportunistic contact between two partitions i and j, merge the all-pair shortest path matrices M i and M j into M ij Guarantee: M ij is the all-pair shortest path matrix on perimeter nodes in the merger of partitions of i and j
  • Slide 29
  • Cyclades initial results Clustering on DBLP data with 100k and 200k nodes Shows the number of perimeter nodes and edges And the maximum size of clustered partitions
  • Slide 30
  • Cyclades initial results Comparison of four shortest path algorithms Centralized (Dijkstra algorithm) Pregel random (Pregel with random partitioning) Pregel cluster (Pregel with node clustering) Cyclades (requires provably minimum number of contacts in the BSP model; guarantees on communication and computation cost being explored)
  • Slide 31
  • Cyclades initial results Number of synch operations and communication cost between partitions Each synch operation requires at least one DTN contact Since contacts in DTNs are rare, we have to minimize the number of synchs, to ensure low query latency
  • Slide 32
  • Cyclades initial results Improved communication cost in our approach comes at the cost of higher computation and storage cost Our approach tradeoffs the number of synch operation (query latency) with computation and storage cost
  • Slide 33
  • Cyclades initial results Computing the node betweenness centrality Randomly chosen uv pairs Random walk sampling (picks nodes with high degree) Expansion sampling (greedily picks nodes with maximum expansion: |N(S)|/|S| Figure shows the accuracy of node betweenness with number of samples
  • Slide 34
  • Military and Network Science Relevance Tactical military networks: intermittent connectivity, multiple modalities of communication, unreliable communication Needs disruption tolerance data dissemination and query answering Needs the ability control tradeoffs between quality and performance Enhancing trust in decision making: provenance dissemination and distributed trust computation Joint analysis of communication and information network to enhance the efficacy of information delivery and query processing Examine a spectrum of expressiveness of information network models
  • Slide 35
  • Path Ahead In-network analytics and query answering on DTNs (with CNARC) Examine diversity based (partial) redundancy elimination mechanisms; quantify tradeoffs between quality and performance of query answering Characterize graph query processing algorithms that benefits from hierarchical decomposition and/or speculative execution Determine better graph partitioning, sampling and clustering strategies to enhance query processing
  • Slide 36
  • Collaborations Within task Numerous telecons Xifengs student (UCSB) -> IBM (summer 2011) to work on distributed graph query processing in DTNs Within INARC I2.2: Provide query execution interface for the DTN (battlefield) context; modify distributed information network processing platform for DTNs. I1: I2.1 provides a scenario and data set for investigation of QoI metrics for data pools (in I1.2) that maximize quality of fusion; I2.1 offers a test scenario for algorithmic advances in I1.1 that focus on improving inference in resource constrained networks With CNARC (Guohong Cao, PSU C2.1) Research interest in DTNs and in-network storage Guohongs student (PSU/CNARC) -> IBM/INARC (summer 2010 & 2011) to work on delta encoding in DTNs With IRC (Vikas Kawadia, BBN T2.3) Distributed trust computation over in-network storage
  • Slide 37
  • Impact Publications M. Srivatsa, W. Gao and A. Iyengar. Provenance driven data dissemination in disruption tolerant networks. Under submission, Fusion 2011 W. Gao, A. Iyengar, M. Srivatsa and G. Cao. Supporting Cooperative Caching in Disruption Tolerant Networks. In ICDCS 2011 Y. Zhang, W. Gao, G. Cao, T. La Porta, B.Krishnamachari, and A. Iyengar Social- Aware Data Diffusion in Delay Tolerant MANETs". Book chapter to appear in Handbook of Optimization in Complex Networks: Communication and Social Networks, Springer W. Gao, A. Iyengar and M. Srivatsa. System and Method for Caching Provenance Information. Patent filed F. Le, M. Srivatsa, A. Iyengar and G. Cao. Resolving Negative Interferences between In-Network Caching Methods. Under preparation Demo/Transitions Md Y. S. Uddin, Guo-Jun Qi, and Tarek Abdelzaher, Guohong Cao, PhotoNet: A Similarity-aware Image Delivery Service for Situation Awareness, IPSN Demo, April 2011 Collaborations with Robert Cole (CERDEC), John Hancock (ArtisTech/IRC), Matthew Aguirre (ArtisTech/IRC), Alice Leung (BBN/IRC) Used our DTN in-network caching code to run experiments on traces typical of military scenarios
  • Slide 38
  • Questions Contact: Mudhakar Srivatsa ([email protected])