ioannis konstantinou school of ece computing systems laboratory
Post on 19-Jan-2016
26 Views
Preview:
DESCRIPTION
TRANSCRIPT
GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content
Ioannis KonstantinouSchool of ECE
Computing Systems LaboratoryNational Technical University of
Athens
Concept
Text based search in audiovisual content Search results: Portions of video files
containing selected keywords Example
User searches for keyword “Acropolis” Video portions containing the spoken word
“Acropolis” are located and presented in the user
YouTube like functionality
Objectives
Keyword extraction from video files using automatic speech recognition algorithms (ASR)
Efficient and scalable distributed storage of large media content
Indexing of extracted metadata for efficient keyword search
YouTube like user interface for video searching/downloading
Contribution to existing Grid Middleware using GGF standardized components
Addressed Issues Execution
Distributed execution of CPU/Data intensive Speech Recognition Algorithms
Storage Server load balancing using
performance metrics Client transfer time optimization using
bittorrent like algorithms Increase data availability Multi-organizational data storage
support using Virtual Organizations (VOs)
Video file keyword extraction procedure
MetadataDB
MetaScheduler
Cluster Cluster
Cluster
Storage
SESE
SE
Application Software
Data Client GET/PUT File Ops
CVSP tool
MetadataFile
Export
MetadataParserImport
Update
Schedule Execution
Distributed Execution Platform Architecture
Central MDS Server
User Interface
GridWay Metascheduler
Globus Toolkit 4
MDS
Local MDS Server Local MDS Server Local MDS Server
Publish aggregateCluster information
SOAP/XML messages
Computing Element
PBS Scheduler
Globus Toolkit 4
MDS + WSGRAM
WN WN WN
Ganglia Stats
WN WN WN WN WN WN
PBS client service
Ganglia Tool
MDS : Managing and Discovery ServicePBS : Portable Batch SchedulerWSGRAM : Grid Resource and
Allocation ManagementWN: Worker Node
Distributed Storage Platform Architecture
DRLS Pastry Peer to Peer
SE
GridFTP Server
Network Weather Service (NWS)
DRLSCom Service
Globus WS Apache Axis Container
SE
SE
Storage Subsystem
Client Machine
GridNews DataClient component
GridFTP Client
Parallel Downloader
GridFTPGet/put
Query Storage Subsystemto obtain candidate storage servers
using SOAP/XML Messages
Get/Put LFN to PFN mappings
SOAP/XML Messages
MDS: Managing and Discovery ServiceGridFTP: Grid File Transfer ProtocolDRLS: Distributed Replica Location ServiceSE: Storage ElementPFN: Physical File NameLFN: Logical File Name
SE
Replication Algorithmusing NWS stats
Distributed Replica Location Service
Contains mappings of LFNs to PFNs DHT used: Pastry P2P Logarithmic routing
In a network with n nodes, a query needs only log(n) messages (hops)
Plaxton’s algorithm minimizes query latencies Redundancy through replication
Eliminates single point of failure situations Inherent load balancing capabilities
Consistent hashing algorithms
Load Balancing
Servers exchange load metrics CPU Bandwidth Free Disk Space
Prediction algorithms (e.g. Linear regression) forecast future metrics from history data
Weighted Normalized Metric WNM : WmX(Mt/Mmax) Total Server Load (TSL): Sum(WNMi)i=1..n Servers maintain numerically sorted TSL list:
[TSL1..TSLn] TSL list periodically refreshed
Replication
Upon a STORE client request: Top k servers are selected from WNM list
k: configurable static replication factor Most suitable server is returned to the client Client initiates a single GridFTP file upload Server replicates the new file according to
WNM list and factor k DRLS is informed about the new LFN->PFN
mappings Client is informed Upon completion
Parallel Downloader
Upon a GET client request: Server contacts DRLS and retrieves replica
locations Client establishes N GridFTP connections Client initiates N parallel (threaded) small data
chunk requests After each successful retrieval, client re-initiates
another request Optimum file transfer time:
The greater file portion is retrieved from the faster storage nodes
To be replaced by GridTorrent
GridTorrent Metadata fields
Current Id File size Piece size Hashes
Distributed RLS instead of Tracker
Partial GridFTP for actual transfer
BitTorrent replica selection and tit-for-tat algorithm.
Compatible with plain GridFTP servers
PFN’s prefix determines protocol (gtp://site.fully.qualified.domain.name/path/to/file)
Bittorrent
Peer3GridFTP Server
New connection
New
connection
Peer setPeer2
GridTorrent client
Peer1GridTorrent client Bittorrent
Stri
ped
Grid
FTP
Striped GridFTP
CurrentID :pfn1 pfn2 pfn3
…
publish
pfns
Peer that wants todownload a file
size,piece_size,hashes, currentID
DistributedRLS
Striped GridFTP
Bitt
orre
nt
New
con
nect
ion
piece
piece
piece
piece
piece
piece
EXISTING MIDDLEWARE CONTRIBUTION
Design and development of a dynamic replica
selection/placement algorithm
Added support for multiple clusters using Gridway Metascheduler
Replace centralized replica location service with a
scalable distributed peer to peer solution
Enhanced bittorrent like file downloading
from multiple sources
Development Testbed Hardware
4X Dual Core AMD Opteron(tm) Processor 875 2.2GHz – 8 virtual CPU 16Gb Ram
Deployment of 5 Xen virtual machines, 2GB ram Software
Globus Toolkit v4.0 Globus WS Core (Apache AXIS WS Container) Rice Pastry P2P (Java) Network Weather Service Torque (OpenPBS) scheduler GridWay Metascheduler
Virtualization
Xen Hypervisor Paravirtualization tequnique Guest OS use special “xen aware” kernel Direct utilization of special CPU instructions Faster than full virtualization (VmWare)
Use of Xen Hypervisor Easy prototype management/administration Simple control of the node lifecycle Facilitate prototype deployment in many actual
nodes
Currently working
Replace ParallelDownloader with GridTorrent
Deploy prototype in the PlanetLab testbed
Run experiments Fine-tune designed algorithms
Gridnews Portal
Users can: Perform keyword search in the auto-
annotated multimedia content View the video from their browser in a
youTube style Download only a fragment of the video
where this keyword exists
Screenshots
keyword
Video URLs
time position
Questions
top related