p2p & cloud conference: keynote march 9, 2010 michael papish – keynote
TRANSCRIPT
P2P & Cloud Conference: KeynoteMarch 9, 2010
Michael Papish – Keynote
Agenda
• Who is MediaUnbound?
• What is hybrid-cloud computing?
• How does MediaUnbound use hybrid-cloud computing?
• How can you mine P2P data to drive technology?
• Coming Soon…
Introduction
• Based in Cambridge, MA—the real center of the digital media world
• Regional offices for content analysis in Germany and Japan
• Founded in January 2000
• Creates media personalization/ recommendation technology
Representative Clients
Corporate Info
• 9+ year track record in learning about media personalization and providing production-grade technology to customers
• Strong team and technology advisors– Pattie Maes (MIT Media Lab), Robert Berwick (MIT AI Lab),
Jonathan Zittrain (Harvard Law School)
• Privately held—profitable since 2003
• Company offers solutions to digital media providers not individual consumers
• International focus—provide services for North America, Europe, South America, and Asia
Implementations: eMusic
powered by
Hybrid-Cloud Computing
Q: What is hybrid-cloud computing?
A: Hybrid-cloud computing is a fancy name for the practice of hosting parts of
your service in multiple locations
Use of hybrid cloud
• We host critical components and data in private datacenter
• But, we host in the “public” cloud (e.g. EC2) for:
• Spikes in demand. Very useful when clients can drive unanticipated demand
• Devel and staging environments. Public cloud is ideal for providing pre-production services to clients, since these environments can be created on-demand and completely walled-off from sensitive private cloud operations
• Large data runs. Access to an elastic, public cloud allows MUI to instantiate a large number of compute nodes for one-off, large-scale computation runs (e.g. 100s of nodes). This is much cheaper than acquiring private cloud capacity for one-off computations.
We utilize hybrid public-private cloud
Personalization Platform schematic
Feedback DistillerReal-time
Recommendation / Feedback Processor
Client Recommendation
Data Optimizer
Content Delivery System
VOD
Music Store
Mobile Devices
Med
iaU
nb
ound
Per
son
aliz
atio
n L
ayer
Me
dia
Unb
ou
nd B
acke
nd D
ata
Clie
nt S
ide
Feedback InterfaceClient Functions
MediaUnbound FunctionsRecommender Interface
Cached Content
Specific Data
Live User Profiles (Crunched)
Recommendations
Content Recommendations
Audio
Video
Text
Ad Targeting
Heavy User Profiles
Music Profile Other Content Data
Other Content Data
Content-Specific Indexes
Content-Neutral Indexes
MediaUnbound Personalization Platform
MediaUnbound CoreAutomated Data
ProcessingExternally-Gathered
Usage Data
Human-Generated Data
Algorithm Library
Territory- and Content-Specific
Optimizations
Third-Party Reporting Sources
Omniture reporting;Rentrak VOD reporting;Nielsen TV reporting;
DoubleClick;CDN(s), including
Akamai
User Feedback
Explicit Feedback
Personalization Session
Negative “Don’t Like” Feedback
Positive User Ratings
Implicit Feedback
Usage Data
Demographic Data
Personal Library Analysis
“Power User” Controls
Interactive viewing / modifying of data in profile by user
Statistics Module
Profile Data Reporting
Service-Wide Statistics Reporting
Client Admin Tool
Client-Specific RecommendationTweaking Tools
Client QA Team
MediaUnbound QA Team / Media Analysts
Reasons to adopt hybrid cloud
• Ability to predict demand and usage
• Security needs. When hosting multiple global 500 clients, a private cloud approach can provide increased comfort levels around data security
• Utilization. Elastic public cloud resources can be ideal for one-off usage, since it will be cheaper than acquiring a large amount of private capacity which might sit idle.
Need to evaluate following, before choosing hybrid architecture:
How to mine P2P data
• Large amount of easily accessible, public data
• In aggregate, reveals trends
• At the individual peer level, can find usage patterns on specific content and user types
Raw P2P traffic contains valuable information
MUI’s P2P datawarehouse
• 250M+ snapshots of individual shared collections
• Spanning 10 years—since 2000 and multiple networks (original Napster, soulseek/slsk, gnutella, etc.)
• Quality of data changes over period
• Early collections are more representative of individual tastes; users had yet to realize how to turn off sharing
• Later collections are pared down, more focused on current content; more superusers
In our datawarehouse, we have:
Coming Soon...
• Exciting announcement
• Strong push into video (movies, TV, etc.)
• Focus on non-PC devices
• Integrating local content with cloud content via intelligent recommendations and organization tools
Coming soon…
P2P & Cloud Conference: KeynoteMarch 9, 2010
Michael Papish – Keynote