cloud computing, big data, & cdn emerging technologies

Upload: adrian-lopez

Post on 06-Jul-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    1/212

    Cloud Introduction

    Cloud Computing

    1

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    2/212

    Cloud Computing 

    What does Cloud Computing do?

    • Provides online data storage

    • Enables configuration and accessing of online applications

    • Provides a variety of software usage

    • Provides computing platform and computing infrastructure

    2

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    3/212

    Cloud Computing 

    Application Example

    • Using Gmail on my smartphone to check e-mails

    • Receive an e-mail with a MS Power Point attachment file

    • However, MS Power Point and Windows OS is not installed

    on my smartphone!

    • Google Drive service’s Google Docs, Sheets, and Slides

    can be used to open the file

    3

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    4/212

    Cloud Computing 

    What is a Cloud?

    • Cloud can provide services through a public or private

    Network or the Internet, where the service hosting system isat a remote location

    • Cloud can support various applications

    • E-mail, Web Conferencing, Games, Database

    Management, CRM (Customer Relationship Management),

    etc.

    4

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    5/212

    Cloud Computing 

    Cloud Models

    5

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    6/212

    Cloud Models

    • Public Cloud

    ˗ Enables public systems and service access˗ Open architecture (e.g., e-mail)

    ˗ Could be less secure due to openness

    • Private Cloud

    ˗ Enables service access within an organization

    ˗ Due to its private nature, it is more secure

    Cloud Computing 

    6

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    7/212

    • Community Cloud

    ˗ Cloud accessible by a group of organizations

    • Hybrid Cloud

    ˗ Hybrid Cloud = Public Cloud + Private Cloud

    ˗ Private cloud supports critical activities

    ˗ Public cloud supports non-critical activities

    Cloud Computing 

    Cloud Models

    7

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    8/212

    Cloud Computing 

    Cloud Service Models

    Ø SaaS: Software as a Service

    Ø PaaS: Platform as a ServiceØ IaaS: Infrastructure as a Service

    The lower service model supports the

    management, computing power, security

    of its upper service model

    8

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    9/212

    Cloud Computing 

    Software as a Service (SaaS)• Provides a variety of software applications as a service to

    end users

    Platform as a Service (PasS)

    • Provides a program executable platform for applications,

    development tools, etc.

    Infrastructure as a Service (IaaS)• Provides the fundamental computing and security

    resources for the entire cloud

    • Backup storage, computing power, VM (Virtual Machines),

    etc.

    9

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    10/212

    Cloud Computing 

    Cloud Service Models

    • There are many other service models

    • XaaS = Anything as a Service

    • NaaS 

    N for Network as a Service

    • DaaS 

    D for Database as a Service

    • BaaS 

    B for Business as a Service

    • etc.

    10

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    11/212

    Cloud Computing 

    Cloud Benefits

    11

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    12/212

    Cloud Computing 

    Characteristics

    12

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    13/212

    REFERENCES

    Cloud Computing

    13

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    14/212

    • K. Kumar and Y. H. Lu, “Cloud Computing for Mobile Users: Can Offloading

    Computation Save Energy?,” Computer , vol. 43, no. 4, pp. 51–56, Apr. 2010.

    • Wikipedia, http://www.wikipedia.org

    •  Apple, iCloud, https://www.icloud.com

    • Google, Google Cloud, https://cloud.google.com/products [Accessed June 1, 2015]

    • Virtualization, Cisco’s IaaS cloud,http://www.virtualization.co.kr/data/file/01_2/1889266503_6f489654_1.jpg

    [Accessed June 1, 2015]

    • Tutorialspoint, Cloud computing,

    http://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf

    [Accessed June 1, 2015]

    References

    14

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    15/212

    Image sources•  AWS Simple Icons Storage Amazon S3 Bucket with Objects, By Amazon Web

    Services LLC [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via

    Wikimedia Commons

    • iCloud Logo, By EEIM (Own work) [Public domain], via Wikimedia Commons

    • MobileMe Logo, By Apple Inc. [Public domain], via Wikimedia Commons

    References

    15

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    16/212

    Cloud Service Models

    Cloud Computing

    16

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    17/212

    Cloud Computing 

    Cloud Service Models

    Ø SaaS: Software as a Service

    Ø PaaS: Platform as a ServiceØ IaaS: Infrastructure as a Service

    The lower service model supports the

    management, computing power, security

    of its upper service model

    17

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    18/212

    IaaS 

    IaaS (Infrastructure as a Service)

    • Infrastructure support over the Internet

    • Cloud’s Computing & Storage Resources• Computing Power 

    • Storage Services

    • Software Packages & Bundles

    • VLAN (Virtual Local Area Network)

    • VM (Virtual Machine) Features

    18

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    19/212

    IaaS 

    VM (Virtual Machine) Administration

    • IaaS enables control of computing resources through

    Administrative Access to VMs

    Server Virtualization features

    • Access to computing resources are enabled by

    Administrative Access to VMs

    • VM Administrative Command examples

    • Save data on cloud server 

    • Start web server• Install new application

    19

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    20/212

    IaaS 

    IaaS Procedures

    20

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    21/212

    IaaS 

    IaaS Benefits

    • Flexible and Efficient Renting of Computer & Server

    Hardware

    • Rentable Resources

    • VM, Storage, Bandwidth,IP Addresses, Monitoring Services, Firewalls,

    etc.

    • Rent Payment Basis

    • Resource type

    • Usage time• Service packages

    21

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    22/212

    IaaS 

    IaaS Benefits

    • Portability & Interoperability with

    Legacy Applications

    • Enables portability based on infrastructureresources that are

    used through Internet connections

    • Enables a method to maintain interoperability with

    legacy applications and workloads

    between IaaS clouds

    22

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    23/212

    PaaS 

    PaaS

    (Platform as a Service)

    • Provides development &

    deployment tools for

    application development

    • Provides runtime

    environment for apps.

    23

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    24/212

    Stand Alone

    Development

    Environment

    Application

    Delivery-Only

    Environment

    Open Platform

    as a Service

    Add-on

    Development

    Facilities

    Cloud Services

    PaaS Types

    24

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    25/212

    PaaS 

    PaaS Types

    • Application Delivery-Only Environment

    • Provides on-demand scaling & application security

    • Stand-Alone Development Environment• Provides an independent platform for a specific function

    • Open Platform as a Service

    • Provides open source software to run applications for

    PaaS providers

    • Add-On Development Facilities• Enables customization to the existing SaaS platforms

    25

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    26/212

    PaaS 

    PaaS Benefits

    26

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    27/212

    PaaS 

    Benefits

    • Lower Administrative Overhead

    • User does not need to be involved in any

    administration of the platform

    • Lower Total Cost of Ownership

    • User does not need to purchase any hardware,

    memory, or server

    27

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    28/212

    PaaS 

    • Scalable Solutions

    • Application resource demand based automatic

    resource scale control

    • More Current System Software

    • Cloud provider needs to maintain software

    upgrades & patch installations

    Benefits

    28

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    29/212

    SaaS 

    SaaS (Software as a Service)

    • Provides software applications as a service to the

    user 

    • Software that is deployed on a cloud server which

    is accessible through the Internet

    29

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    30/212

    SaaS 

    Characteristics

    • On Demand Availability

    • Cloud software is available anywhere that the

    cloud is reachable via Internet• Easy Maintenance

    • No user software upgrade or maintenance needed

    All supported by the cloud

    • Flexible Scale Up or Scale Down

    • Centralized Management & Data

    30

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    31/212

    SaaS 

    Characteristics• Enables a Shared Data Model

    • Multiple users can share a single

    data model and database

    • Cost Effectiveness• Pay based on usage

    • No risk in buying the wrong software

    • Multitenant Programming Solutions

    • Multiple programmers are ensured to use the same

    software versionNo version mismatch problems

    31

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    32/212

    Software-as-a-service

    Open SaaS

    Applications

    32

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    33/212

    REFERENCES

    Cloud Computing

    33

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    34/212

    • K. Kumar and Y. H. Lu, “Cloud Computing for Mobile Users: Can Offloading

    Computation Save Energy?,” Computer , vol. 43, no. 4, pp. 51–56, Apr. 2010.

    • Wikipedia, http://www.wikipedia.org

    •  Apple, iCloud, https://www.icloud.com

    • Google, Google Cloud, https://cloud.google.com/products [Accessed June 1, 2015]

    • Virtualization, Cisco’s IaaS cloud,

    http://www.virtualization.co.kr/data/file/01_2/1889266503_6f489654_1.jpg

    [Accessed June 1, 2015]

    • Tutorialspoint, Cloud computing,

    http://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf

    [Accessed June 1, 2015]

    References

    34

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    35/212

    Image sources•  AWS Simple Icons Storage Amazon S3 Bucket with Objects, By Amazon Web

    Services LLC [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via

    Wikimedia Commons

    • iCloud Logo, By EEIM (Own work) [Public domain], via Wikimedia Commons

    • MobileMe Logo, By Apple Inc. [Public domain], via Wikimedia Commons

    References

    35

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    36/212

    Cloud Services

    Cloud Computing

    36

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    37/212

    Cloud Services

    Google Cloud

    • Google App Engine

    ˗ Released as a preview in April 2008

    ˗ PaaS (Platform as a Service) for web applications˗ Provides automatic scaling based on resource

    demands and server load

    • Google Cloud Storage˗ Launched in May 2010

    ˗ Online file storage service

    37

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    38/212

    Cloud Services

    Google Cloud

    • Google BigQuery˗ Released in April 2012

    ˗ Data analysis tool that uses SQL-like queries toprocess big datasets in seconds

    • Google Compute Engine˗ Released in June 2012

    ˗ IaaS (Infrastructure as a Service) support

    to enable on demand launching of VMs (VirtualMachines)

    38

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    39/212

    Cloud Services

    Google Cloud

    • Google Cloud Endpoints˗ Released in November 2013

    ˗ Tool to create services inside App Engine˗ Easily connects from Android, iOS, and JavaScript

    clients

    • Google Cloud DNS (Domain Name System)˗ DNS service supported by the Google Cloud

    39

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    40/212

    Cloud Services

    • Google Cloud Datastore˗ NoSQL (No Structured Query Language) data storage

    • Google Cloud SQL (Structured Query Language)˗ Released in February 2014

    as GA (General Availability)

    ˗ Fully managed MySQL database

    Google Cloud

    40

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    41/212

    Cloud Services

    Amazon S3 (Simple Storage Service)

    • Online file storage web service offered by Amazon Web

    Services

    • Public web service released in the United States in March

    2006 and in Europe in November 2007

    • Provides storage through

    web services interfaces

    (REST, SOAP, and BitTorrent)

    41

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    42/212

    Cloud Services

    Amazon Cloud Drive

    • Amazon Cloud Drive was released in

    March 2011

    • Web storage application from Amazon

    • Storage Space Characteristics˗ Can be accessed from up to eight specific devices (e.g.,

    mobile devices & different computers) and by using

    different browsers on the same computer 

    42

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    43/212

    Cloud Services

    Amazon Cloud Drive

    • Cloud Player (Originally bundled)

    ˗ Users can play music in their Cloud Drive from any

    computer or Android device

    ˗ Music browsing based on song titles, albums, artists,

    genres (website only), and playlists

    43

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    44/212

    Cloud Services

    Amazon Cloud Drive Options

    • Unlimited Photos

    ˗ Unlimited storage for photos & raw data files˗ 5 gigabytes of video storage

    • Unlimited Everything˗ Unlimited storage for photos, videos, documents, and

    various files types

    44

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    45/212

    Cloud Services

    iCloud

    • Developed by Apple, Inc.

    • Public release in October 2011

    • Cloud Storage & Cloud Computing

    • Operating system˗ OS X (10.7 Lion or later)

    ˗ Microsoft Windows 7 or later 

    ˗ iOS 5 or later 

    45

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    46/212

    Cloud Services

    iCloud replaces MobileMe

    • Subscription-based collection of Apple’s online

    services and software

    • MobileMe was replaced by iCloud

    • MobileMe ceased services in

    June 2012

    • MobileMe users were allowed transfers to iCloud

    until

    July 2012

    46

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    47/212

    Cloud Services

    iCloud Features

    • Email, Contacts, and Calendars

    • Find My Friends

    • Backup & Restore˗ Back up feature for device settings & data

    ˗ iOS 5 or later required

    • Find My iPhone˗ Enables a user to track the location of an iOS device or

    Mac

    ˗ Formerly a feature of MobileMe

    47

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    48/212

    Cloud Services

    • Can manage lost or stolen Apple devices

    • Back to My Mac˗ Enables remote log in to other computers that have

    Back to My Mac installed (using the same Apple ID)

    • iWork for iCloud˗ Apple's iWork suite (Pages, Numbers, and Keynote)

    made available on a web interface

    iCloud Features

    48

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    49/212

    Cloud Services

    iCloud Features

    • Photo Stream˗ Can store most recent 1,000 photos

    ˗ Free storage for up to 30 days

    • iCloud Photo Library˗ Stores all photos at original resolution

    ˗ Stores photo metadata

    • Storage (Introduced in 2011)

    ˗ 5 GB of free storage per account

    49

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    50/212

    Cloud Services

    • iCloud Drive˗ Can save photos, videos, documents, and apps

    • iCloud Keychain˗ Secure database for Website and Wi-Fi

    password

    ˗ Secure Credit card & Debit card management for

    quick access and auto-fill

    iCloud Features

    50

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    51/212

    Cloud Services

    • iTunes Match˗ iTunes music library scan and match tracks

    function˗ Serves tracks copied from CDs or other sources

    iCloud Features

    51

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    52/212

    REFERENCES

    Cloud Computing

    52

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    53/212

    • K. Kumar and Y. H. Lu, “Cloud Computing for Mobile Users: Can Offloading

    Computation Save Energy?,” Computer , vol. 43, no. 4, pp. 51–56, Apr. 2010.

    • Wikipedia, http://www.wikipedia.org

    •  Apple, iCloud, https://www.icloud.com

    • Google, Google Cloud, https://cloud.google.com/products [Accessed June 1, 2015]

    • Virtualization, Cisco’s IaaS cloud,

    http://www.virtualization.co.kr/data/file/01_2/1889266503_6f489654_1.jpg[Accessed June 1, 2015]

    • Tutorialspoint, Cloud computing,

    http://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf

    [Accessed June 1, 2015]

    References

    53

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    54/212

    Image sources•  AWS Simple Icons Storage Amazon S3 Bucket with Objects, By Amazon Web

    Services LLC [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via

    Wikimedia Commons

    • iCloud Logo, By EEIM (Own work) [Public domain], via Wikimedia Commons

    • MobileMe Logo, By Apple Inc. [Public domain], via Wikimedia Commons

    References

    54

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    55/212

    Big Data ExamplesBig Data

    55

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    56/212

    Big Data

    New FLU Virus Starts in the U.S.!

    • H1N1 flu virus (which has combined virus elements of thebird and swine (pig) flu) started to spread in the U.S. in2009

    • U.S. CDC (Centers for Disease Control and Prevention) wasonly collecting diagnostic data of Medical Doctors once aweek

    • Using the CDC information to find how the flu wasspreading would have an approximate2 week lag, which is far too slow compared to the speed ofthe virus spreading

    56

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    57/212

    Big Data

    New FLU Virus Starts in the U.S.!

    • What vaccine was needed?• How much vaccine was needed?• Where was the vaccine needed?

    • Vaccine preparation and delivery plans couldnot be setup fast enough to safely prevent thevirus from spreading out of control

    57

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    58/212

    Big Data

    • Fortunately, Google published a paper abouthow they could predict the spread of the winter

    flu in the U.S. accurately down to specificregions and states

    • This paper was published in the journal Naturea few weeks before the H1N1 virus made the

    headline news

    New FLU Virus Starts in the U.S.!

    58

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    59/212

    Big Data

    • Millions of the most common search terms andMillions of different mathematical models were testedon Google’s database• Google receives more than 3 billion search queries

    a day

    • Analysis system was set to look for correlationbetween the frequency of certain search queues and

    the spread of the flu over time and space

    New FLU Virus Starts in the U.S.!

    59

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    60/212

    Big Data

    • Google’s method of analysis did not use dataprovided from hospitals or Medical Doctors

    • Google used Big Data analysis on the most commonsearch terms people use

    • Google’s system proved to be more accurate andfaster than analyzing government statistics

    New FLU Virus Starts in the U.S.!

    60

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    61/212

    Big Data

    Wal-Mart

    • Wal-Mart’s Data Warehouse• Stores 4 petabytes (4 1015) of data

    • Records every single purchase• Approximately 267 million

    transactions a day from 6000stores worldwide is recorded

    61

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    62/212

    Big Data

    • Wal-Mart’s Data Analysis• Focused on evaluating the effectiveness of

    pricing strategies and advertising campaigns

    • Seeking for improvement methodsin inventory management and supply chains

    Wal-Mart

    62

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    63/212

    Big Data

    Recommendation System using Big Data

    • Based on data analysis of simple elements

    • What users made purchases in the past

    • Which items do they have in their virtualshopping cart

    • Which items did customers rate and like

    • What influence did the rating have on other

    customers to make a purchase

    63

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    64/212

    Big Data

    Amazon.com

    • Amazon.com’s Recommendation System• Item-to-Item Collaborative Filtering Algorithm

    • Personalization of the Online Store 

    Customized to each customer 

    • Each customer’s store is based on the customer’spersonal interest• Example: For a new mother, the store will display

    baby supplies and toys

    64

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    65/212

    Big Data

    Citibank

    • Bank operations in 100 countries

    • Big Data analysis on the database of basic financial

    transactions can enable Global insight oninvestments, market changes, trade patterns, andeconomic conditions

    • Many companies (e.g., Zara, H&M, etc.) work withCitibank to locate new stores and factories

    65

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    66/212

    Big Data

    Product Development & Sales

    • For example, a Smartphone takes significant timeand money to manufacture

    • In addition, the duration of popularity for a newSmartphone is limited

    • To maximize sales, a company needs to manufacture just the right amount of products and sell them in the

    right locations

    66

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    67/212

    Big Data

    Product Development & Sales

    • Too much will result in leftovers and abig waste for the company!

    •Too less will result in a lost opportunity for company profitand growth!

    • Big Data analysis can help find how many smartphonesand where the products could be popular based oncommon search terms that people use Use this to alsoestimate how many products could be sold in a certain

    location But why is this difficult?

    67

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    68/212

    REFERENCES

    Big Data

    68

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    69/212

    • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how

    we live, work, and think . Houghton Mifflin Harcourt, 2013.

    • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.

    • J. Venner, Pro Hadoop. Apress, 2009.

    • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data,

     Analytics and the Path From Insights to Value,” MIT Sloan Management Review ,

    vol. 52, no. 2, Winter 2011.• B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating

    revolutionary breakthroughs in commerce, science and society," Computing

    Community Consortium, pp. 1-15, Dec. 2008.

    • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item

    Collaborative Filtering," IEEE Internet Computing , vol. 7, no. 1, pp. 76-80, Jan/Feb.

    2003.

    References

    69

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    70/212

    • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"

    Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.

    • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International

    Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.

    • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and

     Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.

    • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEETransactions on Knowledge and Data Engineering , vol. 26, no. 1, pp. 97–107, Jan.

    2014.

    • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-a-

    Service: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403–

    410, Jun/Jul. 2013.

    References

    70

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    71/212

    • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE

    Transactions on Knowledge and Data Engineering , vol. 24, no. 10, pp. 1904-1916,

    2012.

    • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database

    Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer

    Electronics, vol. 56, no. 2, pp. 392-398, May 2010.

    • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-data.html [Accessed June 1, 2015]

    • Hadoop Apache, http://hadoop.apache.org

    • Wikipedia, http://www.wikipedia.org

    Image sources• Walmart Logo, By Walmart [Public domain], via Wikimedia Commons

    •  Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0

    (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

    References

    71

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    72/212

    Big Data's 4 VsBig Data

    72

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    73/212

    Big Data

    Big Data’s 4 V Big Challenges

    • Volume – Data Size

    • Variety – Data Formats

    • Velocity – Data Streaming Speeds

    • Veracity – Data Trustworthiness

    73

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    74/212

    Big Data

    Volume – Data Size

    • 40 Zettabytes (1021) of data is predicted to be createdby 2020

    • 2.5 Quintillionbytes (1018) of data are created everyday

    • 6 Billion (109) people have mobile phones• 100 Terabytes (1012) of data (at least) is stored by

    most U.S. companies• 966 Petabytes (1015) was the approximate storage size

    of the American manufacturing industry in 2009

    74

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    75/212

    Big Data

    Variety – Data Formats

    • 150 Exabytes (1018) was the estimated size of data forhealth care throughout the world in 2011

    • More than 4 Billion (10

    9

    ) hours each month are used inwatching YouTube• 30 Billon contents are exchanged every month on

    Facebook• 200 Million monthly active users exchange 400 Million

    tweets every day

    75

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    76/212

    Big Data

    Velocity – Data Streaming Speeds

    • 1 Terabytes (1012) of trade information is exchangedduring every trading session at the New York StockExchange

    • 100 sensors (approximately) are installed in moderncars to monitor fuel level, tire pressure, etc.

    • 18.9 Billion network connections are predicted to

    exist by 2016

    76

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    77/212

    Big Data

    Veracity – Data Trustworthiness

    • 1 out of 3 business leaders have experienced trustissues with their data when trying to make a

    business decision

    • $3.1 Trillion (1012) a year is estimated to be wastedin the U.S. economy due to poor data quality

    77

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    78/212

    Big Data

    New technology is needed to overcome these4 V Big Data Challenges

    • Volume – Data Size

    • Variety – Data Formats

    • Velocity – Data Streaming Speeds

    • Veracity – Data Trustworthiness

    78

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    79/212

    REFERENCES

    Big Data

    79

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    80/212

    • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how

    we live, work, and think . Houghton Mifflin Harcourt, 2013.

    • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.

    • J. Venner, Pro Hadoop. Apress, 2009.

    • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data,

     Analytics and the Path From Insights to Value,” MIT Sloan Management Review ,

    vol. 52, no. 2, Winter 2011.• B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating

    revolutionary breakthroughs in commerce, science and society," Computing

    Community Consortium, pp. 1-15, Dec. 2008.

    • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item

    Collaborative Filtering," IEEE Internet Computing , vol. 7, no. 1, pp. 76-80, Jan/Feb.

    2003.

    References

    80

    f

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    81/212

    • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"

    Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.

    • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International

    Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.

    • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and

     Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.

    • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEETransactions on Knowledge and Data Engineering , vol. 26, no. 1, pp. 97–107, Jan.

    2014.

    • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-a-

    Service: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403–

    410, Jun/Jul. 2013.

    References

    81

    R f

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    82/212

    • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE

    Transactions on Knowledge and Data Engineering , vol. 24, no. 10, pp. 1904-1916,

    2012.

    • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database

    Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer

    Electronics, vol. 56, no. 2, pp. 392-398, May 2010.

    • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-data.html [Accessed June 1, 2015]

    • Hadoop Apache, http://hadoop.apache.org

    • Wikipedia, http://www.wikipedia.org

    Image sources• Walmart Logo, By Walmart [Public domain], via Wikimedia Commons

    •  Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0

    (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

    References

    82

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    83/212

    HADOOPBig Data

    83

    H d

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    84/212

    Hadoop

    Data Storage, Access, and Analysis

    • Hard drive storage capacity has tremendouslyincreased

    • But the data read and write speeds to and from thehard drives have not significantly improved yet

    • Simultaneous parallel read and write of data withmultiple hard disks requires advanced technology

    84

    H d

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    85/212

    Hadoop

    • Challenge 1: Hardware Failure

    • Challenge 2: Cost

    ˗ When using many computers for data storage and

    analysis, the probability that one computer will fail isvery high

    ˗ To avoid data loss or computed analysis informationloss, using backup computers and memory is needed,which helps the reliability, but is very expensive

    Data Storage, Access, and Analysis

    85

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    86/212

    Hadoop

    • Challenge 3: Combining Analyzed Data

    ˗ Combining the analyzed data is very difficult

    ˗ If one part of the analyzed data is not ready, then theoverall combining process has to be delayed

    ˗ If one part has errors in its analysis, then the overallcombined result may be unreliable and useless

    Data Storage, Access, and Analysis

    86

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    87/212

    Hadoop

    Hadoop

    • Hadoop is a Reliable Shared Storage and Analysis System

    • Hadoop = HDFS + MapReduce + α

    ˗ HDFS provides Data Storage˗ HDFS: Hadoop Distributed FileSystem

    ˗ MapReduce provides Data Analysis˗ MapReduce = Map + Reduce

    Function Function

    87

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    88/212

    Hadoop

    HDFS: Hadoop Distributed FileSystem

    • DFS (Distributed FileSystem) is designed for storagemanagement of a network of computers

    • HDFS is optimized to store huge files with streamingdata access patterns

    • HDFS is designed to run on clusters of generalcomputers

    88

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    89/212

    Hadoop

    HDFS: Hadoop Distributed FileSystem

    • HDFS was designed to be optimal in performancefor a WORM (Write Once, Read Many times) pattern,

    which is a very efficient data processing pattern

    • HDFS was designed considering the time to read thewhole dataset to be more important than the timerequired to read the first record

    89

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    90/212

    Hadoop

    HDFS

    • HDFS clusters use 2 types of nodes

    • Namenode (master node)

    • Datanode (worker node)

    90

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    91/212

    Hadoop

    HDFS: Namenode

    • Manages the filesystem namespace

    • Maintains the filesystem tree and the metadata for all thefiles and directories in the tree

    • Stores on the local disk using 2 file forms• Namespace Image• Edit Log

    91

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    92/212

    Hadoop

    HDFS: Datanodes

    • Workhorse of the filesystem

    • Store and retrieve blocks when requested by theclient or the namenode

    • Report back to the namenode periodically with listsof blocks that were stored

    92

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    93/212

    Hadoop

    MapReduce

    • MapReduce is a program that abstracts the analysis

    problem from stored data

    • MapReduce transforms the analysis problem into acomputation process that uses a set of keys andvalues

    93

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    94/212

    Hadoop

    MapReduce System Architecture

    • MapReduce was designed for tasks that consume

    several minutes or hours on a set of dedicated trustedcomputers connected with a broadband high-speednetwork managed by a single master data center 

    94

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    95/212

    Hadoop

    MapReduce Characteristics

    • MapReduce uses a somewhat brute-force data analysis

    approach

    • The entire dataset (or a big part of the dataset) isprocessed for every query• 

    Batch Query Processor model

    95

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    96/212

    adoop

    MapReduce Characteristics

    • MapReduce enables the ability to run an ad hoc query

    against the whole dataset within a scalable time

    • Many distributed systems combine data from multiplesources (which is very difficult), but MapReduce doesthis in a very effective and efficient way

    96

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    97/212

    p

    Technical Terms used in MapReduce

    • Seek Time is the delay in finding a file

    • Transfer Rate is the speed to move a file

    • Transfer Rate has improved significantly more (i.e.,now has much faster transfer speeds) compared toimprovements in Seek Time (i.e., still relatively slow)

    97

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    98/212

    p

    MapReduce

    • MapReduce gains performance enhancement throughoptimal balancingof Seeking and Transfer operations

    • Reduce Seek operations• Effectively use Transfer operations

    • In the next lecture, we will compare MapReduce with atraditional RDBMS (Rational Database ManagementSystem)

    98

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    99/212

    REFERENCES

    Big Data

    99

    References

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    100/212

    • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how

    we live, work, and think . Houghton Mifflin Harcourt, 2013.

    • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.

    • J. Venner, Pro Hadoop. Apress, 2009.

    • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data,

     Analytics and the Path From Insights to Value,” MIT Sloan Management Review ,

    vol. 52, no. 2, Winter 2011.

    • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating

    revolutionary breakthroughs in commerce, science and society," Computing

    Community Consortium, pp. 1-15, Dec. 2008.

    • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item

    Collaborative Filtering," IEEE Internet Computing , vol. 7, no. 1, pp. 76-80, Jan/Feb.

    2003.

    100

    References

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    101/212

    • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"

    Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.

    • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International

    Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.

    • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and

     Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.

    • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE

    Transactions on Knowledge and Data Engineering , vol. 26, no. 1, pp. 97–107, Jan.

    2014.

    • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-a-

    Service: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403–

    410, Jun/Jul. 2013.

    101

    References

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    102/212

    • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEETransactions on Knowledge and Data Engineering , vol. 24, no. 10, pp. 1904-1916,

    2012.

    • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database

    Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer

    Electronics, vol. 56, no. 2, pp. 392-398, May 2010.

    • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-

    data.html [Accessed June 1, 2015]• Hadoop Apache, http://hadoop.apache.org

    • Wikipedia, http://www.wikipedia.org

    Image sources• Walmart Logo, By Walmart [Public domain], via Wikimedia Commons

    •  Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0

    (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

    102

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    103/212

    MapReduce vs.RDBMS

    Big Data

    103

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    104/212

    • RDBMS (Rational Database Management System)Characteristics

    • RDBMS is good for updating a small proportion of abig database

    • RDBMS uses a traditional B-Tree, which is highlydependent in the time required to perform seekoperations

    MapReduce vs. RDBMS

    104

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    105/212

    MapReduce vs. RDBMS

    • MapReduce Characteristics

    • MapReduce is good for updating all (or a majority) ofa big database

    • MapReduce uses Sort and Merge to rebuild thedatabase, which depends more on transfer operations

    105

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    106/212

    • RDBMS is good for applications that require thedatasets of the database to be very frequently updated

    (e.g., point queries or small dataset updates)

    • MapReduce is better for WORM (Write Once and ReadMany times) based data applications

    • MapReduce is a complementary system to RDBMS

    MapReduce vs. RDBMS

    106

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    107/212

    MapReduce vs. RDBMSRDBMS MapReduce

    Data Size Gigabytes (109) Petabytes (1012)

    Access Interactive & Batch Batch

    Updates Read & Write Many Times WORM (Write Once,Read Many Times)

    DataStructure Static Schema Dynamic Schema

    Integrity High Low

    Scalability Nonlinear Linear

    107

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    108/212

    MapReduce vs. RDBMS: Data Types

    •   Structured Data: Data that has a formal defined structure (e.g.,XML documents or database tables)

    •   Semi-Structured Data: Data that has a looser format where thedata structure is used as a guide and may be ignored

    •   Unstructured Data: Data that does not have any formalstructure (e.g., plain text or image data)

    108

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    109/212

    MapReduce vs. RDBMS: Data Types

    • MapReduce is very effective on unstructured and semi-structured data

    • Why?

    • MapReduce interprets data during the dataprocessing sessions

    • MapReduce does not use intrinsic properties of thedata as input keys or input values. The parametersusedare selected by the person analyzing the data

    109

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    110/212

    MapReduce vs. RDBMS: Scalability

    • MapReduce has a programming model that is linearlyscalable

    • MapReduce Functions: 2 types• Map function• Reduce function

    • Both of these functions define aKey-Value pair mapping relation(e.g., Key-Value pair 1 Key-Value pair 2)

    110

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    111/212

    Hadoop Release SeriesFeature 1.x 0.22 2.X

    Secure authentication Yes No Yes

    Old configuration names Yes

    New configuration names No Yes Yes

    Old MapReduce API Yes Yes Yes

    New MapReduce APIYes (with some

    missing libraries)Yes Yes

    MapReduce 1 runtime (Classic) Yes Yes No

    MapReduce 2 runtime (YARN) No No Yes

    HDFS Federation No No Yes

    HDFS High-Availability No No Yes

    Release 2.6.0 became available Nov. 2014

    111

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    112/212

    Hadoop Release Series

    • 2.x includes several major new features

    • MapReduce 2 is the new MapReduce runtime

    implemented on a new system called YARN•  YARN

    •  Yet Another Resource Negotiator • General resource management system for

    running distributed applications

    112

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    113/212

    • HDFS Federation partitions the HDFS namespaceacross multiple namenodes• Enables improved support for clusters with very

    large numbers of files

    • HDFS High-Availability feature uses standbynamenodes for backup, and therefore, the namenodeis no longer a potential SPOF (Single Point of Failure)

    Hadoop Release Series

    113

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    114/212

    REFERENCESBig Data

    114

    References

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    115/212

    • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform howwe live, work, and think . Houghton Mifflin Harcourt, 2013.

    • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.

    • J. Venner, Pro Hadoop. Apress, 2009.

    • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data,

     Analytics and the Path From Insights to Value,” MIT Sloan Management Review ,

    vol. 52, no. 2, Winter 2011.

    • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creatingrevolutionary breakthroughs in commerce, science and society," Computing

    Community Consortium, pp. 1-15, Dec. 2008.

    • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item

    Collaborative Filtering," IEEE Internet Computing , vol. 7, no. 1, pp. 76-80, Jan/Feb.

    2003.

    115

    References

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    116/212

    • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.

    • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International

    Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.

    • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and

     Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.

    • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE

    Transactions on Knowledge and Data Engineering , vol. 26, no. 1, pp. 97–107, Jan.2014.

    • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-a-

    Service: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403–

    410, Jun/Jul. 2013.

    116

    References

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    117/212

    • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEETransactions on Knowledge and Data Engineering , vol. 24, no. 10, pp. 1904-1916,

    2012.

    • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database

    Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer

    Electronics, vol. 56, no. 2, pp. 392-398, May 2010.

    • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-

    data.html [Accessed June 1, 2015]• Hadoop Apache, http://hadoop.apache.org

    • Wikipedia, http://www.wikipedia.org

    Image sources• Walmart Logo, By Walmart [Public domain], via Wikimedia Commons

    •  Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0

    (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

    117

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    118/212

    MapReduceBig Data

    118

    MapReduce

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    119/212

    Hadoop

    • Hadoop is a Reliable Shared Storage and Analysis System

    • Hadoop = HDFS + MapReduce + α

    ˗ HDFS provides Data Storage˗ HDFS: Hadoop Distributed FileSystem

    ˗ MapReduce provides Data Analysis˗ MapReduce = Map Function + Reduce Function

    119

    MapReduce

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    120/212

    Scaling Out

    • Scaling out is done by the DFS (Distributed FileSystem),where the data is divided and stored in distributedcomputers & servers

    • Hadoop uses HDFS to move the MapReduce computationto several distributed computing machinesthat will process a part of thedivided data assigned

    120

    MapReduce

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    121/212

    Jobs

    • MapReduce job is a unit of work that needs to beexecuted

    • Job types: Data input, MapReduce program,

    Configuration Information, etc.

    • Job is executed by dividing it into one of two types oftasks

    •   Map Task 

    •   Reduce Task 

    121

    MapReduce

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    122/212

    Node types for Job execution

    • Job execution is controlled by 2 types of nodes•   Jobtracker 

    •   Tasktracker 

    • Jobtracker coordinates all jobs

    • Jobtracker schedules all tasks and assigns the tasksto tasktrackers

    122

    MapReduce

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    123/212

    • Tasktracker will execute its assigned task• Tasktracker will send a progress reports to the Jobtracker 

    • Jobtracker will keep a record of the progress of all jobs executed

    123

    MapReduce

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    124/212

    Data flow

    • Hadoop divides the input into input splits (or splits)suitable for the MapReduce job

    •   Split has a fixed-size

    •   Split size is commonly matched to the size of a HDFSblock (64 MB) for maximum processing efficiency

    124

    MapReduce

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    125/212

    • Map Task is created for each split

    • Map Task executes the map function for all recordswithin the split

    • Hadoop commonly executes the Map Task on thenode where the input data resides

    Data flow

    125

    MapReduce

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    126/212

    Data flow

    •   Data-Local Map Task

    •   Data locality optimization

    does not need to use the cluster network•   Data-local flow process shows why the

    Optimal Split Size = 64 MB HDFS Block Size

    126

    MapReduce

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    127/212

    •   Rack-Local Map Task • A node hosting the

    HDFS block replicas fora map task’s input splitcould be running other map tasks

    • Job Scheduler will look for a free map slot on

    a node in the same rack as one of the blocks

    Map Task

    HDFS Block

    Node

    Rack

    Data Center

    Data flow

    127

    MapReduce

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    128/212

    •   Off-Rack Map Task 

    • Needed when theJob Schedulercannot perform data-local or rack-local map tasks

    • Uses inter-rack network transfer 

    Data flow

    128

    MapReduce

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    129/212

    Map

    • Map task will write its output to the local disk• Map task output is not the final output, it is only the

    intermediate output

    Reduce• Map task output is processed by Reduce Tasks to produce

    the final output• Reduce Task output is stored in HDFS

    • For a completed job, the Map Task output can bediscarded

    129

    MapReduce

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    130/212

    Single Reduce Task

    • Node includes Split, Map, Sort, and Output unit• Light blue arrows show data transfers in a node

    • Black arrows show data transfers between nodes130

    MapReduce

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    131/212

    Single Reduce Task

    • Number of reduce tasks is specifiedindependently, and is not based onthe size of the input

    131

    MapReduce

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    132/212

    Combiner Function

    • User specified function to run on the Map outputForms the input to the Reduce function

    • Specifically designed to minimize the data transferredbetween Map Tasks and Reduce Tasks

    • Solves the problem of limited network speed on thecluster and helps to reduce the time in completingMapReduce jobs

    132

    MapReduce

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    133/212

    Multiple Reducer 

    • Map tasks partition their output, each creating onepartition for each reduce task

    • Each partition may use many keys and keyassociated values

    • All records for a key are kept in a single partition

    133

    MapReduce

    M lti l R d

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    134/212

    Multiple Reducers

    • Shuffle process is used in the data flow

    between the Map tasks and Reduce tasks

    Shuffle

    134

    MapReduce

    Z R d

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    135/212

    Zero Reducer 

    • Zero reducer usesno shuffle process

    • Applied when all of theprocessing can be carriedout in parallel Map tasks

    135

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    136/212

    REFERENCESBig Data

    136

    • V Mayer Schönberger and K Cukier Big data: A revolution that will transform how

    References

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    137/212

    • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform howwe live, work, and think . Houghton Mifflin Harcourt, 2013.

    • T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.

    • J. Venner, Pro Hadoop. Apress, 2009.

    • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data,

     Analytics and the Path From Insights to Value,” MIT Sloan Management Review ,

    vol. 52, no. 2, Winter 2011.

    • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creatingrevolutionary breakthroughs in commerce, science and society," Computing

    Community Consortium, pp. 1-15, Dec. 2008.

    • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item

    Collaborative Filtering," IEEE Internet Computing , vol. 7, no. 1, pp. 76-80, Jan/Feb.

    2003.

    137

    • J R GalbRaith "Organizational Design Challenges Resulting From Big Data "

    References

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    138/212

    • J. R. GalbRaith, Organizational Design Challenges Resulting From Big Data,Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.

    • S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International

    Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.

    • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and

     Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.

    • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE

    Transactions on Knowledge and Data Engineering , vol. 26, no. 1, pp. 97–107, Jan.2014.

    • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-a-

    Service: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403–

    410, Jun/Jul. 2013.

    138

    • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE

    Transactions on Knowledge and Data Engineering vol 24 no 10 pp 1904 1916

    References

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    139/212

    Transactions on Knowledge and Data Engineering , vol. 24, no. 10, pp. 1904-1916,

    2012.

    • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database

    Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer

    Electronics, vol. 56, no. 2, pp. 392-398, May 2010.

    • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-

    data.html [Accessed June 1, 2015]• Hadoop Apache, http://hadoop.apache.org

    • Wikipedia, http://www.wikipedia.org

    Image sources• Walmart Logo, By Walmart [Public domain], via Wikimedia Commons

    •  Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0

    (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

    139

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    140/212

    HDFSBig Data

    140

    HDFS 

    Hadoop

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    141/212

    Hadoop

    • Hadoop is a Reliable Shared Storage and Analysis System

    • Hadoop = HDFS + MapReduce + α

    ˗ HDFS provides Data Storage˗ HDFS: Hadoop Distributed FileSystem

    ˗ MapReduce provides Data Analysis˗ MapReduce = Map Function + Reduce Function

    141

    HDFS 

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    142/212

    • DFS (Distributed FileSystem) is designed for storage

    management of a network of computers

    • HDFS is optimized to store large terabyte size fileswith streaming data access patterns

    HDFS: Hadoop Distributed FileSystem

    142

    HDFS 

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    143/212

    HDFS: Hadoop Distributed FileSystem

    • HDFS was designed to be optimal in performance fora WORM (Write Once,

    Read Many times) pattern

    • HDFS is designed to run on clusters of generalcomputers & servers from multiple vendors

    143

    HDFS 

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    144/212

    HDFS Characteristics

    • HDFS is optimized for large scale and high throughputdata processing

    • HDFS does not perform well in supporting applicationsthat require minimum delay (e.g., tens of millisecondsrange)

    144

    HDFS 

    Blocks

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    145/212

    Blocks

    • Files in HDFS are divided into block size chunks 64Megabyte default block size

    • Block is the minimum size of data that it can read or write

    • Blocks simplifies the storage and replication processProvides fault tolerance & processing speedenhancement for larger files

    145

    HDFS 

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    146/212

    HDFS

    • HDFS clusters use 2 types of nodes

    • Namenode (master node)

    • Datanode (worker node)

    146

    HDFS 

    Namenode

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    147/212

    Namenode

    • Manages the filesystem namespace• Namenode keeps track of the datanodes that have

    blocks of a distributed file assigned

    • Maintains the filesystem tree and the metadata for allthe files and directories in the tree

    • Stores on the local disk using 2 file forms• Namespace Image• Edit Log

    147

    HDFS 

    Namenode

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    148/212

    Namenode

    • Namenode holds the filesystem metadata in its memory

    • Namenode’s memory size determines the limit to thenumber of files in a filesystem

    • But then, what is Metadata?

    148

    HDFS 

    Metadata

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    149/212

    Metadata

    • Traditional concept of the library card catalogs

    • Categorizes and describes the contents and context of

    the data files

    • Maximizes the usefulness of the original data file bymaking it easy to find and use

    149

    HDFS 

    Metadata Types

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    150/212

    Metadata Types

    • Structural Metadata• Focuses on the data structure's design and

    specification

    • Descriptive Metadata• Focuses on the individual instances of application

    data or the data content

    150

    HDFS 

    Datanodes

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    151/212

    Datanodes

    • Workhorse of the filesystem

    • Store and retrieve blocks when requested by the client

    or the namenode

    • Periodically reports back to the namenode with lists ofblocks that were stored

    151

    HDFS 

    Client Access

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    152/212

    Client Access

    •   Client can access the filesystem (on behalf of the user )by communicating with the namenode and datanodes

    • Client can use a filesystem interface (similar to a POSIX(Portable Operating System Interface)) so the user codedoes not need to know about the namenode anddatanodes to function properly

    152

    HDFS 

    Namenode Failure

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    153/212

    • Namenode keeps track of the datanodes that have blocksof a distributed file assigned Without the namenode, thefilesystem cannot be used

    • If the computer running the namenode malfunctions thenreconstruction of the files (from the blocks on thedatanodes) would not be possible Files on thefilesystem would be lost

    153

    HDFS 

    Namenode Failure Resilience

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    154/212

    Namenode Failure Resilience

    • Namenode failure prevention schemes

    1. Namenode File Backup

    2. Secondary Namenode

    154

    HDFS 

    1. Namenode File Backup

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    155/212

    • Back up the namenode files that form the persistentstate of the filesystem’s metadata

    • Configure the namenode to write its persistent state

    to multiple filesystems 

    Synchronous and atomic backup

    • Common backup configuration Copy to Local Diskand Remote FileSystem

    155

    HDFS 

    2. Secondary Namenode

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    156/212

    • Secondary namenode does not act the same way as thenamenode

    • Secondary namenode periodically merges the

    namespace image with the edit log to prevent the edit logfrom becoming too large

    • Secondary namenode usually runs on a separatecomputer to perform the merge process because thisrequires significant processing capability and memory

    156

    HDFS 

    Hadoop 2.x Release Series HDFS Reliability

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    157/212

    Enhancements

    • HDFS Federation

    • HDFS HA (High-Availability)

    157

    HDFS 

    HDFS Federation

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    158/212

    • Allows a cluster to scale by adding namenodes

    • Each namenode manages a

    namespace volume and a block pool •   Namespace volume is made up of the metadata for

    the namespace•   Block pool contains all the blocks for the files in the

    namespace

    158

    HDFS 

    HDFS Federation

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    159/212

    • Namespace volumes are all independent

    • Namenodes do not communicate with each other 

    • Failure of a namenode is also independent to othernamenodes• A namenode failure does not influence the

    availability of another namenode’s namespace

    159

    HDFS 

    HDFS High-Availability

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    160/212

    • Pair of namenodes (Primary & Standby) are set to be inActive-Standby configuration

    • Secondary namenode stores the latest edit log entriesand an up-to-date block mapping

    • When the primary namenode fails, the standbynamenode takes over serving client requests

    160

    HDFS 

    HDFS High-Availability

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    161/212

    • Although the active-standby namenode can takeoveroperation quickly (e.g., few tens of seconds), to

    avoid unnecessary namenode switching, standbynamenode activation will be executed after asufficient observation period(e.g., approximately a minute or a few minutes)

    161

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    162/212

    REFERENCESBig Data

    162

    • V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how

    we live, work, and think . Houghton Mifflin Harcourt, 2013.• T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.

    References

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    163/212

    , p y ,

    • J. Venner, Pro Hadoop. Apress, 2009.

    • S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data,

     Analytics and the Path From Insights to Value,” MIT Sloan Management Review ,

    vol. 52, no. 2, Winter 2011.

    • B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating

    revolutionary breakthroughs in commerce, science and society," Computing

    Community Consortium, pp. 1-15, Dec. 2008.

    • G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item

    Collaborative Filtering," IEEE Internet Computing , vol. 7, no. 1, pp. 76-80, Jan/Feb.

    2003.

    163

    • J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"

    Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.• S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International

    References

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    164/212

    g g g

    Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.

    • M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and

     Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.

    • X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE

    Transactions on Knowledge and Data Engineering , vol. 26, no. 1, pp. 97–107, Jan.

    2014.

    • Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-a-

    Service: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403–

    410, Jun/Jul. 2013.

    164

    • I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE

    Transactions on Knowledge and Data Engineering , vol. 24, no. 10, pp. 1904-1916,2012.

    References

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    165/212

    • M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database

    Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer

    Electronics, vol. 56, no. 2, pp. 392-398, May 2010.

    • IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-

    data.html [Accessed June 1, 2015]

    • Hadoop Apache, http://hadoop.apache.org• Wikipedia, http://www.wikipedia.org

    Image sources• Walmart Logo, By Walmart [Public domain], via Wikimedia Commons

    •  Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0

    (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

    165

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    166/212

    CDN IntroductionCDN (Content Delivery Network)

    166

    CDN 

    Table of Contents

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    167/212

    • CDN Motivation & Structure

    • CDN Procedures

    • Hierarchical Content Delivery Model

    • CDN Market & Major Service Providers

    • CDN Research & Development

    167

    CDN 

    CDN Motivation

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    168/212

    • CDN is a network constructed from a group ofstrategically placed and geographically distributedcaching servers

    • CDN is one of the most efficient solutions for CPs (ContentProviders) in serving a large number of user devices, forreduction in content download time and network traffic

    168

    CDN 

    CDN Motivation

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    169/212

    • Network traffic that is accessed by mobile users (e.g., smartdevices) is rapidly increasing

    • Mobile network performance is highly dependent on the

    content download of multimedia data and applications

    • Several mobile network operators have suffered from serviceoutage or performance deterioration due to the significantincrease in use of mobile devices

    169

    CDN 

    CDN Structure

    Using CDN, both content

    download time and networktraffic are reduced

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    170/212

    ContentProvider 

    CachingServer 

    User 

    Content request and delivery route with CDNContent request and delivery route without CDN

    Storepopular

    contents inadvance

    170

    CDN 

    CDN in Mobile Networks

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    171/212

    • Mobile communication networks have a stronger needfor both reduced traffic load and content delivery timecompared to broadband backbone networks where

    capacity is abundant such that traffic load reduction maynot be as much of a critical issue

    171

    CDN 

    CDN Structure

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    172/212

    • CDN usually consists of the CP (Content Provider) andcaching servers

    • Caching servers are distributed in the networkcontaining selected copies of identical contents that theCP stores

    • CP possesses all contents to serve

    172

    CDN 

    CDN Structure• When a user requests a content to its nearest

    hi th d li th

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    173/212

    caching server , the server can delivery thecontent if the requested content is in its cache

    • Otherwise the caching server redirects theuser’s request to the remotely located CP

    173

    CDN 

    CDN Procedures• When a user requests a content to its nearest caching server, theserver can delivery the content if the requested content is in its

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    174/212

    server can delivery the content if the requested content is in itscache

    174

    CDN 

    CDN Procedures• If the requested content is not in the local server’s cache,

    t t t i di t d t th t l l t d CP

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    175/212

    content request is redirected to the remotely located CP

    175

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    176/212

    CDN 

    Content Aging Procedure

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    177/212

    • Each content has a content update periodTTL (Time to Live)

    • Few seconds for on-line trading• Few seconds for auction information• 24 hours or more for movies

    177

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    178/212

    REFERENCES

    CDN

    178

    • “Content Delivery Functional Architecture in NGN,” Telecommunication

    Standardization Sector of ITU, White Paper, Sep. 2010.

    • “Content delivery networks: Market dynamics and growth perspectives,” Informa

    Telecoms & Media, White Paper, Oct. 2012.

    • Cisco Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update

    References

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    179/212

    • Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update,

    http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-

    index-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015]

    •  Akamai, http://www.akamai.com/index.html/

    • LimeLight, http://www.limelight.com/• Level 3, http://www.level3.com/

    • CDNetworks, http://www.us.cdnetworks.com/

    179

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    180/212

    CDN HierarchicalContent Delivery

    CDN (Content Delivery Network)

    180

    Hierarchical Content Delivery 

    Hierarchical Content Delivery

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    181/212

    • It is not possible for a caching server to save allcontents that the CP (Content Providers) serves

    • Retrieving contents from the remotely located CP cancause a long content download time. In addition, alarge amount of traffic will be generated by eachserver in support of the content’s packet routing

    181

    Hierarchical Content Delivery 

    • For the given cache size of each server, it is important

    Hierarchical Content Delivery

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    182/212

    g , pto maximize the hit rate of the local caching serversuch that the requested contents do not have to beretrieved from the CP

    • To accomplish this objective in the Internet in ascalable way, hierarchical cooperative content deliverytechniques are used in providing content delivery tolocal caching servers

    182

    Hierarchical Content Delivery 

    • CD & LCF (Content Distribution & Location ControlF nctions) controls the o erall content deli er process

    Hierarchical Content Delivery

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    183/212

    Functions) controls the overall content delivery process,and has all content IDs of the CDN

    • CCF (Cluster Control Function) controls multiple CDPFs(Content Delivery Processing Functions) and savescontent IDs of the cluster 

    • CDPF stores and delivers the contents to the users

    183

    Hierarchical Content Delivery 

    Hierarchical Content Delivery Network Example

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    184/212

    184

    Hierarchical Content Delivery 

    Content Delivery Procedures

    C 1

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    185/212

    • Case 1• Requested content is in the local cluster 

    • Content request message is delivered to the CCF

    • CCF sends a session request message to theCDPF to deliver the content to the user 

    • CDPF delivers the content to the user 

    185

    Hierarchical Content Delivery 

    Content Delivery Procedures

    • Case 1 Procedures

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    186/212

    186

    Hierarchical Content Delivery 

    Content Delivery Procedures• Case 2

    • Requested content is not in the local cluster but

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    187/212

    Requested content is not in the local cluster, butanother local cluster (i.e., target cluster) has thecontent

    • Procedures• Content request message is redirected from

    the local cluster to the CD & LCF

    • Continued…

    187

    Hierarchical Content Delivery 

    • Case 2• Procedures Continued…

    Content Delivery Procedures

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    188/212

    Procedures Continued…• CD & LCF checks if the requested content is

    in theother cluster 

    • Requested content can be delivered from thetarget cluster to the user directly, or throughthe local cluster (the local cluster can storethe requested content)

    188

    Hierarchical Content Delivery 

    Content Delivery Procedures• Case 2 Procedures

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    189/212

    189

    Hierarchical Content Delivery 

    • Case 3Wh th t d t t i t i th CDN

    Content Delivery Procedures

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    190/212

    • When the requested content is not in the CDN• Content request message is sent from the

    CD & LCF to the CP

    • CP delivers the content to the user throughthe local cluster • The requested content can be stored in

    the local cluster 

    190

    Hierarchical Content Delivery 

    Content Delivery Procedure

    • Case 3 Procedures

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    191/212

    191

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    192/212

    REFERENCES

    CDN

    192

    • “Content Delivery Functional Architecture in NGN,” Telecommunication

    Standardization Sector of ITU, White Paper, Sep. 2010.

    • “Content delivery networks: Market dynamics and growth perspectives,” InformaTelecoms & Media, White Paper, Oct. 2012.

    • Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update,

    http://www cisco com/c/en/us/solutions/collateral/service provider/visual networking

    References

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    193/212

    http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-

    index-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015]

    •  Akamai, http://www.akamai.com/index.html/

    • LimeLight, http://www.limelight.com/

    • Level 3, http://www.level3.com/

    • CDNetworks, http://www.us.cdnetworks.com/

    193

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    194/212

    CDN MarketCDN (Content Delivery Network)

    194

    CDN Market 

    Measuring the CDN Market Value

    • There are many ways to evaluate the value of the CDN market

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    195/212

    • Evaluation is related to the diverse range of CDN industryparticipants• Example of industry participants

    • CSP (Communications Service Provider)• Industry manufacturers• CDN service providers• Content provider 

    195

    CDN Market 

    • For communication service providers, the CDN’s value

    Measuring the CDN Market Value

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    196/212

    includes improving retail service delivery and supportingtheir efforts to win and retain customers

    • For industry manufacturers, the market value is related tothe demand from telcos, content providers and otherbusinesses

    196

    CDN Market 

    CDN Market Size• 2014 CDN Market size was $3.71 billion• CDNs Market Components

    • Content delivery technologies, hardware, analytics,

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    197/212

    • CDN Market Estimations• Expectations to grow to $12.16 billion by 2019

    • Predicted 26.3% CAGR (Compound Annual Growth Rate) from2014~2019

    Co te t de e y tec o og es, a d a e, a a yt cs,monitoring, encoding, transparent caching, DRM(Digital Rights Management), CMS (Content

    Management System), OVP (Online Video Platform),etc.

    197

    CDN Market 

    CDN Service Providers

    • Akamai has about 110,000 servers over the world.Akamai's service includes cloud computing, HD video

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    198/212

    delivery, etc.

    • Amazon Cloudfront delivers static and streamingcontents. Amazon Cloudfront works seamlessly withother Amazon Web and Cloud Service solutions• S3 (Simple Storage Service)• EC2 (Elastic Compute Cloud)

    198

    CDN Market 

    • CDNetworks has POPs (Point of Presences) in 6continents, including 20 POPs in China. World’s

    CDN Service Providers

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    199/212

    continents, including 20 POPs in China. World s3rd largest, and Asia’s #1, full-service provider 

    • Level 3 supports a comprehensive encoding suitefor video data, and intelligent traffic managerservices (i.e., load balance)

    199

    CDN Market 

    • Limtlight has 6,000 servers at 75 POPs (Points ofPresence), and more than 30 regional content delivery

    CDN Service Providers

  • 8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies

    200/212

    Presence), and more than