presentation at google day on big data
DESCRIPTION
TRANSCRIPT
![Page 1: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/1.jpg)
Big Data
![Page 3: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/3.jpg)
Data is growing at a exponential rate and traditional tools like RDBMS is not enough to process
![Page 4: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/4.jpg)
Data is everywhere:
• Flickr (87 million registered members and 3.5 million photos per day)
• YouTube (4B videos streamed per day)• Yahoo! Webmap (3 trillion links, 300TB compressed, 5PB
disk)• Facebook is collecting your data 500 terabytes a day• Walmart handles more than 1 million customer
transactions every hour• IDC Estimates that by 2020, business transactions on the
internet- business-to-business and business-to-consumer – will reach 450 billion per day.
![Page 5: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/5.jpg)
Data is growing at a 40% rate, reaching nearly 45 ZB by 2020 according to IDC
1 ZB is equal to 1 billion TB
![Page 6: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/6.jpg)
What is Big Data and what is not?
• Order details of a e-commerce site• All Orders across 1000s of e-commerce sites• One person’s voter ID information• Every citizen’s voter ID information dataset
Simple Definition: Big Data is Data, that is too big to process with a single machine
![Page 7: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/7.jpg)
What is Big Data?
![Page 8: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/8.jpg)
3 v’s of Big Data
![Page 9: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/9.jpg)
Types of Data:
• Relational Data (Tables/Transaction/Legacy Data)
• Unstructured Data – Apache weblogs• Text Data (Web)• Semi-structured Data (XML) • Graph Data• Social Network, Semantic Web (RDF)• Streaming Data
![Page 10: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/10.jpg)
Data Processing Tasks:
• Aggregation and Statistics - Data warehouse• Contextual Advertising – Real Time Bidding,
Remarketing• Indexing, Searching, and Querying - Keyword
based search, Pattern recognition• Knowledge discovery - Data Mining, Statistical
Modeling
![Page 11: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/11.jpg)
Traditional Architecture
• Relational Data is everything– SQL– Embedded– Client-Server Based
• Data Stack– Web, CDN, Load Balancers, Application, Database
and Storage
![Page 12: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/12.jpg)
Traditional Scalability
• Scale-up– Memory And Hardware has limitations
• Scale-out– Reading
• Cache is everything– Query Cache– Memcache
• Pre-fetching, Replication– Writes
• Redundant Disk Arrays, RAID• Sharding
![Page 13: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/13.jpg)
NoSQL Solution
• Lot of companies emerged to solve data problem• Big Table: Google started to implement massively
distributed scalable system• Many companies followed building scale-out
architecture using commodity hardware• ACID was termed as bad for scaling, so relaxed
consistency model came• Google Big Table and Amazon Dynamo are
notable
![Page 14: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/14.jpg)
Big Data Tools
![Page 15: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/15.jpg)
Big Data Landscape
![Page 16: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/16.jpg)
Thanks
![Page 17: Presentation at Google Day on Big Data](https://reader031.vdocuments.site/reader031/viewer/2022013104/53edea6a8d7f7289708b5f79/html5/thumbnails/17.jpg)
Questions?