bigdata workshop introduction session - ahmedabad java meetup
TRANSCRIPT
For Ahmedabad Java Meetup Group (300+ members strong now!)
Big Data Workshop An introduction and workshop launch session
May, 2014Dhruv GohilFrom Ishi systems
Welcome!Why a workshop and not a presentation
What you should do in workshop?
What is expected from you in this session
What you should expect from this session?
What are up-coming sessions going to be like?
Seems too serious?
Now, This is much better!
So, let's change the font!
OK... So what are we gonna do today?
Workshop setup and series introductionAlready done! (See it's easy!)
Big is not only big.
Why we need 'Big data'?
What 'Big data' is NOT?
fear of Big data? Kick it off!
Let me tell you a story..
http://en.wikipedia.org/wiki/Information_Management_System
If you still think about 'Entities' and 'Tables'
Everything you have been taught in college about Database is ALL WRONG.
http://slideshot.epfl.ch/play/suri_stonebraker
Big Data is...
http://www.ibmbigdatahub.com/infographic/four-vs-big-data
Big Data is not only big
Volume, Velocity, VarietyGB/TB vs PB/EBCentralized vs DistributedStructured vs Semi-Structured/UnstructuredData Model vs SchemaKnown relationships vs Flexible associations
What 'Big data' is NOT?
Big data Hadoop , Hadoop Big data !
What 'Big data' is NOT?
Applying for a job here?Hadoop !
What 'Big data' is NOT?
Why always Hadoop comes to mind with big data?What else we should know?Tools vs MethodologiesBeing too futuristic vs. being practical/economical
Big Data in your organization
http://www.fakingnews.firstpost.com/2014/04/transcript-of-rahul-gandhis-interview-for-job-of-a-c-programmer/We brought RTSC. Right To Source Code.Now, deal with it.
Big Data in your organization
Cost of tools/software decreases, but cost of knowledge increases
Being agile is the only way to deal competition
Are you working with...
Social networking and media
Mobile devices
Internet transactions
Networked devices and sensors
Big Data in your product/service
Have to change thinking in perspective of access vs. storage
Design based on when/where data is used vs. when/where data is produced.
Use redundancy in contrast of storage cost
Understand NoSQL = Not Only SQL
Streams
In memory analytics
Massively parallel processing (Data crunching)
Big Data in your project
Random Research says.. 99% client of yours asked for Big Data project, ended up having total paid customers less then your own fingers.
A Project hits Business scalability much much earlier then technical scalability.
Big Data for your clients
Business first - technology second
Current reality for client projects:
Use big data tools which works at small scale :-)
Design with domain in mind not the database client suggests.
Always design for read optimization in mind (the golden rule)
Big Data project for small data customers
If you can do it postgresql, then do it postgresql (the blue elephant rule)
Few important tips..
The CAP theorem- Basics of NoSQL Databases
Read a lot about design of database before using any non traditional database. Or read good negative posts to know when NOT to use it.
e.g. : http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
Now... the good parts !
It's your time to speak now!
Workshop session:About practical selection of technology and design for real word use cases.
All references used in workshop reference
Basic hadoop introductory material : http://www.coreservlets.com/hadoop-tutorial/
Evaluate hadoop without installation : http://go.cloudera.com/cloudera-live.html
Postgresql good parts : http://www.slideshare.net/Aveic/postgresql-34323147
Postgresql as NOSQL column store : http://postgresguide.com/sexy/hstore.html
Postgresql as Elastic search basic functionality : http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/
Good big data compatible OSS softwares : http://netflix.github.io/
Practical Hbase usage : https://www.facebook.com/UsingHbase
Using cassandra for write heavy applications : http://www.datastax.com/1-million-writes
On-line analytics in STORM : http://hortonworks.com/hadoop/storm/
E-commerce Domain specific use case : http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376
Good use case of selecting data store based on proper understanding of CAP theorem : http://tech-blog.flipkart.net/2013/01/nosql-for-a-user-engagement-platform/
Recommendation engine in Big Data scenarios : http://www.slideshare.net/hava101/recommendations-play-flipkart-14115791
High volume log proessing: http://www.splunk.com/view/product-tour/SP-CAAAAGV Open source alternatives : http://logstash.net/ and http://graylog2.org/
CLIQUE PARA EDITAR O FORMATO DO TEXTO DO TTULO
Clique para editar o formato do texto da estrutura de tpicos2. Nvel da estrutura de tpicos3. Nvel da estrutura de tpicos4. Nvel da estrutura de tpicos5. Nvel da estrutura de tpicos6. Nvel da estrutura de tpicos7. Nvel da estrutura de tpicos
Click to edit the title text format
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline Level