logitech journey to the cloud - next generation data warehousing
Post on 15-Apr-2017
51 Views
Preview:
TRANSCRIPT
BIG DATA TODAY
Journey to the Cloud - next generation data warehousing
Steven Perelli-MinettiManager – Data Architecture, Logitech
Avi DeshpandePrincipal – Big Data, Logitech
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75552
Cloud empowers IT organizations to redefine the way data services are produced and delivered for Analytics.
more scalable … can reconfigure larger cluster in an hour
more efficient‒ can turn off over the weekend
‒ can clone prod for UAT and drop when done
more reliable‒ AWS automatically does 90% of what our DBAs did
Journey to Cloud
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75553
Challenges of our traditional warehouse
On Prem Data Warehouse could no longer be extended to effectively address our evolving business needs:
Growing too fast for Exadata‒ smallest increase in any resource is a quarter rack
Difficult to set up and tune performance
Difficult to manage usage‒ Resources usage over time
‒ Queries … impact of each team, process
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75554
Journey to the Cloud – DataWarehouse Architecture
AWS GlacierERP
POS
Scrapy
AWS S3
AWS Redshift
Tableau
Pentaho BA
Data Interfaces
Web ServicesD
enodoPentaho DI
MDM
Pentaho Operations Mart
RDSmysql
AWS - EC2DRM
SFDC GitHub
Cloudwatch
SNS
IAM Cloudtrail
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75555
Producer and Consumer processes
Working in the cloud requires a different architecture to optimize use of cloud resources and services
Producers extract data and load to S3 by batch‒ Amazon Simple Storage Service (Amazon S3), provides developers
and IT teams with secure, durable, highly-scalable object storage
Consumers take a batch from S3 to load to Redshift
Asynchronous processes provide simple restart point‒ If Redshift is down, we continue to run producers to load S3 batches,
and restart consumers when Redshift is back up
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75556
Data Producer Process (Source to S3)
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75557
Data Consumer Process (S3 to Redshift)
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75558
Template driven development
Templates provide consistent processes
Simplifies maintenance, enforcement of standards
Makes it easier to develop specs for offshore development
Supports faster development, testing, debugging
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75559
Pentaho PDI Template (Producer Transformation)
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755510
Pentaho PDI Template (Producer Job)
Demonstration
template based development
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755512
Ops Data Mart supports better management
Collects data from our DI and BA servers‒ Collects process metadata over time
‒ Out of the box reports / dashboards
Provides meta data supporting validations‒ Raise a flag if today’s run outside of 2 std deviations of average
Provides history to see changes / trends over time‒ Raise a flag if job run time doubles in a month
‒ Raise a flag if report usage drops over time
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755513
Data Virtualization
Business Layer‒ Keep the business from trolling thru the backend db
‒ Data Consistency through single object, multiple consumers
Security thru Data Virtualization rather than every tool‒ Hard to keep security in synch across multiple analytic tools
Rapid Prototyping‒ Add new data source in DV layer first
‒ Move to Redshift / Pentaho after virtual analytics are validated
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755514
Benefits…
Proactive – IT has embraced cloud as a model for achieving innovation through increased efficiency, reliability and agility
Reusability and template development
Rapid innovation within governance structure, balanced costs, risks and service levels
Greater efficiency and reliability, enabling broader audience to consume IT services via self-service
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755515
What We Covered Today:
Pros / Cons of traditional solution Architecture
Benefits of moving to cloud
Benefits of template development
Benefits of Ops Data Mart
Benefits of Data Virtualization
Summary
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755516
Want to learn more?
Amazon AWS - https://aws.amazon.com/
Data Virtualization https://en.wikipedia.org/wiki/Data_virtualization
http://www.denodo.com/en
Columnar databases - https://aws.amazon.com/redshift/
Pentaho DI and BA - http://community.pentaho.com/
Next Steps
Thank YouJoin the Conversation
#PWorld15
<Avi - @Avinash49799752 ><Steve - @StevePerelliMin>
top related