#bdam: cask market - big data's app store
TRANSCRIPT
cask.cocask.co1
Big Data’s App Store
Albert ShauBDAM 2017-02-08
cask.cocask.co
The Big Data PromiseSolving new Problems with Powerful
Infrastructure
2
cask.cocask.co
The Big Data PromiseSolving new Problems with Powerful
Infrastructure
3
Anomaly DetectionNatural Language Processing
ClassificationRecommendation Systems
Realtime Event Monitoring
cask.cocask.co
From Infrastructure to Applications
4
1.Easy to write powerful applications
2. Easy to distribute, discover, and install applications
cask.cocask.co
Cask Market
5
Distribute, discover and install applications, plugins, and data
cask.cocask.co
Architecture
6
cask.cocask.co
Demo
7
cask.cocask.co
Terminology
8
Package - A collection of jars, applications, datasets, streams, and configuration that can be installed on a CDAP instance
Package Spec - Metadata about a package and a list of installation steps
Catalog - List of packages in the market
cask.cocask.co
API
9
GET market.cask.co/v1/packages.json
[ { "name": "datapack-access-log", "version": "1.0.0", "description": "Sample access logs in Combined Log Format (CLF).", "label": "Access Log Sample", "author": "Cask", "org": "Cask Data, Inc.", "created": 1473901763, "beta": false, "categories": [ "datapack" ] }, ...]
cask.cocask.co
GET market.cask.co/v1/packages/usecase-sms-spam-pipeline/1.0.0/spec.json
{ "actions": [ ... { "type": "load_datapack", "label": "Labeled Messages Sample", "arguments": [ { "name": "name", "value": "labeledSMSTexts" }, { "name": "files", "value": [ "labeled_texts.tsv" ] } ] }, ...}
API
10
cask.cocask.co
{ "actions": [ ... { "type": "create_pipeline", "label": "Spam Classifier Trainer", "arguments": [ { "name": "artifact", "value": { "scope": "system", "name": "cdap-data-pipeline", "version": "4.0.0" } }, { "name": "config", "value": "trainer.json" }, ... ] }}
API
11
cask.cocask.co
Execution
12
CDAP
Big Data Infrastructure
PUT /v3/namespaces/default/streams/smsTexts
cask.cocask.co
Execution
13
CDAP
Big Data Infrastructure
POST /v3/namespaces/default/streams/smsTexts/batch
cask.cocask.co
Execution
14
CDAP
Big Data Infrastructure
PUT /v3/namespaces/default/apps/SpamTrainer
cask.cocask.co
Hosting
15
Just serve static content through HTTP/v1/packages.json…/v1/packages/hydrator-plugin-solrsearch/1.5.0/spec.json/v1/packages/hydrator-plugin-solrsearch/1.5.0/solrsearch-plugins-1.5.0.json/v1/packages/hydrator-plugin-solrsearch/1.5.0/solrsearch-plugins-1.5.0.jar.../v1/packages/usecase-sms-spam-pipeline/1.0.0/spec.json/v1/packages/usecase-sms-spam-pipeline/1.0.0/trainer.json/v1/packages/usecase-sms-spam-pipeline/1.0.0/classifier.json/v1/packages/usecase-sms-spam-pipeline/1.0.0/texts.txt/v1/packages/usecase-sms-spam-pipeline/1.0.0/labeled_texts.txt
cask.cocask.co
Looking Forward
16
Improve discoverability
Formalize external contribution process
Package dependencies
More Applications!
cask.cocask.co
Summary
17
Cask Market is a way to distribute, discover, and install Big Data applications
Cask hosts a public market, but enterprises can easily host their own
Move from infrastructure to applications