just in time (series) - kairosdb

26
JUST in time Time-series & KairosDB USES FOR KAIROSDB

Upload: victor-anjos

Post on 14-Apr-2017

808 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Just in time (series) - KairosDB

JUST in timeTime-series &

KairosDBUSES FOR KAIROSDB

Page 2: Just in time (series) - KairosDB

Bio-Informatics Engineer

• Business Analyst••

• Data Warehouse Specialist

••

• System Operations / DevOps

@

Who is Victor Anjos?TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

• Founder & Lead Technologist

• Presenter, Speaker, Organizer

• Founder / Do-Gooder

• Engineering Manager

Page 3: Just in time (series) - KairosDB

@

Why Real-Time?TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

Page 4: Just in time (series) - KairosDB

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

cqlsh:test> CREATE TABLE example ( ... field1 int PRIMARY KEY, ... field2 int, ... field3 int);

#TCUG

Page 5: Just in time (series) - KairosDB

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

cqlsh:test> CREATE TABLE example ( ... field1 int PRIMARY KEY, ... field2 int, ... field3 int);

cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 1,2,3);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 4,5,6);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 7,8,9);

#TCUG

Page 6: Just in time (series) - KairosDB

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

cqlsh:test> CREATE TABLE example ( ... field1 int PRIMARY KEY, ... field2 int, ... field3 int);

cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 1,2,3);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 4,5,6);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 7,8,9);

cqlsh:test> SELECT * FROM example;

field1 | field2 | field3--------+--------+-------- 1 | 2 | 3 4 | 5 | 6 7 | 8 | 9

#TCUG

Page 7: Just in time (series) - KairosDB

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

[default@test] list example;-------------------RowKey: 1=> (column=, value=, timestamp=1374546754299000)=> (column=field2, value=00000002, timestamp=1374546754299000)=> (column=field3, value=00000003, timestamp=1374546754299000)-------------------RowKey: 4=> (column=, value=, timestamp=1374546757815000)=> (column=field2, value=00000005, timestamp=1374546757815000)=> (column=field3, value=00000006, timestamp=1374546757815000)-------------------RowKey: 7=> (column=, value=, timestamp=1374546761055000)=> (column=field2, value=00000008, timestamp=1374546761055000)=> (column=field3, value=00000009, timestamp=1374546761055000)

#TCUG

Page 8: Just in time (series) - KairosDB

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

[default@test] list example;-------------------RowKey: 1=> (column=, value=, timestamp=1374546754299000)=> (column=field2, value=00000002, timestamp=1374546754299000)=> (column=field3, value=00000003, timestamp=1374546754299000)-------------------RowKey: 4=> (column=, value=, timestamp=1374546757815000)=> (column=field2, value=00000005, timestamp=1374546757815000)=> (column=field3, value=00000006, timestamp=1374546757815000)-------------------RowKey: 7=> (column=, value=, timestamp=1374546761055000)=> (column=field2, value=00000008, timestamp=1374546761055000)=> (column=field3, value=00000009, timestamp=1374546761055000)

#TCUG

Page 9: Just in time (series) - KairosDB

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

cqlsh:test> CREATE TABLE example ( ... partitionKey1 text, ... partitionKey2 text, ... clusterKey1 text, ... clusterKey2 text, ... normalField1 text, ... normalField2 text, ... PRIMARY KEY ( (partitionKey1, partitionKey2), clusterKey1, clusterKey2 ) ... );

#TCUG

Page 10: Just in time (series) - KairosDB

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

cqlsh:test> CREATE TABLE example ( ... partitionKey1 text, ... partitionKey2 text, ... clusterKey1 text, ... clusterKey2 text, ... normalField1 text, ... normalField2 text, ... PRIMARY KEY ( (partitionKey1, partitionKey2), clusterKey1, clusterKey2 ) ... );

cqlsh:test> INSERT INTO example (partitionKey1, ... partitionKey2, clusterKey1, clusterKey2, ... normalField1, normalField2) VALUES ( ... 'partitionVal1', ... 'partitionVal2', ... 'clusterVal1', ... 'clusterVal2', ... 'normalVal1', ... 'normalVal2');

#TCUG

Page 11: Just in time (series) - KairosDB

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

cqlsh:test> SELECT * FROM example; partitionkey1 | partitionkey2 | clusterkey1 | clusterkey2 | normalfield1 | normalfield2---------------+---------------+-------------+-------------+--------------+-------------- partitionVal1 | partitionVal2 | clusterVal1 | clusterVal2 | normalVal1 | normalVal2

#TCUG

Page 12: Just in time (series) - KairosDB

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

cqlsh:test> SELECT * FROM example; partitionkey1 | partitionkey2 | clusterkey1 | clusterkey2 | normalfield1 | normalfield2---------------+---------------+-------------+-------------+--------------+-------------- partitionVal1 | partitionVal2 | clusterVal1 | clusterVal2 | normalVal1 | normalVal2

[default@test] list example;-------------------RowKey: partitionVal1:partitionVal2=> (column=clusterVal1:clusterVal2:, value=, timestamp=1374630892473000)=> (column=clusterVal1:clusterVal2:normalfield1, value=6e6f726d616c56616c31, timestamp=1374630892473000)

#TCUG

Page 13: Just in time (series) - KairosDB

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

1. First part of composite key [inside the inner brackets] is called “Partition Key”, rest [no inside the inner brackets] are “Cluster Keys”.

2. Cassandra stores columns differently when composite keys are used. Partition key becomes row key. Remaining keys are concatenated with each column name (“:” as separator) to form column names (cluster keys). Column values remain unchanged.

3. Cluster keys (other than partition keys) are ordered, and you cannot allowed search on random columns, you have to specify the entire cluster key and can run a range query on the final portion of it.

#TCUG

Page 14: Just in time (series) - KairosDB

@

A bit of data modellingTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

USER ACTIVITY DATA MODEL

CREATE TABLE user_activity (… username varchar,… interaction_time timeuuid,… activity_code varchar,… detail varchar… PRIMARY KEY (username, interaction time)… ) WITH CLUSTERING ORDER BY (interaction_time

DESC);

CREATE TABLE user_activity_history (… username varchar,… interaction_date varchar,… interaction_time timeuuid,… activity_code varchar,… detail varchar,… PRIMARY KEY

((username,interaction_date),interaction_time)… );

#TCUG

Page 15: Just in time (series) - KairosDB

@

Data modelling 4 QUERIESTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

FIND A CAR IN A LOTCREATE TABLE car_location_index (

… make varchar,… model varchar,… colour varchar,… vehicle_id int,… lot_id,… PRIMARY KEY ((make,model,colour),vehicle_id)… );

#TCUG

Page 16: Just in time (series) - KairosDB

@

Data modelling 4 QUERIESTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

FIND A CAR IN A LOT

Truth(iness) Table

#TCUG

Page 17: Just in time (series) - KairosDB

@

Data modelling 4 QUERIESTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

FIND A CAR IN A LOTINSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’Mustang’,’Blue’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’Mustang’,’’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’’,’Blue’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’’,’’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘’,’Mustang’,’Blue’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘’,’Mustang’,’’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘’,’’,’Blue’,1234,8675309)

#TCUG

Page 18: Just in time (series) - KairosDB

@

Data modelling 4 QUERIESTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

FIND A CAR IN A LOTSELECT vehicle_id, lot_idFROM car_location_indexWHERE make = ‘Ford’AND model = ‘’AND colour= ‘Blue’;

vehicle_id | lot_id--------------+----------- 1234 | 8675309

SELECT vehicle_id, lot_idFROM car_location_indexWHERE make = ‘’AND model = ‘’AND colour = ‘Blue’;

vehicle_id | lot_id--------------+----------- 1234 | 8675309 8765 | 5551212

#TCUG

Page 19: Just in time (series) - KairosDB

@

A Bucketized CounterTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

void prepareTimeBucketStatements(Session session) { Map<TimeUnit, String> ttl = ImmutableMap.of(TimeUnit.SECONDS, String.valueOf(TimeUnit.DAYS.toSeconds(2)), TimeUnit.MINUTES, String.valueOf(TimeUnit.DAYS.toSeconds(14)), TimeUnit.HOURS, String.valueOf(TimeUnit.DAYS.toSeconds(2 * 365)), TimeUnit.DAYS, String.valueOf(TimeUnit.DAYS.toSeconds(3 * 365)));

for (TimeUnit unit: mMetricUnits) { String unitName = unit.toString().toLowerCase().substring(0, unit.toString().length() - 1); switch (mDeliveryType) { case Transactional: mTimeInsertStatements.put(unit, session.prepare("INSERT INTO metrics_by_" + unitName + "_count (row_section_uuid, row_route_verb, row_parameters, row_tschunk, " + "cluster_response_code, cluster_section_uuid, txid, value)" + " VALUES (?, ?, ?, ?, ?, ?, ?, ?) USING TTL " + ttl.get(unit))); mTimeReadStatements.put(unit, session.prepare("SELECT txid, value FROM metrics_by_" + unitName + "_count WHERE row_route_verb = ? AND row_parameters = ? AND row_section_uuid = ? " + "AND row_tschunk = ? AND cluster_response_code = ? AND cluster_section_uuid = ?")); break; case NonTransactional: mTimeUpdateStatements.put(unit, session.prepare("UPDATE metrics_by_" + unitName + "_counter USING TTL " + ttl.get(unit) + " SET value = value + ? WHERE row_route_verb = ? AND row_parameters = ? AND row_section_uuid = ? AND " + "row_tschunk = ? AND cluster_response_code = ? AND cluster_section_uuid = ?")); } } }

void prepareMetricStatement(Session session) { mStatement = session.prepare("INSERT INTO metrics (row_route_verb, row_parameters, row_section_uuid, " + "row_tschunk, cluster_response_code, cluster_ts, route, verb, parameters, response_time) VALUES " + "(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"); }

Page 20: Just in time (series) - KairosDB

@

Enter KairosDBTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

[{ "name": "archive.file.tracked", "datapoints": [[1359788400000, 123], [1359788300000, 13.2], [1359788410000, 23.1]], "tags": { "host": "server1", "data_center": "DC1" }},{ "name": "archive.file.search", "timestamp": 999, "value": 321, "tags":{"host":"test"}}]

http://localhost:8080/api/v1/datapoints

http://localhost:8080/api/v1/datapoints/query

#TCUG

Page 21: Just in time (series) - KairosDB

@

JAVA to KairosDBTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

public class KairosSynchronousWriter implements Writer { private final Gson mGson; private final HttpClient mClient = new DefaultHttpClient(); private final String mKairosHost; private final String mKairosPort;

public KairosSynchronousWriter(VfConfig config) { GsonBuilder gsonBuilder = new GsonBuilder(); gsonBuilder.registerTypeAdapter(Datapoint.class, new Datapoint.DatapointJsonSerializer()); mGson = gsonBuilder.create(); mKairosHost = config.getString("Writer.kairosHost"); mKairosPort = config.getString("Writer.kairosPort"); }

@Override public void enqueue(Collection<Datapoint> results) { HttpPost post = null; try { post = new HttpPost("http://" + mKairosHost + ":" + mKairosPort + "/api/v1/datapoints"); } catch (URISyntaxException e) { e.printStackTrace(); } StringEntity input = null; try { input = new StringEntity(mGson.toJson(results)); input.setContentType("application/json"); } catch (UnsupportedEncodingException e) { e.printStackTrace(); } post.setEntity(input); try { HttpResponse response = mClient.execute(post); } catch (HttpException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); }

}}

Page 22: Just in time (series) - KairosDB

@

JAVA/KairosDB MonitoringTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

public class MonitoringClient { private final String mHostName; private final VfConfig mConfig; private final AggregatorPool mAggregatorPool; private final Writer mWriter; private boolean isActive = true;

public enum AggregationType {...

} public MonitoringClient() { this(new VfConfig("MonitoringClient.properties", "VfMonitoringClient/MonitoringClient.properties")); } public MonitoringClient(VfConfig config) { mConfig = config; try { mHostName = InetAddress.getLocalHost().getHostName(); } catch (UnknownHostException e) { throw new RuntimeException("Unable to initialize Monitoring client", e); } mWriter = createWriter(); mAggregatorPool = new AggregatorPool(mConfig, mWriter); } public void record(String metricName, double value, AggregationType type, String[] tags) { if (isActive) { mAggregatorPool.enqueueInput(new Datapoint(new DatapointKey(metricName, makeTagMap(tags)), value, System.currentTimeMillis(), type)); } } private Writer createWriter() { String type = mConfig.getString("Writer.type"); if (type.equals("log")) { return new LogWriter(); } else if(type.equals("kairosSync")) { return new KairosSynchronousWriter(mConfig); } else { throw new RuntimeException( "Invalid configuration: Writer.type given invalid value, valid values are: kairosSync, log"); } }

Page 23: Just in time (series) - KairosDB

@

PYTHON to KairosDBTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

def pushToKairos(metrics): """ Let's push into KairosDB

Data will come in as such:

metrics: { 'name' : 'filterList:<overall|entityName>:<entity|count>', 'time_queried' : <timestamp>, 'value' : <somevalue>, 'tags' : { 'filter|user1' : <filter|user1>, ... 'filter|userN' : <filter|userN>, 'entity1' : <entity1>, ... 'entityN' : <entityN>, ... 'textSentiment' : <positive|negative|neutral> } }

""" import json, requests

### YOU NEED TO CHANGE THIS TO YOUR KAIROS INSTALLATION ENDPOINT ### PORT = 8080 BASE_URL = 'http://localhost:' + str(PORT) + '/api/v1/datapoints'

return requests.post(url=BASE_URL, data=json.dumps(metrics))

Page 24: Just in time (series) - KairosDB

@

KairosDB Twitter SentimentTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

metrics_base = { 'name' : '_'.join(self.filters) + '/overall/sentiment', 'timestamp' : time_queried, 'value' : sentiment_score }

metrics_entity = {}

if entities: for entity in entities: for what in ['sentiment', 'count', 'relevance']: what_name = what if what != 'sentiment' else 'entity_sentiment' value = entity[what] if 'score' not in entity[what] else entity[what]['score'] print 'What_name: ', what_name, ' value: ', value, ' from: ', entity[what], '\n' metrics_entity = { 'name' : '_'.join(self.filters) + '/' + entity['text'].lower().replace(' ', '_') + '/' + what_name, 'timestamp' : time_queried, 'value' : value if value and type(value) is not dict else 0 } for eachtype in entity['type']: tags = {'type': eachtype } metrics.append( dict(metrics_entity, **{'tags': tags}) ) if 'type' in entity[what]: tags = {'textSentiment': entity[what]['type'] } metrics.append( dict(metrics_entity, **{'tags': tags}) ) for filter in self.filters: tags = {'filter':filter} tags['textSentiment'] = sentiment_type if sentiment_type else 'not_applicable' metrics.append( dict(metrics_base, **{'tags': tags}) ) for individual_metric in metrics: status = pushToKairos(individual_metric) if status.status_code != 204: raise Exception('KairosDB Issue...', status.text)

Page 25: Just in time (series) - KairosDB

@

All rolled into ONE!!!TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

https://gist.github.com/vanjos/6169734

Install CCM

Install KairosDBhttps://code.google.com/p/kairosdb/wiki/GettingStarted

#TCUG

Page 26: Just in time (series) - KairosDB

@

EMPTY SLIDETWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

- overview of why real-time- show some data modeling- show a use for logging (our own Storm code)- show a use for a/b testing (our API counters)- show a use for debugging (our API counters)

- show KairosDB- describe some features- show some visualizations (using Alchemy & twitter)

- conlude with Gists

- announce next meetup with Calliope