just in time (series) - kairosdb

Post on 14-Apr-2017

808 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

JUST in timeTime-series &

KairosDBUSES FOR KAIROSDB

Bio-Informatics Engineer

• Business Analyst••

• Data Warehouse Specialist

••

• System Operations / DevOps

@

Who is Victor Anjos?TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

• Founder & Lead Technologist

• Presenter, Speaker, Organizer

• Founder / Do-Gooder

• Engineering Manager

@

Why Real-Time?TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

cqlsh:test> CREATE TABLE example ( ... field1 int PRIMARY KEY, ... field2 int, ... field3 int);

#TCUG

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

cqlsh:test> CREATE TABLE example ( ... field1 int PRIMARY KEY, ... field2 int, ... field3 int);

cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 1,2,3);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 4,5,6);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 7,8,9);

#TCUG

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

cqlsh:test> CREATE TABLE example ( ... field1 int PRIMARY KEY, ... field2 int, ... field3 int);

cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 1,2,3);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 4,5,6);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 7,8,9);

cqlsh:test> SELECT * FROM example;

field1 | field2 | field3--------+--------+-------- 1 | 2 | 3 4 | 5 | 6 7 | 8 | 9

#TCUG

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

[default@test] list example;-------------------RowKey: 1=> (column=, value=, timestamp=1374546754299000)=> (column=field2, value=00000002, timestamp=1374546754299000)=> (column=field3, value=00000003, timestamp=1374546754299000)-------------------RowKey: 4=> (column=, value=, timestamp=1374546757815000)=> (column=field2, value=00000005, timestamp=1374546757815000)=> (column=field3, value=00000006, timestamp=1374546757815000)-------------------RowKey: 7=> (column=, value=, timestamp=1374546761055000)=> (column=field2, value=00000008, timestamp=1374546761055000)=> (column=field3, value=00000009, timestamp=1374546761055000)

#TCUG

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

[default@test] list example;-------------------RowKey: 1=> (column=, value=, timestamp=1374546754299000)=> (column=field2, value=00000002, timestamp=1374546754299000)=> (column=field3, value=00000003, timestamp=1374546754299000)-------------------RowKey: 4=> (column=, value=, timestamp=1374546757815000)=> (column=field2, value=00000005, timestamp=1374546757815000)=> (column=field3, value=00000006, timestamp=1374546757815000)-------------------RowKey: 7=> (column=, value=, timestamp=1374546761055000)=> (column=field2, value=00000008, timestamp=1374546761055000)=> (column=field3, value=00000009, timestamp=1374546761055000)

#TCUG

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

cqlsh:test> CREATE TABLE example ( ... partitionKey1 text, ... partitionKey2 text, ... clusterKey1 text, ... clusterKey2 text, ... normalField1 text, ... normalField2 text, ... PRIMARY KEY ( (partitionKey1, partitionKey2), clusterKey1, clusterKey2 ) ... );

#TCUG

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

cqlsh:test> CREATE TABLE example ( ... partitionKey1 text, ... partitionKey2 text, ... clusterKey1 text, ... clusterKey2 text, ... normalField1 text, ... normalField2 text, ... PRIMARY KEY ( (partitionKey1, partitionKey2), clusterKey1, clusterKey2 ) ... );

cqlsh:test> INSERT INTO example (partitionKey1, ... partitionKey2, clusterKey1, clusterKey2, ... normalField1, normalField2) VALUES ( ... 'partitionVal1', ... 'partitionVal2', ... 'clusterVal1', ... 'clusterVal2', ... 'normalVal1', ... 'normalVal2');

#TCUG

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

cqlsh:test> SELECT * FROM example; partitionkey1 | partitionkey2 | clusterkey1 | clusterkey2 | normalfield1 | normalfield2---------------+---------------+-------------+-------------+--------------+-------------- partitionVal1 | partitionVal2 | clusterVal1 | clusterVal2 | normalVal1 | normalVal2

#TCUG

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

cqlsh:test> SELECT * FROM example; partitionkey1 | partitionkey2 | clusterkey1 | clusterkey2 | normalfield1 | normalfield2---------------+---------------+-------------+-------------+--------------+-------------- partitionVal1 | partitionVal2 | clusterVal1 | clusterVal2 | normalVal1 | normalVal2

[default@test] list example;-------------------RowKey: partitionVal1:partitionVal2=> (column=clusterVal1:clusterVal2:, value=, timestamp=1374630892473000)=> (column=clusterVal1:clusterVal2:normalfield1, value=6e6f726d616c56616c31, timestamp=1374630892473000)

#TCUG

@

Keys in C*TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

1. First part of composite key [inside the inner brackets] is called “Partition Key”, rest [no inside the inner brackets] are “Cluster Keys”.

2. Cassandra stores columns differently when composite keys are used. Partition key becomes row key. Remaining keys are concatenated with each column name (“:” as separator) to form column names (cluster keys). Column values remain unchanged.

3. Cluster keys (other than partition keys) are ordered, and you cannot allowed search on random columns, you have to specify the entire cluster key and can run a range query on the final portion of it.

#TCUG

@

A bit of data modellingTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

USER ACTIVITY DATA MODEL

CREATE TABLE user_activity (… username varchar,… interaction_time timeuuid,… activity_code varchar,… detail varchar… PRIMARY KEY (username, interaction time)… ) WITH CLUSTERING ORDER BY (interaction_time

DESC);

CREATE TABLE user_activity_history (… username varchar,… interaction_date varchar,… interaction_time timeuuid,… activity_code varchar,… detail varchar,… PRIMARY KEY

((username,interaction_date),interaction_time)… );

#TCUG

@

Data modelling 4 QUERIESTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

FIND A CAR IN A LOTCREATE TABLE car_location_index (

… make varchar,… model varchar,… colour varchar,… vehicle_id int,… lot_id,… PRIMARY KEY ((make,model,colour),vehicle_id)… );

#TCUG

@

Data modelling 4 QUERIESTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

FIND A CAR IN A LOT

Truth(iness) Table

#TCUG

@

Data modelling 4 QUERIESTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

FIND A CAR IN A LOTINSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’Mustang’,’Blue’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’Mustang’,’’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’’,’Blue’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’’,’’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘’,’Mustang’,’Blue’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘’,’Mustang’,’’,1234,8675309)

INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘’,’’,’Blue’,1234,8675309)

#TCUG

@

Data modelling 4 QUERIESTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

FIND A CAR IN A LOTSELECT vehicle_id, lot_idFROM car_location_indexWHERE make = ‘Ford’AND model = ‘’AND colour= ‘Blue’;

vehicle_id | lot_id--------------+----------- 1234 | 8675309

SELECT vehicle_id, lot_idFROM car_location_indexWHERE make = ‘’AND model = ‘’AND colour = ‘Blue’;

vehicle_id | lot_id--------------+----------- 1234 | 8675309 8765 | 5551212

#TCUG

@

A Bucketized CounterTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

void prepareTimeBucketStatements(Session session) { Map<TimeUnit, String> ttl = ImmutableMap.of(TimeUnit.SECONDS, String.valueOf(TimeUnit.DAYS.toSeconds(2)), TimeUnit.MINUTES, String.valueOf(TimeUnit.DAYS.toSeconds(14)), TimeUnit.HOURS, String.valueOf(TimeUnit.DAYS.toSeconds(2 * 365)), TimeUnit.DAYS, String.valueOf(TimeUnit.DAYS.toSeconds(3 * 365)));

for (TimeUnit unit: mMetricUnits) { String unitName = unit.toString().toLowerCase().substring(0, unit.toString().length() - 1); switch (mDeliveryType) { case Transactional: mTimeInsertStatements.put(unit, session.prepare("INSERT INTO metrics_by_" + unitName + "_count (row_section_uuid, row_route_verb, row_parameters, row_tschunk, " + "cluster_response_code, cluster_section_uuid, txid, value)" + " VALUES (?, ?, ?, ?, ?, ?, ?, ?) USING TTL " + ttl.get(unit))); mTimeReadStatements.put(unit, session.prepare("SELECT txid, value FROM metrics_by_" + unitName + "_count WHERE row_route_verb = ? AND row_parameters = ? AND row_section_uuid = ? " + "AND row_tschunk = ? AND cluster_response_code = ? AND cluster_section_uuid = ?")); break; case NonTransactional: mTimeUpdateStatements.put(unit, session.prepare("UPDATE metrics_by_" + unitName + "_counter USING TTL " + ttl.get(unit) + " SET value = value + ? WHERE row_route_verb = ? AND row_parameters = ? AND row_section_uuid = ? AND " + "row_tschunk = ? AND cluster_response_code = ? AND cluster_section_uuid = ?")); } } }

void prepareMetricStatement(Session session) { mStatement = session.prepare("INSERT INTO metrics (row_route_verb, row_parameters, row_section_uuid, " + "row_tschunk, cluster_response_code, cluster_ts, route, verb, parameters, response_time) VALUES " + "(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"); }

@

Enter KairosDBTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

[{ "name": "archive.file.tracked", "datapoints": [[1359788400000, 123], [1359788300000, 13.2], [1359788410000, 23.1]], "tags": { "host": "server1", "data_center": "DC1" }},{ "name": "archive.file.search", "timestamp": 999, "value": 321, "tags":{"host":"test"}}]

http://localhost:8080/api/v1/datapoints

http://localhost:8080/api/v1/datapoints/query

#TCUG

@

JAVA to KairosDBTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

public class KairosSynchronousWriter implements Writer { private final Gson mGson; private final HttpClient mClient = new DefaultHttpClient(); private final String mKairosHost; private final String mKairosPort;

public KairosSynchronousWriter(VfConfig config) { GsonBuilder gsonBuilder = new GsonBuilder(); gsonBuilder.registerTypeAdapter(Datapoint.class, new Datapoint.DatapointJsonSerializer()); mGson = gsonBuilder.create(); mKairosHost = config.getString("Writer.kairosHost"); mKairosPort = config.getString("Writer.kairosPort"); }

@Override public void enqueue(Collection<Datapoint> results) { HttpPost post = null; try { post = new HttpPost("http://" + mKairosHost + ":" + mKairosPort + "/api/v1/datapoints"); } catch (URISyntaxException e) { e.printStackTrace(); } StringEntity input = null; try { input = new StringEntity(mGson.toJson(results)); input.setContentType("application/json"); } catch (UnsupportedEncodingException e) { e.printStackTrace(); } post.setEntity(input); try { HttpResponse response = mClient.execute(post); } catch (HttpException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); }

}}

@

JAVA/KairosDB MonitoringTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

public class MonitoringClient { private final String mHostName; private final VfConfig mConfig; private final AggregatorPool mAggregatorPool; private final Writer mWriter; private boolean isActive = true;

public enum AggregationType {...

} public MonitoringClient() { this(new VfConfig("MonitoringClient.properties", "VfMonitoringClient/MonitoringClient.properties")); } public MonitoringClient(VfConfig config) { mConfig = config; try { mHostName = InetAddress.getLocalHost().getHostName(); } catch (UnknownHostException e) { throw new RuntimeException("Unable to initialize Monitoring client", e); } mWriter = createWriter(); mAggregatorPool = new AggregatorPool(mConfig, mWriter); } public void record(String metricName, double value, AggregationType type, String[] tags) { if (isActive) { mAggregatorPool.enqueueInput(new Datapoint(new DatapointKey(metricName, makeTagMap(tags)), value, System.currentTimeMillis(), type)); } } private Writer createWriter() { String type = mConfig.getString("Writer.type"); if (type.equals("log")) { return new LogWriter(); } else if(type.equals("kairosSync")) { return new KairosSynchronousWriter(mConfig); } else { throw new RuntimeException( "Invalid configuration: Writer.type given invalid value, valid values are: kairosSync, log"); } }

@

PYTHON to KairosDBTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

def pushToKairos(metrics): """ Let's push into KairosDB

Data will come in as such:

metrics: { 'name' : 'filterList:<overall|entityName>:<entity|count>', 'time_queried' : <timestamp>, 'value' : <somevalue>, 'tags' : { 'filter|user1' : <filter|user1>, ... 'filter|userN' : <filter|userN>, 'entity1' : <entity1>, ... 'entityN' : <entityN>, ... 'textSentiment' : <positive|negative|neutral> } }

""" import json, requests

### YOU NEED TO CHANGE THIS TO YOUR KAIROS INSTALLATION ENDPOINT ### PORT = 8080 BASE_URL = 'http://localhost:' + str(PORT) + '/api/v1/datapoints'

return requests.post(url=BASE_URL, data=json.dumps(metrics))

@

KairosDB Twitter SentimentTWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

metrics_base = { 'name' : '_'.join(self.filters) + '/overall/sentiment', 'timestamp' : time_queried, 'value' : sentiment_score }

metrics_entity = {}

if entities: for entity in entities: for what in ['sentiment', 'count', 'relevance']: what_name = what if what != 'sentiment' else 'entity_sentiment' value = entity[what] if 'score' not in entity[what] else entity[what]['score'] print 'What_name: ', what_name, ' value: ', value, ' from: ', entity[what], '\n' metrics_entity = { 'name' : '_'.join(self.filters) + '/' + entity['text'].lower().replace(' ', '_') + '/' + what_name, 'timestamp' : time_queried, 'value' : value if value and type(value) is not dict else 0 } for eachtype in entity['type']: tags = {'type': eachtype } metrics.append( dict(metrics_entity, **{'tags': tags}) ) if 'type' in entity[what]: tags = {'textSentiment': entity[what]['type'] } metrics.append( dict(metrics_entity, **{'tags': tags}) ) for filter in self.filters: tags = {'filter':filter} tags['textSentiment'] = sentiment_type if sentiment_type else 'not_applicable' metrics.append( dict(metrics_base, **{'tags': tags}) ) for individual_metric in metrics: status = pushToKairos(individual_metric) if status.status_code != 204: raise Exception('KairosDB Issue...', status.text)

@

All rolled into ONE!!!TWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

https://gist.github.com/vanjos/6169734

Install CCM

Install KairosDBhttps://code.google.com/p/kairosdb/wiki/GettingStarted

#TCUG

@

EMPTY SLIDETWEET

ABOUT US

@VictorFAnjos

@Viafoura

@PlanetCassandra

#TCUG

- overview of why real-time- show some data modeling- show a use for logging (our own Storm code)- show a use for a/b testing (our API counters)- show a use for debugging (our API counters)

- show KairosDB- describe some features- show some visualizations (using Alchemy & twitter)

- conlude with Gists

- announce next meetup with Calliope

top related