mcubed london - data science at the edge

22
Data science at the Edge With NiFi, TensorFlow and a proper cluster for good measure Simon Elliston Ball @sireb

Upload: simon-elliston-ball

Post on 29-Jan-2018

511 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: mcubed london - data science at the edge

Data science at the Edge

With NiFi, TensorFlow and a proper

cluster for good measure

Simon Elliston Ball

@sireb

Page 2: mcubed london - data science at the edge

Simon Elliston Ball

• Product Manager

• Data Scientist

• Elephant herder

• @sireb

Page 3: mcubed london - data science at the edge

Data gravity

588,000,000 km

• Size• Distance

Page 4: mcubed london - data science at the edge

Other types of data gravity

•Compliance

• Legislation

•Political

•Paranoia

Photo: https://flic.kr/p/JvW7qh

Page 5: mcubed london - data science at the edge

Sampling vs Big Data: a quick history

• Before we had cloud, clusters and GPUs…• MPP

• Super Computers

• Grids

• Cut down data size to fit in memory

Page 6: mcubed london - data science at the edge

A quick intro to NiFi

• Guaranteed Delivery

• Prioritized queuing and buffering

• Data provenance

• Bi-directional communication

• Security – Authentication and multi-role authorization

• Visual command and control

• Templating

• Robust API

Page 7: mcubed london - data science at the edge

and lots of adapters

Page 8: mcubed london - data science at the edge

Demo: sending stuff around

• Pushing camera frames to the cloud

Page 9: mcubed london - data science at the edge

Face detectionKey point locations

Lightweight models

Low contextual data

Page 10: mcubed london - data science at the edge

face detection

• Simple haarcascader in opencv: https://github.com/simonellistonball/nifi-OpenCV

Page 11: mcubed london - data science at the edge

Dlib Face

Detection

• 68 Facial Point Model

• c. 100MB

Page 12: mcubed london - data science at the edge

Tensorflow in NiFi

• Our haarcascade was… Face detection didn’t do a great job

• Neural Networks

• Relatively Large models• Haarcascader: 677KB of XML

• Facenet trained model on LFW: 168 MB (and that’s zipped protobufs)

• Tensorflow: https://github.com/tspannhw/nifi-tensorflow-processor.git

Page 13: mcubed london - data science at the edge

Face recognition• Huge databases of face hashes and feature measures

• Extra information and context around the person

• Computationally expensive and heavy network use

• Apple Face ID demo… too many people had tried the device beforehand, blew the database. One or two faces is easy, millions is another matter

Page 14: mcubed london - data science at the edge

Rocket ship to the cloud

https://www.nasa.gov/sites/default/files/thumbnails/image/s83-35620-3k.jpg

Page 15: mcubed london - data science at the edge

Cloud: ML all packaged up… for a price

Page 16: mcubed london - data science at the edge

Tensorflow on Spark

• Why?

• Doesn’t TensorFlow already have a distributed compute model?

Existing clusters, multi-purpose clusters:

• Tensorframes, TensorflowOnSpark, CaffeOnSpark, Spark ML, SQL

• When?

• Training, batch scoring

Page 17: mcubed london - data science at the edge

Broadening the example

• Where is your context?

• Why do you need context?• Detection

• Explanation

Page 18: mcubed london - data science at the edge

Body worn video

• Record everything

• Record when you remember to press the button

• Record when it matters

What about?

• Live assist

• Evidence and accountability

Page 19: mcubed london - data science at the edge

Netflow

Cybersecurity: progressive context

• Record everything: PCAP

• Send up the (maybe) interesting bits

• Fetch detail on demand

PCAP at Edge

1ST Pass Model Security Data Analytics Platform

adds context, more compute intensive modelling etc

Hmmm… That’s interesting

Let me tell you more…

“small” data flow

Page 20: mcubed london - data science at the edge

ANPR: or why you can’t hide from parking fines

Page 21: mcubed london - data science at the edge

Summary: progressive enhancement of context

Is it worth processing? Rough-cut and hashing Expensive deep analysis

@sireb

677KB of local model O(100MB) models Cloud scale models and data

name

Simon Elliston Ball

cognitive.face.emotion

surprise

cognitive.face.exposure

overExposure

cognitive.face.noise

high

Page 22: mcubed london - data science at the edge

Thank you!

@sireb