mcubed london - data science at the edge

Post on 29-Jan-2018

511 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data science at the Edge

With NiFi, TensorFlow and a proper

cluster for good measure

Simon Elliston Ball

@sireb

Simon Elliston Ball

• Product Manager

• Data Scientist

• Elephant herder

• @sireb

Data gravity

588,000,000 km

• Size• Distance

Other types of data gravity

•Compliance

• Legislation

•Political

•Paranoia

Photo: https://flic.kr/p/JvW7qh

Sampling vs Big Data: a quick history

• Before we had cloud, clusters and GPUs…• MPP

• Super Computers

• Grids

• Cut down data size to fit in memory

A quick intro to NiFi

• Guaranteed Delivery

• Prioritized queuing and buffering

• Data provenance

• Bi-directional communication

• Security – Authentication and multi-role authorization

• Visual command and control

• Templating

• Robust API

and lots of adapters

Demo: sending stuff around

• Pushing camera frames to the cloud

Face detectionKey point locations

Lightweight models

Low contextual data

face detection

• Simple haarcascader in opencv: https://github.com/simonellistonball/nifi-OpenCV

Dlib Face

Detection

• 68 Facial Point Model

• c. 100MB

Tensorflow in NiFi

• Our haarcascade was… Face detection didn’t do a great job

• Neural Networks

• Relatively Large models• Haarcascader: 677KB of XML

• Facenet trained model on LFW: 168 MB (and that’s zipped protobufs)

• Tensorflow: https://github.com/tspannhw/nifi-tensorflow-processor.git

Face recognition• Huge databases of face hashes and feature measures

• Extra information and context around the person

• Computationally expensive and heavy network use

• Apple Face ID demo… too many people had tried the device beforehand, blew the database. One or two faces is easy, millions is another matter

Rocket ship to the cloud

https://www.nasa.gov/sites/default/files/thumbnails/image/s83-35620-3k.jpg

Cloud: ML all packaged up… for a price

Tensorflow on Spark

• Why?

• Doesn’t TensorFlow already have a distributed compute model?

Existing clusters, multi-purpose clusters:

• Tensorframes, TensorflowOnSpark, CaffeOnSpark, Spark ML, SQL

• When?

• Training, batch scoring

Broadening the example

• Where is your context?

• Why do you need context?• Detection

• Explanation

Body worn video

• Record everything

• Record when you remember to press the button

• Record when it matters

What about?

• Live assist

• Evidence and accountability

Netflow

Cybersecurity: progressive context

• Record everything: PCAP

• Send up the (maybe) interesting bits

• Fetch detail on demand

PCAP at Edge

1ST Pass Model Security Data Analytics Platform

adds context, more compute intensive modelling etc

Hmmm… That’s interesting

Let me tell you more…

“small” data flow

ANPR: or why you can’t hide from parking fines

Summary: progressive enhancement of context

Is it worth processing? Rough-cut and hashing Expensive deep analysis

@sireb

677KB of local model O(100MB) models Cloud scale models and data

name

Simon Elliston Ball

cognitive.face.emotion

surprise

cognitive.face.exposure

overExposure

cognitive.face.noise

high

Thank you!

@sireb

top related