using open source in the internet of things

46
Using OpenSource in the Internet of Things

Upload: aaron-mefford

Post on 07-Aug-2015

83 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Using open source in the internet of things

Using OpenSource in the Internet of Things

Page 2: Using open source in the internet of things

What Things?• Exercise equipment• 20,000 communicating units• Growing to 60k units in 3 years• Each unit:– 1 snapshot per sec, sent every 30 sec– 86 bytes packed binary per snapshot

Page 3: Using open source in the internet of things

Storing?• 112,908,571 Snapshots per day• 52,876,800,000 per month• 634,521,600,000 per year• 1295 snapshots/sec on average• 20,400 snapshots/sec peak

• That’s only with 20k machines

Page 4: Using open source in the internet of things

Bytes?• 86 bytes per snapshot• 860 bytes in JSON• 51 TB/yr raw binary in S3• 508 TB/yr in Structured Data

• That’s only with 20k machines

Page 5: Using open source in the internet of things

AWS Costs?

• $1,524/mo in S3• $127k/mo in EBS

• Maybe we should sample some data?

• $1,044/mo EBS with 30 sec samples

Page 6: Using open source in the internet of things

How we gonna do it?

Page 7: Using open source in the internet of things
Page 8: Using open source in the internet of things

Why Node.JS?• Non-blocking, event-driven I/O• C10k problem• How about 1Million Concurrent?

– http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-connections/

• Low Latency• Excellent Binary Conversion• Code already working• It’s Open Source!

Page 9: Using open source in the internet of things

Why Flume-NG?• Proven Log event processing• Scale out architecture• Flexible pipeline design• Many options for connectors– Node.JS (via node-flume) Source– HDFS/S3 Sink– ElasticSearch Sink–MongDB Sink (3rd Party)

• It’s Open Source!

Page 10: Using open source in the internet of things

Why MongoDB?• Scalable Document Data Store• Scale out architecture• Flexible Data Model• Written in C/C++• Significant Existing Interest• Massive Community Support• It’s Open Source!

Page 11: Using open source in the internet of things

Why ElasticSearch?• Super-Scalable Document Store• Lightning Fast Queries• Simple Facetting/Aggregation• Scale out architecture• Flexible Data Model• Excellent Community Support• It’s Open Source!• It’s Easy

Page 12: Using open source in the internet of things

Why Not ElasticSearch?

• It’s written in Java!• Lack of definition of query needs• Lack of Time• No Good Reason!

• I love ElasticSearch!

Page 13: Using open source in the internet of things
Page 14: Using open source in the internet of things

Why Amazon S3?• Super-Scalable Data Store• Cost Effective Storage• Flume HDFS support• Tight Integration with EMR

Page 15: Using open source in the internet of things

Why Amazon EC2?• Customer needed Geo-Diversity• Computer Power on Demand– Auto Scaling– Cloud Formations

• Reduce Operational requirements• Dynamic Dev/QA environments

Page 16: Using open source in the internet of things

Why EMR?• Zero Maintenance Hadoop Cluster• No Ongoing Costs• Pay On Demand• Computer Power On Demand• Spot Instances• S3 Integration

Page 17: Using open source in the internet of things

Flume - Modular

Page 18: Using open source in the internet of things

Flume - Scalable

Page 19: Using open source in the internet of things

Flume - Flexible

Page 20: Using open source in the internet of things

Flume-Config Pipelineagent1.sources = thrift-json-source thrift-bin-sourceagent1.channels = bin-channel json-channelagent1.sinks = s3Sink mongo-sink

Page 21: Using open source in the internet of things

Flume-Config Channelagent1.channels.bin-channel.type = memoryagent1.channels.bin-channel.capacity = 100000agent1.channels.bin-channel.transactionCapacity = 800agent1.channels.bin-channel.keep-alive = 3

Page 22: Using open source in the internet of things

Flume-Config Sourceagent1.sources.thrift-bin-source.type = thriftagent1.sources.thrift-bin-source.channels = bin-channelagent1.sources.thrift-bin-source.bind = 0.0.0.0agent1.sources.thrift-bin-source.port = 51515agent1.sources.thrift-bin-source.threads = 10

Page 23: Using open source in the internet of things

Flume-Config Interceptors

agent1.sources.thrift-bin-source.interceptors = timestamp1 host1agent1.sources.thrift-bin-source.interceptors.timestamp1.type = timestampagent1.sources.thrift-bin-source.interceptors.timestamp1.preserveExisting = trueagent1.sources.thrift-bin-source.interceptors.host1.type = hostagent1.sources.thrift-bin-source.interceptors.host1.hostHeader = hostnameagent1.sources.thrift-bin-source.interceptors.host1.preserveExisting = true

Page 24: Using open source in the internet of things

Flume-Config Mongo Sink

agent1.sinks.mongo-sink.type = org.riderzen.flume.sink.MongoSinkagent1.sinks.mongo-sink.host = mongo-testagent1.sinks.mongo-sink.port = 27017agent1.sinks.mongo-sink.model = singleagent1.sinks.mongo-sink.collection = TestCollectionagent1.sinks.mongo-sink.batch = 100agent1.sinks.mongo-sink.channel = json-channel

Page 25: Using open source in the internet of things

Flume-Config S3 Sinkagent1.sinks.s3Sink.type = hdfs#Specify the channel the sink should useagent1.sinks.s3Sink.channel = bin-channel#Name prefixed to filesagent1.sinks.s3Sink.hdfs.filePrefix = WorkoutSnapshots#Number of seconds to wait before rollingagent1.sinks.s3Sink.hdfs.rollInterval = 600#Timeout after which inactive files get closed#This corresponds to the roundValue below to avoid .tmp filesagent1.sinks.s3Sink.hdfs.idleTimeout = 900#File size to trigger roll in bytes#Currently this is 256MB a good MapReduce numberagent1.sinks.s3Sink.hdfs.rollSize = 268435456#Number of events to trigger a rollagent1.sinks.s3Sink.hdfs.rollCount = 100000#File Format SequenceFile, DataStream or CompressedStream#(1)DataStream will not comress#(2)CompressedStream requires hdfs.codeC setagent1.sinks.s3Sink.hdfs.fileType = DataStream#HDFS Write Format Text or Writable(default)agent1.sinks.s3Sink.hdfs.writeFormat = Textagent1.sinks.s3Sink.hdfs.path = s3n://<API_KEY>:<API_SECRET>@dapi-workoutsnapshots/%Y-%m-%d/%H%M#Make S3 files in 15 minute buckets# Should the timestamp be rounded down (if true, affects all time based escape sequences except %t)agent1.sinks.s3Sink.hdfs.round = true# Rounded down to the highest multiple of this (in the unit configured using hdfs.roundUnit), less than current time.agent1.sinks.s3Sink.hdfs.roundValue = 15# The unit of the round down value - second, minute or hour.agent1.sinks.s3Sink.hdfs.roundUnit = minute

Page 26: Using open source in the internet of things

Flume-Config S3 Sink

agent1.sinks.s3Sink.type = hdfsagent1.sinks.s3Sink.channel = bin-channel

Page 27: Using open source in the internet of things

Flume-Config S3 Sink#Name prefixed to filesagent1.sinks.s3Sink.hdfs.filePrefix = WorkoutSnapshots

#Number of seconds to wait before rollingagent1.sinks.s3Sink.hdfs.rollInterval = 600

#Timeout after which inactive files get closed#This corresponds to the roundValue below to avoid .tmp filesagent1.sinks.s3Sink.hdfs.idleTimeout = 900

Page 28: Using open source in the internet of things

Flume-Config S3 Sink#File size to trigger roll in bytes#Currently this is 256MB a good MapReduce numberagent1.sinks.s3Sink.hdfs.rollSize = 268435456

#Number of events to trigger a rollagent1.sinks.s3Sink.hdfs.rollCount = 100000

Page 29: Using open source in the internet of things

Flume-Config S3 Sink#File Format SequenceFile, DataStream or CompressedStream#(1)DataStream will not compress#(2)CompressedStream requires hdfs.codec setagent1.sinks.s3Sink.hdfs.fileType = DataStream

#HDFS Write Format Text or Writable(default)agent1.sinks.s3Sink.hdfs.writeFormat = Text

Page 30: Using open source in the internet of things

Flume-Config S3 Sinkagent1.sinks.s3Sink.hdfs.path = s3n://<API_KEY>:<API_SECRET>@dapi-workoutsnapshots/%Y-%m-%d/%H%M

#Make S3 files in 15 minute buckets# Should the timestamp be rounded downagent1.sinks.s3Sink.hdfs.round = true

# Rounded down to the highest multiple of thisagent1.sinks.s3Sink.hdfs.roundValue = 15

# The unit of the round down value - second, minute or hour.agent1.sinks.s3Sink.hdfs.roundUnit = minute

Page 31: Using open source in the internet of things

Node.JS-node-logger• Versions are important!• Node-logger built for Flume v1 not NG• Searched Much!

• Flume-NG changed Thrift definition– Uses TCompactProtocol– TCompactProtocol not supported in node-thrift

• Node-logger does not work with Flume-NG

Page 32: Using open source in the internet of things

Node.JS-new-logger• Thank heavens for Open Source!• C++ Has TCompactProtocol• My C++ foo is not strong enough!

– Get Sync POC working!– Can’t figure out how to do Async in C++– Node.JS really needs Async

• Randy Abernathy steps up after post to thrift list• TCompactProtocol implemented in node-thrift

– THRIFT-2511 available in 0.9.2

• Node.JS now communicating with Flume

Page 33: Using open source in the internet of things

Node.JS-Flume

• Closed Source is a bummer!

• Compile Thrift Stubs– thrift -r --gen js:node flume.thrift– Generates Client and Server

• Wrap Stubs in a module• Send messages to Flume

Page 34: Using open source in the internet of things

Validation-Setup

• 2 Front-End Node.JS servers– M3.medium– AWS Elastic Load Balancer

• 3 Node MongoDB Replica Set– M3.xlarge– 4 EBS volumes per node 100 IOPS

• 1-3 Node Load Generator Cluster– Custom Python Scripts

Page 35: Using open source in the internet of things

Validation-Requests

Page 36: Using open source in the internet of things

Validation-Node.JS CPU

Page 37: Using open source in the internet of things

Validation-Latency

Page 38: Using open source in the internet of things

Validation-MongoDB CPU

Page 39: Using open source in the internet of things

Validation-MongoDB Net

Page 40: Using open source in the internet of things

Failure-Setup

• Establish load• Disable 1 Front-End• Measure impact• Restore Front-End• Measure recovery

Page 41: Using open source in the internet of things

Failure-Node.JS

Page 42: Using open source in the internet of things

Failure-Node.JS Req

Page 43: Using open source in the internet of things

Failure-Node.JS Lat

Page 44: Using open source in the internet of things

Test Results

• Node.JS servers capable 10k-20k rpm• MongoDB can serve many front-ends• Latency excellent• Architecture can handle failures• Ingested 23 million snapshots in 3 hours

Page 45: Using open source in the internet of things

Conclusion

• The Internet of Things is here!• Things generate a lot of data!• Avalon Consulting, LLC provided an awesome

learning experience!• I would look at Couchbase now!

– Avalon Benchmark - MongoDB 3.0 vs. Couchbase Server 3.0.2– http://news.avalonconsult.com/2015/03/19/performance_benchmark/

Page 46: Using open source in the internet of things

• Leave Comments on Joind.in– https://joind.in/13908

• Avalon Consulting, LLC is Hiring!– http://www.avalonconsult.com/– http://www.avalonconsult.com/career-opportunities

• Avalon Consulting, LLC can help you implement!

Thank You!