sinfonier: how i turned my grandmother into a data analyst - fran j. gomez - codemotion amsterdam...

42
Sinfonier: How I turned my grandmother into a data analyst. @ffranz AMSTERDAM 11-12 MAY 2016

Upload: codemotion

Post on 12-Apr-2017

371 views

Category:

Technology


2 download

TRANSCRIPT

Title

Sinfonier: How I turned my grandmother into a data analyst.@ffranz

AMSTERDAM 11-12 MAY 2016

My Granma first time using Sinfonier

So here you can see my Granma first time using Sinfonier :-D Its a joke. In fact this is me first time I used Storm.2

Why some technologies must be difficult to use?

This is the question that pops into my head first time I see the Storm potential. Why people with no deep technological skills can use a great technology like Storm?So

3

Let's not repeat our mistakes

4

What is trying to change Sinfonier?Sinfonier is a change in the focus respect to current solutions in the area of processing information in real-time. We combine an easy-to-use interface, modular and adaptable, and we integrate it with an advanced technological solution to allow you to do the necessary tune up suitable for your needs in matters of information security.

Were trying to join people using the same technology and adapting it to all of them. Developers and analyst working together.

5

How many of you knows Apache Storm?

Hands up!6

HadoopStormLarge but finite jobsInfinite computations called TopologiesProcesses a lot of data at onceProcess Infinite streams of data one tuple at a timeHigh LatencyLow Latency

MapReduce, Hadoop, and related technologies have made it possible to store and process data at scales previously unthinkable. Unfortunately, these data processing technologies are not real time systems. There's no way that will turn Hadoop into a real time system.In a nutshell these are the main differences between Hadoop and Storm:8

Storm implements Topologies. A Topology is a graph where each node is a spout or a bolt. Edges indicating which bolt subscribes to which stream.

Storm users define topologies for how to process the data when it comes streaming in from the spout. When the data comes in, it is processed and the results are passed to output systems.9

What do you need to use Storm?

Apache Storm Cluster

What do you need to use Storm?- You need to know how to deploy a cluster.- After that you have your cluster running and you must learn how to create a topology- A topology is just a few modules interconnected, so its necessary to create these modules.- When you have your topology its time to deploy it using the command line tool or API.

11

What do you need to use Sinfonier?

Sinfonier tries to make available to users a real-time framework as Apache-STORM but with an intuitive graphical interface, an abstraction layer to facilitate the development of new functions and cluster management and an open, collaborative community.

As we will see, Sinfonier supports modeling algorithms (Apache Storm works with diagrams DAG - directed acyclic graph -) and the creation of software components (modules that enable the creation of algorimos) for processing and collecting information so simple and can use multiple programming languages (Java and Python initially, but in the future languages like Ruby and Node.js will be added).

15

Apache Storm ClusterSinfonier Drawer

Apache StormSinfonier ProjectTopologiesProgrammatically DAGVisual DAGComponentsSpouts and BoltsSpouts, Bolts and DrainsGroupingShuffle, Field, All (+4)ShuffleData ModelTuplesJSON Tuple K,V

17

Spout: This collects the information from an external source. It typically reads from a queue system like Kestrel, RabbitMQ or Kafka, but a Spout can also generate its own data stream by reading from sources which have a Streaming API such as Twitter.Bolt: This is responsible for processing the input data it receives from a Spout or another Bolt and adding new output data. Most computing logic is performed in the Bolts, such as functions, filters, data links, communication with databases, etc.Drain: This processes all the data which reach it and sends them to an external service, usually to store information on a database, to represent the information on a Dashboard or just to send it to a Log system.

18

DRAIN

BOLT

SPOUT

BOLT

DRAIN

DRAIN

SPOUT

19

One of the requirements to implement a collaborative system like Sinfonier is to ensure all modules are able to communicate. For this reason Sinfonier defines a data model based on a single JSON tuple instead of Apache Storm data model based on an indeterminate number of tuples .This approach allow users to share and use others users modules into the platform and Sinfonier includes an API to manage this JSON tuple in a simple way. Allowing users to add new fields, remove fields and check fields exists. API current support Java and Python language.

20

TITLE

Look how an XML item (from a RSS) is transformed into a JSON.

http://mqtt.org/feed21

* RandomSentenceSpout: this is the spout which will be responsible for generating the phrases with which the word count will be performed.* SplitSentence: this is the bolt responsible for identifying the words which appear in each of the sentences, generating a list whose elements are the different words in each of the sentences.* WordCount: this is the bolt responsible for counting the number of words which make up the list generated by the previous module.TopologyBuilder builder = new TopologyBuilder();builder.setSpout("spout", new RandomSentenceSpout(), 5);builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word")); Sinfonier vs. Storm

Sinfonier vs. Storm

Tool area: This contains the different modules which the user has added to his or her toolbox and which can be incorporated into the topology by simply selecting the module in question and dragging it to the Canvas area.

Canvas: This is the work area upon which the topology is constructed, incorporating the different modules and allowing them to connect to each other. It is also in this area where the topology will be edited in its different versions.

Context information: Two main sections can be distinguished within this area: Properties, where a name is given to the topology and, optionally, a description of it is added, with the section of the Minimap showing an overview of the topology.

24

RSS Demo

http://mqtt.org/feed

mongodb://mqtt:[email protected]:23000/mqtt25

Developers, developers, developers

26

Name: Your module name. Must be UpperCamelCaseIcon: Add an image.Entity: In order to catalog.Type: Choose your type of module. Spout, Bolt and Drain. Wont be change.Language: Java or PythonCode: Url point to Gist.github.comDescription: Describe what you module do.Fields: Declare your parameters.

States of the moduleDeveloping

Pending

Private

Published

DeletedCannot use in TopologiesJust declaredCannot use in TopologiesValidation pendingOwner can use in TopologiesCode changes must be validatedModule only available on existing topologiesEverybody can use in Topologies

DevelopingThis is the first module status just after its creation. At this point, the module can only be seen by the creator. Until the validation request is sent, you can change the code module. After sending the request, the module status will move to pending.PendingThis is the status after the creation and development of a module. At this point, the module must be validated by a Sinfonier administrator. The administrator has two possibilities, if the module is correct and the manager deems it appropriate, the module will be validated and the status of the module will switch to private status. If it is not considered correct, the module will be returned to the developing status.PrivateWhen a module is in this status, it can only be seen by its creator and can only be used to implement new topologies for this user. If the user wishes, the module can be published by moving its status to published.PublishedThis status is very similar to the private status, with the difference that at this point any Sinfonier user can use this module to implement his new topologies. The creator of the module can switch between the private and published statuses at any time. If the creator of the module changes the code in the private or published status, you will need a new validation, and so the status will once again change to pending. When it is validated it will move to private status and you will need to publish it again if you want it to go back to the published status.DeletedThis is a status of no return, when a module is deleted it will never be available again. However, it can still be used in topologies which are already implementing it. Only the creator of a module can delete it.

29

http://virtualmachine.sinfonier-project.net

Lets Play

https://github.com/search?utf8=%E2%9C%93&q=shodan+api&type=Code&ref=searchresults

https://public.ducksboard.com/XBc7Sa8nZmspsh_Oc6Gr/

https://api.shodan.io/api-info?key=oCiMsgM6rQWqiTvPxFHYcExlZgg7wvTt (DEV)https://api.shodan.io/api-info?key=qAMQ71XPyyBz2dnmKOgX1Oou250OGQYL (DEV)https://api.shodan.io/api-info?key=nKQx5VGqgryRvTNBbfg9SCAuQSs66IS2 (OSS)https://api.shodan.io/api-info?key=Mawfb3ne6mFteBiRLkbDH6v5UfKyizdj (OSS)https://api.shodan.io/api-info?key=BgGlm7PGISqGOEieypFvUE0kuQtIBKeP (OSS)https://api.shodan.io/api-info?key=oykKBEq2KRySU33OxizNkOir5PgHpMLv (DEV)https://api.shodan.io/api-info?key=v4YpsPUJ3wjDxEqywwu6aF5OZKWj8kik (OSS)https://api.shodan.io/api-info?key=c1twrStvBUBJHq7euqxhap1XOZuMmouY (DEV)https://api.shodan.io/api-info?key=uAs67OallytytIdagyHKO1nAWxYetniW (OSS)https://api.shodan.io/api-info?key=aH7F5pcxsC3U9i7hmPUYA6vwdehxxNeP (OSS)https://api.shodan.io/api-info?key=dj2asy53eJcEu6GAYu5PRrVoKHOVTCFl (OSS)https://api.shodan.io/api-info?key=SjGOMa1LUc5RYsU4bWAmhNkInLaxphfC (OSS)

32

{ "_id":{ "$oid":"5731928be4b0be5c2b777447" }, "guid":{ "content":"http://mqtt.org/?p=522", "isPermaLink":false }, "pubDate":"Fri, 07 Nov 2014 13:11:28 +0000", "category":"news", "title":"MQTT v3.1.1 now an OASIS Standard", "slash:comments":5}{ "_id-$oid":"5731928be4b0be5c2b777447, "guid-content":"http://mqtt.org/?p=522", "guid-isPermaLink":false, "pubDate":"Fri, 07 Nov 2014 13:11:28 +0000", "category":"news", "title":"MQTT v3.1.1 now an OASIS Standard", "slash:comments":5}

https://public.ducksboard.com/XBc7Sa8nZmspsh_Oc6Gr/

https://github.com/search?utf8=%E2%9C%93&q=shodan+api&type=Code&ref=searchresults

https://public.ducksboard.com/XBc7Sa8nZmspsh_Oc6Gr/

https://api.shodan.io/api-info?key=oCiMsgM6rQWqiTvPxFHYcExlZgg7wvTt (DEV)https://api.shodan.io/api-info?key=qAMQ71XPyyBz2dnmKOgX1Oou250OGQYL (DEV)https://api.shodan.io/api-info?key=nKQx5VGqgryRvTNBbfg9SCAuQSs66IS2 (OSS)https://api.shodan.io/api-info?key=Mawfb3ne6mFteBiRLkbDH6v5UfKyizdj (OSS)https://api.shodan.io/api-info?key=BgGlm7PGISqGOEieypFvUE0kuQtIBKeP (OSS)https://api.shodan.io/api-info?key=oykKBEq2KRySU33OxizNkOir5PgHpMLv (DEV)https://api.shodan.io/api-info?key=v4YpsPUJ3wjDxEqywwu6aF5OZKWj8kik (OSS)https://api.shodan.io/api-info?key=c1twrStvBUBJHq7euqxhap1XOZuMmouY (DEV)https://api.shodan.io/api-info?key=uAs67OallytytIdagyHKO1nAWxYetniW (OSS)https://api.shodan.io/api-info?key=aH7F5pcxsC3U9i7hmPUYA6vwdehxxNeP (OSS)https://api.shodan.io/api-info?key=dj2asy53eJcEu6GAYu5PRrVoKHOVTCFl (OSS)https://api.shodan.io/api-info?key=SjGOMa1LUc5RYsU4bWAmhNkInLaxphfC (OSS)

37

https://public.ducksboard.com/4xvXKxfjjSDkWBdN26Fk/

Today's world is no longer driven by data it's driven by the connections between them.

$MATCH p=shortestPath( (user1:TWITTER_USER {name:"ffranz"})-[*]-(user2:TWITTER_USER {name:"Valentino Rossi"}) ) RETURN p

MATCH p=allShortestPaths( (user1:TWITTER_USER {name:"ffranz"})-[*]-(user2:TWITTER_USER {name:"Valentino Rossi"}))RETURN p

39

http://fiware-cosmos.readthedocs.io/en/latest/

https://github.com/sinfonier-project/

Sinfonier project is part of FiWare project. 40

Join us!http://sinfonier-project.net/ http://blog.sinfonier-project.net/ @e_sinfonier

@ffranz

All pictures belongto their respective authorsAMSTERDAM 9-12 MAY 2016