Big Data Visualization

Download Big Data Visualization

Post on 14-Jul-2015



Data & Analytics

1 download

Embed Size (px)


<ul><li><p>Raffael Marty, CEO</p><p>Big Data Visualization</p><p>London February, 2015</p></li><li><p>Secur i ty. Analyt ics . Ins ight .2</p><p> Visualization </p><p> Design Principles </p><p> Dashboards </p><p> SOC Dashboard </p><p> Data Discovery and Exploration </p><p> Data Requirements for Visualization </p><p> Big Data Lake</p><p>Overview</p></li><li><p>Secur i ty. Analyt ics . Ins ight .3</p><p>I am Raffy - I do Viz!</p><p>IBM Research</p></li><li><p>4Visualization</p></li><li><p>Secur i ty. Analyt ics . Ins ight .5</p><p>Why Visualization?the stats ...</p><p></p><p>the data...</p></li><li><p>Secur i ty. Analyt ics . Ins ight .6</p><p>Why Visualization?</p><p></p><p>Human analyst: pattern detection remembers context fantastic intuition can predict</p></li><li><p>Secur i ty. Analyt ics . Ins ight .7</p><p>Visualization To Present / Communicate Discover / Explore</p></li><li><p>Design Principles</p></li><li><p>Secur i ty. Analyt ics . Ins ight .9</p><p>Choosing Visualizations</p><p>Objective AudienceData</p></li><li><p>Secur i ty. Analyt ics . Ins ight .10</p><p> Objective: Find attackers in the network moving laterally </p><p> Defines data needed (netflow, sflow, ) </p><p> maybe restrict to a network segment </p><p> Audience: security analyst, risk team, </p><p> Informs how to visualize / present data </p><p>For Example - Lateral MovementRecon Weaponize Deliver Exploit Install C2 Act</p></li><li><p>Secur i ty. Analyt ics . Ins ight .11</p><p> Show comparisons, contrasts, differences </p><p> Show causality, mechanism, explanation, systematic structure. </p><p> Show multivariate data; that is, show more than 1 or 2 variables. </p><p>by Edward Tufte</p><p>Principals of Analytic Design</p></li><li><p>Secur i ty. Analyt ics . Ins ight .12</p><p>Show Context</p><p>42</p></li><li><p>Secur i ty. Analyt ics . Ins ight .</p><p>42 is just a number </p><p>and means nothing without context</p><p>13</p><p>Show Context</p></li><li><p>Secur i ty. Analyt ics . Ins ight .15</p><p>Use Numbers To Highlight Most Important Parts of Data</p><p>NumbersSummaries</p></li><li><p>Secur i ty. Analyt ics . Ins ight .16</p><p>Additional information about objects, such as: </p><p> machine roles criticality location owner </p><p> user roles office location </p><p>Add Context</p><p>source destination</p><p>machine and user context</p><p>machine roleuser role</p></li><li><p>Secur i ty. Analyt ics . Ins ight .17</p><p>Traffic Flow Analysis With Context</p></li><li><p>Secur i ty. Analyt ics . Ins ight .18</p><p></p><p> Black background Blue or green colors Glow</p><p>Aesthetics Matter</p></li><li><p>Secur i ty. Analyt ics . Ins ight .19</p><p>B O R I N G</p></li><li><p>Secur i ty. Analyt ics . Ins ight .20</p><p>Sexier</p></li><li><p>Secur i ty. Analyt ics . Ins ight .21</p><p> Audience, audience, audience! Comprehensive Information (enough context) Highlight important data Use graphics when appropriate </p><p> Good choice of graphics and design Aesthetically pleasing Enough information to decide if action is necessary No scrolling </p><p> Real-time vs. batch? (Refresh-rates) </p><p> Clear organization</p><p>Dashboard Design Principles</p></li><li><p>22</p><p>SOC Dashboards</p></li><li><p>Secur i ty. Analyt ics . Ins ight .23</p><p>Mostly Blank</p></li><li><p>Secur i ty. Analyt ics . Ins ight .24</p><p> Disappears too quickly </p><p> Analysts focus is on their own screens </p><p> SOC dashboard just distracts </p><p> Detailed information not legible </p><p> Put the detailed dashboards on the analysts screens! </p><p>Dashboards For Discovery</p></li><li><p>Secur i ty. Analyt ics . Ins ight .25</p><p> Provide analyst with context </p><p> What else is going on in the environment right now? </p><p> Bring Into Focus </p><p> Turn something benign into something interesting </p><p> Disprove </p><p> Turn something interesting into something benign</p><p>Use SOC Dashboard For Context</p><p>Environment informs detection policies</p></li><li><p>Secur i ty. Analyt ics . Ins ight .26</p><p>Show ComparisonsCurrent Measure</p><p>week prior</p></li><li><p>Secur i ty. Analyt ics . Ins ight .27</p><p> News feed summary (FS ISAC feeds, mailinglists, threat feeds) </p><p> Monitoring twitter or IRC for certain activity / keywords </p><p> Volumes or metrics (e.g., #firewall blocks, #IDS alerts, #failed transactions) </p><p> Top N metrics: </p><p> Top 10 suspicious users </p><p> Top 10 servers connecting outbound</p><p>What To Put on Screens</p><p>Provide context to individual security alerts</p><p></p></li><li><p>28</p><p>Data Discovery &amp; Exploration</p></li><li><p>Secur i ty. Analyt ics . Ins ight .29</p><p>Visualize Me Lots (&gt;1TB) of Data</p></li><li><p>Secur i ty. Analyt ics . Ins ight .30</p><p>Information Visualization Mantra</p><p>Overview Zoom / Filter Details on Demand</p><p>Principle by Ben Shneiderman</p><p> summary / aggregation data mining signal detection (IDS, behavioral, etc.)</p></li><li><p>Secur i ty. Analyt ics . Ins ight .31</p><p> Access to data </p><p> Parsed data and data context </p><p> Data architecture for central data access and fast queries </p><p> Application of data mining (how?, what?, scalable, ) </p><p> Visualization tools that support </p><p> Complex visual types (||-coordinates, treemaps, heat maps, link graphs) </p><p> Linked views </p><p> Data mining (clustering, ) </p><p> Collaboration, information sharing </p><p> Visual analytics workflow</p><p>Visualization Challenges</p></li><li><p>Big Data Lake</p></li><li><p>Secur i ty. Analyt ics . Ins ight .33</p><p> One central location to store all cyber security data Data collected only once and third party software leveraging it Scalability and interoperability </p><p> More than deploying an off the shelf product from a vendor Data use influences both data formats and technologies to store the data </p><p> search, analytics, relationships, and distributed processing correlation, and statistical summarization </p><p> What to do with Context? Enrich or join? Hard problems: </p><p> Parsing: can you re-parse? Common naming scheme! Data store capabilities (search, analytics, distributed processing, etc.) Access to data: SQL (even in Hadoop context), how can products access the data?</p><p>The Big Data Lake</p></li><li><p>Secur i ty. Analyt ics . Ins ight .34</p><p>Federated Data Access</p><p>SIEM</p><p>dispatcher</p><p>SIEM connector SIEM console</p><p>Prod A</p><p>AD / LDAPHR</p><p>IDS</p><p>FW Prod B</p><p>DBs</p><p>Data Lake</p><p>Caveats: </p><p> Dispatcher? </p><p> Standard access to dispatcher /products enabled </p><p> Data lake technology?</p><p>SNMP</p></li><li><p>Secur i ty. Analyt ics . Ins ight .35</p><p>Multiple Data Stores</p><p>raw logs</p><p>key-value </p><p>structured</p><p>real-timeprocessing </p><p>(un)-structured data</p><p>context</p><p>SQL </p><p>storage</p><p>stats</p><p>index </p><p>queue </p><p>distributedprocessing </p><p>access</p><p>graph</p><p>Caveat: </p><p> Need multiple types of data stores</p></li><li><p>Secur i ty. Analyt ics . Ins ight .36</p><p>Technologies (Example)</p><p>raw logs</p><p>key-value(Cassandra)</p><p>columnar(parquet)</p><p>real-time processing</p><p>(Spark)</p><p>(un)-structured data</p><p>context</p><p>SQL(Impala, </p><p>SparkSQL)</p><p>HDFS</p><p>aggregates</p><p>index(ES)</p><p>queue(Kafka)</p><p>distributedprocessing</p><p>(Spark)</p><p>access</p><p>graph(GraphX)</p><p>Caveat: </p><p> No out of the box </p><p>solution available</p></li><li><p>Secur i ty. Analyt ics . Ins ight .37</p><p>SIEM Integration - Log Management First</p><p>SIEM</p><p>columnar or </p><p>search engineor </p><p>log management </p><p>processing</p><p>SIEM connector</p><p>raw logs</p><p>SIEM console</p><p>SQL or searchinterface</p><p>processingfiltering</p><p>HDFS</p><p>e.g., PIG parsing</p></li><li><p>Secur i ty. Analyt ics . Ins ight .38</p><p>Simple SIEM Integration</p><p>raw, csv, jsonflume</p><p>log data</p><p>SQL(Impala, </p><p>with SerDe)</p><p>HDFS</p><p>SIEM connector</p><p>SIEM</p><p>Requirement: </p><p> SIEM connector to forward text-based data to Flume.</p><p>SQL interface Tableau, etc.</p><p>SIEM console</p></li><li><p>Secur i ty. Analyt ics . Ins ight .39</p><p>SIEM Integration - Advanced</p><p>SIEM</p><p>columnar(parquet)</p><p>processing</p><p>syslog data</p><p>SQL(Impala, </p><p>SparkSQL)</p><p>HDFS</p><p>index(ES)</p><p>queue(Kafka)</p><p>access</p><p>other data sources</p><p>SIEM connector</p><p>raw logs</p><p>SIEM console</p><p>SQL and search interface </p><p>Tableau, Kibana, etc.requires parsing and formatting in a SIEM readable format (e.g., CEF)</p></li><li><p>Secur i ty. Analyt ics . Ins ight .40</p><p>What I am Working On</p><p>Data Stores Analytics Forensics Models Admin </p><p> --&gt; --&gt; --&gt; --&gt; --&gt;</p><p></p><p></p><p>538.8.8.8</p><p></p><p>Anomalies</p><p>Decomposition</p><p>Data</p><p>Seasonal</p><p>Trend</p><p>Anomaly Details</p><p>Hunt ExplainVisual Search</p><p> Big data backend Own visualization engine (Web-based) Visualization workflows</p></li><li><p>Secur i ty. Analyt ics . Ins ight .41</p><p>BlackHat Workshop</p><p>Visual Analytics - Delivering Actionable Security </p><p>Intelligence</p><p>August 1-6 2015, Las Vegas, USA</p><p>big data | analytics | visualization</p></li><li><p>Secur i ty. Analyt ics . Ins ight .42</p><p> </p><p>List: </p><p>Twitter: @secviz</p><p>Share, discuss, challenge, and learn about security visualization.</p><p>Security Visualization Community</p></li><li><p>Secur i ty. Analyt ics . Ins ight .</p><p></p><p> </p><p> and @secviz</p><p>Further resources:</p></li></ul>


View more >