ibm extreme blue ftp discovery week 2 presentation

Hi everyone, we're trying a new format for our presentation this week. We've got 21 slides, with 20 seconds per slide. We're hoping not to fall behind, or go too fast and finish way before the slide changes, because that would be really embarrassing.....

Today we'll tell you...

What our project is (again)

Where we were

What we've done

What we're going to do

...ok so, today we'll tell you, hopefully with better timing, what our project is, where we were last week, what we've done this week, and what our plans are for next week. I'll start with the overview, and then talk about the way we've organised. This week we've made some major tech progress, so the business content will be limited, but next week we'll have some really interesting market analysis to share!

What our project is (again)

FTP discovery is a system for mapping FTP networks, and showing companies the risks they are taking in using them. It should then help move infrastructure over to MFT systems, and then engage in governance to insure that the system remains efficient, and to curtail FTP growth. In implementing the project, one of our main concerns has been how to organise efficiently to get our work done.

How do we keep track?

So there's obviously lots of super tech stuff going on, but how do we keep track? We needed a solution everyone could access and edit, and something that was extendible and easy to understand. The less time we spend managing our work, the more time we have to actually do it! We built a private wiki, and we're keeping all our tasks and documentation there. I'll hand over to the tech team now, who will give you an overview of our progress.

Where we were

PETER AND JAMES START

So where were we? You probably remember this pretty picture from last Monday, giving a rough overview of the sort of work we'd done. While it was a good starting point, it was still quite limited we can't tell you anything about this data at all. Absolutely nothing. Except, perhaps, where a packet came from or where it was going to in the immediate vicinity.

What we had

One node collecting packet data of interaction with remote hosts

No datastore

No analysis

We had the sniffer working only on a single host, which would track all packets going across the network as it happened. The data we had was volatile; it didn't persist. We also didn't care about some 99% of the traffic, and we only had one central node. We didn't do any analysis of the traffic either, so we couldn't tell you how important any of the nodes were.

What we wanted

Every node in the network reporting data

Store this data

Analysis of important nodes

Visualisation of entire network

What we really wanted was for every single node in the network to report back to us we wanted to start to be able to track data as it went across a massive network, not as it hopped from one host to another. We wanted to know which nodes had a lot of data going through them, the ones which were likely to be most important. We also wanted to use our visualisation to see the traffic going to and from every node.

So what did we do?

Every node in the network reporting data

Store this data

Analysis of important nodes

Visualisation of entire network

So what did we actually manage to do? Well, everything. We currently have every node reporting data back to us, which is being stored in our yummy database. James's visualisation is showing the entire network, including links between remote nodes and Vedika's analysis engine is extracting information about every header and every host, working its magic and starting to work out what's important to us.

Lots of code-writing was involved...

Not all smooth sailing...

Talk about Jpcap

Our second problem wasn't that obvious at first. We were sniffing packets, and had an unusually large amount of data...

Architecture

Nodes collect data and report back to a centralised database

Analysis of database

Visualisation of network

We needed to come up with a sensible way to architect this project, and it was important that we keep it as modular and scalable as possible. We have a centralised database which all the nodes report back to. Vedika's analysis engine trawls through all of this data periodically and tries to work out which of our hosts and packets are important. Our visualisation also connects to this central database in order to build up its graph.

Architecture (2)

Before we can do anything else, we need data and lots of it. Our current packet sniffer runs on a host-by-host basis and tracks the data going from that host to other hosts, and sends all of this information back to our central database. The database itself stores all of the metadata associated with the header, links between nodes and also analysis metrics which Vedika will talk about in a few moments.

Demo!

Visualization

Visualisation (screenshot)

This is the updated visualisation, while it might not look very different to what we had last week in appearance, it is actually getting information about every node and every link from the database, and it isn't running live, as it was before. As you can see, we have our sniffer running on two different hosts, each of which are reporting back their network traffic data this is the stuff that you saw in the previous slide.

Next week

Refinement

Visualisation

Analysis

PETER AND JAMES END

Planning our analysis

So now we have a nice database with tons of data, but what do we do with it. We have to analyse it to get something sensible out of it. For this I was researching rules engines but with the scope of the project and the time we have at hand , that did not seem feasible. I was kind of working like this guy.So I decided to build my own analysis engine, but in a way that we could replace it with a proper tool later.

The current analysis engine

V 1.0 of the analysis engine is up and running. It analyses captured packet headers, runs analysis on the traffic through each node to estimate how important the node is. It also tries to measure what the chance is of a transfer being part of a business process. It is pretty rudimentary at the moment and this example is off simulated data but it is still easy to see how node 5 is very under used.

Integrating with the infrastructure

The engine has been integrated with the central database, It reads the captured packet data, does the same analysis and writes the data back to the database.Both the captured data and the engine are basic but this gives us a base infrastructure. I plan to iterate on the engine each week , making the results more refined and metrics more sensible.

Thanks for listening, we hope you've enjoyed our presentation. Any questions?

ibm extreme blue ftp discovery week 2 presentation

Documents