splunklive! milan 2015 - fastweb

20
Copyright © 2014 Splunk Inc. Alessandro Bono Vincenzo Vignera Splunk at Fastweb

Upload: splunk

Post on 19-Aug-2015

73 views

Category:

Technology


2 download

TRANSCRIPT

Copyright © 2014 Splunk Inc.

Alessandro Bono Vincenzo Vignera

Splunk at Fastweb

2

Splunk at Fastweb

Alessandro Bono

Network Operations Control

Coordinator

Vincenzo Vignera

Network Operations Control

Professional

3

Fastweb Overview

Today FASTWEB is the Italian leader in Ultra Broadband

~300K ~200K

~310K FTTH Customers

~400K FTTC Customers

With 500k customers connected at speed up to 100 Mbps, FASTWEB has a 70% share of the UBB market

of which FASTWEB

of which FASTWEB

~710K UBB

~500K (~70%)

of which FASTWEB

4

Background and Roles B

usin

ess P

rocess

Impro

vem

ent

Operational

Planning

Technology Division - Network Operations

DataCenters

Operation

NOC

Service Platforms

Backbone

Alessandro Bono In Fastweb since 2006

Backbone - Network Access Operations

Vincenzo Vignera In Fastweb since 2001

Support Platforms - Monitoring Platforms

5

Backbone Backbone

Access Network 15k FTTC Devices

6k ADSL Devices

1k FTTH Devices

2k Core Network Device

24k Access Equipment

6

Service Platforms Monitoring Platforms

OSS Platforms

VAS & Mobile Data Platforms

~3,1 Mln Mailbox

815K – MVNO USIM

~200k q/sec DNS

1,1 Mln ACS Devices

2 Mln UsersPayPerUse

4k Server Monitored with Agents

200k Network Devices

4,5 Mln KPI Collected

7

Splunk at Fastweb

Ind

exer

s

Hea

vy F

orw

ard

ers

Sear

ch H

ead

Release 1 - 2014

Milano Roma Genova Torino Padova

Ind

exer

s H

eavy

Fo

rwar

der

s Se

arch

Hea

d

Universal Forwarders

Release 2 - April 2015

200 GB/day

?

8

Reporting Delivered Services

Standard Reporting of Delivered Services

– Situation: Service Platforms Platforms Team and Backbone team consume a lot of time in Reporting Delivered Services

– Struggling with: Dozens of Platforms for Reporting different KPI

– Wanted: A centralized view for Reporting periodically Delivered Services

9

Reporting Delivered Services

# Monitoring Software

# CLI Command

# Database Queries

# Code

# …

: Before

: After

Enter Splunk: Splunk Enterprise enables Reporting for different services with the same Output

10

Analyze Bypass SPAMMER Filters

– Situation: Realtime logs Analyzing of Transactions that was sent by 1 IP Address and satisfy two of the following conditions:

• 2 or More Recipissssent

• At least 20 Mail ("QUEUE From" with different ID in 5 minutes)

• At least 2 Different From

• At least 1 E-mail known as spam (SPAM-BLOCKED).

- Next starting from «Auth» used Mailbox with drill-down report mail sent, % of «Subject» as SPAM

- Top Spammer by Source IP (latest 15m)

- Internet forwarding Check vs Fastwebnet Domain (Reporting Mailbox with more than 1 forward vs Faswtebnet, External Database Lookup to retrieve Customer Account)

SPAM Finder: Analyzing Problems

11

index="msr" sourcetype="c*_smtp" (transaction_type=QUEUE OR transaction_type=SPAM-BLOCKED)

|stats first(_time) AS time, values(transaction_type) AS type, values(Recipient) AS Recipients, dc(Recipient) AS nb_recipients, values(Relay) AS Relay,

values(Auth) AS Auth, values(From) AS From by transaction_id

|search Auth=* |eval more_than_2_recipients=IF(nb_recipients>=2,1,0) |eval spam_blocked=IF(type="SPAM-BLOCKED",1,0)

|stats first(time) AS first_time, dc(transaction_id) AS nb_mails, values(From) as Froms, dc(From) AS nb_froms, sum(more_than_2_recipients) AS nb_more_than_2_recipients,

sum(spam_blocked) AS nb_spam_blocked BY Relay, Auth

|eval more_than_2_recipients=IF(nb_more_than_2_recipients>0,1,0) |eval spam_blocked=IF(nb_spam_blocked>0,1,0)

|eval more_than_20_mails=IF(nb_mails>=20,1,0) |eval more_than_2_froms=IF(nb_froms>=2,1,0)

|eval possible_spam=more_than_2_recipients+more_than_20_mails+more_than_2_froms+spam_blocked |where possible_spam>=2

|eval first_sent_at=strftime(first_time, "%H:%M:%S") | eval possible_spam="yes"

|table first_sent_at Relay Auth Froms more_than_2_recipients more_than_20_mails more_than_2_froms spam_blocked possible_spam

|sort - first_sent_at

SPAM Finder: Analyzing Problems

12

Storming Detections

Detect Storming Network Devices

– Situation: Network Devices can logs thousand of syslog messages every seconds caused by interface problems

– Wanted: Network Devices Dashboard to analyzing trends

13

Storming Detections

- Enter Splunk:

- Analyzing Trends supporterd by Dashboard

- Automatic Actions

- Monitoring Deviations

14

Service Dashboard Monitoring

# Monitoring Software

# CLI Command

# Database Queries

# Code

# …

15

Logs and Scripts

Monitoring Backbone Link Customer Connectivity

16

Proactive Monitoring SNMP

SNMP App Single Device Check

17

Network Troubleshooting

Troubleshooting Bug on Network Devices

– Situation: Problem on 15k Network Devices, every ADSL Board provide services at 48 Customers ~ 700K Customers affected – Unable to Surf until Board Reset

– Struggling with: Thousand of Customer Center call to report problem

– Wanted: Decrease Recovery Time from 3h to 1h

18

Network Troubleshooting – First Step

Enter Splunk:

– Customer Care use automatic tools to check customer connectivity

– Intercept the actions of automated tools

– We decrease of 50% reporting

19

Splunk – Resolution

Enter Splunk:

– Find the Bug’s

– Implement an automated system to find the bug

– Splunk launches an automated script to reset the board

Customer Care Calling Decrease of 100%

Thank You