splunklive! milan 2015 - fastweb
TRANSCRIPT
2
Splunk at Fastweb
Alessandro Bono
Network Operations Control
Coordinator
Vincenzo Vignera
Network Operations Control
Professional
3
Fastweb Overview
Today FASTWEB is the Italian leader in Ultra Broadband
~300K ~200K
~310K FTTH Customers
~400K FTTC Customers
With 500k customers connected at speed up to 100 Mbps, FASTWEB has a 70% share of the UBB market
of which FASTWEB
of which FASTWEB
~710K UBB
~500K (~70%)
of which FASTWEB
4
Background and Roles B
usin
ess P
rocess
Impro
vem
ent
Operational
Planning
Technology Division - Network Operations
DataCenters
Operation
NOC
Service Platforms
Backbone
Alessandro Bono In Fastweb since 2006
Backbone - Network Access Operations
Vincenzo Vignera In Fastweb since 2001
Support Platforms - Monitoring Platforms
5
Backbone Backbone
Access Network 15k FTTC Devices
6k ADSL Devices
1k FTTH Devices
2k Core Network Device
24k Access Equipment
6
Service Platforms Monitoring Platforms
OSS Platforms
VAS & Mobile Data Platforms
~3,1 Mln Mailbox
815K – MVNO USIM
~200k q/sec DNS
1,1 Mln ACS Devices
2 Mln UsersPayPerUse
4k Server Monitored with Agents
200k Network Devices
4,5 Mln KPI Collected
7
Splunk at Fastweb
Ind
exer
s
Hea
vy F
orw
ard
ers
Sear
ch H
ead
Release 1 - 2014
Milano Roma Genova Torino Padova
Ind
exer
s H
eavy
Fo
rwar
der
s Se
arch
Hea
d
Universal Forwarders
Release 2 - April 2015
200 GB/day
?
8
Reporting Delivered Services
Standard Reporting of Delivered Services
– Situation: Service Platforms Platforms Team and Backbone team consume a lot of time in Reporting Delivered Services
– Struggling with: Dozens of Platforms for Reporting different KPI
– Wanted: A centralized view for Reporting periodically Delivered Services
9
Reporting Delivered Services
# Monitoring Software
# CLI Command
# Database Queries
# Code
# …
: Before
: After
Enter Splunk: Splunk Enterprise enables Reporting for different services with the same Output
10
Analyze Bypass SPAMMER Filters
– Situation: Realtime logs Analyzing of Transactions that was sent by 1 IP Address and satisfy two of the following conditions:
• 2 or More Recipissssent
• At least 20 Mail ("QUEUE From" with different ID in 5 minutes)
• At least 2 Different From
• At least 1 E-mail known as spam (SPAM-BLOCKED).
- Next starting from «Auth» used Mailbox with drill-down report mail sent, % of «Subject» as SPAM
- Top Spammer by Source IP (latest 15m)
- Internet forwarding Check vs Fastwebnet Domain (Reporting Mailbox with more than 1 forward vs Faswtebnet, External Database Lookup to retrieve Customer Account)
SPAM Finder: Analyzing Problems
11
index="msr" sourcetype="c*_smtp" (transaction_type=QUEUE OR transaction_type=SPAM-BLOCKED)
|stats first(_time) AS time, values(transaction_type) AS type, values(Recipient) AS Recipients, dc(Recipient) AS nb_recipients, values(Relay) AS Relay,
values(Auth) AS Auth, values(From) AS From by transaction_id
|search Auth=* |eval more_than_2_recipients=IF(nb_recipients>=2,1,0) |eval spam_blocked=IF(type="SPAM-BLOCKED",1,0)
|stats first(time) AS first_time, dc(transaction_id) AS nb_mails, values(From) as Froms, dc(From) AS nb_froms, sum(more_than_2_recipients) AS nb_more_than_2_recipients,
sum(spam_blocked) AS nb_spam_blocked BY Relay, Auth
|eval more_than_2_recipients=IF(nb_more_than_2_recipients>0,1,0) |eval spam_blocked=IF(nb_spam_blocked>0,1,0)
|eval more_than_20_mails=IF(nb_mails>=20,1,0) |eval more_than_2_froms=IF(nb_froms>=2,1,0)
|eval possible_spam=more_than_2_recipients+more_than_20_mails+more_than_2_froms+spam_blocked |where possible_spam>=2
|eval first_sent_at=strftime(first_time, "%H:%M:%S") | eval possible_spam="yes"
|table first_sent_at Relay Auth Froms more_than_2_recipients more_than_20_mails more_than_2_froms spam_blocked possible_spam
|sort - first_sent_at
SPAM Finder: Analyzing Problems
12
Storming Detections
Detect Storming Network Devices
– Situation: Network Devices can logs thousand of syslog messages every seconds caused by interface problems
– Wanted: Network Devices Dashboard to analyzing trends
13
Storming Detections
- Enter Splunk:
- Analyzing Trends supporterd by Dashboard
- Automatic Actions
- Monitoring Deviations
17
Network Troubleshooting
Troubleshooting Bug on Network Devices
– Situation: Problem on 15k Network Devices, every ADSL Board provide services at 48 Customers ~ 700K Customers affected – Unable to Surf until Board Reset
– Struggling with: Thousand of Customer Center call to report problem
– Wanted: Decrease Recovery Time from 3h to 1h
18
Network Troubleshooting – First Step
Enter Splunk:
– Customer Care use automatic tools to check customer connectivity
– Intercept the actions of automated tools
– We decrease of 50% reporting
19
Splunk – Resolution
Enter Splunk:
– Find the Bug’s
– Implement an automated system to find the bug
– Splunk launches an automated script to reset the board
Customer Care Calling Decrease of 100%