splunkapplicationloggingbestpractices template 2 · event1logs1suck online1 services web1 services...
TRANSCRIPT
<Presenter><Title>
Application LoggingBest Practices
§ Reality of Event Logging§ Liberating Application Data§ Operational Best Practices§ Data Enrichment | Other Data Sources§ More Developer Tools
Agenda
2
3
Reality of Event Logging
The Accelerating Pace of DataVolume | Velocity | Variety | Variability
GPS,RFID,
Hypervisor,Web Servers,
Email, Messaging,Clickstreams, Mobile,
Telephony, IVR, Databases,Sensors, Telematics, Storage,
Servers, Security Devices, Desktops
Machine data is the fastest growing, most complex, most valuable area of big data
Event Logs Suck
Online Services Web
Services
ServersSecurity GPS
Location
StorageDesktops
Networks
Packaged Applications
CustomApplicationsMessaging
TelecomsOnline Shopping Cart
Web Clickstreams
Databases
Energy Meters
Call Detail Records
Smartphones and Devices
RFID
On-‐Premises
Private Cloud
Public Cloud
§ They have some structure§ Structure is not consistent§ Structure is non-‐standard§ Keys can be stored separately§ High volume, growing every day§ Hard to access§ Take up tons of space§ Clog up the network
Event Logs Suck050818 16:19:31 2 Query UPDATE xar_session_info SET xar_vars = 'XARSVuid|i:2;XARSVrand|i:343223999;XARSVuaid|s:2:\"29\";XARSVbrowsername|s:9:\"Netscape6\";XARSVbrowserversion|s:3:\"5.0\";XARSVosname|s:7:\"Unknown\";XARSVosversion|s:7:\"Unknown\";XARSVnavigationLocale|s:11:\"en_US.utf-‐8\";SPLUNKAPP_IP|N;', xar_lastused = 1124407171 WHERE xar_sessid ='ll7joq442223fl6h07v3f3vpd2’10 Query UPDATE xar_session_info SET xar_vars = 'XARSVuid|i:2;XARSVrand|i:89426315;XARSVuaid|s:2:\"29\";XARSVbrowsername|s:9:\"Netscape6\";XARSVbrowserversion|s:3:\"5.0\";XARSVosname|s:7:\"Unknown\";XARSVosversion|s:7:\"Unknown\";XARSVnavigationLocale|s:11:\"en_US.utf-‐8\";SPLUNKAPP_IP|N;', xar_lastused = 1124407193 WHERE xar_sessid = 't2idg584t1co0scgj40qnnm’ 31 Connect [email protected] on caveJun 2 13:36:50 DEBUG[1826]: Setting NAT on RTP to 0Jun 2 13:36:50 DEBUG[1826]: Check for res for 5008officeJun 2 13:36:50 DEBUG[1826]: Call from user '5008office' is 1 out of 0Jun 2 13:36:50 DEBUG[1826]: build_route: Contact hop: <sip:[email protected]:5060>Jun 2 13:36:50 VERBOSE[10887]: -‐-‐ Executing Macro("SIP/5008office-‐dfbd", ”
Apr 29 19:13:01 45.2.98.7 SentriantGenericAlert: Time="04/29/06 07:12 PM PDT",Host="roach_motel.enet.interop.net",Category="fabric_network_activity",Generator="Response:Slow Scan",Type="NOTICE",Priority="High",Body="Appliance=roach_motel.enet.interop.net,Reporting Segment=ENET network,Action=Response disabled,Response=Slow Scan,Duration=90 seconds,Source Segment=Unprotected,Source IP=88.73.39.200,Source MAC=00:01:30:BC:93:90,Current Target Count=0"Apr 29 19:13:01 45.2.98.7 SentriantGenericAlert: Time="04/29/06 07:12 PM PDT",Host="roach_motel.enet.interop.net",Category="fabric_network_activity",Generator="Response:Slow Scan",Type="NOTICE",Priority="High",Body="Appliance=roach_motel.enet.interop.net,Reporting Segment=ENET network,Action=Response disabled,Response=Slow Scan,Duration=69 seconds,Source Segment=Unprotected,Source IP=68.163.20.95,Source MAC=00:01:30:BC:93:90,Current Target Count=0"Apr 29 19:13:01 45.2.98.7 SentriantGenericAlert: Time="04/29/06 07:12 PM PDT",Host="roach_motel.enet.interop.net",Category="fabric_network_activity",Generator="Response:Slow
45.2.98.7 SentriantGenericAlert:Time="04
Event Logs Rocks
• Ensure system security• Meet compliance mandates
• Customer behavior and experience• Product and service usage• End-‐to-‐end transaction visibility
Definitive record ofactivity and behavior
Important insight forIT and the business
10.2.1.44 - [25/Sep/2009:09:52:30 -0700] type=USER_LOGIN msg=audit(1253898008.056:199891): user pid=25702 uid=0
auid=4294967295 msg='acct="TAYLOR": exe="/usr/sbin/sshd" (hostname=?, addr=10.2.1.48, terminal=sshd res=failed)'
User IP Action Login Result
10.2.1.80 - - [25/Jan/2010:09:52:30 -0700] "GET /petstore/product.screen
?product_id=AV-CB-01 HTTP/1.1" 200 9967 "http://10.2.1.224/petstore/category.screen?category_id=BIRDS" "Mozilla/5.0 (compatible; Konqueror/3.1;
Linux)”"JSESSIONID=xZDTK81Gjq9gJLGWnt2NXrJ2tpGZb1HyHHV8hJGYFj1DFByvL5L!-1539148667"
User IP Product Category
Gold Mine of Information
8
Interpretation = Real Business Value
9
10
The Mighty Application Log
Operations
Security
Business Intelligence
Social/Mobile
§ How many transactions are failing?§ Which specific transactions are failing?§ Is system performance falling behind?
§ Who is accessing the app? When?§ What activity looks suspicious?§ Is the application behaving as expected?
§ What is the purchase volume over time?§ How do purchases compare to last month?§ How are customers affected by app issues?
§ How is the customer experience?§ Are transactions taking too long?§ Where are transactions happening?
Traditional AnalyticsSELECT customers.* FROM customers WHERE customers.customer_id NOT IN(SELECT customer_id FROM orders WHERE year(orders.order_date) = 2004)
Early Structure Binding
Structure Data
§ Schema created a design time
§ Queries understood at design time
§ Homogenous§ Must fit into table or
converted to tables§ Must match constraints
Analytics with Splunk Late Structure Binding
Structure Data
§ Schema-‐less§ Created at search time§ Queries executed ad-‐hoc
§ Heterogeneous§ Constantly changing§ No conversion required§ No constraints
Gain Intelligence QuicklyEarly Structure Binding
Decide question to ask
Design the schema
Normalize data + writeDB insertion code
Create SQL & feed intoanalytics tool
Write Semantic Events
Collect
Create Searches, Reports &Graphs
Late Structure Binding
§ Days – Weeks – Months § Destructive
§ Minutes§ Non-‐Destructive
15
Liberating Application Data
Current State§ You have no control over other system’s events§ You have full control over events that YOU write§ Most events are written by developers to help them debug § Some events are written to form an audit trail
Logging with Purpose§ Logging for Debugging
§ Troubleshoot application problems§ Identify trends§ Categorize issues
§ Semantic Logging§ Record the state of business processes§ Examples: web clicks, financial trades,
cell phone connections, audit trails, etc.
void submitPurchase(purchaseId) {
log.info("action=submitPurchaseStart, purchaseId=%d", purchaseId)
// These calls throw an exception:
submitToCreditCard(...)generateInvoice(...)generateFullfillmentOrder(...)
log.info("action=submitPurchaseCompleted, purchaseId=%d", purchaseId)
}
Liberating Log Data – In a Nutshell
qUse clear key-‐value pairs
q Create events humans can read
qUse developer-‐friendly formats
qUse timestamps for every event
qUse unique identifiers (IDs)
q Log in text format
q Log more than debug events
qUse categories
q Identify the source
qMinimize multi-‐line events
Use Clear Key-‐Value Pairs§ Create Structure from Unstructured Data
§ Use space or comma delimited§ Wrap values with spaces in quotes
§ Automatic field extraction§ Self describing, does not require regular expressions to parse§ Keys are stored alongside field values§ No additional configuration work for Splunk Admin or Knowledge Manager
Example (Good):Log.debug(“orderstatus=error,errorcode=454,user=%d,transactionid=%s”, userId, transId)
Example (Bad):Log.debug(“error %d 454 - %s ”, userId, transId)
Create Human-‐Readable Events§ Use ASCII Format
§ Avoid complex encoding§ Avoid formats which require arbitrary code to decipher
§ Use Consistent Formatting§ Separate events with different formats into individual files
Create Human-‐Readable Events§ Avoid Binary Data
§ Binary data is compressed, but requires decoding and does not segment § Splunk cannot meaningfully search or analyze binary data§ If data must be in binary format:
§ Provide tool to easily convert to ASCII§ Create custom Splunk search command to decode binary segments inline§ Place textual metadata in the event§ For example, do not log the binary data of a JPG file, but do log its image size, creation tool, username, camera, GPS location, etc.
Use Developer-‐Friendly Formats§ JSON and XML are Readable by Humans and Machines
§ Seamless parsing by most programming languages right in the browser§ Useful for capturing hierarchy or membership, and self-‐describing§ Easily interpreted by Splunk spath command
{"widget": {"text": {
"data": "Click here","size": 36,"data": "Learn more","size": 37,"data": "Help","size": 38,
}}
date size data---------- ---- ----------2014-08-12 36 Click here
37 Learn more38 Help
Use Timestamps§ Time is a First Class Citizen
§ Timestamps are critical to understanding the sequence of events for debugging, analytics, and deriving transactions
§ Timestamps are automatically detected, but best to use an intelligent format
§ Timestamp Dos§ Use most verbose granularity, if possible microseconds since events can
become orphaned from the originating event§ Place timestamps at beginning of event§ Include a four digit year§ Include a time zone
§ Timestamp Do Nots§ Do not use a time offset
Example (Good):08/12/2014:09:16:35.842 GMT INFO key1=value1 key2=value2
Use Unique Identifiers (IDs)§ More Power for Debugging and Analytics
§ Examples: Transaction IDs, user IDs§ Used to find exact transactions
§ Carry Unique IDs Through Multiple Touch Points§ Avoid changing format between modules or systems§ Include transitive closures
transid=abcdef, transid=abcdef, otherid= qrstuv, . . . . .otherid=qrstuv
Transaction
Unique IDs Through Multiple Touch PointsOrder ID
Customer’s Tweet
Time Waiting On Hold
Product ID
Company’s Twitter ID
Order ID
Customer ID
Twitter ID
Customer ID
Customer ID
Sources
Order Processing
Care IVR
Middleware Error
Minimize Multi-‐ Line/Value Events§ Multi-‐ Line/Value Events are Less Efficient
§ More difficult for software to parse§ Generate many segments, affects indexing/search speed + disk compression§ Break multi-‐line events into separate events§ Break multi-‐value fields into separate events for easier manipulation
Example (Good):<TS> phonenumber=333-‐444-‐4444, app=angrybirds, installdate=xx/xx/xx<TS> phonenumber=333-‐444-‐4444, app=facebook, installdate=yy/yy/yy
Example (Bad):<TS> phonenumber=333-‐444-‐4444, app=angrybirds,facebook
Log More Than Debug Events§ Log anything that can add value when aggregated and/or visualized
§ user actions§ timing§ transactions§ audit trails
§ Log Category§ Severity levels can aid navigation and baselining
§ Identify the Source§ Use class, function or filename
28
Operational Best Practices
Operational Best Practices§ Log locally to log files
§ Provides local buffer§ Non-‐blocking during network failures§ Use syslog-‐ng or rsyslog + Splunk forwarder for syslog data
§ Implement rotation policies§ Logs take up space§ Many compliance regulations require years of archival storage§ Decide on destroying or backing up logs
Operational Best Practices§ Use Splunk Forwarders
§ Data collection in real-‐time§ Tracks and maintains state§ Enable collection of data over many channels:
HTTP | Queues | Multicast | Web services | Databases
§ Collect events from everything, everywhere§ Application logs | database logs | network data | configuration files |
performance data | time-‐based data§ More data captured = more visibility
Copyright © 2014 Splunk Inc.
31
Un|Structured Data
Creating Value With Structured Data
Enrich search results with additional business contextEasily import data into Splunk for deeper analysisIntegrate multiple DBs concurrentlySimple set-‐up, non-‐invasive and secure
DB Connect provides reliable, scalable, real-‐time integration between Splunk and traditional relational databases
Microsoft SQLServer
JDBC
Database Lookup
Database Query
Connection Pooling
Other Databases
Oracle Database
Java Bridge Server
32
Hadoop and NoSQL offer simple storage but hard analytics: difficult to explore, analyze, visualize
Hard-‐to-‐staff skills: require months of labor by specialists with rare and expensive skill sets
Inflexible approaches: must predefine fixed schemas or program MapReduce jobs
Hadoop (MapReduce & HDFS)
YARNDataFu
Hive
Mahout PigSqoop
Wide Range of Open Source Projects for Analytics and Data Visualization
Azkaban
It’s Hard to Turn Raw Data Into Refined Insights
NoSQL Data Stores
Integrated Analytics Platform for Diverse Data StoresFull-‐featured, Integrated Product
Fast Insights for Everyone
Works with What You Have Today
Explore Visualize Dashboards
ShareAnalyze
Hadoop Clusters NoSQL and Other Data Stores
Hadoop Client Libraries Streaming Resource Libraries
Bi-‐directional Integration with Hadoop
35
More Developer Tools
The Splunk Enterprise Platform
Collection
Indexing
Search Processing Language
Core Functions
Inputs, Apps, Other Content
SDKContent
Core Engine
User and Developer Interfaces
Web Framework
REST API
What’s Possible with the Splunk Enterprise Platform?
Power Mobile Apps
Log Directly
Extract Data
Customer Dashboards
Integrate BI Tools
Integrate PlatformServices
Developer Platform
Powerful Platform for Enterprise Developers
REST API
Web Framework
Web Framework Ruby
C#PHP
Data Models
Search Extensibility
Modular Inputs
SDKsSimple XML
JavaScript
Django
Developers Can Customize and Extend
Splunk Software for Developers
Gain Application Intelligence
Build Splunk Apps
Integrate and Extend Splunk
40
Thank You