large scale log collection using logstash & mongodb
DESCRIPTION
Short descriptionTRANSCRIPT
Large scale log collection
Guided byProfessor Simon Shim
Team #14 Gaurav Bhardwaj <009297431> Vaibhav Bhor <009313434> Sumant Murke <009303879> Amod Rege <009259692>
CMPE 283: VIRTUALIZATION TECHNOLOGIES
1. Project Overview2. Objective3. Project Part-2 4. Project Part-1 (DRS-DPM)5. Screenshots6. Lessons learnt 7. Conclusion
AGENDA
Objective
Manage and test Virtual Machines Simulate DRS- DPM functionality Develop large scale analysis tool, which collects VM as
well as Host performance data. Understand need to Gather and Analyze log Data To come up with a framework which provides complete
solution for virtual Machine log file collection & analysis.
Design
Components
Agent Collector Aggregator Local storage (mongoDB) Central storage (MySQL) Visualization
Agent
Uses Java VI api to collect system metrics Collects Host as well as Virtual Machine stats Writes to a text file every 5 secs Takes following parameter VM Name, vHost
Name , y/n VM Name => Name of Virtual Machine it has to
monitor, y=> to collect stats for both vHost as well as
VM, n=> to collect only VM stats Vhost-Name => Name of vHost it has to
monitor
Java -jar Agent.jar “vHost Name” “vm
Name” “y/n”
Agent flow
Parsing file using LogStash
LogStash reads log file written by agent, For every append in log file it detects and
generates an event, parses each line of log file and stores it in mongoDB.
Conf file(logshipper.conf) supplied to LogStash
Input {file=> ”*.log”} Filter {filter=>json} Output {output=> mongoDB }
bin/logstash -f logshipper.conf
Collector
Takes IP of all agents Connects to local storage of each VM Pulls data in a round robin manner Clears data from mongoDB after reading Stores in MySQL Configuration file for connection information Automated run every 5 min using crontab
Python collector.py “conf file”
Aggregator & Central DB design
24 hour 1 hour 5 minute data VM and vHost stats Schema
DRS-DPM (Part-1)
Initialize the environment and get number of VM's and host's.
Initialize standard variables vmCount and hostCount. If number of virtual machines is greater than vmCount.If new machine is powered on. Move newly added virtual machine to host with minimum load. End if End ifIf number of host machines is greater than hostCount. If cpu load of new host is less than 30% Migrate the virtual machine to host with minimum load. Power off the host. End if find the VM with minimum load Migrate the virtual machine under new host. end if
Avoided ping-pong migration
Is our design good ?
Agents: will not append will re-write to file DataBase (mongoDB) Collector:
Collects data, stores it in MySQL and removes it from local Storage
Can connect to as many client specified in conf file
Aggregator purges main table DataBase (MySQL): Aggregator clears the
main table Visualization module is totally decoupled from
server and storage
Visualization approach Library
We used canvas.js a Javascript library for visualization.
CanvasJSUsed canvas.js to plot the graphs.We used canvas.js since it is easy to use
and provides different types of visualization.
Data Source: MySQL DatabaseMySQL database was used from which data
was plotted on the graph.MySQL was used to get data in structured
format and then plotted on the graph.
Output Graphs
Output Graphs
Output Graphs
Output Graphs
Output Graphs
Tools & Technology Agents
- Java VI api Collectors
- Python script automated with CRONTAB Log file parsing
- LogStash with mongoDB plugin Stress api
Manually increase CPU, IO and RAM consumption stress --cpu 2 --io 1 --vm 1 --vm-bytes 128M --timeout 10s --verbose
Visualization tools CanvasJS JavaScript Library JSP & HTML5
Programming languages - Java, Python, JavaScript
Utilities Putty , winscp
Database MySQL mongoDB
Lessons learnt
Using VI java api Concept behind DRS-DPM. Never clone a vHost Not every Virtual Machine is Linux Automation using CRONTAB ESX log files awareness Designing systems Working with SQL and No-SQL databases and
understanding their usage context
THANK YOU...