using nosql databases to store radius and syslog data

Post on 20-Jan-2015

4.114 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

A seminar presentation done for TUT's NoSQL course. A brief look into the possibility and the feasibility of using NoSQL databases to store RADIUS accounting and Syslog data. In this particular case, Syslog-NG, Radiator RADIUS server and MongoDB were used as trial platforms. The presentation includes configuration examples and also some code.

TRANSCRIPT

Using NoSQL databases to store RADIUS and Syslog

data, part 1: IdeaKarri Huhtanen

18.9.2012

Some background

• currently RADIUS accounting data is stored usually in SQL databases with fixed database schema

• for Syslog messages an SQL database can be used, but commercial log analyzers (like Splunk) usually use their own solutions which may or may not be SQL databases

• Started thinking if NoSQL database could be applied to both or one of these?

RADIUS accounting messageWed  Aug    8  13:49:33  2012                User-­‐Name  =  "jotain@realm"                NAS-­‐Port  =  8                NAS-­‐IP-­‐Address  =  192.168.229.131                Framed-­‐IP-­‐Address  =  192.168.163.226                NAS-­‐Identifier  =  "Cisco_66:77:88"                Airespace-­‐WLAN-­‐Id  =  4                Acct-­‐Session-­‐Id  =  "50223ea9/00:11:22:33:44:55/2292"                Acct-­‐Authentic  =  Remote                Tunnel-­‐Type  =  0:VLAN                Tunnel-­‐Medium-­‐Type  =  0:802                Tunnel-­‐Private-­‐Group-­‐ID  =  0:222                Event-­‐Timestamp  =  1344422780                Acct-­‐Status-­‐Type  =  Alive                Acct-­‐Input-­‐Octets  =  1262012                Acct-­‐Input-­‐Gigawords  =  0                Acct-­‐Output-­‐Octets  =  13518133                Acct-­‐Output-­‐Gigawords  =  0                Acct-­‐Input-­‐Packets  =  11692                Acct-­‐Output-­‐Packets  =  11154                Acct-­‐Session-­‐Time  =  1235                Acct-­‐Delay-­‐Time  =  19                Calling-­‐Station-­‐Id  =  "00:11:22:33:44:55"                Called-­‐Station-­‐Id  =  "f4:7f:35:5e:bf:b0"                cisco-­‐avpair  =  "nas-­‐update=true"                Digest-­‐Response  =  "P"C<188>"                Digest-­‐Response  =  "P"C<194>"                Timestamp  =  1344422954

One message contains undetermined number of attributes.

Some can be interpreted, some stay unknown.

Because there can be a changing number of changing type of attributes I began to wonder if NoSQL could be used for storing these?

interpreted attributes,the unknown attributes are usually left in OID:FieldDataType binary format

Syslog messageThe  syslog  message  has  the  following  ABNF  [RFC5234]  definition:

           SYSLOG-­‐MSG            =  HEADER  SP  STRUCTURED-­‐DATA  [SP  MSG]

           HEADER                    =  PRI  VERSION  SP  TIMESTAMP  SP  HOSTNAME                                                SP  APP-­‐NAME  SP  PROCID  SP  MSGID            PRI                          =  "<"  PRIVAL  ">"            PRIVAL                    =  1*3DIGIT  ;  range  0  ..  191            VERSION                  =  NONZERO-­‐DIGIT  0*2DIGIT            HOSTNAME                =  NILVALUE  /  1*255PRINTUSASCII

           APP-­‐NAME                =  NILVALUE  /  1*48PRINTUSASCII            PROCID                    =  NILVALUE  /  1*128PRINTUSASCII            MSGID                      =  NILVALUE  /  1*32PRINTUSASCII

           TIMESTAMP              =  NILVALUE  /  FULL-­‐DATE  "T"  FULL-­‐TIME            FULL-­‐DATE              =  DATE-­‐FULLYEAR  "-­‐"  DATE-­‐MONTH  "-­‐"  DATE-­‐MDAY            DATE-­‐FULLYEAR      =  4DIGIT            DATE-­‐MONTH            =  2DIGIT    ;  01-­‐12            DATE-­‐MDAY              =  2DIGIT    ;  01-­‐28,  01-­‐29,  01-­‐30,  01-­‐31  based  on                                                                ;  month/year            FULL-­‐TIME              =  PARTIAL-­‐TIME  TIME-­‐OFFSET            PARTIAL-­‐TIME        =  TIME-­‐HOUR  ":"  TIME-­‐MINUTE  ":"  TIME-­‐SECOND                                                [TIME-­‐SECFRAC]            TIME-­‐HOUR              =  2DIGIT    ;  00-­‐23            TIME-­‐MINUTE          =  2DIGIT    ;  00-­‐59            TIME-­‐SECOND          =  2DIGIT    ;  00-­‐59            TIME-­‐SECFRAC        =  "."  1*6DIGIT            TIME-­‐OFFSET          =  "Z"  /  TIME-­‐NUMOFFSET            TIME-­‐NUMOFFSET    =  ("+"  /  "-­‐")  TIME-­‐HOUR  ":"  TIME-­‐MINUTE

           STRUCTURED-­‐DATA  =  NILVALUE  /  1*SD-­‐ELEMENT            SD-­‐ELEMENT            =  "["  SD-­‐ID  *(SP  SD-­‐PARAM)  "]"            SD-­‐PARAM                =  PARAM-­‐NAME  "="  %d34  PARAM-­‐VALUE  %d34            SD-­‐ID                      =  SD-­‐NAME            PARAM-­‐NAME            =  SD-­‐NAME            PARAM-­‐VALUE          =  UTF-­‐8-­‐STRING  ;  characters  '"',  '\'  and                                                                          ;  ']'  MUST  be  escaped.            SD-­‐NAME                  =  1*32PRINTUSASCII                                                ;  except  '=',  SP,  ']',  %d34  (")

           MSG                          =  MSG-­‐ANY  /  MSG-­‐UTF8            MSG-­‐ANY                  =  *OCTET  ;  not  starting  with  BOM            MSG-­‐UTF8                =  BOM  UTF-­‐8-­‐STRING            BOM                          =  %xEF.BB.BF

Until researching into this I thought

Syslog messages had fixed structure and

could be then handled with fixed database schema.

Then I read the RFC5424: http://

tools.ietf.org/html/rfc5424

Here we have once again parameters, although they are within one defined STRUCTURED-

DATA field.

So could NoSQL be used also for Syslog?

So what happens next?• Selection of NoSQL database:

• Likely Column Family Store if no one can suggest a better one?

• Something easy to setup and use, will concentrate into getting RADIUS server and/or Syslogd transferring data to database.

• Setting up a WiFi access point and/or controller to provide real RADIUS and Syslog data

• Storing data, retrieving data, searching data, deleting data to see what works

• Writing and presenting Part II: “Implementation and Results” of these slides

Results (hopefully)

• Is storing RADIUS accounting and Syslog messages into NoSQL database: a brilliant idea, brilliantly stupid idea or something else?

• How hard can it be? What does it require to do this, is it possible and how?

• Does it actually work? What can you do with data? Is there some indication of performance improvements or problems?

• Will not do complete performance measurements though, designing and setting up reliable measurement environment will probably take too much time.

Using NoSQL databases to store RADIUS and Syslog data,

part 1I: The Saga Continues

Karri Huhtanen27.11.2012

Happened earlier

• currently RADIUS accounting data is stored usually in SQL databases with fixed database schema

• for Syslog messages an SQL database can be used, but commercial log analyzers (like Splunk) usually use their own solutions which may or may not be SQL databases

• Started thinking if NoSQL database could be applied to both or one of these?

Results (luckily)

• Is storing RADIUS accounting and Syslog messages into NoSQL database: a brilliant idea, brilliantly stupid idea or something else?

• How hard can it be? What does it require to do this, is it possible and how?

• Does it actually work? What can you do with data? Is there some indication of performance improvements or problems?

• Will not do complete performance measurements though, designing and setting up reliable measurement environment will probably take too much time.

a good idea

easy, 1 night before presentation required

Yes. Store and Process. Unknown.Some issues to be considered.

Coded one Python script.

So what happened?• Selection of NoSQL database:

• Likely Column Family Store if no one can suggest a better one?

• Something easy to setup and use, will concentrate into getting RADIUS server and/or Syslogd transferring data to database.

• Setting up a WiFi access point and/or controller to provide real RADIUS and Syslog data

• Storing data, retrieving data, searching data, deleting data to see what works

• Writing and presenting Part II: “Implementation and Results” of these slides

MongoDB

Done, but not thoroughly

Done

That was the executive summary. Thank you.

Now some more detailed information and even some code.

storing RADIUS accounting and Syslog messages into NoSQL database

• It is a good idea because:

• When we have massive amount of log or accounting data, we need massive database clusters.

• Data is mainly stored, read, analyzed and occasionally deleted. Data will not be updated or changed and is relatively simple (few tables with a lot of columns).

• NoSQL may provide better way to scale this horizontally by distribution and sharding.

• It is already being done. Several log analyzers, stores already use NoSQL databases as backends. There exists projects such as Greylog2 etc. which provide complete solutions from log storage, visualization, analysis etc.

• Logs and accounting data are actually use cases for some NoSQL databases, for example: http://docs.mongodb.org/manual/use-cases/storing-log-data/

storing RADIUS accounting and Syslog messages into NoSQL database

• It is not a brilliant idea because:

• If we look what we need to do to optimize the performance it starts to look like a lot like designing and optimizing a SQL database: http://docs.mongodb.org/manual/use-cases/storing-log-data/

• You cannot forget datatypes or database design even with NoSQL databases especially when going into production.

• Prototypes may be faster and easier for developers, but creating a design and configuration which survices production use may be as hard as it has ever been. The difference is that instead of SQL database expert, you know need a NoSQL expert.

• ... but it is not a brilliantly stupid idea either, it is an idea worth considering depending of the project.

How hard can it be?

• With Ubuntu Linux Server 12.04 LTS:

• sudo apt-get install python-pymongo mongodb syslog-ng syslog-ng-mod-mongodb

• for Syslog-NG, just some configuration

• for Radiator, some configuration and coding an external Python script to handle accounting messages

• But this is far from production use, it is more like proto or proof of concept implementation done in 1 work day.

Demo

Radiator RADIUS server

# /etc/radiator/radiator.cfg## send all RADIUS accounting requests to external script#<Handler Request-Type = Accounting-Request> <AuthBy EXTERNAL> Command %D/acct2mongo.py </AuthBy> AcctLogFileName %L/acct-acct2mongodb-%Y-%M.log</Handler>

#!/usr/bin/env pythonfrom pymongo import Connectionimport datetimeimport sys

def main():

line = str() post = dict()

# opening connection connection = Connection( 'localhost', 27017) # database 'radius' db = connection['radius'] # collection 'accounting' collection = db['accounting']

post['acct2mongotimestamp'] = datetime.datetime.utcnow()

for line in sys.stdin.readlines(): pieces = line.split(' = ', 1) if len(pieces) == 2: post[pieces[0].strip().strip('"')]=pieces[1].strip().strip('"')

collection.insert(post) connection.end_request() connection.disconnect()

# 0 Means reply with an acceptance. For Access-Requests, # an Access-Accept will be sent. For Accounting-Requests, # an Accounting-Response will be sent. return 0

if __name__ == '__main__': main()

acct2mongo.py

Does it actually work? What can you do with data?

• Yes it does actually work, but once again it does not solve or be applicable to everything.

• One can store, read, search and delete data supposedly very efficiently, but anything more complicated is harder and must be implemented by developer.

• For example: MongoDB does not have a reliable decimal datatype. It is better to keep numbers as a string and convert them when processing data.

• Repeating earlier statement: “You cannot forget datatypes or database design even with NoSQL databases especially when going into production.”

Performance?

• Would need to be measured and verified and with real production environment or solution.

• Would also need to be compared with well designed and optimised SQL database, maybe even one functioning as NoSQL one.

• In the implementation this was not tested as the datasets were very small compared to real datasets.

Conclusions• NoSQL should be at least considered as an option

when designing and implementing large scale Syslog or Radius Accounting storages.

• For development it is flexible.

• For production use NoSQL solution still needs design, careful planning and testing to verify if the performance, reliability and security is enough. Probably as much as SQL database design.

• Key issue will probably be can the SQL database handle the data or is horizontal scaling required.

top related