d3.2.1 - scenarios analysis and external languages specification_v1.0_final

MAnagement of Security information and eventsin Service InFrastructures

MASSIFFP7-257475

D3.2.1 - Scenarios analysis and externallanguages specification

Activity A3 Workpackage WP3.2

Due Date December 2010 Submission Date 2011-02-04

Main Author(s) Herv Debar (TSP)

Version v1.0(Rev : 92) Status Final

DisseminationLevel

CO Nature R

Keywords security languages, event languages, alert languages

Reviewers Luigi Romano (CINI)

Claudio Soriente (UPM)

Part of the SeventhFramework Programme

Funded by the EC - DG INFSO

MASSIF - FP7-257475D3.2.1 - Scenarios analysis and external languages specification

Version history

Rev Date Author Comments

V0.1 2011-01-14 Herv Debar First draft for review

V1.0 2011-02-03 Herv Debar Final version after 2nd review cycle

V1.0 2011-02-04 Elsa Prieto (Atos) Final review and delivery

2011 by MASSIF Consortium 2 / 61


Glossary of Acronyms

Abbr Abbreviation

BSCW Be Smart - Cooperate Worldwide

CEF Common Event Format

CLF Common Log Format

CSS Cascading style sheets

DoW Description of Work

EC European Commission

EU European Union

FP7 Seventh Framework Programme

FTP File Transfer Protocol

IEFT Internet Engineering Task Force

LEA Log Extraction API

MASSIF MAnagement of Security information and events in Service InFrastructures

MSS Managed Security Service

MSSP Managed Security Service Provider

OASIS Organization for the Advancement of Structured Information Standards

ODBC Open Database Connectivity

PU Public Usage

R&D Research & Development

RSS Really Simple Syndication

SCP Secure Copy

SFTP Secure File Transfer Protocol

SIEM Security Information and Event Management

SNMP Simple Network Management Protocl

SSH Secure Shell

WMI Windows Management Infrastructure

W3C World Wide Web Consortium



Executive Summary

Deliverable D3.2.1 is one of the first technical productions of the MASSIF project. The description ofwork specifies that this document is an analysis of input and output formats from use case scenarii,and specification of common message formats for these data streams. This document has thereforetwo objectives, enumerate data formats and models that have been used by the partners of the projectin SIEM-related projects, and provide a first glimpse at use cases, from a data point of view, that willspread knowledge and understanding among partners on these use cases, and provide a first evaluationof the importance of the aforementioned data formats. The document is constituted of 2 parts, Alert andEvent Languages describing security alerts and events, and use-case specific data streams describinglog formats specific to the proposed use cases. This document concludes with an analysis highligtingseveral characteristics shared between these languages and event formats, among wich simplicity ofthe information representation that must be easily readable, timestamping and modularity of the formatstructure.


Contents

1 Introduction 111.1 Deliverable objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.2 MASSIF architecture sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Alert and Event Languages 142.1 Languages selection rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.1 Analysis of Commercial SIEMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.1.2 Presentation of log sources selection . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 The Common Event Format (CEF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Structure overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Links with other data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Relationship with MASSIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.4 Critical assessment of the format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3 The Common Log Format (CLF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.3.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19



2.4 The Intrusion Detection Message Exchange Format (IDMEF) . . . . . . . . . . . . . . . . 212.4.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5


2.4.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.4.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21



2.5 InterFace to Metadata Access Point (IF-MAP) . . . . . . . . . . . . . . . . . . . . . . . . . 232.5.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.5.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.5.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24



2.6 Incident Object Description and Exchange Format (IODEF) . . . . . . . . . . . . . . . . . 262.6.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.6.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.6.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27



2.7 IP Flow Information Export (ipfix) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.7.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.7.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.7.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29





2.8 The Syslog Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.8.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.8.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.8.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32



2.9 Windows Management Instrumentation (WMI) . . . . . . . . . . . . . . . . . . . . . . . . 352.9.1 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.9.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.9.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36



2.10 WS-Eventing and WS-Notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.10.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.10.2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Delivery mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Links with other data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Relationship with MASSIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.10.3 Advantages of the formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3 Use-case specific data streams 433.1 Olympic Games Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.1.1 Motivation and description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.1.2 Novell Sentinel Interface: Syslog data format . . . . . . . . . . . . . . . . . . . . . 44

Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Drawbacks and issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.1.3 Novell Sentinel Interface: LEA API . . . . . . . . . . . . . . . . . . . . . . . . . . . 46



Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Drawbacks and issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.2 Mobile Money Transfer Service scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.2.1 Motivation and description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.2.2 Mobile Money Service: proprietary data format . . . . . . . . . . . . . . . . . . . . 47

3.3 Managed Enterprise Service Infrastructures scenario . . . . . . . . . . . . . . . . . . . . 493.3.1 Motivation and description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.3.2 Tivoli TSOM interface: SNMP data format . . . . . . . . . . . . . . . . . . . . . . . 503.3.3 Tivoli TSOM interface: Syslog data format . . . . . . . . . . . . . . . . . . . . . . . 51

3.4 Critical Infrastructure Process Control (Dam) scenario . . . . . . . . . . . . . . . . . . . . 513.4.1 Motivation and description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.4.2 Dam scenario: Modbus data format . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Structure overview (Modbus) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Modbus Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Issues (Modbus) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Modbus Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.4.3 Dam scenario: WSN and CTP data formats . . . . . . . . . . . . . . . . . . . . . . 56WSN Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Advantages (WSN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58Issues (WSN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.4.4 Links with other data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4 Analysis and Conclusion 594.1 Analysis of alert and event languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.2 Analysis of use case specific data streams . . . . . . . . . . . . . . . . . . . . . . . . . . 60


List of Figures

1.1 MASSIF Blueprint Architecture (proposed) . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1 An example metadata graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.2 Windows Management Infrastructure architecture data flow . . . . . . . . . . . . . . . . . 36

3.1 Log example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.2 General Modbus Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.3 Modbus transaction (error free) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.4 Modbus transaction (exception response) . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

9

List of Tables

2.1 RSA Envision collectors summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2 Included log sources summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.3 Eliminated log sources summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.1 Money Transfer Message Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.2 TIVOLI TSOM SNMP Trap content example . . . . . . . . . . . . . . . . . . . . . . . . . . 51

10

Chapter 1

Introduction

1.1 Deliverable objectives

This deliverable is one of the first technical productions of the MASSIF project. The description of workspecifies that this document is an analysis of input and output formats from use case scenarii, andspecification of common message formats for these data streams. This document has therefore twoobjectives:

enumerate data formats and models that have been used by the partners of the project in SIEM-related projects, in order to give a broad overview of the richness of the field, and prepare thedefinition of the ontology (MASSIF Deliverable 3.2.2).

provide a first glimpse at use cases, from a data point of view, that will spread knowledge andunderstanding among partners on these use cases, and provide a first evaluation of the importanceof the aforementioned data formats.

As one can see from these two items, data is at the core of the MASSIF project, since Security Infor-mation and Event Management is, at the heart, about gathering data, analyzing it, and making informeddecisions in the ICT security domain. With respect to data gathering, this document concentrates onthe syntax and semantics of the information, regardless of location or actual transport mechanisms.Resilient event collection is handled in workpackage 3.1, scalable event processing engine. The onlyassupmtion of this document is that whatever format chosen will be available without restrictions throughWP31. With respect to data analysis, methods will be studied in WP33 (event collection, parsing andpropagation) on the sensor side and WP34 (event filtering, aggregation, abstraction, and correlation) onthe SIEM platform side. We will thus focus on the syntax and semantic of as many data formats as feltpertinent by the projects partners.

In accordance with the objectives of the document, we have segmented it in two main parts, asfollows:

Alert and Event Languages Chapter 2 gathers all formats and languages that represent transient in-formation, information that is time-driven and that has to be handled by the MASSIF SIEM system

11


to manage the security status of the monitored system. In this area, we will focus on languages thatare considered having standards status, either through their publication mechanism or because oftheir widespread use.

Use-case specific data streams Chapter 3 describe the use cases data stream formats. We are par-ticularly interested in describing the specificities of the content of the data streams, such as theway they build syslog message contents, as most of the syntax should be covered in the previouschapters.

1.2 MASSIF architecture sketch

The work on data streams analysis has to be considered in relationship with the definition of the MASSIFplatform architecture. While the mandate of this document is not to specify an architecture for theMASSIF project, we do introduce it with thoughts on a very simple architecture sketch shown in figure1.1.

Figure 1.1: MASSIF Blueprint Architecture (proposed)

Figure 1.1 separates the world in two parts, the MASSIF SIEM Platform plane and the monitoredbusiness system plane. The former is under full control of the project and the latter should be left as



undisturbed as possible, or at least the capabilities required by the MASSIF SIEM system in terms ofmonitoring and countermeasures should be fixed and acceptable to the business system owners.

Within the monitored system, we have separated three functions, intrusion detection sensors, busi-ness process components, and access control. Business process components have as primary functionto service users; however, they have also auditing capabilities in the form of log files, and minimal policyenforcement capabilities like startup and shutdown. Sensors have as primary function to detect andreport sensitive events, either attacks or anomalies. Access control and identity management are secu-rity policy components, whose interaction with the MASSIF SIEM system will be the primary mean forsecurity response. In the current security litterature, intrusion prevention systems should be consideredas belonging to the two last categories.

Within the SIEM platform, we separate the operational decision support subsystem, handling thealerts in real time, and the model management subsystem, which evaluates and updates the decisionsupport system according to its past performance, to the evolution of the monitored system, and to theevolution of the global knowledge (vulnerabilities, etc.).

The most relevant part of this architecture for the present deliverable is the exchanges between thetwo planes, which we model as follows:

Events (push) This stream describes events being pushed by the monitored business system to theMASSIF SIEM platform. These events are typically alerts or logs driven by the interactions that themonitored business system has with the outside world (users, updates, etc.) The formats used inthis data stream are described in section 4.1, alert and event languages.

Events (pull) This stream describes events being requested by the MASSIF SIEM platform from themonitored business system. This allows the business system to store data and only make it avail-able to the MASSIF SIEM if necessary. It is a way for the SIEM platform to ask questions or verifyinformation that it has on the monitored system. The formats used in this data stream should besimilar to the ones described in section 4.1, alert and event languages.

Configurations (Commands) This stream describes modifications of the behaviour of the businesssystem that are driven by the MASSIF SIEM system, mainly for update or response purposes.This stream is important for alert correlation, but is outside the scope of this document.

Audits This stream represents the interaction of the model management subsystem with the monitoredbusiness system. While it is analytically a different data stream, it might be assimilated to thecombination of both event (push + pull) streams, and might be implemented in this way, to simplifythe plane interface management. This stream is particularly important for model acquisition andmaintenance, but is outside the scope of this document.

The refined small blue arrows precise the data stream names in the case of sensors and should betreated as examples only for the purpose of this deliverable.

This blueprint architecture will further evolve as the specifications of the MASSIF SIEM prototype aredeveloped.


Chapter 2

Alert and Event Languages

2.1 Languages selection rationale

Since it is impossible to produce a comprehensive list of all formats, we have specified selection criteriato include only a subset of the available data formats. One first need to note that we are interested informats, not in transport protocols. Unfortunately, there is a very close association between data formatsand transport protocols in several cases, which makes it difficult to exactly understand the motivationsof developers and users. Another consideration is that we do not need to describe all formats, but weneed to identify formats that are also generic representations of information.

The following elements are the foundations of our rationale:

SIEM-market supported We have looked at the SIEM market, and specifically the adapters that theyprovide. We have specifically analyzed five major commercial SIEMs, OSSIM and Prelude (rep-resented in the MASSIF project), as well as Envision from RSA Security, Novell Sentinell andArcsight, to understand what kind of data formats they collects. This analysis is further detailed insection 2.1.1.

Standards-body driven We are interested in using formats that are supported by open standards or-ganizations, and that are freely available. In that group, we have selected standards defined by theInternet Engineering Task Force (IETF), the World Wide Web Consortium (W3C) and the Organi-zation for the Advancement of Structured Information Standards (OASIS). Even though they maynot be in production use today, they do provide an interesting and collective vision of the problemthat we are addressing, and some of them have actually been used.

User-supported Finally, we have also drawn from the collective experience and knowledge of theprojects partner, particularly the commercial users and use case providers, to complement andconfirm the first two criteria.

14


2.1.1 Analysis of Commercial SIEMs

Commercial SIEM vendors have a strong marketing incentive to collect information from as many datasources as possible, in order to market their products as a data warehouse for logging and compliance.They also have a strong technical incentive to limit the number of protocols they understand, in order tosimplify not only development but also integration. Therefore, we expect from the documentation of thecapabilities of the SIEM products many data sources but few protocols.

A summary of the list of connectors for the RSA EnVision SIEM is presented in table 2.1. It lists 186products but only about 15 different connectors. We have counted in table 2.1 the number of times eachconnector type appears in the documentation1. This summary shows that a majority of log sources areconnected via Syslog. The three other important mechanisms are acquisition of log files via FTP, ODBCand SNMP; however, SNMP does not even mention if it is about traps; or which management informationbases are involved. The other connectors are dedicated to a specific set of tools (e.g. Checkpoints LEAor Windows WMI).

Connector identity Number of instances Percentage

Syslog 95 51%

Log File FTP 25 13%

ODBC 25 13%

SNMP 20 11%

File Reader 4 2%

Agentless Windows 4 2%

Other connectors 13 7%

Total number of interfaced products 186 100%

Table 2.1: RSA Envision collectors summary

Novell Sentinel documents connectivity to at least 61 products, using 11 different connectors. Theidentification of the collectors is extremely similar to the one shown in table 2.1. Even though we do nothave available the same level of detail, we surmise that the results would be quite similar.

One of the major issues when dealing with SIEM tools is the lack of separation between the dataformat and the transportation protocol. In fact, the operation of these products requires understandingnot only of the protocol, but also of the content and semantic of the message. That is why ArcsightsSIEM has published its interface specification, the Common Event Format, wishing for wide adoption bythe community of security tools vendors. While this has not seen the light, it provides an interesting andimportant viewpoint at the way SIEM vendors see their data providers today.

Finally, one needs to note that Prelude, one of the SIEM tools we are looking at in MASSIF, is usingthe IETF standard IDMEF for its data format, even though it is not using the companion IDXP protocol.

1Whenever a product listed several connectors, we selected the most represented one.



2.1.2 Presentation of log sources selection

One also needs to realize that this analysis does not give us information about deployment in the field,or only in an approximate way. We have therefore added a third element, the experience of the partnersin the field, to evaluate the data sources and reinforce our selection criteria. Table 2.2 presents ourselection of log sources that are included in the alert and event languages description.

As one can see from table 2.2, our selection points us to 8 different alert and event languages.Beyond the ubiquitous syslog, we have included languages that are important either because of theirstandard status, and because they will help us reach the goals of the project, even though they are notcurrently used in SIEM environments (to the best of our knowledge). When analyzing the existing SIEMenvironments, we have also eliminated the description of log sources from this deliverable. The reasonsfor not selecting these sources are presented in table 2.3.

We will now proceed to the description of the alert and event languages, following as much aspossible an homogeneous template. The description in itself is kept short, as the reader is refered toalready existing documentation. We have rather focused on our experience with these data sources,and their relationship to the project.

2.2 The Common Event Format (CEF)

2.2.1 Reference

The Common Event Format (CEF)2 is specified and provided without charge by Arcsight Inc3, a SIEMvendor, as part of its strategy to foster interoperability between its SIEM vendor and sensors vendors.

2.2.2 Objectives

The Common Event Format (CEF) is an open log management standard that improves the interoper-ability of security-related information from different security and network devices and applications. CEFhas been designed to enable technology companies and customers to use a common event log formatso that data can easily be collected and aggregated for analysis by an enterprise management system.

2http://www.arcsight.com/collateral/CEFstandards.pdf

3http://www.arcsight.com/



2.2.3 Structure

Structure overview

CEF is an extensible, text-based, high-performance format designed to support multiple device typesfrom both security and non-security devices and applications in the most simple manner possible, unlikeother standards that target a single component of the security infrastructure, are tied to a specific trans-port protocol, or are designed specifically for applications and cannot support todays high-performance,real-time security requirements.

To simplify integration, the syslog message format is used as a transport mechanism. However, if anevent producer is unable to write syslog messages, it is still possible to write the events to a file.

The basic grammar of the format includes the self-explanatory fields:

CEF:Version|Device Vendor|Device Product|Device Version|Signature

ID|Name|Severity|Extension

An example of a CEF message taken from the documentation is:

Sep 19 08:26:10 zurich CEF:0|security|threatmanager|1.0|100|worm

successfully stopped|10|src=10.0.0.1 dst=2.1.2.2 spt=1232

Links with other data formats

CEF is fairly close to syslog in spirit, and also share similarities with the Security Device Event Exchange(SDEE)4, a joint effort between Cisco and SourceFire to standardize events coming out of network-basedintrusion detection sensors.

Relationship with MASSIF

This format should be considered in the light of competition. Owning the base data format is a wayto lock customers into a specific SIEM platform, in this case Arcsights, because of the investment indeveloping translation agents for custom logs and in deploying these agents in the field. It might beuseful to have at least import capabilities from CEF into MASSIF.

4http://www.cisco.com/en/US/docs/security/ips/specs/CIDEE_Specification.htm



2.2.4 Critical assessment of the format

Advantages

The Common Event Format promotes interoperability between various event (or log) generating devices.Although each vendor has its own format for reporting event information, these event formats often lackthe key information necessary to integrate the events from their devices.

The ArcSight standard attempts to improve the interoperability of infrastructure devices by aligningthe logging output from various technology vendors.

The Extension Dictionary from the CEF provides a broad set of predefined extension keys whichcovers most event log requirements.

Issues

Custom extension keys are recommended for use only when no reasonable mapping of the informationcan be established for a predefined CEF key. While the custom extension key mechanism can be usedto safely send information to CEF consumers for persistence, there are certain limitations as to whenand how to access the data mapped into them.

Data submitted to ArcSight Logger using custom key extensions is retained in the system; however,it is not available for use in the Logger reporting infrastructure.

Uses

Use of the CEF format is limited to Arcsights deployments, despite the lobbying efforts deployed.

2.3 The Common Log Format (CLF)

2.3.1 Reference

The Common Log Format (CLF) and its sibling the Extended Common Log Format (ECLF) are specifiedby the W3C community5 and by the Apache developper community6. This format falls into the categoryof de-facto standards; while it is widely adopted by web servers, there is no normative reference.

5http://www.w3.org/Daemon/User/Config/Logging.html#common-logfile-format

6http://httpd.apache.org/docs/2.2/logs.html#common



2.3.2 Objectives

The Common Log Format is used by web servers, in particular the Apache web server, to trace allrequests processed by the server. It is generally shared by all log files (access.log, error.log, and others).While the Apache web server offers the possibility to customize the log format, the users tend to keepthe default configuration, using either the simple CLF format, or its extension the ECLF format, whichshares the same initial description.

2.3.3 Structure

Structure overview

The CLF format stores the following information:

IP address of the origin of the request as presented to the server. If the requesting browser is behinda proxy, the address of the proxy will show up in the logs.

identd identity of the client as specified in RFC 1413[8].

userid of the requester as determined by HTTP authentication.

Timestamp of the request.

Request line presented by the client, including the method, the URI and the protocol.

Status code that was returned to the client, indicating how the server was able to fulfil the request.

Size of the object returned to the client.

The ECLF format includes in parenthesis, after the information provided by CLF, additional informa-tion provided by the client, such as the referign URL and user agent identifiyng the clients browser.


This format is similar in spirit to syslog (one line of timestamped textual information), but is tailored forweb servers.


We expect that all web servers providing information to the MASSIF platform will use this format(s).




Advantages

The CLF format is very easy to use and very informative. Even though it limits itself to HTTP headerinformation, it synthetizes the important aspects of the activity of the web server, from the point of viewof security: who asked what, when, and how did the server react. It is extremely compact and thusefficient in terms of processing. Being widely adopted by web servers developers and proxy developers,it provides a solid basis for analysis and detection of malicious activity aiming to subvert the web serverthrough the use of the HTTP protocol.

Issues

The CLF format does suffer from several issues, that have an impact on the detection and diagnosing ofattacks:

Multiplicity of lines Since the HTTP server may serve multiple requests for a single page view, a com-plete diagnosis may require the analysis of multiple lines which are not necessarily sharing anidentifying token.

Lack of payload information The log file does not contain HTTP payload information. This means thatfor methods such as POST, the complete information is not available for diagnosis. This may be aserious limitation for diagnosing infections such as XSS or SQL injection, for example if content ispushed into comments in dynamic web sites.

Lack of server-side information The log file does not contain information identifying the web server(such as the virtual server accessed). This is a serious limitation in identifying the exact target ofthe attacker.

Uses

The CLF format is extremely used for web servers.



2.4 The Intrusion Detection Message Exchange Format (IDMEF)

2.4.1 Reference

The Intrusion Detection Message Exchange Format (IDMEF) is normalized by the Internet EngineeringTask Force (IETF) as RFC 4765[5].

2.4.2 Objectives

The Intrusion Detection Message Exchange Format (IDMEF)[13] is intended to be a standard data for-mat that automated intrusion detection systems can use to report alerts about events that they deemsuspicious. The development of this standard format aims at enabling interoperability among commer-cial, open source, and research systems, allowing users to mix-and-match the deployment of thesesystems according to their strong and weak points to obtain an optimal implementation. It standardizesmessages between a sensor providing security analysis and detecting threats, and a manager whichreceives and treats these messages. In the MASSIF context, the manager should be either the SIEMplatform itself or a gateway to it.

2.4.3 Structure

Structure overview

IDMEF is built as an UML class diagram of components. The standard defines two types of messages,Alert (for security information) and Heartbeat (for management information). A message is an aggre-gation of components, modeling various entities that are part of an intrusion-detection sensor. At thetop level, a message requires a timestamp (CreateTime in IDMEF), a meaning (Classification in IDMEF)and a generating sensor (Analyzer in IDMEF). The two other major components are the target and thesource of the attack. Each of these blocs has a complex structure, that attempts to capture the variousfacets that characterize a component of an information system. One example of the elementary compo-nents that compose these larger blocks is the notion of Node, which is found both in Analyzer, Sourceand Target, which models a machine.




IDMEF per se does not have links with other formats. However, several tools including Prelude providemechanisms for parsing log formats, for example syslog or clf, and transform these log formats intoIDMEF messages. This parsing includes and requires knowledge not only of the source format but alsoof its semantic, in order to provide meaningful conversion.


The IDMEF format is the back-end format of Prelude. It is also used by 6cure and Tlcom SudParis fortheir research activities, to represent and transmit alert information.


Advantages

Semantic IDMEF is extremely conscious of the semantic of the information it manipulates, and doesmuch more that providing a syntax. Furthermore, it provides rationales and explanations to limitinterpretation by developers and thus reduce ambiguity. IDMEF also includes many constants thatstrongly type objects. While the manner in which these constants are defined may not be the best,the idea of strongly typing objects is very important in contributing to strong and clear semantic.

Modularity IDMEF is built of a set of components and thus is extremely modular. It also providesfacilities for referencing components instead of including them in the message, which contributesto the efficiency in transfering and sharing identical information.

Extensibility IDMEF provides facilities for including structured information in a message, under theform of the AdditionalData blob. This facility enables including original messages within IDMEF, orinformation that becomes available at a later stage.

Issues

Dissemination Even though IDMEF is an RFC, it is only an informational one and it has not beenwidely picked up by the security product developers, as sensor developers prefer simpler and lessconstrained solutions, and as SIEM developers have prefered to own their base formats.



XML IDMEF is an XML format, thus it is quite verbose. While for transport purposes it compressesquite well, it should not be used for storing information, nor for developing DB schemas. Also, thenormative reference is the XML DTD and not the XML schema, thus type checking is less precise.

extensibility IDMEF is extensible through the use of XML blobs. The idea is nice and useful, but thereare currently no possibilities for creating and sharing standard or useful patterns out of these blobs.

Uses

The IDMEF format is used mostly in the research community as a standard back-end for intrusion de-tection and alert correlation research projects and communities. It is also used by the Prelude SIEMenvironment7 as its back-end data format (although the companion transport protocol IDXP is not usedby Prelude).

2.5 InterFace to Metadata Access Point (IF-MAP)

2.5.1 Reference

trustedcomputing.org

http://www.trustedcomputinggroup.org/developers/trusted_network_connect/

Specification document of IF-MAP 2.0 [11]

Specification document of IF-MAP Metadata for Network Security [12]

2.5.2 Objectives

IF-MAP is an interface specification between a Metadata Access Point (MAP) Server and entities thateither publish metadata or that subscribe to metadata from the MAP. The entities are called IF-MAPclients, while the Server is referred to Metadata Access Point (MAP) or as IF-MAP Server. The latterprovides functionalities to publish metadata, to search through the stored metadata and enable clientsto subscribe to specific data and be notified on the event of data changes.

As IF-MAP aims to enable the structured collection and provision of data, it is not only a language todescribe (security) events. Nevertheless, a specification of a metadata language for network security ispart of IF-MAP [12]. As IF-MAP has been created by the TNC working group of the Trusted ComputingGroup, its foremost purpose is the gathering of information that can be used in order to apply accessdecisions in a networking environment. Thus the metadata comprises elements like registered address

7http://www.prelude-ids.org/



Figure 2.1: An example metadata graph

bindings, authentication status, endpoint policy compliance status, endpoint behavior, and authorizationstatus. But the specification is open and the process is not finished, thus allowing to influence thedefinition of models for metadata describing any kind of information.

2.5.3 Structure

Structure overview

The IF-MAP specification comprises of two single documents yet. One is the general descriptionand SOAP binding TNC_IFMAP_v2_0r36.pdf, also referred to as IF-MAP 2.0 [11]. The other is thespecification of IFMAP Metadata for Network Security which is v1.0 revision 25 at the time of writ-ing this document [12]. Additionally, for a quick overview, we propose reading the IF-MAP FAQ underwww.trustedcomputing.org.

The session based communication between a MAP client and server is always initiated by the clientand is based on SOAP. The commands comprise different kinds of publish (update, delete etc.), sub-scribe (e.g. notification poll) and search.

The data model of IF-MAP comprises two types of data. The identifier (e.g. identities of severaltypes, mac-address, ip-address) and the metadata which can be related to each other by a link. Fig-ure 2.5.3 visualises the data model used in IF-MAP where identifiers are represented by ovals, metadatais represented by rectangles, and links are represented by lines connecting identifiers.




The metadata description language is XML, thus any event description based on XML should be easilyintroduced.


The IF-MAP specification allows a publish and subscribe model for the information collection and pro-cessing. This could have a major impact to the different tasks of MASSIF as it might facilitate an interfacefor the security information. This does not apply to a single use case only but refers to all four use casesand could even enable a combination of security information of the use cases and the different SIEMtools in order to enable convergence and collaboration as well as a uniform presentation of the MASSIFappliances.


As part of these points have been described in previos subsections, this sections provides bullet pointsmainly.

Advantages

Provision of an interface for various kinds of security information

A central database for information based on one protocol

A simple publish/subscribe data collector

Standard enables integration of application & system input & output from different vendors.

Opportunity to create a vocabulary explicitly for the needs of MASSIF and

thereby have an influence on the standardisation process

IF-MAP is intrincically defined to be extensible

Close contact of SIT with FHH (open source IF-MAP server irond) and Infoblox (IF-MAP serverIBOS and IF-MAP starter kit) and

opportunities of cooperation (user group) and dissemination (though Infoblox who are activelyadvertising every adoption of IF-MAP)



Issues

As the specification of the metadata is not concluded or only consists of NAC information, respec-tively, there is no fully-fledged vocabulary. Nevertheless, one could add additional metadata typesthrough the use of other tags.

The standardisation of IF-MAP is not finished, so the specification might evolve during the runof MASSIF. Standardisation with the IETF is planned for summer 2011 but usually takes severalyears.

Uses

As the metadata definition does not yet exceed that of network security information, normal applicationsaccording to the TCG are:

Federation between remote access and network access control (NAC).

Integration of NAC with endpoint monitoring and e.g. data leak detection.

Integration of physical access control with NAC.

Federation of authentication information, single sign on/off.

Real time information gathering and processing.

There are a lot of potential applications, specifically interesting to the goals of MASSIF. The TCG men-tions applications in the field of smart grid and cloud security for reasons, that enable IF-MAP to facilitateSIEM integration, such as aggregating, correlating and distributing of data from various applications andsystems.

2.6 Incident Object Description and Exchange Format (IODEF)

2.6.1 Reference

The Incident Object Description and Exchange Format (IODEF) is normalized by the Internet Engineer-ing Task Force (IETF) as RFC 5070[4].



2.6.2 Objectives

The Incident Object Description Exchange Format (IODEF) is a format for representing computer secu-rity information commonly exchanged between Computer Security Incident Response Teams (CSIRTs).It provides an XML representation for conveying incident information across administrative domainsbetween parties that have an operational responsibility of remediation or a watch-and-warning over adefined constituency. The data model encodes information about hosts, networks, and the services run-ning on these systems; attack methodology and associated forensic evidence; impact of the activity; andlimited approaches for documenting workflow. The structured format provided by the IODEF allows forincreased automation in processing of incident data; decreased effort in normalizing similar data fromdifferent sources; and a common format on which to build interoperable tools for incident handling andsubsequent analysis, specifically when data comes from multiple constituencies.

2.6.3 Structure

Structure overview

The IODEF implementation is specified as an Extensible Markup Language (XML) document type def-inition. The data model is composed of nineteen classes that describe the data related to the incident(e.g. incident ID, related activity, time, assessment, history, etc). The data model serves as a transportformat; it does not attempt to dictate a definition for an incident, it rather assumes a broad understandingof an incident that is flexible enough to encompass most operators. Since describing an incident for alldefinitions requires an extremely complex data model, the IODEF intends to be a framework to conveycommonly exchanged incident information, ensuring that there are ample mechanisms for extensibilityto support organization-specific information and techniques to reference the information kept outside themodel.


The data model of the Intrusion Detection Message Exchange Format (IDMEF) influenced the design ofthe IODEF. The classes of the data model can be extended through the use of extensible classes, whichprovide the ability to have new atomic or XML-encoded data elements in all of the top-level classes ofthe Incident class and a few of the more complicated subordinate classes.

Similarly, while the IODEF supports different languages, the data model relies heavily on standard-ized enumerated attributes that can crudely approximate the contents of the document. With this ap-proach, a CSIRT should be able to make some sense of an IODEF document it receives even if the textbased data elements are written in a language unfamiliar to the analyst.





Advantages

The overriding purpose of the IODEF is to enhance the operational capabilities of CSIRTs. Communityadoption of the IODEF provides an improved ability to resolve incidents and convey situational aware-ness by simplifying collaboration and data sharing.

Implementing the IODEF in XML provides numerous advantages. Its extensibility makes it ideal forspecifying a data encoding framework that supports various character encodings, such as UTF-8 andUTF-16. Likewise, the abundance of related technologies (e.g., XSL, XPath, XML-Signature) makes forsimplified manipulation.

The data model supports multiple translations of free-form text. The intent is to allow the identicaltext to be encoded in different instances of the same class, but each being in a different language. Thisapproach allows an IODEF document author to send recipients speaking different languages an identicaldocument.

Issues

XML is fundamentally a text representation, which makes it inherently inefficient when binary data mustbe embedded or large volumes of data must be exchanged.

In order to support the changing activity of CSIRTs, the IODEF data model will need to evolve alongwith them. Internationalization and localization is of specific concern to the IODEF, since it is only throughcollaboration, often across language barriers, that certain incidents be resolved. The IODEF supportsthis goal by depending on XML constructs, and through explicit design choices in the data model.

The domain of security analysis is not fully standardized and must rely on free-form textual descrip-tions. The IODEF attempts to strike a balance between supporting this free-form content, while stillallowing automated processing of incident information.

As the data encoded by the IODEF might be considered privacy sensitive by the parties exchangingthe information or by those described by it, care needs to be taken in ensuring the appropriate disclosureduring both document exchange and subsequent processing. Similarly, care must be taken by the parserto properly authenticate the recipient of the document and ascribe an appropriate confidence to the dataprior to action.



Uses

We do not have specific information about the actual use of the IODEF by FIRST or CERT organizations.

2.7 IP Flow Information Export (ipfix)

2.7.1 Reference

The Internet Protocol Flow Information Export (IPFIX) requirements are normalized by the Internet En-gineering Task Force (IETF) as RFC 3917[10]. The specifications are normalized in the RFC 5101[2].

2.7.2 Objectives

The Internet Protocol Flow Information Export (IPFIX) has been created from the need of a standard forexporting Internet Protocol flow information collected from routers, probes and other devices used bymediation systems, accounting/billing systems and network management systems. The IPFIX standarddefines how IP flow information has to be formatted and transferred from an exporter to a collector. Pre-viously, many data network operators were relying on the proprietary Cisco Systems Netflow standardfor traffic flow information export. The IPFIX Working Group chose the Netflow version 9 as basis for thestandardization. The working group submitted the IPFIX Protocol Specification to the IESG for approvalin 2006.

2.7.3 Structure

Structure overview

IPFIX defines a flow as any number of packets observed in a specific timeslot and sharing a number ofproperties, like "same source, same destination, same protocol". The IPFIX protocol defines a precisearchitecture for flow data information exporting. This architecture includes an observation point forcollecting IP packets belonging to a specific observation domain. A metering process filters data packetsand aggregates information about these packets; this information defines the Flow Records. The FlowRecord contains metrics related to packet header data, timestamping, sampling, classification. FlowRecords are sent by the IPFIX exporter to an IPFIX collector, in charge of receiving and cataloguingIPFIX packets; exporter and collector are in many-to-many relation and work on a push based paradigm.



The IPFIX data format makeup is transmitted by means of template records to the collector; theycould be standard or user-defined. Template Records are an n-uple of type-size couples, used to defineentirely the structure and the semantic of a specific set of metrics sent to the collector. The collectordiscerns different Data Records by means of their Template ID. Data Records are composed of a certainnumber of Information Elements, representing the attributes description.


IPFIX is not strictly related to other data formats, apart from Cisco Systems NetFlow 9, its predecessorbefore the standardization. Despite this isolation, IPFIX data format could contain information for feedingan IDMEF message parser/sender: IP source and destination addresses, IP of target machines, times-tamps, data information. The format translation needs a proper IPFIX collector, in charge of extractingand classifying needed information.


IPFIX messages and protocol architecture supply information sent by several network devices, routers,sensors and critical nodes and machines, like network management systems. These different devicesare present in turn in almost all the scenarios.


Advantages

Modularity The IPFIX architecture and its many-to-many paradigm is operatively modular and fits per-fectly the needs of MASSIF for a distributed data metering system and for collecting data fromremote sites.

Flexibility The IPFIX standard, by means of Template Records, provides solutions to extend the datamessage format with user defined fields, for example for introducing non-standard InformationElements. Moreover it allows the definition of the messages structure. The standard works ondifferent transmission protocols like TCP, UDP or SCTP.

Interoperability The IPFIX protocol is standard and can rely on a widespread number of compliantdevices from several vendors, reducing the number of ad-hoc solutions.

Extensibility IPFIX information is not limited to flows: network behavior, performance behavior, appli-cation behavior, host behavior, security analysis are some of them.



Issues

Encryption Analysis of encrypted packets is a relevant issue for a proper data inspection. In encryptedscenarios, IP packets fields are encrypted and unobservable at several layers, so some metrics,related for example to protocol headers, cannot be evaluated.

Hardware requirements Probes must be deployed on every link to be monitored. Moreover deep in-spection on high bandwidth networks is not tolerated by a simple router device.

Collector flooding Since the protocol is push based the collector could suffer of excessive load comingfrom the probes. A careful exporting configuration must be considered.

Uses

The IPFIX format is largely implemented and adopted by generic network devices, like routers, andnetwork analysis devices provided by several vendors. IPFIX compliant devices are used as supportfor effective network measurement, providing vital information on the health of the managed networks;the collection of network information can be used for several purposes: the standard provides a strongback-end for security functionalities, like Intrusion Detection.

2.8 The Syslog Format

2.8.1 Reference

The Syslog Protocol is normalized by the Internet Engineering Task Force (IETF) as RFC 5424[6].

2.8.2 Objectives

The need for a new layered specification has arisen because standardization efforts for reliable andsecure syslog extensions suffer from the lack of a Standards-Track and transport-independent RFC.Without this, each other standard needs to define its own syslog packet format and transport mechanism,which over time will introduce subtle compatibility issues. The goal of this architecture is to separatemessage content from message transport while enabling easy extensibility for each layer.



2.8.3 Structure

Structure overview

This protocol utilizes a layered architecture, which allows the use of any number of transport protocolsfor transmission of syslog messages. It also provides a message format that allows vendor-specificextensions to be provided in a structured way. The syslog protocol does not provide acknowledgmentof message delivery. Though some transports may provide status information, conceptually, syslog is apure simplex communication protocol.

The syslog message has the following ABNF[3] definition:

SYSLOG-MSG = HEADER SP STRUCTURED-DATA [SP MSG]

HEADER = PRI VERSION SP TIMESTAMP SP HOSTNAME

SP APP-NAME SP PROCID SP MSGID

PRI = ""

PRIVAL = 1*3DIGIT ; range 0 .. 191

VERSION = NONZERO-DIGIT 0*2DIGIT

HOSTNAME = NILVALUE / 1*255PRINTUSASCII

APP-NAME = NILVALUE / 1*48PRINTUSASCII

PROCID = NILVALUE / 1*128PRINTUSASCII

MSGID = NILVALUE / 1*32PRINTUSASCII

TIMESTAMP = NILVALUE / FULL-DATE "T" FULL-TIME

FULL-DATE = DATE-FULLYEAR "-" DATE-MONTH "-" DATE-MDAY

DATE-FULLYEAR = 4DIGIT

DATE-MONTH = 2DIGIT ; 01-12

DATE-MDAY = 2DIGIT ; 01-28, 01-29, 01-30, 01-31

; based on month/year

FULL-TIME = PARTIAL-TIME TIME-OFFSET

PARTIAL-TIME = TIME-HOUR ":" TIME-MINUTE ":" TIME-SECOND

[TIME-SECFRAC]

TIME-HOUR = 2DIGIT ; 00-23

TIME-MINUTE = 2DIGIT ; 00-59

TIME-SECOND = 2DIGIT ; 00-59

TIME-SECFRAC = "." 1*6DIGIT

TIME-OFFSET = "Z" / TIME-NUMOFFSET



TIME-NUMOFFSET = ("+" / "-") TIME-HOUR ":" TIME-MINUTE

STRUCTURED-DATA = NILVALUE / 1*SD-ELEMENT

SD-ELEMENT = "[" SD-ID *(SP SD-PARAM) "]"

SD-PARAM = PARAM-NAME "=" %d34 PARAM-VALUE %d34

SD-ID = SD-NAME

PARAM-NAME = SD-NAME

PARAM-VALUE = UTF-8-STRING ; characters '"', '\' and ']'

; MUST be escaped.

SD-NAME = 1*32PRINTUSASCII except '=', SP, ']',

%d34 (")

MSG = MSG-ANY / MSG-UTF8

MSG-ANY = *OCTET ; not starting with BOM

MSG-UTF8 = BOM UTF-8-STRING

BOM = %xEF.BB.BF

UTF-8-STRING = *OCTET ; UTF-8 string as specified

; in RFC 3629

OCTET = %d00-255

SP = %d32

PRINTUSASCII = %d33-126

NONZERO-DIGIT = %d49-57

DIGIT = %d48 / NONZERO-DIGIT

NILVALUE = "-"

Syslog message size limits are dictated by the syslog transport mapping in use. There is no upperlimit per se. Each transport mapping defines the minimum maximum required message length support,and the minimum maximum must be at least 480 octets in length.

The TIMESTAMP field is a formalized timestamp derived from [RFC3339].The HOSTNAME field identifies the machine that originally sent the syslog message.The APP-NAME field should identify the device or application that originated the message. It is a

string without further semantics. It is intended for filtering messages on a relay or collector.The PROCID field is a value that is included in the message, having no interoperable meaning,

except that a change in the value indicates there has been a discontinuity in syslog reporting. Thefield does not have any specific syntax or semantics; the value is implementation-dependent and/oroperator-assigned.

The MSGID should identify the type of message. For example, a firewall might use the MSGIDTCPIN for incoming TCP traffic and the MSGID TCPOUT for outgoing TCP traffic. Messages with thesame MSGID should reflect events of the same semantics. The MSGID itself is a string without furthersemantics. It is intended for filtering messages on a relay or collector.



STRUCTURED-DATA provides a mechanism to express information in a well defined, easily parseableand interpretable data format. There are multiple usage scenarios.


Relationship with BSD Syslog, RFC 3164[9].


Given its widespread use, we expect many of the use cases to partially rely on it. Beyond the project,supporting syslog is an absolute requirement for commercial success of a SIEM platform, be it as soft-ware or as a managed security service.


Advantages

The syslog format tries to provide a solid basis that allows code to be written once for each syslog featurerather than once for each transport. Without this format, each other standard would need to define itsown syslog packet format and transport mechanism, which over time will introduce subtle compatibilityissues.

Issues

The protocol may content the NULL value as control characters. However, invalid UTF-8 sequences maybe used by an attacker to inject ASCII control characters. Similarly, message truncation can be misusedby an attacker to hide vital log information.

There is no mechanism in the syslog protocol to detect message replay. An attacker may record aset of messages that indicate normal activity of a machine. At a later time, that attacker may removethat machine from the network and replay the syslog messages to the relay or collector.

Some messages may be lost because there is no mechanism to ensure delivery, and the underlyingtransport may be unreliable (e.g., UDP).

Syslog can generate unlimited amounts of data. The transfer of this data over UDP is generallyproblematic, since UDP lacks congestion control mechanisms.

The syslog protocol does not have mechanisms to provide confidentiality for the messages in transit.



Network administrators must take the time to estimate the appropriate capacity of the syslog collector.An attacker may perform a Denial of Service attack by filling the disk of the collector with false messages.

Uses

Syslog is in widespread use, both for UNIX operating system hosts and for networking equipments.

2.9 Windows Management Instrumentation (WMI)

2.9.1 Reference

Windows Management Instrumentation (WMI) is the Microsoft implementation8 of Web-based Enter-prise Management (WBEM), which is an industry initiative to develop a standard technology for access-ing management information in an enterprise environment.

WMI uses the Common Information Model (CIM)9 industry standard to represent systems, applica-tions, networks, devices, and other managed components. CIM is developed and maintained by theDistributed Management Task Force (DMTF). The Managed Object Format (MOF)10 language is usedto create new CIM class.

2.9.2 Objectives

The main target of WMI is to provide a standard to share management information between managementapplications windows-based throughout the network. The aim of this set of specifications is to establish auniform model that allows working in different environments and interact with other existing managementstandards to access information from any source, such as DMI (Desktop Management Interface) orSNMP.

8http://msdn.microsoft.com/en-us/library/aa384642(v=VS.85).aspx9http://www.dmtf.org/standards/cim

10http://msdn.microsoft.com/en-us/library/aa823192%28v=vs.85%29.aspx



2.9.3 Structure

Structure overview

The Microsoft WMI implements the three-tiered model of the WBEM architecture for working with man-agement data that in this case includes the following components: a standard mechanism for storingobject definition (a CIM-compliant object repository), a standard protocol for collecting and distributingmanagement data (such as COM/DCOM), and one or more Win32 dynamic-link libraries (DLLs) thatfunction as WMI data providers.

Diagram shows the data flow in the WMI architecture11:

Figure 2.2: Windows Management Infrastructure architecture data flow11http://msdn.microsoft.com/en-us/library/ff566343%28v=VS.85%29.aspx



It is important to highlight that WMI is an object model and not a language. Several scripting lan-guages, such as VBScript or Windows PowerShell, can be used in WMI to manage the different windows-based servers locally and remotely.

The Windows Management Instrumentation defines the objects, methods and properties which areneeded to access to the management information data from the different parts of the operating system.The model that WMI uses to store this information is the standard Common Information Model (CIM).

According to the CIM Specification 2.312, there are three different levels of classes in the CIM modelfor storing information: the Core, Common and the Extended classes.

The core model define an information model that applies to all areas of management

The common model applies to information that is common to particular management areas (suchas systems, applications, networks and devices) but which is independent of a particular imple-mentation or technology.

The extension schemas are extensions to the common model for a specific technology, for examplefor different operating systems such as Microsoft Windows or Unix.

On the other hand, according to the CIM definition provided by the DMTF, CIM is composed of aspecification and a schema. The specification defines the details for integration with other managementmodels, while the schema provides the actual model descriptions.

The specification can be described in Unified Modeling Language (UML), Managed Object Format(MOF), or Extensible Markup Language (XML). But to create and describe classes in the CommonInformation Model (CIM), the Managed Object Format (MOF)13 is the most used and popular language.


WMI is an implementation of the Web-Based Enterprise Management (WBEM) and is fully compliantwith the Common Information Model (CIM), defined by the DMFT, which is based upon UML.

MOF, the language that is used for describing the CIM classes, is based on the Interface DefinitionLanguage (IDL).

It is possible to use Windows Remote Management (WinRM) instead the Distributed ComponentObject Model (DCOM) to obtain remote WMI management data using the WS-Management SOAP-based protocol that are formatted in XML.


In the Olympic Games scenario there are Windows systems where WMI might be used to grab the logsbut at the present, they are enforced using the standard format and moved to syslog.

12http://www.dmtf.org/sites/default/files/standards/documents/DSP0004V2.3_final.pdf13http://www.dmtf.org/sites/default/files/standards/documents/DSP0004_2.6.0_0.pdf




Advantages

WMI is widely present in windows-based applications so it is a common way to access and share man-agement information from local and remote computers. Besides, there is a variety of scripting languages(such as VBScript or Perl), that can be used in enterprise applications and administrative scripts to obtainWMI data or take actions through WMI.

CIM is a model that permits both a common model that applies to all areas and particular extensionsto define different management information for systems, networks, applications, devices and services.This feature allows building semantically rich management information that will be exchange throughoutthe network.

Issues

The WMI log files are being replaced by Event Tracing for Windows (ETW) .Some vulnerability on applications that use Windows Management Instrumentation can be found.

For example in some applications, due to insufficient security protections on WMI providers, a localattacker could gain elevated privileges on the local system and use them to take control of it.

Uses

WMI scripts and applications are used to obtain and exchange management information on windows-based systems. These scripts allow performing administrative tasks on parts of the operating systemsas well as share management data with different products. Some of the products can be MicrosoftSystem Center Operations Manager or Windows Remote Management (WinRM).

2.10 WS-Eventing and WS-Notification

2.10.1 Objectives

WS-Eventing[1] and WS-Notification[7] are two competing specifications to standardize message for-mats and Web services interfaces for subscription management and notification delivery in event notifi-cation systems in WS-based systems. A WS-based event notification system utilizes Web services tech-



nologies to deliver event notifications and manage subscriptions. In such a system, a SOAP-formattedsubscription is sent to an event producer Web service, requesting a certain kind of event notifications toone or more event consumer Web services. As events occurr, the event consumer Web services canreceive SOAP-formatted notification messages. The notification messages can be transported throughintermediary and use different transportation mechanisms.

2.10.2 Structure

Architecture

The architectures presented in WS-Eventing and WS-Notification are remarkably similar irrespective oftheir incompatibility. In fact, subsequent versions of each specification have converged towards eachother, borrowing concepts from the other to mitigate their own deficiencies.

WS-Eventing and WS-Notifications both process identical WS-based architecture and follow Pub-lisher/Subscriber design. Both define subscriber and subscription manager entities. The event sinkdefined in WS-Eventing is comparable to the notification consumer defined in WS-Notification. Thesubscribers are separated from notification consumers such that notification consumers are required tohandle only the received notification messages. They are not required to know the message broker lo-cation and manage subscriptions. WS-Eventing does not separate the publisher from the event source.The event source in WS-Eventing has both functions of the notification producer and publisher definedin WS-Notification.

Function

WS-Eventing defines five operations, namely Subscribe, Renew, GetStatus, Unsubscribe and Subscrip-tionEnd. The Subscribe operation is used to create a subscription for an event sink. The Renew, Get-Status and Unsubscribe operations are provided by subscription managers to subscribe to their existingsubscriptions. If an event source terminates unexpectedly, a SubscriptionEnd message is generatedand sent to the address specified in the subscription request. If that address is not presented in thesubscription request, this SubscriptionEnd message is not generated.

WS-Notification has comparable operations for the above five operations. Even though it does notdefine GetStatus and SubscriptionEnd operations, they can be implemented using the (optional) WS-ResourceFramework since WS-Notification can treat subscriptions as WS-Resources in WS-ResourceFrameworkspecification.



Delivery mode

Both WS-Eventing and WS-Notification can use push, pull and wrapped mode to deliver notificationmessages. The wrapped mode deliver can encapsulate several notification messages on to one forefficient delivery. The pull mode enables the event sink or notification manager to check an event sourceperiodically for relevant events. In push mode, the event source waits for an acknowledgement for thenotification message it sends.

Filters

WS-Notification defines three types of message filters namely TopicExpression, ProducerProperties andMessageContent. A subscriber can use any or all of these filters. WS-Eventing allows at most one filterin subscription requests. The default filter is a content-based filter using XPath expressions in a specifieddialect that evaluates to a Boolean value as a filtering criteria. WS-Eventing does not specify a way tofilter messages using ProducerProperties of publishers.


WS-Eventing and WS-Notification specifications are composable with other WS-* specifications. Hencethey only defines the key publishers/subscriber functions and rely on other WS-* specifications to providevarious value additions such as security, reliability and transactions. For instance, WS-Security can beused with WS-Eventing or WS-Notification to provide secure delivery of messages.


Both specifications are candidates for receiveng events from web services platforms.

2.10.3 Advantages of the formats

Both specifications provide means to develop distributed event notification systems utilizing exitingWeb services technology which intrinsically provides vendor-independent, platform independentand programming language independent interoperability.

They are composable with other WS-* specifications to provide various value additions such assecure delivery, reliability and transactions.

Fits well with Asynchronous Web services Invocation paradigm



Data source Characteristics Rationale summary

SIEM Standard Experience

CEF Y(1) N Y CEF is an interesting glance at data collection from animportant SIEM vendor and is a public specification.

CLF Y(all) Y Y CLF is a major log format for web servers, being sup-ported by Apache out of the box. It can be directly inte-grated in many SIEMs, e.g. Prelude and RSA.

IDMEF Y(1) Y N While IDMEF is not widely used in the community, andits important overhead may prevent its further diffusion,it does provide a reference viewpoint for modeling alertinformation. At least 2 MASSIF partners have experiencewith IDMEF.

IF-MAP N Y Y IF-MAP is a recent newcomer and has industrial backing,although outside the SIEM community so far. One MAS-SIF partner has experience with IFMAP.

IODEF N Y N IODEF addresses a different community than the classicSIEM world, so provides an additional, alternative view-point about decision support modeling, that has to ourknowledge no equivalent, and that is important for theMASSIF decision support components.

IPFIX N Y Y IPFIX is becoming increasingly important in the network-ing world, where it may provide an alternative or a com-plement for syslog.

Syslog Y(all) Y Y This is the major data source. It is clearly used a lot inSIEMS, has standards backing and is used by profes-sionals. It is the de-facto data source standard for theATOS use case and for many network operators. Whilethe analysis of syslog messages needs to be refined toreally understand the content, it does provide a first entrypoint for syntactic and semantic analysis.

WMI Y Y Y WMI is one of the major interfaces for managing Microsoftwindows systems, and as such is a way to retrieve in-formation from them, that is of interest to the MASSIFproject.

WS-Eventing N Y Y While these languages are currently rarely included inSIEM environments, the focus of MASSIF on businessprocesses attack detection makes these languages im-portant.

Table 2.2: Included log sources summary



Data source Characteristics Rationale summary

SIEM Standard Experience

ODBC Y N N While ODBC is used as a collection mechanism, it shouldbe considered with caution. We believe that its use is ori-ented to Windows environments, and WMI provides a bet-ter alternative. Also, it is purely about transport and doesnot provide us with information about the data, thus is con-sidered out of scope of this deliverable.

SNMP Y Y N While SNMP is cited as a collection mechanism by sev-eral SIEMs, its use seems to be limited to transportingdata. The management information bases used by SIEMSwould have been in scope, but SIEM products do not pub-licly document this, and the transport protocol only is outof the scope of this deliverable.

Log file pull Y N Y Several methods for pulling out log files are mentioned inSIEMs documentations, such as FTP, SFTP, SSH or SCP.This does not provide information about the content of theinformation handled thus does not fall into the scope of thisdeliverable.

Table 2.3: Eliminated log sources summary


Chapter 3

Use-case specific data streams

3.1 Olympic Games Scenario

3.1.1 Motivation and description

The Olympic Games SIEM definition follows business drivers, that is, definition is tight to the specifictechnology that the customer (the Local Organizing Committee) decides. Usually this decision followssponsorship interests.

Hence, events processing languages in the Olympic Games Scenario is tight to the specific SIEMproduct development context. The choice of the language events processing protocol will influence theinternal representation of the events data, transmission and storage but, by all means, it is usually tightto the specific SIEM product. Current contexts are based in the Novell SIEM product (i.e. Novell Sentinel6.1 in the Vancouver Winter Olympic Games project) and only two different protocols where used in thelast Olympic Games: Syslog and LEA.

The Olympic Games SIEM uses the Novell Sentinel product. Novell Sentinel 6.11 delivers real-time monitoring and remediation for automated security and compliance. With a single view of securityand compliance events across the enterprise, Sentinel 6.1 combines identity management and securityevents management for real-time. Sentinel 6 streamlines labor-intensive and error-prone processes,cuts costs through automation, and enables you to deliver a more rigorous security and complianceprogram.

1http://www.novell.com/products/sentinel/

43


3.1.2 Novell Sentinel Interface: Syslog data format

Description

Syslog2 (see section 2.8) is a standard for logging program messages. It allows separation of thesoftware that generates messages from the system that stores them and the software that reports andanalyzes them. It also provides devices, which would otherwise be unable to communicate, a means tonotify administrators of problems or performance.

There are three main topics when defining the Olympic Games related events and languages:

1. How to collect data transmission, syslog, wmi, snmp, etc

2. How to parse the data format, spaces and commas

3. How to make sense out of the collected data meaning/logics of the fields posed by the monitoredapplication/system

Mapping these three topics into Novell Sentinel 6.1 we get the following Novell components:

Sources are systems that are being monitored.

Connectors define connectivity protocols. Only two different protocols where used in the last OlympicGames: Syslog and LEA.

Collectors define parsing rules and mapping of the internal data presentation into Sentinel taxonomy.Collectors examples used in the Olympic Games were Windows (through Snare agents), Source-fire, Nortel switches/routers or Sophos Antivirus.

Advantages

Syslog provides flexibility when dealing with different SIEM products and obviously is a widely extendedlog format.

Syslog is the preferred (de facto) format in the Olympic Games scenario.

Drawbacks and issues

We have used Syslog as native log function built-in in the network devices, e.g. switches/routers, IDS,FW appliances, etc. These devices can not speak IDMEF or similar.

2http://www.syslog.org/



When monitoring Windows systems we might used WMI to grab the logs, but still we enforced usingthe standard format and moved to syslog by implementing Snare agents on each windows systemtranslating Eventlog into Syslog.

Examples

The following are examples of valid syslog messages. A description of each example can be found belowit. The examples are based on similar examples from RFC 3164[9] and may be familiar to readers. Theotherwise-unprintable Unicode BOM is represented as "BOM" in the examples.

Example 1 - with no STRUCTURED-DATA

1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47

- BOM'su root' failed for lonvick on /dev/pts/8

In this example, the VERSION is 1 and the Facility has the value of 4. The Severity is 2. The messagewas created on 11 October 2003 at 10:14:15pm UTC, 3 milliseconds into the next second. The messageoriginated from a host that identifies itself as mymachine.example.com. The APP-NAME is su andthe PROCID is unknown. The MSGID is ID47. The MSG is su root failed for lonvick..., encoded inUTF-8. The encoding is defined by the BOM. There is no STRUCTURED-DATA present in the message;this is indicated by - in the STRUCTURED-DATA field.

Example 2 - with no STRUCTURED-DATA

1 2003-08-24T05:14:15.000003-07:00 192.0.2.1 myproc 8710 - -

%% It's time to make the do-nuts.

In this example, the VERSION is 1. The Facility is 20, the Severity 5. The message was createdon 24 August 2003 at 5:14:15am, with a -7 hour offset from UTC, 3 microseconds into the next second.The HOSTNAME is 192.0.2.1, so the syslog application did not know its FQDN and used one of itsIPv4 addresses instead. The APP-NAME is myproc and the PROCID is 8710 (for example, this couldbe the UNIX PID). There is no STRUCTURED-DATA present in the message; this is indicated by - inthe STRUCTURED-DATA field. There is no specific MSGID and this is indicated by the - in the MSGIDfield. The message is %% Its time to make the do-nuts.. As the Unicode BOM is missing, the syslogapplication does not know the encoding of

d3.2.1 - scenarios analysis and external languages specification_v1.0_final

Documents

event formats

scenarios analysis

massif consortium

alert andevent languages

security alerts

analysis of input

analysis of commercial

massif project