intrusion detection by intelligent analysis of data …intrusion detection by intelligent analysis...

Intrusion Detection by Intelligent analysis of data across multiple gateways inreal-time.

Joel Scanlan, Samuel Lorimer, Jacky Hartnett and Kevin Manderson

School of Computing, University of TasmaniaSandy Bay Tasmania

[email protected], [email protected], [email protected], [email protected]

Keywords: Intrusion Detection, Firewall, Multiple Gateway, Analysis

Abstract – Current firewalls and intrusiondetection systems are generally designed toprotect a single gateway in order to provideprotection for machines residing behind thegateway on an internal network. Whenconsidering a network incorporating multiplegateways across a range of IP addresses exposedto the Internet, interesting data can be gatheredwith regard to the types of scans occurringacross these gateways from the outside. Thevalidity of using a central server to amalgamate,reduce and analyse the log files of each gatewayis investigated in order to examine the activitiesof the scans across multiple gateways and portnumbers. The results from this analysis canthen be used to act against an attack throughheuristic driven rule creation.

INTRODUCTION

In the last few decades computers, and theirattached networks, have spread throughout theglobe to the stage where they are now pervasive inour everyday lives. However while thesetechnological leaps have arguably made our liveseasier and our jobs more efficient they have notnecessarily made our data (of whatever type orcontent) more secure. Work, home and governmentnetworks are now all connected through theinternet, leaving them open to attack, whether theattack’s origin is within the same city as the serverstoring the data, or is half a world away on adifferent continent.

A Trend Micro [1] survey of 500 corporations,government agencies, financial institutions,medical institutions, and universities revealed that85% of them had suffered a security breach in the

preceding 12 months. Everyone in the networkedworld is at risk of those who wish to cause harm tocomputer systems.

In an effort to counter these threats networkinfrastructure in the form of Firewalls andIntrusion Detection systems are implemented.Firewalls act as a perimeter defence around anetwork, blocking traffic that is not allowed toenter (or exit) a network. Intrusion detectionsystems are used to alert administrators to threatsthat have made it through the firewall onto thenetwork, often producing an automated response oraction.

Current intrusion detection systems and firewallsare designed to operate on a single gateway as anindividual location separate to the othercomponents of the network (or as an individuallistening sensor monitoring network traffic). Thisresults in them not having access to the contextualinformation of what is occurring to the network asa whole; which frequently involves having othergateways to other networks and the internet.

The following paper will describe the analysisprocesses and results of an examination of an Auditlog that was amalgamated from multiple gatewaysacross an entire class C IP address range. Attacksthat are being carried out against the network as awhole will be responded to, pre-empting theattack’s continuation against other gateways on thenetwork.

AUDIT LOG ANALYSIS

The most fundamental element in almost anyintrusion detection system is the presence of a log

file. Audit log files record the events that occur ona computer system, along with a time-stamp andother identifiers such as the user or IP address.Without this critical information it is impossible toknow what operations have been performed on thesystem. The log files used within this study werefrom the gateways (entry points) to a network.

Audit log files have primarily been used in the pastto analyse how an attack occurred upon a systemafter it has finished [2]. Clifford Stoll [3] in hisbook The Cuckoo’s Egg details how he tracked aseries of attacks by a hacker on and through hissystem by printing out his activities andlaboriously manually analysing what the hackerhad done. During the 1980’s audit logs changedfrom being massive mounds of weeklong printoutsto being stored electronically on the system.Developments in pattern matching techniquesallowed for automated analysis of electronicallystored audit logs [4]; these were the first intrusiondetection systems.

During the 1990’s advances in computing powerenabled the analysis of audit logs to occur in real-time, thus allowing these intrusion detectionsystems to respond immediately to attacks [5]. Theevents which usually result in a log being createdin most systems entail identification andauthentication mechanisms, creation, deletion ormodification of files and directories, networkactivity and administrative activity relating toprocesses and account creation [6].

Before analysis can occur the data stored within thefile is filtered, discarding information that isirrelevant to the analysis. Feature extraction is afurther process of log reduction; it examines thelog file entries, extracting specific relevantinformation and again discarding the remainder.These methods facilitate fast efficient extraction ofaudit log data.

There are two main methods by which attacks arediscovered upon a system through Log File

Analysis: Anomaly Detection and SignatureDetection.

Anomaly detection ID systems require a profile ofeach user or user group to be made to enable thesystem to “learn” what comprises normalbehaviour [7, 8]. The behaviour model is thencompared to user actions upon the system,searching for behaviour that does not fit the model;this behaviour is then classed as abnormalbehaviour and treated as an intrusion.

Anomaly detection is broader then just mappingprofiles of human usage. It is also applicable toprocesses and network access or usage [9].Network traffic analysis also yields profiles ofnormal usage that can be used in monitoringnetwork traffic for anomalies and thus to detectattacks.

Signature detection searches audit logs for knownattacks, matching malicious behaviour to pre-defined signatures. Signature or misuse detectionhas a database of attack signatures against which itcan compare network event patterns in order todiscover an attack. This results in signaturedetection systems being able to be operationaldirectly after they are installed without the need forany training of the system [8].

Signature based intrusion detection is significantlymore computationally efficient then anomaly baseddetection per item of knowledge as it does not needto create matrices for each system activity [10].However, Brox [11] comments that signaturedetection has a flaw in that it requires a signaturefor a given attack to be able to be detected, and insome instances this is a case of waiting for anattack to occur, to then be able to make a signatureto protect against it. Existing signature basedintrusion detection systems examine audit logswithin the context of a single gateway, and do nottherefore protect systems from a signature that inactuality is spread across several gateways. It isthis shortfall that is examined in this paper.

MULTIPLE GATEWAY ANALYSIS

Current Intrusion Detection Systems usingsignature or anomaly detection (or indeedcombinations of them both) work effectively upona single gateway or network device, however theylack the context of what is occurring across theentire network. Network infrastructure need to becontextually aware to be truly effective andefficient. An example of this can be seen in earlypacket filtering firewalls that lacked the contextualinformation of session data, and thus somewhatneedlessly and laboriously filtered each packetwithin a session, ignoring the previous conclusionthat the packets were not malicious in content.Likewise, multiple gateways across a singlenetwork could each be being (trivially) attackedsimultaneously, each largely ignoring the attack,when in reality the attack is occurring across thewhole network and is of a serious concern.

To be able to efficiently monitor and act upon suchattacks a centralized analysis module needs tooperate, having access to the complete audit logs ofeach gateway or sensor upon the network so as topreserve the network context. The Logamalgamation allows for signature and anomalydetection methods to be implemented networkwide, thus allowing for a unified defence across themultiple gateways of the network.

IMPLEMENTATION

The implementation utilises actual audit data froma gateway range that consists of multiple remotegateways along with the central server (ns1). Ns1is bound to the IP addresses of almost a completeC-class running from 0 to 252 in the last octet. Thisacts as an excellent range of ‘virtual’ consecutivegateways, as it will appear externally as thoughthere are 253 separate machines when really eachIP address will report to the same machine. Theaudit log still reports which IP address within thisrange was probed, meaning that it is possible toanalyse the data from this single machine as thoughit were 253 separate gateway machines (Figure 1).

Fig. 1. Audit Log Amalgamation.

The majority of the results discussed within thispaper were gathered from port probes that coveredthis IP range. One of the goals of the system is todevelop an effective threshold level of probeswhich if passed would result in action being takento protect the network as a whole.

The system developed to obtain results consists oftwo separate modules – an analysis module and atracking module. A database is utilised to store theprocessed data.

The analysis module is developed to maintain anoverall image of the state of the system at anygiven time. It analyses every incoming log entry, oran archived log file, and updates a table in thedatabase which specifies which source IP addressesmay be of interest. Rather than keeping a databaseentry for every instance of each IP within the log, itonly makes one entry for each individual IPaddress and then increments a count for allsubsequent instances along with additionalBoolean information indicating if an IP hasscanned more then a single gateway, or port.

This count that is recorded, is the value used by athat the threshold or heuristic that dictates howlong an IP address can probe the network before itis banned upon all the gateways; this enables theanalysis and tracking module to have a window ofopportunity to gather the required information to

ns1

Internet

A B C D E

assess whether or not an IP is conducting amultiple gateway attack and set a Boolean multiplegateway attack value.

The tracking module is developed to follow theexact activities of individual IP addresses whichhave been deemed to be performing suspicious, orpotentially interesting, types of scans acrossmultiple gateways or port numbers. It creates adatabase entry for each piece of activity within thelog file from the IP addresses that it is tracking.This results in a highly verbose, yet extremelycomprehensive, record of the activities of theseparticular IP addresses. This data can then beanalysed to gain an insight into the methods usedfor these scans.

The aim of these two modules is to track theactivities of IP addresses across the gateway rangein an attempt to recognise any patterns or methodsof attack that are currently being overlooked by theexisting security infrastructure examining themeach singularly.

RESULTS

The results that have been ascertained fall into twocategories, one for each module, Analysis Resultsand Tracking Results.

Analysis Module Results

The analysis module records a set of informationon each IP which probes one of the gateways onthe network; it records the number of times this IPhas probed the network, whether or not it hasprobed more then one gateway, what port it probedon and whether or not it has probed multiple ports.When examining these data sets the followingsimple statistic table can be identified (Table 1).

During the 10-day study period (September 1st till10th 2003) 6766 individual IP addresses probed thegateways on the network; of these 776 (11.5% of

SingleGateway

MultipleGateways

Source IPAddresses 5990 776

% of Total 88.5 11.5

Table 1. Individual IP address Statistics.

the total) probed more then one of the gateways.This demonstrates that there is a sizeable risk tosystems from malicious users who are approachinggateway access at a network wide level, thusjustifying central processing of audit data to retainnetwork context. Realising that the presence andmode of attack is present, however, is only the firststep to combating the problem itself. Detectingwhen these attacks are taking place with reasonableefficiency is the true goal of the Analysis module.

Upon further examination of the data collected bythe analysis module, it is possible to group thesource IP addresses based on the total number ofprobes sent. Fig. 2 (over page) shows that the vastmajority (83%) of source IP addresses sent only 3or fewer probes against the network, indicating thatperhaps it could be an appropriate level to test athreshold level heuristic. There are also slightincreases at 6 and 9 also which were tested asthreshold levels. This heuristic was then used inconjunction with the Boolean value stored forwhether or not an IP has probed more then onegateway to classify source IP’s, creating a lesscoarse heuristic rule set.

The results showed that at a threshold level of 3only 3.2% of IP’s were classified as potentiallyperforming scans on multiple gateways. With theoptimum of 11.5% to get all potential maliciousprobes it is a relatively poor result. By comparison,as the threshold level was increased to the levels of6 and 9, the result returned were 8.3% and 10%respectively. These results were much moreacceptable, however not quite at the levels desiredto achieve an acceptable efficiency.

527

919

4195

104 122359

20 44 72 120

500

1000

1500

2000

2500

3000

3500

4000

4500

1 2 3 4 5 6 7 8 9 10

Number of Probes attempted

Nu

mb

er o

f in

div

idu

al IP

s

Fig. 2. Distribution of the number of port probe attempts

Fig. 3 illustrates the final distribution aftertesting several other threshold levels. Theoptimum level of efficiency was found tobe when operating an 11 probe threshold.

Tracking Module Results

The tracking module produced veryinteresting results by using the flags thatwere triggered by the analysis module as aguide to which source IP addresses were

worth tracking. The graph in Figure 4depicts the ten days of Tracker activity onall source IP addresses that probed thegateway array. More than 40,000 probeswere received in total in order tostatistically analyse the scan activity on thegateway array as shown in Fig. 4. The scansacross multiple gateways from one or moresource IP addresses appear as vertical barswithin this graph because of the large scaleof the x-axis.

0

10

20

30

40

50

60

70

80

90

100

1 3 5 7 9 11 13 15 17 19 21 23 25

Threshold Value

% E

ffec

tive

nes

s o

f d

etec

tio

n

Fig. 3. Effectiveness of Detecting IP address probing Multiple Gateways

0

50

100

150

200

250

1-Sep 2-Sep 3-Sep 4-Sep 5-Sep 6-Sep 7-Sep 8-Sep 9-Sep 10-Sep

Date

Las

t O

ctet

of

Gat

eway

IP A

dd

ress

Port Probe

Fig 4 Ten days of gateway activity recorded by the Tracker Module

These scans are taking place over a timeinterval of between one minute and onehour, and will be referred to as ‘fast’ or‘normal’ scans. Generally these areoccurring as quickly as the internetconnection and processor speed of theattacking machine will allow. The diagonallines on a 45-degree angle spanning anumber of days are referred to as a ‘slow’scan. These will have a far larger timeinterval between each individual probe of

anywhere up to an hour, making them moredifficult to detect. Using the Trackingmodule it is possible to track a single IPand watch what it has done over a period oftime, Fig. 5 displays the methodicalness ofa port scan across the entire range of theclass C address as well as the fact the scanalso covered multiple ports on each of thegateways. It is this style of attack that ourimplementation is attempting to detect, andact upon to protect the given network.

0

50

100

150

200

250

9:06:29 9:06:37 9:06:46 9:06:55 9:07:03 9:07:12 9:07:21 9:07:29 9:07:38

Time

Fin

al O

ctet

of

Tar

get

IP A

dd

ress

Port Probe

Fig. 5 .Port scan across multiple gateways and ports from a lone source IP address.

Scan TypeVertical - Fast45o Angle - Slow

CONTINUING WORK

The implementation has progressed on toautomating the process to allow it to occurin real-time, while also building ininteraction between the Analysis andTracking module. A third module has alsobeen created called the Action modulewhich examines the results produced by theother two and formulates a firewall rule tobe sent to the gateways to provideprotection from a multiple gateway attack.

The goal of the Action module is be a pre-emptive defence mechanism to provideprotection the gateways on the network thathave not yet been attacked by a given IPaddress. For this to be truly worthwhile theprocess needs to be efficient and operate atreal-time during attacks.

The current work on the system is alsoaiming at discovering the optimum banlength for attackers to prevent both long-term slow scans and attackers who returnafter a period of inactivity. This optimumlength will be one of the heuristicsconsidered when creating the firewall rulesin response to a detected attack.

RELATED WORK

There have been continuous developmentsand advancements in the examination ofelectronic audit logs since the 1980’s.However this has only rarely branched intothe realm of amalgamating logs acrossgateways to examine threats at a networkwide level. Recent research in this area hasoccurred however with the MINDS project.

MINDS

The MINDS [12]Minnesota IntrusionDetection System) project has the objectiveof producing a system which will allowlarge scale analysis using data miningalgorithms to detect attacks [12]. TheMINDS system uses a combination ofsignature detection and anomaly detection

to provide protection to the University ofMinnesota network.

The MINDS system uses network trafficflow data collected from CISCO routers.This audit data is then filtered to removeextraneous entries before feature extractioncollates the required information foranalysis (source and destination IP’s andports, protocols, timestamp, flags). Alsocatalogued is derived contextualinformation such as the amount of traffic toa destination from a specific source. Theextracted, reduced log is then run throughthe Attack Detection Module of MINDSusing signature detection to discover anyknown attacks. The remaining log is thenfed through the Anomaly DetectionModules that allocates a score to eachconnection in relation to normal trafficpatterns. Connections that score highly arethen further analysed by the networkadministrators to moderate whether or notthe connection was an intrusion or a falsepositive. Connections that scored highly bythe Anomaly Module, and are not found tobe false positives by the administrators, arethen further analysed to produce newsignatures for emerging attacks. It is in thisway the MINDS system is able to not onlyprotect against the more common and wellknown attacks, but is also very strong onthe detection of novel attacks, or attackswhich are not yet supported by many otherIDS [13, 14].

The MINDS project has been developedduring the same period as our own system,and as such, there are some notabled i f fe rences be tween the twoimplementations. The MINDS project doesnot employ threshold level heuristics intheir detection mechanisms, and the systemis not fully intended to be automatedprocess.

CONCLUSION

In conclusion, this paper has demonstratedthat it is possible to detect and indeed trackmalicious scans across a series of gateways

through the centralised analysis of auditlogs. Detecting such attacks is impossiblefor individual gateways spread across anetwork as they lack the contextualinformation to recognise the true aims of asingle trivial probe against a port.

Once the central processing and analysishas occurred, and detected malicious sourceIP addresses it is possible to defend theremaining network gateways from futureattacks. Such pre-emptive defencemechanisms will better equip network administrators toprotect and maintain services to userswithin sizeable networks.

REFERENCES

[1] T. Micro, "The Real Cost of a VirusOutbreak," Trend Micro, CupertinoCA, White Paper 2002.

[2] C. P. Pfleeger and S. L. Pfleeger,Security in computing, 3rd Int ed.Upper Saddle River, N.J.: PrenticeHall PTR, 2003.

[3] C. Stoll, The cuckoo's egg : trackinga spy through the maze of computerespionage. London: Pan Books,1991.

[4] J. Anderson, "Computer SecurityThreat Monitoring andSurveillance.," Fort WashingtonApril 1980 1980.

[5] R. Kemmerer and G. Vigna,"Intrustion Detection: A BriefHistory and Overview," IEEEComputer, vol. Special publicationon Security and Privacy, 2002.

[6] E. G. Amoroso, Intrusion detection: an introduction to Internetsurveillance, correlation, traps,

trace back, and response, 1st ed.Sparta, N.J.: Intrusion.Net Books,1998.

[7] L. Heberlein, G. Dias, K. Levitt, B.Mukherjee, J. Wood, and D.Wolber, "A Network SecurityMonitor," Proceedings of the 1990IEEE Symposium on Research inSecurity and Privacy, pp. 296-304,1990.

[8] G. Holden, Guide to NetworkDefence and Countermeasures:Thomson, 2003.

[9] S. Hofmeyr, S. Forrest, and A.Somayaji, "Intrusion Detectionusing Sequences of System Calls,"Journal of Computer Security, vol.6, pp. 151/180, 1998.

[10] S. Kumar, "Classification anddetection of computer intrusions.,"in Computer Science: PurdueUniversity, 1995, pp. 180.

[11] A. Brox, "Signature Based orAnomaly Based Intrusion Detection– The Practice and Pitfalls,"Schmagazine, 2002.

[12] R. T. MINDS, "MINDS -Minnesota Intrusion DetectionSystem," V. Kumar and J.Srivastava, Eds., 2004.

[13] L. Ertoz, E. Eilertson, A. Lazarevic,P. Tan, P. Dokas, J. Srivastava, andV. Kumar, "Detection andSummarization of Novel NetworkAttacks Using Data Mining,"Technical Report 2003.

[14] L. Ertoz, E. Eilertson, A. Lazarevic,P. Tan, J. Srivastava, V. Kumar, andP. Dokas, "The MINDS - MinnesotaIntrusion Detection System," inNext Generation Data Mining,2004.

intrusion detection by intelligent analysis of data …intrusion detection by intelligent analysis...

Documents