network intrusion detection using distributed data mining filenetwork intrusion detection using...

20
Network Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center Department of Computer Science University of Minnesota http://www.cs.umn.edu/~kumar Collaborators: Paul Dokas, Eric Eilertson, Levent Ertoz, Aleksandar Lazarevic, Michael Steinbach, George Simon, Jaideep Srivastava, Pang-Ning Tan Yongdae Kim, Zhi-li Zhang

Upload: hoanganh

Post on 29-Mar-2019

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Network Intrusion Detection Using Distributed Data Mining

Vipin KumarArmy High Performance Computing Research Center

Department of Computer ScienceUniversity of Minnesota

http://www.cs.umn.edu/~kumar

Collaborators: Paul Dokas, Eric Eilertson, Levent Ertoz,Aleksandar Lazarevic, Michael Steinbach,George Simon, Jaideep Srivastava, Pang-Ning TanYongdae Kim, Zhi-li Zhang

Page 2: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Information AssuranceDue to the proliferation of Internet, more and more organizations are becoming vulnerable to cyber attacksSophistication of cyber attacks as well as their severity is also increasingCyber strategies can be a major force multiplier and equalizerSecurity mechanisms always have inevitable vulnerabilities

Firewalls are not sufficient to ensure security in computer networksInsider attacks

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

1 2 3 4 5 6 7 8 9 10 11 12 131990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002

Incidents Reported to Computer Emergency Response Team/Coordination Center (CERT/CC)

Traditional signature-based intrusion detection systems (IDSs) (e.g. SNORT) cannot detect emerging cyber threatsData Mining can alleviate this limitation

www.snort.org

Example of SNORT rule (MS-SQL “Slammer” worm)

any -> udp port 1434 (content:"|81 F1 03 01 04 9B 81 F1 01|"; content:"sock"; content:"send")

Page 3: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Experts Race to Beat Computer WormU.S., Canada Try to Thwart Sobig by Disconnecting 17 Machines

The virus was disguised on Usenet as a pornographic photograph in an adult news group, Minor said. People who clicked on the photo had their PC infected with the virus, which then began to e-mail itself to every address on the infected computer's e-mail address book.

FBI cyber division spokesman Bill Murray said the bureau and the Department of Homeland Security would do everything they could, including serving subpoenas, to track the source of the worm.

The Sobig.F worm, a variation of a virus that's been around since January, quickly spread out of control this month. America Online Inc., the world's largest online service, reported that nearly 60 percent of the 38 million attachments to e-mail messages that it filtered Thursday contained the Sobig.F virus.

The computer worm was one of at least three viruses that have brought corporate, personal and government computer networks to a crawl over the past two weeks.

The FBI served a grand jury subpoena yesterday onEasyNews.com, a Phoenix-based Internet service provider whose network may have been used as a starting point for the Sobig worm.

The worm is thought to have been released originally on Usenet, a sort of Internet bulletin board, by someone who had an account at EasyNews.com, according to Michael Minor, the company's co-owner. The account was paid for with a stolen credit card number and established minutes before the virus was released on the Internet on Monday, Minor said. He added that the company is cooperating with the FBI.

BY BRIAN KREBSSpecial to The Washington Post________________

Computer-security experts working with law enforcement officials in the United States and Canada raced yesterday to contain the Sobig.F computer worm before it could launch a new attack as authorities reported progress on finding the source of the virus.

Security experts who cracked the worm's code late Thursday night found that Sobig instructed infected computers to try to contact one of 20 other computers yesterday afternoon to download new instructions -- to do what is as yet unknown. But the worm either failed to seek those instructions or it was thwarted from doing so when security experts disconnected 17 of the 20 targeted computers before the anticipated 3 p.m. attack.

SCO claims that IBM illegally inserted Unix code into its version of Linux and has sent letters to corporations, warning them that they may be violating copyright laws by using the Linux operating system.

Raymond, president of the Open Source Initiative advocacy group, urged the hacker, if a member of the open-source community, to stop the attack, because it could do more harm than good.

"We're the good guys. But that doesn't matter if we aren't *seen* to be the good guys," Raymond wrote in the Sunday posting. "We cannot fight our war using vandalism and trespass and the suppression of speech, or SCO will paint us as crackers and maybe win.“In the posting, Raymond also made a reference to a planned counterattack by members of the open-source community against SCO to demonstrate the weakness of its legal case, but did not go into detail, saying "the element of surprise is part of it.“

This weekend, a denial-of-service attack took down the Web site of The SCO Group, which is caught in an increasingly acrimonious row with the open-source community over the company's legal campaign against Linux.

SCO's Web site was largely out of commission until Monday morning, a representative of the Lindon, Utah-based Unix and Linux seller said Monday. Performance measurement statistics from Netcraft indicated that the site had been down since Friday night.

In a distributed denial-of-service (DDoS) attack, numerous computers simultaneously send so much data across a network that the targeted system slows to a crawl while trying to keep up with the traffic it's receiving. The SCO representative could not say where this weekend's strike originated. However, unofficial open-source spokesman Eric Raymond suggested in a posting Sunday to open-source news Web site NewsForge that the attack was launched by someone angry at comments from SCO executives criticizing the open-source community's role in the legal battles over Linux.

Hackers cut off SCO Web siteMartin LaMonica, Staff Writer, August 25, 2003

Just as these problems seemed to be receding, the latest version of a virus called SoBig invaded many computers in the form of attachments to e-mail notes. When an unwary user opened such an attachment, the virus could steal all the e-mail addresses residing on the computer and mail copies of itself to those people as well, inviting the unwary to click on the attachment and start the infection moving again. Virtually everyone involved in the Internet can share some of the blame for these lapses in security. Microsoft, whose Windows operating systems were the target of the worms, can be faulted for failing to design and test its software adequately. Internet service providers have not stressed security as much as ease of connection. Government and corporate network administrators have sometimes been slow to safeguard their systems. And individuals have been notoriously lax about keeping their antivirus protection up to date. Indeed, Microsoft offered a free software patch to protect against the Blaster worm in July, yet many users never bothered to download it.

It has been an anxious few weeks for computer users trying to ward off various worms and viruses, which can take over their laptops anddesktop machines and even bring businesses and public agencies to a halt. Malicious computer programs have been around for years, with hundreds of viruses apt to be circulating at any one time, but the attacks this month have been particularly trying. Yesterday experts barely headed off a programmed acceleration of a viral attack that has already flooded the Internet with hundreds of millions of infected e-mail messages.

The latest round of computer woes started earlier this month when the Blaster worm took advantage of a weakness in the Windows operating system to invade computer hard drives, where it slowed operations and moved on to attack other computers as well. This irritation was promptly compounded by a similar worm, called Nachi or Welchia, that seems to have been designed to follow in the footsteps of Blaster.

An Onslaught of Computer VirusesAugust 23, 2003

Blackout spurs cyberattack worryBy Kevin Maney and Michelle Kessler, USA TODAY, August 19, 2003

"The electric industry is concerned enough that on Wednesday — one day before the blackout — the North American Electric Reliability Council (NERC) adopted the industry's first-ever cybersecurity standard. It outlines 16 things that utilities should do to protect themselves. ome companies have gone well beyond this. Some have to catch up," says Lynn Constantini, NERC's chief information officer.

Yet, because the grid is so interconnected, experts note, companies that must catch up put the whole system at risk. "Most computer networks are only as good as the weakest point," says Ramnath Chellappa, computer business professor at the

The electric power grid might be more vulnerable to a cyberattack today than it was on Sept. 11, 2001. Officials doubt last week's massive blackout was caused by a terrorist or domestic hacker breaking into an electric power system via the Internet. Yet, the incident again brought to the forefront concerns that such an attack is possible.

"This power infrastructure is all Band-Aids and baling wire. And, of course, it's all dependent on computers," says Peter Neumann of research firm SRI International. "This stuff is riddled with security and reliability flaws."

often you'll find that the administrative networks are segmented from the core of the Department of Defense and that maybe they don't provide as much as security as some of the core networks."

Citibank finished mailing new cards Wednesday to replace the 13,000 that were compromised, said Glenn Flood, a Defense Department spokesman. More than half of the new cards have already been activated.

To reduce the chance of any unauthorized charges being made, the Navy is also beginning a gradual replacement of 9,000 other cards in the program that do not appear to have been compromised. Most of the cards have a $2,500 spending limit.

Federal Computer News reported the hacker attack and cancellation of the cards on Thursday.

BY ANITHA REDDYWashington Post Staff Writer_________________

The Navy has canceled 13,000 credit cards used for government expenses after discovering that hackers had downloaded card numbers and billing records, Defense Department officials said.

Citibank, the card issuer, has found no unusual activity in the card accounts since the hacking began in July and no fraud related to the incident had been reported as of Thursday, according to a Defense Department official.

Officials and investigative teams from the Navy and Department of Defense are still trying to figure out what vulnerabilities the hackers exploited and how to prevent such attacks in the future.

Hackers Steal 13,000 Credit Card NumbersNavy Says No Fraud Has Been Noticed

Saturday, August 23, 2003

Saturday, August 23, 2003

Page 4: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

What are Intrusions?

Intrusions are actions that attempt to bypass security mechanisms of computer systems. They are caused by:

Attackers accessing the system from InternetInsider attackers - authorized users attempting to gain and misuse non-authorized privileges

Typical intrusion scenario

Scanning activity

Computer Network

Attacker Machine with vulnerability

Page 5: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

What are Intrusions?

Intrusions are actions that attempt to bypass security mechanisms of computer systems. They are caused by:

Attackers accessing the system from InternetInsider attackers - authorized users attempting to gain and misuse non-authorized privileges

Typical intrusion scenario

Computer Network

Attacker Compromised Machine

Page 6: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Data Mining for Intrusion DetectionIncreased interest in data mining based intrusion detection

Attacks for which it is difficult to build signaturesAttack stealthinessUnforeseen/Unknown/Emerging attacksDistributed/coordinated attacks

Data mining approaches for intrusion detectionMisuse detection

Building predictive models from labeled labeled data sets (instances are labeled as “normal” or “intrusive”) to identify known intrusionsHigh accuracy in detecting many kinds of known attacksCannot detect unknown and emerging attacks

Anomaly detectionDetect novel attacks as deviations from “normal” behaviorPotential high false alarm rate - previously unseen (yet legitimate) system behaviors may also be recognized as anomalies

Page 7: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Data Mining for Intrusion Detection

Misuse Detection –Building Predictive Models

categorical

temporal

continuous

class

ModelLearn

Classifier

Tid SrcIP Start time Dest IP Dest

Port Number of bytes Attack

1 206.135.38.95 11:07:20 160.94.179.223 139 192 No

2 206.163.37.95 11:13:56 160.94.179.219 139 195 No

3 206.163.37.95 11:14:29 160.94.179.217 139 180 No

4 206.163.37.95 11:14:30 160.94.179.255 139 199 No

5 206.163.37.95 11:14:32 160.94.179.254 139 19 Yes

6 206.163.37.95 11:14:35 160.94.179.253 139 177 No

7 206.163.37.95 11:14:36 160.94.179.252 139 172 No

8 206.163.37.95 11:14:38 160.94.179.251 139 285 Yes

9 206.163.37.95 11:14:41 160.94.179.250 139 195 No

10 206.163.37.95 11:14:44 160.94.179.249 139 163 Yes 10

Tid SrcIP Start time Dest Port Number

of bytes Attack

1 206.163.37.81 11:17:51 160.94.179.208 150 ?

2 206.163.37.99 11:18:10 160.94.179.235 208 ?

3 206.163.37.55 11:34:35 160.94.179.221 195 ?

4 206.163.37.37 11:41:37 160.94.179.253 199 ?

5 206.163.37.41 11:55:19 160.94.179.244 181 ?

categorical

Anomaly DetectionRules Discovered:

{Src IP = 206.163.37.95, Dest Port = 139, Bytes ∈ [150, 200]} --> {ATTACK}

Rules Discovered:

{Src IP = 206.163.37.95, Dest Port = 139, Bytes ∈ [150, 200]} --> {ATTACK}

Summarization of attacks using association rules

Training Set

Test SetKey Technical ChallengesLarge data size

High dimensionality

Temporal nature of the data

Skewed class distribution

Data preprocessing

On-line analysis

Page 8: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

The MINDS ProjectMINDS – Minnesota Intrusion Detection System

MINDS

network

Data capturing device

Anomaly detection

……

Anomaly scores

Humananalyst

Detected novel attacks

Summary and characterization

of attacks

MINDS system

Known attack detection

Detected known attacks

Labels

Feature Extraction

Association pattern analysis

MINDSAT

Filtering

The MINDS is being used at University of Minnesota and at the ARL Center for Intru-sion Monitoring and Protection (CIMP) to detect attacks and intrusive behavior that can not be detected using widely used intrusion detection systems, such as SNORT.Anomalies/attacks picked by MINDS

Scanning activitiesWormsNon-standard behavior

Policy violationsInsider attacks

Net flow tools

tcpdump

Page 9: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Typical Anomaly Detection OutputJanuary 25Even 24 hours after the “SQL Slammer/Sapphire” worm started, network connections related to the worm were ranked at the top by the anomaly detection algorithm

The behavior of “slammer” worm is different from traditional scanning activities, since infected machines target random hosts

MINDS

score srcIP srcPort dstIP DstPort protocoflags packets bytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1620826.69 128.171.137.62 1042 160.94.213.101 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 020344.83 128.171.137.62 1042 160.94.243.110 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 019295.82 128.171.137.62 1042 160.94.214.79 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 018717.1 128.171.137.62 1042 160.94.155.47 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 018147.16 128.171.137.62 1042 160.94.96.183 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 017484.13 128.171.137.62 1042 160.94.204.101 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 016715.61 128.171.137.62 1042 160.94.32.166 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 015973.26 128.171.137.62 1042 160.94.116.102 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 013084.25 128.171.137.62 1042 160.94.176.54 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 012797.73 128.171.137.62 1042 160.94.230.189 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 012428.45 128.171.137.62 1042 160.94.4.247 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 011245.21 128.171.137.62 1042 160.94.131.58 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 09327.98 128.171.137.62 1042 160.94.148.135 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 07468.52 128.171.137.62 1042 160.94.182.91 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 05489.69 128.171.137.62 1042 160.94.31.30 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 05070.5 128.171.137.62 1042 160.94.180.233 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 04558.72 128.171.137.62 1042 160.94.25.1 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 04225.09 128.171.137.62 1042 160.94.133.143 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 04170.72 128.171.137.62 1042 160.94.109.225 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 02937.42 128.171.137.62 1042 160.94.135.75 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 02458.61 128.171.137.62 1042 160.94.119.150 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 01116.41 128.171.137.62 1042 160.94.187.255 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 01035.17 128.171.137.62 1042 160.94.50.50 1434 17 16 [0,2) [387,1264) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

Page 10: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Typical Anomaly Detection OutputJanuary 26, 2003 (48 hours after the “slammer” worm)

MINDS

score srcIP sPort dstIP dPort protocoflagspackets bytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1637674.69 63.150.X.253 1161 128.101.X.29 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.59 0 0 0 0 026676.62 63.150.X.253 1161 160.94.X.134 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.59 0 0 0 0 024323.55 63.150.X.253 1161 128.101.X.185 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 021169.49 63.150.X.253 1161 160.94.X.71 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 019525.31 63.150.X.253 1161 160.94.X.19 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 019235.39 63.150.X.253 1161 160.94.X.80 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 017679.1 63.150.X.253 1161 160.94.X.220 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 08183.58 63.150.X.253 1161 128.101.X.108 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.58 0 0 0 0 07142.98 63.150.X.253 1161 128.101.X.223 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 05139.01 63.150.X.253 1161 128.101.X.142 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 04048.49 142.150.Y.101 0 128.101.X.127 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 04008.35 200.250.Z.20 27016 128.101.X.116 4629 17 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 03657.23 202.175.Z.237 27016 128.101.X.116 4148 17 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 03450.9 63.150.X.253 1161 128.101.X.62 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 03327.98 63.150.X.253 1161 160.94.X.223 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02796.13 63.150.X.253 1161 128.101.X.241 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02693.88 142.150.Y.101 0 128.101.X.168 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 02683.05 63.150.X.253 1161 160.94.X.43 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02444.16 142.150.Y.236 0 128.101.X.240 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 02385.42 142.150.Y.101 0 128.101.X.45 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 02114.41 63.150.X.253 1161 160.94.X.183 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02057.15 142.150.Y.101 0 128.101.X.161 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01919.54 142.150.Y.101 0 128.101.X.99 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01634.38 142.150.Y.101 0 128.101.X.219 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01596.26 63.150.X.253 1161 128.101.X.160 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 01513.96 142.150.Y.107 0 128.101.X.2 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01389.09 63.150.X.253 1161 128.101.X.30 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 01315.88 63.150.X.253 1161 128.101.X.40 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 01279.75 142.150.Y.103 0 128.101.X.202 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01237.97 63.150.X.253 1161 160.94.X.32 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01180.82 63.150.X.253 1161 128.101.X.61 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0

Anomalous connections that correspond to the “slammer” wormAnomalous connections that correspond to the ping scanConnections corresponding to UM machines connecting to “half-life” game servers

Page 11: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Detection of Anomalies on Real Network DataScans

August 13, 2002, Detected scanning for Microsoft DS service on port 445/TCP (Ranked#1)Reported by CERT as recent DoS attacks that needs further analysis (CERT August 9, 2002)Undetected by SNORT since the scanning was non-sequential (very slow). Rule added to SNORT in September

August 13, 2002, Detected scanning for Oracle server (Ranked #2), Reported by CERT, June 13, 2002Undetected by SNORT because the scanning was hidden within another Web scanning

October 10, 2002, Detected a distributed windows networking scan from multiple source locations (Ranked #1)

Policy ViolationsAugust 8, 2002, Identified machine running Microsoft PPTP VPN server on non-standard ports (Ranked #1)

Undetected by SNORT since the collected GRE traffic was part of the normal trafficAugust 10 2002 & October 30, 2002, Identified compromised machines running FTP servers on non-standard

ports, which is a policy violation (Ranked #1)Example of anomalous behavior following a successful Trojan horse attack

February 6, 2003, The IP address 128.101.X.0 (not a real computer, but a network itself) has been targeted with IP Protocol 0 traffic from Korea (61.84.X.97) (bad since IP Protocol 0 is not legitimate)

February 6, 2003, Detected a computer on the network apparently communicating with a computer in California over a VPN or on IPv6 (February 7, 2003)

WormsOctober 10, 2002, Detected several instances of slapper worm that were not identified by SNORT since they were

variations of existing worm codeFebruary 6, 2003, Detected unsolicited ICMP ECHOREPLY messages to a computer previously infected with

Stacheldract worm (a DDos agent)

MINDS

Page 12: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

MINDS - Framework for Mining Associations

Anomaly Detection System

attack

normal

R1: TCP, DstPort=1863 → Attack

R100: TCP, DstPort=80 → Normal

Discriminating Association

Pattern Generator

1. Build normal profile

2. Study changes in normal behavior

3. Create attack summary

4. Detect misuse behavior

5. Understand nature of the attack

update

Knowledge Base

Ranked connections

MINDS

Page 13: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Summarization of Anomalous Connections

January 26, 2003 (48 hours after the Slammer worm)score srcIP sPort dstIP dPort protocoflagspackets bytes37674.69 63.150.X.253 1161 128.101.X.29 1434 17 16 [0,2) [0,1829)26676.62 63.150.X.253 1161 160.94.X.134 1434 17 16 [0,2) [0,1829)24323.55 63.150.X.253 1161 128.101.X.185 1434 17 16 [0,2) [0,1829)21169.49 63.150.X.253 1161 160.94.X.71 1434 17 16 [0,2) [0,1829)19525.31 63.150.X.253 1161 160.94.X.19 1434 17 16 [0,2) [0,1829)19235.39 63.150.X.253 1161 160.94.X.80 1434 17 16 [0,2) [0,1829)17679.1 63.150.X.253 1161 160.94.X.220 1434 17 16 [0,2) [0,1829)8183.58 63.150.X.253 1161 128.101.X.108 1434 17 16 [0,2) [0,1829)7142.98 63.150.X.253 1161 128.101.X.223 1434 17 16 [0,2) [0,1829)5139.01 63.150.X.253 1161 128.101.X.142 1434 17 16 [0,2) [0,1829)4048.49 142.150.Y.101 0 128.101.X.127 2048 1 16 [2,4) [0,1829)4008.35 200.250.Z.20 27016 128.101.X.116 4629 17 16 [2,4) [0,1829)3657.23 202.175.Z.237 27016 128.101.X.116 4148 17 16 [2,4) [0,1829)3450.9 63.150.X.253 1161 128.101.X.62 1434 17 16 [0,2) [0,1829)3327.98 63.150.X.253 1161 160.94.X.223 1434 17 16 [0,2) [0,1829)2796.13 63.150.X.253 1161 128.101.X.241 1434 17 16 [0,2) [0,1829)2693.88 142.150.Y.101 0 128.101.X.168 2048 1 16 [2,4) [0,1829)2683.05 63.150.X.253 1161 160.94.X.43 1434 17 16 [0,2) [0,1829)2444.16 142.150.Y.236 0 128.101.X.240 2048 1 16 [2,4) [0,1829)2385.42 142.150.Y.101 0 128.101.X.45 2048 1 16 [0,2) [0,1829)2114.41 63.150.X.253 1161 160.94.X.183 1434 17 16 [0,2) [0,1829)2057.15 142.150.Y.101 0 128.101.X.161 2048 1 16 [0,2) [0,1829)1919.54 142.150.Y.101 0 128.101.X.99 2048 1 16 [2,4) [0,1829)1634.38 142.150.Y.101 0 128.101.X.219 2048 1 16 [2,4) [0,1829)1596.26 63.150.X.253 1161 128.101.X.160 1434 17 16 [0,2) [0,1829)1513.96 142.150.Y.107 0 128.101.X.2 2048 1 16 [0,2) [0,1829)1389.09 63.150.X.253 1161 128.101.X.30 1434 17 16 [0,2) [0,1829)1315.88 63.150.X.253 1161 128.101.X.40 1434 17 16 [0,2) [0,1829)1279.75 142.150.Y.103 0 128.101.X.202 2048 1 16 [0,2) [0,1829)1237.97 63.150.X.253 1161 160.94.X.32 1434 17 16 [0,2) [0,1829)1180.82 63.150.X.253 1161 128.101.X.61 1434 17 16 [0,2) [0,1829)1107.78 63.150.X.253 1161 160.94.X.154 1434 17 16 [0,2) [0,1829)

Potential Rules:1.

{Dest Port = 1434/UDP #packets ∈ [0, 2)} --> Highly anomalous behavior (Slammer Worm)

2.

{Src IP = 142.150.Y.101, Dest Port = 2048/ICMP #bytes ∈ [0, 1829]} --> Highly anomalous behavior (ping – scan)

Potential Rules:1.

{Dest Port = 1434/UDP #packets ∈ [0, 2)} --> Highly anomalous behavior (Slammer Worm)

2.

{Src IP = 142.150.Y.101, Dest Port = 2048/ICMP #bytes ∈ [0, 1829]} --> Highly anomalous behavior (ping – scan)

Page 14: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Typical Summarization OutputDetected anomalous connections on May 21, 2003

MINDS

sco re c1 c2 s rc IP sPo rt d s t IP d Po rt p ro to c flag s p ackets b ytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

31.2 - - 2 18 .19 .X.16 8 50 0 2 134 .84 .X.129 4 18 2 6 27 [5,6 ) [0 ,2 0 45) 0 0.01 0.01 0.03 0 0 0 0 0 0 0 0 0 0 1 0

3 .0 4 13 8 12 6 4 .156 .X.74 ----- xxx.xxx.xxx.xx ----- xxx 4 [0 ,2 ) [0 ,2 0 45) 0.12 0.48 0.26 0.58 0 0 0 0 0.07 0.27 0 0 0 0 0 0

15.4 - - 2 18 .19 .X.16 8 50 0 2 134 .84 .X.129 4 89 6 6 27 [5,6 ) [0 ,2 0 45) 0.01 0.01 0.01 0.06 0 0 0 0 0 0 0 0 0 0 1 0

14 .4 - - 134 .8 4 .X.12 9 4 770 2 18 .19 .X.168 50 02 6 27 [5,6 ) [0 ,2 0 45) 0.01 0.01 0.05 0.01 0 0 0 0 0 0 1 0 0 0 0 0

7.81 - - 134 .8 4 .X.12 9 3 8 90 2 18 .19 .X.168 50 02 6 27 [5,6 ) [0 ,2 0 45) 0.01 0.02 0.09 0.02 0 0 0 0 0 0 1 0 0 0 0 0

3 .0 9 4 1 xxx.xxx.xxx.xx 4 72 9 xxx.xxx.xxx.xx ----- 6 ------ --------- --------- 0.14 0.33 0.17 0.47 0 0 0 0 0 0 0.2 0 0 0 0 0

2 .4 1 64 8 xxx.xxx.xxx.xx ----- 2 00 .75.X.2 ----- xxx ------ --------- [0 ,2 0 45) 0.33 0.27 0.21 0.49 0 0 0 0 0 0 0 0 0.28 0.25 0.01 0

6 .6 4 - - 2 18 .19 .X.16 8 50 0 2 134 .84 .X.129 3 676 6 27 [5,6 ) [0 ,2 0 45) 0.03 0.03 0.03 0.15 0 0 0 0 0 0 0 0 0 0 0.99 0

5.6 - - 2 18 .19 .X.16 8 50 0 2 134 .84 .X.129 4 62 6 6 27 [5,6 ) [0 ,2 0 45) 0.03 0.03 0.03 0.17 0 0 0 0 0 0 0 0 0 0 0.98 0

2 .7 12 0 xxx.xxx.xxx.xx ----- xxx.xxx.xxx.xx 113 6 2 [0 ,2 ) [0 ,2 0 45) 0.25 0.09 0.15 0.15 0 0 0 0 0 0 0.08 0 0.79 0.15 0.01 0

4 .3 9 - - 2 18 .19 .X.16 8 50 0 2 134 .84 .X.129 4 571 6 27 [5,6 ) [0 ,2 0 45) 0.04 0.05 0.05 0.26 0 0 0 0 0 0 0 0 0 0 0.96 0

4 .3 4 - - 2 18 .19 .X.16 8 50 0 2 134 .84 .X.129 4 572 6 27 [5,6 ) [0 ,2 0 45) 0.04 0.05 0.05 0.23 0 0 0 0 0 0 0 0 0 0 0.97 0

4 .0 7 8 0 160 .9 4 .X.114 518 27 6 4 .8 .X.6 0 119 6 24 [48 3 ,-) [84 2 4 ,-) 0.09 0.26 0.16 0.24 0 0 0 0.91 0 0 0 0 0 0 0 0

3 .4 9 - - 2 18 .19 .X.16 8 50 0 2 134 .84 .X.129 4 525 6 27 [5,6 ) [0 ,2 0 45) 0.06 0.06 0.06 0.35 0 0 0 0 0 0 0 0 0 0 0.93 0

3 .4 8 - - 2 18 .19 .X.16 8 50 0 2 134 .84 .X.129 4 524 6 27 [5,6 ) [0 ,2 0 45) 0.06 0.06 0.07 0.35 0 0 0 0 0 0 0 0 0 0 0.93 0

3 .3 4 - - 2 18 .19 .X.16 8 50 0 2 134 .84 .X.129 4 159 6 27 [5,6 ) [0 ,2 0 45) 0.06 0.07 0.07 0.37 0 0 0 0 0 0 0 0 0 0 0.92 0

2 .4 6 51 0 2 00 .75.X.2 ----- xxx.xxx.xxx.xx 2 1 6 2 --------- [0 ,2 0 45) 0.19 0.64 0.35 0.32 0 0 0 0 0.18 0.44 0 0 0 0 0 0

2 .3 7 42 5 xxx.xxx.xxx.xx 2 1 2 00 .75.X.2 ----- 6 20 --------- [0 ,2 0 45) 0.35 0.31 0.22 0.57 0 0 0 0 0 0 0 0 0.18 0.28 0.01 0

2 .4 5 58 0 2 00 .75.X.2 ----- xxx.xxx.xxx.xx 2 1 6 ------ --------- [0 ,2 0 45) 0.19 0.63 0.35 0.32 0 0 0 0 0.18 0.44 0 0 0 0 0 0

UM computer connecting to a remote FTP server, running on port 5002Summarized TCP reset packets received from 64.156.X.74, which is a victim of DoS attack, and we were observing backscatter, i.e. replies to spoofed packetsSummarization of FTP scan from a computer in Columbia, 200.75.X.2Summary of IDENT lookups, where a remote computer tries to get user nameSummarization of a USENET server transferring a large amount of data

Page 15: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Typical Summarization OutputDetected anomalous connections on Sept 11, 2003, 7:40 AM CT

MINDS

sco re c1 c2 s rc IP sPo rt d s t IP d Po rt p ro t flags p ackets b ytes 1 2 3 4 5 6 7 8 9 10 11 12

611 - - 128 .118 .x.96 873 16 0 .9 4 .x.50 4 529 6 ---AP--- [24 k,12 4k [2 0M,182 M 0 0 0 0 0 0 0 1 0 0 0 034 8 - - 160 .9 4 .x.50 452 9 12 8 .118 .x.9 6 8 73 6 ---A---- [24 k,12 4k [3 M,5M] 0 0 0 0 0 0 0 1 0 0 0 023 .9 - - 128 .101.x.33 20 20 0 .95.x.2 25 50 01 6 ---AP--- [24 k,12 4k [2 0M,182 M 0 0 0 0 0 0 0 1 0 0 0 010 .9 - - 2 4 .2 2 3 .x.59 1135 16 0 .9 4 .x.1 554 6 ---APRSF [33 8 ,3 79] [15k,17k] 0 .1 0 .1 0 .1 0 .3 0 0 0 0 .9 0 0 0 07.8 11 0 x.x.x.x 82 00 16 0 .9 4 .x.154 --- 6 ---AP-SF [4 ,4 ] --- 0 .4 0 .4 0 .7 0 .1 0 0 0 0 0 0 .2 0 .1 010 - - 128 .101.x.173 22 24 .2 6 .x.13 4 94 9 6 ---AP--- [24 k,12 4k [3 M,5M] 0 0 0 0 0 0 0 1 0 0 0 09 .6 - - 128 .101.x.113 20 81.16 8 .x.40 4 68 92 6 ---AP-SF [24 k,12 4k [2 0M,182 M 0 0 0 0 0 0 0 1 0 0 0 09 .5 - - 192 .18 .x.4 0 31734 13 4 .8 4 .x.19 3 60 80 6 ---AP--F [24 k,12 4k [2 0M,182 M 0 0 0 0 0 0 0 1 0 0 0 09 .5 - - 192 .18 .x.4 0 540 37 13 4 .8 4 .x.19 3 60 81 6 ---AP--F [24 k,12 4k [2 0M,182 M 0 0 0 0 0 0 0 1 0 0 0 09 .4 - - 2 4 .3 3 .x.62 20 11 16 0 .9 4 .x.150 3 98 9 6 ---AP-SF [217,217] [2 52k,2 65k] 0 .2 0 .2 0 .3 0 .2 0 0 1 0 .6 0 0 0 07.8 13 1 x.x.x.x 82 00 13 4 .8 4 .x.21 --- 6 ---AP-SF [4 ,4 ] --- 0 .4 0 .4 0 .7 0 .3 0 0 0 0 0 0 .1 0 09 .1 - - 2 4 .3 3 .x.62 20 11 16 0 .9 4 .x.150 4 010 6 ---AP-SF [217,217] [2 52k,2 65k] 0 .2 0 .2 0 .3 0 .1 0 0 1 0 .6 0 0 0 09 .1 - - 2 4 .3 3 .x.62 20 11 16 0 .9 4 .x.150 3 99 5 6 ---AP-SF [217,217] [2 52k,2 65k] 0 .2 0 .2 0 .3 0 .1 0 0 1 0 .6 0 0 0 09 .1 - - 2 4 .3 3 .x.62 20 11 16 0 .9 4 .x.150 3 99 2 6 ---AP-SF [217,217] [2 52k,2 65k] 0 .2 0 .2 0 .3 0 .1 0 0 1 0 .6 0 0 0 09 - - 2 4 .3 3 .x.62 20 11 16 0 .9 4 .x.150 4 00 7 6 ---AP-SF [217,217] [2 52k,2 65k] 0 .2 0 .2 0 .3 0 .1 0 0 1 0 .6 0 0 0 08 .9 - - 2 4 .3 3 .x.62 20 11 16 0 .9 4 .x.150 4 00 4 6 ---AP-SF [218 ,2 34 ] [2 65k,3 09 k] 0 .2 0 .2 0 .3 0 .1 0 0 1 0 .6 0 0 0 08 .9 - - 2 4 .3 3 .x.62 20 11 16 0 .9 4 .x.150 4 00 1 6 ---AP-SF [217,217] [2 52k,2 65k] 0 .2 0 .2 0 .3 0 .1 0 0 1 0 .6 0 0 0 05.7 10 # 6 3 .2 51.x.177 82 00 x.x.x.x --- 6 ---AP-SF [4 ,4 ] --- 0 .4 0 .4 0 .3 0 .4 0 0 0 0 0 0 0 .1 07.3 2 7 7 6 6 .151.x.19 0 82 00 x.x.x.x --- 6 ---AP-SF [4 ,4 ] [559 ,559] 0 .4 0 .4 0 .7 0 .2 0 0 0 0 0 0 .2 0 0

UM computers doing bulk transfersAttack on Real-Media server8200/tcp traffic possibly related to gotomypc.com which allows users to remotely control a desktop (involves a third party)Mysterious traffic currently being investigated

Page 16: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Need for Distributed Intrusion DetectionAttacks on the network infrastructure may be launched from several different locations and may target multiple destinations

Attacks are often preceded by scanning for vulnerabilities, and subsequently compromising (planting malicious codes) into a few vulnerable machines Using these compromised machines as handlers, the attacker can spread attacks to an even wider part of the Internet

Stealthy coordinated attacks with low traffic volumes,undetected by IDSs based at a single network site, can best be detected, especially in their earlier stages, by correlating data collected at multiple network sitesDetecting distributed and coordinated attack patterns in a timely manner requires information not only from the local network site, but also from network sites elsewhere

Page 17: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Cooperative/Global Intrusion DetectionCooperative/Global Intrusion Detection

Cooperative network of locally running IDSs that can share information (e.g., security alerts, suspected system vulnerabilities, mined attack patterns) to cooperatively detect, throttle and track down distributed attacks

Global IDS that uses patterns detected at local sites – to discover patterns of emerging attacks that cannot be detected by a standalone IDS

INTERNET

network

Global Intrusion Detection System (g-IDS)- Detecting coordinated/distributed attacks- Detecting anomalies at the global level

Local Intrusion Detection Systems ( l-IDS)- Analyzing Cyber Threats using Data Mining- Clustering for Profiling- Anomaly Detection- Summarization of attacks- On-line Early detection of Cyber Threats

network network

Page 18: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Challenges of Distributed IDSs

Large data size (millions of network connections)Typical network traffic at University level reach around 500 million connections per dayNot sufficient bandwidth to transfer row data across sites

Preservation of data privacySecurityHeterogeneous data

Raw network data, IDS alarms, high-level patternsDifferent sites may have different types of data or attributes (e.g. tcpdump, netflows, different IDS tools)

Temporal (streaming) nature of data

Page 19: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Data Services(e.g. JDBC,SQL, SRB)

Grid and Web Services(e.g. Globus,

XML-RPC,DWTP,Condor)

Grid and Web Services(e.g. Globus, XML-RPC,

DWTP, Condor)

Data Services(e.g. JDBC, SQL, SRB)

Data and PolicyManagement Services

Data and PolicyManagement Services

Scheduling and ReplicationServices

Data Mining and

Exploration Services

Execution, RepresentationAnd Management Systems

(e.g., Chimera, Pegasus)

Application

Data & Model Transport Services

Grid Control Services

Data Mining Middleware for Grids

Joint work with B. Grossman, S. Ranka, and J. Weissman

Page 20: Network Intrusion Detection Using Distributed Data Mining fileNetwork Intrusion Detection Using Distributed Data Mining Vipin Kumar Army High Performance Computing Research Center

Grid-Based Data Mining: Distributed Network Intrusion Detection

Data Services Grid and Web Services

Data and PolicyManagement Services

Scheduling and ReplicationServices

Data Mining and Exploration Services

Execution, RepresentationAnd Management Systems

Application

Grid and Web Services Data Services

Data and PolicyManagement Services

Data & Model Transport Services

Grid Control Services

Detection of attack by correlating suspicious events across sites.

Locate computing resources needed for time critical execution of the data mining query.

Needed to protect privacy, but allow necessary data access.

INTERNET

network

network network