statistics for engineering and infonnation science - …978-1-4757-34… · ·...

Statistics for Engineering and Infonnation Science

Series Editors M. Jordan, S.L. Lauritzen, J.P. Lawless, V. Nair

Springer Science+Business Media, LLC

Statistics for Engineering and Information Science

Akaike and Kitagawa: The Practice of Time Series Analysis. Cowell, Dawid, Lauritzen, and Spiegelhalter: Probabilistic Networks and

Expert Systems. Doucet, de Freitas, and Gordon: Sequential Monte Carlo Methods in Practice. Fine: Feedforward Neural Network Methodology. Hawkins and otwell: Cumulative Sum Charts and Charting for Quality Improvement. Jensen: Bayesian Networks and Decision Graphs. Marchette: Computer Intrusion Detection and Network Monitoring:

A Statistical Viewpoint. Vapnik: The Nature of Statistical Learning Theory, Second Edition.

David J. Marchette

Computer Intrusion Detection and Network Monitoring A Statistical Viewpoint

With 86 Illustrations

Springer

David J. Marchette Naval Surface Warfare Center Code BlO 17320 Dahlgren Road Dahlgren, V A 22448 USA [email protected]

Series Editors Michael Jordan Department of Computer Science University of California, Berkeley Berkeley, CA 94720 USA

Jerald F. Lawless Department of Statistics University of Waterloo Waterloo, Ontario N2L 3G 1 Canada

Steffen L. Lauritzen Department of Mathematical Sciences Aalborg University Fredrik Bajers Vej 7G 9220 Aalborg East Denmark

Vijay Nair Department of Statistics University of Michigan Ann Arbor, MI 48109 USA

Library of Congress Cataloging-in-Publication Data Marchette, David J.

Computer intrusion detection and network monitoring: a statistical viewpoint 1 David J. Marchette. p. cm. - (Statistics for engineering and information science)

Includes bibliographical references and index. ISBN 978-1-4419-2937-2 ISBN 978-1-4757-3458-4 (eBook) DOI 10.1007/978-1-4757-3458-4

1. Computer security-Statistical methods. 2. Computer networks-Security measures-Statistical methods. I. Title. II. Series. QA76.9.A25 .M342001 005.8-dc21 2001032011

Printed on acid-free paper.

© 200 I Springer Science+Business Media New York

Originally published by Springer-Verlag New York, Inc. in 2001.

Softcover reprint of the hardcover I st edition 200 I

All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher Springer Science+Business Media, LLC except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.

Production managed by Michael Koy; manufacturing supervised by Joe Quatela. Photocomposed copy prepared from the author's ~TEJX2e files.

9 8 7 6 5 432 1

SPIN 10833837

Preface

In the fall of 1999, I was asked to teach a course on computer intrusion detection for the Department of Mathematical Sciences of The Johns Hopkins University. That course was the genesis of this book. I had been working in the field for several years at the Naval Surface Warfare Center, in Dahlgren, Virginia, under the auspices of the SHADOW program, with some funding by the Office of Naval Research.

In designing the class, I was concerned both with giving an overview of the basic problems in computer security, and with providing information that was of interest to a department of mathematicians. Thus, the focus of the course was to be more on methods for modeling and detecting intrusions rather than one on how to secure one's computer against intrusions.

The first task was to find a book from which to teach. I was familiar with several books on the subject, but they were all at either a high level, focusing more on the political and policy aspects of the problem, or were written for security analysts, with little to interest a mathematician. I wanted to cover material that would appeal to the faculty members of the department, some of whom ended up sitting in on the course, as well as providing some interesting problems for students. None of the books on the market at the time had an adequate discussion of mathematical issues related to intrusion detection.

Lacking a text, I was thus forced to provide examples from articles, Web sites, and the like. After the course was over, I decided it would be a good idea to provide a compendium of the information that I had found. This book is the result. It's purpose is to provide an introduction to some of the issues in computer intrusion detection, with a focus on problems and techniques that would be of interest to a mathematician or statistician.

v

vi PREFACE

I have provided an extensive bibliography, covering much of the research in computer intrusion detection. This is not complete, but it does cover most of the important papers in the area.

My background is in pattern recognition and statistics, with a focus on computational statistics. This is the branch of statistics that is interested in the interface between statistics and computers. It considers issues related to computation, large data sets and high-dimensional data, visualization of complex data, and nonparametric models. Thus, computer intrusion detection was a natural area in which to become involved.

Dahlgren, Virginia, USA DJ. MARCHETfE

Acknowledgments

My mentors and teachers have had an important part in making this book possible. In particular, I am indebted to my dissertation advisor, Prof. Ed Wegman, for his encouragement, advice and friendship. I would not have learned about computer security and intrusion detection without John Green, Vicki Irwin and Stephen Northcutt. They have been extremely helpful, providing information, data, and training that were invaluable. I also want to thank the Department of Mathematical Sciences of The Johns Hopkins University, particularly Dan Naiman, John Wierman, Alan Goldman, and Carey Priebe. Many other people have had parts in bringing some of the information in this book to light, including Pat Carter, Jim Matthews, and Jeff Solka. Glen Moore, my boss at NSWC, has been extremely supportive, as has Wendy Martinez of the Office of Naval Research. I would particularly like to thank Matt Schonlau and Bill Cheswick for allowing me to use their graphics. Fred Kirby offered suggestions and caught several glaring errors. John Kimmel has been instrumental in bringing this work to fruition with a minimum of pain and suffering on my part. Finally, I must thank my family, particularly Susan, who has put up with a lot and spent a lot of time reading through and correcting manuscripts. The errors that remain are there in spite of her heroic efforts. Also, thanks to Steven, Jeffrey, and Katy for putting up with me while this work was written.

DJ. MARCHETIE

vii

Preface Acknowledgments Introduction

Part I Networking Basics

1 TCP/IP Networking 1.1 Overview of Networking ..... 1.2 tcpdump....... 1.3 Network Layering . 1.4 Data Encapsulation 1.5 Header Information 1.6 Fragmentation .. . 1.7 Routing ........... . 1.8 Domain Name Service ... . 1.9 Miscellaneous Utilities 1.10 Further Reading

2 Network Statistics 2.1 Introduction ........ . 2.2 Network Traffic Intensities

Contents

.....

.......

.......

v vii

xiii

3 3 6

9

10

11

21 22 23 27 42

43 43 43

ix

x CONTENTS

2.3 Modeling Network Traffic . . . . . . . . .. 53

2.4 Mapping the Internet .... . . . . . . . . .. 58

2.5 Visualizing Network Traffic . . . . . . . . . . . . . .. 60

2.6 Further Reading ..................... 70

3 Evaluation 73 3.1 Introduction...................... 73

3.2 Evaluating Classifiers . . . . . . . . . . . . . . . . 75

3.3 Receiver Operator Characteristic Curves ...... 79

3.4 The DARPAIMITLL ID Testbed .......... 79

3.5 Live Network Testing ................ 82

3.6 Further Reading ..................... 84

Part II Intrusion Detection

4 Network Monitoring

4.1 Introduction 4.2 tcpdump Filters.

Common Attacks SHADOW ....

89 89

90

91

106

4.3 4.4 4.5 4.6 4.7 4.8

Activity Profiling . . . . . . . . . . . . . . . . . . . .. 109

EMERALD .......... . . . . . . . .. 146

WATCHERS . . . . . . . . . . . . . . . . . . . . . . .. 150

GrIDS ................. . 150

4.9 Miscellaneous Utilities .. . . . . . . . . .. 151

4.10 Further Reading ...... . 157

5 Host Monitoring 159 5.1 Introduction...................... 159

5.2 Common Attacks . . . . . . . . . . . . . . . . . . . . . 159

5.3 NIDES ........................... 171

5.4 Computer Immunology ................. 178

5.5 User Profiling. . . . . . . . . . . . . . . . . . . . . . . 183

5.6 Miscellaneous Utilities ................. 201

5. 7 Further Reading ..................... 209

Part III Viruses and Other Creatures

6 Computer Viruses and Worms

6.1 Introduction.......... 6.2 How Viruses Replicate 6.3 How Viruses Scanners Work 6.4 Epidemiology ........ . 6.5 An Immunology Approach 6.6 Virus Phylogenies ..... . 6.7 Computer Worms 6.8 Further Reading ..

7 Trojan Programs and Covert Channels

7.1 Introduction ... 7.2 Covert Channels 7.3 Steganography . 7.4 Back Doors ... 7.5 Miscellaneous Trojans . 7.6 Detecting Trojans 7.7 Further Reading ....

Appendix A Well-Known Port Numbers

Appendix B Trojan Port Numbers

Appendix C Country Codes

Appendix D Security Web Sites

D.1 Introduction .......... . D.2 General Information Web Sites D.3 Security ... D.4 Cyber Crime D.5 Software . . . D.6 Data ..... D.7 Intrusion Detection

Bibliography

Glossary

CONTENTS xi

215 215 216 218 221 229 231 232 239

241 241 242 246 249 252 254 255

257

265

275

281 281 282 284 287 288 289 289

291

311

xii CONTENTS

Acronyms

Author Index

Subject Index

317

320

325

Introduction

Computer networks are a rich source of interesting problems and data for statisticians. This book will explore some of the issues of interest to the statistician that arise from the general problem of protecting computers and computer networks from unauthorized use or malicious attacks. This book will not attempt to be comprehensive, but rather will focus on a few areas of particular interest that lend themselves to statistical or probabilistic analysis.

One reason to forego any claim of comprehensiveness is the speed at which change occurs in networking and on the Internet. When I started this work, in December of 1999, I had intended a chapter on future threats, in which I placed distributed attacks. It was not more than a few months later that several major Web servers were shut down by distributed denial of service attacks. Thus, the future quickly becomes the past.

Another factor is the vast literature on networking and network modeling, which is of immense interest to a statistician and of only marginal interest in network defense. I will briefly touch on this topic in Chapter 2, but it deserves a separate book in its own right.

Since the subject of computer and network security is quite broad, some discussion of scope is in order. First, I will consider what I refer to as "network monitoring." A typical network within a corporation or university is a collection of machines that can communicate with each other and with machines on other networks (the Internet) through a gateway. A network monitor is a system designed to monitor the traffic in and out of the network (or between machines on the network) for the purposes of determining whether the network is working properly and that it is not being attacked from without.

xiii

xiv INTRODUCTION

Network monitoring can be as simple as collecting statistics on usage to determine such things as average and peak loads and other measures of the health of the network. It can also include characterizing the kind of traffic on the network as either "normal" (and hence not of concern) or "abnormal" (and hence warranting further investigation). Detecting and characterizing changing activity on the network is of interest, as are sudden deviations from "normal" activity. These ideas will be considered in some detail in Chapters 2 and 4.

An analogy to keep in mind as you read this book is the "envelope" analogy. The information sent across the network is broken into small "chunks," referred to as "packets". Each packet contains addressing information and data. Consider a standard (paper) letter. It contains an address (to and from) and some information as to how the letter is to be handled (e.g., return to sender if undeliverable) as well as content, which resides inside the letter and is generally inaccessible to the mail handlers. A packet is like a letter. It contains addressing and handling information (the "header") and private information (the "data"), which, unlike a letter, is also freely accessible to anyone who wants to look at it (although it can be encrypted for privacy).

Essentially, network monitoring involves measuring statistics on the individual packets sent across the network. One can keep statistics on the headers (the address information on the letter), or one can look at the content to try to infer the intent of the sender. Looking at content is problematic for several reasons:

• High network speeds require extremely fast processing to analyze content.

• Privacy issues often make it politically (or legally) difficult.

• The difficulty of parsing the content is comparable to that of natural language.

• Encryption can make it difficult or impossible to determine the content.

I take the position that network monitoring should primarily concern the address information (header) of the packets, while any content monitoring should be restricted to the individual hosts. Thus, we consider issues of analyzing content or specific individual actions in the chapter on host monitoring, Chapter 5.

Intrusion detection is more specific than network monitoring in the sense that it focuses not only on the detection of "abnormal" behavior but the determination that the behavior is undesirable and/or harmful. In order to make this determination, an intrusion detection system (IDS) must infer both the intent of the activity and the ultimate results of the activity, should it be successful.

There has been a lot of press about computer intrusions in the last few years. Usually the culprits are identified as "hackers," a term that has come to connote a person bent on illegal entry and malicious damage to a computer system. I will refrain from using this term for several reasons. The term "hacker" originally meant someone who was very good at writing computer programs, possibly to the point of obsession. To be a "hacker" was a badge of honor, for it denoted programmers who were at the top of their field. There are still those who hold to the old definition and prefer the term "cracker" for the person intent on damage. Rather than get involved in this battle, I have chosen to sidestep the issue entirely.

INTRODUCTION xv

Another reason to avoid the term is that it still retains the connotation of a knowledgeable person, when in reality many so-called "hackers" are simply kids (literally or metaphorically) who come across programs that allow them to break into other people's computers. These programs require little skill, assuming the target computer is not well-defended.

Finally, there is the issue of the insider, a person with legitimate access to the computer who, for revenge or gain, decides to damage or otherwise make unauthorized use of the machine. These people are not necessarily expert users and often do no "hacking" in any usual sense of the word. I will refer to any of the above as an "attacker."

This is not a book on how to secure your computer from attack. I will, however, point out various utilities that can help you in this or that are useful for collecting data relevant to intrusion detection. These utilities are all Unix-based, although most of them are also available for other operating systems. All are also available for free. Although there are many commercial products that perform these and other useful security and monitoring functions, I will not cover any commercial products.

There are a number of very good books describing how to secure a given operating system. One I recommend for Linux is Toxen [2001].

The focus of the utilities discussed in this book is almost entirely on collecting data rather than securing a system. Many of the utilities also help to secure a system, and a few are really designed primarily for this task. There are many utilities that have not been listed, due to space limitations, and the interested reader is encouraged to check the Unix manual pages and the Web addresses in AppendixD.

This avoidance of commercial products extends to those designed specifically for intrusion detection. There are several books that cover these, such as anonymous [1997], Escamilla [1998], Amoroso [1999], Northcutt [1999], and Bace [2000]. Also, products change so quickly that anything said about them will likely be inaccurate in a few months. Finally, in order to do a good job of evaluating commercial systems, I would feel the need to acquire them and test them out. This is not an option. Although we have several systems at NSWC that I could evaluate, I decided it best to leave the evaluation of these systems to others. Industry magazines are good places to find such evaluations.

Throughout the book, I have examples of IP addresses and machine names. These should all be considered imaginary, in no way corresponding to a real machine. This is particularly important in the examples of attacks. In no case does an attack example contain the name or IP address of the real attacker or victim, even in those cases where the data come from a real attack.

This book is organized into three sections, covering network basics, intrusion detection, and viruses. Computer professionals with a knowledge of basic networking and TCP/IP can skip most of the first section, whereas statisticians may find this material helpful.

The section on intrusion detection is split into network and host monitoring. Many of the same techniques are relevant to both of these areas, but each has unique features. I will describe some of the more common attacks and some of the approaches to detecting these and other attacks.

xvi INTRODUCTION

The final section covers viruses, worms, and other types of malicious code. The chapter on viruses describes how these operate and takes a slightly different approach to analysis. Rather than focusing on detection, I consider the problem of modeling virus propagation. This is similar to biological virus epidemiology and will make use of techniques from epidemiology. The chapter on trojan programs discusses some common examples ofthese and the more general problem of covert channels.

Since Unix may not be familiar to all readers of this book, a list of common commands follows.

• alias Rename a command (this is actually a shell command rather than a Unix command, but I will ignore this distinction). For example, I have the following on my computers:

alias 11 "Is -It"

which allows me to simply type "11" when I want a time-ordered long listing of a directory.

• cd Change the current directory.

• chmod Change the mode (read, write, execute permission) of a file or directory.

• chown Change the owner of a file or directory.

• cp Copy a file.

• csh The C command shell (similar to the MS-DOS prompt).

• echo Echo the string to the terminal.

• grep Search a file for occurrences of a given string.

• gzip A file compression utility.

• head List the first few lines of a file.

• kill Stop the execution of a process.

• Is List a directory (similar to MS-DOS dir command).

• man Look up a command in the manual pages. I will refer to a manual page as a "man page," which is the standard terminology among Unix users.

• mkdir Create a directory.

• more View a file one page at a time.

• mv Move a file.

• perl A powerful language for scanning and extracting information from text files.

INTRODUCTION xvii

• rm Remove a file (similar to MS-DOS del command).

• rmdir Remove an empty directory.

• sh The Bourne command shell.

• su Substitute user. Change to another user (for example, root). Some people think "su" stands for "super user," since typing "su" alone is used to change to the "super user," known as "root" in the Unix world. Assuming you know the user's password, you can use "su" to change to that person's account. In particular, root can change to anyone's account. The syntax is

su username or su - username

The "-" makes the shell a login shell, and hence reads any initialization files that are read at login.

• tail List the last few lines in a file.

• vi A file editor. There are many text editors available. Vi is the classic Unix "visual editor" that is used by many programmers, particularly those who learned programming in the early days of Unix.

There are many books on Unix that provide information on the preceding commands and more. Rather than provide a list, I will leave it to the interested reader to visit a local bookstore.

statistics for engineering and infonnation science - …978-1-4757-34… · ·...

Documents