security in p2p networks a study of the gnutella protocol and it’s weaknesses by: imran qureshi...

Security in P2P NetworksA study of the gnutella protocol and it’s weaknesses

By: Imran Qureshi

Date: December 9, 2004

Gnutella Security - Overview

- What is Gnutella? The history- The topology of Gnutella

- no central server (de-centralized - second generation)- direct peer connection

- Gnutella Protocol - Gnutella Descriptors

- 5 descriptors - ping, pong, query, queryhit, push - byte structure of the descriptors

- descriptor header - byte structure- Communication in Gnutella

- Finding and connecting to other servents - Downloading resources - offline- Firewalled servents

Overview

- Security Risks-Spamming- Denial of service attacks- Pong attack- IP harvesting- Spreading viruses through the push descriptor- Man in the Middle attacks

- Solutions- Validation- Gnutella Proxy Server

Gnutella History

History of Gnutella

• Gnutella was developed at Nullsoft, a subsidiary of AOL, by Justin Frankel and Tom Pepper

• Justin Frankel, as some call him “the world’s most dangerous geek, created Winamp at the age of 18 and a few years later, Gnutella while working for AOL.

• Gnutella was released on 14th March 2000• During those days, Napster was under scrutiny of lawsuits regarding

illegal copyrighted material. When people came to know about Gnutella, a large number of people downloaded it.

• AOL forced Nullsoft to take down all links to Gnutella from it’s website since it promoted piracy. But for the small time that gnutella was available, one day, a large group of people already had it.

• Gnutella was open source, so people started reverse engineering the protocol and now we have different programs using the Gnutella protocol:

Gnutella Clients

Source: Peer-to-Peer Networks, by Prof. N.Vlajic

Gnutella Topology

Gnutella - Topology

• Gnutella topology is known as “de-centralized topology”. Meaning that the communication between two peers or users or nodes on the network take place directly. Each node acts as a client or a server, giving permission to download resources or asking other nodes to access there resources.

• Famous P2P clients; Napster, Kazaa, Gnutella• The total number of peers found on the Gnutella network

during a weekday is around 43,546, sharing approximately 1,843,549 files.

• The communication does not go through a central server, unlike Napster.

• Each node or peer on the network is called a “servent”. The word servent comes from:

Each peer = SERVer + cliENT = “SERVENT”

Gnutella – Topology (contd…)

Napster (central server)Gnutella (no central server)

Gnutella Protocol

Gnutella Protocol

• The Gnutella Protocol are a set of rules by which users communicate over the network.

• All the communication is done via the use of “descriptors”• There are 5 basic descriptors used, namely :

- Ping, Pong, Query, QueryHit and Push

• Each descriptor is preceded by a “descriptor header”• In the following slides, we will describe the purposes of the

descriptors and there byte structure.

Gnutella Protocol – Byte Structures

The Descriptors:• When a peer talks to another peer, the communication is done

via descriptors.• The byte structure of a typical message is as follows:

• Note:- All the following structures are in little-endian byte order (least significant value is stores first)- All IP addresses are in IPv4 format:

Descriptor Header Descriptor Payload

0 22 23variable,0…max

0xD0 0x11 0x32 0x04

byte1 byte2 byte3 byte4


Descriptor Header:• Byte Structure

- Descriptor ID – Unique identifier for the descriptor on the network (16-byte String)- Payload Descriptor – This value depends on the descriptor being sent:

ping - 0x00pong - 0x01query - 0x80queryhit - 0x81push - 0x40

- TTL (Time to live or Horizon) – The number of times that the descriptor will be forwarded. Each servent that receives a descriptor, will decrement the value of TTL and forward it on to the next peer. When TTL reaches 0, the descriptor is no longer forwarded.TTL is the best way available to reduce the amount of network traffic and prevent poor performance.

Descriptor ID Payload Descriptor

TTL Hops Payload Length


- Hops – Total number of times the descriptor has already been forwarded. The hop value is incremented by each peer who receives it.

TTL(initial) = TTL(current) + Hops(current)

- Payload Length – The length of the next descriptor. Used to find the beginning of the nest descriptor.

• Right after the descriptor header, is a descriptor payload. This payload could be :

Ping• A ping descriptor is used by a servent to find or search for other servents on the

network.• A servent who receives a ping descriptor, responds back with a pong.• Ping have a length of 0 and have no payload. Hence they have no byte structure.• The descriptor header identifies a ping by having a value of 0x00 in the payload

descriptor field and a value of 0x00000000 in the payload length field


Pong

• Sent as a response to a ping• Defining values:

- Port: the port at which this responding can accept incoming connections- IP Address: IP Address of the responding host (big-endian format)- Number of files shared: Total number of files the responding is sharing on the network (usually found in the “shared folder”- Number of Kb’s shared: Total number of Kb’s the responding host (with the given IP and Port) is sharing.

Port IP Address Number of files shared

Number of Kb shared


Query

• After a servent has the IP address and the port of other servents, it may search for particular files using the query descriptor.

• Defining values:

- Minimum Speed: The minimum speed (in kb/s) of the servents who should respond to this query request. A query with the minimum speed requirements of m (kb/s), should only responded to with a queryhit by a servent who has a speed greater than m.

- Search Criteria: A search string terminated by a null (0x00). The maximum length is bounded by the payload_length field of the descriptor header.eg: “nameofthesong.mp3”

Minimum Speed Search Criteria


QueryHit

- No. of Hits: Total number of hits or matches for the query in the result set- Port: the port at which this responding can accept incoming connections- IP Address: IP Address of the responding host (big-endian format)- Speed: Speed of the responding host- Result Set: Set of No. of hits responses for the correspoding query. In otherwords, how many files in the shared folder of the responding host met the search criteria. Each of the set of the No. of hits elements, has the following structure:

- File Index: Location and the ID of the file matching the query. (assigned by the responsing host)

- File size: Size in bytes of the file.- File Name: name of the file (double null terminated 0x0000)

- Servent Identifier: Unique 16-byte string identifier of the responding servent on the network.

No. of Hits

Port IP Address

Speed Result Set

Servent Identifier

File Index File Size File Name


Push

• The basic purpose of a push descriptor is to connect to a servent who is behind a firewall. This topic is discussed in detail later on.

• Defining values:- Servent Identifier: targeted or firewalled servents unique 16-byte string

identifer on the network, being requested to push the file with a index of File Index

- File Index: index of the file to be pushed on the targeted servents shared folder.

- IP Address: IP Address of the servent (big-endian format) to whom will be pushed

- Port: the port on the targeted host, through which the file should be pushed.

Servent Identifier

File Index IP Address Port

Communication in Gnutella


Finding servents• In order to connect to a gnutella network and share files, a servent needs

to run one of the many gnutella clients (ex; bearshare, morpheus etc..).

• After the network is launched, this peer or node will let it’s neighboring node (let’s say B) know of its existence. (You should know the Domain Name Server DNS or IP Address of some neighbor at the start).

• A will let’s its neighbors know of its existence by sending out the ping descriptor.

• B in turn will forward the ping to it’s neighbors and this descriptor will keep going throughout the network letting the nodes know of A’s existence. Like that, the information is broadcasted, and will keep on going to different nodes on the network until the time-to-live (TTL) packet expires or reaches 0.


• Now, A has become part of the network and everyone know of it’s existence.

• If a servent wants to acknowledge, it will send a pong descriptor to A, letting it know which of its port is accepting traffic and what’s the IP address.

• Like that, A will have a file of all the IP address and ports of the servents who responded with a pong descriptor.


Servent A announcing existence to peersSource: Prof. Igor Ivkovic, Dept. of Compt. Science at Univ. of Waterloo

“Improving Gnutella Protocol”


Connecting to Servents• Now that A has a file containing other servents addresses and ports, it

will try to connect to one of those servents (lets say B)• After an TCP session is established with B, A will then send the

following commands in ASCII :

GNUTELLA CONNECT/<protocol version string>\n\nwhere protocol version is the current version of Gnutella (ex: “0.4”)

• If B wants to connect, it responds to the command by sending:

GNUTELLA OK\n\n• Now, there is a valid direct connection between A and B.• If B responds with any other command, A will know that B has no

willingness to create a connection.


• Now that this connection has been established, the communication between A and B will carry on with the use descriptor and descriptor headers, as described before.

Ping, Pong, Query, Queryhit and Push


Downloading resources or files from other serventsBefore downloading is done, we need to search for the files.

Searching for files• Let’s again take our two servents A and B• Suppose that A wants to search for a file called “ushersong.mp3”.• It will send out a query descriptor as follows:

- Let’s suppose that the minimum speed requirements are x:

• If a servent has a file or files which has the file “ushersong.mp3” and has a speed >= x (kb/s), it may chose to send a queryhit descriptor as follows:

X ushersong.mp3


1 30 120.168.10.2

> x Result Set Servent Identifier

2 4661248 bytes “ushersong.mp3”

Result Set:


• A will receive the queryhit descriptor and ask for downloading the file.

Downloading• All searches on the gnutella network are done online while the downloads are done

offline• Hence, two servents who wish to download, communicate using HTTP commands.• So, in our example A creates a TCP connection with B and sends the following

command to download the file:GET /get/<File Index>/<File Name>/ HTTP/1.0\r\nConnection: Keep-Alive\r\nRange: bytes=0-\r\nUser-Agent: Gnutella\r\n\r\n

source: Mattias Jansson, “Gnutella” Feb 1, 2004


• For our example, the HTTP command will read:

GET /get/2/ushersong.mp3/ HTTP/1.0\r\nConnection: Keep-Alive\r\nRange: bytes=0-\r\nUser-Agent: Gnutella\r\n\r\n

• A response to this could be :

HTTP 200 OK\r\nServer: Gnutella\r\nContent-type: application/binary\r\ncontent-length: 4661248\r\n\r\n

… data …

source: Mattias Jansson, “Gnutella” Feb 1, 2004


Fire walled Servents:• If a targeted servent, from whom a file needs to be downloaded, is

behind a firewall, it is not possible to create a direct connection in order to download the file.

• The fire wall will not allow incoming connections to it’s gnutella port.• Hence, the requesting servent sends a push descriptor.• Upon receiving the push request, the targeted servent tries to create a

TCP/IP connection with that host. If this connection is not established, then it means that both the servents/hosts are behind a firewall.

• So the targeted servent sends the following command:GIV/<File Index>:<Servent Identifier>/<File Name>\n\n

• After receiving this command, the requesting servent sends the following HTTP GET request:


GET /get/<File Index>/<File Name>/ HTTP/1.0\r\n

Connection: Keep-Alive\r\n

Range: bytes=0-\r\n

User-Agent: Gnutella\r\n

\r\n

• The rest of the download process is similar to what I described before.

Security Risks of Gnutella


Spamming and Denial of Service Attacks• In emails, spammed messages can easily be deletd and there will be

no further harm.• But, if you accept a spammed query, the consequences can be very

harsh and you could actively play a part in the Denial of Service Attacks (DOS)

• DOS attacks in Gnutella are achieved very very easily.• If a user (A) asks for a file to be downloaded from another peer(B), it

will query it.• Let’s say that B in our case is a malicious peer and is misbehaving on

the network.• B will receive the query from A and respond positively, and urge A to

download the file from C (the host under attack)• Hence A will start downloading the files from C, without knowing that it

is actually downloading it from C.


• This way, the malicious B will direct many peers to download files from C and hence create a denial of service attack

• The important to understand in this concept is that, any body could be playing a role in a DOS, with out knowing it.

• At some point, the load on C could be so much that it could be unable to allows connections to more peers and may even crash.

• It will also be very hard for any to identify who originated this attack, since request to C could be coming from many different IP and many different Domains.


Pong Attack• The concept behind a pong attack is the same as the DOS attack• When the malicious B receives a ping from A, it might reply back with

a pong, containing the IP and port of C (host under attack)

• A believes that a connection has been established with B, and will start forwarding queries, even though they are going to C

Port IP Address Number of files shared

Number of Kb shared


IP Harvesting• Hackers are always in search for people’s IP addresses.• They continuously search and scan the internet in order to see

people’s IP addresses.• Since most web servers have highly protective firewalls, it is hard for

them for break through.• But in Gnutella, IP are easily derived.• P2P networks work in a way that requires you to advertise your IP

address.• A hacker could easily gather or harvest IP addresses and attack

vulnerable user on the network.• This is not a problem for people with Dial-up Connections, wince there

IP keeps on changes.• But the people with static IP addresses (such as montclair state

university or “.edu” domains) are in trouble.


Transferring viruses through the push descriptor

• A typical push descriptor contains the IP addresses of the responding host and the port that is accepting traffic.

• When a user sends out a query to a peer , that peer might lie and say that it has the file even though it doesn’t.

• Then the user will send a push request to the responding peer and the responding peer will create a TCP/IP connection with the user.

• Now, the responding host can easily transfer any files to the user, since it has already gained trust by lying.

• These files could be “.exe” files, that could transfer a virus to the user’s computer


Man in the Middle Attacks:• I will describe this with the use of an example:• We have three people:

A – searching for a fileB – has the fileC – malicious user

• A pings the network searching for a file.• B has the file, and responds back with a query.• Suppose C receives one of these queries, changes it to it’s own IP and

port , and directs it to A• A, who gets the reply from C, creates the connection with C but not B• C, on the other hand, download the original file from B, infects it with

malicious content, and then transfers it to A

Solutions

Solutions

1) Validation

2) Unique Network Identifier

3) Reduce Network traffic

Thank You For Your Attention

Questions or Suggestions about any concepts discussed ?

security in p2p networks a study of the gnutella protocol and it’s weaknesses by: imran qureshi...

Documents

history of gnutella

topology of gnutella

gnutella network

gnutella protocol slide

gnutella topology slide

gnutella history slide

gnutella security overview

gnutella clients source