bittorrent: the protocol, its background and uses

35
Bittorrent: The protocol, its background and uses 1. BitTorrent Background a) What is BitTorrent? b) Who’s the author, history 2. The Protocol a) Terminology b) Distributed Scenario c) Structure of .torrent files d) Protocol between peers and trackers 3. BitTorrent Applications a) Bittorent Inc, Usages throughout industry

Upload: teo

Post on 22-Jan-2016

54 views

Category:

Documents


0 download

DESCRIPTION

BitTorrent Background What is BitTorrent? Who ’ s the author, history The Protocol Terminology Distributed Scenario Structure of .torrent files Protocol between peers and trackers BitTorrent Applications Bittorent Inc, Usages throughout industry. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bittorrent: The protocol, its background and uses

Bittorrent: The protocol, its background and uses

1. BitTorrent Backgrounda) What is BitTorrent?b) Who’s the author, history

2. The Protocol a) Terminologyb) Distributed Scenarioc) Structure of .torrent filesd) Protocol between peers and trackers

3. BitTorrent Applicationsa) Bittorent Inc, Usages throughout industry

Page 2: Bittorrent: The protocol, its background and uses

BitTorrent

“You get so tired of having your work die,” he says. “I just wanted to make something that people would actually use.”

• The above quote if from Bram Cohen, BitTorrent’s author, in an interview with Wired in 2005.

Page 3: Bittorrent: The protocol, its background and uses

What is BitTorrent?

From 10,000 feet

Efficient content distribution system using

file swarming. Does not perform

all the functions of a typical p2p system,

like searching.

http://www.cs.uiowa.edu/~ghosh/bittorrent.ppt

Page 4: Bittorrent: The protocol, its background and uses

What is BitTorrent?

• BitTorrent introduced two novel concepts• Rather than providing a search protocol itself, it

was designed to integrate seamlessly with the Web and made files (torrents) available via Web pages, which could be searched for using standard Web search tools.

• It enabled so-called file swarming; that is, once a peer starts downloading that file, it also makes whatever portion of the file that is downloaded immediately available for sharing.

Page 5: Bittorrent: The protocol, its background and uses

What is BitTorrent

• The file-swarming process is enabled through the use of a tracker: • an HTTP-based server used to dynamically

synchronise and update the peers as they are downloading - tracks availability of pieces of the file on the network. • The tracker also can monitor users’ usage on the

network – how much do they contribute?• Then implements a tit-for-tat scheme, which

divides bandwidth according to how much a peer contributes to the other peers in the network – if you do not share, you cannot consume.

Page 6: Bittorrent: The protocol, its background and uses

BitTorrent Bram Cohen• Born 1975 - computer programmer• Engineered large parts of Mojo

Nation (mojonation.net) - parts of it similar in flavour to Bittorrent (Pre April 2001).

• April 2001, Focused on authoring the peer-to-peer (P2P) BitTorrent protocol and writing the first file sharing program to use the protocol, also known as BitTorrent.

• He is also the organizer of the San Francisco Bay Area P2P-hackers meeting, and the co-author of Codeville.

Currently lives in the San Francisco Bay Area

Page 7: Bittorrent: The protocol, its background and uses

Start of BitTorrent - CodeCon

• Cohen unveiled his novel ideas at the first CodeCon conference in 2002 • CodeCon is a conference for hackers and

technology enthusiasts. • Co-organised by Bram and his roommate Len

Sassaman.

• CodeCon intended to be a low cost conference (I.e. <$100) with a focus on developers doing presentations of working code, rather than on companies with products to sell.

• It remains an event for those seeking information about new directions in software, though BitTorrent continues to lay claim to the title of "most famous presentation".

Page 8: Bittorrent: The protocol, its background and uses

• Peer-to-peer in nature

Features?

Page 9: Bittorrent: The protocol, its background and uses

Taxonomy for Distributed Systems

3. Resource Communication: Two types:

Brokered Communication (centralized): communication is passed through a central server - resources do not have direct references to each other. Point to point (decentralized -peer to peer): a direct connection between the sender and the receiver.

Taxonomy is based on following factors and their relation to centralization:

1. Resource Discovery:

2. Resource Availability: Scalability – do resources scale with network? - does access to them scale with network?

Mechanism for discovering resources on a distributed system?

• Examples: DNS, Napster Lookup, Jini LUS, UDDI, Gnutella broadcast etc

Page 10: Bittorrent: The protocol, its background and uses

Equal Peers, balanced (equal) load on communication

WebServer

Centralization of Point-to-Point Connections

True Peer to Peer e.g. Gnutella

Many to one relationship between users and the web server and therefore this can be considered centralized communication piecespieces

pieces

BitTorrent

Page 11: Bittorrent: The protocol, its background and uses

• Peer-to-peer in nature• Central server called a tracker• Tracker uses HTTP• Download and upload at the same time• Efficiency improves the more a file is

downloaded

Features?

Page 12: Bittorrent: The protocol, its background and uses

Downloading Speeds

Download speeds depend on two factors:

• BitTorrent keeps track of how much you contribute to hosting files for the group. • The more you share, the faster your downloads.

• The more people trading a file, the more options for obtaining its pieces. • So, unlike the old Napster, popularity doesn't bog

down the process -- it gives it a shot of adrenaline• Trackers also more dynamic than Napster servers

- provide updates

Page 13: Bittorrent: The protocol, its background and uses

File Swarming

• File swarming allows users to download files to the maximum of their Download capability of their broadband connection

• Enables simultaneous downloads of pieces of the same file from multiple users.

• Significant because broadband has a far lower Upload bandwidth than Download • upload bandwidth can be ten times slower than download • You can connect to, say, ten peers, will balance this

mismatch and enable full download capacity

Page 14: Bittorrent: The protocol, its background and uses

BitTorrent Protocol

• The BitTorrent protocol is an open specification

• Can be found in full on the BitTorrent Web site

• Is updated periodically in order to keep various BitTorrent applications compatible.

Page 15: Bittorrent: The protocol, its background and uses

Terminology 1

• Torrent - metadata file containing the information about a file to be shared on the BitTorrent network

• Peer - a participant in the network• Seed - the peer that has a complete copy of

the file (who probably created the torrent)• Swarm - peers that are connected

(interested) in a particular file • Tracker - server responsible for keeping track

of the people in a swarm

Page 16: Bittorrent: The protocol, its background and uses

Terminology 2

• Choked - state of a connection when a peer does not wish to upload information at this time (perhaps because s/he already has too many connections)

• Interested - a client is “interested” if they are interested in downloading a file from another BT node.

• Piece - piece of a file in Bittorrent - typically a power of 2, depends on file size - common sizes are 256K, 512K or 1MB.

• Bencoding - terse format for BitTorrent messages

Page 17: Bittorrent: The protocol, its background and uses

BitTorrent

A BitTorrent application generally has the following components:

• An 'original' downloader - seed• An ordinary web server• The end user web browsers - they click on a:• A static 'metainfo' file (a .torrent file)• Start the end user downloading apps (BitTorrent)• A BitTorrent tracker • There are ideally many end users for a single file.

Page 18: Bittorrent: The protocol, its background and uses

Web ServerIansLectures.torrent

Web Sites contain .torrent files

Lectures as .TorrentSeed

- Ian T.

User Web Browser

Because of MIME mapping from .torrent to BitTorrent application

2. User clicks IansLectures.torrent, which launches the BitTorrent Client

BitTorrent Client

(enthusiastic student)

Tracker

Other BitTorrent Client

(enthusiastic student)

Other BitTorrent Client

(enthusiastic student)

1. Ian creates IansLectures.torrent, (metadata) and uploaders it to Web site

4. BitTorrent client contacts specified tracker and finds “interested” clients

5. Clients connect to each other and seed to download pieces

3. Clients show interest in IansLectures.torrent

Page 19: Bittorrent: The protocol, its background and uses

BitTorrent Messages - Bencoding

• Bencoding is a way to specify and organize data in a terse format. It supports the following 4 types:• Strings are encoded as follows: <string length>:<string

data> e.g. 4:spam represents the string "spam"• Integers are encoded as follows: i<integer>e e.g. i3e

represents the integer "3”• Lists are encoded as follows: l<bencoded values>e -

e.g. l4:spam4:eggse represents the list of two strings: [ "spam", "eggs" ]

• Dictionaries are encoded as follows: d<bencoded string><bencoded element>e - note keys must be bencoded strings. E.g. d4:spaml1:a1:bee represents the dictionary { "spam" => [ "a", "b" ] }

Page 20: Bittorrent: The protocol, its background and uses

.torrent Files

The content of a ".torrent" is a bencoded dictionary, containing:

• announce: The URL of the tracker (string) - later versions have lists of trackers.

• info: a dictionary that describes the file(s) of the torrent - contains the following:• Name - name for the file• Piece length: number of bytes in each piece (integer)• Pieces: string consisting of the concatenation of all

20-byte SHA1 hash values, one per piece (byte string)

• Format changes if there’s one file (as above) or many, where there are files occurrences of the above information (piece length and pieces) and path is used to replace name for uniqueness.

Page 21: Bittorrent: The protocol, its background and uses

BitTorrent - Trackers

Centralised: All clients go to one server

The BitTorrent Solution: customers help distribute content

Their contribution grows at the same rate as their demand, creating limitless scalability for a fixed cost.Tracker maintains the process

Page 22: Bittorrent: The protocol, its background and uses

Tracker Scenario

TrackerSeed

BT 1

BT 2

BT 3

Update !

Step 2

Step 1

Step 2 - Pieces 4, 5 and 6

Step 2 - Piece 1

Step 2 - Piece 2

Step 2 - Piece 3

Step 1 - Pieces 1, 2 and 3

Page 23: Bittorrent: The protocol, its background and uses

Tracker GET RequestPeer -> Tracker• Info_hash - 20 byte SHA1 hash of the bencoded form of

the info value from the metainfo file. • Peer_id - string of length 20 containing ID of downloader

- generated at random at the start of a new download. • IP - IP (or dns name) of peer. • Port - port number for the peer - tries port 6881 and if

that port is taken try 6882, then 6883, etc. and give up after 6889.

• Uploaded - total amount uploaded so far.• Downloaded - The total amount downloaded so far.• Left - number of bytes this peer still has to download• Event - optional key which maps to started, completed,

or stopped (or empty, which is the same as not being present).

Page 24: Bittorrent: The protocol, its background and uses

Tracker Response• Tracker -> peer• Tracker responses are bencoded dictionaries.• If a tracker response has a key failure reason,

then that maps to a human readable string which explains why the query failed, and no other keys are required.

• Otherwise, it must have two keys: • Interval which maps to the number of seconds the

downloader should wait between regular rerequests• Peers maps to a list of dictionaries corresponding to

peers, each of which contains the keys peer id, ip, and port, which map to the peer's self-selected ID, IP address or dns name as a string, and port number, respectively.

Page 25: Bittorrent: The protocol, its background and uses

Scenario

Web page with link to .torrent

A

B

C

Peer

[Leech]

Downloader

“US”

Peer

[Seed]

Peer

[Leech]

TrackerWeb Server

.torr

ent

Page 26: Bittorrent: The protocol, its background and uses

Scenario

Web page with link to .torrent

A

B

C

Peer

[Leech]

Downloader

“US”

Peer

[Seed]

Peer

[Leech]

Tracker

Get-announce

Web Server

Page 27: Bittorrent: The protocol, its background and uses

Scenario

Web page with link to .torrent

A

B

C

Peer

[Leech]

Downloader

“US”

Peer

[Seed]

Peer

[Leech]

Tracker

Response-peer list

Web Server

Page 28: Bittorrent: The protocol, its background and uses

Web page with link to .torrent

A

B

C

Peer

[Leech]

Downloader

“US”

Peer

[Seed]

Peer

[Leech]

Tracker

Shake-hand

Web Server

Shake-hand

Scenario

Page 29: Bittorrent: The protocol, its background and uses

Web page with link to .torrent

A

B

C

Peer

[Leech]

Downloader

“US”

Peer

[Seed]

Peer

[Leech]

Tracker

pieces

pieces

Web Server

Scenario

Page 30: Bittorrent: The protocol, its background and uses

Web page with link to .torrent

A

B

C

Peer

[Leech]

Downloader

“US”

Peer

[Seed]

Peer

[Leech]

Tracker

piecespieces

pieces

Web Server

Scenario

Page 31: Bittorrent: The protocol, its background and uses

Web page with link to .torrent

A

B

C

Peer

[Leech]

Downloader

“US”

Peer

[Seed]

Peer

[Leech]

Tracker

Get-announce

Response-peer list

piecespieces

pieces

Web Server

Scenario

Page 32: Bittorrent: The protocol, its background and uses

Strengths

• Better bandwidth utilization• Never before speeds.

• Up to 7 MB/s from the Internet.

• Limit free riding – tit-for-tat• Limit leech attack – coupling upload &

download• Spurious files not propagated• Ability to resume a download• Open Source implementations !

Page 33: Bittorrent: The protocol, its background and uses

Potential Drawbacks

• Small files – latency, overhead• Scalability• Millions of peers – Tracker behavior (uses 1/1000 of

bandwidth)• Single point of failure - although there can be many trackers,

there is only one tracker assigned to each torrent file• Difficult to load balance• Solved later by having lists of alternative trackers

• Robustness• System progress dependent on altruistic nature of seeds (and

peers)• Malicious attacks and leeches.

Page 34: Bittorrent: The protocol, its background and uses

• 160 million clients, 100 million active users

• According to their website, the company has announced partnerships with some 55 companies, including:

Who Uses it?

Page 35: Bittorrent: The protocol, its background and uses

Bittorrent: summary

1. BitTorrenta) Underlying file sharing protocolb) Role of the .torrentc) Use and role of the trackerd) Bittorrent Scenarioe) How file swarming works