distributed algorithms. distributed computing key idea –buying 1000 machines of speed x is...
out of 47
Post on 22-Dec-2015
Embed Size (px)
- Slide 1
- Distributed Algorithms
- Slide 2
- Distributed computing Key idea Buying 1000 machines of speed x is significantly cheaper than buying one machine of speed 1000x No one person has to buy all 1000 machines: A lot of computational, communication and storage resources already in place and can be harvested for bigger things Key challenge Making the machines work together for effective speedup. Communication between machines is a key challenge. Approaches Find problems that can be distributed easily
- Slide 3
- Distributed problems Problems that can use decentralized computing Weather prediction Weather in a location is most affected by weather nearby Movie generation Individual frames can be generated separately Google search engine 10,000s PCs. all of them cheap, many of them identical Can answer over 100,000,000 queries per day in sec or less each Looking for the origin of the universe Can be localized like weather prediction File swapping and access (distributed storage) Looking for extra terrestrial intelligence Content caching and distribution
- Slide 4
- Distributed computers Scales of distributed computing Cluster-in-a-roomhundreds of machines All dedicated to the task PCs on a campusthousands of machines Using spare cycles SETI clustermillions of machines Screen saver situation
- Slide 5
- Cluster in a Room Machines are dedicated to the network All machines run similar software Problem is divided into pieces Each piece is assigned to a machine in the cluster Problem pieces should be loosely linked Computation is faster than communication
- Slide 6
- PCs on a Campus Loosely coupled on a local-area-network PCs do other things some of the time When free cycles are available, theyre used Many more machines, but less of each machine available
- Slide 7
- Workstation Network at Google Front end 100 machines called www.google.com Searching machines Retrieving machines Fit 40-80 machines in a 7x2x3 rack
- Slide 8
- SETI Telescope at Arecibo, PR collects data Data is processed in real time by fast machines But, no one looks for weak signals Too costly SETI@Home project built to do this
- Slide 9
- SETI@Home Receive data from Arecibo 35 Gbytes per day by snail mail Break into Work Units .25 Mbyte each, so 140,000 WUs per day WU takes 20 hours to process Need about 117,000 dedicated machines to process one day
- Slide 10
- SETI@Home Get individual users to download software Machine idle and screen saver runs software Download WU Compute When finished send back result Database at Berkeley reassembles results Progress to date -- Seti@HomeStatsSeti@HomeStats
- Slide 11
- Medical/Biological Applications Peer-to-Peer Medicine Cancer Research
- Slide 12
- Distributed databases Data spread across machines in different ways Web pages E.g. HTTP MP3 collections E.g. Napster, Gnutella E.g Morpheus, Kazaa, Music City Auction items E.g. EBay
- Slide 13
- Client-Server Model Central server Clients store and retrieve data from server File manager HTTP
- Slide 14
- Napster Model Server is only used to find who has the file Communication is peer-to-peer (P2P) Client to client transfer without a real server
- Slide 15
- How Napster works Initial registration name, password, local directory for files, When client connects to Napster server Client provides list of files it will share Napster updates its central index of available files When client asks for a file Napster gives client a list of online clients with that file
- Slide 16
- How Napster works (contd.) When client asks for download from given supplier Napster asks supplier to accept a request Napster tells client how to contact supplier Client opens port and fetches file from supplier Supplier and client report progress/completion status to Napster Napster server directory continually updated Client ranks potential servers by bandwidth and latency
- Slide 17
- Napster Model Server is only used to make connections Communication is peer-to-peer (P2P) Client to client transfer without a real server Can we do this without a central server???
- Slide 18
- Gnutella Gnutella design has no central server Every machine is both client and server (called servant in Gnutella) To connect, you need to know any one machine already on the Gnutella network
- Slide 19
- How Gnutella works To connect to the network, only need to know of any servant that is already connected. Your servant announces your presence to all of the servants it is already connected to, and so on until the message propagates throughout the entire network. Each of these servants then responds to this message with a bit of information about itself: how many files it is sharing, how many KBs of space they take up, etc. By connecting, you immediately know how much is available on the network to search through.
- Slide 20
- How Gnutella works - II To search You send out a search request, it is propagated through the network, and each servant that has matching terms passes back its result set Each servant handles the search query in its own way To save on bandwidth, a servant does not have to respond to a query if it has no matching items. The servant also has the option of returning only a limited result set.
- Slide 21
- How Gnutella works - III For file sharing, each servant acts as a miniature HTTP web server. To prevent searches from going on forever: Gnutella messages have a TTL (Time To Live) The TTL starts off at some low number, like 5 Each time a packet is routed through a servant, the servant lowers the TTL by 1 Once the TTL hits 0 the packet is no longer forwarded Keeps messages from circling the network forever
- Slide 22
- What is KaZaA (now FastTrack) KaZaA Media Desktop 17 Million downloads of KaZaA by April 2002! And 2nd place on C|Net Download.com. Why don't you join the world's largest online media community right now. 91% of users are recommending it. What are you waiting for? Download it now - it's free. KaZaA Media Desktop is a full featured peer-to-peer file sharing application. You can search, download, organise and play your media files - audio, video, images and documents with it. It has a powerful search engine where you can search on 'meta data' such as categories, artist etc.
- Slide 23
- Users and Usage 60M users of file-sharing in US 8.5M logged in at a given time on average 814M units of music sold in US last year 140M digital tracks sold by music companies As of Nov, 35% of all Internet traffic was for BitTorrent, a single file-sharing system Major legal battles underway between recording industry and file-sharing companies
- Slide 24
- Share of Internet Traffic
- Slide 25
- Traffic on a College Network Chronicle of Higher Education 9/28/01
- Slide 26
- Types of File-Sharing Traffic
- Slide 27
- Chronicle of Higher Education 9/28/01
- Slide 28
- Number of Users Others include BitTorrent, eDonkey, iMesh, Overnet, Gnutella BitTorrent (and others) gains share from FastTrack. Why?
- Slide 29
- What about: Copyrights Who owns this stuff? Why is this not like going to a store and stealing? Breaches of trust Try to download Friends but get pornography Can download serious viruses PU Bandwidth This detracts from real work of the university?
- Slide 30
- Is it Legal? Legal Opinion: Supreme Court hearing case as we speak Public Opinion:
- Slide 31
- Whats next Problems for which there are no algorithms Problems for which all algorithms run slowly Applications of problems where algorithms run slowly
- Slide 32
- Hard problems
- Slide 33
- The big picture We built a computer We built an operating system We attached the computer to a network We wrote programs We designed algorithms We looked at distributed algorithms and systems Next, we want to see if there are limitations Mathematical proofs that things cannot be done???
- Slide 34
- Simple unsolvable problem This sentence is false. Consider the sentence: Problem: Can you say, correctly, if the sentence is true or false?
- Slide 35
- Limitations of Computers? Possible impossible tasks for computers? Emotion Creative thought Beating the world chess champion? Deep Blue v. Kasparov: Who won the 6- game rematch in 1997 Others?
- Slide 36
- Limitations of Computers? How to show that such tasks are impossible? Find a cleanly defined problem Prove a theorem that says it cant be solved Theorem is proved using axioms (like logic gates) that are assembled (like state machines and computers) to justify the result The underlying field is called mathematical logic
- Slide 37
- Simple task --The Hello Assignment You write a computer program which outputs Hello! and stops. Assignment promises full credit for ANY program that outputs Hello! and then halts, and no credit otherwise. Assignment is to be done on a computer that is as fast as you like and has as much memory as you like.
- Slide 38
- The Hel
View more >