cs 138: distributed systemscs.brown.edu/courses/cs138/s18/lectures/l1_intro.pdffinal puddlestore •...
TRANSCRIPT
CS138:DistributedSystems
Staff• Faculty
– TheophilusBenson(AKATheo)
• HeadTAs
– SiddarthKaramcheti
• UTAs
– AmedeoFeliceAlberio– AbdullaAldilaijan
– ChristopherHarvie– BenjaminShteinfeld
– JaredSiskin
CourseDetails• CoursePolicies
• ClassWebsite:http://cs.brown.edu/courses/cs138/s18/index.html• Piazza:https://piazza.com/brown/spring2018/cs1380• Cheating/Collaboration
• Waitlist!!!!• Registered:~69• Waitlisted:~108• Attendclass!• Ifclassisrequiredforgraduation(2018),haveacademicadvisoremailus.
CourseDetails• CoursePolicies
• ClassWebsite:http://cs.brown.edu/courses/cs138/s18/index.html• Piazza:https://piazza.com/brown/spring2018/cs1380• Cheating/Collaboration
• Waitlist!!!!• Registered:~69• Waitlisted:~108• Attendclass!• Ifclassisrequiredforgraduation(2018),haveacademicadvisoremailus.
Workload
• Fourprograms(50%)– LiteMiner(10%)NEW!!!!!!
– Tapestry(13%)– Raft(13%)
– PuddleStore(13%)
• Fourwrittenhomeworks(15%)
• Onein-classmidtermexam(15%)
• Finalexam(20%)• Seehttp://cs.brown.edu/courses/cs138/s18/content/docs/syllabus.pdf
SkillsNeeded
• Abilitytowriteanddebuglargishprogramswiththreads– CS32or33
• Abilitytoproveatheorem– therewon’tbemany
– CS22ishelpful
• Willingnesstolearnanewprogramminglanguage– Go
Textbook
What Is A Distributed System?“Acollectionofindependentcomputersthatappearstoitsusersasasinglecoherentsystem.”
• Ideal:topresentasingle-systemimage:– Thedistributedsystem“lookslike”asinglecomputerratherthanacollectionofseparatecomputers.
DistributedSystemsareHardtoBuild
• Mustdealwithfailures
• Coordinationbetweennodes
• Consistencyandpersistence
• Concurrency
Google Confidential and Proprietary
OFC Bug History
WhyDoYouNeedADistributedSystem?
• Faulttolerance
• Scalability
• Performance
• Resourcesharing
• InformationGathering
Youwoulddesignadistributedsystemwhen…
• Interestedingettingfastandefficientdeeplearningframeworks
https://training.databricks.com/databricks_guide/gentle_introduction/spark_cluster_tasks.png
Youwoulddesignadistributedsystemwhen…
• Efficientlymanagingandcoordinatingbitcoinminers
https://medium.com/@lopp/the-future-of-bitcoin-mining-ac9c3dc39c60
Youwoulddesignadistributedsystemwhen…
• Designingsuccessfullargeonlineservices
http://www.ageofinnovation.org/gallery/the-benefits-of-using-online-services-pictures/The-benefits-of-using-online-services.jpg
Youwoulddesignadistributedsystemwhen…
• DesigningIoTplatforms
https://training.databricks.com/databricks_guide/gentle_introduction/spark_cluster_tasks.png
Youwoulddesignadistributedsystemwhen…
• Designingselfdrivingcars
https://training.databricks.com/databricks_guide/gentle_introduction/spark_cluster_tasks.png
Youwoulddesignadistributedsystemwhen…
• Designingtradingplatforms
https://3nlm2c1gjj0z2ju16293909h-wpengine.netdna-ssl.com/wp-content/uploads/2017/01/best-stock-trading-software-mac-
WhyDoYouNeedADistributedSystem?
• Buildingacloud
• ScalingoutthenextAirBnB
• Designingaselfdrivingcar
• Interestedingettingfastandefficientdeeplearningframeworks
• Developingthenextgameframework
• Designingatradingplatformforahedgefundorinvestmentbank
CaseStudy:Facebook
CAVA
• Circa2007,Facebookdecidedtoaddaseconddatacentertoitsoperations
FacebookDatabaseReplication
https://www.facebook.com/notes/facebook-engineering/scaling-out/23844338919
Why?
• Majorreason:latency– can’tgofasterthanthespeedoflightyet
• Otherreasons– scale:needtohandlerapidlyincreasingloads– resiliency:whatifanearthquakehitsCA?– power:sometimesavailabilityofpowerlimitsthesizeofadatacenter!
Cachingobjects
• Facebookhandlesreadsviamemcached
Cachingobjects
• Cacheinvalidatedonanewwrite
AddinganewDatacenter
• Initialdesignhadabug
AddinganewDatacenter
Introduction to Go
Where is Go used?
● Google, of course! ● Docker (Container management) ● CloudFlare (Content Delivery Network) ● Digital Ocean (VM hosting) ● Dropbox (Cloud storage/file sharing) ● … and many more!
Why use Go?
● Easy concurrency w/ goroutines (green threads)
● Garbage collection and memory safety
● Libraries provide easy RPC
● Channels for communication between goroutines
Example: Simple Program
packagemain
import( "fmt" "os")
funcmain(){ forcount:=1;count<100;count++{ ifcount%2==0{ fmt.Printf("Foundevennumber:%v\n",count) }else{ fmt.Fprintf(os.Stderr,"Notanevennumber:%v\n",count) } }}
● No parentheses
● “for { }” will loop forever
● “for condition { }” avoids initialization/afterthought, similar to a while loop
Learning Go
● Project 0: Whatsup? ● Effective Go ● golang.org/doc ● tour.golang.org ● go-handout.pdf
PuddleStore
• Averydistributedfilesystem– thousandsofcomputers
• allovertheworld– (oratleastthroughouttheSunLab)
• nocommonadministration
– eachholdspiecesofafewfiles
• piecesreplicatedonmanycomputers
• BasedonOceanStore– anditsPondprototype
AFile
Indirect Block
Data Block 1 Data Block 4
Data Block 2 Data Block 3
ADistributedFile
Indirect Block
Data Block 1
Data Block 4
Data Block 2Data Block 3
Indirect Block
Indirect Block
Indirect Block
Data Block 1
Data Block 1
Data Block 1
Data Block 2
Data Block 2
Data Block 2
Data Block 1
Data Block 1
Data Block 1
Data Block 1
Data Block 2
Data Block 2
Data Block 2
HowDoYouFindthePieces?
• Hashing• But...
– nodesmaycrash
• duplicatesarerequired
• howdoyoufindthem?
– mustprovidegoodperformance
• cachingmaybenecessary
• piecesmayhavetobereassignedtootherlocations
MakingItWork(sortof…)• Assigneachblockauniquen-bitID
– cryptohashofitscontents• Assigneachcomputerauniquen-bitID
• StoreblockatcomputerthathasclosestID
• Routerequestsforthatblocktothatcomputer
0x2a74ca56 0x9da6f453
0x529e02f80xd53b7621
Data Block 10x87a6df52
I want Block 1
OverlayNetworks
MakingIt(really)Work(withhighprobability)• Assigneachblockauniquen-bitID
– cryptohashofitscontents• Assigneachcomputerauniquen-bitID
• Storemultiplecopiesofblockseachatanumberofcomputers
• StoreblockaddressesatcomputerthathasclosestID– addressesarecachedatothernodes
• Routerequestsforthatblocktothatcomputer– requestisredirectedtonearestcomputerthathascopyofblock
0x2a74ca56 0x9da6f453
0x529e02f80xd53b7621
Data Block 10x87a6df52
0x87a6df52 locations: 0x2a74ca56 0xd53b7621
Data Block 10x87a6df52
I want Block 1
PublishingD A
B
C
R ABC
A
A
A
A A
A
B
B
B
B
B BC C
C
C C
?
Tapestry
• Distributedobjectlocationandrouting(DOLR)– youimplementitinthesecondprogrammingassignment
MorePuddleStoreIssues
• Howarefilesnamed?– fileID=CryptoHash(filename)
• Howarefilesupdated?– carefully…
CopyonWrite(1)
Indirect Block
Data Block 1 Data Block 4
Data Block 2 Data Block 3
Version Node
fileID
ModifiedData Block 4
Indirect Block
ModifiedData Block 3
CopyonWrite(2)
Indirect Block
Data Block 1 Data Block 4
Data Block 2 Data Block 3
Version Node
fileID
ModifiedData Block 4
Indirect Block
ModifiedData Block 3
MoreRedundancy
Indirect Block
Data Block 1 Data Block 4
Data Block 2 Data Block 3
fileID
ModifiedData Block 4
Indirect Block
ModifiedData Block 3
Version Node
Raft
• Multipleclientsupdatefileconcurrently• Eachcommunicateswithdifferentservers
– serverspropagatechangestoallcopies
• Howdoweensurethatallcopiesareupdatedinthesameorder?– ordermatters...
• Raft– thirdprogrammingassignment
FinalPuddleStore
• Youputallthistogether– wegiveyoutheBdesign
• ifyouimplementitcompletely:yougetaB
– ifyouimproveit(reasonablywell):yougetanA(anditmaycountasacapstone)
• you’reencouragedtodiscussyourdesignwithclassmates
Top Secret
Announcements
• CS138Social• Where:CIT3rdFloorAtrium• When:6-8pm• What:MeetTAs
• Filloutandsigncheatingpolicy• TAwillsendthisouttonight
• Project1outtonight• Writeaminerandminingpoolmanager• Due:Feb13