riak core: building distributed applications without shared state
DESCRIPTION
Riak Core--an open-source Erlang library created by Basho Technologies that powers Riak KV and Riak Search--allows developers to build distributed, scalable, failure-tolerant applications based on a generalized version of Amazon's Dynamo architecture. In this talk, Rusty explains why Riak Core was built, discuss what problems it solves and how it works, and walk through the steps to using Riak Core in an Erlang application.TRANSCRIPT
Rusty Klophaus (@rklophaus)Basho Technologies
Riak Core: Building Distributed
Applications Without Shared State
Commercial Users of Functional Programming Baltimore, MD · October 2010
Wednesday, October 6, 2010
2
You suddenly feelan uncontrollable desire to
learn Erlang.
http://www.flickr.com/photos/procsilas/18014203
Wednesday, October 6, 2010
What is Riak Core?How does it work?How can you use it?
3
Wednesday, October 6, 2010
Distributed, scalable, failure-tolerant.
4
Wednesday, October 6, 2010
Distributed, scalable, failure-tolerant.
No central coordinator. Easy to setup/operate.
5
Wednesday, October 6, 2010
Distributed, scalable, failure-tolerant.
Horizontally scalable; add commodity hardware
to get more X.
6
Wednesday, October 6, 2010
Distributed, scalable, failure-tolerant.
Always available. No single point of failure.
Self-healing.
7
Wednesday, October 6, 2010
Basho Technologies
8
Riak KVDistributed, scalable, failure-tolerant key/value datastore.
Started as a “Dynamo clone”.
Map/Reduce, Lightweight Data Relations, Client APIs
Riak SearchDistributed, scalable, failure-tolerant full-text search engine.
Near Realtime, Riak KV Integration, Solr Support
Wednesday, October 6, 2010
Wednesday, October 6, 2010
Riak
KV
Riak
Search
Riak
Core
Wednesday, October 6, 2010
Riak Core is an Erlang librarythat helps you build
distributed, scalable, failure-tolerant applications.
11
Wednesday, October 6, 2010
Amazon Dynamo
Wednesday, October 6, 2010
“We Generalized the Dynamo Architecture and Open-Sourced the Bits.”
13
Wednesday, October 6, 2010
Wait, doesn’t *Erlang* let you build distributed, scalable, failure-tolerant
applications?
14
Wednesday, October 6, 2010
Client
Service A Service B
Resource D
Service C
Queue E
Erlang makes it easy to connect the components of your application.
Wednesday, October 6, 2010
Service
Node A
Node E
Node I
Node M
Node B
Node F
Node J
Node N
Node C
Node G
Node K
Node O
Node D
Node H
Node L
. . .
Riak Core helps you build a service that harnesses the power of many nodes.
Wednesday, October 6, 2010
“People use languagesother than Erlang?!?I find that hilarious.”-Al Gore actually said this to me.
http://www.flickr.com/photos/scobleizer/2216445692
Wednesday, October 6, 2010
How does Riak Core work?
18
Wednesday, October 6, 2010
Command ObjectName, Payload
Wednesday, October 6, 2010
Predictable Routing
20
Wednesday, October 6, 2010
Hash the Object Name
Command ObjectName, Payload
SHA1(ObjName), Payload
0 to 2^160
Wednesday, October 6, 2010
A Naive Approach
Command ObjectName, Payload
SHA1(ObjName), Payload
Node A Node B Node C Node D
Wednesday, October 6, 2010
A Naive Approach
Command
SHA1(ObjName), Payload
Node A Node B Node C Node D Node E
ObjectName, Payload
Wednesday, October 6, 2010
"All problems in computer science can be solved by
another level of indirection." - David Wheeler
24
Wednesday, October 6, 2010
Add VNodes
25
WhatVirtual Node. Logical subdivision of the cluster.
Handles incoming commands, does work, replies.
For Parallelism# of VNodes = maximum concurrent requests
For Rebalancing the ClusterSmallest block that can be shifted to a new node.
For ResilianceThe system restarts failed VNodes.
Wednesday, October 6, 2010
Routing with Consistent Hash
Command ObjectName, Payload
SHA1(ObjName), Payload
VNode 0 VNode 1 VNode 2 VNode 3 VNode 4 VNode 5 VNode 6 VNode 7
Node A Node B Node C Node D
Wednesday, October 6, 2010
Adding a Node
Command ObjectName, Payload
SHA1(ObjName), Payload
VNode 0 VNode 1 VNode 2 VNode 3 VNode 4 VNode 5 VNode 6 VNode 7
Node A Node B Node C Node D Node E
Wednesday, October 6, 2010
Removing a Node
Command ObjectName, Payload
SHA1(ObjName), Payload
VNode 0 VNode 1 VNode 2 VNode 3 VNode 4 VNode 5 VNode 6 VNode 7
Node A Node B Node C Node D Node E
Wednesday, October 6, 2010
The Ring
Hash Location
Wednesday, October 6, 2010
The Ring
Preflist
Wednesday, October 6, 2010
Writing Replicas (n_val)
Preflist when N=3
Wednesday, October 6, 2010
Routing Around Failures
Preflist when N=3and node 0 is down.
X
Wednesday, October 6, 2010
Location of the Routing Layer
33
Wednesday, October 6, 2010
Router in the Middle
Client Client Client
Router
VNode
0
Node A Node B Node C Node D Node E
VNode
1
VNode
3
VNode
4
VNode
2
VNode
5
VNode
6
VNode
7
Wednesday, October 6, 2010
Riak Core - Router on Each Node
Client Client Client
Router Router Router RouterRouter
VNode
0
Node A Node B Node C Node D Node E
VNode
1
VNode
3
VNode
4
VNode
2
VNode
5
VNode
6
VNode
7
Wednesday, October 6, 2010
Eventually - Router on the Client
Client Client Client
VNode
0
Node A Node B Node C Node D Node E
VNode
1
VNode
3
VNode
4
VNode
2
VNode
5
VNode
6
VNode
7
Router RouterRouter
Wednesday, October 6, 2010
No Shared State
Router Router Router RouterRouter
VNode
0
Node A Node B Node C Node D Node E
VNode
1
VNode
3
VNode
4
VNode
2
VNode
5
VNode
6
VNode
7
Wednesday, October 6, 2010
Gossip
Local Ring State
IncomingRing State
Are rings equivalent?Strictly descendent?Or different?
Wednesday, October 6, 2010
Handoff
WhenNodes is added to the system.Node is removed from the system.Node has temporarily failed.
WhatShip the data backing a VNode from one node to another.
39
Wednesday, October 6, 2010
Not MentionedVector ClocksMerkle TreesBloom Filters
40
Wednesday, October 6, 2010
Distinguishedgentlemen
prefer Erlang.
http://www.flickr.com/photos/rebcal/3987226359
Wednesday, October 6, 2010
How do you use Riak Core?
42
Wednesday, October 6, 2010
Two Things to Think About
43
CommandCommand = ObjectName, PayloadThe commands/requests/operations that you will send through the system.
VNode ModuleThe callback module that will receive the commands.
Wednesday, October 6, 2010
VNode Module
44
Startup/Shutdowninit([Partition]) ->
{ok, State}
terminate(State) ->
ok
Commandshandle_command(Cmd, Sender, State) ->
{noreply, State1} | {reply, Reply, State1}
handle_handoff_command(Cmd, Sender, State) ->
{noreply, State1} | {reply, ok, State1}
Wednesday, October 6, 2010
VNode Module
45
Handoff Coordinationhandoff_starting(Node, State) ->
{Bool, State1}
encode_handoff_data(Data, State) ->
<<Binary>>.
handle_handoff_data(Data, Sender, State) ->
{reply, ok, State1}
handoff_finished(Node, State) ->
{ok, State1}
Wednesday, October 6, 2010
Start the riak_core application
riak_core
riak_core_vnode_sup riak_core_handoff_*
riak_core_ring_*
riak_core_node_*
riak_core_gossip_*X_vnode
. . .
X_vnode
X_vnode
46
application:start(riak_core).
Wednesday, October 6, 2010
Start the riak_core application
riak_core
riak_core_vnode_sup riak_core_handoff_*
riak_core_ring_*
riak_core_node_*
riak_core_gossip_*X_vnode
. . .
X_vnode
X_vnode
47
Supervise vnode processes.
Wednesday, October 6, 2010
Start the riak_core application
� � � � � � �
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � �
� � � � � � � � � � � � �
� � � � � � � � � � � � � � �� � � � � �
� � � � �
� � � � � �
� � � � � �
48
Start, coordinate, and supervise handoff.
Wednesday, October 6, 2010
Start the riak_core application
riak_core
riak_core_vnode_sup riak_core_handoff_*
riak_core_ring_*
riak_core_node_*
riak_core_gossip_*X_vnode
. . .
X_vnode
X_vnode
49
Maintain cluster membership information.
Wednesday, October 6, 2010
Start the riak_core application
riak_core
riak_core_vnode_sup riak_core_handoff_*
riak_core_ring_*
riak_core_node_*
riak_core_gossip_*X_vnode
. . .
X_vnode
X_vnode
50
Monitor node liveness, broadcast to registered modules.
Wednesday, October 6, 2010
Start the riak_core application
riak_core
riak_core_vnode_sup riak_core_handoff_*
riak_core_ring_*
riak_core_node_*
riak_core_gossip_*X_vnode
. . .
X_vnode
X_vnode
51
Send ring information to other nodes.Reconcile different views of the cluster.
Rebalance cluster when nodes join or leave.
Wednesday, October 6, 2010
In your application...
riak_core
riak_core_vnode_sup riak_core_handoff_*
riak_core_ring_*
riak_core_node_*
riak_core_gossip_*X_vnode
. . .
X_vnode
X_vnode
52
Start the vnodes for your application.Master = { riak_X_vnode_master, { riak_core_vnode_master, start_link, [riak_X_vnode] }, permanent, 5000, worker, [riak_core_vnode_master]},{ok, { {one_for_one, 5, 10}, [Master]} }.
Wednesday, October 6, 2010
In your application...
riak_core
riak_core_vnode_sup riak_core_handoff_*
riak_core_ring_*
riak_core_node_*
riak_core_gossip_*X_vnode
. . .
X_vnode
X_vnode
53
Tell riak_core that your applicationis ready to receive requests.
riak_core:register_vnode_module(riak_X_vnode),riak_core_node_watcher:service_up(riak_X, self())
Wednesday, October 6, 2010
In your application...riak_core
riak_core_vnode_sup riak_core_handoff_*
riak_core_ring_*
riak_core_node_*
riak_core_gossip_*X_vnode
. . .
X_vnode
X_vnode
riak_core
riak_core_vnode_sup riak_core_handoff_*
riak_core_ring_*
riak_core_node_*
riak_core_gossip_*X_vnode
. . .
X_vnode
X_vnode
54
Join to an existing node in the cluster.
riak_core_gossip:send_ring(ClusterNode, node())
Wednesday, October 6, 2010
Start Sending Commands
55
# Figure out the preflist...{_Verb, ObjName, _Payload} = Command,PrefList = riak_core_apl:get_apl(ObjName, NVal, riak_X),
# Send the command...riak_core_vnode_master:command(PrefList, Command, riak_X_vnode_master)
Wednesday, October 6, 2010
Review
Riak CoreOpen source Erlang library for building distributed, scalable, failure tolerant applications.
Continual improvement in the coming months.
Riak KVKey/Value datastore with map/reduce based on Riak Core.
Riak SearchFull-text, near real-time search engine based on Riak Core.
56
Wednesday, October 6, 2010
http://www.flickr.com/photos/jurvetson/469492885
At the center of the universe, beneath the bottom-most turtle,
is a cluster of Erlang nodes.
Wednesday, October 6, 2010
http://www.flickr.com/photos/jurvetson/469492885
At the center of the universe, beneath the bottom-most turtle,
is a cluster of Erlang nodes.With 100% uptime.
Wednesday, October 6, 2010
Thanks! Questions?
Learn MoreMore Information: http://wiki.basho.comAmazon’s Dynamo Paper
Get the Codehttp://hg.basho.com/riak_core
Get in TouchTwitter : @rklophaus, @basho/teamEmail: [email protected]
59
Wednesday, October 6, 2010
END
Wednesday, October 6, 2010