dlm knowledge-sharing

42
DLM Introduction HA team knowledge sharing - 2016 Zhen Ren [email protected]

Upload: eric-ren

Post on 12-Apr-2017

44 views

Category:

Software


0 download

TRANSCRIPT

DLM IntroductionHA team knowledge sharing - 2016

Zhen [email protected]

Agenda

● What’s DLM?

● Where is DLM in HA cluster?

● How to configure DLM?

● DLM in userspace

● DLM in kernel

● Interaction between userspace and kernel

2

What’s DLM?

3

What is DLM?

4

Distributed Locking Manager is used to coordinate nodes to access shared resource in cluster.

● Node level locking

● Not only for file lock– Protects the whole filesystem

– supports some filesystem semantics

History brief

● 1982: VAX (3.0), developed by Digital Equipment Corporation (DEC)

● 1992: OpenVMS (5.1), acquired by Compaq

● 2001: redhat developed opendlm for GFS after giving up working on IBM’s dlm

● 2005: David Teigland push DLM into mainline

● After 2012: DLM becomes stable and not in heavy development since then

5

Some highlights for DLM

● Availability: recovery ability via keeping a duplicated cluster-wide lock database

● Performance: achieves excellent performance by increasing the likehood of local processing

● Elimination of bottleneck: distributes lock server workload among every nodes in cluster, eliminating the bottlenecks of memory, cpu and network

● kernel implementation: the main users of DLM OCFS2 and GFS2 are kernel filesystems

6

Where is DLM in HA cluster?

7

Where is DLM in HA cluster? (1/2)

8

● dlm_controld– an agent deamon for dlm kernel code: membership and user

interface– based on the services provided by corosync, i.e. closed process

group(cpg), quorum and configuration

● Resource agent– Provided by pacemaker to control dlm deamon

● Libdlm– transfers locking operations from ocfs2 tools, clvm,etc, to kernel by

dlm devices

● dlm_tool– Admin and debug

● DLM kernel module– DLM core, used by ocfs2, gfs2, clvm and clusterMD

Where is DLM in HA cluster? (2/2)

9

Some Resources for DLM

● Source code– DLM userspace code:

https://git.fedorahosted.org/cgit/dlm.git

– DLM kernel module: fs/dlm/*

● RPM packages– dlm-kmp-default-4.4.19-60.1.x86_64

– libdlm-4.0.4-15.2.x86_64

– libdlm3-4.0.4-15.2.x86_64

● DLM RA– /usr/lib/ocf/resource.d/pacemaker/controld

10

How to configure DLM?

11

Configure DLM

0. We assume the pacemaker and sbd are already properly configured.

1. Add DLM resource#crm configure primitive dlm ocf:pacemaker:controld \op start interval=0 timeout=90 \op stop interval=0 timeout=100 \op monitor interval=20 timeout=600

2. Put DLM resource into group#crm configure group base-group dlm

3. Clone the group#crm configure clone base-clone base-group

4. Check#crm status full

12

DLM in userspace

13

DLM Deamon

14

DLM tools – usage overview

# dlm_tool -hUsage:

dlm_tool [command] [options] [name]

Commands:ls, status, dump, dump_config, fence_acklog_plock, plocksjoin, leave, lockdebug

15

DLM tools – Dump dlm_controld daemon state

# dlm_tool status

cluster nodeid 1084783247 quorate 1 ring seq 28 28

daemon now 348817 fence_pid 0

node 1084783118 M add 3523 rem 0 fail 0 fence 0 at 0 0

node 1084783247 M add 3523 rem 0 fail 0 fence 0 at 0 0

● seq x y

– x: cluster_ringid – ring ID from corosync quorum service

– y: daemon_ringid – to indicate if dlm_controld is in sync with quorum change

● Role

– M: member, U: starting-up node, X:others

16

DLM tools – List lockspace

# dlm_tool lsdlm lockspacesname 2E76BB09DD314581A62C032436F58344id 0xc0dc6d2aflags 0x00000000 change member 1 joined 1 remove 0 failed 0 seq 1,1Members 1084783247

● seq x,y

– x: the most recent completed change sequence number

– y: the most recent change sequence number

17

DLM tools – Dump dlm_controld debug buffer(1/3)

# dlm_tool dump3523 dlm_controld 4.0.4 started3523 our_nodeid 1084783247…3523 cmap totem.cluster_name = 'cluster'3523 set cluster_name cluster…3523 cluster quorum 1 seq 28 nodes 23523 cluster node 1084783118 added seq 283523 set_configfs_node 1084783118 192.168.122.14 local 03523 cluster node 1084783247 added seq 283523 set_configfs_node 1084783247 192.168.122.143 local 13523 cpg_join dlm:controld …...3523 daemon joined 1084783247…3523 daemon joined 1084783118…

18

DLM tools – Dump dlm_controld debug buffer(2/3)

342259 uevent: add@/kernel/dlm/2E76BB09DD314581A62C032436F58344

342259 kernel: add@ 2E76BB09DD314581A62C032436F58344

342259 uevent: online@/kernel/dlm/2E76BB09DD314581A62C032436F58344

342259 kernel: online@ 2E76BB09DD314581A62C032436F58344

342259 2E76BB09DD314581A62C032436F58344 cpg_join dlm:ls:2E76BB09DD314581A62C032436F58344 ...

342259 2E76BB09DD314581A62C032436F58344 start_kernel cg 1 member_count 1

342259 write "3235671338" to "/sys/kernel/dlm/2E76BB09DD314581A62C032436F58344/id"

342259 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/2E76BB09DD314581A62C032436F58344/nodes/1084783247"

342259 write "1" to "/sys/kernel/dlm/2E76BB09DD314581A62C032436F58344/control"

342259 write "0" to "/sys/kernel/dlm/2E76BB09DD314581A62C032436F58344/event_done"

19

DLM tools – Dump dlm_controld debug buffer(3/3)

344456 2E76BB09DD314581A62C032436F58344 add_change cg 2 counts member 2 joined 1 remove 0 failed 0344456 2E76BB09DD314581A62C032436F58344 stop_kernel cg 2344456 write "0" to "/sys/kernel/dlm/2E76BB09DD314581A62C032436F58344/control"…344456 2E76BB09DD314581A62C032436F58344 check_fencing done...344456 2E76BB09DD314581A62C032436F58344 match_change 1084783118:1 matches cg 2344456 2E76BB09DD314581A62C032436F58344 wait_messages cg 2 need 1 of 2344456 2E76BB09DD314581A62C032436F58344 receive_start 1084783247:2 len 80344456 2E76BB09DD314581A62C032436F58344 match_change 1084783247:2 matches cg 2344456 2E76BB09DD314581A62C032436F58344 wait_messages cg 2 got all 2344456 2E76BB09DD314581A62C032436F58344 start_kernel cg 2 member_count 2344456 dir_member 1084783247344456 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/2E76BB09DD314581A62C032436F58344/nodes/1084783118"344456 write "1" to "/sys/kernel/dlm/2E76BB09DD314581A62C032436F58344/control"

20

DLM tools – display of locks from the lockspace(1/3)

#dlm_tool lockdebug 2E76BB09DD314581A62C032436F58344

Resource len 12 "version_lock"Master LVB len 64 seq 101 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00Granted00000001 PR…Resource len 31 "P000000000000000000000000000000"Master Granted00000007 EX

21

DLM tools – display of locks from the lockspace(2/3)

● On node 1084783247:

Resource len 31 "M0000000000000000000005b1a11150"Master LVB len 64 seq 905 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 15 f8 96 97 92 f1 32 7b 15 f8 96 95 9a f4 7e ff 15 f8 96 95 9a f4 7e ff 00 00 00 00 00 00 0f 38 41 ed 00 03 00 00 00 00 b1 a1 11 50 00 00 00 00Granted00000011 EX Remote: 1084783118 00000005 00000006 NL

22

DLM tools – display of locks from the lockspace(3/3)

● On node 1084783118

Resource len 31 "M0000000000000000000005b1a11150"

Master:1084783247

Granted

00000005 EX Master: 1084783247 00000011

23

DLM library

DLM library is mainly used by ocfs2-tools, gfs2-utils and clvm code.

[1] Programming Locking Applications:

http://opendlm.sourceforge.net/cvsmirror/opendlm/docs/dlmbook_final.pdf

24

DLM in kernel

25

core concepts (1/2)

26

● Lockspace– independent namespaces for different DLM service instance

● Resource block(RSB):– Represents a shared resource– master RSB and copy RSB

● Resource Directory– stores the mapping of resource names to master node id

● Lock block(LKB)– represents a lock request

– LKB mode: NL, PR and EX

– lock queues: granted queue, convert queue and waiting queue

● (Blocking) Asynchronous system trap (AST/BAST):– Support unblock-mode locking

core concepts (2/2)

27

● Lock Status Block (LKSB)– Used by DLM client to communicate lock status with DLM

● Lock value block (LVB):– A 32/64 bit memory block– Always keep up-to-date among lock holders

Resource Block

28

AST/BAST

29

LVB (1/2)

30

LVB (2/2)

31

Node Operation type Mode Copying

1 initial request any A → B

2 initial request any A → C, C → E, E → D

1 conversion NL to higher A → B

1 conversion EX to lower B → A

2 conversion NL to higher A → C, C → E, E → D

2 conversion EX to lower D → F, F → C, C → A

1 unlock EX B → A

2 unlock EX D → F, F → C, C → A

DLM operations - Lockspace Operations

32

1. int dlm_new_lockspace(const char *name, const char *cluster,

uint32_t flags, int lvblen,

const struct dlm_lockspace_ops *ops, void *ops_arg,

int *ops_result, dlm_lockspace_t **lockspace);

2. int dlm_release_lockspace(void *lockspace, int force);

DLM operations - acquiring or converting a Lock

33

1. int dlm_lock(dlm_lockspace_t *lockspace, int mode, struct dlm_lksb *lksb, uint32_t flags, void *name, unsigned int namelen, uint32_t parent_lkid, void (*ast) (void *astarg), void *astarg, void (*bast) (void *astarg, int mode));

2. int dlm_unlock(dlm_lockspace_t *lockspace, uint32_t lkid, uint32_t flags, struct dlm_lksb *lksb, void *astarg)

DLM operations - posix lock operations

34

1. int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file, int cmd, struct file_lock *fl);

2. int dlm_unlock(dlm_lockspace_t *lockspace, uint32_t lkid, uint32_t flags, struct dlm_lksb *lksb, void *astarg)

3. int dlm_posix_get(dlm_lockspace_t *lockspace, u64 number, struct file *file, struct file_lock *fl)

Recovery

35

Recovery: the process of bringing lockspace back into working state

Interaction between userspace and kernel

36

configfs# tree /sys/kernel/config/dlmdlm└── cluster ├── buffer_size ├── cluster_name ├── comms │ ├── 1084783118 │ │ ├── addr │ │ ├── addr_list │ │ ├── local │ │ └── nodeid │ └── 1084783247 ... ├── log_debug ├── log_info ├── new_rsb_count ├── protocol ├── recover_callbacks ├── recover_timer ├── rsbtbl_size ├── scan_secs ├── spaces │ ├── 2E76BB09DD314581A62C032436F58344 │ │ └── nodes │ │ └── 1084783247 │ │ ├── nodeid │ │ └── weight │ └── clvmd …. ├── tcp_port ├── timewarn_cs ├── toss_secs └── waitwarn_us

37

debugfs

#tree /sys/kernel/debug/dlm//sys/kernel/debug/dlm/├── BC65B6D742274FEDA223B6E605EF962C├── BC65B6D742274FEDA223B6E605EF962C_all├── BC65B6D742274FEDA223B6E605EF962C_locks├── BC65B6D742274FEDA223B6E605EF962C_toss├── BC65B6D742274FEDA223B6E605EF962C_waiters├── clvmd├── clvmd_all├── clvmd_locks├── clvmd_toss└── clvmd_waiters

38

sysfs

# tree /sys/kernel/dlm//sys/kernel/dlm/├── BC65B6D742274FEDA223B6E605EF962C│ ├── control│ ├── event_done│ ├── id│ ├── nodir│ ├── recover_nodeid│ └── recover_status└── clvmd ├── control ├── event_done ├── id ├── nodir ├── recover_nodeid └── recover_status

39

dlm device and udev

#tree /dev/misc//dev/misc/├── dlm_BC65B6D742274FEDA223B6E605EF962C -> ../dlm_BC65B6D742274FEDA223B6E605EF962C├── dlm_clvmd -> ../dlm_clvmd├── dlm-control -> ../dlm-control├── dlm-monitor -> ../dlm-monitor├── dlm_plock -> ../dlm_plock

# cat /usr/lib/udev/rules.d/51-dlm.rulesKERNEL=="dlm-control", MODE="0666", SYMLINK+="misc/dlm-control"KERNEL=="dlm-monitor", MODE="0666", SYMLINK+="misc/dlm-monitor"KERNEL=="dlm_plock", MODE="0666", SYMLINK+="misc/dlm_plock"KERNEL=="dlm_*", MODE="0660", SYMLINK+="misc/%k"

40

Netlink and netlink socket

41

Thanks!

Q & A?