rac+asm: stories to share
DESCRIPTION
RAC+ASM: Lessons learned after 2 years in productionManaging over 70 databases for 4 major customers, I have some good stories to share. Running almost all possible combinations of ASM, RAC, NETAPP and NFS. Success, failure and gotchas. This presentation is the equivalent of years of experience, condensed in major highlights in 45 minutes. To list a few stories:TRANSCRIPT
RAC+ASM 3 years in production Stories to sharePresented by: Christo Kutrovsky
© 2009/2010 Pythian - Presentation for ABC Company2
Who Am I
•Oracle ACE•10 years in Oracle field• Joined Pythian 2003•Part of Pythian Consulting Team
•Special projects•Performance tuning•Critical services
“oracle pinup”
© 2009/2010 Pythian - Presentation for ABC Company©The Pythian Group
Pythian Facts
Founded in 1997
90 employees
120 customers worldwide
20 customers more than $1 billion in revenue
5 offices in 5 countries
10 years profitable private company
© 2009/2010 Pythian - Presentation for ABC Company©The Pythian Group
What Pythian does
Pythian provides database and application infrastructure services.
© 2009/2010 Pythian - Presentation for ABC Company5
Agenda
•2 nodes RAC•ASMLIB with multipathing•Migrating to new servers with ASM•Thin provisioning•ASM + restores = danger•Device naming conventions•spfile location• JBOD configuration
6
2 Node RAC for High Availability
© 2009/2010 Pythian - Presentation for ABC Company7
2 Node RACs for HA
•Two node RAC nodes•13 databases•Dev databases
•Shutdown databases (and ASM) on node1•Perform maintenance
•Unplug the interconnect cable•What happens?
© 2009/2010 Pythian - Presentation for ABC Company8
2 nodes RAC
Interconnect
ASMDG
OCR/V
VIP VIP
Node 1
Node 2
SID_A1
SID_B1
SID_A2
SID_B2
Fibre Channel
© 2009/2010 Pythian - Presentation for ABC Company9
2 nodes RAC
Interconnect
ASMDG
OCR/V
VIP VIP
Node 1
Node 2
SID_A1
SID_B1
SID_A2
SID_B2
Fibre Channel
© 2009/2010 Pythian - Presentation for ABC Company1
0
Interconnect
ASMDG
OCR/V
VIP VIP
Node 1
Node 2
SID_A1
SID_B1
SID_A2
SID_B2
Fibre Channel
I can’tSee Node 1
I can’tSee Node 2
© 2009/2010 Pythian - Presentation for ABC Company1
1
One is not Quorum
•50% chance your working node gets restarted
•Depends on clusterware version•Who will shoot the other guy first
© 2009/2010 Pythian - Presentation for ABC Company1
2
One is not Quorum
•Conclusion?•Turn off clusterware when you have only 2 nodes and performing maintenance
•Upgrade to a more predicable clusterware
•Lowest ‘leader’ always survives•Add a 3th tie-breaker node
•doesn’t have to run a database, just clusterware (observer)
© 2009/2010 Pythian - Presentation for ABC Company1
3
One is not Quorum
Production cases, what happens if•All Network dies on one node?•All disk dies on one node?
14
ASMLIBwith Multi Pathing
© 2009/2010 Pythian - Presentation for ABC Company1
5
Building ASMLIB devices when multipathing is present•Devices used for creating asmlib
•/dev/emcpowerc1•/dev/mapper/raid10_data_disk
•Devices used to create asm diskgroup•ASMLIB
•The reboot changes everything•ASMLIB re-discovers the devices without multipath
•Difficult to diagnose
© 2009/2010 Pythian - Presentation for ABC Company1
6
Visual
LUN_1 LUN_2
/dev/sdb
/dev/sdc
/dev/sdd
/dev/sde
HBA1
HBA2
/dev/mapper/data1
/dev/mapper/data2
© 2009/2010 Pythian - Presentation for ABC Company1
7
Building ASMLIB devices when multipathing is present•Do not use ASMLIB
• If you have to (why?)•Must setup “ORACLEASM_SCANORDER”
•asm_diskstring parameter•Permissions
•Udev files•Boot/startup script
© 2009/2010 Pythian - Presentation for ABC Company1
8
Removing ASMLIB
•Why•Extra layer•Requires new driver for every new kernel
•Can cause downtime if not careful•ASMLIB header is the same as ASM DISK header
• Just has extra field for ASMLIB name•Disks can be accessed directly, without ASMLIB without having to drop/recreate them
© 2009/2010 Pythian - Presentation for ABC Company1
9
Removing ASMLIB
•Unmount all affected diskgroups•Change or set asm_diskstring•Remount diskgroups via new paths
•Can be done in rolling fashion in RAC
20
SAN Migration
© 2009/2010 Pythian - Presentation for ABC Company2
1
Migrating from EMC to 3PAR
•New SAN•New concept
•Thin provisioning•A big project
•Or not
© 2009/2010 Pythian - Presentation for ABC Company2
2
Add/drop/go home
•No brainer•Thin provisioning rocks
•SA adds disks•Add disk to diskgroup•Drop all old disks•Wait•Never be paged on space
23
Server Migration
© 2009/2010 Pythian - Presentation for ABC Company2
4
Server migration
•Current setup•2 nodes RAC with ASM
•New servers•Better, Faster, Stronger
•Fastest (effort wise) way to migrate, with minimal downtime
•Possible with zero downtime
© 2009/2010 Pythian - Presentation for ABC Company2
5
Server migration options
•Create standby on new server•Requires extra copy of data
•Add the new nodes, drop existing ones•Possible clusterware issues
•Move the LUNs•Easy•New servers tested
© 2009/2010 Pythian - Presentation for ABC Company2
6
Lun Migration
• Install clusterware and create RAC database with same name
•Test hardware / wiring / configuration•Migrate
•Stop production•Re-assigning LUNs•Start production
27
ASM Restore creates database black hole
© 2009/2010 Pythian - Presentation for ABC Company2
8
ASM + Same host restore = DANGER
•Production database•Diskgroup +PROD
•Snapshot database•Diskgroup +SNAP
•Rebuild monthly via duplicate database•Except this one time…
© 2009/2010 Pythian - Presentation for ABC Company2
9
The concept
•“SNAP” backups not taken• If a given “SNAP” backup is to be restored, simply re-create the given “PROD” backup
• Independent from Production
© 2009/2010 Pythian - Presentation for ABC Company3
0
Restore with ASM
•Restore FRA files into separate directory•Startup SNAP instance•Catalog backup files•Restore into SNAP diskgroup
•The missing piece?“restore” writes into original backup file location
•Must use “set new name for datafile” in run block
© 2009/2010 Pythian - Presentation for ABC Company3
1
Restore with ASM – the result
•Unrecoverable corruption on production database
•Lost about 3-4 hours of changes• If this was filesystem and not ASM, no corruption would have occured
© 2009/2010 Pythian - Presentation for ABC Company3
2
Corruption – what happened
SGA
Disk
REDO
5row
s
5row
s
5row
s
5row
s
2row
s
2row
s
2row
s
Disk5
rows
5row
s
5 row
s
5 row
s
BLK1 add Row 6BLK3 add Row 3
Original datafile
Partially overwritten datafile
© 2009/2010 Pythian - Presentation for ABC Company3
3
Corruption – what should’ve hap.
SGA
Disk
REDO
5row
s
5row
s
5row
s
5row
s
2row
s
2row
s
Disk5
rows
5row
s
5 row
s
5 row
s
BLK1 add Row 6BLK3 add Row 3
Original datafile
Partially overwritten datafile
BLK3 add Row 6
5row
s
© 2009/2010 Pythian - Presentation for ABC Company3
4
Corruption – what happened
© 2009/2010 Pythian - Presentation for ABC Company3
5
Corruption
•Why this wouldn’t have happened with filesystem?
•File names are just pointers to data stream
• If a file is re-created, a new data streams is associated with it
•Processes that have the file currently open still use the old data stream
•This is why “undelete” is possible•My blog about undeleting files
© 2009/2010 Pythian - Presentation for ABC Company3
6
Open “file 1”Corruption
Stream X1
File 1 Process 1
© 2009/2010 Pythian - Presentation for ABC Company3
7
Open “file 1”Corruption – recreate File 1
Stream X1
File 1 Process 1
Stream X2
38
Device names convention causes user error
© 2009/2010 Pythian - Presentation for ABC Company3
9
Device naming conventions
•Using /dev/mapper/<name>•Asm uses <name>p1 – first partition
•Permissions set script uses: “*p1” •Then came /dev/mapper/backup1
•First partition is: /dev/mapper/backup1p1
© 2009/2010 Pythian - Presentation for ABC Company4
0
Device naming conventions
•V$ASM_DISKPATH HEADER_STATUS
--------------------------- -------------
/dev/mapper/backup1 CANDIDATE
/dev/mapper/redop1 MEMBER
/dev/mapper/backup1p1 MEMBER
/dev/mapper/data2p1 MEMBER
/dev/mapper/data1p1 MEMBER
© 2009/2010 Pythian - Presentation for ABC Company4
1
Naming conventions
DISK
Partition 1
IN USE
ADDED
© 2009/2010 Pythian - Presentation for ABC Company4
2
New convention
•Now we use generic names, as we do re-assign disks
•We also use prefix and suffix with a clear dilimiter
/dev/mapper/asm-raid5-dev01-part1
43
spfile location in RAC
© 2009/2010 Pythian - Presentation for ABC Company4
4
spfile location
• Intended configuration• init.oraspfile=‘+ASM_DSKGRP/dbname.spfile’
•no spfile
© 2009/2010 Pythian - Presentation for ABC Company4
5
Changing parameters in masses
•create pfile=‘your_initials.ora’ from spfile;•edit•create spfile=‘+ASM_DSK/spfile’ from pfile=‘ck.ora’
© 2009/2010 Pythian - Presentation for ABC Company4
6
What not to do
•create pfile from spfile•edit•create spfile from pfile;
© 2009/2010 Pythian - Presentation for ABC Company4
7
Result
•One node uses local spfile•Other(s) uses global spfile•Parameter changes to “BAD” node are sent to other nodes
•not persistent on GOOD nodes•persistent on BAD nodes
•Paramer changes on GOOD nodes have reversed behaviour
48
Adding ASM disks crashes databases
© 2009/2010 Pythian - Presentation for ABC Company4
9
Adding disks
•Must be visible on all servers•Otherwise your diskgroup gets dismounted on the nodes that don’t see the disk
•All databases using this diskgroup crash
© 2009/2010 Pythian - Presentation for ABC Company5
0
ASM add disk process
1. Is the disk visible locally?2. Initialize disk header, add it to
diskgroup3. Notify all nodes to rescan disks and
add the new disk4. If one or more nodes cannot see the
disk, raise error5. Dismount diskgroup on all nodes not
seeing the new disk
51
ASM with JBODwelcomes simplicity
© 2009/2010 Pythian - Presentation for ABC Company5
2
JBOD Configuration
•Linux Datawarehouse•10 TB space•28 disks of 430/285 GB•All redundancy/striping provided by ASM
© 2009/2010 Pythian - Presentation for ABC Company5
3
JBOD Configuration
•Simplicity•No ASMLIB•Straight devices
•Naming convention – use only 1 partition, and use partition 4
•/dev/sd*4 • is ASM partition• is permissions wildcard• is asm_diskstring
© 2009/2010 Pythian - Presentation for ABC Company5
4
Testing your speed
•Verify read speed of each device•Verifies each device is performing as expected
•Verify read speed from all devices•Verify your total bandwith
•Verify read speed from all devices, towards the end of the device
•Disk read speed is not linear
© 2009/2010 Pythian - Presentation for ABC Company5
5
Read Speed of a single disk
* Courtesy google image search
© 2009/2010 Pythian - Presentation for ABC Company5
6
Testing your speed
•One device at a timefor dsk in /dev/sd[c-q]; do echo $dsk; dd if=$dsk of=/dev/null iflag=direct bs=2M count=100; done• All devices (total bandwith)for dsk in /dev/sd[c-q]; do echo $dsk; dd if=$dsk of=/dev/null iflag=direct bs=2M count=100 &; done• Test end speed
• Add SKIP=x
© 2009/2010 Pythian - Presentation for ABC Company5
7
Sample output/dev/sdc100+0 records in100+0 records out209715200 bytes (210 MB) copied, 1.60325 seconds, 131 MB/s/dev/sdd100+0 records in100+0 records out209715200 bytes (210 MB) copied, 1.60188 seconds, 131 MB/s/dev/sde100+0 records in100+0 records out209715200 bytes (210 MB) copied, 1.60067 seconds, 131 MB/s/dev/sdf100+0 records in100+0 records out209715200 bytes (210 MB) copied, 1.59928 seconds, 131 MB/s/dev/sdg100+0 records in100+0 records out
© 2009/2010 Pythian - Presentation for ABC Company5
8
JBOD configuration
•Disk adding/removal is very easy•Add disks in bulk:alter diskgroup XXX add disk ‘/dev/sd[c-q]4’;
•Performance rocks•controller speed
•Diagnostic is easy• Iostat –x 5 /dev/sd*4
•Manageability is easy•1 diskgroup – no filenames, no mountpoints
© 2009/2010 Pythian - Presentation for ABC Company5
9
Final Thoughts
•RAC for HA requires 3 nodes•ASM
•Keep it simple•Reduce layers•Runs fast•Still need to be carefull
© 2009/2010 Pythian - Presentation for ABC Company6
0
The End
Thank You
Questions?
I blog at
http://www.pythian.com/news/author/kutrovsky/