de-confusing ssd for oracle database

Upload: hery014405

Post on 05-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 De-Confusing SSD for Oracle Database

    1/4

    De-Confusing SSD (for Oracle Databases)

    Posted by Gwen Shapira on Nov 22, 2011

    You never forget your first SSD.

    For me, the first time I really *noticed* SSDs was when one of my customers encountered seriouscorruption on one of their databases and we had to restore an entire database. It was not a small

    database, around 300G in total file size. After I started RMAN restore and recovery process, thecustomer asked the inevitable question: How long will this take?. I replied that Im not familiar

    with the performance of their storage, but from my experience a restore of this size can be

    expected to take 5 hours. Imagine my surprise when the restore was done after an hour.

    This was enough to convince me that SSD is magic, and that if you have money and IO problem,just go SSD. Of course, if that was the end of the story, I wouldnt have much of a blog post.

    When I tried to share my excitement with other DBAs, I found out that SSD is actually kind of

    scary. Soon I became extremely confused by everything Ive seen and heard:

    SSD is fast for reads, but not for writes. Its fast for random writes, but not for sequential writes.You shouldnt use it for redo, except that Oracle do that on their appliances. SSD gets slow over

    time. SSD has a limited lifespan and is unreliable. Performance depends on exactly which SSD

    you use. You can have PCI or SATA or even SAN. You can use SSD for flash cache, but onlyspecific versions, maybe. You can have MLC or SLC. It can be enterprise or home grade.

    With every conversation, the confusion grew. Until I finally had enough, and also had some spare

    time, and could sit down and untangle the web. Heres what I found out:

    First, there are two types of SSD Flash based (also called NAND flash) and RAM based. For allpractical purposes, RAM based doesnt actually exist, and is used to confuse DBAs a bit more. Justignore it for now all SSDs are NAND flash.

    NAND flash SSDs arrive in two flavors multi-level (MLC) and single-level (SLC). It sounds like

    multi-level is better, but thats wrong. MLC is slower and cheaper, SLC is faster and more

    expensive. Enterprise-grade SSDs are SLC.

    SSDs base memory unit is a cell, which holds 1 bit in SLC and 2 bits in MLC. Cells are organized

    in pages (usually 4k) and pages are organized in blocks (512K). Data can be read and written inpages, but is always deleted in blocks. This will become really important in a moment.

    SSD is indeed very fast for reads. SSD can read 4K page in 25 *microseconds*. Remember thatwith spinning disks, 1ms read is very reasonable, and significantly longer read times from SAN

    storage. SSD is around 4 orders of magnitude slower than RAM and 3 orders of magnitude faster

    than spinning disks.

    SSD is also very fast for writes. 4K block write takes 250 microseconds. Much slower than SSDreads, but much faster than writing to magnetic disks.

    http://www.pythian.com/news/28797/de-confusing-ssd-for-oracle-databases/http://www.pythian.com/news/author/shapirahttp://www.pythian.com/news/author/shapirahttp://www.pythian.com/news/28797/de-confusing-ssd-for-oracle-databases/
  • 7/31/2019 De-Confusing SSD for Oracle Database

    2/4

    However, for SSD, almost all write-time statistics are misleading.

    SSD is much faster than spinning disk for random writes on new media but it doesnt have much

    advantage when it comes to sequential writes. The main advantage of SSD over magnetic disks

    is that it doesnt have to spin. Writing to any area on the device is just as fast. When the magnetic

    disk doesnt have to spin either because it is writing all the data to the same location, SSD is notmuch faster. This is why DBAs do not advise placing redo logs on SSD it is all sequential writes,

    so there wont be any performance improvement. Performance may actually degrade if the redologs were places on a volume with large number of underlying disks. SSDs parallelism is limited

    to the number of channels the device supports, usually fewer than 16.

    The second problem with SSD is writing is *erasing* is very slow. Erasing is very slow becauseeven if you want to delete one page, the SSD can only delete the entire block. The controller needs

    to read the entire block, erase everything and write back only the bits you want to keep. To make

    things even more fun, each SSD cell can only be written a certain number of times before it

    becomes unusable.

    To keep the re-write overhead low, manufacturers use several techniques:

    They use every cell before resorting to deleting existing cells

    They will over-provision the SSD (i.e. build 2G SSD but only show the OS 1.8G), so

    writes can be completed quickly using the spare space and then the required deletes canbe completed asynchronously in the background.

    As you can see, this requires the controller to be pretty smart and maintain free lists of pages that

    can be used for writing, and pages that need cleaning. This is one of the major differences

    between different SSD devices how well the controller manages the writing and erasing

    cycles on the device.

    To compare SSDs, you can check the write amplification number for each device. This number

    shows how many Mbytes are actually written when you attempt to write 1M (on average). The

    extra writes are due to the cleanup processes involved. Devices with amplification factor that iscloser to 1 will have better write performance.

    When looking at SSD benchmarks, it is important to remember that writing on clean device

    is faster by definition than writing on used device, so make sure you are looking at the right

    numbers. If you run your own benchmarks, expect throughput to drop significantly in the first 30minutes of the test, and only consider the numbers you see following the initial drop.

    Of course, once the device is nearly full, the controller has much harder time finding free pages

    and it has to perform more cleanup on every write. This is the SSD version of fragmentation.

    When considering benchmarks, also check how full the device was while the benchmark ran.

    The SSD performance blog is an excellent resource with many well-ran SSD benchmarks. You can

    use their results, but its even better to learn from their methodology.

    http://www.ssdperformanceblog.com/http://www.ssdperformanceblog.com/
  • 7/31/2019 De-Confusing SSD for Oracle Database

    3/4

    The limit on how many times a cell can be used is not as scary as it initially sounds. On enterprisegrade SSD cells can take around 1M writes, which is usually enough to last around 20 years.

    Since most disks arent expected to last over 3 years anyway, I wouldnt worry about that. Exceptthat this estimate assumes that you use all parts of the at a uniform rate. This isnt a natural

    pattern of using data we usually use some of the data much more frequently than we use other

    data. The SSD controller usually contains logic to make sure it spreads the write-erase activityaround regardless of how you use the data.

    Devices also differ in the interfaces they offer. The common interfaces are either PCI-E or SATA.

    PCI-E is faster, offers more channels (better parallelism) and is more expensive. SATA is cheaper

    and fits into more legacy hardware.

    Now that we know a bit about how SSDs work and how to choose your SSD device, its time for

    the important question: How do I use it to make my Oracle Database faster. Here are severaloptions, from most effective to least recommended:

    * If you are not IO-bound, re-consider your decision to use SSD. If you dont see significantnumber of physical reads and sequential read wait events in your AWR report, you wont notice

    much performance improvements from using SSD.

    * If you can afford it, put all your data files on SSD. This is a quick way to make sure you get

    excellent read and write performance.

    * If you have more data than space on your SSD, you want to place on the SSD segments that willbenefit most from its random-read and random-write performance. Look for segments where you

    do most of the reading and writing, prefer segments with more reads than writes, and those that

    have sequential read (i.e. random access) to scattered read (i.e. full scans). I have some queries that

    I use to pick the segments, and I also use the statistics in AWR reports.

    * If the most-used segments are the current partition in a partitioned table (a very frequent

    scenario), youll want to write a job that will create the partition on the SSD, but move it to the

    magnetic storage when it stops being active.

    * If you cant decide on which segments to use, you can use your SSD as a secondary cache to

    your SGA. The feature is called Database Smart Flash Cache, not to be confused with Exadata

    Smart Flash Cache. When you enable this feature, blocks that are evicted from the SGA are

    written to the SSD, where they can later be retrieved much faster than they can be from disk. Thismeans that writes will not get any performance benefit from SSD, in the same way that a larger

    SGA wont benefit writes either.

    If your system is IO-bound and read-heavy, this is a very easy way to benefit from a smaller SSD

    without deeply inspecting your workload patterns and moving segments around. However,according toGuy Harrisons tests, the performance benefits from using SSD this way are not as

    impressive as selecting the right tables and placing them on SSD.

    https://github.com/gwenshap/Oracle-DBA-Scripts/blob/master/SSD.sqlhttps://github.com/gwenshap/Oracle-DBA-Scripts/blob/master/SSD.sqlhttp://guyharrison.squarespace.com/blog/2010/1/24/flash-tablespace-vs-db-flash-cache.htmlhttp://guyharrison.squarespace.com/blog/2010/1/24/flash-tablespace-vs-db-flash-cache.htmlhttps://github.com/gwenshap/Oracle-DBA-Scripts/blob/master/SSD.sqlhttps://github.com/gwenshap/Oracle-DBA-Scripts/blob/master/SSD.sqlhttp://guyharrison.squarespace.com/blog/2010/1/24/flash-tablespace-vs-db-flash-cache.html
  • 7/31/2019 De-Confusing SSD for Oracle Database

    4/4