storage efficiencies with hyper-v at the virtual and...

Storage Efficiencies with Hyper-V at the Virtual and Physical LayerThe Perils & Benefits of Thin Provisioning, UNMAP & Snapshots

Didier Van Hoye Technical Architect

Storage Efficiencies with Hyper-V at the Virtual and Physical Layer

2© 2016 Veeam Software

ContentsIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Thin Provisioning and UNMAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Thin provisioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

Thin provisioning at the physical layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

Thin provisioning at the virtual layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

Thin provisioning at the physical and virtual layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9

UNMAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9

Space efficiencies at the virtual layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9

Elasticity at the virtual layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Performance issues with thin provisioning and UNMAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Snapshots and checkpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Storage arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Snapshots are space efficient, not magic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Don’t run off to buy more storage before verification of the issue! . . . . . . . . . . . . . . . . . . . . . . . . . 14

Hyper-V checkpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

The effect of Snapshots and/or checkpoints on UNMAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Is fragmentation an issue in all of this? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

On the storage array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

In regards to Hyper-V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

On the Hyper-V host . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Inside the VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

The internal VHDX structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Closing thoughts on fragmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Other capabilities enhancing these efficiencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

ODX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

ReFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

About the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

About Veeam Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26



IntroductionYou might have read some articles or blog posts about the dangers of thin provisioning, UNMAP

and snapshots . It’s wise to point out these risks . Any powerful technology used in the wrong way

is dangerous . In the end, no matter how automated our process, we are in control . So, the advice is

to master the technologies we leverage and not become mindless slaves to it . Why? Because these

technologies are about achieving storage efficiencies . Efficiencies in delivering storage and offering

data protection . We can use these technologies at the physical layer (storage array), at the virtual

layer (Hyper-V) or a combination of both . What we choose depends on the environment, budget and

workloads . It’s about cost reduction, the ease and speed of recovering to points in time and offering

data to Development Operations (DevOps) teams .

Make no mistake — when you have economies of scale, this is of utmost importance . 20% of a 10

million dollars is serious money in terms of savings on storage CapEx . When you’re a small company,

not having to buy another 10TB of storage can make or break the budget for that year . It’s a cost

cutting edge you need to compete at cloud scale . These efficiencies help Hyper-V surpass challenges

beyond the ability of delivering a few hundred to over one million IOPS to virtual machines (VMs) .

These features are valuable to all sizes and types of users when used properly . So, let’s take closer

look at the benefits, the risks and the smoke and mirrors . A lot of this is based on many years

of running and supporting Hyper-V .



Thin Provisioning and UNMAP Thin provisioningThe benefits are clear . You don’t waste storage space by handing it out only for it never to be used . The

caveat with thin provisioning is when you consume the maximum space allocated in the dynamically

expanding VHDX files or on your storage array . You can deplete your storage capacity because you’re

basically lying to the operating system . It doesn’t know that you’ve run out of storage under the hood .

This problem exists at both the virtual and the physical layer . You can create a 4 TB dynamically expanding

VHDX file on a 2 TB LUN — thin provisioned or fully provisioned — it doesn’t even matter here . Once your

virtual hard disk outgrows the space available, your LUN won’t have any operating room left and your

VMs will suffer down time . Likewise, when you create a 50 TB LUN on a SAN that only has 30 TB in total

capacity, you’ll be in trouble when you actually try to copy that much data on to the LUN .

Note that Hyper-V, just like a decent SAN, will protect your data by no longer allowing it to be used .

The actions of Hyper-V or/and your storage array prevent major data corruption issues . But, you will

suffer down time for your virtual machines because storage can’t be generated out of thin air .

Barring the risk of running out of real, usable storage capacity, there are some other risks

with a thin provisioned LUN that we’ll address when we discuss UNMAP and fragmentation —

as these are performance related .

It takes careful management and monitoring to make sure that you add storage capability when

it’s needed . This is the responsibility of the administrator . Your organization must be able to handle

ordering and deploying storage in a timely fashion for that to happen . Make sure you know your

processes and timelines . Don't just assume them, or your extra storage capacity might never arrive or

get provisioned . Also, know the procedure to actually deliver this storage to your VMs . Not all storage

solutions are created equal and can increase storage indefinitely .

So, plan ahead . Know your storage . Make sure you have some head room and a safety margin . It’s all

about checks and balances . Best practices dictate that you do not over provision and that you monitor

your environment when the workloads are mission critical . Does this sound like a lot work? Well no,

not really . After the initial learning curve, it comes down to actually monitoring your environment

and responding to alerts . You have to do it — and doing it well is far superior to not doing it at all or

aiming for perfection . Trust me, managing storage is a hundred times less complex and expensive than

managing people . You probably have dozens of middle managers doing that so a meager 20% FTE for

storage monitoring and management shouldn’t be an issue .

Thin provisioning at the physical layer brings a lot of benefits — even when all your virtual disks are fixed . But, it

can also be combined with thin provisioning at the virtual layer . Dynamically expanding virtual disks provides

thin provisioning for people who don’t have a storage array with thin provisioning . We’ll discuss these below .



But let’s not just talk about the risks . There are real benefits and you can do really interesting things that

will save you loads of lost capacity . Performance issues are often not due to thin provisioning because

modern storage is often tiered or all flash .

Thin provisioning at the physical layer

Thin provisioning at the physical layer means in the storage array . Let’s take a look at this 10 .5 TB LUN on

a SAN where we disabled snapshots . It shows that the actual space we are using on the SAN is 402 GB,

the rest of that 10 .5 TB is not allocated .

Figure 1: A 10.5 TB LUN with 402.01 GB of actual storage consumed on the SAN

We’ll delete about 200GB of files and see that amount of space recuperated on the SAN .

Figure 2: That same 10.5 TB LUN after we deleted about 200 GB worth of data on it. After UNMAP — happens automatically in Windows Server 20012(R2) — it’s passed onto the SAN that’s this data has been deleted and the space has been reclaimed!

Do note that your SAN is often more capable than the OS at detecting truly used storage space . The

OS doesn’t know about the fact that a fixed VHDX is only really consuming 10 GB out of 50 GB or 127

GB . The SAN does . The same goes for dynamically expanding VHDX files — on which space has been

recovered on the SAN, but where the file hasn’t shrunk yet . So, the OS reports way higher storage use

on a CSV LUN full of VMs with fixed VHDX size or dynamically expanding VHDX files than the SAN . As an

extreme example, take a look at the same LUN as above .



Figure 3: The CSV reports the size the OS sees, not what’s consumed on the SAN.

There’s a nice collection of fixed VHDX files on this CSV that are filled with just an OS and nothing more .

So, they don’t consume space on the thin provisioned LUN on the SAN but Windows doesn’t know any

better and reports the fixed VHDX file sizes .

Figure 4: Windows reporting 2.59 TB in use while in reality only 402 GB is consumed on the SAN.



Thin provisioning at the virtual layer

Even if you don’t have a storage array that provides you with thin provisioning, you can enjoy its

benefits due that to the dynamically expanding VHDX . No storage capacity is required until you actually

put data into the volume(s) on these virtual disks .

Figure 5: The real size of the dynamically expanding virtual disks consumed is reported on the Hyper-V host.

You might notice the difference in size between an empty VHD and VHDX . This is due to the new

internal structure of the VHDX which grows by default in block sizes of 32 MB versus 2 MB with a VHD .

It also allocates some blocks as a buffer so it can fill requests fast while it expands with new blocks . The

file sizes on the Hyper-V host reflect this when we write data to the dynamically expanding virtual disks .

Figure 6: The real size of the dynamically expanding virtual disks on the Hyper-V host has grown.

A thing to note here is that if the fixed VHDX files reside on a thin provisioned LUN on a storage array,

they don’t consume that space either . Only when data is written to them .



The operating system reports the size of the LUN as assigned inside of the VM, not the real size of the

dynamically expanding virtual disks . The amount of free space is never consumed on the host . This will

not cause any issues as long as the space assigned is available when needed .

Figure 7: The OS reports the size of the LUNs that are assigned and that are free space inside the VM.

I’ll mention this for completeness, but today you should be using the VHDX format and not VHD . Much has

been discussed on the benefits of VHDX so, I’m not going to repeat it here . Unless you have the need to move

VMs back and forth to an older Windows Server 2008 R2 environment and your modern Windows 2012 R2

Hyper-V environment you should really use VHDX and not look back . The performance is way better and the

dynamically expanding VHDX is faster, on top of better resilience and data protection .



I didn’t use dynamically expanding virtual disks in production at all prior to the introduction of the

VHDX virtual disk file format . It was a lab/dev-only use case for environments I worked at . This was

because of a noticeable reduction in performance and an increased fragmentation of the VHDX files

on the LUN when there were too many dynamically expanding VHD files residing there and expansion

happened frequently . The newer VHDX format has changed this significantly and they now grow in

larger block sizes . The performance is really good even during growth, and we’ve started using them in

production for new deployments of “general purpose” VMs were we cannot predict the storage needs

accurately . This avoids overprovisioning and saves on wasted storage space on the storage array . It also

prevents resizing (shrink/extend) the virtual hard disk . Even if this can be done online while leveraging

ODX for performance, it’s not as easy as avoiding most of the issue . We gradually try our larger VM

sizes with dynamically expanding VHDX files and its usage is growing overall . Having ODX helps with

performance during the extension of the virtual disks, but this is most noticeable in cases where

significant growth happens . We still leverage fixed VHDX files for really IO-intensive workloads .

Thin provisioning at the physical and virtual layer

There used to be a time when some SAN vendors didn’t support dynamically expanding virtual hard

disks on their thin provisioned LUNs . That was back in the days of VHDs and Windows 2008 (R2) . But,

that’s behind us and it perfectly fine with most vendors to combine thin provisioning at the physical

and virtual layer . The combination of dynamically expanding virtual disks and thin provisioned LUNs on

a SAN complement each other .

UNMAPThis is the feature that allows both the virtual disk as well as the physical storage to deliver space

efficiencies . It does this in multiple ways and at both the virtual and physical layer . Since Windows

Server 2012, we have native support for UNMAP in the OS . This has made our lives a lot easier in regards

to communication to the storage arrays when data has been deleted and can then be recuperated .

This was time consuming and used to require scripting and storage vendor agents to be deployed —

causing an impact on performance (zeroing out the volumes) .

Both a fixed VHDX and a dynamically expanding VHDX UNMAP will mark the free space that becomes

available after the deletion of data as “not in use .” This is passed via the host to the storage array which,

if it’s thinly provisioned, can also reclaim that space and use it elsewhere in the array .

Space efficiencies at the virtual layer

At the virtual layer the free space in the virtual disk can be reused which means that a dynamically

expanding VHDX will not have to grow beyond what’s really needed as it knows very well that it has

reusable free space . You may think that’s the case with a dynamically expanding VHD — and if it’s

a single volume you’re right — the experience is much the same . The benefit a VHDX and UNMAP

has over a VHD becomes very visible, and more important, when a virtual disk has multiple volumes .

Volumes in VHD don’t know about the free space at the disk level or if that space has been freed from

another volume on that virtual disk . This means that when copying data into other volumes on the

same dynamically expanding VHD, that disk will have to grow, which comes at a performance cost

because it is allocating more space than is actually being used .



Another benefit of UNMAP is that a backup and Availability solution like Veeam® doesn’t have to backup hard

deleted data in a VM . UNMAP has signaled to the VM that the data is deleted and that the space is free . This

means Veeam Backup & Replication™ doesn’t back it up either, reducing the overhead at your backup target .

Starting from Veeam Availability Suite™ v9, thanks to the new feature BitLooker, we can skip blocks where

deleted files were situated, even without the UNMAP support from guest OS/hypervisor .

Here’s logical example of UNMAP as an illustration . Let’s assume it’s a dynamically expanding

virtual disk (VHD/VHDX) . It has two volumes, which both have 50% of the maximum capacity

of the dynamically expanding virtual disk .

Figure 8: A dynamically expanding VHDX/VHD with two volumes and no free space.

Figure 9: We copy data into volume 1. The VHDX/VHD grows as the volumes.

Figure 10: We delete data from volume 1. There is free space.

Figure 11: We copy data into volume 2. UNMAP & VHDX allow for use free disk space. The virtual disk does not have to expand.



Figure 12: We copy data into volume 2. VHD doesn’t support UNMAP. The virtual disk does need to expand to provision space for the volume 2. The result is loss in performance and storage capacity.

Elasticity at the virtual layer

UNMAP allows a dynamically expanding VHDX to shrink . When you shut down the VM the dynamically

expanding VHDX will shrink . This provides efficiencies when you do not have a thin provisioned

storage array . If you do have one, the UNMAP command is propagated via the Hyper-V host to the

SAN, which can reclaim the space for use elsewhere .

There is a big difference to note between the virtual layer and the storage array . At the virtualized layer

shrinking of dynamically expanding VHDX is possible . When you shut down the VM, the dynamically

expanding virtual disk VHDX will be shrunk by the amount of available free space . This requires the

VM to shut down, it doesn’t happen during restarts . That’s OK . The use case for this is a large change

in space required in a dynamically expanding virtual disk . You do not need the yoyo effect of a

dynamically expanding virtual disk extending and shrinking all the time . This is senseless overhead

which can affect performance, just like thinly provisioned LUNs at the physical layer . Highly volatile

LUNs where large amounts of data are being written to and deleted should not be thinly provisioned .

That’s one use case for thick provisioning via fixed virtual hard disks . I have described a similar situation

in this blog post Mind the UNMAP Impact on Performance in Certain Scenarios .

Storage arrays are generally only good at expanding LUNs . Shrinking them isn’t commonly available

to the best of my knowledge . True elasticity is only available at the virtual layer . Even with fixed VHDX

there is elasticity but it’s not automatically done, you have to shrink or expand the VHDX manually .

Shrinking is only possible if you have free space on the disk . Also don’t forget that this requires

a guest OS where you can resize volumes .

https://blog.workinghardinit.work/2013/09/18/mind-the-unmap-impact-on-performance-in-certain-scenarios/



Performance issues with thin provisioning and UNMAP

It goes without saying that if you have heavy IO, going on fixed VHDX is the way to go . This is due to the

fact that no matter how fast the dynamically expanding VHDX can grow, it still has impact on performance .

ODX can mitigate part of this, but not everything . There are other issues you need to watch out for . When

a dynamically expanding VHDX has shrunk at shutdown of the VM, it will have to grow again, needlessly .

UNMAP itself comes at a cost when being executed . In most cases this is not an issue but there are scenarios

where it will hurt performance too much . This is the case with LUNs when large amounts data are constantly

written and deleted after x amount of time like some monitoring systems, backups etc .

Figure 13: Real life example: A huge increase in deleting expired backup when in the day UNMAP became active on the target host. After figuring this out, fully provisioning the LUNs was a solution.

It’s wise to provision thick LUNs on the storage array to prevent performance issues in these scenarios . I prefer

this to turning UNMAP off at the host level, which affects all LUNs on that host . A fully provisioned LUN on a

SAN doesn’t take away the thin provisioning/UNMAP benefits from the other LUNs . To understand when to

use, or not use, thin provisioning start here Plan and Deploy Thin Provisioning .

https://technet.microsoft.com/en-us/library/jj674351.aspx



Snapshots and checkpointsBoth Hyper-V checkpoints (as snapshots are now called) and storage array snapshots provide us with

a lot of flexibility for ease of recovery, data protection or delivering read-only data to certain workloads

or test data to dev/ops . They are fast and space efficient . That means they can have an impact on your

performance, and you can’t make endless snapshots . They do consume storage space, memory on

controllers and they negatively influence performance, especially under certain conditions .

There are also some risks associated with ill-advised use of snapshots . But, in our discussion here we’ll

mainly focus of the effect snapshots have on storage consumption, how they affect UNMAP and

what their performance impact is .

You need to use them wisely and manage them with intent . Performance will suffer if you don’t . Too

much storage space will be consumed, and if you really get into trouble, they can cause downtime or

even data loss . Knowledge of the technologies and how to use them in your environment are key to

optimizing the benefits they deliver while avoiding any, or most, issues .

Storage arraysThe technical details will differ between vendors, models, makes and versions of the storage arrays .

So to demonstrate that management and monitoring is needed to enjoy the benefits that snapshots

deliver, we’ll look at some real live examples .

Note that most SANs support two types of snapshots . The crash consistent ones and application

consistent ones . The latter require a hardware Volume Snapshot Service (VSS) provider to be used

in order to create them . This is the type of snapshot you can leverage with your backup solution .

Snapshots by themselves do not replace backups as they violate the 3-2-1 rule .



Snapshots are space efficient, not magic

Snapshots are space efficient, but that doesn’t translate to endlessly creating snapshots without ever

running out of storage space .

Here’s an example of a thinly provisioned LUN for SQL Server virtualization . We sized it at 1 .9 TB

(growing is easy, shrinking is not, so we’re conservative in increasing LUNs) . Thin provisioning at the

physical layer (the SAN) means we only use 1 .21 TB of the available storage for the actual data .

Figure 14: Note the space consumed by the snapshots of the LUN (light green).

Depending on the number of snapshots you retain — how frequent you take them and how volatile

your environment is in adding and deleting data, leading to large deltas frequently — the amount of

storage consumed by snapshots on a LUN and in your SAN can become significant . Don’t assume that

just because snapshots are space efficient and only capture a delta that they are insignificant!

You can also see the RAID overhead (yellow) . That’s why the available storage is always less than the

raw capacity of a storage array .

Don’t run off to buy more storage before verification of the issue!

Let me share with you another practical example of the importance of managing your snapshots in your

storage arrays . In this case, some were already suggesting they urgently needed to buy extra storage

capacity . But, we dove into the environment first to see if the numbers made sense . They did, but not

because the storage was being used for their real workloads . The actual storage needed for that, including

snapshots, was way below the alerting threshold . The extra capacity “lost” was due to abandoned,

orphaned snapshots in combination with a volatile elastic environment where snapshots can be big .



Figure 15: Space reclaimed on SAN by doing some long overdue maintenance.

So just doing maintenance — cleaning up snapshots that didn’t expire for some reason, removing orphaned

snapshot and making sure UNMAP was enabled on the hypervisor — recuperated over 80 TB of capacity .

Monitoring and maintenance are one thing . You also need to know and understand your storage ecosystem,

what’s happening to it and why . A smooth sales guy could have earned a nice bonus here .

This was caused mainly by the orphaned snapshots, which also prevents UNMAP from doing its job . But

just consider people who are not leveraging UNMAP with Hyper-V . You can often reclaim 15% to 20%

of space after a number of years in production by turning it on!

Even if the above is an extreme example, do realize that you should always verify what’s going on

before you rush out to order additional capacity . You need to manage UNMAP, Snapshots on both your

Hyper-V hosts and your storage array . Remember that even when your storage array doesn’t provide

thin provisioning or UNMAP support, you’ll gain benefits at the virtual layer and that might just be

what you need to make your resources go 15 to 20% further . That’s real money right there — especially

when needing to balance your budget .



Hyper-V checkpointsLike with storage based snapshots Hyper-V checkpoints provide a space efficient and fast way of creating

a point in time to revert back to . Much has been written and told about those . I’ll summarize the most

important aspects — both the benefits and the risks . The good news is that it’s all done at the virtual layer

meaning it’s a great feature for all of us who do not have hardware snapshot capabilities at our disposal .

When you want to test patches, deploy a new driver, upgrade software or even the guest OS it’s a life

saver . It saves us a ton of time in the lab and it’s a blessing for demos . As soon as you notice that things

are not working out, you can revert back to the original situation . You can chain checkpoints and revert

back to distinct steps in the process that you’re testing or performing . That’s a great capability to have .

Do note that “soon” is important here . So, if you need to go back after a week or longer with a stateful

application you’ll be in trouble . Basically, checkpoints are not supported in production, but are intended

for very fast consumption after being made in test/development by people who know what they’re doing

and they shouldn’t be kept around for a long time . You should use supported backup solutions and not

checkpoints for production . That’s also the case with certain applications that have their own extensive

and complex mechanisms of dealing with their data consistency such as SQL Server, Exchange and

Active Directory (AD) . Sure AD has become more intelligent about being virtualized, but people still get

into trouble . Reading the documentation is very important here . One way to mitigate the issues above is

taking down the VM before taking the checkpoint . That’s actually the safest way of using them .

Checkpoints also allow for cloning of a VM . That means that you can give a copy of a production VM to a

developer for testing of the upgrade of an application, to a security auditor or to engineer to reproduce issues .

These are nice capabilities but you should not use them in production environments and even for the

above use cases you must really know your workloads to know what you can get away with .

Checkpoints have other issues you need to be aware of:

• They consume disks space and can become large in volatile environments

• People forget they need space for a checkpoint merge to happen

• They are very bad for performance and it only gets worse the bigger the checkpoint chain becomes —

this can get pretty bad and you pay even more in lost performance during backups

• They can complicate migrations to different hardware or Hyper-V version which is important when you

need to replace the Hyper-V host or during migrations to a newer version of the parent partition

• They can wreck applications or even wreak havoc in your environment leading to data loss, service

down time and the cost of recovering from this — they do NOT replace backups

• You can’t extend a VM’s disks that have checkpoints because it will cause problems and potential data loss



• Some things have improved over time since we can now merge checkpoints online — in the past,

people who deleted a checkpoint (or snapshot as it was called than) and forgot about it, were

surprised that it still had to merge when they shut down the VM . That led to issues when shutting

down the VM(s) ranging from longer down times because a large avhdx takes time to merge . You

also need sufficient free space to merge the avhd(x) into the vhd(x), which might not be available

after a longer period of time . But, the good news is, merging checkpoints can leverage ODX!

• Tracking and managing large number of checkpoints is tedious and error prone

Figure 16: Those checkpoints can add up. Don’t leave them lingering around without a reason.

Note that we now have a second type of checkpoint in Windows Server 2016: “Production Checkpoints .”

Production checkpoints use the VSS from within the VM . This is similar to a hardware VSS assisted SAN

snapshot and is actually like a backup . The VM and the applications are aware of what happened . It’s

not just a point-in-time freeze . When you apply a production checkpoint, the VM will be shut down

and you’ll need to boot it to restore the VM with application consistency . Applications like SQL Server

and Exchange should be able to handle production checkpoints like they can handle normal backups .

They have their own requirements and conditions in regards to backups that you still have to meet, but

this is a supported and feasible way of using a production snapshot . Do note however that production

checkpoints by themselves are no substitute for backups!



The effect of Snapshots and/or checkpoints on UNMAPWhen you are leveraging UNMAP and snapshots or checkpoints you might notice that it doesn’t seem

to work, or when it does, it’s with a certain time delay . It seems unpredictable to people . To understand

why, we need to understand how both technologies impact each other .

Snapshots on SANs are used for automatic data-tiering, data protection and various other use cases . As

long as those snapshots live, so does the data in them, UNMAP will not free up space on the SAN with

thinly provisioned LUNs . This is normal, because the data is still stored on the SAN for those snapshots and

hard-deleting it from the VM or host has no impact on the storage the SAN uses until those snapshots are

deleted or expire . The direct impact is received only by what happens in the active portion .

This is also the case with Hyper-V checkpoints . When you create a checkpoint, the VHDX is kept and

you are writing to the avhdx (differencing disk) — meaning that any UNMAP activity will only reflect

on data in the active avhdx file and not in the “frozen” parent file . When you have a larger chain, data is

stored over even more avhdx files and the results of UNMAP become less predictable as well .

The good news is, it all still works . And, as your snapshots expire or your checkpoints are deleted or

merged, the benefits will be there . This should not be an issue as the true value of thin provisioning and

UNMAP is accomplished over time . If it is a concern for certain VMs, don’t use checkpoint or snapshots

on the LUN where it resides .



Is fragmentation an issue in all of this?You’ll most likely have more urgent concerns than fragmentation in your Hyper-V environment — on

modern hardware with a well-designed Hyper-V deployment and the correct choice of virtual disk . But,

it does exist, and both extremes — “it’s a huge wide spread issue” or “it doesn’t matter at all” — aren’t

very helpful . Let’s take a quick look at fragmentation, especially in its relation to thin provisioning and

dynamically expanding virtual disks

On the storage arrayGreat news . The internal structure of the storage array isn’t your problem — it’s the vendor’s

responsibility . In modern storage arrays with auto-tiering, snapshots, deduplication and virtualized

storage systems, the LUNS aren’t exactly a contiguous file and Windows doesn’t know anything about

that anyway . Then there’s the fact that with SSD “classic” fragmentation is a non-issue and we’re seeing

ever more of SSD out there . However, even with SSDs and “virtualized storage” in storage arrays,

fragmentation still comes in to play in performance . The point is that storage vendors make their

money form delivering robust, reliable, highly performant and capable storage . They need to worry

about how they solve the technical challenges .

The storage array is an abstraction layer where the only thing you can do is ask for advice from your

storage vendor on what to do regarding defragmentation on the Windows and Hyper-V side of

things . You have a 95% chance they’ll tell you that they handle it “under-the-hood,” to “ask Microsoft” or

nothing . I’ve learned to stop worrying and choose a good storage solution that I love .

In regards to Hyper-VThere are three types of disk fragmentation you might need to deal with in regards to Hyper-V:

1 . Fragmentation of the file system on the host LUN where the VMs reside

2 . Fragmentation of files system on the LUNs inside of the VM

3 . Block fragmentation of the VHDX itself — this is a potentially issue with dynamic disks

and differencing disks



On the Hyper-V host

When you use a lot of dynamically expanding virtual disks, and they all grow frequently over time,

fragmentation will build up . This may or may not cause performance issues . Things aren’t as simple on a

modern storage array as they used to be on a single HHD . The good news is, when this becomes a problem

and you’ve weighed the impact on other subsystems, you can defragment your LUNs — and even the CSV —

without down time . Check out this blog post on the subject by Microsoft How to Run ChkDsk and Defrag on Cluster Shared Volumes in Windows Server 2012 R2 . Decent third party tools will support this by default .

Take a look at the Optimize Drives GUI from a Hyper-V host that has both local, non-CSV LUNS and CSV LUNs .

Figure 17: A Hyper-V cluster host with local and SAN drives.

http://blogs.msdn.com/b/clustering/archive/2014/01/02/10486462.aspx

http://blogs.msdn.com/b/clustering/archive/2014/01/02/10486462.aspx



The local disks are identified as hard disk drives, the SAN LUNs, both non-CSV and CSV LUNs are

identified as thin provisioned drives . Pretty smooth . Here’s another from a lab server of mine . You can

see that Windows tries to identify the type of disk and will run the optimization based on that . If you

think that SSD/Flash storage means absolutely no fragmentation, I invite you to read Scott Hanselman’s

excellent post on the subject The real and complete story — Does Windows defragment your SSD? .

Figure 18: A server with local direct attached storage only, showing both solid state and hard disk drives

Inside the VM

That’s your standard run-of-the-mill fragmentation within the OS and data partitions . Whether this is

problematic and requires regular maintenance depends on the characteristics of the workload . Since

Windows Server 2012, we have the storage optimizer which runs as maintenance job in Windows . The

storage optimizer will optimize things for you as best as it can in an intelligent way . Please note that

capabilities have evolved a lot over the various Windows versions . So read What's new in defrag for Windows Server 2012/2012R2 .

http://www.hanselman.com/blog/TheRealAndCompleteStoryDoesWindowsDefragmentYourSSD.aspx

http://blogs.technet.com/b/askcore/archive/2014/02/17/what-39-s-new-in-defrag-for-windows-server-2012-2012r2.aspx




I do need to clarify one thing . When you look at Optimize Drives inside a Windows Server 2012 R2 VM,

you’ll see all disks identified as a thin provisioned drive .

Figure 19: Optimize Drives in the guest OS reporting the media type — It’s all thin provisioned drives.

The screenshot is of a VM that has every type of disk attached . The Optimize Drives GUI inside this VM shows

you a mix of fixed and dynamically expanding VHD/VHDX virtual disks . The VM resides on a CSV provided

by a modern SAN . Note why they are all identified as “thin provisioned drive” — A VHD doesn’t know about

UNMAP, does it? How does that compute? Well, my understanding is that all virtual disks, dynamically

expanding or fixed, both VHDX and VHD, are identified as thin provisioned disks, no matter what type of

physical disk they reside on (CSV, SAS, SATA, SSD, shared/non shared) . This is to allow for UNMAP command

to be sent from the guest to the Hyper-V storage stack below . If it’s a VHD those UNMAP commands are

basically black holed — just like they’d never be passed down to a local SATA HHD (on the host) that has no

idea what it is or what it’s used for . The physical layer will figure out what to do .

But, to get back to the part about fragmentation — unless you see serious performance degradation,

you’ll be fine with without intervention . The key thing is to know your workloads and data/file behavior

to know where fragmentation might cause issues and monitor for that .



The internal VHDX structure

The most unknown one for you to watch is the internal fragmentation of the VHDX file . Not because it’s a

huge problem, but because it’s a little known one . It’s easy to spot via PowerShell . It’s fixed by creating a new

virtual disk using the fragmented one as the source . See Hyper-V and Disk Fragmentation . But this would

cause down time, so the best way to handle this is to leverage storage live migration . If you can leverage

ODX it’s also pretty fast . So, always leave yourself some margin on storage space to be able to perform

maintenance operations via storage live migration . However, there is little guidance on what percentage is

acceptable . I wouldn’t go through the effort for less than 15% fragmentation . Your mileage may vary .

Closing thoughts on fragmentationDynamically expanding virtual disks are more prone to fragmentation than fixed virtual disks . However,

experience shows me that heavy IO workloads that have many read/writes/deletes, and potentially a

large delta due to the volatile nature of the data, are not good candidates for dynamically expanding

virtual disks in regards to performance . So, choose your workload for dynamically expanding virtual

disks wisely and your fragmentation problems will be kept to a minimum as well .

Read up on the subject in What's new in defrag for Windows Server 2012/2012R2 and learn that

running defragmentation is not a simple yes or no question . Defragmentation itself has changed in

what it does and how . This has evolved with the versions of Windows and modern tiered storage, SSD/

Flash, deduplication and the like . Just doing classic, old-school defragmentation out of habit is pure

institutional inertia at work . Don’t be like that . Find out if there is a real issue — and if so, solve it in

an intelligent manner . Stating that it’s not an issue anymore is overly simplistic . Observe, decide, act

(ODA) is the way to go . Don’t just "act" by default “just in case” . If you do decide to leverage third party

solutions for defragmentation, choose a good-quality, modern version that knows how to deal with

modern storage and not just hard-disk drives .

https://blog.workinghardinit.work/2015/05/25/hyper-v-and-disk-fragmentation/




Other capabilities enhancing these efficiencies ODXIn all situations where the storage array (or arrays) supports offloaded data transfer (ODX), you’ll see

a significant increase in performance when copying or moving data . The array or arrays do the heavy

lifting under the hood .

You’ll enjoy the benefits from ODX when:

• Resizing VHD/VHDX virtual hard disks

• Creating VHD/VHDX virtual hard disks

• Merging checkpoints or differencing disks

• Storage live migration or shared nothing live migration

• Copying data on the same LUN or between LUNs on the same servers

(physical and virtual) or different servers

As you can see, Hyper-V efficiencies are enhanced by ODX when available on both

the virtual and the physical layer .

There’s only one rule, and that is that the data has to live within the storage array or a cluster of arrays

that support ODX . On a high level, ODX works with a token .

Figure 20: ODX



Windows ODX allows direct data transfer within or between compatible storage devices, bypassing

the servers . This minimizes latencies, maximizes throughput and reduces CPU and network load on the

server hosts . It all happens transparently and requires no change in processes .

ODX uses a token for reading and writing data within or between intelligent storage arrays . The token is only

used between the source and target . This token is what allows the process to be transparent to the operating

system while the actual data transfer happens in or between the storage arrays . ODX avoids all the overhead

associated with “ordinary” data copies or moves; which need CPU cycles, incur multiple data copy actions and

context switches to move data across the TCP/IP stack to transfer the data between servers .

ReFSIn Windows Server 2016 ReFS is enhanced with capabilities that make it extremely fast in performing

file operations that are important and useful to Hyper-V scenarios . The creation and merging of virtual

disks now happens at lightning speed . For this set of actions it’s like ODX behavior without needing a

specialized storage array . Pretty impressive and yet a very economical way of leveraging commodity

storage for even better results .

ConclusionThin provisioning, UNMAP and snapshots are very important tools that are available to you, to provide

efficiencies and speed at both the virtual and physical storage layer . It would be a tremendous waste

to ignore them — either by not realizing what they can do for you or out of fear for the risks . Learn to

master your tools and use them to your advantage . That means knowing about their strengths and

weaknesses so you can manage the risk while achieving your goals . The raw power of storage (speed,

low latency and capacity) can only shine brighter when assisted by correctly used intelligent features

that help us deliver efficiencies . Other storage enhancements like ODX or ReFS act as force multipliers

and help storage, which has gotten very scalable and capable in Windows Server 2012 R2 Hyper-V

shine even more . All we need to do is remember that with great power comes great responsibility .

Knowledge is your friend . Thin provisioning, UNMAP and snapshots will deliver great value when used

correctly for the right workloads and scenarios .



Didier Van Hoye is an IT veteran with over 17 years of expertise in Microsoft

technologies, storage, virtualization and networking . He works mainly

as a subject matter expert advisor and infrastructure architect in Wintel

environments leveraging DELL hardware to build the best possible high

performance solutions with great value for money . He contributes his

experience and knowledge to the global community as Microsoft MVP in

Hyper-V, a Veeam Vanguard, a member of the Microsoft Extended Experts

Team in Belgium and a DELL TechCenter Rockstar . His areas of expertise include

blogging, writing and public speaking .

Twitter @workinghardinit

Blog http://blog.workinghardinit.work

LinkedIn http://be.linkedin.com/in/didiervanhoye

About Veeam Software Veeam® recognizes the new challenges companies across the globe face in enabling the Always-

On Business™, a business that must operate 24/7/365 . To address this, Veeam has pioneered a

new market of Availability for the Always-On Enterprise™ by helping organizations meet recovery

time and point objectives (RTPO™) of less than 15 minutes for all applications and data, through

a fundamentally new kind of solution that delivers high-speed recovery, data loss avoidance,

verified protection, leveraged data and complete visibility . Veeam Availability Suite™, which

includes Veeam Backup & Replication™, leverages virtualization, storage, and cloud technologies

that enable the modern data center to help organizations save time, mitigate risks, and

dramatically reduce capital and operational costs .

Founded in 2006, Veeam currently has 37,000 ProPartners and more than 183,000 customers

worldwide . Veeam's global headquarters are located in Baar, Switzerland, and the company has

offices throughout the world . To learn more, visit http://www.veeam.com .

About the Author

https://twitter.com/workinghardinit

http://blog.workinghardinit.work

http://be.linkedin.com/in/didiervanhoye

http://be.linkedin.com/in/didiervanhoye

https://www.veeam.com

https://www.veeam.com/data-center-availability-suite.html

https://www.veeam.com/vm-backup-recovery-replication-software.html

https://www.veeam.com/



Learn more and previewthe upcoming v9 release

vee.am/v9

NEW Veeam® AvailabilitySuite™ v9

RTPO™ <15 minutes forALL applications and data

storage efficiencies with hyper-v at the virtual and...

Documents