Download - 9 Storage Considerations Vm Administrators
8/12/2019 9 Storage Considerations Vm Administrators
http://slidepdf.com/reader/full/9-storage-considerations-vm-administrators 1/10
9 Storage Considerationsfor VM AdministratorsWhat a VM administrator needs to know to avoid
performance problems caused by insufficient storage
capacity and plan for environment growth
WHITEPAPER BY ALEX ROSEMBLAT
8/12/2019 9 Storage Considerations Vm Administrators
http://slidepdf.com/reader/full/9-storage-considerations-vm-administrators 2/10
© 2010 VKernel Corporation. All rights reserved 2
Table of ContentsIntroduction .................................................................................................................................................. 3
Insufficient Storage Capacity Causes VM Performance Problems................................................................ 3
How VM Data Flows to Shared Storage: From Host to SAN and Back .................................................... 3
The Hardware and Design Decisions that Determine Storage Capacity ................................................. 4
Storage Space ............................................................................................................................................ 5
Throughput ............................................................................................................................................... 6
Inter-connectivity/Fabric ...................................................................................................................... 6
Host Bus Adaptor .................................................................................................................................. 6
Storage Controller ................................................................................................................................. 6
Spindles ................................................................................................................................................. 6
How to Forecast Storage Capacity Requirements ........................................................................................ 7
Virtualization will Vastly Increase Storage Volume ...................................................................................... 8
Virtual Environments Employ Numerous Host to Datastore Connection Permutations.............................. 8
Virtualization Will Cause Unpredictable Throughput Concentrations ......................................................... 9
Additional Storage Abstraction Layers are Used with Virtualization .......................................................... 10
Virtualization Introduces Environment Dynamism and Automation ......................................................... 10
Conclusion ................................................................................................................................................... 10
8/12/2019 9 Storage Considerations Vm Administrators
http://slidepdf.com/reader/full/9-storage-considerations-vm-administrators 3/10
© 2010 VKernel Corporation. All rights reserved 3
IntroductionMany VM performance issues stem from bottlenecks within a data center’s storage throughput capacity.
These issues originate from inadequate planning for a virtual environment’s storage needs and can be
avoided with visibility into storage infrastructure and a planning process to address storage resource
expansion in line with the application usage growth that the environment will face. Even for data
centers that employ Storage Area Networks (SAN) for their physical servers before shifting to a virtual
infrastructure, virtualization adds additional intricacy that must not be overlooked and needs to be
factored into storage planning decisions, documentation procedures, and issue troubleshooting. This
whitepaper will present nine considerations that VM administrators must know about storage to avoid
VM performance problems, forecast storage capacity needs, and further understand how virtualization
makes data center storage more complex.
Insufficient Storage Capacity Causes VM Performance ProblemsPlanning for storage in a data center is critical as environments with insufficient storage capacity will
experience VM performance problems. Virtual environments that utilize a SAN for their storage needs
experience a complex resource sharing arrangement as VMs and hosts connect to a separate storage
area and share bandwidth within this connection, as well as the actual storage space. Disk I/O and the
storage space itself are two vital resources that a VM requires to function. If a VM needs storage space
and there is none left, the VM will cease to function. Without sufficient disk I/O capacity, commands will
be delayed as they wait their turn to pass through the interconnective “fabric” to communicate to the
SAN or host. When commands reach the actual disks, if the disks do not have the physical capacity or
have not been configured to handle the volume of requests, commands will once again be delayed as
they wait their turn to access the disk.
Due to the possibility of capacity bottlenecks which will cause VM performance problems, storageimplementations must be finely tuned to an organization’s requirements. This can become a
complicated process in and of itself as there is an immense amount of vendors and configurations to
select from with a wide spectrum of costs at nearly every level of storage infrastructure. Finding the
right “mix” for a data center’s needs at an appropriate price can be challenging. To add further
complication to the process, many organizations employ a separate IT department to manage the
storage infrastructure. That department may not have visibility into the firm’s application requirements,
usage growth rates, and expansion forecasts which are vital to determine the amount of storage
capacity a data center needs.
How VM Data Flows to Shared Storage: From Host to SAN and BackShared storage requires an infrastructure with several different hardware components. Importantly,
data will be processed through the entire infrastructure only at the maximum capacity of the smallest
capacity component, or the “weakest link”, so to say. That means that if an organization has
hypothetically deployed high capacity spindles (the actual “disks” that are being written to), but is using
a much lower capacity “fabric” for interconnectivity between the ESX or Hyper-V hosts, the advantage of
8/12/2019 9 Storage Considerations Vm Administrators
http://slidepdf.com/reader/full/9-storage-considerations-vm-administrators 4/10
© 2010 VKernel Corporation. All rights reserved 4
having the higher capacity spindles will be negated. Consequently, the entire storage system will only be
able to handle the amount of data that can fit through the fabric. Due to the systemic nature of storage,
a holistic view of an organization’s application performance needs is required to define storage and
storage access capacity requirements.
Figure 1 illustrates how data flows from a host to a disk. As a command leaves for the datastore fromthe ESX or Hyper-V host, it travels to the Host Bus Adaptor, which acts as a “travel agent” for the
command and tells it where it needs to go within the SAN and how to get there. After the data passes
through the Host Bus Adaptor, it passes through the inter-connectivity layer, or fabric to arrive at the
SAN. A command emerges in the Storage Controller which will point the command to the correct
spindle. The command reaches the physical spindle where it executes its instructions and is given a
response to deliver back to the ESX or Hyper-V host. This command then travels through the same
infrastructure levels in reverse, back to the host. A bottleneck or issue in any of the levels in this flow
will cause performance issues which will be detected as increased latency.
ESX/Hyper-V Host Host Bus Adaptor Inter-connectivity/
Fabric Storage Controller Spindle
Figure 1 – The Hardware Components in the Storage Data Flow
The Hardware and Design Decisions that Determine Storage CapacityAs described above, there are many moving parts within the storage infrastructure. As a result, there are
many decision points that go into determining the specifications and configurations of the different
hardware parts that make up this resource. To determine those specifications, information for currentdata production, number of data transactions, and importantly, the forecasted growth in applications
and application use is vital to plan appropriately for environment growth. As mentioned above, the
weakest link in the “storage infrastructure” chain will determine the total throughput capacity for the
entire storage resource.
Figure 2 shows the how the hardware decisions that must be made are interconnected. The hardware
supplied in each of these points and its configuration will determine total storage capacity. At its highest
level, storage capacity is made up of actual storage space (i.e, the rough number of gigabytes available
to store data), and the data throughput that the SAN can take in. Throughput is determined by the
volume of data within the connection from host to SAN, and also within the capacity of the hardwarethat processes the data to read and write it to disk – namely the host bus adapters, storage controllers,
and spindles. More detailed descriptions of the data needed to determine a requirement at each level is
included below Figure 2.
8/12/2019 9 Storage Considerations Vm Administrators
http://slidepdf.com/reader/full/9-storage-considerations-vm-administrators 5/10
© 2010 VKernel Corporation. All rights reserved 5
Storage Space Throughput
Host Bus AdSpindles
Storage HardwareInter-connectivity/
Fabric
Overhead
Storage Capacity
Storage Controller
Figure 2 – The Connections in Hardware Decision Points that Determine Storage Capacity
Storage Space
Storage space refers to the actual amount of disk space needed for a virtual environment to continue
running. Importantly, if a VM needs storage space and there is no more available, the VM or an
application on the VM will cease functioning. Virtualized infrastructure storage needs can be much
larger than the storage that was necessary when the same applications ran on physical servers as VMs
will likely require more storage space than just their allocations. Each VM may have an associated
snapshot for quick maintenance purposes. Also additional storage will be needed to host all templates
used to quickly provision VMs, and there may be use cases where multiple VM images of the same
VMDK or VHD file must exist. Additionally, as VMs can move around with vMotion, the entire storage
resource is always fluid and dynamic, and a buffer must be left so that the environment has enough
“slack” to rebalance itself when necessary.
Notably, decisions on data redundancy and the RAID scheme employed will impact the amount of
storage space needed. Depending on the RAID scheme, up to double the amount of physical storage
space will be needed for all actual data being stored. To further add complexity, different parts of the
SAN may be equipped at different RAID levels based on the criticality and performance needs of the
information being stored and accessed. Accordingly, the decisions that are made at the spindle level will
directly impact the amount of storage space that is needed.
8/12/2019 9 Storage Considerations Vm Administrators
http://slidepdf.com/reader/full/9-storage-considerations-vm-administrators 6/10
© 2010 VKernel Corporation. All rights reserved 6
Throughput
Disk throughput is the most common resource in which capacity bottlenecks can arise. These issues can
be difficult to pinpoint, and are often sensed through a high latency value. Thus, ensuring that the
hardware in the storage solution accurately fits the expected data transaction need is critical.
Inter-connectivity/FabricThe connection between the host and the datastore is a critical area to determine the amount of
throughput that will be available to the SAN. Several options for both connection technology and file
standards exist, with price varying almost directly to the bandwidth available, and thus the throughput
capacity within such hardware. Although some high capacity solutions such as Fiber Channel may be
very expensive, because storage throughput has such a high impact on VM performance, such solutions
may be necessary to maintain a high level of service based on the types of applications that are running
in the environment.
Host Bus Adaptor
The host bus adaptor is the piece of hardware that will direct the command from the host to the disk,and then catch the return message. Generally, this hardware component is specified by the host
hardware vendor.
Storage Controller
The storage controller is the piece of hardware that receives commands from the fabric at the SAN, and
integrates to the spindles. It is important to note that based on the RAID standard used, additional
calculations may need to be employed with complex RAID performance or data redundancy
implementations. Thus, if such RAID standards are used, additional time in milliseconds will be needed
every time that a command is sent to disk or returned which could impact performance, especially if
other areas of the storage infrastructure could become constricted.
Spindles
The spindle is the actual disk that will store the data that is being written and accessed by the virtual
environment. Tremendous variability exists with the technical capabilities of the spindles, with price
directly mapping to performance and amount of storage space. Additionally a fabric must exist between
the storage controller and the spindles that similar to the fabric that connects the SAN to the hosts, can
vary greatly in throughput capacity and price. As mentioned previously, if the throughput capacity at the
spindle level does not match the other components in the storage infrastructure stack (most critically,
the fabric connecting the host to the SAN), some of the capacity that a disk has will not be accessible, or
vice versa.
The RAID standards that will be employed must be decided on at the spindle level, which will directly
affect performance of all disk read and writes as an aggregate. RAID standards will also drive the total
amount of storage that will be needed as more intensive RAID standards for data redundancy or
performance require higher storage overhead which translates into greater amounts of storage space.
Additionally, if there is any level of deduplication of the data (i.e. redundancies shared by several data
8/12/2019 9 Storage Considerations Vm Administrators
http://slidepdf.com/reader/full/9-storage-considerations-vm-administrators 7/10
© 2010 VKernel Corporation. All rights reserved 7
objects such as when an often-used operating system within a VM is housed in one place), this will make
storage usage more efficient.
How to Forecast Storage Capacity Requirements
Without adequate storage or disk throughput resources, VM performance will be gravely affected andan environment may be unable to grow. Thus, calculating storage and disk I/O needs must be
undertaken with anticipation, and if a separate department manages storage, application-side growth
visibility must be provided for accurate storage capacity forecasting. To begin forecasting capacity,
system administrators must first take a baseline measurement of their current environments.
Importantly, the following questions must be answered:
What is the total volume of data that is transacted between read and write operations per
second for every second of a typical week and what is the breakdown per application?
What are the high-water marks in total data volume transacted and when do they occur?
What are the average and maximum data transaction sizes for the whole environment and perapplication?
What are the average and maximum number of actual transactions at any given second for the
whole environment and per application?
What are the peak time periods for data volume and number of transactions?
Is there a measure of VM sprawl and waste (i.e. storage being taken up by abandoned VM
images, unused snapshots, unused templates, powered-off VMs that have not been deleted,
and zombie VMs)?
After having established a solid baseline of both the number of transactions that occur at each second,
and the data size of all the transactions, and having further stratified this data by application type, asubjective assessment of application growth should be added by answering these questions:
Which applications will see increased usage in the near future and by what growth margin?
Which new applications or increased instances of existing applications will be provisioned in the
near future?
With assumptions on application and usage growth in hand, forecasting for the amount of storage space
and throughput can be made. Importantly, a percentage of excess capacity should be added in as well,
as forecasting assumptions should remain on the liberal side because running out of storage space or
disk I/O will have grave ramifications for both VM performance and the ability to immediately provision
new VMs or increase allocations on existing VMs.
A storage administrator should be given a total amount of storage space that will be needed from
projections for the upcoming purchase period, as well as a list of new applications, or application
expansions with a requested performance level (low, medium, high) for each application. Any
information on baseline data transaction volume for existing applications will be highly valuable in this
process.
8/12/2019 9 Storage Considerations Vm Administrators
http://slidepdf.com/reader/full/9-storage-considerations-vm-administrators 8/10
© 2010 VKernel Corporation. All rights reserved 8
A storage administrator will then be able to make appropriate decisions on how to increase storage and
disk I/O capacity to serve the expansion of the environment.
Virtualization will Vastly Increase Storage Volume
Environments that virtualize will experience significant growth in storage needs. This stems from twosources:
1. Additional File Creation for VM Maintenance – Virtual disks are nothing more than
VMDK(VMWare) or VHD(Microsoft Hyper-V) files. These are large files as they include the data
for operating systems and other supporting software. Each VM may also have several additional
files of similar size created for maintenance purposes in snapshots or copies of the VM image.
Data-intensive templates may also be needed for each kind of VM instance that is provisioned.
Additionally, many environments suffer from VM data waste in abandoned VM images,
powered-off VMs that are not needed and not deleted, unused snapshots, unused templates,
and zombie VMs which are left powered-on and not used. These data objects can be difficult tofind and clean up, and can take up large amounts of storage.
2. IT Usage Growth – As VMs are fast and easy to create, virtualization often unlocks pent up
demand within the organization for new applications, or extended use of existing applications.
This translates into more VM images with their associated overhead in snapshots, templates and
other files as well as a growth in data produced within the applications that must be stored.
Also, if environments are thin provisioned, this IT usage growth can cause hard to detect
additional storage growth if a backup fails or other problems occur. In such a scenario, VM
rebalancing can take place for many VMs sharing the storage and log files will begin to be
written at a breakneck pace until that growth is noticed, a full backup occurs or the log filescompletely fill up available storage.
This growth in data creation should be anticipated as storage administrators must not only extend their
storage capacity, but may also have to upgrade to higher capacity architectures and new hardware to
handle the increased storage and storage access demands.
Virtual Environments Employ Numerous Host to Datastore Connection
PermutationsPhysical servers that employ shared storage feature a host to datastore architecture that is typically
quite simple. There is usually a one to one mapping between host and Logical Unit Number (LUN) as is
shown in Figure 3.
8/12/2019 9 Storage Considerations Vm Administrators
http://slidepdf.com/reader/full/9-storage-considerations-vm-administrators 9/10
© 2010 VKernel Corporation. All rights reserved 9
LUNPhysical Host
Figure 3 – Physical Host to LUN Connection
In the virtualized world, a host can map to an unlimited amount of datastores based on the needs of the
VMs within the host. Because an issue can occur within any one of these host to datastore connections,
the number of areas that must be monitored will increase drastically. As Figure 4 shows, the sheer
amount of permutations can add significant complexity if for example, a latency issue needs to be
investigated to find the root cause. Some environments may also choose to replicate the existing
connections for redundancy causing the total number of connections to grow even further.
Virtualized Hosts Datastores
Figure 4 – Multiple Virtualized Host to Datastore Connections
Virtualization Will Cause Unpredictable Throughput ConcentrationsAs hosts and VMs share storage, they must also share the connections to the SAN. Each VM however
functions independently and experiences usage peaks and increased loads for its own reasons.
Additionally, as VMs can move around and gain access to different datastores, knowing how much
throughput will occur at any given time in any host to datastore connection is impossible. Hence, the
concentration of throughput at any part of the virtual infrastructure is unpredictable and an
8/12/2019 9 Storage Considerations Vm Administrators
http://slidepdf.com/reader/full/9-storage-considerations-vm-administrators 10/10
© 2010 VKernel Corporation. All rights reserved 10
environment must have enough capacity to handle not only regular operating needs, but also peaks in
usage. With this multiple host to datastore connection structure, issues can occur quickly with little to
no warning and can be hard to track down as the entire virtual environment keeps on shifting.
Additional Storage Abstraction Layers are Used with VirtualizationWhile physical hosts connected to the SAN generally link to a single LUN (an abstraction layer of several
physical spindles), virtualization has introduced an additional layer of abstraction: the datastore (a
collection of LUNs). Although this new level of abstraction allows for flexibility and additional robustness
in storage redundancy, performance tiering, deploying VMs and allowing the virtual environment to
automatically balance itself, it also adds additional complexity. A VM knows which datastore it is
mapped to, but then additional investigation must be employed to know which LUNs make up the
datastore, and then which spindles make up the LUN. This process adds time and additional steps to
maintenance and issue troubleshooting.
Virtualization Introduces Environment Dynamism and AutomationOne of the great cost and time savers of virtualization is that aspects of data center maintenance can be
automated, and because resources are shared, VMs can easily shift their resource usage and move
around when necessary. However, this flexibility also means that additional planning and capacity must
be available to enable this functionality. “Slack” capacity is needed in storage to ensure that DRS and
vMotion will work appropriately to deploy and shift VMs or errors will occur. Also, because an
environment is dynamic and ever-changing, problems can appear and disappear and trying to
troubleshoot or find the root cause of an issue becomes harder and more tedious. Further, without
adequate documentation and analytic abilities to piece together all circumstances that were present
when an issue occurred, finding the cause of the issue can become impossible.
ConclusionHaving visibility into the capacity for storage space and disk throughput is critical to maintaining a high
performing environment. Many capacity bottlenecks occur within the disk throughput resource area,
cause massive performance problems and can be hard to track down. Because of the dynamic nature of
shared storage, slack must also be built in to storage resource calculations to allow for additional
capacity needed at peak times or when self-balancing action such as vMotion occur.
As environments are always growing, storage administrators need visibility into application and
application usage growth to adequately plan and purchase the necessary hardware to accommodate the
growth. The infrastructure to operate shared storage is complex and has many moving parts. The total
bandwidth of that infrastructure is only as robust as the “weakest link” of the infrastructure. Without
adequate storage planning, virtualized environments run the risk of running out of disk throughput
capacity and facing VM performance problems or stunted growth.