storage technology

Storage TechnologyMehrvash Vosoughian

ITOCCOSeptember 2016

Contents 1- Storage System Introduction to Information - Data - Types of Data Storage Evolution of storage architecture - Volume Manager - File System Connectivity - IDE/ATA and Serial ATS - SCSI and Serial SCSI - Fibre Channel -Internet Protocol

Storage Disk Drive Components Host access to Data 2- Data Protection : RAID 3- Storage Networking Technology _DAS _NAS _SAN 4-Fiber Channel Storage Area Network 5-IP SAN and FCoE 6-Virtualization 7-Cloud Storage

Information StorageData Data is a collection of raw facts from which conclusions might be drawn.

Handwritten letters, a printed book, a family photograph, printed and duly signed copies of

mortgage papers, a bank’s ledgers, and an airline ticket are al examples that contain data.

Before the advent of computers, the methods adopted for data creation and sharing were

limited to fewer forms, such as paper and film. Today, the same data can be converted into

more convenient forms, such as an e-mail message, an e-book, a digital image, or a digital

movie. This data can be generated using a computer and stored as strings of binary numbers

(0s and 1s), as shown in Figure 1-1. Data in this form is called digital data and is accessible by

the user only after a computer processes it.

Types of Data Data can be classified as structured or unstructured (see Figure 1-2) based on how

it is stored and managed. Structured data is organized in rows and columns in a rigidly defined format so that applications can retrieve and process it efficiently.

Structured data is typically stored using a database management system (DBMS). Data is unstructured if its elements cannot be stored in rows and columns, which makes it difficult to query and retrieve by applications. For example, customer contacts that are stored in various forms such as sticky notes, e-mail messages, business cards, or even digital format fi les, such as .doc, .txt, and .pdf. Due to its unstructured nature, it is difficult to retrieve this data using a traditional customer relationship management application. A vast majority of new data being created today is unstructured. The industry is challenged with new architectures, technologies, techniques, and skills to store, manage, analyze, and derive value from unstructured data from numerous sources.

Storage

Data created by individuals or businesses must be stored so that it is easily

accessible for further processing. In a computing environment, devices designed

for storing data are termed storage devices or simply storage. The type of

storage used varies based on the type of data and the rate at which it is created

and used. Devices, such as a media card in a cell phone or digital camera, DVDs,

CD-ROMs, and disk drives in personal computers are examples of storage

devices.

Businesses have several options available for storing data, including internal hard

disks, external disk arrays, and tapes.

Evolution of Storage Architecture Historically, organizations had centralized computers (mainframes) and

information storage devices (tape reels and disk packs) in their data center.

The evolution of open systems, their affordability, and ease of deployment made it possible for business units/departments to have their own servers and storage. In earlier implementations of open systems, the storage was typically internal to the server. These storage devices could not be shared with any other servers.

This approach is referred to as server-centric storage architecture. In this architecture, each server has a limited number of storage devices, and any administrative tasks, such as maintenance of the server or increasing storage capacity, might result in unavailability of information. The proliferation of departmental servers in an enterprise resulted in unprotected, unmanaged, fragmented islands of information and increased capital and operating expenses.

To overcome these challenges, storage evolved from server-centric to information-centric architecture .In this architecture, storage devices are managed centrally and independent of servers. These centrally-managed storage devices are shared with multiple servers. When a new server is deployed in the environment, storage is assigned from the same shared storage devices to that server.

The capacity of shared storage can be increased dynamically by adding more storage devices without impacting information availability. In this architecture, information management is easier and cost-effective. Storage technology and architecture continue to evolve, which enables organizations to consolidate, protect, optimize, and leverage their data to achieve the highest return on information assets.

Volume Manager In the early days, disk drives appeared to the operating system as a number of

continuous disk blocks. The entire disk drive would be allocated to the file system or other data entity used by the operating system or application. The disadvantage was lack of flexibility. When a disk drive ran out of space, there was no easy way to extend the file system’s size. Also, as the storage capacity of the disk drive increased, allocating the entire disk drive for the file system often resulted in underutilization of storage capacity.

The evolution of Logical Volume Managers (LVMs) enabled dynamic extension of file system capacity and efficient storage management. The LVM is software that runs on the compute system and manages logical and physical storage. LVM is an intermediate layer between the file system and the physical disk. It can partition a larger-capacity disk into virtual, smaller-capacity volumes (the process is called partitioning) or aggregate several smaller disks to form a larger virtual volume. (The process is called concatenation.)

Volume Manager These volumes are then presented to applications. Disk partitioning was introduced to

improve the flexibility and utilization of disk drives. In partitioning, a disk drive is divided into logical containers called logical volumes (LVs) .For example, a large physical drive can be partitioned into multiple LVs to maintain data according to the file system and application requirements. The partitions are created from groups of contiguous cylinders when the hard disk is initially set up on the host. The host’s file system accesses the logical volumes without any knowledge of partitioning and physical structure of the disk.

Concatenation is the process of grouping several physical drives and presenting them to the host as one big logical volume

The LVM provides optimized storage access and simplifies storage resource management. It hides details about the physical disk and the location of data on the disk. It enables administrators to change the storage allocation even when the application is running.

File System A file is a collection of related records or data stored as a unit with a name. A

file system is a hierarchical structure of fi les. A file system enables easy access to data fi les residing within a disk drive, a disk partition, or a logical volume. A file system consists of logical structures and software routines that control access to files. It provides users with the functionality to create, modify, delete, and access files. Access to fi les on the disks is controlled by the permissions assigned to the file by the owner, which are also maintained by the file system.

A file system organizes data in a structured hierarchical manner via the use of directories, which are containers for storing pointers to multiple files. All file systems maintain a pointer map to the directories, subdirectories, and files that are part of the file system. Examples of common file systems are:

FAT 32 (File Allocation Table) for Microsoft Windows NT File System (NTFS) for Microsoft Windows UNIX File System (UFS) for UNIX Extended File System (EXT2/3) for Linux

file system block A file system block is the smallest “unit” allocated for storing data. Each file

system block is a contiguous area on the physical disk. The block size of a file

system is fixed at the time of its creation. The file system size depends on

the block size and the total number of file system blocks. A file can span

multiple file system blocks because most files are larger than the predefined

block size of the file system. File system blocks cease to be contiguous and

become fragmented when new blocks are added or deleted. Over time, as

files grow larger, the file system becomes increasingly fragmented.

the process of mapping user files to the disk storage subsystem with an LVM 1. Files are created and managed by users and

applications. 2. These files reside in the file systems. 3. The file systems are mapped to file system

blocks. 4. The file system blocks are mapped to logical

extents of a logical volume. 5. These logical extents in turn are mapped to the

disk physical extents either by the operating system or by the LVM. 6. These physical extents are mapped to the disk

sectors in a storage subsystem. If there is no LVM, then there are no logical

extents. Without LVM, file system blocks are directly mapped to disk sectors.

Connectivity Connectivity and communication between host and storage are enabled

using physical components and interface protocols. The physical components of connectivity are the hardware elements that

connect the host to storage. Three physical components of connectivity between the host and storage are the host interface device, port, and cable.

A host interface device or host adapter connects a host to other hosts and storage devices.

Examples of host interface devices are host bus adapter (HBA) and network interface card (NIC). Host bus adaptor is an application-specific integrated circuit (ASIC) board that performs I/O interface functions between the host and storage, relieving the CPU from additional I/O processing workload. A host typically contains multiple HBAs

A port is a specialized outlet that enables connectivity between the host and external devices. An HBA may contain one or more ports to connect the host to the storage device. Cables connect hosts to internal or external devices using copper or fiber optic media

Interface Protocols

A protocol enables communication between the host and storage. Protocols are implemented using interface devices (or controllers) at both source and destination. The popular interface protocols used for host to storage communications are Integrated Device Electronics/Advanced Technology Attachment (IDE/ATA), Small Computer System Interface (SCSI), Fibre Channel (FC) and Interne Protocol (IP).

IDE/ATA and Serial ATA

IDE/ATA is a popular interface protocol standard used for connecting storage devices, such as disk drives and CD-ROM drives. This protocol supports parallel transmission and therefore is also known as Parallel ATA (PATA) or simply ATA. IDE/ATA has a variety of standards and names The Ultra DMA/133 version of ATA supports a throughput of 133 MB per second. In a master-slave configuration, an ATA interface support two storage devices per connector. However, if the performance of the drive is important, sharing a port between two devices is not recommended. The serial version of this protocol supports single bit serial transmission and is known as Serial ATA (SATA). High performance and low cost SATA has largely replaced PATA in newer systems. SATA revision 3.0 provides a data transfer rate up to 6 Gb/s.

SCSI and Serial SCSI SCSI has emerged as a preferred connectivity protocol in high-end

computers. This protocol supports parallel transmission and offers improved performance, scalability, and compatibility compared to ATA. However, the high cost associated with SCSI limits its popularity among home or personal desktop users. Over the years, SCSI has been enhanced and now includes a wide variety of related technologies and standards. SCSI supports up to 16 devices on a single bus and provides data transfer rates up to 640 MB/s (for the Ultra-640 version). Serial attached SCSI (SAS) is a point-to-point serial protocol that provides an alternative to parallel SCSI. A newer version of serial SCSI (SAS 2.0) supports a data transfer rate up to 6 Gb/s. This book’s Appendix B provides more details on the SCSI architecture and interface.

Fibre Channel

Fibre Channel is a widely used protocol for high-speed communication

to the storage device. The Fibre Channel interface provides gigabit

network speed. It provides a serial data transmission that operates

over copper wire and optical fiber. The latest version of the FC

interface (16FC) allows transmission of data up to 16 Gb/s.

Internet Protocol (IP)

IP is a network protocol that has been traditionally used for host-to-host

traffic. With the emergence of new technologies, an IP network has

become a viable option for host-to-storage communication. IP offers

several advantages in terms of cost and maturity and enables

organizations to leverage their existing IP-based network. iSCSI and FCIP

protocols are common examples that leverage IP for host-to-storage

communication

Storage Storage is a core component in a data center. A storage device uses magnetic, optic,

or solid state media. Disks, tapes, and diskettes use magnetic media, whereas CD/DVD uses optical media for storage. Removable Flash memory or Flash drives are examples of solid state media. In the past, tapes were the most popular storage option for backups because of their low cost. However, tapes have various limitations in terms of performance and management, as listed here:

Data is stored on the tape linearly along the length of the tape. Search and retrieval of data are done sequentially, and it invariably takes several seconds to access the data. As a result, random data access is slow and time-consuming. This limits tapes as a viable option for applications that require real-time, rapid access to data.

In a shared computing environment, data stored on tape cannot be accessed by multiple applications simultaneously, restricting its use to one application at a time.

On a tape drive, the read/write head touches the tape surface, so the tape degrades or wears out after repeated use.

The storage and retrieval requirements of data from the tape and the overhead associated with managing the tape media are significant.

Disk Drive Components The key components of a hard disk drive are platter,

spindle, read-write head, actuator arm assembly, and controller board.

I/O operations in a HDD are performed by rapidly moving the arm across the rotating flat platters coated with magnetic particles. Data is transferred between the disk controller and magnetic platters through the read-write (R/W) head which is attached to the arm. Data can be recorded and erased on magnetic platters any number of times. Following sections detail the different components of the disk drive, the mechanism for organizing and storing data on disks, and the factors that affect disk performance

2.6.1 Platter : A typical HDD consists of one or more flat circular disks called platters. The data is recorded on these platters in binary codes (0s and 1s). The set of rotating platters is sealed in a case, called the Head Disk Assembly (HDA). A platter is a rigid, round disk coated with magnetic material on both

surfaces (top and bottom). The data is encoded by polarizing the magnetic area, or domains, of the disk surface. Data can be written to or read from both surfaces of the platter. The number of platters and the storage capacity of each platter determine the total capacity of the drive.

2.6.2 Spindle: A spindle connects all the platters and is connected to a motor. The motor of the

spindle rotates with a constant speed. The disk platter spins at a speed of several thousands of revolutions per minute (rpm). Common spindle speeds are 5,400 rpm, 7,200 rpm, 10,000 rpm, and 15,000 rpm. The speed of the platter is increasing with improvements in technology, although the extent to which it can be improved is limited.

Read/Write Head:

Read/Write (R/W) heads, as shown in Figure 2-7, read and write data from or to platters. Drives have two R/W heads per platter, one for each surface of the platter. The R/W head changes the magnetic polarization on the surface of the platter when writing data. While reading data, the head detects the magnetic polarization on the surface of the platter. During reads and writes, the R/W head senses the magnetic polarization and never touches the surface of the platter. When the spindle is rotating, there is a microscopic air gap maintained between the R/W heads and the platters, known as the head flying height. This air gap

is removed when the spindle stops rotating and the R/W

head rests on a special area on the platter near the

spindle. This area is called the landing zone.

The landing zone is coated with a lubricant to

reduce friction between the head and the platter

The logic on the disk drive ensures that heads are moved to the landing one before they touch the surface. If the drive malfunctions and the R/W head accidentally touches the surface of the platter outside the landing zone, a head crash occurs. In a head crash, the magnetic coating on the platter is scratched and may cause damage to the R/W head. A head crash generally results in data loss.

Actuator Arm Assembly: R/W heads are mounted on the actuator arm assembly, which positions the R/W

head at the location on the platter where the data needs to be written or read .The R/W heads for all platters on a drive are attached to one actuator arm assembly and move across the platters simultaneously.

Drive Controller Board: The controller is a printed circuit board, mounted at the bottom of a disk drive. It

consists of a microprocessor, internal memory, circuitry, and firmware. The firmware controls the power to the spindle motor and the speed of the motor. It also manages the communication between the drive and the host. In addition, it controls the R/W operations by moving the actuator arm and switching between different R/W heads, and performs the optimization of data access.

Physical Disk Structure:

Data on the disk is recorded on tracks, which are concentric rings on the platter around the spindle, as shown in Figure 2-8. The tracks are numbered, starting from zero, from the outer edge of the platter. The number of tracks per inch (TPI) on the platter (or the track density) measures how tightly the tracks are packed on a platter. Each track is divided into smaller units called sectors. A sector is the smallest, individually addressable unit of storage. The track and sector structure is written on the platter by the drive manufacturer using a low-level formatting operation. The number of sectors per track varies according to the drive type. The first personal computer disks had 17 sectors per track. Recent disks have a much larger number of sectors on a single track. There can be thousands of tracks on a platter, depending on the physical dimensions and recording density of the platter.

typically, a sector holds 512 bytes of user data, although some disks can

be formatted with larger sector sizes. In addition to user data, a sector

also stores other information, such as the sector number, head number

or platter number, and track number. This information helps the

controller to locate the data on the drive. A cylinder is a set of identical

tracks on

both surfaces of each drive platter. The location

of R/W heads is referred to by the cylinder number,

not by the track number.

Host Access to Data Data is accessed and stored by applications using the underlying infrastructure.

The key components of this infrastructure are the operating system (or file system), connectivity, and storage. The storage device can be internal and (or) external to the host. In either case, the host controller card accesses the storage devices using predefined protocols, such as IDE/ATA, SCSI, or Fibre Channel (FC). IDE/ATA and SCSI are popularly used in small and personal computing environments for accessing internal storage. FC and iSCSI protocols are used for accessing data from an external storage device (or subsystems). External storage devices can be connected to the host directly or through the storage network. When the storage is connected directly to the host, it is referred as direct-attached storage (DAS), which is detailed later in this chapter

Understanding access to data over a network is important because it lays the foundation for storage networking technologies. Data can be accessed over a network in one of the following ways: block level, file level, or object level In general, the application requests data from the file system (or operating system) by specifying the filename and location. The file system maps the file attributes to the logical block address of the data and sends the request to the storage device. The storage device converts the logical block address (LBA) to a cylinder-head-sector (CHS) address and fetches the data

In a block-level access, the file system is created on a host, and data is accessed on a network at the block level. In this case, raw disks or logical volumes are assigned to the host for creating the file system. In a fi le-level access, the file system is created on a separate file server or at the storage side, and the fi le-level request is sent over a network

Because data is accessed at the file level, this method has higher overhead, as compared to the data accessed at the block level. Object-level access

is an intelligent evolution, whereby data is accessed

over a network in terms of self-contained objects with

a unique object identifier.

RAID Array Components A RAID array is an enclosure that contains a number of disk drives

and supporting hardware to implement RAID. A subset of disks within a RAID array can be grouped to form logical associations called logical arrays.

RAID Techniquesstriping, mirroring, and parity 3.3.1 Striping:

Striping is a technique to spread data across multiple drives (more than one) to use the drives in parallel. All the read-write heads work simultaneously, allowing more data to be processed in a shorter time and increasing performance.

3.3.2 Mirroring:

Mirroring is a technique whereby the same data is stored on two different disk drives, yielding two copies of the data. If one disk drive failure occurs, the data is intact on the surviving disk drive (see Figure 3-3) and the controller continues to service the host’s data requests from the surviving disk of a mirrored pair

Parity Parity is a method to protect striped data from disk drive failure without the cost of mirroring. An

additional disk drive is added to hold parity, a mathematical construct that allows re-creation of the missing data. Parity is a redundancy technique that ensures protection of data without maintaining a full set of duplicate data. Calculation of parity is a function of the RAID controller.

Parity information can be stored on separate, dedicated disk drives or distributed across all the drives in a RAID set. The first four disks, labeled “Data Disks,” contain the data. The fifth disk, labeled “Parity Disk,” stores the parity information, which, in this case, is the sum of the elements in each row. Now, if one of the data disks fails, the missing value can be calculated by subtracting the sum of the rest of the elements from the parity value. Here, for simplicity, the computation of parity is represented as an arithmetic sum of the data. However, parity calculation is a bitwise

XOR operation

RAID Levels

RAID 0 RAID 0 configuration uses data striping techniques, where data is striped across all the

disks within a RAID set. Therefore it utilizes the full storage capacity of a RAID set. To

read data, all the strips are put back together by the controller.

Figure 3-5 shows RAID 0 in an array in which data is striped across five disks.

When the number of drives in the RAID set increases, performance improve because

more data can be read or written simultaneously. RAID 0 is a good option for applications

that need high I/O throughput. However, if these applications require high availability

during drive failures, RAID 0 does not provide data protection and availability.

RAID 1

RAID 1 is based on the mirroring technique. In this RAID configuration,

data is

mirrored to provide fault tolerance (see Figure 3-6). A RAID 1 set

consists of two disk drives and every write is written to both disks. The

mirroring is transparent to the host. During dis failure the impact on

data recover I RAID 1 is the least among all RAID implementations.

This is because the RAID controller uses the mirror drive for data

recovery. RAID 1 is suitable for applications that require high availability

and cost is no constraint.

Nested RAID Most data centers require data redundancy and performance from their RAID

arrays. RAID 1+0 and RAID 0+1combine the performance benefits of RAID 0 with the redundancy benefits of RAID 1. They use striping and mirroring techniques and combine their benefits. These types of RAID require an even number of disks, the minimum being four.

RAID 1+0 is also known as RAID 10 (Ten) or RAID 1/0. Similarly, RAID 0+1 is also known as RAID 01 or RAID 0/1. RAID 1+0 performs well for workloads with small, random, write-intensive I/Os. Some applications that benefit from RAID 1+0 include the following:

High transaction rate Online Transaction Processing (OLTP) Large messaging installations Database applications with write intensive random access workloads.

RAID 3 RAID 3 stripes data for performance and uses parity for fault tolerance. Parity

information is stored on a dedicated drive so that the data can be reconstructed

if a drive fails in a RAID set. For example, in a set of five disks, four are used for

data and one for parity. Therefore, the total disk space required is 1.25 times

the size of the data disks. RAID 3 always reads and writes complete stripes of

data across all disks because the drives operate in parallel. There are no partial

writes that update one out of many strips in a stripe. Figure 3-8 illustrates the

RAID 3 implementation. RAID 3 provides good performance for applications that

involve large sequential data access, such as data backup or video streaming.

RAID 4

Similar to RAID 3, RAID 4 stripes data for high performance and uses parity

for improved fault tolerance. Data is striped across all disks except the

parity disk in the array. Parity information is stored on a dedicated disk so

that the data can be rebuilt if a drive fails. Unlike RAID 3, data disks in RAID

4 can be accessed independently so that specific data elements can be read

or written on a single disk without reading or writing an entire stripe. RAID 4

provides good read throughput and reasonable write throughput.

RAID 5 RAID 5 is a versatile RAID implementation. It is similar to RAID 4 because

it uses striping. The drives (strips) are also independently accessible. The difference between RAID 4 and RAID 5 is the parity location. In RAID 4, parity is written to a dedicated drive, creating a write bottleneck for the parity disk. In RAID 5, parity is distributed across all disks to overcome the write bottleneck of a dedicated parity disk. Figure 3-9 illustrates the RAID 5 implementation. RAID 5 is good for random, read-intensive I/O applications and preferred for messaging, data mining, medium-performance media serving, and relational database management system (RDBMS) implementations, in which database administrators (DBAs) optimize data access.

RAID 6

RAID 6 works the same way as RAID 5, except that RAID 6 includes a

second parity element to enable survival if two disk failures occur in a RAID

set. Therefore, a RAID 6 implementation requires at least four disks. RAID 6

distributes the parity across all the disks. The write penalty (explained later

in this chapter) in RAID 6 is more than that in RAID 5; therefore, RAID 5

writes perform better than RAID 6. The rebuild operation in RAID 6 may

take longer than that in RAID 5 due to the presence of two parity sets.

Direct-Attached Storage DAS is an architecture in which storage is connected directly to the hosts. The internal disk drive of a host

and the directly-connected external storage array are some examples of DAS. Although the implementation of storage networking technologies is gaining popularity, DAS has remained suitable for localized data access in a small environment, such as personal computing and workgroups. DAS is classified as internal or external, based on the location of the storage device with respect to the host.

In internal DAS architectures, the storage device is internally connected to the host by a serial or parallel. The physical bus has distance limitations and can be sustained only over a shorter distance for high speed connectivity. In addition, most internal buses can support only a limited number of devices, and they occupy a large amount of space Inside the host, making maintenance of other components difficult.

On the other hand, in external DAS architectures, the host connects directly to the external storage device, and data is accessed at the block level In most cases, communication between the host and the storage device takes place over a SCSI or FC protocol. Compared to internal DAS, an external DAS overcomes the distance and device count limitations and provides centralized management of storage devices

DAS Benefits and Limitations DAS requires a relatively lower initial investment than storage networking architectures.

The DAS configuration is simple and can be deployed easily and rapidly. The setup is managed using host-based tools, such as the host OS, which makes storage management tasks easy for small environments. Because DAS has a simple architecture, it requires fewer management tasks and less hardware and software elements to set up and operate.

However, DAS does not scale well. A storage array has a limited number of ports, which restricts the number of hosts that can directly connect to the storage. When capacities are reached, the service availability may be compromised. DAS does not make optimal use of resources due to its limited capability to share front-end ports. In DAS environments, unused resources cannot be easily reallocated, resulting in islands of over-utilized and under-utilized storage pools.

Network-Attached Storage File sharing, as the name implies, enables users to share files with other users. Traditional methods of file

sharing involve copying files to portable media such as floppy diskette, CD, DVD, or USB drives and delivering

them to other users with whom it is being shared. However, this approach is not suitable in an enterprise

environment in which a large number of users at different locations need access to common files. Network-

based file sharing provides the flexibility to share fi les over long distances among a large number of users.

File servers use client server technology to enable file sharing over a network. To address the tremendous

growth of file data in enterprise environments, organizations have been deploying large numbers of file

servers. These servers are either connected to direct-attached storage (DAS) or storage area network (SAN)-

attached storage. This has resulted in the proliferation of islands of over-utilized and under-utilized file servers

and storage. In addition, such environments have poor scalability, higher management cost, and greater

complexity. Network-attached storage (NAS) emerged as a solution to these challenges.

NAS is a dedicated, high-performance file sharing and storage device. NAS enables its clients to share fi les over an IP network. NAS provides the advantages of server consolidation by eliminating the need for multiple file servers. It also consolidates the storage used by the clients onto a single system, making it easier to manage the storage. NAS uses network and file-sharing protocols to provide access to the file data. These protocols include TCP/IP for data transfer, and Common Internet File System (CIFS) and Network File System (NFS) for network file service. NAS enables both UNIX and Microsoft Windows users to share the same data seamlessly.

A NAS device uses its own operating system and integrated hardware and software components to meet specific file-service needs. Its operating system is optimized for file I/O and, therefore, performs file I/O better than a general-purpose server. As a result, a NAS device can serve more clients than general-purpose servers and provide the benefit of server consolidation. A network-based file sharing environment is composed of multiple file servers or NAS devices. It might be required to move the files from one device to another due to reasons such as cost or performance. File-level virtualization, implemented in the file sharing environment, provides a simple, non disruptive file-mobility solution. It enables the movement of fi les across NAS devices, even if the files are being accessed.

This chapter describes the components of NAS, different types of NAS implementations, and the file-sharing protocols used in NAS implementations. The chapter also explains factors that affect NAS performance, and fi le-level virtualization.

CPU and memory One or more network interface cards (NICs), which provide connectivity

to the client network. Examples of network protocols supported by NIC include Gigabit Ethernet, Fast Ethernet, ATM, and Fiber Distributed Data Interface (FDDI).

An optimized operating system for managing the NAS functionality. It translates fi le-level requests into block-storage requests and further converts the data supplied at the block level to file data.

NFS, CIFS, and other protocols for file sharing Industry-standard storage protocols and ports to connect and manage

physical disk resources The NAS environment includes clients accessing a NAS device over an IP network using file-sharing protocols

Components of NAS

NAS File-Sharing Protocols

Most NAS devices support multiple fi le-service protocols to handle file

I/O requests to a remote file system. As discussed earlier, NFS and CIFS

are the common protocols for file sharing. NAS devices enable users to

share file data across different operating environments and provide a

means for users to migrate transparently from one operating system to

another.

NFS is a client-server protocol for file sharing that is commonly used on UNIX systems. NFS was originally based on the connectionless User Datagram Protocol (UDP). It uses a machine-independent model to represent user data. It also uses Remote Procedure Call (RPC) as a method of inter-process communication between two computers. The NFS protocol provides a set of RPCs to access a remote file system for the following operations:

Searching fi les and directories

Opening, reading, writing to, and closing a file

Changing File attributes

Modifying file links and directories

NFS creates a connection between the client and the remote system to transfer data. NFS (NFSv3 and earlier) is a stateless protocol, which means that it does not maintain any kind of table to store information about open fi les and associated pointers. Therefore, each call provides a full set of arguments to access fi les on the server. These arguments include a file handle reference to the fi le, a particular position to read or write, and the versions of NFS.

NFS

CIFS CIFS is a client-server application protocol that enables client programs to make requests for fi les and services on

remote computers over TCP/IP. It is a public, or open, variation of Server Message Block (SMB) protocol. The CIFS

protocol enables remote clients to gain access to fi les on a server. CIFS enables file sharing with other clients by using

special locks. Filenames in CIFS are encoded using Unicode characters. CIFS provides the following features to ensure

data integrity: It uses file and record locking to prevent users from overwriting the work of another user on a file or a

record.

It supports fault tolerance and can automatically restore connections and reopen files that were open prior to an

interruption. The fault tolerance features of CIFS depend on whether an application is written to take advantage of

these features. Moreover, CIFS is a state full protocol because the CIFS server maintains connection information

regarding every connected client. If a network failure or CIFS server failure occurs, the client receives a disconnection

notification. User disruption is minimized if the application has the embedded intelligence to restore the connection.

However, if the embedded intelligence is missing, the user must take steps to reestablish the CIFS connection.

Benefits of NAS Comprehensive access to information: Enables efficient file sharing and supports

many-to-one and one-to-many configurations. The many-to-one configuration enables a NAS device to serve many clients simultaneously. The one-to-many configuration enables one client to connect with many NAS devices simultaneously.

Improved efficiency: NAS delivers better performance compared to a general-purpose file server because NAS uses an operating system specialized for file serving.

Improved flexibility: Compatible with clients on both UNIX and Windows platforms using industry-standard protocols. NAS is flexible and can serve requests from different types of clients from the same source.

Centralized storage: Centralizes data storage to minimize data duplication on client workstations, and ensure greater data protection.

Simplified management: Provides a centralized console that makes it possible to manage file systems efficiently.

Benefits of NAS Scalability: Scales well with different utilization profiles and types of

business applications because of the high-performance and low-latency design

High availability: Offers efficient replication and recovery options, enabling high data availability. NAS uses redundant components that provide maximum connectivity options. A NAS device supports clustering technology for failover.

Security: Ensures security, user authentication, and file locking with industry-standard security schemas

Low cost: NAS uses commonly available and inexpensive Ethernet components.

Ease of deployment: Configuration at the client is minimal, because the clients have required NAS connection software built in.

Factors Affecting NAS Performance NAS uses IP network; therefore, bandwidth and latency issues associated with IP affect NAS

performance. Network congestion is one of the most significant sources of latency in a NAS environment. Other factors that affect NAS performance at different levels follow:

1. Number of hops: A large number of hops can increase latency because IP processing is required at each hop, adding to the delay caused at the router.

2. Authentication with a directory service such as Active Directory or NIS: The authentication service must be available on the network with enough resources to accommodate the authentication load. Otherwise, a large number of authentication requests can increase latency.

3. Retransmission: Link errors and buffer overflows can result in retransmission. This causes packets that have not reached the specified destination to be re-sent. Care must be taken to match both speed and duplex settings on the network devices and the NAS heads. Improper configuration might result in errors and retransmission, adding to latency.

.

Factors Affecting NAS Performance 4. Over utilized routers and switches: The amount of time that an over utilized device in a network takes

to respond is always more than the response time of an optimally utilized or underutilized device. Network administrators can view utilization statistics to determine the optimum utilization of switches and routers in a network. Additional devices should be added if the current devices are over utilized.

5. File system lookup and metadata requests: NAS clients access fi les on NAS devices. The processing required to reach the appropriate file or directory can cause delays. Sometimes a delay is caused by deep directory structures and can be resolved by fl attending the directory structure Poor file system layout and an over utilized disk system can also degrade performance.

6. Over utilized NAS devices: Clients accessing multiple fi les can cause high utilization levels on a NAS device, which can be determined by viewing utilization statistics. High memory, CPU, or disk subsystem utilization levels can be caused by a poor file system structure or insufficient resources in a storage subsystem.

7. Over utilized clients: The client accessing CIFS or NFS data might also be over utilized. An over utilized client requires a longer time to process the requests and responses. Specific performance-monitoring tools are available for various operating systems to help determine the utilization of client resources.

Network-attached storageNetwork-attached storage (NAS) is basically a LAN-attached file server that serves files by using a network protocol, such as Network File

System (NFS). NAS refers to storage elements that connect to a network and provide file access services to computer systems. An NAS storage element consists of an engine that implements the file services (by using access protocols, such as NFS or Common Internet File System (CIFS)) and one or more devices, on which data is stored. NAS elements might be attached to any type of network. From a SAN perspective, a SAN-attached NAS engine is treated just like any other server. However, NAS does not provide any of the activities that a server in a server-centric system typically provides, such as email, authentication, or file management. NAS allows more hard disk storage space to be added to a network that already uses servers, without shutting them down for maintenance and upgrades. With an NAS device, storage is not a part of the server. Instead, in this storage-centric design, the server still handles all of the processing of the data, but an NAS device delivers the data to the user. An NAS device does not need to be located within the server, but an NAS device can exist anywhere in the LAN. An NAS device can consist of multiple networked NAS devices. These units communicate to a host by using Ethernet and file-based protocols. This method is in contrast to the disk units that are already described, which use Fibre Channel Protocol (FCP) and block-based protocols to communicate.

NAS NAS storage provides acceptable performance and security, and it is often less

expensive for servers to implement (for example, Ethernet adapters are less expensive than Fibre Channel adapters). To bridge the two worlds and open up new configuration options for clients, certain vendors, including IBM, sell NAS units that act as a gateway between IP-based users and SAN-attached storage. This configuration allows the connection of the storage device and shares the storage device between your high-performance database servers (attached directly through FC) and your users (attached through IP). These users do not have strict performance requirements.

NAS is an ideal solution for serving files that are stored on the SAN to users in cases where it is impractical and expensive to equip users with Fibre Channel adapters. NAS allows those users to access your storage through the IP-based network that they already have.

Storage area networks The Storage Networking Industry Association (SNIA) defines the storage area network

(SAN) as a network whose primary purpose is the transfer of data between computer systems and storage elements. A SAN consists of a communication infrastructure, which provides physical connections. It also includes a management layer, which organizes the connections, storage elements, and computer systems so that data transfer is secure and robust. The term SAN is typically (but not necessarily) identified with block I/O services rather than file access services.

In simple terms, a SAN is a specialized, high-speed network that attaches servers and storage devices. The SAN is sometimes referred to as the network behind the servers. A SAN allows an any-to-any connection across the network, by using interconnect elements, such as switches and directors. The SAN eliminates the traditional dedicated connection between a server and storage, and the concept that the server effectively owns and manages the storage devices

The SAN also eliminates any restriction to the amount of data that a server can access. Traditionally, a server is limited by the number of storage devices that attach to the individual server. Instead, a SAN introduces the flexibility of networking to enable one server or many heterogeneous servers to share a common storage utility. A network might include many storage devices, including disk, tape, and optical storage. Additionally, the storage utility might be located far from the servers that it uses.

The SAN can be viewed as an extension to the storage bus concept. This concept enables storage devices and servers to interconnect by using similar elements, such as LANs and wide area networks (WANs).

SANs create new methods of attaching storage to servers. These new methods can enable great improvements in both availability and performance. The SANs of today are used to connect shared storage arrays and tape libraries to multiple servers, and they are used by clustered servers for failover.

A SAN can be used to bypass traditional network bottlenecks. A SAN facilitates direct, high-speed data transfers between servers and storage devices, potentially in any of the following three ways:

Server to storage: This method is the traditional model of interaction with storage devices. The advantage is that the same storage device might be accessed serially or concurrently by multiple servers.

Server to server: A SAN might be used for high-speed, high-volume communications between servers.

Storage to storage: This outboard data movement capability enables data to be moved without server intervention, therefore freeing up server processor cycles for other activities, such as application processing. Examples include a disk device that backs up its data to a tape device without server intervention, or a remote device mirroring across the SAN.

SANs allow applications that move data to perform better, for example, by sending data directly from the source device to the target device with minimal server intervention. SANs also enable new network architectures where multiple hosts access multiple storage devices that connect to the same network. The use of a SAN can potentially offer the following benefits:

Improvements to application availability: Storage is independent of applications and accessible through multiple data paths for better reliability, availability, and serviceability.

Higher application performance: Storage processing is offloaded from servers and moved onto a separate network.

Centralized and consolidated storage: Simpler management, scalability, flexibility, and availability are possible.

Data transfer and vaulting to remote sites: A remote copy of data is enabled for disaster protection and against malicious attacks.

Simplified centralized management: A single image of storage media simplifies management.

Storage area network components Fibre Channel (FC) is the predominant architecture on which most SAN

implementations are built. IBM Fibre Channel connection (FICON®) is the

standard protocol for IBM z/OS® systems and Fibre Channel Protocol (FCP) is

the standard protocol for open systems.

RequirementsWith this scenario in mind, you might consider several requirements for the storage infrastructures of today: Unlimited and just-in-time scalability: Businesses require the capability to flexibly adapt to the rapidly

changing demands for storage resources without performance degradation. System simplification: Businesses require an easy-to-implement infrastructure with a minimum amount of

management and maintenance. The more complex the enterprise environment, the more costs that are involved in terms of management. Simplifying the infrastructure can save costs and provide a greater return on investment (ROI).

Flexible and heterogeneous connectivity: The storage resource must be able to support whatever platforms are within the IT environment. This resource is essentially an investment protection requirement that allows for the configuration of a storage resource for one set of systems. It later configures part of the capacity to other systems on an as-needed basis.

Security: This requirement guarantees that data from one application or system does not become overlaid or corrupted by other applications or systems. Authorization also requires the ability to fence off the data of one system from other systems.

Encryption: When sensitive data is stored, it must be read or written only from certain authorized systems. If for any reason the storage system is stolen, data must never be available to be read from the system.

Hypervisors: This requirement is for the support of the server, application, and desktop virtualization hypervisor features for cloud computing.

Speed: Storage networks and devices must be able to manage the high number of gigabytes and intensive I/O that are required by each business industry.

Availability: This requirement implies both the protection against media failure and the

ease of data migration between devices, without interrupting application processing. This

requirement certainly implies improvements to backup and recovery processes. Attaching

disk and tape devices to the same networked infrastructure allows for fast data movement

between devices, which provides the following enhanced backup and recovery capabilities:

– Serverless backup. This capability is the ability to back up your data without using the

computing processor of your servers.

– Synchronous copy. This capability ensures that your data is at two or more places before

your application goes to the next step.

– Asynchronous copy. This capability ensures that your data is at two or more places

within a short time. The disk subsystem controls the data flow.

Storage area network connectivityStandards and models for storage connectivity Networking is governed by adherence to standards and models. Data transfer is also governed

by standards. By far the most common standard is Small Computer System Interface (SCSI).

SCSI is an American National Standards Institute (ANSI) standard that is one of the leading I/O buses in the computer industry.

An industry effort was started to create a stricter standard to allow devices from separate vendors to work together. This effort is recognized in the ANSI SCSI-1 standard. The SCSI-1 standard (circa 1985) is rapidly becoming obsolete. The current standard is SCSI-2. The SCSI-3 standard is in the production stage.

Fibre Channel is a serial interface (primarily implemented with fiber-optic cable). Fibre Channel is the primary architecture for most SANs. To support this interface, many vendors in the marketplace produce Fibre Channel adapters and other Fibre Channel devices. Fibre Channel brought these advantages by introducing a new protocol stack and by keeping the SCSI-3 CCS on top of it

Options for storage connectivity

we divided these components into three sections according to the abstraction level to which they belong: lower-level layers, middle-level layers, and higher-level layers. Lower-level layers Only three stacks can directly interact with the physical wire: Ethernet,

SCSI, and Fibre Channel. Because of this configuration, these models are considered the lower-level layers. All of the other stacks are combinations of the layers, such as Internet SCSI (iSCSI), Fibre Channel over IP (FCIP), and Fibre Channel over Ethernet (FCoE), which are also called the middle-level layers.

We assume that you have a basic knowledge of Ethernet, which is typically used on conventional server-to-server or workstation-to-server network connections. The connections build up a common-bus topology by which every attached device can communicate with every other attached device by using this common bus. Ethernet speed is increasing as it becomes more pervasive in the data center.

Middle-level layers

This section consists of the transport protocol and session layers.

Internet Small Computer System Interface

Internet Small Computer System Interface (iSCSI) is a transport protocol that carries SCSI commands from an initiator to a target. The iSCSI data storage networking protocol transports standard SCSI requests over the standard Transmission Control Protocol/Internet Protocol (TCP/IP) networking technology.

iSCSI enables the implementation of IP-based SANs, enabling clients to use the same networking technologies, for both storage and data networks. Because iSCSI uses TCP/IP, iSCSI is also suited to run over almost any physical network. By eliminating the need for a second network technology just for storage, iSCSI has the potential to lower the costs of deploying networked storage.

Fibre Channel Protocol

The Fibre Channel Protocol (FCP) is the interface protocol of SCSI on Fibre Channel (FC). It is a gigabit speed network technology that is primarily used for storage networking. Fibre Channel is standardized in the T11 Technical Committee of the International Committee of Information Technology Standards (INCITS), an ANSI-accredited standards committee. FCP started for use primarily in the supercomputer field, but FCP is now the standard connection type for SANs in enterprise storage. Despite its name, Fibre Channel signaling can run on both twisted-pair copper wire and fiber optic cables.

Fibre Channel over IP

Fibre Channel over IP (FCIP) is also known as Fibre Channel tunneling or storage tunneling. It is a method to allow the transmission of Fibre Channel information to be tunneled through the IP network. Because most organizations already have an existing IP infrastructure, the attraction of being able to link geographically dispersed SANs, at a relatively low cost, is enormous. FCIP encapsulates Fibre Channel block data and then transports it over a TCP socket. TCP/IP services are used to establish connectivity between remote SANs. Congestion control and management and also data error and data loss recovery are handled by TCP/IP services and do not affect Fibre Channel fabric services.

The major consideration with FCIP is that it does not replace Fibre Channel with IP; it allows deployments of Fibre Channel fabrics by using IP tunneling. You might assume that the industry decided that Fibre Channel-based SANs are appropriate. Another possible assumption is that the IP connection is only needed to facilitate any distance requirement that is beyond the current scope of an FCP SAN

Fibre Channel connection

Fibre Channel connection (FICON) architecture is an enhancement of, rather than a replacement for, the traditional IBM Enterprise Systems Connection (ESCON) architecture. A SAN is Fibre Channel-based (FC-based). Therefore, FICON is a prerequisite for IBM z/OS systems to fully participate in a heterogeneous SAN, where the SAN switch devices allow the mixture of open systems and mainframe traffic.

FICON is a protocol that uses Fibre Channel as its physical medium. FICON channels can achieve data rates up to 200 MBps full duplex and extend the channel distance (up to 100 km (62 miles)). FICON can also increase the number of control unit images for each link and the number of device addresses for each control unit link. The protocol can also retain the topology and switch management characteristics of ESCON.

Higher-level layers This section consists of the presentation and application layers.

Server-attached storage

The earliest approach was to tightly couple the storage device with the server. This server-attached storage approach keeps performance overhead to a minimum. Storage

is attached directly to the server bus by using an adapter, and the storage device is dedicated to a single server. The server itself controls the I/O to the device, issues the low-level device commands, and monitors device responses.

Initially, disk and tape storage devices had no onboard intelligence. They merely ran the I/O requests of the server. The subsequent evolution led to the introduction of control units (CUs). These units are storage offload servers that contain a limited level of intelligence. The Cus can perform functions, such as I/O request caching for performance improvements or dual copying data (RAID 1) for availability. Many advanced storage functions are developed and implemented inside the CU.

Fibre Channel advantages

Fibre Channel is an open, technical standard for networking that incorporates the channel transport characteristics of an I/O bus, with the flexible connectivity and distance characteristics of a traditional network. Because of Fibre Channel’s channel-like qualities, hosts and applications see storage devices that are attached to the SAN as though they are locally attached storage. Because of Fibre Channel’s network characteristics, Fibre Channel can support multiple protocols and a broad range of devices. And, Fibre Channel can be managed as a network. Fibre Channel can use either optical fiber (for distance) or copper cable links (for short distance at low cost)

Fibre Channel model overview

Like other networks, information is sent in structured packets or frames, and data is serialized before transmission. But, unlike other networks, the Fibre Channel architecture includes significant hardware processing to deliver high performance.

Fibre Channel uses a serial data transport scheme that is similar to other computer networks, which stream packets (frames) of bits, one behind the other, in a single data line to achieve high data rates. Serial transfer does not suffer from the problem of skew, so speed and distance are not restricted in the same way that parallel data transfers are restricted. Figure 3-5 shows the process of parallel data transfers versus serial data transfers.

Serial transfer enables simpler cabling and connectors, and also the routing of information through switched networks. Fibre Channel can operate over longer distances, both natively and by implementing cascading, and longer with the introduction of repeaters. Just as LANs can be interlinked in wide area networks (WANs) by using high-speed gateways, campus SANs can be interlinked to build enterprise-wide SANs.

Therefore, Fibre Channel combines the best characteristics of traditional I/O channels with the characteristics of computer networks:

High performance for large data transfers by using simple transport protocols and extensive hardware assists Serial data transmission A physical interface with a low error rate definition

Reliable transmission of data with the ability to guarantee or confirm error-free delivery of the data

The ability to package data in packets (frames, in Fibre Channel terminology)

Flexibility in terms of the types of information that can be transported in frames (such as data, video, and audio)

Use of existing device-oriented command sets, such as SCSI and FCP

A vast expansion in the number of devices that can be addressed when compared to I/O interfaces: a theoretical maximum of more than 15 million ports.

FC Layers Fibre Channel (FC) is broken up into a series of five layers. The concept of layers,

starting with the International Organization for Standardization/open systems interconnection (ISO/OSI) seven-layer model, allows the development of one layer to remain independent of the adjacent layers. Although a Fibre Channel contains five layers, those layers follow the general principles that are stated in the ISO/OSI model.

The series of five layers that make up a Fibre Channel can be categorized into the following layers:

Physical and signaling layer

Upper layer

Physical and signaling layersThe physical and signaling layers include the three lowest layers: FC-0, FC-1, and FC-2. Physical interface and media: FC-0

The lowest layer, FC-0, defines the physical link in the

system, including the cabling, connectors, and electrical parameters for the system at a wide range of data rates. This level is designed for maximum flexibility, and this level allows the use of many technologies to match the needs of the configuration. A communication route between two nodes can be made up of links of different technologies. For example, in reaching its destination, a signal might start out on copper wire and be converted to single-mode fiber for longer distances. This flexibility allows for specialized configurations, depending on IT requirement.

Laser safety Fibre Channel often uses lasers to transmit data, and can, therefore, present an optical health hazard. The FC-0 layer defines an

open fiber control (OFC) system, and it acts as a safety interlock for point-to-point fiber connections that use semiconductor laser diodes as the optical source. If the fiber connection is broken, the ports send a series of pulses until the physical connection is re-established and the necessary handshake procedures are followed.

Transmission protocol: FC-1 The second layer, FC-1, provides the methods for adaptive 8B/10B encoding to bind the maximum length of the code, maintain

DC-balance, and provide word alignment. This layer is used to integrate the data with the clock information that is required by serial transmission technologies.

Framing and signaling protocol: FC-2 Reliable communications result from the FC-2 framing and signaling protocol of the FC. FC-2 specifies a data transport

mechanism that is independent of upper-layer protocols. FC-2 is self-configuring and supports point-to-point, arbitrated loop, and switched environments. FC-2, which is the third layer of the Fibre Channel Physical and Signaling interface (FC-PH), provides the transport methods to determine the following factors:

Topologies that are based on the presence or absence of a fabric Communication models Classes of service that are provided by the fabric and the nodes General fabric model Sequence and exchange identifiers Segmentation and reassembly Data is transmitted in 4-byte ordered sets that contain data and control characters. Ordered sets provide the availability to obtain

bit and word synchronization, which also establishes word boundary alignment. Together, FC-0, FC-1, and FC-2 form the FC-PH

Common services: FC-3 FC-3 defines functions that span multiple ports on a single-node or fabric. Functions that are

currently supported include the following features: Hunt groups A hunt group is a set of associated node ports (N_ports) that is attached to a single node. This set is assigned an alias identifier that allows any frames that contain the alias to be routed to any available N_port within the set. This process decreases the latency in waiting for an N_port to become available. Striping

Striping is used to multiply bandwidth by using multiple N_ports in parallel to transmit a single information unit across multiple links.

Multicast Multicast delivers a single transmission to multiple destination ports. This method includes the

ability to broadcast to all nodes or a subset of nodes.Upper-layer protocol mapping: FC-4 The highest layer, FC-4, provides the application-specific protocols. Fibre Channel is equally adept

at transporting both the network and channel information and allows both protocol types to be transported concurrently over the same physical interface. Through mapping rules, a specific FC-4 describes how upper-layer protocol (ULP) processes of the same FC-4 type interoperate. A channel example is FCP. This protocol is used to transfer SCSI data over Fibre Channel. A networking example is sending IP packets between the nodes. FICON is another ULP in use today for mainframe systems. FICON is a contraction of Fibre Connection and refers to running ESCON traffic over Fibre Channel

Fiber in the storage area network Fibre Channel can be run over optical or copper media, but fiber-optic cables offer a major advantage in noise immunity.

For this reason, fiber-optic cabling is preferred. However, copper is also used. In the short term, a mixed environment likely needs to be tolerated and supported. Although, a mixed environment is less likely to be needed as SANs mature. In addition to the noise immunity, fiber-optic cabling provides distinct advantages over copper transmission lines that make it an attractive medium for many applications. The following advantages are at the forefront:

Greater distance capability than is generally possible with copper

Insensitivity to induced electromagnetic interference (EMI)

No emitted electromagnetic radiation, such as Radio Frequency Interference (RFI)

No electrical connection between two ports Not susceptibility to crosstalk Compact and lightweight cables and connectorsHowever, fiber-optic and optical links have drawbacks. The drawbacks include the following considerations: Optical links tend to be more expensive than copper links over short distances. Optical connections do not lend themselves to backplane-printed circuit wiring. Optical connections might be affected by dirt and other contamination.Overall, optical fibers provide a high-performance transmission medium that was refined and proven over many years.

Normally, fiber-optic cabling is referred to by mode or the frequencies of lightwaves that are carried by a particular cable type. Fiber cables come in two distinct types.

Multi-mode fiber for shorter distances

Multi-mode cabling is used with shortwave laser light and has either a 50-micron or a 62.5-micron core with a cladding of 125 micron. The 50-micron or 562.5-micron diameter is sufficiently large for injected light waves to be reflected off the core interior. Multi-mode fiber (MMF) allows more than one mode of light. Common multi-mode core sizes are 50 micron and 62.5 micron. MMF fiber is better suited for shorter-distance applications. Where costly electronics are heavily concentrated, the primary cost of the system is not the cable. In this case, MMF is more economical because it can be used with inexpensive connectors and laser devices, therefore reducing the total system cost.

Single-mode fiber for longer distances

Single-mode fiber (SMF) allows only one pathway, or mode, of light to travel

within the fiber. The core size is typically 8.3 micron. SMFs are used in applications

where low signal loss and high data rates are required. An example of this type of

application is on long spans between two system devices or network devices

where repeater and amplifier spacing needs to be maximized.

Classes of service Applications might require different levels of service and guarantees

for delivery, connectivity, and bandwidth. Certain applications need

bandwidth that is dedicated to the application during the data

exchange. An example of this type of application is a tape backup.

Other applications might be bursty in nature and not require a

dedicated connection, but they might insist that an acknowledgment

is sent for each successful transfer. The Fibre Channel standards

provide different classes of service to accommodate different

application.

Class 1

In class 1 service, a dedicated connection source and destination are established through the fabric during the transmission. Class 1 service provides acknowledged service. This class of service ensures that the frames are received by the destination device in the same order in which they are sent. This class reserves full bandwidth for the connection between the two devices. It does not provide for a good utilization of the available bandwidth because it blocks another possible contender for the same device. Because of this blocking and the necessary dedicated connections, class 1 is rarely used.

Class 2

Class 2 is a connectionless, acknowledged service. Class 2 makes better use of available

bandwidth because it allows the fabric to multiplex several messages on a frame-by-frame basis. While frames travel through the fabric, they can take separate routes, so class 2 service does not guarantee in-order delivery. Class 2 relies on upper-layer protocols to take care of the frame sequence. The use of acknowledgments reduces available bandwidth, which needs to be considered in large-scale busy networks.

Class 3

No dedicated connection is available in class 3, and the received frames are not acknowledged. Class 3 is also called datagram connectionless service. It optimizes the use of fabric resources, but it is now up to the upper-layer protocol to ensure that all frames are received in the correct order. The upper-layer protocol also needs to request to the source device the retransmission of missing frames. Class 3 is a commonly used class of service in Fibre Channel networks.

Class 4

Class 4 is a connection-oriented service, which is similar to class 1. The major difference is that class 4 allocates only a fraction of the available bandwidth of the path through the fabric that connects two N_ports. Virtual circuits (VCs) are established between two N_ports with guaranteed quality of service (QoS), including bandwidth and latency. Like class 1, class 4 guarantees the in-order delivery of frames and provides an acknowledgment of delivered frames. However, now the fabric is responsible for multiplexing frames of different VCs. Class 4 service is intended for multimedia applications, such as video, and for applications that allocate an established bandwidth by department within the enterprise. Class 4 is included in the FC-PH-2 standard.

Class 5

Class 5 is called isochronous service, and is intended for applications that require immediate delivery of the data as it arrives, with no buffering. Class 5 is not clearly defined yet, and it is not included in the FC-PH documents.

Class 6

Class 6 is a variant of class 1, and it is known as a multicast class of service. It provides dedicated connections for a reliable multicast. An N_port might request a class 6 connection for one or more destinations. A multicast server in the fabric establishes the connections, receives the acknowledgment from the destination ports, and sends the acknowledgment back to the originator. When a connection is established, the connection is retained and guaranteed by the fabric until the initiator ends the connection. Class 6 was designed for applications, such as audio and video, that require multicast functionality. Class 6 is included in the FC-PH-3 standard.

Class F

Class F service is defined in the Fibre Channel Switched Fabric (FC-SW) standard and the FC-SW-2 standard for use by switches that communicate through inter-switch links (ISLs). It is a connectionless service with notification of non-delivery between E_ports that are used for the control, coordination, and configuration of the fabric. Class F is similar to class 2. The major difference is that class 2 works with N_ports that send data frames. Class F is used by E_ports for the control and management of the fabric.

Byte-encoding schemes To transfer data over a high-speed serial interface, the data is encoded before transmission and decoded

on reception. The encoding process ensures that sufficient clock information is present in the serial data stream. This information allows the receiver to synchronize to the embedded clock information and successfully recover the data at the required error rate. This 8b/10b encoding finds errors that a parity check cannot. A parity check does not find the even-numbered bit errors, only the odd numbers. The 8b/10b encoding logic finds almost all errors. First developed by IBM, the 8b/10b encoding process converts each 8-bit byte into two possible 10-bit characters. This scheme is called 8b/10b encoding because it refers to the number of data bits that are input to the encoder and the number of bits that are output from the encoder. The format of the 8b/10b character is of the format Ann.m:

A represents D for data or K for a special character.

nn is the decimal value of the lower 5 bits (EDCBA).

The (.) is a period.

m is the decimal value of the upper 3 bits (HGF).

The following steps occur in the encoding example: 1. Hexadecimal representation x’59’ is converted to binary:

01011001. 2. The upper 3 bits are separated from the lower 5 bits: 010 11001. 3. The order is reversed, and each group is converted to decimal: 25

2. 4. The letter notation D (for data) is assigned and becomes D25.2

Data transfer Data is sent in frames. One or more related frames make up a

sequence. One or more related sequences make up an exchange. Frames Fibre Channel places a restriction on the length of the data field of a

frame at 528 transmission words, which is 2112 bytes. Larger amounts of data must be transmitted in several frames. This larger unit that consists of multiple frames is called a sequence. An entire transaction between two ports is made up of sequences that are administered by an even larger unit that is called an exchange.

Flow control Now that you know that data is sent in frames, you also must understand that

devices need to temporarily store the frames as they arrive. The data frames must be stored until they are assembled in sequence and then delivered to the upper-layer protocol. Because of the potential high bandwidth of the Fibre Channel, it is possible to inundate and overwhelm a target device with frames. A mechanism must exist to prevent this situation. The ability of a device to accept a frame is called its credit. This credit is typically referred to as the number of buffers (its buffer credit) that a node maintains for accepting incoming data.

Framing rulesThe following rules apply to the framing protocol: A frame is the smallest unit of information transfer. A sequence has at least one frame. An exchange has at least one sequence

Fibre Channel topologies Fibre Channel-based networks support three types of base topologies:

point-to-point, arbitrated loop, and switched fabric. A switched fabric is the most commonly encountered topology today and it has subclassifications of topology.

Point-to-point topologyA point-to-point connection is the simplest topology. It is used when exactly two

nodes exist, and future expansion is not predicted. Media is not shared, which allows the devices to use the total bandwidth of the link. A simple link initialization is needed before communications can begin.

Fibre Channel is a full-duplex protocol, which means that both paths transmit data

simultaneously. For example, Fibre Channel connections that are based on the 1 Gbps standard can transmit at 100 Megabytes per second (MBps) and receive at 100 MBps simultaneously. For Fibre Channel connections that are

based on the 2 gigabits per second (Gbps) standard, they can

transmit at 200 MBps and receive at 200 MBps simultaneously.

This speed also extends to 4 Gbps, 8 Gbps, and 16 Gigabytes per

second (GBps) technologies.

Arbitrated loop topologyAlthough this topology is rarely encountered anymore andconsidered a historical topology, we include it for historical reasons only.

Our second topology is Fibre Channel Arbitrated Loop (FC-AL). FC-AL is more useful for storage applications. It is a loop of up to 126 node loop ports (NL_ports) that is managed as a shared bus. Traffic flows in one direction, carrying data frames and primitives around the loop with a total bandwidth of 400 MBps (or 200 MBps for a loop-based topology on 2 Gbps technology).

Using arbitration protocol, a single connection is established between a sender and a receiver, and a data frame is transferred around the loop.

When the communication comes to an end between the two connected ports,

the loop becomes available for arbitration and a new connection might be

established. Loops can be configured with hubs to make connection management

easier. A distance of up to 10 km (6.2 miles) is supported by the Fibre Channel

standard for both of these configurations. However, latency on the arbitrated

loop configuration is affected by the loop size.

Switched fabric topology

Our third topology, and the most useful topology that is used in SAN implementations, is Fibre Channel Switched Fabric (FC-SW). It applies to switches and directors that support the FC-SW standard; that is, it is not limited to switches as its name suggests. A Fibre Channel fabric is one or more fabric switches in a single, sometimes extended, configuration. Switched fabrics provide full bandwidth for each port that is compared to the shared bandwidth for each port in arbitrated loop implementations.

One key differentiator is that if you add a device into the arbitrated loop you further divide the shared bandwidth. However, in a switched fabric, adding a device or a new connection between existing devices or connections actually increases the bandwidth. For example, an 8-port switch (assume that it is based on 2 Gbps technology) with three initiators

and three targets can support three concurrent 200 MBps conversations or a total of

600 MBps throughput. This total equates to 1,200 MBps if full-duplex applications are available.This configuration is one of the major reasons why arbitrated loop is considered a historical SAN topology. A switched fabric is typically referred to as a fabric. In terms of switch interconnections, the switched SAN topologies can be classified as the following types: Single switch topology Cascaded and ring topology Mesh topology

Single switch topology The single switch topology has only one switch and no inter-switch links (ISLs). It is the

simplest design for infrastructures that do not need any redundancy. Because of the issues with this topology introducing a single point of failure (SPOF), this topology is rarely used.

Cascaded and ring topology In a cascaded topology, switches are connected in a queue fashion

Even in a ring topology, the switches connect in a queue fashion, but the ring topology forms a closed ring with an additional ISL

Mesh topology In a full mesh topology, each switch is connected to every other switch in

the fabric. In terms of a tiered approach, the switched fabric can be further classified with the following topologies:

1 Core-edge topology2 Edge-core-edge topology Core-edge topology In core-edge topology, the servers are connected to the edge fabric and

the storage is connected to core switches. Edge-core-edge topology In this topology, the server and storage are connected to the edge fabric

and the core switch connectivity is used only for scalability in terms of connecting to edge switches. This configuration expands the SAN traffic flow to long distance by dense wavelength division multiplexing (DWDM), connecting to virtualization appliances, and encryption switches. Also, the servers might be isolated to one edge and storage can be at the other edge, which helps with management.

Fibre Channel Arbitrated Loop protocols To support the shared behavior of Fibre Channel Arbitrated Loop (FC-

AL), many loop-specific protocols are used. These protocols are used in the following ways:

Initialize the loop and assign addresses. Arbitrate for access to the loop. Open a loop circuit with another port in the loop. Close a loop circuit when two ports complete their current use of the

loop. Implement the access fairness mechanism to ensure that each port

has an opportunity to access the loop

Storage area network devices A Fibre Channel SAN employs a fabric to connect devices, or end points. A fabric can be as simple as a single

cable that connects two devices, similar to server-attached storage. However, the term is most often used to describe a more complex network to connect servers and storage by using switches, directors, and gateways. Independent from the size of the fabric, a good SAN environment starts with good planning, and always includes an up-to-date map of the SAN.

Fibre Channel bridges Fibre Channel bridges allow the integration of traditional SCSI devices in a Fibre Channel network. Fibre

Channel bridges provide the capability for Fibre Channel and SCSI interfaces to support both SCSI and Fibre Channel devices seamlessly. Therefore, they are often referred to as FC-SCSI routers.

(Do not confuse Fibre Channel bridges with data center bridging (DCB), although fundamentally they serve the same purpose, which is to interconnect different protocols)

A bridge is a device that converts signals and data from one form to another form. You can imagine these devices in a similar way as the bridges that we use to cross rivers. They act as a translator (a bridge) between two different protocols. These protocols can include the following types:

Fibre Channel Internet Small Computer System Interface (iSCSI) Serial Storage Architecture (SSA) Fibre Channel over IP (FCIP)We do not see many of these devices today, and they are considered historical devices

Arbitrated loop hubs and switched hubs

Fibre Channel Arbitrated loop (FC-AL) is a Fibre Channel topology in which devices connect in a one-way loop fashion in a ring topology.

In FC-AL, all devices on the loop share the bandwidth. The total number of devices that might participate in the loop is 126, without

using any hubs or fabric. For practical reasons, however, the number tends to be limited to no more

than 10 - 15. Hubs are typically used in a SAN to attach devices or servers that do not support switched fabric-only

FC-AL. They might be unmanaged hubs, managed hubs, or switched hubs. Unmanaged hubs serve as cable concentrators

and as a means to configure the arbitrated loop that is based on the connections that it detects. When one of

the interfaces, typically a gigabit interface converter (GBIC), on the hub senses that no cable is connected,

that interface shuts down. The hub port is then bypassed as part of the arbitrated loop configuration.

Managed hubs offer all of the benefits of unmanaged hubs, but in addition, they offer the ability to

manage them remotely by using Simple Network Management Protocol (SNMP).

By using FC-AL, you can connect many servers and storage devices without using costly

Fibre Channel switches. FC-AL is not used much today because switched fabrics now lead in

the Fibre Channel market.

Switched hubs

Switched hubs allow devices to be connected in their own arbitrated loop. These loops are then internally connected by a switched fabric. A switched hub is useful to connect several FC-AL devices together, but to allow them to communicate at full Fibre Channel bandwidth rather than all share the bandwidth. Switched hubs are typically managed hubs

Switches and directors

Switches and directors allow Fibre Channel devices to be connected (cascaded) together, implementing a switched fabric topology between them. The switch intelligently routes frames from the initiator to the responder and operates at full Fibre Channel bandwidth. Switches can be connected in cascades and meshes by using inter-switch links (ISLs) or expansion ports (E_ports) .

The switch also provides various fabric services and features. The following list provides examples:

Name service

Fabric control

Time service

Automatic discovery and registration of host and storage devices Rerouting of frames, if possible, in a port problem Storage services (virtualization, replication, and extended distances) It is common to refer to switches as either core switches or edge switches, depending on where they are in the SAN. If the switch forms, or is part of the SAN backbone, it is the core switch . If the switch is mainly used to connect to hosts or storage, it is called an edge switch. Directors are also sometimes referred to as switches because they are essentially switches. Directors are large switches with higher redundancy than most normal switches

Multiprotocol routing

Certain devices are multiprotocol routers and devices. Multiprotocol routers and devicesprovide improved scalability, security, and manageability by enabling devices in separate SANfabrics to communicatewithout merging fabrics into a single, large meta-SAN fabric.Depending on the manufacturer, multiprotocol routers and devices support many protocolsand offer their own features, such as zoning. The following list shows the supported protocols: Fibre Channel Protocol (FCP) Fibre Channel over IP (FCIP) Internet Fibre Channel Protocol (iFCP) Internet Small Computer System Interface (iSCSI) Internet Protocol (IP)

Gigabit transport technology

In Fibre Channel technology, frames are moved from the source to the destination by using gigabit transport, which is required to achieve fast transfer rates. To communicate with gigabit transport, both sides must support this type of communication. You can obtain this support by installing this feature in the device or by using specially designed interfaces that can convert other communication transport into gigabit transport. The bit error rate (BER) allows for only a single bit error to occur one time in every 1,000,000,000,000 bits in the Fibre Channel standard. Gigabit transport can be used in a copper or fiber optic infrastructure.

Layer 1 of the open systems interconnection (OSI) model is the layer at which the physical transmission of data occurs. The unit of transmission at Layer 1 is a bit. This section explains the common concepts of the Layer 1 level.

Fibre Channel cabling Fibre Channel cabling is available in two forms: fiber optic cabling or copper cabling.

Fiber optic cabling is the typical cabling type. But,Fibre Channel over Ethernet (FCoE) coppercabling is also available. Fiber optic cabling is more expensive than copper cabling. The optical components for

devices and switches and the cost of any client cabling is typically more expensive to install.

However, the higher costs are often easily justified by the benefits of fiber optic cabling. Fiber optic cabling provides for longer distance. Fiber optic cabling is resistant to signaling distortion by electromagnetic interference. In fiber optic cabling, light is used to transmit the data. Fiber optic cabling is the medium for channeling the light signals between devices in the network.

Two modes of fiber optic signaling are explained in this chapter: single-mode and multimode.The difference between the modes is the wavelength of the light that is used for the transmission

IP SAN An IP SAN is a dedicated storage area network (SAN) that allows multiple servers to access pools

of shared block storage devices using storage protocols that depend on the Internet Engineering Taskforce standard Internet Protocol suite.

The storage protocols designed to move block-based data between a host server and storage array include the Internet Small Computer Systems Interface (iSCSI), Fibre Channel over IP (FCIP) and Internet Fibre Channel Protocol (iFCP).

The most common type of IP SAN uses iSCSI to encapsulate SCSI commands and assemble data into packets for transfer between the host servers and storage devices. IP SAN protocols typically run over a standard Ethernet network and use the Transmission Control Protocol/Internet Protocol (TCP/IP) for communication. An IP SAN for block-based data is often referred to as an iSCSI SAN.

An IP SAN is generally viewed as lower cost, less complex and simpler to manage than an FC SAN. An FC SAN requires special hardware such as host bus adapters and FC switches, whereas the IP SAN can use commodity Ethernet networking hardware. One potential disadvantage of an IP SAN is higher latency than an FC SAN, which uses deterministic layer 2 switching technology.

http://searchstorage.techtarget.com/definition/storage-area-network-SAN

http://searchstorage.techtarget.com/definition/block-storage

http://searchsoa.techtarget.com/definition/IETF

http://searchstorage.techtarget.com/definition/IP-storage

Non-redundant IP SAN implementation this is the non redundant and most simple implementation of an IP

SAN. The benefits are low infrastructure costs, but if any component within the IP SAN fails, access to the data will be lost.

Highly available IP SAN configuration fully redundant iSCSI SAN duplex controllers :

Such a configuration requires a larger investment, but if any component fails, the data will be still accessible. This Setup also allows you to use the high availability (HA) features of the DS3300 such as online firmware upgrade and load balancing between the two NICs in the Server to a logical drive owned by Controller 0, for example. This is also the preferred option for using the DS3300 in an IP SAN.

Storage Virtualization Technology What Is It?

Storage virtualization is a technique that creates a unified view of all deployed storage devices no matter the vendor and allows them to be collectively managed in a uniform way. Virtualization hides the potential complexity created by having a variety of disparate storage devices, therefore providing the flexibility to make changes to the configuration of those storage devices without disrupting the users of the computers they serve. By doing so, storage virtualization also helps a business optimize storage efficiency, improve overall system performance, and maximize return on IT investments.

What's the Need?

Organizations of all sizes and in all industries are both consuming and creating information at an ever-increasing rate. This information is typically stored on a collection of different storage devices. Changing business needs, acquisitions, the advancement of technology, and other forces can lead to unwanted complexity and inefficiency even among the best managed IT infrastructures. In terms of disk storage, this can result in a heterogeneous collection of disk storage devices differing in size, speed, configuration, brand, features, and other characteristics, thereby increasing complexity.

As the number of disk storage devices increases, so do inefficiencies and the effort necessary to manage the system. For example, moving data from one disk subsystem to another disk subsystem (to balance workloads, perform maintenance, recover from a failure, etc.) can be time consuming, error prone, and may require taking host computers down, thereby disrupting users. Further, a physical disk subsystem may be dedicated to a specific application or host, preventing any other application from using any unused storage capacity it may have.

Storage virtualization provides a way to insulate the host computers and their users from the complexity associated with mixing different types of storage systems. In doing so, storage virtualization can help a business simplify storage management, improve system availability, add advanced functions to older storage devices, improve storage utilization and efficiency, gain the flexibility necessary to balance workloads, optimize storage performance, and implement an effective disaster recovery strategy, thus improving business continuity.

How Does it Work?

In computing terms, virtualization is a smarter way to deploy physical computing resources

(memory, storage, network adapters, etc.). It is a clever technique that has been around for

decades but has now reached a new level of sophistication that delivers maximum computing

efficiencies and flexibility. In the context of disk storage, virtualization can be implemented

anywhere along the path from application programs running in the host to the physical disk

drives themselves. Virtualization can be integrated into disk storage devices or implemented

in a separate device. The SVC is an appliance, i.e., specialized hardware and software

packaged together to implement a specific function (in this case, storage virtualization).

Virtualization in SAN Block-level Storage Virtualization

Block-level storage virtualization aggregates block storage devices (LUNs) and enables provisioning of virtual storage volumes, independent of the underlying physical storage. A virtualization layer, which exists at the SAN, abstracts the identity of physical storage devices and creates a storage pool from heterogeneous storage devices. Virtual volumes are created from the storage pool and assigned to the hosts. Instead of being directed to the LUNs on the individual storage arrays, the hosts are directed to the virtual volumes provided by the virtualization layer. For hosts and storage arrays, the virtualization layer appears as the target and initiator devices, respectively. The virtualization layer maps the virtual volumes to the LUNs on the individual arrays. The hosts remain unaware of the mapping operation and access the virtual volumes as if they were accessing the physical storage attached to them. Typically, the virtualization layer is managed via a dedicated virtualization appliance to which the hosts and the storage arrays are connected.

It shows two physical servers, each of which has one virtual volume assigned. These virtual volumes are used by the servers. These virtual volumes are mapped to the LUNs in the storage arrays. When an I/O is sent to a virtual volume, it is redirected through the virtualization layer at the storage network to the mapped LUNs. Depending on the capabilities of the virtualization appliance, the architecture may allow for more complex mapping between array LUNs and virtual volumes. Block-level storage virtualization enables extending the storage volumes online to meet application growth requirements. It consolidates heterogeneous storage arrays and enables transparent volume access.

Block-level storage virtualization also provides the advantage of nondisruptive data migration. In a traditional SAN environment, LUN migration from one array to another is an offline event because the hosts needed to be updated to reflect the new array configuration. In other instances, host CPU cycles were required to migrate data from one array to the other, especially in a multivendor environment. With a block-level virtualization solution in place, the virtualization layer handles the back-end migration of data, which enables LUNs to remain online and accessible while data is migrating. No physical changes are required because the host still points to the same virtual targets on the virtualization layer. However, the mappings information on the virtualization layer should be changed. These changes can be executed dynamically and are transparent to the end user.

Previously, block-level storage virtualization provided nondisruptive data

migration only within a data center. The new generation of block-level

storage virtualization enables nondisruptive data migration both within and

between data centers. It provides the capability to connect the virtualization

layers at multiple data centers. The connected virtualization layers are

managed centrally and work as a single virtualization layer stretched across

data centers. This enables the federation of block-storage resources both

within and across data centers. The virtual volumes are created from the

federated storage resources.

Virtual SAN (VSAN) Virtual SAN (also called virtual fabric) is a logical fabric on an FC SAN, which enables communication among a group of

nodes regardless of their physical location in the fabric. In a VSAN, a group of hosts or storage ports communicate with

each other using a virtual topology defined on the physical SAN. Multiple VSANs may be created on a single physical SAN.

Each VSAN acts as an independent fabric with its own set of fabric services, such as name server, and zoning Fabric-

related configurations in one VSAN do not affect the traffic in another. VSANs improve SAN security, scalability,

availability, and manageability. VSANs provide enhanced security by isolating the sensitive data in a VSAN and by

restricting access to the resources located within that VSAN. The same Fibre Channel address can be assigned to nodes in

different VSANs, thus increasing the fabric scalability. Events causing traffic disruptions in one VSAN are contained within

that VSAN and are not propagated to other VSANs. VSANs facilitate an easy, flexible, and less expensive way to manage

networks. Configuring VSANs is easier and quicker compared to building separate physical FC SANs for various node

groups. To regroup nodes, an administrator simply changes the VSAN configurations without moving nodes and recabling.

Storage cloud overview

A storage cloud provides storage as a service (SaaS) to storage consumers. It can be delivered in any of the previously described cloud delivery models (public, private, hybrid, and community). A storage cloud can be used to support a diverse range of storage needs, including mass data stores, file shares, backup, archive, and more. Implementations range from public user data stores to large private storage area networks (SAN) or network-attached storage (NAS), hosted in-house or at third-party managed facilities. The following examples are publicly available storage clouds:

IBM SmartCloud offers various storage options, including archive, backup, and object storage.

SkyDrive from Microsoft allows the public to store and share nominated files on the Microsoft public storage cloud service.

Email services, such as Hotmail, Gmail, and Yahoo, store user email and attachments in their respective storage clouds.

Facebook and YouTube allow users to store and share photos and videos. Storage cloud capability can also be offered in the form of storage as a service, where

you pay based on the amount of storage space used. A storage cloud can be used in various ways, based on your organization's specific requirements.

Although the devices can access SAN or NAS storage, SAN or NAS storage can itself use storage cloud for backup or other purposes.

Public Cloud

Private Cloud

Community Cloud

Hybrid Cloud

Cloud Computing Infrastructure

A cloud computing infrastructure is the collection of hardware and software that enables the five essential characteristics of cloud computing. Cloud computing infrastructure usually consists of the following layers:

Physical infrastructure Virtual infrastructure Applications and platform software Cloud management and service creation toolsThe resources of these layers are aggregated and coordinated to provide cloud services to the consumer

Physical Infrastructure The physical infrastructure consists of physical computing resources, which

include physical servers, storage systems, and networks. Physical servers are connected to each other, to the storage systems, and to the clients via networks, such as IP, FC SAN, IP SAN, or FCoE networks.

Cloud service providers may use physical computing resources from one or more data centers to provide services. If the computing resources are distributed across multiple data centers, connectivity must be established among them. The connectivity enables the data centers in different locations to work as a single large data center. This enables migration of business applications and data across data centers and provisioning cloud services using the resources from multiple data centers.

Virtual Infrastructure

Cloud service providers employ virtualization technologies to build a virtual infrastructure layer on the top of the physical infrastructure. Virtualization enables fulfilling some of the cloud characteristics, such as resource pooling and rapid elasticity. It also helps reduce the cost of providing the cloud services. Some cloud service providers may not have completely virtualized their physical infrastructure yet, but they are adopting virtualization for better efficiency and optimization.

Virtualization abstracts physical computing resources and provides a consolidated view of the resource capacity. The consolidated resources are managed as a single entity called a resource pool. For example, a resource pool might group CPUs of physical servers within a cluster. The capacity of the resource pool is the sum of the power of all CPUs (for example, 10,000 megahertz) available in the cluster. In addition to the CPU pool, the virtual infrastructure includes other types of resource pools, such as memory pool, network pool, and storage pool.

Apart from resource pools, the virtual infrastructure also includes identity pools, such as VLAN ID pools and VSAN ID pools. The number of each type of pool and the pool capacity depend on the cloud service provider’s requirement to create different cloud services. Virtual infrastructure also includes virtual computing resources, such as virtual machines, virtual storage volumes, and virtual networks. These resources obtain capacities, such as CPU power, memory, network bandwidth, and storage space from the resource pools. The capacity is allocated to the virtual computing resources easily and flexibly based on the service requirement. Virtual networks are created using network identifiers, such as VLAN IDs and VSAN IDs from the respective identity pools. Virtual computing resources are used for creating cloud infrastructure services.

Applications and Platform Software

This layer includes a suite of business applications and platform software, such as the OS and database. Platform software provides the environment on which business applications run. Applications and platform software are hosted on virtual machines to create SaaS and PaaS. For SaaS, both the application and platform software are provided by cloud service providers. In the case of PaaS, only the platform software is provided by cloud service providers; consumers export their applications to the cloud.

Cloud Management and Service Creation Tools

The cloud management and service creation tools layer includes three types of software:

Physical and virtual infrastructure management software

Unified management software

User-access management software

This classification is based on the different functions performed by the software. This software interacts with each other to automate provisioning of cloud services. The physical and virtual infrastructure management software is offered by the vendors of various infrastructure resources and third-party organizations. For example, a storage array has its own management software. Similarly, network and physical servers are managed independently using network and compute management software respectively. This software provides interfaces to construct a virtual infrastructure from the underlying physical infrastructure.

References 1-Introduction to Storage Area Networks, IBM

RedBooks, International Technical Support Organization, January 2016

2- Information Storage and Management Storing, Managing, and Protecting Digital Information in Classic, Virtualized, and Cloud Environments,2nd Edition ,EMC Corporation, 2012