Back To Basics — Disk Driver and Storage

Back To Basics — Disk Driver and Storage

The increase in disk usage is not a linear function it is exponential function. The usage of the disk space has increased and will run out space soon. So planning the storage is very important.

Let us understand the type of disk drives

Hard Disk Drives

Hard Disk Drives (HDDs) also known as spinning drives, are traditional magnetic disk drives. capacities of modern HDDs range anywhere from 500GB to 6 TB. HDDs include mechanical components such as motor that spins the disk platers and the servos that move the read/write heads over the spinning platters to reach and write data

The performance of an HDD depends in large part on how fast the disk platters spin. Disk speed is measured in revolutions per minute (RPM) with three speeds being common 7.2k,10k, and 15k. The higher RPM drives have better performance because read/write heads must wait less time for data to arrive under the heads. In addition, when the data does arrive at the read/write heads, it can be read or written faster because the magnetic medium of the disk platter travels past the heads at a higher rate of speed.

Higher-RPM drives are also more expensive than slower spinning drives because greater engineering care is needed to safely spin the platters at higher speeds

Solid State Drives

Solid State Drives (SSD) are all electronic devices with no moving parts. They are based on memory technology and they are considerably faster than HDDs. They are also considerably more expensive, and they tend to have a smaller capacity typically 100GB to1TB.

SSD storage is significantly faster than HDD storage because no moving parts are involved. SSD technology is on the order of a thousand times faster than HDD technology, though that doesn’t necessarily mean a given SSD drive is 1000 times faster than a given HDD. many other factors combine to determine the overall performance of a storage device. Still, SSD is several orders of magnitude faster than HDD storage. HDD speed is measured in milliseconds — Thousands of a second — while a SSD storage is measured in microseconds — millionths of a second

SSD storage devices are based on flasj memory, similar to the memory that is used in USB flash drives, but more reliable and considerably faster.

SSD are considerably more expensive than HDDs of similar capacity so far the time being, HDD is more likely to fit within the budget. Most networks include a combination of SSD and HDD reserving SSD for data with speed benefit HDD outweighs the price penalty.

Form Factor

Form factor refers to the size of the disk drives you will use. Both HDDs and SSDs come in two basic form factors: 3.5inch called LFF(Large form factor) and 2.5 inch called SFF(small form factor) Because 3.5 inch disk drives are larger they have potentially higher capacity. the smaller 2.5 inch drives have smaller capacity

Considering Drive Interfaces

The drive interfaces manages the connection between the disk drive itself and he control unit that the drive is attached to. The Disk controller is built into motherboard and it is almost always the first variety called SATA. In network server, the disk controller is often a separate card installed into the server’s chassis. In that case, either the SATA interface or the more advanced SAS interface can be used

SATA

SATA is most popular interface for soncumer devices. It is an evolution of the original disk interface that was used when hard drives were first intoduced in IBM PCs. This interface was originally called IDE which toold for Integrated device electronics. That was soon replaced by ATA which stood for AT attachment because it was designed to work with IBMs PC-AT line of personal computers.

The original IDE and ATA interfaces were parallel interfaces, which meant that they transmitted an received 16 bits of data at a time. This arrangement required a total of 40 separate wires on the cables that connected the disk drives to the controllers and complicated circuitry that keep the data synchronized on all the wires

Parallel interfaces were increasingly difficult to keep up with increasing disk transfer speeds, so IDE and ATA evolved into a serial interface, data is transmitted one bit at a time.

SATA is used on nearly all desktop and laptop computers and on many low end server computers. Most SATA disks can transmit the data at 6 Gbps there are two classes of SATA disk devices: consumer and enterprise.Consumer class SATA disks are found in desktop and laptop computers and are the least expensive disk drives available. Enterprise class SATA drives are preferred for server storage because they are about 10 times as reliable as consumer class drives. They are bit more expensive

SAS

SAS is the preferred drive interface for netwrok storage. It is an evolution of an older drive interface called SCSI which stands for small Computer System Interface. SAS is the serial version of SCSI stands for Serial Attached SCSI

The SAS interface is faster than the SATA interface. Most SAS devices transfer data from the disk to the controller at either 6Gbps or 12 Gbps

The ability to work at 12 Gbps is one of main benefits of SAS or SATA but reliability is another important factor. Enterprise class SAS drives are about ten times more reliable than enterprise class SATA drives. But performance and reliability are important considerations for network storage, so we go with 12gbps SAS drives whenever the budge will allow.

Considering RAID

Reliability is one of the most important considerations when planning your network storage. All disk devices will eventually fail. This includes SSDs as well as HDDs. In fact, SSDs and HDDs have about the same reliability both fall at about the same rate.

We have ways to survive disk drive failures. The first line of defense s to use RAID which groups the disk drives together into arrays that have built-in redundancy and automatic recovery when one of the drives in an array fails. There are 3 main RAID configurations : RAID 10, RAID 5 and RAID 6

RAID 10

In a RAID 10 array, the disks in the array are paired into mirror sets, in which both disks in each set contain the same data. Whenever data is written to one disk in a set the exact same data is written into the other disk. This if either of the two disks fails, the other disk in the set has a backup copy of the data

RAID 10 is considered the safest form of RAID, but it is vulnerable to a loss of two disks in an array. If two disks fail at the dame time, only luck will determine whether the entire array is lost. If the failing disks are in separate mirror sets, the array will survive. But if both disks in a single mirror set are lost, the entire array will be lost.

RAID 5

In a RAID 5 array, multiple disks are combined into single array, but the equivalent of one disk’s worth of space is set aside for redundancy. The redundancy data is actually spread across all the disks in the array, but the total amount of disk space needed for the redundancy is equivalent to one full disk in the array.

If any disk in the array fails, the content of that disk is recovered to a new disk by calculating the data that was on the failed disk using the data that is on the surviving disks.

The usable capacity of the array is one derive less than the total number of drives in the array.

RAID5 is more efficient thant RAID 10 in terms of disk cpacity But the performance perspective, RAID 5 is considerably slower than RAID 10 when writing dat ato the disk. To write the Data to the RAID5 array, first the redundancy data must be calculated. Then both the data initially to be written as well as the redundancy data must be written to the array. The RAID 5 is less efficient because of the calculation and because of the need for multiple writes. When one drives in the RAID-5 fails the array will take much longer to rebuild than when a drive in a RAID 10 array fails.

RAID 6

RAID 6 is one step more secure than RAID5. Instead of calculating one set of redundancy data for the entire array, in RAID 6 two sets of redundancy information is calculated. Effectively, two of the disks in the array are set aside for redundancy. This allows the array to survive the loss of any two disks in the array not just a single disk

Of course RAID6 imposes a greater space penalty than RAID5. RAID6 is bit slower than RAID 5 because two sets of redundancy data must be calculated rather than one.

Attachment Types

Disk storage must be attached to your servers. There are four basic approached to attaching storage to your servers

Direct Attached Storage

Direct Attached Storage (DAS) is the simplest and most obvious wat to attach the storage to a server. With DAS , storage is directly connected to the hard disk controller within the server. This provides the fastest possible connection to the computer but it is also the most limited because the storage can be used only by the computer to which it is directly attached.

the hard disk controller is on the motherboard and the disk drive or drives are mounted inside the computer’s case in internal drive bays. In atypical rack mounted server computer, drive bays for DAS are also built into the case. but they are usually accessible from the front of the server and they are usually hot swappable which means they can be removed and replaced while the server is powered up.

Most server computers need at least a small amount of DAS installed directly in the server chassis. You can use this storage for the server’s operating system or if you use virtualization.

As the storage is only available to the server it is not good to have large amount of storage directly attached. You can use external storage subsystem that has the ability to directly attach to more then one host. Such systems can typically be attached to anywhere from two to four host servers. the attachments are usually made with external SAS cables. This arrangement requires external SAS adapters on both the host servers and external storage subsystem

Storage Area Networks

A storage area netwrok is used when the number of storage devices in the host computers makes it impossible to directly connect the storage to the hosts. Instead of a separate network of storage devices is created using networking technology called Fiber channel. Fiber channel is similar in many ways to other networking technology such as Ethernet, but designed specifically or connecting huge number of storage devices to servers. Fiber channel networks can support thousands of storage devices.

fiber channel is also very fast, with top speeds of up to 128 Gbps. However, most fiber channel networks run at a more modest 16 Gbps. Fiber channel usually operates over a fiber-optic cable but it can also run on copper cable at slower speeds.

It is also connected with Switches via Fiber Channel switch. The cables and connectors for this network are 16Gbps fiber-optic

Network Attached Storage

One final form of attaching storage in a network is called network attached storage (NAS). when NAS is used, storage devices are connected directly to the existing ethernet network and data is accessed over TCP/IP using a variety of protocols that enable normal disk and file handling operations to be encapsulated in IP packets. NAS is one of the easiest ways to add large amounts of storage to the network but NAS doesn’t have nearly performance that SAN or DAS does. UNAS devices is limited to the speed of underlying network which is typically 1Gbps.

The most common form of NAS consists of appliance like devices that are essentially a small computer running as a file server with large amount of disk storage. Users can access data on a NAS appliance as if it were any pother file server on the network. The NAS appliances usually have a web-based administrative console that can be used to set up shares, manage permissions and so on