By Anil Vasudeva, President and Chief Analyst,
IMEX Research
Industry Trends
Explosion in the growth of Storage Data
The internet has been the catalyst behind the explosive growth of
digital information, growing at an annual rate of 60% and estimated to
reach 1,800 Exabytes by 2011. Regardless of the state of the economy,
the amount of data that needs to be stored, accessed and managed will
continue to grow exponentially, especially as new types of data such as
that from social networking and internet video continues to mushroom.
Based on IMEX Research work with the providers and deliverers of Web
2.0 content and services, three fundamental trends stand out that would
dictate the requirements of the new infrastructure architecture:
- The infrastructure needs to be designed to scale quickly in order
to dynamically react to the fluctuations in demand for capacity,
performance and high availability of stored data.
- The storage has to be managed holistically - with an ability to
deploy a 10PB system and manage the data easily with a small staff.
- With the increasing shift to support Web 2.0 and Cloud Computing
and Services, the economics of the infrastructure has to align with the
new business model, needing an inexpensive infrastructure and truly
cheap storage for this new business model to work. Google Search,
YouTube Video, Facebook or Flickr collaborative sites could not have
grown if they charged $10 a month to access them. Their monetizing
models have evolved to soft-sell viral ads, instead. In such cases a PB
of data at $15/GB wouldn't sell, nor would $3/GB if you need an army of
administrators to keep the system up and running.
- The Enterprises have started to evolve into Enterprise
DataCenter/Private Enterprise Cloud (using SOA or Service Oriented
Architecture running SaaS or Software as a Service) and Public
CloudCenter© (run by Service Providers and supporting compatible SOA
and Software wherein the applications can be equivalently run either in
the Private Enterprise or the Public CloudCenter©.
How the storage industry is providing newer, cost-effective Serial
Attached SCSI (SAS)/Serial Advanced Technology Attachment (SATA)
storage technologies and platforms to keep pace with this growth and
required operational metrics is described in this article.
Market Drivers for SAS/SATA Storage
SMB markets - the perfect storm for SAS/SATA Sharing
data by multiple PC clients over a Local Area Network (LAN) as
companies grew, gave rise to LAN-attached storage (NAS). Access to
shared storage by clients became a bottleneck whenever backups from
disk NAS to tape drives happened, as it completely hogged the LAN
bandwidth. That required a separate dedicated network (a la SAN) to do
LAN-less backups without disturbing the client's direct access to NAS.
These SAN networks had to be lossless, unlike Ethernet based LANs which
would wilt under high traffic and start dropping packets as required by
their CDMA design, resulting in constant retries and a complete loss of
performance. As a result, a lossless (albeit very expensive) Fibre
Channel Storage Area Network (FC San) was devised. It needed special
components but could be used over long distances as required by large
companies with operations spread over multiple divisions, which only
their large dedicated storage IT staff could implement and maintain.
In the case of small-to-medium-size business (SMB), SAS fabrics are
being implemented in configurations where expensive FC SANs are not
attractive. These are configurations where high performance is needed
and the solution needs to be low cost, yet doesn't need to cover large
distances such as a small scale data center with less than 50 servers
and storage arrays. By using a SAS fabric, the data being written to
and read from the hard drives does not need to be translated to a
different protocol to move thru the fabric as in some FC-based SANs.
This creates a perfect storm for matching the needs of SMB markets with
SAS/SATA storage solutions. Companies implementing their IT
infrastructure using blade server-integrated storage within a small
chassis with multiple chassis in a rack, as the "datacenter in a room,"
are perfect candidates for the use of SAS/SATA storage.
Tiered Storage in the Enterprise
Given the different cost/performance/availability required by
different workloads, it is natural to optimize the storage
infrastructure to optimally match that need. This resulted in the
generation of multi-tiers in storage systems - namely:
• Tier-0 Solid State Flash Drives,
• Tier-1 Primary Storage (Performance Optimized SAS/SATA based Disk Arrays),
• Tier-2 Nearline Back up Storage (Capacity Optimized SATA Storage) •
Tier-3 Archival Storage (SATA based VTL and Tape Libraries) as shown in
the diagram of Corporate Online Data Usage. See figure below.
SAS Evolves with the Industry
The storage industry responded by delivering high-capacity drives
that store 3-4x more data than traditional performance-optimized
drives, and are suitable for 24x7 use in rack-mount environments. These
high-capacity drives are helping IT managers meet their storage growth
requirements and will continue to increasingly penetrate the data
center as users learn how to better leverage these storage devices.
With its ability to support both SAS and SATA disk drives, SAS is also
making headway as the disk-drive interface of choice for external
storage in both JBODs and external RAID subsystems. SAS is beginning to
penetrate a segment that until now has primarily been the domain of
Fibre Channel.
In 2007, roughly 10.8 million capacity-optimized HDDs shipped in
datacenter storage solutions. Today SAS is the dominant storage
interface offered by server vendors worldwide.
Enterprise Level Storage System Features
SAS is Scalable, Sharable, Secure Storage 6Gb/s SAS is
more than an improved set of features over the current generation of
SAS. It offers IT managers' highly tangible benefits that will make
their data more reliable, secure and faster. SAS-2 (T10 SAS standard
designation) allows up to 10-meter cables, standardized expander zoning
and spread spectrum clocking to reduce radiated emissions (EMI).
Multiplexing, which allows multiple, slower speed data streams to be
aggregated into a 6Gb/s data stream is an efficient way of aggregating
bandwidth. SAS-2 provides performance and row address strobe (RAS),
which are at least on par with that provided today by Fibre Channel.
6Gb/s SAS products and systems will make a significant impact on
storage in 2009.
Some of the main features of SAS-2 include:
- Performance — doubles the link rate and bandwidth
- Multiplexing — optimizes bandwidth by enabling two 3Gb/s links to share a 6Gb/s port
- Increased zoning capabilities — enables partitioning of a domain into smaller sets of accessible devices
- Self-configuring expander devices — accelerates system initialization and change detection
- Diagnostics and robustness — improves status reporting and error notification
Flexibility - Combining High Performance with High-Capacity Drives SATA
drives are primarily designed for cost-effective bulk storage. To
achieve economies of scale, SATA drives feature lower spindle speeds
(typically 7,200 rpm), lower mean-time-between-failure rates and lower
cost. Consequently, they tend to be applied where transaction rates are
low and data availability is not critical.
SAS drives, on the other hand, are built for high-performance,
high-availability use. SAS-2 has the ability to connect high-capacity
disk drives (SATA) alongside performance drives (SAS) in a storage
system. The SAS connector itself is designed as a single, uniform
backplane, so designing a system with both drive types is simple. This
compatibility reduces the cost and complexity of storage designs, since
SATA devices are fully compatible with SAS controllers - the SATA
Tunneled Protocol (STP), included within SAS, passes SATA commands
through to the SATA drives.
SAS and SATA compatibility also allows system builders to design hybrid
storage systems using common connectors and cabling. Installing or
upgrading either SATA or SAS drives in the same system is simply a
matter of replacing one drive type with the other as the SAS backplane
connectors receive both SAS and SATA devices. However, since SATA
backplanes connect only to SATA devices, backplanes should use SAS
connectors to provide the greatest system design flexibility.
SAS connects with SATA through one of the following techniques, (1)
using expanders with SATA Tunneling Protocol (STP)/SATA bridging, (2)
using SATA drive tailgate cards with Serial SCSI Protocol (SSP)/SATA
bridging, and (3) using high-capacity SAS drives with pure SSP. Each
approach has its advantages and disadvantages that should be considered
during architectural design.
Cable/Connector Consolidation To meet industry's
demand for denser cabling solutions, the small form factor (SFF)
mini-SAS connector has been quickly adopted for both internal and
external connectivity.
Scalability
Data centers require storage architecture that is able to scale on
demand. By shifting more of the SAS topology discovery process from the
host controller to the expander, and by providing the added capability
of flexible table-to-table routing, SAS-2 now dramatically reduces SAS
messaging during topology discovery, resulting in reduced time to
discover, initialize and scale ever-increasing devices needed by large,
tiered-storage solutions.
SAS uses expander hardware to simplify configuration of large external
storage systems that can be easily scaled with minimal latency while
preserving bandwidth for increased workloads. The expander hardware
enables highly flexible storage topologies of up to 256 mixed SAS and
SATA drives. SAS expander hardware, in effect, functions as a switch to
simplify configuration of large systems that can be scaled with minimal
latency degradation while preserving bandwidth for increased workloads.
SAS today has the ability to connect multiple servers and thousands of
storage devices. Scalability at that level often requires that the
storage devices and/or subsystems be consistently assigned, or zoned,
to operate with multiple hosts in virtualized server deployments. This
ability to assign various operating domains for both shared and
separate pools of storage is accomplished through a capability scheme
referred to as SAS expander zoning.
This standardized zoning improves SAS' ability to effectively support
more complex topologies across multiple expander vendors, while
increasing the number of supported zones from 128 to 256.
Intelligent Self-Configuring Expanders
Expanders are capable of implementing self-configuration features. Each
expander device discovers the devices attached to it and completes its
own route table. Since all expanders are initializing at the same time,
the overall system topology is resolved quickly.
SAS as a Fabric
SAS was developed as the natural evolution of parallel SCSI, enabling
point-to-point drive connections via a serial interface. To support
direct-attach storage outside of the server, the concept of an expander
was defined. SAS expanders enable a simple switching topology and allow
multiple servers to connect to the same SAS JBOD, and then be shared
between multiple servers.
As larger and larger SAS-based topologies are being implemented, there
is discussion over using Serial Attached SCSI as a fabric technology.
To fully understand this phenomenon, it's important to understand the
roots of SAS and how SAS systems are being architected. SAS-2 provides
additional status and reporting information to facilitate diagnostic
functions. In the event of a fault, this status data can be used to
identify, isolate, and analyze fault and error conditions.
SAS expanders, built into SAS switches, enable SAS fabrics to be built.
SAS fabrics make it simple to add more storage to a configuration. SAS
hard drives in a SAS fabric are sharable to all servers connected to
the fabric. SAS zoning allows administrators to divide the storage into
segments and then decide which servers are allowed to access which
specific segments. Server blades connecting to multiple storage blades
form a good environment for use of a SAS fabric.
Command Queuing
Like SCSI, SAS includes advanced command queuing with 256 queue levels,
providing unique intelligent data handling features such as
head-of-queue and out-of-order queuing. These queuing features are
critical to enterprise applications because they allow a system to
reorder and reprioritize commands within the interface.
Increased Security through Zoning
Secure zoning is a part of the new SAS-2 specification which is defined
to increase security in a server and storage environment. A "zone" is
similar to a hardware firewall that compartmentalizes a group of disk
drives to create secure zones, segregating data. It works with 6Gb/s
and 3Gb/s SAS and SATA disk drives within a 6Gb/s SAS environment.
SAS-2 zoning enhances the SAS fabric by providing a hardware mechanism
to increase device segregation. SAS data storage systems may include a
variety of device types such as SAS and SATA, as well as data
protection mechanisms such as RAID and encryption. Zoning enables
segregation of these storage types at the system level to simplify
partitioning, provisioning and overall system management. Zoning can be
optionally secured by password to prevent unauthorized access,
malicious attacks and corruption of data by operator or application
error on the server.
Enterprise-level Data Integrity
Commonly referred to as Data Integrity Field (DIF), it allows both data
and commands to be protected from the application layer, all the way
from the host to the storage system to the disk drive.
Supporting Industry Standards-based Ecosystem
Decision Feedback Equalization (DFE) allows SAS cabling of up to 10m at
6Gb/s transfer rates, keeping pace with the throughput being offered in
PCI Express 2.0 servers. At 6Gb/s, the second generation (SAS-2) with
6Gb/s controllers, are optimized to take full advantage of the 5Gb/s
per-link speeds of PCIe 2.0 enabling peerless system robustness. The
improvement in bandwidth allows more disk drives to be added to the
high-performance SAS links without the need for additional host
controllers or ports, freeing up PCI Express slots for other system
expansion needs and reducing cable congestion.
Faster Performance
Network user demands for faster data keep growing. SAS-2 helps
overcome the increased demands by doubling the throughput capability of
SAS to 6Gb/s. Each SAS connection now supports up to 600MB/sec of
throughput. Common SAS controllers come with four or eight ports, which
creates connections up to 2.4Gb/s and 4.8Gb/s of throughput,
respectively.
The full-duplex, point-to-point nature of SAS enables simultaneously
active connections among multiple initiators and high-performance SAS
targets. Narrow ports allow for a single serial link, while wide ports
support multiple links, allowing the aggregation of eight SAS or SATA
targets to increase total available bandwidth to 24 Gb/s, the
significant bandwidth requirement of large SAS topologies.
Moving to 6Gb/s SAS means faster data throughput, yet it works with
3Gb/s SAS, which protects any current investment in SAS disk drives and
storage systems. With the use of 6Gb/s SAS expanders, twice as many
3Gb/s disk drives can be connected with the 6Gb/s SAS multiplexing
capability.
Solid State Drives to the rescue
Solid state drives (SSDs), have a promising future in the enterprise
space. They promise to overcome literally all limitations of
traditional hard drives – power consumption, heat dissipation,
mean-time-between-failures, speed and IO/s, etc. Much as 2.5 inch SAS
HDDs can now offer high performance (at both 10K and 15K spindle
speeds) when used in conjunction with SSDs, they can significantly
improve overall system IOPs – typically about 30 times – with response
times of less than two milliseconds as compared to a 15K RPM Fibre
Channel drive. Commercial databases can get high levels of value from
these when they are used intelligently, which is to say if sites put
metadata rather than data on them. SSDs represent what may come to be
thought of as a new storage tier, "tier 0".
For all of their good features, some of the shortcomings of SSDs relate
to their wear characteristics resulting in limited read cycles. To
overcome such limitations the SSD vendors have created wear-leveling
techniques wherein failing or bad data blocks are diagnosed early on
and automatically removed and substituted under a controller, thus
mitigating reliability concerns. In fact, the larger the SSD storage
size, the better these wear-leveling algorithms perform. NAND-based
SSDs have a growing opportunity in the datacenter. Similar to HDDs,
flash-based SSDs are offered with several interface options. Most
manufacturers are standardizing SSDs with a SAS/SATA interface instead.
Application-Aware Storage Infrastructure
Certain storage applications requiring high IOPs to support the data
architecture and performance requirements of the computer system are
candidates for SSD-enabled acceleration. (See chart: Application-aware
Storage Infrastructure) above. Enterprise storage applications can
strongly benefit from the use of SSDs in conjunction with
cost-effective SAS/SATA drives.
The use of SSDs, besides improving performance, can also
significantly reduce power consumption since they have no spinning
media. SSDs consume a fraction of the power consumed by magnetic hard
drives. A 64GB flash drive, for example, can use 30 to 40 percent less
energy than a 73GB 15K RPM magnetic drive. Reduced power consumption
also means reduced heat dissipation. As such, the array as a whole will
have a lower thermal footprint and reduce air-conditioning
requirements.
Investment Protection
SAS-2 will have a 6Gb/s data rate and be backward compatible with
existing 1.5Gb/s SATA and 3Gb/s SAS/SATA products and infrastructure.
The 6Gb/s SAS interface not only enables faster data rates, it also
offers new benefits and opportunities for enterprise applications,
including the ability to spread I/O requests over a greater number of
HDDs. It also provides a higher performance interface for future SSD
devices designed with very fast I/O and data-rate capabilities. The
6Gb/s SAS interface also allows for the design of SAS storage solutions
that could compete with storage systems currently leveraging the Fibre
Channel interface.
Backward Compatibility
SAS-2 at 6Gb/s doubles the previous generation's bandwidth for each
link and adds link multiplexing to enable a 6Gb/s link rate to share
two 3Gb/s connections. 6Gb/s SAS is built to be backward compatible to
3Gb/s and 1.5Gb/s link rates. The SAS infrastructure supports a mix of
SAS and SATA drives and link rates.
Scaling SAS Implementations
One of the benefits of SAS expanders is that they can be cascaded,
enabling very large configurations to be built. A single SAS domain may
support up to 16,384 devices and access every hard drive. The SAS-1.1
specification enables the concept of SAS zoning, whereby a
configuration of SAS hard drives is broken into groups or zones and
servers are enabled to communicate with drives in one or more zones. IT
managers are able to specifically prevent some servers from
communicating with some zones. SAS zoning is implemented in the SAS
expander and this new expander essentially becomes a SAS switch. Once
SAS switches are incorporated into a configuration, a SAS fabric is
built. Such a switch is much less expensive to build and maintain than
Fibre Channel. SAS fills the need for a high-performance, low-cost
fabric that is not required to span very long distances.
SAS and SATA Compatibility One of the main reasons
that SAS has been able to scale is due to its compatibility with SATA
HDDs, which provide the highest capacity at the lowest
cost-per-gigabyte of any storage media. In addition, the use of a SATA
Active/Active port selector to dual-port a SATA HDD enables fully
redundant storage architectures with greater system fault tolerance.
Where SATA drives are used for infrequently accessed data, such as
near-line storage or backup, Redundant Array of Independent Disks
(RAID) is commonly used to mitigate the reliability risks of SATA
storage.
The ability to support enterprise quality data storage with SAS, and
cost effective, high capacity storage with SATA, both using the same
SAS infrastructure, has led to economical, scalable storage and server
offerings. In addition, having a mix of SAS and SATA drives allows for
information lifecycle management (ILM), whereby data migrates from
primary 24/7 storage using SAS devices, to secondary/nearline storage
using SATA devices as it ages and is accessed less frequently. When
data has completed its useful life it is moved to tape for archiving.
Management
The Serial Management Protocol (SMP) is enhanced to provide more
configuration, faster initialization and greater reporting for
diagnostic and status monitoring.
SAS and Windows SAS' backward compatibility with
previous-generation SCSI software and middleware, makes it easy to
incorporate legacy components - hosts and drives - into evolving SAS
topologies eliminating new training or integration costs and the need
for modifications to legacy software.
A Data Integrity Field (DIF) where 8 bytes of protection information
per sector is used by the drive and host system to validate the data,
features like Full Disk Encryption (FDE) for security or improved
external storage capabilities and features (like virtualization) that
take advantage of the high-interface bandwidth - all are enhanced by
SAS-2. All of these features combine to allow for bigger, more
function-rich computing environments.
SAS Market Penetration
In addition to the rapid transition to SAS for performance-optimized
drives, the vast majority of capacity-optimized drives shipping for
enterprise applications today employ the SATA interface. When
accounting for all performance and capacity-optimized HDDs that shipped
in 2008, more than 70% shipped with a serial interface (SAS or SATA) as
shown in the HDD shipments chart.
SAS leverages proven SCSI functionality and builds on the enterprise
expertise of multiple chip, board, drive, subsystem and server
manufacturers throughout the industry. In the enterprise, SAS has
crossed over to being pervasive in the industry.
The Future - Steps for SAS 2.X
In order for SAS to keep pace with the ever increasing needs for
more capacity and more complex capabilities beyond 6Gb/s SAS,
additional enhancements are planned for the 2011 timeframe - currently
referred to as SAS 2.X. The main focus of improvements include:
- Data Center Scale-out Capabilities – providing improved cabling
options – copper cables of 20 meters or more, and the potential for
optical connections for even longer cabling distances.
- Energy-efficiency Green Storage Features – providing power
management options that would bring SATA style power management into
the SAS system to improve power and cooling efficiencies.
When these enhancements are added to 6Gb/s SAS – providing even longer
distances, larger infrastructure support and improved power management
capabilities, it will allow for larger data system scale-outs and also
generally greener storage.
High Capacity SAS/SATA to Mitigate Data Proliferation
With virtually every IT department in today's corporate world facing
growing user demands and shrinking budgets, storage vendors are rushing
to deliver the cost efficiency of SAS/SATA systems with value-added
features and availability levels typically found only in enterprise
class facilities.
Energy Efficiency
The best path to storage efficiency is probably the use of data
deduplication. Every vendor now offers this capability (sometimes also
referred to as "single instancing"), which eliminates redundant data,
replacing redundant bytes or blocks with pointers, which take up much
less space. Deduping can reduce the overall need for storage capacity
by 10-30%, depending on the application and the kind of data being
stored. Deduping is an easy way to save on storage costs and on the
power needed to drive arrays, and it doesn't necessarily require
committing to new architectures.
MAID (Massive Arrays of Idle Disks) systems spin down their disks until
the data on them is needed, at which point they spin up again for as
long as they are in use. MAID is finding a home as a storage medium for
"persistent" data, information that needs to be available for reference
but for which speedy access is not necessary.
Opportunities for Embedding Intelligence in SAS Expanders
Not only is computing infrastructure performance important but
exponential demands for new storage facilities have lately been driven
by legislative requirements including Sarbanes-Oxley, HIPAA and others.
The costs of storage technology acquisition (capital expense) have
fallen below that of ongoing operational and maintenance (operational
expense) of exploding data storage.
New opportunities exist in embedding intelligence for Deduplication,
MAID and Workload-aware Dynamic Provisioning of Virtualized Storage at
the SAS Expander level.
Towards a Scalable NextGen Data Center (NGDC)
www.imexresearch.com
http://www.scsita.org
This article is the copyrighted property of IMEX Research. It is
being provided to SCSI Trade Association as a courtesy to promote the
benefits of SAS technology as an all-encompassing next generation
enterprise storage interface.
© IMEX Research 2003-09. All rights reserved. Copying without written permission prohibited. |