|
Fundamental to computing are three elements – CPUs, Memory and I/O (Storage I/O & Network I/O). In the last two decades these computing elements have progressed at breakneck speed. Today we have CPUs are ~1,000x faster, DRAM has 1,000,000x better access and Storage capacity 3,000,000x larger than two decades ago. The remaining problem is I/O.
In a perfect world, Storage I/O would not be necessary since what applications/ workloads really want is infinite cheap storage capacity ($/GB) and immediate access (i.e. low response time or low latency) from this first level storage, in effect, get very high IOPS at a minimal cost of storage (IOPS/$/GB). That has long been the Holy Grail for computer architects.
But architects (and applications/ workloads) had to yield to accommodate the real life constraints and tradeoffs of cost, access, reliability and other factors, resulting in the attached Price/Performance positioning of different storage technologies.
Price/Performance Gaps in Hierarchy of Storage Technologies
HDD
- HDD performance has always been gated - fastest HDDs can only sustain about 350 IOPS
DRAM
4 Very fast, 4 Dense, 4 Volatile 4 Not cheap 4 No internal file system 4 Is it cache or disk?
- DRAM Disk as (controller) Cache Replacement
Issues: 4 Cost/GB, 4TCO, 4Expandability/ flexibility
NAND Flash
4 Non-volatile 4Slow Writes 4Reasonably Cheap 4Dense,
- NAND Flash as HDD Replacement
Issues: 4 Write cycles 4cost/GB 4media lifetime 4TCO
With the use of new sophisticated controllers, SSDs are getting closer to having best of both worlds – HDD costs and DRAM like performance for certain IOP intensive storage workloads such as Databases and OLTP with SSD models now able to sustain over 40,000 IOPS.
SCM (Storage Class Memory) is a solid-state memory that is filling the gap between DRAM and HDDs by being low-cost, fast, and non-volatile. The marketplace is quickly segmenting SCMS into SATA and PCIe based SSDs
Key Metrics Requirements for SCMs
- Device - Capacity (GB), Cost ($/GB),
- Performance - Latency (Random/Block RW Access Time - ms); Bandwidth (R/W - GB/sec)
- Data Integrity – BER (Better than 1 in 10^17)
- Reliability - Write Endurance (No. of writes before death); Data Retention(Years); MTBF (millions of Hrs),
- Environmental – Power Consumption (Watts); Volumetric Density (TB/cu.in); Power On/Off Time (sec),
- Resistance - Shock/Vibration (g-force); Temperature/Voltage Extremes 4-Corner (oC,V); Radiation (Rad)
- SSD as backend storage to DRAM as the front end
- 36 PCIe Lanes Availability,
- 3/6 GB/s Performance (PCIe Gen2/3 x8),
- Low Latency in micro sec,
- Low Cost (via eliminating HBA cost)
SATA Value Proposition
See IMEX Research Industry Report “SSDs in the Enterprise” with exhaustive use cases and market forecast SATA SSDs vs PCIe SSDs.
SLC vs. MLC vs. TLC SSD Technologies
By using 2 bits/cell in MLC (multi-level cell) against 1 bit/cell used in SLC (single level cell), MLC NAND stores 2x the capacity. As a result MLC offers a higher density and lower cost/bit than SLC. With the cost almost the key decision metric for adoption of Flash Storage in the PC and Consumer Computing gear, lower cost/GB MLC based SSDs became the drivers necessary to accelerate SSD adoption. But issues related to reliability (endurance, data retention…), performance, adaptability to existing storage interfaces, ease of management etc. became the challenges to overcome.
Challenges with enabling MLC SSDs
|
Drivers |
Challenges |
Raw Media Reliability |
- No moving parts
- Predictable wear out
- Post infant mortality catastrophic device failures rare
|
- Higher density of MLC increases bit error rate
- High bit error rate increases with wear
- Program and Read Disturb Prevention
- Partial Page Programming
- Data retention is poor at high temperature and wear
|
Media Performance |
- Performance is excellent (compared to HDDs)
- High performance per power (IOPS/Watt)
- Low pin count: shared command / data bus, good balance
|
- NAND not really a random access device
- Block oriented
- Slow effective write, erase/transfer/program) latency,
- Imbalanced R/W access speed
- NAND Performance changes with wear
- Some controllers do read/erase/modify/write
- Others use inefficient garbage collection
|
Controller |
- Transparently converts NAND Flash memory into storage device
- Manages high bit error rate
- Improves endurance to sustain a 5-year life cycle
|
- Interconnect
- Number of NAND Flash Chips (Die)
- Number of Buses (Real / Pipelined)
- Data Protection (Int./Ext. RAID; DIF; ECC…)
- Write Mitigation techniques
- Effective Block (LBA; Sector) Size
- Write Amplification
- Garbage Collection (GC) Efficiency
- Buffer Capacity & Management
- Meta-data processing
|
The Endurance numbers…
One serious drawback of MLC has been its lower endurance to withstand data write/erase cycles (typically at 10,000 vs. 100, 000 for SLC), besides slower write speeds and higher bit error rates compared with SLC NAND. Thus
- Moving from HDD and mechanical issues to SSD with “hard” limits on writing can be complex
- Different vendors show different wear levels on raw NAND
- As geometry shrinks so do Endurance and Reliability
Retaining Customer Data…
- Raw NAND retention is inversely proportional to cycles
- NAND media types also have different wear out factors
- How long is good enough for Enterprise SSDs
more»
Now with the industry on a solid roadmap for the future through a continuous cost reduction by increasing the bit density by adopting 2, 3, and 4 bits per cell (bpc) propels it towards mass adoption of MLC technology based SSDs.
To leverage Flash NAND with its genesis as Non-Volatile Memory capable of semiconductor based mass production techniques and use them as self contained storage devices required an interface to connect to the host, an advanced device controller besides the NAND Flash semiconductor components and packaged them in a single device ready to plug into computers.
To meet the rigorous requirements of their use in the enterprise where reliability and performance requirements supersede cost, new sophisticated controllers and firmware had to be devised before they could be adopted as mission critical applications in the enterprise.
Now sophisticated controllers with advanced architectures are being made available from a number of manufacturers (for an exhaustive industry updates see IMEX Research’s Industry Report “SSD in the Enterprise”) to mitigate the key challenges posed by MLC SSDs.
Earlier Shortfalls
- High cost due to use of low density single bit SLC NANDs
- Using Higher density MLC increased bit error rate
- Relatively high bit error rate increases with wear
- Program and Read Disturbs
- Partial Page Programming
- Data retention poor at high temperature and wear
Shortfall mitigation by Modern Controllers
Today MLC NAND is able to overcome above shortfalls experienced in previous years and now meet the cost/performance/ reliability requirements of SSDs for use in the enterprise through techniques such as:
- COST
- Using 2 and 3 bit per cell MLC NANDs for cost reduction
These advanced controllers manage the above features to help make NAND Flash suitable as “Enterprise-Ready SSD” (©2010 IMEX Research) to meet the expected:
- Fast I/O Performance required by business-critical applications and
- 5-Yr. Life Cycle Endurance required by mission-critical applications in the enterprise.
To combine the best of features of SSDs - outstanding Read Performance (Latency, IOPs) and Throughput (MB/s) and the extremely low cost of HDDs has given rise to a new class of storage - Hybrid Storage Devices (brought to market by Seagate, EMC, Nvelo, Violin Memory etc)
For an exhaustive in-depth study of markets, adoption rates, newer technologies, newer standards, vendor offerings and their competitive strategies and positioning plus future directions see IMEX Research’s detailed report on Solid State Storage in the Enterprise 2010.
Automated Tiered Solid State Storage is the next killer application for SSDs
EMC – FAST (Fully Automated Storage Tiering)
- Continuously monitor and analyze data access on the tiers
- Automatically elevate hot data to “Hot Tiers” and demote cool data/volumes to “Lower Tiers”
- Allocate and relocate volumes on each tier based on use
- Automated Migration reduces OPEX to otherwise manage SANs manually
IBM – Smart Tiering Technology
Traditional Disk Mapping |
Smart Storage Mapping |
|
|
Volumes have different characteristics.
Applications need to place them on
correct tiers of storage based on usage. |
All volumes appear to be “logically”
homogenous to apps. But data is placed at
the right tier of storage based on its usage
through smart data placement and migration |
Workload I/O Monitoring & Smart Migration to SSD
Every workload has its unique IO access signatures and behavior over time. IBM has a Smart Monitoring and Analysis Tool that allows customers to develop deeper insight into the application’s behaviour over time to allow optimization of storage infrastructure supporting it. A typical historical performance data for a LUN over time is shown that reveals performance skews and hot data regions in three LBA ranges.
Smart Tiering Technology identifies these hot LBA regions and non-disruptively migrates “hot data” from HDD to SSD. Typically about 4-8% of data becomes candidate for migration from HDD to SSD depending on the workload. Result: Response time reduction of 60-70+ % at peak loads.
Response Time Improvement - Productivity Enhancements for OLTP Transactions using SSDs
Using Smart Tiering Technology Monitoring, and using automated reallocation of hot spot data (typically 5-10% of total data) to SSDs organizations can typically achieve performance improvement benefits in:
- Response time reduction of around 70+% or
- Through put (IOPS) increase of 200% for any I/O intensive loads such as experience by Time-Perishable Online Transactions such as Airlines Reservations, Wall Street Investment Banking Stock Transactions, Financial Institutions Hedge Funds etc. as well as Low Latency seeking High Performance Clustered Systems etc.
Brokerage Workload Optimization Using Smart Tiering
- Identify hot “database objects” and smartly placed in the right tier.
- Scalable Throughput Improvement - 300%
- Substantial IO Bound Transaction Response time Improvement - 45%-75%
Database
Databases have key elements of Commit files – logs, redo, undo, tempDB
Structured versus Unstructured
- Structured data
- Structured data access is an excellent fit for SSD
- Exception – large, growing table spaces
- Unstructured data
- Unstructured data access is a poor fit for SSD
- Exception – small, non-growing, tagged files
- OS images – boot-from-flash, page-to-DRAM
|
|
Multiple companies have achieved outstanding results through using SSDs in combination with HDDs to achieve the best of both worlds – excellent read performance of SSDs with cost effective low cost $/GB of HDDs. In the process they have been able to achieve
In a typical SAN environment attached graph typically depicts cost reductions - $230K using large number of Fibre Channel HDDs most commonly used in enterprises to achieve better performance vs. cost of $130K using SSDs with lower cost SATA achieving a TCO reduction of 76%, as shown. In the process IOPS performance improvements of 475 % and $/IOP reductions of a whopping 800% have been achieved. For more details refer to IMEX Research Industry Report.
(Courtesy: J.Freitas, IBM)
New technologies currently under development in research labs around the world that promise to replace today's NAND Flash technology. These new technologies - collectively called Storage Class Memory (SCM) – are being targeted to provide higher performance, lower cost, and more energy efficient solutions than today's SLC/MLC NAND Flash products.
. |
Improved Flash |
FeRAM |
MRAM |
Racetrack |
RRAM |
Memristor |
Solid Electrolyte |
PCRAM |
|
64Mb FeRAM (Prototype)
0.13um 3.3V
|
|
|
|
4Mb PCRAM (Product)
0.25um 3.3V
|
|
512Mb PCRAM
(Prototype) 0.1um 1.8V
|
4Mb MRAM (Product)
0.18um 3.3V
|
Knowledge level |
advanced development |
product |
product |
basic research |
Early development |
Early development |
development |
advanced development |
Smallest Cell demonstrated |
4F2
(1F2 per bit) |
15F2 (@130nm) |
25F2 @180nm |
— |
— |
— |
8F2 @90nm (4F2 per bit) |
5.8F2 (diode) 12F2 (BJT) @90nm |
Prospects for ……Scalability |
maybe (enough stored charge?) |
Poor (integration, signal loss) |
Poor
(high currents) |
Unknown (too early to know, good potential) |
unknown |
unknown |
promising (filament-based, but new materials) |
promising (rapid progress to date) |
…fast readout |
yes |
yes |
yes |
yes |
yes |
yes |
yes |
yes |
…fast writing |
NO |
yes |
yes |
yes |
sometimes |
sometimes |
yes |
yes |
…low Switching Power |
yes |
yes |
NO |
uncertain |
sometimes |
sometimes |
yes |
poor |
…high endurance |
Poor
(1e7 cycles) |
yes |
yes |
should |
poor |
poor |
unknown |
yes |
…non-volatility |
yes |
yes |
yes |
unknown |
sometimes |
sometimes |
sometimes |
yes |
…MLC operation |
yes |
Difficult |
NO |
yes (3-D) |
yes |
yes |
yes |
yes |
Cos. pursuing |
Spansion
Infineon
Macronix
Samsung
Toshiba
NEC
Nano-x’tal
Freescale
Matsushita |
Fujitsu
STMicro
TI
Toshiba
Infineon
Samsung
NEC
Hitachi
Rohm
HP
Cypress
Matsushita
Oki
Hynix
Celis
Fujitsu
Seiko Epson |
IBM
Infineon
Freescale
Philips
STMicro
HP
NVE
Honeywell
Toshiba
NEC
Sony
Fujitsu
Renesas
Samsung
Hynix
TSMC |
|
IBM
Sharp
Unity
Spansion
Samsung |
|
Axon
Infineon |
Ovonyx
BAE
Intel
STMicro
Samsung
Elpida
IBM
Macronix
Infineon
Hitachi
Philips |
|