|Overclocking Data Storage Subsystems: Variable Channel Bandwidth|
|Written by Paul A. Mitchell, B.A., M.S.|
|Thursday, 29 July 2010|
Overclocking Data Storage Subsystems: One Approach to Variable Channel Bandwidth
by Paul A. Mitchell, B.A., M.S.
In recent years, the performance of personal computer systems has seen leaps and bounds by successfully overclocking CPU and memory bus speeds. By comparison, the widespread adoption of storage interface standards has limited the raw speeds of data transmission channels to relatively few options.
In particular, the current SATA "6G" standard still suffers from the 10/8 protocol invented for a bygone era when 9600-baud dial-up modems were state-of-the-art. Thus, even with SATA channels now oscillating at 6.0 Gigabits per second ("6G"), there is still one start bit and one stop bit for every 8-bit byte transmitted.
This article briefly introduces the concept of variable channel frequencies by demonstrating a few of its expected benefits in one peer-to-peer computer hardware application.
Quad-Channel Memory Architectures: Coming Soon!
One of the reasons why Gigabit Ethernet (GbE) adapters remain so popular is their compatibility with legacy PCI slots. Transmitting 32 bits at 33 MHz, one legacy PCI slot supports a raw bandwidth of 1,056 Megabits per second, or just enough for one GbE card. Dividing by 8 bits per byte, a PCI slot performs at ~133 Megabytes per second i.e. the "133" in "ATA-133" for Parallel ATA data channels. Ramping up to 10 Gigabit Ethernet ("10GbE") then results in a raw bandwidth approaching ten times that of GbE, or ~1.33 Gigabytes ("GB") per second (+/-).
Neither of these alternatives is very exciting, when compared to the raw speeds now possible with conventional DDR3 RAM DIMMs operating in triple-channel mode. Early specifications confirmed raw bandwidths exceeding 25,000 MB per second at stock clock and latency settings. By overclocking the same RAM, assuming it supports the higher settings without any damage, raw bandwidths have been measured to exceed 30,000 MB per second (30 GBps) with triple-channel chipsets.
A truly exciting future should happen before long, when CPUs are designed to integrate quad-channel memory controllers. This development will permit computer motherboards to achieve very high memory bandwidths with only four DIMM slots, instead of six. To illustrate, by using a scaling factor of 4-to-3, then four such DIMM slots running in quad-channel mode should yield raw bandwidths approaching 40,000 MB/second (30,000 x 4/3) at stock settings. Constant change is here to stay (or so reads my bumper sticker).
Quad-Channel Designs for Data Storage
The question then arises: how about using quad-channel designs to accelerate storage subsystems too? This approach is already happening with RAID 0 subsystems that combine four storage devices like rotating hard disk drives ("HDD"), and solid-state drives ("SSD") utilizing common variants of popular Nand Flash memory technology.
Quite apart from the controller overheads that are a necessary feature of such RAID subsystems, its adherence to popular standards means that the computer industry is "stuck" with accepting fixed channel frequencies and outdated transmission protocols like the 10/8 overhead required of all Serial ATA ("SATA") devices.
Variable Channel Frequencies: Also Coming Soon?
As a result of using a little imagination, this author recently theorized what could happen if quad-channel storage subsystems were designed and implemented to remove the extra overhead in the 10/8 SATA protocol, and to permit the frequency of each data channel to be varied upwards to some practical engineering limit.
Specifically, the current SATA/6G protocol is modified to use a minimum number of error correction code ("ECC") bits on a 4,096-byte "jumbo frame" transmitted initially at 6 Gigabits per second: this should result in an effective bandwidth of ~750 MBps (6 Gbps / 8).
Also, the 6G transmission rate is increased to 8 Gigabits per second ("8G"), with a goal of balancing the aggregate bandwidth of quad channels with the use of a specialized controller that utilizes x8 or x16 PCI-Express 2.0 lanes ("PCI-E Gen2").
Thus, by oscillating at 8 Gbps, each data channel now has a raw bandwidth of 8G / 8 = 1.0 GB per second. In quad-channel mode, we predict four times that rate, or 4.0 GB per second. This latter number also conveniently corresponds to the raw bandwidth of x8 PCI-E Gen2 lanes i.e. x8 lanes @ 500 MB/second = 4.0 GB/second.
Now we are talking very fast storage subsystems, the raw bandwidth of which is in the neighborhood of measured DDR3 RAM speeds. Compare 4.0 GB/second to the maximum bandwidth of 10GbE above, which is much slower by comparison.
A Quad-Channel Example for High-Speed Peer-to-Peer File Transfers
We show a standard SFF-8088 "multi-lane" cable connecting two Highpoint RocketRAID 2722 controllers installed in two separate workstations. A custom protocol eliminates the 10/8 transmission overhead by using 4K jumbo frames that replace the start and stop bits on each byte with far fewer ECC bits.
Also, the raw clock frequency of all four data channels is increased from 6G to 8G. In principle, then, each of those four data channels should be able to transit raw data at 1.0 GB/second, for a total of 4.0 GB per second in quad-channel mode.
By using a controller with an x8 edge connector, the latter bandwidth of 4.0 GB per second corresponds exactly to the one-way bandwidth of the PCI-E bus, because x8 lanes transmit data in each direction at 500 MBps per lane (i.e. 8 x 500 = 4,000 MB/second).
Additionally, because the 2722 controller has two external multi-lane ports, Figure 1 allows for other physical and logical applications of that second SFF-8088 port, such as conventional RAID arrays or a second quad-channel connection for each workstation.
Assuming for now that all necessary device drivers can be implemented to support all of the capabilities described above, in addition to new controller circuits to oscillate data channels at 8G instead of 6G, we are then in a position to contemplate the possibilities which such high-speed storage brings much closer to reality.
For example, with a raw bandwidth of 4.0 GB/second, system and application software can be written to perform memory-to-memory transfers, when the RAM involved in such transfers is resident in 2 entirely different workstations.
Similarly, storing entire file systems in ramdisks is a popular option for many computer enthusiasts at present. For example, see this author's published review of RamDisk Plus by SuperSpeed LLC. Likewise, writing drive images of an OS partition will finish much faster when output rates are not hampered by the slow buffer-to-disk speeds of rotating hard drives.
And, this author can foresee a not-too-distant future in which an OS is loaded entirely into RAM, including all of its program and data files. When that capability becomes a proven reality, the quad-channel example illustrated in Figure 1 should result in significantly accelerating the speed with which routine drive images of such an OS are created.
An example of an overclocked quad-channel storage subsystem is presented which also modifies the standard SATA protocol with 4K jumbo frames and far fewer ECC bits. A specific combination of PCI-Express hardware and standard SFF-8088 multilane cables is briefly described for its potential to increase raw data bandwidth to 4.0 Gigabytes per second between two PCI-Express workstations. Overclocking the raw transmission rates of data storage channels is an idea whose time has arrived, in order to achieve significant improvements in the performance of data storage subsystems, particularly when solid-state memory technology is utilized as the storage medium in those same subsystems.
About the Author:
Paul A. Mitchell, B.A., M.S., is an instructor, inventor and systems development consultant, now living in Seattle, Washington State. He volunteers technical advice frequently at Tom's Hardware as MRFS e.g. search for site:www.tomshardware.com +"Best answer from MRFS".
Republished with persmission from author. Original source: SurpremeLaw