|G.Skill RipJaws DDR3-1600 CL7 Memory Kit|
|Reviews - Featured Reviews: Memory|
|Written by Bruce Normann|
|Friday, 18 December 2009|
Page 4 of 5
Performance Test Results
Four benchmark applications for memory performance have been in rotation here at Benchmark Reviews for some time now, and there are no new contenders that offer any more or better information: Passmark Performance Test, Lavalys EVEREST, SiSoftware Sandra, and Crysis. The first three are synthetic benchmark suites specifically targeted at several aspects of memory performance. Each one has a unique approach, which provides a diverse set of measurements so that performance trends are brought to light. The last benchmark, Crysis, offers insight into how memory performance affects a gaming application that stresses the CPU and memory almost as much as it does the graphics subsystem. CPU speed is always a factor in memory tests, and we did our best to eliminate it as a variable. During overclocking, we had to adjust the Northbridge clock frequency, which has a halo effect on the overall system, but we were able to keep the CPU clock the same.
In Passmark Performance Test, there were very minimal gains, either from higher clock frequencies or tighter timings. The cached memory read test saw literally no difference between the five tested configurations; any differences are buried by experimental error. The uncached read test scored less than 1% improvement between the 1066 MHz and 1744 MHz settings. One of the nice aspects of this benchmark is the consistency of the results, I feel confident that even the small improvement measured here is real and repeatable.
The write performance was the bright spot of this test, clocking in a 3% gain as clock speed increased. Once again, the results were very consistent for this test, and while 3% may not seem like a lot, at least it is real, measureable and repeatable. I am hoping for more differentiation in the remaining tests, though.
EVEREST Ultimate Edition offers three simple memory bandwidth tests that focus on the basics; Read, Write, and Copy. In order to avoid concurrent threads competing over system memory bandwidth, the Memory benchmarks utilize only one processor core and one thread.
The Everest Read benchmark measures the maximum achievable memory read bandwidth. The code behind this benchmark method is written in Assembly and it is extremely optimized for every popular AMD and Intel processor core variants by utilizing the appropriate x86, MMX, 3DNow!, SSE, SSE2 or SSE4.1 instruction set extension. The benchmark reads a 16 MB sized, 1 MB aligned data buffer from system memory into the CPU. Memory is read in forward direction, continuously without breaks.
In Lavasys Everest we see more dramatic performance differences between the speed settings, and we can also see the effect of timings. From best to worst, there is a 25% improvement in read performance. We can also see how the tighter timings that were achieved at 1333 MHz almost made up the difference in speed between 1333 and 1600 MHz. It's also interesting to note that the timing changes at 1066 MHz made very little difference.
The Everest Write benchmark measures the maximum achievable memory write bandwidth. The code behind this benchmark method is written in Assembly and it is extremely optimized for every popular AMD and Intel processor core variants by utilizing the appropriate x86, MMX, 3DNow!, SSE or SSE2 instruction set extension. The benchmark writes a 16 MB sized, 1 MB aligned data buffer from the CPU into the system memory. Memory is written in forward direction, continuously without breaks.
The write performance is relatively flat, as speed settings increase, until we get to the overclocked configuration, where we were able to bump up the memory clock by increasing the Front Side Bus (FSB) by 9%, from 200 to 218 MHz. We reduced the CPU multiplier to keep the CPU clock the same, but as most people know, increasing the FSB clock makes almost everything faster. In fact, the best performance is usually achieved by pushing the FSB even higher and using a lower FSB:DRAM strap. But that's not a fair way to test memory products...
The Everest Copy benchmark measures the maximum achievable memory copy speed. The code behind this benchmark method is written in Assembly and it is extremely optimized for every popular AMD and Intel processor core variants by utilizing the appropriate x86, MMX, 3DNow!, SSE, SSE2 or SSE4.1 instruction set extension. The benchmark copies an 8 MB sized, 1 MB aligned data buffer into another 8 MB sized, 1 MB aligned data buffer through the CPU. Memory is copied in forward direction, continuously without breaks.
Copy performance was influenced the most by cranking up the memory clocks. We achieved a 38% increase in performance on this benchmark, which seemed to depend mostly on clock speed and less on clock timings. Overall, there were some significant performance gains to be had in the Everest set of benchmark tests. Not bad for a product that is supposedly optimized for a completely different operating environment. So far I see no reason that these new, low voltage RAM sets can't be used to good effect on the "old" AMD platform.
Sandra is based on STREAM, a popular memory bandwidth benchmark that has been used on personal computers to super computers. It measures sustained memory bandwidth not burst or peak. Therefore, the results may be lower than those of other benchmarks. STREAM 2.0 uses static data (about 12M) - Sandra uses dynamic data (around 40-60% of physical system RAM). This means that on computers with fast memory Sandra may yield lower results than STREAM. It's not feasible to make Sandra use static RAM - since Sandra is much more than a benchmark, thus it would needlessly use memory.
A major difference is that Sandra's algorithm is multi-threaded on SMP/SMT systems. This works by splitting the arrays and letting each thread work on its own bit. Sandra creates a thread for each CPU in the system and assigns each thread to an individual CPU. Another difference is the aggressive use of scheduling/overlapping of instructions in order to maximize memory throughput even on "slower" processors. The loops should always be memory bound rather than CPU bound on all modern processors.
The other major difference is the use of alignment. Sandra dynamically changes the alignment of streams until it finds the best combination, then it repeatedly tests it to estimate the maximum throughput of the system. You can change the alignment in STREAM and recompile - but generally it is set to 0.
The results from SiSoft Sandra look a lot like the Read performance results in Lavasys Everest. They scale more as a result from increasing clock speeds than clock timings. Interestingly, the Integer and Floating Point results are almost identical, and the individual results were also very consistent from run-to-run. The overclocked pair, running 1744 MHz at CL8 bested the 1066 MHz CL7 set by 50% in both tests. That's a pretty significant gain, and a testament to the strength of the memory controller built into the AMD Phenom II architecture.
Crysis needs no introduction on this website. It is well known as one of the most demanding benchmarks, and our move to DirectX 10 has only increased the overall difficulty of achieving reasonable frame rates at high resolutions. In this scenario, where we want to reduce the influence of the video card in the results, we are primarily interested in the low resolution tests, and minimizing the video processing that is handled by the graphics subsystem.
Starting on the right and moving to the left, we can see that at 1680x1050 and 1280x1024 resolutions, there are minimal differences in gaming performance with changes in memory. Concentrating on the lowest resolution we tested, 1024x768, there is a noticeable, 12 FPS difference in average frame rate between the lowest and highest performing memory configurations. I say noticeable, meaning that it is easily measured; I doubt that you or I could visually tell the difference between an average of 109 and 121 frames per second in Crysis.
Overall, the synthetic tests mostly showed measureable performance improvements from increased memory speeds and tighter timings. Our toughest gaming benchmark, in terms of CPU and memory usage only showed measureable changes at low resolution. But, as GPU power increases in the system, this influence will be felt at higher resolutions. Similarly, if you are still using DirectX 9, where the GPU has an easier task, the impact will be greater.
We're left with the question of value, then. How much difference does premium, high speed memory make, especially compared to investing money in other system components. Continue on to Final Thoughts for the answer to that question, and a discussion of how I really feel about XMP and other memory standards.