|OCZ Black Edition DDR3-1600 Memory Kit|
|Reviews - Featured Reviews: Memory|
|Written by Bruce Normann|
|Wednesday, 13 January 2010|
Page 5 of 6
Performance Test Results
Four benchmark applications for memory performance have been in rotation here at Benchmark Reviews for some time now, and there are no new contenders that offer any more or better information: Passmark Performance Test, Lavalys EVEREST, SiSoftware Sandra, and Crysis. The first three are synthetic benchmark suites specifically targeted at several aspects of memory performance. Each one has a unique approach, which provides a diverse set of measurements so that performance trends are brought to light. The last benchmark, Crysis, offers insight into how memory performance affects a gaming application that stresses the CPU and memory almost as much as it does the graphics subsystem. CPU speed is always a factor in memory tests, and we did our best to eliminate it as a variable. During overclocking, we had to adjust the Northbridge clock frequency, which has a halo effect on the overall system, but we were able to keep the CPU clock the same.
In Passmark Performance Test, there were either minimal gains or losses, from higher clock frequencies. The cached memory read test saw literally no difference between the three standard JEDEC configurations and a loss when I overclocked the memory to 1744 MHz. The uncached read test scored a 1% improvement between the 1066 MHz and 1600 MHz settings, and then a minor loss at 1744 MHz, again. One of the nice aspects of this benchmark is the consistency of the results; even the small changes measured here are real and repeatable.
The write performance was the only bright spot of this test, clocking in a 2.5% gain as clock speed increased. The kicker here is that the maximum performance was achieved at 1333 MHz, with CL7 timings. The lower clock with tighter timings beat both the 1600 MHz and overclocked 1744 MHz configurations. Once again, the results were very consistent for this test, and while 2.5% may not seem like a lot, it is real, measureable and repeatable. Plus it was interesting to see the advantage of the tighter timings at the lower frequencies. I predict that we will see more differentiation in the remaining tests, though.
EVEREST Ultimate Edition offers three simple memory bandwidth tests that focus on the basics; Read, Write, and Copy. In order to avoid concurrent threads competing over system memory bandwidth, the Memory benchmarks utilize only one processor core and one thread.
The Everest Read benchmark measures the maximum achievable memory read bandwidth. The code behind this benchmark method is written in Assembly and it is extremely optimized for every popular AMD and Intel processor core variants by utilizing the appropriate x86, MMX, 3DNow!, SSE, SSE2 or SSE4.1 instruction set extension. The benchmark reads a 16 MB sized, 1 MB aligned data buffer from system memory into the CPU. Memory is read in forward direction, continuously without breaks.
In Lavasys Everest we see more dramatic performance differences between speed settings in the read test. From best to worst, there is a 25% improvement in read performance. We can also see how the tight CL7 timings at 1333 MHz almost made up the difference in speed between 1333 and 1600 MHz.
The Everest Write benchmark measures the maximum achievable memory write bandwidth. The code behind this benchmark method is written in Assembly and it is extremely optimized for every popular AMD and Intel processor core variants by utilizing the appropriate x86, MMX, 3DNow!, SSE or SSE2 instruction set extension. The benchmark writes a 16 MB sized, 1 MB aligned data buffer from the CPU into the system memory. Memory is written in forward direction, continuously without breaks.
The write performance is much flatter than the read test as speed settings increase, but once again, the 1333 MHZ CL7 setting yields excellent results, close to the overclocked configuration. That's significant, since we achieved the 1744 MHz overclock by increasing the Front Side Bus (FSB) 9%, from 200 to 218 MHz. We reduced the CPU multiplier to keep the CPU clock the same, but as most people know, increasing the FSB clock makes almost everything faster. In fact, the best performance is usually achieved by pushing the FSB even higher and using a lower FSB:DRAM strap. But that's not a fair way to test memory products...
The Everest Copy benchmark measures the maximum achievable memory copy speed. The code behind this benchmark method is written in Assembly and it is extremely optimized for every popular AMD and Intel processor core variants by utilizing the appropriate x86, MMX, 3DNow!, SSE, SSE2 or SSE4.1 instruction set extension. The benchmark copies an 8 MB sized, 1 MB aligned data buffer into another 8 MB sized, 1 MB aligned data buffer through the CPU. Memory is copied in forward direction, continuously without breaks.
Copy performance was influenced the most by cranking up the memory clocks. We achieved a 38% increase in performance on this benchmark, which seemed to depend mostly on clock speed and less on clock timings. Overall, there were some significant performance gains to be had in the Everest set of benchmark tests.
Sandra is based on STREAM, a popular memory bandwidth benchmark that has been used on personal computers to super computers. It measures sustained memory bandwidth not burst or peak. Therefore, the results may be lower than those of other benchmarks. STREAM 2.0 uses static data (about 12M) - Sandra uses dynamic data (around 40-60% of physical system RAM). This means that on computers with fast memory Sandra may yield lower results than STREAM. It's not feasible to make Sandra use static RAM - since Sandra is much more than a benchmark, thus it would needlessly use memory.
A major difference is that Sandra's algorithm is multi-threaded on SMP/SMT systems. This works by splitting the arrays and letting each thread work on its own bit. Sandra creates a thread for each CPU in the system and assigns each thread to an individual CPU. Another difference is the aggressive use of scheduling/overlapping of instructions in order to maximize memory throughput even on "slower" processors. The loops should always be memory bound rather than CPU bound on all modern processors.
The results from SiSoft Sandra look a lot like the Read performance results in Lavasys Everest. They scale more as a result from increasing clock speeds than clock timings. Interestingly, the Integer and Floating Point results are almost identical, and the individual results were also very consistent from run-to-run. The overclocked pair, running 1744 MHz at CL8 bested the 1066 MHz CL7 set by 48% in both tests. That's a pretty significant gain, and a testament to the strength of the memory controller built into the AMD Phenom II architecture.
Crysis needs no introduction on this website. It is well known as one of the most demanding benchmarks, and our move to DirectX 10 has only increased the overall difficulty of achieving reasonable frame rates at high resolutions. In this scenario, where we want to reduce the influence of the video card in the results, we are primarily interested in the low resolution tests, and minimizing the video processing that is handled by the graphics subsystem.
Starting on the right and moving to the left, we can see that at 1680x1050 and 1280x1024 resolutions with quality settings on High, there are minimal differences in gaming performance with changes in memory. The only thing that has any effect is raising the FSB; even if you zero out the increase in memory speed and CPU clock, raising the FSB has a positive effect. Concentrating on the lowest resolution we tested, 1024x768, there is a noticeable, 10 FPS difference in average frame rate between the lowest and highest performing memory configurations. I say noticeable, meaning that it is both consistent and easily measured; I doubt that you or I could visually tell the difference between an average of 110 and 120 frames per second in Crysis. It's also interesting to note that tighter-than-standard timings at 1333 MHz, performed better than stock timings at the higher, 1600 MHz frequency.
Overall, the synthetic tests mostly showed measureable performance improvements from increased memory speeds and tighter timings. Our toughest gaming benchmark, in terms of CPU and memory usage only showed measureable changes at low resolution. But, as GPU power increases in the system, this influence will be felt at higher resolutions. Similarly, if you are still using DirectX 9, where the GPU has an easier task, the impact will be greater.
We're left with the question of value, then. How much difference does premium, high speed memory make, especially compared to investing money in other system components. Continue on to Final Thoughts for the answer to that question, and a discussion of how I really feel about EPP, XMP, and AOD memory standards.