Archive Home arrow Reviews: arrow Video Cards arrow Zotac GeForce GTX-480 Fermi Video Card
Zotac GeForce GTX-480 Fermi Video Card E-mail
Reviews - Featured Reviews: Video Cards
Written by Olin Coles   
Tuesday, 11 May 2010
Table of Contents: Page Index
Zotac GeForce GTX-480 Fermi Video Card
Features and Specifications
NVIDIA GF100 GPU Fermi Architecture
Closer Look: Zotac GeForce GTX480
Video Card Testing Methodology
3DMark Vantage GPU Tests
Battlefield: Bad Company 2
BattleForge Performance
Crysis Warhead Tests
Far Cry 2 Benchmark
Resident Evil 5 Tests
Metro 2033 DX11 Performance
Unigine Heaven Benchmark
NVIDIA APEX PhysX Enhancements
NVIDIA 3D-Vision Effects
GeForce GTX480 Temperatures
VGA Power Consumption
Editors Opinion: Fermi GF100
ZOTAC GTX-480 Conclusion

NVIDIA GF100 GPU Fermi Architecture

NVIDIA's latest GPU is codenamed GF100, and is the first graphics processor based on the Fermi architecture. In this article, Benchmark Reviews explains the technical architecture behind NVIDIA's GF100 graphics processor and offers an insight into upcoming Fermi-based GeForce video cards. For those who are not familiar, NVIDIA's GF100 GPU is their first graphics processor to support DirectX-11 hardware features such as tessellation and DirectCompute, while also adding heavy particle and turbulence effects. The GF100 GPU is also the successor to the GT200 graphics processor, which launched in the GeForce GTX 280 video card back in June 2008. NVIDIA has since redefined their focus, and GF100 proves a dedication towards next generation gaming effects such as raytracing, order-independent transparency, and fluid simulations. Rest assured, the new GF100 GPU is more powerful than the GT200 could ever be, and early results indicate a Fermi-based video card delivers far more than twice the gaming performance over a GeForce GTX-280.

GF100 is not another incremental GPU step-up like we had going from G80 to GT200. While processor cores have grown from 128 (G80) and 240 (GT200), they now reach 512 and earn the title of NVIDIA CUDA (Compute Unified Device Architecture) cores. The key here is not only the name, but that the name now implies an emphasis on something more than just graphics. Each Fermi CUDA processor core has a fully pipelined integer arithmetic logic unit (ALU) and floating point unit (FPU). GF100 implements the new IEEE 754-2008 floating-point standard, providing the fused multiply-add (FMA) instruction for both single and double precision arithmetic. FMA improves over a multiply-add (MAD) instruction by doing the multiplication and addition with a single final rounding step, with no loss of precision in the addition. FMA minimizes rendering errors in closely overlapping triangles.

nvidia-fermi-gf100-gpu-block-diagram-benchmarkreviews-sm.png

NVIDIA Fermi GF100 Block Diagram (click for high-resolution)

Based on Fermi's third-generation Streaming Multiprocessor (SM) architecture, GF100 doubles the number of CUDA cores over the previous architecture. NVIDIA GeForce GF100 Fermi GPUs are based on a scalable array of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. The NVIDIA GF100 implements four GPCs, sixteen SMs, and six memory controllers. Expect NVIDIA to launch GF100 products with different configurations of GPCs, SMs, and memory controllers to address different price points.

CPU commands are read by the GPU via the Host Interface. The GigaThread Engine fetches the specified data from system memory and copies them to the frame buffer. GF100 implements six 64-bit GDDR5 memory controllers (384-bit total) to facilitate high bandwidth access to the frame buffer. The GigaThread Engine then creates and dispatches thread blocks to various SMs. Individual SMs in turn schedules warps (groups of 32 threads) to CUDA cores and other execution units. The GigaThread Engine also redistributes work to the SMs when work expansion occurs in the graphics pipeline, such as after the tessellation and rasterization stages.

GF100 implements 512 CUDA cores, organized as 16 SMs of 32 cores each. Each SM is a highly parallel multiprocessor supporting up to 48 warps at any given time. Each CUDA core is a unified processor core that executes vertex, pixel, geometry, and compute kernels. A unified L2 cache architecture services load, store, and texture operations. GF100 has 48 ROP units for pixel blending, antialiasing, and atomic memory operations. The ROP units are organized in six groups of eight. Each group is serviced by a 64-bit memory controller. The memory controller, L2 cache, and ROP group are closely coupled-scaling one unit automatically scales the others.

NVIDIA GigaThread Thread Scheduler

One of the most important technologies of the Fermi architecture is its two-level, distributed thread scheduler. At the chip level, a global work distribution engine schedules thread blocks to various SMs, while at the SM level, each warp scheduler distributes warps of 32 threads to its execution units. The first generation GigaThread engine introduced in G80 managed up to 12,288 threads in real-time. The Fermi architecture improves on this foundation by providing not only greater thread throughput, but dramatically faster context switching, concurrent kernel execution, and improved thread block scheduling.

What's new in Fermi?

With any new technology, consumers want to know what's new in the product. The goal of this article is to share in-depth information surrounding the Fermi architecture, as well as the new functionality unlocked in GF100. For clarity, the 'GF' letters used in the GF100 GPU name are not an abbreviation for 'GeForce'; they actually denote that this GPU is a Graphics solution based on the Fermi architecture. The next generation of NVIDIA GeForce-series desktop video cards will use the GF100 to promote the following new features:

  • Third Generation Streaming Multiprocessor (SM)
    o 32 CUDA cores per SM, 4x over GT200
    o 8x the peak double precision floating point performance over GT200
    o Dual Warp Scheduler simultaneously schedules and dispatches instructions from two independent warps
    o 64 KB of RAM with a configurable partitioning of shared memory and L1 cache
  • Second Generation Parallel Thread Execution ISA
    o Unified Address Space with Full C++ Support
    o Optimized for OpenCL and DirectCompute
    o Full IEEE 754-2008 32-bit and 64-bit precision
    o Full 32-bit integer path with 64-bit extensions
    o Memory access instructions to support transition to 64-bit addressing
    o Improved Performance through Predication
  • Improved Memory Subsystem
    o NVIDIA Parallel DataCache hierarchy with Configurable L1 and Unified L2 Caches
    o First GPU with ECC memory support
    o Greatly improved atomic memory operation performance
  • NVIDIA GigaThread Engine
    o 10x faster application context switching
    o Concurrent kernel execution
    o Out of Order thread block execution
    o Dual overlapped memory transfer engines

Benchmark Reviews also more detail in our full-length NVIDIA GF100 GPU Fermi Graphics Architecture guide.



 

Comments 

 
# Other TestsFederico La Morgia 2010-05-13 22:57
3 x 480 ?
tests in OC ?
Report Comment
 
 
# Next Zotac 480 coming soonWW_Dagger 2010-05-26 16:44
I've set my heart on the recently announced Zotac FTX 480 AMP. This takes the cake! Faster yet quieter, and looks just awesome.
Report Comment
 
 
# RE: Next Zotac 480 coming soonFederico La Morgia 2010-05-26 22:09
Point my heart, I would like a Fermi limited to 480 SP, but having the full 512-SP.
Not only!, I wish to put in dual-GPU as my current GTX 295 and directly to the liquid cooled heat sink with copper entirely!
Report Comment
 
 
# HD5870Ansuex 2010-09-10 00:32
Is it just me? or is the 5870 losing less fps when resolution is turned higer compared to the GTX480 or?
Report Comment
 

Comments have been disabled by the administrator.

Search Benchmark Reviews
QNAP Network Storage Servers

Follow Benchmark Reviews on FacebookReceive Tweets from Benchmark Reviews on Twitter