Archive Home arrow Reviews: arrow Video Cards arrow NVIDIA GeForce GTX 460 768MB Video Card
NVIDIA GeForce GTX 460 768MB Video Card E-mail
Reviews - Featured Reviews: Video Cards
Written by Olin Coles   
Monday, 12 July 2010
Table of Contents: Page Index
NVIDIA GeForce GTX 460 768MB Video Card
Features and Specifications
NVIDIA GF104 GPU Fermi Architecture
NVIDIA GeForce GTX 460 Video Card
GeForce GTX-460 Partner Products
Video Card Testing Methodology
DX10: 3DMark Vantage
DX10: Crysis Warhead
DX10: Far Cry 2
DX10: Resident Evil 5
DX11: Aliens vs Predator
DX11: Battlefield Bad Company 2
DX11: BattleForge
DX11: Metro 2033
DX11: Unigine Heaven 2.1
NVIDIA APEX PhysX Enhancements
NVIDIA 3D-Vision Effects
GeForce GTX460 Temperatures
VGA Power Consumption
GeForce GTX 460 Overclocking
Editor's Opinion: NVIDIA Fermi
GeForce GTX 460 Conclusion

NVIDIA GF104 GPU Fermi Architecture

Based on the Fermi architecture, NVIDIA's latest GPU is codenamed GF104 and is equipped on the GeForce GTX 460. In this article, Benchmark Reviews explains the technical architecture behind NVIDIA's GF104 graphics processor and offers an insight into upcoming Fermi-based GeForce video cards. For those who are not familiar, NVIDIA's GF100 GPU was their first graphics processor to support DirectX-11 hardware features such as tessellation and DirectCompute, while also adding heavy particle and turbulence effects. The GF100 GPU is also the successor to the GT200 graphics processor, which launched in the GeForce GTX 280 video card back in June 2008. NVIDIA has since redefined their focus, and GF100/GF104 proves a dedication towards next generation gaming effects such as raytracing, order-independent transparency, and fluid simulations. The new GF104 GPU is still more powerful than GT200, and delivers DirectX-11 performance for NVIDIA's mid-range Fermi-based video card family.

GF100 was not another incremental GPU step-up like we had going from G80 to GT200. While processor cores have grown from 128 (G80) and 240 (GT200), they reach 512 in the GF100 and earn the title of NVIDIA CUDA (Compute Unified Device Architecture) cores. GF104 features up to 336 CUDA cores. The key here is not only the name, but that the name now implies an emphasis on something more than just graphics. Each Fermi CUDA processor core has a fully pipelined integer arithmetic logic unit (ALU) and floating point unit (FPU). GF104 implements the IEEE 754-2008 floating-point standard, providing the fused multiply-add (FMA) instruction for both single and double precision arithmetic. FMA improves over a multiply-add (MAD) instruction by doing the multiplication and addition with a single final rounding step, with no loss of precision in the addition. FMA minimizes rendering errors in closely overlapping triangles.

NVIDIA-GF104-Fermi-Graphics-Processor.png

NVIDIA Fermi GF104 Block Diagram (click for high-resolution)

Based on Fermi's third-generation Streaming Multiprocessor (SM) architecture, GF104 could be mistaken as a divided GF100. NVIDIA GeForce GF100-series Fermi GPUs are based on a scalable array of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. NVIDIA's GF100 GPU implemented four GPCs, sixteen SMs, and six memory controllers. Conversely, GF104 implements two GPCs. eight SMs, and four memory controllers. Where each SM contained 32 CUDA cores in the GF100, NVIDIA now configures the GF104 to deliver 48 cores per SM. As expected, NVIDIA GF100-series products are launching with different configurations of GPCs, SMs, and memory controllers to address different price points.

CPU commands are read by the GPU via the Host Interface. The GigaThread Engine fetches the specified data from system memory and copies them to the frame buffer. GF104 implements four 64-bit GDDR5 memory controllers (256-bit total) to facilitate high bandwidth access to the frame buffer. The GigaThread Engine then creates and dispatches thread blocks to various SMs. Individual SMs in turn schedules warps (groups of 48 threads) to CUDA cores and other execution units. The GigaThread Engine also redistributes work to the SMs when work expansion occurs in the graphics pipeline, such as after the tessellation and rasterization stages.

GF104 implements 336 CUDA cores, organized as 8 SMs of 48 cores each. Each SM is a highly parallel multiprocessor supporting up to 32 warps at any given time (four Dispatch Units per SM deliver two dispatched instructions per warp for four total instructions per clock per SM). Each CUDA core is a unified processor core that executes vertex, pixel, geometry, and compute kernels. A unified L2 cache architecture (384KB on 768MB version or 512KB on 1GB cards) services load, store, and texture operations. GF104 is designed to offer a total of 32 ROP units (768MB=24 / 1GB=32) for pixel blending, antialiasing, and atomic memory operations. The ROP units are organized in four groups of eight. Each group is serviced by a 64-bit memory controller. The memory controller, L2 cache, and ROP group are closely coupled-scaling one unit automatically scales the others.

GF104 Specifications

  • Up to 336 CUDA Cores
  • 8 Geometry Units
  • 2 Raster Units
  • 64 Texture Units
  • 32 ROP Units
  • 256-bit GDDR5
  • DirectX-11 API Support

GeForce GTX 400 Specifications

Graphics Card

GeForce GTX 460

GeForce GTX 465

GeForce GTX 470

GeForce GTX 480

GPU Transistors 1.95 Billion 3.2 Billion 3.2 Billion 3.2 Billion

Graphics Processing Clusters

2

4

4

4

Streaming Multiprocessors

7 11

14

15

CUDA Cores

336 352

448

480

Texture Units

56 44

56

60

ROP Units

768MB=24 / 1GB=32 32

40

48

Graphics Clock
(Fixed Function Units)

675 MHz

607 MHz

607 MHz

700 MHz

Processor Clock
(CUDA Cores)

1350 MHz

1215 MHz

1215 MHz

1401 MHz

Memory Clock
(Clock Rate/Data Rate)

900/3600 MHz

837/3348 MHz

837/3348 MHz

924/3696 MHz

Total Video Memory

768MB / 1GB

1024 MB

1280 MB

1536 MB

Memory Interface

768MB=192 / 1GB=256-Bit

256-Bit

320-Bit

384-Bit

Total Memory Bandwidth

86.4 / 115.2 GB/s

102.6 GB/s

133.9 GB/s

177.4 GB/s

Texture Filtering Rate
(Bilinear)

37.8 GigaTexels/s

26.7 GigaTexels/s

34.0 GigaTexels/s

42.0 GigaTexels/s

GPU Fabrication Process

40 nm

40 nm

40 nm

40 nm

Output Connections

2x Dual-Link DVI-I
1x Mini HDMI

2x Dual-Link DVI-I
1x Mini HDMI

2x Dual-Link DVI-I
1x Mini HDMI

2x Dual-Link DVI-I
1x Mini HDMI

Form Factor

Dual-Slot

Dual-Slot

Dual-Slot

Dual-Slot

Power Input

2x 6-Pin

2x 6-Pin

2x 6-Pin

6-Pin + 8-Pin

Thermal Design Power (TDP)

768MB=150W / 1GB=160W

200 Watts

215 Watts

250 Watts

Recommended PSU

450 Watts

550 Watts

550 Watts

600 Watts

GPU Thermal Threshold

104°C

105°C

105°C

105°C

GeForce Fermi Chart Courtesy of Benchmark Reviews



 

Comments 

 
# RE: NVIDIA GeForce GTX-460 768MB Video CardServando Silva 2010-07-11 21:27
Finally a decent Fermi GPU. Nvidia strikes back after almost 1 year. Thanks for this great Review Olin.
Report Comment
 
 
# First "Gaming" Fermi GPU...?Bruce Normann 2010-07-12 10:00
I can't help but think that the GF100-based Fermi cards were not really optimized for gaming. Engineers don't just throw a bunch of transistors into a rectangular pan and bake at 350F for 45 minutes. The architecture of the GF100 was designed to excel at something, it just wasn't gaming graphics. What I keep wodering is: what is the size and scope of the market that they WERE optimized for?
Report Comment
 
 
# RE: First "Gaming" Fermi GPU...?Servando Silva 2010-07-12 12:56
I think they focused a lot on CUDA and features (3D, Surround, PhysX) instead of performance. This GPUs could really be super fast for other applications, just not gaming. This seems to be their first product "really" targeted to gamers.
Report Comment
 
 
# It's only a matter of timeAvro Arrow 2010-07-14 06:30
We need to keep in mind that ATi's HD 5xxx series has already been out almost a year (wow, has it really been that long?) and that nVidia was supposed to have released Fermi almost exactly 1 year ago. It's unknown what exactly ATi is going to release this year but we can be sure that it's most likely going to make the GTX 4xx series look like the FX 5xxx series...lol
Report Comment
 
 
# one questionFederico La Morgia 2010-08-07 05:45
what is written on the RAM chips?
Report Comment
 
 
# RE: one questionOlin Coles 2010-08-07 05:56
Textures are cached and buffered to the video memory. Some games buffer only 100MB, and other buffer up to 1GB.
Report Comment
 

Comments have been disabled by the administrator.

Search Benchmark Reviews
QNAP Network Storage Servers

Follow Benchmark Reviews on FacebookReceive Tweets from Benchmark Reviews on Twitter