Archive Home arrow Reviews: arrow Video Cards arrow NVIDIA GeForce GTX 460 768MB Video Card
NVIDIA GeForce GTX 460 768MB Video Card E-mail
Reviews - Featured Reviews: Video Cards
Written by Olin Coles   
Monday, 12 July 2010
Table of Contents: Page Index
NVIDIA GeForce GTX 460 768MB Video Card
Features and Specifications
NVIDIA GF104 GPU Fermi Architecture
NVIDIA GeForce GTX 460 Video Card
GeForce GTX-460 Partner Products
Video Card Testing Methodology
DX10: 3DMark Vantage
DX10: Crysis Warhead
DX10: Far Cry 2
DX10: Resident Evil 5
DX11: Aliens vs Predator
DX11: Battlefield Bad Company 2
DX11: BattleForge
DX11: Metro 2033
DX11: Unigine Heaven 2.1
NVIDIA APEX PhysX Enhancements
NVIDIA 3D-Vision Effects
GeForce GTX460 Temperatures
VGA Power Consumption
GeForce GTX 460 Overclocking
Editor's Opinion: NVIDIA Fermi
GeForce GTX 460 Conclusion

NVIDIA Fermi Features

In today's complex graphics, tessellation offers the means to store massive amounts of coarse geometry, with expand-on-demand functionality. In the NVIDIA GF104 GPU (GF100 series), tessellation also enables more complex animations. In terms of model scalability, dynamic Level of Detail (LOD) allows for quality and performance trade-offs whenever it can deliver better picture quality over performance without penalty. Comprised of three layers (original geometry, tessellation geometry, and displacement map), the final product is far more detailed in shade and data-expansion than if it were constructed with bump-map technology. In plain terms, tessellation gives the peaks and valleys with shadow detail in-between, while previous-generation technology (bump-mapping) would give the illusion of detail.

id-imp-tessellated-character.jpg

Stages of Tessellation with NVIDIA Fermi Graphics

Using GPU-based tessellation, a game developer can send a compact geometric representation of an object or character and the tessellation unit can produce the correct geometric complexity for the specific scene. Consider the "Imp" character illustrated above. On the far left we see the initial quad mesh used to model the general outline of the figure; this representation is quite compact even when compared to typical game assets. The two middle images of the character are created by finely tessellating the description at the left. The result is a very smooth appearance, free of any of the faceting that resulted from limited geometry. Unfortunately this character, while smooth, is no more detailed than the coarse mesh. The final image on the right was created by applying a displacement map to the smoothly tessellated third character to the left.

What's new in Fermi?

With any new technology, consumers want to know what's new in the product. The goal of this article is to share in-depth information surrounding the Fermi architecture, as well as the new functionality unlocked in GF100. For clarity, the 'GF' letters used in the GF100 GPU name are not an abbreviation for 'GeForce'; they actually denote that this GPU is a Graphics solution based on the Fermi architecture. The next generation of NVIDIA GeForce-series desktop video cards will use the GF100 to promote the following new features:

  • Third Generation Streaming Multiprocessor (SM)
    o 32 CUDA cores per SM, 4x over GT200
    o 8x the peak double precision floating point performance over GT200
    o Dual Warp Scheduler simultaneously schedules and dispatches instructions from two independent warps
    o 64 KB of RAM with a configurable partitioning of shared memory and L1 cache
  • Second Generation Parallel Thread Execution ISA
    o Unified Address Space with Full C++ Support
    o Optimized for OpenCL and DirectCompute
    o Full IEEE 754-2008 32-bit and 64-bit precision
    o Full 32-bit integer path with 64-bit extensions
    o Memory access instructions to support transition to 64-bit addressing
    o Improved Performance through Predication
  • Improved Memory Subsystem
    o NVIDIA Parallel DataCache hierarchy with Configurable L1 and Unified L2 Caches
    o First GPU with ECC memory support
    o Greatly improved atomic memory operation performance
  • NVIDIA GigaThread Engine
    o 10x faster application context switching
    o Concurrent kernel execution
    o Out of Order thread block execution
    o Dual overlapped memory transfer engines

Tessellation in DirectX-11

Control hull shaders run DX11 pre-expansion routines, and operates explicitly in parallel across all points. Domain shaders run post-expansion operations on maps (u/v or x/y/z/w) and is also implicitly parallel. Fixed function tessellation is configured by Level of Detail (LOD) based on output from the control hull shader, and can also produce triangles and lines if requested. Tessellation is something that is new to NVIDIA GPUs, and was not part of GT200 because of geometry bandwidth bottlenecks from sequential rendering/execution semantics.

In regard to the GF100-series graphics processor, NVIDIA has added a new PolyMorph and Raster engines to handle world-space processing (PolyMorph) and screen-space processing (Raster). There are eight PolyMorph engines and two Raster engines on the GF104, which depend on an improved L2 cache to keep buffered geometric data produced by the pipeline on-die.

Four-Offset Gather4

The texture unit on previous processor architectures operated at the core clock of the GPU. On GF104, the texture units run at a higher clock, leading to improved texturing performance for the same number of units. GF104's texture units now add support for DirectX-11's BC6H and BC7 texture compression formats, reducing the memory footprint of HDR textures and render targets.

The texture units also support jittered sampling through DirectX-11's four-offset Gather4 feature, allowing four texels to be fetched from a 128×128 pixel grid with a single texture instruction. NVIDIA GF100 series GPUs implements DirectX-11 four-offset Gather4 in hardware, greatly accelerating shadow mapping, ambient occlusion, and post processing algorithms. With jittered sampling, games can implement smoother soft shadows or custom texture filters efficiently. The previous GT200 GPU did not offer coverage samples, while the GF100-series can deliver 32x CSAA.

GF104 Compute for Gaming

As developers continue to search for novel ways to improve their graphics engines, the GPU will need to excel at a diverse and growing set of graphics algorithms. Since these algorithms are executed via general compute APIs, a robust compute architecture is fundamental to a GPU's graphical capabilities. In essence, one can think of compute as the new programmable shader. GF100's compute architecture is designed to address a wider range of algorithms and to facilitate more pervasive use of the GPU for solving parallel problems. Many algorithms, such as ray tracing, physics, and AI, cannot exploit shared memory-program memory locality is only revealed at runtime. GF104's cache architecture was designed with these problems in mind. With up to 48 KB of L1 cache per Streaming Multiprocessor (SM) and a global L2 cache, threads that access the same memory locations at runtime automatically run faster, irrespective of the choice of algorithm.

NVIDIA Codename NEXUS brings CPU and GPU code development together in Microsoft Visual Studio 2008 for a shared process timeline. NEXUS also introduces the first hardware-based shader debugger. NVIDIA GF100-series GPUs are the first to ever offer full C++ support, the programming language of choice among game developers. To ease the transition to GPU programming, NVIDIA developed Nexus, a Microsoft Visual Studio programming environment for the GPU. Together with new hardware features that provide better debugging support, developers will be able enjoy CPU-class application development on the GPU. The end results is C++ and Visual Studio integration that brings HPC users into the same platform of development. NVIDIA offers several paths to deliver compute functionality on the GF104 GPU, such as CUDA C++ for video games.

Image processing, simulation, and hybrid rendering are three primary functions of GPU compute for gaming. Using NVIDIA GF100-series GPUs, interactive ray tracing becomes possible for the first time on a standard PC. Ray tracing performance on the NVIDIA GF100 is roughly 4x faster than it was on the GT200 GPU, according to NVIDIA tests. AI/path finding is a compute intensive process well suited for GPUs. The NVIDIA GF100 can handle AI obstacles approximately 3x better than on the GT200. Benefits from this improvement are faster collision avoidance and shortest path searches for higher-performance path finding.

NVIDIA GigaThread Thread Scheduler

One of the most important technologies of the Fermi architecture is its two-level, distributed thread scheduler. At the chip level, a global work distribution engine schedules thread blocks to various SMs, while at the SM level, each warp scheduler distributes warps of 32 threads to its execution units. The first generation GigaThread engine introduced in G80 managed up to 12,288 threads in real-time. The Fermi architecture improves on this foundation by providing not only greater thread throughput, but dramatically faster context switching, concurrent kernel execution, and improved thread block scheduling.



 

Comments 

 
# RE: NVIDIA GeForce GTX-460 768MB Video CardServando Silva 2010-07-11 21:27
Finally a decent Fermi GPU. Nvidia strikes back after almost 1 year. Thanks for this great Review Olin.
Report Comment
 
 
# First "Gaming" Fermi GPU...?Bruce Normann 2010-07-12 10:00
I can't help but think that the GF100-based Fermi cards were not really optimized for gaming. Engineers don't just throw a bunch of transistors into a rectangular pan and bake at 350F for 45 minutes. The architecture of the GF100 was designed to excel at something, it just wasn't gaming graphics. What I keep wodering is: what is the size and scope of the market that they WERE optimized for?
Report Comment
 
 
# RE: First "Gaming" Fermi GPU...?Servando Silva 2010-07-12 12:56
I think they focused a lot on CUDA and features (3D, Surround, PhysX) instead of performance. This GPUs could really be super fast for other applications, just not gaming. This seems to be their first product "really" targeted to gamers.
Report Comment
 
 
# It's only a matter of timeAvro Arrow 2010-07-14 06:30
We need to keep in mind that ATi's HD 5xxx series has already been out almost a year (wow, has it really been that long?) and that nVidia was supposed to have released Fermi almost exactly 1 year ago. It's unknown what exactly ATi is going to release this year but we can be sure that it's most likely going to make the GTX 4xx series look like the FX 5xxx series...lol
Report Comment
 
 
# one questionFederico La Morgia 2010-08-07 05:45
what is written on the RAM chips?
Report Comment
 
 
# RE: one questionOlin Coles 2010-08-07 05:56
Textures are cached and buffered to the video memory. Some games buffer only 100MB, and other buffer up to 1GB.
Report Comment
 

Comments have been disabled by the administrator.

Search Benchmark Reviews
QNAP Network Storage Servers

Follow Benchmark Reviews on FacebookReceive Tweets from Benchmark Reviews on Twitter