Summary
While AMD and Nvidia battle for supremacy in the GPU computing arena, there's one obvious loser, Intel. AMD's 5870 appeared on schedule. Nvidia's Fermi is late, but its GTX 280 series still is competitive. Intel's Larrabee remains a no-show. End users who buy their systems by the teraflop have discovered and validated an alternative approach that requires fewer x86 CPUs, less power, and less space. GPU computing is here to stay, and the market will punish those who lack a competitive offering.
Analysis
At last week's GPU Technology Conference (GTC), representatives from two of the nation's major supercomputing centers, Oak Ridge National Labs and the University of Illinois at Urbana-Champaign, indicated their plans to incorporate GPUs as key computational elements in next generation supercomputers now under construction at both sites. These organizations, along with hundreds of others, have been experimenting with the use of GPUs to boost their number crunching capabilities over the past three years, and it now appears that the scientific community is ready to fully embrace GPU computing. Almost 100 "poster papers" from researchers in academia and industry lined the halls of the conference facility, touching on topics as varied as "Nanophotonic simulation," "Accelerated path planning for multi-axis machine tools," and "Using GPUs for Internet routing processing."
Nvidia's been plugging away at these new uses for its GPUs since 2006. Fermi, the new GPU it unveiled last week, incorporates several features designed specifically to appeal to scientific users, including a far more robust 64-bit floating point capability (delivering a peak rate of 750 DP GFLOPS, compared with 80 GFLOPS in its current offering), and the use of error-correcting memory in the on-board frame buffer. It has also dramatically expanded its software capabilities, allowing developers to program its GPUs using C++, along with two new industry standard software interfaces, OpenCL and DirectCompute. Nvidia also introduced a new development tool, Nexus, that eases the problems of debugging multi-threaded applications in mixed CPU/GPU environments. There's no doubt Fermi is coming late to market (Jen-Hsun Huang, Nvidia's CEO, noted that he wished he had had it "six months ago"), but the scientists who use his chips seem willing to cut Nvidia some slack, and are ready to gobble up the new GPUs when they emerge in 2010.
AMD's also been dabbling in the GPU computing market since 2006, but Google's 2007 acquisition of a key AMD partner (PeakStream) forced a software reset in its program. AMD refocused its efforts around the new OpenCL and DirectCompute standards, but lacks the ability to run C++ and Fortran, two languages with great popularity in the high performance computing arena. AMD's new chips outperform Nvidia's forthcoming Fermi on single precision (32-bit) floating point math (2.7 TFLOPS to 1.5 TFLOPS), but lag on double-precision math (500 GFLOPS to 750 GFLOPS). AMD's design lacks support for error correcting memory; their engineers argue that they've never seen an ECC memory failure in years of hard searching for such errors, and the feature is just not needed.
While AMD and Nvidia battle for supremacy in the GPU computing market, there's one obvious loser, Intel. AMD's 5870 appeared on schedule. Although Nvidia's Fermi is late, its prior generation GTX 280 still has some life left in it. But, Intel's many-core Larrabee is still a no-show, and the company's Larrabee demo at its recent Developers' Forum was universally regarded as brain-dead, if not an outright embarrassment. End users with high performance computational requirements previously filled their data centers with racks of x86 servers to handle those requirements. Now they have now discovered and validated an alternative approach that requires fewer x86 CPUs, less power, and less space. GPU computing won't solve all the world's computing problems, but it will give users who buy their systems by the teraflop a new, more cost effective alternative that will take some of the wind out of Intel's high performance computing sales.
Note: Readers can see my detailed architectural analysis of Fermi at http://tinyurl.com/ycerrxv
Analyses are solely the work of the authors and have not been edited or endorsed by GLG.