CUDA
GT200 was designed for active usage in CUDA computations. In this mode, the new GPU can be treated as a programmable multiprocessor with 240 computing cores, built-in memory, support for random writing and reading, and one gigabyte of dedicated memory of high bandwidth. According to NVIDIA, GeForce GTX 280 turns a common PC into a small supercomputer in this mode, which provides almost a teraflop, which is quite handy for many scientific and application tasks.
Several key factors make GeForce GTX 200 an excellent parallel processor. First of all, it's CUDA. It's because software is always the most important part of parallel computations. And CUDA is a simple and powerful method to move CPU computations to a GPU. It's also important that GT200 was designed to be used for non-graphics computing, it has such features as shared memory and support for double-precision computing.
As a result, GeForce GTX 280 with its 240 cores operating at 1.3 GHz is one of the most powerful processors for floating-point computations. High memory bandwidth also comes in handy here, which is provided by the 512-bit memory exchange bus and fast GDDR3 video memory.
Quite a lot of the most difficult tasks can be moved from a CPU to a GPU with the help of CUDA. You can also raise performance by moving some computations to a GPU. This picture shows examples of using CUDA in real tasks, it publishes numbers that show multiplicity of GPU performance gains versus CPU.
You can see various tasks: re-encoding video data, molecular dynamics, astrophysics simulations, financial simulations, medical imaging, etc. Performance gains from moving computations to a GPU amount to 20-140-fold. Thus, the new GPU can accelerate many various algorithms, if they are moved to CUDA.
One of domestic usages of GPU computations is re-encoding video and encoding video data in corresponding applications. Elemental moved the encoding task to GPU in its RapidHD and obtained the following results:
GeForce GTX 280 performs brilliantly in this task, performance gains versus the fastest CPU amount to over 10 times. It takes 231 seconds to encode a 2-min video clip on a CPU and just 21 seconds on GT200. This GPU completes the task not just in real time, but even faster.
Another task, where you can get huge performance gains now, is Folding@Home -- distributed computations of coagulation of protein molecules, which is used for better understanding of some diseases caused by bad proteins. GPUs compute such tasks tens or even hundreds of times faster than CPUs.
Simulation speed is measured in nanoseconds a day. Results show how many nanoseconds of protein life can be imitated for a day of PC computations. A CPU can simulate only 4 ns/day, PlayStation 3 -- about 100 ns/day, GeForce GTX 280 can reach 590 ns/day. So it is over 100 times as fast as a CPU, and three times as fast as the top single-GPU solution from its competitor.
Another important factor for distributed computing is that there are over 70 million cards from NVIDIA with CUDA support all over the world with the average performance of 100 gigaflops each. And now imagine if at least 1% of these cards is used in Folding@Home, it will add 70 petaflops of potential performance to this project. GPU power reveals truly marvellous options.
We are planning to publish a separate review of CUDA to dwell on aspects of its usage and examples of real applications in various fields. One of examples of its usage, which will be useful for all users now, is GPU-assisted physics computing.
Write a comment below. No registration needed!