Deep dive: Nvidia's RTX 2080 and real-time ray tracing explained

As expected, Nvidia unveiled three new GeForce RTX graphics cards at its Gamescom event. We’ve covered a lot of rumors and speculation, but we now know pricing, features, performance—and yes, even the name. Nvidia will provide further details on the architecture in the coming days, but those details will be embargoed until a later date, presumably close to September 20, which is when the RTX 2080 Ti and RTX 2080 officially go on sale. But we have plenty of other information to dissect before then, so let’s dive in.

Nvidia’s last GeForce architecture was Pascal, which powered everything from the top-tier best graphics cards like the GTX 1080 and GTX 1080 Ti to the entry level GTX 1050 and GT 1030. Last year, Nvidia released a new Volta architecture, which will apparently remain in the supercomputing and deep learning focused fields, because the new Turing architecture appears to beat it in nearly every meaningful way. If you just splurged on a Titan V, that’s bad news, but for gamers holding out for new graphics cards, your patience has paid off.

Core specs and pricing for the RTX 20-series graphics cards

There was a ton of speculation and, yes, blatantly wrong guesses as to what the Turing architecture would contain. Every single ‘leak’ prior to last week was wrong. Chew on that for a moment. We can make an educated guess about what Nvidia and AMD might do with a future architecture, but such guesses are bound to be wrong. Nvidia unveiled many core details of the Turing architecture at SIGGRAPH, and with the official announcement of the GeForce RTX 20-series we can finally put all the rumor-mongering to bed.

Quick disclaimer: I’ve used the ‘reference’ specs for all the GPUs in the following table. The 20-series Founders Edition cards carry a higher price but come with a 90MHz higher boost clock for Turing, putting them in the same range that factory overclocked models are likely to land. As for the ‘true reference’ cards, we don’t know what those will look like or how widely available they’ll be, particularly at launch. I suspect we won’t see the lower end of the price ranges listed above for at least a month or two after the graphics cards begin shipping.

Here are the specs (with a few areas still unknown, like die size and transistor counts for the smaller Turing core):

For traditional graphics work—what games have been using up until now—CUDA core counts are moderately improved across the line. The 2080 Ti has 21 percent more cores than the GTX 1080 Ti, the RTX 2080 has 15 percent more cores than the GTX 1080, and the RTX 2070 has 20 percent more cores than the GTX 1070. The result in theoretical TFLOPS is a similar 13.5 to 18.6 percent improvement—call it 15 percent on average. Here’s the important bit: those theoretical numbers represent more of a worst-case scenario for Turing.

Architecturally, Nvidia has enhanced the CUDA cores this round. One major change is that the CUDA cores can do simultaneous FP32 and INT calculations. Most graphics work relies on floating-point computation (eg, 3.14159 * 2.71828), but integer calculations for memory addresses are also important. It’s not clear exactly how this ultimately affects graphics performance, but during his GeForce RTX presentation, Nvidia CEO Jensen Huang stated the Turing cores are “1.5 times faster” than the Pascal cores. If that figure is even close to reality, the new RTX 20-series GPUs will be substantially faster than the current 10-series.

The performance improvements don’t stop with more and faster CUDA cores. Turing will use 14 GT/s GDDR6 memory in the three parts revealed so far. That gives the 2080 Ti a modest 27 percent improvement in bandwidth, the 2080 sees a larger 40 percent boost, and the 2070 gets catapulted to equivalence with the 2080 model and receives a 75 percent increase in performance. Every GPU has a certain amount of memory bandwidth that it needs, beyond which faster memory doesn’t help as much. Nvidia has traditionally kept its top GPUs pretty well balanced, but the move to GDDR6 has altered things. I suspect the 2070 doesn’t really need all that bandwidth, but having extra certainly won’t hurt.

Everything so far represents updates to Nvidia’s traditional GPU architecture. What comes next are the new additions, the RT and Tensor cores. The RT stands for ray-tracing, a technique first introduced back in 1979 by Turner Whitted. It’s likely no coincidence that Whitted joined Nvidia in 2014, working in its research division. The timing fits perfectly with Nvidia beginning serious efforts to implement real-time ray-tracing hardware, and Turing is the first clear fruits of those efforts—in a recent blog post, Whitted discussed some of his history with ray-tracing and global illumination.

I’ll come back to what ray-tracing is in a bit, but the new information from Nvidia is that the RT cores do about 10 TFLOPS of computations for each Giga Ray per second. It’s important to state that these TFLOPS are not general purpose TFLOPS, but instead these are specific operations designed to accelerate ray-tracing calculations. Nvidia says the RT cores are used to compute ray triangle intersections (where a ray hits a polygon), as well as BVH traversal. That second bit requires a lengthier explanation.

BVH stands for “bounding volume hierarchy” and is a method for optimizing intersection calculations. Instead of checking rays against polygons, objects are encapsulated by larger, simple volumes. If a ray doesn’t intersect the large volume, no additional effort needs to be spent checking the object. Conversely, if a ray does intersect the bounding volume, then the next level of the hierarchy gets checked, with each level becoming more detailed. Basically, Nvidia is providing hardware that accelerates common functions used in ray-tracing, potentially speeding up the calculations by an order of magnitude (or more).

The final major architectural feature in Turing is the inclusion of Tensor cores. Normally used for machine learning, you might wonder why these are even useful for gaming. There’s future potential for games to make use of such cores to enhance AI in games, but that seems unlikely—especially when for the next five or more years a large installed base of gamers won’t have Tensor cores available. In the more immediate future, these cores can be used in more practical ways.

Nvidia showed some examples of improved image upscaling quality, where machine learning that has been trained on millions of images can generate a better result with less blockiness and other artifacts. Imagine rendering a game at 1080p with a high framerate, but then using the Tensor cores to upscale that to a pseudo-4k without the massive hit to performance we currently incur. It wouldn’t necessarily be perfect, but suddenly the thought of 4k displays running at 144Hz with ‘native’ 4k content isn’t so far-fetched.

Nvidia also discussed a new DLSS algorithm that provides a better anti-aliasing experience than TAA (temporal AA). It’s not clear whether Infiltrator is using DLSS, the Tensor cores, or what, but Nvidia says the Infiltrator demo runs at “78 fps” on a GTX 2080 Ti, compared to just “30-something” fps on a GTX 1080 Ti—both at 4k.

Turing will be manufactured using TSMC 12nm

One piece of news that wasn’t surprising at all is that Turing GPUs will be manufactured using TSMC’s 12nm FinFET process. Later Turing models could potentially be manufactured by Samsung, as was the case with the GTX 1050/1050 Ti and GT 1030 Pascal parts, but the first round of Turing GPUs will come from TSMC.

What does the move to 12nm from 16nm mean in practice? Various sources indicate TSMC’s 12nm is more of a refinement and tweak to the existing 16nm rather than a true reduction in feature sizes. In that sense, 12nm is more of a marketing term than a true die shrink, but optimizations to the process technology over the past two years should help improve clockspeeds, chip density, and power use—the holy trinity of faster, smaller, and cooler running chips. TSMC’s 12nm FinFET process is also mature at this point, with good yields, which allows Nvidia to create a very large GPU design.

The top TU102 Turing design will have 18.6 billion transistors and measures 754mm2. (Note that TU102 is what some places are calling it—Nvidia hasn’t officially named the chips as far as I’m aware. “A rose by any other name” and all that….) That’s a huge chip, far larger than the GP102 used in the GTX 1080 Ti (471mm2 and 11.8 billion transistors). It’s nearly as large as the GV100 used in the Tesla V100 and Titan V (815mm2), which is basically as large as Nvidia can go with TSMC’s current production line.

The TU102 supports a maximum of 4,608 CUDA cores, 576 Tensor cores, and 10 Giga Rays/sec spread across 36 streaming multiprocessors (SMs), with 128 CUDA cores and 16 Tensor cores per SM. As usual Nvidia can partially disable chips to create lower tier models—or more likely, it can harvest chips that are partially defective. The RTX 2080 Ti uses 34 SMs, giving it 4,352 CUDA cores and 544 Tensor cores as far as we can tell. Nvidia hasn’t given specific details on the RT core counts, but RTX 2080 Ti is rated at the top 10 Giga Rays/s that Nvidia also uses for the Quadro RTX 6000, so it doesn’t appear to have any disabled RT cores.

The second Turing chip for now is one step down in size, but Nvidia hasn’t provided any specific figures for the TU104 yet. It has a maximum of 24 SMs, and it will be used in the RTX 2080 and RTX 2070. The 2080 disables just one SM, giving it 2,944 CUDA cores and 368 Tensor cores from what we can tell. It’s also rated at 8 Giga Rays/s, suggesting the RT cores may not be directly integrated into the SMs. The RTX 2070 meanwhile disables six SMs, for 2,304 CUDA cores and 288 Tensor cores, and 6 Giga Rays/s. Die size is likely in the 500-550mm2 range, with around 12-14 million transistors. More importantly, TU104 will cost less to manufacture, so it can more readily go into $500 parts.

Wrapping up the Turing and GeForce RTX hardware, all the new GPUs will use GDDR6 memory, and based off the VRAM capacities Nvidia is using 8Gb chips (whereas Quadro RTX uses 16Gb chips). The TU102 has up to a 384-bit interface, and the 2080 Ti disables one 32-bit channel to end up with a 352-bit interface, and the TU104 has up to a 256-bit interface. Using 14 GT/s GDDR6 for both the 2070 and 2080 means they end up with the same memory bandwidth, which likely means the 2070 has more bandwidth than it will normally use. GDDR6 officially supports speeds of 14-16 GT/s, and Micron has demonstrated 18 GT/s modules, so Nvidia is going for the lower end of the spectrum right now. We could see faster memory in the future, or on partner cards.

What is ray-tracing, and is it really that big of a deal?

That’s it for the architecture (for now, at least), but I promised to get back to those RT cores and why they’re important. Nvidia is putting a lot of money into ray-tracing with Turing, which it often refers to as the “holy grail” of computer graphics. That’s because ray-tracing can have a profound impact on the way games are rendered. It’s a big enough change that Nvidia has dumped GTX branding on the new 20-series parts (at least the 2070 and above), shifting to RTX. You could try and say that it’s just marketing, but doing anything close to real-time ray-tracing is pretty incredible, and in 10 years we may be looking back at the introduction of RTX just like we currently look back at the introduction of programmable shaders.

Explaining what ray-tracing is, how it works, and why it’s better than alternative rendering models is a huge subject. Nvidia and many others have published lengthy explanations—here’s a good starting point if you want to know more, or check out this series of seven videos on RTX and games. Fundamentally, ray-tracing requires a lot more computational work than rasterization, but the resulting images are generally far more accurate than the approximations we’re used to seeing. Ray-tracing is particularly effective at simulating lighting, including global lighting, point lights, shadows, ambient occlusion, and more. With RTX, Nvidia enables developers to come much closer to simulating accurate lighting and shadows.

Instead of explaining exactly how ray-tracing works, it’s better to look at some examples of how it’s being used in games. There are currently 11 announced games in development that use Nvidia’s RTX ray-tracing (and probably others that haven’t been announced). There are 21 games total that use some portion of the new RTX enhancements that Nvidia’s Turing architecture provides, and here are a couple of specific examples of games using ray-tracing.

This clip from Shadow of the Tomb Raider shows how RTX ray-tracing can improve the lighting model. The key elements to notice are the point lights (candles) in the foreground and the shadows those create. Adding dynamic point lights can drastically degrade performance with traditional rasterization, and the more point lights you have, the worse it gets. Developers and artists spend a lot of time currently to come up with approximations that can look quite good, but there are limits to what can be done. Ray-tracing provides a far more accurate rendition of how light interacts with the environment.

Here’s another clip showing how ray-tracing improves the lighting in Shadow of the Tomb Raider, this time with two cone lights and two rectangular area lights. Everything looks good in the traditional mode, with shadows altering based on the lights, but the way those shadows blend doesn’t properly reflect the real world. The RTX lighting in contrast uses physically based modeling of the environment, and it shows the green and red spotlights blending together, haziness around the edges of shadows, and more.

Another ray-tracing example showcasing global illumination is Metro Exodus. Here the traditional model lights up the whole room a lot more, while the ‘correct’ ray-traced lighting has deep shadows in the corners, bright areas lit by direct lighting, and indirect lighting helping to make some areas still clearly visible while others are not. The opportunities this gives to artists and level designers are interesting, though I have to note that ‘realistic’ shadows isn’t always more fun.

I got a chance to play the Metro Exodus demo, which allowed me to dynamically switch between RTX on/off. Walking around some dilapidated buildings, with RTX lighting the rooms are much darker. That can create a sense of dread, but it also makes it more difficult to spot objects and figure out where you’re going and what you can do. Regardless, the look and feel of the Metro world was excellent, and the RTX lighting makes for a very different experience—this isn’t just some fluffy tweak to graphics to provide slightly different shadows; RTX lighting clearly alters the environment and affects gameplay.

There is a second downside, however: RTX has higher performance requirements. All the games being shown are in alpha or beta states, so much could change, but it’s clear that enabling all the fancy RTX effects causes a performance impact. I saw periodic stutters in Shadow of the Tomb Raider, Metro Exodus, and Battlefield V—the three biggest names right now for RTX. The visual difference can be impressive, but if the performance drops in half compared to traditional rendering techniques, a lot of gamers are likely to end up disabling the effects. There’s work to be done, and hopefully that work comes more in the form of software updates to improve performance without sacrificing quality, rather than needing to wait for a couple more generations of hardware before this stuff becomes practical.

Nvidia’s RTX is the shape of the future

If you’ve followed the graphics industry at all, it’s always been clear that the goal was to get to real-time ray-tracing, or at least use some elements of ray-tracing in a real-time graphics engine. Our graphics chips have come a long way in the past 30 years, including milestones like the 3dfx Voodoo as the first mainstream consumer card that could do high performance 3D graphics, the GeForce 256 as the first GPU with acceleration of the transform and lighting process, and AMD’s Radeon 9700 Pro as the first fully programmable DirectX 9 GPU. Nvidia’s Turing architecture looks to be as big of a change relative to its predecessors as any of those products.

Like all change, this isn’t necessarily going to be a nice and clean break with the old and the beginning of something new. As cool as real-time ray-tracing might be, it requires new hardware. It’s the proverbial chicken and egg problem, where the software won’t support a new feature without the hardware, but building hardware to accelerate something that isn’t currently used is a big investment. Nvidia has made that investment with RTX and Turing, and only time will tell if it pays off.

Unfortunately, for the next five years at least, we’re going to have a messy situation where most gamers don’t have a card that can do RTX—or even the generic DirectX RT from Microsoft. I’m going to be talking with some developers that are using RTX for ray-tracing to find out just how difficult it is to add support to a game. Hopefully it’s not too hard, because most developers will need to continue supporting legacy products and rasterization technologies.

Even long-term, RTX extensions may not win out—it’s proprietary Nvidia technology, so AMD is completely locked out right now. Ideally, standards will develop, just like they did with Direct3D, and eventually games can support a single APU that will do ray-tracing on whatever GPU/processor that’s in a system. We’re in a pretty good spot with DirectX 11/12 these days, so maybe DirectX RT 5.0 will become that standard. But regardless of how we get there, real-time ray-tracing or some variant of it is set to be the next big thing in PC gaming. Now we just need to wait for the consoles and software to catch up to the hardware.

But how does the hardware actually perform? Stay tuned for our full review of the GeForce RTX 2080 Ti and RTX 2080, on or around September 20.

Let’s block ads! (Why?)

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *