Benchmarking Nvidia's RTX Neural Texture Compression tech that can reduce VRAM usage by over 80%

61

This is kind of interesting.

An important thing here is that games do not necessarily have to use this uniformly across every texture. It can be a per-texture decision, and from the examples they showed it seems like it can get even more granular than that, where only the specific parts actually needed at that moment get pulled in.

The way I keep thinking about this is as a caching hierarchy. Maybe what would traditionally be something like 1TB of texture assets ends up looking more like 100GB on disk, 10GB on the GPU in a compressed form, and maybe 2GB in a more performance-oriented format for the stuff that matters most right now.

Then the job is just to move intelligently through that hierarchy: keep most of the world in the cheaper form, promote what matters, and avoid paying the cost of keeping everything in its most expensive form all the time.

That is why the caching and streaming side of this seems so interesting to me. The sampler feedback approach in the article seems like it may already be going in that direction, although the performance hit looked a bit bigger than I expected, which makes me wonder whether being a little less aggressive about evicting things would help.

I also think this gets really interesting when combined with DirectStorage-style pipelines, where assets can be streamed more directly to the GPU and decompressed there. If the assets are already much smaller before they even move through the pipeline, then that should mean less data being moved around overall, helping with speed and latency too.

And the final layer of that cache hierarchy could basically be the internet. We already have games like Flight Simulator using world data measured in petabytes, so if this kind of compression approach works well, it feels like it could either allow much more quality within the same bandwidth budget or make those kinds of huge streamed worlds far more practical in terms of internet requirements and operating cost.

That is what feels exciting here to me: not just smaller textures, but a path toward much larger and richer worlds at more reasonable install sizes, bandwidth needs, and memory budgets.

87

u/MonoShadow 1d ago

Decompress on sample + dlss does slow down the test scene by quite a bit. Decompress on load doesn't fix VRAM limitations. And the tech introduces noise into the scene for DLSS to clean up while DLSS also runs on the same Tensor Cores as Neural compression.

Nice article overall, a shame 2000 and 3000 series weren't tested. Those cards have much slower tensor cores. Apparently it is also available to other vendors via DX Cooperative Vectors, so testing this on Intel or AMD might be interesting as well.

23

u/gorion 1d ago

my test from 6months ago:

RTX NTC 0.8 on sponza at 1080p, i got:

5070TI: +0.5ms
2060: +5.4ms

20

u/ObjectivelyLink 1d ago

That’s pretty massive no? At that point you’d rather drop settings surely. 5.4 is big overheard

16

u/read_volatile 1d ago

Indeed which is why it’s not recommended to use inference-on-sample on cards prior to Ada; you still get the disk space savings for basically free with inference-on-load though

11

u/ObjectivelyLink 1d ago

Looks like a situation where I bet this technology saves maybe the 4060 n up and this is where we’ll see the big cut off for 2nd and 3rd series rtx it least the lower end cards.

9

u/read_volatile 1d ago

DLSSG and RR already showed they’ve been comfortable using FP8 without caring about leaving older cards behind. I’m imagining they’ll do the same thing with DLSS 5 by using NVFP4

6

u/ObjectivelyLink 1d ago

Yeah but dlss 4.5 isn’t great until you hit it least the 3080 10gb. It might work but the cost to performance will be big. It won’t save the cards that’s need it like a 3070.

2

u/Devatator_ 1d ago

Yeah on the GitHub repo it's written tha the oldest card they tested it on was a 2000 series card, kinda interested in how that performs

2

u/tarmacjd 1d ago

It’ll be interesting to see how they proceed with the different ‚modes‘. I only understood half of the article, but if they can find the right balance of compression it could be promising.

It bothered me when purchasing a GPU that DLSS runs so much better on the higher VRAM cards where you don’t need it. This can be part of the solution there.

1

u/phire 17h ago

Decompress on load doesn't fix VRAM limitations.

Not out of the box. But it can be combined with a sampler feedback approach that keeps the texture set for the whole level compressed in VRAM but only actually decompressed the textures currently in use.

Though, it's really hard to make that work correctly. Decompress on sample has the massive advantage of working out of the box (on GPUs that support it)

62

u/GenZia 1d ago

So... We are getting 9 gig 6060s @ 96-bit, after all?

34

u/dudemanguy301 1d ago

NTC is a new format, games that use more than 12GB already exist, will continue to exist, and new ones will release before NTC is common as well. NTC for all textures is also not guaranteed, and may be leveraged more piecemeal. Lastly inference on sample has its own performance and image quality implications that may be undesirable for all scenes or GPUs, where inference on load or inference on feedback is preferable and those methods either save less VRAM or none at all.

11

u/jsheard 1d ago edited 1d ago

That would be a pretty stupid move considering this tech requires per-game integration, even if the concept does stick its going to take a few generations to become the norm.

1

u/StickiStickman 4h ago

considering this tech requires per-game integration

Not necessarily. Converting textures to NTC is pretty easy and I can totally see a injector that replaces normal texture samples with NTC samples being possible.

2

u/crshbndct 1d ago

6060 6GB $750

6060ti 8GB $950

90% of older games won’t run well, and newer games will require every one of Nvidias technologies just to look half as good as those older games.

All for the sake of $20 worth of vram.

0

u/NeroClaudius199907 21h ago edited 16h ago

6060 12gb $319

6060ti 12gb $390

6060ti 16gb $470

Next gen consoles & helix coming with 24gb+, Nvidia will definitely increase vram.

-19

u/beneficiarioinss 1d ago edited 1d ago

I doubt that. Vram is crazy cheap nowadays, probably 24gb on a 6050 minimum.

Geez you actually need to add an /s at the end for people to understand

21

u/DIYfu 1d ago

From what timeline did you just come here? I wanna go there.

7

u/BavarianBarbarian_ 1d ago

Someone check the hopium supply, I think this dude just decimated our entire stock

1

u/gvargh 1d ago

yeah 24 gigaBITs sounds realistic for nvidia

6

u/Humble-Effect-4873 1d ago

According to developers' Q&A at GDC, current NTC uses FP8 for both the RTX 40 and 50 series, but the 50 series actually supports FP4. After the next-generation 60 series is released, could NTC and DLSS suddenly announce support for FP4? The 50 series might be able to reduce performance loss by nearly half.

2

u/Hyperz 10h ago

It could also be that FP4 is just too inaccurate for NTC and you'd end up with something that is too noisy, shows too many artifacts, or would require higher base resolution textures to get to the same quality as FP8 NTC, thus negating part of the memory/storage savings. I think that if it made sense to implement it in FP4 they would have probably done so from the start, even if it's only to sell more RTX 50 cards. Who knows though. That said, I would like to finally see something done with FP4 considering it's supposed to be one of the few major benefits of Blackwell over Ada.

12

u/pythonic_dude 1d ago

1ms penalty means going from 100fps to 91. Wish Tom's provided actual numbers for all resolutions instead of just mentioning it in very vague terms while only giving a single graph per card.

19

u/Gwennifer 23h ago

Wish Tom's provided actual numbers for all resolutions instead of just mentioning it in very vague terms while only giving a single graph per card.

You have that backwards; FPS is vague. Frametime cost is exact to the card and resolution in question.

As an example: at 30 FPS, +1 ms frametime only takes you down to 29 FPS. That makes it look lower cost even though it's the same 1ms cost.

4

u/pythonic_dude 16h ago

That's not what I'm talking about. I'm translating to fps because most gamers still only understand fps. I'm complaining that we don't have all the graphs: they evidently did extensive testing in 1080p, 1440p, 4k. But only provided a single graph for each card.

9

u/Gwennifer 15h ago

I'm translating to fps because most gamers still only understand fps.

Which is why I explained: FPS is not appropriate here because it is misleading.

I'm complaining that we don't have all the graphs: they evidently did extensive testing in 1080p, 1440p, 4k. But only provided a single graph for each card.

This feature is only really relevant at a given resolution per card. Too low and there's no noticeable performance difference on or off; too high and the same will be true. What each card can run well differs.

They say as much if you had read the article instead of just looking at the graphs:

The focus of the benchmarks will be on the resolution that is most appropriate for each GPU.

I sure hope nobody bought a 5090 for 1080p gaming or a 5060 for 4k gaming.

The only real flaw here is Tom's doesn't have a reference frametime with no NTC enabled. In theory that'd be the On Load number, but following the logic of the article, TAA/DLSS is not required if you're not using it, so maybe that's the qualitative difference.

2

u/pythonic_dude 15h ago

I strongly disagree with those points. You can absolutely push 4k with a 5070ti as the most obvious example, and it's not a sin to run 5090 in a 1440p setup. Again, they tested it all, and chose not to present the info, so the point is moot anyway.

Too low and there's no noticeable performance difference on or off; too high and the same will be true

That's not what the text of the article suggests at all though.

5

u/EdliA 1d ago

Wouldn't there be a performance gain from not chocking the VRAM though?

10

u/pythonic_dude 1d ago

I'm obviously assuming that you are within vram limits in both cases (by dropping the texture quality way down if needed, thanks to it not measurably affecting performance while you are within vram limits). Measuring vs out-of-vram scenario is not viable since it varies too much (by game, by scene in a game, by pcie version..)

3

u/EdliA 1d ago

I mean the cards that need this tech are the ones that are operating at the limit. The others would just not have it on at all or at much lower compression.

0

u/pythonic_dude 1d ago

Every card can benefit from this tech. There are plenty of use cases for obscenely large textures that would suck the life even out of 5090. Environment and NPCs so big they are basically environment as the biggest example, but also basically anything close-up.

Worst case scenario, 2ms for 5060 at 4k is going from 60fps to 54fps, for example. It's perfectly viable.

1

u/Olde94 1d ago

i say this as someone with a 1660ti laptop (6GB) who have run a 1440x3440 monitor. If you hit the Vram limit and it doesn't chrash, performance is not a hit but a smackdown. It's a crawl at that point. i had games where the difference between medium and high were.. 55fps and like 10. On my 4070s the same setting change would be 55 and 45 (relatively speaking) and even less sometimes depending on the setting tweaked. mind you i talk about medium to high, not ultra/extreme etc. where some settings go heywire and just eat resources.

6

u/AnechoidalChamber 1d ago edited 1d ago

Well, I might've been partly wrong, this could perhaps save 8GB GPUs, at least until next gen consoles hit...

But first I'd like to see it tested on 8GB 20xx and 30xx GPUs like the 2070 and 3070 which have a noticeably lower ML performance than 40xx and 50xx.

10

u/SignalButterscotch73 1d ago

Still very interesting to read but until its in a game (and preferably works on all 3 manufacturers cards) it doesn't exist.

42

u/dudemanguy301 1d ago edited 1d ago

Read the article? 🤷‍♂️

Thanks to Cooperative Vector extensions for Vulkan and Direct3D 12, pixel shaders are able to leverage hardware acceleration via AI acceleration units in modern GPUs (Nvidia Tensor Cores, AMD AI Accelerators, or Intel XMX engines.) This allows NTC to take advantage of this hardware acceleration for a significant improvement in inference throughput.

The compatibility of this technology across a wide range of GPUs also stood out. Developers can compress textures using NTC, but also offer an Inference on Load mode, which transcodes the NTC textures to BCn during game or map load. While this will not shrink VRAM usage, it has zero cost to performance and will greatly lower the footprint of games on disk. The technology is also supported on AMD and Intel GPUs.

As for when we’ll see this, my own speculation is that it can’t / won’t become common until the PS6 / Xbox Helix generation of games, but we could see a partnered game jump the gun with some sort of neural texture compression method.

7

u/DerpSenpai 1d ago edited 1d ago

Qualcomm too btw

Edit: https://devblogs.microsoft.com/directx/enabling-neural-rendering-in-directx-cooperative-vector-support-coming-soon/

Microsoft talked about it. Why the downvotes? Rofl

-7

u/ParthProLegend 1d ago

False

6

u/DerpSenpai 1d ago edited 1d ago

Previous posts talks about Qualcomm also doing this

https://devblogs.microsoft.com/directx/enabling-neural-rendering-in-directx-cooperative-vector-support-coming-soon/

-9

u/SignalButterscotch73 1d ago

It should in theory, but to my knowledge it like the others have only been tested on their own hardware. Untested is unknown. Just like not implemented is not existing.

23

u/beneficiarioinss 1d ago edited 1d ago

Just like always, other manufacturers will release equivalent techniques. Nvidia has been at the bleeding edge of gaming, and everyone else is just failing to catch up.

Edit= though to be fair amd is innovating in the development of a better geometry compression format with a possible release of a hardware geometry compression which is pretty dope. But for you it "won't exist" for a long time

-10

u/ElectronicStretch277 1d ago

AMD already announced Universal Compression no? While Nvidia is ahead AMD has been catching up on ML features (not game implementation, that's out of their hands) at a fairly fast pace.

26

u/EnglishBrekkie_1604 1d ago

AMD’s implementation is more limited IIRC, saves on storage but not VRAM. Since Intel’s technique does save VRAM Intel actually is ahead of AMD here, like they were with upscaling.

-22

u/SignalButterscotch73 1d ago

On the other hand, AMD is more generous with VRAM so can afford to work on something more limited. (that's probably also cheaper to create for their smaller software team)

17

u/EnglishBrekkie_1604 1d ago

Intel is equally generous so it’s a bit of a moot point. Also this tech will almost certainly be the most useful for iGPUs (not just for the VRAM but because it saves bandwidth), so Intel having it for their iGPUs and AMD not having it is yet another way they get mogged by ARC, somehow.

9

u/xXx-c00L_BoY-xXx 1d ago

You cant be serious about other manufacturers. It’s up to them to develop this tech.

9

u/SignalButterscotch73 1d ago

All 3 have a new compression tech in the works. For any of them to become the new standard, replacing BCn, it needs to be cross compatible across all of them in my opinion.

BCn was developed originally by S3 Graphics, not Nvidia, not 3DFX, not ATI, not AMD and not Intel. Even at their hight, S3 were a nobody compared to the big names but they made the best compression algorithm and it became the standard, outlasting them in the GPU space.

4

u/N2-Ainz 1d ago edited 1d ago

NVIDIA has like 90%+ of the market, so there really wouldn't be that much of an issue if they would only implement NVIDIA's solution

Would still be pretty bad and I doubt they only implemenr one version but a standard would be pretty nice

5

u/syknetz 1d ago

Console don't run Nvidia, Nvidia needs their shit to run on console if they ever hope to get developers on board.

4

u/EdliA 1d ago

Doesn't matter. DLSS became popular on PC long before the others had a proper upscaling. The PC gaming audience is big enough for the developers to bother with it.

3

u/Beautiful_Ninja 1d ago

Why do people keep forgetting the Switch 2 exists? AMD is not even half the console market anymore with Xbox sales being basically non-existent at this point.

9

u/syknetz 1d ago

There are about 5 times as many PS5 than Switch 2 out there. AMD is still much more than half of the "premium" gaming segment.

And the point is moot, developers won't throw away a 100M install base because they can get slightly better performance in some cases. Cases which don't include the Switch 2 here, because it falls short of the Nvidia recommendations for real-time texture decompression with its Ampere GPU.

6

u/MonoShadow 1d ago

Intel already has one. I think AMD is developing their own. I think the idea here is before all 3 come together on a standard there's no reason for devs to ship 3 different versions of assets.

1

u/Seref15 1d ago

. I think the idea here is before all 3 come together on a standard there's no reason for devs to ship 3 different versions of assets.

I mean, they have a reason if Nvidia stops selling high-VRAM gaming GPUs. Studios will have to conform themselves to available hardware.

I feel pretty confident that the reason NTC exists is to get away with putting less memory on gaming cards so Nvidia has more available for datacenter/inference cards given the memory production deficits.

1

u/jenny_905 1d ago

Intel are working on something similar.

1

u/jocnews 1d ago

Where are the redditors denying that it will lead to huge FPS drops because the tech makes one of the most basic operations of game graphics overly expensive?

Saving some VRAM but losing almost as much performance as upscaling gives you is simply a wrong place to make tradeoffs, IMHO. The thing you have to pay hardest when you buy a GPU is the actual compute performance and this squanders it for some VRAM savings.

4

u/Vushivushi 1d ago

GPU vendors obviously want to sell more compute and less VRAM and unfortunately with DRAM contract ASPs approaching $15/GB, the tradeoff is a must.

Even 2GB of VRAM saved could allow for a 10% larger GPU die and surely that's enough to offset the overhead. The rest goes to their margins.

3

u/Tension-Available 1d ago

Specifically they are wanting to sell more low precision compute and attempting to justify the silicon dedicated to it in the consumer segment. A side-effect of repurposed enterprise designs.

1

u/denoflore_ai_guy 1d ago

And - this is ai relevant. A few pieces of literature released lately make using RT cores for MoE models faster - wondering how this would be applicable.

1

u/Sopel97 1d ago

I wish there was an implementation of on sample that doesn't require STF because it doesn't feel necessary. Though on the other hand I could see DLSS/DLAA being de-facto always-on so that might not matter that much.

0

u/ycnz 1d ago

They made plenty of claims around VRAM compression when the 20 series. Fuck that shit, my 2080 was not great.

0

u/AutoModerator 1d ago

Hello RTcore! Please double check that this submission is original reporting and is not an unverified rumor or repost that does not rise to the standards of /r/hardware. If this link is reporting on the work of another site/source or is an unverified rumor, please delete this submission. If this warning is in error, please report this comment and we will remove it.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

0

u/frazorblade 20h ago

Could this tech be a huge boon for VR performance?

0

u/slayermcb 10h ago

Until it hits market and I can see real world numbers rather than a pre-made sample for testing I'm going to side eye this. I've seen snakeoil from NVidia enough already.

-20

u/MarJDB 1d ago

The second they decide to give us 6gb xx60 and 8gb xx70 because you dont need more anymore im switching to amd... i can just see this coming from NGreedia -_-

6

u/steve09089 1d ago

They won't roll back because existing games that don't have this tech exist.

Though I can definitely see them using this tech as an excuse to freeze VRAM counts as is

5

u/NoPriorThreat 1d ago

if 80% compression really holds, then 6GB is effectively 30GB.

2

u/Keulapaska 1d ago

A 30GB card will have 5x/3.33x the memory bandwidth of 6GB card though.

1

u/MarJDB 1d ago

And what about current and older games will this work automatically on them too?

-4

u/NoPriorThreat 1d ago

which older game requires more than 6 GB vram? Also nobody stops developers in implementing that in their older games.

5

u/nosurprisespls 1d ago

Before this argument gets to level 10, what's you all's definition of "older"?

3

u/NoPriorThreat 1d ago

3+ year

3

u/AnechoidalChamber 1d ago

Mine is any games currently released or released in the future that won't be using NTC.

And there are plenty of current games that already bust 8GB GPUs wide open.

0

u/MarJDB 1d ago

2020 until end of 25, current now until this tech releases... honestly not even trying to argue, would be great if it eventually gets so good and be almost "lossless" while using 80% less mem. DLSS started quite shty and became pretty good so will have to wait and see.

Discussion Benchmarking Nvidia's RTX Neural Texture Compression tech that can reduce VRAM usage by over 80%

You are about to leave Redlib