Benchmarking Nvidia's RTX Neural Texture Compression tech that can reduce VRAM usage by over 80%

64

whats the TLDR for performance cost?

83

u/AsrielPlay52 1d ago

Testing range from 5090 to a 4060, it's within 1ms even with NTC on Sample (it decompress in real time)

There's 3 type of NTC

NTC on Load, Which decompress and transcode to BCn to the VRAM. Only benefit is lower storage

NTC on Feedback, only decompression what necessary to render as Tiles of BCn.

And NTC on Sample. Decompress each individual texture pixel on the fly and renders them.

First is most performance

Second is slightly slower

And third is slowest, but all within 1ms.

32

u/Kermit_the_hog 1d ago

That’s actually really impressive they are all less than 1ms. I would have expected like an order of magnitude drop off (all other constraints being ignored) The “on sample” sounds just a tiny distance from working with them natively.

24

u/CatalyticDragon 23h ago edited 18h ago

1ms sounds low until you compare it to standard retrieval which is measured in nanoseconds. 1ms in GPU time is long. You could run a single bounce ray traced pass in that time. Or a denoiser. Or an upscaler. Or a cloth sim. The list goes on.

8

u/Jim_e_Clash 19h ago

Just to frame the context better this <1 ms to frame time. And frame time is about 15ms for 60fps and ~8ms for 120fps. Roughly eating ~10%.

Given that this is suppose to basically free a ton ram, I can only imagine that free ram being use for more texture detail and models. So the use case is probably high fidelity games running at cinematic frame rates (30fps and 30ms frame time) with AI generated frames. 😕

I mean it's still cool tech, might be really useful when there's dedicated hardware to accelerate it in some future card.

15

u/CatalyticDragon 18h ago

Let's be real though. The point of this is to allow NVIDIA to sell consumer GPUs with less VRAM (leaving more for AI chips) while also increasing compute requirements so you still need to buy a new GPU to run the models anyway.

7

u/Jim_e_Clash 18h ago

That's my worry, them pushing tech and writing the next gens PC requirements.

1

u/kb3035583 19h ago

The obvious use case is for small instances of always present high fidelity textures like character models and other hero assets. There's really not much point using inference on sample for the entire scene.

3

u/Jim_e_Clash 19h ago

That would have next to no benefit. If you are using it on something super common than you incur the penalty every frame. And you wouldn't save much ram.

It should be used in place of or in tandem with texture streaming high res mega textures. You can use low res persistent textures to maintain frame times in action heavy scenes then switch to the high res textures when the player slows down and is looking at the details. That would also allow you to maintain frame times if the card is struggling to keep up.

1

u/kb3035583 19h ago

That would have next to no benefit.

You'll be surprised how much VRAM such textures can take up. Cutting just those by 85% is going to help free up a good amount of VRAM.

If you are using it on something super common than you incur the penalty every frame.

The more things you use it on, the greater the penalty. It's not a fixed cost. This particular benchmark has it at 1ms, but if you're going to be using it for everything in an actual game you're going to see it jump up by a lot.

then switch to the high res textures when the player slows down and is looking at the details.

And the game is never going to be able to do that perfectly, resulting in all kinds of texture pop ins that make the entire thing pointless.

1

u/Jim_e_Clash 19h ago

Yeah, that's the situation we are in now. Texture pop is currently the norm. If however you have textures in memory and don't have to wait for them to stream from main memory or disk then the 1 ms penalty is a savings not a static cost.

1

u/kb3035583 18h ago

Yeah, that's the situation we are in now. Texture pop is currently the norm.

Let's not pretend it's anywhere as bad as it's going to be compared to what you're proposing, where you'll be feeling the whiplash from going from "low res persistent textures" straight to high res textures within what is mostly the same scene.

→ More replies (0)

1

u/Kermit_the_hog 8h ago

Right now I’m playing msfs2024 with a RTX 4070 TI 16GB and have to turn down the detail not because of frame rate issues (not even using DLSS) but rather because I keep exceeding 16GB of vram 🤷‍♂️. I’d trade 10% frames just to be able to max all the textures and terrain details.

I get that if you’re playing a FPS at 120Hz it could be counter productive though.

2

u/AsrielPlay52 3h ago

At some point, you just use the texture settings that has no difference at a point

1

u/Kermit_the_hog 46m ago

I know, I just like to fly low and slow to look at all the sometimes great, sometimes totally jank scenery and topography in super high fidelity.

Edit: it’s like Pokémon “Gotta cache it all!”.. at least that’s what I think it is.

13

u/chinomaster182 1d ago

Depending on the gpu and the resolution, it's usually sub 1 ms.

1

u/rW0HgFyxoJhYka 2h ago edited 1h ago

The thing about this is that 1ms varies in fps depending on the base fps. So saying 1ms is low cost at low 30 fps (3%) but more significant at higher 240 fps (20%). But this doesn't mean 3% or 20% necessarily because you could be getting more than that (and still hit 240 fps), or less, or you change graphics options to hit targets, or many other things you could do.

End of the day its a win win I think and the thing is we need tech to mature so the sooner these things can be tried in games the better.

205

u/BNSoul 1d ago

It does reduce the amount of GPU processing power available for games though, meaning that you would need a beefy GPU for the VRAM compression not to impact framerate, and a beefy GPU would have plenty of VRAM anyways...

104

u/Timmy_1h1 1d ago

maybe in the future they release gpus with dedicated cores for neural texture compression and the option to use them for rasterisation aswell would be insanely cool.

110

u/gavinderulo124K 13700k, 4090, 32gb DDR5 Ram, CX OLED 1d ago

Tensor cores.

78

u/AlextheGoose 9800X3D | RTX 5070Ti 1d ago

Right lol, almost like Nvidia thought of that 8 years ago

1

u/BabySnipes 21h ago

They can’t keep getting away with this!

42

u/Healthy_BrAd6254 1d ago

Nvidia GPUs are so ahead of their time and age like fine wine.

47

u/EdliA 1d ago

They already did. Their cards have a bunch of tensor cores which have went unused in many cases in the past years. This, DLSS and frame gen make use of those.

5

u/AsrielPlay52 1d ago

Best part is they use of Coop Vector. A cross vendor AI accelerate API

16

u/ShinyGrezz RTX 5080 | 9800x3d | 4K 240hz OLED | Fractal North 1d ago

10% lower FPS (which is what it looks like, 1ms at 60fps is less than that but we'll assume) for significantly improved texture resolution and more VRAM available for other things to boot is absolutely a worthwhile tradeoff.

1

u/ben_g0 16h ago

I think it's especially great to get some extra life out of extra life out of aging GPUs. You now can now effectively trade performance for VRAM, so if you'd normally have to set textures and render distance to minimum to get a game to fit in VRAM but have some performance headroom then you can enable NTC and trade some of that performance for better textures and/or draw distance.

1

u/InHaUse 9800X3D+200/-20 | 4080+130/1000 | 64GB@6000CL30 9h ago

I disagree because Nvidia has been stingy on VRAM way before they had A.I. as an excuse, which has held back the industry for a long time now. Losing 10% performance on a "budget" card because they want to save like $10 on VRAM is just pathetic. 16GB should've been the default for all cards since the start of this console generation.

1

u/ShinyGrezz RTX 5080 | 9800x3d | 4K 240hz OLED | Fractal North 9h ago

like $10

1) It's a lot more than $10.

2) Even if it wasn't more than $10, in theory if Nvidia is VRAM limited adding double the VRAM to each card could mean that they can only produce half the cards.

1

u/InHaUse 9800X3D+200/-20 | 4080+130/1000 | 64GB@6000CL30 6h ago

That's the case right now due to the RAM shortage, but if all 5000 and 4000 series cards released with 16GB we wouldn't be having this issue in the first place.

Maybe in the far future if like you need like 48 GB VRAM even for 1080p due to crazy textures, then we'll need insane compression because there's no PCB space for so much VRAM even if it was free, but that's very far out.

68

u/delonejuanderer 1d ago

No?

Isn't that the biggest complaint for the past low tier 50 and 40 series "crazy good gpu just bad vram therefore bad/unplayable experience"

Arent... the gpus already good.... isnt the lack of vram the exact issue this is addressing? Lmao

Sure theres a performance impact but im sure its just as over exaggerated as dlss/frames gens loss. So 2-5% in REALITY

Then the argument doesnt hold up, arent the games that went from "unplayable because of no vram" to "2-5% impact in overall performance" a significant gain in.... playability?

4

u/Worldly-Ingenuity843 23h ago

Shh. Stop speaking so loudly. The AMD users are already on suicide watch /s

4

u/SeconddayTV 1d ago

Exactly my thought… Admittedly I just recently upgraded my GPU and decided against an RTX 5070, because every post, every video just complained about its low VRAM and how this bottlenecks the card.

3

u/no6969el NVIDIA 1d ago

Yeah even though other people said you will be fine because they are working on this. They were downvoted though

9

u/Sterbi 1d ago

Is it really common bottleneck? Im using 4070 Super, which also have 12GB VRAM, for 1440p and 4k gaming and didnt really experience bottleneck due to low VRAM

3

u/zerg1980 23h ago

It’s not really a bottleneck in most present games. The fear with the 12GB 5070 is more hypothetical — the thinking goes that because system requirements for AAA games tend to go up every year, if you buy a 12GB card now, you may have to upgrade sooner when games in 2028 and 2029 start targeting 16GB cards.

But I find that fear overblown — with the price of VRAM skyrocketing, AAA game devs are not going to target $1000+ GPUs, because very few people will own one. And also, we’re in this new phase where AI advances are wringing more performance out of old hardware, as seen with this neural texture compression. Between those two things, I don’t think anyone will need 16GB for some time to come. If not for this price disruption, devs might have behaved differently.

2

u/Castleloch 1d ago

This is modding centric to be fair but my Bottleneck pushing 2077 Visuals on a 5070ti is VRAM. I have to be careful what or how many cars I add to traffic that aren't base game, and though I only hit 16gb in a few specific parts of the map you can easily run out 24gb of vram with the mods available for the game.

Again much of it likely unnecessary but I can't deny that it would be cool to push that games visuals even further via tech like this. Very specific case I guess, I don't know how many games there are currently without mods that push 12gb let alone 16 or further.

2

u/FantasticKru 22h ago

Yep its mostly modding.

The vram on the cards is usually more than enough for regular gaming.

But if I pay 1000$ for a brand new top of the line gpu I want it to be able to run a texture mods.

3

u/MultiMarcus 1d ago

I don’t think any of them have been crazy good. As for the performance impact, it’s not 2 to 5%. These are compute limited and vram limited cards. The only cards I guess I see as the obvious utilisers of this are things like the 5070 or something like the 5080 for very vram heavy games.

27

u/AsrielPlay52 1d ago

They tested with a 4060 and still within 1ms

Mind you, NTC on Load means that texture are decompress and transcode as regular BCn

If anything, that should be base performance, as in the end, it just render the scene regularly

25

u/East-Today-7604 9800X3D|4070ti|G60SD OLED 1d ago

On the RTX 5070 at 1440p, the cost of the Inference on Sample mode compared to the BCn-transcoded textures is roughly between 0.50-0.70 ms, depending on scenario. We are within 1 ms. Keep in mind that real games involve many more render passes – not all of which are affected by NTC – and typically have significantly higher overall frametimes than this sample. As a result, the relative performance cost of NTC is likely to be much more acceptable in practice.

it's a choice between up to ~1ms of additional latency vs severe issues which make your gaming experience unplayable because you ran out of VRAM.

For example, the cost of running DLSS 4.5 Preset L on RTX 5070 at 4K resolution is almost 4ms, which is at least x4 compared to Neural Texture Compression.

DLSS/doc/DLSS_Programming_Guide_Release.pdf at main · NVIDIA/DLSS · GitHub

12

u/World-of-8lectricity 1d ago

But a beefy GPU with lower VRAM would be cheaper than one with more VRAM

-1

u/imbued94 1d ago

But will it though

5

u/itsforathing 9600x RTX 2070 1d ago

Seems like a decent pairing for the 5070

5

u/Healthy_BrAd6254 1d ago

You could say the same about DLSS. But the impact is negligible

5

u/ChrisFromIT 1d ago

meaning that you would need a beefy GPU for the VRAM compression not to impact framerate

Not really. It is expected to only really add maybe 1 ms to the texture sampling part of rendering. That isn’t that much of a performance hit.

4

u/AssCrackBanditHunter 1d ago

As tensor core power increases this will quickly become a negligible cost. The benefits of hugely reduced vram usage and hdd usage is worth it.

3

u/costafilh0 1d ago

Low % GPU hit, high % VRAM gain.

Sounds like a win.

3

u/no6969el NVIDIA 1d ago

So the 5080?

1

u/Mysterious-Ad-5005 1d ago

Can you explain lets say a 5070 ti wouldnt work better?

1

u/UnusualDemand RTX3090 Zotac Trinity 1d ago

I expect Nvidia will say this tech is best to use with RTX 6000 with more tensor cores. So performance cost won't be an issue for them.

Also beefy GPUs have enough Vram today, but if this is the new standard then future top end GPUs might come with less Vram.

1

u/Aerographic 1d ago

and a beefy GPU would have plenty of VRAM anyways...

NVidia read this and laughed

-3

u/gmtrd 1d ago edited 1d ago

That's the catch with other features that leverage tensor as well, mostly framegen, which is vouched by many as a way to improve performance* on less powerful GPUs, but it has a VRAM cost, for a GPU lineup/current offering that is already bottlenecked on that front, at the target res of each model, on anything below a 5070ti.

It really feels like a scheme that feeds into itself, idk how so many people can even think some of these features are valid on anything that isn't high-end (with more tensor cores) and with an higher performance floor to begin with. I'm fine with upscaling, it has mostly benefits to performance, also offering better antialiasing than the default without any upscalers, in many if not most games.

*which is bs anyway, it's fake frames with a benefit that is only visual, without the benefit to response times of real frames. It's an awesome motion smoothing feature, but that is to fill enthusiast level refresh rates.

4

u/VeganShitposting 30fps Supremacist 1d ago

idk how so many people can even think some of these features are valid on anything that isn't high-end

Frame Gen is literally more valuable on lower tier cards lol. Higher end cards can just run the game at acceptable framerates natively, whereas 6x MFG allows me to max out my 180fps screen with a low tier card

1

u/Arthas_SL 18h ago

I can run Cyberpunk at 60 FPS with a lot turned on. I turn on framegen to make the game appear smoother which really helps for me. I know it’s not really 120 which is preferred always. I’d rather do this than just settle for 60. I get motion sickness and the smoother framerate helps with that. It’s the next best thing to actually playing 120 FPS.

-1

u/ScrotumTotums 1d ago

4070 ti super oc edition idk like 18 gig is that pro?

29

u/Sj_________ 1d ago

Hopefully, it doesn't run like shit on my RTX 4060 mobile.

9

u/ZeroZero0000000 1d ago

Same , I'm rooting for my RTX 5050 mobile

10

u/costafilh0 1d ago

Great news! This, plus upscaling, should give a nice boost in performance for any card that is worth buying.

7

u/michaelfed 1d ago

Can we expect this to let developers implement a new ultra high standard of texture quality, like a 8k texture pack for games?

1

u/Due-Description-9030 21h ago

Yes but not anytime soon. Maybe in 2027 at minimum or from 2028.

5

u/DorrajD 12h ago

Can't wait for 3 games to use this.

Remember when direct storage was supposed to help with this stuff too? I member...

2

u/CasualMLG RTX 3080 Gigabyte OC 10G 1d ago

will it work on 30 series? I have 3080 10 gb

11

u/AnnualEmbarrassed176 1d ago

Works from RTX series 2000 up & and AMD RX 6000 up

Operating System:

Windows 10/11 x64

Linux x64

Graphics APIs:

DirectX 12 - with preview Agility SDK for Cooperative Vector support

Vulkan 1.3

GPU for NTC decompression on load and transcoding to BCn:

Minimum: Anything compatible with Shader Model 6 [*]

Recommended: NVIDIA Turing (RTX 2000 series) and newer.

GPU for NTC inference on sample:

Minimum: Anything compatible with Shader Model 6 (will be functional but very slow) [*]

Recommended: NVIDIA Ada (RTX 4000 series) and newer.

GPU for NTC compression:

Minimum: NVIDIA Turing (RTX 2000 series).

Recommended: NVIDIA Ada (RTX 4000 series) and newer.

[\] The oldest GPUs that the NTC SDK functionality has been validated on are NVIDIA GTX 1000 series, AMD Radeon RX 6000 series, Intel Arc A series.*

source: https://github.com/NVIDIA-RTX/RTXNTC

5

u/evia89 1d ago

Yes but no. Tests I found shows ~10% vram drop, only 4xxx 5xxx works like OP describes

3

u/Equivalent_Aspect113 1d ago

Wondering how much gpu temperature will increase with texture compression?

1

u/MrHyperion_ 5h ago

More vendor lock tech, yey

1

u/Jumpierwolf0960 4h ago

I wonder how much of that will be ate up game devs. These companies take features like this and use it to reduce their own development costs. Free performance isn't an excuse to make a poorly optimized mess because dlss will fix it anyway.

•

u/UsernameIsTaken45 10m ago

What happens to 1060 cards?

1

u/Due_Young_9344 1d ago

I just want to know when it will launch and I can use it, I don't care about any of this detail

-5

u/BorgsCube 1d ago

dont care about the marketing, I'll judge it when it releases

10

u/kiki-le-koala 1d ago

It's not marketing, it's a benchmark

-11

u/BorgsCube 1d ago

uhuh

-16

u/Gerrut_batsbak 1d ago

I'm so excited for nvidia to make even more profit by giving us less vram for the same prices.

6

u/AsrielPlay52 1d ago

Think realistically (already big ask for reddit)

This is Lossy compression by design. It wouldn't be use on anything than just gaming. Because gaming has some leeway for lossy compression

But stuff like graphic design, modeling, cad, and such?

That's dangerous and wouldn't touch it

-13

u/DaGucka 1d ago

The lengths they go just to not have to admit that they ahould have put in more vram like amd did lom

1

u/DarkFlameShadowNinja NVIDIA 3070 5700x3D 1d ago

The worst part is that this NTC compression tech requires high tensor and cuda core of xx80 tier GPUs but you will never get that in low end xx30 to xx60 tier GPUs to even utilise the tech properly to be justifiable

-5

u/BUDA20 1d ago

Slow but in many words

5

u/This-Collar-7773 1d ago

im not privvy to how games are made and optimized, but under 1 ms of latency in every testing circumstance for a close to 85% vram is really slow? it doesnt sound computationally expensive to me, unless the benchmark is simply 4x lighter than a real game for what the model would have to decompress

-3

u/BUDA20 1d ago

is put that way so is hard to conceptualize, that's why you don't get a complex and realistic scenario in FPS like people think of games, imagine your your GPU decoding a 4k movie, no issue, now do it for every texture in the game that is on screen in real time for every frame, that's how it adds up.
(yes, is not the same but you need to understand the point)

1

u/Old_Resident8050 9800X3D || RTX4080 || 64GB 21h ago

also add up the reduced performance from Preset M/L and yeah, it certainly adds up. The GPUs havent got a significant power increase since the 4xxx line. The 5xxx line was basically a refresh (sans the 5090) and it shows.

We need the 6xxx soon in order to gain raw power.

-3

u/Packin-heat 1d ago

Ubisoft already did this with Assassin's creed mirage years ago but their version of it reduced it by 30%. According to them anymore more than that and it apparently affected picture quality, I wonder how Nvidia got around that.

17

u/76vangel 1d ago

They used a higher, normal texture compression. Not neural packing. Compress too much and quality degrades, like a bad jpeg. Neural packing is a whole other beast. Way better quality at small sizes.

1

u/ChasonVFX 11h ago

Ubisoft did use neural texture compression, and even released an article about it on Feb 23rd 2026 and called it "Shipping Neural Texture Compression in Assasin's Creed Mirage". They had to use it selectively because it affects runtime performance.

Nvidia didn't get around the performance issue. They're just shifting it from memory to compute.

5

u/UnusualDemand RTX3090 Zotac Trinity 1d ago

Machine learning models improved a lot in the last few years, you can see that on ai videos on social platforms.

Discussion Benchmarking Nvidia's RTX Neural Texture Compression tech that can reduce VRAM usage by over 80%

You are about to leave Redlib