add arrow-down arrow-left arrow-right arrow-up authorcheckmark clipboard combo comment delete discord dots drag-handle dropdown-arrow errorfacebook history inbox instagram issuelink lock markup-bbcode markup-html markup-pcpp markup-cyclingbuilder markup-plain-text markup-reddit menu pin radio-button save search settings share star-empty star-full star-half switch successtag twitch twitter user warningwattage weight youtube

Why is the race for efficiency so important in elite-performance GPUs?

hi_im_snowman

29 months ago

I've been wondering this for a while now... and forgive me if this is a noob question but why couldn't GPU makers increase the GPU die size to provide massive compute-power increases and forgo some efficiency? I don't really understand the "we have to do everything in our power to shrink the lithography in order to increase compute power without decreasing efficiency" mindset, when it comes to elite-performance products.

To be honest, I'd actually consider a 200$ price increase over the top tier GPU if there was a new class of video cards available: the "more-power-consuming-and-requires-water-cooling-card-but-yields-40%-more-compute-power GPU." The "V12" engine, if you will, of the GPU world.

Consumers can buy a 1200w PSU that can power my neighbors' Tesla, but nVidia gave us a 1080ti @ 250w. That's great, but where's the "GTX Nitro" @ 400w with 40%+ increased performance over the ti model?

I presume my layman's understanding of the hardware prevents me from drawing deeper conclusions, so I figured I'd ask the much smarter PCPP community what y'all think/know! :)

Comments

  • 29 months ago
  • 2 points

if you increase the die size, you increase latency between ip blocks. so you lose performance by increasing die size. if you increase die size, you decrease yield - so you increase the price to build gpus.

historically amd was contractually obligated to consume a certain amount of production from global foundaries so had a big die size mentality. this is changing though so what people used to expect from amd (big die, high power consumption, good price/performance, bad watts/performance). is going to change. amd now is going to be motivated to reduce die size since they rewrote their Global Foundaries contracts.

I think your idea of a gpu requring water cooling is interesting. It would be difficult for nvidia and amd to justify at this point (ie in 2017) because they are being slammed - they can't even produce enough gpus to meet demand. I think they'd say for you - you should be looking at titans or sli.

  • 29 months ago
  • 1 point

Very interesting, this bit about AMD, I'll definitely read up on that, thanks!

  • 29 months ago
  • 1 point

Well, a larger die produces enough heat to blow up the neighbor's Tesla, and draws enough power to suck it's batteries dry. Maybe a bit of an exaggeration, but not by much. Also, to cool and maintain normal GPUs already takes a lot of cooling and energy. And more compute cores = lower clock speed, as the speed if the chip is limited to the lowest-performing compute core. And increasing die size (again) isn't that simple. GPU cores are pretty scaleable, but information transfer speed within the die is not. A bigger die with more cires increases the latency between the furthest cores and the cache, slowing calculations in the GPU.

  • 29 months ago
  • 1 point

Really it's not that CPU's have been shrinking in their overall size. It's that their constituent components (transistors) have been shrunk to fit more on a certain sized CPU. In order to do what they are suggesting you would need to increase the size of the chip and thus the socket. This would require an entirely new board as well as techniques for cooling such a large chip. The bright side would be that since the chip size increased so would the surface area that could be cooled.

  • 29 months ago
  • 2 points

A bit off topic, but AMD's solution to this issue is to place several smaller dies on an interconnect fabric and place the chips far enough apart to distribute the heat load for more efficient cooling. Which is why despite being such a massive CPU package, threadripper really doesn't run all that hot.

In the future I could definitely see AMD trying to do the same thing with GPU's to maximize thermal and manufacturing performance. It will require some advancements in driver multi-GPU compatibility though.

  • 29 months ago
  • 1 point

That's pretty interesting. I wonder what technology Intel employs to achieve similar results.

  • 29 months ago
  • 1 point

http://www.anandtech.com/show/9561/exploring-intels-omnipath-network-fabric

Omnipath, but from what I've seen it only looks like they use for their Xeon Phi CPU's, which are basically CPU's on a PCI-e card used for things like artificial intelligence and machine learning. They don't have it on any of their desktop or HPD Chips like Ryzen does, I think.

  • 29 months ago
  • 1 point

It's kinda interesting.

Here is a shot of a delidded Skylake-X chip: https://abload.de/img/e42intelskylakex_delihps0y.jpg

and here is a shot of a delidded Threadripper "chip" (In reality "Chips"): http://cdn.overclock.net/c/c0/500x1000px-LL-c08df673_article-630x354.90c8d2d6.jpeg

It's hard to judge scale without them being side by side, but it's pretty obvious that the Skylake-X chip is 1 massive die vs 4 small dies on threadripper. Which as I mentioned in another comment, makes Ryzen exponentially easier to manufacture and probably contributes to the lower cost per core.

  • 29 months ago
  • 1 point

Geez, the complexity us humans can come up with... Amazing

  • 29 months ago
  • 1 point

GPU, not CPU.

  • 29 months ago
  • 1 point

GPU, CPU same principal.

  • 29 months ago
  • 1 point

Not completely the same, but your explanation above is pretty on-point regardless,

  • 29 months ago
  • 1 point

My bad.

  • 29 months ago
  • 1 point

Well nVidia is almost at the limit for largest die possible with the V100s 815²mm 21 billion transistor die. They crammed 5,376 CUDA cores in there.

  • 29 months ago
  • 1 point

As you dubbed it, the "GTX NITRO" would be a very niche card and not very beneficial for Nvidia to produce since it would only appeal to such a small market (from a business standpoint). Back in the not too distant past, Nvidia and ATI were sticking two cards on one PCB (9800GX2, GTX 690, 295x2, etc) but it was such a small market.. Look at the R9 295x2.. You barely heard of that card in the mainstream market because very few people bought it because it required a 1000W PSU and cost like $1500 when it came out.. Now, while plenty of people bought it, it paled in comparison to the number of sales and profits that the 290/290X and 970/980/ti were making in the high end mainstream market.. I can talk about this for hours, but what it really comes down to is finding that sweet spot of high performance/price while still appealing to the largest consumer base.. It's business.. It's the same reason that you don't see consumer market cars with 2000HP, even though the technology has been there for decades..

TL;DR : Companies make products that appeal to the largest consumer base possible.

  • 29 months ago
  • 1 point

1MW PSU, not TW. Otherwise, your explanation is on point.

  • 29 months ago
  • 1 point

I looked it up and I guess we are both wrong.. 1000W isn't even a Megawatt.. Edited my post to reflect..

  • 29 months ago
  • 1 point

Damn, now that i think about it of course it's not! I'm dumb. It's a Kilowatt. Mega is 1000 kilos and tera is 1000 megas.

  • 29 months ago
  • 1 point

TL;DR : Companies make products that appeal to the largest consumer base possible.

I'll see your argument (which I agree with) but will raise you another: Wouldn't Nvidia's Titan X be evidence that they've already assessed there's a large enough customer base interested in purchasing ultra high-end cards at monumental markups?

What i'm pondering is why isn't there a far more power-consuming/mandatory water-cooled GPU option which would eclipse the ti models? A true "Elite" option for consumers wanting the pinnacle of performance - at almost any cost. Those wanting top 1% performance don't care about saving a few hundred watts (or dollars), imo.

In reality, I'm just confused as to why the Titan (today's "Elite" option) is as close in performance as the ti model at half the price... There are many people, I believe, who would pay for a true top-1% option with real bench's to prove it. I don't see that option today available anywhere.

Strange.

  • 29 months ago
  • 1 point

It's not strange.. The only reason that Nvidia even releases a Titan is for an advertising scheme for their Ti series cards... They release a Titan with each of their new series of cards and then about 6 months later, they release the xx80 Ti for several hundred dollars less than the Titan with similar or sometimes greater performance..

It starts a a cash grab from early adopters that want the best thing possible when a new series comes out (money is no issue). Then once the Ti comes out, it matches/out performs the "god card" and becomes the new go to and since it's much cheaper, people feel that they are getting the performance of a $1200 card for only $700. Think about how many times you've seen somebody mention buying a Titan, only to see somebody instantly reply "Wait for the xx80 Ti" or "Why buy a Titan when the Ti is so much cheaper?"... It's a very successful marketing scheme by Nvidia.. Anytime a company (any company) releases a product that competes with one of their own products, you have to wonder why they do it..

Money. It's Business 101.

  • 29 months ago
  • 1 point

Decreasing transistor size has many benefits. You can make chips cheaper (up until 28nm) that run faster while using less power.

Efficiency is important, as not everybody just ignores power consumption. Data centers make up a large chunk of GPU sales (and therefore future R&D funding) and having a thousands of inefficient chips can translate to several hundred $$$ in electricity costs.

  • 29 months ago
  • 1 point

So the simplest answer is just that lower power consumption means lower temperature which means higher clock speed. So even if you have a massive die, if it's energy efficient it will produce less heat as waste energy and thus allow it to run at full speed without risk of thermal throttling.

The more complicated answer about why manufacturers don't just make massive chips has to do with economies of scale for chip manufacture. The basic jist is that as die size grows, cost of manufacture grows exponentially, not linearly. For a more complete explanation watch this: https://www.youtube.com/watch?v=d9aaGyqm2m8

  • 29 months ago
  • 1 point

The law of diminishing returns.

[comment deleted by staff]
  • 29 months ago
  • 1 point

I agree with most of what you said. but I still think there is a race for performance/watt. Miners in particular care about Mhash/watt. gpu noise goes down with performance/watt. I also think lower performance/watt is more ethical.

Sort

add arrow-down arrow-left arrow-right arrow-up authorcheckmark clipboard combo comment delete discord dots drag-handle dropdown-arrow errorfacebook history inbox instagram issuelink lock markup-bbcode markup-html markup-pcpp markup-cyclingbuilder markup-plain-text markup-reddit menu pin radio-button save search settings share star-empty star-full star-half switch successtag twitch twitter user warningwattage weight youtube