• Secret300@sh.itjust.works
    link
    fedilink
    arrow-up
    10
    ·
    10 months ago

    What is “end-to-end GPU Acceleration”? Like for playing back video? Or for rendering stuff like in blender

    • IHeartBadCode@kbin.social
      link
      fedilink
      arrow-up
      21
      ·
      10 months ago

      Data science term. Means everything runs inside the GPU entirely. No CPU or system RAM outside of the (usually Python) interface that started, monitors, and collects the result of the job.

      ROCm is AMD’s solution to CUDA that covers for nVidia.

        • IHeartBadCode@kbin.social
          link
          fedilink
          arrow-up
          5
          ·
          10 months ago

          Both are vendor specific implementations of processing on GPUs. This is in opposition to open standards like OpenCL, which a lot of the exascale big boys out there mostly use.

          nVidia spent a lot of cash on “outreach” to get CUDA into a lot of various packages in R, python, and what not. That did a lot of displacement from OpenCL stuff. These libraries are what a lot of folks spin up on as most of the leg work is done for them in the library. With the exascale rigs, you literally have a team that does nothing but code very specific things on the machine in front of them, so yeah, they go with the thing that is the most portable, but doesn’t exactly yield libraries for us mere mortals to use.

          AMD has only recently had the cash to start paying folks to write libs for their stuff. So were starting to see it come to python libs and what not. Likely, once it becomes a fight of CUDA v ROCm, people will start heading back over to OpenCL. The “worth it” for vendor lock-in for CUDA and ROCm will diminish more and more over time. But as it stands, with CUDA you do get a good bit of “squeezing that extra bit of steam out of your GPU” by selling your soul to nVidia.

          That last part also plays into the “why” of CUDA and ROCm. If you happen to NOT have a rig with 10,000 GPUs, then the difference between getting 98% of your GPU and 99.999% of your GPU means a lot to you. If you do have 10,000 GPUs, having like a 1% inefficiency is okay, you’ve got 10,000 GPUs the 1% loss is barely noticeable and not worth it to lose portability with OpenCL.