Powerful and reliable programming model and computing toolkit

NVIDIA CUDA Toolkit

NVIDIA CUDA Toolkit 12.9.1 (for Windows 10)

  -  3.3 GB  -  Freeware

Sometimes latest versions of the software can cause issues when installed on older devices or devices running an older version of the operating system.

Software makers usually fix these issues but it can take them some time. What you can do in the meantime is to download and install an older version of NVIDIA CUDA Toolkit 12.9.1 (for Windows 10).


For those interested in downloading the most recent release of NVIDIA CUDA Toolkit or reading our review, simply click here.


All old versions distributed on our website are completely virus-free and available for download at no cost.


We would love to hear from you

If you have any questions or ideas that you want to share with us - head over to our Contact page and let us know. We value your feedback!

Why is this app published on FileHorse? (More info)

What's new in this version:

General CUDA:
CUDA Toolkit Major Components:
- Starting with CUDA 11, individual components within the CUDA Toolkit (for example: compiler, libraries, tools) are versioned independently

New Features:
CUDA Compiler:
CUDA Developer Tools:
- For changes to nvprof and Visual Profiler, see the changelog
- For new features, improvements, and bug fixes in Nsight Systems, see the changelog
- For new features, improvements, and bug fixes in Nsight Visual Studio Edition, see the changelog
- For new features, improvements, and bug fixes in CUPTI, see the changelog
- For new features, improvements, and bug fixes in Nsight Compute, see the changelog
- For new features, improvements, and bug fixes in Compute Sanitizer, see the changelog
- For new features, improvements, and bug fixes in CUDA-GDB, see the changelog

Fixed:
CUDA Compiler:
- Starting with CUDA 12.8, we observed miscompilation issues caused by incorrect code generation for address calculations involving large immediate values (i.e., values that exceed the bounds of a 32-bit integer). This miscompiled code can lead to runtime errors such as “illegal memory access” on SM90 and SM100. The issue has been resolved in CUDA 12.9.1.
- The problem can be triggered by a PTX pattern in which a group of add instructions sharing the same base operand but use different immediate values as the second operand. These immediate values exceed the bounds of a 32-bit integer. The register values used in the add instructions are all warp-uniform, and an add instruction with the larger immediate value is scheduled before the one with the smaller immediate value.