Powerful and reliable programming model and computing toolkit

NVIDIA CUDA Toolkit

NVIDIA CUDA Toolkit 12.9.0 (for Windows 10)

  -  3.3 GB  -  Freeware

Sometimes latest versions of the software can cause issues when installed on older devices or devices running an older version of the operating system.

Software makers usually fix these issues but it can take them some time. What you can do in the meantime is to download and install an older version of NVIDIA CUDA Toolkit 12.9.0 (for Windows 10).


For those interested in downloading the most recent release of NVIDIA CUDA Toolkit or reading our review, simply click here.


All old versions distributed on our website are completely virus-free and available for download at no cost.


We would love to hear from you

If you have any questions or ideas that you want to share with us - head over to our Contact page and let us know. We value your feedback!

Why is this app published on FileHorse? (More info)

What's new in this version:

General CUDA:
- MPS client termination is now supported on Tegra platforms (For L4T users - starting JetPack 7.0 only). More details can be found here
- Extended CUDA in Graphics (CIG) mode now supports Vulkan, expanding beyond the previous DirectX-only implementation
- CPU NUMA allocation support through cuMemCreate and cuMemAllocAsync is now available on Windows when using the driver in WDDM and MCDM modes, expanding this previously Linux-only feature
- CUDA Graphs functionality has been enhanced to support the inclusion of memory nodes in child graphs
- CUDA Toolkit 12.9 adds compiler target support for SM architecture 10.3 (sm_103, sm_103f, and sm_103a), enabling development for the latest GPU architectures with specific optimizations for each variant

CUDA Toolkit 12.9 introduces compiler support for a new target architecture class: family-specific architectures. Learn more: NVIDIA Blog: Family-Specific Architecture Features:
Multiple enhancements to NVML and nvidia-smi:
Added counters (in microseconds) for the throttling time for the following reasons:
- nvmlClocksEventReasonGpuIdle
- nvmlClocksEventReasonApplicationsClocksSetting
- nvmlClocksEventReasonSwPowerCap
- nvmlClocksThrottleReasonHwSlowdown
- nvmlClocksEventReasonSyncBoost
- nvmlClocksEventReasonSwThermalSlowdown
- nvmlClocksThrottleReasonHwThermalSlowdown
- nvmlClocksThrottleReasonHwPowerBrakeSlowdown
- nvmlClocksEventReasonDisplayClockSetting
- Improved consistency for device identification between CUDA and NVML
- Added NVML chip-to-chip (C2C) telemetry APIs
- Added CTXSW metrics
- Implemented GPU average power counters
- Added PCIe bind/unbind events

CUDA Compiler:
- Added a new compiler option --Ofast-compile=<level>, supported in nvcc, nvlink, nvrtc, and ptxas. This option prioritizes faster compilation over optimizations at varying levels, helping to accelerate development cycles. Refer to the fast-compile documentation for more details.
- Added a new compiler option --frandom-seed=<seed>, supported in nvcc and nvrtc. The user specified random seed will be used to replace random numbers used in generating symbol names and variable names. The option can be used to generate deterministically identical ptx and object files. If the input value is a valid number (decimal, octal, or hex), it will be used directly as the random seed. Otherwise, the CRC value of the passed string will be used instead. NVCC will also pass the option, as well as the user specified value to host compilers, if the host compiler is either GCC or Clang, since they support -frandom-seed option as well. Users are responsible for assigning different seeds to different files.

CUDA Developer Tools:
- For changes to nvprof and Visual Profiler, see the changelog
- For new features, improvements, and bug fixes in Nsight Systems, see the changelog
- For new features, improvements, and bug fixes in Nsight Visual Studio Edition, see the changelog
- For new features, improvements, and bug fixes in CUPTI, see the changelog
- For new features, improvements, and bug fixes in Nsight Compute, see the changelog
- For new features, improvements, and bug fixes in Compute Sanitizer, see the changelog
- For new features, improvements, and bug fixes in CUDA-GDB, see the changelog

Fixed:
CUDA Compiler:
- Resolved a segmentation fault that occurred when a lambda expression used as a class template argument was invoked inside a function template
- Resolved NVCC internal assertion triggered when inheriting protected constructors
- Resolved issue where C++20 template parameter lists in lambdas and the new auto syntax were causing nvcc to fail
- Fixed issue with incorrect C++20 if constexpr(concept) usage in template lambda
- Resolved template compile error in CUDA 12.6.1 when using MSVC with C++20
- Fixed NVCC issue with incorrect initialization of std::vector of std::any in C++ code