Powerful and reliable programming model and computing toolkit

Report Share

NVIDIA CUDA Toolkit 11.2.1 (for Windows 10)

February, 11th 2021 - 2.7 GB - Freeware

Free Download

Security Status

Latest Version

NVIDIA CUDA Toolkit 12.9.0 (for Windows 11)
Operating System

Windows 10 (32-bit) / Windows 10 (64-bit)
User Rating

Click to vote
Author / Product

NVIDIA Corporation / External Link
Filename

cuda_11.2.1_461.09_win10.exe

Sometimes latest versions of the software can cause issues when installed on older devices or devices running an older version of the operating system.

Software makers usually fix these issues but it can take them some time. What you can do in the meantime is to download and install an older version of NVIDIA CUDA Toolkit 11.2.1 (for Windows 10).

For those interested in downloading the most recent release of NVIDIA CUDA Toolkit or reading our review, simply click here.

All old versions distributed on our website are completely virus-free and available for download at no cost.

We would love to hear from you

If you have any questions or ideas that you want to share with us - head over to our Contact page and let us know. We value your feedback!

Download NVIDIA CUDA Toolkit 11.2.1 (for Windows 10)

Why is this app published on FileHorse? (More info)

What's new in this version:

CUDA Compiler:
Resolved Issues:
- Previously, when using recent versions of VS 2019 host compiler, a call to pow(double, int) or pow(float, int) in host or device code sometimes caused build failures. This issue has been resolved.

CuSOLVER:
New Features:
- New singular value decomposition (GESVDR) is added. GESVDR computes partial spectrum with random sampling, an order of magnitude faster than GESVD
- libcusolver.so no longer links libcublas_static.a; instead, it depends on libcublas.so. This reduces the binary size of libcusolver.so. However, it breaks backward compatibility. The user has to link libcusolver.so with the correct version of libcublas.so.

CuSPARSE:
New Features:
- New Tensor Core-accelerated Block Sparse Matrix - Matrix Multiplication (cusparseSpMM) and introduction of the Blocked-Ellpack storage format
- New algorithms for CSR/COO Sparse Matrix - Vector Multiplication (cusparseSpMV) with better performance
- New algorithm (CUSPARSE_SPMM_CSR_ALG3) for Sparse Matrix - Matrix Multiplication (cusparseSpMM) with better performance especially for small matrices
- New routine for Sampled Dense Matrix - Dense Matrix Multiplication (cusparseSDDMM) which deprecated cusparseConstrainedGeMM and provides better performance
- Better accuracy of cusparseAxpby, cusparseRot, cusparseSpVV for bfloat16 and half regular/complex data types
- All routines support NVTX annotation for enhancing the profiler time line on complex applications

Deprecations:
- cusparseConstrainedGeMM has been deprecated in favor of cusparseSDDMM
- cusparseCsrmvEx has been deprecated in favor of cusparseSpMV
- COO Array of Structure (CooAoS) format has been deprecated including cusparseCreateCooAoS, cusparseCooAoSGet, and its support for cusparseSpMV

Known Issues:
- cusparseDestroySpVec, cusparseDestroyDnVec, cusparseDestroySpMat, cusparseDestroyDnMat, cusparseDestroy with NULL argument could cause segmentation fault on Windows

Resolved Issues:
- cusparseAxpby, cusparseGather, cusparseScatter, cusparseRot, cusparseSpVV, cusparseSpMV now support zero-size matrices
- cusparseCsr2cscEx2 now correctly handles empty matrices (nnz = 0)
- cusparseXcsr2csr_compress now uses 2-norm for the comparison of complex values instead of only the real part

Extended functionalities for cusparseSpMV:
- Support for the CSC format
- Support for regular/complex bfloat16 data types for both uniform and mixed-precision computation
- Support for mixed regular-complex data type computation
- Support for deterministic and non-deterministic computation

NPP:
New features:
- New APIs added to compute Distance Transform using Parallel Banding Algorithm (PBA) - nppiDistanceTransformPBA_xxxxx_C1R_Ctx() – where xxxxx specifies the input and output combination 8u16u, 8s16u, 16u16u, 16s16u, 8u32f, 8s32f, 16u32f, 16s32f) and nppiSignedDistanceTransformPBA_32f_C1R_Ctx()

Resolved issues:
- Fixed the issue in which Label Markers adds zero pixel as object region

NVJPEG:
New Features:
- nvJPEG decoder added a new API to support region of interest (ROI) based decoding for batched hardware decoder: nvjpegDecodeBatchedEx() and nvjpegDecodeBatchedSupportedEx()

Resolved Issues:
- Previously, reduced performance of power-of-2 single precision FFTs was observed on GPUs with sm_86 architecture. This issue has been resolved
- Large prime factors in size decomposition and real to complex or complex to real FFT type no longer cause cuFFT plan functions to fail

CUPTI:
Deprecations early notice:
- The following functions are scheduled to be deprecated in 11.3 and will be removed in a future release:
- NVPW_MetricsContext_RunScript and NVPW_MetricsContext_ExecScript_Begin from the header nvperf_host.h.
- cuptiDeviceGetTimestamp from the header cupti_events.h