NVIDIA Corporation

All

642 repositories

cccl
Public
CUDA Core Compute Libraries
cpp hpc gpu modern-cpp parallel-computing cuda nvidia gpu-acceleration cuda-kernels gpu-computing
C++
•
Other
•309•2.1k•1.1k•208•Updated Dec 19, 2025Dec 19, 2025
NeMo-Agent-Toolkit
Public
The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.
Python
•
Apache License 2.0
•459•1.6k•57•32•Updated Dec 19, 2025Dec 19, 2025
numba-cuda
Public
The CUDA target for Numba
Python
•
BSD 2-Clause "Simplified" License
•51•232•99•25•Updated Dec 19, 2025Dec 19, 2025
spark-rapids-tools
Public
User tools for Spark RAPIDS
Scala
•
Apache License 2.0
•47•65•262•1•Updated Dec 19, 2025Dec 19, 2025
cuda-python
Public
CUDA Python: Performance meets Productivity
Cython
•
Other
•233•3.1k•202•15•Updated Dec 19, 2025Dec 19, 2025
Megatron-LM
Public
Ongoing research training transformer models at scale
transformers model-para large-language-models
Python
•
Other
•3.4k•15k•343•248•Updated Dec 19, 2025Dec 19, 2025
nv-ingest
Public
NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.
Python
•
Apache License 2.0
•281•2.8k•101•31•Updated Dec 19, 2025Dec 19, 2025
Fuser
Public
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
C++
•
Other
•74•366•209•215•Updated Dec 19, 2025Dec 19, 2025
OSMO
Public
The developer-first platform for scaling complex Physical AI workloads across heterogeneous compute—unifying training GPUs, simulation clusters, and edge devices in a simple YAML
Python
•
Apache License 2.0
•6•60•14•12•Updated Dec 19, 2025Dec 19, 2025
cuda-quantum
Public
C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
python cpp quantum quantum-computing hacktoberfest quantum-programming-language quantum-algorithms quantum-machine-learning unitaryhack
C++
•
Other
•313•875•405•81•Updated Dec 19, 2025Dec 19, 2025
Model-Optimizer
Public
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
Python
•
Apache License 2.0
•218•1.7k•56•53•Updated Dec 19, 2025Dec 19, 2025
TensorRT-LLM
Public
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
cuda pytorch moe blackwell llm-serving
Python
•
Other
•2k•12k•554•485•Updated Dec 19, 2025Dec 19, 2025
bionemo-framework
Public
BioNeMo Framework: For building and adapting AI models in drug discovery at scale
machine-learning gpu pytorch drug-discovery
Jupyter Notebook
•108•606•60•108•Updated Dec 18, 2025Dec 18, 2025
MatX
Public
An efficient C++20 GPU numerical computing library with Python-like syntax
hpc gpu cuda gpgpu gpu-computing
C++
•
BSD 3-Clause "New" or "Revised" License
•109•1.4k•33•6•Updated Dec 18, 2025Dec 18, 2025
TileGym
Public
Helpful kernel tutorials and examples for tile-based GPU programming
Python
•
Other
•22•452•0•1•Updated Dec 18, 2025Dec 18, 2025
nvrc
Public
The NVRC project provides a Rust binary that implements a simple init system for microVMs.
Rust
•
Apache License 2.0
•9•19•5•1•Updated Dec 18, 2025Dec 18, 2025
open-gpu-kernel-modules
Public
NVIDIA Linux open GPU kernel module source
C
•
Other
•1.5k•16k•222•46•Updated Dec 18, 2025Dec 18, 2025
TensorRT-Incubator
Public
Experimental projects related to TensorRT
MLIR
•22•117•37•12•Updated Dec 18, 2025Dec 18, 2025
spark-rapids-jni
Public
RAPIDS Accelerator JNI For Apache Spark
Cuda
•
Apache License 2.0
•76•52•86•7•Updated Dec 18, 2025Dec 18, 2025
aistore
Public
AIStore: scalable storage for AI applications
kubernetes high-performance distributed-storage high-availability object-storage multi-cloud batch-jobs s3-compatible multipart-upload ml-training
Go
•
MIT License
•231•1.7k•1•0•Updated Dec 18, 2025Dec 18, 2025
cuopt
Public
GPU accelerated decision optimization
gpu optimization cuda linear-programming
Cuda
•
Apache License 2.0
•104•624•85•33•Updated Dec 18, 2025Dec 18, 2025
skyhook
Public
A Kubernetes Operator to manage Node OS customizations.
Go
•
Apache License 2.0
•3•34•0•2•Updated Dec 18, 2025Dec 18, 2025
nvidia-resiliency-ext
Public
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to failures and interruptions.
Python
•
Other
•39•239•2•17•Updated Dec 18, 2025Dec 18, 2025
physicsnemo
Public
Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
machine-learning deep-learning physics pytorch nvidia-gpu nvidia-warp
Python
•
Apache License 2.0
•519•2.2k•39•42•Updated Dec 18, 2025Dec 18, 2025
NVFlare
Public
NVIDIA Federated Learning Application Runtime Environment
python decentralized pet privacy-protection federated-learning federated-analytics federated-computing
Python
•
Apache License 2.0
•226•849•15•21•Updated Dec 18, 2025Dec 18, 2025
TransformerEngine
Public
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
python machine-learning deep-learning gpu cuda pytorch jax fp8 fp4
Python
•
Apache License 2.0
•582•3k•280•102•Updated Dec 18, 2025Dec 18, 2025
linux
Public
OpenBMC Linux kernel source tree
C
•
Other
•60k•8•0•0•Updated Dec 18, 2025Dec 18, 2025
NeMo-Agent-Toolkit-UI
Public
The NVIDIA NeMo Agent Toolkit UI streamlines interacting with NeMo Agent Toolkit workflows in an easy-to-use web application.
TypeScript
•
Other
•44•59•7•14•Updated Dec 18, 2025Dec 18, 2025
DALI
Public
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
python machine-learning deep-learning neural-network mxnet gpu image-processing pytorch gpu-tensorflow data-processing
C++
•
Apache License 2.0
•655•5.6k•223•31•Updated Dec 18, 2025Dec 18, 2025
jaxpp
Public
JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training
Python
•
Apache License 2.0
•1•61•0•1•Updated Dec 18, 2025Dec 18, 2025