v0.1
The initial release of tilus.
What's Changed
- [CI] Add workflow to deploy wheel to pypi (#10)
- [CI] Use deep checkout for diff (#9)
- [Docs] Update copyright and remove some redundant descriptions (#8)
- [CI] Use nvidia github runners for docs building (#7)
- [CI] Update docs and ci runner for format (#6)
- [CI] Fix the permission issue of deploy github pages (#5)
- [CI][Docs] Add workflow to deploy docs (#4)
- [CI] Migrate CI runners (#2)
- [License] Add license header and contribution guide
- [Misc] Add vscoding settings
- [Docs] Add documentation for layout system
- [Docs] Add more sections in programming guides
- [Docs] Add the framework of programming guide
- [Docs] Add documentation for the remaining matmuls
- [Docs] Add docs for two matmul examples
- [Docs] Add the documentation for naive matmul
- [Docs] Add initial version of docs
- [Bugfix] Improve the performance of HoistLoopInvariants pass
- [Pass] Add HoistLoopInvariants pass
- [Pass] Add affine to recursive transformation pass
- [Pass] Explicitly list the used hidet passes
- [Bufix] Fix OOM issue in attention example
- [Example] Optimize the attention operator by spliting the sequence of kv
- [Example] Remove explicit layout in examples
- [Operator] Optimize attention operator
- [Kernel] Optimize attention kernel
- [Tool] Update tilus IRPrinter
- [Script] Support script procedure
- [Operator] Optimize the attention operator by pipelining
- [Operator] Optimize attention operator with software pipelining
- [Bugfix] Fix a bug in attention example
- [Example] Add attention example
- [Package] Update information in
pyproject.toml
- [Submodule] Remove .gitmodules
- [Feature] Automatic Layout Inference
- [Layout] Remove old layout definition
- [Layout] Use the new layout system in the emitters
- [Layout] Add the unified representation of layout system
- [Bug] Unify the segments of dynamic shape for tuning
- [Layout] Add transpose operation for register layout
- [Cache] Add cache for instantiated_script
- [Enhancement] Add more instructions and functionality
- [Reduce] Support reduce instruction
- [Instruction] Add repeat and repeat_interleave instructions
- [Optimize] Optimize layout for cast kernel
- [Options] Configure hidet option to avoid ftz flag in nvcc
- [CUDA] Avoid any cuda runtime api call during import
- [Fix] Fix a bug in quantization example
- [Sync] Upstream changes to hidet
- [Refactor] Simplify and generalize the dot instruction
- [Example] Add quantized matmul with full range of quantized data types
- [Quantization] Support low-precision data types in cast kernel
- [Tensor] Add tensor class and cast kernel
- [Fix] Fix a bug in mma emitter
- [Pass] Update the bound aware simplification pass
- [Example] Add example matmul-v7 with parallel-k implemented
- [Example] Add matmul-v6 that implements an efficient write back
- [Exampe] Add matmul-v5 example that implements software pipeline
- [Example] Add matmul-v4 that uses
copy_async
instruction - [Instruction][Example] Refactor LoadMatrix instruction
- [Version] Use setuptools_scm to manage the version number
- [IR][Codegen] Add generic load/store instructions for shared tensor
- [Tools] Add
IRVerifier
to verify the integrity and correctness of IR - [Examples] Add examples to the lint script and ci
- [Tuning] Add support of auto-tune
- [Script] Add
load_shared
andstore_shared
in Tilus Script - [IR][GlobalTensor] Introduce GlobalTensor in the Tilus IR
- [Matmul] Add simple matmul example
- [Linter] Enable mypy disallow incomplete defs
- [Linter] Enable check-untyped-defs flag of mypy
- [IR] Refactor IR classes to enforce copy-on-write machanism
- [Script] Add
tilus.script
module - [Workflow] Refactor hidet installation and wheel building as actions
- [Build] Enable end to end compilation and build
- [IR] Add Program IR node to hold multiple functions
- [IR] Use short module name and fix some bugs
- [CI] Skip installation of hidet dependency to speedup format and lint
- [IR] Refactor the virtual machine IR
- [IR] Add the core IR of the tilus language
- [Init] Initial commit
Full Changelog: https://github.com/NVIDIA/tilus/commits/v0.1