Skip to content

v0.1

Compare
Choose a tag to compare
@yaoyaoding yaoyaoding released this 05 Aug 22:46
· 25 commits to main since this release
c5a0d16

The initial release of tilus.

What's Changed

  • [CI] Add workflow to deploy wheel to pypi (#10)
  • [CI] Use deep checkout for diff (#9)
  • [Docs] Update copyright and remove some redundant descriptions (#8)
  • [CI] Use nvidia github runners for docs building (#7)
  • [CI] Update docs and ci runner for format (#6)
  • [CI] Fix the permission issue of deploy github pages (#5)
  • [CI][Docs] Add workflow to deploy docs (#4)
  • [CI] Migrate CI runners (#2)
  • [License] Add license header and contribution guide
  • [Misc] Add vscoding settings
  • [Docs] Add documentation for layout system
  • [Docs] Add more sections in programming guides
  • [Docs] Add the framework of programming guide
  • [Docs] Add documentation for the remaining matmuls
  • [Docs] Add docs for two matmul examples
  • [Docs] Add the documentation for naive matmul
  • [Docs] Add initial version of docs
  • [Bugfix] Improve the performance of HoistLoopInvariants pass
  • [Pass] Add HoistLoopInvariants pass
  • [Pass] Add affine to recursive transformation pass
  • [Pass] Explicitly list the used hidet passes
  • [Bufix] Fix OOM issue in attention example
  • [Example] Optimize the attention operator by spliting the sequence of kv
  • [Example] Remove explicit layout in examples
  • [Operator] Optimize attention operator
  • [Kernel] Optimize attention kernel
  • [Tool] Update tilus IRPrinter
  • [Script] Support script procedure
  • [Operator] Optimize the attention operator by pipelining
  • [Operator] Optimize attention operator with software pipelining
  • [Bugfix] Fix a bug in attention example
  • [Example] Add attention example
  • [Package] Update information in pyproject.toml
  • [Submodule] Remove .gitmodules
  • [Feature] Automatic Layout Inference
  • [Layout] Remove old layout definition
  • [Layout] Use the new layout system in the emitters
  • [Layout] Add the unified representation of layout system
  • [Bug] Unify the segments of dynamic shape for tuning
  • [Layout] Add transpose operation for register layout
  • [Cache] Add cache for instantiated_script
  • [Enhancement] Add more instructions and functionality
  • [Reduce] Support reduce instruction
  • [Instruction] Add repeat and repeat_interleave instructions
  • [Optimize] Optimize layout for cast kernel
  • [Options] Configure hidet option to avoid ftz flag in nvcc
  • [CUDA] Avoid any cuda runtime api call during import
  • [Fix] Fix a bug in quantization example
  • [Sync] Upstream changes to hidet
  • [Refactor] Simplify and generalize the dot instruction
  • [Example] Add quantized matmul with full range of quantized data types
  • [Quantization] Support low-precision data types in cast kernel
  • [Tensor] Add tensor class and cast kernel
  • [Fix] Fix a bug in mma emitter
  • [Pass] Update the bound aware simplification pass
  • [Example] Add example matmul-v7 with parallel-k implemented
  • [Example] Add matmul-v6 that implements an efficient write back
  • [Exampe] Add matmul-v5 example that implements software pipeline
  • [Example] Add matmul-v4 that uses copy_async instruction
  • [Instruction][Example] Refactor LoadMatrix instruction
  • [Version] Use setuptools_scm to manage the version number
  • [IR][Codegen] Add generic load/store instructions for shared tensor
  • [Tools] Add IRVerifier to verify the integrity and correctness of IR
  • [Examples] Add examples to the lint script and ci
  • [Tuning] Add support of auto-tune
  • [Script] Add load_shared and store_shared in Tilus Script
  • [IR][GlobalTensor] Introduce GlobalTensor in the Tilus IR
  • [Matmul] Add simple matmul example
  • [Linter] Enable mypy disallow incomplete defs
  • [Linter] Enable check-untyped-defs flag of mypy
  • [IR] Refactor IR classes to enforce copy-on-write machanism
  • [Script] Add tilus.script module
  • [Workflow] Refactor hidet installation and wheel building as actions
  • [Build] Enable end to end compilation and build
  • [IR] Add Program IR node to hold multiple functions
  • [IR] Use short module name and fix some bugs
  • [CI] Skip installation of hidet dependency to speedup format and lint
  • [IR] Refactor the virtual machine IR
  • [IR] Add the core IR of the tilus language
  • [Init] Initial commit

Full Changelog: https://github.com/NVIDIA/tilus/commits/v0.1