Skip to content

Commit 62298bc

Browse files
authored
perf: customize cublastLt algo for Llamba 3.3 70B TP4 (#6315)
Signed-off-by: Zhenhua Wang <[email protected]>
1 parent 7b6aadc commit 62298bc

File tree

2 files changed

+4
-1
lines changed

2 files changed

+4
-1
lines changed

.clangd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ CompileFlags:
2929
# Tweak the clangd parse settings for all files
3030
CompileFlags:
3131
Compiler: clang++
32-
CompilationDatabase: .
32+
CompilationDatabase: cpp/build
3333
Add:
3434
# report all errors
3535
- "-ferror-limit=0"

cpp/tensorrt_llm/thop/cublasScaledMM.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,9 @@ AlgoListType fp8_algo_list = {
6666
{{8, 8192, 8192}, {393, 36, 1, 0, 0, 5, 2}},
6767
// [-algo66 -m_tile10 -m_stages36 -m_numsK1 -m_reduction0 -m_swizzle0 -m_custom1 -m_mma0 -m_cga2 -m_scheduling1]
6868
{{8, 8192, 57344}, {10, 36, 1, 0, 0, 1, 2}},
69+
// Llama-3.3-70B TP4 (this is the default algo on B200. Here we aim to use the same algo on GB200.)
70+
// [-algo66 -m_tile393 -m_stages36 -m_numsK1 -m_reduction0 -m_swizzle0 -m_custom1 -m_mma0 -m_cga4 -m_scheduling1]
71+
{{8, 8192, 14336}, {393, 36, 1, 0, 1, 1, 4}},
6972
};
7073

7174
void set_algo_attr(cublasLtMatmulAlgo_t& algo, std::array<int, 7> const& attr_list)

0 commit comments

Comments
 (0)