Skip to content

Commit 6cdfffd

Browse files
committed
Remove dymanic memory allocation during rutime
This commit addresses review comments. Also, we have saperated out legacy mnpack path and matmul_tiled paths for tinyBLAS_Q0_PPC class. Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com> 10 ~ 30% improvement in PP Speed with Q4_0 and Q8_0 Models. Tested with Meta-Llama3-8B quatized models with llama-bench, llama-batched-bench.
1 parent 52fb79b commit 6cdfffd

2 files changed

Lines changed: 44 additions & 372 deletions

File tree

0 commit comments

Comments
 (0)