Commit 6cdfffd
committed
Remove dymanic memory allocation during rutime
This commit addresses review comments.
Also, we have saperated out legacy mnpack path
and matmul_tiled paths for tinyBLAS_Q0_PPC class.
Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>
10 ~ 30% improvement in PP Speed with Q4_0 and Q8_0 Models.
Tested with Meta-Llama3-8B quatized models with llama-bench,
llama-batched-bench.1 parent 52fb79b commit 6cdfffd
2 files changed
Lines changed: 44 additions & 372 deletions
0 commit comments