Skip to content

Commit dd5cda0

Browse files
authored
Add CMake flag for pipeline parallelism for multi-GPU (#940)
LCPP Default is set to 4, which is a bit too much in my opinion. Saves VRAM (0.5-1%?), some compute and some electricity if set to 2, at the expense of some potential performance (prompt processing?), that I do not notice in usage. 2 is thus my own setting.
1 parent f7a0d25 commit dd5cda0

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

CMakeLists.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ set(LLAMA_GPROF OFF)
2626
set(LLAMA_SANITIZE_THREAD OFF)
2727
set(LLAMA_SANITIZE_ADDRESS OFF)
2828
set(LLAMA_SANITIZE_UNDEFINED OFF)
29+
set(LLAMA_SCHED_MAX_COPIES "2" CACHE STRING "llama: max input copies for pipeline parallelism")
2930

3031
# instruction set specific
3132
option(LLAMA_AVX "llama: enable AVX" ON)
@@ -66,6 +67,7 @@ set(THREADS_PREFER_PTHREAD_FLAG ON)
6667
find_package(Threads REQUIRED)
6768

6869
add_compile_definitions(LOG_DISABLE_LOGS)
70+
add_compile_definitions(GGML_SCHED_MAX_COPIES=${LLAMA_SCHED_MAX_COPIES})
6971

7072
file(GLOB GGML_SOURCES_CUDA "ggml-cuda/*.cu")
7173
list(APPEND GGML_SOURCES_CUDA "ggml-cuda.cu")

0 commit comments

Comments
 (0)