Prerequisites
Feature Description
The documentation for /completion has a description for this obsolete field:
image_data: An array of objects to hold base64-encoded image data and its ids to be reference in prompt. You can determine the place of the image in the prompt as in the following: USER:[img-12]Describe the image in detail.\nASSISTANT:. In this case, [img-12] will be replaced by the embeddings of the image with id 12 in the following image_data array: {..., "image_data": [{"data": "<BASE64_STRING>", "id": 12}]}. Use image_data only with multimodal models, e.g., LLaVA.
However, when passing a prompt with [img-1] to a multimodal model loaded along its corresponding mmproj, the model doesn't understand the image. It works fine with the /chat/completions endpoint though.
Motivation
My project does its own prompt formatting and communicates with llama.cpp through /completion. I would like to integrate llama.cpp's multimodal feature but am unable to due to the limitation above.
Possible Implementation
No response
Prerequisites
Feature Description
The documentation for
/completionhas a description for this obsolete field:However, when passing a prompt with [img-1] to a multimodal model loaded along its corresponding mmproj, the model doesn't understand the image. It works fine with the
/chat/completionsendpoint though.Motivation
My project does its own prompt formatting and communicates with llama.cpp through
/completion. I would like to integrate llama.cpp's multimodal feature but am unable to due to the limitation above.Possible Implementation
No response