more streamlined APIs for typical use cases by davidkoski · Pull Request #330 · ml-explore/mlx-swift-examples

davidkoski · 2025-06-12T16:58:25Z

inspired by https://developer.apple.com/documentation/foundationmodels

davidkoski · 2025-06-12T16:59:03Z

Libraries/MLXLLM/LLMModelFactory.swift

        hub: HubApi, configuration: ModelConfiguration,
        progressHandler: @Sendable @escaping (Progress) -> Void
-    ) async throws -> ModelContext {
+    ) async throws -> sending ModelContext {


This is the correct syntax for passing ownership back.

davidkoski · 2025-06-12T17:01:28Z

Libraries/MLXLMCommon/ModelFactory.swift


+/// Default instance of HubApi to use.  This is configured to save downloads into the caches directory.
+public var defaultHubApi: HubApi = {
+    HubApi(downloadBase: FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first)


Per feedback and experience putting the downloaded weights in ~/Documents was problematic -- it synced to iCloud Documents and participated in backups on iOS devices.

Another option is to use ~/Downloads (like the MLXChatExample app does) but that requires specific entitlements. This will put them in ~/Library/Caches (or the equivalent in a container):

ls ~/Library/Caches/models/mlx-community Qwen3-0.6B-4bit Qwen3-4B-4bit Qwen3-8B-4bit

davidkoski · 2025-06-12T17:01:56Z

Libraries/MLXLMCommon/ModelFactory.swift


+    public func load(
+        hub: HubApi = defaultHubApi, id: String,
+        progressHandler: @Sendable @escaping (Progress) -> Void = { _ in }


All of these are missing documentation -- I will add that once we are in agreement on API.

This allows:

let model = try await LLMModelFactory.shared.load(id: "mlx-community/Qwen3-4B-4bit")

Question: we have had some feedback about load being both download and load. Do we want to address that here? It may be tricky as the current load() semantics cover both. We can easily add a download() method, for sure.

As far as simplicity goes, having this do both seems better, but it propagates the issue forward.

I think we should keep it this way for now.

Sorry naive question: but why do we need to do LLMModelFactory.shared.load vs either just a free function load or simply LLMModelFactory.load?

Per discussion:

load() has layering issues as MLXLMCommon doesn't know how to load VLM/LLM (it is above those in the layering)

we can skip the .shared. part with a static function

Though I wonder if we can do something dynamic ... let me try that.

Yep, that worked

davidkoski · 2025-06-12T17:03:24Z

Libraries/MLXLMCommon/ModelFactory.swift

+    public func load(
+        hub: HubApi = defaultHubApi, directory: URL,
+        progressHandler: @Sendable @escaping (Progress) -> Void = { _ in }
+    ) async throws -> sending ModelContext {


This allows:

let model = try await LLMModelFactory.shared.load(directory: .homeDirectory.appending(component: "my-model"))

davidkoski · 2025-06-12T17:04:06Z

Libraries/MLXLMCommon/Streamlined.swift

+import Foundation
+import MLX
+
+private class Generator {


Code shared between the one-shot and session calls. It is a little more complex than just calling the methods as it handles some variants.

davidkoski · 2025-06-12T17:05:33Z

Libraries/MLXLMCommon/Streamlined.swift

+    image: UserInput.Image? = nil, video: UserInput.Video? = nil,
+    processing: UserInput.Processing = .init(resize: CGSize(width: 512, height: 512)),
+    generateParameters: GenerateParameters = .init()
+) async throws -> String {


A lot of arguments -- all but the prompt have defaults:

let model = try await LLMModelFactory.shared.load(id: "mlx-community/Qwen3-4B-4bit") print(try await generate(model, "What are three things to see in Paris?"))

The others can be supplied if you are using a VLM and want to add an image for example.

Do we like this API?

(pro) it matches the python API

(con) there are a lot of overloads of generate() -- is this confusing? we can highlight this one in the docs

(con) it doesn't match the naming of the FM api or the session api below (respond(to:))

Per discussion removing these free functions -- they are covered by the ChatSession API and that is minimal overhead to create. You can just create & discard for the one-shot.

davidkoski · 2025-06-12T17:06:07Z

Libraries/MLXLMCommon/Streamlined.swift

+}
+
+public func generate(
+    _ model: ModelContext, instructions: String? = nil, _ prompt: String,


Same as above but with a ModelContext (doesn't have the actor container). Actually the example code above uses this one.

davidkoski · 2025-06-12T17:09:23Z

Libraries/MLXLMCommon/Streamlined.swift

+    return try await generator.generate()
+}
+
+public func stream(


Same as above but produce a streaming output:

for try await item in stream(model, prompt) { print(item, terminator: "") }

Questions on the API:

should this be more like streamResponse(to:)?

stream() vs generate() -- should it be generateStream() or streamGenerate()?

davidkoski · 2025-06-12T17:10:15Z

Libraries/MLXLMCommon/Streamlined.swift

+    return generator.stream()
+}
+
+public class ChatSession {


For chat sessions:

let session = ChatSession(model) let questions = [ "What are two things to see in San Francisco?", "How about a great place to eat?", "What city are we talking about? I forgot!", ] for question in questions { for try await item in session.streamResponse(to: question) { print(item, terminator: "") } print() }

davidkoski · 2025-06-12T17:10:32Z

Libraries/MLXLMCommon/Streamlined.swift

+            generateParameters: generateParameters)
+    }
+
+    public func respond(


These two methods closely match the FM API

davidkoski · 2025-06-12T17:11:02Z

Tests/MLXLMTests/EvalTests.swift

-            hiddenSize: 128, hiddenLayers: 128, intermediateSize: 512, attentionHeads: 32,
-            rmsNormEps: 0.00001, vocabularySize: 1500, kvHeads: 8)
+            hiddenSize: 64, hiddenLayers: 16, intermediateSize: 512, attentionHeads: 32,
+            rmsNormEps: 0.00001, vocabularySize: 100, kvHeads: 8)


I thought I reduced the size of these earlier -- no need for them to be that large, we just want to exercise the machinery.

davidkoski · 2025-06-12T17:14:16Z

Tests/MLXLMTests/EvalTests.swift

+                    )
+                }
+        )
+    }


The TestTokenizer gets a little more power -- it produces output like this:

rpxdjm twj rexpn tdrgdu tdrgdu xmrrds ldre lcowwy lcowwy lcowwy lcowwy nzlmfiz lmb lmb jkjkxz twj gefvypc lmb ldre klb ulipy cvvi tnxgjl oew cvvi xhqk unxxymp

davidkoski · 2025-06-12T17:14:46Z

Tests/MLXLMTests/StreamlinedTests.swift

+        print(try await session.respond(to: "what color is the sky?"))
+        print(try await session.respond(to: "why is that?"))
+        print(try await session.respond(to: "describe this image", image: .ciImage(CIImage.red)))
+    }


@awni @angeloskath some examples of the streamlined API

davidkoski · 2025-06-12T17:16:56Z

Tools/ExampleLLM/main.swift

@@ -0,0 +1,50 @@
+import MLXLLM
+import MLXLMCommon


@angeloskath @awni this is meant as an example of the streamlined API with full integration

how does this look?

changes? improvements?

I tried to keep the code as simple as possible but still have legible output

if the output isn't considered it can be simpler:

let session = ChatSession(model) print(try await session.respond(to: "What are three things to see in San Francisco?") print(try await session.respond(to: "How about a place to eat?")

davidkoski · 2025-06-12T17:17:30Z

Tools/llm-tool/Chat.swift

-                // add the assistant response to the chat messages
-                state.chat.append(.assistant(output))
+                // the kvcache now contains this context
+                state.chat.removeAll()


Unrelated to the API but I realized the messages/kvcache in the chat command line example were not right.

awni · 2025-06-12T18:28:59Z

Tools/ExampleLLM/main.swift

+
+        """)
+
+    for try await item in session.streamResponse(to: question) {


I slightly feel like this should be session.streamRespond(to: question) to match session.respond(to:question).

Or they could both be response instead of respond?

FM:

https://developer.apple.com/documentation/foundationmodels/languagemodelsession/respond(to:options:isolation:)

https://developer.apple.com/documentation/foundationmodels/languagemodelsession/streamresponse(to:options:)

respond(to:) and streamResponse(to:)

awni · 2025-06-12T18:32:17Z

This is really great. I left a couple inline comments / questions.

I think the main thing I'm unsure about (same as you) is if we should provide the generate API or not. It's a small improvement over the session in terms of usability. And yet it's also pretty nice and matches the Python version as well.
I'm wondering if we could simplify the loading to be ModelFactory.load("model/path") (or just a free function) and it figures out if it's an LLM or VLM dynamically?

- inspired by https://developer.apple.com/documentation/foundationmodels

awni · 2025-06-13T13:47:35Z

Libraries/MLXLMCommon/ModelFactory.swift

+/// an `actor` providing an isolation context.  Use this call when you control the isolation context
+/// and can hold the `ModelContext` directly.


Should those be in double backticks?

awni · 2025-06-13T13:48:57Z

Libraries/MLXLMCommon/ModelFactory.swift

+/// an `actor` providing an isolation context.  Use this call when you control the isolation context
+/// and can hold the `ModelContext` directly.


Same comment.

awni · 2025-06-13T13:52:23Z

Libraries/MLXLMCommon/Streamlined.swift

+
+    private let generator: Generator
+
+    /// Initialzie the ChatSession


Suggested change

/// Initialzie the ChatSession

/// Initialize the ChatSession

Also maybe ChatSession should be backticked?

awni · 2025-06-13T13:52:35Z

Libraries/MLXLMCommon/Streamlined.swift

+            generateParameters: generateParameters)
+    }
+
+    /// Initialzie the ChatSession


Suggested change

/// Initialzie the ChatSession

/// Initialize the ChatSession

awni · 2025-06-13T13:53:07Z

Libraries/MLXLMCommon/ModelFactory.swift

+///   - hub: optional HubApi -- by default uses ``defaultHubApi``
+///   - directory: directory of configuration and weights
+///   - progressHandler: optional callback for progress
+/// - Returns: a ModelContainer


Maybe ModelContainer should be in backticks?

awni · 2025-06-13T13:53:23Z

Libraries/MLXLMCommon/ModelFactory.swift

+///   - hub: optional HubApi -- by default uses ``defaultHubApi``
+///   - id: huggingface model identifier, e.g "mlx-community/Qwen3-4B-4bit"
+///   - progressHandler: optional callback for progress
+/// - Returns: a ModelContainer


Maybe ModelContainer should be in backticks?

awni · 2025-06-13T13:53:34Z

Libraries/MLXLMCommon/ModelFactory.swift

+///   - hub: optional HubApi -- by default uses ``defaultHubApi``
+///   - id: huggingface model identifier, e.g "mlx-community/Qwen3-4B-4bit"
+///   - progressHandler: optional callback for progress
+/// - Returns: a ModelContext


ModelContext in backticks?

awni

Minor nits in the docs. O/w looks awesome!

davidkoski requested review from angeloskath and awni June 12, 2025 16:58

davidkoski commented Jun 12, 2025

View reviewed changes

awni reviewed Jun 12, 2025

View reviewed changes

davidkoski added 5 commits June 12, 2025 22:41

more streamlined APIs for typical use cases

6682460

- inspired by https://developer.apple.com/documentation/foundationmodels

remove notes

374f9e6

update per PR feedback

da23f0e

documentation, cleanup

821b363

rebase from main, add documentation & readmes

fb07f27

awni reviewed Jun 13, 2025

View reviewed changes

awni approved these changes Jun 13, 2025

View reviewed changes

davidkoski force-pushed the simple-api branch from f1d1915 to fb07f27 Compare June 13, 2025 14:47

incorporate PR feedback

6f86eb3

davidkoski merged commit 45563d4 into main Jun 13, 2025
1 check passed

davidkoski deleted the simple-api branch June 13, 2025 16:25

davidkoski mentioned this pull request Jun 24, 2025

Feature: prompt caching (Fixes #310) #312

Open


		""")

		for try await item in session.streamResponse(to: question) {

		/// an `actor` providing an isolation context. Use this call when you control the isolation context
		/// and can hold the `ModelContext` directly.


		private let generator: Generator

		/// Initialzie the ChatSession

	/// Initialzie the ChatSession
	/// Initialize the ChatSession

Conversation

davidkoski commented Jun 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awni commented Jun 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awni left a comment

Choose a reason for hiding this comment

Uh oh!