benoitc
diff --git a/‎docs/content/2026-news.md‎
Lines changed: 9 additions & 0 deletions b/‎docs/content/2026-news.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/content/dirty.md‎
Lines changed: 139 additions & 5 deletions b/‎docs/content/dirty.md‎
Lines changed: 139 additions & 5 deletions
diff --git a/‎docs/content/reference/settings.md‎
Lines changed: 27 additions & 3 deletions b/‎docs/content/reference/settings.md‎
Lines changed: 27 additions & 3 deletions
diff --git a/‎gunicorn/__init__.py‎
Lines changed: 1 addition & 1 deletion b/‎gunicorn/__init__.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎gunicorn/config.py‎
Lines changed: 27 additions & 3 deletions b/‎gunicorn/config.py‎
Lines changed: 27 additions & 3 deletions
@@ -16,6 +16,15 @@
   - Lifecycle hooks: `on_dirty_starting`, `dirty_post_fork`,
     `dirty_worker_init`, `dirty_worker_exit`
 
+- **Per-App Worker Allocation for Dirty Arbiters**: Control how many dirty workers
+  load each app for memory optimization with heavy models
+  ([PR #3473](https://github.com/benoitc/gunicorn/pull/3473))
+  - Set `workers` class attribute on DirtyApp (e.g., `workers = 2`)
+  - Or use config format `module:class:N` (e.g., `myapp:HeavyModel:2`)
+  - Requests automatically routed to workers with the target app
+  - New exception `DirtyNoWorkersAvailableError` for graceful error handling
+  - Example: 8 workers × 10GB model = 80GB → with `workers=2`: 20GB (75% savings)
+
 - **HTTP/2 Support (Beta)**: Native HTTP/2 (RFC 7540) support for improved performance
   with modern clients ([PR #3468](https://github.com/benoitc/gunicorn/pull/3468))
   - Multiplexed streams over a single connection
 
@@ -89,8 +89,10 @@ This makes dirty apps ideal for ML inference, where loading a model once and reu
                                           |   |        |   |       |   |
                                           +---+--------+---+-------+---+
                                                        |
-                                     All workers load all dirty apps
-                                          [MLApp, ImageApp, ...]
+                                     Workers load apps based on allocation
+                                     Worker 1: [MLApp, ImageApp, HeavyApp]
+                                     Worker 2: [MLApp, ImageApp, HeavyApp]
+                                     Worker 3: [MLApp, ImageApp]  (HeavyApp workers=2)
 ```
 
 ### Process Relationships
@@ -138,6 +140,133 @@ gunicorn myapp:app \
 | `dirty_threads` | `1` | Threads per dirty worker |
 | `dirty_graceful_timeout` | `30` | Graceful shutdown timeout |
 
+## Per-App Worker Allocation
+
+By default, all dirty workers load all configured apps. For apps that consume significant memory (like large ML models), you can limit how many workers load a specific app.
+
+### Why Per-App Allocation?
+
+Consider a scenario with a 10GB ML model and 8 dirty workers:
+
+- **Default behavior**: 8 workers × 10GB = 80GB RAM
+- **With `workers=2`**: 2 workers × 10GB = 20GB RAM (75% savings)
+
+Requests for the limited app are routed only to workers that have it loaded.
+
+### Configuration Methods
+
+**Method 1: Class Attribute**
+
+Set the `workers` attribute on your DirtyApp class:
+
+```python
+from gunicorn.dirty import DirtyApp
+
+class HeavyModelApp(DirtyApp):
+    workers = 2  # Only 2 workers will load this app
+
+    def init(self):
+        self.model = load_10gb_model()
+
+    def predict(self, data):
+        return self.model.predict(data)
+
+    def close(self):
+        pass
+```
+
+**Method 2: Config Override**
+
+Use the `module:class:N` format in your config:
+
+```python
+# gunicorn.conf.py
+dirty_apps = [
+    "myapp.light:LightApp",           # All workers (default)
+    "myapp.heavy:HeavyModelApp:2",    # Only 2 workers
+    "myapp.single:SingletonApp:1",    # Only 1 worker
+]
+dirty_workers = 4
+```
+
+Config overrides take precedence over class attributes.
+
+### Worker Distribution
+
+When workers spawn, apps are assigned based on their limits:
+
+```
+Example with dirty_workers=4:
+  - LightApp (workers=None):  Loaded on workers 1, 2, 3, 4
+  - HeavyModelApp (workers=2): Loaded on workers 1, 2
+  - SingletonApp (workers=1):  Loaded on worker 1
+
+Worker 1: [LightApp, HeavyModelApp, SingletonApp]
+Worker 2: [LightApp, HeavyModelApp]
+Worker 3: [LightApp]
+Worker 4: [LightApp]
+```
+
+### Request Routing
+
+Requests are automatically routed to workers that have the target app:
+
+```python
+client = get_dirty_client()
+
+# Goes to any of 4 workers (round-robin)
+client.execute("myapp.light:LightApp", "action")
+
+# Goes to worker 1 or 2 only (round-robin between those)
+client.execute("myapp.heavy:HeavyModelApp", "predict", data)
+
+# Always goes to worker 1
+client.execute("myapp.single:SingletonApp", "process")
+```
+
+### Error Handling
+
+If no workers have the requested app loaded, a `DirtyNoWorkersAvailableError` is raised:
+
+```python
+from gunicorn.dirty import get_dirty_client
+from gunicorn.dirty.errors import DirtyNoWorkersAvailableError
+
+def my_view(request):
+    client = get_dirty_client()
+    try:
+        result = client.execute("myapp.heavy:HeavyModelApp", "predict", data)
+    except DirtyNoWorkersAvailableError as e:
+        # All workers with this app are down or app not configured
+        return {"error": "Service temporarily unavailable", "app": e.app_path}
+```
+
+### Worker Crash Recovery
+
+When a worker crashes, its replacement gets the **same apps** as the dead worker:
+
+```
+Timeline:
+  t=0: Worker 1 crashes (had HeavyModelApp)
+  t=1: Arbiter detects crash, queues respawn
+  t=2: New Worker 5 spawns with same apps as Worker 1
+  t=3: HeavyModelApp still available on Worker 2 during gap
+```
+
+This ensures:
+
+- No memory redistribution on existing workers
+- Predictable replacement behavior
+- The heavy model is only loaded on the new worker
+
+### Best Practices
+
+1. **Set realistic limits** - Don't set `workers=1` unless truly necessary (single point of failure)
+2. **Monitor memory** - Track per-worker memory to tune allocation
+3. **Handle unavailability** - Catch `DirtyNoWorkersAvailableError` gracefully
+4. **Use class attributes for app-specific limits** - Makes the limit part of the app definition
+5. **Use config for deployment-specific overrides** - Different limits for dev vs prod
+
 ## Creating a Dirty App
 
 Dirty apps inherit from `DirtyApp` and implement three methods:
@@ -190,8 +319,9 @@ class MLApp(DirtyApp):
 
 ### DirtyApp Interface
 
-| Method | Description |
-|--------|-------------|
+| Method/Attribute | Description |
+|------------------|-------------|
+| `workers` | Class attribute. Number of workers to load this app (`None` = all workers). |
 | `init()` | Called once when dirty worker starts, after instantiation. Load resources here. |
 | `__call__(action, *args, **kwargs)` | Handle requests from HTTP workers. |
 | `close()` | Called when dirty worker shuts down. Cleanup resources. |
@@ -604,12 +734,13 @@ watch -n 1 'pstree -p $(cat gunicorn.pid)'
 The dirty client raises specific exceptions:
 
 ```python
-from gunicorn.dirty import (
+from gunicorn.dirty.errors import (
     DirtyError,
     DirtyTimeoutError,
     DirtyConnectionError,
     DirtyAppError,
     DirtyAppNotFoundError,
+    DirtyNoWorkersAvailableError,
 )
 
 try:
@@ -620,6 +751,9 @@ except DirtyTimeoutError:
 except DirtyAppNotFoundError:
     # App not loaded in dirty workers
     pass
+except DirtyNoWorkersAvailableError as e:
+    # No workers have this app (all crashed or app limited to 0 workers)
+    print(f"No workers for app: {e.app_path}")
 except DirtyAppError as e:
     # Error during app execution
     print(f"App error: {e.message}, traceback: {e.traceback}")
 
@@ -208,24 +208,48 @@ DirtyArbiter and the exiting DirtyWorker.
 
 Dirty applications to load in the dirty worker pool.
 
-A list of application paths in pattern ``$(MODULE_NAME):$(CLASS_NAME)``.
+A list of application paths in one of these formats:
+
+- ``$(MODULE_NAME):$(CLASS_NAME)`` - all workers load this app
+- ``$(MODULE_NAME):$(CLASS_NAME):$(N)`` - only N workers load this app
+
 Each dirty app must be a class that inherits from ``DirtyApp`` base class
 and implements the ``init()``, ``__call__()``, and ``close()`` methods.
 
 Example::
 
     dirty_apps = [
-        "myapp.ml:MLApp",
-        "myapp.images:ImageApp",
+        "myapp.ml:MLApp",           # All workers load this
+        "myapp.images:ImageApp",    # All workers load this
+        "myapp.heavy:HugeModel:2",  # Only 2 workers load this
     ]
 
+The per-app worker limit is useful for memory-intensive applications
+like large ML models. Instead of all 8 workers loading a 10GB model
+(80GB total), you can limit it to 2 workers (20GB total).
+
+Alternatively, you can set the ``workers`` class attribute on your
+DirtyApp subclass::
+
+    class HugeModelApp(DirtyApp):
+        workers = 2  # Only 2 workers load this app
+
+        def init(self):
+            self.model = load_10gb_model()
+
+Note: The config format (``module:Class:N``) takes precedence over
+the class attribute if both are specified.
+
 Dirty apps are loaded once when the dirty worker starts and persist
 in memory for the lifetime of the worker. This is ideal for loading
 ML models, database connection pools, or other stateful resources
 that are expensive to initialize.
 
 !!! info "Added in 25.0.0"
 
+!!! info "Changed in 25.1.0"
+    Added per-app worker allocation via ``:N`` format suffix.
+
 ### `dirty_workers`
 
 **Command line:** `--dirty-workers INT`
 
@@ -2,7 +2,7 @@
 # This file is part of gunicorn released under the MIT license.
 # See the NOTICE for more information.
 
-version_info = (24, 1, 1)
+version_info = (25, 0, 0)
 __version__ = ".".join([str(v) for v in version_info])
 SERVER = "gunicorn"
 SERVER_SOFTWARE = "%s/%s" % (SERVER, __version__)
@@ -2885,23 +2885,47 @@ class DirtyApps(Setting):
     desc = """\
         Dirty applications to load in the dirty worker pool.
 
-        A list of application paths in pattern ``$(MODULE_NAME):$(CLASS_NAME)``.
+        A list of application paths in one of these formats:
+
+        - ``$(MODULE_NAME):$(CLASS_NAME)`` - all workers load this app
+        - ``$(MODULE_NAME):$(CLASS_NAME):$(N)`` - only N workers load this app
+
         Each dirty app must be a class that inherits from ``DirtyApp`` base class
         and implements the ``init()``, ``__call__()``, and ``close()`` methods.
 
         Example::
 
             dirty_apps = [
-                "myapp.ml:MLApp",
-                "myapp.images:ImageApp",
+                "myapp.ml:MLApp",           # All workers load this
+                "myapp.images:ImageApp",    # All workers load this
+                "myapp.heavy:HugeModel:2",  # Only 2 workers load this
             ]
 
+        The per-app worker limit is useful for memory-intensive applications
+        like large ML models. Instead of all 8 workers loading a 10GB model
+        (80GB total), you can limit it to 2 workers (20GB total).
+
+        Alternatively, you can set the ``workers`` class attribute on your
+        DirtyApp subclass::
+
+            class HugeModelApp(DirtyApp):
+                workers = 2  # Only 2 workers load this app
+
+                def init(self):
+                    self.model = load_10gb_model()
+
+        Note: The config format (``module:Class:N``) takes precedence over
+        the class attribute if both are specified.
+
         Dirty apps are loaded once when the dirty worker starts and persist
         in memory for the lifetime of the worker. This is ideal for loading
         ML models, database connection pools, or other stateful resources
         that are expensive to initialize.
 
         .. versionadded:: 25.0.0
+
+        .. versionchanged:: 25.1.0
+           Added per-app worker allocation via ``:N`` format suffix.
         """