add simple configuration generator #119

sumukshashidhar · 2025-06-04T11:44:06Z

configuration generator for the cli

alozowski

This is a good CLI enhancement, but there are a few important points I commented on, especially around API key handling. We should also update tests to cover .env edge cases (.env file is missing or not readable or writable)

alozowski · 2025-06-06T09:44:12Z

yourbench/main.py

+                i = int(idx.strip()) - 1
+                if 0 <= i < len(models):
+                    selected.append(models[i]["model_name"])
+            except ValueError:


There is a silent failure, and the debugging will be hard. I suggest to log it as always, something like this:

for idx in indices.split(","): try: i = int(idx.strip()) - 1 if 0 <= i < len(models): selected.append(models[i]["model_name"]) else: logger.warning(f"Model index {idx} is out of range (1-{len(models)})") except ValueError: logger.warning(f"Invalid model index '{idx}' - expected a number")

Fixed! Added proper logging for both out-of-range indices and invalid format errors. Now logs warnings with clear messages about what went wrong.

alozowski · 2025-06-06T09:50:51Z

yourbench/main.py

+    # Write to file
+    output.parent.mkdir(parents=True, exist_ok=True)
+    with open(output, "w") as f:
+        yaml.dump(config, f, default_flow_style=False, sort_keys=False, width=120)


Writing here can fail because of the various reasons. Let's wrap into a try/catch

try: # the file operations here except (OSError, PermissionError) as e: # log the error using logger.error() # exit gracefully with typer.Exit(1)

Also useful if users try to write to read-only folders or forget permissions

Wrapped file operations in try/catch block with OSError and PermissionError handling. Shows user-friendly error messages and exits gracefully.

alozowski · 2025-06-06T09:57:24Z

yourbench/main.py

+
+    elif choice == 2:  # OpenAI Compatible
+        config["base_url"] = "https://api.openai.com/v1"
+        config["api_key"] = Prompt.ask("API key (use $VAR for env)", default="$OPENAI_API_KEY")


Right now, if a user enters a real API key instead of a variable like $OPENAI_API_KEY (and there's no input validation) it gets written to config.yaml in plain text. This is a security risk, especially if the file is committed to git (we never know)

I suggest writing to an .env file instead and adding input validation, for example, blocking input that doesn't start with $

Added validation to prevent plain text API keys. Now enforces $VAR format and automatically creates/updates .env file with placeholders. Shows clear error if users try to enter real keys.

alozowski · 2025-06-06T10:09:47Z

yourbench/main.py

+        if Confirm.ask("Use custom tokenizer?", default=False):
+            config["encoding_name"] = Prompt.ask("Encoding name", default="cl100k_base")
+    else:
+        config["max_concurrent_requests"] = 16 if choice == 1 else 8


Magic numbers make code hard to maintain.

I suggest to place such variables at the top of the file:

DEFAULT_CONCURRENT_REQUESTS_HF = 16 DEFAULT_CONCURRENT_REQUESTS_API = 8 DEFAULT_CHUNK_TOKENS = 256 DEFAULT_MAX_TOKENS = 16384

Moved all magic numbers to named constants at the top of the file for better maintainability.

alozowski · 2025-06-06T10:20:17Z

yourbench/main.py

+        return {
+            "ingestion": [model_name],
+            "summarization": [model_name],
+            "chunking": ["intfloat/multilingual-e5-large-instruct"],


I noticed we're assigning intfloat/multilingual-e5-large-instruct to the chunking stage by default, but the chunking stage is purely token-based since semantic_chunking was deprecated

Also, this model requires torch and transformers, which aren’t installed in the project, so this could break things or confuse users. Let’s remove it from model_roles["chunking"]?

TODO: update the example/configs/advanced_example.yaml as well and remove semantic_chunking

Removed the embedding model from chunking stage since it's purely token-based now. Also cleaned up model_roles to exclude chunking. Will update advanced_example.yaml in a follow-up commit.

alozowski

Cool!

add simple configuration generator

f65edaf

sumukshashidhar requested a review from alozowski June 4, 2025 11:44

alozowski requested changes Jun 6, 2025

View reviewed changes

sumukshashidhar added 2 commits June 10, 2025 12:29

changes

9189648

Merge branch 'main' into cli-tool-building

9c258af

sumukshashidhar requested a review from alozowski June 10, 2025 17:35

alozowski approved these changes Jun 11, 2025

View reviewed changes

sumukshashidhar merged commit ea584bb into main Jun 26, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add simple configuration generator #119

add simple configuration generator #119

Uh oh!

sumukshashidhar commented Jun 4, 2025

Uh oh!

alozowski left a comment

Uh oh!

alozowski Jun 6, 2025

Uh oh!

sumukshashidhar Jun 10, 2025

Uh oh!

alozowski Jun 6, 2025

Uh oh!

sumukshashidhar Jun 10, 2025

Uh oh!

alozowski Jun 6, 2025

Uh oh!

sumukshashidhar Jun 10, 2025

Uh oh!

alozowski Jun 6, 2025

Uh oh!

sumukshashidhar Jun 10, 2025

Uh oh!

alozowski Jun 6, 2025

Uh oh!

sumukshashidhar Jun 10, 2025

Uh oh!

alozowski left a comment

Uh oh!

Uh oh!

Uh oh!

add simple configuration generator #119

add simple configuration generator #119

Uh oh!

Conversation

sumukshashidhar commented Jun 4, 2025

Uh oh!

alozowski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alozowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!