Refactor: Centralize keyword_extraction parameter handling in OpenAI LLM implementations by danielaskdd · Pull Request #2401 · HKUDS/LightRAG

danielaskdd · 2025-11-21T04:14:28Z

Refactor: Centralize keyword_extraction parameter handling in OpenAI LLM implementations

Summary

Refactored the keyword_extraction parameter handling in OpenAI and Azure OpenAI LLM implementations to follow the DRY (Don't Repeat Yourself) principle. All keyword extraction logic is now centralized in the base *_complete_if_cache functions, eliminating code duplication across wrapper functions.

Enhance keyword extraction compatibility to handle cases where the LLM cannot reliably generate JSON output.

Changes

`lightrag/llm/openai.py`

Enhanced openai_complete_if_cache: Added keyword extraction handling that sets response_format to GPTKeywordExtractionFormat when keyword_extraction=True
Simplified wrapper functions: Removed redundant keyword extraction logic from:
- openai_complete() - removed inconsistent "json" format handling
- gpt_4o_complete() - removed duplicate format setting
- gpt_4o_mini_complete() - removed duplicate format setting

`lightrag/llm/azure_openai.py`

Added import: GPTKeywordExtractionFormat from lightrag.types
Enhanced azure_openai_complete_if_cache:
- Added keyword_extraction: bool = False parameter
- Implemented keyword extraction logic with GPTKeywordExtractionFormat
- Removed redundant kwargs.pop("keyword_extraction", None)
Updated azure_openai_complete: Now properly passes keyword_extraction parameter to the base function

Benefits

✅ Single source of truth: All keyword extraction logic centralized in base functions
✅ Consistency: Both implementations use the same GPTKeywordExtractionFormat
✅ Maintainability: Future changes only need to be made in one location per file
✅ Code quality: Eliminates code duplication and improves readability

Testing

Python syntax validation passed
Backward compatibility maintained
All wrapper functions continue to work as expected

Breaking Changes

None. This is a pure refactoring that maintains full backward compatibility.

• Move response format to core function • Remove duplicate format assignments • Standardize keyword extraction flow • Clean up redundant parameter handling • Improve Azure OpenAI compatibility

danielaskdd · 2025-11-21T04:16:06Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

lightrag/llm/openai.py

chatgpt-codex-connector · 2025-11-21T04:19:42Z

lightrag/llm/azure_openai.py

+    # Handle keyword extraction mode
+    if keyword_extraction:
+        kwargs["response_format"] = GPTKeywordExtractionFormat


Guard Azure keyword extraction parse responses

When keyword_extraction is true this new block sets response_format, causing the request to go through beta.chat.completions.parse, but the non-streaming path still assumes response.choices[0].message.content is a string and immediately probes it with "\u" in content. Structured parse responses often set content=None and populate message.parsed instead, so keyword extraction calls will now raise a TypeError before returning any result. Consider handling message.parsed or null content before performing string operations.

Useful? React with 👍 / 👎.

Added support for structured output (JSON mode) from the OpenAI API in `openai.py` and `azure_openai.py`. When `response_format` is used to request structured data, the new logic checks for the `message.parsed` attribute. If it exists, it's serialized into a JSON string as the final content. If not, the code falls back to the existing `message.content` handling, ensuring backward compatibility.

- Remove beta prefix from completions.parse - Update OpenAI dependency to >=2.0.0 - Fix whitespace formatting - Update all requirement files - Clean up pyproject.toml dependencies

danielaskdd · 2025-11-21T04:56:22Z

@codex review

chatgpt-codex-connector · 2025-11-21T05:03:49Z

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Refactor keyword extraction handling to centralize response format logic

c9e1c86

• Move response format to core function • Remove duplicate format assignments • Standardize keyword extraction flow • Clean up redundant parameter handling • Improve Azure OpenAI compatibility

chatgpt-codex-connector bot reviewed Nov 21, 2025

View reviewed changes

danielaskdd added 2 commits November 21, 2025 12:46

Update OpenAI client to use stable API and bump minimum version to 2.0.0

02fdceb

- Remove beta prefix from completions.parse - Update OpenAI dependency to >=2.0.0 - Fix whitespace formatting - Update all requirement files - Clean up pyproject.toml dependencies

danielaskdd merged commit 8777895 into HKUDS:main Nov 21, 2025
4 checks passed

danielaskdd mentioned this pull request Nov 21, 2025

[Bug]: Tools for quering vector DB not called when running gpt-oss:20b on llama.cpp server #2398

Open

2 tasks

danielaskdd deleted the fix-openai-keyword-extraction branch November 21, 2025 11:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Centralize keyword_extraction parameter handling in OpenAI LLM implementations#2401

Refactor: Centralize keyword_extraction parameter handling in OpenAI LLM implementations#2401
danielaskdd merged 3 commits intoHKUDS:mainfrom
danielaskdd:fix-openai-keyword-extraction

danielaskdd commented Nov 21, 2025

Uh oh!

danielaskdd commented Nov 21, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot Nov 21, 2025

Uh oh!

danielaskdd commented Nov 21, 2025

Uh oh!

chatgpt-codex-connector bot commented Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danielaskdd commented Nov 21, 2025