Refactor: Centralize keyword_extraction parameter handling in OpenAI LLM implementations#2401
Conversation
• Move response format to core function • Remove duplicate format assignments • Standardize keyword extraction flow • Clean up redundant parameter handling • Improve Azure OpenAI compatibility
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Handle keyword extraction mode | ||
| if keyword_extraction: | ||
| kwargs["response_format"] = GPTKeywordExtractionFormat |
There was a problem hiding this comment.
Guard Azure keyword extraction parse responses
When keyword_extraction is true this new block sets response_format, causing the request to go through beta.chat.completions.parse, but the non-streaming path still assumes response.choices[0].message.content is a string and immediately probes it with "\u" in content. Structured parse responses often set content=None and populate message.parsed instead, so keyword extraction calls will now raise a TypeError before returning any result. Consider handling message.parsed or null content before performing string operations.
Useful? React with 👍 / 👎.
Added support for structured output (JSON mode) from the OpenAI API in `openai.py` and `azure_openai.py`. When `response_format` is used to request structured data, the new logic checks for the `message.parsed` attribute. If it exists, it's serialized into a JSON string as the final content. If not, the code falls back to the existing `message.content` handling, ensuring backward compatibility.
- Remove beta prefix from completions.parse - Update OpenAI dependency to >=2.0.0 - Fix whitespace formatting - Update all requirement files - Clean up pyproject.toml dependencies
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Refactor: Centralize keyword_extraction parameter handling in OpenAI LLM implementations
Summary
Refactored the
keyword_extractionparameter handling in OpenAI and Azure OpenAI LLM implementations to follow the DRY (Don't Repeat Yourself) principle. All keyword extraction logic is now centralized in the base*_complete_if_cachefunctions, eliminating code duplication across wrapper functions.Enhance keyword extraction compatibility to handle cases where the LLM cannot reliably generate JSON output.
Changes
lightrag/llm/openai.pyopenai_complete_if_cache: Added keyword extraction handling that setsresponse_formattoGPTKeywordExtractionFormatwhenkeyword_extraction=Trueopenai_complete()- removed inconsistent"json"format handlinggpt_4o_complete()- removed duplicate format settinggpt_4o_mini_complete()- removed duplicate format settinglightrag/llm/azure_openai.pyGPTKeywordExtractionFormatfromlightrag.typesazure_openai_complete_if_cache:keyword_extraction: bool = FalseparameterGPTKeywordExtractionFormatkwargs.pop("keyword_extraction", None)azure_openai_complete: Now properly passeskeyword_extractionparameter to the base functionBenefits
✅ Single source of truth: All keyword extraction logic centralized in base functions
✅ Consistency: Both implementations use the same
GPTKeywordExtractionFormat✅ Maintainability: Future changes only need to be made in one location per file
✅ Code quality: Eliminates code duplication and improves readability
Testing
Breaking Changes
None. This is a pure refactoring that maintains full backward compatibility.