Skip to content

support async chunking func to improve processing performance when a heavy chunking_func is passed in by user#2336

Merged
danielaskdd merged 2 commits intoHKUDS:mainfrom
tongda:main
Nov 13, 2025
Merged

support async chunking func to improve processing performance when a heavy chunking_func is passed in by user#2336
danielaskdd merged 2 commits intoHKUDS:mainfrom
tongda:main

Conversation

@tongda
Copy link
Contributor

@tongda tongda commented Nov 9, 2025

Description

Current implementation call the chunking_func as synchronized function. If the chunking_func has heavy operations such as calling LLMs, the calling will block the main loop for a long time.

I add support of async implementation of chunking_func so that it will not block the main loop. If the chunking_func is a normal function, the code will run as before.

Related Issues

Changes Made

I simply add a condition check before calling chunking_func. If it is an async function, call await, else call as before.

lightrag.py:apipeline_process_enqueue_documents:1761

if iscoroutinefunction(self.chunking_func):
  chunks = await self.chunking_func(
      self.tokenizer,
      content,
      split_by_character,
      split_by_character_only,
      self.chunk_overlap_token_size,
      self.chunk_token_size,
  )
else:
  chunks = self.chunking_func(
      self.tokenizer,
      content,
      split_by_character,
      split_by_character_only,
      self.chunk_overlap_token_size,
      self.chunk_token_size,
  )
chunks: dict[str, Any] = {
  compute_mdhash_id(dp["content"], prefix="chunk-"): {
      **dp,
      "full_doc_id": doc_id,
      "file_path": file_path,  # Add file path to each chunk
      "llm_cache_list": [],  # Initialize empty LLM cache list for each chunk
  }
  for dp in chunks
}

Checklist

  • Changes tested locally
  • Code reviewed
  • Documentation updated (if necessary)
  • Unit tests added (if applicable)

Additional Notes

[Add any additional notes or context for the reviewer(s).]

@danielaskdd
Copy link
Collaborator

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@tongda
Copy link
Contributor Author

tongda commented Nov 10, 2025

I changed the code to a simplifier but more general implementation.

Previous code could not work with object that implemented an async __call__ method.

@tongda
Copy link
Contributor Author

tongda commented Nov 10, 2025

@codex review

@chatgpt-codex-connector
Copy link

To use Codex here, create a Codex account and connect to github.

@danielaskdd
Copy link
Collaborator

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danielaskdd danielaskdd merged commit 245df75 into HKUDS:main Nov 13, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants