Skip to content

Make "fast_chunking" the default chunking mode and preserve semantic mode under config #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 31, 2025

Conversation

sumukshashidhar
Copy link
Member

Summary of Changes

  1. Default to fast_chunking: Introduced a new “fast_chunking” logic that creates chunks purely based on maximum token length. This mode avoids embedding computation and similarity checks.
  2. Retain Semantic Chunking via Config: The existing embedding-based (semantic) mode is now activated only if the config specifies "chunking_mode": "semantic_chunking".
  3. Refactored Chunking Flow: The fast mode does not load or run any embedding model, minimizing overhead.
  4. Added chunking_mode Field: The new field in ChunkingParameters decides which approach to use, defaulting to "fast_chunking" if not explicitly configured.

Tested to work

Copy link
Member

@clefourrier clefourrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Maybe add some doc on how this affects/would affect multihop (I assume we shouldn't multihop with the fast chunking, right?)

@clefourrier
Copy link
Member

And ofc, fix the style first

@sumukshashidhar
Copy link
Member Author

fixed style!

@sumukshashidhar
Copy link
Member Author

multi-hop wouldn't be affected - as all multi-hop chunks are just combos of single hop chunks!

@sumukshashidhar sumukshashidhar merged commit 316201e into main Mar 31, 2025
1 check passed
Josephrp pushed a commit to Josephrp/yourbench that referenced this pull request Jun 5, 2025
Make "fast_chunking" the default chunking mode and preserve semantic mode under config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants