Skip to content

Conversation

@TomTom101
Copy link
Contributor

Fixes (not reported) an error that may occur in some cases in the RecursiveCharacterTextSplitter.

An empty new_separators array ([]) would end up in the else path of the condition below and used in a function where it is expected to be non empty.

if new_separators is None:
    ...
else:
   # _split_text() expects this array to be non-empty!
   other_info = self._split_text(s, new_separators)

resulting in an IndexError

def _split_text(self, text: str, separators: List[str]) -> List[str]:
        """Split incoming text and return chunks."""
        final_chunks = []
        # Get appropriate separator to use
>       separator = separators[-1]
E       IndexError: list index out of range

langchain/text_splitter.py:425: IndexError

Who can review?

@hwchase17 @eyurtsev

TomTom101 added 2 commits June 8, 2023 18:04
Fixes
```def _split_text(self, text: str, separators: List[str]) -> List[str]:
        """Split incoming text and return chunks."""
        final_chunks = []
        # Get appropriate separator to use
>       separator = separators[-1]
E       IndexError: list index out of range

langchain/text_splitter.py:425: IndexError
```
@TomTom101 TomTom101 changed the title Tomtom101/textsplitter Fix IndexError in RecursiveCharacterTextSplitter Jun 8, 2023
Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@TomTom101
Copy link
Contributor Author

Thanks for the missing linebreak between the imports – newbie mistake :)

(my excuse: make lint fails on my machine with a mypy "Should never get here in normal mode" error. Did not yet find the time to investigate)

@hwchase17 hwchase17 merged commit ac3e6e3 into langchain-ai:master Jun 10, 2023
@TomTom101 TomTom101 deleted the tomtom101/textsplitter branch June 11, 2023 20:01
Undertone0809 pushed a commit to Undertone0809/langchain that referenced this pull request Jun 19, 2023
Fixes (not reported) an error that may occur in some cases in the
RecursiveCharacterTextSplitter.

An empty `new_separators` array ([]) would end up in the else path of
the condition below and used in a function where it is expected to be
non empty.

```python
if new_separators is None:
    ...
else:
   # _split_text() expects this array to be non-empty!
   other_info = self._split_text(s, new_separators)

```
resulting in an `IndexError`

```python
def _split_text(self, text: str, separators: List[str]) -> List[str]:
        """Split incoming text and return chunks."""
        final_chunks = []
        # Get appropriate separator to use
>       separator = separators[-1]
E       IndexError: list index out of range

langchain/text_splitter.py:425: IndexError
```

#### Who can review?
@hwchase17 @eyurtsev

---------

Co-authored-by: Harrison Chase <[email protected]>
This was referenced Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants