Using "disect" makes node-tokenizer fundamentally broken

The way that the `Tokenizer` method uses the `disect` method is fundamentally broken: `disect` requires that "all indices superior to the one returned MUST validate the predicate as well" ([source](https://www.npmjs.com/package/disect)). This is not the case for the substring-based predicate in `Tokenizer`.

Example: If the rules allow tokens of length one and tokens of length greater than two (`/./` and `/...+/`), the predicate will return false for index 2 and true for all other indices. Depending on the remaining length of the input, `disect` will hit the index 2 or it won't. If it does, it finds a token of length three, if it doesn't it will find a token of length one. 

So the parsing result depends on the length of the remaining input, **which makes the parser behave highly erratic**.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using "disect" makes node-tokenizer fundamentally broken #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Using "disect" makes node-tokenizer fundamentally broken #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions