Extremeley slow on large files

```
What steps will reproduce the problem?
1. Read in a large file of varied UTF characters
2. Run guessLanguage on it
3. It takes forever

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.
The library is designed to deal with small chunks of data, which is fine. 
However, in the case you feed it lots of data, it slows to a crawl.

This appears to be because of the nonAlphaRe call in normalize; the regex is 
thousands of characters long, and applied to every character in the data. 

A substantial speedup (100x or more) can be obtained by replacing the following 
call in normalize():
    u = nonAlphaRe.sub(' ', u)
with
    u = ''.join([ c.isalpha() and c or ' ' for c in u])
which I believe has the same effect.

```

Original issue reported on code.google.com by `ajshan...@gmail.com` on 13 Jul 2011 at 1:18


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extremeley slow on large files #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Extremeley slow on large files #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions