Skip to content

optimize regex handling #713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 17, 2015
Merged

optimize regex handling #713

merged 5 commits into from
Jul 17, 2015

Conversation

kmod
Copy link
Collaborator

@kmod kmod commented Jul 17, 2015

I think these are good changes, but the perf results are exactly the same as my non-perf-related PR:

       django_template.py             3.6s (2)             3.6s (2)  -0.2%
            pyxl_bench.py             3.5s (2)             3.4s (2)  -1.2%
sqlalchemy_imperative2.py             4.3s (2)             4.4s (2)  +0.4%
                  geomean                 3.8s                 3.8s  -0.4%

It speeds up some simple regex microbenchmarks by 4x though.

kmod added 5 commits July 17, 2015 00:09
This division is expensive; the divisor is always sizeof(char) or sizeof(Py_UNICODE),
and it seems to be faster to do a branch and then possibly a shift.
- put it into a header file (and start including it)
- move the grow-the-array part into a separate function
  to encourage the fast-path to get inlined.
Particularly for string slicing, where we would
always memset the string data to zero, and then
immediately memcpy it.
- copy CPython's implementation (that uses C slots)
- implement the C slots for str and list
- avoid doing a division for non-step slices
@kmod
Copy link
Collaborator Author

kmod commented Jul 17, 2015

It looks like Travis-CI tested the wrong commit, but I ran the tests manually and it seems ok.

Here are the stats on the benchmarks I was looking at:

         django_lexing.py             2.0s (2)             1.8s (2)  -11.6%
      re_split_ubench2.py             2.9s (2)             1.8s (2)  -38.7%
       re_split_ubench.py             3.1s (2)             0.8s (2)  -75.3%

django_lexing is the lexing part of the template parsing; re_split_ubench2 is the regex portion of the lexing, and re_split_ubench is just a tiny regex (something like re.split("banana" * 100, "ana")).

kmod added a commit that referenced this pull request Jul 17, 2015
optimize regex handling
@kmod kmod merged commit ef2d7ba into pyston:master Jul 17, 2015
@kmod kmod deleted the perf3 branch July 17, 2015 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant