optimize regex handling #713

kmod · 2015-07-17T00:15:10Z

I think these are good changes, but the perf results are exactly the same as my non-perf-related PR:

       django_template.py             3.6s (2)             3.6s (2)  -0.2%
            pyxl_bench.py             3.5s (2)             3.4s (2)  -1.2%
sqlalchemy_imperative2.py             4.3s (2)             4.4s (2)  +0.4%
                  geomean                 3.8s                 3.8s  -0.4%

It speeds up some simple regex microbenchmarks by 4x though.

This division is expensive; the divisor is always sizeof(char) or sizeof(Py_UNICODE), and it seems to be faster to do a branch and then possibly a shift.

- put it into a header file (and start including it) - move the grow-the-array part into a separate function to encourage the fast-path to get inlined.

Particularly for string slicing, where we would always memset the string data to zero, and then immediately memcpy it.

It was unused

- copy CPython's implementation (that uses C slots) - implement the C slots for str and list - avoid doing a division for non-step slices

kmod · 2015-07-17T01:51:47Z

It looks like Travis-CI tested the wrong commit, but I ran the tests manually and it seems ok.

Here are the stats on the benchmarks I was looking at:

         django_lexing.py             2.0s (2)             1.8s (2)  -11.6%
      re_split_ubench2.py             2.9s (2)             1.8s (2)  -38.7%
       re_split_ubench.py             3.1s (2)             0.8s (2)  -75.3%

django_lexing is the lexing part of the template parsing; re_split_ubench2 is the regex portion of the lexing, and re_split_ubench is just a tiny regex (something like re.split("banana" * 100, "ana")).

optimize regex handling

kmod added 5 commits July 17, 2015 00:09

Optimization to cpythons regex library

d95b70f

This division is expensive; the divisor is always sizeof(char) or sizeof(Py_UNICODE), and it seems to be faster to do a branch and then possibly a shift.

Make listAppendInternal inlineable

b44f8a5

- put it into a header file (and start including it) - move the grow-the-array part into a separate function to encourage the fast-path to get inlined.

Reduce unnecessary string memsets

ba389a2

Particularly for string slicing, where we would always memset the string data to zero, and then immediately memcpy it.

Remove ObjLookupCache.objptr

3658143

It was unused

Optimize PySequence_GetSlice

5bd967f

- copy CPython's implementation (that uses C slots) - implement the C slots for str and list - avoid doing a division for non-step slices

kmod added a commit that referenced this pull request Jul 17, 2015

Merge pull request #713 from kmod/perf3

ef2d7ba

optimize regex handling

kmod merged commit ef2d7ba into pyston:master Jul 17, 2015

kmod deleted the perf3 branch July 17, 2015 01:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize regex handling #713

optimize regex handling #713

Uh oh!

kmod commented Jul 17, 2015

Uh oh!

kmod commented Jul 17, 2015

Uh oh!

Uh oh!

optimize regex handling #713

optimize regex handling #713

Uh oh!

Conversation

kmod commented Jul 17, 2015

Uh oh!

kmod commented Jul 17, 2015

Uh oh!

Uh oh!