Skip to content

bpo-45045: Optimize mapping patterns of structural pattern matching #28043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 30, 2021

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Aug 29, 2021

@corona10
Copy link
Member Author


+---------------+--------+----------------------+
| Benchmark     | base   | opt                  |
+===============+========+======================+
| bench pattern | 482 ns | 417 ns: 1.15x faster |
+---------------+--------+----------------------+

@@ -859,7 +859,7 @@ match_keys(PyThreadState *tstate, PyObject *map, PyObject *keys)
if (dummy == NULL) {
goto fail;
}
values = PyList_New(0);
values = PyTuple_New(nkeys);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of the tuple is predictable.

Python/ceval.c Outdated
@@ -873,7 +873,8 @@ match_keys(PyThreadState *tstate, PyObject *map, PyObject *keys)
}
goto fail;
}
PyObject *value = PyObject_CallFunctionObjArgs(get, key, dummy, NULL);
PyObject *args[] = { key, dummy };
PyObject *value = PyObject_Vectorcall(get, args, 2, NULL);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just replacing PyObject_CallFunctionObjArgs shows a 2% performance enhancement on the micro benchmark.

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Aug 29, 2021

The changes LGTM. Tested locally on Win64:

python -m test test_patma -R 3:3
0:00:00 Run tests sequentially
0:00:00 [1/1] test_patma
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

BTW, I was thinking if using _PyObject_GetMethod instead of _PyObject_GetAttrId will make your benchmark faster? The diff from your current is not too large:

@@ -846,7 +846,9 @@ match_keys(PyThreadState *tstate, PyObject *map, PyObject *keys)
     // - Don't cause key creation or resizing in dict subclasses like
     //   collections.defaultdict that define __missing__ (or similar).
     _Py_IDENTIFIER(get);
-    PyObject *get = _PyObject_GetAttrId(map, &PyId_get);
+    PyObject *get_name = _PyUnicode_FromId(&PyId_get); // borrowed
+    PyObject *get = NULL;
+    int meth_found = _PyObject_GetMethod(map, get_name, &get);
     if (get == NULL) {
         goto fail;
     }
@@ -873,8 +875,14 @@ match_keys(PyThreadState *tstate, PyObject *map, PyObject *keys)
             }
             goto fail;
         }
-        PyObject *args[] = { key, dummy };
-        PyObject *value = PyObject_Vectorcall(get, args, 2, NULL);
+        PyObject *args[] = { map, key, dummy };
+        PyObject *value = NULL;
+        if (meth_found) {
+            value = PyObject_Vectorcall(get, args, 3, NULL);
+        }
+        else {
+            value = PyObject_Vectorcall(get, &args[1], 2, NULL);
+        }
         if (value == NULL) {
             goto fail;
         }

@corona10
Copy link
Member Author

corona10 commented Aug 29, 2021

@Fidget-Spinner
Yeah it's better!


➜  cpython git:([bpo-45045](https://bugs.python.org/issue45045)) ✗ ./python.exe -m pyperf compare_to --table base.json suggestion.json
+---------------+--------+----------------------+
| Benchmark     | base   | suggestion           |
+===============+========+======================+
| bench pattern | 482 ns | 373 ns: 1.29x faster |
+---------------+--------+----------------------+
➜  cpython git:([bpo-45045](https://bugs.python.org/issue45045)) ✗ ./python.exe -m pyperf compare_to --table opt.json suggestion.json
+---------------+--------+----------------------+
| Benchmark     | opt    | suggestion           |
+===============+========+======================+
| bench pattern | 417 ns | 373 ns: 1.12x faster |
+---------------+--------+----------------------+

@corona10
Copy link
Member Author

With new commit

0:00:00 load avg: 5.05 Run tests sequentially
0:00:00 load avg: 5.05 [1/1] test_patma
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 1.1 sec
Tests result: SUCCESS

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@corona10
Copy link
Member Author

corona10 commented Aug 30, 2021

@Fidget-Spinner Thanks for the review.

Here is the final benchmark with optimization build with thin LTO :)

+---------------+---------------+----------------------+
| Benchmark     | thin_lto_base | thin_lto_opt         |
+===============+===============+======================+
| bench pattern | 357 ns        | 287 ns: 1.24x faster |
+---------------+---------------+----------------------+

@corona10 corona10 deleted the bpo-45045 branch August 30, 2021 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants