You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Improvements to the DTO_WAIT_METHODS
1. use busypoll and autotune as defaults
2. add tpause instruction
3. change to use compiler intrinsics for UMWAIT and TPAUSE
4. update the documentation for using UMWAIT
5. clean up wait_method paths
* Update README.md
Co-authored-by: Copilot <[email protected]>
* umwait and tpause come from waitpkg
* update size
---------
Co-authored-by: Copilot <[email protected]>
Copy file name to clipboardExpand all lines: README.md
+8-1Lines changed: 8 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,6 +26,13 @@ To improve throughput for synchronous offload, DTO uses "pseudo asynchronous" ex
26
26
calling thread cpu - cpu-centric numa awareness.
27
27
3) In parallel, DTO performs the CPU portion of the job using std library on CPU.
28
28
4) DTO waits for DSA to complete (if it hasn't completed already). The wait method can be configured using an environment variable DTO_WAIT_METHOD.
29
+
The wait method can be one of the following: yield, busypoll, umwait, or tpause. The default is busypoll.
30
+
31
+
For some workloads, complete offloading to the DSA device can result in improved performance and reduce power consumption via the UMWAIT (or TPAUSE) instruction.
32
+
In this case, DTO_CPU_SIZE_FRACTION can be set to 0.0, which means that the CPU job is 0 bytes and the entire job is offloaded to DSA and AUTO_ADJUST_KNOBS is set to 0.
33
+
Then during the wait step, DTO will use the UMWAIT (or TPAUSE) instruction to wait for the DSA job to complete. UMWAIT / TPAUSE are instructions that allow the CPU to enter a low-power state while waiting for the job to complete. For UMWAIT, please refer to this [enabling guide](https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/monitor-umonitor-performance-guidance.html) for more details.
34
+
35
+
29
36
30
37
DTO also implements a heuristic to auto tune dsa_min_bytes and cpu_size_fraction parameters based on current DSA load. For example, if DSA is heavily loaded,
31
38
DTO tries to reduce the DSA load by increasing cpu_size_fraction and dsa_min_bytes. Conversely, if DSA is lightly loaded, DTO tries to increase the DSA load by
@@ -179,4 +186,4 @@ When linking DTO using LD_PRELOAD environment variable special care is required
179
186
in the script.
180
187
- When the application is started by a script with #!<location of shell> which invokes another script with #!<location of shell>, for
181
188
unknown reasons DTO causes a segmentation fault during a memset operation on an 8K sized buffer. This can be avoided by setting the minimum
182
-
DTO size above 8K, or by avoiding this invocation sequence.
189
+
DTO size above 8K, or by avoiding this invocation sequence.
0 commit comments