You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/_sources/quickstart.rst
+22-18Lines changed: 22 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -359,35 +359,37 @@ expression and are not limited to simply assigning to a tensor. Expanding on our
359
359
Instead of setting ``t1`` to a range, we multiply the range by 5.0, and add that range to a vector of ones using the ``ones`` generator. Without any
360
360
intermediate storage, we combined two generators, a multiply, and an add operator into a single kernel.
361
361
362
-
Executors
363
-
---------
364
-
As mentioned above, the ``exec`` function is an executor for launching operators onto the device. ``exec`` is a special type of executor since it can take
365
-
either views or operators as inputs and transform them in an element-wise kernel. Often the type of operation we are trying to do cannot be expressed as
366
-
an MatX element-wise operator, so ``exec`` cannot be used. Other types of executors exist for this purpose. These executors typically do more complex
367
-
transformations on the data compared to an element-wise kernel, and often use optimized libraries on the back-end to execute. Some examples are fft (Fast
368
-
Fourier Transform), matmul (Matrix Multiply), and sort.
369
-
370
-
MatX provides an easy-to-use API for executing complex functions, like those mentioned above. These executors currently cannot be part of an operator
371
-
expression and must be executed as their own statement:
362
+
Transforms
363
+
----------
364
+
As mentioned above, the ``run`` function takes an executor describing where to launch the work. In the examples above ``run`` the operator
365
+
expressions created a single fused element-wise operation. Often the type of operation we are trying to do cannot be expressed as
366
+
an element-wise operator and therefor can't be fused with other operations without synchronization. These classes of operators are called *transforms*.
367
+
Transforms can be used anywhere operators are used:
372
368
373
369
.. code-block:: cpp
374
370
375
-
fft(B, A, stream);
371
+
(B = fft(A) * C).run(stream);
376
372
377
-
The ``fft`` executor above performs a 1D FFT on the tensor ``A``, and stores it in ``B``. All executors use the same calling convention where the outputs
378
-
are listed first, followed by inputs, and finally an optional stream. Except for ``exec``, executors can only operate on tensor views, and not
379
-
on generators or operators. For instance, you cannot take an fft of ``ones()``.
373
+
The ``fft`` transform above performs a 1D FFT on the tensor ``A``, multiplies the output by ``C``, and stores it in ``B``. Since the FFT
374
+
may require synchronizing before performing the multiply, MatX can internally create a temporary buffer for the FFT output and free it when
375
+
the expression goes out of scope.
380
376
381
-
Unless documented otherwise, executors work on tensors of a specific size. Matrix multiplies require a 2D tensor (matrix), 1D FFTs require
377
+
Unless documented otherwise, transforms work on tensors of a specific size. Matrix multiplies require a 2D tensor (matrix), 1D FFTs require
382
378
a 1D tensor (vector), etc. If the dimension of the tensor is higher than the expected dimension, all higher dimensions will be batched. In the FFT
383
379
call above, if ``A`` and ``B`` are 4D tensors, the inner 3 dimensions will launch a batched 1D FFT with no change in syntax.
384
380
385
-
As mentioned above, the same tensor views can be used in operator expressions before or after executors:
381
+
As mentioned above, the same tensor views can be used in operator expressions before or after transforms:
386
382
387
383
.. code-block:: cpp
388
384
389
385
(a = b + 2).run(stream);
390
-
matmul(c, a, b, stream);
386
+
(c = matmul(a, d)).run(stream);
387
+
388
+
Or fused in a single line:
389
+
390
+
.. code-block:: cpp
391
+
392
+
(c = matmul(b + 2, d)).run(stream);
391
393
392
394
The code above executes a kernel to store the result of ``b + 2`` into ``a``, then subsequently performs the matrix multiply ``C = A * B``. Since
393
395
the operator and matrix multiply are launched in the same CUDA stream, they will be executed serially.
@@ -398,11 +400,13 @@ Common reduction executors are also available, such as ``sum()``, ``mean()``, ``
398
400
399
401
auto t4 = make_tensor<float>({100, 100, 100, 100});
400
402
auto t0 = make_tensor<float>();
401
-
sum(t0, t4);
403
+
(t0 = sum(t4)).run();
402
404
403
405
The above code performs an optimized sum reduction of ``t4`` into ``t0``. Currently reduction type exectors *can* take operators as an input. Please
404
406
see the documentation for a list of which ones are compatible.
405
407
408
+
For more information about operation fusion, see :ref:`fusion`.
409
+
406
410
Random numbers
407
411
--------------
408
412
MatX can generate random numbers using the cuRAND library as the backend. Random number generation consumes memory on the device, so the construction
0 commit comments