NVIDIA · cliffburdick · Jul 3, 2023 · Jul 3, 2023 · Jul 3, 2023
diff --git a/docs/_sources/api/linalg/decomp/chol.rst b/docs/_sources/api/linalg/decomp/chol.rst
@@ -3,7 +3,7 @@
 chol
 ####
 
-Perform a Cholesky factorization and saves the result in either the uppoer or lower triangle of the output. 
+Perform a Cholesky factorization and saves the result in either the upper or lower triangle of the output. 
 
 .. note::
   The input matrix must be positive semidefinite

diff --git a/docs/_sources/basics/broadcast.rst b/docs/_sources/basics/broadcast.rst
@@ -3,4 +3,47 @@
 Broadcasting
 ############
 
-TODO
+MatX supports broadcasting of values to operators. Broadcasting allows lower-ranked operators to have
+their values repeated into a higher-ranked operator. Broadcasting may allow for both memory and cache improvements
+by storing fewer values in memory and loading the same value from cache. 
+
+In its simplest form, scalars can be broadcast to an arbitrary-sized tensor:
+
+.. code-block:: cpp
+
+    (t = 4).run();
+
+In the example above the value `4` is broadcast to every value in `t`, regardless of the rank and sizes
+of `t`. This same rule applies with 0D operators, which are effectively scalar values:
+
+.. code-block:: cpp
+
+    auto t0 = make_tensor<float>();
+    auto t2 = make_tensor<float>({10, 20);
+    (t2 = t0).run();
+
+Besides scalars, other rank operators can also be broadcasted:
+
+.. code-block:: cpp
+
+    auto t1 = make_tensor<float>({20});
+    auto t2 = make_tensor<float>({10, 20);
+    (t2 = t1).run();
+
+In fact, any operator can be broadcasted to another operator as long as the dimensions are *compatible*. For
+broadcasting, compatible means either:
+
+* The value being broadcasted from is either a scalar or 0D operator
+* The operator being broadcasted from matches all the right-most dimensions of the operator assigned to
+
+The first rule is discussed above. The second rule can be explained with this example:
+
+.. code-block:: cpp
+
+    auto t3 = make_tensor<float>({3, 4, 20});
+    auto t5 = make_tensor<float>({10, 6, 3, 4, 20);
+    (t5 = t3).run();
+
+In this example `t3`'s dimensions (3, 4, 20) match the right-most (fastest-changing) dimensions in `t5`, so
+the values in `t3` can be broadcasted to `t5`. During this assignment the same values in memory in `t3` are 
+loaded repeatedly into the left-most dimensions of `t5` (10, 6).
diff --git a/docs/api/linalg/decomp/chol.html b/docs/api/linalg/decomp/chol.html
@@ -609,7 +609,7 @@ <h2> Contents </h2>
 
   <section id="chol">
 <span id="chol-func"></span><h1>chol<a class="headerlink" href="#chol" title="Permalink to this heading">#</a></h1>
-<p>Perform a Cholesky factorization and saves the result in either the uppoer or lower triangle of the output.</p>
+<p>Perform a Cholesky factorization and saves the result in either the upper or lower triangle of the output.</p>
 <div class="admonition note">
 <p class="admonition-title">Note</p>
 <p>The input matrix must be positive semidefinite</p>

diff --git a/docs/basics/broadcast.html b/docs/basics/broadcast.html
@@ -599,7 +599,41 @@ <h1>Broadcasting</h1>
 
   <section id="broadcasting">
 <span id="broadcast"></span><h1>Broadcasting<a class="headerlink" href="#broadcasting" title="Permalink to this heading">#</a></h1>
-<p>TODO</p>
+<p>MatX supports broadcasting of values to operators. Broadcasting allows lower-ranked operators to have
+their values repeated into a higher-ranked operator. Broadcasting may allow for both memory and cache improvements
+by storing fewer values in memory and loading the same value from cache.</p>
+<p>In its simplest form, scalars can be broadcast to an arbitrary-sized tensor:</p>
+<div class="highlight-cpp notranslate"><div class="highlight"><pre><span></span><span class="p">(</span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">4</span><span class="p">).</span><span class="n">run</span><span class="p">();</span>
+</pre></div>
+</div>
+<p>In the example above the value <cite>4</cite> is broadcast to every value in <cite>t</cite>, regardless of the rank and sizes
+of <cite>t</cite>. This same rule applies with 0D operators, which are effectively scalar values:</p>
+<div class="highlight-cpp notranslate"><div class="highlight"><pre><span></span><span class="k">auto</span><span class="w"> </span><span class="n">t0</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">make_tensor</span><span class="o">&lt;</span><span class="kt">float</span><span class="o">&gt;</span><span class="p">();</span>
+<span class="k">auto</span><span class="w"> </span><span class="n">t2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">make_tensor</span><span class="o">&lt;</span><span class="kt">float</span><span class="o">&gt;</span><span class="p">({</span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="mi">20</span><span class="p">);</span>
+<span class="p">(</span><span class="n">t2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">t0</span><span class="p">).</span><span class="n">run</span><span class="p">();</span>
+</pre></div>
+</div>
+<p>Besides scalars, other rank operators can also be broadcasted:</p>
+<div class="highlight-cpp notranslate"><div class="highlight"><pre><span></span><span class="k">auto</span><span class="w"> </span><span class="n">t1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">make_tensor</span><span class="o">&lt;</span><span class="kt">float</span><span class="o">&gt;</span><span class="p">({</span><span class="mi">20</span><span class="p">});</span>
+<span class="k">auto</span><span class="w"> </span><span class="n">t2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">make_tensor</span><span class="o">&lt;</span><span class="kt">float</span><span class="o">&gt;</span><span class="p">({</span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="mi">20</span><span class="p">);</span>
+<span class="p">(</span><span class="n">t2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">t1</span><span class="p">).</span><span class="n">run</span><span class="p">();</span>
+</pre></div>
+</div>
+<p>In fact, any operator can be broadcasted to another operator as long as the dimensions are <em>compatible</em>. For
+broadcasting, compatible means either:</p>
+<ul class="simple">
+<li><p>The value being broadcasted from is either a scalar or 0D operator</p></li>
+<li><p>The operator being broadcasted from matches all the right-most dimensions of the operator assigned to</p></li>
+</ul>
+<p>The first rule is discussed above. The second rule can be explained with this example:</p>
+<div class="highlight-cpp notranslate"><div class="highlight"><pre><span></span><span class="k">auto</span><span class="w"> </span><span class="n">t3</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">make_tensor</span><span class="o">&lt;</span><span class="kt">float</span><span class="o">&gt;</span><span class="p">({</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">20</span><span class="p">});</span>
+<span class="k">auto</span><span class="w"> </span><span class="n">t5</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">make_tensor</span><span class="o">&lt;</span><span class="kt">float</span><span class="o">&gt;</span><span class="p">({</span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="mi">6</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">20</span><span class="p">);</span>
+<span class="p">(</span><span class="n">t5</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">t3</span><span class="p">).</span><span class="n">run</span><span class="p">();</span>
+</pre></div>
+</div>
+<p>In this example <cite>t3</cite>’s dimensions (3, 4, 20) match the right-most (fastest-changing) dimensions in <cite>t5</cite>, so
+the values in <cite>t3</cite> can be broadcasted to <cite>t5</cite>. During this assignment the same values in memory in <cite>t3</cite> are
+loaded repeatedly into the left-most dimensions of <cite>t5</cite> (10, 6).</p>
 </section>