diff --git a/content/english/hpc/pipelining/tables.md b/content/english/hpc/pipelining/tables.md index ad90c400..5c6e9eba 100644 --- a/content/english/hpc/pipelining/tables.md +++ b/content/english/hpc/pipelining/tables.md @@ -30,7 +30,7 @@ You can get latency and throughput numbers for a specific architecture from spec Some comments: -- Because our minds are so used to the cost model where "more" means "worse," people mostly use *reciprocals* of throughput instead of throughput. +- Reciprocal throughput (a unit of time) is generally used instead of throughput (a frequency) because time is a linear unit in the time domain and frequency is not. Reciprocal throughput also aligns with the natural intuition that higher values typically represent lower performance. - If a certain instruction is especially frequent, its execution unit could be duplicated to increase its throughput — possibly to even more than one, but not higher than the [decode width](/hpc/architecture/layout). - Some instructions have a latency of 0. This means that these instruction are used to control the scheduler and don't reach the execution stage. They still have non-zero reciprocal throughput because the [CPU front-end](/hpc/architecture/layout) still needs to process them. - Most instructions are pipelined, and if they have the reciprocal throughput of $n$, this usually means that their execution unit can take another instruction after $n$ cycles (and if it is below 1, this means that there are multiple execution units, all capable of taking another instruction on the next cycle). One notable exception is [integer division](/hpc/arithmetic/division): it is either very poorly pipelined or not pipelined at all.