Skip to content

Commit 0f2b44b

Browse files
authored
[Opset15][Spec] Add and update specifications for EmbeddingBag ops (#25100)
### Details: - *Add specification for EmbeddingBagOffsets-15 and EmbeddingBagPacked-15* - *Remove dead links and improve specification of EmbeddingBagOffsetsSum-3 and EmbeddingBagPackedSum-3* ### Tickets: - *141862* - *141863*
1 parent b80e30c commit 0f2b44b

File tree

4 files changed

+396
-8
lines changed

4 files changed

+396
-8
lines changed
Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
.. {#openvino_docs_ops_sparse_EmbeddingBagOffsets_15}
2+
3+
EmbeddingBagOffsets
4+
======================
5+
6+
7+
.. meta::
8+
:description: Learn about EmbeddingBagOffsets-15 - a sparse operation, which
9+
can be performed on three required and two optional input tensors.
10+
11+
**Versioned name**: *EmbeddingBagOffsets-15*
12+
13+
**Category**: *Sparse*
14+
15+
**Short description**: Computes sums or means of "bags" of embeddings, without instantiating the intermediate embeddings.
16+
17+
**Detailed description**:
18+
19+
Operation EmbeddingBagOffsets is an implementation of ``torch.nn.EmbeddingBag`` with indices and offsets inputs being 1D tensors.
20+
21+
For each index in ``indices`` this operator gathers values from ``emb_table`` embedding table. Then values at indices in the range of the same bag (based on ``offset`` input) are reduced according to ``reduction`` attribute.
22+
23+
Values in ``offsets`` define starting index in ``indices`` tensor of each "bag",
24+
e.g. ``offsets`` with value ``[0, 3, 4, 4, 6]`` define 5 "bags" containing ``[3, 1, 0, 2, num_indices-6]`` elements corresponding to ``[indices[0:3], indices[3:4], empty_bag, indices[4:6], indices[6:]]`` slices of indices per bag.
25+
26+
EmbeddingBagOffsets is an equivalent to following NumPy snippet:
27+
28+
.. code-block:: py
29+
30+
def embedding_bag_offsets(
31+
emb_table: np.ndarray,
32+
indices: np.ndarray,
33+
offsets: np.ndarray,
34+
default_index: Optional[int] = None,
35+
per_sample_weights: Optional[np.ndarray] = None,
36+
reduction: Literal["sum", "mean"] = "sum",
37+
):
38+
assert (
39+
reduction == "sum" or per_sample_weights is None
40+
), "Attribute per_sample_weights is only supported in sum reduction."
41+
if per_sample_weights is None:
42+
per_sample_weights = np.ones_like(indices)
43+
embeddings = []
44+
for emb_idx, emb_weight in zip(indices, per_sample_weights):
45+
embeddings.append(emb_table[emb_idx] * emb_weight)
46+
previous_offset = offsets[0]
47+
bags = []
48+
offsets = np.append(offsets, len(indices))
49+
for bag_offset in offsets[1:]:
50+
bag_size = bag_offset - previous_offset
51+
if bag_size != 0:
52+
embedding_bag = embeddings[previous_offset:bag_offset]
53+
reduced_bag = np.add.reduce(embedding_bag)
54+
if reduction == "mean":
55+
reduced_bag = reduced_bag / bag_size
56+
bags.append(reduced_bag)
57+
else:
58+
# Empty bag case
59+
if default_index is not None and default_index != -1:
60+
bags.append(emb_table[default_index])
61+
else:
62+
bags.append(np.zeros(emb_table.shape[1:]))
63+
previous_offset = bag_offset
64+
return np.stack(bags, axis=0)
65+
66+
67+
**Attributes**:
68+
69+
* *reduction*
70+
71+
* **Description**: reduction mode.
72+
* **Range of values**:
73+
74+
* sum - compute weighted sum, using corresponding values of ``per_sample_weights`` as weights if provided.
75+
* mean - compute average of values in bag. Input ``per_sample_weights`` is not supported and will raise exception.
76+
77+
* **Type**: ``string``
78+
* **Default value**: sum
79+
* **Required**: *no*
80+
81+
**Inputs**:
82+
83+
* **1**: ``emb_table`` tensor containing the embedding lookup table of the module of shape ``[num_emb, emb_dim1, emb_dim2, ...]`` and of type *T*. **Required.**
84+
* **2**: ``indices`` tensor of shape ``[num_indices]`` and of type *T_IND*. **Required.**
85+
* **3**: ``offsets`` tensor of shape ``[batch]`` and of type *T_IND* containing the starting index positions of each "bag" in ``indices``. Maximum value of offsets cannot be greater than length of ``indices``. **Required.**
86+
* **4**: ``default_index`` scalar of type *T_IND* containing default index in embedding table to fill empty "bags". If set to ``-1`` or not provided, empty "bags" are filled with zeros. Reverse indexing using negative values is not supported. **Optional.**
87+
* **5**: ``per_sample_weights`` tensor of the same shape as ``indices`` and of type *T*. Supported only when *reduction* attribute is set to ``"sum"``. Each value in this tensor are multiplied with each value pooled from embedding table for each index. Optional, default is tensor of ones. **Optional.**
88+
89+
**Outputs**:
90+
91+
* **1**: tensor of shape ``[batch, emb_dim1, emb_dim2, ...]`` and of type *T* containing embeddings for each bag.
92+
93+
**Types**
94+
95+
* *T*: any numeric type.
96+
* *T_IND*: ``int32`` or ``int64``.
97+
98+
**Example**
99+
100+
*Example 1: per_sample_weights are provided, default_index is set to 0 to fill empty bag with values gathered form emb_table on given index.*
101+
102+
.. code-block:: xml
103+
104+
<layer ... type="EmbeddingBagOffsets" ... >
105+
<data reduction="sum"/>
106+
<input>
107+
<port id="0"> <!-- emb_table value is: [[-0.2, -0.6], [-0.1, -0.4], [-1.9, -1.8], [-1., 1.5], [ 0.8, -0.7]] -->
108+
<dim>5</dim>
109+
<dim>2</dim>
110+
</port>
111+
<port id="1"> <!-- indices value is: [0, 2, 3, 4] -->
112+
<dim>4</dim>
113+
</port>
114+
<port id="2"> <!-- offsets value is: [0, 2, 2] - 3 "bags" containing [2,0,4-2] elements, second "bag" is empty -->
115+
<dim>3</dim>
116+
</port>
117+
<port id="3"/> <!-- default_index value is: 0 -->
118+
<port id="4"/> <!-- per_sample_weights value is: [0.5, 0.5, 0.5, 0.5] -->
119+
<dim>4</dim>
120+
</port>
121+
</input>
122+
<output>
123+
<port id="5"> <!-- output value is: [[-1.05, -1.2], [-0.2, -0.6], [-0.1, 0.4]] -->
124+
<dim>3</dim>
125+
<dim>2</dim>
126+
</port>
127+
</output>
128+
</layer>
129+
130+
*Example 2: per_sample_weights are provided, default_index is set to -1 to fill empty bag with 0.*
131+
132+
.. code-block:: xml
133+
134+
<layer ... type="EmbeddingBagOffsets" ... >
135+
<data reduction="sum"/>
136+
<input>
137+
<port id="0"> <!-- emb_table value is: [[-0.2, -0.6], [-0.1, -0.4], [-1.9, -1.8], [-1., 1.5], [ 0.8, -0.7]] -->
138+
<dim>5</dim>
139+
<dim>2</dim>
140+
</port>
141+
<port id="1"> <!-- indices value is: [0, 2, 3, 4] -->
142+
<dim>4</dim>
143+
</port>
144+
<port id="2"> <!-- offsets value is: [0, 2, 2] - 3 "bags" containing [2,0,4-2] elements, second "bag" is empty -->
145+
<dim>3</dim>
146+
</port>
147+
<port id="3"/> <!-- default_index value is: -1 - fill empty bag with 0-->
148+
<port id="4"/> <!-- per_sample_weights value is: [0.5, 0.2, -2, 1] -->
149+
<dim>4</dim>
150+
</port>
151+
</input>
152+
<output>
153+
<port id="5"> <!-- output value is: [[-0.48, -0.66], [0., 0.], [2.8, -3.7]] -->
154+
<dim>3</dim>
155+
<dim>2</dim>
156+
</port>
157+
</output>
158+
</layer>
159+
160+
*Example 3: Example of reduction set to mean.*
161+
162+
.. code-block:: xml
163+
164+
<layer ... type="EmbeddingBagOffsets" ... >
165+
<data reduction="mean"/>
166+
<input>
167+
<port id="0"> <!-- emb_table value is: [[-0.2, -0.6], [-0.1, -0.4], [-1.9, -1.8], [-1., 1.5], [ 0.8, -0.7]] -->
168+
<dim>5</dim>
169+
<dim>2</dim>
170+
</port>
171+
<port id="1"> <!-- indices value is: [0, 2, 3, 4] -->
172+
<dim>4</dim>
173+
</port>
174+
<port id="2"> <!-- offsets value is: [0, 2, 2] - 3 "bags" containing [2,0,4-2] elements, second "bag" is empty -->
175+
<dim>3</dim>
176+
</port>
177+
</input>
178+
<output>
179+
<port id="3"> <!-- output value is: [[-1.05, -1.2], [0., 0.], [-0.1, 0.4]] -->
180+
<dim>3</dim>
181+
<dim>2</dim>
182+
</port>
183+
</output>
184+
</layer>

docs/articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/sparse/embedding-bag-offsets-sum-3.rst

Lines changed: 74 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,56 @@ EmbeddingBagOffsetsSum
1414

1515
**Short description**: Computes sums of "bags" of embeddings, without instantiating the intermediate embeddings.
1616

17-
**Detailed description**: This is the second case of the PyTorch `EmbeddingBag <https://pytorch.org/docs/stable/nn.html#embeddingbag>`__ , it has indices in two 1D tensors provided as 2nd and 3rd inputs. For each index in ``indices`` this operator gets values from ``data`` embedding table and sums all values belonging to each bag. Values in ``offsets`` define starting index in ``indices`` tensor of each "bag", e.g. ``offsets`` with value ``[0,3,4,4,6]`` define 5 "bags" containing ``[3,1,0,2,n-6]`` elements.
17+
**Detailed description**:
18+
19+
Operation EmbeddingBagOffsets is an implementation of ``torch.nn.EmbeddingBag`` with indices and offsets inputs being 1D tensors.
20+
21+
For each index in ``indices`` this operator gathers values from ``emb_table`` embedding table. Then values at indices in the range of the same bag (based on ``offset`` input) are reduced according to ``reduction`` attribute.
22+
23+
Values in ``offsets`` define starting index in ``indices`` tensor of each "bag",
24+
e.g. ``offsets`` with value ``[0, 3, 4, 4, 6]`` define 5 "bags" containing ``[3, 1, 0, 2, num_indices-6]`` elements corresponding to ``[indices[0:3], indices[3:4], empty_bag, indices[4:6], indices[6:]]`` slices of indices per bag.
25+
26+
EmbeddingBagOffsetsSum is an equivalent to following NumPy snippet:
27+
28+
.. code-block:: py
29+
30+
def embedding_bag_offsets(
31+
emb_table: np.ndarray,
32+
indices: np.ndarray,
33+
offsets: np.ndarray,
34+
default_index: Optional[int] = None,
35+
per_sample_weights: Optional[np.ndarray] = None,
36+
):
37+
if per_sample_weights is None:
38+
per_sample_weights = np.ones_like(indices)
39+
embeddings = []
40+
for emb_idx, emb_weight in zip(indices, per_sample_weights):
41+
embeddings.append(emb_table[emb_idx] * emb_weight)
42+
previous_offset = offsets[0]
43+
bags = []
44+
offsets = np.append(offsets, len(indices))
45+
for bag_offset in offsets[1:]:
46+
bag_size = bag_offset - previous_offset
47+
if bag_size != 0:
48+
embedding_bag = embeddings[previous_offset:bag_offset]
49+
reduced_bag = np.add.reduce(embedding_bag)
50+
bags.append(reduced_bag)
51+
else:
52+
# Empty bag case
53+
if default_index is not None and default_index != -1:
54+
bags.append(emb_table[default_index])
55+
else:
56+
bags.append(np.zeros(emb_table.shape[1:]))
57+
previous_offset = bag_offset
58+
return np.stack(bags, axis=0)
1859
1960
**Attributes**: EmbeddingBagOffsetsSum operation has no attributes.
2061

2162
**Inputs**:
2263

2364
* **1**: ``emb_table`` tensor containing the embedding lookup table of the module of shape ``[num_emb, emb_dim1, emb_dim2, ...]`` and of type *T*. **Required.**
2465
* **2**: ``indices`` tensor of shape ``[num_indices]`` and of type *T_IND*. **Required.**
25-
* **3**: ``offsets`` tensor of shape ``[batch]`` and of type *T_IND* containing the starting index positions of each "bag" in ``indices``. **Required.**
66+
* **3**: ``offsets`` tensor of shape ``[batch]`` and of type *T_IND* containing the starting index positions of each "bag" in ``indices``. Maximum value of offsets cannot be greater than length of ``indices``. **Required.**
2667
* **4**: ``default_index`` scalar of type *T_IND* containing default index in embedding table to fill empty "bags". If set to ``-1`` or not provided, empty "bags" are filled with zeros. Reverse indexing using negative values is not supported. **Optional.**
2768
* **5**: ``per_sample_weights`` tensor of the same shape as ``indices`` and of type *T*. Each value in this tensor are multiplied with each value pooled from embedding table for each index. Optional, default is tensor of ones. **Optional.**
2869

@@ -37,7 +78,9 @@ EmbeddingBagOffsetsSum
3778

3879
**Example**
3980

40-
.. code-block:: cpp
81+
*Example 1: per_sample_weights are provided, default_index is set to 0 to fill empty bag with values gathered form emb_table on given index.*
82+
83+
.. code-block:: xml
4184
4285
<layer ... type="EmbeddingBagOffsetsSum" ... >
4386
<input>
@@ -52,7 +95,7 @@ EmbeddingBagOffsetsSum
5295
<dim>3</dim>
5396
</port>
5497
<port id="3"/> <!-- default_index value is: 0 -->
55-
<port id="4"/> <!-- per_sample_weigths value is: [0.5, 0.5, 0.5, 0.5] -->
98+
<port id="4"/> <!-- per_sample_weights value is: [0.5, 0.5, 0.5, 0.5] -->
5699
<dim>4</dim>
57100
</port>
58101
</input>
@@ -64,4 +107,31 @@ EmbeddingBagOffsetsSum
64107
</output>
65108
</layer>
66109
110+
*Example 2: per_sample_weights are provided, default_index is set to -1 to fill empty bag with 0.*
111+
112+
.. code-block:: xml
67113
114+
<layer ... type="EmbeddingBagOffsets" ... >
115+
<input>
116+
<port id="0"> <!-- emb_table value is: [[-0.2, -0.6], [-0.1, -0.4], [-1.9, -1.8], [-1., 1.5], [ 0.8, -0.7]] -->
117+
<dim>5</dim>
118+
<dim>2</dim>
119+
</port>
120+
<port id="1"> <!-- indices value is: [0, 2, 3, 4] -->
121+
<dim>4</dim>
122+
</port>
123+
<port id="2"> <!-- offsets value is: [0, 2, 2] - 3 "bags" containing [2,0,4-2] elements, second "bag" is empty -->
124+
<dim>3</dim>
125+
</port>
126+
<port id="3"/> <!-- default_index value is: -1 - fill empty bag with 0-->
127+
<port id="4"/> <!-- per_sample_weights value is: [0.5, 0.5, 0.5, 0.5] -->
128+
<dim>4</dim>
129+
</port>
130+
</input>
131+
<output>
132+
<port id="5"> <!-- output value is: [[-1.05, -1.2], [0., 0.], [-0.1, 0.4]] -->
133+
<dim>3</dim>
134+
<dim>2</dim>
135+
</port>
136+
</output>
137+
</layer>

0 commit comments

Comments
 (0)