Skip to content

Commit a77c345

Browse files
committed
Adding NB blocking a2a; minor updates
1 parent f0805cb commit a77c345

File tree

5 files changed

+146
-11
lines changed

5 files changed

+146
-11
lines changed

content/nb_collectives_intro.tex

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,10 @@
99
the operation is posted and returned immediately. All participants of the Team
1010
should call this routine.
1111

12-
\item Collective Types: In the current specification, not all blocking collectives have
13-
their nonblocking variants. The nonblocking variants supported include alltoall,
14-
broadcast, and reduction collectives. The reduction types supported
15-
are defined in the Table \ref{reducetypes}.
12+
\item Collective Types: The nonblocking variants supported include alltoall,
13+
broadcast, and reduction collectives. Other collective operations such as
14+
collect, barrier, alltoalls, and sync will not have nonblocking variants. The
15+
reduction types supported are defined in Table \ref{teamreducetypes}.
1616

1717
\item Completion semantics: \openshmem programs can learn the status of the collective operations
1818
using the \FUNC{shmem\_req\_test} routine and can be completed using
@@ -21,7 +21,7 @@
2121
\item Threads: While using SHMEM\_THREAD\_MULTIPLE, the \openshmem
2222
programs are allowed to call multiple collective operations on different threads
2323
and the same Team. The collective operations invoked on different threads
24-
are ordered by user-provided tag. The user may choose to not order the
24+
are ordered by a user-provided tag. The user may choose to not order the
2525
collective operations by using the library constant
2626
\CONST{SHMEM\_COLL\_UNORDERED} instead of specifying the tag.
2727

content/shmem_alltoall_nb.tex

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
\apisummary{
2+
Exchanges a fixed amount of contiguous data blocks between all pairs
3+
of \acp{PE} participating in the collective routine.
4+
}
5+
6+
\begin{apidefinition}
7+
8+
%% C11
9+
\begin{C11synopsis}
10+
int @\FuncDecl{shmem\_alltoall\_nb}@(shmem_team_t team, TYPE *dest, const TYPE
11+
*source, size_t nelems, uint32_t tag, shmem_req_h *request);
12+
\end{C11synopsis}
13+
where \TYPE{} is one of the standard \ac{RMA} types specified by Table \ref{stdrmatypes}.
14+
15+
\begin{Csynopsis}
16+
\end{Csynopsis}
17+
\begin{CsynopsisCol}
18+
int @\FuncDecl{shmem\_\FuncParam{TYPENAME}\_alltoall\_nb}@(shmem_team_t team,
19+
TYPE *dest, const TYPE *source, size_t nelems, uint32_t tag, shmem_req_h *request);
20+
\end{CsynopsisCol}
21+
where \TYPE{} is one of the standard \ac{RMA} types and has a corresponding \TYPENAME{} specified by Table \ref{stdrmatypes}.
22+
23+
\begin{CsynopsisCol}
24+
int @\FuncDecl{shmem\_alltoallmem\_nb}@(shmem_team_t team, void *dest, const
25+
void *source, size_t nelems, uint32_t tag, shmem_req_h *request);
26+
\end{CsynopsisCol}
27+
28+
\begin{apiarguments}
29+
30+
\apiargument{IN}{team}{A valid \openshmem team handle to a team.}%
31+
32+
\apiargument{OUT}{dest}{Symmetric address of a data object large enough to receive
33+
the combined total of \VAR{nelems} elements from each \ac{PE} in the
34+
active set.
35+
The type of \dest{} should match that implied in the SYNOPSIS section.}
36+
\apiargument{IN}{source}{Symmetric address of a data object that contains \VAR{nelems}
37+
elements of data for each \ac{PE} in the active set, ordered according to
38+
destination \ac{PE}.
39+
The type of \source{} should match that implied in the SYNOPSIS section.}
40+
\apiargument{IN}{nelems}{
41+
The number of elements to exchange for each \ac{PE}.
42+
For \FUNC{shmem\_alltoallmem\_nb}, elements are bytes;
43+
for \FUNC{shmem\_alltoall\{32,64\}\_nb}, elements are 4 or 8 bytes,
44+
respectively.
45+
}
46+
\apiargument{IN}{tag}{A user defined tag to order the collective operation;
47+
SHMEM\_COLL\_UNORDERED can be provided if no order is required.}
48+
\apiargument{OUT}{request}{An opaque request handle identifying the collective
49+
operation.}
50+
51+
\end{apiarguments}
52+
53+
\apidescription{
54+
The \FUNC{shmem\_alltoall\_nb} routines are collective routines. All
55+
\acp{PE} in the provided team must participate in the collective. If
56+
\VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is
57+
otherwise invalid, the behavior is undefined.
58+
59+
{\bf Invocation and completion}: A call to the nonblocking alltoall routine posts the operation and returns
60+
immediately without necessarily completing the operation. On the successful
61+
post of the operation, an opaque request handle is created and returned. The
62+
operation is completed after a call to \FUNC{shmem\_req\_test} or
63+
\FUNC{shmem\_req\_wait}. When the operation is complete, the request handle
64+
is deallocated and cannot be reused.
65+
66+
Though nonblocking alltoall varies in invocation and completion semantics
67+
when compared to blocking alltoall, the data exchange semantics are similar.
68+
69+
{\bf Data exchange semantics}:
70+
In this routine, each \ac{PE}
71+
participating in the operation exchanges \VAR{nelems} data elements
72+
with all other \acp{PE} participating in the operation.
73+
The size of a data element is:
74+
\begin{itemize}
75+
\item 32 bits for \FUNC{shmem\_alltoall32}
76+
\item 64 bits for \FUNC{shmem\_alltoall64}
77+
\item 8 bits for \FUNC{shmem\_alltoallmem}
78+
\item \FUNC{sizeof}(\TYPE{}) for alltoall routines taking typed \VAR{source} and \VAR{dest}
79+
\end{itemize}
80+
81+
The data being sent and received are
82+
stored in a contiguous symmetric data object. The total size of each \ac{PE}'s
83+
\VAR{source} object and \VAR{dest} object is \VAR{nelems} times the size of
84+
an element
85+
times \VAR{N}, where \VAR{N} equals the number of \acp{PE} participating
86+
in the operation.
87+
The \VAR{source} object contains \VAR{N} blocks of data
88+
(where the size of each block is defined by \VAR{nelems}) and each block of data
89+
is sent to a different \ac{PE}.
90+
91+
The same \dest{} and \source{}
92+
arrays, and same value for nelems
93+
must be passed by all \acp{PE} that participate in the collective.
94+
95+
Given a \ac{PE} \VAR{i} that is the \kth \ac{PE}
96+
participating in the operation and a \ac{PE}
97+
\VAR{j} that is the \lth \ac{PE}
98+
participating in the operation,
99+
100+
\ac{PE} \VAR{i} sends the \lth block of its \VAR{source} object to
101+
the \kth block of
102+
the \VAR{dest} object of \ac{PE} \VAR{j}.
103+
104+
105+
Like data exchange semantics, the entry and completion
106+
criteria of blocking and nonblocking alltoall is similar.
107+
108+
{\bf Entry criteria}: Before any \ac{PE} calls a \FUNC{shmem\_alltoall\_nb} routine,
109+
the following condition must be ensured:
110+
\begin{itemize}
111+
\item The \VAR{dest} data object on all \acp{PE} in the team is
112+
ready to accept the \FUNC{shmem\_alltoall\_nb} data.
113+
\end{itemize}
114+
Otherwise, the behavior is undefined.
115+
116+
{\bf Completion criteria}: Upon completion, the following is true for
117+
the local PE:
118+
\begin{itemize}
119+
\item Its \VAR{dest} symmetric data object is completely updated and
120+
the data has been copied out of the \VAR{source} data object.
121+
\end{itemize}
122+
}
123+
124+
\apireturnvalues{
125+
Zero on successful local completion. Nonzero otherwise.
126+
}
127+
128+
\end{apidefinition}
129+

content/shmem_broadcast_nb.tex

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,10 @@
4343
}
4444
\apiargument{IN}{PE\_root}{Zero-based ordinal of the \ac{PE}, with respect to
4545
the team, from which the data is copied.}
46-
\apiargument{IN}{tag}{A user defined tag to order the collective operation.}
47-
\apiargument{OUT}{request}{An opaque request handle identifying the collective operation}
46+
\apiargument{IN}{tag}{A user defined tag to order the collective operation;
47+
SHMEM\_COLL\_UNORDERED can be provided if no order is required.}
48+
\apiargument{OUT}{request}{An opaque request handle identifying the collective
49+
operation.}
4850

4951

5052
\end{apiarguments}
@@ -97,7 +99,7 @@
9799

98100

99101
\apireturnvalues{
100-
Zero on successfull posting of the collective
102+
Zero on successful posting of the collective
101103
operation; otherwise, nonzero.
102104
}
103105

content/shmem_collective_test.tex

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
\begin{apidefinition}
66

77
\begin{Csynopsis}
8-
int @\FuncDecl{shmem\_req\_test}@(shmem_req_t request);
8+
int @\FuncDecl{shmem\_req\_test}@(shmem_req_h request);
99
\end{Csynopsis}
1010

1111
\begin{apiarguments}
@@ -17,8 +17,8 @@
1717
\apidescription{
1818
A call to \FUNC{shmem\_req\_test} returns immediately. If the
1919
collective operation identified by the request is completed, it returns
20-
true (non-negative integer). The request object is deallocated. If the collective operation is not
21-
completed, it returns zero.
20+
zero. The request object is deallocated. If the collective operation is not
21+
completed, it returns an integer (non-negative integer).
2222

2323
In a multithreaded environment, the collective and the
2424
\FUNC{shmem\_req\_test} can be

main_spec.tex

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -390,6 +390,10 @@ \subsection{Nonblocking Collective Routines}\label{subsec:nb_coll}
390390
\subsubsection{\textbf{SHMEM\_BROADCAST\_NB}}\label{subsec:shmem_broadcast_nb}
391391
\input{content/shmem_broadcast_nb.tex}
392392

393+
\subsubsection{\textbf{SHMEM\_ALLTOALL\_NB}}\label{subsec:shmem_alltoall_nb}
394+
\input{content/shmem_alltoall_nb.tex}
395+
396+
393397
\subsubsection{\textbf{SHMEM\_COLLECTIVE\_TEST}}\label{subsec:shmem_collective_test}
394398
\input{content/shmem_collective_test.tex}
395399

0 commit comments

Comments
 (0)