|
| 1 | +\apisummary{ |
| 2 | + Exchanges a fixed amount of contiguous data blocks between all pairs |
| 3 | + of \acp{PE} participating in the collective routine. |
| 4 | +} |
| 5 | + |
| 6 | +\begin{apidefinition} |
| 7 | + |
| 8 | +%% C11 |
| 9 | +\begin{C11synopsis} |
| 10 | +int @\FuncDecl{shmem\_alltoall\_nb}@(shmem_team_t team, TYPE *dest, const TYPE |
| 11 | +*source, size_t nelems, uint32_t tag, shmem_req_h *request); |
| 12 | +\end{C11synopsis} |
| 13 | +where \TYPE{} is one of the standard \ac{RMA} types specified by Table \ref{stdrmatypes}. |
| 14 | + |
| 15 | +\begin{Csynopsis} |
| 16 | +\end{Csynopsis} |
| 17 | +\begin{CsynopsisCol} |
| 18 | +int @\FuncDecl{shmem\_\FuncParam{TYPENAME}\_alltoall\_nb}@(shmem_team_t team, |
| 19 | +TYPE *dest, const TYPE *source, size_t nelems, uint32_t tag, shmem_req_h *request); |
| 20 | +\end{CsynopsisCol} |
| 21 | +where \TYPE{} is one of the standard \ac{RMA} types and has a corresponding \TYPENAME{} specified by Table \ref{stdrmatypes}. |
| 22 | + |
| 23 | +\begin{CsynopsisCol} |
| 24 | +int @\FuncDecl{shmem\_alltoallmem\_nb}@(shmem_team_t team, void *dest, const |
| 25 | +void *source, size_t nelems, uint32_t tag, shmem_req_h *request); |
| 26 | +\end{CsynopsisCol} |
| 27 | + |
| 28 | +\begin{apiarguments} |
| 29 | + |
| 30 | +\apiargument{IN}{team}{A valid \openshmem team handle to a team.}% |
| 31 | + |
| 32 | +\apiargument{OUT}{dest}{Symmetric address of a data object large enough to receive |
| 33 | + the combined total of \VAR{nelems} elements from each \ac{PE} in the |
| 34 | + active set. |
| 35 | + The type of \dest{} should match that implied in the SYNOPSIS section.} |
| 36 | +\apiargument{IN}{source}{Symmetric address of a data object that contains \VAR{nelems} |
| 37 | + elements of data for each \ac{PE} in the active set, ordered according to |
| 38 | + destination \ac{PE}. |
| 39 | + The type of \source{} should match that implied in the SYNOPSIS section.} |
| 40 | +\apiargument{IN}{nelems}{ |
| 41 | + The number of elements to exchange for each \ac{PE}. |
| 42 | + For \FUNC{shmem\_alltoallmem\_nb}, elements are bytes; |
| 43 | + for \FUNC{shmem\_alltoall\{32,64\}\_nb}, elements are 4 or 8 bytes, |
| 44 | + respectively. |
| 45 | +} |
| 46 | +\apiargument{IN}{tag}{A user defined tag to order the collective operation; |
| 47 | +SHMEM\_COLL\_UNORDERED can be provided if no order is required.} |
| 48 | +\apiargument{OUT}{request}{An opaque request handle identifying the collective |
| 49 | +operation.} |
| 50 | + |
| 51 | +\end{apiarguments} |
| 52 | + |
| 53 | +\apidescription{ |
| 54 | + The \FUNC{shmem\_alltoall\_nb} routines are collective routines. All |
| 55 | + \acp{PE} in the provided team must participate in the collective. If |
| 56 | + \VAR{team} compares equal to \LibConstRef{SHMEM\_TEAM\_INVALID} or is |
| 57 | + otherwise invalid, the behavior is undefined. |
| 58 | + |
| 59 | + {\bf Invocation and completion}: A call to the nonblocking alltoall routine posts the operation and returns |
| 60 | + immediately without necessarily completing the operation. On the successful |
| 61 | + post of the operation, an opaque request handle is created and returned. The |
| 62 | + operation is completed after a call to \FUNC{shmem\_req\_test} or |
| 63 | + \FUNC{shmem\_req\_wait}. When the operation is complete, the request handle |
| 64 | + is deallocated and cannot be reused. |
| 65 | + |
| 66 | + Though nonblocking alltoall varies in invocation and completion semantics |
| 67 | + when compared to blocking alltoall, the data exchange semantics are similar. |
| 68 | + |
| 69 | + {\bf Data exchange semantics}: |
| 70 | + In this routine, each \ac{PE} |
| 71 | + participating in the operation exchanges \VAR{nelems} data elements |
| 72 | + with all other \acp{PE} participating in the operation. |
| 73 | + The size of a data element is: |
| 74 | + \begin{itemize} |
| 75 | + \item 32 bits for \FUNC{shmem\_alltoall32} |
| 76 | + \item 64 bits for \FUNC{shmem\_alltoall64} |
| 77 | + \item 8 bits for \FUNC{shmem\_alltoallmem} |
| 78 | + \item \FUNC{sizeof}(\TYPE{}) for alltoall routines taking typed \VAR{source} and \VAR{dest} |
| 79 | + \end{itemize} |
| 80 | + |
| 81 | + The data being sent and received are |
| 82 | + stored in a contiguous symmetric data object. The total size of each \ac{PE}'s |
| 83 | + \VAR{source} object and \VAR{dest} object is \VAR{nelems} times the size of |
| 84 | + an element |
| 85 | + times \VAR{N}, where \VAR{N} equals the number of \acp{PE} participating |
| 86 | + in the operation. |
| 87 | + The \VAR{source} object contains \VAR{N} blocks of data |
| 88 | + (where the size of each block is defined by \VAR{nelems}) and each block of data |
| 89 | + is sent to a different \ac{PE}. |
| 90 | + |
| 91 | + The same \dest{} and \source{} |
| 92 | + arrays, and same value for nelems |
| 93 | + must be passed by all \acp{PE} that participate in the collective. |
| 94 | + |
| 95 | + Given a \ac{PE} \VAR{i} that is the \kth \ac{PE} |
| 96 | + participating in the operation and a \ac{PE} |
| 97 | + \VAR{j} that is the \lth \ac{PE} |
| 98 | + participating in the operation, |
| 99 | + |
| 100 | + \ac{PE} \VAR{i} sends the \lth block of its \VAR{source} object to |
| 101 | + the \kth block of |
| 102 | + the \VAR{dest} object of \ac{PE} \VAR{j}. |
| 103 | + |
| 104 | + |
| 105 | + Like data exchange semantics, the entry and completion |
| 106 | + criteria of blocking and nonblocking alltoall is similar. |
| 107 | + |
| 108 | + {\bf Entry criteria}: Before any \ac{PE} calls a \FUNC{shmem\_alltoall\_nb} routine, |
| 109 | + the following condition must be ensured: |
| 110 | + \begin{itemize} |
| 111 | + \item The \VAR{dest} data object on all \acp{PE} in the team is |
| 112 | + ready to accept the \FUNC{shmem\_alltoall\_nb} data. |
| 113 | + \end{itemize} |
| 114 | + Otherwise, the behavior is undefined. |
| 115 | + |
| 116 | + {\bf Completion criteria}: Upon completion, the following is true for |
| 117 | + the local PE: |
| 118 | + \begin{itemize} |
| 119 | + \item Its \VAR{dest} symmetric data object is completely updated and |
| 120 | + the data has been copied out of the \VAR{source} data object. |
| 121 | + \end{itemize} |
| 122 | +} |
| 123 | + |
| 124 | +\apireturnvalues{ |
| 125 | + Zero on successful local completion. Nonzero otherwise. |
| 126 | +} |
| 127 | + |
| 128 | +\end{apidefinition} |
| 129 | + |
0 commit comments