pMR is a C++ communication library particularly optimized for small messages. It exploits communication offloading and persistence provided by the NIC, which are typical features of RDMA. It only supports a limited low-level feature set, which allows to design a lightweight library that keeps software overhead to a minimum.
While it uses a connection-oriented API that fits quite nicely on top of many hardware-specific or hardware-near APIs, it hides all hardware specific implementation details and quirks from the user. This approach allows the user to benefit from hardware-near optimizations while still writing hardware-agnostic code.
Apart from low-level point-to-point data transfers, pMR currently only supports global reduction.
In addition to the C++ API a limited C API is provided.
pMR supports multi-threaded communication. Thread support is limited to only one thread working on a particular connection, or a resource associated with it, simultaneously. I.e., calls to functions associated with a particular connection have to be serialized.
pMR uses Doxygen to generate documentation (HTML) from its annotated sources. Check out section Compilation about how to build the documentation in ./doc/html.
Open current pMR API documentation: pMR API Documentation
It is suggested to build pMR as a static library using the included CMake files. Set up build environment using CMake (only out-of-source builds are allowed):
mkdir build && cd build
cmake \
-DCMAKE_INSTALL_PREFIX:PATH="/install/path/for/this/library" \
-DCLUSTER="Target cluster to build for" \
-DBACKEND="Used backend" \
-DTHREAD="Multithreading environment" \
../
- QPACE3: Cluster with single port Omni-Path HFI. Omni-Path (OFI or PSM2) and shared memory (Cross-Memory Attach).
- QPACE2: Cluster with 1D FBT InfiniBand topology. InfiniBand verbs only, no shared memory.
- QPACEB: Cluster with single port InfiniBand HCA. InfiniBand verbs only, no shared memory.
- iDataCool: Cluster with single port InfiniBand HCA. InfiniBand verbs and shared memory (Cross-Memory Attach).
- SHM: Single node using shared memory (Cross-Memory Attach) only.
- MPI: MPI fallback support. Uses MPI for intra- and inter-node communication.
- MPI: For applications using MPI for multi-processing.
- Single: Single-threaded application.
- Serialized: Multi-threaded application with serialized calls to pMR functions (except explicitly multi-threaded functions).
- Multiple: Multi-threaded application.
Note: Even in case of non-serialized multi-threaded applications, calls to one particular connection still have to be serialized. I.e. no two threads are allowed to be working on the same connection concurrently.
For a full list of optional parameters see CMakeLists.txt.
- -DMIC=ON: Cross-compile for the first generation Intel Xeon Phi (KNC). Requires Intel Compiler and Intel MPSS 3.7 (or newer).
- -DCAPI=ON: Include optional C API.
- -DPROFILING=ON: Enable profiling capability.
To build a static library and install all required files:
make
make install
To build documentation (HTML):
make doc