Skip to content

pjgeorg/pMR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pMR

pico Message Passing for RDMA

pMR is a C++ communication library particularly optimized for small messages. It exploits communication offloading and persistence provided by the NIC, which are typical features of RDMA. It only supports a limited low-level feature set, which allows to design a lightweight library that keeps software overhead to a minimum.

While it uses a connection-oriented API that fits quite nicely on top of many hardware-specific or hardware-near APIs, it hides all hardware specific implementation details and quirks from the user. This approach allows the user to benefit from hardware-near optimizations while still writing hardware-agnostic code.

Apart from low-level point-to-point data transfers, pMR currently only supports global reduction.

C API

In addition to the C++ API a limited C API is provided.

Thread Support

pMR supports multi-threaded communication. Thread support is limited to only one thread working on a particular connection, or a resource associated with it, simultaneously. I.e., calls to functions associated with a particular connection have to be serialized.

Documentation

pMR uses Doxygen to generate documentation (HTML) from its annotated sources. Check out section Compilation about how to build the documentation in ./doc/html.

Open current pMR API documentation: pMR API Documentation

Compilation

It is suggested to build pMR as a static library using the included CMake files. Set up build environment using CMake (only out-of-source builds are allowed):

mkdir build && cd build
cmake \
    -DCMAKE_INSTALL_PREFIX:PATH="/install/path/for/this/library" \
    -DCLUSTER="Target cluster to build for" \
    -DBACKEND="Used backend" \
    -DTHREAD="Multithreading environment" \
    ../

Supported options for CLUSTER

  • QPACE3: Cluster with single port Omni-Path HFI. Omni-Path (OFI or PSM2) and shared memory (Cross-Memory Attach).
  • QPACE2: Cluster with 1D FBT InfiniBand topology. InfiniBand verbs only, no shared memory.
  • QPACEB: Cluster with single port InfiniBand HCA. InfiniBand verbs only, no shared memory.
  • iDataCool: Cluster with single port InfiniBand HCA. InfiniBand verbs and shared memory (Cross-Memory Attach).
  • SHM: Single node using shared memory (Cross-Memory Attach) only.
  • MPI: MPI fallback support. Uses MPI for intra- and inter-node communication.

Supported options for BACKEND

  • MPI: For applications using MPI for multi-processing.

Supported options for THREAD

  • Single: Single-threaded application.
  • Serialized: Multi-threaded application with serialized calls to pMR functions (except explicitly multi-threaded functions).
  • Multiple: Multi-threaded application.

Note: Even in case of non-serialized multi-threaded applications, calls to one particular connection still have to be serialized. I.e. no two threads are allowed to be working on the same connection concurrently.

Optional configurations

For a full list of optional parameters see CMakeLists.txt.

  • -DMIC=ON: Cross-compile for the first generation Intel Xeon Phi (KNC). Requires Intel Compiler and Intel MPSS 3.7 (or newer).
  • -DCAPI=ON: Include optional C API.
  • -DPROFILING=ON: Enable profiling capability.

Build & Install

To build a static library and install all required files:

make
make install

To build documentation (HTML):

make doc

About

pico Message Passing for RDMA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •