Author: Braden T Helmer
This project implements a custom Message Passing Interface (MPI) library from scratch, providing core MPI functionality including point-to-point communication, collective operations, and process management. The implementation is designed for educational purposes to understand the underlying mechanisms of parallel message passing systems.
-
Process Management:
MPI_Init()
- Initialize the MPI environmentMPI_Comm_rank()
- Get process rankMPI_Comm_size()
- Get total number of processesMPI_Finalize()
- Clean up MPI environment
-
Point-to-Point Communication:
MPI_Send()
- Blocking send operationMPI_Recv()
- Blocking receive operation
-
Collective Operations:
MPI_Barrier()
- Synchronization barrierMPI_Gather()
- Gather data from all processesMPI_Bcast()
- Broadcast data to all processes
-
Supported Data Types:
MPI_INT32_T
- 32-bit integersMPI_PROCESS_T
- Process information structures
The implementation uses TCP sockets for inter-process communication and includes:
- Custom process launcher (
my_prun
) that works with SLURM - Network-based communication using IP addresses and ports
- Support for both intra-node and inter-node communication
- GCC compiler
- Python 3 (for port allocation)
- SLURM environment (for multi-node execution)
make # Build the executable
make run # Build and run with my_prun launcher
make clean # Clean build artifacts
The included Round-Trip Time (RTT) benchmark tests message passing performance:
# Basic execution (requires SLURM environment)
./my_prun ./my_rtt
# For Case 4 testing (inter/intra node comparison)
# Compile with -DCASE4 flag first
./my_prun ./my_rtt inter # Test inter-node communication
./my_prun ./my_rtt intra # Test intra-node communication
The my_prun
launcher sets up the following environment variables:
MYMPI_ROOT_HOST
- Hostname of the root processMYMPI_ROOT_PORT
- Port number for root processMYMPI_PORT
- Port number for current processMYMPI_RANK
- Process rankMYMPI_NTASKS
- Total number of tasksMYMPI_NNODES
- Number of nodes
The RTT benchmark tests message passing latency across different message sizes:
- 32KB, 64KB, 128KB, 256KB, 512KB, 1MB, 2MB
- 100 iterations per message size for statistical accuracy
- Measures round-trip time for send-receive pairs
- Outputs timing data in scientific notation (nanoseconds)
In performance testing against standard MPI implementations, this custom implementation shows significantly higher latency:
- Latency: 100x to 1000x slower than optimized MPI libraries
- Throughput: Reduced due to lack of optimization techniques like:
- Zero-copy transfers
- Shared memory optimizations
- Advanced networking protocols (InfiniBand, etc.)
- Pipelined message transfers
This performance difference illustrates the complexity and optimization present in production MPI implementations.
my_mpi.h
- Header file with MPI function declarations and data structuresmy_mpi.c
- Implementation of core MPI functionsmy_rtt.c
- RTT benchmark applicationmy_prun
- Custom process launcher script for SLURM environmentsMakefile
- Build configuration
- Uses TCP sockets for reliable message delivery
- Process discovery through environment variables set by launcher
- Simple blocking communication model
- Basic error handling and process coordination
- Compatible with SLURM job scheduler for multi-node execution
Note: This is an educational implementation designed to demonstrate MPI concepts. For production use, utilize optimized MPI libraries like OpenMPI or MPICH.