Skip to content

johnmaxrin/avial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🥗 Avial : An MLIR Dialect for Distributed Heterogeneous Computing

Avial is a compiler infrastructure built using MLIR that enables efficient execution of programs across distributed and heterogeneous computing systems (CPU, GPU, cluster). Avial introduces a novel task-centric intermediate representation (IR) where tasks are first-class citizens, capturing their parallelism, device targets, and interdependencies.

🚧 Project Status

Current Focus
Next Release
GitHub last commit
Open Issues

Why is parallel programming hard?

Parallel programming is notoriously difficult. Developers must reason about concurrency, memory consistency, synchronization, and performance optimization, all of which are challenging even in a single-threaded environment. The situation is further complicated by the fragmented nature of parallel programming frameworks: multicore CPUs are commonly programmed using POSIX threads or OpenMP, distributed memory systems use MPI, and accelerators like GPUs are typically programmed using CUDA, OpenCL, or OpenACC. Each of these paradigms comes with its own abstractions and programming idioms.

In parallel programming, you need to think about how and where your code runs, not just what it does.

Unifying these paradigms into a single coherent programming or compilation model is non-trivial due to fundamental differences in their memory models, synchronization semantics, and communication mechanisms. While there have been commendable efforts at unifying heterogeneous computing within a node. Such as OpenCL, OpenACC, and more recently Mojo. There is a noticeable gap when it comes to extending these unifications across distributed environments. The gap remains largely due to the complexity of distributed computing: issues such as explicit data movement between the nodes and network topology cannot be abstracted away as easily.

Why Avial Is Unique?

While MLIR includes dialects like omp for shared-memory parallelism and gpu for targeting accelerators such as CUDA or ROCm, there is currently no dialect that provides a unified abstraction for distributed heterogeneous computing, that is, for clusters of nodes with diverse compute units like CPUs and GPUs.

The mpi dialect in MLIR offers low-level building blocks that reflect traditional MPI operations (e.g., send, recv, bcast). However, it requires the programmer to manage rank assignments, data partitioning, topology awareness, and task scheduling manually. This is error-prone and non-trivial, especially as system complexity scales.

Our dialect builds on top of the MPI dialect, But raises the level of abstraction. Users express what computation needs to be performed and whether it should run on a CPU or GPU, without worrying about the underlying distributed communication or resource allocation.

This dialect bridges the gap between device and cluster level parallelism, making it an MLIR dialect that can target distributed heterogeneous systems.

The CodeDrop Approach

In high-performance computing environments, applications often contain a mix of compute regions. Some better suited for multicore CPUs, others for GPUs. Traditionally, orchestrating these different parts involves complex code: different frameworks (e.g., OpenMP, CUDA, MPI), manual device management, and tedious boilerplate. The CodeDrop approach introduces a task-oriented, declarative model that streamlines this process:

Drop your computation. Declare the target. Let the dialect take care of the rest.

Here's how it works in practice:

  • Wrap your compute region inside a TaskOp. This region can contain operations from any dialect whether it's affine loops, linalg ops, or custom dialects.
  • Attach a targetOp to specify where the task should execute. e.g., cpu or gpu.
  • Let the Dialect Handle the Rest
    • Automatically schedules tasks to the right hardware
    • Inserts necessary MPI coordination
    • Lowers the task to the different backend (e.g., LLVM, CUDA, ROCm)
    • Handles device setup and data movement

Thanks to the CodeDrop approach, integrating the Avial dialect into existing compiler pipelines is both trivial and non-intrusive. The process begins by identifying performance-critical regions such as loops, compute kernels, or math-heavy operations regardless of which dialect they're written in. These regions are then wrapped in a TaskOp. That’s it. From there, Avial takes full control, automatically lowering tasks to the appropriate execution backends including MPI for distributed execution and ultimately to LLVM IR.

This approach not only simplifies integration but also scales easily across heterogeneous and distributed environments. Whether running on a single multicore CPU or across a CPU-GPU cluster with MPI, Avial ensures consistent handling of task distribution and coordination.

About

An MLIR Dialect for Distributed Heterogeneous Computing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published