Skip to content

Roadmap

Overview

DLSlime is dedicated to supporting efficient transmission over a variety of different links, including but not limited to IBVerbs, CUDA IPC, TCP Socket, PCIE, NVShmem, Ascend (Direct), NVME-oF ...

Transfer Engine

DLSlime provides a flexible and efficient P2P Transfer Engine, enabling AI-workload-aware customized functions such as Prefill-Decode separation and checkpoint transmission.

Collective Ops

Referring to DeepEP, DLSlime provides a buffer-based collective communication library that achieves ultra-low latency and SM-free collective communications.

Torch Wrapper

To meet the heterogeneous requirements of SPMD programs such as heterogeneous pipeline parallel training, a Torch communication backend is provided.

Transfer Engine Roadmap

  • IBVerbs Transfer Engine
  • ✅ SendRecv Endpoint
  • ✅ RDMA Read/Write Endpoint
  • NVShmem
  • ✅ NVShmem Context and Send/Recv Kernel
  • ⚡ support NVShmem put and get wrapper
  • TCP Socket
  • ✅ zmq bootstrap
  • ⏳ TCP Socket transfer engine
  • CUDA IPC
  • ✅ support CUDAIPC Read/Write Endpoint
  • PCIE
  • ⏳ High performance Shared Memory transfer engine
  • ⏳ High performance data offloading
  • Ascend
  • ✅ Ascned direct transfer engine
  • NVME-oF
  • 💭 Planning
  • UB Mesh
  • 💭 Planning

Collective Ops

  • IBVerbs
  • ✅ Send/Recv
  • ⚡ M2N for attention-FFN disaggregation
  • ⏳ AllGather
  • ⏳ AllReduce
  • ⏳ All2All
  • NVShmem
  • ⏳ Send/Recv
  • ✅ AllGather
  • ⏳ AllReduce
  • ⏳ All2All
  • CUDA IPC
  • ✅ AllGather
  • ⚡ High performance AllGather using CUDA Multi-Mem

Torch Wrapper

  • IBVerbs
  • ✅ Send/Recv
  • ⏳ AllGather
  • ⏳ AllReduce
  • ⏳ All2All