Skip to content

SlimeRPC

SlimeRPC is a typed RPC layer built on PeerAgent and the DLSlime RDMA data path. Use it when application logic should look like a Python service call, while connection setup, memory registration, and request/reply transport stay inside DLSlime.

The public API is:

from dlslime.rpc import (
    method,
    proxy,
    serve,
    serve_once,
    wait_all,
    RpcError,
    RemoteRpcError,
    RpcTimeoutError,
    SoftRnrMonitor,
)

Requirements

SlimeRPC requires:

  • NanoCtrl running and reachable by both peers.
  • Redis reachable through NanoCtrl.
  • Connected PeerAgents on both client and worker.
  • A DLSlime build with the C++ RPC session enabled. If the extension was built without RPC support, proxy() raises an error asking you to rebuild with BUILD_RPC=ON.
nanoctrl start
python examples/python/rpc_example.py --ctrl http://127.0.0.1:3000

Basic Service

Define a service class and mark RPC-callable methods with @method:

from dlslime.rpc import method


class CalcService:
    @method
    def add(self, a: int, b: int) -> int:
        return a + b

    @method
    def echo(self, msg: str) -> str:
        return f"echo: {msg}"

By default, SlimeRPC serializes arguments and return values with pickle. Both sides must import the same service class definition, because method tags are assigned deterministically by sorted method name.

Connect PeerAgents

SlimeRPC does not replace PeerAgent discovery. First create and connect the two agents:

from dlslime import PeerAgent

worker = PeerAgent(nanoctrl_url="http://127.0.0.1:3000", alias="worker:0")
driver = PeerAgent(nanoctrl_url="http://127.0.0.1:3000", alias="driver:0")

driver_conn = driver.connect_to("worker:0", ib_port=1, qp_num=1)
worker_conn = worker.connect_to("driver:0", ib_port=1, qp_num=1)

driver_conn.wait()
worker_conn.wait()

Worker Side

serve() blocks and dispatches requests for one peer:

from dlslime.rpc import serve

serve(worker, CalcService(), peer="driver:0")

For examples or tests, run it in a daemon thread:

import threading

threading.Thread(
    target=serve,
    args=(worker, CalcService(), "driver:0"),
    daemon=True,
).start()

serve() can infer peer only when exactly one peer is connected. Pass peer=... explicitly in multi-peer services.

Server dispatch settings:

Setting Purpose
max_workers= Overrides the handler dispatch pool size.
SLIME_RPC_SERVER_WORKERS Default handler pool size when max_workers is unset.
@method(parallel=True) Allows a handler to run concurrently on the same service instance.

Handlers are serialized by default under a per-service lock. This is safer for stateful services such as model runners, NCCL workers, and shared PyTorch objects.

Client Side

Create a proxy from the client PeerAgent to the worker alias:

from dlslime.rpc import proxy

calc = proxy(driver, "worker:0", CalcService)

future = calc.add(1, 2)
assert future.wait(timeout=30.0) == 3

echo_future = calc.echo("hello")
assert echo_future.wait() == "echo: hello"

Every proxy call returns a future. Use wait() to receive the result:

result = calc.add(1, 2).wait()

Wait for several calls in order:

from dlslime.rpc import wait_all

futures = [calc.add(i, i * 10) for i in range(5)]
results = wait_all(futures, timeout=30.0)

Channel Buffers

SlimeRPC creates an RDMA mailbox channel per (local_agent, peer_agent) pair. The channel registers receive memory under names like:

rpc:mailbox:<local_alias>:<peer_alias>

Buffer controls:

Setting Default Purpose
agent._rpc_buffer_size 32_000_000 bytes Per-inflight request/reply slot size.
agent._rpc_max_inflight SLIME_RPC_MAX_INFLIGHT or 16 Number of receive slots.
SLIME_RPC_MAX_INFLIGHT 16 Environment fallback for max inflight RPCs.

Set these before creating the proxy or starting serve():

driver._rpc_buffer_size = 64 * 1024 * 1024
driver._rpc_max_inflight = 8
worker._rpc_buffer_size = 64 * 1024 * 1024
worker._rpc_max_inflight = 8

Client and worker should use compatible slot sizes for large payloads.

Raw Mode

Use @method(raw=True) to bypass pickle. Raw handlers receive the channel, a request pointer, and request byte length:

import ctypes
from dlslime.rpc import method


class RawEcho:
    @method(raw=True)
    def echo(self, channel, ptr, nbytes):
        request = bytes((ctypes.c_char * nbytes).from_address(ptr))
        return request.upper()

The client passes one bytes payload and receives bytes:

raw = proxy(driver, "worker:0", RawEcho)
response = raw.echo(b"hello").wait()
assert response == b"HELLO"

This is the right mode for FlatBuffers, Cap'n Proto, protobuf bytes, or custom binary layouts. See examples/python/rpc_flatbuf_example.py for a complete FlatBuffers loopback.

In-Place Raw Replies

For very small hot-path handlers, @method(raw=True, inplace=True) lets the handler write the reply directly into the registered send buffer:

class InplaceService:
    @method(raw=True, inplace=True)
    def echo(self, req_ptr, req_nbytes, resp_ptr, resp_cap) -> int:
        n = min(req_nbytes, resp_cap)
        ctypes.memmove(resp_ptr, req_ptr, n)
        return n

The handler must return the number of bytes written. While the handler runs, the session send lock is held, so use this only for very short handlers.

Error Handling

Common client-visible errors:

Error Meaning
RpcTimeoutError future.wait(timeout=...) timed out.
RemoteRpcError The remote handler raised an exception.
RpcError Base class for SlimeRPC client-visible errors.
from dlslime.rpc import RemoteRpcError, RpcTimeoutError

try:
    result = calc.add(1, 2).wait(timeout=1.0)
except RpcTimeoutError:
    ...
except RemoteRpcError as exc:
    print(exc.type_name, exc.message)

One-Shot Serving

serve_once() dispatches exactly one incoming call and returns. It is useful for tests and controlled loops:

from dlslime.rpc import serve_once

serve_once(worker, CalcService(), peer="driver:0")

RNR Monitoring

For RDMA receiver-not-ready debugging, use SoftRnrMonitor:

from dlslime.rpc import SoftRnrMonitor

monitor = SoftRnrMonitor()
monitor.snapshot()

# run workload

print(monitor.delta())
print(monitor.total_delta())

Full Examples

  • examples/python/rpc_example.py: typed pickle RPC loopback.
  • examples/python/rpc_flatbuf_example.py: raw FlatBuffers RPC loopback.
  • bench/python/rpc_bench_slime_worker.py: benchmark worker.
  • bench/python/rpc_bench_slime_driver.py: benchmark driver.