DLSlimeCache: RDMA cache service assignment directory¶
Status: V0 landed in this branch.
One-line summary: DLSlimeCache is a small service that owns a preallocated memory region, exposes it through a composed
PeerAgent, and records(peer_agent_id, version) -> AssignmentBatchmanifests so clients can read cached bytes back through the existing DLSlime RDMA endpoint path.
This document describes the current cache-service design. It intentionally
does not model peer-to-peer references as a cache mode: a worker that wants
to read directly from another worker can already do that through
PeerAgent and RDMAEndpoint. DLSlimeCache starts only when bytes are
written into the cache service's own registered memory.
Goals¶
- Reuse DLSlime's existing data plane. The cache server is a
PeerAgentpeer with a registered memory region; clients use normalwriteandreadoperations. - Keep cache metadata tiny. The C++ cache core stores only assignment manifests keyed by the original Engine/PeerAgent id plus a generated version.
- Keep the first version service-shaped. Users can run
dlslime-cache start/status/stop, and Python examples can perform a real end-to-end RDMA roundtrip. - Avoid inventing a new storage abstraction for peer-to-peer transfer. P2P remains P2P; cache means data has entered the cache service MR.
Non-goals¶
- No shallow mode. The old
store(key, extents, mode="shallow")idea was removed because it duplicates PeerAgent's direct P2P path. - No tiered allocator yet. V0 manages one fixed slab size per service instance; adaptive per-store slab classes are deferred.
- No persistence, replication, master election, SSD tier, or distributed cache protocol. Placement and replication policy belong in NanoDeploy.
- No server-side RDMA read on query. Clients issue their own RDMA reads using the returned manifest.
Data Model¶
The core object is an assignment manifest:
struct AssignmentManifest {
std::string peer_agent_id;
dlslime::AssignmentBatch assignments;
std::vector<uint64_t> slab_ids;
uint64_t version;
};
peer_agent_id is the original Engine/PeerAgent owner id. version is
generated by the cache server. slab_ids records the fixed cache slabs
owned by the manifest. assignments is a ready-to-run batch for the
consumer's RDMA read path.
The C++ cache core exposes:
AssignmentManifest store_assignments(peer_agent_id, assignments);
Optional<AssignmentManifest> query_assignments(peer_agent_id, version);
bool delete_assignments(peer_agent_id, version);
CacheStats stats();
void clear(); // test helper
There is no Extent, Manifest, CacheMode, store(key, ...),
load(key), or delete(key) API in the landed V0 surface.
Service Flow¶
The cache service composes a real PeerAgent:
dlslime-cache start --memory-size ...preallocates host memory.- The service registers that buffer as a PeerAgent memory region, default
name
cache. GET /peer-agenttells clients the cache PeerAgent id, NanoCtrl address, cache MR name, slab size, memory size, and resource info.- A client connects to the cache PeerAgent through NanoCtrl.
- The client stores a read manifest with
POST /store; the service allocates cache slabs and rewrites cache-side offsets in the returned manifest. - The client writes bytes into the allocated cache MR offsets with normal RDMA write.
- A consumer queries the manifest with
POST /query. - The consumer feeds the returned assignments to
agent.read(...). - The client removes the manifest with
POST /deletewhen done.
The example at examples/python/cache_client_example.py performs this full
roundtrip and checks correctness.
HTTP API¶
GET /healthz¶
Returns:
GET /stats¶
Returns assignment and slab counters:
{
"slab_size": 262144,
"memory_size": 1073741824,
"num_slabs": 4096,
"used_slabs": 3,
"free_slabs": 4093,
"num_assignment_peers": 1,
"num_assignment_entries": 1,
"num_assignments": 3,
"assignment_bytes": 655360
}
GET /peer-agent¶
Returns the cache service's PeerAgent and cache MR metadata:
{
"peer_agent_id": "cache-agent:0",
"cache_mr_name": "cache",
"cache_mr_handle": 123,
"nanoctrl_url": "http://127.0.0.1:3000",
"scope": null,
"slab_size": 262144,
"memory_size": 1073741824,
"resource": {}
}
The endpoint returns 503 if the service was started without a PeerAgent or without preallocated cache memory.
POST /store¶
Request:
{
"peer_agent_id": "engine-a",
"assignments": [
{
"mr_key": 11,
"remote_mr_key": 22,
"target_offset": 0,
"source_offset": 0,
"length": 655360
}
]
}
Response:
{
"peer_agent_id": "engine-a",
"version": 1,
"total_bytes": 655360,
"slab_ids": [0, 1, 2],
"assignments": [
{
"mr_key": 11,
"remote_mr_key": 22,
"target_offset": 0,
"source_offset": 0,
"length": 262144
}
]
}
Large assignments are split into chunks no larger than slab_size.
When preallocated memory is enabled, each returned assignment owns one
slab id, and source_offset points at slab_id * slab_size inside the
cache MR.
POST /query¶
Request:
Response is the stored assignment manifest, or 404 if not found.
POST /delete¶
Request:
Response:
CLI¶
The public lifecycle commands are:
Data mode requires preallocated memory:
nanoctrl start
dlslime-cache start --ctrl http://127.0.0.1:3000 \
--host 127.0.0.1 --port 8765 --memory-size 1G
--metadata-only exists only for parser/control-plane tests. It starts the
HTTP metadata wrapper without a usable cache MR, so real clients should not
use it.
Useful service knobs:
--slab-size: maximum assignment slab bytes, default256K. Supported startup range is128Kto1G.--memory-size: preallocated cache MR size. Accepts suffixes such as512Mand1G.--cache-mr-name: PeerAgent memory-region name, defaultcache.--ctrl: NanoCtrl address, defaulthttp://127.0.0.1:3000.--peer-agent-alias: optional fixed alias for the service PeerAgent.
Python Client¶
from dlslime.cache import CacheClient
client = CacheClient(url="http://127.0.0.1:8765", peer_agent=agent)
server = client.connect_to_server()
stored = client.store(assignments)
queried = client.query(stored["peer_agent_id"], stored["version"])
deleted = client.delete(stored["peer_agent_id"], stored["version"])
If no peer_agent is passed, connect_to_server() creates one using the
NanoCtrl information advertised by /peer-agent.
Correctness Contract¶
- Store/query/delete metadata is protected by a
std::shared_mutex. query_assignments()andstats()take a shared lock.store_assignments(),delete_assignments(), andclear()take the write lock.- Delete removes the manifest and returns its slab ids to the free list. It does not fence or cancel RDMA reads that a client has already issued.
- Callers should delete after
read_future.wait()if they want the same correctness property as the example.
Because slabs are reusable, delete/evict must grow a lease or pin-count mechanism before production use so memory cannot be recycled while a client has an in-flight read.
Slab Semantics¶
slab_size is currently a startup-time normalization and capacity unit:
- Store splits every assignment into chunks of at most
slab_size. - Supported range is
128K <= slab_size <= 1G. memory_size / slab_sizegives the number of logical slabs.- If
memory_size > 0, store rejects manifests that would exceed the configured logical slab count. - Store allocates slab ids from a free list and rewrites returned
assignment
source_offsetvalues to cache MR slab offsets. - Delete returns slab ids to the free list.
used_slabsandfree_slabscome from allocator state.
Per-store adaptive slab sizing is deferred until tiered capacity accounting and leases exist; changing the slab unit per manifest would make lifecycle semantics ambiguous in the current V0 directory.
Implementation Status¶
| Component | Status |
|---|---|
| C++ assignment directory | Landed |
| Pybind cache bindings | Landed |
| Python HTTP service | Landed |
CacheClient wrapper |
Landed |
dlslime-cache start/status/stop |
Landed |
| Real RDMA client example | Landed |
| HTTP delete path | Landed |
| Fixed slab allocator / reuse | Landed |
| Leases / pin counts | Not started |
| NanoDeploy placement integration | Not started |
Current C++ surface:
from dlslime._slime_c import Assignment, cache
srv = cache.CacheServer(slab_size=256 * 1024, memory_size=1024 * 1024)
m = srv.store_assignments("engine-a", [Assignment(1, 2, 0, 0, 655360)])
got = srv.query_assignments("engine-a", m.version)
srv.delete_assignments("engine-a", m.version)
Example¶
Start services:
nanoctrl start
dlslime-cache start --ctrl http://127.0.0.1:3000 \
--host 127.0.0.1 --port 8765 --memory-size 1G
Run the client:
Expected success signal:
Stop the service:
Next Steps¶
- Add tiered slab sizing, e.g.
128K..1G, so each store can choose the smallest fitting slab class once capacity accounting is backed by real slab ownership instead of a single fixed startup unit. - Add slab leases or pin counts so delete/evict cannot race with in-flight reads.
- Add metrics for assignment entries, bytes, logical slab pressure, and failed stores.
- Integrate NanoDeploy placement policy on top of the cache client.
- Add multi-client stress tests around store/query/delete/read ordering.