Introduction

What is sind?

sind (Slurm-IN-Docker) creates and manages containerized Slurm clusters for development, testing, and CI/CD workflows. Each node runs as a separate Docker container with systemd as init, providing a realistic multi-node Slurm environment without requiring bare-metal infrastructure.

Inspired by kind (Kubernetes in Docker), sind offers a familiar CLI experience for quickly spinning up and tearing down complete Slurm clusters in seconds.

Why sind?

Setting up a multi-node Slurm cluster for development or testing traditionally requires dedicated machines, VMs, or complex provisioning. Tools like Vagrant and Docker Compose can get you there, but they demand significant configuration effort and maintenance. sind complements these approaches by trading some flexibility for ease of use and speed — a single command creates a fully working cluster.

Instant clusters — go from zero to a working Slurm cluster in seconds, not hours
Reproducible environments — every team member gets the same setup, every CI run starts clean
No infrastructure needed — just Docker on your laptop or CI runner

Features

Multi-node, multi-cluster & multi-realm

Each cluster consists of individual containers for controller, submitter, and worker nodes. Run multiple clusters simultaneously with shared networking, and organize them into isolated realms for federation and multi-tenant testing scenarios.

System containers

Unlike typical Docker containers that run a single application process, sind nodes are full system containers running systemd as init — closely emulating bare-metal machines. Services like munge, sshd, and slurmctld start and interact exactly as they would on real nodes, with proper service dependencies, process supervision, and signal handling. This means you can apply the same configuration management tools you use on bare metal — Ansible, Chef, Puppet, Salt — directly to sind nodes.

Designed for CI/CD

sind runs rootless on standard GitHub Actions runners — no privileged containers, no custom runner images. The sind-action GitHub Action installs sind and creates clusters in a single workflow step. Use realms to isolate parallel matrix jobs on the same runner.

Worker lifecycle

Dynamically add and remove worker nodes from running clusters. Test how your workloads react to nodes joining and leaving — without touching the controller.

Power cycle simulation

Shutdown, reboot, freeze, and power-cycle individual nodes to simulate real-world failure scenarios. Validate your fault-tolerance strategies before they matter.

Minimal dependencies

sind needs nothing but Docker and a container image. Install a single binary and you’re ready to go. sind is also usable as a Go library for embedding cluster management into your own tooling.

AI-ready via MCP

sind includes a built-in Model Context Protocol (MCP) server that exposes all CLI commands as tools for AI assistants. Run sind mcp start to let tools like Claude, VS Code Copilot, or Cursor create clusters, check status, and manage nodes on your behalf. Register sind with your editor in one command — sind mcp claude enable, sind mcp vscode enable, or sind mcp cursor enable.

Next steps

Ready to try it? Head to the Getting Started section to install sind and create your first cluster.