Atlas
Atlas Development Guide
type: guide title: Atlas Development Guide status: Draft status_detail: "DG1 S3 — front-door doc for building on Atlas, proven against a real worked-example plugin (dev-guide-proof); author-from-guide gaps folded back. Map, not rewrite." author: devops-lead drafted: 2026-06-09 updated: 2026-06-09 summary: "The one doc a Codex or Claude developer reads before touching Atlas: how the dev loop and runtimes work, where plugins/skills/code live, how to use the SDK, how to write tests, and the documentation and delivery standards to follow." source: "Workspace SoT — knowledge/atlas/development-guide.md. The BookStack copy on docs.netos.io is a rendered mirror, not a second source of truth." related:
- knowledge/atlas/capability-spine/dev.yaml
- team/devops-lead/outputs/atlas-operating-foundation-design.md
- team/devops-lead/outputs/atlas-netos-delivery-master-plan.md
Atlas Development Guide
This is the single page you read before you build on Atlas. It is a map, not an encyclopedia: it tells you where everything lives and how the pieces fit, and it points you at the authoritative doc for each piece rather than repeating it. When this guide and a linked design doc disagree, the design doc wins for its own subject and this guide should be corrected.
How capabilities are cited. Atlas describes itself through the Capability
Spine — a grouped index of every Atlas capability, each citable by a stable id
of the form group.slug. Throughout this guide, a capability is cited by that id
and linked to the live Capabilities index. The ids are stable, so cite
them freely; to jump to one, open the index and filter by the id (the page has a
filter box that matches on id, name, surface, and owner). The spine itself is
gates.capability-spine.
Source of truth. This file —
knowledge/atlas/development-guide.mdin the workspace — is authoritative. It renders in Atlas at/file?path=knowledge%2Fatlas%2Fdevelopment-guide.md. The copy published to BookStack (docs.netos.io) is a rendered mirror for the wiki audience; never edit the wiki and expect it to flow back.
1. What Atlas is, for a developer
Atlas v2 is an agent platform assembled from a small kernel plus plugins, driven
by LLM runtimes, deployed by Ansible-pull from Git. As a developer you work
across five repositories, all cloned under netos-gitlab/netos-agents/ in this
workspace and originating on uks-git01.prod.netos.io:
Repo (clone path under netos-gitlab/netos-agents/) |
Owns |
|---|---|
netos-atlas |
The kernel (packages/kernel/), the web app (apps/web/, Next.js), and the API (apps/api/, Fastify). Kernel modules live in modules/ (files, playbook-runtime, scheduler). |
netos-atlas-plugins |
Every plugin under plugins/<id>/. This is where most feature code lands. |
netos-atlas-sdk |
The published TypeScript SDK, @netos/atlas-sdk (packages/ts/). Plugins import their types and helpers from here. |
netos-atlas-deploy |
The deploy repo (Ansible). Symlinked into the workspace at platforms/ansible/playbooks/atlas-deploy/. Stages workspace files onto hosts, fetches plugin source, builds, flips the release. |
netos-mcp |
The MCP server exposing the netbox.* / support.* data tools and the offline bin/dryrun-playbook harness. |
The kernel / plugin / SDK boundary is the thing to internalise first:
- The kernel owns the runtime, the database, auth, the route table, and the module lifecycle. You rarely change it.
- A plugin is a self-contained unit that registers routes, actions, jobs,
and UI against a
PluginContextthe kernel hands it atinit(). This is your primary build surface. - The SDK is the typed contract between the two: a plugin imports
PluginContextand the manifest helpers from@netos/atlas-sdkand never reaches into kernel internals.
For the why behind this shape — capability as the primary object, the in-Atlas dev loop, the authoring kit — read the Atlas Operating Foundation design (section 8 is the dev loop, section 6 the authoring kit). This guide does not restate it.
2. The dev loop — author → test → publish → deploy
This is the one section that is genuinely new connective tissue; everything else links out. The loop has four legs, and a change is not done until it has been round all four.
flowchart LR
A["Author<br/>(Codex / Claude runtime)"] --> B["Test<br/>(colocated suites + CI)"]
B --> C["Publish<br/>(commit → main → push to uks-git01)"]
C --> D["Deploy<br/>(atlas-deploy → host-verify)"]
D -->|gap found| A
classDef leg fill:#1f2937,color:#e5e7eb,stroke:#374151;
class A,B,C,D leg;
Test. Write tests with the code (section 6). CI runs them on every change and fails red on a real failure.
Publish. Commit to main. Workspace files (KB, spine YAML, skills) sync via
source.workspace-mirror; repo clones push to uks-git01 via
source.gitlab-sync (the git-repo wrapper). The deploy repo is its own
clone, source.atlas-deploy-repo, and is not workspace-mirrored — it
must be pushed explicitly.
Deploy. Run the gate's named atlas-deploy command against the target host,
then verify ground truth on the host — never self-report (section 7). For multi-
step gated work the gates.gate-runner drives the steps and
gates.gated-delivery is the process they follow.
3. Where code lives
Four homes, by kind of thing:
- Plugins →
netos-atlas-plugins/plugins/<id>/. A plugin is a directory with aplugin.jsonmanifest, asrc/(TypeScript,entrypointis usuallysrc/index.ts), colocatedtests/, and optionallyschema/and asettings.schema.json. The canonical read-only template to clone isplugins/system-capability-spine/(itself a clone ofsystem-designs-index). - Agent skills →
knowledge/skills/<skill>/. These are behaviours an agent performs (e.g.code_edit,host_deploy,branch_push), deployed to hosts via the skill-bundle mechanism, not compiled into a plugin. - Kernel modules →
netos-atlas/modules/<module>/. Core runtime concerns (files,playbook-runtime,scheduler). You touch these only when a change is genuinely kernel-level. - Workspace knowledge →
knowledge/(this guide, the capability spine YAML, playbooks, brand hub). Authoritative reference; staged onto hosts by the deploy.
The "two kinds of skills" distinction
The word "skill" names two different layers; keep them separate (Operating Foundation section 4.4):
- An agent skill is how an agent does something — a behaviour under
knowledge/skills/. - An MCP tool / capability is what data or action exists — the
netbox.*/support.*surface served bynetos-mcp(mcp.netos-mcp-server). - A skill uses tools. State the link explicitly when you author one (e.g.
kb-answer → support.find_topic / support.read_kb).
4. Using the SDK
Plugins build against @netos/atlas-sdk (the FND1 deliverable that consolidated
59 local type mirrors down to one published package — the story is in the
SDK mirror re-audit, and consumer docs are in the
SDK README).
Import types — never re-declare them locally. The SDK is types + helpers, not
a manifest builder: there is no defineManifest/definePlugin export. The
manifest is hand-authored JSON (plugin.json, below); from the SDK you import
the runtime contract your init() builds against:
import type { PluginContext, ModuleInitResult } from '@netos/atlas-sdk';
PluginContext is the single object the kernel hands your init(ctx). Its
surface (from packages/ts/src/index.ts):
| Field | What you use it for |
|---|---|
ctx.routes |
Register HTTP routes (ctx.routes.get/post(...)), with a requirePerm guard. |
ctx.actions |
Expose and call cross-plugin actions (the provides_api surface). |
ctx.settings / ctx.secrets |
Read plugin config and secrets. Keys must be lowercase — the seeder lowercases env keys before schema match, so an upper-case key seeds as unknown_key. |
ctx.db |
Scoped SQL handles, but only for the bindings you opted into via requires.db (see allowlist below). |
ctx.audit / ctx.capture |
Emit audit records; resolve capture level. |
ctx.health |
Register a health probe (surfaced on the host health endpoint). |
ctx.flow |
Ambient flow-frame helpers for orchestration-driven plugins. |
ctx.log |
Structured logging. |
ctx.api |
The stable cross-plugin read API surface. |
The manifest (plugin.json)
id,name,version,author,language: "ts",entrypoint: "src/index.ts".pack— which deploy pack the plugin belongs to. This field is the source of truth for pack membership; the deploy catalog mirrors it. A plugin only loads on a host if its slug is in an enabled pack. The catalog lives under theatlas_pack_catalog:key innetos-atlas-deploy/group_vars/all/packs.yml, so a newcoreplugin must be added to theatlas_pack_catalog.corelist there (mirroring itspack: core) or it is filtered out at install and 404s after deploy.requires.plugins— other plugins this one depends on.requires.db— the DB-binding allowlist. A scopedctx.dbhandle appears only if the key is inrequires.dband registered in the kernel'sdb-handles.ts(the currently registered keys areusageRollups,mcpRequests,mcpCalls— camelCase); an unknown key is dropped at bind time (not rejected at parse), so it degrades silently with a misleading warning. Declare[]if you need no DB.provides_api— the action surface other plugins may call.routes,jobs,ui— declared surfaces (theuientries become Atlas pages).
For the worked manifest, read plugins/system-capability-spine/plugin.json — a
read-only, requires.db: [], single-UI-page plugin that is the cleanest minimal
example.
5. Writing a plugin
The reliable path is scaffold from a proven read-only plugin, then adapt:
- Clone the template. Copy
plugins/system-capability-spine/toplugins/<your-id>/. Renameid,name, routes, andprovides_apiinplugin.json. Setpackto the pack you intend (corefor always-on). - Write
src/index.ts. Exportasync function init(ctx: PluginContext). Register routes with an explicitrequirePerm(e.g.admin.plugins.read), and return anyactionsyou expose. Keep the handler thin; put logic in sibling modules (src/scan.ts, etc.). - Settings & secrets. Declare a
settings.schema.jsonand read values viactx.settings.get(...)/ctx.secrets. Lowercase every key. - Pack allowlist. If the plugin is new, add its id under the matching
atlas_pack_catalog.<pack>list innetos-atlas-deploy/group_vars/all/packs.yml(mirroring itsplugin.jsonpack). This is a deploy-repo edit and must be pushed to uks-git01 (source.atlas-deploy-repo). - Provide an API only if another plugin needs it.
provides_apiplusctx.actionsis how plugins call each other; don't expose internals you don't have to.
The DG1 worked example — the dev-guide-proof plugin authored solely by
following this guide — is the concrete "now you do it" exhibit; see section 9.
6. Writing tests
Tests are real and gating now (the FND2 gate made CI execute the colocated suites and fail on red). Two patterns coexist:
- Colocated
*.test.tsnext tosrc/, compiled and run via the plugin's ownpnpm test. The plugins-repo CIteststage (scripts/run-plugin-tests.mjs) runs every plugin's suite on every change; a single failing test fails the pipeline.netos-atlashas the equivalent job for its kernel suites. node:testrunner suites undertests/(e.g.tests/*.test.mjs) for plugins that test against fixtures rather than the type build.
Local checks before you push:
- Typecheck — the repo-root eslint config has no TS parser, so per-file lint
errors are noise. Use
tscby tsconfig instead:node node_modules/typescript/bin/tsc -p <abs path to plugin tsconfig>. - Unit tests — run the plugin's
pnpm testin its own directory. - Playbook dry-run — for playbook/journey changes, validate offline against
fixtures with
netos-mcp/bin/dryrun-playbookbefore any host run. The dry-run / approval discipline issafety.dry-run-approval.
CI is the backstop, not the substitute: a green local run plus a green pipeline on uks-git01 is the bar before deploy.
7. Publishing & deploying
Publish
- Workspace files (KB,
knowledge/atlas/capability-spine/*.yaml, this guide,knowledge/skills/) are committed to workspacemain;source.workspace-mirrorsyncs them. No feature branches that outlive their gate. - Repo clones (
netos-atlas,netos-atlas-plugins,netos-atlas-sdk) push to uks-git01 with thegit-repowrapper (source.gitlab-sync), which injects the right token. Pushes to uks-git01 needGIT_SSL_CAINFO=/usr/lib/python3/dist-packages/certifi/cacert.pem. - The deploy repo (
netos-atlas-deploy) is its own clone and is not workspace-mirrored — edits to deploy tasks must be committed and pushed separately (source.atlas-deploy-repo).
Deploy
- Full deploy stages workspace files onto the host mount, fetches
plugins-src@main, rebuilds web, and flips the release:platforms/ansible/bin/ansible deploy atlas-deploy --limit eus-az2-atlas-lab02. - Plugins-only fast deploy (
atlas-plugins.yml) re-fetches plugin source and restarts (~3 min vs ~15) — the right path when only plugin code changed and no new workspace file needs staging. - No migrations run automatically.
atlas-deployapplies neither kernel nor plugin DB migrations; a schema change must be applied and verified on the host by hand. - Staging gaps are real. The deploy only stages the files its tasks name. A new workspace doc (like this guide) needs its own staging task, or it 404s on the host even though the commit landed.
Verify on the host — never self-report
A deploy step is "done" only when ground truth on the host confirms it
(safety.fail-closed-verify): read the live release with
readlink current, confirm the plugin loaded, check the journal, and probe the
real route with a host-minted admin cookie. Build-then-flip means a broken
redeploy leaves the prior release current, so a failed verify is safe to retry.
Gate verify scripts must be fail-closed and must avoid echo | grep -q under
pipefail (SIGPIPE races a false-negative pause); use here-strings.
8. Documentation & delivery standards
- Design docs. Anything that describes how we'll build X before building it
goes to
team/<role>/outputs/<topic>-design.mdwithtype: designYAML frontmatter (status Draft → Approved → Implemented → Deprecated). The filename-design.mdheuristic puts it in thegates.designs-index(/designs). Use```mermaidblocks, not ASCII art; if youclassDefa lightfill:, also set a darkcolor:or the text renders unreadable. - The capability spine. When you ship a capability, add or update its entry
in
knowledge/atlas/capability-spine/<group>.yamland let the gap-ledger loop keep coverage honest. Entries are validated (schema, unique id,group.slugprefix, non-emptydocs) by the spine plugin'sbin/validate-spine.mjs. Thedocs.kindenum isbookstack | design | schema | tool | playbook | skill | readme— there is noguidekind, so reference markdown usesreadme. Adding an entry needs no kernel deploy: commit the YAML, deploy stages it, a plugin Refresh re-scans. - Doc standard. The house Atlas docs standard is
knowledge.docs-overhaul; playbooks are indexed underknowledge.playbook-index. - Voice. Internal docs (this guide, designs, handovers) use plain US-English
and are unrestricted on punctuation. The em-dash ban and the brand voice apply
only to customer-facing copy — that hub is
knowledge.brand-voice-hub. - Deploy on close. A gate or milestone is not closed until its changes are deployed to the surface where they run, with the deploy command's output as evidence. Exceptions must be named explicitly in the design doc.
- Gated delivery. Multi-step builds run as numbered gates with a handover doc
per gate (
gates.gated-delivery, full process in the gated-delivery process doc). Thegates.gate-runnerexecutes the steps; keep close steps small so they don't overrun the wall-clock budget. Models a task runs under are selected bymodels.profiles. - Sandbox & security. Untrusted file input flows through
sandbox.file-ingest; code execution is confined bysandbox.nsjail(use a finiterlimit_fsize, nevermax/inf, or writes fail with EIO). The operational baseline isSECURITY.md.
9. Worked example — dev-guide-proof
The proof that this guide is sufficient is a minimal plugin authored solely by
following it: netos-atlas-plugins/plugins/dev-guide-proof/. It is a read-only
TypeScript plugin scaffolded from system-capability-spine, with:
plugin.json—id: dev-guide-proof,pack: core,language: ts,requires.db: [],requires.plugins: [].src/index.ts— one GET route,/api/plugins/dev-guide-proof/ping→{ ok: true, guide: "dev.development-guide" }.- a colocated test.
It exercises the highest-value path the guide documents: importing
PluginContext / ModuleInitResult from @netos/atlas-sdk with no local
mirror, hand-authoring a strict-schema-valid manifest, adding the slug to
atlas_pack_catalog.core in packs.yml, and running a colocated test. It is
additive and disposable — it can stay as a permanent reference exhibit or be
removed with no schema or DB impact.
Validated (DG1 S3), all from inside the plugin dir:
pnpm install --no-frozen-lockfile— links the SDK via thefile:dep.node node_modules/typescript/bin/tsc -p tsconfig.json --noEmit— typecheck, exit 0.node --test --import tsx './tests/**/*.test.mjs'— colocated suite, 2/2 pass.- Manifest load:
plugin.jsonparses clean against the kernelManifestSchema(the exact gatemodule-loader.loadOne()runs before a plugin reacheslist()).
Gaps found while authoring solely from this guide, folded back above: the
SDK has no defineManifest helper (section 4 import corrected); the manifest
JSON Schema lives in the SDK repo and the enforced gate is the kernel's strict
zod ManifestSchema, not a schemas/plugin.schema.json in the plugins repo
(section 4); and the pack catalog is nested under atlas_pack_catalog: with the
plugin.json pack field as source of truth (sections 4 and 5). The guide and
the real path are now in sync.
Capabilities cited in this guide
Each links to the Capabilities index; filter by the id to open the entry.
dev.codex-runtime,dev.claude-runtime,dev.model-router,dev.runtime-profiles,dev.dev-lab-fleet— the dev runtimes, routing, profiles, and fleet.gates.gate-runner,gates.gated-delivery,gates.designs-index,gates.capability-spine— delivery.source.gitlab-sync,source.workspace-mirror,source.atlas-deploy-repo— publish & deploy plumbing.sandbox.file-ingest,sandbox.nsjail,safety.fail-closed-verify,safety.dry-run-approval— sandbox & safety.models.profiles,mcp.netos-mcp-server,knowledge.docs-overhaul,knowledge.playbook-index,knowledge.brand-voice-hub— models, data, docs.