Introduction
cmux (Coding Agent Multiplexer) is a cross-platform desktop application for AI-assisted development with git worktree integration.
What is cmux?
cmux helps you work with multiple coding assistants more effectively via:
- Isolated workspaces with central view on git status updates
- Multi-model (
sonnet-4-*
,gpt-5-*
,opus-4-*
) support - Supporting UI and keybinds for efficiently managing a suite of agents
- Rich markdown outputs (mermaid diagrams, LaTeX, etc.)
Quick Links
- Install - Download and installation instructions
- Why Parallelize? - Why parallelize?
- Keyboard Shortcuts - Complete keyboard reference
- AGENTS - Developer guide for AI assistants
License
cmux is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
Copyright (C) 2025 Coder Technologies, Inc.
Install
Downloads
Release Builds
Download pre-built binaries from the releases page:
- macOS: Signed and notarized DMG (separate builds for Intel/Apple Silicon)
- Linux: AppImage
- Windows: not implemented, coming soon
Development Builds
Pre-built binaries are available from GitHub Actions:
- macOS: Signed and notarized DMG
macos-dmg-x64
(Intel Macs)macos-dmg-arm64
(Apple Silicon)
- Linux: AppImage (portable, works on most distros)
To download:
- Go to the Build workflow
- Click on the latest successful run
- Scroll down to "Artifacts" section
- Download the appropriate artifact for your platform
Installation
macOS:
- Download the DMG file for your Mac:
- Intel Mac:
macos-dmg-x64
- Apple Silicon:
macos-dmg-arm64
- Intel Mac:
- Open the DMG file
- Drag Cmux to Applications folder
- Open the app normally
The app is code-signed and notarized by Apple, so it will open without security warnings.
Linux:
- Download the AppImage file
- Make it executable:
chmod +x Cmux-*.AppImage
- Run it:
./Cmux-*.AppImage
Testing Pre-Release Builds
⚠️ Note: Only builds from the main
branch are signed and notarized. If you're testing a build from a pull request or other branch, you'll need to bypass macOS Gatekeeper:
- After installing, open Terminal
- Run:
xattr -cr /Applications/Cmux.app
- Run:
codesign --force --deep --sign - /Applications/Cmux.app
- Now you can open the app normally
Why Parallelize?
Here are some specific use cases we enable:
- Contextual continuity between relevant changes:
- e.g. create a workspace for
code-review
,refactor
, andnew-feature
- e.g. create a workspace for
- GPT-5-Pro: use the slow but powerful GPT-5-Pro for complex issues
- Run in the background for hours on end
- The stream will automatically resume after restarts or intermittent connection issues. We show a subtle indicator when the model completes.
- A/B testing: test a variety of approaches to the same problem, abandon the bad ones.
- Tangent management: launch tangents in
cmux
away from main work
Models
See also:
Currently we support the Sonnet 4 models and GPT-5 family of models:
anthropic:claude-sonnet-4-5
anthropic:claude-opus-4-1
openai:gpt-5
openai:gpt-5-pro
openai:gpt-5-codex
And we intend to always support the models used by 90% of the community.
Anthropic models are better supported than GPT-5 class models due to an outstanding issue in the Vercel AI SDK.
TODO: add issue link here.
Keyboard Shortcuts
cmux is designed to be keyboard-driven for maximum efficiency. All major actions have keyboard shortcuts.
Note: This document should be kept in sync with
src/utils/ui/keybinds.ts
, which is the source of truth for keybind definitions.
Platform Conventions
- macOS: Shortcuts use
⌘
(Command) as the primary modifier - Linux/Windows: Shortcuts use
Ctrl
as the primary modifier
When documentation shows Ctrl
, it means:
⌘
(Command) on macOSCtrl
on Linux/Windows
General
Action | Shortcut |
---|---|
Cancel / Close / Interrupt | Esc |
Chat & Messages
Action | Shortcut |
---|---|
Focus chat input | a , i , or Ctrl+I |
Send message | Enter |
New line in message | Shift+Enter |
Cancel editing message | Ctrl+Q |
Jump to bottom of chat | Shift+G |
Change model | Ctrl+/ |
Toggle thinking level | Ctrl+Shift+T |
Workspaces
Action | Shortcut |
---|---|
Create new workspace | Ctrl+N |
Next workspace | Ctrl+J |
Previous workspace | Ctrl+K |
Open workspace in terminal | Ctrl+T |
Modes
Action | Shortcut |
---|---|
Toggle between Plan and Exec modes | Ctrl+Shift+M |
Interface
Action | Shortcut |
---|---|
Open command palette | Ctrl+Shift+P |
Toggle sidebar | Ctrl+P |
Tips
- Vim-inspired navigation: We use
J
/K
for next/previous navigation, similar to Vim - VS Code conventions: Command palette is
Ctrl+Shift+P
and quick toggle isCtrl+P
(use⌘
on macOS) - Consistent modifiers: Most workspace/project operations use
Ctrl
as the modifier - Natural expectations: We try to use shortcuts users would naturally expect (e.g.,
Ctrl+N
for new) - Focus anywhere: Use
Ctrl+I
to quickly jump to the chat input from anywhere in the application - Per-model thinking:
Ctrl+Shift+T
toggles thinking on/off and remembers your last preference for each model - Terminal access:
Ctrl+T
opens the current workspace in your system terminal
Vim Mode
cmux includes a built-in Vim mode for the chat input, providing familiar Vim-style editing for power users.
Enabling Vim Mode
Vim mode is always enabled. Press ESC to enter normal mode from insert mode.
Modes
Insert Mode (Default)
- This is the default mode when typing in the chat input
- Type normally, all characters are inserted
- Press ESC or Ctrl-[ to enter normal mode
Normal Mode
- Command mode for navigation and editing
- Indicated by "NORMAL" text above the input
- Pending commands are shown (e.g., "NORMAL d" when delete is pending)
- Press i, a, I, A, o, or O to return to insert mode
Navigation
Basic Movement
- h - Move left one character
- j - Move down one line
- k - Move up one line
- l - Move right one character
Word Movement
- w - Move forward to start of next word
- W - Move forward to start of next WORD (whitespace-separated)
- b - Move backward to start of previous word
- B - Move backward to start of previous WORD
- e - Move to end of current/next word
- E - Move to end of current/next WORD
Line Movement
- 0 - Move to beginning of line
- _ - Move to first non-whitespace character of line
- $ - Move to end of line
- Home - Same as 0
- End - Same as $
Column Preservation
When moving up/down with j/k, the cursor attempts to stay in the same column position. If a line is shorter, the cursor moves to the end of that line, but will return to the original column on longer lines.
Entering Insert Mode
- i - Insert at cursor
- a - Append after cursor
- I - Insert at beginning of line
- A - Append at end of line
- o - Open new line below and insert
- O - Open new line above and insert
Editing Commands
Simple Edits
- x - Delete character under cursor
- p - Paste after cursor
- P - Paste before cursor
Undo/Redo
- u - Undo last change
- Ctrl-r - Redo
Line Operations
- dd - Delete line (yank to clipboard)
- yy - Yank (copy) line
- cc - Change line (delete and enter insert mode)
Operators + Motions
Vim's power comes from combining operators with motions. All operators work with all motions:
Operators
- d - Delete
- c - Change (delete and enter insert mode)
- y - Yank (copy)
Motions
- w - To next word
- b - To previous word
- e - To end of word
- $ - To end of line
- 0 - To beginning of line
- _ - To first non-whitespace character
Examples
- dw - Delete to next word
- de - Delete to end of word
- d$ - Delete to end of line
- cw - Change to next word
- ce - Change to end of word
- c0 - Change to beginning of line
- y$ - Yank to end of line
- ye - Yank to end of word
- yy - Yank line (doubled operator)
Shortcuts
- D - Same as d$ (delete to end of line)
- C - Same as c$ (change to end of line)
Text Objects
Text objects let you operate on semantic units:
Inner Word (iw)
- diw - Delete inner word (word under cursor)
- ciw - Change inner word
- yiw - Yank inner word
Text objects work from anywhere within the word - you don't need to be at the start.
Visual Feedback
- Cursor: Thin blinking cursor in insert mode, solid block in normal mode
- Mode Indicator: Shows current mode and pending commands (e.g., "NORMAL d" when waiting for motion)
Keybind Conflicts
ESC Key
ESC is used for:
- Exiting Vim normal mode (highest priority)
- NOT used for canceling edits (use Ctrl-Q instead)
- NOT used for interrupting streams (use Ctrl-C instead)
Tips
-
Learn operators + motions: Instead of memorizing every command, learn the operators (d, c, y) and motions (w, b, $, 0). They combine naturally.
-
Use text objects:
ciw
to change a word is more reliable thancw
because it works from anywhere in the word. -
Column preservation: When navigating up/down, your column position is preserved across lines of different lengths.
Not Yet Implemented
Features that may be added in the future:
- ge - Backward end of word motion
- f{char}, t{char} - Find character motions
- i", i', i(, i[, i{ - More text objects
- 2w, 3dd, 5x - Count prefixes
- Visual mode - Character, line, and block selection
- Macros - Recording and replaying command sequences
- Marks - Named cursor positions
Context Management
Commands for managing conversation history length and token usage.
Comparison
Approach | /clear | /truncate | /compact | Start Here |
---|---|---|---|---|
Speed | Instant | Instant | Slower (uses AI) | Instant |
Context Preservation | None | Temporal | Intelligent | Intelligent |
Cost | Free | Free | Uses API tokens | Free |
Reversible | No | No | No | Yes |
Start Here
Start Here allows you to restart your conversation from a specific point, using that message as the entire conversation history. This is available on:
- Plans - Click "🎯 Start Here" on any plan to use it as your conversation starting point
- Final Assistant messages - Click "🎯 Start Here" on any completed assistant response
This is a form of "opportunistic compaction" - the content is already well-structured, so the operation is instant. You can review the new starting point before the old context is permanently removed, making this the only reversible context management approach (use Cmd+Z/Ctrl+Z to undo).
/clear
- Clear All History
Remove all messages from conversation history.
Syntax
/clear
Notes
- Instant deletion of all messages
- Irreversible - all history is permanently removed
- Use when you want to start a completely new conversation
/compact
- AI Summarization
Compress conversation history using AI summarization. Replaces the conversation with a compact summary that preserves context.
Syntax
/compact [-t <tokens>] [-m <model>]
[continue message on subsequent lines]
Options
-t <tokens>
- Maximum output tokens for the summary (default: ~2000 words)-m <model>
- Model to use for compaction (default: workspace model). Supports abbreviations likehaiku
,sonnet
, or full model strings
Examples
Basic compaction:
/compact
Limit summary size:
/compact -t 5000
Choose compaction model:
/compact -m haiku
Use Haiku for faster, lower-cost compaction.
Auto-continue with custom message:
/compact
Continue implementing the auth system
After compaction completes, automatically sends "Continue implementing the auth system" as a follow-up message.
Multiline continue message:
/compact
Now let's refactor the middleware to use the new auth context.
Make sure to add tests for the error cases.
Continue messages can span multiple lines for more detailed instructions.
Combine all options:
/compact -m haiku -t 8000
Keep working on the feature
Combine custom model, token limit, and auto-continue message.
Notes
- Uses the specified model (or workspace model by default) to summarize conversation history
- Preserves actionable context and specific details
- Irreversible - original messages are replaced
- Continue message is sent once after compaction completes (not persisted)
/truncate
- Simple Truncation
Remove a percentage of messages from conversation history (from the oldest first).
Syntax
/truncate <percentage>
Parameters
percentage
(required) - Percentage of messages to remove (0-100)
Examples
/truncate 50
Remove oldest 50% of messages.
Notes
- Simple deletion, no AI involved
- Removes messages from oldest to newest
- About as fast as
/clear
/truncate 100
is equivalent to/clear
- Irreversible - messages are permanently removed
OpenAI Responses API Limitation
⚠️ /truncate
does not work with OpenAI models due to the Responses API architecture:
- OpenAI's Responses API stores conversation state server-side
- Manual message deletion via
/truncate
doesn't affect the server-side state - Instead, OpenAI models use automatic truncation (
truncation: "auto"
) - When context exceeds the limit, the API automatically drops messages from the middle of the conversation
Workarounds for OpenAI:
- Use
/clear
to start a fresh conversation - Use
/compact
to intelligently summarize and reduce context - Rely on automatic truncation (enabled by default)
Instruction Files
Overview
cmux layers instructions from two locations:
~/.cmux/AGENTS.md
(+ optionalAGENTS.local.md
) — global defaults<workspace>/AGENTS.md
(+ optionalAGENTS.local.md
) — workspace-specific context
Priority within each location: AGENTS.md
→ AGENT.md
→ CLAUDE.md
(first match wins). If the base file is found, cmux also appends AGENTS.local.md
from the same directory when present.
Mode Prompts
Use mode-specific sections to optimize context and customize the behavior specific modes.
cmux reads mode context from sections inside your instruction files. Add a heading titled:
Mode: <mode>
(case-insensitive), at any heading level (#
..######
)
Rules:
- Workspace instructions are checked first, then global instructions
- The first matching section wins (at most one section is used)
- The section's content is everything until the next heading of the same or higher level
- Missing sections are ignored (no error)
Example (in either ~/.cmux/AGENTS.md
or my-project/AGENTS.md
):
# General Instructions
- Be concise
- Prefer TDD
## Mode: Plan
When planning:
- Focus on goals, constraints, and trade-offs
- Propose alternatives with pros/cons
- Defer implementation detail unless asked
## Mode: Compact
When compacting conversation history:
- Preserve key decisions and their rationale
- Keep code snippets that are still relevant
- Maintain context about ongoing tasks
- Be extremely concise—prioritize information density
Available modes
- exec - Default mode for normal operations
- plan - Activated when the user toggles plan mode in the UI
- compact - Automatically used during
/compact
operations to guide how the AI summarizes conversation history
Customizing the compact
mode is particularly useful for controlling what information is preserved during automatic history compaction.
Practical layout
~/.cmux/
AGENTS.md # Global instructions
AGENTS.local.md # Personal tweaks (gitignored)
my-project/
AGENTS.md # Project instructions (may include "Mode: Plan", etc.)
AGENTS.local.md # Personal tweaks (gitignored)
Project Secrets
Securely manage environment variables for your projects in cmux. Project secrets are automatically injected when the agent executes bash commands, making it easy to provide API keys, tokens, and other sensitive configuration.
What Are Project Secrets?
Project secrets are key-value pairs stored per project that are:
- Automatically injected as environment variables when running bash commands
- Stored outside repo in
~/.cmux/secrets.json
- Project-scoped - each project has its own set of secrets
- Workspace-inherited - all workspaces in a project use the same secrets
Common Use Cases
- API Keys:
ANTHROPIC_API_KEY
,OPENAI_API_KEY
,GITHUB_TOKEN
- Authentication tokens:
NPM_TOKEN
,DOCKER_HUB_TOKEN
- Database credentials:
DATABASE_URL
,POSTGRES_PASSWORD
- Service endpoints:
API_BASE_URL
,WEBHOOK_URL
- Build configuration:
BUILD_ENV
,FEATURE_FLAGS
Managing Secrets
Opening the Secrets Modal
- Find your project in the left sidebar
- Hover over the project name
- Click the 🔑 key icon that appears
How Secrets Are Used
When the agent runs bash commands (via the bash
tool), all project secrets are automatically injected as environment variables:
# If you have a secret: GH_TOKEN=ghp_abc123
# The agent can use it in commands:
gh api /user # Uses GH_TOKEN from environment
The agent doesn't need to explicitly reference secrets - they're available as regular environment variables in all bash executions within that project's workspaces.
Security Considerations
Storage
- Secrets are stored in
~/.cmux/config.json
- Stored in plaintext - the config file is not encrypted
- The config file has standard user-only file permissions
Related
- Agentic Git Identity - Configure Git credentials for AI commits using Project Secrets
Agentic Git Identity
Configure cmux to use a separate Git identity for AI-generated commits, making it easy to distinguish between human and AI contributions. Reasons to use a separate identity include:
- Clear attribution
- Preventing (accidental) destructive actions
- Enforcing review flow, e.g. preventing AI from merging into
main
while allowing humans
Setup Overview
- Create a GitHub account for your agent (e.g.,
username-agent
) - Generate a Classic GitHub token
- Configure Git to use the agent identity
- Configure Git credentials to use the token
Step 1: Create Agent GitHub Account
Create a separate GitHub account for your agent:
- Sign up at github.com/signup
- Use a distinctive username (e.g.,
yourname-agent
,yourname-ai
) - Use a separate email (GitHub allows plus-addressing:
yourname+ai@example.com
)
Note: This is optional but recommended. You can also use your main account with a different email/name.
Step 2: Generate Classic GitHub Token
Classic tokens are easier to configure than fine-grained tokens for repository access.
- Log into your agent GitHub account
- Go to Settings → Developer settings → Personal access tokens → Tokens (classic)
- Click "Generate new token (classic)"
- Configure the token:
- Note: "cmux agent token" (or similar)
- Expiration: Choose based on your security preferences
- Scopes: Select
repo
(Full control of private repositories)
- Click "Generate token"
- Copy the token immediately - you won't see it again
Step 3: Configure Git Identity
Add the Git identity environment variables as Project Secrets in cmux:
- Open cmux and find your project in the sidebar
- Click the 🔑 key icon to open the secrets modal
- Add the following four secrets:
GIT_AUTHOR_NAME
=Your Name (Agent)
GIT_AUTHOR_EMAIL
=yourname+ai@example.com
GIT_COMMITTER_NAME
=Your Name (Agent)
GIT_COMMITTER_EMAIL
=yourname+ai@example.com
- Click "Save"
These environment variables will be automatically injected when the agent runs Git commands in that project.
Note: If you need the agent identity outside of cmux, you can alternatively set these as global environment variables in your shell configuration (
~/.zshrc
,~/.bashrc
, etc.)
Step 4: Configure GitHub Authentication
Install GitHub CLI
If you don't have it:
# macOS
brew install gh
# Windows
winget install --id GitHub.cli
# Linux
# See https://github.com/cli/cli/blob/trunk/docs/install_linux.md
Configure Git Credential Helper
Set up Git to use the GitHub CLI for authentication. The recommended approach is to use gh auth setup-git
, which scopes the credential helper to GitHub only:
# Configure gh as credential helper for GitHub (recommended)
gh auth setup-git
This configures Git to use gh
for GitHub authentication while preserving your existing credential helpers for other Git hosts.
Alternative: Manual configuration (for advanced users)
If you need more control or want to completely replace existing credential helpers:
# Scope to GitHub only (preserves other credential helpers)
git config --global credential.https://github.com.helper '!gh auth git-credential'
# OR: Replace all credential helpers (may break non-GitHub authentication)
git config --global --unset-all credential.helper
git config --global credential.helper ""
git config --global --add credential.helper '!gh auth git-credential'
⚠️ Warning: The "replace all" approach will disable platform keychain helpers and may break Git authentication for non-GitHub remotes (GitLab, Bitbucket, etc.).
Prompting Tips
Some tips and tricks from the cmux developers on getting the most out of your agents.
Persist lessons
When you notice agents make the same class of mistake repeatedly, ask them to modify their AGENTS.md
to prevent the mistake from happening again. We have found this pattern is most effective when:
- You specify the size of the change
- LLMs love fluff — always specify a size constraint like "change at most two sentences"
- Ask the agent to focus on the general lesson, not the specific mistake
Codebases often have "watering hole" type files that are read in the course of
certain types of changes. For example, you may have a central file defining an API interface. When
the lesson is only relevant to a particular type of change it's often better to persist lessons as
source comments in such files vs. expanding the global AGENTS.md
.
Define the loop
Agents thrive on TDD. Try to define their task in terms of what checks need to pass before they can claim success.
For cmux development, we have a wait_pr_checks.sh
script
that polls GitHub and ensures that:
- There are no dirty changes
- All checks pass
- All review comments are resolved
- There are no merge conflicts
Create a similar script for your project and try asking your agent to work persistently until it passes.
Aggressively prune context
Even though Sonnet 4.5 has up to 1M in potential context, we experience a noticeable improvement in
quality when kept <100k tokens. We suggest running /compact
with a continue message
often to keep context small. For example:
/compact
<what you want next>
This will automatically send a follow-up message after compaction to keep the session flowing.
Keeping code clean
Some prompts that help you keep the codebase clean:
Elevate the fix to design level:
- We keep seeing this class of bug in component X, fix this at a design level
- There's bug X, provide a fix that solves the whole class of bugs
At the end of a long session before compaction, try asking:
- How can the code/architecture be improved to make similar changes easier?
- What notes in AGENTS.md would make this change easier for future Assistants?
At end of long session (ideally after compaction), try asking:
- DRY your work
- Strive for net LoC reduction
- Review in depth, simplify
System Prompt
cmux
is interested in supporting a variety of models at different levels of performance.
To that end, we're built on the Vercel AI SDK which does most of the heavy lifting in creating a unified API for all models.
Even with consistent support at the protocol layer, we have found that different models react very differently to the same set of tools and instructions. So, we strive to minimize the system prompt and let users figure out the prompting trade-offs.
Here's a snippet from src/services/systemMessage.ts
which is our shared system prompt (minus tools).
// The PRELUDE is intentionally minimal to not conflict with the user's instructions.
// cmux is designed to be model agnostic, and models have shown large inconsistency in how they
// follow instructions.
const PRELUDE = `
<prelude>
You are a coding agent.
<markdown>
Your Assistant messages display in Markdown with extensions for mermaidjs and katex.
When creating mermaid diagrams:
- Avoid side-by-side subgraphs (they display too wide)
- For comparisons, use separate diagram blocks or single graph with visual separation
- When using custom fill colors, include contrasting color property (e.g., "style note fill:#ff6b6b,color:#fff")
- Make good use of visual space: e.g. use inline commentary
- Wrap node labels containing brackets or special characters in quotes (e.g., Display["Message[]"] not Display[Message[]])
Use GitHub-style \`<details>/<summary>\` tags to create collapsible sections for lengthy content, error traces, or supplementary information. Toggles help keep responses scannable while preserving detail.
</markdown>
</prelude>
`;
function buildEnvironmentContext(workspacePath: string): string {
return `
<environment>
You are in a git worktree at ${workspacePath}
- This IS a git repository - run git commands directly (no cd needed)
- Tools run here automatically
- Do not modify or visit other worktrees (especially the main project) without explicit user intent
- You are meant to do your work isolated from the user and other agents
</environment>
`;
}
Storybook
Storybook is a tool for developing and testing UI components in isolation. It provides a sandboxed environment where you can build, view, and test components without running the full Electron application.
Starting Storybook
make storybook
# or
bun run storybook
This will start the Storybook development server at http://localhost:6006
.
Building Static Storybook
To build a static version of Storybook that can be deployed:
make storybook-build
# or
bun run storybook:build
The output will be in storybook-static/
.
Writing Stories
Stories are colocated with their components. For example, ErrorMessage.tsx
has its stories in ErrorMessage.stories.tsx
in the same directory.
Basic Story Structure
import type { Meta, StoryObj } from "@storybook/react";
import { MyComponent } from "./MyComponent";
const meta = {
title: "Components/MyComponent",
component: MyComponent,
parameters: {
layout: "centered", // or "fullscreen" or "padded"
},
tags: ["autodocs"], // Enables automatic documentation
} satisfies Meta<typeof MyComponent>;
export default meta;
type Story = StoryObj<typeof meta>;
export const Default: Story = {
args: {
prop1: "value1",
prop2: "value2",
},
};
export const Variant: Story = {
args: {
prop1: "different value",
prop2: "another value",
},
};
Component Examples
See the existing stories for reference:
src/components/ErrorMessage.stories.tsx
- Simple component with multiple statessrc/components/Modal.stories.tsx
- Complex component with children and multiple variants
Global Styles
Storybook automatically applies the same global styles as the main app:
- Color variables (
GlobalColors
) - Font definitions (
GlobalFonts
) - Scrollbar styles (
GlobalScrollbars
)
These are configured in .storybook/preview.tsx
.
Handling Electron APIs
Some components depend on window.api
for Electron IPC communication. For these components:
- Preferred: Extract the component logic to accept props instead of calling IPC directly
- Alternative: Mock the
window.api
object in.storybook/preview.tsx
Example mock structure:
window.api = {
workspace: {
create: async () => ({ success: true, metadata: { ... } }),
list: async () => ({ success: true, workspaces: [...] }),
// ...
},
// ... other IPC channels
};
Benefits
- Isolated Development: Build components without running the full Electron app
- Visual Testing: See all component states at once
- Documentation: Stories serve as living documentation with
autodocs
- Faster Iteration: Hot reload is faster than Electron rebuilds
- Accessibility: Storybook addons can check accessibility issues
Configuration
.storybook/main.ts
- Main Storybook configuration.storybook/preview.tsx
- Global decorators and parameterstsconfig.json
- Includes.storybook/**/*.ts
for type checking
Tips
- Keep stories simple and focused on visual states
- Use Storybook's Controls addon to make props interactive
- Add multiple stories for different states (loading, error, success, etc.)
- Use the
tags: ["autodocs"]
option to generate automatic documentation
Terminal Benchmarking
cmux ships with a headless adapter for Terminal-Bench. The adapter runs the Electron backend without opening a window and exercises it through the same IPC paths we use in integration tests. This page documents how to launch benchmarks from the repository tree.
Prerequisites
- Docker must be installed and running. Terminal-Bench executes each task inside a dedicated Docker container.
uv
is available in the nixdevShell
(provided viaflake.nix
), or install it manually from https://docs.astral.sh/uv/.- Standard provider API keys (e.g.
ANTHROPIC_API_KEY
,OPENAI_API_KEY
) should be exported so cmux can stream responses.
Optional environment overrides:
Variable | Purpose | Default |
---|---|---|
CMUX_AGENT_REPO_ROOT | Path copied into each task container | repo root inferred from the agent file |
CMUX_TRUNK | Branch checked out when preparing the project | main |
CMUX_WORKSPACE_ID | Workspace identifier used inside cmux | cmux-bench |
CMUX_MODEL | Preferred model (supports provider/model syntax) | anthropic/claude-sonnet-4-5 |
CMUX_THINKING_LEVEL | Optional reasoning level (off , low , medium , high ) | high |
CMUX_MODE | Starting mode (plan or exec ) | exec |
CMUX_TIMEOUT_MS | Optional stream timeout in milliseconds | no timeout |
CMUX_CONFIG_ROOT | Location for cmux session data inside the container | /root/.cmux |
CMUX_APP_ROOT | Path where the cmux sources are staged | /opt/cmux-app |
CMUX_PROJECT_PATH | Explicit project directory inside the task container | auto-detected from common paths |
Running Terminal-Bench
All commands below should be run from the repository root.
Quick smoke test (single task)
uvx terminal-bench run \
--dataset terminal-bench-core==0.1.1 \
--agent-import-path benchmarks.terminal_bench.cmux_agent:CmuxAgent \
--n-tasks 1
This downloads the Terminal-Bench runner, copies the cmux sources into the container, and validates the adapter against the first task only. Use this before attempting a full sweep.
Full dataset
uvx terminal-bench run \
--dataset terminal-bench-core==0.1.1 \
--agent-import-path benchmarks.terminal_bench.cmux_agent:CmuxAgent
Results (pass/fail, token usage, wall-clock) are printed at the end of the run. Terminal-Bench also writes per-task logs under the current working directory; review them when diagnosing failures.
You can also use make
:
TB_CONCURRENCY=6 TB_LIVESTREAM=1 \
make benchmark-terminal TB_ARGS="--n-tasks 3 --model anthropic/claude-sonnet-4-20250514 --agent-kwarg mode=plan --agent-kwarg thinking_level=medium"
TB_DATASET
defaults to terminal-bench-core==0.1.1
, but can be overridden (e.g. make benchmark-terminal TB_DATASET=terminal-bench-core==head
).
Use --agent-kwarg mode=plan
to exercise the plan/execute workflow—the CLI will gather a plan first, then automatically approve it and switch to execution. Leaving the flag off (or setting mode=exec
) skips the planning phase.
Use TB_CONCURRENCY=<n>
to control --n-concurrent
(number of concurrently running tasks) and TB_LIVESTREAM=1
to stream log output live instead of waiting for the run to finish. These map to Terminal-Bench’s --n-concurrent
and --livestream
flags.
How the Adapter Works
The adapter lives in benchmarks/terminal_bench/cmux_agent.py
. For each task it:
- Copies the cmux repository (package manifests +
src/
) into/tmp/cmux-app
inside the container. - Ensures Bun exists, then runs
bun install --frozen-lockfile
. - Launches
src/debug/agentSessionCli.ts
to prepare workspace metadata and stream the instruction, storing state underCMUX_CONFIG_ROOT
(default/root/.cmux
).
CMUX_MODEL
accepts either the cmux colon form (anthropic:claude-sonnet-4-5
) or the Terminal-Bench slash form (anthropic/claude-sonnet-4-5
); the adapter normalises whichever you provide.
Troubleshooting
command not found: bun
– ensure the container can reach Bun’s install script, or pre-install Bun in your base image. The adapter aborts if the install step fails.- Workspace creation errors – set
CMUX_PROJECT_PATH
to the project directory inside the task container if auto-discovery misses it. - Streaming timeouts – pass
--n-tasks 1
while iterating on fixes, or setCMUX_TIMEOUT_MS=180000
to reinstate a timeout if needed.
AGENT INSTRUCTIONS
Project Context
- Electron + React desktop application
- No existing users - migration code is not needed when changing data structures
AI-Generated Content Attribution
When creating public operations (commits, PRs, issues), always include:
- 🤖 emoji in the title
- "Generated with
cmux
" in the body (if applicable)
This ensures transparency about AI-generated contributions.
PR Management
Prefer to reuse existing PRs by force-pushing to the same branch, even if the branch name becomes irrelevant. Avoid closing and recreating PRs unnecessarily - PR spam clutters the repository history.
After submitting or updating PRs, always check merge status:
gh pr view <number> --json mergeable,mergeStateStatus | jq '.'
This is especially important with rapid development where branches quickly fall behind.
Wait for PR checks to complete:
./scripts/wait_pr_checks.sh <pr_number>
This script polls every 5 seconds and fails immediately on CI failure, bad merge status, or unresolved review comments.
Key status values:
mergeable: "MERGEABLE"
= No conflicts, can mergemergeable: "CONFLICTING"
= Has conflicts, needs resolutionmergeStateStatus: "CLEAN"
= Ready to merge ✅mergeStateStatus: "BLOCKED"
= Waiting for CI checksmergeStateStatus: "BEHIND"
= Branch is behind base, rebase neededmergeStateStatus: "DIRTY"
= Has conflicts
If branch is behind:
git fetch origin
git rebase origin/main
git push --force-with-lease
⚠️ NEVER Auto-Merge PRs
DO NOT enable auto-merge (gh pr merge --auto
) or merge PRs (gh pr merge
) without explicit user instruction.
Reason: PRs may need human review, discussion, or additional changes based on review comments (e.g., Codex feedback). Always:
- Submit the PR
- Wait for checks to pass
- Report PR status to user
- Wait for user to decide whether to merge
Only merge if the user explicitly says "merge it" or similar.
Writing PR Descriptions
Write PR bodies for busy reviewers. Be concise and avoid redundancy:
- Each section should add new information - Don't restate the same thing in different words
- Structure emerges from content - Some fixes need problem/solution/testing, others just need "what changed and why"
- If it's obvious, omit it - Problem obvious from solution? Don't state it. Solution obvious from problem? Skip to implementation details.
- Avoid over-explaining - Comprehensive testing checklists, multiple code examples, and detailed edge case lists make PRs harder to review. State the change and why it matters.
❌ Bad (redundant):
Problem: Markdown rendering is slow, causing 50ms tasks
Solution: Make markdown rendering faster
Impact: Reduces task time to <16ms
✅ Good (each section adds value):
ReactMarkdown was re-parsing content on every parent render because plugin arrays
were created fresh each time. Moved to module scope for stable references.
Verify with React DevTools Profiler - MarkdownCore should only re-render when content changes.
Project Structure
src/main.ts
- Main Electron processsrc/preload.ts
- Preload script for IPCsrc/App.tsx
- Main React componentsrc/config.ts
- Configuration management~/.cmux/config.json
- User configuration file~/.cmux/src/<project_name>/<branch>
- Workspace directories for git worktrees~/.cmux/sessions/<workspace_id>/chat.jsonl
- Session chat histories
Documentation Guidelines
Free-floating markdown docs are not permitted. Documentation must be organized:
- User-facing docs →
./docs/
directory- IMPORTANT: Read
docs/README.md
first before writing user-facing documentation - User docs are built with mdbook and deployed to https://cmux.io
- Must be added to
docs/SUMMARY.md
to appear in the docs - Use standard markdown + mermaid diagrams
- IMPORTANT: Read
- Developer docs → inline with the code its documenting as comments. Consider them notes as notes to future Assistants to understand the logic more quickly. DO NOT create standalone documentation files in the project root or random locations.
NEVER create markdown documentation files (README, guides, summaries, etc.) in the project root during feature development unless the user explicitly requests documentation. Code + tests + inline comments are complete documentation.
External API Docs
DO NOT visit https://sdk.vercel.ai/docs/ai-sdk-core. All of that content is already
in /tmp/ai-sdk-docs/**.mdx
.
(Generate them with ./scripts/update_vercel_docs.sh
if they don't exist.)
Documentation Guidelines
Developer documentation should live inline with relevant code as comments. The docs/
directory contains user-facing documentation.
Key Features
- Projects sidebar (left panel)
- Workspaces using git worktrees
- Configuration persisted to
~/.cmux/config.json
Package Manager
- Using bun - All dependencies are managed with bun (not npm)
- Use bun over npm whenever possible, including to:
- Install dependencies:
bun install
- Add packages:
bun add <package>
- Run scripts:
bun run <script>
- etc.
- Install dependencies:
- If you hit missing module/type errors locally or in CI, run
bun install
before diving into deeper debugging.
Development Commands
This project uses Make as the primary build orchestrator. See Makefile
for inline documentation.
Primary commands (use Make):
make dev
- Start development server (Vite + TypeScript watcher)make start
- Build and start Electron appmake build
- Build the application (with parallelism)make lint
- Run ESLint & typecheckmake lint-fix
- Run ESLint with --fixmake fmt
- Format all source files with Prettiermake fmt-check
- Check if files are formatted correctlymake typecheck
- Run TypeScript type checkingmake test
- Run unit testsmake test-integration
- Run all tests (unit + integration)make clean
- Clean build artifactsmake help
- Show all available targets
Backwards compatibility: Existing commands are available via bun run
(e.g., bun run dev
calls make dev
). New commands will only be added to Makefile
, not package.json
.
Refactoring
- When refactoring, use
git mv
to preserve file history instead of rewriting files from scratch
Testing
Test-Driven Development (TDD)
TDD is the preferred development style for agents.
- Prefer relocated complex logic into places where they're easily tested
- E.g. pure functions in
utils
are easier to test than complex logic in a React component
- E.g. pure functions in
- Strive for broad coverage with minimal tests
- Prefer testing large blocks of composite logic
- Tests should be written with the end-user experience in mind
- Good tests create conditions where the feature matters, then verify the difference. Don't just test that requests succeed with a flag enabled—create the scenario where the flag changes the outcome (e.g., build history that exceeds limits, then verify flag prevents the error). Tests must prove the feature actually does something, not just that it doesn't break things.
General Testing Guidelines
-
Always run
make typecheck
after making changes to verify types (checks both main and renderer) -
Unit tests should be colocated with their business logic - Place unit test files (*.test.ts) in the same directory as the code they test (e.g.,
aiService.test.ts
next toaiService.ts
) -
Don't test simple mapping operations - If the test just verifies the code does what it obviously does from reading it, skip the test.
-
E2E and integration tests may live in
./tests/
directory, but unit tests must be colocated -
Strive to decompose complex logic away from the components and into
.src/utils/
- utils should be either pure functions or easily isolated (e.g. if they operate on the FS they accept a path). Testing them should not require complex mocks or setup.
-
Integration tests:
- Run specific integration test:
TEST_INTEGRATION=1 bun x jest tests/ipcMain/sendMessage.test.ts -t "test name pattern"
- Run all integration tests:
TEST_INTEGRATION=1 bun x jest tests
(~35 seconds, runs 40 tests) - Performance: Tests use
test.concurrent()
to run in parallel within each file - NEVER bypass IPC in integration tests - Integration tests must use the real IPC communication paths (e.g.,
mockIpcRenderer.invoke()
) even when it's harder. Directly accessing services (HistoryService, PartialService, etc.) or manipulating config/state directly bypasses the integration layer and defeats the purpose of the test.
Examples of bypassing IPC (DON'T DO THIS):
// ❌ BAD - Directly manipulating config const config = env.config.loadConfigOrDefault(); config.projects.set(projectPath, { path: projectPath, workspaces: [] }); env.config.saveConfig(config);
- Run specific integration test:
// ❌ BAD - Directly accessing services const history = await env.historyService.getHistory(workspaceId); await env.historyService.appendToHistory(workspaceId, message);
**Correct approach (DO THIS):**
```typescript
// ✅ GOOD - Use IPC to save config
await env.mockIpcRenderer.invoke(IPC_CHANNELS.CONFIG_SAVE, {
projects: Array.from(projectsConfig.projects.entries()),
});
// ✅ GOOD - Use IPC to interact with services
await env.mockIpcRenderer.invoke(IPC_CHANNELS.HISTORY_GET, workspaceId);
await env.mockIpcRenderer.invoke(IPC_CHANNELS.WORKSPACE_CREATE, projectPath, branchName);
Acceptable exceptions:
- Reading context (like
env.config.loadConfigOrDefault()
) to prepare IPC call parameters - Verifying filesystem state (like checking if files exist) after IPC operations complete
- Loading existing data to avoid expensive API calls in test setup
If IPC is hard to test, fix the test infrastructure or IPC layer, don't work around it by bypassing IPC.
Command Palette (Cmd+Shift+P)
- Open with
Cmd+Shift+P
on macOS orCtrl+Shift+P
on Windows/Linux. - Quick toggle sidebar is
Cmd+P
/Ctrl+P
. - Palette includes workspace switching/creation, navigation, chat utils, mode/model, projects, and slash-command prefixes:
/
shows slash command suggestions (select to insert into Chat input).>
filters to actions only.
Styling
- Colors are centralized as CSS variables in
src/styles/colors.tsx
- Use CSS variables (e.g.,
var(--color-plan-mode)
) instead of hardcoded colors - Fonts are centralized as CSS variables in
src/styles/fonts.tsx
TypeScript Best Practices
-
Avoid
as any
in all contexts - Never useas any
casts. Instead:- Use proper type narrowing with discriminated unions
- Leverage TypeScript's type guards and the compiler's type checking
- Import and reuse existing types from dependencies rather than creating anonymous clones
- If a type is truly complex, create a proper type definition or interface
-
Use
Record<EnumType, ValueType>
for exhaustive mappings - When mapping enum values to strings, colors, or other values, useRecord
types instead of switch statements or if/else chains. This ensures TypeScript catches missing or invalid cases at compile time.// ✅ Good - TypeScript ensures all modes are handled const MODE_COLORS: Record<UIPermissionMode, string> = { plan: "var(--color-plan-mode)", edit: "var(--color-edit-mode)", }; // ❌ Avoid - Can miss cases, typos won't be caught switch (mode) { case "plan": return "blue"; case "edits": return "green"; // Typo won't be caught! }
-
Leverage TypeScript's utility types for UI-specific data - Use
Omit
,Pick
, and other utility types to create UI-specific versions of backend types. This prevents unnecessary re-renders and clearly separates concerns.// Backend type with all fields export interface WorkspaceMetadata { id: string; projectName: string; permissionMode: UIPermissionMode; nextSequenceNumber: number; // Backend bookkeeping } // UI type excludes backend-only fields export type WorkspaceMetadataUI = Omit<WorkspaceMetadata, "nextSequenceNumber">;
This pattern ensures:
- UI components don't re-render on backend-only changes
- Clear separation between UI and backend concerns
- Type safety - compiler catches if you try to access excluded fields
- Self-documenting code - types clearly show what data UI needs
-
Prefer type-driven development - Let TypeScript guide your architecture. When types become complex or you need many runtime checks, it often indicates a design issue. Simplify by:
- Creating focused types for specific contexts (UI vs backend)
- Using discriminated unions for state variations
- Leveraging the compiler to catch errors at build time
-
Use
using
for leakable system resources - Always use explicit resource management (using
declarations) for resources that need cleanup such as child processes, file handles, database connections, etc. This ensures proper cleanup even when errors occur.// ✅ Good - Process is automatically cleaned up using process = createDisposableProcess(spawn("command")); const output = await readFromProcess(process); // process.kill() called automatically when going out of scope // ❌ Avoid - Process may leak if error occurs before cleanup const process = spawn("command"); const output = await readFromProcess(process); process.kill(); // May never be reached if error thrown
-
This pattern maximizes type safety and prevents runtime errors from typos or missing cases
-
Centralize magic constants - Define in
src/constants/
and import everywhere. Never duplicate numbers/strings across backend, UI, tests, and schema descriptions.
Component State Management
For per-operation state tied to async workflows, parent components should own all localStorage operations. Child components should notify parents of user intent without manipulating storage directly, preventing bugs from stale or orphaned state across component lifecycles.
Module Imports
-
NEVER use dynamic imports - Always use static
import
statements at the top of files. Dynamic imports (await import()
) are a code smell that indicates improper module structure.// ❌ BAD - Dynamic import hides circular dependency const { getTokenizerForModel } = await import("../utils/tokenizer"); // ✅ GOOD - Static import at top of file import { getTokenizerForModel } from "../utils/tokenizer";
-
If you encounter circular dependencies - Restructure the code to eliminate them. Common solutions:
- Extract shared types/interfaces into a separate file
- Move shared utilities into a common module
- Invert the dependency relationship
- Use dependency injection instead of direct imports
Dynamic imports are NOT an acceptable workaround for circular dependencies.
Workspace IDs - NEVER Construct in Frontend
CRITICAL: Workspace IDs must NEVER be constructed in the frontend. This is a dangerous form of duplication that makes the codebase brittle.
-
❌ BAD - Constructing workspace ID from parts:
const newWorkspaceId = `${projectName}-${newName}`; // WRONG!
-
✅ GOOD - Get workspace ID from backend:
const result = await window.api.workspace.rename(workspaceId, newName); if (result.success) { const newWorkspaceId = result.data.newWorkspaceId; // Backend provides it }
Why this matters:
- Workspace ID format is a backend implementation detail
- If the backend changes ID format, frontend breaks silently
- Creates multiple sources of truth
- Leads to subtle bugs and inconsistencies
Always:
- Backend operations that change workspace IDs must return the new ID
- Frontend must use the returned ID, never construct it
- Backend is the single source of truth for workspace identity
IPC Type Boundaries
Backend types vs Frontend types - Keep them separate.
The IPC layer is the boundary between backend and frontend. Follow these rules to maintain clean separation:
Rules:
-
IPC methods should return backend types - Use
WorkspaceMetadata
, not custom inline types// ✅ GOOD - Returns backend type create(): Promise<{ success: true; metadata: WorkspaceMetadata } | { success: false; error: string }> // ❌ BAD - Duplicates type definition inline create(): Promise<{ success: true; workspace: { workspaceId: string; projectName: string; ... } }>
-
Frontend types extend backend types with UI context - Frontend has information backend doesn't
// Backend type (no projectPath - backend doesn't need it) interface WorkspaceMetadata { id: string; projectName: string; workspacePath: string; } // Frontend type (adds projectPath and branch for UI) interface WorkspaceSelection extends WorkspaceMetadata { projectPath: string; // Frontend initiated the call, so it has this branch: string; // Frontend tracks this for display workspaceId: string; // Alias for 'id' to match UI conventions }
-
Frontend constructs UI types from backend types + local context
// ✅ GOOD - Frontend combines backend data with context it already has const { recommendedTrunk } = await window.api.projects.listBranches(projectPath); const trunkBranch = recommendedTrunk ?? "main"; const result = await window.api.workspace.create(projectPath, branchName, trunkBranch); if (result.success) { setSelectedWorkspace({ ...result.metadata, projectPath, // Frontend already had this branch: branchName, // Frontend already had this workspaceId: result.metadata.id, }); } // ❌ BAD - Backend returns frontend-specific data const { recommendedTrunk } = await window.api.projects.listBranches(projectPath); const trunkBranch = recommendedTrunk ?? "main"; const result = await window.api.workspace.create(projectPath, branchName, trunkBranch); if (result.success) { setSelectedWorkspace(result.workspace); // Backend shouldn't know about WorkspaceSelection }
-
Never duplicate type definitions in IPC layer - Always import and use existing types
Why this matters:
- Single source of truth - Backend types are defined once
- Clean boundaries - Backend doesn't know about UI concerns
- Type safety - Changes to backend types propagate to IPC automatically
- Prevents duplication - No need to keep inline types in sync with source types
Debugging
bun run debug ui-messages --workspace <workspace-name>
- Show UI messages for a workspacebun run debug ui-messages --workspace <workspace-name> --drop <n>
- Show messages with last n dropped- Workspace names can be found in
~/.cmux/sessions/
UX Guidelines
-
DO NOT add UX complexity without permission - Keep interfaces simple and predictable. Do not add features like auto-dismiss, animations, tooltips, or other UX enhancements unless explicitly requested by the user.
Example of adding unwanted complexity:
// ❌ BAD - Added auto-dismiss without being asked useEffect(() => { if (errorMessage) { const timer = setTimeout(() => { setErrorMessage(null); }, 10000); return () => clearTimeout(timer); } }, [errorMessage]);
Instead, implement the simplest solution that meets the requirement.
DRY
Notice when you've made the same change many times, refactor to create a shared function or component, update all the duplicated code, and then continue on with the original work. When repeating string literals (especially in error messages, UI text, or system instructions), extract them to named constants in a relevant constants/utils file - never define the same string literal multiple times across files.
Avoid unnecessary callback indirection: If a hook detects a condition and has access to all data needed to handle it, let it handle the action directly rather than passing callbacks up to parent components. Keep hooks self-contained when possible.
UX Considerations
- For every operation in the frontend, there should be a keyboard shortcut.
- Buttons, widgets, etc. that have a keybind should display a tooltip for it during hover.
Logging
In the backend, use the log
class from log.ts
to log messages. Particularly spammy messages
should go through log.debug()
.
Solving Bugs
- When solving a new bug, consider whether there's a solution that simplifies the overall codebase to simplify the bug.
- If you're fixing via simplifcation, a new test case is generally not necessary.
- If fixing through additional complexity, add a test case if an existing convenient harness exists.
- Otherwise if creating complexity, propose a new test harness to contain the new tests.
Mode: Exec
If a user requests wait_pr_checks
, treat it as a directive to keep running that process and address failures continuously. Do not return to the user until the checks succeed or you encounter a blocker you cannot resolve alone. This mode signals that the user expects persistent execution without further prompting.
If static checks fail remotely, reproduce the error locally with make static-check
before responding. If formatting issues are flagged, run make fmt
to fix them before retrying CI.
If any test or check fails in CI, see if you can reproduce the failure locally before returning to wait_pr_checks. Try to run the minimal set of tests to reproduce the failure. This is in an effort to move fast. Take note of how long commands take to run and adjust your workflow to minimize time spent waiting.
Mode: Plan
In Plan Mode, attach a net LoC estimate to recommended approach(es). This estimate should be focussed on product code changes, not test code changes.