Computer use
Drive local desktop apps from an agent via accessibility trees, screenshots, and safe UI actions.
The orca computer CLI lets an agent inspect and control native desktop apps — list running apps, read accessibility trees, click controls, set values, type text, scroll, and take screenshots. Use it when a task needs to operate the OS or a third-party app rather than a terminal or the built-in browser.
First-time setup
Check the runtime and permissions:
orca status --json
orca computer permissions --json
orca computer capabilities --json
If permissions reports anything missing, grant Accessibility (and Screen Recording on macOS) to Orca Computer Use in System Settings, then re-run permissions --json to confirm.
Snapshot → act → snapshot
Every interaction follows the same loop: read the app's current state, act on a specific element, then re-read state to verify the result.
orca computer list-apps --json
orca computer get-app-state --app com.spotify.client --json
orca computer click --app com.spotify.client --element-index 42 --json
Element indexes are scoped to the latest get-app-state result. Refresh state after navigation, focus changes, scrolling, or any app re-render before reusing an index.
Selecting an app
Prefer bundle IDs returned by list-apps:
orca computer get-app-state --app com.microsoft.edgemac --json
App names work when unambiguous (--app Spotify). Use --app pid:<number> only when bundle ID and name both collide.
Available actions
orca computer click --app <app> --element-index <i> --json
orca computer set-value --app <app> --element-index <i> --value "text" --json
orca computer type-text --app <app> --text "text" --json
orca computer press-key --app <app> --key Return --json
orca computer hotkey --app <app> --key CmdOrCtrl+A --json
orca computer paste-text --app <app> --text "text" --json
orca computer scroll --app <app> --element-index <i> --direction down --json
orca computer drag --app <app> --from-x 100 --from-y 100 --to-x 300 --to-y 300 --json
orca computer perform-secondary-action --app <app> --element-index <i> --action <name> --json
Prefer semantic actions (click, set-value, perform-secondary-action) over raw type-text or press-key — they target accessibility elements directly and survive focus changes that keyboard input doesn't.
Sensitive input
Pass secrets through stdin so they don't land in shell history:
printf '%s' "$TEXT" | orca computer set-value \
--app com.apple.Safari --element-index 7 --value-stdin --json
--text-stdin works the same way for type-text and paste-text.
Screenshots
get-app-state returns an accessibility tree and, by default, a screenshot. With --json, the image bytes are written to disk and the path is returned in screenshot.path rather than embedded in the response. Pass --no-screenshot when pixels aren't needed (faster, smaller payload). Pass --restore-window to bring a hidden or minimized window into view before capture.
Use it from an agent
The shipped computer-use skill packages the same command surface with safety guidance. Install it into the agent's skill directory:
npx skills add https://github.com/stablyai/orca --skill computer-use
See Skills registry & MCP for how skills are picked up.
Next steps
- Orca CLI overview — the rest of the CLI surface (worktrees, terminal, browser).
- Skills registry & MCP — distributing this CLI to agents.