Banata

Sandboxes

Sandbox Capabilities

Advanced sandbox features — browser pairing, AI agent integration, human handoff, task completion reports, document processing, and storage options.

Sandboxes support optional capabilities that extend them beyond basic code execution. These are configured in the capabilities field when creating a sandbox.


Browser Pairing

A sandbox can be paired with a browser session, giving you both a code environment and a live browser in a single coordinated workspace.

Browser modes

ModeDescriptionRequired Plan
noneNo browser (default)All plans
paired-banata-browserA dedicated Banata browser session is created alongside the sandboxPro and above
local-chromiumA local Chromium instance runs inside the sandbox container and is previewed as the real in-sandbox desktop/browser sessionAll plans

Creating a sandbox with a paired browser

json
{
  "runtime": "bun",
  "size": "standard",
  "capabilities": {
    "browser": {
      "mode": "local-chromium",
      "persistentProfile": true,
      "streamPreview": true,
      "humanInLoop": true,
      "viewport": {
        "width": 1280,
        "height": 800
      }
    }
  }
}
FieldTypeDefaultDescriptionRequired Plan
modestring"none"Browser mode (see table above)—
recordingbooleanfalseRecord the browser session as MP4Pro and above
persistentProfilebooleanfalsePersist browser cookies/state across sessions—
streamPreviewbooleanAutoEnable live browser viewport streaming. In practice you normally leave this on for browser-enabled sandboxes.—
humanInLoopbooleanfalseEnable human-in-the-loop control of the browser—
viewport.widthnumberWorker defaultRequested browser viewport width in pixels—
viewport.heightnumberWorker defaultRequested browser viewport height in pixels—

When the sandbox is ready, the session response includes pairedBrowser with connection details:

json
{
  "pairedBrowser": {
    "sessionId": "browser-session-id",
    "cdpUrl": "wss://...",
    "previewUrl": "https://...",
    "recording": true,
    "persistentProfile": true,
    "controlMode": "shared"
  }
}

Browser preview

When browser preview is enabled, get the browser preview metadata:

bash
curl "https://api.boxes.banata.dev/v1/sandboxes/browser-preview?id=SANDBOX_ID" \
  -H "Authorization: Bearer br_live_..."

The response includes WebSocket URLs for real-time browser viewport streaming. This lets you build UIs that show what the browser is doing. For native sandbox browsers (local-chromium), the preview is the actual in-sandbox desktop/browser session exposed through noVNC.

If you want to hand a human a simple browser-only page, the SDK also derives a standalone viewer URL on your app host. That viewer page only renders the browser and handoff state and can be opened outside the dashboard.

Controlling the browser

Switch control between AI automation and human interaction:

bash
curl -X POST "https://api.boxes.banata.dev/v1/sandboxes/browser-preview/control" \
  -H "Authorization: Bearer br_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "id": "SANDBOX_ID",
    "mode": "human",
    "controller": "user@example.com",
    "leaseMs": 300000
  }'
FieldTypeRequiredDescription
idstringYesSandbox session ID
mode"ai" | "human" | "shared"YesWho controls the browser
controllerstringNoIdentifier for the current controller
leaseMsnumberNoHow long the controller has exclusive access (ms)

OpenCode Integration

OpenCode is an AI agent that can generate and execute code inside the sandbox. It turns your sandbox into a prompt-driven development environment.

Enabling OpenCode

json
{
  "runtime": "bun",
  "size": "standard",
  "capabilities": {
    "opencode": {
      "enabled": true,
      "defaultAgent": "build",
      "allowPromptApi": true
    }
  }
}
FieldTypeDefaultDescription
enabledbooleanfalseEnable OpenCode
defaultAgent"build" | "plan"—Default agent type for prompts
allowPromptApiboolean—Allow sending prompts via the API. If false, OpenCode works internally but cannot receive prompts via /v1/sandboxes/opencode/prompt.

Sending prompts

bash
curl -X POST "https://api.boxes.banata.dev/v1/sandboxes/opencode/prompt" \
  -H "Authorization: Bearer br_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "id": "SANDBOX_ID",
    "prompt": "Create a TypeScript script that fetches the top 10 Hacker News stories and saves them as JSON."
  }'
FieldTypeRequiredDescription
idstringYesSandbox session ID
promptstringYesThe instruction for the AI agent
agent"build" | "plan"NoAgent type (uses defaultAgent if omitted)
sessionIdstringNoOpenCode session ID for conversation continuity
noReplybooleanNoIf true, send the prompt without waiting for a response

OpenCode reads the current workspace state, generates code, writes files, and runs commands to fulfill the prompt.

If you are building your own chat UI, treat OpenCode as a server-side agent runtime:

  • GET /v1/sandboxes/opencode/state for current session/runtime status
  • GET /v1/sandboxes/opencode/messages for message history
  • GET /v1/sandboxes/opencode/events for live SSE streaming
  • POST /v1/sandboxes/opencode/prompt or /prompt-async to submit work

This is the same model the hosted dashboard uses.

Checking OpenCode status

bash
curl "https://api.boxes.banata.dev/v1/sandboxes/runtime?id=SANDBOX_ID" \
  -H "Authorization: Bearer br_live_..."

The response includes the current state of OpenCode (idle, working, complete) along with any outputs.

Error responses

StatusBodyCause
403OpenCode prompt API is disabled for this sandboxallowPromptApi is false
409OpenCode is not enabledcapabilities.opencode.enabled is not true

Human Handoff

Automated workflows sometimes encounter situations that require human judgment — MFA prompts, ambiguous UI elements, or CAPTCHA challenges. The human handoff system lets the sandbox request human assistance and pause automation until the human completes the task.

Requesting handoff

bash
curl -X POST "https://api.boxes.banata.dev/v1/sandboxes/handoff/request" \
  -H "Authorization: Bearer br_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "id": "SANDBOX_ID",
    "reason": "mfa",
    "message": "Please enter the 6-digit code from your authenticator app."
  }'
FieldTypeRequiredDescription
idstringYesSandbox session ID
reasonstringYesOne of: mfa, captcha, approval, login_failed, ambiguous_ui, file_download, custom
messagestringYesHuman-readable description of what's needed
requestedBystringNoWho initiated: opencode, worker, sdk, dashboard. Defaults to sdk.
resumePromptstringNoPrompt to automatically execute when the human returns control
expiresInMsnumberNoWhen the handoff request expires (ms)

Accepting handoff

A human sees the request through your application's UI and takes control:

bash
curl -X POST "https://api.boxes.banata.dev/v1/sandboxes/handoff/accept" \
  -H "Authorization: Bearer br_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "id": "SANDBOX_ID",
    "controller": "user@example.com",
    "leaseMs": 300000
  }'
FieldTypeRequiredDescription
idstringYesSandbox session ID
controllerstringYesIdentifier for the human taking control
leaseMsnumberNoHow long the human has control (ms)

Completing handoff

After the human finishes, they return control to automation:

bash
curl -X POST "https://api.boxes.banata.dev/v1/sandboxes/handoff/complete" \
  -H "Authorization: Bearer br_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "id": "SANDBOX_ID",
    "note": "MFA code entered successfully",
    "returnControlTo": "ai",
    "runResumePrompt": true
  }'
FieldTypeRequiredDescription
idstringYesSandbox session ID
controllerstringNoController identifier
notestringNoCompletion note
returnControlTo"ai" | "shared"NoWho gets control after completion
runResumePromptbooleanNoExecute the resumePrompt from the original request

Checking handoff state

bash
curl "https://api.boxes.banata.dev/v1/sandboxes/handoff?id=SANDBOX_ID" \
  -H "Authorization: Bearer br_live_..."

Returns the current handoff state including:

json
{
  "humanHandoff": {
    "requestId": "...",
    "status": "pending",
    "reason": "mfa",
    "message": "Please enter the 6-digit code from your authenticator app.",
    "requestedBy": "sdk",
    "requestedAt": 1772078705353,
    "expiresAt": 1772079005353
  }
}

Handoff statuses: pending → accepted → completed (or cancelled / expired).


Task Completion Reports

When the AI agent finishes a task, it can submit a structured completion report. The report contains a detailed step-by-step recipe — URLs visited, selectors used, values entered, wait times, and error workarounds — that can be saved as a browser skill and replayed by a cheaper model later.

How it works

OpenCode calls the banata-complete-task CLI tool (automatically installed in every sandbox with OpenCode enabled):

bash
banata-complete-task \
  --result "Successfully scraped pricing data from 3 competitor sites" \
  --report "1. Navigate to https://example.com/pricing\n2. Wait for .pricing-table selector\n3. Extract text from .plan-name and .plan-price elements\n4. Click 'Show annual pricing' toggle (.billing-toggle)\n5. Re-extract prices for annual plans\n6. Save results to /workspace/output/pricing.json" \
  --url "https://example.com/pricing" \
  --steps 6 \
  --duration 45000

The report is automatically uploaded to R2 as a JSON artifact and synced to Convex.

Retrieving the last task report

bash
curl "https://api.boxes.banata.dev/v1/sandboxes/runtime?id=SANDBOX_ID" \
  -H "Authorization: Bearer br_live_..."

The response includes a lastTaskReport field:

json
{
  "lastTaskReport": {
    "result": "Successfully scraped pricing data from 3 competitor sites",
    "report": "1. Navigate to https://example.com/pricing\n2. Wait for .pricing-table selector\n3. ...",
    "url": "https://example.com/pricing",
    "stepsCompleted": 6,
    "durationMs": 45000,
    "completedAt": 1772078705353
  }
}

Report fields

FieldTypeRequiredDescription
resultstringYesHigh-level summary of what was accomplished
reportstringYesDetailed step-by-step recipe: URLs visited, selectors used, values entered, wait times, error workarounds. This is the procedural recipe that gets auto-saved as a browser skill for future replay.
urlstringNoFinal URL the browser was on when the task completed
stepsCompletednumberNoTotal number of steps executed
actionsstring[]NoList of individual actions taken
durationMsnumberNoTotal duration in milliseconds
usageobjectNoLLM token usage metadata

Replaying reports

Task reports are designed as replay recipes for cheaper models. The report field contains enough detail — selectors, values, timing, and workarounds — that a smaller model can follow the same procedure without needing to reason from scratch. Reports are persisted as R2 artifacts under {artifactPrefix}/task-reports/.


Document Processing

For sandboxes that need to work with office documents, enable the documents capability:

json
{
  "runtime": "base",
  "size": "standard",
  "capabilities": {
    "documents": {
      "libreofficeHeadless": true
    }
  }
}

This pre-installs LibreOffice in headless mode, enabling document conversion, rendering, and manipulation:

bash
# Convert DOCX to PDF
libreoffice --headless --convert-to pdf document.docx
 
# Convert spreadsheet to CSV
libreoffice --headless --convert-to csv spreadsheet.xlsx

Storage Options

Configure workspace persistence and artifact organization:

json
{
  "capabilities": {
    "storage": {
      "workspace": "checkpointed",
      "artifactPrefix": "my-project/batch-1/"
    }
  }
}
FieldTypeDefaultDescription
workspace"ephemeral" | "checkpointed""ephemeral"Whether workspace can be checkpointed
artifactPrefixstring—Custom prefix for artifact storage paths

Combining Capabilities

Capabilities can be combined freely. A common pattern is a sandbox with browser pairing, OpenCode, and human handoff:

json
{
  "runtime": "bun",
  "size": "standard",
  "capabilities": {
    "browser": {
      "mode": "local-chromium",
      "recording": true,
      "streamPreview": true,
      "humanInLoop": true,
      "viewport": {
        "width": 1280,
        "height": 800
      }
    },
    "opencode": {
      "enabled": true,
      "defaultAgent": "build",
      "allowPromptApi": true
    },
    "storage": {
      "workspace": "checkpointed"
    }
  }
}

This creates an AI-driven workspace that can browse the web, write code, execute it, request human help for tricky situations, save everything for later, and produce detailed task reports that can be replayed by cheaper models.

All sandbox examples here use the single public sandbox size: standard.


Next Steps