MCP Tools Reference
Haptix exposes a stable MCP tool surface through the protocol. Every tool is called by your AI agent — you don't call them directly. Everything available via MCP is also available through the haptix CLI.
Tool index
| Tool | Category | Description |
|---|---|---|
start_session |
Session | Begin an agent session (auto-discovers devices) |
end_session |
Session | End the active session |
device_status |
Device | Check connection state |
device_info |
Device | Get device model, OS, screen size |
list_devices |
Device | List all visible devices |
select_device |
Device | Bind session to a specific device |
screenshot |
Screenshot | Capture the screen |
screenshot_stream_start |
Screenshot | Start continuous streaming (USB only) |
screenshot_stream_stop |
Screenshot | Stop streaming |
accessibility_tree |
Accessibility | Get UI element hierarchy |
tap |
Gesture | Single tap |
double_tap |
Gesture | Double tap |
long_press |
Gesture | Long press (context menus) |
swipe |
Gesture | Swipe between two points |
drag |
Gesture | Drag between two points |
draw_path |
Gesture | One continuous stroke through many points |
scroll |
Gesture | Scroll page content |
pinch |
Gesture | Pinch gesture |
rotate |
Gesture | Rotation gesture |
type_text |
Text Input | Type into focused field |
clear_text |
Text Input | Clear focused field |
get_console_output |
Console | Read app console output |
set_activity |
Activity | Show/hide activity halo |
Session
start_session
Begin an agent session. This is the entry point — call it before any other tool.
start_session auto-discovers connected USB devices and auto-selects if only one is found. It returns the device name, model, screen size, and connection state. You don't need to call list_devices → select_device → device_info separately — start_session handles discovery for you.
| Parameter | Type | Required | Description |
|---|---|---|---|
name |
string | No | Session name or purpose description |
Returns a session code, device info (if a device is connected), and confirmation.
end_session
End the active session. Removes the activity halo, stops recording, and returns a summary with duration and tool call count. Always call when done.
No parameters.
Device
device_status
Check connection state. Returns connection and session status.
No parameters.
device_info
Get device details: model, screen size, OS version, orientation, battery level, app bundle ID, debugger state, and UI automation status.
No parameters.
Returns JSON with: model, osVersion, screenWidth, screenHeight, scaleFactor, orientation, batteryLevel, isCharging, bundleIdentifier, appName, debuggerAttached, uiAutomationEnabled.
list_devices
List all connected USB devices. Returns device name, app name, bundle ID, model, and connection status for each.
No parameters.
Most agents don't need this —
start_sessionauto-discovers and auto-selects devices. Uselist_devicesonly when you have multiple devices and need to choose.
select_device
Bind this session to a specific device. All subsequent commands target this device. If not called, commands target the first available device.
| Parameter | Type | Required | Description |
|---|---|---|---|
device |
string | Yes | Device identifier — matches service name, bundle ID, or app name (case-insensitive, partial match) |
Screenshot & Observation
screenshot
Capture the screen. The primary tool for visual discovery.
Screenshots are Mac-side compositor captures from CoreMediaIO. Haptix waits for a fresh frame from the selected device and fails clearly if it cannot get one. There is no device-side screenshot fallback, and Haptix refuses to guess across attached phones if capture-source identity drifts.
| Parameter | Type | Required | Description |
|---|---|---|---|
mode |
string | No | full (with status bar), app (default), or region |
format |
string | No | png (default) or jpeg |
annotated |
boolean | No | Overlay bounding boxes with labels, identifiers, and coordinates |
filter |
string | No | Filter annotations: interactive, adjustable, button, text, input, image |
highlight |
string | No | Spotlight one element by accessibility identifier (implies annotated) |
Annotated screenshots show green boxes for interactive elements and blue boxes for static elements.
screenshot_stream_start
Start continuous screenshot streaming at a specified frame rate. Requires a USB connection.
| Parameter | Type | Required | Description |
|---|---|---|---|
fps |
integer | No | Frames per second, 1–10. Default: 2 |
screenshot_stream_stop
Stop an active screenshot stream.
No parameters.
accessibility_tree
Get UI elements with labels, identifiers, values, traits, and coordinates.
| Parameter | Type | Required | Description |
|---|---|---|---|
mode |
string | No | compact (flat list, ~2K tokens, default) or full (nested hierarchy, ~20K tokens) |
Compact mode returns a flat list of meaningful elements. Full mode returns the complete nested view hierarchy. Prefer compact — it's far more token-efficient.
Gestures
All gesture tools return a multi-content response: a fresh follow-up screenshot of the resulting UI state, confirmation text, and any console output captured during the action. No need to call screenshot after every gesture unless you want a different format or a higher-resolution capture.
tap
Single tap. Prefer identifier or label over coordinates — they're more reliable and survive layout changes.
| Parameter | Type | Required | Description |
|---|---|---|---|
identifier |
string | No | Accessibility identifier (preferred) |
label |
string | No | Accessibility label (fallback) |
x |
number | No | X coordinate in points (last resort) |
y |
number | No | Y coordinate in points (last resort) |
Includes hit feedback showing which element was tapped: [Hit] "Submit" [button] identifier: "submitButton".
double_tap
Double tap. Same parameters as tap.
long_press
Long press. Triggers context menus. Specify duration for custom hold time.
| Parameter | Type | Required | Description |
|---|---|---|---|
identifier |
string | No | Accessibility identifier |
label |
string | No | Accessibility label |
x |
number | No | X coordinate |
y |
number | No | Y coordinate |
duration |
number | No | Hold duration in seconds. Default: 1.0 |
swipe
Swipe between two points. Use on background areas to scroll pages. Use on picker wheels to change values. Use on sliders to move the thumb.
| Parameter | Type | Required | Description |
|---|---|---|---|
startX |
number | Yes | Start X coordinate |
startY |
number | Yes | Start Y coordinate |
endX |
number | Yes | End X coordinate |
endY |
number | Yes | End Y coordinate |
duration |
number | No | Duration in seconds. Default: 0.3 |
drag
Drag between two points (slower than swipe, with press-and-hold). Use for reordering lists, moving items, or dragging handles. For sliders, prefer swipe.
| Parameter | Type | Required | Description |
|---|---|---|---|
startX |
number | Yes | Start X coordinate |
startY |
number | Yes | Start Y coordinate |
endX |
number | Yes | End X coordinate |
endY |
number | Yes | End Y coordinate |
holdDuration |
number | No | Dwell time at start before moving. Default: 0.15s |
duration |
number | No | Movement duration. Default: 1.0s |
draw_path
Draw one continuous one-finger stroke through an ordered list of waypoints. Use it for Freeform drawing, signatures, circles, lasso paths, and organic drags where breaking the motion into separate straight-line calls would incorrectly lift the finger.
| Parameter | Type | Required | Description |
|---|---|---|---|
points |
array | Yes | Ordered screen-point waypoints as [{x, y}, ...]. The first point is touch-down and the last point is lift. |
duration |
number | No | Total stroke duration in seconds. Default: 1.0s |
holdDuration |
number | No | Pause after touch-down before moving. Default: 0 |
closed |
boolean | No | Append the first point at the end if needed. Useful for circles. Default: false |
Separate draw_path calls imply lifting the finger on purpose.
scroll
Scroll page content in a direction. Simpler than swipe for basic navigation — no coordinates needed.
| Parameter | Type | Required | Description |
|---|---|---|---|
direction |
string | Yes | up, down, left, or right |
amount |
string | No | small (25%), medium (50%), large (75%), full_page (100%). Default: medium |
identifier |
string | No | Target a specific scrollable element by identifier |
pinch
Pinch gesture at a center point. scale less than 1 pinches in (zoom out), greater than 1 pinches out (zoom in).
| Parameter | Type | Required | Description |
|---|---|---|---|
centerX |
number | Yes | Center X coordinate |
centerY |
number | Yes | Center Y coordinate |
scale |
number | Yes | Scale factor |
duration |
number | No | Duration. Default: 0.5s |
Known issue: Pinch is currently broken. Gesture recognizers reject synthetic multi-touch events. See Compatibility.
rotate
Two-finger rotation at a center point.
| Parameter | Type | Required | Description |
|---|---|---|---|
centerX |
number | Yes | Center X coordinate |
centerY |
number | Yes | Center Y coordinate |
angle |
number | Yes | Rotation angle in degrees. Positive is clockwise |
duration |
number | No | Duration. Default: 0.5s |
Known issue: Rotate is currently broken. Same issue as pinch. See Compatibility.
Text Input
type_text
Type text into the focused input field. Tap the field first to focus it.
| Parameter | Type | Required | Description |
|---|---|---|---|
text |
string | Yes | The text to type |
clear_text
Clear all text in the focused input field.
No parameters.
Console
get_console_output
Get app console output including print(), NSLog(), os_log(), errors, and warnings. Console output is also auto-included in every gesture response, so you rarely need to call this directly.
| Parameter | Type | Required | Description |
|---|---|---|---|
since |
string | No | ISO 8601 timestamp — only return entries after this time |
limit |
integer | No | Maximum entries to return. Default: 100 |
contains |
string | No | Substring filter (case-insensitive) |
source |
string | No | Filter by source: stdout, stderr, or os_log |
level |
string | No | Filter by level: debug, info, notice, error, fault |
Activity
set_activity
Show or hide the activity halo on the device. The halo auto-activates on tool use and auto-deactivates after ~10 seconds of inactivity, so you typically only need to explicitly dismiss it.
| Parameter | Type | Required | Description |
|---|---|---|---|
active |
boolean | Yes | true to show, false to hide |
During an active session (start_session → end_session), the halo stays on and cannot be dismissed.
Key behaviors
- Auto-screenshot: Every gesture tool returns a screenshot of the resulting UI state. No need to call
screenshotafter each action. - Fresh-or-fail screenshots: Haptix waits for a fresh compositor frame from the selected device. If capture-source identity drifts across attached phones, it returns a clear error instead of another device's pixels.
- Hit feedback: Tap tools include which element was hit:
[Hit] "Submit" [button] identifier: "submitButton" -- base layer. - Auto-console: Console output captured during the gesture is included in every response.
- Coordinates: All coordinates are in screen points (not pixels), origin
(0, 0)at top-left. Read them from annotated screenshots — don't estimate from image pixels. - Settle delay: After each gesture, the server waits 500ms for the UI to settle before capturing the follow-up screenshot.
- Auto-discovery:
start_sessiondiscovers USB devices automatically. No need forlist_devices→select_deviceunless you have multiple devices.