AI-Powered macOS Application Automation via Model Context Protocol
AppMCP bridges the gap between AI models and macOS applications, enabling intelligent automation through visual inspection, UI interpretation, and precise control of native applications.
- Smart Screenshots: Capture high-resolution app windows using ScreenCaptureKit
- OCR Text Recognition: Extract text from screenshots using Apple's Vision Framework
- UI Tree Analysis: Extract detailed accessibility hierarchies for precise element targeting
- Multi-App Discovery: Identify and monitor multiple running applications simultaneously
- Precise Interactions: Mouse clicks, keyboard input, and gesture automation
- Smart Waiting: Intelligent delays and condition-based waiting mechanisms
- Error Recovery: Robust fallback strategies for reliable automation
- Permission Management: Seamless TCC (Transparency, Consent, and Control) integration
- Secure Communication: JSON-RPC over STDIO with structured error handling
- Bundle ID Validation: Verified application targeting for enhanced security
- macOS 15.0+ (Sequoia or later)
- Swift 6.1+
- Xcode 16.0+
# Clone the repository
git clone https://github.com/your-username/AppMCP.git
cd AppMCP
# Build the project
swift build -c release
# Run the daemon
./.build/release/appmcpd --stdio
AppMCP requires the following macOS permissions:
- ๐ Accessibility: System Preferences โ Privacy & Security โ Accessibility
- ๐บ Screen Recording: System Preferences โ Privacy & Security โ Screen Recording
The application will guide you through the permission setup process.
import json
import subprocess
# Start AppMCP server
process = subprocess.Popen(['./appmcpd', '--stdio'])
# Take screenshot of Weather app
request = {
"jsonrpc": "2.0",
"id": 1,
"method": "resources/read",
"params": {"uri": "app://app_screenshot"}
}
# Send request and get response
response = send_mcp_request(request)
print(f"๐ธ Screenshot captured: {response['result']['contents'][0]['text']}")
# Get accessibility tree
request = {
"jsonrpc": "2.0",
"id": 2,
"method": "resources/read",
"params": {"uri": "app://app_accessibility_tree"}
}
tree = send_mcp_request(request)
print(f"๐ณ UI Elements: {tree['result']['contents'][0]['text']}")
# Click on coordinates
request = {
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "mouse_click",
"arguments": {"x": 300, "y": 150}
}
}
# Type text
request = {
"jsonrpc": "2.0",
"id": 4,
"method": "tools/call",
"params": {
"name": "type_text",
"arguments": {"text": "Tokyo"}
}
}
graph TB
A[๐ค AI Model] -->|JSON-RPC| B[๐ก MCP Server]
B --> C[๐ Resources]
B --> D[๐ Tools]
C --> E[๐ฑ App Screenshot]
C --> F[๐ณ Accessibility Tree]
C --> G[๐ Running Apps]
D --> H[๐ฑ Mouse Control]
D --> I[โจ๏ธ Keyboard Input]
D --> J[โฑ Wait Functions]
E --> K[๐ฑ macOS Apps]
F --> K
G --> K
H --> K
I --> K
J --> K
Component | Description | Technology |
---|---|---|
๐ฏ AppSelector | Application discovery and targeting | AppKit, NSWorkspace |
๐ธ ScreenCaptureProvider | High-quality screenshot capture | ScreenCaptureKit (macOS 15+) |
๐ณ AppAXTreeProvider | Accessibility tree extraction | Accessibility API |
๐ฑ MouseClickTool | Precise mouse automation | CGEvent, Quartz |
โจ๏ธ KeyboardTool | Text input and shortcuts | CGEvent, Carbon |
๐ TCCManager | Permission management | TCC Framework |
AppMCP/
โโโ ๐ Sources/
โ โโโ ๐ AppMCP/
โ โโโ ๐ฏ AppMCP.swift # Core protocols & types
โ โโโ ๐ฅ MCPServer.swift # Main MCP server
โ โโโ ๐ Resources/ # Data providers
โ โโโ ๐ Tools/ # Automation tools
โ โโโ ๐ Permissions/ # Security management
โโโ ๐ Sources/appmcpd/
โ โโโ ๐ Command.swift # CLI daemon
โโโ ๐ Tests/
โ โโโ ๐ AppMCPTests/ # Comprehensive test suite
โโโ ๐ Package.swift # Swift Package configuration
โโโ ๐ CLAUDE.md # Development guidelines
swift test
- ๐ง Unit Tests: Core functionality validation
- ๐ Integration Tests: End-to-end workflow testing
- โก Performance Tests: Response time benchmarking
- ๐ก Security Tests: Permission and validation checks
Test Suite 'AppMCPTests' passed at 2025-06-04 16:42:04.049
Executed 19 tests, with 0 failures (0 unexpected) in 0.015 seconds
โ
All tests passing
AppMCP provides the following specialized tools for macOS automation:
-
capture_ui_snapshot
: Capture screenshot with UI element hierarchy- Optional text recognition via Vision Framework
- Element filtering with queries
- Returns base64 screenshot + structured UI data
-
recognize_text_in_screenshot
: ๐ OCR text extraction from app windows- Multi-language support (en-US, ja-JP, zh-Hans, etc.)
- Fast vs accurate recognition modes
- Confidence scores and bounding boxes
click_element
: Element-based clicking with multi-button supportinput_text
: Text input with setValue/type methodsdrag_drop
: Drag and drop between elementsscroll_window
: Scrolling at specific element locations
list_running_applications
: Get all running apps with metadatalist_application_windows
: List windows with bounds and visibility
The Vision Framework integration provides powerful OCR capabilities:
{
"bundleID": "com.apple.TextEdit",
"includeTextRecognition": true,
"recognitionLanguages": ["en-US", "ja-JP"],
"recognitionLevel": "accurate"
}
Recognition Results:
- Full text extraction in reading order
- Individual text regions with confidence scores
- Bounding boxes in normalized coordinates
- Support for 50+ languages
- Handwritten text detection
- Weather app automation PoC
- Basic screenshot & UI tree extraction
- Mouse & keyboard automation
- Permission management
- Vision Framework OCR text recognition
- Multi-app simultaneous control
- DevTools integration
- Enhanced error recovery
- Performance optimizations
- HTTP transport support
- Shortcuts.app integration
- Plugin SDK for extensions
- Real-time UI streaming
We welcome contributions! Please see our Contributing Guidelines for details.
# Install dependencies
swift package resolve
# Run tests
swift test
# Format code
swift-format -i -r Sources/ Tests/
# Build for development
swift build
AppMCP is released under the MIT License. See LICENSE for details.
- Model Context Protocol - For the excellent MCP Swift SDK
- Apple Developer Team - For the powerful macOS automation APIs
- Swift Community - For the robust Swift ecosystem
Built with โค๏ธ for the AI automation community