AppMCP

main

1amageek/app-mcp

๐Ÿค– AppMCP

AI-Powered macOS Application Automation via Model Context Protocol

Swift macOS MCP License

AppMCP bridges the gap between AI models and macOS applications, enabling intelligent automation through visual inspection, UI interpretation, and precise control of native applications.


โœจ Features

๐ŸŽฏ Visual Intelligence

  • Smart Screenshots: Capture high-resolution app windows using ScreenCaptureKit
  • OCR Text Recognition: Extract text from screenshots using Apple's Vision Framework
  • UI Tree Analysis: Extract detailed accessibility hierarchies for precise element targeting
  • Multi-App Discovery: Identify and monitor multiple running applications simultaneously

๐Ÿ›  Automation Toolkit

  • Precise Interactions: Mouse clicks, keyboard input, and gesture automation
  • Smart Waiting: Intelligent delays and condition-based waiting mechanisms
  • Error Recovery: Robust fallback strategies for reliable automation

๐Ÿ”’ Privacy & Security

  • Permission Management: Seamless TCC (Transparency, Consent, and Control) integration
  • Secure Communication: JSON-RPC over STDIO with structured error handling
  • Bundle ID Validation: Verified application targeting for enhanced security

๐Ÿš€ Quick Start

Prerequisites

  • macOS 15.0+ (Sequoia or later)
  • Swift 6.1+
  • Xcode 16.0+

Installation

# Clone the repository
git clone https://github.com/your-username/AppMCP.git
cd AppMCP

# Build the project
swift build -c release

# Run the daemon
./.build/release/appmcpd --stdio

Permissions Setup

AppMCP requires the following macOS permissions:

  1. ๐Ÿ”“ Accessibility: System Preferences โ†’ Privacy & Security โ†’ Accessibility
  2. ๐Ÿ“บ Screen Recording: System Preferences โ†’ Privacy & Security โ†’ Screen Recording

The application will guide you through the permission setup process.


๐ŸŽฎ Usage Examples

Weather App Automation

import json
import subprocess

# Start AppMCP server
process = subprocess.Popen(['./appmcpd', '--stdio'])

# Take screenshot of Weather app
request = {
    "jsonrpc": "2.0",
    "id": 1,
    "method": "resources/read",
    "params": {"uri": "app://app_screenshot"}
}

# Send request and get response
response = send_mcp_request(request)
print(f"๐Ÿ“ธ Screenshot captured: {response['result']['contents'][0]['text']}")

UI Element Discovery

# Get accessibility tree
request = {
    "jsonrpc": "2.0",
    "id": 2,
    "method": "resources/read",
    "params": {"uri": "app://app_accessibility_tree"}
}

tree = send_mcp_request(request)
print(f"๐ŸŒณ UI Elements: {tree['result']['contents'][0]['text']}")

Automated Interactions

# Click on coordinates
request = {
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
        "name": "mouse_click",
        "arguments": {"x": 300, "y": 150}
    }
}

# Type text
request = {
    "jsonrpc": "2.0",
    "id": 4,
    "method": "tools/call",
    "params": {
        "name": "type_text",
        "arguments": {"text": "Tokyo"}
    }
}

๐Ÿ— Architecture

graph TB
    A[๐Ÿค– AI Model] -->|JSON-RPC| B[๐Ÿ“ก MCP Server]
    B --> C[๐Ÿ” Resources]
    B --> D[๐Ÿ›  Tools]
    
    C --> E[๐Ÿ“ฑ App Screenshot]
    C --> F[๐ŸŒณ Accessibility Tree]
    C --> G[๐Ÿ“‹ Running Apps]
    
    D --> H[๐Ÿ–ฑ Mouse Control]
    D --> I[โŒจ๏ธ Keyboard Input]
    D --> J[โฑ Wait Functions]
    
    E --> K[๐Ÿ“ฑ macOS Apps]
    F --> K
    G --> K
    H --> K
    I --> K
    J --> K
Loading

Core Components

Component Description Technology
๐ŸŽฏ AppSelector Application discovery and targeting AppKit, NSWorkspace
๐Ÿ“ธ ScreenCaptureProvider High-quality screenshot capture ScreenCaptureKit (macOS 15+)
๐ŸŒณ AppAXTreeProvider Accessibility tree extraction Accessibility API
๐Ÿ–ฑ MouseClickTool Precise mouse automation CGEvent, Quartz
โŒจ๏ธ KeyboardTool Text input and shortcuts CGEvent, Carbon
๐Ÿ”’ TCCManager Permission management TCC Framework

๐Ÿ“ฆ Package Structure

AppMCP/
โ”œโ”€โ”€ ๐Ÿ“ Sources/
โ”‚   โ””โ”€โ”€ ๐Ÿ“ AppMCP/
โ”‚       โ”œโ”€โ”€ ๐ŸŽฏ AppMCP.swift          # Core protocols & types
โ”‚       โ”œโ”€โ”€ ๐Ÿ–ฅ MCPServer.swift        # Main MCP server
โ”‚       โ”œโ”€โ”€ ๐Ÿ“ Resources/            # Data providers
โ”‚       โ”œโ”€โ”€ ๐Ÿ“ Tools/                # Automation tools
โ”‚       โ””โ”€โ”€ ๐Ÿ“ Permissions/          # Security management
โ”œโ”€โ”€ ๐Ÿ“ Sources/appmcpd/
โ”‚   โ””โ”€โ”€ ๐Ÿš€ Command.swift             # CLI daemon
โ”œโ”€โ”€ ๐Ÿ“ Tests/
โ”‚   โ””โ”€โ”€ ๐Ÿ“ AppMCPTests/              # Comprehensive test suite
โ”œโ”€โ”€ ๐Ÿ“‹ Package.swift                 # Swift Package configuration
โ””โ”€โ”€ ๐Ÿ“– CLAUDE.md                     # Development guidelines

๐Ÿงช Testing

Run All Tests

swift test

Test Categories

  • ๐Ÿ”ง Unit Tests: Core functionality validation
  • ๐Ÿ”— Integration Tests: End-to-end workflow testing
  • โšก Performance Tests: Response time benchmarking
  • ๐Ÿ›ก Security Tests: Permission and validation checks

Example Test Results

Test Suite 'AppMCPTests' passed at 2025-06-04 16:42:04.049
    Executed 19 tests, with 0 failures (0 unexpected) in 0.015 seconds
โœ… All tests passing

๐Ÿ›  API Reference

MCP Tools

AppMCP provides the following specialized tools for macOS automation:

Screenshot & UI Analysis

  • capture_ui_snapshot: Capture screenshot with UI element hierarchy

    • Optional text recognition via Vision Framework
    • Element filtering with queries
    • Returns base64 screenshot + structured UI data
  • recognize_text_in_screenshot: ๐Ÿ†• OCR text extraction from app windows

    • Multi-language support (en-US, ja-JP, zh-Hans, etc.)
    • Fast vs accurate recognition modes
    • Confidence scores and bounding boxes

Automation Controls

  • click_element: Element-based clicking with multi-button support
  • input_text: Text input with setValue/type methods
  • drag_drop: Drag and drop between elements
  • scroll_window: Scrolling at specific element locations

App Discovery

  • list_running_applications: Get all running apps with metadata
  • list_application_windows: List windows with bounds and visibility

Text Recognition Features

The Vision Framework integration provides powerful OCR capabilities:

{
  "bundleID": "com.apple.TextEdit",
  "includeTextRecognition": true,
  "recognitionLanguages": ["en-US", "ja-JP"],
  "recognitionLevel": "accurate"
}

Recognition Results:

  • Full text extraction in reading order
  • Individual text regions with confidence scores
  • Bounding boxes in normalized coordinates
  • Support for 50+ languages
  • Handwritten text detection

๐ŸŽฏ Roadmap

๐ŸŒŸ Current (v1.0.0)

  • Weather app automation PoC
  • Basic screenshot & UI tree extraction
  • Mouse & keyboard automation
  • Permission management
  • Vision Framework OCR text recognition

๐Ÿš€ Near Future (v0.2.0)

  • Multi-app simultaneous control
  • DevTools integration
  • Enhanced error recovery
  • Performance optimizations

๐Ÿ”ฎ Long Term (v1.0.0)

  • HTTP transport support
  • Shortcuts.app integration
  • Plugin SDK for extensions
  • Real-time UI streaming

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

# Install dependencies
swift package resolve

# Run tests
swift test

# Format code
swift-format -i -r Sources/ Tests/

# Build for development
swift build

๐Ÿ“„ License

AppMCP is released under the MIT License. See LICENSE for details.


๐Ÿ™ Acknowledgments

  • Model Context Protocol - For the excellent MCP Swift SDK
  • Apple Developer Team - For the powerful macOS automation APIs
  • Swift Community - For the robust Swift ecosystem

Built with โค๏ธ for the AI automation community

๐Ÿ“– Documentation โ€ข ๐Ÿ› Issues โ€ข ๐Ÿ’ฌ Discussions

Description

  • Swift Tools 6.1.0
View More Packages from this Author

Dependencies

Last updated: Sun Jul 27 2025 01:50:30 GMT-0900 (Hawaii-Aleutian Daylight Time)