UI Element-Based macOS Automation SDK
AppPilot is a modern Swift Package Manager library that provides intelligent UI automation for macOS applications. Instead of relying on brittle coordinate-based automation, AppPilot discovers actual UI elements using Accessibility APIs and performs smart, element-based operations.
- π― Smart Element Discovery: Find UI elements by role, title, and identifier using Accessibility API
- π±οΈ Element-Based Actions: Click buttons, fill text fields, and interact with UI components
- π Automatic Coordinate Calculation: No need to manually calculate button centers
- π Universal Compatibility: Works with SwiftUI, AppKit, Electron, and web applications
- β° Intelligent Waiting: Wait for elements to appear or conditions to be met
- π· Screen Capture: Take screenshots of windows and applications using ScreenCaptureKit
- π Multi-Language IME Support: Advanced composition input for Japanese, Chinese, Korean, and other languages with automatic candidate detection
- π Graceful Fallback: Coordinate-based automation when element detection fails
- π‘οΈ Type Safety: Built with Swift 6.1 and modern concurrency (Actor-based design)
- π§ͺ Comprehensive Testing: Swift Testing framework with dedicated TestApp integration
ββββββββββββββββ AppPilot (Actor) ββββββββββββββββ
β β’ UI Element discovery and automation β
β β’ Smart element-based actions β
β β’ Automatic coordinate conversion β
β β’ Application and window management β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββΌβββββββββββ
βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββββββ
β Element β β CGEvent β β Accessibilityβ
β Finder β β Driver β β Driver β
β β β β β β
β AX Tree β β Mouse & β β Element β
β Parser β β Keyboard β β Detection β
ββββββββββββ ββββββββββββ ββββββββββββββββ
- macOS 15.0+
- Swift 6.1+
- Xcode 16.0+
- Accessibility Permission (System Preferences β Security & Privacy β Accessibility)
Add AppPilot to your Package.swift
:
dependencies: [
.package(url: "https://github.com/your-username/AppPilot.git", from: "1.2.0")
]
Or add it via Xcode:
- File β Add Package Dependencies
- Enter the repository URL
- Select your target and add the package
import AppPilot
let pilot = AppPilot()
// Find an application
let app = try await pilot.findApplication(name: "Calculator")
let window = try await pilot.findWindow(app: app, index: 0)
// Discover UI elements
let buttons = try await pilot.findElements(in: window, role: .button)
let numberFive = try await pilot.findButton(in: window, title: "5")
// Perform element-based actions
try await pilot.click(element: numberFive)
try await pilot.click(element: try await pilot.findButton(in: window, title: "+"))
try await pilot.click(element: try await pilot.findButton(in: window, title: "3"))
try await pilot.click(element: try await pilot.findButton(in: window, title: "="))
AppPilot prioritizes finding actual UI elements over blind coordinate clicking:
// β Old approach: Hardcoded coordinates
try await pilot.click(window: window, at: Point(x: 200, y: 300))
// β
New approach: Smart element discovery
let submitButton = try await pilot.findButton(in: window, title: "Submit")
try await pilot.click(element: submitButton)
Find elements using semantic properties:
// Find by role and title
let saveButton = try await pilot.findElement(in: window, role: .button, title: "Save")
// Find by identifier
let searchField = try await pilot.findElements(in: window, identifier: "search_input")
// Find all buttons
let allButtons = try await pilot.findElements(in: window, role: .button)
// Find text fields
let textField = try await pilot.findTextField(in: window)
Wait for UI changes and conditions:
// Wait for element to appear
let loadingComplete = try await pilot.waitForElement(
in: window,
role: .staticText,
title: "Loading complete",
timeout: 10.0
)
// Wait for specific time
try await pilot.wait(.time(seconds: 2.0))
// Wait for element to disappear
try await pilot.wait(.elementDisappear(window: window, role: .button, title: "Loading..."))
AppPilot uses Swift Testing framework with comprehensive test coverage:
# Run all tests
swift test
# Run specific test categories
swift test --filter ".unit" # Unit tests
swift test --filter ".integration" # Integration tests with TestApp
swift test --filter ".mouseClick" # Mouse click accuracy tests
swift test --filter ".keyboard" # Keyboard input tests
swift test --filter "CompositionInput" # Multi-language IME tests
# Run specific tests
swift test --filter "testElementDiscovery"
swift test --filter "CorrectTestFlowTests"
# Build the project
swift build
# Clean build artifacts
swift package clean
The project includes a dedicated TestApp for comprehensive automation validation following the γθ¦γγγηθ§£γγγγγ’γ―γ·γ§γ³γ (See, Understand, Action) pattern:
@Test("Complete TestApp integration test")
func testTestAppIntegration() async throws {
let pilot = AppPilot()
// Stage 1: θ¦γ (See/Observe) - Application & UI Discovery
let testApp = try await pilot.findApplication(name: "TestApp")
let window = try await pilot.findWindow(app: testApp, title: "Mouse Click")
// Stage 2: ηθ§£γγ (Understand) - Element Analysis
let allElements = try await pilot.findElements(in: window)
let clickTargets = allElements.filter { $0.role == .button }
#expect(clickTargets.count >= 5, "Should find at least 5 click targets")
// Stage 3: γ’γ―γ·γ§γ³ (Action) - Element-based Automation
let testSession = try await TestSession.create(pilot: pilot, testType: .mouseClick)
await testSession.resetState()
for target in clickTargets {
let beforeState = await testSession.getClickTargets()
let beforeCount = beforeState.filter { $0.clicked }.count
// Perform element-based click
let result = try await pilot.click(element: target)
#expect(result.success, "Click should succeed for \(target.title ?? target.id)")
try await pilot.wait(.time(seconds: 0.5))
// Verify via TestApp API
let afterState = await testSession.getClickTargets()
let afterCount = afterState.filter { $0.clicked }.count
#expect(afterCount > beforeCount, "TestApp should detect click on \(target.title ?? target.id)")
}
}
@Test("Input source management test")
func testInputSourceManagement() async throws {
let pilot = AppPilot()
// Test current input source
let currentSource = try await pilot.getCurrentInputSource()
#expect(!currentSource.identifier.isEmpty)
// Test available sources
let sources = try await pilot.getAvailableInputSources()
#expect(sources.count > 0)
// Test text input with different sources
let testApp = try await pilot.findApplication(name: "TestApp")
let window = try await pilot.findWindow(app: testApp, title: "Keyboard")
let textField = try await pilot.findTextField(in: window)
// Type with English input
try await pilot.type(text: "Hello", into: textField, inputSource: .english)
// Type with Japanese input (if available)
// Test Japanese composition input
if sources.contains(where: { $0.identifier.contains("Japanese") }) {
let result = try await pilot.input("konnichiwa", into: textField, with: .japaneseRomaji)
#expect(result.success, "Japanese composition input should succeed")
if result.needsUserDecision {
// Test candidate selection
let selection = try await pilot.selectCandidate(at: 0, for: textField)
#expect(selection.success, "Candidate selection should succeed")
}
}
}
AppPilot requires specific macOS permissions to function:
- Open System Preferences β Security & Privacy β Privacy β Accessibility
- Click the lock to make changes
- Add your application to the list
- Ensure it's checked/enabled
For sandboxed applications, add these entitlements:
<key>NSAppleEventsUsageDescription</key>
<string>AppPilot needs AppleEvents access for window management</string>
<key>com.apple.security.automation.apple-events</key>
<true/>
AppPilot 1.2 provides a comprehensive set of automation operations:
findElements(in:role:title:identifier:)
- Find UI elements with flexible criteriafindElement(in:role:title:)
- Find single UI elementfindButton(in:title:)
- Find button by titlefindTextField(in:placeholder:)
- Find text input fieldfindClickableElements(in:)
- Find all clickable elementsfindTextInputElements(in:)
- Find all text input elements
click(element:)
- Click UI element at its center pointinput(text:into:)
- Type text into UI element (enhanced)input(_:into:with:)
- Composition input with IME support (NEW)selectCandidate(at:for:)
- Select IME conversion candidate (NEW)commitComposition(for:)
- Commit IME composition (NEW)cancelComposition(for:)
- Cancel IME composition (NEW)getValue(from:)
- Get value from UI elementelementExists(_:)
- Check if element is still valid
wait(.time(seconds:))
- Wait for specific durationwait(.elementAppear(window:role:title:))
- Wait for element to appearwait(.elementDisappear(window:role:title:))
- Wait for element to disappearwait(.uiChange(window:timeout:))
- Wait for UI changeswaitForElement(in:role:title:timeout:)
- Wait for specific element
getCurrentInputSource()
- Get current keyboard layoutgetAvailableInputSources()
- List all available input sourcesswitchInputSource(to:)
- Change keyboard layouttype(_:inputSource:)
- Type with specific input source (fallback)
capture(window:)
- Capture window screenshotScreenCaptureUtility.convertToPNG(_:)
- Convert to PNG dataScreenCaptureUtility.convertToJPEG(_:quality:)
- Convert to JPEG dataScreenCaptureUtility.saveToFile(_:path:format:)
- Save image to file
click(window:at:button:count:)
- Click at coordinates with app focusclick(at:button:count:)
- Legacy coordinate click (no focus management)type(text:)
- Type to focused app (fallback)gesture(from:to:duration:)
- Drag gesture between pointsdrag(from:to:duration:)
- Legacy drag operation
listApplications()
- Get all running applicationsfindApplication(bundleId:)
- Find app by bundle IDfindApplication(name:)
- Find app by namelistWindows(app:)
- Get windows for applicationfindWindow(app:title:)
- Find window by titlefindWindow(app:index:)
- Find window by index
// List all applications
let apps = try await pilot.listApplications()
// Find application by bundle ID
let safari = try await pilot.findApplication(bundleId: "com.apple.Safari")
// Find application by name
let finder = try await pilot.findApplication(name: "Finder")
// Get windows for an application
let windows = try await pilot.listWindows(app: safari)
// Find specific window
let mainWindow = try await pilot.findWindow(app: safari, title: "Safari")
// Find elements with flexible criteria
let elements = try await pilot.findElements(
in: window,
role: .button, // Optional: filter by role
title: "Save", // Optional: filter by title
identifier: "save_btn" // Optional: filter by identifier
)
// Specialized finders
let button = try await pilot.findButton(in: window, title: "OK")
let textField = try await pilot.findTextField(in: window, placeholder: "Enter text")
// Click discovered elements
let result = try await pilot.click(element: button)
// Type into text fields
try await pilot.type(text: "Hello World", into: textField)
// Get element values
let value = try await pilot.getValue(from: textField)
// Check element existence
let exists = try await pilot.elementExists(button)
// Get current input source
let currentSource = try await pilot.getCurrentInputSource()
print("Current: \(currentSource.displayName)")
// List available input sources
let sources = try await pilot.getAvailableInputSources()
for source in sources {
print("\(source.displayName): \(source.identifier)")
}
// Switch input source and type
try await pilot.switchInputSource(to: .japanese)
try await pilot.type(text: "γγγ«γ‘γ―", into: textField)
// Capture window screenshot
let image = try await pilot.capture(window: window)
// Convert to PNG data and save
if let pngData = ScreenCaptureUtility.convertToPNG(image) {
let url = URL(fileURLWithPath: "/tmp/screenshot.png")
try pngData.write(to: url)
}
import AppPilot
let pilot = AppPilot()
// Find TestApp
let testApp = try await pilot.findApplication(name: "TestApp")
let window = try await pilot.findWindow(app: testApp, title: "Mouse Click")
// Discover all clickable targets automatically
let buttons = try await pilot.findElements(in: window, role: .button)
print("Found \(buttons.count) clickable targets")
// Click each target using element-based automation
for button in buttons where button.isEnabled {
print("Clicking: \(button.title ?? button.id)")
try await pilot.click(element: button)
// Verify via TestApp API
let response = try await testAppAPI.getClickTargets()
let clickedCount = response.filter { $0.clicked }.count
print("Targets clicked: \(clickedCount)")
try await pilot.wait(.time(seconds: 0.5))
}
let pilot = AppPilot()
// Find Weather app
let weatherApp = try await pilot.findApplication(bundleId: "com.apple.weather")
let window = try await pilot.findWindow(app: weatherApp, index: 0)
// Find and interact with search field
let searchField = try await pilot.findTextField(in: window)
try await pilot.click(element: searchField)
try await pilot.type(text: "Tokyo", into: searchField)
// Wait for and click search result
let tokyoResult = try await pilot.waitForElement(
in: window,
role: .button,
title: "Tokyo",
timeout: 5.0
)
try await pilot.click(element: tokyoResult)
let pilot = AppPilot()
// Find text editing app
let textEdit = try await pilot.findApplication(name: "TextEdit")
let window = try await pilot.findWindow(app: textEdit, index: 0)
let textArea = try await pilot.findTextField(in: window)
// Simple English input
try await pilot.input(text: "Hello World", into: textArea)
// Japanese composition input with automatic candidate handling
let result = try await pilot.input("konnichiwa", into: textArea, with: .japaneseRomaji)
// Handle IME candidates if user decision is needed
if result.needsUserDecision {
if let candidates = result.compositionCandidates {
print("Available candidates: \(candidates)")
// Example: ["γγγ«γ‘γ―", "γγγ«γ‘γ", "δ»ζ₯γ―"]
// Select the first candidate (γγγ«γ‘γ―)
let selection = try await pilot.selectCandidate(at: 0, for: textArea)
// Commit the composition
if !selection.isCompositionCompleted {
try await pilot.commitComposition(for: textArea)
}
}
}
// Chinese input with Pinyin
let chineseResult = try await pilot.input("ni hao", into: textArea, with: .chinesePinyin)
// Handles: "ni hao" β "δ½ ε₯½" with candidate selection if needed
// Korean input
let koreanResult = try await pilot.input("annyeong", into: textArea, with: .korean)
// Handles: "annyeong" β "μλ
" with automatic composition
// Direct input (bypasses IME for final text)
try await pilot.input("γγγ«γ‘γ―", into: textArea) // Direct hiragana input
let pilot = AppPilot()
// Setup
let app = try await pilot.findApplication(name: "TextEdit")
let window = try await pilot.findWindow(app: app, index: 0)
let textField = try await pilot.findTextField(in: window)
// Complex Japanese input workflow
let inputResult = try await pilot.input("arigatougozaimasu", into: textField, with: .japaneseRomaji)
// Check if the IME presents multiple conversion candidates
if case .candidateSelection(_, let candidates, let selectedIndex) = inputResult.compositionData?.state {
print("Candidates available:")
for (index, candidate) in candidates.enumerated() {
let marker = index == selectedIndex ? "π" : " "
print("\(marker) \(index): \(candidate)")
}
// Select different candidate if needed
if selectedIndex != 0 {
try await pilot.selectCandidate(at: 0, for: textField)
}
// Commit the final selection
try await pilot.commitComposition(for: textField)
} else if case .committed(let finalText) = inputResult.compositionData?.state {
print("Automatically committed: \(finalText)")
}
// Cancel composition if needed
if inputResult.needsUserDecision {
// User decides to cancel
try await pilot.cancelComposition(for: textField)
}
AppPilot provides comprehensive error handling:
public enum PilotError: Error {
case permissionDenied(String)
case applicationNotFound(String)
case windowNotFound(WindowHandle)
case elementNotFound(role: ElementRole, title: String?)
case elementNotAccessible(String)
case multipleElementsFound(role: ElementRole, title: String?, count: Int)
case timeout(TimeInterval)
case osFailure(api: String, code: Int32)
}
Handle errors gracefully:
do {
let button = try await pilot.findButton(in: window, title: "Submit")
try await pilot.click(element: button)
} catch PilotError.elementNotFound(let role, let title) {
print("Button '\(title ?? "unknown")' not found")
// Fallback to coordinate-based clicking
try await pilot.click(window: window, at: Point(x: 200, y: 300))
} catch PilotError.permissionDenied(let message) {
print("Permission required: \(message)")
}
- Element-First Approach: UI elements are discovered before actions
- Smart Targeting: Find elements by semantic properties, not coordinates
- Automatic Coordinate Calculation: No manual coordinate math required
- Better Error Messages: Descriptive errors about missing elements
- Input Source Management: Built-in support for multi-language input
- Enhanced Wait Operations: Wait for specific UI element conditions
- Improved Testing: Swift Testing framework with TestApp integration
// Old: Hardcoded coordinate clicking
try await pilot.click(window: window, at: Point(x: 534, y: 228))
// New: Smart element discovery and clicking
let button = try await pilot.findButton(in: window, title: "Submit")
try await pilot.click(element: button)
// Old: Focus app and type blindly
try await pilot.type(text: "Hello World")
// New: Find text field and type into it
let textField = try await pilot.findTextField(in: window)
try await pilot.type(text: "Hello World", into: textField)
// Old: No element discovery, manual coordinate calculation
let buttonCenter = Point(x: 200, y: 150)
try await pilot.click(window: window, at: buttonCenter)
// New: Automatic element discovery and interaction
let allButtons = try await pilot.findElements(in: window, role: .button)
for button in allButtons where button.isEnabled {
try await pilot.click(element: button) // Automatically uses element.centerPoint
}
// Old: Fixed time waits
try await Task.sleep(nanoseconds: 2_000_000_000)
// New: Semantic wait conditions
try await pilot.waitForElement(in: window, role: .button, title: "Continue", timeout: 10.0)
try await pilot.wait(.elementDisappear(window: window, role: .dialog, title: "Loading"))
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass:
swift test
- Submit a pull request
AppPilot is available under the MIT license. See LICENSE file for details.
- Documentation: Check the inline documentation and examples
- Issues: Report bugs and feature requests on GitHub
- Discussions: Join the community discussions for help and tips
AppPilot 1.2 - Intelligent UI automation for the modern Mac