SparkConnect

0.1.0

Apache Spark Connect Client for Swift
apache/spark-connect-swift

What's New

v0.1.0

2025-05-08T04:04:10Z

Apache Sparkā„¢ Connect Client for Swift language is a subproject of Apache Spark and aims to provide Swift implementation of Spark Connect. v0.1.0 is the initial release of Apache Spark Connect for Swift client. This is still experimental.

Swift Package Index

https://swiftpackageindex.com/apache/spark-connect-swift

Documentation

https://swiftpackageindex.com/apache/spark-connect-swift/v0.1.0/documentation/sparkconnect

Full Changelog

https://github.com/apache/spark-connect-swift/commits/v0.1.0

Resolved Issues

  • [SPARK-51458] Add GitHub Action job to check ASF license
  • [SPARK-51459] Add merge_spark_pr.py and PULL_REQUEST_TEMPLATE
  • [SPARK-51461] Setup SparkConnect Swift package structure and CI to test build
  • [SPARK-51463] Add Spark Connect-generated Swift source code
  • [SPARK-51465] Use Apache Arrow Swift 19.0.1
  • [SPARK-51472] Add gRPC SparkConnectClient actor
  • [SPARK-51477] Enable autolink to SPARK jira issue
  • [SPARK-51481] Add RuntimeConf actor
  • [SPARK-51483] Add SparkSession and DataFrame actors
  • [SPARK-51485] Add How to use in your apps section to README.md
  • [SPARK-51490] Support iOS, watchOS, and tvOS
  • [SPARK-51493] Refine merge_spark_pr.py to use connect-swift-x.y.z version
  • [SPARK-51495] Add Integration Test GitHub Action job with 4.0.0-preview2
  • [SPARK-51504] Support select/limit/sort/orderBy/isEmpty for DataFrame
  • [SPARK-51508] Support collect(): [[String?]] for DataFrame
  • [SPARK-51510] Add SQL-file based SQLTests suite
  • [SPARK-51521] Add integral/floating/string/date type test and answer files
  • [SPARK-51524] Fix Package Author information to Apache Spark project
  • [SPARK-51529] Support TLS connections
  • [SPARK-51539] Refactor SparkConnectClient to use analyze helper function
  • [SPARK-51560] Support cache/persist/unpersist for DataFrame
  • [SPARK-51561] Upgrade gRPC Swift to 2.1.2 and gRPC Swift NIO Transport to 1.0.2
  • [SPARK-51570] Support filter/where for DataFrame
  • [SPARK-51572] Support binary type in show and collect
  • [SPARK-51620] Support columns for DataFrame
  • [SPARK-51621] Support sparkSession for DataFrame
  • [SPARK-51626] Support DataFrameReader
  • [SPARK-51636] Add StorageLevel struct
  • [SPARK-51642] Support explain for DataFrame
  • [SPARK-51656] Support time for SparkSession
  • [SPARK-51659] Add cache and describe-related sql test and answer files
  • [SPARK-51676] Support printSchema for DataFrame
  • [SPARK-51679] Support dtypes for DataFrame
  • [SPARK-51689] Support DataFrameWriter
  • [SPARK-51693] Support storageLevel for DataFrame
  • [SPARK-51702] Revise sparkSession/read/write/columns/schema/dtypes/storageLevel API
  • [SPARK-51708] Add CaseInsensitiveDictionary
  • [SPARK-51718] Update README.md with Spark 4.0.0 RC3
  • [SPARK-51719] Support table for SparkSession and DataFrameReader
  • [SPARK-51729] Support head/tail for DataFrame
  • [SPARK-51730] Add Catalog actor and support catalog/database APIs
  • [SPARK-51736] Make SparkConnectError and StorageLevel fields public
  • [SPARK-51743] Add describe_(database|table), show_(database|table), explain sql test and answer files
  • [SPARK-51749] Add MacOS integration test with Apache Spark 4.0.0 RC3
  • [SPARK-51750] Upgrade FlatBuffers to v25.2.10
  • [SPARK-51759] Add ErrorUtils and SQLHelper
  • [SPARK-51763] Support struct type in ArrowReader
  • [SPARK-51781] Update README.md and integration test with Apache Spark 4.0.0 RC4
  • [SPARK-51782] Add build-ubuntu-arm test pipeline
  • [SPARK-51784] Support xml in DataFrame(Reader/Writer)
  • [SPARK-51785] Support addTag/removeTag/getTags/clearTags in SparkSession
  • [SPARK-51787] Remove sessionID parameter from getExecutePlanRequest
  • [SPARK-51792] Support saveAsTable and insertInto
  • [SPARK-51793] Support ddlParse and jsonToDdl in SparkConnectClient
  • [SPARK-51799] Support user-specified schema in DataFrameReader
  • [SPARK-51804] Support sample in DataFrame
  • [SPARK-51807] Support drop and withColumnRenamed in DataFrame
  • [SPARK-51808] Use Swift 6.1 in GitHub Action CIs
  • [SPARK-51809] Support offset in DataFrame
  • [SPARK-51815] Add Row struct
  • [SPARK-51825] Add SparkFileUtils
  • [SPARK-51837] Support inputFiles for DataFrame
  • [SPARK-51839] Support except(All)?/intersect(All)?/union(All)?/unionByName in DataFrame
  • [SPARK-51841] Support isLocal and isStreaming for DataFrame
  • [SPARK-51846] Upgrade gRPC Swift Protobuf to 1.2 and gRPC Swift NIO Transport to 1.0.3
  • [SPARK-51850] Fix DataFrame.execute to reset previously received Arrow batch data
  • [SPARK-51851] Refactor to use withGPRC wrappers
  • [SPARK-51852] Support SPARK_CONNECT_AUTHENTICATE_TOKEN
  • [SPARK-51853] Improve DataFrame.show API to support all signatures
  • [SPARK-51854] Remove SwiftyTextTable dependency and unused import statements
  • [SPARK-51855] Support Spark SQL REPL
  • [SPARK-51857] Support token/userId/userAgent parameters in SparkConnectClient
  • [SPARK-51858] Support SPARK_REMOTE
  • [SPARK-51863] Support join and crossJoin in DataFrame
  • [SPARK-51864] Rename parameters and support case-insensitively
  • [SPARK-51870] Support SPARK_GENERATE_GOLDEN_FILES in SQLTests
  • [SPARK-51871] Improve SQLTests to check column names
  • [SPARK-51875] Support repartition(ByExpression)? and coalesce
  • [SPARK-51879] Support groupBy/rollup/cube in DataFrame
  • [SPARK-51911] Support lateralJoin in DataFrame
  • [SPARK-51912] Support semanticHash and sameSemantics in DataFrame
  • [SPARK-51916] Add create_(scala|table)_function and drop_function test scripts
  • [SPARK-51917] Add DataFrameWriterV2 actor
  • [SPARK-51934] Add MacOS integration test with Apache Spark 3.5.5
  • [SPARK-51942] Support selectExpr in DataFrame
  • [SPARK-51943] Upgrade setup-swift to 3.0 dev version
  • [SPARK-51967] Use discardableResult to prevent unnecessary warnings
  • [SPARK-51968] Support (cache|uncache|refresh)Table, refreshByPath, isCached, clearCache in Catalog
  • [SPARK-51969] Support createTable and (table|function)Exists in Catalog
  • [SPARK-51970] Support to create and drop temporary views in DataFrame and Catalog
  • [SPARK-51971] Improve DataFrame.collect to return the original values
  • [SPARK-51976] Add array, map, timestamp, posexplode test queries
  • [SPARK-51977] Improve SparkSQLRepl to support multiple lines
  • [SPARK-51986] Support Parameterized SQL queries in sql API
  • [SPARK-51990] Use Swift docker image on Linux environments
  • [SPARK-51991] Add SparkConnect.md, GettingStarted.md and SparkSession.md
  • [SPARK-51992] Support interrupt(Tag|Operation|All) in SparkSession
  • [SPARK-51993] Support emptyDataFrame and listColumns
  • [SPARK-51994] Fix ArrowType.Info.== to support complex types
  • [SPARK-51995] Support toDF, distinct and dropDuplicates(WithinWatermark)? in DataFrame
  • [SPARK-51996] Support describe and summary in DataFrame
  • [SPARK-51997] Mark nodoc to hide generated and internal classes from docs

Apache Spark Connect Client for Swift

GitHub Actions Build Swift Version Compatibility Platform Compatibility

This is an experimental Swift library to show how to connect to a remote Apache Spark Connect Server and run SQL statements to manipulate remote data.

So far, this library project is tracking the upstream changes like the Apache Spark 4.0.0 RC4 release and Apache Arrow project's Swift-support.

Requirement

How to use in your apps

Create a Swift project.

mkdir SparkConnectSwiftApp
cd SparkConnectSwiftApp
swift package init --name SparkConnectSwiftApp --type executable

Add SparkConnect package to the dependency like the following

$ cat Package.swift
import PackageDescription

let package = Package(
  name: "SparkConnectSwiftApp",
  platforms: [
    .macOS(.v15)
  ],
  dependencies: [
    .package(url: "https://github.com/apache/spark-connect-swift.git", branch: "main")
  ],
  targets: [
    .executableTarget(
      name: "SparkConnectSwiftApp",
      dependencies: [.product(name: "SparkConnect", package: "spark-connect-swift")]
    )
  ]
)

Use SparkSession of SparkConnect module in Swift.

$ cat Sources/main.swift

import SparkConnect

let spark = try await SparkSession.builder.getOrCreate()
print("Connected to Apache Spark \(await spark.version) Server")

let statements = [
  "DROP TABLE IF EXISTS t",
  "CREATE TABLE IF NOT EXISTS t(a INT) USING ORC",
  "INSERT INTO t VALUES (1), (2), (3)",
]

for s in statements {
  print("EXECUTE: \(s)")
  _ = try await spark.sql(s).count()
}
print("SELECT * FROM t")
try await spark.sql("SELECT * FROM t").cache().show()

try await spark.range(10).filter("id % 2 == 0").write.mode("overwrite").orc("/tmp/orc")
try await spark.read.orc("/tmp/orc").show()

await spark.stop()

Run your Swift application.

$ swift run
...
Connected to Apache Spark 4.0.0 Server
EXECUTE: DROP TABLE IF EXISTS t
EXECUTE: CREATE TABLE IF NOT EXISTS t(a INT)
EXECUTE: INSERT INTO t VALUES (1), (2), (3)
SELECT * FROM t
+---+
| a |
+---+
| 2 |
| 1 |
| 3 |
+---+
+----+
| id |
+----+
| 2  |
| 6  |
| 0  |
| 8  |
| 4  |
+----+

You can find this example in the following repository.

How to use Spark SQL REPL via Spark Connect for Swift

This project also provides Spark SQL REPL. You can run it directly from this repository.

$ swift run
...
Build of product 'SparkSQLRepl' complete! (2.33s)
Connected to Apache Spark 4.0.0 Server
spark-sql (default)> SHOW DATABASES;
+---------+
|namespace|
+---------+
|  default|
+---------+

Time taken: 30 ms
spark-sql (default)> CREATE DATABASE db1;
++
||
++
++

Time taken: 31 ms
spark-sql (default)> USE db1;
++
||
++
++

Time taken: 27 ms
spark-sql (db1)> CREATE TABLE t1 AS SELECT * FROM RANGE(10);
++
||
++
++

Time taken: 99 ms
spark-sql (db1)> SELECT * FROM t1;
+---+
| id|
+---+
|  1|
|  5|
|  3|
|  0|
|  6|
|  9|
|  4|
|  8|
|  7|
|  2|
+---+

Time taken: 80 ms
spark-sql (db1)> USE default;
++
||
++
++

Time taken: 26 ms
spark-sql (default)> DROP DATABASE db1 CASCADE;
++
||
++
++
spark-sql (default)> exit;

Apache Spark 4 supports SQL Pipe Syntax.

$ swift run
...
Build of product 'SparkSQLRepl' complete! (2.33s)
Connected to Apache Spark 4.0.0 Server
spark-sql (default)>
FROM ORC.`/opt/spark/examples/src/main/resources/users.orc`
|> AGGREGATE COUNT(*) cnt
   GROUP BY name
|> ORDER BY cnt DESC, name ASC
;
+------+---+
|  name|cnt|
+------+---+
|Alyssa|  1|
|   Ben|  1|
+------+---+

Time taken: 159 ms

You can use SPARK_REMOTE to specify the Spark Connect connection string in order to provide more options.

SPARK_REMOTE=sc://localhost swift run

Description

  • Swift Tools 6.0.0
View More Packages from this Author

Dependencies

Last updated: Thu May 15 2025 23:46:33 GMT-0900 (Hawaii-Aleutian Daylight Time)