Installation
Python
pip install hnswlib-spark==2.0.0b2
JVM
-
// for spark 3.4.x libraryDependencies += "com.github.jelmerk" %% "hnswlib-spark_3_4" % "2.0.0-beta.2" // for spark 3.5.x libraryDependencies += "com.github.jelmerk" %% "hnswlib-spark_3_5" % "2.0.0-beta.2" // for spark 4.0.x libraryDependencies += "com.github.jelmerk" %% "hnswlib-spark_4_0" % "2.0.0-beta.2"
-
<properties> <scala.binary.version>2.13</scala.binary.version> </properties> <dependencies> <!-- for spark 3.4.x --> <dependency> <groupId>com.github.jelmerk</groupId> <artifactId>hnswlib-spark_3_4_${scala.binary.version}</artifactId> <version>2.0.0-beta.2</version> </dependency> <!-- for spark 3.5.x --> <dependency> <groupId>com.github.jelmerk</groupId> <artifactId>hnswlib-spark_3_5_${scala.binary.version}</artifactId> <version>2.0.0-beta.2</version> </dependency> <!-- for spark 3.5.x --> <dependency> <groupId>com.github.jelmerk</groupId> <artifactId>hnswlib-spark_4_0_${scala.binary.version}</artifactId> <version>2.0.0-beta.2</version> </dependency> </dependencies>
-
ext.scalaBinaryVersion = '2.13' dependencies { // for spark 3.4.x implementation("com.github.jelmerk:hnswlib-spark_3_4_$scalaBinaryVersion:2.0.0-beta.2") // for spark 3.5.x implementation("com.github.jelmerk:hnswlib-spark_3_5_$scalaBinaryVersion:2.0.0-beta.2") // for spark 4.x.x implementation("com.github.jelmerk:hnswlib-spark_4_0_$scalaBinaryVersion:2.0.0-beta.2") }
Databricks
-
Create a cluster if you don’t have one already
-
In Libraries tab inside your cluster go to Install New -> Maven -> Coordinates and enter
for DBR 13.3 LTS:
com.github.jelmerk:hnswlib-spark_3_4_2.12:2.0.0-beta.2
for DBR 14.3 LTS and above:
com.github.jelmerk:hnswlib-spark_3_5_2.12:2.0.0-beta.2
for DBR 17.0 and above:
com.github.jelmerk:hnswlib-spark_4_0_2.13:2.0.0-beta.2
then press install
-
Optionally add the following cluster settings for faster searches
Advanced Options -> Spark -> Environment variables:
JNAME=zulu17-ca-amd64
Advanced Options -> Spark -> Spark config
spark.executor.extraJavaOptions --enable-preview --add-modules jdk.incubator.vector
Now you can attach your notebook to the cluster and use Hnswlib spark!
Spark shell
# for spark 3.4.x`
spark-shell --packages 'com.github.jelmerk:hnswlib-spark_3_4_2.12:2.0.0-beta.2'
# for spark 3.5.x`
spark-shell --packages 'com.github.jelmerk:hnswlib-spark_3_5_2.12:2.0.0-beta.2'
# for spark 4.0.x`
spark-shell --packages 'com.github.jelmerk:hnswlib-spark_4_0_2.13:2.0.0-beta.2'
Pyspark shell
# for spark 3.4.x
pyspark --packages 'com.github.jelmerk:hnswlib-spark_3_4_2.12:2.0.0-beta.2'
# for spark 3.5.x and scala 2.12,
pyspark --packages 'com.github.jelmerk:hnswlib-spark_3_5_2.12:2.0.0-beta.2'
# for spark 4.0.x
pyspark --packages 'com.github.jelmerk:hnswlib-spark_4_0_2.13:2.0.0-beta.2'