Share via


Databricks Runtime 14.0 (EoS)

Note

Support for this Databricks Runtime version has ended. For the end-of-support date, see End-of-support history. For all supported Databricks Runtime versions, see Databricks Runtime release notes versions and compatibility.

The following release notes provide information about Databricks Runtime 14.0, powered by Apache Spark 3.5.0.

Databricks released this version in September 2023.

New features and improvements

Row tracking is GA

Row tracking for Delta Lake is now generally available. See Use row tracking for Delta tables.

Predictive I/O for updates is GA

Predictive I/O for updates is now generally available. See What is predictive I/O?.

Deletion vectors are GA

Deletion vectors are now generally available. See What are deletion vectors?.

Spark 3.5.0 is GA

Apache Spark 3.5.0 is now generally available. See Spark Release 3.5.0.

Public preview for user-defined table functions for Python

User-defined table functions (UDTFs) allow you to register functions that return tables instead of scalar values. See Python user-defined table functions (UDTFs).

Public preview for row-level concurrency

Row-level concurrency reduces conflicts between concurrent write operations by detecting changes at the row-level and automatically resolving competing changes in concurrent writes that update or delete different rows in the same data file. See Write conflicts with row-level concurrency.

Default current working directory has changed

The default current working directory (CWD) for code executed locally is now the directory containing the notebook or script being run. This includes code such as %sh and Python or R code not using Spark. See What is the default current working directory?.

Known issue with sparklyr

The installed version of the sparklyr package (version 1.8.1) is not compatible with Databricks Runtime 14.0. To use sparklyr, install version 1.8.3 or above.

Introducing Spark Connect in shared cluster architecture

With Databricks Runtime 14.0 and above, shared clusters now use Spark Connect with the Spark Driver from the Python REPL by default. Internal Spark APIs are no longer accessible from user code.

Spark Connect now interacts with the Spark Driver from the REPL, instead of the legacy REPL integration.

List available Spark versions API update

Enable Photon by setting runtime_engine = PHOTON, and enable aarch64 by choosing a graviton instance type. Azure Databricks sets the correct Databricks Runtime version. Previously, the Spark version API would return implementation-specific runtimes for each version. See GET /api/2.0/clusters/spark-versions in the REST API Reference.

Breaking changes

In Databricks Runtime 14.0 and above, clusters with standard access mode (formerly shared access mode) use Spark Connect for client-server communication. This includes the following changes.

For more on standard access mode limitations, see Standard compute requirements and limitations.

Python on clusters with standard access mode (formerly shared access mode)

  • sqlContext is not available. Azure Databricks recommends using the spark variable for the SparkSession instance.
  • Spark Context (sc) is no longer available in Notebooks, or when using Databricks Connect on a cluster with standard access mode. The following sc functions are no longer available:
    • emptyRDD, range, init_batched_serializer, parallelize, pickleFile, textFile, wholeTextFiles, binaryFiles, binaryRecords, sequenceFile, newAPIHadoopFile, newAPIHadoopRDD, hadoopFile, hadoopRDD, union, runJob, setSystemProperty, uiWebUrl, stop, setJobGroup, setLocalProperty, getConf
  • The Dataset Info feature is no longer supported.
  • There is no longer a dependency on the JVM when querying Apache Spark and as a consequence, internal APIs related to the JVM, such as _jsc, _jconf, _jvm, _jsparkSession, _jreader, _jc, _jseq, _jdf, _jmap, and _jcols are no longer supported.
  • When accessing configuration values using spark.conf only dynamic runtime configuration values are accessible.
  • Lakeflow Declarative Pipelines analysis commands are not supported on shared clusters yet.

Delta on clusters with standard access mode (formerly shared access mode)

  • In Python, there is no longer a dependency on JVM when querying Apache Spark. Internal APIs related to JVM, such as DeltaTable._jdt, DeltaTableBuilder._jbuilder, DeltaMergeBuilder._jbuilder, and DeltaOptimizeBuilder._jbuilder are no longer supported.

SQL on clusters with standard access mode (formerly shared access mode)

  • DBCACHE and DBUNCACHE commands are no longer supported.
  • Rare use cases like cache table db as show databases are no longer supported.

Library upgrades

  • Upgraded Python libraries:
    • asttokens from 2.2.1 to 2.0.5
    • attrs from 21.4.0 to 22.1.0
    • botocore from 1.27.28 to 1.27.96
    • certifi from 2022.9.14 to 2022.12.7
    • cryptography from 37.0.1 to 39.0.1
    • debugpy from 1.6.0 to 1.6.7
    • docstring-to-markdown from 0.12 to 0.11
    • executing from 1.2.0 to 0.8.3
    • facets-overview from 1.0.3 to 1.1.1
    • googleapis-common-protos from 1.56.4 to 1.60.0
    • grpcio from 1.48.1 to 1.48.2
    • idna from 3.3 to 3.4
    • ipykernel from 6.17.1 to 6.25.0
    • ipython from 8.10.0 to 8.14.0
    • Jinja2 from 2.11.3 to 3.1.2
    • jsonschema from 4.16.0 to 4.17.3
    • jupyter_core from 4.11.2 to 5.2.0
    • kiwisolver from 1.4.2 to 1.4.4
    • MarkupSafe from 2.0.1 to 2.1.1
    • matplotlib from 3.5.2 to 3.7.0
    • nbconvert from 6.4.4 to 6.5.4
    • nbformat from 5.5.0 to 5.7.0
    • nest-asyncio from 1.5.5 to 1.5.6
    • notebook from 6.4.12 to 6.5.2
    • numpy from 1.21.5 to 1.23.5
    • packaging from 21.3 to 22.0
    • pandas from 1.4.4 to 1.5.3
    • pathspec from 0.9.0 to 0.10.3
    • patsy from 0.5.2 to 0.5.3
    • Pillow from 9.2.0 to 9.4.0
    • pip from 22.2.2 to 22.3.1
    • protobuf from 3.19.4 to 4.24.0
    • pytoolconfig from 1.2.2 to 1.2.5
    • pytz from 2022.1 to 2022.7
    • s3transfer from 0.6.0 to 0.6.1
    • seaborn from 0.11.2 to 0.12.2
    • setuptools from 63.4.1 to 65.6.3
    • soupsieve from 2.3.1 to 2.3.2.post1
    • stack-data from 0.6.2 to 0.2.0
    • statsmodels from 0.13.2 to 0.13.5
    • terminado from 0.13.1 to 0.17.1
    • traitlets from 5.1.1 to 5.7.1
    • typing_extensions from 4.3.0 to 4.4.0
    • urllib3 from 1.26.11 to 1.26.14
    • virtualenv from 20.16.3 to 20.16.7
    • wheel from 0.37.1 to 0.38.4
  • Upgraded R libraries:
    • arrow from 10.0.1 to 12.0.1
    • base from 4.2.2 to 4.3.1
    • blob from 1.2.3 to 1.2.4
    • broom from 1.0.3 to 1.0.5
    • bslib from 0.4.2 to 0.5.0
    • cachem from 1.0.6 to 1.0.8
    • caret from 6.0-93 to 6.0-94
    • chron from 2.3-59 to 2.3-61
    • class from 7.3-21 to 7.3-22
    • cli from 3.6.0 to 3.6.1
    • clock from 0.6.1 to 0.7.0
    • commonmark from 1.8.1 to 1.9.0
    • compiler from 4.2.2 to 4.3.1
    • cpp11 from 0.4.3 to 0.4.4
    • curl from 5.0.0 to 5.0.1
    • data.table from 1.14.6 to 1.14.8
    • datasets from 4.2.2 to 4.3.1
    • dbplyr from 2.3.0 to 2.3.3
    • digest from 0.6.31 to 0.6.33
    • downlit from 0.4.2 to 0.4.3
    • dplyr from 1.1.0 to 1.1.2
    • dtplyr from 1.2.2 to 1.3.1
    • evaluate from 0.20 to 0.21
    • fastmap from 1.1.0 to 1.1.1
    • fontawesome from 0.5.0 to 0.5.1
    • fs from 1.6.1 to 1.6.2
    • future from 1.31.0 to 1.33.0
    • future.apply from 1.10.0 to 1.11.0
    • gargle from 1.3.0 to 1.5.1
    • ggplot2 from 3.4.0 to 3.4.2
    • gh from 1.3.1 to 1.4.0
    • glmnet from 4.1-6 to 4.1-7
    • googledrive from 2.0.0 to 2.1.1
    • googlesheets4 from 1.0.1 to 1.1.1
    • graphics from 4.2.2 to 4.3.1
    • grDevices from 4.2.2 to 4.3.1
    • grid from 4.2.2 to 4.3.1
    • gtable from 0.3.1 to 0.3.3
    • hardhat from 1.2.0 to 1.3.0
    • haven from 2.5.1 to 2.5.3
    • hms from 1.1.2 to 1.1.3
    • htmltools from 0.5.4 to 0.5.5
    • htmlwidgets from 1.6.1 to 1.6.2
    • httpuv from 1.6.8 to 1.6.11
    • httr from 1.4.4 to 1.4.6
    • ipred from 0.9-13 to 0.9-14
    • jsonlite from 1.8.4 to 1.8.7
    • KernSmooth from 2.23-20 to 2.23-21
    • knitr from 1.42 to 1.43
    • later from 1.3.0 to 1.3.1
    • lattice from 0.20-45 to 0.21-8
    • lava from 1.7.1 to 1.7.2.1
    • lubridate from 1.9.1 to 1.9.2
    • markdown from 1.5 to 1.7
    • MASS from 7.3-58.2 to 7.3-60
    • Matrix from 1.5-1 to 1.5-4.1
    • methods from 4.2.2 to 4.3.1
    • mgcv from 1.8-41 to 1.8-42
    • modelr from 0.1.10 to 0.1.11
    • nnet from 7.3-18 to 7.3-19
    • openssl from 2.0.5 to 2.0.6
    • parallel from 4.2.2 to 4.3.1
    • parallelly from 1.34.0 to 1.36.0
    • pillar from 1.8.1 to 1.9.0
    • pkgbuild from 1.4.0 to 1.4.2
    • pkgload from 1.3.2 to 1.3.2.1
    • pROC from 1.18.0 to 1.18.4
    • processx from 3.8.0 to 3.8.2
    • prodlim from 2019.11.13 to 2023.03.31
    • profvis from 0.3.7 to 0.3.8
    • ps from 1.7.2 to 1.7.5
    • Rcpp from 1.0.10 to 1.0.11
    • readr from 2.1.3 to 2.1.4
    • readxl from 1.4.2 to 1.4.3
    • recipes from 1.0.4 to 1.0.6
    • rlang from 1.0.6 to 1.1.1
    • rmarkdown from 2.20 to 2.23
    • Rserve from 1.8-12 to 1.8-11
    • RSQLite from 2.2.20 to 2.3.1
    • rstudioapi from 0.14 to 0.15.0
    • sass from 0.4.5 to 0.4.6
    • shiny from 1.7.4 to 1.7.4.1
    • sparklyr from 1.7.9 to 1.8.1
    • SparkR from 3.4.1 to 3.5.0
    • splines from 4.2.2 to 4.3.1
    • stats from 4.2.2 to 4.3.1
    • stats4 from 4.2.2 to 4.3.1
    • survival from 3.5-3 to 3.5-5
    • sys from 3.4.1 to 3.4.2
    • tcltk from 4.2.2 to 4.3.1
    • testthat from 3.1.6 to 3.1.10
    • tibble from 3.1.8 to 3.2.1
    • tidyverse from 1.3.2 to 2.0.0
    • tinytex from 0.44 to 0.45
    • tools from 4.2.2 to 4.3.1
    • tzdb from 0.3.0 to 0.4.0
    • usethis from 2.1.6 to 2.2.2
    • utils from 4.2.2 to 4.3.1
    • vctrs from 0.5.2 to 0.6.3
    • viridisLite from 0.4.1 to 0.4.2
    • vroom from 1.6.1 to 1.6.3
    • waldo from 0.4.0 to 0.5.1
    • xfun from 0.37 to 0.39
    • xml2 from 1.3.3 to 1.3.5
    • zip from 2.2.2 to 2.3.0
  • Upgraded Java libraries:
    • com.fasterxml.jackson.core.jackson-annotations from 2.14.2 to 2.15.2
    • com.fasterxml.jackson.core.jackson-core from 2.14.2 to 2.15.2
    • com.fasterxml.jackson.core.jackson-databind from 2.14.2 to 2.15.2
    • com.fasterxml.jackson.dataformat.jackson-dataformat-cbor from 2.14.2 to 2.15.2
    • com.fasterxml.jackson.datatype.jackson-datatype-joda from 2.14.2 to 2.15.2
    • com.fasterxml.jackson.datatype.jackson-datatype-jsr310 from 2.13.4 to 2.15.1
    • com.fasterxml.jackson.module.jackson-module-paranamer from 2.14.2 to 2.15.2
    • com.fasterxml.jackson.module.jackson-module-scala_2.12 from 2.14.2 to 2.15.2
    • com.github.luben.zstd-jni from 1.5.2-5 to 1.5.5-4
    • com.google.code.gson.gson from 2.8.9 to 2.10.1
    • com.google.crypto.tink.tink from 1.7.0 to 1.9.0
    • commons-codec.commons-codec from 1.15 to 1.16.0
    • commons-io.commons-io from 2.11.0 to 2.13.0
    • io.airlift.aircompressor from 0.21 to 0.24
    • io.dropwizard.metrics.metrics-core from 4.2.10 to 4.2.19
    • io.dropwizard.metrics.metrics-graphite from 4.2.10 to 4.2.19
    • io.dropwizard.metrics.metrics-healthchecks from 4.2.10 to 4.2.19
    • io.dropwizard.metrics.metrics-jetty9 from 4.2.10 to 4.2.19
    • io.dropwizard.metrics.metrics-jmx from 4.2.10 to 4.2.19
    • io.dropwizard.metrics.metrics-json from 4.2.10 to 4.2.19
    • io.dropwizard.metrics.metrics-jvm from 4.2.10 to 4.2.19
    • io.dropwizard.metrics.metrics-servlets from 4.2.10 to 4.2.19
    • io.netty.netty-all from 4.1.87.Final to 4.1.93.Final
    • io.netty.netty-buffer from 4.1.87.Final to 4.1.93.Final
    • io.netty.netty-codec from 4.1.87.Final to 4.1.93.Final
    • io.netty.netty-codec-http from 4.1.87.Final to 4.1.93.Final
    • io.netty.netty-codec-http2 from 4.1.87.Final to 4.1.93.Final
    • io.netty.netty-codec-socks from 4.1.87.Final to 4.1.93.Final
    • io.netty.netty-common from 4.1.87.Final to 4.1.93.Final
    • io.netty.netty-handler from 4.1.87.Final to 4.1.93.Final
    • io.netty.netty-handler-proxy from 4.1.87.Final to 4.1.93.Final
    • io.netty.netty-resolver from 4.1.87.Final to 4.1.93.Final
    • io.netty.netty-transport from 4.1.87.Final to 4.1.93.Final
    • io.netty.netty-transport-classes-epoll from 4.1.87.Final to 4.1.93.Final
    • io.netty.netty-transport-classes-kqueue from 4.1.87.Final to 4.1.93.Final
    • io.netty.netty-transport-native-epoll from 4.1.87.Final-linux-x86_64 to 4.1.93.Final-linux-x86_64
    • io.netty.netty-transport-native-kqueue from 4.1.87.Final-osx-x86_64 to 4.1.93.Final-osx-x86_64
    • io.netty.netty-transport-native-unix-common from 4.1.87.Final to 4.1.93.Final
    • org.apache.arrow.arrow-format from 11.0.0 to 12.0.1
    • org.apache.arrow.arrow-memory-core from 11.0.0 to 12.0.1
    • org.apache.arrow.arrow-memory-netty from 11.0.0 to 12.0.1
    • org.apache.arrow.arrow-vector from 11.0.0 to 12.0.1
    • org.apache.avro.avro from 1.11.1 to 1.11.2
    • org.apache.avro.avro-ipc from 1.11.1 to 1.11.2
    • org.apache.avro.avro-mapred from 1.11.1 to 1.11.2
    • org.apache.commons.commons-compress from 1.21 to 1.23.0
    • org.apache.hadoop.hadoop-client-runtime from 3.3.4 to 3.3.6
    • org.apache.logging.log4j.log4j-1.2-api from 2.19.0 to 2.20.0
    • org.apache.logging.log4j.log4j-api from 2.19.0 to 2.20.0
    • org.apache.logging.log4j.log4j-core from 2.19.0 to 2.20.0
    • org.apache.logging.log4j.log4j-slf4j2-impl from 2.19.0 to 2.20.0
    • org.apache.orc.orc-core from 1.8.4-shaded-protobuf to 1.9.0-shaded-protobuf
    • org.apache.orc.orc-mapreduce from 1.8.4-shaded-protobuf to 1.9.0-shaded-protobuf
    • org.apache.orc.orc-shims from 1.8.4 to 1.9.0
    • org.apache.xbean.xbean-asm9-shaded from 4.22 to 4.23
    • org.checkerframework.checker-qual from 3.19.0 to 3.31.0
    • org.glassfish.jersey.containers.jersey-container-servlet from 2.36 to 2.40
    • org.glassfish.jersey.containers.jersey-container-servlet-core from 2.36 to 2.40
    • org.glassfish.jersey.core.jersey-client from 2.36 to 2.40
    • org.glassfish.jersey.core.jersey-common from 2.36 to 2.40
    • org.glassfish.jersey.core.jersey-server from 2.36 to 2.40
    • org.glassfish.jersey.inject.jersey-hk2 from 2.36 to 2.40
    • org.javassist.javassist from 3.25.0-GA to 3.29.2-GA
    • org.mariadb.jdbc.mariadb-java-client from 2.7.4 to 2.7.9
    • org.postgresql.postgresql from 42.3.8 to 42.6.0
    • org.roaringbitmap.RoaringBitmap from 0.9.39 to 0.9.45
    • org.roaringbitmap.shims from 0.9.39 to 0.9.45
    • org.rocksdb.rocksdbjni from 7.8.3 to 8.3.2
    • org.scala-lang.modules.scala-collection-compat_2.12 from 2.4.3 to 2.9.0
    • org.slf4j.jcl-over-slf4j from 2.0.6 to 2.0.7
    • org.slf4j.jul-to-slf4j from 2.0.6 to 2.0.7
    • org.slf4j.slf4j-api from 2.0.6 to 2.0.7
    • org.xerial.snappy.snappy-java from 1.1.10.1 to 1.1.10.3
    • org.yaml.snakeyaml from 1.33 to 2.0

Apache Spark

Databricks Runtime 14.0. This release includes all Spark fixes and improvements included in Databricks Runtime 13.3 LTS, as well as the following additional bug fixes and improvements made to Spark:

  • [SPARK-45109] [DBRRM-462][sc-142247][SQL][connect] Fix aes_decrypt and ln functions in Connect
  • [SPARK-44980] [DBRRM-462][sc-141024][PYTHON][connect] Fix inherited namedtuples to work in createDataFrame
  • [SPARK-44795] [DBRRM-462][sc-139720][CONNECT] CodeGenerator Cache should be classloader specific
  • [SPARK-44861] [DBRRM-498][sc-140716][CONNECT] jsonignore SparkListenerConnectOperationStarted.planRequest
  • [SPARK-44794] [DBRRM-462][sc-139767][CONNECT] Make Streaming Queries work with Connect's artifact management
  • [SPARK-44791] [DBRRM-462][sc-139623][CONNECT] Make ArrowDeserializer work with REPL generated classes
  • [SPARK-44876] [DBRRM-480][sc-140431][PYTHON] Fix Arrow-optimized Python UDF on Spark Connect
  • [SPARK-44877] [DBRRM-482][sc-140437][CONNECT][python] Support python protobuf functions for Spark Connect
  • [SPARK-44882] [DBRRM-463][sc-140430][PYTHON][connect] Remove function uuid/random/chr from PySpark
  • [SPARK-44740] [DBRRM-462][sc-140320][CONNECT][follow] Fix metadata values for Artifacts
  • [SPARK-44822] [DBRRM-464][python][SQL] Make Python UDTFs by default non-deterministic
  • [SPARK-44836] [DBRRM-468][sc-140228][PYTHON] Refactor Arrow Python UDTF
  • [SPARK-44738] [DBRRM-462][sc-139347][PYTHON][connect] Add missing client metadata to calls
  • [SPARK-44722] [DBRRM-462][sc-139306][CONNECT] ExecutePlanResponseReattachableIterator._call_iter: AttributeError: 'NoneType' object has no attribute 'message'
  • [SPARK-44625] [DBRRM-396][sc-139535][CONNECT] SparkConnectExecutionManager to track all executions
  • [SPARK-44663] [SC-139020][dbrrm-420][PYTHON] Disable arrow optimization by default for Python UDTFs
  • [SPARK-44709] [DBRRM-396][sc-139250][CONNECT] Run ExecuteGrpcResponseSender in reattachable execute in new thread to fix flow control
  • [SPARK-44656] [DBRRM-396][sc-138924][CONNECT] Make all iterators CloseableIterators
  • [SPARK-44671] [DBRRM-396][sc-138929][PYTHON][connect] Retry ExecutePlan in case initial request didn't reach server in Python client
  • [SPARK-44624] [DBRRM-396][sc-138919][CONNECT] Retry ExecutePlan in case initial request didn't reach server
  • [SPARK-44574] [DBRRM-396][sc-138288][SQL][connect] Errors that moved into sq/api should also use AnalysisException
  • [SPARK-44613] [DBRRM-396][sc-138473][CONNECT] Add Encoders object
  • [SPARK-44626] [DBRRM-396][sc-138828][SS][connect] Followup on streaming query termination when client session is timed out for Spark Connect
  • [SPARK-44642] [DBRRM-396][sc-138882][CONNECT] ReleaseExecute in ExecutePlanResponseReattachableIterator after it gets error from server
  • [SPARK-41400] [DBRRM-396][sc-138287][CONNECT] Remove Connect Client Catalyst Dependency
  • [SPARK-44664] [DBRRM-396][python][CONNECT] Release the execute when closing the iterator in Python client
  • [SPARK-44631] [DBRRM-396][sc-138823][CONNECT][core][14.0.0] Remove session-based directory when the isolated session cache is evicted
  • [SPARK-42941] [DBRRM-396][sc-138389][SS][connect] Python StreamingQueryListener
  • [SPARK-44636] [DBRRM-396][sc-138570][CONNECT] Leave no dangling iterators
  • [SPARK-44424] [DBRRM-396][connect][PYTHON][14.0.0] Python client for reattaching to existing execute in Spark Connect
  • [SPARK-44637] [SC-138571] Synchronize accesses to ExecuteResponseObserver
  • [SPARK-44538] [SC-138178][connect][SQL] Reinstate Row.jsonValue and friends
  • [SPARK-44421] [SC-138434][spark-44423][CONNECT] Reattachable execution in Spark Connect
  • [SPARK-44418] [SC-136807][python][CONNECT] Upgrade protobuf from 3.19.5 to 3.20.3
  • [SPARK-44587] [SC-138315][sql][CONNECT] Increase protobuf marshaller recursion limit
  • [SPARK-44591] [SC-138292][connect][SQL] Add jobTags to SparkListenerSQLExecutionStart
  • [SPARK-44610] [SC-138368][sql] DeduplicateRelations should retain Alias metadata when creating a new instance
  • [SPARK-44542] [SC-138323][core] Eagerly load SparkExitCode class in exception handler
  • [SPARK-44264] [SC-138143][python]E2E Testing for Deepspeed
  • [SPARK-43997] [SC-138347][connect] Add support for Java UDFs
  • [SPARK-44507] [SQL][connect][14.x][14.0] Move AnalysisException to sql/api
  • [SPARK-44453] [SC-137013][python] Use difflib to display errors in assertDataFrameEqual
  • [SPARK-44394] [SC-138291][connect][WEBUI][14.0] Add a Spark UI page for Spark Connect
  • [SPARK-44611] [SC-138415][connect] Do not exclude scala-xml
  • [SPARK-44531] [SC-138044][connect][SQL][14.x][14.0] Move encoder inference to sql/api
  • [SPARK-43744] [SC-138289][connect][14.x][14.0] Fix class loading problem cau…
  • [SPARK-44590] [SC-138296][sql][CONNECT] Remove the arrow batch record limit for SqlCommandResult
  • [SPARK-43968] [SC-138115][python] Improve error messages for Python UDTFs with wrong number of outputs
  • [SPARK-44432] [SC-138293][ss][CONNECT] Terminate streaming queries when a session times out in Spark Connect
  • [SPARK-44584] [SC-138295][connect] Set client_type information for AddArtifactsRequest and ArtifactStatusesRequest in Scala Client
  • [SPARK-44552] [14.0][sc-138176][SQL] Remove private object ParseState definition from IntervalUtils
  • [SPARK-43660] [SC-136183][connect][PS] Enable resample with Spark Connect
  • [SPARK-44287] [SC-136223][sql] Use PartitionEvaluator API in RowToColumnarExec & ColumnarToRowExec SQL operators.
  • [SPARK-39634] [SC-137566][sql] Allow file splitting in combination with row index generation
  • [SPARK-44533] [SC-138058][python] Add support for accumulator, broadcast, and Spark files in Python UDTF's analyze
  • [SPARK-44479] [SC-138146][python] Fix ArrowStreamPandasUDFSerializer to accept no-column pandas DataFrame
  • [SPARK-44425] [SC-138177][connect] Validate that user provided sessionId is an UUID
  • [SPARK-44535] [SC-138038][connect][SQL] Move required Streaming API to sql/api
  • [SPARK-44264] [SC-136523][ml][PYTHON] Write a Deepspeed Distributed Learning Class DeepspeedTorchDistributor
  • [SPARK-42098] [SC-138164][sql] Fix ResolveInlineTables can not handle with RuntimeReplaceable expression
  • [SPARK-44060] [SC-135693][sql] Code-gen for build side outer shuffled hash join
  • [SPARK-44496] [SC-137682][sql][CONNECT] Move Interfaces needed by SCSC to sql/api
  • [SPARK-44532] [SC-137893][connect][SQL] Move ArrowUtils to sql/api
  • [SPARK-44413] [SC-137019][python] Clarify error for unsupported arg data type in assertDataFrameEqual
  • [SPARK-44530] [SC-138036][core][CONNECT] Move SparkBuildInfo to common/util
  • [SPARK-36612] [SC-133071][sql] Support left outer join build left or right outer join build right in shuffled hash join
  • [SPARK-44519] [SC-137728][connect] SparkConnectServerUtils generated incorrect parameters for jars
  • [SPARK-44449] [SC-137818][connect] Upcasting for direct Arrow Deserialization
  • [SPARK-44131] [SC-136346][sql] Add call_function and deprecate call_udf for Scala API
  • [SPARK-44541] [SQL] Remove useless function hasRangeExprAgainstEventTimeCol from UnsupportedOperationChecker
  • [SPARK-44523] [SC-137859][sql] Filter's maxRows/maxRowsPerPartition is 0 if condition is FalseLiteral
  • [SPARK-44540] [SC-137873][ui] Remove unused stylesheet and javascript files of jsonFormatter
  • [SPARK-44466] [SC-137856][sql] Exclude configs starting with SPARK_DRIVER_PREFIX and SPARK_EXECUTOR_PREFIX from modifiedConfigs
  • [SPARK-44477] [SC-137508][sql] Treat TYPE_CHECK_FAILURE_WITH_HINT as an error subclass
  • [SPARK-44509] [SC-137855][python][CONNECT] Add job cancellation API set in Spark Connect Python client
  • [SPARK-44059] [SC-137023] Add analyzer support of named arguments for built-in functions
  • [SPARK-38476] [SC-136448][core] Use error class in org.apache.spark.storage
  • [SPARK-44486] [SC-137817][python][CONNECT] Implement PyArrow self_destruct feature for toPandas
  • [SPARK-44361] [SC-137200][sql] Use PartitionEvaluator API in MapInBatchExec
  • [SPARK-44510] [SC-137652][ui] Update dataTables to 1.13.5 and remove some unreached png files
  • [SPARK-44503] [SC-137808][sql] Add SQL grammar for PARTITION BY and ORDER BY clause after TABLE arguments for TVF calls
  • [SPARK-38477] [SC-136319][core] Use error class in org.apache.spark.shuffle
  • [SPARK-44299] [SC-136088][sql] Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8]
  • [SPARK-44422] [SC-137567][connect] Spark Connect fine grained interrupt
  • [SPARK-44380] [SC-137415][sql][PYTHON] Support for Python UDTF to analyze in Python
  • [SPARK-43923] [SC-137020][connect] Post listenerBus events durin…
  • [SPARK-44303] [SC-136108][sql] Assign names to the error class LEGACY_ERROR_TEMP[2320-2324]
  • [SPARK-44294] [SC-135885][ui] Fix HeapHistogram column shows unexpectedly w/ select-all-box
  • [SPARK-44409] [SC-136975][sql] Handle char/varchar in Dataset.to to keep consistent with others
  • [SPARK-44334] [SC-136576][sql][UI] Status in the REST API response for a failed DDL/DML with no jobs should be FAILED rather than COMPLETED
  • [SPARK-42309] [SC-136703][sql] Introduce INCOMPATIBLE_DATA_TO_TABLE and sub classes.
  • [SPARK-44367] [SC-137418][sql][UI] Show error message on UI for each failed query
  • [SPARK-44474] [SC-137195][connect] Reenable “Test observe response” at SparkConnectServiceSuite
  • [SPARK-44320] [SC-136446][sql] Assign names to the error class LEGACY_ERROR_TEMP[1067,1150,1220,1265,1277]
  • [SPARK-44310] [SC-136055][connect] The Connect Server startup log should display the hostname and port
  • [SPARK-44309] [SC-136193][ui] Display Add/Remove Time of Executors on Executors Tab
  • [SPARK-42898] [SC-137556][sql] Mark that string/date casts do not need time zone id
  • [SPARK-44475] [SC-137422][sql][CONNECT] Relocate DataType and Parser to sql/api
  • [SPARK-44484] [SC-137562][ss]Add batchDuration to StreamingQueryProgress json method
  • [SPARK-43966] [SC-137559][sql][PYTHON] Support non-deterministic table-valued functions
  • [SPARK-44439] [SC-136973][connect][SS]Fixed listListeners to only send ids back to client
  • [SPARK-44341] [SC-137054][sql][PYTHON] Define the computing logic through PartitionEvaluator API and use it in WindowExec and WindowInPandasExec
  • [SPARK-43839] [SC-132680][sql] Convert _LEGACY_ERROR_TEMP_1337 to UNSUPPORTED_FEATURE.TIME_TRAVEL
  • [SPARK-44244] [SC-135703][sql] Assign names to the error class LEGACY_ERROR_TEMP[2305-2309]
  • [SPARK-44201] [SC-136778][connect][SS]Add support for Streaming Listener in Scala for Spark Connect
  • [SPARK-44260] [SC-135618][sql] Assign names to the error class LEGACY_ERROR_TEMP[1215-1245-2329] & Use checkError() to check Exception in _CharVarchar_Suite
  • [SPARK-42454] [SC-136913][sql] SPJ: encapsulate all SPJ related parameters in BatchScanExec
  • [SPARK-44292] [SC-135844][sql] Assign names to the error class LEGACY_ERROR_TEMP[2315-2319]
  • [SPARK-44396] [SC-137221][connect] Direct Arrow Deserialization
  • [SPARK-44324] [SC-137172][sql][CONNECT] Move CaseInsensitiveMap to sql/api
  • [SPARK-44395] [SC-136744][sql] Add test back to StreamingTableSuite
  • [SPARK-44481] [SC-137401][connect][PYTHON] Make pyspark.sql.is_remote an API
  • [SPARK-44278] [SC-137400][connect] Implement a GRPC server interceptor that cleans up thread local properties
  • [SPARK-44264] [SC-137211][ml][PYTHON] Support Distributed Training of Functions Using Deepspeed
  • [SPARK-44430] [SC-136970][sql] Add cause to AnalysisException when option is invalid
  • [SPARK-44264] [SC-137167][ml][PYTHON] Incorporating FunctionPickler Into TorchDistributor
  • [SPARK-44216] [SC-137046] [PYTHON] Make assertSchemaEqual API public
  • [SPARK-44398] [SC-136720][connect] Scala foreachBatch API
  • [SPARK-43203] [SC-134528][sql] Move all Drop Table case to DataSource V2
  • [SPARK-43755] [SC-137171][connect][MINOR] Open AdaptiveSparkPlanHelper.allChildren instead of using copy in MetricGenerator
  • [SPARK-44264] [SC-137187][ml][PYTHON] Refactoring TorchDistributor To Allow for Custom “run_training_on_file” Function Pointer
  • [SPARK-43755] [SC-136838][connect] Move execution out of SparkExecutePlanStreamHandler and to a different thread
  • [SPARK-44411] [SC-137198][sql] Use PartitionEvaluator API in ArrowEvalPythonExec and BatchEvalPythonExec
  • [SPARK-44375] [SC-137197][sql] Use PartitionEvaluator API in DebugExec
  • [SPARK-43967] [SC-137057][python] Support regular Python UDTFs with empty return values
  • [SPARK-43915] [SC-134766][sql] Assign names to the error class LEGACY_ERROR_TEMP[2438-2445]
  • [SPARK-43965] [SC-136929][python][CONNECT] Support Python UDTF in Spark Connect
  • [SPARK-44154] [SC-137050][sql] Added more unit tests to BitmapExpressionUtilsSuite and made minor improvements to Bitmap Aggregate Expressions
  • [SPARK-44169] [SC-135497][sql] Assign names to the error class LEGACY_ERROR_TEMP[2300-2304]
  • [SPARK-44353] [SC-136578][connect][SQL] Remove StructType.toAttributes
  • [SPARK-43964] [SC-136676][sql][PYTHON] Support arrow-optimized Python UDTFs
  • [SPARK-44321] [SC-136308][connect] Decouple ParseException from AnalysisException
  • [SPARK-44348] [SAS-1910][sc-136644][CORE][connect][PYTHON] Reenable test_artifact with relevant changes
  • [SPARK-44145] [SC-136698][sql] Callback when ready for execution
  • [SPARK-43983] [SC-136404][python][ML][connect] Enable cross validator estimator test
  • [SPARK-44399] [SC-136669][pyhton][CONNECT] Import SparkSession in Python UDF only when useArrow is None
  • [SPARK-43631] [SC-135300][connect][PS] Enable Series.interpolate with Spark Connect
  • [SPARK-44374] [SC-136544][python][ML] Add example code for distributed ML for spark connect
  • [SPARK-44282] [SC-135948][connect] Prepare DataType parsing for use in Spark Connect Scala Client
  • [SPARK-44052] [SC-134469][connect][PS] Add util to get proper Column or DataFrame class for Spark Connect.
  • [SPARK-43983] [SC-136404][python][ML][connect] Implement cross validator estimator
  • [SPARK-44290] [SC-136300][connect] Session-based files and archives in Spark Connect
  • [SPARK-43710] [SC-134860][ps][CONNECT] Support functions.date_part for Spark Connect
  • [SPARK-44036] [SC-134036][connect][PS] Cleanup & consolidate tickets to simplify the tasks.
  • [SPARK-44150] [SC-135790][python][CONNECT] Explicit Arrow casting for mismatched return type in Arrow Python UDF
  • [SPARK-43903] [SC-134754][python][CONNECT] Improve ArrayType input support in Arrow Python UDF
  • [SPARK-44250] [SC-135819][ml][PYTHON][connect] Implement classification evaluator
  • [SPARK-44255] [SC-135704][sql] Relocate StorageLevel to common/utils
  • [SPARK-42169] [SC-135735] [SQL] Implement code generation for to_csv function (StructsToCsv)
  • [SPARK-44249] [SC-135719][sql][PYTHON] Refactor PythonUDTFRunner to send its return type separately
  • [SPARK-43353] [SC-132734][python] Migrate remaining session errors into error class
  • [SPARK-44133] [SC-134795][python] Upgrade MyPy from 0.920 to 0.982
  • [SPARK-42941] [SC-134707][ss][CONNECT][1/2] StreamingQueryListener - Event Serde in JSON format
  • [SPARK-43353] Revert “[SC-132734][es-729763][PYTHON] Migrate remaining session errors into error class”
  • [SPARK-44100] [SC-134576][ml][CONNECT][python] Move namespace from pyspark.mlv2 to pyspark.ml.connect
  • [SPARK-44220] [SC-135484][sql] Move StringConcat to sql/api
  • [SPARK-43992] [SC-133645][sql][PYTHON][connect] Add optional pattern for Catalog.listFunctions
  • [SPARK-43982] [SC-134529][ml][PYTHON][connect] Implement pipeline estimator for ML on spark connect
  • [SPARK-43888] [SC-132893][core] Relocate Logging to common/utils
  • [SPARK-42941] Revert “[SC-134707][ss][CONNECT][1/2] StreamingQueryListener - Event Serde in JSON format”
  • [SPARK-43624] [SC-134557][ps][CONNECT] Add EWM to SparkConnectPlanner.
  • [SPARK-43981] [SC-134137][python][ML] Basic saving / loading implementation for ML on spark connect
  • [SPARK-43205] [SC-133371][sql] fix SQLQueryTestSuite
  • [SPARK-43376] Revert “[SC-130433][sql] Improve reuse subquery with table cache”
  • [SPARK-44040] [SC-134366][sql] Fix compute stats when AggregateExec node above QueryStageExec
  • [SPARK-43919] [SC-133374][sql] Extract JSON functionality out of Row
  • [SPARK-42618] [SC-134433][python][PS] Warning for the pandas-related behavior changes in next major release
  • [SPARK-43893] [SC-133381][python][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF
  • [SPARK-43627] [SC-134290][spark-43626][PS][connect] Enable pyspark.pandas.spark.functions.{kurt, skew} in Spark Connect.
  • [SPARK-43798] [SC-133990][sql][PYTHON] Support Python user-defined table functions
  • [SPARK-43616] [SC-133849][ps][CONNECT] Enable pyspark.pandas.spark.functions.mode in Spark Connect
  • [SPARK-43133] [SC-133728] Scala Client DataStreamWriter Foreach support
  • [SPARK-43684] [SC-134107][spark-43685][SPARK-43686][spark-43691][CONNECT][ps] Fix (NullOps|NumOps).(eq|ne) for Spark Connect.
  • [SPARK-43645] [SC-134151][spark-43622][PS][connect] Enable pyspark.pandas.spark.functions.{var, stddev} in Spark Connect
  • [SPARK-43617] [SC-133893][ps][CONNECT] Enable pyspark.pandas.spark.functions.product in Spark Connect
  • [SPARK-43610] [SC-133832][connect][PS] Enable InternalFrame.attach_distributed_column in Spark Connect.
  • [SPARK-43621] [SC-133852][ps][CONNECT] Enable pyspark.pandas.spark.functions.repeat in Spark Connect
  • [SPARK-43921] [SC-133461][protobuf] Generate Protobuf descriptor files at build time
  • [SPARK-43613] [SC-133727][ps][CONNECT] Enable pyspark.pandas.spark.functions.covar in Spark Connect
  • [SPARK-43376] [SC-130433][sql] Improve reuse subquery with table cache
  • [SPARK-43612] [SC-132011][connect][PYTHON] Implement SparkSession.addArtifact(s) in Python client
  • [SPARK-43920] [SC-133611][sql][CONNECT] Create sql/api module
  • [SPARK-43097] [SC-133372][ml] New pyspark ML logistic regression estimator implemented on top of distributor
  • [SPARK-43783] [SC-133240][spark-43784][SPARK-43788][ml] Make MLv2 (ML on spark connect) supports pandas >= 2.0
  • [SPARK-43024] [SC-132716][python] Upgrade pandas to 2.0.0
  • [SPARK-43881] [SC-133140][sql][PYTHON][connect] Add optional pattern for Catalog.listDatabases
  • [SPARK-39281] [SC-131422][sql] Speed up Timestamp type inference with legacy format in JSON/CSV data source
  • [SPARK-43792] [SC-132887][sql][PYTHON][connect] Add optional pattern for Catalog.listCatalogs
  • [SPARK-43132] [SC-131623] [SS] [CONNECT] Python Client DataStreamWriter foreach() API
  • [SPARK-43545] [SC-132378][sql][PYTHON] Support nested timestamp type
  • [SPARK-43353] [SC-132734][python] Migrate remaining session errors into error class
  • [SPARK-43304] [SC-129969][connect][PYTHON] Migrate NotImplementedError into PySparkNotImplementedError
  • [SPARK-43516] [SC-132202][ml][PYTHON][connect] Base interfaces of sparkML for spark3.5: estimator/transformer/model/evaluator
  • [SPARK-43128] Revert “[SC-131628][connect][SS] Make recentProgress and lastProgress return StreamingQueryProgress consistent with the native Scala Api”
  • [SPARK-43543] [SC-131839][python] Fix nested MapType behavior in Pandas UDF
  • [SPARK-38469] [SC-131425][core] Use error class in org.apache.spark.network
  • [SPARK-43309] [SC-129746][spark-38461][CORE] Extend INTERNAL_ERROR with categories and add error class INTERNAL_ERROR_BROADCAST
  • [SPARK-43265] [SC-129653] Move Error framework to a common utils module
  • [SPARK-43440] [SC-131229][python][CONNECT] Support registration of an Arrow-optimized Python UDF
  • [SPARK-43528] [SC-131531][sql][PYTHON] Support duplicated field names in createDataFrame with pandas DataFrame
  • [SPARK-43412] [SC-130990][python][CONNECT] Introduce SQL_ARROW_BATCHED_UDF EvalType for Arrow-optimized Python UDFs
  • [SPARK-40912] [SC-130986][core]Overhead of Exceptions in KryoDeserializationStream
  • [SPARK-39280] [SC-131206][sql] Speed up Timestamp type inference with user-provided format in JSON/CSV data source
  • [SPARK-43473] [SC-131372][python] Support struct type in createDataFrame from pandas DataFrame
  • [SPARK-43443] [SC-131024][sql] Add benchmark for Timestamp type inference when use invalid value
  • [SPARK-41532] [SC-130523][connect][CLIENT] Add check for operations that involve multiple data frames
  • [SPARK-43296] [SC-130627][connect][PYTHON] Migrate Spark Connect session errors into error class
  • [SPARK-43324] [SC-130455][sql] Handle UPDATE commands for delta-based sources
  • [SPARK-43347] [SC-130148][python] Remove Python 3.7 Support
  • [SPARK-43292] [SC-130525][core][CONNECT] Move ExecutorClassLoader to core module and simplify Executor#addReplClassLoaderIfNeeded
  • [SPARK-43081] [SC-129900] [ML] [CONNECT] Add torch distributor data loader that loads data from spark partition data
  • [SPARK-43331] [SC-130061][connect] Add Spark Connect SparkSession.interruptAll
  • [SPARK-43306] [SC-130320][python] Migrate ValueError from Spark SQL types into error class
  • [SPARK-43261] [SC-129674][python] Migrate TypeError from Spark SQL types into error class.
  • [SPARK-42992] [SC-129465][python] Introduce PySparkRuntimeError
  • [SPARK-16484] [SC-129975][sql] Add support for Datasketches HllSketch
  • [SPARK-43165] [SC-128823][sql] Move canWrite to DataTypeUtils
  • [SPARK-43082] [SC-129112][connect][PYTHON] Arrow-optimized Python UDFs in Spark Connect
  • [SPARK-43084] [SC-128654] [SS] Add applyInPandasWithState support for spark connect
  • [SPARK-42657] [SC-128621][connect] Support to find and transfer client-side REPL classfiles to server as artifacts
  • [SPARK-43098] [SC-77059][sql] Fix correctness COUNT bug when scalar subquery has group by clause
  • [SPARK-42884] [SC-126662][connect] Add Ammonite REPL integration
  • [SPARK-42994] [SC-128333][ml][CONNECT] PyTorch Distributor support Local Mode
  • [SPARK-41498] [SC-125343]Revert ” Propagate metadata through Union”
  • [SPARK-42993] [SC-127829][ml][CONNECT] Make PyTorch Distributor compatible with Spark Connect
  • [SPARK-42683] [LC-75] Automatically rename conflicting metadata columns
  • [SPARK-42874] [SC-126442][sql] Enable new golden file test framework for analysis for all input files
  • [SPARK-42779] [SC-126042][sql] Allow V2 writes to indicate advisory shuffle partition size
  • [SPARK-42891] [SC-126458][connect][PYTHON] Implement CoGrouped Map API
  • [SPARK-42791] [SC-126134][sql] Create a new golden file test framework for analysis
  • [SPARK-42615] [SC-124237][connect][PYTHON] Refactor the AnalyzePlan RPC and add session.version
  • [SPARK-41302] Revert “[ALL TESTS][sc-122423][SQL] Assign name to _LEGACY_ERROR_TEMP_1185”
  • [SPARK-40770] [SC-122652][python] Improved error messages for applyInPandas for schema mismatch
  • [SPARK-40770] Revert “[ALL TESTS][sc-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch”
  • [SPARK-42398] [SC-123500][sql] Refine default column value DS v2 interface
  • [SPARK-40770] [ALL TESTS][sc-122652][PYTHON] Improved error messages for applyInPandas for schema mismatch
  • [SPARK-40770] Revert “[SC-122652][python] Improved error messages for applyInPandas for schema mismatch”
  • [SPARK-40770] [SC-122652][python] Improved error messages for applyInPandas for schema mismatch
  • [SPARK-42038] [ALL TESTS] Revert “Revert “[SC-122533][sql] SPJ: Support partially clustered distribution””
  • [SPARK-42038] Revert “[SC-122533][sql] SPJ: Support partially clustered distribution”
  • [SPARK-42038] [SC-122533][sql] SPJ: Support partially clustered distribution
  • [SPARK-40550] [SC-120989][sql] DataSource V2: Handle DELETE commands for delta-based sources
  • [SPARK-40770] Revert “[SC-122652][python] Improved error messages for applyInPandas for schema mismatch”
  • [SPARK-40770] [SC-122652][python] Improved error messages for applyInPandas for schema mismatch
  • [SPARK-41302] Revert “[SC-122423][sql] Assign name to _LEGACY_ERROR_TEMP_1185”
  • [SPARK-40550] Revert “[SC-120989][sql] DataSource V2: Handle DELETE commands for delta-based sources”
  • [SPARK-42123] Revert “[SC-121453][sql] Include column default values in DESCRIBE and SHOW CREATE TABLE output”
  • [SPARK-42146] [SC-121172][core] Refactor Utils#setStringField to make maven build pass when sql module use this method
  • [SPARK-42119] Revert “[SC-121342][sql] Add built-in table-valued functions inline and inline_outer”

Highlights

  • Fix aes_decrypt and ln functions in Connect SPARK-45109
  • Fix inherited named tuples to work in createDataFrame SPARK-44980
  • CodeGenerator Cache is now classloader-specific [SPARK-44795]
  • Added SparkListenerConnectOperationStarted.planRequest [SPARK-44861]
  • Make Streaming Queries work with Connect's artifact management [SPARK-44794]
  • ArrowDeserializer works with REPL generated classes [SPARK-44791]
  • Fixed Arrow-optimized Python UDF on Spark Connect [SPARK-44876]
  • Scala and Go client support in Spark Connect SPARK-42554 SPARK-43351
  • PyTorch-based distributed ML Support for Spark Connect SPARK-42471
  • Structured Streaming support for Spark Connect in Python and Scala SPARK-42938
  • Pandas API support for the Python Spark Connect Client SPARK-42497
  • Introduce Arrow Python UDFs SPARK-40307
  • Support Python user-defined table functions SPARK-43798
  • Migrate PySpark errors onto error classes SPARK-42986
  • PySpark Test Framework SPARK-44042
  • Add support for Datasketches HllSketch SPARK-16484
  • Built-in SQL Function Improvement SPARK-41231
  • IDENTIFIER clause SPARK-43205
  • Add SQL functions into Scala, Python and R API SPARK-43907
  • Add named argument support for SQL functions SPARK-43922
  • Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated SPARK-41469
  • Distributed ML <> spark connect SPARK-42471
  • DeepSpeed Distributor SPARK-44264
  • Implement changelog checkpointing for RocksDB state store SPARK-43421
  • Introduce watermark propagation among operators SPARK-42376
  • Introduce dropDuplicatesWithinWatermark SPARK-42931
  • RocksDB state store provider memory management enhancements SPARK-43311

Spark Connect

  • Refactoring of the sql module into sql and sql-api to produce a minimum set of dependencies that can be shared between the Scala Spark Connect client and Spark and avoids pulling all of the Spark transitive dependencies. SPARK-44273
  • Introducing the Scala client for Spark Connect SPARK-42554
  • Pandas API support for the Python Spark Connect Client SPARK-42497
  • PyTorch-based distributed ML Support for Spark Connect SPARK-42471
  • Structured Streaming support for Spark Connect in Python and Scala SPARK-42938
  • Initial version of the Go client SPARK-43351
  • Lot's of compatibility improvements between Spark native and the Spark Connect clients across Python and Scala
  • Improved debugability and request handling for client applications (asynchronous processing, retries, long-lived queries)

Spark SQL

Features

  • Add metadata column file block start and length SPARK-42423
  • Support positional parameters in Scala/Java sql() SPARK-44066
  • Add named parameter support in parser for function calls SPARK-43922
  • Support SELECT DEFAULT with ORDER BY, LIMIT, OFFSET for INSERT source relation SPARK-43071
  • Add SQL grammar for PARTITION BY and ORDER BY clause after TABLE arguments for TVF calls SPARK-44503
  • Include column default values in DESCRIBE and SHOW CREATE TABLE output SPARK-42123
  • Add optional pattern for Catalog.listCatalogs SPARK-43792
  • Add optional pattern for Catalog.listDatabases SPARK-43881
  • Callback when ready for execution SPARK-44145
  • Support Insert By Name statement SPARK-42750
  • Add call_function for Scala API SPARK-44131
  • Stable derived column aliases SPARK-40822
  • Support general constant expressions as CREATE/REPLACE TABLE OPTIONS values SPARK-43529
  • Support subqueries with correlation through INTERSECT/EXCEPT SPARK-36124
  • IDENTIFIER clause SPARK-43205
  • ANSI MODE: Conv should return an error if the internal conversion overflows SPARK-42427

Functions

  • Add support for Datasketches HllSketch SPARK-16484
  • Support the CBC mode by aes_encrypt()/aes_decrypt() SPARK-43038
  • Support TABLE argument parser rule for TableValuedFunction SPARK-44200
  • Implement bitmap functions SPARK-44154
  • Add the try_aes_decrypt() function SPARK-42701
  • array_insert should fail with 0 index SPARK-43011
  • Add to_varchar alias for to_char SPARK-43815
  • High-order function: array_compact implementation SPARK-41235
  • Add analyzer support of named arguments for built-in functions SPARK-44059
  • Add NULLs for INSERTs with user-specified lists of fewer columns than the target table SPARK-42521
  • Adds support for aes_encrypt IVs and AAD SPARK-43290
  • DECODE function returns wrong results when passed NULL SPARK-41668
  • Support udf 'luhn_check' SPARK-42191
  • Support implicit lateral column alias resolution on Aggregate SPARK-41631
  • Support implicit lateral column alias in queries with Window SPARK-42217
  • Add 3-args function aliases DATE_ADD and DATE_DIFF SPARK-43492

Data Sources

  • Char/Varchar Support for JDBC Catalog SPARK-42904
  • Support Get SQL Keywords Dynamically Thru JDBC API and TVF SPARK-43119
  • DataSource V2: Handle MERGE commands for delta-based sources SPARK-43885
  • DataSource V2: Handle MERGE commands for group-based sources SPARK-43963
  • DataSource V2: Handle UPDATE commands for group-based sources SPARK-43975
  • DataSource V2: Allow representing updates as deletes and inserts SPARK-43775
  • Allow jdbc dialects to override the query used to create a table SPARK-41516
  • SPJ: Support partially clustered distribution SPARK-42038
  • DSv2 allows CTAS/RTAS to reserve schema nullability SPARK-43390
  • Add spark.sql.files.maxPartitionNum SPARK-44021
  • Handle UPDATE commands for delta-based sources SPARK-43324
  • Allow V2 writes to indicate advisory shuffle partition size SPARK-42779
  • Support lz4raw compression codec for Parquet SPARK-43273
  • Avro: writing complex unions SPARK-25050
  • Speed up Timestamp type inference with user-provided format in JSON/CSV data source SPARK-39280
  • Avro to Support custom decimal type backed by Long SPARK-43901
  • Avoid shuffle in Storage-Partitioned Join when partition keys mismatch, but join expressions are compatible SPARK-41413
  • Change binary to unsupported dataType in CSV format SPARK-42237
  • Allow Avro to convert union type to SQL with field name stable with type SPARK-43333
  • Speed up Timestamp type inference with legacy format in JSON/CSV data source SPARK-39281

Query Optimization

  • Subexpression elimination support shortcut expression SPARK-42815
  • Improve join stats estimation if one side can keep uniqueness SPARK-39851
  • Introduce the group limit of Window for rank-based filter to optimize top-k computation SPARK-37099
  • Fix behavior of null IN (empty list) in optimization rules SPARK-44431
  • Infer and push down window limit through window if partitionSpec is empty SPARK-41171
  • Remove the outer join if they are all distinct aggregate functions SPARK-42583
  • Collapse two adjacent windows with the same partition/order in subquery SPARK-42525
  • Push down limit through Python UDFs SPARK-42115
  • Optimize the order of filtering predicates SPARK-40045

Code Generation and Query Execution

  • Runtime filter should supports multi level shuffle join side as filter creation side SPARK-41674
  • Codegen Support for HiveSimpleUDF SPARK-42052
  • Codegen Support for HiveGenericUDF SPARK-42051
  • Codegen Support for build side outer shuffled hash join SPARK-44060
  • Implement code generation for to_csv function (StructsToCsv) SPARK-42169
  • Make AQE support InMemoryTableScanExec SPARK-42101
  • Support left outer join build left or right outer join build right in shuffled hash join SPARK-36612
  • Respect RequiresDistributionAndOrdering in CTAS/RTAS SPARK-43088
  • Coalesce buckets in join applied on broadcast join stream side SPARK-43107
  • Set nullable correctly on coalesced join key in full outer USING join SPARK-44251
  • Fix IN subquery ListQuery nullability SPARK-43413

Other Notable Changes

  • Set nullable correctly for keys in USING joins SPARK-43718
  • Fix COUNT(*) is null bug in correlated scalar subquery SPARK-43156
  • Dataframe.joinWith outer-join should return a null value for unmatched row SPARK-37829
  • Automatically rename conflicting metadata columns SPARK-42683
  • Document the Spark SQL error classes in user-facing documentation SPARK-42706

PySpark

Features

Other Notable Changes

  • Add autocomplete support for df[|] in pyspark.sql.dataframe.DataFrame [SPARK-43892]
  • Deprecate & remove the APIs that will be removed in pandas 2.0 [SPARK-42593]
  • Make Python the first tab for code examples - Spark SQL, DataFrames and Datasets Guide SPARK-42493
  • Updating remaining Spark documentation code examples to show Python by default SPARK-42642
  • Use deduplicated field names when creating Arrow RecordBatch [SPARK-41971]
  • Support duplicated field names in createDataFrame with pandas DataFrame [SPARK-43528]
  • Allow columns parameter when creating DataFrame with Series [SPARK-42194]

Core

  • Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks SPARK-40082
  • Introduce PartitionEvaluator for SQL operator execution SPARK-43061
  • Allow ShuffleDriverComponent to declare if shuffle data is reliably stored SPARK-42689
  • Add max attempts limitation for stages to avoid potential infinite retry SPARK-42577
  • Support log level configuration with static Spark conf SPARK-43782
  • Optimize PercentileHeap SPARK-42528
  • Add reason argument to TaskScheduler.cancelTasks SPARK-42602
  • Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated SPARK-41469
  • Fixing accumulator undercount in the case of the retry task with rdd cache SPARK-41497
  • Use RocksDB for spark.history.store.hybridStore.diskBackend by default SPARK-42277
  • NonFateSharingCache wrapper for Guava Cache SPARK-43300
  • Improve the performance of MapOutputTracker.updateMapOutput SPARK-43043
  • Allowing apps to control whether their metadata gets saved in the db by the External Shuffle Service SPARK-43179
  • Add SPARK_DRIVER_POD_IP env variable to executor pods SPARK-42769
  • Mounts the hadoop config map on the executor pod SPARK-43504

Structured Streaming

  • Add support for tracking pinned blocks memory usage for RocksDB state store SPARK-43120
  • Add RocksDB state store provider memory management enhancements SPARK-43311
  • Introduce dropDuplicatesWithinWatermark SPARK-42931
  • Introduce a new callback onQueryIdle() to StreamingQueryListener SPARK-43183
  • Add option to skip commit coordinator as part of StreamingWrite API for DSv2 sources/sinks SPARK-42968
  • Introduce a new callback “onQueryIdle” to StreamingQueryListener SPARK-43183
  • Implement Changelog based Checkpointing for RocksDB State Store Provider SPARK-43421
  • Add support for WRITE_FLUSH_BYTES for RocksDB used in streaming stateful operators SPARK-42792
  • Add support for setting max_write_buffer_number and write_buffer_size for RocksDB used in streaming SPARK-42819
  • RocksDB StateStore lock acquisition should happen after getting input iterator from inputRDD SPARK-42566
  • Introduce watermark propagation among operators SPARK-42376
  • Cleanup orphan sst and log files in RocksDB checkpoint directory SPARK-42353
  • Expand QueryTerminatedEvent to contain error class if it exists in exception SPARK-43482

ML

  • Support Distributed Training of Functions Using Deepspeed SPARK-44264
  • Base interfaces of sparkML for spark3.5: estimator/transformer/model/evaluator SPARK-43516
  • Make MLv2 (ML on spark connect) supports pandas >= 2.0 SPARK-43783
  • Update MLv2 Transformer interfaces SPARK-43516
  • New pyspark ML logistic regression estimator implemented on top of distributor SPARK-43097
  • Add Classifier.getNumClasses back SPARK-42526
  • Write a Deepspeed Distributed Learning Class DeepspeedTorchDistributor SPARK-44264
  • Basic saving / loading implementation for ML on spark connect SPARK-43981
  • Improve logistic regression model saving SPARK-43097
  • Implement pipeline estimator for ML on spark connect SPARK-43982
  • Implement cross validator estimator SPARK-43983
  • Implement classification evaluator SPARK-44250
  • Make PyTorch Distributor compatible with Spark Connect SPARK-42993

UI

  • Add a Spark UI page for Spark Connect SPARK-44394
  • Support Heap Histogram column in Executors tab SPARK-44153
  • Show error message on UI for each failed query SPARK-44367
  • Display Add/Remove Time of Executors on Executors Tab SPARK-44309

Build and Others

Removals, Behavior Changes and Deprecations

Upcoming Removal

The following features will be removed in the next Spark major release

  • Support for Java 8 and Java 11, and the minimal supported Java version will be Java 17
  • Support for Scala 2.12, and the minimal supported Scala version will be 2.13

Migration Guides

Databricks ODBC/JDBC driver support

Databricks supports ODBC/JDBC drivers released in the past 2 years. Please download the recently released drivers and upgrade (download ODBC, download JDBC).

System environment

  • Operating System: Ubuntu 22.04.3 LTS
  • Java: Zulu 8.70.0.23-CA-linux64
  • Scala: 2.12.15
  • Python: 3.10.12
  • R: 4.3.1
  • Delta Lake: 2.4.0

Installed Python libraries

Library Version Library Version Library Version
anyio 3.5.0 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0
asttokens 2.0.5 attrs 22.1.0 backcall 0.2.0
beautifulsoup4 4.11.1 black 22.6.0 bleach 4.1.0
blinker 1.4 boto3 1.24.28 botocore 1.27.96
certifi 2022.12.7 cffi 1.15.1 chardet 4.0.0
charset-normalizer 2.0.4 click 8.0.4 comm 0.1.2
contourpy 1.0.5 cryptography 39.0.1 cycler 0.11.0
Cython 0.29.32 databricks-sdk 0.1.6 dbus-python 1.2.18
debugpy 1.6.7 decorator 5.1.1 defusedxml 0.7.1
distlib 0.3.7 docstring-to-markdown 0.11 entrypoints 0.4
executing 0.8.3 facets-overview 1.1.1 fastjsonschema 2.18.0
filelock 3.12.2 fonttools 4.25.0 GCC runtime library 1.10.0
googleapis-common-protos 1.60.0 grpcio 1.48.2 grpcio-status 1.48.1
httplib2 0.20.2 idna 3.4 importlib-metadata 4.6.4
ipykernel 6.25.0 ipython 8.14.0 ipython-genutils 0.2.0
ipywidgets 7.7.2 jedi 0.18.1 jeepney 0.7.1
Jinja2 3.1.2 jmespath 0.10.0 joblib 1.2.0
jsonschema 4.17.3 jupyter-client 7.3.4 jupyter-server 1.23.4
jupyter_core 5.2.0 jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0
keyring 23.5.0 kiwisolver 1.4.4 launchpadlib 1.10.16
lazr.restfulclient 0.14.4 lazr.uri 1.0.6 lxml 4.9.1
MarkupSafe 2.1.1 matplotlib 3.7.0 matplotlib-inline 0.1.6
mccabe 0.7.0 mistune 0.8.4 more-itertools 8.10.0
mypy-extensions 0.4.3 nbclassic 0.5.2 nbclient 0.5.13
nbconvert 6.5.4 nbformat 5.7.0 nest-asyncio 1.5.6
nodeenv 1.8.0 notebook 6.5.2 notebook_shim 0.2.2
numpy 1.23.5 oauthlib 3.2.0 packaging 22.0
pandas 1.5.3 pandocfilters 1.5.0 parso 0.8.3
pathspec 0.10.3 patsy 0.5.3 pexpect 4.8.0
pickleshare 0.7.5 Pillow 9.4.0 pip 22.3.1
platformdirs 2.5.2 plotly 5.9.0 pluggy 1.0.0
prometheus-client 0.14.1 prompt-toolkit 3.0.36 protobuf 4.24.0
psutil 5.9.0 psycopg2 2.9.3 ptyprocess 0.7.0
pure-eval 0.2.2 pyarrow 8.0.0 pycparser 2.21
pydantic 1.10.6 pyflakes 3.0.1 Pygments 2.11.2
PyGObject 3.42.1 PyJWT 2.3.0 pyodbc 4.0.32
pyparsing 3.0.9 pyright 1.1.294 pyrsistent 0.18.0
python-dateutil 2.8.2 python-lsp-jsonrpc 1.0.0 python-lsp-server 1.7.1
pytoolconfig 1.2.5 pytz 2022.7 pyzmq 23.2.0
requests 2.28.1 rope 1.7.0 s3transfer 0.6.1
scikit-learn 1.1.1 seaborn 0.12.2 SecretStorage 3.3.1
Send2Trash 1.8.0 setuptools 65.6.3 six 1.16.0
sniffio 1.2.0 soupsieve 2.3.2.post1 ssh-import-id 5.11
stack-data 0.2.0 statsmodels 0.13.5 tenacity 8.1.0
terminado 0.17.1 threadpoolctl 2.2.0 tinycss2 1.2.1
tokenize-rt 4.2.1 tomli 2.0.1 tornado 6.1
traitlets 5.7.1 typing_extensions 4.4.0 ujson 5.4.0
unattended-upgrades 0.1 urllib3 1.26.14 virtualenv 20.16.7
wadllib 1.3.6 wcwidth 0.2.5 webencodings 0.5.1
websocket-client 0.58.0 whatthepatch 1.0.2 wheel 0.38.4
widgetsnbextension 3.6.1 yapf 0.31.0 zipp 1.0.0

Installed R libraries

R libraries are installed from the Posit Package Manager CRAN snapshot on 2023-07-13.

Library Version Library Version Library Version
arrow 12.0.1 askpass 1.1 assertthat 0.2.1
backports 1.4.1 base 4.3.1 base64enc 0.1-3
bit 4.0.5 bit64 4.0.5 blob 1.2.4
boot 1.3-28 brew 1.0-8 brio 1.1.3
broom 1.0.5 bslib 0.5.0 cachem 1.0.8
callr 3.7.3 caret 6.0-94 cellranger 1.1.0
chron 2.3-61 class 7.3-22 cli 3.6.1
clipr 0.8.0 clock 0.7.0 cluster 2.1.4
codetools 0.2-19 colorspace 2.1-0 commonmark 1.9.0
compiler 4.3.1 config 0.3.1 conflicted 1.2.0
cpp11 0.4.4 crayon 1.5.2 credentials 1.3.2
curl 5.0.1 data.table 1.14.8 datasets 4.3.1
DBI 1.1.3 dbplyr 2.3.3 desc 1.4.2
devtools 2.4.5 diagram 1.6.5 diffobj 0.3.5
digest 0.6.33 downlit 0.4.3 dplyr 1.1.2
dtplyr 1.3.1 e1071 1.7-13 ellipsis 0.3.2
evaluate 0.21 fansi 1.0.4 farver 2.1.1
fastmap 1.1.1 fontawesome 0.5.1 forcats 1.0.0
foreach 1.5.2 foreign 0.8-82 forge 0.2.0
fs 1.6.2 future 1.33.0 future.apply 1.11.0
gargle 1.5.1 generics 0.1.3 gert 1.9.2
ggplot2 3.4.2 gh 1.4.0 gitcreds 0.1.2
glmnet 4.1-7 globals 0.16.2 glue 1.6.2
googledrive 2.1.1 googlesheets4 1.1.1 gower 1.0.1
graphics 4.3.1 grDevices 4.3.1 grid 4.3.1
gridExtra 2.3 gsubfn 0.7 gtable 0.3.3
hardhat 1.3.0 haven 2.5.3 highr 0.10
hms 1.1.3 htmltools 0.5.5 htmlwidgets 1.6.2
httpuv 1.6.11 httr 1.4.6 httr2 0.2.3
ids 1.0.1 ini 0.3.1 ipred 0.9-14
isoband 0.2.7 iterators 1.0.14 jquerylib 0.1.4
jsonlite 1.8.7 KernSmooth 2.23-21 knitr 1.43
labeling 0.4.2 later 1.3.1 lattice 0.21-8
lava 1.7.2.1 lifecycle 1.0.3 listenv 0.9.0
lubridate 1.9.2 magrittr 2.0.3 markdown 1.7
MASS 7.3-60 Matrix 1.5-4.1 memoise 2.0.1
methods 4.3.1 mgcv 1.8-42 mime 0.12
miniUI 0.1.1.1 ModelMetrics 1.2.2.2 modelr 0.1.11
munsell 0.5.0 nlme 3.1-162 nnet 7.3-19
numDeriv 2016.8-1.1 openssl 2.0.6 parallel 4.3.1
parallelly 1.36.0 pillar 1.9.0 pkgbuild 1.4.2
pkgconfig 2.0.3 pkgdown 2.0.7 pkgload 1.3.2.1
plogr 0.2.0 plyr 1.8.8 praise 1.0.0
prettyunits 1.1.1 pROC 1.18.4 processx 3.8.2
prodlim 2023.03.31 profvis 0.3.8 progress 1.2.2
progressr 0.13.0 promises 1.2.0.1 proto 1.0.0
proxy 0.4-27 ps 1.7.5 purrr 1.0.1
r2d3 0.2.6 R6 2.5.1 ragg 1.2.5
randomForest 4.7-1.1 rappdirs 0.3.3 rcmdcheck 1.4.0
RColorBrewer 1.1-3 Rcpp 1.0.11 RcppEigen 0.3.3.9.3
readr 2.1.4 readxl 1.4.3 recipes 1.0.6
rematch 1.0.1 rematch2 2.1.2 remotes 2.4.2
reprex 2.0.2 reshape2 1.4.4 rlang 1.1.1
rmarkdown 2.23 RODBC 1.3-20 roxygen2 7.2.3
rpart 4.1.19 rprojroot 2.0.3 Rserve 1.8-11
RSQLite 2.3.1 rstudioapi 0.15.0 rversions 2.1.2
rvest 1.0.3 sass 0.4.6 scales 1.2.1
selectr 0.4-2 sessioninfo 1.2.2 shape 1.4.6
shiny 1.7.4.1 sourcetools 0.1.7-1 sparklyr 1.8.1
SparkR 3.5.0 spatial 7.3-15 splines 4.3.1
sqldf 0.4-11 SQUAREM 2021.1 stats 4.3.1
stats4 4.3.1 stringi 1.7.12 stringr 1.5.0
survival 3.5-5 sys 3.4.2 systemfonts 1.0.4
tcltk 4.3.1 testthat 3.1.10 textshaping 0.3.6
tibble 3.2.1 tidyr 1.3.0 tidyselect 1.2.0
tidyverse 2.0.0 timechange 0.2.0 timeDate 4022.108
tinytex 0.45 tools 4.3.1 tzdb 0.4.0
urlchecker 1.0.1 usethis 2.2.2 utf8 1.2.3
utils 4.3.1 uuid 1.1-0 vctrs 0.6.3
viridisLite 0.4.2 vroom 1.6.3 waldo 0.5.1
whisker 0.4.1 withr 2.5.0 xfun 0.39
xml2 1.3.5 xopen 1.0.0 xtable 1.8-4
yaml 2.3.7 zip 2.3.0

Installed Java and Scala libraries (Scala 2.12 cluster version)

Group ID Artifact ID Version
antlr antlr 2.7.7
com.amazonaws amazon-kinesis-client 1.12.0
com.amazonaws aws-java-sdk-autoscaling 1.12.390
com.amazonaws aws-java-sdk-cloudformation 1.12.390
com.amazonaws aws-java-sdk-cloudfront 1.12.390
com.amazonaws aws-java-sdk-cloudhsm 1.12.390
com.amazonaws aws-java-sdk-cloudsearch 1.12.390
com.amazonaws aws-java-sdk-cloudtrail 1.12.390
com.amazonaws aws-java-sdk-cloudwatch 1.12.390
com.amazonaws aws-java-sdk-cloudwatchmetrics 1.12.390
com.amazonaws aws-java-sdk-codedeploy 1.12.390
com.amazonaws aws-java-sdk-cognitoidentity 1.12.390
com.amazonaws aws-java-sdk-cognitosync 1.12.390
com.amazonaws aws-java-sdk-config 1.12.390
com.amazonaws aws-java-sdk-core 1.12.390
com.amazonaws aws-java-sdk-datapipeline 1.12.390
com.amazonaws aws-java-sdk-directconnect 1.12.390
com.amazonaws aws-java-sdk-directory 1.12.390
com.amazonaws aws-java-sdk-dynamodb 1.12.390
com.amazonaws aws-java-sdk-ec2 1.12.390
com.amazonaws aws-java-sdk-ecs 1.12.390
com.amazonaws aws-java-sdk-efs 1.12.390
com.amazonaws aws-java-sdk-elasticache 1.12.390
com.amazonaws aws-java-sdk-elasticbeanstalk 1.12.390
com.amazonaws aws-java-sdk-elasticloadbalancing 1.12.390
com.amazonaws aws-java-sdk-elastictranscoder 1.12.390
com.amazonaws aws-java-sdk-emr 1.12.390
com.amazonaws aws-java-sdk-glacier 1.12.390
com.amazonaws aws-java-sdk-glue 1.12.390
com.amazonaws aws-java-sdk-iam 1.12.390
com.amazonaws aws-java-sdk-importexport 1.12.390
com.amazonaws aws-java-sdk-kinesis 1.12.390
com.amazonaws aws-java-sdk-kms 1.12.390
com.amazonaws aws-java-sdk-lambda 1.12.390
com.amazonaws aws-java-sdk-logs 1.12.390
com.amazonaws aws-java-sdk-machinelearning 1.12.390
com.amazonaws aws-java-sdk-opsworks 1.12.390
com.amazonaws aws-java-sdk-rds 1.12.390
com.amazonaws aws-java-sdk-redshift 1.12.390
com.amazonaws aws-java-sdk-route53 1.12.390
com.amazonaws aws-java-sdk-s3 1.12.390
com.amazonaws aws-java-sdk-ses 1.12.390
com.amazonaws aws-java-sdk-simpledb 1.12.390
com.amazonaws aws-java-sdk-simpleworkflow 1.12.390
com.amazonaws aws-java-sdk-sns 1.12.390
com.amazonaws aws-java-sdk-sqs 1.12.390
com.amazonaws aws-java-sdk-ssm 1.12.390
com.amazonaws aws-java-sdk-storagegateway 1.12.390
com.amazonaws aws-java-sdk-sts 1.12.390
com.amazonaws aws-java-sdk-support 1.12.390
com.amazonaws aws-java-sdk-swf-libraries 1.11.22
com.amazonaws aws-java-sdk-workspaces 1.12.390
com.amazonaws jmespath-java 1.12.390
com.clearspring.analytics stream 2.9.6
com.databricks Rserve 1.8-3
com.databricks databricks-sdk-java 0.2.0
com.databricks jets3t 0.7.1-0
com.databricks.scalapb compilerplugin_2.12 0.4.15-10
com.databricks.scalapb scalapb-runtime_2.12 0.4.15-10
com.esotericsoftware kryo-shaded 4.0.2
com.esotericsoftware minlog 1.3.0
com.fasterxml classmate 1.3.4
com.fasterxml.jackson.core jackson-annotations 2.15.2
com.fasterxml.jackson.core jackson-core 2.15.2
com.fasterxml.jackson.core jackson-databind 2.15.2
com.fasterxml.jackson.dataformat jackson-dataformat-cbor 2.15.2
com.fasterxml.jackson.datatype jackson-datatype-joda 2.15.2
com.fasterxml.jackson.datatype jackson-datatype-jsr310 2.15.1
com.fasterxml.jackson.module jackson-module-paranamer 2.15.2
com.fasterxml.jackson.module jackson-module-scala_2.12 2.15.2
com.github.ben-manes.caffeine caffeine 2.9.3
com.github.fommil jniloader 1.1
com.github.fommil.netlib native_ref-java 1.1
com.github.fommil.netlib native_ref-java 1.1-natives
com.github.fommil.netlib native_system-java 1.1
com.github.fommil.netlib native_system-java 1.1-natives
com.github.fommil.netlib netlib-native_ref-linux-x86_64 1.1-natives
com.github.fommil.netlib netlib-native_system-linux-x86_64 1.1-natives
com.github.luben zstd-jni 1.5.5-4
com.github.wendykierp JTransforms 3.1
com.google.code.findbugs jsr305 3.0.0
com.google.code.gson gson 2.10.1
com.google.crypto.tink tink 1.9.0
com.google.errorprone error_prone_annotations 2.10.0
com.google.flatbuffers flatbuffers-java 1.12.0
com.google.guava guava 15.0
com.google.protobuf protobuf-java 2.6.1
com.helger profiler 1.1.1
com.jcraft jsch 0.1.55
com.jolbox bonecp 0.8.0.RELEASE
com.lihaoyi sourcecode_2.12 0.1.9
com.microsoft.azure azure-data-lake-store-sdk 2.3.9
com.microsoft.sqlserver mssql-jdbc 11.2.2.jre8
com.ning compress-lzf 1.1.2
com.sun.mail javax.mail 1.5.2
com.sun.xml.bind jaxb-core 2.2.11
com.sun.xml.bind jaxb-impl 2.2.11
com.tdunning json 1.8
com.thoughtworks.paranamer paranamer 2.8
com.trueaccord.lenses lenses_2.12 0.4.12
com.twitter chill-java 0.10.0
com.twitter chill_2.12 0.10.0
com.twitter util-app_2.12 7.1.0
com.twitter util-core_2.12 7.1.0
com.twitter util-function_2.12 7.1.0
com.twitter util-jvm_2.12 7.1.0
com.twitter util-lint_2.12 7.1.0
com.twitter util-registry_2.12 7.1.0
com.twitter util-stats_2.12 7.1.0
com.typesafe config 1.2.1
com.typesafe.scala-logging scala-logging_2.12 3.7.2
com.uber h3 3.7.0
com.univocity univocity-parsers 2.9.1
com.zaxxer HikariCP 4.0.3
commons-cli commons-cli 1.5.0
commons-codec commons-codec 1.16.0
commons-collections commons-collections 3.2.2
commons-dbcp commons-dbcp 1.4
commons-fileupload commons-fileupload 1.5
commons-httpclient commons-httpclient 3.1
commons-io commons-io 2.13.0
commons-lang commons-lang 2.6
commons-logging commons-logging 1.1.3
commons-pool commons-pool 1.5.4
dev.ludovic.netlib arpack 3.0.3
dev.ludovic.netlib blas 3.0.3
dev.ludovic.netlib lapack 3.0.3
info.ganglia.gmetric4j gmetric4j 1.0.10
io.airlift aircompressor 0.24
io.delta delta-sharing-spark_2.12 0.7.1
io.dropwizard.metrics metrics-annotation 4.2.19
io.dropwizard.metrics metrics-core 4.2.19
io.dropwizard.metrics metrics-graphite 4.2.19
io.dropwizard.metrics metrics-healthchecks 4.2.19
io.dropwizard.metrics metrics-jetty9 4.2.19
io.dropwizard.metrics metrics-jmx 4.2.19
io.dropwizard.metrics metrics-json 4.2.19
io.dropwizard.metrics metrics-jvm 4.2.19
io.dropwizard.metrics metrics-servlets 4.2.19
io.netty netty-all 4.1.93.Final
io.netty netty-buffer 4.1.93.Final
io.netty netty-codec 4.1.93.Final
io.netty netty-codec-http 4.1.93.Final
io.netty netty-codec-http2 4.1.93.Final
io.netty netty-codec-socks 4.1.93.Final
io.netty netty-common 4.1.93.Final
io.netty netty-handler 4.1.93.Final
io.netty netty-handler-proxy 4.1.93.Final
io.netty netty-resolver 4.1.93.Final
io.netty netty-transport 4.1.93.Final
io.netty netty-transport-classes-epoll 4.1.93.Final
io.netty netty-transport-classes-kqueue 4.1.93.Final
io.netty netty-transport-native-epoll 4.1.93.Final
io.netty netty-transport-native-epoll 4.1.93.Final-linux-aarch_64
io.netty netty-transport-native-epoll 4.1.93.Final-linux-x86_64
io.netty netty-transport-native-kqueue 4.1.93.Final-osx-aarch_64
io.netty netty-transport-native-kqueue 4.1.93.Final-osx-x86_64
io.netty netty-transport-native-unix-common 4.1.93.Final
io.prometheus simpleclient 0.7.0
io.prometheus simpleclient_common 0.7.0
io.prometheus simpleclient_dropwizard 0.7.0
io.prometheus simpleclient_pushgateway 0.7.0
io.prometheus simpleclient_servlet 0.7.0
io.prometheus.jmx collector 0.12.0
jakarta.annotation jakarta.annotation-api 1.3.5
jakarta.servlet jakarta.servlet-api 4.0.3
jakarta.validation jakarta.validation-api 2.0.2
jakarta.ws.rs jakarta.ws.rs-api 2.1.6
javax.activation activation 1.1.1
javax.el javax.el-api 2.2.4
javax.jdo jdo-api 3.0.1
javax.transaction jta 1.1
javax.transaction transaction-api 1.1
javax.xml.bind jaxb-api 2.2.11
javolution javolution 5.5.1
jline jline 2.14.6
joda-time joda-time 2.12.1
net.java.dev.jna jna 5.8.0
net.razorvine pickle 1.3
net.sf.jpam jpam 1.1
net.sf.opencsv opencsv 2.3
net.sf.supercsv super-csv 2.2.0
net.snowflake snowflake-ingest-sdk 0.9.6
net.snowflake snowflake-jdbc 3.13.29
net.sourceforge.f2j arpack_combined_all 0.1
org.acplt.remotetea remotetea-oncrpc 1.1.2
org.antlr ST4 4.0.4
org.antlr antlr-runtime 3.5.2
org.antlr antlr4-runtime 4.9.3
org.antlr stringtemplate 3.2.1
org.apache.ant ant 1.9.16
org.apache.ant ant-jsch 1.9.16
org.apache.ant ant-launcher 1.9.16
org.apache.arrow arrow-format 12.0.1
org.apache.arrow arrow-memory-core 12.0.1
org.apache.arrow arrow-memory-netty 12.0.1
org.apache.arrow arrow-vector 12.0.1
org.apache.avro avro 1.11.2
org.apache.avro avro-ipc 1.11.2
org.apache.avro avro-mapred 1.11.2
org.apache.commons commons-collections4 4.4
org.apache.commons commons-compress 1.23.0
org.apache.commons commons-crypto 1.1.0
org.apache.commons commons-lang3 3.12.0
org.apache.commons commons-math3 3.6.1
org.apache.commons commons-text 1.10.0
org.apache.curator curator-client 2.13.0
org.apache.curator curator-framework 2.13.0
org.apache.curator curator-recipes 2.13.0
org.apache.datasketches datasketches-java 3.1.0
org.apache.datasketches datasketches-memory 2.0.0
org.apache.derby derby 10.14.2.0
org.apache.hadoop hadoop-client-runtime 3.3.6
org.apache.hive hive-beeline 2.3.9
org.apache.hive hive-cli 2.3.9
org.apache.hive hive-jdbc 2.3.9
org.apache.hive hive-llap-client 2.3.9
org.apache.hive hive-llap-common 2.3.9
org.apache.hive hive-serde 2.3.9
org.apache.hive hive-shims 2.3.9
org.apache.hive hive-storage-api 2.8.1
org.apache.hive.shims hive-shims-0.23 2.3.9
org.apache.hive.shims hive-shims-common 2.3.9
org.apache.hive.shims hive-shims-scheduler 2.3.9
org.apache.httpcomponents httpclient 4.5.14
org.apache.httpcomponents httpcore 4.4.16
org.apache.ivy ivy 2.5.1
org.apache.logging.log4j log4j-1.2-api 2.20.0
org.apache.logging.log4j log4j-api 2.20.0
org.apache.logging.log4j log4j-core 2.20.0
org.apache.logging.log4j log4j-slf4j2-impl 2.20.0
org.apache.mesos mesos 1.11.0-shaded-protobuf
org.apache.orc orc-core 1.9.0-shaded-protobuf
org.apache.orc orc-mapreduce 1.9.0-shaded-protobuf
org.apache.orc orc-shims 1.9.0
org.apache.thrift libfb303 0.9.3
org.apache.thrift libthrift 0.12.0
org.apache.xbean xbean-asm9-shaded 4.23
org.apache.yetus audience-annotations 0.13.0
org.apache.zookeeper zookeeper 3.6.3
org.apache.zookeeper zookeeper-jute 3.6.3
org.checkerframework checker-qual 3.31.0
org.codehaus.jackson jackson-core-asl 1.9.13
org.codehaus.jackson jackson-mapper-asl 1.9.13
org.codehaus.janino commons-compiler 3.0.16
org.codehaus.janino janino 3.0.16
org.datanucleus datanucleus-api-jdo 4.2.4
org.datanucleus datanucleus-core 4.1.17
org.datanucleus datanucleus-rdbms 4.1.19
org.datanucleus javax.jdo 3.2.0-m3
org.eclipse.jetty jetty-client 9.4.51.v20230217
org.eclipse.jetty jetty-continuation 9.4.51.v20230217
org.eclipse.jetty jetty-http 9.4.51.v20230217
org.eclipse.jetty jetty-io 9.4.51.v20230217
org.eclipse.jetty jetty-jndi 9.4.51.v20230217
org.eclipse.jetty jetty-plus 9.4.51.v20230217
org.eclipse.jetty jetty-proxy 9.4.51.v20230217
org.eclipse.jetty jetty-security 9.4.51.v20230217
org.eclipse.jetty jetty-server 9.4.51.v20230217
org.eclipse.jetty jetty-servlet 9.4.51.v20230217
org.eclipse.jetty jetty-servlets 9.4.51.v20230217
org.eclipse.jetty jetty-util 9.4.51.v20230217
org.eclipse.jetty jetty-util-ajax 9.4.51.v20230217
org.eclipse.jetty jetty-webapp 9.4.51.v20230217
org.eclipse.jetty jetty-xml 9.4.51.v20230217
org.eclipse.jetty.websocket websocket-api 9.4.51.v20230217
org.eclipse.jetty.websocket websocket-client 9.4.51.v20230217
org.eclipse.jetty.websocket websocket-common 9.4.51.v20230217
org.eclipse.jetty.websocket websocket-server 9.4.51.v20230217
org.eclipse.jetty.websocket websocket-servlet 9.4.51.v20230217
org.fusesource.leveldbjni leveldbjni-all 1.8
org.glassfish.hk2 hk2-api 2.6.1
org.glassfish.hk2 hk2-locator 2.6.1
org.glassfish.hk2 hk2-utils 2.6.1
org.glassfish.hk2 osgi-resource-locator 1.0.3
org.glassfish.hk2.external aopalliance-repackaged 2.6.1
org.glassfish.hk2.external jakarta.inject 2.6.1
org.glassfish.jersey.containers jersey-container-servlet 2.40
org.glassfish.jersey.containers jersey-container-servlet-core 2.40
org.glassfish.jersey.core jersey-client 2.40
org.glassfish.jersey.core jersey-common 2.40
org.glassfish.jersey.core jersey-server 2.40
org.glassfish.jersey.inject jersey-hk2 2.40
org.hibernate.validator hibernate-validator 6.1.7.Final
org.ini4j ini4j 0.5.4
org.javassist javassist 3.29.2-GA
org.jboss.logging jboss-logging 3.3.2.Final
org.jdbi jdbi 2.63.1
org.jetbrains annotations 17.0.0
org.joda joda-convert 1.7
org.jodd jodd-core 3.5.2
org.json4s json4s-ast_2.12 3.7.0-M11
org.json4s json4s-core_2.12 3.7.0-M11
org.json4s json4s-jackson_2.12 3.7.0-M11
org.json4s json4s-scalap_2.12 3.7.0-M11
org.lz4 lz4-java 1.8.0
org.mariadb.jdbc mariadb-java-client 2.7.9
org.mlflow mlflow-spark 2.2.0
org.objenesis objenesis 2.5.1
org.postgresql postgresql 42.6.0
org.roaringbitmap RoaringBitmap 0.9.45
org.roaringbitmap shims 0.9.45
org.rocksdb rocksdbjni 8.3.2
org.rosuda.REngine REngine 2.1.0
org.scala-lang scala-compiler_2.12 2.12.15
org.scala-lang scala-library_2.12 2.12.15
org.scala-lang scala-reflect_2.12 2.12.15
org.scala-lang.modules scala-collection-compat_2.12 2.9.0
org.scala-lang.modules scala-parser-combinators_2.12 1.1.2
org.scala-lang.modules scala-xml_2.12 1.2.0
org.scala-sbt test-interface 1.0
org.scalacheck scalacheck_2.12 1.14.2
org.scalactic scalactic_2.12 3.2.15
org.scalanlp breeze-macros_2.12 2.1.0
org.scalanlp breeze_2.12 2.1.0
org.scalatest scalatest-compatible 3.2.15
org.scalatest scalatest-core_2.12 3.2.15
org.scalatest scalatest-diagrams_2.12 3.2.15
org.scalatest scalatest-featurespec_2.12 3.2.15
org.scalatest scalatest-flatspec_2.12 3.2.15
org.scalatest scalatest-freespec_2.12 3.2.15
org.scalatest scalatest-funspec_2.12 3.2.15
org.scalatest scalatest-funsuite_2.12 3.2.15
org.scalatest scalatest-matchers-core_2.12 3.2.15
org.scalatest scalatest-mustmatchers_2.12 3.2.15
org.scalatest scalatest-propspec_2.12 3.2.15
org.scalatest scalatest-refspec_2.12 3.2.15
org.scalatest scalatest-shouldmatchers_2.12 3.2.15
org.scalatest scalatest-wordspec_2.12 3.2.15
org.scalatest scalatest_2.12 3.2.15
org.slf4j jcl-over-slf4j 2.0.7
org.slf4j jul-to-slf4j 2.0.7
org.slf4j slf4j-api 2.0.7
org.threeten threeten-extra 1.7.1
org.tukaani xz 1.9
org.typelevel algebra_2.12 2.0.1
org.typelevel cats-kernel_2.12 2.1.1
org.typelevel spire-macros_2.12 0.17.0
org.typelevel spire-platform_2.12 0.17.0
org.typelevel spire-util_2.12 0.17.0
org.typelevel spire_2.12 0.17.0
org.wildfly.openssl wildfly-openssl 1.1.3.Final
org.xerial sqlite-jdbc 3.42.0.0
org.xerial.snappy snappy-java 1.1.10.3
org.yaml snakeyaml 2.0
oro oro 2.0.8
pl.edu.icm JLargeArrays 1.5
software.amazon.cryptools AmazonCorrettoCryptoProvider 1.6.1-linux-x86_64
software.amazon.ion ion-java 1.0.2
stax stax-api 1.0.1