Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Apache Spark is the technology powering compute clusters and SQL warehouses in Azure Databricks.
This page provides an overview of the documentation in this section.
Get started
Get started working with Apache Spark on Databricks.
| Topic | Description |
|---|---|
| Apache Spark on Azure Databricks | Get answers to frequently asked questions about Apache Spark on Azure Databricks. |
| Tutorial: Load and transform data using Apache Spark DataFrames | Follow a step-by-step guide for working with Spark DataFrames in Python, R, or Scala for data loading and transformation. |
| PySpark basics | Learn the basics of using PySpark by walking through simple examples. |
Additional resources
Explore other Spark capabilities and documentation.
| Topic | Description |
|---|---|
| Set Spark configuration properties on Azure Databricks | Set Spark configuration properties to customize settings in your compute environment and optimize performance. |
| Structured Streaming | Read an overview of Structured Streaming, a near real-time processing engine. |
| Diagnose cost and performance issues using the Spark UI | Learn to use the Spark UI for performance tuning, debugging, and cost optimization of Spark jobs. |
| Use Apache Spark MLlib on Azure Databricks | Distributed machine learning using Spark MLlib and integration with popular ML frameworks. |
Spark APIs
Work with Spark using your preferred programming language.
| Topic | Description |
|---|---|
| Reference for Apache Spark APIs | API reference overview for Apache Spark, including links to reference for Spark SQL, DataFrames, and RDD operations across supported languages. |
| PySpark | Use Python with Spark including PySpark basics, custom data sources, and Python-specific optimizations. |
| Pandas API on Spark | Leverage familiar pandas syntax with the scalability of Spark for distributed data processing. |
| R for Spark | Work with R and Spark using SparkR and sparklyr for statistical computing and data analysis. |
| Scala for Spark | Build high-performance Spark applications using Scala with native Spark APIs and type safety. |