SparkJob Class 
A standalone Spark job.
Constructor
SparkJob(*, driver_cores: int | str | None = None, driver_memory: str | None = None, executor_cores: int | str | None = None, executor_memory: str | None = None, executor_instances: int | str | None = None, dynamic_allocation_enabled: bool | str | None = None, dynamic_allocation_min_executors: int | str | None = None, dynamic_allocation_max_executors: int | str | None = None, inputs: Dict[str, Input | str | bool | int | float] | None = None, outputs: Dict[str, Output] | None = None, compute: str | None = None, identity: Dict[str, str] | ManagedIdentityConfiguration | AmlTokenConfiguration | UserIdentityConfiguration | None = None, resources: Dict | SparkResourceConfiguration | None = None, **kwargs: Any)Keyword-Only Parameters
| Name | Description | 
|---|---|
| driver_cores | The number of cores to use for the driver process, only in cluster mode. Default value: None | 
| driver_memory | The amount of memory to use for the driver process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g"). Default value: None | 
| executor_cores | The number of cores to use on each executor. Default value: None | 
| executor_memory | The amount of memory to use per executor process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g"). Default value: None | 
| executor_instances | The initial number of executors. Default value: None | 
| dynamic_allocation_enabled | Whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload. Default value: None | 
| dynamic_allocation_min_executors | The lower bound for the number of executors if dynamic allocation is enabled. Default value: None | 
| dynamic_allocation_max_executors | The upper bound for the number of executors if dynamic allocation is enabled. Default value: None | 
| inputs | The mapping of input data bindings used in the job. Default value: None | 
| outputs | The mapping of output data bindings used in the job. Default value: None | 
| compute | The compute resource the job runs on. Default value: None | 
| identity | 
				Optional[Union[dict[str, str], <xref:azure.ai.ml.ManagedIdentityConfiguration>, <xref:azure.ai.ml.AmlTokenConfiguration>, <xref:azure.ai.ml.UserIdentityConfiguration>]]
		 The identity that the Spark job will use while running on compute. Default value: None | 
| resources | Default value: None | 
Examples
Configuring a SparkJob.
   from azure.ai.ml import Input, Output
   from azure.ai.ml.entities import SparkJob
   spark_job = SparkJob(
       code="./sdk/ml/azure-ai-ml/tests/test_configs/dsl_pipeline/spark_job_in_pipeline/basic_src",
       entry={"file": "sampleword.py"},
       conf={
           "spark.driver.cores": 2,
           "spark.driver.memory": "1g",
           "spark.executor.cores": 1,
           "spark.executor.memory": "1g",
           "spark.executor.instances": 1,
       },
       environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu:33",
       inputs={
           "input1": Input(
               type="uri_file", path="azureml://datastores/workspaceblobstore/paths/python/data.csv", mode="direct"
           )
       },
       compute="synapsecompute",
       outputs={"component_out_path": Output(type="uri_folder")},
       args="--input1 ${{inputs.input1}} --output2 ${{outputs.output1}} --my_sample_rate ${{inputs.sample_rate}}",
   )
Methods
| dump | Dumps the job content into a file in YAML format. | 
| filter_conf_fields | Filters out the fields of the conf attribute that are not among the Spark configuration fields listed in ~azure.ai.ml._schema.job.parameterized_spark.CONF_KEY_MAP and returns them in their own dictionary. | 
dump
Dumps the job content into a file in YAML format.
dump(dest: str | PathLike | IO, **kwargs: Any) -> NoneParameters
| Name | Description | 
|---|---|
| dest 
				Required
			 | The local path or file stream to write the YAML content to. If dest is a file path, a new file will be created. If dest is an open file, the file will be written to directly. | 
Exceptions
| Type | Description | 
|---|---|
| Raised if dest is a file path and the file already exists. | |
| Raised if dest is an open file and the file is not writable. | 
filter_conf_fields
Filters out the fields of the conf attribute that are not among the Spark configuration fields listed in ~azure.ai.ml._schema.job.parameterized_spark.CONF_KEY_MAP and returns them in their own dictionary.
filter_conf_fields() -> Dict[str, str]Returns
| Type | Description | 
|---|---|
| A dictionary of the conf fields that are not Spark configuration fields. | 
Attributes
base_path
creation_context
The creation context of the resource.
Returns
| Type | Description | 
|---|---|
| The creation metadata for the resource. | 
entry
environment
The Azure ML environment to run the Spark component or job in.
Returns
| Type | Description | 
|---|---|
| The Azure ML environment to run the Spark component or job in. | 
id
identity
The identity that the Spark job will use while running on compute.
Returns
| Type | Description | 
|---|---|
| The identity that the Spark job will use while running on compute. | 
inputs
log_files
outputs
resources
The compute resource configuration for the job.
Returns
| Type | Description | 
|---|---|
| The compute resource configuration for the job. | 
status
The status of the job.
Common values returned include "Running", "Completed", and "Failed". All possible values are:
- NotStarted - This is a temporary state that client-side Run objects are in before cloud submission. 
- Starting - The Run has started being processed in the cloud. The caller has a run ID at this point. 
- Provisioning - On-demand compute is being created for a given job submission. 
- Preparing - The run environment is being prepared and is in one of two stages: - Docker image build 
- conda environment setup 
 
- Queued - The job is queued on the compute target. For example, in BatchAI, the job is in a queued state - while waiting for all the requested nodes to be ready. 
- Running - The job has started to run on the compute target. 
- Finalizing - User code execution has completed, and the run is in post-processing stages. 
- CancelRequested - Cancellation has been requested for the job. 
- Completed - The run has completed successfully. This includes both the user code execution and run - post-processing stages. 
- Failed - The run failed. Usually the Error property on a run will provide details as to why. 
- Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled. 
- NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent. 
Returns
| Type | Description | 
|---|---|
| Status of the job. | 
studio_url
CODE_ID_RE_PATTERN
CODE_ID_RE_PATTERN = re.compile('\\/subscriptions\\/(?P<subscription>[\\w,-]+)\\/resourceGroups\\/(?P<resource_group>[\\w,-]+)\\/providers\\/Microsoft\\.MachineLearningServices\\/workspaces\\/(?P<workspace>[\\w,-]+)\\/codes\\/(?P<co)