你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn。
This article describes how to configure Cloud Ingest subvolumes (blob upload with local purge) in Azure Container Storage enabled by Azure Arc. A Cloud Ingest subvolume facilitates limitless data ingestion from edge to blob, including ADLSgen2. Files written to this storage type are seamlessly transferred to blob storage and once confirmed uploaded, are then purged locally. This removal ensures space availability for new data. Moreover, this storage option supports data integrity in disconnected environments, which enables local storage and synchronization upon reconnection to the network.
For example, you can write a file to your cloud ingest Persistent Volume Claim (PVC), and a process runs a scan to check for new files every minute. Once identified, the file is sent for uploading to your designated blob destination. Following confirmation of a successful upload, Cloud Ingest subvolume waits for five minutes, and then deletes the local version of your file.
Prerequisites
If your final destination is blob storage or ADLSgen2, continue following the prerequisites and instructions in this article. If your final destination is OneLake, follow the instructions in Configure OneLake Identity for Cloud subvolumes first.
Create a storage account following the instructions in Create an Azure storage account.
Note
When you create your storage account, it's recommended that you create it under the same resource group and region/location as your Kubernetes cluster.
Create a container in the storage account that you created previously, following the instructions in Quickstart: Upload, download, and list blobs with the Azure portal > Create a container.
Configure Extension Identity
Edge Volumes allows the use of a system-assigned extension identity for access to blob storage. This section describes how to use the system-assigned extension identity to grant access to your storage account, allowing you to upload Cloud Ingest subvolumes to these storage systems.
If you wish to use Workload Identity with Azure Container Storage Enabled by Azure Arc, follow the instructions in Configure Workload Identity for Cloud subvolumes.
Azure portal
- Navigate to your Arc-enabled cluster.
- Select Extensions.
- Select your Azure Container Storage enabled by Azure Arc extension.
- Note the Principal ID under Cluster Extension Details.
Configure blob storage account for Extension Identity
Add Extension Identity permissions to a storage account
- Navigate to storage account in the Azure portal.
- Select Access Control (IAM).
- Select Add+ -> Add role assignment.
- Select Storage Blob Data Owner, then select Next.
- Select +Select Members.
- To add your principal ID to the Selected Members: list, paste the ID and select + next to the identity.
- Click Select.
- To review and assign permissions, select Next, then select Review + Assign.
Create a Cloud Ingest Persistent Volume Claim (PVC)
To create a PVC for your Ingest subvolume, use the following process:
Create a file named
cloudIngestPVC.yamlwith the following content:kind: PersistentVolumeClaim apiVersion: v1 metadata: ### Create a name for your PVC ### name: <create-persistent-volume-claim-name-here> ### Use a namespace that matched your intended consuming pod, or "default" ### namespace: <intended-consuming-pod-or-default-here> spec: accessModes: - ReadWriteMany resources: requests: storage: 2Gi storageClassName: cloud-backed-scNote
Use only lowercase letters and dashes. For more information, see the Kubernetes object naming documentation.
- Edit the
metadata.namevalue and create a name for your PVC. This name is referenced on the last line of deploymentExample.yaml in the next step. - Edit the
metadata.namespacevalue with your intended consuming pod. If you don't have an intended consuming pod, set its value todefault. - The
spec.resources.requests.storageparameter determines the size of the persistent volume. It's 2 GB in this example, but can be modified to fit your needs.
- Edit the
To apply the cloudIngestPVC.yaml, run:
kubectl apply -f "cloudIngestPVC.yaml"
Attach Ingest subvolume to Edge Volume
To create a subvolume for Ingest, using extension identity to connect to your storage account container, use the following process:
Get the name of the Edge Volume you created by running the following command:
kubectl get edgevolumesCreate a file named
ingestSubvolume.yamlwith the following content:apiVersion: "arccontainerstorage.azure.net/v1" kind: IngestSubvolume metadata: name: <create-a-subvolume-name-here> spec: edgevolume: <your-edge-volume-name-here> path: ingestSubDir # Don't use a preceding slash authentication: authType: MANAGED_IDENTITY storageAccountEndpoint: "https://<STORAGE ACCOUNT NAME>.blob.core.windows.net/" containerName: <your-blob-storage-account-container-name> ingest: order: newest-first minDelaySec: 60 eviction: order: unordered minDelaySec: 120 onDelete: trigger-immediate-ingestNote
Use only lowercase letters and dashes. For more information, see the Kubernetes object naming documentation.
metadata.name: Create a name for your subvolume.spec.edgevolume: This name was retrieved from the previous step.spec.path: Create your own subdirectory name under the mount path. The default name isingestSubDir.spec.authentication.authType: This should beMANAGED_IDENTITYorWORKLOAD_IDENTITY, depending on the authentication mechanism chosen.spec.storageAccountEndpoint: Navigate to your storage account in the Azure portal. On the Overview page, near the top right of the screen, select JSON View. You can find the link under properties.primaryEndpoints.blob. Copy the entire link.spec.containerName: The container name in your storage account.
The following variables have reasonable defaults, but can be changed:
spec.ingest.order: The order in which dirty files are uploaded. This is a best effort, not a guarantee. Options for order are:oldest-firstornewest-first.spec.ingest.minDelaySec: The minimum number of seconds before a dirty file is eligible for ingest. This number can range between 0 and 31536000 (a year in seconds).spec.eviction.order: How files are evicted once they are uploaded to the cloud. Options for eviction order are:unorderedornever.spec.eviction.minDelaySec: The number of seconds before a clean file is eligible for eviction. This number can range between 0 and 31536000 (a year in seconds).spec.onDelete: The action to take on this IngestSubVolume if/when it's requested to be deleted. Options aretrigger-immediate-ingestwhich will immediately mark all files as eligible for ingest and attempt to ingest them, orabandonwhich will abandon all data in this ingest subvolume and delete the subvolume.
Note
If you choose abandon for your
spec.onDeletevalue, any dirty data in your subvolume will be lost. Please be careful and mindful before choosing this as an option.To apply the ingestSubvolume.yaml, run:
kubectl apply -f "ingestSubvolume.yaml"
Attach your app (Kubernetes native application)
To configure a generic single pod (Kubernetes native application) against the PVC to use the Ingest capabilities, use the following process:
Create a file named
deploymentExample.yamlwith the following content:apiVersion: apps/v1 kind: Deployment metadata: name: cloudingestsubvol-deployment ### This must be unique for each deployment you choose to create. spec: replicas: 2 selector: matchLabels: name: acsa-testclientdeployment template: metadata: name: acsa-testclientdeployment labels: name: acsa-testclientdeployment spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - acsa-testclientdeployment topologyKey: kubernetes.io/hostname containers: ### Specify the container in which to launch the busy box. ### - name: ingest-deployment-container image: mcr.microsoft.com/azure-cli:2.57.0@sha256:c7c8a97f2dec87539983f9ded34cd40397986dcbed23ddbb5964a18edae9cd09 command: - "/bin/sh" - "-c" - "dd if=/dev/urandom of=/data/ingestSubDir/acsaingesttestfile count=16 bs=1M && while true; do ls /data &>/dev/null || break; sleep 1; done" volumeMounts: ### This name must match the volumes.name attribute below ### - name: acsa-volume ### This mountPath is where the PVC is attached to the pod's filesystem ### mountPath: "/data" volumes: ### User-defined 'name' that's used to link the volumeMounts. This name must match volumeMounts.name as previously specified. ### - name: acsa-volume persistentVolumeClaim: ### This claimName must refer to your PVC metadata.name (Line 5) claimName: <your-pvc-metadata-name-from-line-5-of-pvc-yaml>Note
Use only lowercase letters and dashes. For more information, see the Kubernetes object naming documentation.
- Edit the
containers.nameandvolumes.persistentVolumeClaim.claimNamevalues. - If you edited the
spec.pathvalue in edgeSubvolume.yaml, the valueingestSubDiron this file must be updated with your new path name. - The
spec.replicasparameter determines the number of replica pods to create. It's 2 in this example, but can be modified to fit your needs.
- Edit the
To apply the deploymentExample.yaml and create the pod, run:
kubectl apply -f "deploymentExample.yaml"Find the name of your pod to use in the next step:
kubectl get podsNote
Because
spec.replicasfrom deploymentExample.yaml was specified with 2, two pods are created. You can use either pod name for the next step.Run the following command to start exec into the pod. Replace
<name-of-pod>with your pod name from the previous step:kubectl exec -it <name-of-pod> -- shChange directories into the
/datamount path as specified from your deploymentExample.yaml file:cd /dataYou should see a directory that matches the value you set for
spec.pathin ingestSubvolume.yaml. If you used the default values, its name is ingestSubDir. Change to that subdirectory:cd ingestSubDirAs an example, create a file named
file1.txtand write to it:echo "Hello World" > file1.txtThis file will be uploaded to your blob storage account container, and then purged locally after five minutes.
In the Azure portal, navigate to your storage account and find the container that matches the value you set for
spec.containerNamein ingestSubvolume.yaml. You should findfile1.txtpopulated within the container. If the file is not there yet, wait approximately 1 minute; Edge Volumes waits a minute before uploading.
Next steps
- To learn how to configure Cloud Mirror subvolumes, see Configure Cloud Mirror subvolumes.
- To learn how to use Edge Volumes together, see Using Edge Volumes together.