Edit

Share via


Data streaming with AKS

Azure App Service
Azure API Management
Azure Container Registry
Azure Managed Redis
Azure Cosmos DB

Solution ideas

This article describes a solution idea. Your cloud architect can use this guidance to help visualize the major components for a typical implementation of this architecture. Use this article as a starting point to design a well-architected solution that aligns with your workload's specific requirements.

This article presents a solution for using Azure Kubernetes Service (AKS) to quickly process and analyze a large volume of streaming data from devices.

ApacheĀ®, Apache Kafka, and Apache Spark are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks. Splunk is a registered trademark of Cisco.

Architecture

Architecture diagram that shows how streaming data from devices is ingested, processed, and analyzed.

Download a Visio file of this architecture.

Dataflow

  1. Sensors generate data and stream it to Azure API Management.
  2. An AKS cluster runs microservices that are deployed as containers behind a service mesh. The containers are built by using a DevOps process. The container images are stored in Azure Container Registry.
  3. An ingest service in AKS stores data in Azure Cosmos DB.
  4. Asynchronously, an analysis service in AKS receives the data and streams it to Apache Kafka on Azure HDInsight.
  5. Data scientists use machine learning models on Azure HDInsights and the Splunk platform to analyze the data.
  6. A processing service in AKS processes the data and stores the results in Azure Database for PostgreSQL. The service also caches the data in Azure Cache for Redis.
  7. A web app that runs in Azure App Service creates visualizations of the results.

Components

  • AKS is a managed Kubernetes container orchestration service. In this architecture, it hosts containerized microservices that ingest, process, and route streaming data from sensors to various storage and analytics layers.

  • Apache Kafka is a distributed event streaming platform designed for high-throughput, low-latency data feeds. In this architecture, it receives real-time data from AKS microservices and streams it to Azure HDInsight for large-scale analytics.

  • API Management is a gateway for publishing, securing, and analyzing APIs. In this architecture, it receives incoming data from sensors and routes it to the AKS cluster for processing.

  • App Service is a fully managed platform for building and hosting web applications. In this architecture, it runs a web app that visualizes processed results from the PostgreSQL database.

  • Azure Cache for Redis is an in-memory data store that supports fast data access. In this architecture, it temporarily stores processed data from AKS microservices to accelerate access and reduce latency.

  • Azure Cosmos DB is a globally distributed NoSQL database service. In this architecture, it stores ingested data from AKS microservices.

  • Azure Database for PostgreSQL is a managed relational database service based on PostgreSQL. In this architecture, it stores processed results from AKS microservices for downstream reporting and visualization.

  • Azure HDInsight is a cloud-based service for big data analytics using open-source frameworks. In this architecture, it runs Apache Spark jobs to analyze streamed data from Kafka and supports machine learning workloads.

  • Azure Pipelines is a continuous integration and continuous delivery (CI/CD) service within Azure DevOps. In this architecture, it builds and deploys containerized microservices to AKS to enable automated and repeatable delivery workflows.

  • Container Registry is a managed Docker container registry service. In this architecture, it stores container images containing the microservices.

  • Splunk is a data analytics and visualization platform for machine-generated data. In this architecture, it analyzes real-time data from Azure HDInsight and creates visual dashboards for business intelligence.

Scenario details

This solution is a good fit for a scenario that involves millions of data points, where data sources include Internet of Things (IoT) devices, sensors, and vehicles. In such a situation, processing the large volume of data is one challenge. Quickly analyzing the data is another demanding task, as organizations seek to gain insight into complex scenarios.

Containerized microservices in AKS form a key part of the solution. These self-contained services ingest and process the real-time data stream. They also scale as needed. The containers' portability makes it possible for the services to run in different environments and process data from multiple sources. To develop and deploy the microservices, DevOps and continuous integration/continuous delivery (CI/CD) are used. These approaches shorten the development cycle.

To store the ingested data, the solution uses Azure Cosmos DB. This database elastically scales throughput and storage, which makes it a good choice for large volumes of data.

The solution also uses Apache Kafka. This low-latency streaming platform handles real-time data feeds at extremely high speeds.

Another key solution component is Azure HDInsight, which is a managed cloud service that enables you to efficiently process massive amounts of data using the most popular open source frameworks. Azure HDInsight simplifies running big data frameworks in large volume and velocity while using Apache Spark in Azure. Splunk helps in the data analysis process. Splunk creates visualizations from real-time data and provides business intelligence.

Potential use cases

This solution benefits the following areas:

  • Vehicle safety, especially in the automotive industry
  • Customer service in retail and other industries
  • Healthcare cloud solutions
  • Financial technology solutions in the finance industry

Next steps

Product documentation:

Microsoft training modules: