Edit

Share via


Minimal storage – change feed to replicate data

Azure Front Door
Azure App Service
Azure Functions
Azure Cosmos DB
Azure Table Storage

This article presents a high-availability solution for a web application dealing with large volumes of data that need to be accessible within a specific time frame. The solution involves using Azure Cosmos DB as the primary data store and using the Azure Cosmos DB change feed to replicate data to low-cost secondary storage. When the specified time period expires, Azure Functions is used to delete the data from Azure Cosmos DB. The data in secondary storage remains available for a longer period of time to enable other solutions for auditing and analysis purposes. The solution also provides high durability by replicating data to different data services.

Architecture

Architecture of a resilient system that uses two types of storage to reduce costs.

Download a Visio file of this architecture.

Dataflow

  1. The client authenticates with Microsoft Entra ID and is granted access to web applications hosted on Azure App Service.
  2. Azure Front Door, a firewall and layer-7 load balancer, switches user traffic to the standby region if there's a regional outage.
  3. App Service hosts websites and RESTful web APIs. Browser clients run Asynchronous JavaScript and XML (AJAX) applications that use the APIs.
  4. Web APIs delegate responsibility to code hosted by Functions to handle background tasks. The tasks are queued in Azure Queue Storage queues.
  5. The queued messages trigger the functions, which perform the background tasks.
  6. Azure Cache for Redis caches database data for the functions. By using the cache, the solution offloads database activity and speeds up the function apps and web apps.
  7. Azure Cosmos DB holds recently generated data.
  8. Azure Cosmos DB issues a change feed that can be used to replicate changes.
  9. A function app reads the change feed and replicates the changes to Azure Table Storage tables. Another function app periodically removes expired data from Azure Cosmos DB.
  10. Table Storage provides low-cost storage.

Components

  • Microsoft Entra ID is an identity and access management service that can synchronize with an on-premises directory. In this architecture, it authenticates users and grants access to web applications hosted on App Service.
  • Azure DNS is a high-availability hosting service for Domain Name System (DNS) domains. In this architecture, Azure DNS provides DNS resolution and for the web app exposed through Azure Front Door.
  • Azure Front Door is a secure content delivery network and load balancer. In this architecture, it accelerates content delivery, provides failover capabilities, and protects apps from cyber threats.
  • App Service is a fully managed service for building, deploying, hosting, and scaling web apps. You can build apps by using .NET, .NET Core, Node.js, Java, Python, or PHP. Apps can run in containers or on Windows or Linux. In a mainframe migration, you can code the front-end screens or web interface as HTTP-based REST APIs. You can segregate them and make them stateless to orchestrate a microservices-based system. For more information about web APIs, see RESTful web API design. In this architecture, App Service hosts the web interface and REST APIs for the application.
  • Functions provides an environment to run small pieces of code, called functions, without having to establish an application infrastructure. You can use it to process bulk data, integrate systems, work with Internet of Things (IoT) devices, and build simple APIs and microservices. With microservices, you can create servers that connect to Azure services and always remain up to date. In this architecture, Functions handles background tasks like replicating data and deleting expired records.
  • Azure Storage is a set of massively scalable and secure cloud services for data, apps, and workloads. It includes Azure Files, which serves as an effective tool to migrate mainframe workloads.
    • Queue Storage provides simple, cost-effective, durable message queueing for large workloads. This architecture uses Queue Storage for task messaging.
    • Table Storage is a NoSQL key-value store for rapid development that uses massive semi-structured datasets. The tables are schemaless and adapt readily as needs change. Access is fast and cost-effective for many types of applications, and typically costs less than other types of keyed storage. This architecture uses Table Storage to store a synchronized and restructured copy of the data in Azure Cosmos DB.
  • Azure Cache for Redis is a fully managed in-memory caching service and message broker for sharing data and state among compute resources. It includes both the open-source Redis and a commercial product from Redis Labs as managed services. You can improve the performance of high-throughput online transaction processing (OLTP) applications by designing them to scale and to make use of an in-memory data store such as Azure Cache for Redis. In this architecture, Azure Cache for Redis accelerates access to frequently used data, which improves performance for both function apps and web apps.
  • Azure Cosmos DB is a globally distributed, multi-model database that enables your solutions to elastically and independently scale throughput and storage across any number of geographic regions. It provides throughput, latency, availability, and consistency guarantees with comprehensive service-level agreements (SLAs). In this architecture, Azure Cosmos DB stores recent data and emits a change feed used to replicate updates to Table Storage.

Alternatives

  • Azure Traffic Manager directs incoming DNS requests across the global Azure regions based on your choice of traffic routing methods. It also provides automatic failover and performance routing.
  • Azure Content Delivery Network caches static content in edge servers for quick response, and uses network optimizations to improve response for dynamic content. Content Delivery Network is especially useful when the user base is global.
  • Azure Container Apps is a fully managed, serverless container service used to build and deploy modern apps at scale.
  • Azure Kubernetes Service (AKS) is a fully managed Kubernetes service for deploying and managing containerized applications. You can use it to implement a microservices architecture whose components scale independently on demand.
  • Azure Container Instances provides a quick and simple way to run tasks without having to manage infrastructure. It's useful during development or for running unscheduled tasks.
  • Azure Service Bus is a reliable cloud messaging service for simple hybrid integration. It can be used instead of Queue Storage in this architecture. For more information, see Storage queues and Service Bus queues - compared and contrasted.

Scenario details

This solution uses Azure Cosmos DB to store the large volume of data that the web application uses. Web apps that handle massive amounts of data benefit from the ability of Azure Cosmos DB to elastically and independently scale throughput and storage.

Another key solution component is the Azure Cosmos DB change feed. When changes are made to the database, the change feed stream is sent to an event-driven Functions trigger. A function then runs and replicates the changes to Table Storage tables, which provide a low-cost storage solution. You can also orchestrate broader downstream data movement by using Azure Data Factory pipelines or Microsoft Fabric Data Factory to land data in analytics zones.

The web app needs the data for only a limited amount of time. The solution takes advantage of that fact to further reduce costs. Specifically, another function periodically runs and deletes expired data from Azure Cosmos DB. Besides being triggered, functions can also be scheduled to run at set times.

Potential use cases

The architecture is appropriate for any application that:

  • Uses a massive amount of data.
  • Requires that data is always available when it's needed.
  • Uses data that expires.

Examples include apps that:

  • Personalize customer experience and drive engagement through live data feeds and sensors in physical locations.
  • Track customer spending habits and shopping behavior.
  • Track vehicle fleets by collecting data on vehicle location, performance, and driver behavior for improved efficiency and safety.
  • Forecast weather.
  • Offer smart traffic systems or implement smart traffic systems or use smart technology to monitor traffic.
  • Analyze manufacturing IoT data.
  • Display smart meter data or use smart technology to monitor meter data.

Considerations

These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that can be used to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.

  • When you implement and maintain this solution, you incur extra costs.
  • Using the change feed for replication requires less code maintenance than doing the replication in the core application.
  • You need to migrate existing data. The migration process requires ad hoc scripts or routines to copy old data to storage accounts. When you migrate the data, make sure that you use time stamps and copy flags to track migration progress.
  • To avoid deleting entries from the Azure Table secondary storage, ignore delete feeds that are generated when your functions delete entries from Azure Cosmos DB.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal author:

  • Nabil Siddiqui | Cloud Solution Architect - Digital and Application Innovation

Other contributors:

Next steps