Edit

Share via


Use Qlik to replicate mainframe and midrange data to Azure

Azure Event Hubs
Azure Data Lake
Azure Databricks

This solution uses an on-premises instance of Qlik to replicate on-premises data sources to Azure in real time.

Note

Pronounce "Qlik" like "click."

Apache® and Apache Kafka® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

Architecture

Diagram of an architecture that uses Qlik to migrate data to Azure.

Download a Visio file of this architecture.

Workflow

  1. Host agent: The host agent on the on-premises system captures change log information from Db2, Information Management System (IMS), and Virtual Storage Access Method (VSAM) data stores and passes it to the Qlik replication server.

  2. Replication server: The Qlik replication server software passes the change log information to Kafka and Azure Event Hubs. In this example, Qlik is on-premises, but you can deploy it on a virtual machine in Azure.

  3. Stream ingestion: Kafka and Event Hubs provide message brokers to receive and store change log information.

  4. Kafka Connect: The Kafka Connect API receives data from Kafka to update Azure data stores like Azure Data Lake Storage, Azure Databricks, and Azure Synapse Analytics.

  5. Data Lake Storage: Data Lake Storage is a staging area for the change log data.

  6. Azure Databricks: Azure Databricks processes the change log data and updates the corresponding files on Azure.

  7. Azure data services: Azure provides the following efficient data storage services.

    • Relational database services:

      • SQL Server on Azure Virtual Machines
      • Azure SQL Database
      • Azure SQL Managed Instance
      • Azure Database for PostgreSQL
      • Azure Database for MySQL
      • Azure Cosmos DB

      There are many factors to consider when you choose a data storage service. Consider the type of workload, cross-database queries, two-phase commit requirements, the ability to access the file system, amount of data, required throughput, and latency.

    • Azure Cosmos DB: Azure Cosmos DB is a NoSQL database that provides quick response, automatic scalability, and guaranteed speed at any scale.

    • Azure Synapse Analytics: Azure Synapse Analytics is an analytics service that combines data integration, enterprise data warehousing, and big data analytics. Use it to query data by using either serverless or dedicated resources at scale.

    • Microsoft Fabric: Microsoft Fabric is an all-in-one analytics solution for enterprises. It covers everything from data movement to data science, real-time analytics, and business intelligence. It provides a comprehensive suite of services, including data lake, data engineering, and data integration.

Components

This architecture consists of several Azure cloud services and is divided into four categories of resources: networking and identity, application, storage, and monitoring. The following sections describe the services for each resource and their roles.

Networking

When you design application architecture, it's crucial to prioritize networking and identity components to help ensure security, performance, and manageability during interactions over the public internet or private connections.

  • Azure ExpressRoute is a dedicated, private connection between your on-premises infrastructure and Microsoft cloud services. In this architecture, it ensures secure, high-throughput connectivity to Azure and Microsoft 365 and bypasses the public internet for improved reliability and performance.

  • Azure VPN Gateway is a virtual network gateway that enables encrypted communication between Azure and on-premises environments over the public internet. In this architecture, it provides secure site-to-site or point-to-site VPN access for hybrid connectivity.

Application

Azure provides managed services that support more secure, scalable, and efficient application deployment. This architecture uses application tier services that can help you optimize your application architecture.

  • Apache Kafka is an open-source distributed event streaming platform used for high-throughput data pipelines, streaming analytics, and mission-critical applications. In this architecture, it ingests Db2 change data and integrates with Qlik for real-time data movement and transformation.

  • Azure Databricks is a cloud-based data engineering and analytics platform built on Apache Spark. It can process and transform massive quantities of data. You can explore the data by using machine learning models. Jobs can be written in R, Python, Java, Scala, and Spark SQL. In this architecture, it transforms and analyzes large volumes of ingested data by using machine learning models and supports development in R, Python, Java, Scala, and Spark SQL.

  • Data Lake Storage is a scalable data lake built on Azure Blob Storage for storing structured and unstructured data. In this architecture, it serves as the persistent storage layer for processed change log data from on-premises systems.

  • Event Hubs is a big data streaming platform and event ingestion service that can store Db2, IMS, and VSAM change data messages. It can receive and process millions of messages per second. You can transform and store event hub data by using a real-time analytics provider or a custom adapter. In this architecture, it captures Db2, IMS, and VSAM change data messages and forwards them to analytics platforms or custom adapters for transformation and storage.

Storage and databases

This architecture addresses scalable and more secure cloud storage as well as managed databases for flexible and intelligent data management.

  • Azure Cosmos DB is a globally distributed NoSQL database service. In this architecture, it stores nontabular data migrated from mainframe systems and supports low-latency access across regions.

  • Azure Database for MySQL is a fully managed MySQL database service designed for scalability and high availability. In this architecture, it supports open-source relational workloads.

  • Azure Database for PostgreSQL is a fully managed, intelligent, and scalable PostgreSQL that has native connectivity with Azure services. In this architecture, it hosts relational data that benefits from advanced indexing, analytics, and compatibility with open-source tools.

  • Azure SQL is a family of cloud-based SQL database services that support migration, modernization, and development. This family includes the following offerings:

    • Azure SQL Edge is a lightweight SQL engine optimized for IoT and edge deployments. In this architecture, it processes and stores data close to devices in disconnected or latency-sensitive environments.

    • Azure SQL Managed Instance is a fully managed SQL Server instance with near 100% compatibility with on-premises SQL Server. In this architecture, it hosts migrated databases that benefit from simplified management and built-in high availability.

    • SQL Database is a fully managed relational database optimized for scalability and performance. In this architecture, it supports modernized workloads with elastic compute and built-in intelligence.

    • SQL Server on Azure Virtual Machines is a full-featured SQL Server instance that runs on Azure infrastructure. In this architecture, it supports legacy workloads that require full control over the operating system and database engine.

  • Azure Storage is a suite of scalable and secure cloud services for storing data, applications, and workloads. In this architecture, it provides foundational storage capabilities and includes the following offerings:

    • Azure Files is a fully managed file share service built on the Server Message Block (SMB) protocol. In this architecture, it stores migrated mainframe files and supports lift-and-shift scenarios for legacy workloads.

    • Azure Queue Storage is a messaging service for storing and retrieving messages between distributed application components. In this architecture, it enables asynchronous communication between microservices and back-end systems.

    • Azure Table Storage is a NoSQL key-value store for semi-structured data. In this architecture, it stores metadata and reference data from the mainframe system in a scalable format.

Monitoring

Monitoring tools provide comprehensive data analysis and valuable insights into application performance.

  • Application Insights is a feature of Azure Monitor that provides deep telemetry for application performance, availability, and usage. In this architecture, it monitors application behavior, detects anomalies, and supports distributed tracing to ensure reliability across services.

  • Azure Monitor is a comprehensive platform for collecting, analyzing, and acting on telemetry from Azure and on-premises environments. In this architecture, it serves as the central observability layer, which enables proactive monitoring and diagnostics across infrastructure and applications.

    • Log Analytics is a query tool within Azure Monitor that enables deep analysis of log data using a powerful query language. In this architecture, it supports diagnostics, custom dashboards, and operational insights by joining and aggregating data across multiple sources.

Alternatives

  • The preceding diagram shows Qlik installed on-premises. This approach is a recommended best practice to keep Qlik close to the on-premises data sources. An alternative is to install Qlik in the cloud on an Azure virtual machine.

  • Qlik Data Integration can deliver data directly to Azure Databricks without going through Kafka or an event hub.

  • Qlik Data Integration can't replicate data directly to Azure Cosmos DB, but you can integrate Azure Cosmos DB with an event hub by using event-sourcing architecture.

Scenario details

Many organizations use mainframe and midrange systems to run demanding and critical workloads. Most applications use shared databases, often across multiple systems. In this environment, modernizing to the cloud means that on-premises data must be provided to cloud-based applications. Therefore, data replication becomes an important modernization tactic.

The Qlik Data Integration platform includes Qlik Replicate, which does data replication. It uses change data capture to replicate on-premises data stores in real time to Azure. The change data can come from Db2, IMS, and VSAM change logs. This replication technique eliminates inconvenient batch bulk loads. This solution uses an on-premises instance of Qlik to replicate on-premises data sources to Azure in real time.

Potential use cases

This solution might be appropriate for:

  • Hybrid environments that require replication of data changes from a mainframe or midrange system to Azure databases.

  • Online database migration from Db2 to an Azure SQL database with little downtime.

  • Data replication from various on-premises data stores to Azure for consolidation and analysis.

Considerations

These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that you can use to improve the quality of a workload. For more information, see Well-Architected Framework.

Reliability

Reliability helps ensure that your application can meet the commitments that you make to your customers. For more information, see Design review checklist for Reliability.

  • Qlik Data Integration can be configured in a high-availability cluster.

  • The Azure database services support zone redundancy and can be designed to fail over to a secondary node during a maintenance window or if an outage occurs.

Security

Security provides assurances against deliberate attacks and the misuse of your valuable data and systems. For more information, see Design review checklist for Security.

  • ExpressRoute provides a private and efficient connection to Azure from on-premises, but you can use a site-to-site VPN instead.

  • Azure resources can be authenticated by using Microsoft Entra ID, and permissions are managed through role-based access control.

  • Azure database services support various security options, such as:

    • Data encryption at rest.

    • Dynamic data masking.

    • Always-encrypted databases.

  • For more information, see Azure security documentation.

Cost Optimization

Cost Optimization focuses on ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Design review checklist for Cost Optimization.

Use the Azure pricing calculator to estimate costs for your implementation.

Operational Excellence

Operational Excellence covers the operations processes that deploy an application and keep it running in production. For more information, see Design review checklist for Operational Excellence.

You can combine Application Insights and Log Analytics features to monitor the health of Azure resources. You can set alerts so that you can manage problems proactively.

Performance Efficiency

Performance Efficiency refers to your workload's ability to scale to meet user demands efficiently. For more information, see Design review checklist for Performance Efficiency.

Azure Databricks, Data Lake Storage, and other Azure database services have autoscaling capabilities. For more information, see Autoscaling.

Contributors

Microsoft maintains this article. The following contributors wrote this article.

Principal authors:

To see nonpublic LinkedIn profiles, sign in to LinkedIn.

Next steps