missing events: how to prevent temporarily empty files being processed.

hoi 25 Reputation points
2025-10-20T10:12:38.8266667+00:00

We're using Azure Stream Analytics with ADLS as the input source.

Our source system (Kafka) creates files that are initially empty (0 bytes). After a short delay (milliseconds), data is flushed and the file is closed.

However, Stream Analytics processes each file only once, regardless of whether it contains data. If a file is processed while still empty, it will not be reprocessed once data is written—resulting in missing events upstream.

Unfortunately, we cannot change Kafka’s behavior.

Is there any way for Stream Analytics to handle this scenario more reliably?

Azure Stream Analytics
Azure Stream Analytics
An Azure real-time analytics service designed for mission-critical workloads.
{count} votes

Answer accepted by question author
  1. Smaran Thoomu 31,910 Reputation points Microsoft External Staff Moderator
    2025-10-21T13:58:07.81+00:00

    Hi hoi
    Thank you for reaching out and for explaining your setup in detail. I understand that you’re using Azure Stream Analytics (ASA) with ADLS as input, and encountering an issue where ASA picks up temporary 0-byte files before data is flushed - causing missing events since those files are not reprocessed once data is written.

    This is a known behavior - Stream Analytics processes each blob only once, based on its creation event. If a file is empty at that moment, it won’t be picked up again even if data is later appended.

    Here are some recommended approaches to mitigate this:

    Option 1: Introduce a file-ready signal or rename pattern

    If possible, modify the upstream process (or add a lightweight wrapper) to:

    • Write the file as a temporary name (e.g., file.tmp),
    • Flush and close the file completely,
    • Then rename or copy it to the final monitored container (e.g., file.json or file.parquet). Stream Analytics will only trigger ingestion on the final file creation, ensuring the file contains data.

    Option 2: Use Azure Data Factory (ADF) or Logic App as a pre-processor

    Use ADF or a Logic App to:

    • Monitor the ADLS folder for new files.
    • Filter out 0-byte files.
    • Move or copy non-empty files into a staging folder that ASA reads from. This ensures ASA only processes valid data files.

    Option 3: Consider an Event-based or streaming alternative

    If changing file behavior is difficult, you might consider:

    • Event Hubs or Kafka directly as input to ASA (bypassing the file latency issue), or
    • Using Dataflow/ADF pipelines with triggers to control when files are handed off to ASA.

    ASA doesn’t currently support “reprocessing” a blob after it’s updated, nor does it check file size or last modified timestamp. Once it’s marked as read, it’s skipped even if updated later.

    Reference:

    I hope this information helps. Please do let us know if you have any further queries.

    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. hoi 25 Reputation points
    2025-10-23T12:22:14.35+00:00

    Hi, thanks for your detailed answer. Your 3 options align with our own thinking.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.