missing events: how to prevent temporarily empty files being processed.

Question

missing events: how to prevent temporarily empty files being processed.

hoi 25

We're using Azure Stream Analytics with ADLS as the input source.

Our source system (Kafka) creates files that are initially empty (0 bytes). After a short delay (milliseconds), data is flushed and the file is closed.

However, Stream Analytics processes each file only once, regardless of whether it contains data. If a file is processed while still empty, it will not be reprocessed once data is written—resulting in missing events upstream.

Unfortunately, we cannot change Kafka’s behavior.

Is there any way for Stream Analytics to handle this scenario more reliably?

Smaran Thoomu 31,910 Reputation points Microsoft External Staff Moderator

2025-10-22T17:06:04.71+00:00

hoi We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer accepted by question author

1 additional answer

Your answer

Smaran Thoomu 31,910 Reputation points Microsoft External Staff Moderator

2025-10-22T17:06:04.71+00:00

hoi We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

Hi hoi
Thank you for reaching out and for explaining your setup in detail. I understand that you’re using Azure Stream Analytics (ASA) with ADLS as input, and encountering an issue where ASA picks up temporary 0-byte files before data is flushed - causing missing events since those files are not reprocessed once data is written.

This is a known behavior - Stream Analytics processes each blob only once, based on its creation event. If a file is empty at that moment, it won’t be picked up again even if data is later appended.

Here are some recommended approaches to mitigate this:

Option 1: Introduce a file-ready signal or rename pattern

If possible, modify the upstream process (or add a lightweight wrapper) to:

Write the file as a temporary name (e.g., file.tmp),
Flush and close the file completely,
Then rename or copy it to the final monitored container (e.g., file.json or file.parquet). Stream Analytics will only trigger ingestion on the final file creation, ensuring the file contains data.

Option 2: Use Azure Data Factory (ADF) or Logic App as a pre-processor

Use ADF or a Logic App to:

Monitor the ADLS folder for new files.
Filter out 0-byte files.
Move or copy non-empty files into a staging folder that ASA reads from. This ensures ASA only processes valid data files.

Option 3: Consider an Event-based or streaming alternative

If changing file behavior is difficult, you might consider:

Event Hubs or Kafka directly as input to ASA (bypassing the file latency issue), or
Using Dataflow/ADF pipelines with triggers to control when files are handed off to ASA.

ASA doesn’t currently support “reprocessing” a blob after it’s updated, nor does it check file size or last modified timestamp. Once it’s marked as read, it’s skipped even if updated later.

Reference:

Azure Stream Analytics – Input from Azure Blob Storage or ADLS

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Answer 2

hoi 25

Hi, thanks for your detailed answer. Your 3 options align with our own thinking.

Share via

missing events: how to prevent temporarily empty files being processed.

1 additional answer

Your answer