Hi hoi
Thank you for reaching out and for explaining your setup in detail. I understand that you’re using Azure Stream Analytics (ASA) with ADLS as input, and encountering an issue where ASA picks up temporary 0-byte files before data is flushed - causing missing events since those files are not reprocessed once data is written.
This is a known behavior - Stream Analytics processes each blob only once, based on its creation event. If a file is empty at that moment, it won’t be picked up again even if data is later appended.
Here are some recommended approaches to mitigate this:
Option 1: Introduce a file-ready signal or rename pattern
If possible, modify the upstream process (or add a lightweight wrapper) to:
- Write the file as a temporary name (e.g.,
file.tmp), - Flush and close the file completely,
- Then rename or copy it to the final monitored container (e.g.,
file.jsonorfile.parquet). Stream Analytics will only trigger ingestion on the final file creation, ensuring the file contains data.
Option 2: Use Azure Data Factory (ADF) or Logic App as a pre-processor
Use ADF or a Logic App to:
- Monitor the ADLS folder for new files.
- Filter out 0-byte files.
- Move or copy non-empty files into a staging folder that ASA reads from. This ensures ASA only processes valid data files.
Option 3: Consider an Event-based or streaming alternative
If changing file behavior is difficult, you might consider:
- Event Hubs or Kafka directly as input to ASA (bypassing the file latency issue), or
- Using Dataflow/ADF pipelines with triggers to control when files are handed off to ASA.
ASA doesn’t currently support “reprocessing” a blob after it’s updated, nor does it check file size or last modified timestamp. Once it’s marked as read, it’s skipped even if updated later.
Reference:
I hope this information helps. Please do let us know if you have any further queries.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.