Hi Ravi Sai Mahsiva,
Welcome to Microsoft Q&A Platform. Thank you for reaching out & hope you are doing well.
I had a look at your code and the output, and I can see what’s happening here.
The offset value you see in your Databricks output (like 58) is coming from the Kafka-compatible interface that Event Hubs uses. This is a Kafka-style offset, which just tracks the message position inside each partition. It’s not the same as the Event Hubs offset that you see in the Azure portal (the large number like 55834574848).
Right now, in your code, you’re reading Event Hubs using the .format("kafka") option, which is why you only get the Kafka offset. If you want to get the actual Event Hubs offset, that value is stored inside the system properties of each event — it’s not part of your message payload.
You can read that offset by switching to the Event Hubs connector instead of the Kafka one and accessing the system property like this:
event_hub_df = (
spark.read
.format("eventhubs")
.options(**event_hub_config)
.load()
)
from pyspark.sql.functions import col, from_json
df = event_hub_df.select(
col("body").cast("string").alias("value"),
col("systemProperties.offset").alias("eventhub_offset")
)
display(df)
Also, in your current schema, you added an "Offset" field inside the JSON structure, but that field doesn’t actually exist in your event data which is why it always shows as null
The small offset you see now is from Kafka, not Event Hubs.
The large offset in the Azure portal is part of Event Hub system properties.
To get that value, use the Event Hubs connector and read it from systemProperties.offset
Kindly let us know if the above helps or you need further assistance on this issue.
If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Thanks,
Manoj