Facing Issue while extracting the offset value from event hubs in data bricks

Ravi Sai Mahsiva 20 Reputation points
2025-10-16T14:11:49.0533333+00:00

Hi Team,

Could you please help us to provide code to extract the offset data from event hubs attached the screenshot of event hub details, the below code were using it, even I have tried with case sensitive pass the offset column in schema, at final result attached, in offset column we were getting sequence number, can you please tell how to extract offset column value in data bricks using spark

User's imageUser's image


def get_geofence_event_schema():
    return StructType([
        StructField("id", StringType(), True),
        StructField("type", StringType(), True),
        StructField("source", StringType(), True),
        StructField("specversion", StringType(), True),
        StructField("time", TimestampType(), True),
        StructField("datacontenttype", StringType(), True),
        StructField("pubsubname", StringType(), True),
        StructField("topic", StringType(), True),
        StructField("traceid", StringType(), True),
        StructField("traceparent", StringType(), True),
        StructField("tracestate", StringType(), True),
        StructField("Offset", StringType(), True),
        StructField("data", StringType(), True)
    ])


df_batch = (
    spark.read
        .format("kafka")
        .options(**event_hub_config)
        .load()
        .limit(10)
        .withColumn("geofenceevent", from_json(col("value").cast(StringType()), get_geofence_event_schema()))
)

display(df_batch)

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
{count} votes

Answer accepted by question author
  1. Manoj Kumar Boyini 330 Reputation points Microsoft External Staff Moderator
    2025-10-16T16:07:33.41+00:00

    Hi Ravi Sai Mahsiva,

    Welcome to Microsoft Q&A Platform. Thank you for reaching out & hope you are doing well. 

    I had a look at your code and the output, and I can see what’s happening here.

    The offset value you see in your Databricks output (like 58) is coming from the Kafka-compatible interface that Event Hubs uses. This is a Kafka-style offset, which just tracks the message position inside each partition. It’s not the same as the Event Hubs offset that you see in the Azure portal (the large number like 55834574848).

    Right now, in your code, you’re reading Event Hubs using the .format("kafka") option, which is why you only get the Kafka offset. If you want to get the actual Event Hubs offset, that value is stored inside the system properties of each event — it’s not part of your message payload.

    You can read that offset by switching to the Event Hubs connector instead of the Kafka one and accessing the system property like this:

    event_hub_df = (
        spark.read
            .format("eventhubs")
            .options(**event_hub_config)
            .load()
    )
    
    from pyspark.sql.functions import col, from_json
    
    df = event_hub_df.select(
        col("body").cast("string").alias("value"),
        col("systemProperties.offset").alias("eventhub_offset")
    )
    
    display(df)
    
    
    

    Also, in your current schema, you added an "Offset" field inside the JSON structure, but that field doesn’t actually exist in your event data which is why it always shows as null

    The small offset you see now is from Kafka, not Event Hubs.

    The large offset in the Azure portal is part of Event Hub system properties.

    To get that value, use the Event Hubs connector and read it from systemProperties.offset

    Kindly let us know if the above helps or you need further assistance on this issue.

    If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    Thanks,
    Manoj

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.