Skip to content

User Guide

Follow these steps to get PerfGazer up and running.

Step 1 — Annotate your Spark code

To make the listener data easier to analyze, add job descriptions to your Spark code using setJobDescription or setLocalProperty:

spark.sparkContext.setJobDescription("my-etl-job: loading customer data")

or with a job group:

spark.sparkContext.setJobGroup("my-etl-job", "loading customer data")

These labels will appear in the collected reports and help you correlate metrics back to specific parts of your application.

Step 2 — Set up the listener

The listener can be configured in two ways. The default and recommended approach is via Spark properties, which requires no code changes.

For Databricks-specific setup, see Databricks.

Step 3 — Run your Spark application

Run your application as usual. PerfGazer will collect metrics in the background and flush them to the configured sink.

PerfGazer registers a shutdown hook that ensures the listener is closed gracefully when the driver JVM exits, regardless of which setup method you used.

Step 4 — Analyze the listener data

Once your application has run, the collected reports can be analyzed in different ways depending on your environment and preference:

Note: at application shutdown, PerfGazer prints view creation snippets in the logs that match your configuration. These are a convenient starting point for SQL analysis.