User Guide
Follow these steps to get PerfGazer up and running.
Step 1 — Annotate your Spark code
To make the listener data easier to analyze, add job descriptions to your Spark code using setJobDescription or setLocalProperty:
spark.sparkContext.setJobDescription("my-etl-job: loading customer data")
or with a job group:
spark.sparkContext.setJobGroup("my-etl-job", "loading customer data")
These labels will appear in the collected reports and help you correlate metrics back to specific parts of your application.
Step 2 — Set up the listener
The listener can be configured in two ways. The default and recommended approach is via Spark properties, which requires no code changes.
For Databricks-specific setup, see Databricks.
Step 3 — Run your Spark application
Run your application as usual. PerfGazer will collect metrics in the background and flush them to the configured sink.
PerfGazer registers a shutdown hook that ensures the listener is closed gracefully when the driver JVM exits, regardless of which setup method you used.
Step 4 — Analyze the listener data
Once your application has run, the collected reports can be analyzed in different ways depending on your environment and preference:
Note: at application shutdown, PerfGazer prints view creation snippets in the logs that match your configuration. These are a convenient starting point for SQL analysis.