Setup via Spark Properties
This approach requires no code changes. You only need the PerfGazer JAR on the classpath.
A typical usage via spark-shell is shown below (for spark-submit it is similar).
Use the latest release version: .
spark-shell \
--packages io.github.amadeusitgroup:perfgazer_spark_3-5-2_2.12:0.0.1 \
--conf spark.driver.bindAddress=127.0.0.1 \
--conf spark.driver.host=127.0.0.1 \
--conf spark.extraListeners=com.amadeus.perfgazer.PerfGazer \
--conf spark.perfgazer.sink.class=com.amadeus.perfgazer.JsonSink \
--conf spark.perfgazer.sink.json.destination=/tmp/perfgazer/jsonsink/date={{perfgazer.now.year}}-{{perfgazer.now.month}}-{{perfgazer.now.day}}/applicationId={{spark.app.id}}
Note:
spark.driver.bindAddressandspark.driver.hostforce Spark to bind to the loopback interface (127.0.0.1). This is required on macOS to prevent the OS firewall from blocking Spark's internal Netty RPC channel. Without these settings, macOS may prompt to allow network access and fail if denied.
Available properties
PerfGazer settings
| Property | Default | Description |
|---|---|---|
spark.perfgazer.sql.enabled |
true |
Enable/disable SQL-level metrics collection |
spark.perfgazer.jobs.enabled |
true |
Enable/disable job-level metrics collection |
spark.perfgazer.stages.enabled |
true |
Enable/disable stage-level metrics collection |
spark.perfgazer.tasks.enabled |
false |
Enable/disable task-level metrics collection |
spark.perfgazer.max.cache.size |
100 |
Maximum number of events to keep in memory |
spark.perfgazer.sink.class |
— | Fully qualified class name of the sink to use |
JsonSink settings
| Property | Default | Description |
|---|---|---|
spark.perfgazer.sink.json.destination |
— | Destination path for JSON output |
spark.perfgazer.sink.json.writeBatchSize |
100 |
Number of records to accumulate before writing to disk |
spark.perfgazer.sink.json.fileSizeLimit |
209715200 (200 MB) |
File size threshold before rolling to a new file |
spark.perfgazer.sink.json.asyncFlushTimeoutMillisecsKey |
— | Max time between periodic flushes (ms) |
spark.perfgazer.sink.json.waitForCloseTimeoutMillisecsKey |
— | Max time to wait for graceful sink close (ms) |
Note:
JsonSinkuses the POSIX interface on the driver to write JSON files.