Skip to content

Setup via Spark Properties

This approach requires no code changes. You only need the PerfGazer JAR on the classpath.

A typical usage via spark-shell is shown below (for spark-submit it is similar). Use the latest release version: GitHub Release.

spark-shell \
  --packages io.github.amadeusitgroup:perfgazer_spark_3-5-2_2.12:0.0.1 \
  --conf spark.driver.bindAddress=127.0.0.1 \
  --conf spark.driver.host=127.0.0.1 \
  --conf spark.extraListeners=com.amadeus.perfgazer.PerfGazer \
  --conf spark.perfgazer.sink.class=com.amadeus.perfgazer.JsonSink \
  --conf spark.perfgazer.sink.json.destination=/tmp/perfgazer/jsonsink/date={{perfgazer.now.year}}-{{perfgazer.now.month}}-{{perfgazer.now.day}}/applicationId={{spark.app.id}}

Note: spark.driver.bindAddress and spark.driver.host force Spark to bind to the loopback interface (127.0.0.1). This is required on macOS to prevent the OS firewall from blocking Spark's internal Netty RPC channel. Without these settings, macOS may prompt to allow network access and fail if denied.

Available properties

PerfGazer settings

Property Default Description
spark.perfgazer.sql.enabled true Enable/disable SQL-level metrics collection
spark.perfgazer.jobs.enabled true Enable/disable job-level metrics collection
spark.perfgazer.stages.enabled true Enable/disable stage-level metrics collection
spark.perfgazer.tasks.enabled false Enable/disable task-level metrics collection
spark.perfgazer.max.cache.size 100 Maximum number of events to keep in memory
spark.perfgazer.sink.class Fully qualified class name of the sink to use

JsonSink settings

Property Default Description
spark.perfgazer.sink.json.destination Destination path for JSON output
spark.perfgazer.sink.json.writeBatchSize 100 Number of records to accumulate before writing to disk
spark.perfgazer.sink.json.fileSizeLimit 209715200 (200 MB) File size threshold before rolling to a new file
spark.perfgazer.sink.json.asyncFlushTimeoutMillisecsKey Max time between periodic flushes (ms)
spark.perfgazer.sink.json.waitForCloseTimeoutMillisecsKey Max time to wait for graceful sink close (ms)

Note: JsonSink uses the POSIX interface on the driver to write JSON files.