Getting Started
Quick Start
The fastest way to try PerfGazer is via spark-shell:
spark-shell \
--packages io.github.amadeusitgroup:perfgazer_spark_3-5-2_2.12:0.0.1 \
--conf spark.extraListeners=com.amadeus.perfgazer.PerfGazer \
--conf spark.perfgazer.sink.class=com.amadeus.perfgazer.JsonSink \
--conf spark.perfgazer.sink.json.destination=/tmp/perfgazer/output
Note
Change the version to the latest release:
Run some Spark actions:
spark.range(1000000).groupBy("id").count().collect()
Then explore the generated reports:
ls /tmp/perfgazer/output/
# job-reports-*.json, stage-reports-*.json, sql-reports-*.json
You can now query them directly in Spark (example for job reports):
CREATE OR REPLACE TEMPORARY VIEW job
USING json
OPTIONS (path '/tmp/perfgazer/output/job-reports-*.json');
SELECT * FROM job;
Next Steps
- User Guide - Full setup, configuration options, and data analysis
- Contributor Guide - Build instructions and development setup