Skip to content

Data Model Reference

PerfGazer writes reports as JSON files. Each report type maps to a SQL temporary view. The schemas below describe the structure of each view.

job view

Job-level execution report. One row per completed Spark job.

Column SQL Type Unit Description
jobId BIGINT Unique job identifier
groupId STRING Job group identifier
jobName STRING Name of the job
jobStartTime BIGINT ms Epoch timestamp when the job started
jobEndTime BIGINT ms Epoch timestamp when the job ended
sqlId STRING Associated SQL execution identifier
stages ARRAY<INT> List of stage IDs in this job

sql view

SQL query execution report with SQL plans (logical, physical, ...) and their node metrics. One row per completed SQL execution.

Column SQL Type Unit Description
sqlId BIGINT Unique SQL execution identifier
description STRING SQL query description
details STRING Extended query execution plan
nodes ARRAY<STRUCT<sqlId: BIGINT, jobName: STRING, nodeName: STRING, coordinates: STRING, metrics: MAP<STRING, STRING>, isLeaf: BOOLEAN, parentNodeName: STRING>> Physical plan nodes with execution metrics

SqlNode

Column SQL Type Unit Description
sqlId BIGINT SQL execution this node belongs to
jobName STRING Name of the job that triggered this SQL
nodeName STRING Spark physical plan operator name
coordinates STRING Dot-separated position in the plan tree, e.g. '0.1.2'
metrics MAP<STRING, STRING> Operator metrics as key-value pairs
isLeaf BOOLEAN True if this node has no children in the plan tree
parentNodeName STRING Name of the parent operator in the plan tree

stage view

Stage-level execution report. One row per completed Spark stage.

Column SQL Type Unit Description
stageId INT Unique stage identifier
stageSubmissionTime BIGINT ms Epoch timestamp when the stage was submitted
stageCompletionTime BIGINT ms Epoch timestamp when the stage completed
readBytes BIGINT bytes Total input bytes read
writeBytes BIGINT bytes Total output bytes written
shuffleReadBytes BIGINT bytes Total shuffle bytes read
shuffleWriteBytes BIGINT bytes Total shuffle bytes written
execCpuNs BIGINT ns Executor CPU time
execRunNs BIGINT ns Executor run time
execJvmGcNs BIGINT ns Executor JVM garbage collection time
attempt INT Stage attempt number
memoryBytesSpilled BIGINT bytes Bytes spilled to memory
diskBytesSpilled BIGINT bytes Bytes spilled to disk

task view

Task-level execution metrics. One row per completed Spark task.

Column SQL Type Unit Description
stageId INT Stage this task belongs to
taskId BIGINT Unique task identifier
taskDuration BIGINT ms Wall-clock duration of the task
taskLaunchTime BIGINT ms Epoch timestamp when the task was launched
taskFinishTime BIGINT ms Epoch timestamp when the task finished
executorRunTime BIGINT ms Time spent running the task on the executor
executorCpuTime BIGINT ns CPU time consumed by the executor
executorDeserializeTime BIGINT ms Time to deserialize the task on the executor
executorDeserializeCpuTime BIGINT ns CPU time spent deserializing the task
resultSize BIGINT bytes Size of the serialized task result
diskBytesSpilled BIGINT bytes Bytes spilled to disk
memoryBytesSpilled BIGINT bytes Bytes spilled to memory
bytesRead BIGINT bytes Input bytes read
recordsRead BIGINT Input records read
jvmGCTime BIGINT ms Time spent in JVM garbage collection
bytesWritten BIGINT bytes Output bytes written
recordsWritten BIGINT Output records written
peakExecutionMemory BIGINT bytes Peak execution memory used
resultSerializationTime BIGINT ms Time spent serializing the result
fetchWaitTime BIGINT ms Time spent waiting for shuffle fetch
localBlocksFetched BIGINT Number of local blocks fetched during shuffle
localBytesRead BIGINT bytes Bytes read from local shuffle blocks
remoteBlocksFetched BIGINT Number of remote blocks fetched during shuffle
remoteBytesRead BIGINT bytes Bytes read from remote shuffle blocks
remoteBytesReadToDisk BIGINT bytes Remote shuffle bytes read to disk
totalRecordsRead BIGINT Total records read including shuffle
remoteRequestsDuration BIGINT ms Time spent on remote shuffle requests
shuffleBytesWritten BIGINT bytes Shuffle bytes written
shuffleRecordsWritten BIGINT Shuffle records written
shuffleWriteTime BIGINT ns Time spent writing shuffle data