Data Model Reference
PerfGazer writes reports as JSON files. Each report type maps to a SQL temporary view. The schemas below describe the structure of each view.
job view
Job-level execution report. One row per completed Spark job.
| Column | SQL Type | Unit | Description |
|---|---|---|---|
| jobId | BIGINT |
Unique job identifier | |
| groupId | STRING |
Job group identifier | |
| jobName | STRING |
Name of the job | |
| jobStartTime | BIGINT |
ms | Epoch timestamp when the job started |
| jobEndTime | BIGINT |
ms | Epoch timestamp when the job ended |
| sqlId | STRING |
Associated SQL execution identifier | |
| stages | ARRAY<INT> |
List of stage IDs in this job |
sql view
SQL query execution report with SQL plans (logical, physical, ...) and their node metrics. One row per completed SQL execution.
| Column | SQL Type | Unit | Description |
|---|---|---|---|
| sqlId | BIGINT |
Unique SQL execution identifier | |
| description | STRING |
SQL query description | |
| details | STRING |
Extended query execution plan | |
| nodes | ARRAY<STRUCT<sqlId: BIGINT, jobName: STRING, nodeName: STRING, coordinates: STRING, metrics: MAP<STRING, STRING>, isLeaf: BOOLEAN, parentNodeName: STRING>> |
Physical plan nodes with execution metrics |
SqlNode
| Column | SQL Type | Unit | Description |
|---|---|---|---|
| sqlId | BIGINT |
SQL execution this node belongs to | |
| jobName | STRING |
Name of the job that triggered this SQL | |
| nodeName | STRING |
Spark physical plan operator name | |
| coordinates | STRING |
Dot-separated position in the plan tree, e.g. '0.1.2' | |
| metrics | MAP<STRING, STRING> |
Operator metrics as key-value pairs | |
| isLeaf | BOOLEAN |
True if this node has no children in the plan tree | |
| parentNodeName | STRING |
Name of the parent operator in the plan tree |
stage view
Stage-level execution report. One row per completed Spark stage.
| Column | SQL Type | Unit | Description |
|---|---|---|---|
| stageId | INT |
Unique stage identifier | |
| stageSubmissionTime | BIGINT |
ms | Epoch timestamp when the stage was submitted |
| stageCompletionTime | BIGINT |
ms | Epoch timestamp when the stage completed |
| readBytes | BIGINT |
bytes | Total input bytes read |
| writeBytes | BIGINT |
bytes | Total output bytes written |
| shuffleReadBytes | BIGINT |
bytes | Total shuffle bytes read |
| shuffleWriteBytes | BIGINT |
bytes | Total shuffle bytes written |
| execCpuNs | BIGINT |
ns | Executor CPU time |
| execRunNs | BIGINT |
ns | Executor run time |
| execJvmGcNs | BIGINT |
ns | Executor JVM garbage collection time |
| attempt | INT |
Stage attempt number | |
| memoryBytesSpilled | BIGINT |
bytes | Bytes spilled to memory |
| diskBytesSpilled | BIGINT |
bytes | Bytes spilled to disk |
task view
Task-level execution metrics. One row per completed Spark task.
| Column | SQL Type | Unit | Description |
|---|---|---|---|
| stageId | INT |
Stage this task belongs to | |
| taskId | BIGINT |
Unique task identifier | |
| taskDuration | BIGINT |
ms | Wall-clock duration of the task |
| taskLaunchTime | BIGINT |
ms | Epoch timestamp when the task was launched |
| taskFinishTime | BIGINT |
ms | Epoch timestamp when the task finished |
| executorRunTime | BIGINT |
ms | Time spent running the task on the executor |
| executorCpuTime | BIGINT |
ns | CPU time consumed by the executor |
| executorDeserializeTime | BIGINT |
ms | Time to deserialize the task on the executor |
| executorDeserializeCpuTime | BIGINT |
ns | CPU time spent deserializing the task |
| resultSize | BIGINT |
bytes | Size of the serialized task result |
| diskBytesSpilled | BIGINT |
bytes | Bytes spilled to disk |
| memoryBytesSpilled | BIGINT |
bytes | Bytes spilled to memory |
| bytesRead | BIGINT |
bytes | Input bytes read |
| recordsRead | BIGINT |
Input records read | |
| jvmGCTime | BIGINT |
ms | Time spent in JVM garbage collection |
| bytesWritten | BIGINT |
bytes | Output bytes written |
| recordsWritten | BIGINT |
Output records written | |
| peakExecutionMemory | BIGINT |
bytes | Peak execution memory used |
| resultSerializationTime | BIGINT |
ms | Time spent serializing the result |
| fetchWaitTime | BIGINT |
ms | Time spent waiting for shuffle fetch |
| localBlocksFetched | BIGINT |
Number of local blocks fetched during shuffle | |
| localBytesRead | BIGINT |
bytes | Bytes read from local shuffle blocks |
| remoteBlocksFetched | BIGINT |
Number of remote blocks fetched during shuffle | |
| remoteBytesRead | BIGINT |
bytes | Bytes read from remote shuffle blocks |
| remoteBytesReadToDisk | BIGINT |
bytes | Remote shuffle bytes read to disk |
| totalRecordsRead | BIGINT |
Total records read including shuffle | |
| remoteRequestsDuration | BIGINT |
ms | Time spent on remote shuffle requests |
| shuffleBytesWritten | BIGINT |
bytes | Shuffle bytes written |
| shuffleRecordsWritten | BIGINT |
Shuffle records written | |
| shuffleWriteTime | BIGINT |
ns | Time spent writing shuffle data |