The Teradata database is a pretty expensive box, which serves a lot of people in the company – in a direct or indirect way. The efficient utilization of its resources is essential, from both financial and user satisfaction aspects.
This series of posts will follow a real-life Teradata optimization project’s methodology, steps and results, with this achievement:
Levels of performance optimization
We can typically apply 3 levels of Teradata performance optimization:
Meaning: Analyze user requests and system activity, then refine them, eliminating unnecessary activities.
This level checks if the executed processes are really needed, and they do what the user really needs. Eg: eliminate unnecessary data loads or reports; limit a report to a shorter period
- Big gains possible w/o any trade-off
- Comprehensive analysis (incl. find and interview appropriate persons)
Meaning: Pick individual SQLs and optimize its execution
This approach finds the top resource consumer SQL statements and reduces the resource consumption, while produces the same results.
Eg: Change PI; add/refresh statistics; rewrite SQL
- Easy analysis
- Low risk (small affect, good testing possibilities)
- Needs SQL level analysis
Meaning: Tune low level system (RDBMS, OS, Hardware) parameters
Tuning on infrastructure level is a global intervention regardless of users/business areas or importance
- Miracle in some special cases
- High risk (affects the all system)
- Complex cross-dependencies
I prefer and recommend SQL level optimization (b.) in the beginning, because it delivers high value along with low risk and scaleablity of project. Later on can the others come into focus.
Goal of SQL optimization
- The goals of an optimization project can be different. Some examples:
- “Finish load procedure until 6am”
- “Present my report 2 hours earlier”
- “Reduce CPU load by 20%”
- “Let hardware extension postponed by 6 months”
The goal of our project was: speed up the “load batch” process, in order to let a specific identified data area load finished and available for further usage 1 hour earlier than before.
The key input for an SQL optimization is the information about what exactly happens:
What, when, how:
- Which SQL statements ran, when, what steps were executed, how much resources did the consume with what skewness, etc.
- All these can be found in the DBQL tables of Teradata (dbc.dbql*)
- Switching on them will not cause significant extra load, but it is required to archive/purge them regularly (see PDCR tool in Teradata package).
Teradata Administrator enables to switch logging on (Tools/Query logging…) and one can check the actual settings:
select * from dbc.dbqlruletbl;
- Explain/Obj/Sql/Step flags should be set to ‘T’, others are not necessary. Global (1 record with UserID: 00000000 ) logging is recommended.
- If continuous logging is not possible, logging should be switched on 1 week before starting the optimization project
What / Who runs the statements?
- Teradata provides a great feature for this: Query Banding
- All executed SQL statements can be flagged with several information elements regarding the environment, job, script, version etc. which helps identifying a specific one.
- The queryband data will appear in the DBQL logs – if the running environment is configured to flag the queries being run.
- Querybanding is not essential for an optimization, but helps really a lot to identify the exact point in the data warehouse ecosystem, where the modifications should be done (which job/jobstep/script).
What are the environmental conditions?
- Eg. Data volumes, statistics, etc.
- This information is available in the data dictionary (DBC tables and views), but only as an actual – not for the time the statement run.
- Most of the cases it is OK for us, but this property needs to be kept in mind.
When all these are finished, we are ready to do a successful performance optimization project.
The next post is about how to identify the good candidate SQL statements for optimization.