Results are closing…
We arrived to the final part, soon we’ll know all the details of how we managed to reach performance gain of a remarkable 30%.
We know already:
The only things ahead are to test, go live and harvest the glory 🙂 Let’s go!
Our concept handles the target SQL as a “black box”, which gets inputs and produces outputs. We pry into details only if necessary. We want to modify this box to produce the same outputs for the same inputs, while consuming less resources / running quicker. Therefore the testing methodology will also fit into this schema: let’s put the same input as the original SQL got, and compare the outputs.
What input to be used?
Theoretically, we should use input data, which covers all “special cases” and contain representative data, regarding volume and distribution of all aspects. Sounds very complex. For real life testing, I typically use exactly the same data as a daily execution of the target SQL got. Usually it is covering enough, and the best representing the live environment, since it is it. Take care: if different days produce significantly different input data (eg. bill run days / non bill run days), use the data from “heavy days”.
How get the inputs?
What type is the SQL? Report (persistent input) or ETL (input usually deleted already, depending on preserve/delete of temp data).
Is daily batch or continuous Data Warehouse load? The first case ensures a stable data environment for during the day, while the second doesn’t.
- If production input data is available, copy it and run the optimized SQL with it.
- If not, reproduce it (from earlier available phase of the dataflow, sometimes from “stage” data)
- If not reproducible (continuously changing DB), save a “snapshot” as a stable input for both the original and the optimized SQL
How to produce the outputs?
It depends again.
We need two outputs: original and the optimized SQL’s.
- If original output is still available, copy a snapshot.
- If original is not available, produce it with the original code, but be sure to use the same input as optimized SQL will use.
- If we created an input snapshot, produce output based on it.
- Of the optimized
- All the way we got to run the optimized code. Be sure that the same input was used like the “original” SQL had.
How to compare the outputs?
We branch if the output is a table or a file.
In this case the content must be sorted, deterministic (based on unique value set)
Unix “diff” command will give the answer.
Before using “diff”, “ls -la” will tell the file size. If the two files differ in size, the result cannot be the same, but it is not enough.
We have precisely the same table structures, it is a must (take care of it, it is a part of the concept of not just the testing, but the optimization either!).
- 4 steps check
- select count(*) from “orig_out_table”;
- select count(*) from “opt_out_table”;
- select * from “orig_out_table” MINUS select * from “new_out_table”;
- select * from “new_out_table” MINUS select * from “orig_out_table”;
- a. and b. must be equal
- c. and d. must be empty both
If all applies, the two sets are equivalent – OK
This method handles the NULL values also (null equals null oppsite to ‘=’ operator)
A performance optimization is the same as a regular functional development on the Data Warehouse.
- It is highly advisable to align to the standards, even if the issues are “very urgent”.
- Documentations and version control is mandatory, only patches are allowed( no direct SQL DDLs). As the company standards and methodologies declare it.
- Documentation is very important for us either, since the output of the performance optimization is the amount of resource gained, which is sometimes not as obvious as a new function.
Without documentation we are unable/challenged to demonstrate our achievements.
Each drop will contain its specialities. Some has long running patches, others need additional temp or perm space. Pay attention to collect them in advance, do not frustrate the DBA with a lot of failures during/after the installation of the optimization.
Our goal is to reach satisfaction, and leave good impressions, since performance optimization is a recurring process. So let’s describe all the required (space, permissions, external activities, like archiving) and expected (what goes faster, outage of a table for a while, etc) things.
Last but not least: a very important thing for us, after each drops went live: document the exact go-live time and script versions to be able to track back everything. Including other developments’ possible cross-effects.
The basis of SQL level optimization is to optimize “isolated” components of the system. However it cannot be really true always.
The Data Warehouse is a very complex set of dependent components and processes. If I modify something, it will have influence on other things through more ways:
The ETL and reporting processes typically organized by some dependency system. If we optimize one or more components significantly modify their runtime, see our results below. The timing of the processes can be significantly modified this way: some things will fall parallel to each other, while earlier did not ever meet, and vice versa. If the dependencies are not defined correctly, and only some long-running tasks ensured (in practice only) the availability of a result data for a consumer process, the optimization of the long task will eliminate this delay: the dependency will not apply.
When we optimize Teradata queries, we use techniques, which have effects on other processes. These are for example:
- Table PI changes
- Defining secondary/join indices
- Collecting new / refreshing old statistics
After we delivered these kind of modifications, we got to check those SQLs that use the affected object: whether their performance stayed OK. PTA provides good help for this with the “Object usage search” function.
I have an additional advise regarding statistics change. When we create a new statistics we got to look after the regular refreshing on it, principally if it covers a fast changing, low cardinality (typically partitiong) column. Otherwise it will turn to a time bomb later…
After a drop went live, the next 1-2 days it is highly recommended to check the DBQL logs if everything goes as planned.
Though we test everythigh very carefully, performance is not a 100% stable thing. The Teradata Optimizer will decide regarding the execution plan right that time the query is executed.
And things can change. Data distribution, random-amp-stats, other settings can change between the testing and the go-live. It is not a tragedy, but assume some percentage of iterations.
The measurement goes similarly to the “before” round: identify the log items and take a one week average.
However the identification is a bit different. While the “before” is always one SQL statement per item, the “after” may be several. Sometimes the optimized version is broken down into more SQL statements. The additional statistics, data preparation steps also got to be involved. Therefore we got to collect manually, which are the exact successors of the tuned query.
PTA helps again: search the session of the first run of the optimized query, and all the SQLs will be there.
Collect the appropriate ReqNum (request number: unique ID within a session) of the sussessor SQLs, and register in the documentation.
The next days it is easier: use the PTA “Advanced search”. Fulfill the queryband and the list of the ReqNums, separated by comma (,). The result will contain all the items you need.
Switch to the “Summary view”, and you will get the resource totals broken down on daily level. This way a whole week’s data is collectable in one round.
Results of our project – facts in numbers
Let’s see what could we achieve in our project. I summarize the numerical results in this table:
The reasons of preformance problems were these:
|ID||Reason||Solution (high level)|
|1||SQL breakup||Store partial results (filtered table) in volatile w/appropriate PI|
|2||MissingStat||Collect stats on join columnset|
|3||HashCollision||Patch to MULTISET, no PI change, dedup replaced to GROUP BY|
|4||MissingStat||Collect stat on a filter column (low cardinality)|
|5||MissingStat||Collect stat on a filter column (low cardinality)|
|6||HashCollision||Patch to MULTISET, no PI change, no dedup needed|
|7||SQL breakup||“Hidden product join”. Store partial results with good PI|
|8||PI change||Target table PI change|
|9||OLAP distribution||The “ORDER BY” clause added with a high cardinality postfix (distribution)|
|10||SQL rewrite||“OR”-ed join condition replaced with “UNION”|
|11||PI change||Target table PI change|
|12||HashCollision||Patch to MULTISET, no PI change, no dedup needed|
|13||PI change||Source table PI change|
|14||SQL breakup||Store partial results in volatile w/appropriate PI, collect stat on them|
|15||SQL rewrite||Product Join decomposition, as described here|
|16||SQL breakup||Store partial results in volatile w/appropriate PI, collect stat on them|
|17||PI change||Target table PI change – extreme gain|
|18||SQL breakup||Store partial results in volatile w/appropriate PI, collect stat on them|
From 350h 59min decreased to 6h 46min, which is 344h 13min = 98% gain
From 7h 7min decreased to 0h 49min => which is 6h 18min = 88% gain
Orig daily average of total Impact CPU was: 1112hours.
The gained 344hours mean 344/1112 = 30,9% gain of the system total
- Optimized 18 scripts
- Gained 98% Impact CPU on them
- Reduced system CPU load by 30,9%
- The total project span was 4 weeks and used 12 Man Working Days human resource
Thanks for following us. TRY THIS AT HOME 🙂
The next post is about a Teradata compress optimization project.
16% gain on a system, where multi value compression is already set.
Stay tuned for the details!