Product join (of tables A and B ) is the most simple method of join implementation:
Produce each of <A;B> record combinations, say take each records from A singly, and match it with each records of B one-by-one.
Test the join condition on each produced <A;B> record pairs, and eliminate those combinations where the condition fails.
The two steps are often combined, and the “testing phase” is executed right after a record combination is generated, and the non valid combinations right after dropped. This saves a lot of temp space.
Why don’t we like product joins?
Well, it has a really bad reputation. It is slow, stuffs CPU, etc.
Yes, it usually is, does. It is the brute force method for executing a join, with costs in order of N*M (where N, M are the record numbers of the joinable tables)
Indeed there are situations when it is the best choice, or the only feasible way.
When is it good or necessary?
Please note that product join is the method what is always applicable, independently of all circumstances.
Product join is typically simple, dumb and slow algorithm, this is why we do not like it, but has a very important advantage: requires no pre-processing.* This is why we LIKE IT:)
If we have to join a really large table to a very small table (couple of records) product join is far the most effective method, since the sort of a very large table ( order of N*logN ) can cost a lot, while joining to 1-2 records is really not a big deal.
There are join situations when the only way to go is the product join. Why? Because of the join condition. The “clever joins” (merge, hash) require some information and/or condition that somehow enables to cheat the A x B comparisons: reduce them to the ones that really necessary, and be done in a more effective manner.
* OK, in Teradata this means: only requires that the matchable records from both tables must be on the same AMP. This implies the “small” table to be duplicated to all AMPs.
Merge join example
join B on A.customer_id = B.customer_id
and A.trx_dt between B.eff_dt and B.exp_dt
Customer_id clause is in AND condition with the others
Customer_id is selective enough that hash(customer_id) can reduce the comparisons reasonably
Note that A and B must be sorted (re-sorted) by the hash of customer_id
Product join example
join B on substr(A.telephone_no,1,B.prefix_length) = B.telephone_no_prefix
There is no comparison reducing partial-condition
Note that neither of the tables required to be sorted in a specific order.
Unavoidable product joins
Function used (eg. substr())
Dependent expression is used (eg. A.x+B.y = A.z)
Cross join: intentional Cartesian product
Avoidable product joins
Data type mismatch
The merge join example above works only if customer_no in A and B tables have the same “style” data types, since their hash value will match only in this case. Say hash(13674) <> hash(‘13674’), however integer is compatible with decimal, and char is compatible with varchar.
Pay attention on data type consistence during physical data modeling.
Use domains to eliminate the possibility of mismatch
Align to used data types when defining temp tables, or use “create table as …” statements
If you cannot avoid mismatch, relocate the necessary data to temp tables with proper data types during processing.
Let’s assume the following join condition:
select ... from A
join B on A.col1 = B.Col1 OR A.Col2 = B.Col2
This is equivalent, w/o compulsory product join :
select ... from A
join B on A.col1 = B.Col1 UNION select ...
from A join B on A.Col2 = B.Col2
As I mentioned before product join is the most effective join between a very large and a really small (couple of records) table. If the optimizer thinks that a table is pretty small, but it is not indeed, it may choose a product join in all good faith, misleaded by a stale or missing statistics.
Define and keep fresh those statistics by the optimizer can determine the size of the joinable record sets properly.
How to find avoidable product joins
It is not trivial to list the avoidable product joins. Practically all product joins are required to be examined one-by-one and judged to be avoidable or not. And if avoidable, what to do for.
I strongly recommend to use PRISE Tuning Assistant for both finding the product joins and analyzing the possibility and necessity of elimination:
List top consuming queries with product join(s)
Check the PROD JOIN steps: which tables are processed that way
Check those join conditions for cases described above
What to do if cannot be avoided?
In this case I recommend to try the decomposition, described here.
It can help reducing the number of comparisons, saving CPU and runtime.
This is the most familiar error message in Teradata world:
“Failure 2646 No more spool space”
What does it really mean, what is it caused by?
Let’s get back to the basics.
What is spool space?
Spool space is a temporary area, that can be used to store the part-results during query processing, as well as volatile tables. All free space in the database that are not allocated by PERM data, technically can be used for spool area, since a PERM data does not want to allocate that area.
Each database users may have a “spool limit” that restricts the user to allocate more spool area at a time, than its limit. Keep in mind that all active sessions of a username must share the spool limit together.
Teradata is a massive parallel system, therefore the spool limit must be interpreted on AMP level:
Eg: 100AMP system, a user with 10G spool limit means: 100M spool/AMP
What is spool space limit good for?
This limitation method is a quite simple way to cut those queries from the system that would suck too much resources. There is no exact relationship between high spool usage and ineffective query, but statistically the correlation is high.
Practically: a bad query is being kicked off before is consumes too much resources unnecessarily.
No more spool space scenarios
System ran out of spool space
This is the most rare situation, forget about. There are too few free space on the system, but this situation used to be avoided by defining a “SpoolReserve” database, where no objects are created, this way that area is always available for spool.
If many “big spool limit” users run high spool queries parallel, then this rare situation can yet occure.
Multiple session of the user are active together
This is a quite rare situation also. Check the active users from dbc.sessioninfo.
All existing volatile tables reside in your available spool space, reducing the available. If you create many, and even with skewed distribution, you can stuff your spool up. Choose “primary index” carefully, when defining volatile tables also.
Improper execution plan
These are the >90% of cases that cause the “No more spool space” errors. Let’ see how:
“Duplication to all AMPs” of a non-small set of records
The root cause is typically missing or stale statistics. Either system thinks that much less records will be duplicated than the real (sometimes billions of records get in this kind of spools), or knows this exactly, but on the other branch of the query there are more low quality estimations, and this execution seems to be cheaper.
Redistribution of records by a hash that causes skewed distribution
Check the corresponding blog post: Accelerate skewed joins
Retrieve huge amount of records into spool (locally or redistributed onto the AMPs)
Specific query structures imply this execution, like: join to a view that “union all”-s big tables.
I suggest to use PRISE Tuning Assistant to identify what is the problem. It spectacularly displays which execution step falls in the problems above. Increasing the spool limit will not solve the problems in the most cases.
Too big task
Sometimes a given SQL query requires big spool area to be performed, even with the best execution plan.
This is the only case when raising spool limit is the solution. But first you have to understand that the task is really big. PRISE Tuning Assistant is a good tool for identify this in a minute.
We already finished with the SQL tuning part, where the resource gain was in the order of 100k USD (projected onto a 1 year operation) within a 2-3 weeks project.
There is another really good opportunity to increase our achievements in savings in the same order of magnitude: the data storage.
Why do we discuss it as performance optimization?
You will be convinced later.
Teradata DBs store huge amounts of data, sometimes Petabytes. The efficiency of storing and processing this incredible volume of data seriously affects the Data Warehouse costs. Fortunately Teradata includes marvelous features for optimizing this area either.
This post will show a project, gained 16% on storage space, in turn manual compression was already applied.
Beliefs regarding compression
There are a lot of fears regarding compression, coming from the old days.
“Compression is slow“, “Compression is unsafe“, “We use compression only for the archives“.
These beliefs have some basis, but it is time to re-assess them.
The biggest fear of the compression is speed. Why do we discuss a method regarding performance, which slows things down?
Because compression not only slows down. Some things speed up, even though it is hard to belive. Anyway, a lot of new, effective compression algorithms and codecs has been developed so it is time to consider compression in the production stack
The speed of compression and decompression is not equal, the second typically less costly. The write and read access frequency significantly differ in a Data Warehouse environment – in favour or reads.
When we access the data the amount of I/O required, depends on the number data blocks the table allocates. If the data is compressed, we have to access far less data blocks with I/O operations. I/O is the most expensive resource in the Data Warehouse.
We do not have to compress everything. There are big and rarely used data elements, the compressed storage of them are very advisable.
We do not have to decompress everything. Teradata is very clever to decompress only the required columns. In addition it will decompress data only at the appropiate stage of execution, until then data is “travelling” compressed, consuming less spool and cache memory.
Reliablity and comfort
Embedded compression methods will hide all these kind of things from the users. Everything can be handled the same way as before compression: fully transparent. User will not even catch sight of compression exists.
Types of compression
There is life over the “ZIP”. There are compression methods that require way less resource for encoding and decoding than the “ZIP” style algorithms. They are efficient in special cases, and will achieve less compress rate, but they are the ones we are looking for.
The Teradata compression
Teradata offers 3 types of compression
BLC – Block Level Compression
This is like a “ZIP” for each data blocks. All the data is compressed on a high rate. The whole data has to be decompressed to reach any data within. Very costly, and requires “compression card” hardware within the Teradata servers for good performance.
Very simple, got to be switched on/off. Can be set on system level or finer grains.
ALC – ALgorihmic Compression
This is defined on table columns. Several compression algorithms are offered, must be chosen manually. Got to know the data and the algorithms, which are good for which types of data, what are the resource costs.
ALC is useful in special cases (like long URLs), but the usage is not really wide-spread
MVC – Multi Value Compression
MVC is the magic in Teradata, regarding the efficient storage and processing. It is a “frequent value” type simple compression. You specify those values which occurence is stored on max 8 bits (1byte), regardless of the length of the field value. Tereadata will solve the internals – the encoding and decoding – the usage is full transparent. SQLs are not affected, and query execution plans do not change!
But you are responsible to define the values to be compressed. Bad choice may result bigger table as the original, however good choice results an average 50% compress, or even more.
The really cool advantages are:
Can be switched on with simple* SQL operations, and 100% transparent, so no other modifications or even tests are required.
It can be used without additional hardware with good performance. Quite slight CPU overhead will come rather on encoding.
As less data blocks the tables have, the less data blocks to be moved when the data is processed. This is rationally reflected in the table scans’ I/O consumption. Queries may run faster
CPU saving (!!!)
How can it save CPU? Through the caching. There will be significantly less data blocks to be processed during the data reads. This enables less cache activity (which is quite CPU intensive operation) Queries may run faster. Even those, which are not affected by the compress – by having better cache hit ratio. This is the way we gain the CPU back.
Uncompress only when required
Teradata only decompresses the filed values when it is really required. MVC compression in charge throughout the spools either, and the value is uncompressed when it really needed, sometimes many execution steps later then the record is read from the table – opposed to the BLC.
* Simple means ALTER TABLE commands, but the exact parameters got to be chosen carefully.
MVC is the quickest and cheapest way to release free space in a Teradata Database – without losing any data.
The compress optimization project
This section will present a real-life compress optimization project, which gained 16% of storage space, while did not require more than a couple of man working days – and the PCW software.
This amount of storage resource would cost in the order of 100k USD to be purchased on the market. In addition, installation of new hardware components implies system outage.
The project discussed below had no database downtime at all.
The project scope in this case is quite simple: which tables to deal with. The DBA typically has a pattern in mind which tables can be good candidates:
Which are not in system or temporary databases
Not working tables (naming conventions…)
Those ones that haven’t examined/optimized for a while
Not planned to drop the next month
And are not re-created every day by the ETL process
And so on…
A lot of information which all are accessible from the DBC or some custom repository or from can be hardcoded in an SQL.
has a customizable SQL based table-selection feature to have a tailored one-click tool that will present your custom preference tables every time.
Since it is SQL, you can use any tables in the database, like:
Table sizes, owner, creation, last use
Last optimization time
Any infromation previously set, like exclusions, inclusions.
When the scope (=table list) is fixed, the next step is the analysis: what would be the optimal MVC settings, and calculate how much space gain can be reached with.
It is a straightforward, however exhaustive work – if done manually. Statistics calculation for ten thousands of columns, optimum calculation with tons of exceptions and version-dependent knowledge. SQL creation and measurements. A set of tasks born to be performed by a dedicated software.
PCW executes all these tasks on even thousands of tables, without continuous supervision. Works on sample data, which is good enough for a very precise calculation, meanwhile saves huge amount of CPU and I/O.
The Analysis was running like normal reports, had no interference with the daily life of the Data Warehouse.
The results got into the PCW repository including the actual compression level and consumed CPU and I/O resources (fetched from the DBQL logs automatically if required). Excel report was exported by the PCW GUI, which is the basis of the decision paper: which tables to be involved in the implementation phase.
Yes, PCW delivers a very accurate estimation of reachable gain (GBytes), before any modification is done.
When the analysis report is in hands, the only thing is to make a selection of the tables: which ones provide good savings possibility.
The current filtering was:
The savings reach 1G on a table
The gain reached 10%
The less is the percent, the more data must be processed for the specific gain. Below a limit it is not worth, except for the very big tables.
In this project we selected 178 tables to implement.
The implementation is the most exciting part of the project: the gains are realized.
The Multi value compression setup on a table is pure SQL procedure: ALTER TABLE or CREATE TABLE + INSERT SELECT commands can execute it.
The tricks are in the details:
Statistics: some Teradata versions do not allow ALTER if statistics exist
Indexes / constraints: must be re-created if CREATE + INSERT is applied
Is the data accessibility ensured continually? What level of locking is applied?
Process must be 100% safe
No data loss or inconsistency is allowed. Load must be blocked during the operation. Revert possibility in case of emergency.
Everything must be logged and measured to be able to report the achievements. Even if thousands of tables are involved.
Order of execution
If the space is very limited: what is the optimal sequence of optimalization. The gained space can be rolled ahead to give place for further tables’ optimization
This seems not that simple, however PCW will do these all. Meanwhile there is no black box: the transformations are done by BTEQ SQL scripts, that can be reviewed before executed – from the PCW GUI or in the company standard patch procedure.
These scripts contain all the above: safety checks, locks, statistics/index maintenance and logging, DBA does not need to modify the scripts they are ready made.
The implementation can be done even table-by-table as it fits to the system’s patch timing schedule. The storage gains will arise table-by-table. The results can continually tracked on the PCW GUI.
The analysis and the implementation are executed with different users with the appropriate access level for the highest data safety.
In the project we used the company’s standard patch process: created BTEQ scripts and delivered to the DBAs to be checked, scheduled and executed.
When implementation is ready all data is available for a comprehensive “achievements report” including space gained, data volume touched, CPU and I/O consumed and time span of the project.
This was one of the best ROI project in the Data Warehouse: some day of human work in total, almost a terabyte of free space, which is ~10% of the total database volume.
We reached this on a system, which had a comprehensive, however manual compress optimization before.
How much can you buy a GBytes of storage space in Teradata?
Enter it into PCW and it will calculate how much you saved with the project.
Storage space 16% gain on the optimized tables, resulting 937GBytes of database free space increase
The Analysis phase ran 42 hours, scanning 320 tables, allocating 10TBytes in total
The Implementation phase ran 22 hours, transforming 178 tables, originally allocating 5.8TBytes
The implementation consumed 79 CPU hours and 692M I/Os
Gains in $$$
The space gained was needed by an urgent development. Compress optimization solved this without HW extension. That storage space released by the project could have been bought for ~144k USD assuming 1Gb is $150.
The next post will demonstrate how PCW supports each phases of the compress optimization project.
The only things ahead are to test, go live and harvest the glory 🙂 Let’s go!
Our concept handles the target SQL as a “black box”, which gets inputs and produces outputs. We pry into details only if necessary. We want to modify this box to produce the same outputs for the same inputs, while consuming less resources / running quicker. Therefore the testing methodology will also fit into this schema: let’s put the same input as the original SQL got, and compare the outputs.
What input to be used?
Theoretically, we should use input data, which covers all “special cases” and contain representative data, regarding volume and distribution of all aspects. Sounds very complex. For real life testing, I typically use exactly the same data as a daily execution of the target SQL got. Usually it is covering enough, and the best representing the live environment, since it is it. Take care: if different days produce significantly different input data (eg. bill run days / non bill run days), use the data from “heavy days”.
How get the inputs?
What type is the SQL? Report (persistent input) or ETL (input usually deleted already, depending on preserve/delete of temp data).
Is daily batch or continuous Data Warehouse load? The first case ensures a stable data environment for during the day, while the second doesn’t.
If production input data is available, copy it and run the optimized SQL with it.
If not, reproduce it (from earlier available phase of the dataflow, sometimes from “stage” data)
If not reproducible (continuously changing DB), save a “snapshot” as a stable input for both the original and the optimized SQL
How to produce the outputs?
It depends again.
We need two outputs: original and the optimized SQL’s.
If original output is still available, copy a snapshot.
If original is not available, produce it with the original code, but be sure to use the same input as optimized SQL will use.
If we created an input snapshot, produce output based on it.
Of the optimized
All the way we got to run the optimized code. Be sure that the same input was used like the “original” SQL had.
How to compare the outputs?
We branch if the output is a table or a file.
In this case the content must be sorted, deterministic (based on unique value set)
Unix “diff” command will give the answer.
Before using “diff”, “ls -la” will tell the file size. If the two files differ in size, the result cannot be the same, but it is not enough.
We have precisely the same table structures, it is a must (take care of it, it is a part of the concept of not just the testing, but the optimization either!).
4 steps check
select count(*) from “orig_out_table”;
select count(*) from “opt_out_table”;
select * from “orig_out_table” MINUS select * from “new_out_table”;
select * from “new_out_table” MINUS select * from “orig_out_table”;
a. and b. must be equal
c. and d. must be empty both
If all applies, the two sets are equivalent – OK
This method handles the NULL values also (null equals null oppsite to ‘=’ operator)
A performance optimization is the same as a regular functional development on the Data Warehouse.
It is highly advisable to align to the standards, even if the issues are “very urgent”.
Documentations and version control is mandatory, only patches are allowed( no direct SQL DDLs). As the company standards and methodologies declare it.
Documentation is very important for us either, since the output of the performance optimization is the amount of resource gained, which is sometimes not as obvious as a new function.
Without documentation we are unable/challenged to demonstrate our achievements.
Each drop will contain its specialities. Some has long running patches, others need additional temp or perm space. Pay attention to collect them in advance, do not frustrate the DBA with a lot of failures during/after the installation of the optimization.
Our goal is to reach satisfaction, and leave good impressions, since performance optimization is a recurring process. So let’s describe all the required (space, permissions, external activities, like archiving) and expected (what goes faster, outage of a table for a while, etc) things.
Last but not least: a very important thing for us, after each drops went live: document the exact go-live time and script versions to be able to track back everything. Including other developments’ possible cross-effects.
The basis of SQL level optimization is to optimize “isolated” components of the system. However it cannot be really true always.
The Data Warehouse is a very complex set of dependent components and processes. If I modify something, it will have influence on other things through more ways:
The ETL and reporting processes typically organized by some dependency system. If we optimize one or more components significantly modify their runtime, see our results below. The timing of the processes can be significantly modified this way: some things will fall parallel to each other, while earlier did not ever meet, and vice versa. If the dependencies are not defined correctly, and only some long-running tasks ensured (in practice only) the availability of a result data for a consumer process, the optimization of the long task will eliminate this delay: the dependency will not apply.
When we optimize Teradata queries, we use techniques, which have effects on other processes. These are for example:
Table PI changes
Defining secondary/join indices
Collecting new / refreshing old statistics
After we delivered these kind of modifications, we got to check those SQLs that use the affected object: whether their performance stayed OK. PTA provides good help for this with the “Object usage search” function.
I have an additional advise regarding statistics change. When we create a new statistics we got to look after the regular refreshing on it, principally if it covers a fast changing, low cardinality (typically partitiong) column. Otherwise it will turn to a time bomb later…
After a drop went live, the next 1-2 days it is highly recommended to check the DBQL logs if everything goes as planned.
Though we test everythigh very carefully, performance is not a 100% stable thing. The Teradata Optimizer will decide regarding the execution plan right that time the query is executed.
And things can change. Data distribution, random-amp-stats, other settings can change between the testing and the go-live. It is not a tragedy, but assume some percentage of iterations.
The measurement goes similarly to the “before” round: identify the log items and take a one week average.
However the identification is a bit different. While the “before” is always one SQL statement per item, the “after” may be several. Sometimes the optimized version is broken down into more SQL statements. The additional statistics, data preparation steps also got to be involved. Therefore we got to collect manually, which are the exact successors of the tuned query.
PTA helps again: search the session of the first run of the optimized query, and all the SQLs will be there.
Collect the appropriate ReqNum (request number: unique ID within a session) of the sussessor SQLs, and register in the documentation.
The next days it is easier: use the PTA “Advanced search”. Fulfill the queryband and the list of the ReqNums, separated by comma (,). The result will contain all the items you need.
Switch to the “Summary view”, and you will get the resource totals broken down on daily level. This way a whole week’s data is collectable in one round.
Results of our project – facts in numbers
Let’s see what could we achieve in our project. I summarize the numerical results in this table:
The reasons of preformance problems were these:
Solution (high level)
Store partial results (filtered table) in volatile w/appropriate PI
Collect stats on join columnset
Patch to MULTISET, no PI change, dedup replaced to GROUP BY
Our concept of optimizing a query is quite similar to the way we find the good candidate queries. Instead of deeply understand the whole picture, we try to find the weak points and explore its reason(s). For this we need the appropriate information prepared and a good highlighting, which helps us to focus on the point. We do not want to understand each query’s meaning and SQL text, except it is unavoidable for the tuning. It will save us really a lot of time at many cases.
Similar to the “selection” phase we will use the PRISE Tuning Assistant for the individual query tunings. Let’s see how.
Save the ID(s) of the query(es)
When tuning the selected queries, we will return to one, several times. Validate an idea, check things, etc. PTA enables to save a selected list of IDs to CSV, which can be loaded any time into the search window, and the selected queries will appear on demand.
Check typical patterns
We have a standard way we examine a query and test against about a half a dozen of basic patterns – this part is discussed below. If some of them matches, we will get off the deep understanding the query. Otherwise PTA will help to understand what is exactly happening.
Sometimes it is worth to check the idea on different instances of the SQL (other days’ execution).
Document the idea and solution
It is very important to write down what we’ve found and what is the proposed solution. I typically do this iteratively. First I write my first impressions for each target queries, and then I go back and refine them. At the end it is important to have a exact, detailed specification of what to do with the SQLs. I use a word document with a standard format. Each query has a chapter with standard structure:
Id in DBQL to find the query
Place where SQL is issued
Sequence within the session, to identify
Bad performance case
Detailed description of bad performance reason.
Detailed description of what to do. It is the specification of the development.
I usually check the concept on a test environment, before I finalize the “Action”. It is important to have a similar environment for validation like the production, anyway same result is not guaranteed.
Implement the modifications – in drops
When we know what to do with each of the SQLs, we can assign them to delivery drops, negotiated with operations and developers. I typically use these aspects:
Big gains first
Same job in same drop
Big operations (patches) span evenly
Number of queries span evenly
Query analysis – Typical patterns
The first general rule, when optimizing a query is:
THERE ARE NO ALMIGHTY RULES!
Only high probability rules we have, but they are good enough in most of the cases.
My init checklist on a query
I check if there is a “peak step” which runs far more than others, or the total runtime / resource consumption is divided into more dominant steps. For this I check the “Duration%” field of the steps. If the yellow background step (longest running) has 50%+ we have a peak-step query.
Resource skewness check
Let’s see the query’s resource totals, and the skewness. If it is >50-70%, it is most often a skewness problem. They are caused by a non even distribution of data or tasks to be done.
Object skewness check
I browse through the objects take participate in the query, and check whether we have big ones with significant skewness (>10-20%).
“Peak step” examination
The goal is: Learn what is the long run caused by
Are the sources skewed?
Is the operation skewed?
Is the result much data?
Is the result placing skewed?
Is the big result “duplicated-to-all-AMPS”?
“Peak step” drilldown
I start from the longest running step (typically the most resource hungry)
Catch an overview and browse through its predecessors.
Looking for bad Fidelity*, highly Skewed or other suspicious things. All are vividly highlihgted by the PTA , I just got to catch sight of it.
Fidelity: It is the ratio of estimated and in-fact reow count of a spool. Bad values indicate missing/stale statistics or too many join steps.
When I ran over the checklist above (~ 1 minute for a very complex query with some experience), I have a good impression of the case:
If peaked, probably it is easy to get rid of the problem. If not, maybe got to deal with another query, if possible – because we got to deeply understand what is happening.
If skewed, then data is skewed or an evil hash collision exists.
Skewed source table(s) often imply skewed execution, tables’ PI may need revise.
What type of operation runs long?
This is one of the most important ones.
Most often patterns
Hash collision at Insert / Create_table_as_select
PeakStep: MRG, Target table: SET type w/NUPI, often high skew
MULTISET, different PI
Big table duplication
PeakStep: RET, Placing: Dupl, NO confidence, SpoolSize: huge
PeakStep: RET, Placing: Dupl, HI confidence, SpoolSize: huge
spool used later in a JOIN, where the other source has bad Fidelity (>10..100)
Some statistics are missing or stale. Find the NO confidence or the bad Fidelity branch, and check the “filter” condition in the explain (PTA has a condition visualization).
If “NO confidence”, then the source table will lack some statistics (on filter condition columns). In case of NO confidence, check the other branch of the join step. There will be some statistics missing.
Redistribution resulting skew
Step target is highly skewed (>70-80%)
It depends. Check source tables, exclude peak values, break up the query. Typically needs deeper analysis
Sliding-window join for many partitions
Merge join of “sliding-window” type, more than a couple of partitions involved
Apply exactly the same partitioning definition of the joined tables
Case examples in the project
Hash collision insert
This is a very simple case.
Peak step is a MRG (merge the result into the target table)
Target table has NUPI, is SET type
PI cardinality is low (9M distinct compared to 169M rows)
This means that we have a serious number of hash collisions in the table, which will slow down the insert extremely. We have two options:
change PI or
change to MULTISET
Both have its side effects, actually, we modify to a MULTISET.
Method: since table is not re-created at every script runs, I created a “patch” script to turn the table over to the new structure:
Create new structure as empty
Insert data into
Drop old table
Exclude peak value
In this case the symptom is very simple: a RET (retrieve records into a spool) creates extreme high skew (98%). The redistribution is ok, it has a good reason, it will support the next “merge join” step.
However we have one peak value in the join column: ‘X X X X X’, which does not have corresponding record in the joined table, I checked manually.
Solution: I simply excluded this value in the “WHERE” condition. It eliminated the high skew, and the query ran ok, with the same result.
Materialize a joined view
This SQL was very complex (12 tables involved), and tables all had the appropriate statistics. PTA showed, that there was complicated query-rewrite done by the optimizer, and Fidelity gone rather bad as data flowed from spool-to-spool:
There was a view, which involved 5 tables, some of them were big, however the view resulted only ~100 records.
I materialized the view, collected statistics on them. This resulted a brand new execution plan which was now accurate.
Work from a saved list
Follow the workflow
Document the concept before implement
The next post is about: “Testing, Delivery and Evaluation – enjoy the results“
First of all, we got to make clear the business requirement of the project. This time we got to finish the load process of a specific data area in the Data Warehouse 1 hour earlier than before.
The requirement implies the following conclusions in this case:
All scripts that typically run after 2pm are out of scope, since our target typically finishes at 2pm actually
Because the target system is CPU barred, we will focus on “Impact CPU” as the key resource, by which we search the top consumer queries
We focus on long running SQLs either, principally those on the critical path of targeted data area load (may be low consumers on abolute rank, but the are blockers)
Define the scope
Find good candidate SQLs – before agreeing the scope
What SQLs are good candidates? Which:
Consume much/skewed resources
Here is the trick!!! Recurring queries, what run regularly. Finding query recurrence is not a trivial “grouping query” operation. Date or other constant values often change in the SQL text along the time. These textually differing queries are the same, from the aspect of regularity.
I use the PRISE Tuning Assistant (PTA) tool for finding the candidates. The method is demonstrated in this short video:
What you can see in the video:
I choose one whole week as the basis of the examination
Search top 100 queries for every day on the week (Advanced search)
Find the repetitive top consumers – Recurring query view
Choose one instance of query from each interesting recurring SQL
Build an Excel list of them (Summary view -> Export function)
The results are done in a few minutes, and give a good impression about the query pattern. Reflects the affected total resources (which is the theoretical limit of the gain) and show the “low hanging fruits” immediately.
Propose and agree the project scope
Step 1 gave good insight to the possibilities. Next step is to define our project goals and scope, to limit the expenses and time frame. This means that we have to refine our initial list. I use PTA and Excel to this. Take the PTA’s exported querylist (sorted by the chosen resource type decreasingly), and examine the items individually with the PTA and try to find out which queries can be easily optimized, choose the top 10-20 of them.
When choosing queries, mind these aspects:
Which queries are on the critical path of the targeted data area
Other development dependencies
Do not touch an SQL being developed or soon to be dropped
Budget and timeframe available
When deciding the number of queries
As a good rule of thumb, choose no more than 10-20 SQLs for a project (depending on your time and cost limitations), and divide them into delivery drops (3-5 scripts into each)
This way the whole project will fit in 2-3 weeks, and delivery (incl. testing and go live) will spread evenly across the timeframe.
Please note that the differentiated go-live dates must be concerned when measuring the results.
As a rule of thumb I generally calculate with average 80% resource reduction (Originally 100 resource units reduce to 30) when planning a project, to be on the safe side. My personal average saving rate is over 90%, this project achieved an extreme 98% gain on impact CPU of the affected queries.
Save a “before” report
The scope elements should be documented in an Excel sheet to easily track them (PTA’s Excel export is an excellent basis of it):
Identifier (Script name, Queryband ID, “Requestnum”, etc), which also helps to find it
Drop ID (in which it goes live)
Average “before” resources
This “before” report will be essential basis for a final “gain” report, which is very important to demonstrate the achievements and measure the ROI of the optimization project. Mine looks like this:
It identifies the place from where the SQL was ran. Good queryband settings will make it accessible from the DBQL directly.
SQL scripts typically have linear structure (no branches). In this case the Request num (statement running order within a session) will uniquely identify the given SQL within a script along the days in the DBQL logs.
The average amount of Impact CPU, consumd by the query during the monitored 1 week period
The average runtime of the query during the monitored 1 week period. Along with the Impact CPU, this will be the reference to measure the achievements of the optimization
The ID of one query within the monitored interval. Enables easy finding the appropriate log record within the DBQL
Teradata Fastload is a very fast and efficient tool to load huge amount of data into a Teradata database. It has good documentation to browse as a reference, however there are a few small things good to know to use it more comfortably – either in challenged situations.
Tip1: How to understand “Error1” table
When a record is failed to load because of any Teradata error (conversion, null problem, etc), the record get into “Error1” table, which is set in the “BEGIN LOADING” section of the control script.
The structure of the “Error1” table is fix:
Which column was problematic
The Teradata error code, lookup in DBC.ErrorMsgs if documentation is not at your hands
This filed contains the record transformed into hexa string format
Unfortunately DataParcel’s hexadecimal codes are unreadable for a human in Queryman or other GUIs.
With this short BTEQ script you can export the original record layout into a file:
.SET RECORDMODE ON
.EXPORT DATA FILE=badrecord.out
SELECT DataParcel FROM <Errortable1>;
Tip2: How to switch off deduplication?
There is no way to switch it off. Teradata Fastload will do a full-record-layout deduplication. No matter we use nonunique primary index (NUPI) and MULTISET table definition Teradata will deduplicate eagerly.
If you must load each records, use a preprocessor script (PERL, AWK, etc), which adds a running number column to your data. This way there will be no identical records at all, surely.
Tip3: Deduplication vs. “Error2” table
Why do we have “Error2” table, if Teradata Fastload always does a deduplication? This error table contains those records which passed the full-record-layout deduplication, but failed at the unique primary index (UPI) duplication check. So the data contained other record with the same PI value, but differ in one or more other columns. To load those records either, user NUPI or NO PI table.
Tip4: What is the good “Sessions” number?
SESSIONS is a parameter to be specified at the “BEGIN LOADING” section. Can fall between 1 and “Number-of-AMPs”. The bigger is the number, the better is the loading performance, but the longer is the logon time. So for a small file use 1, for large number of rows, use the maximum allowed. For medium files I generally use 10.
If a lot of small different files to different tables are to be loaded, mind using other technique, since Teradata Fastload will not have advantages in this league.
Tip5: Table is “Being loaded”…
When a Teradata Fastload session is stopped for reaching error limit, the target and the error tables cannot be accessed due to locking:
Error 2652: Operation not allowed: <db>.<table> is being Loaded.
Run this Fastload script to release the locks and get access to the tables:
BEGIN LOADING <db>.<table>;
After this script finished the error tables are readable to identify the load problem root causes, see the “Error1” table tip above.
The Teradata database is a pretty expensive box, which serves a lot of people in the company – in a direct or indirect way. The efficient utilization of its resources is essential, from both financial and user satisfaction aspects.
This series of posts will follow a real-life Teradata optimization project’s methodology, steps and results, with this achievement:
Levels of performance optimization
We can typically apply 3 levels of Teradata performance optimization:
Meaning: Analyze user requests and system activity, then refine them, eliminating unnecessary activities.
This level checks if the executed processes are really needed, and they do what the user really needs. Eg: eliminate unnecessary data loads or reports; limit a report to a shorter period
Big gains possible w/o any trade-off
Comprehensive analysis (incl. find and interview appropriate persons)
Meaning: Pick individual SQLs and optimize its execution
This approach finds the top resource consumer SQL statements and reduces the resource consumption, while produces the same results.
Eg: Change PI; add/refresh statistics; rewrite SQL
Low risk (small affect, good testing possibilities)
Needs SQL level analysis
Meaning: Tune low level system (RDBMS, OS, Hardware) parameters
Tuning on infrastructure level is a global intervention regardless of users/business areas or importance
Miracle in some special cases
High risk (affects the all system)
I prefer and recommend SQL level optimization (b.) in the beginning, because it delivers high value along with low risk and scaleablity of project. Later on can the others come into focus.
Goal of SQL optimization
The goals of an optimization project can be different. Some examples:
“Finish load procedure until 6am”
“Present my report 2 hours earlier”
“Reduce CPU load by 20%”
“Let hardware extension postponed by 6 months”
The goal of our project was: speed up the “load batch” process, in order to let a specific identified data area load finished and available for further usage 1 hour earlier than before.
The key input for an SQL optimization is the information about what exactly happens:
What, when, how:
Which SQL statements ran, when, what steps were executed, how much resources did the consume with what skewness, etc.
All these can be found in the DBQL tables of Teradata (dbc.dbql*)
Switching on them will not cause significant extra load, but it is required to archive/purge them regularly (see PDCR tool in Teradata package).
Teradata Administrator enables to switch logging on (Tools/Query logging…) and one can check the actual settings:
select * from dbc.dbqlruletbl;
Explain/Obj/Sql/Step flags should be set to ‘T’, others are not necessary. Global (1 record with UserID: 00000000 ) logging is recommended.
If continuous logging is not possible, logging should be switched on 1 week before starting the optimization project
All executed SQL statements can be flagged with several information elements regarding the environment, job, script, version etc. which helps identifying a specific one.
The queryband data will appear in the DBQL logs – if the running environment is configured to flag the queries being run.
Querybanding is not essential for an optimization, but helps really a lot to identify the exact point in the data warehouse ecosystem, where the modifications should be done (which job/jobstep/script).
What are the environmental conditions?
Eg. Data volumes, statistics, etc.
This information is available in the data dictionary (DBC tables and views), but only as an actual – not for the time the statement run.
Most of the cases it is OK for us, but this property needs to be kept in mind.
When all these are finished, we are ready to do a successful performance optimization project.
Teradata Fastexport is a nice, powerful tool to export mass volume of data to file. Its default execution flow is the following:
Apply locks on the affected tables
Execute the SELECT statement
Places the result in a SPOOL
Exports data from the SPOOL
It has an advantage: the release of locks will happen immediately after the result data is copied into the SPOOL. Terefore need not to wait until the data is written to the export file outside the database – it is typically a single process operation, can last for a long time.
The price to be payed for this advantage: all the exported data must be copied first to the SPOOL. When the SELECT is a simple operation (no join, sort or other extras), the copy phase can be saved, which can approach 50% of the total resource and time consumption.
The NOSPOOL trick
For these simple cases one can use the NOSPOOL/NOSPOOLONLY mode.
While NOSPOOL is a soft switch, which is ignored if not applicable, NOSPOOLONLY will make statement failed if the nospool option is not available for the given query.
In NOSPOOL mode the database will not make a copy of result data. Instead it directly passes the data to the client, which has another benefit: the execution of the SELECT runs dominantly parallel with the file write operation, instead of successively like in SPOOL (default) case.