Avoiding Product Joins

How to eliminate Product Joins

What is product join?

Avoiding Product Cross Join in Oracle
Product join is one of the implementation methods of an SQL JOIN operation.
Do not mix up with cross join (Cartesian product), which is one type of SQL joins.

SQL join types, eg.: inner join, left outer join, full outer join, cross (Cartesian) join
Join implementation types, eg.: nested join, merge join, hash join, product join.

Product join (of tables A and B ) is the most simple method of join implementation:

  • Produce each of <A;B> record combinations, say take each records from A singly, and match it with each records of B one-by-one.
  • Test the join condition on each produced <A;B> record pairs, and eliminate those combinations where the condition fails.
The two steps are often combined, and the “testing phase” is executed right after a record combination is generated, and the non valid combinations right after dropped. This saves a lot of temp space.

Why don’t we like product joins?

Well, it has a really bad reputation. It is slow, stuffs CPU, etc.
Yes, it usually is, does. It is the brute force method for executing a join, with costs in order of N*M (where N, M are the record numbers of the joinable tables)

Indeed there are situations when it is the best choice, or the only feasible way.

When is it good or necessary?

Please note that product join is the method what is always applicable, independently of all circumstances.

Good

Product join is typically simple, dumb and slow algorithm, this is why we do not like it, but has a very important advantage: requires no pre-processing.* This is why we LIKE IT:)
If we have to join a really large table to a very small table (couple of records) product join is far the most effective method, since the sort of a very large table ( order of N*logN ) can cost a lot, while joining to 1-2 records is really not a big deal.

Necessary

There are join situations when the only way to go is the product join. Why? Because of the join condition. The “clever joins” (merge, hash) require some information and/or condition that somehow enables to cheat the A x B comparisons: reduce them to the ones that really necessary, and be done in a more effective manner.

* OK, in Teradata this means: only requires that the matchable records from both tables must be on the same AMP. This implies the “small” table to be duplicated to all AMPs.

Merge join example


from A
join  B on A.customer_id = B.customer_id
and A.trx_dt between B.eff_dt and B.exp_dt

  • Customer_id clause is in AND condition with the others
  • Customer_id is selective enough that hash(customer_id) can reduce the comparisons reasonably
  • Note that A and B must be sorted (re-sorted) by the hash of customer_id

Product join example


from A
join   B on substr(A.telephone_no,1,B.prefix_length) = B.telephone_no_prefix

  • There is no comparison reducing partial-condition
  • Note that neither of the tables required to be sorted in a specific order.

Unavoidable product joins

  • Non-eqality condition
  • Function used (eg. substr())
  • Dependent expression is used (eg. A.x+B.y = A.z)
  • Cross join: intentional Cartesian product

Avoidable product joins

Data type mismatch

The merge join example above works only if customer_no in A and B tables have the same “style” data types, since their hash value will match only in this case. Say hash(13674) <> hash(‘13674’), however integer is compatible with decimal, and char is compatible with varchar.
Pay attention on data type consistence during physical data modeling.
  • Use domains to eliminate the possibility of mismatch
  • Align to used data types when defining temp tables, or use “create table as …” statements
  • If you cannot avoid mismatch, relocate the necessary data to temp tables with proper data types during processing.

OR condition

Let’s assume the following join condition:
select ...
from A
join  B on A.col1 = B.Col1
OR 

           A.Col2 = B.Col2

This is equivalent, w/o compulsory product join :

select ... 
from A
join  B on A.col1 = B.Col1 

UNION 
select ...

from A
join  B on A.Col2 = B.Col2

Missing/stale statistics

As I mentioned before product join is the most effective join between a very large and a really small (couple of records) table. If the optimizer thinks that a table is pretty small, but it is not indeed, it may choose a product join in all good faith, misleaded by a stale or missing statistics.
Define and keep fresh those statistics by the optimizer can determine the size of the joinable record sets  properly.

How to find avoidable product joins

It is not trivial to list the avoidable product joins. Practically all product joins are required to be examined one-by-one and judged to be avoidable or not. And if avoidable, what to do for.

I strongly recommend to use PRISE Tuning Assistant for both finding the product joins and analyzing the possibility and necessity of elimination:

  • List top consuming queries with product join(s)
  • Check the PROD JOIN steps: which tables are processed that way
  • Check those join conditions for cases described above

What to do if cannot be avoided?

In this case I recommend to try the decomposition, described here.
It can help reducing the number of comparisons, saving CPU and runtime.

 
Have a successful optimization!
 

The “No more spool space” Error Explained

Why do I get “No more spool space” error?

This is the most familiar error message in Teradata world:
“Failure 2646 No more spool space”
What does it really mean, what is it caused by?
Let’s get back to the basics.

What is spool space?

Spool space is a temporary area, that can be used to store the part-results during query processing, as well as volatile tables. All free space in the database that are not allocated by PERM data, technically can be used for spool area, since a PERM data does not want to allocate that area.

Each database users may have a “spool limit” that restricts the user to allocate more spool area at a time, than its limit. Keep in mind that all active sessions of a username must share the spool limit together.

Teradata is a massive parallel system, therefore the spool limit must be interpreted on AMP level:
Eg: 100AMP system, a user with 10G spool limit means: 100M spool/AMP

What is spool space limit good for?

This limitation method is a quite simple way to cut those queries from the system that would suck too much resources. There is no exact relationship between high spool usage and ineffective query, but statistically the correlation is high.
Practically: a bad query is being kicked off before is consumes too much resources unnecessarily.

No more spool space scenarios

System ran out of spool space

This is the most rare situation, forget about. There are too few free space on the system, but this situation used to be avoided by defining a “SpoolReserve” database, where no objects are created, this way that area is always available for spool.
If many “big spool limit” users run high spool queries parallel, then this rare situation can yet occure.

Multiple session of the user are active together

This is a quite rare situation also. Check the active users from dbc.sessioninfo.

Volatile tables

All existing volatile tables reside in your available spool space, reducing the available. If you create many, and even with skewed distribution, you can stuff your spool up. Choose “primary index” carefully, when defining volatile tables also.

Improper execution plan

These are the >90% of cases that cause the “No more spool space” errors. Let’ see how:
  • “Duplication to all AMPs” of a non-small set of records
    The root cause is typically missing or stale statistics. Either system thinks that much less records will be duplicated than the real (sometimes billions of records get in this kind of spools), or knows this exactly, but on the other branch of the query there are more low quality estimations, and this execution seems to be cheaper.
  • Redistribution of records by a hash that causes skewed distribution
    Check the corresponding blog post: Accelerate skewed joins
  • Retrieve huge amount of records into spool (locally or redistributed onto the AMPs)
    Specific query structures imply this execution, like: join to a view that “union all”-s big tables.
I suggest to use PRISE Tuning Assistant to identify what is the problem. It spectacularly displays which execution step falls in the problems above.
Increasing the spool limit will not solve the problems in the most cases. 

Too big task

Sometimes a given SQL query requires big spool area to be performed, even with the best execution plan.

This is the only case when raising spool limit is the solution. But first you have to understand that the task is really big. PRISE Tuning Assistant is a good tool for identify this in a minute.

Teradata performance optimization project
V. Multi Value Compression

Compress your Teradata by MVC

Focus on the storage

We already finished with the SQL tuning part, where the resource gain was in the order of 100k USD (projected onto a 1 year operation) within a 2-3 weeks project.

There is another really good opportunity to increase our achievements in savings in the same order of magnitude: the data storage.

Why do we discuss it as performance optimization?
You will be convinced later.

Teradata DBs store huge amounts of data, sometimes Petabytes. The efficiency of storing and processing this incredible volume of data seriously affects the Data Warehouse costs. Fortunately Teradata includes marvelous features for optimizing this area either.

This post will show a project, gained 16% on storage space, in turn manual compression was already applied.

Beliefs regarding compression

There are a lot of fears regarding compression, coming from the old days.
Compression is slow“, “Compression is unsafe“, “We use compression only for the archives“.
These beliefs have some basis, but it is time to re-assess them.

  • Speed
    The biggest fear of the compression is speed. Why do we discuss a method regarding performance, which slows things down?
    Because compression not only slows down. Some things speed up, even though it is hard to belive. Anyway, a lot of new, effective compression algorithms and codecs has been developed so it is time to consider compression in the production stack
  • Speed II
    The speed of compression and decompression is not equal, the second typically less costly. The write and read access frequency significantly differ in a Data Warehouse environment – in favour or reads.
    When we access the data the amount of I/O required, depends on the number data blocks the table allocates. If the data is compressed, we have to access far less data blocks with I/O operations. I/O is the most expensive resource in the Data Warehouse.
    We do not have to compress everything. There are big and rarely used data elements, the compressed storage of them are very advisable.
    We do not have to decompress everything. Teradata is very clever to decompress only the required columns. In addition it will decompress data only at the appropiate stage of execution, until then data is “travelling” compressed, consuming less spool and cache memory.
  • Reliablity and comfort
    Embedded compression methods will hide all these kind of things from the users. Everything can be handled the same way as before compression: fully transparent. User will not even catch sight of compression exists.
  • Types of compression
    There is life over the “ZIP”. There are compression methods that require way less resource for encoding and decoding than the “ZIP” style algorithms. They are efficient in special cases, and will achieve less compress rate, but they are the ones we are looking for.

The Teradata compression

Teradata offers 3 types of compression

  • BLC – Block Level Compression
    This is like a “ZIP” for each data blocks. All the data is compressed on a high rate. The whole data has to be decompressed to reach any data within. Very costly, and requires “compression card” hardware within the Teradata servers for good performance.
    Very simple, got to be switched on/off. Can be set on system level or finer grains.
  • ALC – ALgorihmic Compression
    This is defined on table columns. Several compression algorithms are offered, must be chosen manually. Got to know the data and the algorithms, which are good for which types of data, what are the resource costs.
    ALC is useful in special cases (like long URLs), but the usage is not really wide-spread
  • MVC – Multi Value Compression
    MVC is the magic in Teradata, regarding the efficient storage and processing. It is a “frequent value” type simple compression. You specify those values which occurence is stored on max 8 bits (1byte), regardless of the length of the field value. Tereadata will solve the internals – the encoding and decoding – the usage is full transparent. SQLs are not affected, and query execution plans do not change!

    But you are responsible to define the values to be compressed. Bad choice may result bigger table as the original, however good choice results an average 50% compress, or even more.

    The really cool advantages are:

    • Fast setup
      Can be switched on with simple* SQL operations, and 100% transparent, so no other modifications or even tests are required.
    • Fast operation
      It can be used without additional hardware with good performance. Quite slight CPU overhead will come rather on encoding.
    • I/O saving
      As less data blocks the tables have, the less data blocks to be moved when the data is processed. This is rationally reflected in the table scans’ I/O consumption. Queries may run faster
    • CPU saving (!!!)
      How can it save CPU? Through the caching. There will be significantly less data blocks to be processed during the data reads. This enables less cache activity (which is quite CPU intensive operation) Queries may run faster. Even those, which are not affected by the compress – by having better cache hit ratio. This is the way we gain the CPU back.
    • Uncompress only when required
      Teradata only decompresses the filed values when it is really required. MVC compression in charge throughout the spools either, and the value is uncompressed when it really needed, sometimes many execution steps later then the record is read from the table – opposed to the BLC.

    * Simple means ALTER TABLE commands, but the exact parameters got to be chosen carefully.

MVC is the quickest and cheapest way to release free space in a Teradata Database – without losing any data.

The compress optimization project

This section will present a real-life compress optimization project, which gained 16% of storage space, while did not require more than a couple of man working days – and the PCW software.

This amount of storage resource would cost in the order of 100k USD to be purchased on the market. In addition, installation of new hardware components implies system outage.
The project discussed below had no database downtime at all.

Scoping

The project scope in this case is quite simple: which tables to deal with. The DBA typically has a pattern in mind which tables can be good candidates:

Big tables
Which are not in system or temporary databases
Not working tables (naming conventions…)
Those ones that haven’t examined/optimized for a while
Not planned to drop the next month
And are not re-created every day by the ETL process
And so on…

A lot of information which all are accessible from the DBC or some custom repository or from can be hardcoded in an SQL.

PRISE Compress Wizard

has a customizable SQL based table-selection feature to have a tailored one-click tool that will present your custom preference tables every time.
Since it is SQL, you can use any tables in the database, like:

  • DBC tables
    Table sizes, owner, creation, last use
  • DBQL tables
    Usage frequency
  • PCW repository
    Last optimization time
  • Custom tables
    Any infromation previously set, like exclusions, inclusions.


Fig. 1 - Custom SQL based table selector
Fig. 1 – Custom SQL based table selector

Analysis

When the scope (=table list) is fixed, the next step is the analysis: what would be the optimal MVC settings, and calculate how much space gain can be reached with.

It is a straightforward, however exhaustive work – if done manually. Statistics calculation for ten thousands of columns, optimum calculation with tons of exceptions and version-dependent knowledge. SQL creation and measurements. A set of tasks born to be performed by a dedicated software.

PCW executes all these tasks on even thousands of tables, without continuous supervision. Works on sample data, which is good enough for a very precise calculation, meanwhile saves huge amount of CPU and I/O.

The Analysis was running like normal reports, had no interference with the daily life of the Data Warehouse.

The results got into the PCW repository including the actual compression level and consumed CPU and I/O resources (fetched from the DBQL logs automatically if required). Excel report was exported by the PCW GUI, which is the basis of the decision paper: which tables to be involved in the implementation phase.

Yes, PCW delivers a very accurate estimation of reachable gain (GBytes), before any modification is done.


Fig. 2 - Result of the Analisis phase
Fig. 2 – Result of the Analisis phase

Decision

When the analysis report is in hands, the only thing is to make a selection of the tables: which ones provide good savings possibility.

The current filtering was:

  • The savings reach 1G on a table
  • The gain reached 10%
    The less is the percent, the more data must be processed for the specific gain. Below a limit it is not worth, except for the very big tables.

In this project we selected 178 tables to implement.

Implementation

The implementation is the most exciting part of the project: the gains are realized.

The Multi value compression setup on a table is pure SQL procedure: ALTER TABLE or CREATE TABLE + INSERT SELECT commands can execute it.

The tricks are in the details:

  • Environmental conditions:
    Statistics: some Teradata versions do not allow ALTER if statistics exist
    Indexes / constraints: must be re-created if CREATE + INSERT is applied
  • Business continuity
    Is the data accessibility ensured continually? What level of locking is applied?
  • Process must be 100% safe
    No data loss or inconsistency is allowed. Load must be blocked during the operation. Revert possibility in case of emergency.
  • Logging
    Everything must be logged and measured to be able to report the achievements. Even if thousands of tables are involved.
  • Order of execution
    If the space is very limited: what is the optimal sequence of optimalization. The gained space can be rolled ahead to give place for further tables’ optimization

This seems not that simple, however PCW will do these all. Meanwhile there is no black box: the transformations are done by BTEQ SQL scripts, that can be reviewed before executed – from the PCW GUI or in the company standard patch procedure.

These scripts contain all the above: safety checks, locks, statistics/index maintenance and logging, DBA does not need to modify the scripts they are ready made.

The implementation can be done even table-by-table as it fits to the system’s patch timing schedule. The storage gains will arise table-by-table. The results can continually tracked on the PCW GUI.

The analysis and the implementation are executed with different users with the appropriate access level for the highest data safety.

In the project we used the company’s standard patch process: created BTEQ scripts and delivered to the DBAs to be checked, scheduled and executed.


Fig. 3 - Generating the optimizer BTEQ SQL scripts
Fig. 3 – Generating the optimizer BTEQ SQL scripts

Evaluation

When implementation is ready all data is available for a comprehensive “achievements report” including space gained, data volume touched, CPU and I/O consumed and time span of the project.

This was one of the best ROI project in the Data Warehouse: some day of human work in total, almost a terabyte of free space, which is ~10% of the total database volume.
We reached this on a system, which had a comprehensive, however manual compress optimization before.

How much can you buy a GBytes of storage space in Teradata?
Enter it into PCW and it will calculate how much you saved with the project.


Fig. 4 - Realized achievements of the project
Fig. 4 – Realized achievements of the project

Summary

  • Storage space
    16% gain on the optimized tables, resulting 937GBytes of database free space increase
  • Runtime
    The Analysis phase ran 42 hours, scanning 320 tables, allocating 10TBytes in total
    The Implementation phase ran 22 hours, transforming 178 tables, originally allocating 5.8TBytes
  • Resources
    The implementation consumed 79 CPU hours and 692M I/Os
  • Gains in $$$
    The space gained was needed by an urgent development. Compress optimization solved this without HW extension. That storage space released by the project could have been bought for ~144k USD assuming 1Gb is $150.

 

The next post will demonstrate how PCW supports each phases of the compress optimization project.

Stay tuned for the demo!

 

 

Teradata performance optimization project
IV. Testing, Delivery and the Results

Successful Teradata optimization with PTA

Results are closing…

We arrived to the final part, soon we’ll know all the details of how we managed to reach performance gain of a remarkable 30%.
We know already:

 

The only things ahead are to test, go live and harvest the glory 🙂 Let’s go!

Testing methodology

Our concept handles the target SQL as a “black box”, which gets inputs and produces outputs. We pry into details only if necessary. We want to modify this box to produce the same outputs for the same inputs, while consuming less resources / running quicker. Therefore the testing methodology will also fit into this schema: let’s put the same input as the original SQL got, and compare the outputs.

What input to be used?

Theoretically, we should use input data, which covers all “special cases” and contain representative data, regarding volume and distribution of all aspects. Sounds very complex. For real life testing, I typically use exactly the same data as a daily execution of the target SQL got. Usually it is covering enough, and the best representing the live environment, since it is it. Take care: if different days produce significantly different input data (eg. bill run days / non bill run days), use the data from “heavy days”.

How get the inputs?

It depends.
What type is the SQL? Report (persistent input) or ETL (input usually deleted already, depending on preserve/delete of temp data).
Is daily batch or continuous Data Warehouse load? The first case ensures a stable data environment for during the day, while the second doesn’t.

  • If production input data is available, copy it and run the optimized SQL with it.
  • If not, reproduce it (from earlier available phase of the dataflow, sometimes from “stage” data)
  • If not reproducible (continuously changing DB), save a “snapshot” as a stable input for both the original and the optimized SQL

How to produce the outputs?

It depends again.
We need two outputs: original and the optimized SQL’s.

Original’s
  • If original output is still available, copy a snapshot.
  • If original is not available, produce it with the original code, but be sure to use the same input as optimized SQL will use.
  • If we created an input snapshot, produce output based on it.
Of the optimized
All the way we got to run the optimized code. Be sure that the same input was used like the “original” SQL had.

How to compare the outputs?

We branch if the output is a table or a file.

File

In this case the content must be sorted, deterministic (based on unique value set)
Unix “diff” command will give the answer.
Before using “diff”, “ls -la” will tell the file size. If the two files differ in size, the result cannot be the same, but it is not enough.

Table

We have precisely the same table structures, it is a must (take care of it, it is a part of the concept of not just the testing, but the optimization either!).

4 steps check
  • select count(*) from “orig_out_table”;
  • select count(*) from “opt_out_table”;
  • select * from “orig_out_table” MINUS select * from “new_out_table”;
  • select * from “new_out_table” MINUS select * from “orig_out_table”;
Evaluation
a. and b. must be equal
c. and d. must be empty both

If all applies, the two sets are equivalent – OK
This method handles the NULL values also (null equals null oppsite to ‘=’ operator)

The Delivery

A performance optimization is the same as a regular functional development on the Data Warehouse.

  • It is highly advisable to align to the standards, even if the issues are “very urgent”.
  • Documentations and version control is mandatory, only patches are allowed( no direct SQL DDLs). As the company standards and methodologies declare it.
  • Documentation is very important for us either, since the output of the performance optimization is the amount of resource gained, which is sometimes not as obvious as a new function.
    Without documentation we are unable/challenged to demonstrate our achievements.

The Drops

Each drop will contain its specialities. Some has long running patches, others need additional temp or perm space. Pay attention to collect them in advance, do not frustrate the DBA with a lot of failures during/after the installation of the optimization.

Our goal is to reach satisfaction, and leave good impressions, since performance optimization is a recurring process. So let’s describe all the required (space, permissions, external activities, like archiving) and expected (what goes faster, outage of a table for a while, etc) things.

Last but not least: a very important thing for us, after each drops went live: document the exact go-live time and script versions to be able to track back everything. Including other developments’ possible cross-effects.

Special effects

The basis of SQL level optimization is to optimize “isolated” components of the system. However it cannot be really true always.
The Data Warehouse is a very complex set of dependent components and processes. If I modify something, it will have influence on other things through more ways:

  • Scheduling
    The ETL and reporting processes typically organized by some dependency system. If we optimize one or more components significantly modify their runtime, see our results below. The timing of the processes can be significantly modified this way: some things will fall parallel to each other, while earlier did not ever meet, and vice versa. If the dependencies are not defined correctly, and only some long-running tasks ensured (in practice only) the availability of a result data for a consumer process, the optimization of the long task will eliminate this delay: the dependency will not apply.
  • DB Objects

    When we optimize Teradata queries, we use techniques, which have effects on other processes. These are for example:

    • Table PI changes
    • Defining secondary/join indices
    • Collecting new / refreshing old statistics

    After we delivered these kind of modifications, we got to check those SQLs that use the affected object: whether their performance stayed OK. PTA provides good help for this with the “Object usage search” function.

    I have an additional advise regarding statistics change. When we create a new statistics we got to look after the regular refreshing on it, principally if it covers a fast changing, low cardinality (typically partitiong) column. Otherwise it will turn to a time bomb later…

Checking results

After a drop went live, the next 1-2 days it is highly recommended to check the DBQL logs if everything goes as planned.

Though we test everythigh very carefully, performance is not a 100% stable thing. The Teradata Optimizer will decide regarding the execution plan right that time the query is executed.
And things can change. Data distribution, random-amp-stats, other settings can change between the testing and the go-live. It is not a tragedy, but assume some percentage of iterations.

Evaluation

Measurement methodology

The measurement goes similarly to the “before” round: identify the log items and take a one week average.

However the identification is a bit different. While the “before” is always one SQL statement per item, the “after” may be several. Sometimes the optimized version is broken down into more SQL statements. The additional statistics, data preparation steps also got to be involved. Therefore we got to collect manually, which are the exact successors of the tuned query.


PTA helps again
: search the session of the first run of the optimized query, and all the SQLs will be there.
Collect the appropriate ReqNum (request number: unique ID within a session) of the sussessor SQLs, and register in the documentation.

The next days it is easier: use the PTA “Advanced search”. Fulfill the queryband and the list of the ReqNums, separated by comma (,). The result will contain all the items you need.
Switch to the “Summary view”, and you will get the resource totals broken down on daily level. This way a whole week’s data is collectable in one round.

Results of our project – facts in numbers

Let’s see what could we achieve in our project. I summarize the numerical results in this table:

Results of Teradata performance optimization
Fig. 1 – Optimization result

The reasons of preformance problems were these:

ID Reason Solution (high level)
1 SQL breakup Store partial results (filtered table) in volatile w/appropriate PI
2 MissingStat Collect stats on join columnset
3 HashCollision Patch to MULTISET, no PI change, dedup replaced to GROUP BY
4 MissingStat Collect stat on a filter column (low cardinality)
5 MissingStat Collect stat on a filter column (low cardinality)
6 HashCollision Patch to MULTISET, no PI change, no dedup needed
7 SQL breakup “Hidden product join”. Store partial results with good PI
8 PI change Target table PI change
9 OLAP distribution The “ORDER BY” clause added with a high cardinality postfix (distribution)
10 SQL rewrite “OR”-ed join condition replaced with “UNION”
11 PI change Target table PI change
12 HashCollision Patch to MULTISET, no PI change, no dedup needed
13 PI change Source table PI change
14 SQL breakup Store partial results in volatile w/appropriate PI, collect stat on them
15 SQL rewrite Product Join decomposition, as described here
16 SQL breakup Store partial results in volatile w/appropriate PI, collect stat on them
17 PI change Target table PI change – extreme gain
18 SQL breakup Store partial results in volatile w/appropriate PI, collect stat on them

Summary

  • Impact CPU
    From 350h 59min decreased to 6h 46min, which is 344h 13min = 98% gain
  • Runtime
    From 7h 7min decreased to 0h 49min => which is 6h 18min = 88% gain
  • System load
    Orig daily average of total Impact CPU was: 1112hours.
    The gained 344hours mean 344/1112 = 30,9% gain of the system total
 

Conclusion:

  • Optimized 18 scripts
  • Gained 98% Impact CPU on them
  • Reduced system CPU load by 30,9%
  • The total project span was 4 weeks and used 12 Man Working Days human resource
 
 

Thanks for following us. TRY THIS AT HOME 🙂

The next post is about a Teradata compress optimization project.
16% gain on a system, where multi value compression is already set.

Stay tuned for the details!

Teradata performance optimization project
III. Optimizing a query

Optimizing a a query with PTA

Concept of optimization

Our concept of optimizing a query is quite similar to the way we find the good candidate queries. Instead of deeply understand the whole picture, we try to find the weak points and explore its reason(s). For this we need the appropriate information prepared and a good highlighting, which helps us to focus on the point. We do not want to understand each query’s meaning and SQL text, except it is unavoidable for the tuning. It will save us really a lot of time at many cases.

Similar to the “selection” phase we will use the PRISE Tuning Assistant for the individual query tunings. Let’s see how.

The workflow

Save the ID(s) of the query(es)

When tuning the selected queries, we will return to one, several times. Validate an idea, check things, etc. PTA enables to save a selected list of IDs to CSV, which can be loaded any time into the search window, and the selected queries will appear on demand.

Check typical patterns

We have a standard way we examine a query and test against about a half a dozen of basic patterns – this part is discussed below. If some of them matches, we will get off the deep understanding the query. Otherwise PTA will help to understand what is exactly happening.
Sometimes it is worth to check the idea on different instances of the SQL (other days’ execution).

Document the idea and solution

It is very important to write down what we’ve found and what is the proposed solution. I typically do this iteratively. First I write my first impressions for each target queries, and then I go back and refine them. At the end it is important to have a exact, detailed specification of what to do with the SQLs. I use a word document with a standard format. Each query has a chapter with standard structure:

Query ID: Id in DBQL to find the query
Job/Step name: Place where SQL is issued
RequestNum: Sequence within the session, to identify
Reason: Bad performance case
Desc: Detailed description of bad performance reason.
Action: Detailed description of what to do. It is the specification of the development.

I usually check the concept on a test environment, before I finalize the “Action”. It is important to have a similar environment for validation like the production, anyway same result is not guaranteed.

Implement the modifications – in drops

When we know what to do with each of the SQLs, we can assign them to delivery drops, negotiated with operations and developers. I typically use these aspects:

  1. Big gains first
  2. Same job in same drop
  3. Big operations (patches) span evenly
  4. Number of queries span evenly

Query analysis – Typical patterns

The first general rule, when optimizing a query is:

THERE ARE NO ALMIGHTY RULES!

Only high probability rules we have, but they are good enough in most of the cases.

My init checklist on a query

  1. “Peakyness” check
    I check if there is a “peak step” which runs far more than others, or the total runtime / resource consumption is divided into more dominant steps. For this I check the “Duration%” field of the steps. If the yellow background step (longest running) has 50%+ we have a peak-step query.

    Fig. 1 - Peakyness check
    Fig. 1 – Peakyness check
  2. Resource skewness check
    Let’s see the query’s resource totals, and the skewness. If it is >50-70%, it is most often a skewness problem. They are caused by a non even distribution of data or tasks to be done.

    Fig. 2 - Resource skewness check
    Fig. 2 – Resource skewness check
  3. Object skewness check
    I browse through the objects take participate in the query, and check whether we have big ones with significant skewness (>10-20%).

    Fig. 3 - Object skewness check
    Fig. 3 – Object skewness check
  4. “Peak step” examination
    The goal is: Learn what is the long run caused by

    • Are the sources skewed?
    • Is the operation skewed?
    • Is the result much data?
    • Is the result placing skewed?
    • Is the big result “duplicated-to-all-AMPS”?

    Fig. 4 - Peak step examination
    Fig. 4 – Peak step examination
  5. “Peak step” drilldown
    I start from the longest running step (typically the most resource hungry)
    Catch an overview and browse through its predecessors.
    Looking for bad Fidelity*, highly Skewed or other suspicious things. All are vividly highlihgted by the PTA
    , I just got to catch sight of it.

    Fig. 5 - Peak step drilldown
    Fig. 5 – Peak step drilldown
  • Fidelity: It is the ratio of estimated and in-fact reow count of a spool. Bad values indicate missing/stale statistics or too many join steps.

Checklist conclusions

When I ran over the checklist above (~ 1 minute for a very complex query with some experience), I have a good impression of the case:

  • Step-peaked?
    If peaked, probably it is easy to get rid of the problem. If not, maybe got to deal with another query, if possible – because we got to deeply understand what is happening.
  • Skewed processing?
    If skewed, then data is skewed or an evil hash collision exists.
  • Skewed source?
    Skewed source table(s) often imply skewed execution, tables’ PI may need revise.
  • What type of operation runs long?
    This is one of the most important ones.

Most often patterns

  • Hash collision at Insert / Create_table_as_select

    Symptoms
    PeakStep: MRG, Target table: SET type w/NUPI, often high skew
    Solution
    MULTISET, different PI
  • Big table duplication

    Symptoms
    1. PeakStep: RET, Placing: Dupl, NO confidence, SpoolSize: huge
    2. PeakStep: RET, Placing: Dupl, HI confidence, SpoolSize: huge

    spool used later in a JOIN, where the other source has bad Fidelity (>10..100)

    Solution

    Some statistics are missing or stale. Find the NO confidence or the bad Fidelity branch, and check the “filter” condition in the explain (PTA has a condition visualization).
    If “NO confidence”, then the source table will lack some statistics (on filter condition columns). In case of NO confidence, check the other branch of the join step. There will be some statistics missing.

  • Redistribution resulting skew

    Symptoms
    Step target is highly skewed (>70-80%)
    Solution
    It depends. Check source tables, exclude peak values, break up the query. Typically needs deeper analysis
  • Sliding-window join for many partitions

    Symptoms
    Merge join of “sliding-window” type, more than a couple of partitions involved
    Solution
    Apply exactly the same partitioning definition of the joined tables

Case examples in the project

Hash collision insert

This is a very simple case.

  • Peak step is a MRG (merge the result into the target table)
  • Target table has NUPI, is SET type
  • PI cardinality is low (9M distinct compared to 169M rows)

Fig. 6 - PI cardinality check
Fig. 6 – PI cardinality check

This means that we have a serious number of hash collisions in the table, which will slow down the insert extremely. We have two options:

  • change PI or
  • change to MULTISET

Both have its side effects, actually, we modify to a MULTISET.

Method: since table is not re-created at every script runs, I created a “patch” script to turn the table over to the new structure:

  1. Create new structure as empty
  2. Insert data into
  3. Copy statistics
  4. Drop old table

Exclude peak value

Fig. 7 - High skew after redistribution
Fig. 7 – High skew after redistribution

In this case the symptom is very simple: a RET (retrieve records into a spool) creates extreme high skew (98%). The redistribution is ok, it has a good reason, it will support the next “merge join” step.

However we have one peak value in the join column: ‘X X X X X’, which does not have corresponding record in the joined table, I checked manually.

Solution: I simply excluded this value in the “WHERE” condition. It eliminated the high skew, and the query ran ok, with the same result.

Materialize a joined view

Fig. 8 - Fidelity goes worse
Fig. 8 – Fidelity goes worse

This SQL was very complex (12 tables involved), and tables all had the appropriate statistics. PTA showed, that there was complicated query-rewrite done by the optimizer, and Fidelity gone rather bad as data flowed from spool-to-spool:

There was a view, which involved 5 tables, some of them were big, however the view resulted only ~100 records.
I materialized the view, collected statistics on them. This resulted a brand new execution plan which was now accurate.

Summary

  • Work from a saved list
  • Follow the workflow
  • Find patterns
  • Document the concept before implement

The next post is about: “Testing, Delivery and Evaluation – enjoy the results

Stay tuned!

Teradata performance optimization project
II. How to define the scope

define scope

Current goals

First of all, we got to make clear the business requirement of the project. This time we got to finish the load process of a specific data area in the Data Warehouse 1 hour earlier than before.


The requirement implies the following conclusions in this case:

  • All scripts that typically run after 2pm are out of scope, since our target typically finishes at 2pm actually
  • Because the target system is CPU barred, we will focus on “Impact CPU” as the key resource, by which we search the top consumer queries
  • We focus on long running SQLs either, principally those on the critical path of targeted data area load (may be low consumers on abolute rank, but the are blockers)

Define the scope

Find good candidate SQLs – before agreeing the scope

What SQLs are good candidates? Which:

  • Consume much/skewed resources

    (CPU;I/O;Runtime)

  • Recurring queries

    Here is the trick!!! Recurring queries, what run regularly. Finding query recurrence is not a trivial “grouping query” operation. Date or other constant values often change in the SQL text along the time. These textually differing queries are the same, from the aspect of regularity.

I use the PRISE Tuning Assistant (PTA) tool for finding the candidates. The method is demonstrated in this short video:

What you can see in the video:

  • I choose one whole week as the basis of the examination
  • Search top 100 queries for every day on the week (Advanced search)
  • Find the repetitive top consumers – Recurring query view
  • Choose one instance of query from each interesting recurring SQL
  • Build an Excel list of them (Summary view -> Export function)

The results are done in a few minutes, and give a good impression about the query pattern. Reflects the affected total resources (which is the theoretical limit of the gain) and show the “low hanging fruits” immediately.

Propose and agree the project scope

Step 1 gave good insight to the possibilities. Next step is to define our project goals and scope, to limit the expenses and time frame. This means that we have to refine our initial list. I use PTA and Excel to this. Take the PTA’s exported querylist (sorted by the chosen resource type decreasingly), and examine the items individually with the PTA and try to find out which queries can be easily optimized, choose the top 10-20 of them.

When choosing queries, mind these aspects:

  • Which queries are on the critical path of the targeted data area
  • Other development dependencies
    Do not touch an SQL being developed or soon to be dropped
  • Budget and timeframe available
    When deciding the number of queries

As a good rule of thumb, choose no more than 10-20 SQLs for a project (depending on your time and cost limitations), and divide them into delivery drops (3-5 scripts into each)

This way the whole project will fit in 2-3 weeks, and delivery (incl. testing and go live) will spread evenly across the timeframe.

Please note that the differentiated go-live dates must be concerned when measuring the results.

As a rule of thumb I generally calculate with average 80% resource reduction (Originally 100 resource units reduce to 30) when planning a project, to be on the safe side. My personal average saving rate is over 90%, this project achieved an extreme 98% gain on impact CPU of the affected queries.

Save a “before” report

The scope elements should be documented in an Excel sheet to easily track them (PTA’s Excel export is an excellent basis of it):

  • Identifier (Script name, Queryband ID, “Requestnum”, etc), which also helps to find it
  • Drop ID (in which it goes live)
  • Average “before” resources

This “before” report will be essential basis for a final “gain” report, which is very important to demonstrate the achievements and measure the ROI of the optimization project. Mine looks like this:

BeforeXLS2_blur

Fig1. – Optimization project “before” report

Explanation

  • Job, Script:
    It identifies the place from where the SQL was ran. Good queryband settings will make it accessible from the DBQL directly.
  • ReqNo
    SQL scripts typically have linear structure (no branches). In this case the Request num (statement running order within a session) will uniquely identify the given SQL within a script along the days in the DBQL logs.
  • Impact CPU
    The average amount of Impact CPU, consumd by the query during the monitored 1 week period
  • Runtime
    The average runtime of the query during the monitored 1 week period. Along with the Impact CPU, this will be the reference to measure the achievements of the optimization
  • QueryID
    The ID of one query within the monitored interval. Enables easy finding the appropriate log record within the DBQL

Summary

  • Identify the business goal
  • Collect good candidate SQLs (10-20 pieces)
    • Recurring queries
    • High consumesr
  • Agree the scope
  • Measure the past and document

The next post is about: Optimizing Teradata query using the PRISE Tuning Assistant

Stay tuned!

5 tips for Teradata Fastload

teradata fastload

Teradata Fastload – use effectively

Teradata Fastload is a very fast and efficient tool to load huge amount of data into a Teradata database. It has good documentation to browse as a reference, however there are a few small things good to know to use it more comfortably – either in challenged situations.

Tip1: How to understand “Error1” table

When a record is failed to load because of any Teradata error (conversion, null problem, etc), the record get into “Error1” table, which is set in the “BEGIN LOADING” section of the control script.
The structure of the “Error1” table is fix:

  1. ErrorFieldName:
    Which column was problematic
  2. ErrorCode:
    The Teradata error code, lookup in DBC.ErrorMsgs if documentation is not at your hands
  3. DataParcel
    This filed contains the record transformed into hexa string format

Unfortunately DataParcel’s hexadecimal codes are unreadable for a human in Queryman or other GUIs.
With this short BTEQ script you can export the original record layout into a file:

.SET RECORDMODE ON
.EXPORT DATA FILE=badrecord.out
SELECT DataParcel FROM <Errortable1>;

Tip2: How to switch off deduplication?

There is no way to switch it off. Teradata Fastload will do a full-record-layout deduplication. No matter we use nonunique primary index (NUPI) and MULTISET table definition Teradata will deduplicate eagerly.

If you must load each records, use a preprocessor script (PERL, AWK, etc), which adds a running number column to your data. This way there will be no identical records at all, surely.

Tip3: Deduplication vs. “Error2” table

Why do we have “Error2” table, if Teradata Fastload always does a deduplication? This error table contains those records which passed the full-record-layout deduplication, but failed at the unique primary index (UPI) duplication check. So the data contained other record with the same PI value, but differ in one or more other columns. To load those records either, user NUPI or NO PI table.

Tip4: What is the good “Sessions” number?

SESSIONS is a parameter to be specified at the “BEGIN LOADING” section. Can fall between 1 and “Number-of-AMPs”. The bigger is the number, the better is the loading performance, but the longer is the logon time. So for a small file use 1, for large number of rows, use the maximum allowed. For medium files I generally use 10.

If a lot of small different files to different tables are to be loaded, mind using other technique, since Teradata Fastload will not have advantages in this league.

Tip5: Table is “Being loaded”…

When a Teradata Fastload session is stopped for reaching error limit, the target and the error tables cannot be accessed due to locking:
Error 2652: Operation not allowed: <db>.<table> is being Loaded.

Run this Fastload script to release the locks and get access to the tables:

BEGIN LOADING <db>.<table>;

END LOADING;

After this script finished the error tables are readable to identify the load problem root causes, see the “Error1” table tip above.

Teradata performance optimization project
I. Basics and prerequisites

Teradata performance optimization

Introduction

The Teradata database is a pretty expensive box, which serves a lot of people in the company – in a direct or indirect way. The efficient utilization of its resources is essential, from both financial and user satisfaction aspects.


This series of posts will follow a real-life Teradata optimization project’s methodology, steps and results, with this achievement:

System level gain: 31,6%

Levels of performance optimization

We can typically apply 3 levels of Teradata performance optimization:

Application

Meaning: Analyze user requests and system activity, then refine them, eliminating unnecessary activities.
This level checks if the executed processes are really needed, and they do what the user really needs. Eg: eliminate unnecessary data loads or reports; limit a report to a shorter period

  • Big gains possible w/o any trade-off
  • Comprehensive analysis (incl. find and interview appropriate persons)

SQL

Meaning: Pick individual SQLs and optimize its execution
This approach finds the top resource consumer SQL statements and reduces the resource consumption, while produces the same results.
Eg: Change PI; add/refresh statistics; rewrite SQL

  • Easy analysis
  • Low risk (small affect, good testing possibilities)
  • Needs SQL level analysis

Infrastructure

Meaning: Tune low level system (RDBMS, OS, Hardware) parameters
Tuning on infrastructure level is a global intervention regardless of users/business areas or importance

  • Miracle in some special cases
  • High risk (affects the all system)
  • Complex cross-dependencies

I prefer and recommend SQL level optimization (b.) in the beginning, because it delivers high value along with low risk and scaleablity of project. Later on can the others come into focus.

Goal of SQL optimization

The goals of an optimization project can be different. Some examples:
“Finish load procedure until 6am”
“Present my report 2 hours earlier”
“Reduce CPU load by 20%”
“Let hardware extension postponed by 6 months”

The goal of our project was: speed up the “load batch” process, in order to let a specific identified data area load finished and available for further usage 1 hour earlier than before.

Prerequisites

The key input for an SQL optimization is the information about what exactly happens:

What, when, how:

Which SQL statements ran, when, what steps were executed, how much resources did the consume with what skewness, etc.
All these can be found in the DBQL tables of Teradata (dbc.dbql*)
Switching on them will not cause significant extra load, but it is required to archive/purge them regularly (see PDCR tool in Teradata package).
Teradata Administrator enables to switch logging on (Tools/Query logging…) and one can check the actual settings:

select * from dbc.dbqlruletbl;
Explain/Obj/Sql/Step flags should be set to ‘T’, others are not necessary. Global (1 record with UserID: 00000000 ) logging is recommended.
If continuous logging is not possible, logging should be switched on 1 week before starting the optimization project

What / Who runs the statements?

Teradata provides a great feature for this: Query Banding
All executed SQL statements can be flagged with several information elements regarding the environment, job, script, version etc. which helps identifying a specific one.
The queryband data will appear in the DBQL logs – if the running environment is configured to flag the queries being run.
Querybanding is not essential for an optimization, but helps really a lot to identify the exact point in the data warehouse ecosystem, where the modifications should be done (which job/jobstep/script).

What are the environmental conditions?

Eg. Data volumes, statistics, etc.
This information is available in the data dictionary (DBC tables and views), but only as an actual – not for the time the statement run.
Most of the cases it is OK for us, but this property needs to be kept in mind.

When all these are finished, we are ready to do a successful performance optimization project.

The next post is about how to identify the good candidate SQL statements for optimization.

Stay tuned!

How to speed up Teradata Fastexport

About Teradata Fastexport

Fastexport1
Teradata Fastexport is a nice, powerful tool to export mass volume of data to file. Its default execution flow is the following:

  1. Apply locks on the affected tables
  2. Execute the SELECT statement
  3. Places the result in a SPOOL
  4. Releases locks
  5. Exports data from the SPOOL

It has an advantage: the release of locks will happen immediately after the result data is copied into the SPOOL. Terefore need not to wait until the data is written to the export file outside the database – it is typically a single process operation, can last for a long time.

The price to be payed for this advantage: all the exported data must be copied first to the SPOOL. When the SELECT is a simple operation (no join, sort or other extras), the copy phase can be saved, which can approach 50% of the total resource and time consumption.

The NOSPOOL trick

For these simple cases one can use the NOSPOOL/NOSPOOLONLY mode.
While NOSPOOL is a soft switch, which is ignored if not applicable, NOSPOOLONLY will make statement failed if the nospool option is not available for the given query.

In NOSPOOL mode the database will not make a copy of result data. Instead it directly passes the data to the client, which has another benefit: the execution of the SELECT runs dominantly parallel with the file write operation, instead of successively like in SPOOL (default) case.

Usage:
.BEGIN EXPORT
SESSIONS …
[SPOOLMODE SPOOL|NOSPOOL|NOSPOOLONLY];

The tradeoff

The NOSPOOL operation only supports the “simple” SELECT statements. The not supported operations:

  • SORT (ORDER BY, HAVING)
  • JOIN
  • Aggregations (GROUP BY, SUM())
  • OLAP functions
  • Non-table data (date, session, user)
  • Multiple statements

But allows:

  • Sampling
  • Column selection
  • Scalar functions (like col1 + col2, substr())
  • “Where” clause (incl. partition elimination)

The disadvatages of NOSPOOL mode:

  • No ActivityCount is available
  • Locks maintained till the end of the export session
  • Data conversion errors can happen during the exporting phase
  • (Only available from V13.10)

What is it good for indeed?
It can almost double the throughput in case of:

  • DataMart export (to another database)
  • Archive dumps
  • Copy data to test environment
  • Migration to another database systems

Measures

We’ve executed a Teradata to Netezza migration, where got to transport ~12TBytes of data through file interface. After we activated the NOSPOOL option, the average export speed grew up by 84%.