Avoiding Product Joins

How to eliminate Product Joins

What is product join?

Avoiding Product Cross Join in Oracle
Product join is one of the implementation methods of an SQL JOIN operation.
Do not mix up with cross join (Cartesian product), which is one type of SQL joins.

SQL join types, eg.: inner join, left outer join, full outer join, cross (Cartesian) join
Join implementation types, eg.: nested join, merge join, hash join, product join.

Product join (of tables A and B ) is the most simple method of join implementation:

  • Produce each of <A;B> record combinations, say take each records from A singly, and match it with each records of B one-by-one.
  • Test the join condition on each produced <A;B> record pairs, and eliminate those combinations where the condition fails.
The two steps are often combined, and the “testing phase” is executed right after a record combination is generated, and the non valid combinations right after dropped. This saves a lot of temp space.

Why don’t we like product joins?

Well, it has a really bad reputation. It is slow, stuffs CPU, etc.
Yes, it usually is, does. It is the brute force method for executing a join, with costs in order of N*M (where N, M are the record numbers of the joinable tables)

Indeed there are situations when it is the best choice, or the only feasible way.

When is it good or necessary?

Please note that product join is the method what is always applicable, independently of all circumstances.

Good

Product join is typically simple, dumb and slow algorithm, this is why we do not like it, but has a very important advantage: requires no pre-processing.* This is why we LIKE IT:)
If we have to join a really large table to a very small table (couple of records) product join is far the most effective method, since the sort of a very large table ( order of N*logN ) can cost a lot, while joining to 1-2 records is really not a big deal.

Necessary

There are join situations when the only way to go is the product join. Why? Because of the join condition. The “clever joins” (merge, hash) require some information and/or condition that somehow enables to cheat the A x B comparisons: reduce them to the ones that really necessary, and be done in a more effective manner.

* OK, in Teradata this means: only requires that the matchable records from both tables must be on the same AMP. This implies the “small” table to be duplicated to all AMPs.

Merge join example


from A
join  B on A.customer_id = B.customer_id
and A.trx_dt between B.eff_dt and B.exp_dt

  • Customer_id clause is in AND condition with the others
  • Customer_id is selective enough that hash(customer_id) can reduce the comparisons reasonably
  • Note that A and B must be sorted (re-sorted) by the hash of customer_id

Product join example


from A
join   B on substr(A.telephone_no,1,B.prefix_length) = B.telephone_no_prefix

  • There is no comparison reducing partial-condition
  • Note that neither of the tables required to be sorted in a specific order.

Unavoidable product joins

  • Non-eqality condition
  • Function used (eg. substr())
  • Dependent expression is used (eg. A.x+B.y = A.z)
  • Cross join: intentional Cartesian product

Avoidable product joins

Data type mismatch

The merge join example above works only if customer_no in A and B tables have the same “style” data types, since their hash value will match only in this case. Say hash(13674) <> hash(‘13674’), however integer is compatible with decimal, and char is compatible with varchar.
Pay attention on data type consistence during physical data modeling.
  • Use domains to eliminate the possibility of mismatch
  • Align to used data types when defining temp tables, or use “create table as …” statements
  • If you cannot avoid mismatch, relocate the necessary data to temp tables with proper data types during processing.

OR condition

Let’s assume the following join condition:
select ...
from A
join  B on A.col1 = B.Col1
OR 

           A.Col2 = B.Col2

This is equivalent, w/o compulsory product join :

select ... 
from A
join  B on A.col1 = B.Col1 

UNION 
select ...

from A
join  B on A.Col2 = B.Col2

Missing/stale statistics

As I mentioned before product join is the most effective join between a very large and a really small (couple of records) table. If the optimizer thinks that a table is pretty small, but it is not indeed, it may choose a product join in all good faith, misleaded by a stale or missing statistics.
Define and keep fresh those statistics by the optimizer can determine the size of the joinable record sets  properly.

How to find avoidable product joins

It is not trivial to list the avoidable product joins. Practically all product joins are required to be examined one-by-one and judged to be avoidable or not. And if avoidable, what to do for.

I strongly recommend to use PRISE Tuning Assistant for both finding the product joins and analyzing the possibility and necessity of elimination:

  • List top consuming queries with product join(s)
  • Check the PROD JOIN steps: which tables are processed that way
  • Check those join conditions for cases described above

What to do if cannot be avoided?

In this case I recommend to try the decomposition, described here.
It can help reducing the number of comparisons, saving CPU and runtime.

 
Have a successful optimization!
 

How to optimize a Teradata query?

Teradata SQL optimization techniques

Introduction

The typical goal of an SQL optimization is to get the result (data set) with less computing resources consumed and/or with shorter response time. We can follow several methodologies depending on our experience and studies, but at the end we have to get the answers for the following questions:

  • Is the task really heavy, or just the execution of the query is non-optimal?
  • What is/are the weak point(s) of the query execution?
  • What can I do to make the execution optimal?

Methodologies

The common part of the methodologies that we have to understand – more or less – what is happening during the execution. The more we understand the things behind the scenes the more we can feel the appropriate point of intervention. One can start with the trivial stuff: collect some statistics, make indices, and continue with query rewrite, or even modifying the base table structures.

What is our goal?

First of all we should branch on what do we have to do:
  1. Optimize a specific query that has been running before and we have the execution detail info
    Step details clearly show where were the big resources burnt
  2. In general, optimize the non optimal queries: find them, solve them
    Like a.,but first find those queries, and then solve them one-by-one
  3. Optimize a query, that has no detailed execution info, just the SQL (and “explain”)
    Deeper knowledge of the base data and “Teradata way-of-thinking” is required, since no easy and trustworthy resource peak-detecting is available. You have to imagine what will happen, and what can be done better

Optimization in practice

This section describes the case b., and expects available detailed DBQL data.
In this post I will not attach example SQL-s, because I also switched to use PRISE Tuning Assistant for getting all the requested information for performance tuning, instead of writing complex SQL queries and making heaps of paper notes.

Prerequisites

My opinion is that DBQL (DataBase Query Logging) is the fundamental basis of a Teradata system performance management – from SQL optimization point of view. I strongly recommend to switch DBQL comprehensively ON (SQL, Step, Explain, Object are important, excluding XML, that is huge, but actually has not too much extra), and use daily archiving from the online tables – just follow Teradata recommendation.

Finding good candidate queries

DBQL is an excellent source for selecting “low hanging fruits” for performance tuning. The basic rule: we can gain big save on expensive items only, let’s focus on the top resource consuming queries first. But what is high resource consumption? I usually check top queries by one or more of these properties:

  • Absolute CPU (CPU totals used by AMPs)
  • Impact CPU (CPU usage corrected by skewness)
  • Absolute I/O (I/O totals used by AMPs)
  • Impact I/O   (Disk I/O usage corrected by skewness)
  • Spool usage
  • Run duration
PRISE Tuning Assistant supplies an easy to use and quick search function for that:

Finding weak point of a query

Examining a query begins with the following steps:
  • Does it have few or many “peak steps”, that consume much resources? 
    • Which one(s)?
    • What type of operations are they?
  • Does it have high skewness?
    Bad parallel efficiency, very harmful
  • Does it consume extreme huge spool?
    Compared to other queries…
PRISE Tuning Assistant again.
Check the yellow highlights in the middle, those are the top consuming steps:

    Most of the queries will have one “peak step”, that consumes most of the total resources. Typical cases:

    • “Retrieve step” with redistribution
      Large number of rows and/or skewed target spool
    • “Retrieve step” with “duplication-to-all-AMPs”
      Large number of rows duplicated to all AMPs
    • Product join
      Huge number of comparisons: N * M
    • Merge or Hash join
      Skewed base or prepared (spool) data
    • OLAP function
      Large data set or skewed operation
    • Merge step
      Skewness and/or many hash collisions
    • Any kind of step
      Non small, but strongly skewed result

    What can we do?

    Teradata optimizer tries its best when produces the execution plan for a query, however it sometimes lacks proper information or its algorithms are not perfect. We – as humans – may have additional knowledge either of the data or the execution, and we can spoil the optimizer to make better decisions. Let’s see our possibilities.
    • Supplement missing / refresh stale statistics
    • Drop disturbing statistics (sometimes occurs…)
    • Restructure the query
    • Break up the query, place part result into volatile table w/ good PI and put statistics on
    • Correct primary index of target / source tables
    • Build secondary/join index/indices
    • Add extra components to the query.
      You may know some additional “easy” filter that lightens the work. Eg. if you know that the join will match for only the last 3 days data of a year-covering table, you can add a date filter, which cost pennies compared to the join.
    • Restrict the result requirements to the real information demand.
      Do the end-user really need that huge amount of data, or just a record of it?

    What should we do?

    First of all, we have to find the root cause(s). Why does that specific top step consume that huge amount or resources or executes so skewed? If we find the cause and eliminate, the problem is usually solved.
    My method is the following:
    1. Find the top consuming step, and determine why it it high consumer
      • Its result is huge
      • Its result is skewed
      • Its work is huge
      • Its input(s) is/are huge
    2. Track the spool flow backwards from the top step, and find
      • Low fidelity results (row count falls far from estimated row count)
      • NO CONFIDENCE steps, specifically w/low fidelity
      • Skewed spool, specifically non small ones
      • Big duplications, specifically w/NO CONFIDENCE
    3. Find the solution
      • Supplement missing statistics, typically on PI, join fields or filter condition
        NO CONFIDENCE, low fidelity, big duplications
      • Break up the query
        Store that part result into a volatile table, where fidelity is very bad, or spool is skewed. Choose a better PI for that
      • Modify PI of the target table
        Slow MERGE step, typical hash-collision problem.
      • Eliminate product joins
      • Decompose large product join-s
      • E.T.C.
    Have a good optimization! 🙂

    PRISE Tuning Assistant use cases

    What is PRISE Tuning Assistant (PTA) good for?

    PRISE developed this tool to help DBAs and developers solve performance issues quickly and easily. In this post I describe a couple of typical situations when it can assist you.
    A full functioning trial version of PRISE Tuning Assistant is available for download.

    In general, PTA application consist of two main functions:

    • Query finder
      Enables applying lots of search criteria like: thresholds, skewness, queryband, SQL text, Explain text, used objects, time window, etc.
    • Query analyzer
      Unfolds the execution process of one selected query, helps understanding the query and its performance bottlenecks. Collects all required supplementary information like table sizes, DDLs, statistics, etc. in one comprehensive dashboard, and provides a “walking tour” feature, you can roam in the execution plan with.
    All this enables even an average skilled Teradata expert to find which queries to be optimized and figure out the weak points of even a complex query in minutes.

    PTA’s key architectural properties:

    • Supports active DBQL (dbc.dbqlogtbl and friends…) and also archived DBQL tables.
    • Optimized DBQL querying (fast response times and low resource usage)
    • Requires read only access to database
    • Portable, multi platform Java application, requires no installation

    Let’s see typical use cases of the application.

    Find the TOP consuming queries

    The optimization’s “low hanging fruits” are the top consuming queries, I mean CPU or I/O  stuffing SQL statements. They are often small in number, however consume significant percent of the total system resources, as I mentioned in this post.

    How to find them?
    Simply set your focused time interval and select the “CPU”, “I/O” or “Duration” (for long time running ones) as TOP criteria, and click the “Get DBQL data” button. You will be given the list of the qualifying queries in descending order of your TOP criteria. After you can sort it by the other KPIs, like:

    • Skewness
    • Spool usage
    • LHR/RHL (Larry Higa ratio)
    • QueryBand (and specific variables of it)
    • Etc.

    Skewed CPU or I/O usage

    Teradata is a massive parallel system. It can utilize its full power if the work is spread evenly across the AMPs, otherwise (skewed queries) the response time increases dramatically while virtual resource consumption can remain low.

    How to find the bad guys?
    Start same as “TOP queries” case, but choose “CPU Impact” or “I/O Impact” as TOP criteria. This will result those queries that have the highest “Impact” (skew-normalized resource consumption) on the system.

    “No more spool space”

    Teradata uses Spool limiting in fact for cancelling those queries that have bad plans. Therefore “Failure 2646: No more spool space error” means a bad query execution plan, and the solution is usually not spool barrier raising. But often those spool limits are raised, until query is able to finish, and bad queries throttle the system… Why does a query usually run out of spool?

    • Duplicates a large spool/table to all AMPS
      Thinks it is small
    • Redistributes spool/table by a skewed value
      Thinks it is non-skewed
    • Processes (copies to spool) unreasonably large number of records

    First two cases are usually caused by missing or stale statistics that can be easily found and supplied.

    How to select?

    Start same as “TOP queries” case, but choose “Spool” as TOP criteria. This will result those queries that have the highest spool consumption, and SQL dashboard will help you to figure out which statistics should be collected. In addition you can filter for “Error 2646” with Error code option, which will result those queries that has reached the spool limit.

    Find a specific query

    The DBA gets a question: “My query runs slow, please help me!” – occasionally an SQL is attached in the email. PTA lets figuring out which query was it in the DBQL. You can filter on:

    • Username
    • Session ID
    • Full query text (with LIKE)
    • QueryID (internal DBQL ID)
    • QueryBand (with LIKE)
    • Error code

    Search disliked plan elements

    PTA enables text search in Explain text of the queries. Eg. find quickly:

    • Product joins
    • Sliding-window joins
    • “Redistributed by”
    If it is combined with CPU and/or I/O limitation (CPU >=; I/O >= criteria) then we can get those queries where they really cause problem.

    Object usage

    We have a table or index in the database, and we look for all the queries that utilize that object.
    It is very useful for:

    • Preparing for a PI change
      PI change usually triggered by skewed data distribution, but the more important factor is the access path it provides. Before you change a PI, you need to examine the workload, accessing that table. What is the typical filter/join criteria, does the current/new PI support good access path? PTA enables to do it quickly and easily.
    • Determine usage of a table
      How many times a table is accessed a day? Is it used extensively, or just kept loaded for nothing?
    • Determine usage of a hash/join index
      A hash/join index requires processing and storage resources. Is it pretty utilized, or not?
    How?
    Simply set the “Object used” filter citeria. In addition you can use the “Thru join/hash index” option for including the join/hash index based data accesses.

    Recurring queries

    Which query to be optimized? Maybe there are a medium consuming queries, but run lot of times a day, consuming much resource in total. Use the “Recurring query finder” function of the PTA to see aggregated information: total and average Duration, CPU and I/O.

    What was a query run by?

    If your ETL and OLAP systems use QueryBand properly, you can set specific variables (max of 6) to be parsed into separate columns in the result query list. Sort the result by it, and find which queries of a load job should be optimized.

    Accelerate PRODUCT JOIN by decomposition

    How to optimize slow product joins

    Case description

    There are SQL queries that cannot be executed any other way, but using product join method, because of the content of join condition. Eg:

    • OR predicate
      Examle:
      ON (a.x = b.x OR a.y = b.y)
    • BETWEEN / LIKE operator
      Examples:

          ON (a.x LIKE b.y)
          ON (a.x LIKE b.y || '%')
          ON (a.b between b.y and b.y)
    • Comparison (=) of different datatype fields
      Example (a.x is INTEGER, b.x is VARCHAR)
      ON (a.x = b.x)
    • Arithmetic expression usage
      Example
      ON (a.x = b.x + 1)
    • SQL function (eg. substr() ) or UDF usage
      Example
      ON (substr(a.x,1,characters(b.x)) = b.x)

    Product join is a technique when the execution will match each record combinations from the two joinable tables and evaluates the join condition on each of them. Product join usually causes huge CPU consumption and long response time.

    How to identify

    The query typically runs long time, contains a “PRODUCT JOIN” step in the Explain description, and that  step consumes high AMPCPUTime and lasts long. Those queries usually have >>1 LHR index (Larry Higa Ratio, showing the CPU and I/O rate), typicall 10s, 100s or more.

    In the DBQL you should find high CPU usage queries ( dbc.DBQLogtbl.AMPCPUTime > 1000 , depends on system size) which also has “product join” expression in the execution plan text (dbc.DBQLExplaintbl.ExplainText like ‘%product join%’)

    PRISE Tuning Assistant supplies easy-to-use GUI based search function.

    Explanation of product join execution

    Let’s assume that we join tables: Table1 (N records, bigger table) and Table2 (M records, smaller table) Join processing assumes that the matchable record pairs must reside on the same AMP. Since product join compares each Table1 records to each Table2 records, one of the tables’ all records must reside on all AMPs, therefore PRODUCT JOIN is preceded by a “Duplicated to all AMPs” step of the smaller table.
    Each record pairs will be evaluated, if the JOIN condition satisfies, the result gets to the result spool, otherwise discarded.
    The number of required comparisons: (N x M), and the cost (approx. the required CPU time) of one comparison depends on the complexity of the join expression.

    Solution

    In most cases the JOIN condition of the product join satisfies only small fraction of all possible combinations. In practice we can identify an often situation:
    Significant subset of the bigger table’s records will fit to a small subset of the smaller table’s records.
    Telco example: Number analysis. Most of the Call records are directed to national number areas (>90%), but the number area describing contains dominantly international number regions (>80..95%). We can declare that national calls will never fit to international areas. In addition it is very simple to identify both a “Number” and a “Number area” if it is national or international.
    The base query looks like that:

    select Table1.Number,Table2.Area_code
    from Table1
    join Table2 ON Table1.Number BETWEEN Table2.Area_Start_number and Table2.Area_end_number;

    Let’s decompose the query into two parts:

    select Table1.Number,Table2.Area_code
    from Table1
    join Table2 ON Table1.Number BETWEEN Table2.Area_Start_number and Table2.Area_end_number
    where substr(Table1.Number,1,1) = '1'   -- USA area code
    and substr(Table2.Area_Start_number,1,1) = '1' -- USA area code
    and substr(Table2.Area_end_number,1,1)   = '1' -- USA area code
    UNION ALL
    select Table1.Number,Table2.Area_code
    from Table1
    join Table2 ON Table1.Number BETWEEN Table2.Area_Start_number and Table2.Area_end_number
    where
    NOT (substr(Table1.Number,1,1) = '1')   -- non-USA area code
    and NOT
    (
    substr(Table2.Area_Start_number,1,1) = '1' -- non-USA area code
    and substr(Table2.Area_end_number,1,1)   = '1' -- non-USA area code
    );



    This way we added some low cost operations (full scan on tables to identify national/international) , and the const of UNIONing the results, but we eliminated lots of trivially not satisfing comparisions.

    The following figures show the processing cost, the red area represents the number of comparisons, therefore the cost:

    Figure1: Original case
    Figure2: Decomposed case

    Let’s do some maths, with imagined combinations:
    N1: International calls
    N2: National calls
    M1: International area descriptions
    M2: National area descriptions
    90% of calls (N) are national (N2)
    90% of area descriptions (M) are international (M1).
    Originall we have to do N x M comparisons.
    The decomposed query must do
    ((0.9 x N) x (0.1 x M)) + ((0.1 x N) x (0.9 x M)) = 0.09 x N x M + 0.09 x N x M = 0.18 x N x M

    The optimized query will do only 18% of the original comparisons, with tradeoff
    of two full scans (I/O intensive) of the base tables and one UNION ALL-ing (low cost)
    of the results.

    In this case we will get ~4 times faster and CPU saving execution.

    Sometimes it can be worth to decompose to 3 or more process phases, depending on data.

    It is important, if there are further joins or transformations on the result data, they should be done on the UNION ALL-ed result, and should not be dupliated on the decomposed phases, due to code management reasons.

    Why can not do the Teradata PE the same?
    The decomposition requires the knowledge of the data, and will vary from query to query, which is currently out of scope and intelligence of an automatic optimizer.

    Summary

    Eliminate trivially invalid record pairs from the PRODUCT JOIN by breaking the query in more parts.

    What’s next

    Next post will discuss slow OUTER (LEFT/RIGHT) JOIN.