How to eliminate Product Joins
What is product join?
Product join is one of the implementation methods of an SQL JOIN operation.
Do not mix up with cross join (Cartesian product), which is one type of SQL joins.
SQL join types, eg.: inner join, left outer join, full outer join, cross (Cartesian) join
Join implementation types, eg.: nested join, merge join, hash join, product join.
Product join (of tables A and B ) is the most simple method of join implementation:
- Produce each of <A;B> record combinations, say take each records from A singly, and match it with each records of B one-by-one.
- Test the join condition on each produced <A;B> record pairs, and eliminate those combinations where the condition fails.
Why don’t we like product joins?
Well, it has a really bad reputation. It is slow, stuffs CPU, etc.
Yes, it usually is, does. It is the brute force method for executing a join, with costs in order of N*M (where N, M are the record numbers of the joinable tables)
Indeed there are situations when it is the best choice, or the only feasible way.
When is it good or necessary?
Product join is typically simple, dumb and slow algorithm, this is why we do not like it, but has a very important advantage: requires no pre-processing.* This is why we LIKE IT:)
If we have to join a really large table to a very small table (couple of records) product join is far the most effective method, since the sort of a very large table ( order of N*logN ) can cost a lot, while joining to 1-2 records is really not a big deal.
There are join situations when the only way to go is the product join. Why? Because of the join condition. The “clever joins” (merge, hash) require some information and/or condition that somehow enables to cheat the A x B comparisons: reduce them to the ones that really necessary, and be done in a more effective manner.
* OK, in Teradata this means: only requires that the matchable records from both tables must be on the same AMP. This implies the “small” table to be duplicated to all AMPs.
Merge join example
join B on A.customer_id = B.customer_id
and A.trx_dt between B.eff_dt and B.exp_dt
- Customer_id clause is in AND condition with the others
- Customer_id is selective enough that hash(customer_id) can reduce the comparisons reasonably
- Note that A and B must be sorted (re-sorted) by the hash of customer_id
Product join example
join B on substr(A.telephone_no,1,B.prefix_length) = B.telephone_no_prefix
- There is no comparison reducing partial-condition
- Note that neither of the tables required to be sorted in a specific order.
Unavoidable product joins
- Non-eqality condition
- Function used (eg. substr())
- Dependent expression is used (eg. A.x+B.y = A.z)
- Cross join: intentional Cartesian product
Avoidable product joins
Data type mismatch
- Use domains to eliminate the possibility of mismatch
- Align to used data types when defining temp tables, or use “create table as …” statements
- If you cannot avoid mismatch, relocate the necessary data to temp tables with proper data types during processing.
join B on A.col1 = B.Col1
A.Col2 = B.Col2
This is equivalent, w/o compulsory product join :
join B on A.col1 = B.Col1
join B on A.Col2 = B.Col2
How to find avoidable product joins
I strongly recommend to use PRISE Tuning Assistant for both finding the product joins and analyzing the possibility and necessity of elimination:
- List top consuming queries with product join(s)
- Check the PROD JOIN steps: which tables are processed that way
- Check those join conditions for cases described above
What to do if cannot be avoided?
In this case I recommend to try the decomposition, described here.
It can help reducing the number of comparisons, saving CPU and runtime.