DBQL analysis I. – Monitor “Top CPU consumers”

DBQL analysis

About DBQL

What is it?

DataBase Query Logging.

It is a nice feature of Teradata RDBMS, which comprehensively logs the issued queries execution – if it is switched on.
Configuration can be checked/administered eg. in the Teradata tools or from DBC.DBQLRuleTbl.
Logging can be set on global/user level, and in respect of details (see DBQL tables)

For detailed information please refer Teradata documentation of your version.

DBQL tables

Table Content
DBQLogTbl Central table, 1 record for each query.
DBQLSQLTbl Whole SQL command, broken up to 30k blocks
DBQLStepTbl Execution steps of the query, one row for each step.
DBQLObjTbl Objects participated in the query. Logged on different levels (db,table, column, index, etc.)
DBQLExplainTbl English explain text, broken up to 30k blocks
DBQLXMLTbl Explain in XML format, broken up to 30k blocks
DBQLSummaryTbl PEs’ aggregated table, which accounts on the desired level.

DBQL tables logically organized into 1:N structure, where DBQLogTbl is the master entity and others (except DBQLSummaryTbl) are the children.
Join fields are the ProcID and QueryId together, eg:

...
from 
    DBQLogTbl a
    join DBQLStepTbl b on 
        a.ProcID = b.ProcID 
        and 
        a.QueryID = b.QueryID
...

Unfortunately PI of DBQL tables are not in sync with logical PK-FK relation in (also in latest V14.10), therefore JOIN-ed selects against online DBQL tables are not optimal.

Cost of using DBQL

DBQL basically consumes negligible amount of processing resources, since it has cached&batch write and generates data proportional to issued queries (flush rate is DBScontrol parameter).
It is important to regularly purge/archive them from the DBC tables, Teradata has a recommendation for it. This ensures that PERM space consumption of the DBQL remains low.
In an environment where ~1M SQLs are issued a day, comprehensive logging generates ~8..10G of DBQL data daily w/o XML and Summary. Less SQLs generate proportionally less data.

It is worth to switch on all option except XML and Summary, since the first generates huge data volume (~makes it double), and the second is similar to Acctg info. If you want to utilize them, they should be switched on, of course.

What is it good for?

It contains:

  • Query run time, duration
  • Consumed resources
  • Environment info (user, default db, etc)
  • SQL text
  • Explain
  • Step resource info
  • Objects involved
  • Etc.

One can get a lot of useful aggregated and query specific tuning information, some of them I will share in the blog.

CPU usage distribution info

(Everything applies to I/O also, just replace CPU with I/O, AMPCPUTime with TotalIOCount…)

Do you think Query optimization is rewarding?

Yes, I know it is hard work to find out why is ONE query run sub-optimally, and what to do with it.

But guess how many queries consume how many percent of the processing resources (CPU) within a whole day’s workload.
Tip it and write down for CPU%: 5%, 10%, 25% and 50%

And now run the query below, which will result it to you. (replace the date value or maybe you have to adjust the date filtering according to local settings)

select 
    'How many queries?' as "_"
    ,min(limit5) as "TOP5%CPU"
    ,min(limit10) as "TOP10%CPU"
    ,min(limit25) as "TOP25%CPU"
    ,min(limit50) as "TOP50%CPU" 
    ,max(rnk) as TotalQueries
from
(
    select
        case when CPURatio < 5.00 then null else rnk end as limit5
        ,case when CPURatio < 10.00 then null else rnk end as limit10
        ,case when CPURatio < 25.00 then null else rnk end as limit25
        ,case when CPURatio < 50.00 then null else rnk end as limit50
        ,rnk
    from
    (
        select
            sum(ampcputime) over (order by ampcputime desc) as totalCPU
            ,sum(ampcputime) over (
                order by ampcputime desc  
                rows unbounded preceding
             ) as subtotalCPU
            ,subtotalCPU * 100.00 / totalCPU as CPUratio
            ,sum(1) over (
                order by ampcputime desc 
                rows unbounded preceding
             ) as rnk
        from
        (
            select *
            /* For archived DBQL
            from dbql_arch.dbqlogtbl_hst 
            where 
                logdate=1131201 
                and 
                ampcputime > 0 
            */
            /* For online DBQL*/
            from dbc.dbqlogtbl 
            where
                cast(cast(starttime as char(10)) as date) = '2013-12-10' (date)
                and 
                ampcputime > 0
        ) x
    ) y
) z
group by 1;

Are you surprised?
I bet:

  • Less than 10 queries will consume 5% of the CPU
  • Less than 1% of the queries will consume 50% of the CPU

Let’s calculate.
How much does your Teradata system cost a year? It is all for storage and processing capacity.
If you can save eg. X% of CPU&I/O and X% storage using MVC optimization, you saved X% of the price of the Teradata system, by:

  • Improved user experience (earlier load, faster responses)
  • Resources for additional reports and applications
  • Enable postponing a very expensive Teradata hardware upgrade
PRISE Tuning Assistant

helps you to find those queries and to get the hang of how to accelerate them.

 
 

Leave a Reply

Your email address will not be published. Required fields are marked *