Compress tables without performance tradeoff
The product was specifically developed to reduce Teradata disk consumption, utilizing internal Teradata Multi-Value Compression (MVC) feature.
PCW analyzes tables, produces the optimal MVC settings, and computes the possible space gains w/o modifying the tables.
I am a CIO/IT manager
PRISE Compress Wizard (PCW)
Log into the target Teradata instance. PCW includes user/repository assistant for easy setup. Configures logon, permissions and repository.
Select your candidate tables through DB tree browser or from a result of custom SQL select. Displays size and last compress info. Customize SQL by your taste. Add chosen tables to "Target list".
Adjust Compress optimization algorithm parameters.
Follow the Analysis process as it happens. You are kept informed on what is happening, and what amount of PERM space can be freed up. Note: nothing has changed yet!
Browse repository for previous Analysis' results. View summary of selected items or create Excel report including the consumed CPU and I/O. Transform tables immediately or generate BTEQ scripts for it to do later.
Set up the Script Generator and the Implementation process. Choose transformation method, locking, statistics handling, space management, channel, etc. You can set the DB user to be used for transforming the tables.
Follow the Implementation process on the fly. You are kept informed what is happening, and what amount of PERM space has been already freed up. BTEQ and JDBC logs are displayed.
Browse repository for Implementation results, history included. View summary of selected items or create Excel report including the consumed CPU and I/O.
Configure Analysis batch(es) for automatic check of compression on a regular basis. Customize preset SQL(s) to get the appropriate table list and adjust parameters. Schedule the batch(es) with Crontab or At to receive both the Analysis report via e-mail and data registered into the repository.
PCW contains context-aware online help. Click "help" icon, and point to the control you are interested in.
PRISE Compress Wizard pays key attention for the safety to your data:
PRISE Compress Wizard
|Vendor:||On-site visit, presentation, requirements definition|
|Recipient:||DBA and decision-maker get familiar with the product, resource planning, technical preparations|
Target areas identified. PRISE Compress Wizard assembled according to requirements and run. PRISE evaluates results within 3 working days.
|Vendor:||Consults, prepares and supervises product configuration and scheduled run|
|Recipient:||Participate in identifying focus areas, available for assistance|
Findings and recommendations prepared and presented. Q&A session.
|Vendor:||Calculates forecasted benefits, creates reports and perform presentation|
|Recipient:||DBA and decision-maker get precise information about the achievable benefits|
Recipient makes decision about implementing PRISE Compress Wizard optimization. No-go decision concludes the project with zero costs to Recipient and implementation will not take place.
Optimization scripts are provided, deployed, scheduled and run.
|Vendor:||Deploys optimization scripts and assists in running|
|Recipient:||DBA fully involved in implementation, managing EDW processes and running optimization scripts|
Log data evaluated and the results of the overall project are assessed and presented. Project closure with benefits shared as previously agreed. Upon mutual satisfaction Recipient issues a reference certificate.
|Vendor:||Assesses and presents results|
PRISE Compress Wizard storage optimization is based on fine tuning Teradata Multi-Value Compression (MVC) functionality. MVC is a 100% transparent feature, using it does not influence the applications. MVC is usually manually configured and will not automatically adapt to the changing business and data environment.
From technical point of view the project has two separate tasks: analysis and implementation. During the analysis phase the vendor's expert identifies the set of tables in focus in cooperation with the Recipient's delegated DBA. The qualified tables will be analyzed by a proprietary software - PRISE Compress Wizard, which inspects the distribution of data values, calculates the optimal MVC settings and calculates the possible space gain for each involved table.
The Vendor then evaluates the result statistics and presents the suggested optimization prospects.
The second technical phase of the project covers the implementation, which enables the Recipient to apply the benefits in practice. The Vendor's and Recipient's DBAs create a deploy time plan to effectively execute the suggested modifications in the database. According to the plan the Vendor delivers SQL scripts that relocate data into the optimized table structure. The scripts are specifically designed to avoid all incompatibilities or system disturbances. For details please refer to Annex B.
The cost structure of a data warehouse consists of several different components:·Storage and processing hardware costs, e.g. servers and disk arrays ·License and support fees in proportion to the system performance ·Operational expenses like backup, space and performance management, optimization ·Electric power consumption of the hardware components ·Server room reservation and cooling costs ·Personnel costs
Implementing PRISE Compress Wizard will decrease all cost components, saving significant expenses for the company, without noticeable disadvantages due to Teradata's unique technological solution utilized by PRISE Compress Wizard. Compression is recommended to be performed from time to time in order to exploit the maximum advantages of the product.
Benefits achieved can be classified into two categories. For further details please refer to Annex A.
Instant advantages·Lower storage consumption, instantly freed disk space ·Less resource consumption -> faster queries to the users ·Shorter backup windows, less archive media ·Deferred hardware extension ·No side effects or application disturbances
Occasional advantages·Shorter migration, reconfiguration times at upgrades ·Shorter disaster recovery time ·Slower growth of database size, less frequent capacity expansions ·More applications and reports can be served by the current infrastructure ·Significantly less DBA time is required for manual database optimization tasks
On the optimized tables PRISE Compress Wizard managed to reach 17% storage space saving, which resulted in 4%..92% savings on individual tables. To visualize the achievement, the gained storage area is enough to cover 11 months of data volume increase. Detailed estimations and statistics are available in Annex C.
Both the CPU and IO load of the system has been decreased by 15% and 13% respectively, at significantly increased user activity. The chart shows the system measurement results:
Figure 2: Overall Workload and CPU&IO usage and trends, including optimization phases
Telenor Hungary's Teradata RDBMS has the DBQL (Database Query Logging) feature switched on continuously, and stores ~1-2 months of historical data. The chart of Figure 2 is based on DBQL data and shows the total CPU and IO load of the system over the time.
|·“Phases”:||shows the optimization implementations in time (phase 4,5,7,8)|
|·System CPU time:||7 days moving average of daily SUM() of CPU seconds|
|·System IOs:||7 days moving average of daily SUM() of IO|
|·Workload:||7 days moving average of number of SQL statements issued daily|
The optimization project was initiated by PRISE in Q1 2011. It required noticeably low human resources from Telenor side, only one DBA, who prepared the working environment and then executed the implementation package. Telenor DBA time spent on the project did not exceed 5 working days. Decision maker time involvement was limited to a few hours in meetings.
The time frame of the project was quite long compared to the required human and machine resources. The reason was low resource availability, which lead to 8 implementation phases. The project did not have a drawback due to the lengthier timing, all benefits were delivered.
Figure 3: PRISE Compress Wizard implementation project timing
Some of the achieved advantages will not be visible immediately, but on the long run they will undoubtedly apply: the company's growing business produces more and more data and rising market competition demands more complex algorithms, both of which require developments of hardware infrastructure sooner or later or periodically. Optimizing available resources will decrease investment into additional hardware resources along with their costs.
Teradata license and support fees' calculation model is based on the idea of TPERF, which is a weighted mixture of processing capacities. The more CPU and IO capacity a system has, the bigger the TPERF value of an installation, and raises the fees higher.
More server nodes and disk bundles consume more electric energy. Researches demonstrated that the power consumption expenses of IT infrastructure fall into the price range of license fees. The power energy consumption additionally doubles for server room cooling.
Other cost components are reduced immediately. Backup and archive times shorten proportional to freed disk capacity. Companies often suffer from the lack of free space. In this case difficult decisions must be made: investing into additional hardware or reducing either the scope of stored information or the depth of history. These will be painful to some division of the company – unlike compression, where there are no tradeoffs.
It is important to highlight, that PRISE Compress Wizard is 100% transparent for all kind of applications, therefore introduction or optimization of the product will not involve ETL, application or report development, which means there is no hidden cost.
For quality assurance purposes we prefer to use Teradata's DBQL feature. We use 3-4 weeks log data as basis of resource consumption calculation for analysis & implementation as well as final results.
We used the Teradata data dictionary to make a pre-selection of the candidate tables. Based on a consultation with development, reporting and operation staff, we excluded the unnecessary databases and tables from the scope
Project decided to take the TOP 1000 tables by size to be examined, and after excluding the unnecessary and stale ones, 889 of them were candidate for analysis.
The analysis of the <target> is a 100% percent automatic process which has the following prerequisites:
The process was run in multiple parts, during the nights, in order to minimize the disturbance of EDW users or daytime if the priority settings enable seamless EDW usage. The time the whole process takes depends on
The calculation process runs as an ordinary ad-hoc report, does not disturb loads (uses access locking) and other reporting and analysis tasks.
The calculation process in Telenor Hungary was run with the following details:
The result of the calculation process contains the original and the estimated optimized table sizes, to be able to estimate the amount of workspace that will be necessary for implementing the optimization and the storage gain which can be achieved – before any modifications are done on the database.
After the completion of the calculation process we found that the following gains are available.
The table contains only the statistics for the tables that are currently not optimal.
|Gain % range||Orig. sum GBytes||Gainable Gbytes||No. of tables|
If we want to cut the time and resource demands of the implementation, it's reasonable to shorten the <target> table list. We've chosen to eliminate those tables that did not promise more than 20Mbytes of space gain. This way we had 488 tables left for implementation phase.
Table compression parameters are creation-time defined; non modifiable in the available versions of Teradata, therefore the modification of compression is only applicable by recreating the tables, which will cause object and data absence for a while.
For this reason we had to choose low activity time windows to implement optimizations. The most appropriate time was the end of business time: 17:00
PRISE Compress Wizard generates all the necessary scripts that need to be run to achieve the possible storage gain by recompressing the tables, but they must be run by the local Teradata DBA. Before running we recommend that the DBA checks all the SQLs before they are applied.
The implementation was broken up into 8 phases since it was quite difficult to find continuous time frames with available resources of both DBA and database.
The total results of the 8 implementation phases are shown in the table below:
|Phase||No of Tables||Original Gbytes||Gain Gbytes||Gain%||Runtime|
The implementation phase requires the optimized space amount of the target table as an additional free space during the technical run of the scripts, for one table at a time. This space will be immediately de-allocated after the table's optimization finishes.
The implementation consumed:
Which are 80% and 25% respectively, of the regular daily system loads.
The test measurements proved that the CPU consumption of an INSERT statement into a compressed table can be 50-200% more, than into its uncompressed equivalent. However, one can easily check that a suboptimal compression definition will demand the same, or more CPU.
Load procedures typically have two important time factors: longer preparation & transformation and significantly shorter final “INSERT”, however the more resource intensive first part is not affected at all.
After the implementation finished, we performed an investigation of the CPU and IO consumption of the INSERT statements issued against the optimized tables:
Figure 4: How did CPU usage of affected tables' INSERT statements change
Performance impact is displayed on the horizontal axis, 1x means no change, 0,5x half, 2x double amount of CPU for the similar INSERT operation. The number of tables where INSERT phase fell into a given performance impact range can be found on the vertical axis. One can read that the modal impact was around 0.95, therefore INSERT statements consumed a little less CPU than before optimization.
Production Data Warehouses are in continuous operation and will not easily go down for maintenance. For this reason PRISE developed a safe process to foreclose the possibility of data damage or loss and minimize the possibility of unavailability.
In Telenor Hungary's PRISE Compress Wizard implementation project we reached these goals 100%:·No particle of data was lost or corrupted
PRISE undertakes to sign a very comprehensive non-disclosure agreement as a first step. PRISE Compress Wizard will not modify or keep any personal data. Additionally we only work with your data in your safe environment.
PRISE provides detailed documentation for ITIL change & configuration management process considerations. PRISE Compress Wizard can be tested initially in a test environment, then gradually implemented across the entire scope with maximum safety.
Open in PDF format: Telenor Hungary full reference case study