How to store date and time info effectively
Data Warehouse databases usually contain significant amount of date/time information. Physical modeling technique can seriously influence their storage space and usability.
Date/time information can be stored in different ways/data types, each of them will have its own specialities.
- Joint storage: Timestamp
- Timestamp(n) , when n means the fractional digits of seconds
- Separate storage: Date & Time
- Date column + Time column
|Integer time (integer format ’99:99:99′)||4 bytes|
|Time(n)||6 bytes, independent of n*, where n:[0..6]|
|Time(n) with time zone||8 bytes, independent of n*, where n:[0..6]|
|Timestamp(n)||10 bytes, independent of n*, where n:[0..6]|
|Timestamp(n) with time zone||12 bytes, independent of n*, where n:[0..6]|
* n means the precision digits of second
- Date: implicit conversions work, easy and comfortable
- Integer time: works fine, but insert-select will loose the formatting, only the integer value will remain
- Time(n): implicit conversion to string is not working. This fails: select cast(‘2014-01-31’ as date) || ‘ ‘ ||cast(’12:00:00’ as time(0))
- Timestamp(n): brrr. Different precisions will not convert automatically either. I don’t like it.
- Filtering: comparing date/datetime values with < / <= /between operators
- Joint storage (timestamps)
Straightforward, just use the values – if they are equivalent data types
- Separate storage
You have to convert to a “joint” format, either a string or a timestamp before
- Date: ok, adding a constant, subtracting dates work fine
- Integer time: do not use arithmetic, results are bad!
- Time(n): interval types accepted. Not really comfortable, eg max 99 second long interval is accepted (V13.10)
- Timestamp(n): same as Time(n)
Regarding arithmetic I suggest building your own UDF library, that will ease your life.
Choosing data type
I recommend to choose data types depending on the table type and usage purposes.
I differentiate “transaction” and “all other” table types, because transaction tables are usually allocate most of the PERM space, while others are many in number, but allocate “negligible” space.
- Separate storage
- Integer time
- All others
- Joint type (timestamp)
Saving space – store “delta”
- Choose a “primary date”
Must be not null, and typically used as partitioning key also, since it is the most often date filtering condition.In our case this will be the Call_start
- Choose separate date-time storing
Eg. Date and Integer time , as this combination requires the least space
- Store the non-primary dates as delta, multi value comressed
Compute it in the load process, like this:
Call_end_delta := Call_end-Call_start
- Compress the “delta” columns
They will reflect low deviation, highly compressible, use PRISE Compress Wizard
- Convert to absolute dates back in the view layer
Call_start + Call_end_delta as “Call_end”
CREATE TABLE T2000_CALL_TRX
Call_start_date Date NOT NULL
Call_end_date_delta Integer COMPRESS (0)
) PRIMARY INDEX (…,Call_start_date)
PARTITION BY RANGE_N ( Call_start_date BETWEEN date ‘2010-01-01’ AND date ‘2020-12-31’ EACH interval ‘1’ day, NO RANGE, UNKNOWN);
CREATE VIEW V2000_CALL_TRX
, Call_end_date_delta +Call_start_date as “Call_end_date“