Logo

Go back

Storing datetime values in a SQL Server-based EDW

Chip Hartney (The Datamology Company)

First written: January 19, 2023

Last updated: January 18, 2024

Contents

Abstract 1

Assumptions. 1

Choices. 1

SMALLDATETIME. 1

DATETIME. 1

DATETIME2. 1

Philosophy. 1

Processing of ingested values. 1

Usage of stored values. 1

DATETIMEOFFSET. 1

Philosophy. 1

Processing of ingested values. 1

Usage of stored values. 1

Recommendation: DATETIMEOFFSET. 1

 

Abstract

To satisfy the basic requirements of an enterprise data warehouse (EDW), one must store datetime values.  The question arises as to which data type one should utilize for datetime values in the EDW tables.  Each has implications regarding how you must handle those values throughout the EDW.

In this article, I compare the data types available in SQL Server, recommend use of DATETIMEOFFSET, and explain why I make that recommendation.

Assumptions

In the following choices, I assume that you will do your best to manage time as accurately as possible given the particular choice.  Goals included in these best efforts are:

·        Support ingestion of datetime values that originate from other time zones.

·        Support sequencing of datetime values by their actual occurrence.

·        Retain the local time of datetime values for reporting and interpretation.

I also assume that you will not use a hybrid approach (different data types in different tables) as this would be very confusing and very likely to lead to mistakes.

In the following, I use “MT” (short for Mountain Time) to indicate the Mountain Standard Time clock that is observed where I live.

Note: The discussion here is strictly about the data type used in EDW tables.  It is a separate matter, altogether, as to how time values are presented in views (where data is actually exposed to data consumers and analysts).

Choices

SMALLDATETIME

Data type specs:

·        Storage: 4 bytes

·        Date range: 1900-01-01 through 2079-06-06

·        Time precision type: Fixed

·        Time precision accuracy: to 1 second

·        UTC anchor: none

This is a legacy SQL Server data type that is no longer recommended for enterprise applications.

Therefore, I do not consider it further here.

DATETIME

Data type specs:

·        Storage: 8 bytes

·        Date range: 1753-01-01 through 9999-12-31

·        Time precision type: Fixed

·        Time precision accuracy: to ~.003 second

·        UTC anchor: none

This is a legacy SQL Server data type that is no longer recommended for enterprise applications.

Therefore, I do not consider it further here.

DATETIME2

Data type specs:

·        Storage: 6-8 bytes

·        Date range: 0001-01-01 through 9999-12-31

·        Time precision type: Variable

·        Time precision accuracy: to .0000001 second

·        UTC anchor: none

This is an ANSI and ISO 8601 compliant data type and is recommended for enterprise applications in which UTC anchoring is not needed.

Philosophy

Store datetime values based on when they occurred with respect to Colorado (MT).

DATETIME2 stores the following components:

·        The date (in Colorado) when the event occurred.

·        The time (in Colorado) when the event occurred.

With both components we can determine:

·        The absolute datetime of the event (UTC).

·        The matching local datetime on any other clock.

Those times are available to us through the AT TIME ZONE clause and the Time Zone setting for the Managed Instance.

We cannot, however, determine:

·        The local datetime as reported (because it was converted to MT before storage).

Processing of ingested values

If the provided datetime value (a string) is in DATETIMEOFFSET form, we convert the value to MT as in:

CONVERT(DATETIMEOFFSET, '<datetime-value>', <datetime-format>) AT TIME ZONE 'Mountain Standard Time'

Otherwise, we must know the clock utilized in the source data.  It must either be specified in the contract (implicit) or in the data (explicit).  The clock name would then be used with the AT TIME ZONE clause to establish the absolute time and then converted to MT as in:

CONVERT(DATETIME2, '<datetime-value>', <datetime-format>) AT TIME ZONE '<clock-name>' AT TIME ZONE 'Mountain Standard Time'

In both of the above cases, the statement returns a DATETIMEOFFSET.  But the value can be used as a DATETIME2 (in which case the offset is truncated without any adjustment to the date and time).

Usage of stored values

To see the absolute time (UTC):

<column-name> AT TIME ZONE 'UTC'

To see the Colorado time:

<column-name>

To see the local time in any other locale:

<column-name> AT TIME ZONE 'Mountain Standard Time' AT TIME ZONE 'Pacific Standard Time'

The first and third above (using AT TIME ZONE) return a DATETIMEOFFSET data type.  The second returns a DATETIME2 data type.  The default formatted value will look something like the following, respectively:

2023-01-01 19:00:00 +00:00

2023-01-01 12:00:00

2023-01-01 11:00:00 -08:00

One is free to format the output value as desired, of course.

The DATETIME2 value can also be converted to DATETIMEOFFSET as in:

CONVERT(DATETIMEOFFSET,<column-name>) AT TIME ZONE 'Mountain Standard Time'

DATETIMEOFFSET

Data type specs:

·        Storage: 8-10 bytes

·        Date range: 0001-01-01 through 9999-12-31

·        Time precision type: Variable

·        Time precision accuracy: to .0000001 second

·        UTC anchor: via offset (in minutes)

This is an ANSI and ISO 8601 compliant data type and is recommended for enterprise applications in which UTC anchoring is needed.

Philosophy

Store both aspects of the datetime (local and absolute) as reported to us.

DATETIMEOFFSET stores the following components:

·        The local date of the event as reported.

·        The local time of the event as reported.

·        The difference (offset) between the local time and UTC.

With both components we can determine:

·        The local datetime as reported.

·        The absolute datetime of the event (UTC).

·        The matching local datetime on any other clock (including MT in Colordo).

Those times are available to us through the AT TIME ZONE clause and the Time Zone setting for the Managed Instance.

Processing of ingested values

Given that the ingested values in an EDW are typically strings, we first have to convert the value to a datetime data type as in:

CONVERT(<data-type>, '<datetime-value>', <datetime-format>)

If the provided value is already in DATETIMEOFFSET form, we use the converted form directly.

Otherwise, we must know the clock utilized in the source data.  It must either be specified in the contract (implicit) or in the data (explicit).  The clock name would then be used with the AT TIME ZONE clause as in:

<converted-form> AT TIME ZONE '<clock-name>'

Usage of stored values

To see the absolute time (UTC):

<column-name> AT TIME ZONE 'UTC'

To see the Colorado time:

<column-name> AT TIME ZONE 'Mountain Standard Time'

To see the local time in any other locale:

<column-name> AT TIME ZONE 'Pacific Standard Time'

Each of the above returns a value with DATETIMEOFFSET data type.  The default formatted value will look something like the following, respectively:

2023-01-01 19:00:00 +00:00

2023-01-01 12:00:00 -07:00

2023-01-01 11:00:00 -08:00

One is free to format the output value as desired, of course.

The DATETIMEOFFSET value can also be converted to DATETIME2 as in:

CONVERT(DATETIME2,<column-name>)

Recommendation: DATETIMEOFFSET

Benefits of Option 3 and Option 4 include:

·        Each supports ingestion of datetime values that originate from other time zones.

·        Each supports sequencing of datetime values by their actual occurrence.

·        Because each uses AT TIME ZONE clause to interpret provided datetime values, each addresses the ambiguity inherent in clock changes (for daylight saving time, for example).

Benefits of Option 3 (as compared to Option 4) include:

·        Same approach used by FRHP DW (and possibly other data stores).

Shortcomings of Option 3 (as compared to Option 4) include:

·        Loss of local time.

o   I.e., after converting values to MT and storing them in that form, there is no (easy) way to recall the local time that it represented.  I.e., each of the following would be stored as 1pm MT:

§  11am in Los Angeles

§  Noon in Denver

§  1pm in Chicago

§  2pm in New York

o   The only way to recover the local time would be to infer the time difference from location information (which may or may not be stored in other columns) and the clock used for that location and adjust the stored time by that difference.

·        More complex manipulation of ingested values.

·        More complex manipulation of stored values (but only when needing to see values in other local times, which would be very rare).

·        Inconsistent data types returned from usage of stored values (unless one adds the AT TIME ZONE clause to the simple use case).  But even then, the resultant data type does not match the original column’s data type.

Given the above comparison, I recommend:

Use DATETIMEOFFSET data type in an EDW to store datetime values.

Note that, in a greenfield analytical environment, we would definitely and strongly recommend using DATETIMEOFFSET.