Transactional grain:
In transactional grain Fact data tables holds the source record
exactly without any modifications to actual data.
Exceptions to this rule are the standard business rules that are
applied to enhance the data.
Good example for this scenario is sales table containing customer
orders
- Each event is stored in the fact table only once.
- It has a date column indicating when the event occurred.
- It has an identifier column which identifies each event.
- The number of rows is the same as the source table.
Lets say on Monday we had 50 orders, Tuesday 60 orders,
Wednesday 80, and like that.
So on Monday night we load the 50 rows for Monday into the data
warehouse.
On Tuesday night we load the 60 rows for Tuesday, and on Wednesday
night we load the 80 rows for Wednesday.
In addition to 60 new orders on Tuesday, we also have updates to
some of the 50 Monday orders.
This is they key difference to the Periodic Snapshot fact table
Transactional fact table updates existing rows, and therefore lost
some history.
Observation:
- In this Sales Fact Table, every order is stored only once.
- The 60 orders on Tuesday are different to the 50 orders on
Monday and are also different to the 80 Wednesday orders.
- For this we use the order date column. In the above example the
Monday, Tuesday and Wednesday are the order date. This order date
column indicates when the event occurred, when the order
happened.
- In this sales table we also have a sales identifier, such as
order number if it is a shop, or ticket number if it is a
restaurant.
- On Wednesday night, after the warehouse load finishes, we have
50+60+80 = 190 rows, the same as in the source system.
Periodic Snapshot grain:
- A snapshot table has non-cumulative facts and hence the design
includes a “Time” context, which is mandatory.
- Inventory data and Financial General Ledger data are example
for this type of fact.
- The underlying data is calculated along the defined Time
context and populated in fact table.
- The summary in a bank account statement is that kind of
periodic snapshot: starting balance, ending balance, interest.
- Here in Periodic Snapshot grain
- The whole source system is copied into the fact table
regularly.
- The same event is stored multiple times.
- It has a snapshot date column indicating when a copy of the
source table was created.
Lets consider bank account balance. so at the end of the day,
the balances of every customer account in the bank is stored in
this account balance table.
Say there were 10,000 customers on Monday; 11,000 customers on
Tuesday and 12,000 customers on Wednesday.
Observation:
- Daily we extract the whole content of the account balance table
into the periodic snapshot fact table.
- So on Monday night we stored 10,000 rows in the account
balances periodic snapshot fact table, on Tuesday night 11,000 rows
and on Wednesday night 12,000 rows. Here an account is copied each
day to the fact table and each day with likely a different balance
amount.
- In the fact table we have a column called snapshot date. For
all the rows created on Monday night, we set the snapshot date
column to (for example) 22nd Mar 2020. The rows created on Tuesday
night we set the snapshot date to 23rd Mar 2020 and for the
Wednesday rows we set the snapshot date to 24th Mar 2020.
- In the fact table, the Monday data set contains these accounts,
but the Tuesday data set doesn’t contain these accounts, and
neither does the Wednesday data set and there are accounts which
were updated on Tuesday. These changes will be reflected on the
Tuesday snapshot in the fact table, different to their Monday
rows.
Hence we need understand that as a we should never have
different datamart with different pattern, it is up to business
which is sutiable for them if they are concern with the captures of
the mesurement at the most atomic dimensional level at the pint in
time of the transaction which wil provide robust dimensional
grouping and rool-up and drill down reporting capabilities for the
business users.
However if the business uses periodic snapshot grain it will
capture the state of the metrics at a specified point time to
quickly access the performance of the measures over specified time
intervals (week, month, quater, year and so on.