Friday, January 2, 2009

Time series modelling

In this post I'll show how to implement time series functionality (month ago and year ago) manually on a physical layer using tables TIMES_MAGO i TIMES_YAGO.

Times_mago and times_yago tables

select min(calendar_month_desc), max(calendar_month_desc) from times

Minimal month is 1998-01 and a maximum is 2002-12.

First, we create table TIMES_MAGO. It shows for each month days on the left side and days of the previous period on the right side, starting from 1998-02, because it's first that has a previous month.

create table times_mago as
select nvl(norm.calendar_month_id,mago.calendar_month_id+1) calendar_month_id,
norm.calendar_month_desc,
norm.time_id,
norm.day_number_in_month,
mago.calendar_month_id mago_calendar_month_id,
mago.calendar_month_desc mago_calendar_month_desc,
mago.time_id mago_time_id,
mago.day_number_in_month mago_day_number_in_month
from times norm,times mago
where norm.day_number_in_month(+)=mago.day_number_in_month and norm.calendar_month_id(+)-1=mago.calendar_month_id

Here is a left outer join to get days from the previous month if the previous month has more days then the current month.

SQL returns 2002-12 on the right side and null on the left side so we don't have this because the maximum date from the left side is 2002-12, so we can delete this:

delete from times_mago where calendar_month_id>1731--ili mago_calendar_month_desc='2002-12'

For days that exist in the previous period and doesn't exist in the current we have null for calendar_month_desc (left side) so we need to update it so that we could use later time series functionality that we'll manually implement in our physical data model.

update times_mago a
set a.calendar_month_desc=
(select distinct calendar_month_desc from times where calendar_month_id=a.calendar_month_id)

Now we create table TIMES_YAGO that for each month shows it's days on the left side and days from the same month previous year on the right side, starting from 1999-01 because this month is the first that has a previous.

create table times_yago as
select nvl(norm.calendar_month_id,yago.calendar_month_id+12) calendar_month_id,
norm.calendar_month_desc,
norm.time_id,
norm.day_number_in_month,
yago.calendar_month_id yago_calendar_month_id,
yago.calendar_month_desc yago_calendar_month_desc,
yago.time_id yago_time_id,
yago.day_number_in_month yago_day_number_in_month
from times norm, times yago
where norm.day_number_in_month(+)=yago.day_number_in_month and norm.calendar_month_id(+)-12=yago.calendar_month_id

Here is a left outer join to get days from the same month previous year if it has more days then the previous.

SQL returns months of the 2002 year on the right side and null on the left side so we don't have this because the maximum date from the left side is 2002-12, so we can delete this:

delete from times_yago where calendar_month_id>1731--substr(mago_calendar_month_desc,1,4)='2002'

Update is the same as for TIMES_MAGO.

update times_yago a
set a.calendar_month_desc=
(select distinct calendar_month_desc from times where calendar_month_id=a.calendar_month_id)

Check:

select * from times_mago--calendar_month_desc from 1998-02 to 2002-12
select * from times_yago--calendar_month_desc from 1999-01 to 2002-12

Time series model in Administrator

We import tables into Administrator:



The idea is to have on BMM just TIMES i SALES logical tables, also with PRODUCTS dimensional table. After we imported TIMES_MAGO i TIMES_YAGO tables we make aliases for SALES table with Mago i Yago suffixes. After that we add FK from SALES Mago to TIMES_MAGO and from SALES Yago to TIMES_YAGO.

We add PRODUCTS to a SALES and to all associated aliases. That is required if we want to see PRODUCTS attribut in WHERE or GROUP BY part of the report in all SQL queries that Oracle BI Server will generate (sales-times, sales mago-times mago, sales yago-times yago) if we choose it. For example, we add condition PROD_CATEGORY='Hardware' and this will appear in all three SQL queries. We'll show this later.

Our physical model:



Foreign keys:

PRODUCTS.PROD_ID = SALES.PROD_ID
PRODUCTS.PROD_ID = "SALES Mago".PROD_ID
PRODUCTS.PROD_ID = "SALES Yago".PROD_ID
TIMES.TIME_ID = SALES.TIME_ID
TIMES_MAGO.MAGO_TIME_ID = "SALES Mago".TIME_ID
TIMES_YAGO.YAGO_TIME_ID = "SALES Yago".TIME_ID

We create a new model in BMM and put SALES and TIMES inside it and add complex join (logical join) if it is not created automatically. Drag and drop columns AMOUNT_SOLD and QUANTITY_SOLD into SALES logical table

*Note that we don't have to add explicitly physical join in BMM (add physical table to logical table source option) between SALES Mago and TIMES_MAGO and between SALES Yago and TIMES_YAGO.



*Also, if we didn't add physical join in phisical layer from TIMES_MAGO and TIMES_YAGO to TIMES how we know that if we choose for example 2001-01 month that Oracle BI Server will generate other SQL queries (for mago and
yago) for the choosen measure?

*This part we'll solve in BMM in TIMES logical table. We add TIMES_MAGO i TIMES_YAGO logical table source inside it. We want to have for example a report that will give us data for CALENDAR_MONTH_DESC so to have a data for mago and yago our column CALENDAR_MONTH_DESC need to have two more sources (from TIMES_MAGO and TIMES_YAGO). We repeat this step for CALENDAR_MONTH_ID. For this reason we did update of CALENDAR_MONTH_ID from TIMES_MAGO and TIMES_YAGO tables before. Every column for which we want to see mago and yago data must have this two additional sources. After drag and drop columns a new source will appear (this can be seen also in column properties).



When we choose CALENDAR_MONTH_DESC to group by or in filter, Oracle BI Server will generate three separeted SQL queries for the choosen value from TIMES-a, and reference this value to the CALENDAR_MONTH_DESC to other two queries (TIMES_MAGO, TIMES_YAGO). That's the way that OBIEE takes measure from SALES Mago and SALES Yago.

Presentation model:



Testing in Answers

Basic test in Answers:



Result:





Why do we get data for months that does not exist in the SALES table? The answer is that Oracle BI Server generates a full outer join to all three SQL queries if you do not select a condition under which you want to filter. Remember that in SALES minimum month is 1998-01 and the maximum is 2001-12

NQQuery.log:



Which would looks something like this if we link those three queries with a full outer join (not in the NQQuery.log):

SELECT c1, SUM (c2), SUM (c3), SUM (c4)
FROM
(
SELECT
DISTINCT CASE
WHEN d1.c2 IS NOT NULL
THEN d1.c2
WHEN d0.c2 IS NOT NULL
THEN d0.c2
WHEN d2.c2 IS NOT NULL
THEN d2.c2
END AS c1,
d1.c1 AS c2,
d0.c1 AS c3,
d2.c1 AS c4
FROM
(SELECT
SUM (t20550.amount_sold) AS c1,
t20553.calendar_month_desc AS c2
FROM times t20553, sales t20550
WHERE (t20550.time_id = t20553.time_id)
GROUP BY t20553.calendar_month_desc) d1
FULL OUTER JOIN
(SELECT
t21046.calendar_month_desc AS c2,
SUM (t20551.amount_sold) AS c1
FROM sales t20551 /* SALES Mago */, times_mago t21046
WHERE (t20551.time_id = t21046.mago_time_id)
GROUP BY t21046.calendar_month_desc) d0
ON (d1.c2 = d0.c2)
FULL OUTER JOIN
(SELECT
t21046.calendar_month_desc AS c2,
SUM (t20551.amount_sold) AS c1
FROM sales t20551 /* SALES Yago */, times_yago t21046
WHERE (t20551.time_id = t21046.yago_time_id)
GROUP BY t21046.calendar_month_desc) d2
ON d1.c2 = d2.c2
ORDER BY c1)
GROUP BY c1
ORDER BY 1

If we don't want to get months that don't exist in the SALES we can define a repository variable (maximum month) and set a filter in the Administrator to TIMES, TIMES _MAGO and TIMES_YAGO:



Repeat this for TIMES_MAGO and TIMES_YAGO:



Now we get only months that we have in SALES (till 2001-12):



NQQuery.log:



If we don't group by months, but only by PROD_CATEGORY and choose measures AMOUNT_SOLD_MAGO and AMOUNT_SOLD_YAGO then we will get the same amount for all three measures, because we did not select CALENDAR_MONTH_DESC, so the right part of the model didn't activate (TIMES, TIMES_MAGO, TIMES_YAGO).





In case that we group by months, it will reference itself on TIMES, TIMES_MAGO and TIMES_YAGO:







And that is correct, as long as we group by months we get the correct results for time series measures.

Now we remove repository variable v_maks_mjesec from all filters in Administrator and drag column CALENDAR_YEAR from a BMM to presentation layer.

In Answers kmake a report and put a filter on CALENDAR_YEAR (note that CALENDAR_YEAR is only from TIMES logical table source):



Results:



NQQuery.log:


We can see that by selecting an attribute that exists only in the TIMES, filter is applied only to the first request (TIMES) and Oracle BI Server generates a left join on other two queries, so in this case we do not have a full outer join.

No comments: