Thursday, November 26, 2009

Aggregates outside OBIEE - materialized views and query rewrite

In this post I explained how to use aggregates in OBIEE. Then we did manually create aggregate tables on the database and set each logical table source to triggering only on the certain level of dimension.

We then used aggregate tables SALES_MONTHS, SALES_YEAR_CAT and SALES_MONTHS_CAT_CH and dimension tables CATEGORIES, MONTHS i YEARS. Here, we don't need that.

In this post we'll try to explain and set up materialized views and the query rewrite to get the same queries as in the post above, but without setting anything in the BMM in the logical table sources.

I'll use oracle 10g database, oracle SH schema and the measure from the SALES fact table.

For using dbms_mview.explain_rewrite we need to have rewrite_table table (file utlxplan.sql) and for dbms_mview.explain_mview table mv_capabilities_table (file utlxmv.sql).

First set:

ALTER SYSTEM SET QUERY_REWRITE_ENABLED=TRUE
ALTER SYSTEM SET QUERY_REWRITE_INTEGRITY='TRUSTED'

This is the level of the query rewrite. I set TRUSTED only to be able to test this. You should see other options as well. In TRUSTED mode, the optimizer trusts that the relationships declared in dimensions and RELY constraints are correct. In this mode, the optimizer also uses prebuilt materialized views or materialized views based on views, and it uses relationships that are not enforced as well as those that are enforced. In this mode, the optimizer also trusts declared but not ENABLED VALIDATED primary or unique key constraints and data relationships specified using dimensions. This mode offers greater query rewrite capabilities but also creates the risk of incorrect results if any of the trusted relationships you have declared are incorrect (Text reference:
Oracle Database Data Warehousing Guide 11g Release 1 (11.1)).

RELY constraints:

We use SALES table as the reference. Existing constraints we need to modify to RELY. RELY only affects those constraints that are ENABLE NOVALIDATE. Parameter QUERY_REWRITE_INTEGRITY is set to TRUSTED (TRUSTED informations are constraints (NOVALIDATE RELY) and dimensions). Oracle will not do the check whether relationships defined with RELY constraints are TRUE. That refers to primary key and unique key constraints (RELY ENABLE NOVALIDATE). Query rewrite also use joinback method for recognition attribute that is not in the materialized view query but can be retrieved with joinback. For example, the query rewrite materialized view has CALENDAR_MONTH_ID and we want to group by CALENDAR_MONTH_DESC and then the query optimizer make the join between materialized view and the TIMES table one more to get CALENAR_MONTH_DESC. TIMES table is joinback table.

Because of the connection with the higher levels we need to have dimensions:



I didnt create them, they are already on the oracle SH schema.

Modify all SALES table constraints to RELY ENABLE NOVALIDATE:

alter table sales modify constraint sales_product_fk RELY ENABLE NOVALIDATE
alter table sales modify constraint sales_channel_fk RELY ENABLE NOVALIDATE
alter table sales modify constraint sales_time_fk RELY ENABLE NOVALIDATE
alter table products modify constraint products_pk RELY ENABLE NOVALIDATE
alter table times modify constraint times_pk RELY ENABLE NOVALIDATE
alter table channels modify constraint channels_pk RELY ENABLE NOVALIDATE

We create materialized view to support all queries like in the
post:

create materialized view mv_sales_all
build immediate
refresh force on demand
with primary key
enable query rewrite
as
select t.calendar_month_id,
s.prod_id,
s.channel_id,
grouping_id(t.calendar_month_id, s.prod_id, s.channel_id) as gr_id,
sum(s.amount_sold) as amount_sold,
sum(s.quantity_sold) as quantity_sold
from sales s, times t
where s.time_id=t.time_id
group by
grouping sets
(
(t.calendar_month_id),--gr_id 3
(t.calendar_month_id,s.prod_id), --gr_id 1
(t.calendar_month_id,s.prod_id,s.channel_id)--gr_id 0
)

In the grouping sets we support all three combinations like in the
post.

Grouping_id function will get the decimal interpretation of the binary. If the attribute gives the contribution to aggregation then the value is 0, otherwise it is 1.

For example, calendar_month_id has value 3 because it's in the combination:

(0, 1, 1) = (calendar_month_id, prod_id, channel_id)

Check:

select bin_to_num(0, 1, 1) from dual--3 decimal
select bin_to_num(0, 0, 0) from dual--0 decimal
select bin_to_num(0, 0, 1) from dual--1 decimal

Example of combinations:



To explain materialized view query we use the table mv_capabilities_table and the procedure dbms_mview.explain_mview.

BMM (clean model):



The focus is on how this works with queries that OBIEE generates, not how to refresh materialized views during the part of the job of the ETL process.

To test this we need to refresh materialized view:

begin
dbms_snapshot.refresh('MV_SALES_ALL','C');
end;

Get schema statistics:

begin
dbms_stats.gather_schema_stats('SH', CASCADE=>TRUE);
end;

Now, if we choose:



NQQuery.log:



Explain plan, table plan_table:



See the joinback to TIMES table to get the CALENDAR_MONTH_DESC.

If we instead of CALENDAR_MONTH_DESC put the CALENDAR_MONTH_ID there is no joiback to TIMES because we use CALENDAR_MONTH_ID which is already in the materialized view query.



To verify that the query did rewrite we can use dbms_mview.explain_rewrite, and the table rewrite_table:



If we choose:



NQQuery.log:



Explain plan, table plan_table:



If we choose:



NQQuery.log:



Explain plan, plan_table:



We see that in all three queries the query rewrite works correctly, query has been rewritten.

I really try to show how this works when you are using OBIEE queries. If you have any question or suggestion please post the comment.

2 comments:

gerardnico said...

Yeah.
Very good, professional and not easy.

Thanks
Nico

LM Bryant said...

Our team is having a problem where we need to display running sums on charts by date. However, there will be some dates for which there are no values (closed change requests). When that happens the chart drops the line to zero which is unacceptable since as a running sum, the value should be the same as the previous date's value. Any ideas on how to work around this problem using Pivot table data?