Oracle Database 12 Release 2 makes it a lot easier to be able to manage this vast collection of scripts by includes a new DBMS_STATS preference called PREFERENCE_OVERRIDES_PARAMETER. When this preference is set to TRUE, it allows preference settings to override the parameter values specifically set in a DBMS_STATS command.
For example, if the global preference ESTIMATE_PERCENT is set to the default DBMS_STATS.AUTO_SAMPLE_SIZE, but a 10% sample size was specified in the in a DBMS_STATS.GATHER_TABLE_STATS command, setting PREFERENCE_OVERRIDES_PARAMETER to TRUE would automatically replace the 10% sample size with the default preference DBMS_STATS.AUTO_SAMPLE_SIZE.
Let’s take a look at this in action. we have a SALES table with nearly 1 million rows in it.
15:06:27 SQL> SELECT TO_CHAR(SYSDATE,'DD-MON-YYYY HH24:MI:SS') WALL_CLOCK_TIME
15:17:49 2 FROM dual;
WALL_CLOCK_TIME
--------------------
15-JUL-2017 15:17:49
Elapsed: 00:00:00.00
15:17:49 SQL>
15:02:18 SQL> SELECT COUNT(*)
2 FROM sales;
COUNT(*)
----------
918843
15:02:20 SQL> BEGIN
2 DBMS_STATS.GATHER_TABLE_STATS(ownname=>'SH', tabname=>'SALES', estimate_percent=>10);
3 END;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:09.08
If we check the sample size used for gathering statistics, we see that only 10% of the rows were used as requested and that a FREQUENCY histogram has been created on the CUST_ID column of the SALES table.
15:03:38 SQL> DBMS_STATS.GET_PREFS(pname=>'ESTIMATE_PRECENT' , ownname=>'SH'
SP2-0734: unknown command beginning "DBMS_STATS..." - rest of line ignored.
15:04:02 SQL>
15:04:02 SQL> SELECT TABLE_NAME, sample_size
15:04:19 2 FROM user_tables
15:04:19 3 WHERE TABLE_NAME= 'SALES';
TABLE_NAME SAMPLE_SIZE
------------------------------ -----------
SALES 91619
Elapsed: 00:00:00.09
15:04:20 SQL> SELECT column_name, num_distinct, num_nulls, histogram
15:04:51 2 FROM user_tab_col_statistics
15:04:51 3 WHERE TABLE_NAME='SALES';
COLUMN_NAME NUM_DISTINCT NUM_NULLS HISTOGRAM
-------------- --------------- ------------ ----------
PROD_ID 651 0 NONE
CUST_ID 630 0 FREQUENCY
TIME_ID 620 0 NONE
CHANNEL_ID 5 0 NONE
PROMO_ID 116 0 NONE
QUANTITY_SOLD 44 0 NONE
AMOUNT_SOLD 583 0 NONE
7 rows selected.
Elapsed: 00:00:00.12
Unfortunately, these statistics are not that accurate as there are over 600 distinct PROD_IDs and CUST_IDs in the SALES table but the statistics gather missed this information because it looked at such a small sample.
15:04:52 SQL> SELECT COUNT(DISTINCT prod_id), COUNT(DISTINCT cust_id)
15:05:31 2 FROM sales;
COUNT(DISTINCTPROD_ID) COUNT(DISTINCTCUST_ID)
---------------------- ----------------------
651 630
In order to improve the accuracy of the statistics, we should use the recommended best practice value for the ESTIMATE_PERECENT parameter, AUTO_SAMPLE_SIZE. Let’s check the preference value for the ESTIMATE_PERECENT parameter on the SALES table.
15:05:34 SQL> SELECT
DBMS_STATS.GET_PREFS(pname=>'ESTIMATE_PRECENT' , ownname=>'SH', tabname=>'SALES')
FROM Dual;
DBMS_STATS.GET_PREFS(PNAME=>'ESTIMATE_PERCENT',OWNNAME=>'SH',TABNAME=>'SALES')
--------------------------------------------------------------------------------
DBMS_STATS.AUTO_SAMPLE_SIZE
We see it’s actually set to DBMS_STATS.AUTO_SAMPLE_SIZE, which is good news for us.
Rather than having to find and edit all of the existing statistics gathering script that refer to the SALES table, we can set the DBMS_STATS preference PREFERENCE_OVERRIDES_PARAMETER to TRUE on the SALES table, which will automatically override all of the none default parameter values used in the DBMS_STATS command with the corresponding preferences values.
15:05:39 SQL> BEGIN
DBMS_STATS.SET_TABLE_PREFS(ownname=>'SH', tabname=>'SALES',
pname=>'PREFERENCE_OVERRIDES_PARAMETER', pvalue=>'TRUE');
END;
/
PL/SQL PROCEDURE successfully completed.
So, in our case the ESTIMATE_PERCENT of 10% will actually be overridden and DBMS_STATS.AUTO_SAMPLE_SIZE will be used. Let’s see what impact that has on both the statistics gathering performance and the quality of the statistics gathered.
15:05:40 SQL> BEGIN
DBMS_STATS.GATHER_TABLE_STATS(ownname=>'SH', tabname=>'SALES', estimate_percent=>10);
END;
/
PL/SQL PROCEDURE successfully completed.
Elapsed: 00:00:06.59
15:05:48 SQL> SELECT TABLE_NAME, SAMPLE_SIZE
FROM user_tables
WHERE TABLE_NAME= 'SALES';
TABLE_NAME SAMPLE_SIZE
------------------------------ -----------
SALES 918843
So, you can see that the time it took to gather statistics was actually less than the time for the 10% but the sample size was the total number of rows in the table. But what about the quality of the statistics?
15:06:04 SQL> SELECT column_name, num_distinct, num_nulls, histogram
FROM user_tab_col_statistics
WHERE TABLE_NAME='SALES';
COLUMN_NAME NUM_DISTINCT NUM_NULLS HISTOGRAM
-------------- --------------- ------------ ----------
PROD_ID 651 0 NONE
CUST_ID 630 0 TOP-FREQUENCY
TIME_ID 620 0 NONE
CHANNEL_ID 5 0 NONE
PROMO_ID 116 0 NONE
QUANTITY_SOLD 44 0 NONE
AMOUNT_SOLD 583 0 NONE
As you can see, we have a much more accurate set of statistics this time with each of the number of distinct values being correct. You will also notice we got a new type of histogram a TOP-FREQUENCY histogram this time. That’s because this new type of histogram is only gathered if the ESTIMATE_PERCENT is set to the default.
Remember setting PREFERENCE_OVERRIDES_PARAMETER to TRUE will override all none default parameters set in the DBMS_STATS.GATHER_*_STATS command, so you may need to adjust the default preferences for some of the parameters, for example METHOD_OPT if you don’t want to revert to the default.
You can set a table preference using the DBMS_STATS.SET_TABLE_PREFS procedure.
15:06:10 SQL>BEGIN
DBMS_STATS.SET_TABLE_PREFS(ownname=>'SH', tabname=>'SALES', pname=>METHOD_OPT,
pvalue=>'FOR ALL COLUMNS SIZE 1 FOR COLUMNS CUST_ID SIZE 254');
END;
For example, if the global preference ESTIMATE_PERCENT is set to the default DBMS_STATS.AUTO_SAMPLE_SIZE, but a 10% sample size was specified in the in a DBMS_STATS.GATHER_TABLE_STATS command, setting PREFERENCE_OVERRIDES_PARAMETER to TRUE would automatically replace the 10% sample size with the default preference DBMS_STATS.AUTO_SAMPLE_SIZE.
Let’s take a look at this in action. we have a SALES table with nearly 1 million rows in it.
15:06:27 SQL> SELECT TO_CHAR(SYSDATE,'DD-MON-YYYY HH24:MI:SS') WALL_CLOCK_TIME
15:17:49 2 FROM dual;
WALL_CLOCK_TIME
--------------------
15-JUL-2017 15:17:49
Elapsed: 00:00:00.00
15:17:49 SQL>
15:02:18 SQL> SELECT COUNT(*)
2 FROM sales;
COUNT(*)
----------
918843
15:02:20 SQL> BEGIN
2 DBMS_STATS.GATHER_TABLE_STATS(ownname=>'SH', tabname=>'SALES', estimate_percent=>10);
3 END;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:09.08
If we check the sample size used for gathering statistics, we see that only 10% of the rows were used as requested and that a FREQUENCY histogram has been created on the CUST_ID column of the SALES table.
15:03:38 SQL> DBMS_STATS.GET_PREFS(pname=>'ESTIMATE_PRECENT' , ownname=>'SH'
SP2-0734: unknown command beginning "DBMS_STATS..." - rest of line ignored.
15:04:02 SQL>
15:04:02 SQL> SELECT TABLE_NAME, sample_size
15:04:19 2 FROM user_tables
15:04:19 3 WHERE TABLE_NAME= 'SALES';
TABLE_NAME SAMPLE_SIZE
------------------------------ -----------
SALES 91619
Elapsed: 00:00:00.09
15:04:20 SQL> SELECT column_name, num_distinct, num_nulls, histogram
15:04:51 2 FROM user_tab_col_statistics
15:04:51 3 WHERE TABLE_NAME='SALES';
COLUMN_NAME NUM_DISTINCT NUM_NULLS HISTOGRAM
-------------- --------------- ------------ ----------
PROD_ID 651 0 NONE
CUST_ID 630 0 FREQUENCY
TIME_ID 620 0 NONE
CHANNEL_ID 5 0 NONE
PROMO_ID 116 0 NONE
QUANTITY_SOLD 44 0 NONE
AMOUNT_SOLD 583 0 NONE
7 rows selected.
Elapsed: 00:00:00.12
Unfortunately, these statistics are not that accurate as there are over 600 distinct PROD_IDs and CUST_IDs in the SALES table but the statistics gather missed this information because it looked at such a small sample.
15:04:52 SQL> SELECT COUNT(DISTINCT prod_id), COUNT(DISTINCT cust_id)
15:05:31 2 FROM sales;
COUNT(DISTINCTPROD_ID) COUNT(DISTINCTCUST_ID)
---------------------- ----------------------
651 630
In order to improve the accuracy of the statistics, we should use the recommended best practice value for the ESTIMATE_PERECENT parameter, AUTO_SAMPLE_SIZE. Let’s check the preference value for the ESTIMATE_PERECENT parameter on the SALES table.
15:05:34 SQL> SELECT
DBMS_STATS.GET_PREFS(pname=>'ESTIMATE_PRECENT' , ownname=>'SH', tabname=>'SALES')
FROM Dual;
DBMS_STATS.GET_PREFS(PNAME=>'ESTIMATE_PERCENT',OWNNAME=>'SH',TABNAME=>'SALES')
--------------------------------------------------------------------------------
DBMS_STATS.AUTO_SAMPLE_SIZE
We see it’s actually set to DBMS_STATS.AUTO_SAMPLE_SIZE, which is good news for us.
Rather than having to find and edit all of the existing statistics gathering script that refer to the SALES table, we can set the DBMS_STATS preference PREFERENCE_OVERRIDES_PARAMETER to TRUE on the SALES table, which will automatically override all of the none default parameter values used in the DBMS_STATS command with the corresponding preferences values.
15:05:39 SQL> BEGIN
DBMS_STATS.SET_TABLE_PREFS(ownname=>'SH', tabname=>'SALES',
pname=>'PREFERENCE_OVERRIDES_PARAMETER', pvalue=>'TRUE');
END;
/
PL/SQL PROCEDURE successfully completed.
So, in our case the ESTIMATE_PERCENT of 10% will actually be overridden and DBMS_STATS.AUTO_SAMPLE_SIZE will be used. Let’s see what impact that has on both the statistics gathering performance and the quality of the statistics gathered.
15:05:40 SQL> BEGIN
DBMS_STATS.GATHER_TABLE_STATS(ownname=>'SH', tabname=>'SALES', estimate_percent=>10);
END;
/
PL/SQL PROCEDURE successfully completed.
Elapsed: 00:00:06.59
15:05:48 SQL> SELECT TABLE_NAME, SAMPLE_SIZE
FROM user_tables
WHERE TABLE_NAME= 'SALES';
TABLE_NAME SAMPLE_SIZE
------------------------------ -----------
SALES 918843
So, you can see that the time it took to gather statistics was actually less than the time for the 10% but the sample size was the total number of rows in the table. But what about the quality of the statistics?
15:06:04 SQL> SELECT column_name, num_distinct, num_nulls, histogram
FROM user_tab_col_statistics
WHERE TABLE_NAME='SALES';
COLUMN_NAME NUM_DISTINCT NUM_NULLS HISTOGRAM
-------------- --------------- ------------ ----------
PROD_ID 651 0 NONE
CUST_ID 630 0 TOP-FREQUENCY
TIME_ID 620 0 NONE
CHANNEL_ID 5 0 NONE
PROMO_ID 116 0 NONE
QUANTITY_SOLD 44 0 NONE
AMOUNT_SOLD 583 0 NONE
As you can see, we have a much more accurate set of statistics this time with each of the number of distinct values being correct. You will also notice we got a new type of histogram a TOP-FREQUENCY histogram this time. That’s because this new type of histogram is only gathered if the ESTIMATE_PERCENT is set to the default.
Remember setting PREFERENCE_OVERRIDES_PARAMETER to TRUE will override all none default parameters set in the DBMS_STATS.GATHER_*_STATS command, so you may need to adjust the default preferences for some of the parameters, for example METHOD_OPT if you don’t want to revert to the default.
You can set a table preference using the DBMS_STATS.SET_TABLE_PREFS procedure.
15:06:10 SQL>BEGIN
DBMS_STATS.SET_TABLE_PREFS(ownname=>'SH', tabname=>'SALES', pname=>METHOD_OPT,
pvalue=>'FOR ALL COLUMNS SIZE 1 FOR COLUMNS CUST_ID SIZE 254');
END;