The need to count distinct values of column is common operation in data analysis. It helps to decide whether an index will be useful or not. Prior to 12c database, we need to execute a query on the table to count the distinct values of column. Below is an example of the same.
select count(distinct object_name) from dummy_data;
If table is small then we can execute in above given manner but what if data is very huge? Other option is checking DBA_TAB_COLUMNS view for number of distinct. This view stores this information of table when we gather statistics of table. But what if statistics are very old or stale ? In such case we need to query the table and if table contains very huge data then this step is very time consuming.
Oracle 12c database version 18.104.22.168 introduces a APPROX_COUNT_DISTINCT function which provides approximate count distinct aggregation of the column. Processing of large volumes of data using this function is significantly faster than the exact aggregation, especially for data sets with a large number of distinct values, with negligible deviation from the exact result.
The following statement returns the approximate number of rows with distinct values for manager_id:
SELECT APPROX_COUNT_DISTINCT(manager_id) AS "Active Managers" FROM employees; Active Managers --------------- 18
The following statement returns the approximate number of distinct customers for each product:
SELECT prod_id, APPROX_COUNT_DISTINCT(cust_id) AS "Number of Customers" FROM sales GROUP BY prod_id ORDER BY prod_id; PROD_ID Number of Customers ---------- ------------------- 13 2516 14 2030 15 2105 16 2367 17 2093 18 2975 19 2630 20 3791 . . .