APPROX_COUNT_DISTINCT Function – Oracle 12c

The need to count distinct values of column is common operation in data analysis. It helps to decide whether an index will be useful or not. Prior to 12c database, we need to execute a query on the table to count the distinct values of column. Below is an example of the same.

select count(distinct object_name) from dummy_data;

If table is small then we can execute in above given manner but what if data is very huge? Other option is checking DBA_TAB_COLUMNS view for number of distinct. This view stores this information of table when we gather statistics of table. But what if statistics are very old or stale ? In such case we need to query the table and if table contains very huge data then this step is very time consuming.

Oracle 12c database version 12.1.0.2 introduces a APPROX_COUNT_DISTINCT function which provides approximate count distinct aggregation of the column. Processing of large volumes of data using this function is significantly faster than the exact aggregation, especially for data sets with a large number of distinct values, with negligible deviation from the exact result.

The following statement returns the approximate number of rows with distinct values for manager_id:

SELECT APPROX_COUNT_DISTINCT(manager_id) AS "Active Managers"
  FROM employees;

Active Managers
---------------
             18

The following statement returns the approximate number of distinct customers for each product:

SELECT prod_id, APPROX_COUNT_DISTINCT(cust_id) AS "Number of Customers"
  FROM sales
  GROUP BY prod_id
  ORDER BY prod_id;

   PROD_ID Number of Customers
---------- -------------------
        13                2516
        14                2030
        15                2105
        16                2367
        17                2093
        18                2975
        19                2630
        20                3791
. . .

Reference:

Oracle Database Online Documentation 12c

APPROX_COUNT_DISTINCT Function – Oracle 12c

Leave a Reply