Big Data/Analytics Zone is brought to you in partnership with:

I am the founder and CEO of Data Geekery GmbH, located in Zurich, Switzerland. With our company, we have been selling database products and services around Java and SQL since 2013. Ever since my Master's studies at EPFL in 2006, I have been fascinated by the interaction of Java and SQL. Most of this experience I have obtained in the Swiss E-Banking field through various variants (JDBC, Hibernate, mostly with Oracle). I am happy to share this knowledge at various conferences, JUGs, in-house presentations and on our blog. Lukas is a DZone MVB and is not an employee of DZone and has posted 255 posts at DZone. You can read more from them at their website. View Full User Profile

How to Use Recursive SQL for Data Normalization

10.28.2013
| 5292 views |
  • submit to reddit

Recursive SQL can be awesome, although a bit hard to read in its SQL standard beauty. Let’s assume you have some aggregated data with dates and a number of events per date:

|                           DATE | COUNT |
|--------------------------------|-------|
| October, 01 2013 00:00:00+0000 |     2 |
| October, 02 2013 00:00:00+0000 |     1 |
| October, 03 2013 00:00:00+0000 |     3 |
| October, 04 2013 00:00:00+0000 |     4 |
| October, 05 2013 00:00:00+0000 |     2 |
| October, 06 2013 00:00:00+0000 |     0 |
| October, 07 2013 00:00:00+0000 |     2 |

Now let’s assume you want to normalise or “unaggregate” this data, generating “COUNT” records per date. The desired output is this:

|                           DATE | EVENT_NUMBER |
|--------------------------------|--------------|
| October, 01 2013 00:00:00+0000 |            1 |
| October, 01 2013 00:00:00+0000 |            2 |
| October, 02 2013 00:00:00+0000 |            1 |
| October, 03 2013 00:00:00+0000 |            1 |
| October, 03 2013 00:00:00+0000 |            2 |
| October, 03 2013 00:00:00+0000 |            3 |
| October, 04 2013 00:00:00+0000 |            1 |
| October, 04 2013 00:00:00+0000 |            2 |
| October, 04 2013 00:00:00+0000 |            3 |
| October, 04 2013 00:00:00+0000 |            4 |
| October, 05 2013 00:00:00+0000 |            1 |
| October, 05 2013 00:00:00+0000 |            2 |
| October, 07 2013 00:00:00+0000 |            1 |
| October, 07 2013 00:00:00+0000 |            2 |

As you may have noticed, there are no records for those dates with zero events (October 06). With recursive SQL, this is rather simple to achieve.

with recursive
 
-- Data could also be a regular table containing
-- the actual data
data(date, count) as (
  select date '2013-10-01', 2 union all
  select date '2013-10-02', 1 union all
  select date '2013-10-03', 3 union all
  select date '2013-10-04', 4 union all
  select date '2013-10-05', 2 union all
  select date '2013-10-06', 0 union all
  select date '2013-10-07', 2
),
 
-- This is the recursive common table expression
-- It starts with all data where count > 0
-- ... and then recurses by subtracting one
recurse(date, count) as (
  select date, count
  from data
  where count > 0
  union all
  select date, count - 1
  from recurse
  where count > 1
)
select date, count event_number from recurse
order by date asc, event_number asc;

See also this SQLFiddle to see the above CTE in action.

Incredibly, Oracle’s CONNECT BY clause doesn’t seem to be an option here. I challenge you to find a better solution, though! For instance, this beautiful solution that works with PostgreSQL:

with recursive
data(date, count) as (
  select date '2013-10-01', 2 union all
  select date '2013-10-02', 1 union all
  select date '2013-10-03', 3 union all
  select date '2013-10-04', 4 union all
  select date '2013-10-05', 2 union all
  select date '2013-10-06', 0 union all
  select date '2013-10-07', 2
)
select date, generate_series(1, count) event_number
from data
where count > 0
order by date asc, event_number asc;
Published at DZone with permission of Lukas Eder, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

David Johnston replied on Tue, 2013/10/29 - 4:04pm

With respect to the PostgreSQL version:

The "recursive" is pointless, the query is a simple non-recursive one.

Also, the "SELECT ... UNION ALL SELECT ..." idiom is functional but not typical, a VALUES expression is much easier to code and understand IMO.

with 
data(date, count) as (
VALUES
( date '2013-10-01', 2 ),
( date '2013-10-02', 1 ),
( date '2013-10-03', 3 ),
( date '2013-10-04', 4 ),
( date '2013-10-05', 2 ),
( date '2013-10-06', 0 ),
( date '2013-10-07', 2 )
)
select date, generate_series(1, count) AS event_number
from data
where count > 0
order by date asc, event_number asc;

Lukas Eder replied on Wed, 2013/10/30 - 1:45am in response to: David Johnston

Thanks, David. You're right of course. The "recursive" was a copy-paste error, whereas the "VALUES" table value constructor is something I keep forgetting about, unfortunately. I wish, all databases supported it.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.