Overlapping Date Spans in SQL vs Some Crazy Math

Often times we need to deal with spans of time in SQL. There are challenges to finding and removing overlapping spans. In this article, I will share two of the methods I’ve used to deal with this problem. I can’t take credit for initially creating them, but I’m happy to make your work easier by passing them on.

First, we need some tables and and a small set of data. Below is a small script to create a table and insert two members with multiple overlapping spans. Please also notice that MemberID 2 has a gap between 4/16/2015 and 4/30/2015. When these spans are combined we should have one span for MemberID 1 and two spans for MemberID 2.

CREATE TABLE MembershipSpans(
MembershipSpanID INT IDENTITY(1,1) PRIMARY KEY,
MemberID int NOT NULL,
StartDate date NOT NULL,
EndDate DATE NOT NULL)
go

INSERT INTO dbo.MembershipSpans(MemberID, StartDate, EndDate)
VALUES
( 1, ‘1/1/2015’, ‘3/31/2015’),
( 1, ‘1/25/2015’, ‘4/15/2015’),
( 1, ‘3/10/2015’, ‘3/16/2015’),
( 1, ‘4/1/2015’, ‘6/30/2015’),
( 1, ‘5/1/2015′, ’12/31/2015’),

( 2, ‘2/1/2015’, ‘3/31/2015’),
( 2, ‘3/25/2015’, ‘4/15/2015’),
( 2, ‘2/10/2015’, ‘3/16/2015’),
( 2, ‘5/1/2015’, ‘6/30/2015’),
( 2, ‘6/1/2015′, ’12/31/2015’)

Method 1: Simple but slow

This method creates a record for each day and then creates the spans based on continuous dates. First, we need an easy way to create the daily records. Here is a simple function that I found online to help.

CREATE Function [dbo].[fnDateTable]
(
@StartDate datetime,
@EndDate datetime,
@DayPart char(5) — support ‘day’,’month’,’year’,’hour’, default ‘day’
)
Returns @Result Table
([Date] date)
As
Begin
Declare @CurrentDate datetime
Set @CurrentDate=@StartDate
While @CurrentDate<=@EndDate
Begin
   Insert Into @Result Values (@CurrentDate)
   Select @CurrentDate=
      Case
      When @DayPart=’year’ Then DateAdd(yy,1,@CurrentDate)
      When @DayPart=’month’ Then DateAdd(mm,1,@CurrentDate)
      When @DayPart=’hour’ Then DateAdd(hh,1,@CurrentDate)
      Else
      DateAdd(dd,1,@CurrentDate)
      End
End
Return
END

Next we create our daily records and group the consecutive days with a common “DateGroup”. The “DateGroup” works by giving a ranking number to each record and then subtracting that number of days from the date on that member’s daily record. This gives consecutive days the same “DateGroup”.

SELECT
m.*,
dt.[Date],
DateGroup = DATEADD(day, -DENSE_RANK() OVER (PARTITION BY                                  MemberID ORDER BY [Date] asc), [Date])
INTO #MemberDays
FROM
MembershipSpans m
INNER JOIN
dbo.fnDateTable(‘1/1/2005′, ’12/31/2017’, ‘day’) dt
   ON
   dt.[Date] BETWEEN m.StartDate AND m.EndDate

Finally, we create spans by taking the minimum and maximum dates for each “DateGroup” per member. I added an “seq” column just to easily see how many spans are in each member, but it is not necessary.

SELECT
MemberID,
MIN([Date]) StartDate,
MAX([Date]) EndDate,
seq = ROW_NUMBER() OVER (PARTITION BY MemberID ORDER BY MIN([Date]) ASC)
FROM
#MemberDays
GROUP BY
MemberID,
DateGroup

Here are the results.

As you can see, the resulting spans correctly represent each member’s time periods. Also, notice the gap discussed earlier.

Method 2: Complicated but fast

Method 2 uses multiple CTEs to create a table with a row for each “StartDate” and each “EndDate”. It also adds a “DateGroup” column and ends by doing the same thing that method 1 did. See the code below:

WITH C1 AS
(
SELECT MemberID, StartDate AS ts, +1 AS type, NULL AS e,
ROW_NUMBER() OVER(PARTITION by MemberID ORDER BY StartDate, MemberID) AS s
FROM
dbo.MembershipSpans

UNION ALL

SELECT MemberID, EndDate AS ts, -1 AS type,
ROW_NUMBER() OVER(PARTITION by MemberID ORDER BY EndDate, MemberID) AS e,
NULL AS s
FROM
dbo.MembershipSpans
),
C2 AS
(
SELECT C1.*, ROW_NUMBER() OVER(PARTITION by MemberID ORDER BY ts, type DESC, MemberID) AS se

FROM
C1
),
C3 AS
(
SELECT MemberID, ts,
FLOOR((ROW_NUMBER() OVER(PARTITION by MemberID ORDER BY ts) – 1) / 2 + 1) AS DateGroup

FROM
C2
WHERE
COALESCE(s – (se – s) – 1, (se – e) – e) = 0
)
SELECT MemberID, MIN(ts) AS StartDate, max(ts) AS EndDate,
seq = ROW_NUMBER() OVER (PARTITION BY MemberID ORDER BY MIN(ts) ASC)
FROM
C3
GROUP BY
MemberID ,
DateGroup;

Here are the results:

As you can see, the results are the same. To keep this post brief, I’m not going into detail as to exactly how “DateGroup” column is generated or how the “Coalesce” statement in the “Where” clause works. We’ll just call it crazy math (it really isn’t that crazy). Feel free to discover how it works for yourself.

Both methods are accurate, so speed is what we care about. For small sets of data, either method will be fast. But what about larger data sets? To test, I created a set of 5,000 members each with 20 spans on my development VM. Some spans were overlapping and some were not. Both returned the same results. Method 1 took too long for me to wait probably due to the huge temp table. Method 2 only took 364 milliseconds. As you can see, except for a few special situations, you should pretty much always use Method 2.

In this post I demonstrated two ways to tackle a common SQL problem. There are numerous other ways and I’ve tested many of them. In the end, I almost always use Method 2 due it’s speed. I hope this post helps to make you SQL work a little easier. As always, feel free to comment or contact me with any questions.