Often times we need to deal with spans of time in SQL. There are challenges to finding and removing overlapping spans. In this article, I will share two of the methods I’ve used to deal with this problem. I can’t take credit for initially creating them, but I’m happy to make your work easier by passing them on.
First, we need some tables and and a small set of data. Below is a small script to create a table and insert two members with multiple overlapping spans. Please also notice that MemberID 2 has a gap between 4/16/2015 and 4/30/2015. When these spans are combined we should have one span for MemberID 1 and two spans for MemberID 2.
CREATE TABLE MembershipSpans(
MembershipSpanID INT IDENTITY(1,1) PRIMARY KEY,
MemberID int NOT NULL,
StartDate date NOT NULL,
EndDate DATE NOT NULL)
go
INSERT INTO dbo.MembershipSpans(MemberID, StartDate, EndDate)
VALUES
( 1, ‘1/1/2015’, ‘3/31/2015’),
( 1, ‘1/25/2015’, ‘4/15/2015’),
( 1, ‘3/10/2015’, ‘3/16/2015’),
( 1, ‘4/1/2015’, ‘6/30/2015’),
( 1, ‘5/1/2015′, ’12/31/2015’),
( 2, ‘2/1/2015’, ‘3/31/2015’),
( 2, ‘3/25/2015’, ‘4/15/2015’),
( 2, ‘2/10/2015’, ‘3/16/2015’),
( 2, ‘5/1/2015’, ‘6/30/2015’),
( 2, ‘6/1/2015′, ’12/31/2015’)
Method 1: Simple but slow
This method creates a record for each day and then creates the spans based on continuous dates. First, we need an easy way to create the daily records. Here is a simple function that I found online to help.
CREATE Function [dbo].[fnDateTable]
(
@StartDate datetime,
@EndDate datetime,
@DayPart char(5) — support ‘day’,’month’,’year’,’hour’, default ‘day’
)
Returns @Result Table
([Date] date)
As
Begin
Declare @CurrentDate datetime
Set @CurrentDate=@StartDate
While @CurrentDate<=@EndDate
Begin
Insert Into @Result Values (@CurrentDate)
Select @CurrentDate=
Case
When @DayPart=’year’ Then DateAdd(yy,1,@CurrentDate)
When @DayPart=’month’ Then DateAdd(mm,1,@CurrentDate)
When @DayPart=’hour’ Then DateAdd(hh,1,@CurrentDate)
Else
DateAdd(dd,1,@CurrentDate)
End
End
Return
END
Next we create our daily records and group the consecutive days with a common “DateGroup”. The “DateGroup” works by giving a ranking number to each record and then subtracting that number of days from the date on that member’s daily record. This gives consecutive days the same “DateGroup”.
SELECT
m.*,
dt.[Date],
DateGroup = DATEADD(day, -DENSE_RANK() OVER (PARTITION BY MemberID ORDER BY [Date] asc), [Date])
INTO #MemberDays
FROM
MembershipSpans m
INNER JOIN
dbo.fnDateTable(‘1/1/2005′, ’12/31/2017’, ‘day’) dt
ON
dt.[Date] BETWEEN m.StartDate AND m.EndDate
Finally, we create spans by taking the minimum and maximum dates for each “DateGroup” per member. I added an “seq” column just to easily see how many spans are in each member, but it is not necessary.
SELECT
MemberID,
MIN([Date]) StartDate,
MAX([Date]) EndDate,
seq = ROW_NUMBER() OVER (PARTITION BY MemberID ORDER BY MIN([Date]) ASC)
FROM
#MemberDays
GROUP BY
MemberID,
DateGroup
Here are the results.
As you can see, the resulting spans correctly represent each member’s time periods. Also, notice the gap discussed earlier.
Method 2: Complicated but fast
Method 2 uses multiple CTEs to create a table with a row for each “StartDate” and each “EndDate”. It also adds a “DateGroup” column and ends by doing the same thing that method 1 did. See the code below:
WITH C1 AS
(
SELECT MemberID, StartDate AS ts, +1 AS type, NULL AS e,
ROW_NUMBER() OVER(PARTITION by MemberID ORDER BY StartDate, MemberID) AS s
FROM
dbo.MembershipSpans
UNION ALL
SELECT MemberID, EndDate AS ts, -1 AS type,
ROW_NUMBER() OVER(PARTITION by MemberID ORDER BY EndDate, MemberID) AS e,
NULL AS s
FROM
dbo.MembershipSpans
),
C2 AS
(
SELECT C1.*, ROW_NUMBER() OVER(PARTITION by MemberID ORDER BY ts, type DESC, MemberID) AS se
FROM
C1
),
C3 AS
(
SELECT MemberID, ts,
FLOOR((ROW_NUMBER() OVER(PARTITION by MemberID ORDER BY ts) – 1) / 2 + 1) AS DateGroup
FROM
C2
WHERE
COALESCE(s – (se – s) – 1, (se – e) – e) = 0
)
SELECT MemberID, MIN(ts) AS StartDate, max(ts) AS EndDate,
seq = ROW_NUMBER() OVER (PARTITION BY MemberID ORDER BY MIN(ts) ASC)
FROM
C3
GROUP BY
MemberID ,
DateGroup;
Here are the results:
As you can see, the results are the same. To keep this post brief, I’m not going into detail as to exactly how “DateGroup” column is generated or how the “Coalesce” statement in the “Where” clause works. We’ll just call it crazy math (it really isn’t that crazy). Feel free to discover how it works for yourself.
Both methods are accurate, so speed is what we care about. For small sets of data, either method will be fast. But what about larger data sets? To test, I created a set of 5,000 members each with 20 spans on my development VM. Some spans were overlapping and some were not. Both returned the same results. Method 1 took too long for me to wait probably due to the huge temp table. Method 2 only took 364 milliseconds. As you can see, except for a few special situations, you should pretty much always use Method 2.
In this post I demonstrated two ways to tackle a common SQL problem. There are numerous other ways and I’ve tested many of them. In the end, I almost always use Method 2 due it’s speed. I hope this post helps to make you SQL work a little easier. As always, feel free to comment or contact me with any questions.