Quantcast
Channel: How can I get running totals of recent rows faster? - Database Administrators Stack Exchange
Viewing all articles
Browse latest Browse all 4

Answer by Joe Obbish for How can I get running totals of recent rows faster?

$
0
0

I recommend testing with a bit more data to get a better idea of what's going on and to see how different approaches perform. I loaded 16 million rows into a table with the same structure. You can find the code to populate the table at the bottom of this answer.

The following approach takes 19 seconds on my machine:

SELECT TOP (10) seq    ,value    ,sum(value) OVER (ORDER BY seq ROWS UNBOUNDED PRECEDING) totalFROM dbo.[Table_1_BIG]ORDER BY seq DESC;

Actual plan here. Most of the time is spent calculating the sum and doing the sort. Worryingly, the query plan does almost all of the work for the entire result set and filters to the 10 rows that you requested at the very end. This query's runtime scales with the size of the table instead of with the size of the result set.

This option takes 23 seconds on my machine:

SELECT *    ,(        SELECT SUM(value)        FROM dbo.[Table_1_BIG]        WHERE seq <= t.seq        ) totalFROM (    SELECT TOP (10) seq        ,value    FROM dbo.[Table_1_BIG]    ORDER BY seq DESC    ) tORDER BY seq DESC;

Actual plan here. This approach scales with both the number of requested rows and the size of the table. Almost 160 million rows are read from the table:

hello

To get correct results you must sum rows for the entire table. Ideally you would perform this summation only once. It's possible to do this if you change the way that you approach the problem. You can calculate the sum for the entire table then subtract a running total from the rows in the result set. That allows you to the find the sum for the Nth row. One way to do this:

SELECT TOP (10) seq,value, [value]    - SUM([value]) OVER (ORDER BY seq DESC ROWS UNBOUNDED PRECEDING)+ (SELECT SUM([value]) FROM dbo.[Table_1_BIG]) AS totalFROM dbo.[Table_1_BIG]ORDER BY seq DESC;

Actual plan here. The new query runs in 644 ms on my machine. The table is scanned once to get the complete total then an additional row is read for each row in the result set. There's no sorting and nearly all of the time is spent calculating the sum in the parallel part of the plan:

pretty_good

If you'd like for this query to go even faster you just need to optimize the part that calculates the complete sum. The above query does a clustered index scan. The clustered index includes all columns but you only need the [value] column. One option is to create a nonclustered index on that column. Another option is to create a nonclustered columnstore index on that column. Both will improve performance. If you're on Enterprise a great option is to create an indexed view like the following:

CREATE OR ALTER VIEW dbo.Table_1_BIG__SUMWITH SCHEMABINDINGASSELECT SUM([value]) SUM_VALUE, COUNT_BIG(*) FOR_UFROM dbo.[Table_1_BIG];GOCREATE UNIQUE CLUSTERED INDEX CI ON dbo.Table_1_BIG__SUM (SUM_VALUE);

This view returns a single row so it takes up almost no space. There will be a penalty when doing DML but it shouldn't be much different than index maintenance. With the indexed view in play the query now takes 0 ms:

enter image description here

Actual plan here. The best part about this approach is the runtime isn't changed by the size of the table. The only thing that matters is how many rows are returned. For example, if you get the first 10000 rows the query now takes 18 ms to execute.

Code to populate table:

DROP TABLE IF EXISTS dbo.[Table_1_BIG];CREATE TABLE dbo.[Table_1_BIG] (    [seq] [int] NOT NULL,    [value] [bigint] NOT NULL);DROP TABLE IF EXISTS #t;CREATE TABLE #t (ID BIGINT);INSERT INTO #t WITH (TABLOCK)SELECT TOP (4000) -1 + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))FROM master..spt_values t1CROSS JOIN master..spt_values t2OPTION (MAXDOP 1);INSERT INTO dbo.[Table_1_BIG] WITH (TABLOCK)SELECT t1.ID * 4000 + t2.ID, 8 * t2.ID + t1.IDFROM (SELECT TOP (4000) ID FROM #t) t1CROSS JOIN #t t2;ALTER TABLE dbo.[Table_1_BIG]ADD CONSTRAINT [PK_Table_1] PRIMARY KEY ([seq]);

Viewing all articles
Browse latest Browse all 4

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>