feat: optimizer rule for windowed sort (#4874)

* basic impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* implement physical rule

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: install windowed sort physical rule and optimize partition ranges

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add logs and sqlness test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: introduce PartSortExec for partitioned sorting

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* tune exec nodes' properties and metrics

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clean up

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix typo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* debug: add more info on very wrong

* debug: also print overlap ranges

* feat: add check when emit PartSort Stream

* dbg: info on overlap working range

* feat: check batch range is inside part range

* set distinguish partition range param

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* chore: more logs

* update sqlness

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* tune optimizer

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clean up

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix lints

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix windowed sort rule

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: early terminate sort stream

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* chore: remove min/max check

* chore: remove unused windowed_sort module, uuid feature and refactor region_scanner to synchronous

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* chore: print more fuzz log

* chore: more log

* fix: part sort should skip empty part

* chore: remove insert logs

* tests: empty PartitionRange

* refactor: testcase

* docs: update comment&tests: all empty

* ci: enlarge etcd cpu limit

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: discord9 <discord9@163.com>
Co-authored-by: evenyag <realevenyag@gmail.com>
This commit is contained in:
Ruihang Xia
2024-10-29 15:46:05 +08:00
committed by GitHub
parent 0ee455a980
commit 03f2fa219d
21 changed files with 930 additions and 230 deletions

View File

@@ -0,0 +1,81 @@
CREATE TABLE test(i INTEGER, t TIMESTAMP TIME INDEX);
Affected Rows: 0
INSERT INTO test VALUES (1, 1), (NULL, 2), (1, 3);
Affected Rows: 3
ADMIN FLUSH_TABLE('test');
+---------------------------+
| ADMIN FLUSH_TABLE('test') |
+---------------------------+
| 0 |
+---------------------------+
INSERT INTO test VALUES (2, 4), (2, 5), (NULL, 6);
Affected Rows: 3
ADMIN FLUSH_TABLE('test');
+---------------------------+
| ADMIN FLUSH_TABLE('test') |
+---------------------------+
| 0 |
+---------------------------+
INSERT INTO test VALUES (3, 7), (3, 8), (3, 9);
Affected Rows: 3
ADMIN FLUSH_TABLE('test');
+---------------------------+
| ADMIN FLUSH_TABLE('test') |
+---------------------------+
| 0 |
+---------------------------+
INSERT INTO test VALUES (4, 10), (4, 11), (4, 12);
Affected Rows: 3
SELECT * FROM test ORDER BY t LIMIT 5;
+---+-------------------------+
| i | t |
+---+-------------------------+
| 1 | 1970-01-01T00:00:00.001 |
| | 1970-01-01T00:00:00.002 |
| 1 | 1970-01-01T00:00:00.003 |
| 2 | 1970-01-01T00:00:00.004 |
| 2 | 1970-01-01T00:00:00.005 |
+---+-------------------------+
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
EXPLAIN ANALYZE SELECT * FROM test ORDER BY t LIMIT 5;
+-+-+-+
| stage | node | plan_|
+-+-+-+
| 0_| 0_|_MergeScanExec: REDACTED
|_|_|_|
| 1_| 0_|_GlobalLimitExec: skip=0, fetch=5 REDACTED
|_|_|_SortPreservingMergeExec: [t@1 ASC NULLS LAST] REDACTED
|_|_|_WindowedSortExec REDACTED
|_|_|_PartSortExec t@1 ASC NULLS LAST REDACTED
|_|_|_SeqScan: region=REDACTED, partition_count=2 (1 memtable ranges, 1 file 1 ranges) REDACTED
|_|_|_|
|_|_| Total rows: 5_|
+-+-+-+
DROP TABLE test;
Affected Rows: 0

View File

@@ -0,0 +1,26 @@
CREATE TABLE test(i INTEGER, t TIMESTAMP TIME INDEX);
INSERT INTO test VALUES (1, 1), (NULL, 2), (1, 3);
ADMIN FLUSH_TABLE('test');
INSERT INTO test VALUES (2, 4), (2, 5), (NULL, 6);
ADMIN FLUSH_TABLE('test');
INSERT INTO test VALUES (3, 7), (3, 8), (3, 9);
ADMIN FLUSH_TABLE('test');
INSERT INTO test VALUES (4, 10), (4, 11), (4, 12);
SELECT * FROM test ORDER BY t LIMIT 5;
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
EXPLAIN ANALYZE SELECT * FROM test ORDER BY t LIMIT 5;
DROP TABLE test;