* feat/manual-compaction-parallelism:
### Add Parallelism Support to Compaction Requests
- **`Cargo.lock` & `Cargo.toml`**: Updated `greptime-proto` dependency to a new revision.
- **`flush_compact_table.rs`**: Enhanced `parse_compact_params` to support a new `parallelism` parameter, allowing users to
specify the level of parallelism for table compaction.
- **`handle_compaction.rs`**: Integrated `parallelism` into the compaction scheduling process, defaulting to 1 if not
specified.
- **`request.rs` & `region_request.rs`**: Modified `CompactRequest` to include `parallelism`, with logic to handle unspecifie
values.
- **`requests.rs`**: Updated `CompactTableRequest` structure to include an optional `parallelism` field.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/manual-compaction-parallelism:
### Commit Message
Enhance Compaction Request Handling
- **`flush_compact_table.rs`**:
- Renamed `parse_compact_params` to `parse_compact_request`.
- Introduced `DEFAULT_COMPACTION_PARALLELISM` constant.
- Updated parsing logic to handle keyword arguments for `strict_window` and `regular` compaction types, including `parallelism` and `window`.
- Modified tests to reflect changes in parsing logic and default parallelism handling.
- **`request.rs`**:
- Updated `parallelism` handling in `RegionRequestBody::Compact` to use the new default value.
- **`requests.rs`**:
- Changed `CompactTableRequest` to use a non-optional `parallelism` field with a default value of 1.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/manual-compaction-parallelism:
### Update `flush_compact_table.rs` Parameter Validation
- Modified parameter validation in `flush_compact_table.rs` to restrict the maximum number of parameters from 4 to 3 in the `parse_compact_request` function.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/manual-compaction-parallelism:
Update `greptime-proto` dependency
- Updated the `greptime-proto` dependency to a new revision in both `Cargo.lock` and `Cargo.toml`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: struct value
Signed-off-by: Ning Sun <sunning@greptime.com>
* feat: update for proto module
* feat: wip struct type
* feat: implement more vector operations
* feat: make datatype and api
* feat: reoslve some compilation issues
* feat: resolve all compilation issues
* chore: format update
* test: resolve tests
* test: test and refactor value-to-pb
* feat: add more tests and fix for value types
* chore: remove dbg
* feat: test and fix iterator
* fix: resolve struct_type issue
* refactor: use vec for struct items
* chore: update proto to main branch
* refactor: address some of review issues
* refactor: update for further review
* Add validation on new methods
* feat: update struct/list json serialization
* refactor: reimplement get in struct_vector
* refactor: struct vector functions
* refactor: fix lint issue
* refactor: address review comments
---------
Signed-off-by: Ning Sun <sunning@greptime.com>
* feat: use correct projection index for old format
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: remove allow dead_code from format
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: check and convert old format to flat format
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: sub primary key num from projection
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: always convert the batch in FlatRowGroupReader
Signed-off-by: evenyag <realevenyag@gmail.com>
* style: fix clippy
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: Change &Option<&[]> to Option<&[]>
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: only build arrow schema once
adds a method flat_sst_arrow_schema_column_num() to get the field num
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: Handle flat format and old format separately
Adds two structs ParquetFlat and ParquetPrimaryKeyToFlat.
ParquetPrimaryKeyToFlat delegates stats and projection to the
PrimaryKeyReadFormat.
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: handle non string tag correctly
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: do not register file cache twice
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: clean temp files
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: add rows and bytes to flush success log
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: convert format in memtable
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: add compaction flag to ScanInput
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: compaction should use old format for sparse encoding
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: merge schema use old format in sparse encoding
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: reads legacy format but not convert if skip_auto_convert
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: suppport sparse encoding in bulk parts
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: add datafusion-postgres dependency
* refactor: move and include pg_catalog udfs
* chore: update upstream
* feat: register table function pg_get_keywords
* feat: bridge CatalogInfo for our CatalogManager
Signed-off-by: Ning Sun <sunning@greptime.com>
* feat: convert pg_catalog table to our system table
* feat: bridge system catalog with datafusion-postgres
Signed-off-by: Ning Sun <sunning@greptime.com>
* feat: add more udfs
* feat: add compatibility rewriter to postgres handler
* fix: various fix
* fmt: fix
* fix: use functions from pg_catalog library
* fmt
* fix: sqlness runner
Signed-off-by: Ning Sun <sunning@greptime.com>
* test: adopt arrow 56.0 to 56.1 memory size change
* fix: add additional udfs
* chore: format
* refactor: return None when creating system table failed
Signed-off-by: Ning Sun <sunning@greptime.com>
* chore: provide safety comments about expect usage
---------
Signed-off-by: Ning Sun <sunning@greptime.com>
fix/disable-parquet-stats-truncate:
- **Update `memcomparable` Dependency**: Switched from crates.io to a Git repository for `memcomparable` in `Cargo.lock`, `mito-codec/Cargo.toml`, and removed it from `mito2/Cargo.toml`.
- **Enhance Parquet Writer Properties**: Added `set_statistics_truncate_length` and `set_column_index_truncate_length` to `WriterProperties` in `parquet.rs`, `bulk/part.rs`, `partition_tree/data.rs`, and `writer.rs`.
- **Add Test for Corrupt Scan**: Introduced a new test module `scan_corrupt.rs` in `mito2/src/engine` to verify handling of corrupt data.
- **Update Test Data**: Modified test data in `flush.rs` to reflect changes in file sizes and sequences.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/update-sequence-on-region-edit:
### Commit Message
Refactor `get_last_seq_num` Method Across Engines
- **Change Return Type**: Updated the `get_last_seq_num` method to return `Result<SequenceNumber, BoxedError>` instead of `Result<Option<SequenceNumber>, BoxedError>` in the following files:
- `src/datanode/src/tests.rs`
- `src/file-engine/src/engine.rs`
- `src/metric-engine/src/engine.rs`
- `src/metric-engine/src/engine/read.rs`
- `src/mito2/src/engine.rs`
- `src/query/src/optimizer/test_util.rs`
- `src/store-api/src/region_engine.rs`
- **Enhance Region Edit Handling**: Modified `RegionWorkerLoop` in `src/mito2/src/worker/handle_manifest.rs` to update file sequences during region edits.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* add committed_sequence to RegionEdit
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/update-sequence-on-region-edit:
### Commit Message
Refactor sequence retrieval method
- **Renamed Method**: Changed `get_last_seq_num` to `get_committed_sequence` across multiple files to better reflect its purpose of retrieving the latest committed sequence.
- Affected files: `tests.rs`, `engine.rs` in `file-engine`, `metric-engine`, `mito2`, `test_util.rs`, and `region_engine.rs`.
- **Removed Unused Struct**: Deleted `RegionSequencesRequest` struct from `region_request.rs` as it is no longer needed.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/update-sequence-on-region-edit:
**Add Committed Sequence Handling in Region Engine**
- **`engine.rs`**: Introduced a new test module `bump_committed_sequence_test` to verify committed sequence handling.
- **`bump_committed_sequence_test.rs`**: Added a test to ensure the committed sequence is correctly updated and persisted across region reopenings.
- **`action.rs`**: Updated `RegionManifest` and `RegionManifestBuilder` to include `committed_sequence` for tracking.
- **`manager.rs`**: Adjusted manifest size assertion to accommodate new committed sequence data.
- **`opener.rs`**: Implemented logic to override committed sequence during region opening.
- **`version.rs`**: Added `set_committed_sequence` method to update the committed sequence in `VersionControl`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/update-sequence-on-region-edit:
**Enhance `test_bump_committed_sequence` in `bump_committed_sequence_test.rs`**
- Updated the test to include row operations using `build_rows`, `put_rows`, and `rows_schema` to verify the committed sequence behavior.
- Adjusted assertions to reflect changes in committed sequence after row operations and region edits.
- Added comments to clarify the expected behavior of committed sequence after reopening the region and replaying the WAL.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/update-sequence-on-region-edit:
**Enhance Region Sequence Management**
- **`bump_committed_sequence_test.rs`**: Updated test to handle region reopening and sequence management, ensuring committed sequences are correctly set and verified after edits.
- **`opener.rs`**: Improved committed sequence handling by overriding it only if the manifest's sequence is greater than the replayed sequence. Added logging for mutation sequence replay.
- **`region_write_ctx.rs`**: Modified `push_mutation` and `push_bulk` methods to adopt sequence numbers from parameters, enhancing sequence management during write operations.
- **`handle_write.rs`**: Updated `RegionWorkerLoop` to pass sequence numbers in `push_bulk` and `push_mutation` methods, ensuring consistent sequence handling.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/update-sequence-on-region-edit:
### Remove Debug Logging from `opener.rs`
- Removed debug logging for mutation sequences in `opener.rs` to clean up the output and improve performance.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: support flat format in SeqScan
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: support flat format in unordered scan
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: support parallel read for flat format in SeqScan
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: rename flat DedupReader to FlatDedupReader
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: address review comments
It also precomputes the input arrow schema
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>