* chore/expose-symbols:
### Commit Message
Enhance `merge_and_dedup` Functionality in `flush.rs`
- **Function Signature Update**: Modified the `merge_and_dedup` function to accept `append_mode` and `merge_mode` as separate parameters instead of using `options`.
- **Function Accessibility**: Changed the visibility of `merge_and_dedup` to `pub` to allow external access.
- **Function Calls Update**: Updated calls to `merge_and_dedup` within `memtable_flat_sources` to align with the new function signature, passing `options.append_mode` and `options.merge_mode()` directly.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/expose-symbols:
### Add Merge and Deduplication Functionality
- **File**: `src/mito2/src/flush.rs`
- Introduced `merge_and_dedup` function to merge multiple record batch iterators and apply deduplication based on specified modes.
- Added detailed documentation for the function, explaining its arguments, behavior, and usage examples.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: update proto to include native histogram
* ci: add a CI check to ensure whitelisted dependencies are using their main branch
* chore: add changes to Cargo.toml to trigger CI
* chore: update proto
* test: update test to include histogram
* refactor(cli): unify storage configuration for export command
- Utilize ObjectStoreConfig to unify storage configuration for export command
- Support export command for Fs, S3, OSS, GCS and Azblob
- Fix the Display implementation for SecretString always returned the string
"SecretString([REDACTED])" even when the internal secret was empty.
Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
* refactor(cli): unify storage configuration for export command
- Change the encapsulation permissions of each configuration
options for every storage backend to public access.
Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>
* refactor(cli): unify storage configuration for export command
- Update the implementation of ObjectStoreConfig::build_xxx() using macro solutions
Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>
* refactor(cli): unify storage configuration for export command
- Introduce config validation for each storage type
Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
* refactor(cli): unify storage configuration for export command
- Enable trait-based polymorphism for storage type handling
(from inherent impl to trait impl)
- Extract helper functions to reduce code duplication
Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
* refactor(cli): unify storage configuration for export command
- Improve SecretString handling and validation
(Distinguishing between "not provided" and "empty string")
- Add validation when using filesystem storage
Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
* refactor(cli): unify storage configuration for export command
- Refactor storage field validation with macro
Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
* refactor(cli): unify storage configuration for export command
- support GCS Application Default Credentials (like GKE, Cloud Run, or local development with ) in export
(Enabling ADC without validating or to be present)
(Making optional in GCS validation (defaults to https://storage.googleapis.com))
Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
* refactor(cli): unify storage configuration for export command
This commit refactors the validation logic for object store configurations in the CLI to leverage clap features and reduce boilerplate.
Key changes:
- Update wrap_with_clap_prefix macro to use clap's requires attribute.
This ensures that storage-specific options (e.g., --s3-bucket) are only accepted when the corresponding backend is enabled (e.g., --s3).
- Simplify FieldValidator trait by removing the is_provided method, as dependency checks are now handled by clap.
- Introduce validate_backend! macro to standardize the validation of required fields for enabled backends.
- Refactor ExportCommand to remove explicit validation calls (validate_s3, etc.) and rely on the validation within backend constructors.
- Add integration tests for ExportCommand to verify build success with S3, OSS, GCS, and Azblob configurations.
Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
* refactor(cli): unify storage configuration for export command
- Use macros to simplify storage export implementation
Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>
* refactor(cli): unify storage configuration for export command
- Rollback StorageExport trait implementation to not using macro for better code clarity and maintainability
- Introduce format_uri helper function to unify URI formatting logic
- Fix OSS URI path bug inherited from legacy code
Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>
* refactor(cli): unify storage configuration for export command
- Remove unnecessary async_trait
Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: jeremyhi <jiachun_feng@proton.me>
---------
Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>
Co-authored-by: jeremyhi <jiachun_feng@proton.me>
* refactor/expose-symbols:
## Refactor `bulk/part.rs` to Simplify Mutation Handling
- Removed the `mutations_to_record_batch` function and its associated helper functions, including `ArraysSorter`, `timestamp_array_to_iter`, and `binary_array_to_dictionary`, to simplify the mutation handling logic in `bulk/part.rs`.
- Deleted related test functions `check_binary_array_to_dictionary` and `check_mutations_to_record_batches` from the test module, along with their associated test cases.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/expose-symbols:
### Commit Message
**Refactor and Enhance Deduplication Logic**
- **`flush.rs`**: Refactored `maybe_dedup_one` function to accept `append_mode` and `merge_mode` as parameters instead of `RegionOptions`. This change enhances flexibility in deduplication logic.
- **`memtable/bulk.rs`**: Made `BulkRangeIterBuilder` struct and its fields public to allow external access and modification, improving extensibility.
- **`sst.rs`**: Corrected a typo in the schema documentation, changing `__prmary_key` to `__primary_key` for clarity and accuracy.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: adds the foundational types and SQL parsing support for vector index
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* refactor: by suggestions
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: ensure index option values must be greater than zero
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: validate connectivity strictly
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: compile error
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: disable SIMD for ci
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix/flight-stuck-on-first-message:
**Refactor GRPC Stream Handling and Table Resolution**
- **`grpc.rs`**: Refactored the `GrpcQueryHandler` to resolve table references and check permissions only once per stream, improving efficiency. Introduced a mechanism to handle table resolution and permission checks after receiving the first `RecordBatch`.
- **`flight.rs`**: Enhanced `PutRecordBatchRequestStream` to manage stream states (`Init` and `Ready`) for better handling of schema and table name extraction. Improved error handling and logging for unexpected flight messages.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: add some doc
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: support function aliases and add MySQL-compatible aliases
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: get_table_function_source
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* refactor: add function_alias mod
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: license
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: collect per file metrics
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: divide build_cost to build_part_cost and build_reader_cost
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: limit the file metrics num to display
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: use sorted iter to get sorted files
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: output metrics in desc order
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>