* Add dynamic cache size adjustment for InvertedIndexConfig
* Increase cache sizes in integration tests for HTTP
- Updated `metadata_cache_size` from 32MiB to 64MiB
* Remove cache size settings from config and update drop_lines_with_inconsistent_results function to handle them
* Add cache size configurations for inverted index metadata and content
- Introduced `metadata_cache_size` with a default of 64MiB.
- Introduced `content_cache_size` with a default of 128MiB.
* chore/index-content-cache-default-size: Add cache size configuration options for Mito engine's inverted index
fix/reader-metrics:
Refactor cache hit/miss logic and update metrics in mito2
- Simplify cache retrieval logic in CacheManager by removing inline update_hit_miss function call.
- Add separate functions for incrementing cache hit and miss metrics.
- Update RowGroupLastRowCachedReader to use new cache hit/miss functions and refactor to new helper methods for creating Hit and Miss variants.
* Add caching for last row reader and expose cache manager
- Implement `RowGroupLastRowCachedReader` to handle cache hits and misses for last row reads.
* Add projection field to SelectorResultValue and refactor RowGroupLastRowReader
- Introduced `projection` field in `SelectorResultValue` to store projection indices.
* Add PruneReader for optimized row filtering and error handling
- Introduced `PruneReader` to replace `RowGroupReader` for optimized row filtering.
* Commit Message:
Make ReaderMetrics fields public for external access
* Add row selection support to SeqScan and FileRange readers
- Updated `SeqScan::build_part_sources` to accept an optional `TimeSeriesRowSelector`.
* Refactor `scan_region.rs` to remove unnecessary cloning of `series_row_selector`. Enhance `file_range.rs` by adding `select_all` method to check if all rows in a row group are selected, and update the logic in `reader` method to use `LastRowReader` only when all rows are
selected and no DELETE operations are present.
* Commit Message:
Enhance PruneReader and ParquetReader with reset functionality and metrics handling
Summary:
• Made Source enum public in prune.rs.
* chore: Update src/mito2/src/sst/parquet/reader.rs
---------
Co-authored-by: Yingwen <realevenyag@gmail.com>
* feat/copy-to-parquet-parameter: Commit Message:
Enhance Parquet Writer with Column-wise Configuration
Summary:
• Introduced column_wise_config function to customize per-column properties in Parquet writer.
* feat/copy-to-parquet-parameter: Commit Message:
Enhance Parquet File Format Handling for Specific Data Types
Summary:
• Added ConcreteDataType import to support specific data type handling.
* feat/copy-to-parquet-parameter: Commit Message:
Refactor Parquet file format configuration
* feat/copy-to-parquet-parameter:
Enhance Parquet file format handling for timestamp columns
- Added logic to disable dictionary encoding and set DELTA_BINARY_PACKED encoding for timestamp columns in the Parquet file format configuration.
* feat/copy-to-parquet-parameter:
Disable dictionary encoding for timestamp columns in Parquet writer and update default max_active_window_runs in TwcsOptions
- Modified Parquet writer to disable dictionary encoding for timestamp columns to optimize for increasing timestamp data.
* feat/copy-to-parquet-parameter:
Update compaction settings in tests
- Modified `test_compaction_region` to include new compaction options: `compaction.type`,
`compaction.twcs.max_active_window_runs`, and `compaction.twcs.max_inactive_window_runs`.
- Updated `test_merge_mode_compaction` to use `compaction.twcs.max_active_window_runs` and
`compaction.twcs.max_inactive_window_runs` instead of `max_active_window_files` and
`max_inactive_window_files`.
* feat/inverted-index-cache:
Update dependencies and add caching for inverted index reader
- Updated `atomic` to 0.6.0 and `uuid` to 1.9.1 in `Cargo.lock`.
- Added `moka` and `uuid` dependencies in `Cargo.toml`.
- Introduced `seek_read` method in `InvertedIndexBlobReader` for common seek and read operations.
- Added `cache.rs` module to implement caching for inverted index reader using `moka`.
- Updated `async-compression` to 0.4.11 in `puffin/Cargo.toml`.
* feat/inverted-index-cache:
Refactor InvertedIndexReader and Add Index Cache Support
- Refactored `InvertedIndexReader` to include `seek_read` method and default implementations for `fst` and `bitmap`.
- Implemented `seek_read` in `InvertedIndexBlobReader` and `CachedInvertedIndexBlobReader`.
- Introduced `InvertedIndexCache` in `CacheManager` and `SstIndexApplier`.
- Updated `SstIndexApplierBuilder` to accept and utilize `InvertedIndexCache`.
- Added `From<FileId> for Uuid` implementation.
* feat/inverted-index-cache:
Update Cargo.toml and refactor SstIndexApplier
- Moved `uuid.workspace` entry in Cargo.toml for better organization.
* feat/inverted-index-cache:
Refactor InvertedIndexCache to use type alias for Arc
- Replaced `Arc<InvertedIndexCache>` with `InvertedIndexCacheRef` type alias.
* feat/inverted-index-cache:
Add Prometheus metrics and caching improvements for inverted index
- Introduced `prometheus` and `puffin` dependencies for metrics.
* feat/inverted-index-cache:
Refactor InvertedIndexReader and Cache handling
- Simplified `InvertedIndexReader` trait by removing seek-related comments.
* feat/inverted-index-cache:
Add configurable cache sizes for inverted index metadata and content
- Introduced `index_metadata_size` and `index_content_size` in `CacheManagerBuilder`.
* feat/inverted-index-cache:
Refactor and optimize inverted index caching
- Removed `metrics.rs` and integrated cache metrics into `index.rs`.
* feat/inverted-index-cache:
Remove unused dependencies from Cargo.lock and Cargo.toml
- Removed `moka`, `prometheus`, and `puffin` dependencies from both Cargo.lock and Cargo.toml.
* feat/inverted-index-cache:
Replace Uuid with FileId in CachedInvertedIndexBlobReader
- Updated `file_id` type from `Uuid` to `FileId` in `CachedInvertedIndexBlobReader` and related methods.
* feat/inverted-index-cache:
Refactor cache configuration for inverted index
- Moved `inverted_index_metadata_cache_size` and `inverted_index_cache_size` from `MitoConfig` to `InvertedIndexConfig`.
* feat/inverted-index-cache:
Remove unnecessary conversion of `file_id` in `SstIndexApplier`
- Simplified the initialization of `CachedInvertedIndexBlobReader` by removing the redundant `into()` conversion for `file_id`.
* refactor: add Compactor trait
* chore: add compact() in Compactor trait and expose compaction module
* refactor: add CompactionRequest and open_compaction_region
* refactor: export the compaction api
* refactor: add DefaultCompactor::new_from_request
* refactor: no need to pass mito_config in open_compaction_region()
* refactor: CompactionRequest -> &CompactionRequest
* fix: typo
* docs: add docs for public apis
* refactor: remove 'Picker' from Compactor
* chore: add logs
* chore: change pub attribute for Picker
* refactor: remove do_merge_ssts()
* refactor: update comments
* refactor: use CompactionRegion argument in Picker
* chore: make compaction module public and remove unnessary clone
* refactor: move build_compaction_task() in CompactionScheduler{}
* chore: use in open_compaction_region() and add some comments for public structure
* refactor: add 'manifest_dir()' in store-api
* refactor: move the default implementation to DefaultCompactor
* refactor: remove Options from MergeOutput
* chore: minor modification
* fix: clippy errors
* fix: unit test errors
* refactor: remove 'manifest_dir()' from store-api crate(already have one in opener)
* refactor: use 'region_dir' in CompactionRequest
* refactor: refine naming
* refactor: refine naming
* refactor: remove clone()
* chore: add comments
* refactor: add PickerOutput field in CompactorRequest
* feat: introduce RemoteJobScheduler
* feat: add RemoteJobScheudler in schedule_compaction_request()
* refactor: use Option type for senders field of CompactionFinished
* refactor: modify CompactionJob
* refactor: schedule remote compaction job by options
* refactor: remove unused Options
* build: remove unused log
* refactor: fallback to local compaction if the remote compaction failed
* fix: clippy errors
* refactor: add plugins in mito2
* refactor: add from_u64() for JobId
* refactor: make schedule module public
* refactor: add error for RemoteJobScheduler
* refactor: add Notifier
* refactor: use Arc for Notifier
* refactor: add 'remote_compaction' in compaction options
* fix: clippy errors
* fix: unrecognized table option
* refactor: add 'start_time' in CompactionJob
* refactor: modify error type of RemoteJobScheduler
* chore: revert changes for request
* refactor: code refactor by review comment
* refactor: use string type for JobId
* refactor: add 'waiters' field in DefaultNotifier
* fix: build error
* refactor: take coderabbit's review comment
* refactor: use uuid::Uuid as JobId
* refactor: return waiters when schedule failed and add on_failure for DefaultNotifier
* refactor: move waiters from notifier to Job
* refactor: use ObjectStoreManagerRef in open_compaction_region()
* refactor: implement for JobId and adds related unit tests
* fix: run unit tests failed
* refactor: add RemoteJobSchedulerError
* fix: add serialize_ignore_column_ids() to fix deserialize region options failed from json string
* refactor: return empty vector if column_id is empty
* feat: add functions to find and merge sorted runs
* chore: refactor code
* chore: remove some duplicates
* chore: remove one clone
* refactor: change max_active_window_files to max_active_window_runs
* feat: integrate with sorted runs
* fix: unit tests
* feat: limit num of sorted runs during compaction
* fix: some test
* fix: some cr comments
* feat: use smallvec
* chore: rebase main
* feat/reduce-sorted-runs:
Refactor compaction logic and update test configurations
- Refactored `merge_all_runs` function to use `sort_ranged_items` for sorting.
- Improved item merging logic by iterating with `into_iter` and handling overlaps.
- Updated test configurations to use `max_active_window_runs` instead of `max_active_window_files` for consistency.
---------
Co-authored-by: tison <wander4096@gmail.com>
* feat: add update_mode to region options
* test: add test
* feat: last not null iter
* feat: time series last not null
* feat: partition tree update mode
* feat: partition tree
* fix: last not null iter slice
* test: add test for compaction
* test: use second resolution
* style: fix clippy
* chore: merge two lines
Co-authored-by: Jeremyhi <jiachun_feng@proton.me>
* chore: address CR comments
* refactor: UpdateMode -> MergeMode
* refactor: LastNotNull -> LastNonNull
* chore: return None earlier
* feat: validate region options
make merge mode optional and use default while it is None
* test: fix tests
---------
Co-authored-by: Jeremyhi <jiachun_feng@proton.me>
* refactor: remove compaction_options and use RegionOptions type for region_options
* refactor: add file_purger field in CompactionRegion
* refactor: add SerializedPickerOutput
* refactor: rename CompactorRequest to OpenCompactionRegionRequest and remove PickerOutput
* refactor: use &PickerOutput instead of clone()
* feat: introduce bulk memtable encoder/decoder
* chore: rebase main
* chore: resolve some comments
* refactor: only carries time unit in ArraysSorter
* fix: some comments
* chore: make RegionOptions serializable and add region_dir in CompactionRegion
* refactor: make `PickerOutput` and `MergeOutput` serializable and deserializable
* refactor: remove Serialize and Deserialize from PickerOutput
* chore: revert changes for file.rs
* chore: revert changes for compactor.rs and compaction.rs
---------
Co-authored-by: tison <wander4096@gmail.com>
* refactor: RangeBase
* feat: memtable range
* feat: scanner use mem range
* feat: remove base from mem range context
* feat: impl ranges for memtables
* chore: fix warnings
* refactor: make predicate cheap to clone
* refactor: MemRange -> MemtableRange
* feat: pub empty memtable to fix warnings
* test: fix sqlness result