- Introduced `DedupReader` and `MergeReader` in `src/mito2/src/read/sync/dedup.rs` and `src/mito2/src/read/sync/merge.rs` to handle deduplication and merging of sorted batches.
### Enhanced `BulkMemtable` Iteration
- Updated `BulkMemtable` in `src/mito2/src/memtable/bulk.rs` to support deduplication and merge modes during iteration.
- Added `BulkIterContext` to manage iteration context.
### Testing Enhancements
- Added comprehensive tests for `BulkMemtable` and `BulkPart` in `src/mito2/src/memtable/bulk.rs` and `src/mito2/src/memtable/bulk/part.rs`.
### Code Refactoring
- Made `BulkPart` and `BulkPartMeta` cloneable in `src/mito2/src/memtable/bulk/part.rs`.
- Exposed internal test modules for better test coverage in `src/mito2/src/memtable/time_series.rs` and `src/mito2/src/read/merge.rs`.
### New Modules
- Created `sync` module in `src/mito2/src/read.rs` to organize synchronous read operations.
**Enhance BulkPart Metadata Handling**
- Updated `BulkMemtable` in `bulk.rs` to track and update `max_sequence`, `max_timestamp`, `min_timestamp`, and `num_rows` using `BulkPart` metadata.
- Extended `BulkPartMeta` in `bulk/part.rs` to include `max_sequence`.
- Modified `mutations_to_record_batch` function to return `max_sequence` along with timestamps in `bulk/part.rs`.
- Adjusted `BulkPartEncoder` to handle the new `max_sequence` metadata in `bulk/part.rs`.
• Introduced PrimaryKeyEncoding to differentiate between dense and sparse primary key encodings.
• Updated BulkMemtableBuilder to conditionally create memtables based on primary key encoding.
• Integrated PartitionTreeMemtableBuilder as a fallback for dense encodings.
• Modified RegionWriteCtx to handle mutations differently based on primary key encoding.
• Adjusted RegionWorkerLoop to skip bulk encoding for dense primary key mutations.
• Refactored SparseEncoder to support conditional compilation for testing purposes.
### Implement Sparse Primary Key Encoding
- **Added `SparseEncoder`**: Introduced a new module `encoder.rs` to implement sparse primary key encoding, replacing the previous dense encoding approach.
- **Updated `BulkPartEncoder`**: Modified `BulkPartEncoder` in `bulk/part.rs` to utilize `SparseEncoder` for encoding primary keys.
- **Refactored `PartitionTree`**: Updated `partition_tree/tree.rs` to use the new `SparseEncoder` for primary key encoding.
- **Code Adjustments**: Removed redundant code and adjusted imports in `key_values.rs` and `partition_tree/tree.rs` to align with the new encoding strategy.
Add allocation tracking to `BulkMemtable` methods
- Updated `write_bulk` in `bulk.rs` to track memory allocation using `alloc_tracker.on_allocation`.
- Modified `freeze` in `bulk.rs` to signal completion of allocation with `alloc_tracker.done_allocating`.
**feat(memtable): Add BulkMemtableBuilder and BulkMemtable**
- Introduced `BulkMemtableBuilder` and `BulkMemtable` in `memtable.rs` and `bulk.rs` to support bulk operations.
- Added environment variable check for `enable_bulk_memtable` to conditionally use `BulkMemtableBuilder`.
- Implemented `MemtableBuilder` for `BulkMemtableBuilder` and `Memtable` for `BulkMemtable`.
- Included new fields `dedup` and `merge_mode` in `BulkMemtable` to handle deduplication and merge operations.
- Temporarily disabled reads in `BulkMemtable` with `EmptyIter` as a placeholder iterator.
* - **Refactored SST File Handling**:
- Introduced `FilePathProvider` trait and its implementations (`WriteCachePathProvider`, `RegionFilePathFactory`) to manage SST and index file paths.
- Updated `AccessLayer`, `WriteCache`, and `ParquetWriter` to use `FilePathProvider` for path management.
- Modified `SstWriteRequest` and `SstUploadRequest` to use path providers instead of direct paths.
- Files affected: `access_layer.rs`, `write_cache.rs`, `parquet.rs`, `writer.rs`.
- **Enhanced Indexer Management**:
- Replaced `IndexerBuilder` with `IndexerBuilderImpl` and made it async to support dynamic indexer creation.
- Updated `ParquetWriter` to handle multiple indexers and file IDs.
- Files affected: `index.rs`, `parquet.rs`, `writer.rs`.
- **Removed Redundant File ID Handling**:
- Removed `file_id` from `SstWriteRequest` and `CompactionOutput`.
- Updated related logic to dynamically generate file IDs where necessary.
- Files affected: `compaction.rs`, `flush.rs`, `picker.rs`, `twcs.rs`, `window.rs`.
- **Test Adjustments**:
- Updated tests to align with new path and indexer management.
- Introduced `FixedPathProvider` and `NoopIndexBuilder` for testing purposes.
- Files affected: `sst_util.rs`, `version_util.rs`, `parquet.rs`.
* chore: merge main
* refactor/generate-file-id-in-parquet-writer:
**Enhance Logging in Compactor**
- Updated `compactor.rs` to improve logging of compaction process.
- Added `itertools::Itertools` for efficient string joining.
- Moved logging of compaction inputs and outputs to the async block for better context.
- Enhanced log message to include both input and output file names for better traceability.