--- Feature Name: Export/Import V2 Tracking Issue: TBD Date: 2025-12-30 Author: @jeremy --- # Summary This RFC proposes a redesigned export/import system (V2) for GreptimeDB that addresses fundamental issues in the current implementation. The new design leverages time-series characteristics for efficient chunking, provides clear storage semantics, and ensures data reliability through comprehensive validation mechanisms. # Motivation ## Problems with V1 The current export/import implementation has several critical issues: 1. **Ambiguous path semantics**: `output_dir` serves dual purposes (local vs remote), causing confusion 2. **No chunking strategy**: Large databases cannot be exported/imported efficiently 3. **Poor reliability**: No resume capability, no progress tracking 4. **Limited format support**: Relies on SQL dumps, not optimized for time-series data ## Goals 1. **Clear semantics**: Explicit distinction between remote storage and server-local paths 2. **Scalability**: Support TB-scale databases through time-based chunking 3. **Reliability**: Resume capability, progress tracking, data integrity verification 4. **Performance**: Streaming export/import using native COPY DATABASE ## Non-Goals - Schema evolution/migration (out of scope) - Cross-version compatibility (V2 does not support V1 format) - Real-time replication (use dedicated replication mechanism) # Guide-level Explanation ## Core Concepts ### Snapshot A **snapshot** represents the complete state of a GreptimeDB catalog at a point in time, including schemas and data. **Snapshot structure**: ``` snapshot-20250101/ ├── manifest.json # Snapshot metadata and chunk index ├── schema/ │ ├── schemas.json # Schema definitions (JSON) │ ├── tables.json # Table definitions (JSON) │ └── views.json # View definitions (JSON) └── data/ ├── 1/ │ ├── public.metrics.parquet │ └── public.logs.parquet ├── 2/ │ ├── public.metrics.parquet │ └── public.logs.parquet └── 3/ ├── public.metrics.parquet └── public.logs.parquet ``` **Key properties**: - Self-contained (all information needed for restore) - Immutable (content never changes after creation) - Verifiable (checksums at file, chunk, and snapshot levels) - Schema-only snapshots contain only `manifest.json` and `schema/`; `data/` is absent, `chunks` is empty, and later data append is rejected (use `--force` to recreate) ### Chunk A **chunk** is a time-range partition of data. Each chunk is independently exportable/importable and retryable. **Chunk properties**: - Has explicit `start_time` and `end_time` (recorded in manifest) - Non-overlapping with other chunks - Covers a contiguous time range - Independent (can be exported/imported in any order) - Atomic (either fully succeeds or fully fails) **Chunk directory naming**: - Sequential numbers: `1/`, `2/`, `3/`, ... - Time ranges are recorded in manifest.json ### Storage Types V2 supports two storage types: | Type | Example | Use Case | | ------------------ | ----------------------- | ------------------------ | | **Remote Storage** | `s3://bucket/snapshots` | Production (recommended) | | **Server Path** | `file:///data/backup` | Local dev/testing | **Important**: Local paths (e.g., `/tmp/export`, `./backup`) are **not supported** because schema export (CLI) and data export (server) run in different processes, which would split the snapshot across two machines. ## Basic Usage ### Export ```bash # Full snapshot to S3 greptime export create \ --to s3://my-bucket/snapshots/prod-20250101 # Incremental snapshot (time range) greptime export create \ --start-time 2024-12-01T00:00:00Z \ --end-time 2024-12-31T23:59:59Z \ --to s3://my-bucket/snapshots/prod-december # Schema-only export greptime export create \ --schema-only \ --to s3://my-bucket/snapshots/prod-schema-only Schema-only snapshots cannot be resumed with data; use `--force` to recreate. # Export with specific format (default: parquet) greptime export create \ --format csv \ --to s3://my-bucket/snapshots/prod-csv # Resume interrupted export (automatic if snapshot exists) greptime export create \ --to s3://my-bucket/snapshots/prod-20250101 # Force recreate (delete existing and start over) greptime export create \ --to s3://my-bucket/snapshots/prod-20250101 \ --force ``` ### Import ```bash # Full import greptime import \ --from s3://my-bucket/snapshots/prod-20250101 # Partial import (selected schemas) greptime import \ --from s3://my-bucket/snapshots/prod-20250101 \ --schemas public,private # Dry-run (verify without importing) greptime import \ --from s3://my-bucket/snapshots/prod-20250101 \ --dry-run ``` # Reference-level Explanation ## Architecture The export/import system consists of four main components: 1. **CLI**: Command parsing, progress display, state management 2. **Coordinator**: Snapshot planning, chunk scheduling, retry logic 3. **Schema Engine**: DDL extraction, JSON serialization, schema validation 4. **Data Engine**: Time-based chunking, streaming export/import via COPY DATABASE All components use OpenDAL for storage abstraction, supporting S3, OSS, GCS, Azure Blob, and local filesystem. ## Data Format ### Manifest File The manifest is a JSON file containing snapshot metadata and chunk index: **Key fields**: - `snapshot_id`: Unique identifier (UUID) - `catalog`, `schemas`: Catalog and schema list - `time_range`: Overall time range covered - `schema_only`: Whether the snapshot contains schema only - `chunks[]`: Array of chunk metadata - `format`: Data format for exported files - `checksum`: Snapshot-level SHA256 checksum **Chunk metadata structure**: Each chunk entry in the manifest contains: - `id`: Chunk identifier (sequential number) - `time_range`: Start and end timestamps - `status`: Export status (Pending, InProgress, Completed, Failed) - `files`: List of data files in the chunk directory - `checksum`: Chunk-level checksum for integrity verification ### Schema Files Schema definitions are stored as JSON (not SQL) for better version compatibility and programmatic processing. **Why JSON instead of SQL?** - Version-agnostic (can handle schema evolution) - Programmatically processable (direct deserialization) - Extensible (easy to add new fields) ### Data Files Data is exported via COPY DATABASE, supporting multiple formats: - **Parquet** (default): Columnar format, efficient compression, recommended for production use - **CSV**: Human-readable, universally compatible, useful for debugging and third-party integration - **JSON**: Structured text format, flexible schema representation - Other formats supported by COPY DATABASE Format is specified via `--format` flag and recorded in manifest.json. Import automatically detects the format from manifest. ## Core Design Decisions ### 1. Storage Path Validation Export/import operations validate storage paths to prevent misconfigurations: **Path types**: - `s3://`, `oss://`, `gs://`, `azblob://` → Remote storage (recommended) - `file:///` → Server-local path (only allowed when CLI and server are co-located) - `/path`, `./path` → Rejected (would split snapshot across machines) **Validation**: - Detect path type by URI scheme - For `file:///`, verify server endpoint resolves to localhost or local IP - Reject bare paths without URI scheme ### 2. Time-based Chunking Data is partitioned into time-range chunks for efficient parallel processing and retry. **Algorithm**: - Generate non-overlapping half-open intervals with configurable time window (default: 1 day) - Chunks are numbered sequentially (1, 2, 3, ...) - Each chunk's time range is recorded in manifest.json - Empty chunks (no data in time window) are skipped and not recorded in manifest - Guarantees: no gaps, no overlaps, complete coverage of time range **Chunk time window selection**: The optimal chunk time window depends on data density (volume per unit time): - **Target**: 100MB - 1GB per chunk (balances parallelism and retry cost) - **Default**: 1 day (suitable for most workloads) - **Recommendations**: - High density (>1GB/day): Use smaller windows like 1h, 6h, or 12h - Low density (<100MB/day): Use larger windows like 7d or 30d - Time windows can be adjusted flexibly (not required to align to day boundaries) **Example**: 500GB database spanning 30 days → ~16.7GB/day → use 1h chunks → ~695MB/chunk ### 3. Data Export via COPY DATABASE V2 leverages the existing `COPY DATABASE TO` for data export, with additional tooling layer for chunking, resume, and metadata management. **How it works**: 1. **Export tool** generates chunks based on time range and chunk window 2. For each chunk, **calls COPY DATABASE** with specific time range: ```sql COPY DATABASE TO '' WITH ( START_TIME = '', END_TIME = '', FORMAT = 'parquet' ) ``` 3. **Export tool** records chunk metadata, calculates checksums, and updates manifest **Separation of concerns**: - **COPY DATABASE** (data layer): Streaming export, format support, time filtering - **Export tool** (tooling layer): Chunking, resume, manifest management, checksum calculation, schema export ### 4. Data Integrity Three-layer checksum validation ensures data integrity: 1. **File-level**: SHA256 of each Parquet file 2. **Chunk-level**: Aggregate checksum of all files in a chunk 3. **Snapshot-level**: Aggregate checksum of all chunks Checksums are verified during import before data is written to the database. ### 5. Retry and Resume **Chunk-level retry**: - Each chunk is an independent unit of work - Failed chunks are retried with exponential backoff (capped at 5 minutes) - Successful chunks are never re-exported **Resume capability**: - Manifest tracks chunk status (Pending, InProgress, Completed, Failed) - Export/import automatically resumes when executed on existing snapshot - Skips completed chunks, retries failed/in-progress chunks, processes pending chunks - Works across process restarts - Use `--force` (export only) to delete existing snapshot and start over ### 6. Concurrent Export Safety **Scenario 1**: Export to different paths ✅ - Multiple exports can run simultaneously to different storage locations - No conflicts (only reads database, writes to different paths) **Scenario 2**: Export to same path ⚠️ - Concurrent exports to the same path can corrupt the snapshot - With default resume behavior: both processes will resume the same export (usually safe) - Race condition exists during initial manifest creation, but unlikely in practice - Recommendation: Include timestamp in snapshot path to avoid conflicts ## CLI Interface ### Export Command ``` greptime export create [OPTIONS] --to Required: --to Target storage location Optional: --catalog Catalog name (default: greptime) --schemas Comma-separated schema list (default: all) --start-time Time range start (default: earliest) --end-time Time range end (default: now) --chunk-time-window Chunk time window (default: 1d) --parallelism Concurrency level (default: 1) --format Export format for data file: parquet (default), csv, json, or other formats supported by COPY DATABASE --schema-only Export schema only, no data --force Delete existing snapshot and recreate Behavior: - If snapshot doesn't exist: create new snapshot - If snapshot exists: automatically resume export (skip completed chunks, retry failed chunks, process pending chunks) - If --force is specified: delete existing snapshot first, then create new one ``` ### Import Command ``` greptime import [OPTIONS] --from Required: --from Source snapshot location Optional: --catalog Catalog name (default: greptime) --schemas Comma-separated schema list (default: all) --parallelism Concurrency level (default: 1) --dry-run Verify without importing --time-range Import partial time range only Behavior: - Automatically resumes if import was previously interrupted - Skips completed chunks, retries failed chunks, processes pending chunks ``` ### Management Commands ```bash # List snapshots greptime export list --location s3://bucket/snapshots # Verify snapshot integrity greptime export verify --snapshot s3://bucket/snapshots/prod-20250101 # Delete snapshot greptime export delete --snapshot s3://bucket/snapshots/old-snapshot ``` # Drawbacks 1. **Breaking change**: V2 does not support V1 format (migration required) 2. **Storage dependency**: Relies on object storage for production use 3. **Time-series assumption**: Assumes data has time index (not suitable for non-time-series tables) 4. **Chunk granularity**: Very sparse data may result in many small chunks # Rationale and Alternatives ## Why time-based chunking? **Alternatives considered**: - **Size-based chunking**: Requires scanning data to estimate size (double I/O overhead) - **Adaptive chunking**: Complex, requires metadata scanning, marginal benefit **Decision**: Fixed time-window chunking - Simple and predictable - Aligns with time-series data characteristics - User can adjust based on data density - No pre-scanning required ## Why JSON for schema, not SQL? **Alternatives**: SQL dumps (V1 approach) **Decision**: JSON schema format - Version-agnostic (can add compatibility layer) - Programmatically processable - Easier to validate and transform - Avoids SQL dialect issues # Unresolved Questions 1. **Cross-version restore**: Should V2 support restoring to older GreptimeDB versions? 2. **Partial schema export**: Should we support table-level filtering (not just schema-level)? # Future Possibilities 1. **Incremental backup**: Export only data changes since last backup (requires WAL integration) 2. **Parallel chunk processing**: Export/import multiple chunks simultaneously (requires careful resource management) 3. **Snapshot metadata service**: Centralized snapshot registry and lifecycle management