feat: Prototype of the storage engine (#107)

* feat: memtable flush (#63)

* wip: memtable flush

* optimize schema conversion

* remove unnecessary import

* add parquet file verfication

* add backtrace to error

* chore: upgrade opendal to 0.9 and fixed some problems

* rename error

* fix: error description

Co-authored-by: Dennis Zhuang <killme2008@gmail.com>

* feat: region manifest service (#57)

* feat: adds Manifest API

* feat: impl region manifest service

* refactor: by CR comments

* fix: storage error mod test

* fix: tweak storage cargo

* fix: tweak storage cargo

* refactor: by CR comments

* refactor: rename current_version

* feat: add wal writer (#60)

* feat: add Wal

* upgrade engine for wal

* fix: unit test for wal

* feat: wal into region

* fix: unix test

* fix clippy

* chore: by cr

* chore: by cr

* chore: prevent test data polution

* chore: by cr

* minor fix

* chore: by cr

* feat: Implement flush (#65)

* feat: Flush framework

- feat: Add id to memtable
- refactor: Rename MemtableSet/MutableMemtables to MemtableVersion/MemtableSet
- feat: Freeze memtable
- feat: Trigger flush
- feat: Background job pool
- feat: flush job
- feat: Sst access layer
- feat: Custom Deserialize for StringBytes
- feat: Use RegionWriter to apply file metas
- feat: Apply version edit
- chore: Remove unused imports

refactor: Use ParquetWriter to replace FlushTask

refactor: FsAccessLayer takes object store as param

chore: Remove todo from doc comments

feat: Move wal to WriterContext

chore: Fix clippy

chore: Add backtrace to WriteWal error

* feat: adds manifest to region and refactor sst/manifest dir config (#72)

* feat: adds manifest to region and refactor sst/manifest dir with EngineConfig

* refactor: ensure path ends with '/' in ManifestLogStorage

* fix: style

* refactor: normalize storage directory path and minor changes by CR

* refactor: doesn't need slash any more

* feat: Implement apply_edit() and add timestamp index to schema (#73)

* feat: Implement VersionControl::apply_edit()

* feat: Add timestamp index to schema

* feat: Implement Schema::timestamp_column()

* feat: persist region metadata to manifest (#74)

* feat: persist metadata when creating region or sst files

* fix: revert FileMeta comment

* feat: resolve todo

* fix: clippy warning

* fix: revert files_to_remove type in RegionEdit

* feat: impl SizeBasedStrategy for flush (#76)

* feat: impl SizeBasedStrategy for flush

* doc: get_mutable_limitation

* fix: code style and comment

* feat: align timestamp (#75)

* feat: align timestamps in write batch

* fix cr comments

* fix timestamp overflow

* simplify overflow check

* fix cr comments

* fix clippy issues

* test: Fix region tests (comment out some unsupported tests) (#82)

* feat: flush job (#80)

* feat: flush job

* fix cr comments

* move file name instead of clone

* comment log file test (#84)

* feat: improve MemtableVersion (#78)

* feat: improve MemtableVersion

* feat: remove flushed immutable memtables and test MemtableVersion

* refactor: by CR comments

* refactor: clone kv in iterator

* fix: clippy warning

* refactor: Make BatchIterator supertrait of Iterator (#85)

* refactor: rename Version to ManifestVersion and move out manifest from ShareData (#83)

* feat: Insert multiple memtables by time range (#77)

* feat: memtable::Inserter supports insert multiple memtables by time range

* chore: Update timestamp comment

* test: Add tests for Inserter

* test: Fix region tests (comment out some unsupported tests)

* refactor: align_timestamp() use TimestampMillis::aligned_by_bucket()

* chore: rename aligned_by_bucket to align_by_bucket

* fix: Fix compile errors

* fix: sst and manifest dir (#86)

* Set RowKeyDescriptor::enable_version_column to false by default

* feat: Implement write stall (#90)

* feat: Implement write stall

* chore: Update comments

* feat: Support reading multiple memtables (#93)

* feat: Support reading multiple memtables

* test: uncomment tests rely on snapshot read

* feat: wal format (#70)

* feat: wal codec

* chore: minor fix

* chore: comment

* chore: by cr

* chore: write_batch_codec mod

* chore: by cr

* chore: upgrade proto

* chore: by cr

* fix failing test

* fix failing test

* feat: manifest to wal (#100)

* feat: write manifest to wal

* chore: sequence into wal

* chore: by cr

* chore: by cr

* refactor: create log store (#104)

Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: Lei, Huang <6406592+v0y4g3r@users.noreply.github.com>
Co-authored-by: fariygirl <clickmetoday@163.com>
Co-authored-by: Jiachun Feng <jiachun_feng@proton.me>
Co-authored-by: Lei, HUANG <mrsatangel@gmail.com>

* chore: Fix clippy

Co-authored-by: Lei, Huang <6406592+v0y4g3r@users.noreply.github.com>
Co-authored-by: Dennis Zhuang <killme2008@gmail.com>
Co-authored-by: Jiachun Feng <jiachun_feng@proton.me>
Co-authored-by: fariygirl <clickmetoday@163.com>
Co-authored-by: Lei, HUANG <mrsatangel@gmail.com>
This commit is contained in:
evenyag
2022-07-25 15:26:00 +08:00
committed by GitHub
parent 2b064265bf
commit bf5975ca3e
95 changed files with 5675 additions and 543 deletions

View File

@@ -4,7 +4,7 @@ version = "0.1.0"
edition = "2021"
[dependencies]
bytes = "1.1"
bytes = { version = "1.1", features = ["serde"] }
common-error = { path = "../error" }
paste = "1.0"
serde = { version = "1.0", features = ["derive"] }

View File

@@ -1,9 +1,9 @@
use std::ops::Deref;
use serde::{Serialize, Serializer};
use serde::{Deserialize, Deserializer, Serialize, Serializer};
/// Bytes buffer.
#[derive(Debug, Default, Clone, PartialEq, Eq, PartialOrd, Ord)]
#[derive(Debug, Default, Clone, PartialEq, Eq, PartialOrd, Ord, Deserialize, Serialize)]
pub struct Bytes(bytes::Bytes);
impl From<bytes::Bytes> for Bytes {
@@ -56,15 +56,6 @@ impl PartialEq<Bytes> for [u8] {
}
}
impl Serialize for Bytes {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
self.0.serialize(serializer)
}
}
/// String buffer that can hold arbitrary encoding string (only support UTF-8 now).
///
/// Now this buffer is restricted to only hold valid UTF-8 string (only allow constructing `StringBytes`
@@ -128,6 +119,17 @@ impl Serialize for StringBytes {
}
}
// Custom Deserialize to ensure UTF-8 check is always done.
impl<'de> Deserialize<'de> for StringBytes {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
let s = String::deserialize(deserializer)?;
Ok(StringBytes::from(s))
}
}
#[cfg(test)]
mod tests {
use super::*;

View File

@@ -34,6 +34,11 @@ pub enum StatusCode {
TableNotFound,
TableColumnNotFound,
// ====== End of catalog related status code =======
// ====== Begin of storage related status code =====
/// Storage is temporarily unable to handle the request
StorageUnavailable,
// ====== End of storage related status code =======
}
impl fmt::Display for StatusCode {

View File

@@ -9,4 +9,4 @@ pub use global::{
spawn_read, spawn_write, write_runtime,
};
pub use crate::runtime::{Builder, JoinHandle, Runtime};
pub use crate::runtime::{Builder, JoinError, JoinHandle, Runtime};

View File

@@ -6,13 +6,13 @@ use metrics::{decrement_gauge, increment_gauge};
use snafu::ResultExt;
use tokio::runtime::{Builder as RuntimeBuilder, Handle};
use tokio::sync::oneshot;
pub use tokio::task::JoinHandle;
pub use tokio::task::{JoinError, JoinHandle};
use crate::error::*;
use crate::metric::*;
/// A runtime to run future tasks
#[derive(Clone)]
#[derive(Clone, Debug)]
pub struct Runtime {
handle: Handle,
// Used to receive a drop signal when dropper is dropped, inspired by databend
@@ -20,6 +20,7 @@ pub struct Runtime {
}
/// Dropping the dropper will cause runtime to shutdown.
#[derive(Debug)]
pub struct Dropper {
close: Option<oneshot::Sender<()>>,
}

View File

@@ -11,7 +11,7 @@ pub struct TimeRange<T> {
}
impl<T> TimeRange<T> {
/// Create a new range that contains timestamp in `[start, end)`.
/// Creates a new range that contains timestamp in `[start, end)`.
///
/// Returns `None` if `start` > `end`.
pub fn new<U: PartialOrd + Into<T>>(start: U, end: U) -> Option<TimeRange<T>> {
@@ -23,6 +23,14 @@ impl<T> TimeRange<T> {
}
}
/// Given a value, creates an empty time range that `start == end == value`.
pub fn empty_with_value<U: Clone + Into<T>>(value: U) -> TimeRange<T> {
TimeRange {
start: value.clone().into(),
end: value.into(),
}
}
/// Returns the lower bound of the range (inclusive).
#[inline]
pub fn start(&self) -> &T {
@@ -71,6 +79,10 @@ mod tests {
assert_eq!(range_eq.start(), range_eq.end());
assert_eq!(None, RangeMillis::new(1, 0));
let range = RangeMillis::empty_with_value(1024);
assert_eq!(range.start(), range.end());
assert_eq!(1024, *range.start());
}
#[test]

View File

@@ -1,6 +1,8 @@
use std::cmp::Ordering;
/// Unix timestamp in millisecond resolution.
///
/// Negative timestamp is allowed, which represents timestamp before '1970-01-01T00:00:00'.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct TimestampMillis(i64);
@@ -18,6 +20,29 @@ impl TimestampMillis {
pub const fn new(ms: i64) -> TimestampMillis {
TimestampMillis(ms)
}
/// Returns the timestamp aligned by `bucket_duration` in milliseconds or
/// `None` if overflow occurred.
///
/// # Panics
/// Panics if `bucket_duration <= 0`.
pub fn align_by_bucket(self, bucket_duration: i64) -> Option<TimestampMillis> {
assert!(bucket_duration > 0);
let ts = if self.0 >= 0 {
self.0
} else {
// `bucket_duration > 0` implies `bucket_duration - 1` won't overflow.
self.0.checked_sub(bucket_duration - 1)?
};
Some(TimestampMillis(ts / bucket_duration * bucket_duration))
}
/// Returns the timestamp value as i64.
pub fn as_i64(&self) -> i64 {
self.0
}
}
impl From<i64> for TimestampMillis {
@@ -60,6 +85,7 @@ mod tests {
let timestamp = TimestampMillis::from(ts);
assert_eq!(timestamp, ts);
assert_eq!(ts, timestamp);
assert_eq!(ts, timestamp.as_i64());
assert_ne!(TimestampMillis::new(0), timestamp);
assert!(TimestampMillis::new(-123) < TimestampMillis::new(0));
@@ -70,4 +96,28 @@ mod tests {
assert_eq!(i64::MAX - 1, TimestampMillis::MAX);
assert_eq!(i64::MIN, TimestampMillis::MIN);
}
#[test]
fn test_align_by_bucket() {
let bucket = 100;
assert_eq!(0, TimestampMillis::new(0).align_by_bucket(bucket).unwrap());
assert_eq!(0, TimestampMillis::new(1).align_by_bucket(bucket).unwrap());
assert_eq!(0, TimestampMillis::new(99).align_by_bucket(bucket).unwrap());
assert_eq!(
100,
TimestampMillis::new(100).align_by_bucket(bucket).unwrap()
);
assert_eq!(
100,
TimestampMillis::new(199).align_by_bucket(bucket).unwrap()
);
assert_eq!(0, TimestampMillis::MAX.align_by_bucket(i64::MAX).unwrap());
assert_eq!(
i64::MAX,
TimestampMillis::INF.align_by_bucket(i64::MAX).unwrap()
);
assert_eq!(None, TimestampMillis::MIN.align_by_bucket(bucket));
}
}