From a547c523c2daba9e661fd2caa6770830bb0b52bb Mon Sep 17 00:00:00 2001 From: Will Jones Date: Fri, 28 Mar 2025 11:04:31 -0700 Subject: [PATCH] feat!: change default read_consistency_interval=5s (#2281) Previously, when we loaded the next version of the table, we would block all reads with a write lock. Now, we only do that if `read_consistency_interval=0`. Otherwise, we load the next version asynchronously in the background. This should mean that `read_consistency_interval > 0` won't have a meaningful impact on latency. Along with this change, I felt it was safe to change the default consistency interval to 5 seconds. The current default is `None`, which means we will **never** check for a new version by default. I think that default is contrary to most users expectations. --- docs/src/guides/tables.md | 42 ++++-- docs/src/js/interfaces/ConnectionOptions.md | 2 +- docs/src/troubleshooting.md | 1 + node/src/integration_test/test.ts | 2 +- nodejs/__test__/connection.test.ts | 2 +- nodejs/__test__/table.test.ts | 2 +- nodejs/examples/basic.test.ts | 30 +++++ nodejs/src/connection.rs | 12 +- nodejs/src/lib.rs | 4 +- python/python/lancedb/__init__.py | 14 +- python/python/lancedb/db.py | 9 +- python/python/tests/docs/test_guide_tables.py | 13 +- python/python/tests/test_db.py | 11 +- python/python/tests/test_table.py | 6 +- python/src/connection.rs | 4 +- rust/ffi/node/src/lib.rs | 2 +- rust/lancedb/src/catalog/listing.rs | 8 +- rust/lancedb/src/connection.rs | 21 +-- rust/lancedb/src/table.rs | 17 +-- rust/lancedb/src/table/dataset.rs | 126 +++++++++++++++--- 20 files changed, 246 insertions(+), 82 deletions(-) diff --git a/docs/src/guides/tables.md b/docs/src/guides/tables.md index a202d2cc..5dcca406 100644 --- a/docs/src/guides/tables.md +++ b/docs/src/guides/tables.md @@ -1001,9 +1001,11 @@ In LanceDB OSS, users can set the `read_consistency_interval` parameter on conne There are three possible settings for `read_consistency_interval`: -1. **Unset (default)**: The database does not check for updates to tables made by other processes. This provides the best query performance, but means that clients may not see the most up-to-date data. This setting is suitable for applications where the data does not change during the lifetime of the table reference. -2. **Zero seconds (Strong consistency)**: The database checks for updates on every read. This provides the strongest consistency guarantees, ensuring that all clients see the latest committed data. However, it has the most overhead. This setting is suitable when consistency matters more than having high QPS. -3. **Custom interval (Eventual consistency)**: The database checks for updates at a custom interval, such as every 5 seconds. This provides eventual consistency, allowing for some lag between write and read operations. Performance wise, this is a middle ground between strong consistency and no consistency check. This setting is suitable for applications where immediate consistency is not critical, but clients should see updated data eventually. +1. **Unset**: The database does not check for updates to tables made by other processes. This setting is suitable for applications where the data does not change during the lifetime of the table reference. +2. **Zero seconds (Strong consistency)**: The database checks for updates on every read. This provides the strongest consistency guarantees, ensuring that all clients see the latest committed data. However, it has the most overhead. This setting is suitable when consistency matters more than having high QPS. For best performance, combine this setting with the storage option `new_table_enable_v2_manifest_paths` set to `true`. +3. **Custom interval (Eventual consistency, the default)**: The database checks for updates at a custom interval. By default, this is every 5 seconds. This provides eventual consistency, allowing for some lag between write and read operations. Performance wise, this is a middle ground between strong consistency and no consistency check. This setting is suitable for applications where immediate consistency is not critical, but clients should see updated data eventually. + +You can always force a synchronization by calling `checkout_latest()` / `checkoutLatest()` on a table. !!! tip "Consistency in LanceDB Cloud" @@ -1041,7 +1043,21 @@ There are three possible settings for `read_consistency_interval`: --8<-- "python/python/tests/docs/test_guide_tables.py:table_async_eventual_consistency" ``` - By default, a `Table` will never check for updates from other writers. To manually check for updates you can use `checkout_latest`: + For no consistency, use `None`: + + === "Sync API" + + ```python + --8<-- "python/python/tests/docs/test_guide_tables.py:table_no_consistency" + ``` + + === "Async API" + + ```python + --8<-- "python/python/tests/docs/test_guide_tables.py:table_async_no_consistency" + ``` + + To manually check for updates you can use `checkout_latest`: === "Sync API" @@ -1059,15 +1075,25 @@ There are three possible settings for `read_consistency_interval`: To set strong consistency, use `0`: ```ts - const db = await lancedb.connect({ uri: "./.lancedb", readConsistencyInterval: 0 }); - const tbl = await db.openTable("my_table"); + --8<-- "nodejs/examples/basic.test.ts:table_strong_consistency" ``` For eventual consistency, specify the update interval as seconds: ```ts - const db = await lancedb.connect({ uri: "./.lancedb", readConsistencyInterval: 5 }); - const tbl = await db.openTable("my_table"); + --8<-- "nodejs/examples/basic.test.ts:table_eventual_consistency" + ``` + + For no consistency, use `null`: + + ```ts + --8<-- "nodejs/examples/basic.test.ts:table_no_consistency" + ``` + + To manually check for updates you can use `checkoutLatest`: + + ```ts + --8<-- "nodejs/examples/basic.test.ts:table_checkout_latest" ```