From 9d29c83f8184d3ec1503589a459aa650108f8708 Mon Sep 17 00:00:00 2001 From: Oz Katz Date: Tue, 28 Oct 2025 14:12:50 -0400 Subject: [PATCH] docs: remove DynamoDB commit store section (#2715) This PR removes the section about needing the DynamoDB Commit Store. Reasoning: * S3 now supports [conditional writes](https://docs.aws.amazon.com/AmazonS3/latest/userguide/conditional-writes.html) * Upstream lance was updated to use this capability in https://github.com/lancedb/lance/issues/2793 * lanceDB itself was updated to include this (see @wjones127's comment [here](https://github.com/lancedb/lancedb/issues/1614#issuecomment-2725687260)) --- docs/src/guides/storage.md | 111 ------------------------------------- 1 file changed, 111 deletions(-) diff --git a/docs/src/guides/storage.md b/docs/src/guides/storage.md index a0a17dbe..e4f73721 100644 --- a/docs/src/guides/storage.md +++ b/docs/src/guides/storage.md @@ -397,117 +397,6 @@ For **read-only access**, LanceDB will need a policy such as: } ``` -#### DynamoDB Commit Store for concurrent writes - -By default, S3 does not support concurrent writes. Having two or more processes -writing to the same table at the same time can lead to data corruption. This is -because S3, unlike other object stores, does not have any atomic put or copy -operation. - -To enable concurrent writes, you can configure LanceDB to use a DynamoDB table -as a commit store. This table will be used to coordinate writes between -different processes. To enable this feature, you must modify your connection -URI to use the `s3+ddb` scheme and add a query parameter `ddbTableName` with the -name of the table to use. - -=== "Python" - - === "Sync API" - - ```python - import lancedb - db = lancedb.connect( - "s3+ddb://bucket/path?ddbTableName=my-dynamodb-table", - ) - ``` - === "Async API" - - ```python - import lancedb - async_db = await lancedb.connect_async( - "s3+ddb://bucket/path?ddbTableName=my-dynamodb-table", - ) - ``` - -=== "JavaScript" - - ```javascript - const lancedb = require("lancedb"); - - const db = await lancedb.connect( - "s3+ddb://bucket/path?ddbTableName=my-dynamodb-table", - ); - ``` - -The DynamoDB table must be created with the following schema: - -- Hash key: `base_uri` (string) -- Range key: `version` (number) - -You can create this programmatically with: - -=== "Python" - - - ```python - import boto3 - - dynamodb = boto3.client("dynamodb") - table = dynamodb.create_table( - TableName=table_name, - KeySchema=[ - {"AttributeName": "base_uri", "KeyType": "HASH"}, - {"AttributeName": "version", "KeyType": "RANGE"}, - ], - AttributeDefinitions=[ - {"AttributeName": "base_uri", "AttributeType": "S"}, - {"AttributeName": "version", "AttributeType": "N"}, - ], - ProvisionedThroughput={"ReadCapacityUnits": 1, "WriteCapacityUnits": 1}, - ) - ``` - -=== "JavaScript" - - - ```javascript - import { - CreateTableCommand, - DynamoDBClient, - } from "@aws-sdk/client-dynamodb"; - - const dynamodb = new DynamoDBClient({ - region: CONFIG.awsRegion, - credentials: { - accessKeyId: CONFIG.awsAccessKeyId, - secretAccessKey: CONFIG.awsSecretAccessKey, - }, - endpoint: CONFIG.awsEndpoint, - }); - const command = new CreateTableCommand({ - TableName: table_name, - AttributeDefinitions: [ - { - AttributeName: "base_uri", - AttributeType: "S", - }, - { - AttributeName: "version", - AttributeType: "N", - }, - ], - KeySchema: [ - { AttributeName: "base_uri", KeyType: "HASH" }, - { AttributeName: "version", KeyType: "RANGE" }, - ], - ProvisionedThroughput: { - ReadCapacityUnits: 1, - WriteCapacityUnits: 1, - }, - }); - await client.send(command); - ``` - #### S3-compatible stores