remote_storage: AWS_PROFILE with endpoint overrides in ~/.aws/config (updates AWS SDKs) (#7664)

Before this PR, using the AWS SDK profile feature for running against
minio didn't work because
* our SDK versions were too old and didn't include
  https://github.com/awslabs/aws-sdk-rust/issues/1060 and 
* we didn't massage the s3 client config builder correctly.

This PR
* udpates all the AWS SDKs we use to, respectively, the latest version I
could find on crates.io (Is there a better process?)
* changes the way remote_storage constructs the S3 client, and
* documents how to run the test suite against real S3 & local minio.

Regarding the changes to `remote_storage`: if one reads the SDK docs, it
is clear that the recommended way is to use `aws_config::from_env`, then
customize.
What we were doing instead is to use the `aws_sdk_s3` builder directly.

To get the `local-minio` in the added docs working, I needed to update
both the SDKs and make the changes to the `remote_storage`. See the
commit history in this PR for details.

Refs:
* byproduct: https://github.com/smithy-lang/smithy-rs/pull/3633
* follow-up on deprecation:
https://github.com/neondatabase/neon/issues/7665
* follow-up for scrubber S3 setup:
https://github.com/neondatabase/neon/issues/7667
This commit is contained in:
Christian Schwarz
2024-05-09 10:58:38 +02:00
committed by GitHub
parent d5399b729b
commit ab10523cc1
6 changed files with 285 additions and 76 deletions

View File

@@ -92,6 +92,166 @@ Exit after the first test failure:
`./scripts/pytest -x ...`
(there are many more pytest options; run `pytest -h` to see them.)
#### Running Python tests against real S3 or S3-compatible services
Neon's `libs/remote_storage` supports multiple implementations of remote storage.
At the time of writing, that is
```rust
pub enum RemoteStorageKind {
/// Storage based on local file system.
/// Specify a root folder to place all stored files into.
LocalFs(Utf8PathBuf),
/// AWS S3 based storage, storing all files in the S3 bucket
/// specified by the config
AwsS3(S3Config),
/// Azure Blob based storage, storing all files in the container
/// specified by the config
AzureContainer(AzureConfig),
}
```
The test suite has a Python enum with equal name but different meaning:
```python
@enum.unique
class RemoteStorageKind(str, enum.Enum):
LOCAL_FS = "local_fs"
MOCK_S3 = "mock_s3"
REAL_S3 = "real_s3"
```
* `LOCAL_FS` => `LocalFs`
* `MOCK_S3`: starts [`moto`](https://github.com/getmoto/moto)'s S3 implementation, then configures Pageserver with `AwsS3`
* `REAL_S3` => configure `AwsS3` as detailed below
When a test in the test suite needs an `AwsS3`, it is supposed to call `remote_storage.s3_storage()`.
That function checks env var `ENABLE_REAL_S3_REMOTE_STORAGE`:
* If it is not set, use `MOCK_S3`
* If it is set, use `REAL_S3`.
For `REAL_S3`, the test suite creates the dict/toml representation of the `RemoteStorageKind::AwsS3` based on env vars:
```rust
pub struct S3Config {
// test suite env var: REMOTE_STORAGE_S3_BUCKET
pub bucket_name: String,
// test suite env var: REMOTE_STORAGE_S3_REGION
pub bucket_region: String,
// test suite determines this
pub prefix_in_bucket: Option<String>,
// no env var exists; test suite sets it for MOCK_S3, because that's how moto works
pub endpoint: Option<String>,
...
}
```
*Credentials* are not part of the config, but discovered by the AWS SDK.
See the `libs/remote_storage` Rust code.
We're documenting two mechanism here:
The test suite supports two mechanisms (`remote_storage.py`):
**Credential mechanism 1**: env vars `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`.
Populate the env vars with AWS access keys that you created in IAM.
Our CI uses this mechanism.
However, it is _not_ recommended for interactive use by developers ([learn more](https://docs.aws.amazon.com/sdkref/latest/guide/access-users.html#credentials-long-term)).
Instead, use profiles (next section).
**Credential mechanism 2**: env var `AWS_PROFILE`.
This uses the AWS SDK's (and CLI's) profile mechanism.
Learn more about it [in the official docs](https://docs.aws.amazon.com/sdkref/latest/guide/file-format.html).
After configuring a profile (e.g. via the aws CLI), set the env var to its name.
In conclusion, the full command line is:
```bash
# with long-term AWS access keys
ENABLE_REAL_S3_REMOTE_STORAGE=true \
REMOTE_STORAGE_S3_BUCKET=mybucket \
REMOTE_STORAGE_S3_REGION=eu-central-1 \
AWS_ACCESS_KEY_ID=... \
AWS_SECRET_ACCESS_KEY=... \
./scripts/pytest
```
<!-- Don't forget to update the Minio example when changing these -->
```bash
# with AWS PROFILE
ENABLE_REAL_S3_REMOTE_STORAGE=true \
REMOTE_STORAGE_S3_BUCKET=mybucket \
REMOTE_STORAGE_S3_REGION=eu-central-1 \
AWS_PROFILE=... \
./scripts/pytest
```
If you're using SSO, make sure to `aws sso login --profile $AWS_PROFILE` first.
##### Minio
If you want to run test without the cloud setup, we recommend [minio](https://min.io/docs/minio/linux/index.html).
```bash
# Start in Terminal 1
mkdir /tmp/minio_data
minio server /tmp/minio_data --console-address 127.0.0.1:9001 --address 127.0.0.1:9000
```
In another terminal, create an `aws` CLI profile for it:
```ini
# append to ~/.aws/config
[profile local-minio]
services = local-minio-services
[services local-minio-services]
s3 =
endpoint_url=http://127.0.0.1:9000/
```
Now configure the credentials (this is going to write `~/.aws/credentials` for you).
It's an interactive prompt.
```bash
# Terminal 2
$ aws --profile local-minio configure
AWS Access Key ID [None]: minioadmin
AWS Secret Access Key [None]: minioadmin
Default region name [None]:
Default output format [None]:
```
Now create a bucket `testbucket` using the CLI.
```bash
# (don't forget to have AWS_PROFILE env var set; or use --profile)
aws --profile local-minio s3 mb s3://mybucket
```
(If it doesn't work, make sure you update your AWS CLI to a recent version.
The [service-specific endpoint feature](https://docs.aws.amazon.com/sdkref/latest/guide/feature-ss-endpoints.html)
that we're using is quite new.)
```bash
# with AWS PROFILE
ENABLE_REAL_S3_REMOTE_STORAGE=true \
REMOTE_STORAGE_S3_BUCKET=mybucket \
REMOTE_STORAGE_S3_REGION=doesntmatterforminio \
AWS_PROFILE=local-minio \
./scripts/pytest
```
NB: you can avoid the `--profile` by setting the `AWS_PROFILE` variable.
Just like the AWS SDKs, the `aws` CLI is sensible to it.
#### Running Rust tests against real S3 or S3-compatible services
We have some Rust tests that only run against real S3, e.g., [here](https://github.com/neondatabase/neon/blob/c18d3340b5e3c978a81c3db8b6f1e83cd9087e8a/libs/remote_storage/tests/test_real_s3.rs#L392-L397).
They use the same env vars as the Python test suite (see previous section)
but interpret them on their own.
However, at this time, the interpretation is identical.
So, above instructions apply to the Rust test as well.
### Writing a test
Every test needs a Neon Environment, or NeonEnv to operate in. A Neon Environment