Commit Graph

1526 Commits

Author SHA1 Message Date
Dmitry Rodionov
9dfa145c7c tone down tenant not found error 2022-05-04 00:47:52 +03:00
Stas Kelvich
5642d0b2b8 Change shutdown_process_on_error thread spawn settings.
Now princeple is following: acceptor threads (libpq and http) error will
bring the pageserver down, but all per-tenant thread failures will be treated
as an error.
2022-05-04 00:42:57 +03:00
Dmitry Rodionov
2f83f793bc print more details when thread fails 2022-05-03 18:31:23 +03:00
Anastasia Lubennikova
2f9b17b9e5 Add simple test of pageserver recovery after crash. To cause a crash, use failpoints in checkpointer 2022-05-03 17:13:09 +03:00
Dmitry Rodionov
e7cba0b607 use thiserror instead of anyhow in disk_btree 2022-05-03 15:34:23 +03:00
Dmitry Rodionov
ff7e9a86c6 turn panic into an error with more details 2022-05-03 12:44:42 +03:00
Heikki Linnakangas
9ede38b6c4 Support finding LSN from a commit timestamp.
A new `get_lsn_by_timestamp` command is added to the libpq page service
API.

An extra timestamp field is now stored in an extra field after each
Clog page. It is the timestamp of the latest commit, among all the
transactions on the Clog page. To find the overall latest commit, we
need to scan all Clog pages, but this isn't a very frequent operation
so that's not too bad.

To find the LSN that corresponds to a timestamp, we perform a binary
search. The binary search starts with min = last LSN when GC ran, and
max = latest LSN on the timeline. On each iteration of the search we
check if there are any commits with a higher-than-requested timestamp
at that LSN.

Implements github issue 1361.
2022-05-03 09:28:57 +03:00
Heikki Linnakangas
62449d6068 Bump vendor/postgres (#1573)
This brings us the performance improvements to WAL redo from
https://github.com/neondatabase/postgres/pull/144
2022-05-03 09:25:12 +03:00
Konstantin Knizhnik
baa59512b8 Traverse frozen layer in get_reconstruct_data in reverse order (#1601)
* Traverse frozen layer in get_reconstruct_data in reverse order

* Fix comments on frozen layers.

Note explicitly the order that the layers are in the queue.

* Add fail point to reproduce failpoint iteration error

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2022-05-03 08:07:14 +03:00
Heikki Linnakangas
87a6c4d051 RFC on connection routing and authentication.
This documents how we want this to work. We're not quite there yet.
2022-05-02 23:39:06 +03:00
Stas Kelvich
801b749e1d Set correct authEndpoint for the new proxy 2022-05-02 21:46:32 +03:00
Kirill Bulatov
5cb501c2b3 Make remote storage test less flacky 2022-05-02 20:04:48 +03:00
Dmitry Rodionov
ad25736f3a Exit pageserver process with correct error code
When we shutdown pageserver due to an error (e g one of th important
thrads panicked) use 1 exit code so systemd can properly restart it
2022-05-02 19:04:45 +03:00
Stas Kelvich
9a396e1feb Support SNI-based routing in proxy 2022-05-02 18:32:18 +03:00
Stas Kelvich
0323bb5870 [proxy] Refactor cplane API and add new console SCRAM auth API
Now proxy binary accepts `--auth-backend` CLI option, which determines
auth scheme and cluster routing method. Following backends are currently
implemented:

* legacy
    old method, when username ends with `@zenith` it uses md5 auth dbname as
    the cluster name; otherwise, it sends a login link and waits for the console
    to call back
* console
    new SCRAM-based console API; uses SNI info to select the destination
    cluster
* postgres
    uses postgres to select auth secrets of existing roles. Useful for local
    testing
* link
    sends login link for all usernames
2022-05-02 18:32:18 +03:00
Dmitry Ivanov
af0195b604 [proxy] Introduce cloud::Api for communication with Neon Cloud
* `cloud::legacy` talks to Cloud API V1.
* `cloud::api` defines Cloud API v2.
* `cloud::local` mocks the Cloud API V2 using a local postgres instance.
* It's possible to choose between API versions using the `--api-version` flag.
2022-05-02 18:32:18 +03:00
Dmitry Ivanov
9df8915b03 [proxy] sasl::Mechanism may return Output during exchange
This is needed to forward the `ClientKey` that's required
to connect the proxy to a compute.

Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
2022-05-02 18:32:18 +03:00
Dmitry Ivanov
4b1bd32e4a Drop Debug impl for ScramKey and ServerSecret
There's a notion that accidental misuse of those implementations
might reveal authentication secrets.
2022-05-02 18:32:18 +03:00
Andrey Taranik
68ba6a58a0 authEndpoint fix 2022-05-02 17:55:13 +03:00
Andrey Taranik
8f479a712f minor fixes in proxy deployment 2022-05-02 17:55:13 +03:00
Stas Kelvich
2477d2f9e2 Deploy standalone SRAM proxy on staging 2022-05-02 17:55:13 +03:00
Dhammika Pathirana
992874c916 Fix update ps settings doc
Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>
2022-05-01 13:52:08 -07:00
Dhammika Pathirana
3128e8c75c Fix tenant conf test
Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>
2022-05-01 13:13:25 -07:00
Dhammika Pathirana
f3f12db2cb Add gc churn threshold knob (#1594)
Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>
2022-05-01 13:13:17 -07:00
Andrey Taranik
038ea4c128 proxy notice message update (#1600) 2022-04-30 22:04:08 +03:00
Kirill Bulatov
7e1db8c8a1 Show which virtual file got the deserialization errors 2022-04-29 21:40:57 +03:00
Andrey Taranik
aa933d3961 proxy settings update for new domain (#1597) 2022-04-29 20:05:14 +03:00
Dmitry Rodionov
67b4e38092 remporarily disable test_backpressure_received_lsn_lag 2022-04-29 15:53:56 +03:00
Dmitry Rodionov
05f8e6a050 Use fsync+rename for atomic downloads from remote storage
Use failpoint in test_remote_storage to check the behavior
2022-04-29 15:53:56 +03:00
chaitanya sharma
76388abeb6 Rename READMEs with .md extension, and fix links to them.
Commit edba2e97 renamed pageserver/README to pageserver/README.md, but
forgot to update links to it. Fix.

Rename libs/postgres_ffi/README and safekeeper/README files to also
have the the .md extension, so that github can render them nicely.

Quote ascii-diagram in safekeeper/README.md so that it renders
correctly.
2022-04-29 14:23:42 +03:00
Kirill Bulatov
2911eb084a Remove timeline files on detach 2022-04-29 09:19:18 +03:00
Kirill Bulatov
6cca57f95a Properly remove from the local timeline map 2022-04-29 09:19:18 +03:00
Kirill Bulatov
4a46b01caf Properly populate local timeline map 2022-04-29 09:19:18 +03:00
Anastasia Lubennikova
5c5c3c64f3 Fix tenant config parsing. Add a test 2022-04-28 11:49:19 +03:00
Arthur Petukhovsky
29539b0561 Set wal_keep_size to zero (#1507)
wal_keep_size is already set to 0 in our cloud setup, but we don't use this value in tests. This commit fixes wal_keep_size in control_plane and adds tests for WAL recycling and lagging safekeepers.
2022-04-27 19:09:28 +03:00
Dmitry Rodionov
695b5f9d88 Remove obsolete failpoint in proxy
When failpoint feature is disabled it throws away passed code so code
inside is not guaranteed to compile when feature is disabled. In this
particular case code is obsolete so removing it.
2022-04-27 14:34:33 +03:00
Dhammika Pathirana
66694e736a Fix add ps tenant config
Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>
2022-04-27 00:05:13 -07:00
Dhammika Pathirana
091cefaa92 Fix add compaction for key partitioning
Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>
2022-04-27 00:05:13 -07:00
Dhammika Pathirana
aeb4f81c3b Add branch traversal unit test
Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>
2022-04-27 00:05:13 -07:00
Dhammika Pathirana
6391862d8a Add branch traversal test
Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>
2022-04-27 00:05:13 -07:00
Dhammika Pathirana
b2e35fffa6 Fix ancestor layer traversal (#1484)
Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>
2022-04-27 00:05:13 -07:00
Arseny Sher
8b9d523f3c Remove old WAL on safekeepers.
Remove when it is consumed by all of 1) pageserver (remote_consistent_lsn) 2)
safekeeper peers 3) s3 WAL offloading.

In test s3 offloading for now is mocked by directly bumping s3_wal_lsn.

ref #1403
2022-04-26 23:02:23 +04:00
Arseny Sher
3fd234da07 Enable etcd for safekeepers in deploy. 2022-04-26 18:13:50 +04:00
Kirill Bulatov
778744d35c Limit concurrent S3 and IAM interactions 2022-04-26 13:49:37 +03:00
Dmitry Rodionov
eabf6f89e4 Use item.get for tenant config toml parsing
Previously we've used table interface, but there was no easy way to pass
it as an override to pageserver through cli. Use the same strategy as
for remote storage config parsing
2022-04-26 10:15:19 +03:00
Kirill Bulatov
fec050ce97 Fix macos clippy issues 2022-04-25 16:23:34 +03:00
Kirill Bulatov
d060a97c54 Simplify clippy runs 2022-04-25 16:23:34 +03:00
Anastasia Lubennikova
78a6cb247f allow the users to create extensions: GRANT CREATE ON DATABASE 2022-04-25 15:35:44 +03:00
Kirill Bulatov
8f6a161271 Show better layer load errors 2022-04-25 14:54:39 +03:00
Andrey Taranik
56f6269a8e rename docker images to neondatabase docker account (#1570)
* rename docker images to neondatabase docker account

* docker images build fix (permisions for Cargo.lock)
2022-04-25 11:34:51 +03:00