Comparing 9cada8b59d...80f68a0029 - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-17 10:22:56 +00:00

Author	SHA1	Message	Date
Arpad Müller	80f68a0029	Use bounds() function	2024-07-02 14:57:20 +02:00
Arpad Müller	9a134a8f18	Fix tests	2024-07-02 12:11:46 +02:00
Arpad Müller	9290f57750	Return the entire buffer in BlobWriter as well	2024-07-02 01:28:27 +02:00
Arpad Müller	6202c84408	Actually return the same slice	2024-07-02 01:17:04 +02:00
Arpad Müller	85260b4905	Fix tests as well	2024-07-01 19:30:07 +02:00
Arpad Müller	bdedd2192b	Use Slice<_> in write path instead of B: BoundedBuf<...>	2024-07-01 18:52:10 +02:00
Heikki Linnakangas	9ce193082a	Restore running xacts from CLOG on replica startup (#7288 ) We have one pretty serious MVCC visibility bug with hot standby replicas. We incorrectly treat any transactions that are in progress in the primary, when the standby is started, as aborted. That can break MVCC for queries running concurrently in the standby. It can also lead to hint bits being set incorrectly, and that damage can last until the replica is restarted. The fundamental bug was that we treated any replica start as starting from a shut down server. The fix for that is straightforward: we need to set 'wasShutdown = false' in InitWalRecovery() (see changes in the postgres repo). However, that introduces a new problem: with wasShutdown = false, the standby will not open up for queries until it receives a running-xacts WAL record from the primary. That's correct, and that's how Postgres hot standby always works. But it's a problem for Neon, because: * It changes the historical behavior for existing users. Currently, the standby immediately opens up for queries, so if they now need to wait, we can breka existing use cases that were working fine (assuming you don't hit the MVCC issues). * The problem is much worse for Neon than it is for standalone PostgreSQL, because in Neon, we can start a replica from an arbitrary LSN. In standalone PostgreSQL, the replica always starts WAL replay from a checkpoint record, and the primary arranges things so that there is always a running-xacts record soon after each checkpoint record. You can still hit this issue with PostgreSQL if you have a transaction with lots of subtransactions running in the primary, but it's pretty rare in practice. To mitigate that, we introduce another way to collect the running-xacts information at startup, without waiting for the running-xacts WAL record: We can the CLOG for XIDs that haven't been marked as committed or aborted. It has limitations with subtransactions too, but should mitigate the problem for most users. See https://github.com/neondatabase/neon/issues/7236. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-01 12:58:12 +03:00
Heikki Linnakangas	75c84c846a	tests: Make neon_xlogflush() flush all WAL, if you omit the LSN arg This makes it much more convenient to use in the common case that you want to flush all the WAL. (Passing pg_current_wal_insert_lsn() as the argument doesn't work for the same reasons as explained in the comments: we need to be back off to the beginning of a page if the previous record ended at page boundary.) I plan to use this to fix the issue that Arseny Sher called out at https://github.com/neondatabase/neon/pull/7288#discussion_r1660063852	2024-07-01 12:58:08 +03:00
Heikki Linnakangas	57535c039c	tests: remove a leftover 'running' flag (#8216 ) The 'running' boolean was replaced with a semaphore in commit `f0e2bb79b2`, but this initialization was missed. Remove it so that if a test tries to access it, you get an error rather than always claiming that the endpoint is not running. Spotted by Arseny at https://github.com/neondatabase/neon/pull/7288#discussion_r1660068657	2024-07-01 11:23:31 +03:00
Heikki Linnakangas	30027d94a2	Fix tracking of the nextMulti in the pageserver's copy of CheckPoint (#6528 ) Whenever we see an XLOG_MULTIXACT_CREATE_ID WAL record, we need to update the nextMulti and NextMultiOffset fields in the pageserver's copy of the CheckPoint struct, to cover the new multi-XID. In PostgreSQL, this is done by updating an in-memory struct during WAL replay, but because in Neon you can start a compute node at any LSN, we need to have an up-to-date value pre-calculated in the pageserver at all times. We do the same for nextXid. However, we had a bug in WAL ingestion code that does that: the multi-XIDs will wrap around at 2^32, just like XIDs, so we need to do the comparisons in a wraparound-aware fashion. Fix that, and add tests. Fixes issue #6520 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-01 01:49:49 +03:00
Alex Chi Z	bc704917a3	fix(pageserver): ensure tenant harness has different names (#8205 ) rename the tenant test harness name Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-28 15:13:25 -04:00
John Spray	b8bbaafc03	storage controller: fix heatmaps getting disabled during shard split (#8197 ) ## Problem At the start of do_tenant_shard_split, we drop any secondary location for the parent shards. The reconciler uses presence of secondary locations as a condition for enabling heatmaps. On the pageserver, child shards inherit their configuration from parents, but the storage controller assumes the child's ObservedState is the same as the parent's config from the prepare phase. The result is that some child shards end up with inaccurate ObservedState, and until something next migrates or restarts, those tenant shards aren't uploading heatmaps, so their secondary locations are downloading everything that was resident at the moment of the split (including ancestor layers which are often cleaned up shortly after the split). Closes: https://github.com/neondatabase/neon/issues/8189 ## Summary of changes - Use PlacementPolicy to control enablement of heatmap upload, rather than the literal presence of secondaries in IntentState: this way we avoid switching them off during shard split - test: during tenant split test, assert that the child shards have heatmap uploads enabled.	2024-06-28 18:27:13 +01:00
Arthur Petukhovsky	e1a06b40b7	Add rate limiter for partial uploads (#8203 ) Too many concurrect partial uploads can hurt disk performance, this commit adds a limiter. Context: https://neondb.slack.com/archives/C04KGFVUWUQ/p1719489018814669?thread_ts=1719440183.134739&cid=C04KGFVUWUQ	2024-06-28 18:16:21 +01:00
John Spray	babbe125da	pageserver: drop out of secondary download if iteration time has passed (#8198 ) ## Problem Very long running downloads can be wasteful, because the heatmap they're using is outdated after a few minutes. Closes: https://github.com/neondatabase/neon/issues/8182 ## Summary of changes - Impose a deadline on timeline downloads, using the same period as we use for scheduling, and returning an UpdateError::Restart when it is reached. This restart will involve waiting for a scheduling interval, but that's a good thing: it helps let other tenants proceed. - Refactor download_timeline so that the part where we update the state for local layers is done even if we fall out of the layer download loop with an error: this is important, especially for big tenants, because only layers in the SecondaryDetail state will be considered for eviction.	2024-06-28 17:05:09 +00:00
Heikki Linnakangas	ca2f7d06b2	Cherry-pick upstream fix for TruncateMultiXact assertion (#8195 ) We hit that bug in a new test being added in PR #6528. We'd get the fix from upstream with the next minor release anyway, but cherry-pick it now to unblock PR #6528. Upstream commit b1ffe3ff0b. See https://github.com/neondatabase/neon/pull/6528#issuecomment-2167367910	2024-06-28 16:47:05 +03:00
Arthur Petukhovsky	c22c6a6c9e	Add buckets to safekeeper ops metrics (#8194 ) In #8188 I forgot to specify buckets for new operations metrics. This commit fixes that.	2024-06-28 11:09:11 +01:00
Christian Schwarz	deec3bc578	virtual_file: take a `Slice` in the read APIs, eliminate `read_exact_at_n`, fix UB for engine `std-fs` (#8186 ) part of https://github.com/neondatabase/neon/issues/7418 I reviewed how the VirtualFile API's `read` methods look like and came to the conclusion that we've been using `IoBufMut` / `BoundedBufMut` / `Slice` wrong. This patch rectifies the situation. # Change 1: take `tokio_epoll_uring::Slice` in the read APIs Before, we took an `IoBufMut`, which is too low of a primitive and while it _seems_ convenient to be able to pass in a `Vec<u8>` without any fuzz, it's actually very unclear at the callsite that we're going to fill up that `Vec` up to its `capacity()`, because that's what `IoBuf::bytes_total()` returns and that's what `VirtualFile::read_exact_at` fills. By passing a `Slice` instead, a caller that "just wants to read into a `Vec`" is forced to be explicit about it, adding either `slice_full()` or `slice(x..y)`, and these methods panic if the read is outside of the bounds of the `Vec::capacity()`. Last, passing slices is more similar to what the `std::io` APIs look like. # Change 2: fix UB in `virtual_file_io_engine=std-fs` While reviewing call sites, I noticed that the `io_engine::IoEngine::read_at` method for `StdFs` mode has been constructing an `&mut[u8]` from raw parts that were uninitialized. We then used `std::fs::File::read_exact` to initialize that memory, but, IIUC we must not even be constructing an `&mut[u8]` where some of the memory isn't initialized. So, stop doing that and add a helper ext trait on `Slice` to do the zero-initialization. # Change 3: eliminate `read_exact_at_n` The `read_exact_at_n` doesn't make sense because the caller can just 1. `slice = buf.slice()` the exact memory it wants to fill 2. `slice = read_exact_at(slice)` 3. `buf = slice.into_inner()` Again, the `std::io` APIs specify the length of the read via the Rust slice length. We should do the same for the owned buffers IO APIs, i.e., via `Slice::bytes_total()`. # Change 4: simplify filling of `PageWriteGuard` The `PageWriteGuardBuf::init_up_to` was never necessary. Remove it. See changes to doc comment for more details. --- Reviewers should probably look at the added test case first, it illustrates my case a bit.	2024-06-28 11:20:37 +02:00
John Spray	063553a51b	pageserver: remove tenant create API (#8135 ) ## Problem For some time, we have created tenants with calls to location_conf. The legacy "POST /v1/tenant" path was only used in some tests. ## Summary of changes - Remove the API - Relocate TenantCreateRequest to the controller API file (this used to be used in both pageserver and controller APIs) - Rewrite tenant_create test helper to use location_config API, as control plane and storage controller do - Update docker-compose test script to create tenants with location_config API (this small commit is also present in https://github.com/neondatabase/neon/pull/7947)	2024-06-28 09:14:19 +01:00
Tristan Partin	5700233a47	Add application_name to compute activity monitor connection string This was missed in my previous attempt to mark every connection string with an application name. See `0c3e3a8667`.	2024-06-27 12:38:15 -07:00
Arthur Petukhovsky	1d66ca79a9	Improve slow operations observability in safekeepers (#8188 ) After https://github.com/neondatabase/neon/pull/8022 was deployed to staging, I noticed many cases of timeouts. After inspecting the logs, I realized that some operations are taking ~20 seconds and they're doing while holding shared state lock. Usually it happens right after redeploy, because compute reconnections put high load on disks. This commit tries to improve observability around slow operations. Non-observability changes: - `TimelineState::finish_change` now skips update if nothing has changed - `wal_residence_guard()` timeout is set to 30s	2024-06-27 18:39:43 +01:00
Alex Chi Z	23827c6b0d	feat(pageserver): add delta layer iterator (#8064 ) part of https://github.com/neondatabase/neon/issues/8002 ## Summary of changes Add delta layer iterator and tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-27 16:03:48 +00:00
Christian Schwarz	66b0bf41a1	fix: shutdown does not kill walredo processes (#8150 ) While investigating Pageserver logs from the cases where systemd hangs during shutdown (https://github.com/neondatabase/cloud/issues/11387), I noticed that even if Pageserver shuts down cleanly[^1], there are lingering walredo processes. [^1]: Meaning, pageserver finishes its shutdown procedure and calls `exit(0)` on its own terms, instead of hitting the systemd unit's `TimeoutSec=` limit and getting SIGKILLed. While systemd should never lock up like it does, maybe we can avoid hitting that bug by cleaning up properly. Changes ------- This PR adds a shutdown method to `WalRedoManager` and hooks it up to tenant shutdown. We keep track of intent to shutdown through the new `enum ProcessOnceCell` stored inside the pre-existing `redo_process` field. A gate is added to keep track of running processes, using the new type `struct Process`. Future Work ----------- Requests that don't need the redo process will not observe the shutdown (see doc comment). Doing so would be nice for completeness sake, but doesn't provide much benefit because `Tenant` and `Timeline` already shut down all walredo users. Testing ------- I did manual testing to confirm that the problem exists before this PR and that it's gone after. Setup: * `neon_local` with a single tenant, create some data using `pgbench` * ensure walredo process is running, not pid * watch `strace -e kill,wait4 -f -p "$(pgrep pageserver)"` * `neon_local pageserver stop` With this PR, we always observe ``` $ strace -e kill,wait4 -f -p "$(pgrep pageserver)" ... [pid 591120] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=591215, si_uid=1000} --- [pid 591134] kill(591174, SIGKILL) = 0 [pid 591134] wait4(591174, <unfinished ...> [pid 591142] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=591174, si_uid=1000, si_status=SIGKILL, si_utime=0, si_stime=0} --- [pid 591134] <... wait4 resumed>[{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], 0, NULL) = 591174 ... +++ exited with 0 +++ ``` Before this PR, we'd usually observe just ``` ... [pid 596239] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=596455, si_uid=1000} --- ... +++ exited with 0 +++ ``` Refs ---- refs https://github.com/neondatabase/cloud/issues/11387	2024-06-27 15:58:28 +02:00
Vlad Lazar	89cf8df93b	stocon: bump number of concurrent reconciles per operation (#8179 ) ## Problem Background node operations take a long time for loaded nodes. ## Summary of changes Increase number of concurrent reconciles an operation is allowed to spawn. This should make drain and fill operations faster and the new value is still well below the total limit of concurrent reconciles.	2024-06-27 13:16:41 +00:00
Alexander Bayandin	54a06de4b5	CI: Use `runner.arch` in cache keys along with `runner.os` (#8175 ) ## Problem The cache keys that we use on CI are the same for X64 and ARM64 (`runner.arch`) ## Summary of changes - Include `runner.arch` along with `runner.os` into cache keys	2024-06-27 13:56:03 +01:00
Arseny Sher	6f20a18e8e	Allow to change compute safekeeper list without restart. - Add --safekeepers option to neon_local reconfigure - Add it to python Endpoint reconfigure - Implement config reload in walproposer by restarting the whole bgw when safekeeper list changes. ref https://github.com/neondatabase/neon/issues/6341	2024-06-27 15:08:35 +03:00
Vlad Lazar	d557002675	strocon: don't overcommit when making node fill plan (#8171 ) ## Problem The fill requirement was not taken into account when looking through the shards of a given node to fill from. ## Summary of Changes Ensure that we do not fill a node past the recommendation from `Scheduler::compute_fill_requirement`.	2024-06-27 11:56:57 +01:00
Cihan Demirci	32b75e7c73	CI: additional trigger on merge to main (#8176 ) Before we consolidate workflows we want to be triggered by merges to main. https://github.com/neondatabase/cloud/issues/14862	2024-06-26 22:36:41 +00:00
Heikki Linnakangas	d2753719e3	test: Add helper function for importing a Postgres cluster (#8025 ) Also, modify the "neon_local timeline import" command so that it doesn't create the endpoint any more. I don't see any reason to bundle that in the same command, the "timeline create" and "timeline branch" commands don't do that either. I plan to add more tests similar to 'test_import_at_2bil', this will help to reduce the copy-pasting.	2024-06-26 21:54:29 +00:00
Alex Chi Z	04b2ac3fed	test: use aux file v2 policy in benchmarks (#8174 ) Use aux file v2 in benchmarks. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-26 20:33:15 +00:00
John Spray	c39d5b03e8	pageserver: remove legacy tenant config code, clean up redundant generation none/broken usages (#7947 ) ## Problem In https://github.com/neondatabase/neon/pull/5299, the new config-v1 tenant config file was added to hold the LocationConf type. We left the old config file in place for forward compat, and because running without generations (therefore without LocationConf) as still useful before the storage controller was ready for prime-time. Closes: https://github.com/neondatabase/neon/issues/5388 ## Summary of changes - Remove code for reading and writing the legacy config file - Remove Generation::Broken: it was unused. - Treat missing config file on disk as an error loading a tenant, rather than defaulting it. We can now remove LocationConf::default, and thereby guarantee that we never construct a tenant with a None generation. - Update some comments + add some assertions to clarify that Generation::None is only used in layer metadata, not in the state of a running tenant. - Update docker compose test to create tenants with a generation	2024-06-26 19:53:59 +00:00
Arthur Petukhovsky	76fc3d4aa1	Evict WAL files from disk (#8022 ) Fixes https://github.com/neondatabase/neon/issues/6337 Add safekeeper support to switch between `Present` and `Offloaded(flush_lsn)` states. The offloading is disabled by default, but can be controlled using new cmdline arguments: ``` --enable-offload Enable automatic switching to offloaded state --delete-offloaded-wal Delete local WAL files after offloading. When disabled, they will be left on disk --control-file-save-interval <CONTROL_FILE_SAVE_INTERVAL> Pending updates to control file will be automatically saved after this interval [default: 300s] ``` Manager watches state updates and detects when there are no actvity on the timeline and actual partial backup upload in remote storage. When all conditions are met, the state can be switched to offloaded. In `timeline.rs` there is `StateSK` enum to support switching between states. When offloaded, code can access only control file structure and cannot use `SafeKeeper` to accept new WAL. `FullAccessTimeline` is now renamed to `WalResidentTimeline`. This struct contains guard to notify manager about active tasks requiring on-disk WAL access. All guards are issued by the manager, all requests are sent via channel using `ManagerCtl`. When manager receives request to issue a guard, it unevicts timeline if it's currently evicted. Fixed a bug in partial WAL backup, it used `term` instead of `last_log_term` previously. After this commit is merged, next step is to roll this change out, as in issue #6338.	2024-06-26 18:58:56 +01:00
Vlad Lazar	dd3adc3693	docker: downgrade openssl to 1.1.1w (#8168 ) ## Problem We have seen numerous segfault and memory corruption issue for clients using libpq and openssl 3.2.2. I don't know if this is a bug in openssl or libpq. Downgrading to 1.1.1w fixes the issues for the storage controller and pgbench. ## Summary of Changes: Use openssl 1.1.1w instead of 3.2.2	2024-06-26 17:27:23 +00:00
Heikki Linnakangas	5b871802fd	Add counters for commands processed through the libpq page service API (#8089 ) I was looking for metrics on how many computes are still using protocol version 1 and 2. This provides counters for that as "pagestream" and "pagestream_v2" commands, but also all the other commands. The new metrics are global for the whole pageserver instance rather than per-tenant, so the additional metrics bloat should be fairly small.	2024-06-26 19:53:03 +03:00
Heikki Linnakangas	24ce73ffaf	Silence compiler warning (#8153 ) I saw this compiler warning on my laptop: pgxn/neon_walredo/walredoproc.c:178:10: warning: using the result of an assignment as a condition without parentheses [-Wparentheses] if (err = close_range_syscall(3, ~0U, 0)) { ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ pgxn/neon_walredo/walredoproc.c:178:10: note: place parentheses around the assignment to silence this warning if (err = close_range_syscall(3, ~0U, 0)) { ^ ( ) pgxn/neon_walredo/walredoproc.c:178:10: note: use '==' to turn this assignment into an equality comparison if (err = close_range_syscall(3, ~0U, 0)) { ^ == 1 warning generated. I'm not sure what compiler version or options cause that, but it's a good warning. Write the call a little differently, to avoid the warning and to make it a little more clear anyway. (The 'err' variable wasn't used for anything, so I'm surprised we were not seeing a compiler warning on the unused value, too.)	2024-06-26 19:19:27 +03:00
Arthur Petukhovsky	3118c24521	Panic on unexpected error in simtests (#8169 )	2024-06-26 16:46:14 +01:00
Alexander Bayandin	5af9660b9e	CI(build-tools): don't install Postgres 14 (#6540 ) ## Problem We install Postgres 14 in `build-tools` image, but we don't need it. We use Postgres binaries, which we build ourselves. ## Summary of changes - Remove Postgresql 14 installation from `build-tools` image	2024-06-26 16:37:04 +01:00
Conrad Ludgate	d7e349d33c	proxy: report blame for passthrough disconnect io errors (#8170 ) ## Problem Hard to debug the disconnection reason currently. ## Summary of changes Keep track of error-direction, and therefore error source (client vs compute) during passthrough.	2024-06-26 15:11:26 +00:00
Arthur Petukhovsky	47e5bf3bbb	Improve term reject message in walproposer (#8164 ) Co-authored-by: Tristan Partin <tristan@neon.tech>	2024-06-26 15:26:52 +01:00
Alex Chi Z	5d2f9ffa89	test(bottom-most-compaction): wal apply order (#8163 ) A follow-up on https://github.com/neondatabase/neon/pull/8103/. Previously, main branch fails with: ``` assertion `left == right` failed left: b"value 3@0x10@0x30@0x28@0x40" right: b"value 3@0x10@0x28@0x30@0x40" ``` This gets fixed after #8103 gets merged. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-26 09:34:41 -04:00
Heikki Linnakangas	fdadd6a152	Remove primary_is_running (#8162 ) This was a half-finished mechanism to allow a replica to enter hot standby mode sooner, without waiting for a running-xacts record. It had issues, and we are working on a better mechanism to replace it. The control plane might still set the flag in the spec file, but compute_ctl will simply ignore it.	2024-06-26 15:13:03 +03:00
Peter Bendel	9b623d3a2c	add commit hash to S3 object identifier for artifacts on S3 (#8161 ) In future we may want to run periodic tests on dedicated cloud instances that are not GitHub action runners. To allow these to download artifact binaries for a specific commit hash we want to make the search by commit hash possible and prefix the S3 objects with `artifacts/${GITHUB_SHA}/${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}` --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-06-26 05:46:52 +00:00
Alex Chi Z	9b98823d61	bottom-most-compaction: use in test_gc_feedback + fix bugs (#8103 ) Adds manual compaction trigger; add gc compaction to test_gc_feedback Part of https://github.com/neondatabase/neon/issues/8002 ``` test_gc_feedback[debug-pg15].logical_size: 50 Mb test_gc_feedback[debug-pg15].physical_size: 2269 Mb test_gc_feedback[debug-pg15].physical/logical ratio: 44.5302 test_gc_feedback[debug-pg15].max_total_num_of_deltas: 7 test_gc_feedback[debug-pg15].max_num_of_deltas_above_image: 2 test_gc_feedback[debug-pg15].logical_size_after_bottom_most_compaction: 50 Mb test_gc_feedback[debug-pg15].physical_size_after_bottom_most_compaction: 287 Mb test_gc_feedback[debug-pg15].physical/logical ratio after bottom_most_compaction: 5.6312 test_gc_feedback[debug-pg15].max_total_num_of_deltas_after_bottom_most_compaction: 4 test_gc_feedback[debug-pg15].max_num_of_deltas_above_image_after_bottom_most_compaction: 1 ``` ## Summary of changes * Add the manual compaction trigger * Use in test_gc_feedback * Add a guard to avoid running it with retain_lsns * Fix: Do `schedule_compaction_update` after compaction * Fix: Supply deltas in the correct order to reconstruct value --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-25 23:00:14 +00:00
Alex Chi Z	76864e6a2a	feat(pageserver): add image layer iterator (#8006 ) part of https://github.com/neondatabase/neon/issues/8002 ## Summary of changes This pull request adds the image layer iterator. It buffers a fixed amount of key-value pairs in memory, and give developer an iterator abstraction over the image layer. Once the buffer is exhausted, it will issue 1 I/O to fetch the next batch. Due to the Rust lifetime mysteries, the `get_stream_from` API has been refactored to `into_stream` and consumes `self`. Delta layer iterator implementation will be similar, therefore I'll add it after this pull request gets merged. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-25 20:49:29 +00:00
Conrad Ludgate	6c5d3b5263	proxy fix wake compute console retry (#8141 ) ## Problem 1. Proxy is retrying errors from cplane that shouldn't be retried 2. ~~Proxy is not using the retry_after_ms value~~ ## Summary of changes 1. Correct the could_retry impl for ConsoleError. 2. ~~Update could_retry interface to support returning a fixed wait duration.~~	2024-06-25 18:07:54 +00:00
Christian Schwarz	cd9a550d97	clippy-deny the `todo!()` macro (#4340 ) `todo!()` shouldn't slip into prod code	2024-06-25 18:03:27 +00:00
John Spray	07f21dd6b6	pageserver: remove attach/detach apis (#8134 ) ## Problem These APIs have been deprecated for some time, but were still used from test code. Closes: https://github.com/neondatabase/neon/issues/4282 ## Summary of changes - It is still convenient to do a "tenant_attach" from a test without having to write out a location_conf body, so those test methods have been retained with implementations that call through to their location_conf equivalent.	2024-06-25 17:38:06 +01:00
Heikki Linnakangas	64a4461191	Fix submodule references to match the REL_*_STABLE_neon branches (#8159 ) No code changes, just point to the correct commit SHAs.	2024-06-25 19:05:13 +03:00
Yuchen Liang	961fc0ba8f	feat(pageserver): add metrics for number of valid leases after each refresh (#8147 ) Part of #7497, closes #8120. ## Summary of changes This PR adds a metric to track the number of valid leases after `GCInfo` gets refreshed each time. Besides this metric, we should also track disk space and synthetic size (after #8071 is closed) to make sure leases are used properly. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-06-25 15:43:12 +00:00
Alexander Bayandin	9b2f9419d9	CI: upload docker cache only from main (#8157 ) ## Problem The Docker build cache gets invalidated by PRs ## Summary of changes - Upload cache only from the main branch	2024-06-25 16:18:22 +01:00
Christian Schwarz	947f6da75e	L0 flush: avoid short-lived allocation when checking key_range empty (#8154 ) We only use `keys` to check if it's empty so we can bail out early. No need to collect the keys for that. Found this while doing research for https://github.com/neondatabase/neon/issues/7418	2024-06-25 17:04:44 +02:00
Vlad Lazar	7026dde9eb	storcon: update db related dependencides (#8155 ) ## Problem Storage controller runs into memory corruption issue on the drain/fill code paths. ## Summary of changes Update db related depdencies in the unlikely case that the issue was fixed in diesel.	2024-06-25 15:06:18 +01:00
Heikki Linnakangas	d502313841	Fix MVCC bug with prepared xact with subxacts on standby (#8152 ) We did not recover the subtransaction IDs of prepared transactions when starting a hot standby from a shutdown checkpoint. As a result, such subtransactions were considered as aborted, rather than in-progress. That would lead to hint bits being set incorrectly, and the subtransactions suddenly becoming visible to old snapshots when the prepared transaction was committed. To fix, update pg_subtrans with prepared transactions's subxids when starting hot standby from a shutdown checkpoint. The snapshots taken from that state need to be marked as "suboverflowed", so that we also check the pg_subtrans. Discussion: https://www.postgresql.org/message-id/6b852e98-2d49-4ca1-9e95-db419a2696e0%40iki.fi NEON: cherry-picked from the upstream thread ahead of time, to unblock https://github.com/neondatabase/neon/pull/7288. I expect this to be committed to upstream in the next few days, superseding this. NOTE: I did not include the new regression test on v15 and v14 branches, because the test would need some adapting, and we don't run the perl tests on Neon anyway.	2024-06-25 16:29:32 +03:00
Yuchen Liang	219e78f885	feat(pageserver): add an optional lease to the get_lsn_by_timestamp API (#8104 ) Part of #7497, closes #8072. ## Problem Currently the `get_lsn_by_timestamp` and branch creation pageserver APIs do not provide a pleasant client experience where the looked-up LSN might be GC-ed between the two API calls. This PR attempts to prevent common races between GC and branch creation by making use of LSN leases provided in #8084. A lease can be optionally granted to a looked-up LSN. With the lease, GC will not touch layers needed to reconstruct all pages at this LSN for the duration of the lease. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-06-24 20:12:24 +00:00
John Spray	1ea5d8b132	tests: accomodate some messages that can fail tests (#8144 ) ## Problem - `test_storage_controller_many_tenants` can fail with warnings in the storage controller about tenant creation holding a lock for too long, because this test stresses the machine running the test with many concurrent timeline creations - `test_tenant_delete_smoke` can fail when synthetic remote storage errors show up ## Summary of changes - tolerate warnings about slow timeline creation in test_storage_controller_many_tenants - tolerate both possible errors during error_tolerant_delete	2024-06-24 17:03:53 +00:00
John Spray	3d760938e1	storcon_cli: remove old tenant-scatter command (#8127 ) ## Problem This command was used in the very early days of sharding, before the storage controller had anti-affinity + scheduling optimization to spread out shards. ## Summary of changes - Remove `storcon_cli tenant-scatter`	2024-06-24 12:57:16 -04:00
Alex Chi Z	9211de0df7	test(pageserver): add delta records tests for gc-compaction (#8078 ) Part of https://github.com/neondatabase/neon/issues/8002 This pull request adds tests for bottom-most gc-compaction with delta records. Also fixed a bug in the compaction process that creates overlapping delta layers by force splitting at the original delta layer boundary. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-24 15:50:31 +00:00
Alex Chi Z	d8ffe662a9	fix(pageserver): handle version number in draw timeline (#8102 ) We now have a `vX` number in the file name, i.e., `000000067F0000000400000B150100000000-000000067F0000000400000D350100000000__00000000014B7AC8-v1-00000001` The related pull request for new-style path was merged a month ago https://github.com/neondatabase/neon/pull/7660 ## Summary of changes Fixed the draw timeline dir command to handle it. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-24 15:31:06 +00:00
Arthur Petukhovsky	a4db2af1f0	Truncate waltmp file on creation (#8133 ) Previously in safekeeper code, new segment file was opened without truncate option. I don't think there is a reason to do it, this commit replaces it with `File::create` to make it simpler and remove `clippy::suspicious_open_options` linter warning.	2024-06-24 14:07:59 +00:00
John Spray	47fdf93cf0	tests: fix a flake in test_sharding_split_compaction (#8136 ) ## Problem This test could occasionally trigger a "removing local file ... because it has unexpected length log" when using the `compact-shard-ancestors-persistent` failpoint is in use, which is unexpected because that failpoint stops the process when the remote metadata is in sync with local files. It was because there are two shards on the same pageserver, and while the one being compacted explicitly stops at the failpoint, another shard was compacting in the background and failing at an unclean point. The test intends to disable background compaction, but was mistakenly revoking the value of `compaction_period` when it updated `pitr_interval`. Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8123/9602976462/index.html#/testresult/7dd6165da7daef40 ## Summary of changes - Update `TENANT_CONF` in the test to use properly typed values, so that it is usable in pageserver APIs as well as via neon_local. - When updating tenant config with `pitr_interval`, retain the overrides from the start of the test, so that there won't be any background compaction going on during the test.	2024-06-24 14:54:54 +01:00
John Spray	de05f90735	pageserver: add more info-level logging in shard splits (#8137 ) ## Problem `test_sharding_autosplit` is occasionally failing on warnings about shard splits taking longer than expected (`Exclusive lock by ShardSplit was held for`...) It's not obvious which part is taking the time (I suspect remote storage uploads). Example: https://neon-github-public-dev.s3.amazonaws.com/reports/main/9618788427/index.html#testresult/b395294d5bdeb783/ ## Summary of changes - Since shard splits are infrequent events, we can afford to be very chatty: add a bunch of info-level logging throughout the process.	2024-06-24 11:53:43 +01:00
John Spray	188797f048	pageserver: remove code that resumes tenant deletions after restarts (#8091 ) #8082 removed the legacy deletion path, but retained code for completing deletions that were started before a pageserver restart. This PR cleans up that remaining code, and removes all the pageserver code that dealt with tenant deletion markers and resuming tenant deletions. The release at https://github.com/neondatabase/neon/pull/8138 contains https://github.com/neondatabase/neon/pull/8082, so we can now merge this to `main`	2024-06-24 11:41:11 +01:00
Arpad Müller	5446e08891	Move remote_storage config related code into dedicated module (#8132 ) Moves `RemoteStorageConfig` and related structs and functions into a dedicated module. Also implements `Serialize` for the config structs (requested in #8126). Follow-up of #8126	2024-06-24 12:29:54 +02:00
Conrad Ludgate	78d9059fc7	proxy: update tokio-postgres to allow arbitrary config params (#8076 ) ## Problem Fixes https://github.com/neondatabase/neon/issues/1287 ## Summary of changes tokio-postgres now supports arbitrary server params through the `param(key, value)` method. Some keys are special so we explicitly filter them out.	2024-06-24 10:20:27 +00:00
Arpad Müller	75747cdbff	Use serde for RemoteStorageConfig parsing (#8126 ) Adds a `Deserialize` impl to `RemoteStorageConfig`. We thus achieve the same as #7743 but with less repetitive code, by deriving `Deserialize` impls on `S3Config`, `AzureConfig`, and `RemoteStorageConfig`. The disadvantage is less useful error messages. The git history of this PR contains a state where we go via an intermediate representation, leveraging the `serde_json` crate, without it ever being actual json though. Also, the PR adds deserialization tests. Alternative to #7743 .	2024-06-22 17:57:09 +00:00
Vlad Lazar	8fe3f17c47	storcon: improve drain and fill shard placement (#8119 ) ## Problem While adapting the storage controller scale test to do graceful rolling restarts via drain and fill, I noticed that secondaries are also being rescheduled, which, in turn, caused the storage controller to optimise attachments. ## Summary of changes * Introduce a transactional looking rescheduling primitive (i.e. "try to schedule to this secondary, but leave everything as is if you can't") * Use it for the drain and fill stages to avoid calling into `Scheduler::schedule` and having secondaries move around.	2024-06-22 14:20:58 +00:00
Anastasia Lubennikova	8776089c70	Remove kq_imcx extension support per customer request neondatabase/cloud#13648	2024-06-21 20:22:54 +01:00
John Spray	b74232eb4d	tests: allow-list neon_local endpoint errors from storage controller (#8123 ) ## Problem For testing, the storage controller has a built-in hack that loads neon_local endpoint config from disk, and uses it to reconfigure endpoints when the attached pageserver changes. Some tests that stop an endpoint while the storage controller is running could occasionally fail on log errors from the controller trying to use its special test-mode calls into neon local Endpoint. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8117/9592392425/index.html#/testresult/9d2bb8623d0d53f8 ## Summary of changes - Give NotifyError an explicit NeonLocal variant, to avoid munging these into generic 500s (I don't want to ignore 500s in general) - Allow-list errors related to the local notification hook. The expectation is that tests using endpoints/workloads should be independently checking that those endpoints work: if neon_local generates an error inside the storage controller, that's ignorable.	2024-06-21 17:23:31 +00:00
Vlad Lazar	ee3081863e	storcon: implement endpoints for cancellation of drain and fill operations (#8029 ) ## Problem There's no way to cancel drain and fill operations. ## Summary of changes Implement HTTP endpoints to allow cancelling of background operations. When the operationis cancelled successfully, the node scheduling policy will revert to `Active`.	2024-06-21 17:13:51 +01:00
John Spray	15728be0e1	pageserver: always detach before deleting (#8082 ) In #7957 we enabled deletion without attachment, but retained the old-style deletion (return 202, delete in background) for attached tenants. In this PR, we remove the old-style deletion path, such that if the tenant delete API is invoked while a tenant is detached, it is simply detached before completing the deletion. This intentionally doesn't rip out all the old deletion code: in case a deletion was in progress at time of upgrade, we keep around the code for finishing it for one release cycle. The rest of the code removal happens in https://github.com/neondatabase/neon/pull/8091 Now that deletion will always be via the new path, the new path is also updated to use some retries around remote storage operations, to tripping up the control plane with 500s if S3 has an intermittent issue.	2024-06-21 15:39:19 +01:00
Arthur Petukhovsky	f45cf28247	Add eviction_state to control file (#8125 ) This is a preparation for #8022, to make the PR both backwards and foward compatible. This commit adds `eviction_state` field to control file. Adds support for reading it, but writes control file in old format where possible, to keep the disk format forward compatible. Note: in `patch_control_file`, new field gets serialized to json like this: - `"eviction_state": "Present"` - `"eviction_state": {"Offloaded": "0/8F"}`	2024-06-21 13:15:02 +01:00
Peter Bendel	82266a252c	Allow longer timeout for starting pageserver, safe keeper and storage controller in test cases to make test cases less flaky (#8079 ) ## Problem see https://github.com/neondatabase/neon/issues/8070 ## Summary of changes the neon_local subcommands to - start neon - start pageserver - start safekeeper - start storage controller get a new option -t=xx or --start-timeout=xx which allows to specify a longer timeout in seconds we wait for the process start. This is useful in test cases where the pageserver has to read a lot of layer data, like in pagebench test cases. In addition we exploit the new timeout option in the python test infrastructure (python fixtures) and modify the flaky testcase to increase the timeout from 10 seconds to 1 minute. Example from the test execution ```bash RUST_BACKTRACE=1 NEON_ENV_BUILDER_USE_OVERLAYFS_FOR_SNAPSHOTS=1 DEFAULT_PG_VERSION=15 BUILD_TYPE=release ./scripts/pytest test_runner/performance/pageserver/pagebench/test_pageserver_max_throughput_getpage_at_latest_lsn.py ... 2024-06-19 09:29:34.590 INFO [neon_fixtures.py:1513] Running command "/instance_store/neon/target/release/neon_local storage_controller start --start-timeout=60s" 2024-06-19 09:29:36.365 INFO [broker.py:34] starting storage_broker to listen incoming connections at "127.0.0.1:15001" 2024-06-19 09:29:36.365 INFO [neon_fixtures.py:1513] Running command "/instance_store/neon/target/release/neon_local pageserver start --id=1 --start-timeout=60s" 2024-06-19 09:29:36.366 INFO [neon_fixtures.py:1513] Running command "/instance_store/neon/target/release/neon_local safekeeper start 1 --start-timeout=60s" ```	2024-06-21 10:36:12 +00:00
John Spray	59f949b4a8	pageserver: remove unused load/ignore APIs (#8122 ) ## Problem These APIs have be unused for some time. They were superseded by /location_conf: the equivalent of ignoring a tenant is now to put it in secondary mode. ## Summary of changes - Remove APIs - Remove tests & helpers that used them - Remove error variants that are no longer needed.	2024-06-21 10:02:15 +00:00
Vlad Lazar	01399621d5	storcon: avoid promoting too many shards of the same tenant (#8099 ) ## Problem The fill planner introduced in https://github.com/neondatabase/neon/pull/8014 selects tenant shards to promote strictly based on attached shard count load (tenant shards on nodes with the most attached shard counts are considered first). This approach runs the risk of migrating too many shards belonging to the same tenant on the same primary node. This is bad for availability and causes extra reconciles via the storage controller's background optimisations. Also see https://github.com/neondatabase/neon/pull/8014#discussion_r1642456241. ## Summary of changes Refine the fill plan to avoid promoting too many shards belonging to the same tenant on the same node. We allow for `max(1, shard_count / node_count)` shards belonging to the same tenant to be promoted.	2024-06-21 10:19:01 +01:00
Jure Bajic	0792bb6785	Add tracing for shared locks in `id_lock_map` (#7618 ) ## Problem Storage controller shared locks do not print a warning when held for long time spans. ## Summary of changes Extension of issue https://github.com/neondatabase/neon/issues/7108 in tracing to exclusive lock in `id_lock_map` was added, to add the same for shared locks. It was mentioned in the comment https://github.com/neondatabase/neon/pull/7397#discussion_r1587961160	2024-06-21 09:47:04 +01:00
Vlad Lazar	f8ac3b0e0e	storcon: use attached shard counts for initial shard placement (#8061 ) ## Problem When creating a new shard the storage controller schedules via Scheduler::schedule_shard. This does not take into account the number of attached shards. What it does take into account is the node affinity: when a shard is scheduled, all its nodes (primaries and secondaries) get their affinity incremented. For two node clusters and shards with one secondary we have a pathological case where all primaries are scheduled on the same node. Now that we track the count of attached shards per node, this is trivial to fix. Still, the "proper" fix is to use the pageserver's utilization score. Closes https://github.com/neondatabase/neon/issues/8041 ## Summary of changes Use attached shard count when deciding which node to schedule a fresh shard on.	2024-06-20 17:32:01 +01:00
Christian Schwarz	02ecdd137b	fix: preinitialize `pageserver_basebackup_query_seconds` metric (#8121 ) Without this patch, the Pageserver 4 Golden Signals dashboard shows no data if there are no basebackups (observed in pre-prod).	2024-06-20 15:50:43 +00:00
Christian Schwarz	79401638df	remove materialized page cache (#8105 ) part of Epic https://github.com/neondatabase/neon/issues/7386 # Motivation The materialized page cache adds complexity to the code base, which increases the maintenance burden and risk for subtle and hard to reproduce bugs such as #8050. Further, the best hit rate that we currently achieve in production is ca 1% of materialized page cache lookups for `task_kind=PageRequestHandler`. Other task kinds have hit rates <0.2%. Last, caching page images in Pageserver rewards under-sized caches in Computes because reading from Pageserver's materialized page cache over the network is often sufficiently fast (low hundreds of microseconds). Such Computes should upscale their local caches to fit their working set, rather than repeatedly requesting the same page from Pageserver. Some more discussion and context in internal thread https://neondb.slack.com/archives/C033RQ5SPDH/p1718714037708459 # Changes This PR removes the materialized page cache code & metrics. The infrastructure for different key kinds in `PageCache` is left in place, even though the "Immutable" key kind is the only remaining one. This can be further simplified in a future commit. Some tests started failing because their total runtime was dependent on high materialized page cache hit rates. This test makes them fixed-runtime or raises pytest timeouts: * test_local_file_cache_unlink * test_physical_replication * test_pg_regress # Performance I focussed on ensuring that this PR will not result in a performance regression in prod. * getpage requests: our production metrics have shown the materialized page cache to be irrelevant (low hit rate). Also, Pageserver is the wrong place to cache page images, it should happen in compute. * ingest (`task_kind=WalReceiverConnectionHandler`): prod metrics show 0 percent hit rate, so, removing will not be a regression. * get_lsn_by_timestamp: important API for branch creation, used by control pane. The clog pages that this code uses are not materialize-page-cached because they're not 8k. No risk of introducing a regression here. We will watch the various nightly benchmarks closely for more results before shipping to prod.	2024-06-20 11:56:14 +02:00
Alexander Bayandin	c789ec21f6	CI: miscellaneous cleanups (#8073 ) ## Problem There are a couple of small CI cleanups that seem too small for dedicated PRs ## Summary of changes - Create release PR with the title that matches the title in the description - Tune error message for disallowing `ubuntu-latest` to explicitly mention what to do - Remove junit output from pytest, we use allure instead	2024-06-19 19:21:09 +01:00
Alexander Bayandin	558a57b15b	CI(test-images): add dockerhub auth (#8115 ) ## Problem ``` Unable to find image 'neondatabase/neon:9583413584' locally docker: Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit. ``` ## Summary of changes - add `docker/login-action@v3` for `test-images` job	2024-06-19 16:54:07 +00:00
John Spray	f0e2bb79b2	tests: use semaphore instead of lock for Endpoint.running (#8112 ) ## Problem Ahem, let's try this again. https://github.com/neondatabase/neon/pull/8110 had a spooky failure in test_multi_attach where a call to Endpoint.stop() timed out waiting for a lock, even though we can see an earlier call completing and releasing the lock. I suspect something weird is going on with the way pytest runs tests across processes, or use of asyncio perhaps. Anyway: the simplest fix is to just use a semaphore instead: if we don't lock we can't deadlock. ## Summary of changes - Make Endpoint.running a semaphore, where we add a unit to its counter when starting the process and atomically decrement it when stopping.	2024-06-19 16:07:14 +00:00
MMeent	fd0b22f5cd	Make sure we can handle temporarily offline PS when we first connect (#8094 ) Fixes https://github.com/neondatabase/neon/issues/7897 ## Problem `shard->delay_us` was potentially uninitialized when we connect to PS, as it wasn't set to a non-0 value until we've first connected to the shard's pageserver. That caused the exponential backoff to use an initial value (multiplier) of 0 for the first connection attempt to that pageserver, thus causing a hot retry loop with connection attempts to the pageserver without significant delay. That in turn caused attemmpts to reconnect to quickly fail, rather than showing the expected 'wait until pageserver is available' behaviour. ## Summary of changes We initialize shard->delay_us before connection initialization if we notice it is not initialized yet.	2024-06-19 15:05:31 +02:00
Peter Bendel	56da624870	allow storage_controller error during pagebench (#8109 ) ## Problem `test_pageserver_max_throughput_getpage_at_latest_lsn` is a pagebench testcase which creates several tenants/timelines to verify pageserver performance. The test swaps environments around in the tenant duplication stage, so the storage controller uses two separate db instances (one in the duplication stage and another one in the benchmarking stage). In the benchmarking stage, the storage controller starts without any knowledge of nodes, but with knowledge of tenants (via attachments.json). When we re-attach and attempt to update the scheduler stats, the scheduler rightfully complains about the node not being known. The setup should preserve the storage controller across the two envs, but i think it's fine to just allow list the error in this case. ## Summary of changes add the error message `2024-06-19T09:38:27.866085Z ERROR Scheduler missing node 1`` to the list of allowed errors for storage_controller	2024-06-19 13:04:29 +00:00
Conrad Ludgate	b998b70315	proxy: reduce some per-task memory usage (#8095 ) ## Problem Some tasks are using around upwards of 10KB of memory at all times, sometimes having buffers that swing them up to 30MB. ## Summary of changes Split some of the async tasks in selective places and box them as appropriate to try and reduce the constant memory usage. Especially in the locations where the large future is only a small part of the total runtime of the task. Also, reduces the size of the CopyBuffer buffer size from 8KB to 1KB. In my local testing and in staging this had a minor improvement. sadly not the improvement I was hoping for :/ Might have more impact in production	2024-06-19 13:34:15 +01:00
John Spray	76aa6936e8	tests: make Endpoint.stop() thread safe (occasional flakes in `test_multi_attach`) (#8110 ) ## Problem Tests using the `Workload` helper would occasionally fail in a strange way, where the endpoint appears to try and stop twice concurrently, and the second stop fails because the pidfile is already gone. `test_multi_attach` suffered from this. Workload has a `__del__` that stops the endpoint, and python is destroying this object in a different thread than NeonEnv.stop is called, resulting in racing stop() calls. Endpoint has a `running` attribute that avoids calling neon_local's stop twice, but that doesn't help in the concurrent case. ## Summary of changes - Make `Endpoint.stop` thread safe with a simple lock held across the updates to `running` and the actual act of stopping it. One could also work around this by letting Workload.endpoint outlive the Workload, or making Workload a context manager, but this change feels most robust, as it avoids all test code having to know that it must not try and stop an endpoint from a destructor.	2024-06-19 13:14:50 +01:00
Christian Schwarz	438fd2aaf3	neon_local: `background_process`: launch all processes in repo dir (or `datadir`) (#8058 ) Before this PR, storage controller and broker would run in the PWD of neon_local, i.e., most likely the checkout of neon.git. With this PR, the shared infrastructure for background processes sets the PWD. Benefits: * easy listing of processes in a repo dir using `lsof`, see added comment in the code * coredumps go in the right directory (next to the process) * generally matching common expectations, I think Changes: * set the working directory in `background_process` module * drive-by: fix reliance of storage_controller on NEON_REPO_DIR being set by neon_local for the local compute hook to work correctly	2024-06-19 13:59:36 +02:00
Vlad Lazar	e7d62a257d	test: fix tenant duplication utility generation numbers (#8096 ) ## Problem We have this set of test utilities which duplicate a tenant by copying everything that's in remote storage and then attaching a tenant to the pageserver and storage controller. When the "copied tenants" are created on the storage controller, they start off from generation number 0. This means that they can't see anything past that generation. This issues has existed ever since generation numbers have been introduced, but we've largely been lucky for the generation to stay stable during the template tenant creation. ## Summary of Changes Extend the storage controller debug attach hook to accept a generation override. Use that in the tenant duplication logic to set the generation number to something greater than the naturally reached generation. This allows the tenants to see all layer files.	2024-06-19 11:55:59 +01:00
Vlad Lazar	5778d714f0	storcon: add drain and fill background operations for graceful cluster restarts (#8014 ) ## Problem Pageserver restarts cause read availablity downtime for tenants. See `Motivation` section in the [RFC](https://github.com/neondatabase/neon/pull/7704). ## Summary of changes * Introduce a new `NodeSchedulingPolicy`: `PauseForRestart` * Implement the first take of drain and fill algorithms * Add a node status endpoint which can be polled to figure out when an operation is done The implementation follows the RFC, so it might be useful to peek at it as you're reviewing. Since the PR is rather chunky, I've made sure all commits build (with warnings), so you can review by commit if you prefer that. RFC: https://github.com/neondatabase/neon/pull/7704 Related https://github.com/neondatabase/neon/issues/7387	2024-06-19 11:55:30 +01:00
Sergey Melnikov	4753b8f390	Copy release images to prod ECR (#8101 ) ## Problem We want to have all released images in production ECR repository ## Summary of changes Copy all docker images to production ECR repository cc: https://github.com/neondatabase/cloud/issues/10177	2024-06-19 09:33:21 +00:00
Alex Chi Z	68476bb4ba	feat(pageserver): add iterator API for btree reader (#8083 ) The new image iterator and delta iterator uses an iterator-based API. https://github.com/neondatabase/neon/pull/8006 / part of https://github.com/neondatabase/neon/issues/8002 This requires the underlying thing (the btree) to have an iterator API, and the iterator should have a type name so that it can be stored somewhere. ```rust pub struct DeltaLayerIterator { index_iterator: BTreeIterator } ``` versus ```rust pub struct DeltaLayerIterator { index_iterator: impl Stream<....> } ``` (this requires nightly flag and still buggy in the Rust compiler) There are multiple ways to achieve this: 1. Either write a BTreeIterator from scratch that provides `async next`. This is the most efficient way to do that. 2. Or wrap the current `get_stream` API, which is the current approach in the pull request. In the future, we should do (1), and the `get_stream` API should be refactored to use the iterator API. With (2), we have to wrap the `get_stream` API with `Pin<Box<dyn Stream>>`, where we have the overhead of dynamic dispatch. However, (2) needs a rewrite of the `visit` function, which would take some time to write and review. I'd like to define this iterator API first and work on a real iterator API later. ## Summary of changes Add `DiskBtreeIterator` and related tests. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-18 20:02:57 +00:00
Arseny Sher	6bb8b1d7c2	Remove dead code from walproposer_pg.c Now that logical walsenders fetch WAL from safekeepers recovery in walproposer is not needed. Fixes warnings.	2024-06-18 21:12:02 +03:00
Yuchen Liang	30b890e378	feat(pageserver): use leases to temporarily block gc (#8084 ) Part of #7497, extracts from #7996, closes #8063. ## Problem With the LSN lease API introduced in https://github.com/neondatabase/neon/issues/7808, we want to implement the real lease logic so that GC will keep all the layers needed to reconstruct all pages at all the leased LSNs with valid leases at a given time. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-06-18 17:37:06 +00:00
Heikki Linnakangas	560627b525	Replace a few references to Zenith with neon	2024-06-18 20:01:32 +03:00
Heikki Linnakangas	1c1b4b0c04	Add a bunch of items for new changes that we've made	2024-06-18 20:01:32 +03:00
Heikki Linnakangas	b774ab54d4	Remove obsolete ones - Relation size cache was moved to extension - the changes in visibilitymap.c and freespace.c became unnecessary with v16, thanks to changes in upstream code - WALProposer was moved to extension - The hack in ReadBuffer_common to not throw an error on unexpected data beyond EOF was removed in v16 rebase. We haven't seen such errors, so I guess that was some early issue that was fixed long time ago. - The ginfast.c diff was made unnecessary by upstream commit 56b662523f	2024-06-18 20:01:32 +03:00
Heikki Linnakangas	33a09946fc	Prefetching has been implemented	2024-06-18 20:01:32 +03:00
Heikki Linnakangas	0396ed67f7	Update comments on various items To update things that have changed since this was written, and to reflect discussions at offsite meeting.	2024-06-18 20:01:32 +03:00
Heikki Linnakangas	8ee6724167	Update overview section to reflect current code organization	2024-06-18 20:01:32 +03:00
dependabot[bot]	8a9fa0a4e4	build(deps): bump urllib3 from 1.26.18 to 1.26.19 (#8086 )	2024-06-18 16:40:46 +01:00
dependabot[bot]	cf60e4c0c5	build(deps): bump ws from 8.16.0 to 8.17.1 in /test_runner/pg_clients/typescript/serverless-driver (#8087 )	2024-06-18 16:40:27 +01:00
Arpad Müller	68a2298973	Add support to specifying storage account in AzureConfig (#8090 ) We want to be able to specify the storage account via the toml configuration, so that we can connect to multiple storage accounts in the same process. https://neondb.slack.com/archives/C06SJG60FRB/p1718702144270139	2024-06-18 16:03:23 +02:00
Arseny Sher	4feb6ba29c	Make pull_timeline work with auth enabled. - Make safekeeper read SAFEKEEPER_AUTH_TOKEN env variable with JWT token to connect to other safekeepers. - Set it in neon_local when auth is enabled. - Create simple rust http client supporting it, and use it in pull_timeline implementation. - Enable auth in all pull_timeline tests. - Make sk http_client() by default generate safekeeper wide token, it makes easier enabling auth in all tests by default.	2024-06-18 15:45:39 +03:00
Arseny Sher	29a41fc7b9	Implement holding off WAL removal for pull_timeline.	2024-06-18 15:45:39 +03:00
Arseny Sher	d8b2a49c55	safekeeper: streaming pull_timeline - Add /snapshot http endpoing streaming tar archive timeline contents up to flush_lsn. - Add check that term doesn't change, corresponding test passes now. - Also prepares infra to hold off WAL removal during the basebackup. - Sprinkle fsyncs to persist the pull_timeline result. ref https://github.com/neondatabase/neon/issues/6340	2024-06-18 15:45:39 +03:00
John Spray	ed9ffb9af2	pageserver: eliminate CalculateSyntheticSizeError::LsnNotFound (`test_metric_collection` flake) (#8065 ) ## Problem ``` ERROR synthetic_size_worker: failed to calculate synthetic size for tenant ae449af30216ac56d2c1173f894b1122: Could not find size at 0/218CA70 in timeline d8da32b5e3e0bf18cfdb560f9de29638\n') ``` e.g. https://neon-github-public-dev.s3.amazonaws.com/reports/main/9518948590/index.html#/testresult/30a6d1e2471d2775 This test had allow lists but was disrupted by https://github.com/neondatabase/neon/pull/8051. In that PR, I had kept an error path in fill_logical_sizes that covered the case where we couldn't find sizes for some of the segments, but that path could only be hit in the case that some Timeline was shut down concurrently with a synthetic size calculation, so it makes sense to just leave the segment's size None in this case: the subsequent size calculations do not assume it is Some. ## Summary of changes - Remove `CalculateSyntheticSizeError::LsnNotFound` and just proceed in the case where we used to return it - Remove defunct allow list entries in `test_metric_collection`	2024-06-18 13:44:30 +01:00
Christian Schwarz	6c6a7f9ace	[v2] Include openssl and ICU statically linked (#8074 ) We had to revert the earlier static linking change due to libicu version incompatibilities: - original PR: https://github.com/neondatabase/neon/pull/7956 - revert PR: https://github.com/neondatabase/neon/pull/8003 Specifically, the problem manifests for existing projects as error ``` DETAIL: The collation in the database was created using version 153.120.42, but the operating system provides version 153.14.37. ``` So, this PR reintroduces the original change but with the exact same libicu version as in Debian `bullseye`, i.e., the libicu version that we're using today. This avoids the version incompatibility. Additional changes made by Christian ==================================== - `hashFiles` can take multiple arguments, use that feature - validation of the libicu tarball checksum - parallel build (`-j $(nproc)`) for openssl and libicu Follow-ups ========== Debian bullseye has a few patches on top of libicu: https://sources.debian.org/patches/icu/67.1-7/ We still decide whether we need to include these patches or not. => https://github.com/neondatabase/cloud/issues/14527 Eventually, we'll have to figure out an upgrade story for libicu. That work is tracked in epic https://github.com/neondatabase/cloud/issues/14525. The OpenSSL version in this PR is arbitrary. We should use `1.1.1w` + Debian patches if applicable. See https://github.com/neondatabase/cloud/issues/14526. Longer-term: * https://github.com/neondatabase/cloud/issues/14519 * https://github.com/neondatabase/cloud/issues/14525 Refs ==== Co-authored-by: Christian Schwarz <christian@neon.tech> refs https://github.com/neondatabase/cloud/issues/12648 --------- Co-authored-by: Rahul Patil <rahul@neon.tech>	2024-06-18 09:42:22 +02:00
MMeent	e729f28205	Fix log rates (#8035 ) ## Summary of changes - Stop logging HealthCheck message passing at INFO level (moved to DEBUG) - Stop logging /status accesses at INFO (moved to DEBUG) - Stop logging most occurances of `missing config file "compute_ctl_temp_override.conf"` - Log memory usage only when the data has changed significantly, or if we've not recently logged the data, rather than always every 2 seconds.	2024-06-17 18:57:49 +00:00
Alexander Bayandin	b6e1c09c73	CI(check-build-tools-image): change build-tools image persistent tag (#8059 ) ## Problem We don't rebuild `build-tools` image for changes in a workflow that builds this image itself (`.github/workflows/build-build-tools-image.yml`) or in a workflow that determines which tag to use (`.github/workflows/check-build-tools-image.yml`) ## Summary of changes - Use a hash of `Dockerfile.build-tools` and workflow files as a persistent tag instead of using a commit sha.	2024-06-17 12:47:20 +01:00
Vlad Lazar	16d80128ee	storcon: handle entire cluster going unavailable correctly (#8060 ) ## Problem A period of unavailability for all pageservers in a cluster produced the following fallout in staging: all tenants became detached and required manual operation to re-attach. Manually restarting the storage controller re-attached all tenants due to a consistency bug. Turns out there are two related bugs which caused the issue: 1. Pageserver re-attach can be processed before the first heartbeat. Hence, when handling the availability delta produced by the heartbeater, `Node::get_availability_transition` claims that there's no need to reconfigure the node. 2. We would still attempt to reschedule tenant shards when handling offline transitions even if the entire cluster is down. This puts tenant shards into a state where the reconciler believes they have to be detached (no pageserver shows up in their intent state). This is doubly wrong because we don't mark the tenant shards as detached in the database, thus causing memory vs database consistency issues. Luckily, this bug allowed all tenant shards to re-attach after restart. ## Summary of changes * For (1), abuse the fact that re-attach requests do not contain an utilisation score and use that to differentiate from a node that replied to heartbeats. * For (2), introduce a special case that skips any rescheduling if the entire cluster is unavailable. * Update the storage controller heartbeat test with an extra scenario where the entire cluster goes for lunch. Fixes https://github.com/neondatabase/neon/issues/8044	2024-06-17 11:40:35 +01:00
Arseny Sher	2ba414525e	Install rust binaries before running rust tests. cargo test (or nextest) might rebuild the binaries with different features/flags, so do install immediately after the build. Triggered by the particular case of nextest invocations missing $CARGO_FEATURES, which recompiled safekeeper without 'testing' feature which made python tests needing it (failpoints) not run in the CI. Also add CARGO_FEATURES to the nextest runs anyway because there doesn't seem to be an important reason not to.	2024-06-17 06:23:32 +03:00
Peter Bendel	46210035c5	add halfvec indexing and queries to periodic pgvector performance tests (#8057 ) ## Problem halfvec data type was introduced in pgvector 0.7.0 and is popular because it allows smaller vectors, smaller indexes and potentially better performance. So far we have not tested halfvec in our periodic performance tests. This PR adds halfvec indexing and halfvec queries to the test.	2024-06-14 18:36:50 +02:00
Alex Chi Z	81892199f6	chore(pageserver): vectored get target_keyspace directly accums (#8055 ) follow up on https://github.com/neondatabase/neon/pull/7904 avoid a layer of indirection introduced by `Vec<Range<Key>>` Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-14 11:57:58 -04:00
Alexander Bayandin	83eb02b07a	CI: downgrade docker/setup-buildx-action (#8062 ) ## Problem I've bumped `docker/setup-buildx-action` in #8042 because I wasn't able to reproduce the issue from #7445. But now the issue appears again in https://github.com/neondatabase/neon/actions/runs/9514373620/job/26226626923?pr=8059 The steps to reproduce aren't clear, it required `docker/setup-buildx-action@v3` and rebuilding the image without cache, probably ## Summary of changes - Downgrade `docker/setup-buildx-action@v3` to `docker/setup-buildx-action@v2`	2024-06-14 11:43:51 +00:00
Arseny Sher	a71f58e69c	Fix test_segment_init_failure. Graceful shutdown broke it.	2024-06-14 14:24:15 +03:00
Conrad Ludgate	e6eb0020a1	update rust to 1.79.0 (#8048 ) ## Problem rust 1.79 new enabled by default lints ## Summary of changes * update to rust 1.79 * `s/default_features/default-features/` * fix proxy dead code. * fix pageserver dead code.	2024-06-14 13:23:52 +02:00
John Spray	eb0ca9b648	pageserver: improved synthetic size & find_gc_cutoff error handling (#8051 ) ## Problem This PR refactors some error handling to avoid log spam on tenant/timeline shutdown. - "ignoring failure to find gc cutoffs: timeline shutting down." logs (https://github.com/neondatabase/neon/issues/8012) - "synthetic_size_worker: failed to calculate synthetic size for tenant ...: Failed to refresh gc_info before gathering inputs: tenant shutting down", for example here: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8049/9502988669/index.html#suites/3fc871d9ee8127d8501d607e03205abb/1a074a66548bbcea Closes: https://github.com/neondatabase/neon/issues/8012 ## Summary of changes - Refactor: Add a PageReconstructError variant to GcError: this is the only kind of error that find_gc_cutoffs can emit. - Functional change: only ignore shutdown PageReconstructError variant: for other variants, treat it as a real error - Refactor: add a structured CalculateSyntheticSizeError type and use it instead of anyhow::Error in synthetic size calculations - Functional change: while iterating through timelines gathering logical sizes, only drop out if the whole tenant is cancelled: individual timeline cancellations indicate deletion in progress and we can just ignore those.	2024-06-14 11:08:11 +01:00
John Spray	6843fd8f89	storage controller: always wait for tenant detach before delete (#8049 ) ## Problem This test could fail with a timeout waiting for tenant deletions. Tenant deletions could get tripped up on nodes transitioning from offline to online at the moment of the deletion. In a previous reconciliation, the reconciler would skip detaching a particular location because the node was offline, but then when we do the delete the node is marked as online and can be picked as the node to use for issuing a deletion request. This hits the "Unexpectedly still attached path", which would still work if the caller kept calling DELETE, but if a caller does a Delete,get,get,get poll, then it doesn't work because the GET calls fail after we've marked the tenant as detached. ## Summary of changes Fix the undesirable storage controller behavior highlighted by this test failure: - Change tenant deletion flow to _always_ wait for reconciliation to succeed: it was unsound to proceed and return 202 if something was still attached, because after the 202 callers can no longer GET the tenant. Stabilize the test: - Add a reconcile_until_idle to the test, so that it will not have reconciliations running in the background while we mark a node online. This test is not meant to be a chaos test: we should test that kind of complexity elsewhere. - This reconcile_until_idle also fixes another failure mode where the test might see a None for a tenant location because a reconcile was mutating it (https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7288/9500177581/index.html#suites/8fc5d1648d2225380766afde7c428d81/4acece42ae00c442/) It remains the case that a motivated tester could produce a situation where a DELETE gives a 500, when precisely the wrong node transitions from offline to available at the precise moment of a deletion (but the 500 is better than returning 202 and then failing all subsequent GETs). Note that nodes don't go through the offline state during normal restarts, so this is super rare. We should eventually fix this by making DELETE to the pageserver implicitly detach the tenant if it's attached, but that should wait until nobody is using the legacy-style deletes (the ones that use 202 + polling)	2024-06-14 10:37:30 +01:00
Alexander Bayandin	edc900028e	CI: Update outdated GitHub Actions (#8042 ) ## Problem We have some amount of outdated action in the CI pipeline, GitHub complains about some of them. ## Summary of changes - Update `actions/checkout@1` (a really old one) in `vm-compute-node-image` - Update `actions/checkout@3` in `build-build-tools-image` - Update `docker/setup-buildx-action` in all workflows / jobs, it was downgraded in https://github.com/neondatabase/neon/pull/7445, but it it seems it works fine now	2024-06-14 10:24:13 +01:00
Heikki Linnakangas	789196572e	Fix test_replica_query_race flakiness (#8038 ) This failed once with `relation "test" does not exist` when trying to run the query on the standby. It's possible that the standby is started before the CREATE TABLE is processed in the pageserver, and the standby opens up for queries before it has received the CREATE TABLE transaction from the primary. To fix, wait for the standby to catch up to the primary before starting to run the queries. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8025/9483658488/index.html	2024-06-14 11:51:12 +03:00
John Spray	425eed24e8	pageserver: refine shutdown handling in secondary download (#8052 ) ## Problem Some code paths during secondary mode download are returning Ok() rather than UpdateError::Cancelled. This is functionally okay, but it means that the end of TenantDownloader::download has a sanity check that the progress is 100% on success, and prints a "Correcting drift..." warning if not. This warning can be emitted in a test, e.g. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8049/9503642976/index.html#/testresult/fff1624ba6adae9e. ## Summary of changes - In secondary download cancellation paths, use Err(UpdateError::Cancelled) rather than Ok(), so that we drop out of the download function and do not reach the progress sanity check.	2024-06-14 09:39:31 +01:00
James Broadhead	f67010109f	extensions: pgvector-0.7.2 (#8037 ) Update pgvector to 0.7.2 Purely mechanical update to pgvector.patch, just as a place to start from	2024-06-14 10:17:43 +02:00
Tristan Partin	0c3e3a8667	Set application_name for internal connections to computes This will help when analyzing the origins of connections to a compute like in [0]. [0]: https://github.com/neondatabase/cloud/issues/14247	2024-06-13 12:06:10 -07:00
Christian Schwarz	82719542c6	fix: vectored get returns incorrect result on inexact materialized page cache hit (#8050 ) # Problem Suppose our vectored get starts with an inexact materialized page cache hit ("cached lsn") that is shadowed by a newer image layer image layer. Like so: ``` <inmemory layers> +-+ < delta layer \| \| -\|-\|----- < image layer \| \| \| \| -\|-\|----- < cached lsn for requested key +_+ ``` The correct visitation order is 1. inmemory layers 2. delta layer records in LSN range `[image_layer.lsn, oldest_inmemory_layer.lsn_range.start)` 3. image layer However, the vectored get code, when it visits the delta layer, it (incorrectly!) returns with state `Complete`. The reason why it returns is that it calls `on_lsn_advanced` with `self.lsn_range.start`, i.e., the layer's LSN range. Instead, it should use `lsn_range.start`, i.e., the LSN range from the correct visitation order listed above. # Solution Use `lsn_range.start` instead of `self.lsn_range.start`. # Refs discovered by & fixes https://github.com/neondatabase/neon/issues/6967 Co-authored-by: Vlad Lazar <vlad@neon.tech>	2024-06-13 18:20:47 +00:00
Alex Chi Z	d25f7e3dd5	test(pageserver): add test wal record for unit testing (#8015 ) https://github.com/neondatabase/neon/issues/8002 We need mock WAL record to make it easier to write unit tests. This pull request adds such a record. It has `clear` flag and `append` field. The tests for legacy-enhanced compaction are not modified yet and will be part of the next pull request. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-13 09:44:37 -04:00
Anna Khanova	fbccd1e676	Proxy process updated errors (#8026 ) ## Problem Respect errors classification from cplane	2024-06-13 14:42:26 +02:00
Heikki Linnakangas	dc2ab4407f	Fix on-demand SLRU download on standby starting at WAL segment boundary (#8031 ) If a standby is started right after switching to a new WAL segment, the request in the SLRU download request would point to the beginning of the segment (e.g. 0/5000000), while the not-modified-since LSN would point to just after the page header (e.g. 0/5000028). It's effectively the same position, as there cannot be any WAL records in between, but the pageserver rightly errors out on any request where the request LSN < not-modified since LSN. To fix, round down the not-modified since LSN to the beginning of the page like the request LSN. Fixes issue #8030	2024-06-13 00:31:31 +03:00
MMeent	ad0ab3b81b	Fix query error in vm-image-spec.yaml (#8028 ) This query causes metrics exporter to complain about missing data because it can't find the correct column. Issue was introduced with https://github.com/neondatabase/neon/pull/7761	2024-06-12 11:25:04 -07:00
Alex Chi Z	836d1f4af7	test(pageserver): add test keyspace into collect_keyspace (#8016 ) Some test cases add random keys into the timeline, but it is not part of the `collect_keyspace`, this will cause compaction remove the keys. The pull request adds a field to supply extra keyspaces during unit tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-12 17:42:43 +00:00
a-masterov	9dda13ecce	Add the image version to the neon-test-extensions image (#8032 ) ## Problem The version was missing in the image name causing the error during the workflow ## Summary of changes Added the version to the image name	2024-06-12 18:15:20 +02:00
Peter Bendel	9ba9f32dfe	Reactivate page bench test in CI after ignoring CopyFail error in pageserver (#8023 ) ## Problem Testcase page bench test_pageserver_max_throughput_getpage_at_latest_lsn had been deactivated because it was flaky. We now ignore copy fail error messages like in `270d3be507/test_runner/regress/test_pageserver_getpage_throttle.py (L17-L20)` and want to reactivate it to see it it is still flaky ## Summary of changes - reactivate the test in CI - ignore CopyFail error message during page bench test cases ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-06-12 16:10:57 +02:00
Vlad Lazar	3099e1a787	storcon_cli: do not drain to undesirable nodes (#8027 ) ## Problem The previous code would attempt to drain to unavailable or unschedulable nodes. ## Summary of Changes Remove such nodes from the list of nodes to fill.	2024-06-12 12:33:54 +01:00
a-masterov	f749437cec	Resolve the problem the docker compose caused by the extensions tests (#8024 ) ## Problem The merging of #7818 caused the problem with the docker-compose file. Running docker compose is now impossible due to the unavailability of the neon-test-extensions:latest image ## Summary of changes Fix the problem: Add the latest tag to the neon-test-extensions image and use the profiles feature of the docker-compose file to avoid loading the neon-test-extensions container if it is not needed.	2024-06-12 12:25:13 +02:00
Heikki Linnakangas	0a256148b0	Update documentation on running locally with Docker (#8020 ) - Fix the dockerhub URLs - `neondatabase/compute-node` image has been replaced with Postgres version specific images like `neondatabase/compute-node-v16` - Use TAG=latest in the example, rather than some old tag. That's a sensible default for people to copy-past - For convenience, use a Postgres connection URL in the `psql` example that also includes the password. That way, there's no need to set up .pgpass - Update the image names in `docker ps` example to match what you get when you follow the example	2024-06-12 07:06:00 +00:00
Heikki Linnakangas	69aa1aca35	Update default Postgres version in docker-compose.yml (#8019 ) Let's be modern.	2024-06-12 09:19:24 +03:00
Heikki Linnakangas	9983ae291b	Another attempt at making test_vm_bits less flaky (#7989 ) - Split the first and second parts of the test to two separate tests - In the first test, disable the aggressive GC, compaction, and autovacuum. They are only needed by the second test. I'd like to get the first test to a point that the VM page is never all-zeros. Disabling autovacuum in the first test is hopefully enough to accomplish that. - Compare the full page images, don't skip page header. After fixing the previous point, there should be no discrepancy. LSN still won't match, though, because of commit `387a36874c`. Fixes issue https://github.com/neondatabase/neon/issues/7984	2024-06-12 09:18:52 +03:00
Sasha Krassovsky	b7a0c2b614	Add On-demand WAL Download to logicalfuncs (#7960 ) We implemented on-demand WAL download for walsender, but other things that may want to read the WAL from safekeepers don't do that yet. This PR makes it do that by adding the same set of hooks to logicalfuncs. Addresses https://github.com/neondatabase/neon/issues/7959 Also relies on: https://github.com/neondatabase/postgres/pull/438 https://github.com/neondatabase/postgres/pull/437 https://github.com/neondatabase/postgres/pull/436	2024-06-11 17:59:32 -07:00
Arpad Müller	27518676d7	Rename S3 scrubber to storage scrubber (#8013 ) The S3 scrubber contains "S3" in its name, but we want to make it generic in terms of which storage is used (#7547). Therefore, rename it to "storage scrubber", following the naming scheme of already existing components "storage broker" and "storage controller". Part of #7547	2024-06-11 22:45:22 +00:00
Heikki Linnakangas	78a59b94f5	Copy editor config for the neon extension from PostgreSQL (#8009 ) This makes IDEs and github diff format the code the same way as PostgreSQL sources, which is the style we try to maintain.	2024-06-11 23:19:18 +03:00
Vlad Lazar	7121db3669	storcon_cli: add 'drain' command (#8007 ) ## Problem We need the ability to prepare a subset of storage controller managed pageservers for decommisioning. The storage controller cannot currently express this in terms of scheduling constraints (it's a pretty special case, so I'm not sure it even should). ## Summary of Changes A new `drain` command is added to `storcon_cli`. It takes a set of nodes to drain and migrates primary attachments outside of said set. Simple round robing assignment is used under the assumption that nodes outside of the draining set are evenly balanced. Note that secondary locations are not migrated. This is fine for staging, but the migration API will have to be extended for prod in order to allow migration of secondaries as well. I've tested this out against a neon local cluster. The immediate use for this command will be to migrate staging to ARM(Arch64) pageservers. Related https://github.com/neondatabase/cloud/issues/14029	2024-06-11 16:39:38 +00:00
Vlad Lazar	126bcc3794	storcon: track number of attached shards for each node (#8011 ) ## Problem The storage controller does not track the number of shards attached to a given pageserver. This is a requirement for various scheduling operations (e.g. draining and filling will use this to figure out if the cluster is balanced) ## Summary of Changes Track the number of shards attached to each node. Related https://github.com/neondatabase/neon/issues/7387	2024-06-11 16:03:25 +01:00
Alex Chi Z	4c2100794b	feat(pageserver): initial code sketch & test case for combined gc+compaction at gc_horizon (#7948 ) A demo for a building block for compaction. The GC-compaction operation iterates all layers below/intersect with the GC horizon, and do a full layer rewrite of all of them. The end result will be image layer covering the full keyspace at GC-horizon, and a bunch of delta layers above the GC-horizon. This helps us collect the garbages of the test_gc_feedback test case to reduce space amplification. This operation can be manually triggered using an HTTP API or be triggered based on some metrics. Actual method TBD. The test is very basic and it's very likely that most part of the algorithm will be rewritten. I would like to get this merged so that I can have a basic skeleton for the algorithm and then make incremental changes. <img width="924" alt="image" src="https://github.com/neondatabase/neon/assets/4198311/f3d49f4e-634f-4f56-986d-bfefc6ae6ee2"> --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-11 14:14:51 +00:00
Joonas Koivunen	d3b892e9ad	test: fix duplicated harness name (#8010 ) We need unique tenant harness names in case you want to inspect the results of the last failing run. We are not using any proc macros to get the test name as there is no stable way of doing that, and there will not be one in the future, so we need to fix these duplicates. Also, clean up the duplicated tests to not mix `?` and `unwrap/assert`.	2024-06-11 10:10:05 -04:00
Joonas Koivunen	7515d0f368	fix: stop storing TimelineMetadata in index_part.json as bytes (#7699 ) We've stored metadata as bytes within the `index_part.json` for long fixed reasons. #7693 added support for reading out normal json serialization of the `TimelineMetadata`. Change the serialization to only write `TimelineMetadata` as json for going forward, keeping the backward compatibility to reading the metadata as bytes. Because of failure to include `alias = "metadata"` in #7693, one more follow-up is required to make the switch from the old name to `"metadata": <json>`, but that affects only the field name in serialized format. In documentation and naming, an effort is made to add enough warning signs around TimelineMetadata so that it will receive no changes in the future. We can add those fields to `IndexPart` directly instead. Additionally, the path to cleaning up `metadata.rs` is documented in the `metadata.rs` module comment. If we must extend `TimelineMetadata` before that, the duplication suggested in [review comment] is the way to go. [review comment]: https://github.com/neondatabase/neon/pull/7699#pullrequestreview-2107081558	2024-06-11 15:38:54 +03:00
a-masterov	e27ce38619	Add testing for extensions (#7818 ) ## Problem We need automated tests of extensions shipped with Neon to detect possible problems. ## Summary of changes A new image neon-test-extensions is added. Workflow changes to test the shipped extensions are added as well. Currently, the regression tests, shipped with extensions are in use. Some extensions, i.e. rum, timescaledb, rdkit, postgis, pgx_ulid, pgtap, pg_tiktoken, pg_jsonschema, pg_graphql, kq_imcx, wal2json_2_5 are excluded due to problems or absence of internal tests. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-06-11 13:07:51 +02:00
Joonas Koivunen	e46692788e	refactor: Timeline layer flushing (#7993 ) The new features have deteriorated layer flushing, most recently with #7927. Changes: - inline `Timeline::freeze_inmem_layer` to the only caller - carry the TimelineWriterState guard to the actual point of freezing the layer - this allows us to `#[cfg(feature = "testing")]` the assertion added in #7927 - remove duplicate `flush_frozen_layer` in favor of splitting the `flush_frozen_layers_and_wait` - this requires starting the flush loop earlier for `checkpoint_distance < initdb size` tests	2024-06-10 19:34:34 +03:00
Alex Chi Z	a8ca7a1a1d	docs: highlight neon env comes with an initial timeline (#7995 ) Quite a few existing test cases create their own timelines instead of using the default one. This pull request highlights that and hopefully people can write simpler tests in the future. Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Yuchen Liang <70461588+yliang412@users.noreply.github.com>	2024-06-10 12:08:16 -04:00
Joonas Koivunen	b52e31c1a4	fix: allow layer flushes more often (#7927 ) As seen with the pgvector 0.7.0 index builds, we can receive large batches of images, leading to very large L0 layers in the range of 1GB. These large layers are produced because we are only able to roll the layer after we have witnessed two different Lsns in a single `DataDirModification::commit`. As the single Lsn batches of images can span over multiple `DataDirModification` lifespans, we will rarely get to write two different Lsns in a single `put_batch` currently. The solution is to remember the TimelineWriterState instead of eagerly forgetting it until we really open the next layer or someone else flushes (while holding the write_guard). Additional changes are test fixes to avoid "initdb image layer optimization" or ignoring initdb layers for assertion. Cc: #7197 because small `checkpoint_distance` will now trigger the "initdb image layer optimization"	2024-06-10 13:50:17 +00:00
Heikki Linnakangas	5a7e285c2c	Simplify scanning compute logs in tests (#7997 ) Implement LogUtils in the Endpoint fixture class, so that the "log_contains" function can be used on compute logs too. Per discussion at: https://github.com/neondatabase/neon/pull/7288#discussion_r1623633803	2024-06-10 12:52:49 +00:00
Christian Schwarz	ae5badd375	Revert "Include openssl and ICU statically linked" (#8003 ) Reverts neondatabase/neon#7956 Rationale: compute incompatibilties Slack thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1718011276665839?thread_ts=1718008160.431869&cid=C033RQ5SPDH Relevant quotes from @hlinnaka > If we go through with the current release candidate, but the compute is pinned, people who create new projects will get that warning, which is silly. To them, it looks like the ICU version was downgraded, because initdb was run with newer version. > We should upgrade the ICU version eventually. And when we do that, users with old projects that use ICU will start to see that warning. I think that's acceptable, as long as we do homework, notify users, and communicate that properly. > When do that, we should to try to upgrade the storage and compute versions at roughly the same time.	2024-06-10 13:20:20 +02:00
Alex Chi Z	3e63d0f9e0	test(pageserver): quantify compaction outcome (#7867 ) A simple API to collect some statistics after compaction to easily understand the result. The tool reads the layer map, and analyze range by range instead of doing single-key operations, which is more efficient than doing a benchmark to collect the result. It currently computes two key metrics: * Latest data access efficiency, which finds how many delta layers / image layers the system needs to iterate before returning any key in a key range. * (Approximate) PiTR efficiency, as in https://github.com/neondatabase/neon/issues/7770, which is simply the number of delta files in the range. The reason behind that is, assume no image layer is created, PiTR efficiency is simply the cost of collect records from the delta layers, and the replay time. Number of delta files (or in the future, estimated size of reads) is a simple yet efficient way of estimating how much effort the page server needs to reconstruct a page. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-10 10:42:13 +02:00
Rahul Patil	3b647cd55d	Include openssl and ICU statically linked (#7956 ) ## Problem Due to the upcoming End of Life (EOL) for Debian 11, we need to upgrade the base OS for Pageservers from Debian 11 to Debian 12 for security reasons. When deploying a new Pageserver on Debian 12 with the same binary built on Debian 11, we encountered the following errors: ``` could not execute operation: pageserver error, status: 500, msg: Command failed with status ExitStatus(unix_wait_status(32512)): /usr/local/neon/v16/bin/initdb: error while loading shared libraries: libicuuc.so.67: cannot open shared object file: No such file or directory ``` and ``` could not execute operation: pageserver error, status: 500, msg: Command failed with status ExitStatus(unix_wait_status(32512)): /usr/local/neon/v14/bin/initdb: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory ``` These issues occur when creating new projects. ## Summary of changes - To address these issues, we configured PostgreSQL build to use statically linked OpenSSL and ICU libraries. - This resolves the missing shared library errors when running the binaries on Debian 12. Closes: https://github.com/neondatabase/cloud/issues/12648 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [x] Do not forget to reformat commit message to not include the above checklist	2024-06-07 17:28:10 +00:00
Tristan Partin	26c68f91f3	Move SQL migrations out of line It makes them much easier to reason about, and allows other SQL tooling to operate on them like language servers, formatters, etc. I also brought back the removed migrations such that we can more easily understand what they were. I included a "-- SKIP" comment describing why those migrations are now skipped. We no longer skip migrations by checking if it is empty, but instead check to see if the migration starts with "-- SKIP".	2024-06-07 08:35:55 -07:00
a-masterov	2078dc827b	CI: copy run-* labels from external contributors' PRs (#7915 ) ## Problem We don't carry run-* labels from external contributors' PRs to ci-run/pr-* PRs. This is not really convenient. Need to sync labels in approved-for-ci-run workflow. ## Summary of changes Added the procedure of transition of labels from the original PR ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-06-07 10:04:59 +02:00
Joonas Koivunen	8ee191c271	test_local_only_layers_after_crash: various fixes (#7986 ) In #7927 I needed to fix this test case, but the fixes should be possible to land irrespective of the layer ingestion code change. The most important fix is the behavior if an image layer is found: the assertion message formatting raises a runtime error, which obscures the fact that we found an image layer.	2024-06-07 10:18:05 +03:00
Anastasia Lubennikova	66c6b270f1	Downgrade No response from reading prefetch entry WARNING to LOG	2024-06-06 20:56:19 +01:00
Arthur Petukhovsky	e4e444f59f	Remove random sleep in partial backup (#7982 ) We had a random sleep in the beginning of partial backup task, which was needed for the first partial backup deploy. It helped with gradual upload of segments without causing network overload. Now partial backup is deployed everywhere, so we don't need this random sleep anymore. We also had an issue related to this, in which manager task was not shut down for a long time. The cause of the issue is this random sleep that didn't take timeline cancellation into account, meanwhile manager task waited for partial backup to complete. Fixes https://github.com/neondatabase/neon/issues/7967	2024-06-06 17:54:44 +00:00
Joonas Koivunen	d46d19456d	raise the warning for oversized L0 to 2target (#7985 ) currently we warn even by going over a single byte. even that will be hit much more rarely once #7927 lands, but get this in earlier. rationale for 2checkpoint_distance: anything smaller is not really worth a warn. we have an global allowed_error for this warning, which still cannot be removed nor can it be removed with #7927 because of many tests with very small `checkpoint_distance`.	2024-06-06 20:18:39 +03:00
Alex Chi Z	5d05013857	fix(pageserver): skip metadata compaction is LSN is not accumulated enough (#7962 ) close https://github.com/neondatabase/neon/issues/7937 Only trigger metadata image layer creation if enough delta layers are accumulated. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-06 11:34:44 -04:00
Alex Chi Z	014509987d	fix(pageserver): more flexible layer size test (#7945 ) M-series macOS has different alignments/size for some fields (which I did not investigate in detail) and therefore this test cannot pass on macOS. Fixed by using `<=` for the comparison so that we do not test for an exact match. observed by @yliang412 Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-06 14:40:58 +00:00
Arpad Müller	75bca9bb19	Perform retries on azure bulk deletion (#7964 ) This adds retries to the bulk deletion, because if there is a certain chance n that a request fails, the chance that at least one of the requests in a chain of requests fails increases exponentially. We've had similar issues with the S3 DR tests, which in the end yielded in adding retries at the remote_storage level. Retries at the top level are not sufficient when one remote_storage "operation" is multiple network requests in a trench coat, especially when there is no notion of saving the progress: even if prior deletions had been successful, we'd still need to get a 404 in order to continue the loop and get to the point where we failed in the last iteration. Maybe we'll fail again but before we've even reached it. Retries at the bottom level avoid this issue because they have the notion of progress and also when one network operation fails, only that operation is retried. First part of #7931.	2024-06-06 14:21:27 +00:00
Joonas Koivunen	a8be07785e	fix: do TimelineMetrics::shutdown only once (#7983 ) Related to #7341 tenant deletion will end up shutting down timelines twice, once before actually starting and the second time when per timeline deletion is requested. Shutting down TimelineMetrics causes underflows. Add an atomic boolean and only do the shutdown once.	2024-06-06 14:20:54 +00:00
Yuchen Liang	630cfbe420	refactor(pageserver): designated api error type for cancelled request (#7949 ) Closes #7406. ## Problem When a `get_lsn_by_timestamp` request is cancelled, an anyhow error is exposed to handle that case, which verbosely logs the error. However, we don't benefit from having the full backtrace provided by anyhow in this case. ## Summary of changes This PR introduces a new `ApiError` type to handle errors caused by cancelled request more robustly. - A new enum variant `ApiError::Cancelled` - Currently the cancelled request is mapped to status code 500. - Need to handle this error in proxy's `http_util` as well. - Added a failpoint test to simulate cancelled `get_lsn_by_timestamp` request. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-06-06 14:00:14 +00:00
Christian Schwarz	0a65333fff	chore(walredo): avoid duplicate tenant_id and shard_slug fields (#7977 ) spotted during reviews of async walredo work in #6628	2024-06-06 15:10:16 +02:00
John Spray	91dd99038e	pageserver/controller: enable tenant deletion without attachment (#7957 ) ## Problem As described in #7952, the controller's attempt to reconcile a tenant before finally deleting it can get hung up waiting for the compute notification hook to accept updates. The fact that we try and reconcile a tenant at all during deletion is part of a more general design issue (#5080), where deletion was implemented as an operation on attached tenant, requiring the tenant to be attached in order to delete it, which is not in principle necessary. Closes: #7952 ## Summary of changes - In the pageserver deletion API, only do the traditional deletion path if the tenant is attached. If it's secondary, then tear down the secondary location, and then do a remote delete. If it's not attached at all, just do the remote delete. - In the storage controller, instead of ensuring a tenant is attached before deletion, do a best-effort detach of the tenant, and then call into some arbitrary pageserver to issue a deletion of remote content. The pageserver retains its existing delete behavior when invoked on attached locations. We can remove this later when all users of the API are updated to either do a detach-before-delete. This will enable removing the "weird" code paths during startup that sometimes load a tenant and then immediately delete it, and removing the deletion markers on tenants.	2024-06-05 20:22:54 +00:00
Christian Schwarz	83ab14e271	chore!: remove walredo_process_kind config option & kind type (#7756 ) refs https://github.com/neondatabase/neon/issues/7753 Preceding PR https://github.com/neondatabase/neon/pull/7754 laid out the plan, this one wraps it up.	2024-06-05 14:21:10 +02:00
Peter Bendel	85ef6b1645	upgrade pgvector from 0.7.0 to 0.7.1 (#7954 ) ## Problem ## Summary of changes performance improvements in pgvector 0.7.1 for hnsw index builds, see https://github.com/pgvector/pgvector/issues/570	2024-06-05 10:32:03 +02:00
Alex Chi Z	1a8d53ab9d	feat(pageserver): compute aux file size on initial logical size calculation (#7958 ) close https://github.com/neondatabase/neon/issues/7822 close https://github.com/neondatabase/neon/issues/7443 Aux file metrics is computed incrementally. If the size is not initialized, the metrics will never show up. This pull request adds the functionality to compute the aux file size on initial logical size calculation. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-04 13:47:48 -04:00
Joonas Koivunen	3d6e389aa2	feat: support changing IndexPart::metadata_bytes to json in future release (#7693 ) ## Problem Currently we serialize the `TimelineMetadata` into bytes to put it into `index_part.json`. This `Vec<u8>` (hopefully `[u8; 512]`) representation was chosen because of problems serializing TimelineId and Lsn between different serializers (bincode, json). After #5335, the serialization of those types became serialization format aware or format agnostic. We've removed the pageserver local `metadata` file writing in #6769. ## Summary of changes Allow switching from the current serialization format to plain JSON for the legacy TimelineMetadata format in the future by adding a competitive serialization method to the current one (`crate::tenant::metadata::modern_serde`), which accepts both old bytes and new plain JSON. The benefits of this are that dumping the index_part.json with pretty printing no longer produces more than 500 lines of output, but after enabling it produces lines only proportional to the layer count, like: ```json { "version": ???, "layer_metadata": { ... }, "disk_consistent_lsn": "0/15FD5D8", "legacy_metadata": { "disk_consistent_lsn": "0/15FD5D8", "prev_record_lsn": "0/15FD5A0", "ancestor_timeline": null, "ancestor_lsn": "0/0", "latest_gc_cutoff_lsn": "0/149FD18", "initdb_lsn": "0/149FD18", "pg_version": 15 } } ``` In the future, I propose we completely stop using this legacy metadata type and wasting time trying to come up with another version numbering scheme in addition to the informative-only one already found in `index_part.json`, and go ahead with storing metadata or feature flags on the `index_part.json` itself. #7699 is the "one release after" changes which starts to produce metadata in the index_part.json as json.	2024-06-04 19:36:22 +03:00
Christian Schwarz	17116f2ea9	fix(pageserver): abort on duplicate layers, before doing damage (#7799 ) fixes https://github.com/neondatabase/neon/issues/7790 (duplicating most of the issue description here for posterity) # Background From the time before always-authoritative `index_part.json`, we had to handle duplicate layers. See the RFC for an illustration of how duplicate layers could happen: `a8e6d259cb/docs/rfcs/027-crash-consistent-layer-map-through-index-part.md (L41-L50)` As of #5198 , we should not be exposed to that problem anymore. # Problem 1 We still have 1. [code in Pageserver](`82960b2175/pageserver/src/tenant/timeline.rs (L4502-L4521)`) than handles duplicate layers 2. [tests in the test suite](`d9dcbffac3/test_runner/regress/test_duplicate_layers.py (L15)`) that demonstrates the problem using a failpoint However, the test in the test suite doesn't use the failpoint to induce a crash that could legitimately happen in production. What is does instead is to return early with an `Ok()`, so that the code in Pageserver that handles duplicate layers (item 1) actually gets exercised. That "return early" would be a bug in the routine if it happened in production. So, the tests in the test suite are tests for their own sake, but don't serve to actually regress-test any production behavior. # Problem 2 Further, if production code _did_ (it nowawdays doesn't!) create a duplicate layer, the code in Pageserver that handles the condition (item 1 above) is too little and too late: * the code handles it by discarding the newer `struct Layer`; that's good. * however, on disk, we have already overwritten the old with the new layer file * the fact that we do it atomically doesn't matter because ... * if the new layer file is not bit-identical, then we have a cache coherency problem * PS PageCache block cache: caches old bit battern * blob_io offsets stored in variables, based on pre-overwrite bit pattern / offsets * => reading based on these offsets from the new file might yield different data than before # Solution - Remove the test suite code pertaining to Problem 1 - Move & rename test suite code that actually tests RFC-27 crash-consistent layer map. - Remove the Pageserver code that handles duplicate layers too late (Problem 1) - Use `RENAME_NOREPLACE` to prevent over-rename the file during `.finish()`, bail with an error if it happens (Problem 2) - This bailing prevents the caller from even trying to insert into the layer map, as they don't even get a `struct Layer` at hand. - Add `abort`s in the place where we have the layer map lock and check for duplicates (Problem 2) - Note again, we can't reach there because we bail from `.finish()` much earlier in the code. - Share the logic to clean up after failed `.finish()` between image layers and delta layers (drive-by cleanup) - This exposed that test `image_layer_rewrite` was overwriting layer files in place. Fix the test. # Future Work This PR adds a new failure scenario that was previously "papered over" by the overwriting of layers: 1. Start a compaction that will produce 3 layers: A, B, C 2. Layer A is `finish()`ed successfully. 3. Layer B fails mid-way at some `put_value()`. 4. Compaction bails out, sleeps 20s. 5. Some disk space gets freed in the meantime. 6. Compaction wakes from sleep, another iteration starts, it attempts to write Layer A again. But the `.finish()` fails because A already exists on disk. The failure in step 5 is new with this PR, and it causes the compaction to get stuck. Before, it would silently overwrite the file and "successfully" complete the second iteration. The mitigation for this is to `/reset` the tenant.	2024-06-04 16:16:23 +00:00
John Spray	fd22fc5b7d	pageserver: include heatmap in tenant deletion (#7928 ) ## Problem This was an oversight when adding heatmaps: because they are at the top level of the tenant, they aren't included in the catch-all list & delete that happens for timeline paths. This doesn't break anything, but it leaves behind a few kilobytes of garbage in the S3 bucket after a tenant is deleted, generating work for the scrubber. ## Summary of changes - During deletion, explicitly remove the heatmap file - In test_tenant_delete_smoke, upload a heatmap so that the test would fail its "remote storage empty after delete" check if we didn't delete it.	2024-06-04 16:16:50 +01:00
Joonas Koivunen	0112097e13	feat(rtc): maintain dirty and uploaded IndexPart (#7833 ) RemoteTimelineClient maintains a copy of "next IndexPart" as a number of fields which are like an IndexPart but this is not immediately obvious. Instead of multiple fields, maintain a `dirty` ("next IndexPart") and `clean` ("uploaded IndexPart") fields. Additional cleanup: - rename `IndexPart::disk_consistent_lsn` accessor `duplicated_disk_consistent_lsn` - no one except scrubber should be looking at it, even scrubber is a stretch - remove usage elsewhere (pagectl used by tests, metadata scan endpoint) - serialize index part before the index upload operation - avoid upload operation being retried because of serialization error - serialization error is fatal anyway for timeline -- it can only make transient local progress after that, at least the error is bubbled up now - gather exploded IndexPart fields into single actual `UploadQueueInitialized::dirty` of which the uploaded snapshot is serialized - implement the long wished monotonicity check with the `clean` IndexPart with an assertion which is not expected to fire Continued work from #7860 towards next step of #6994.	2024-06-04 17:27:08 +03:00
Joonas Koivunen	9d4c113f9b	build(Dockerfile.compute-node): do not log tar contents (#7953 ) in build logs we get a lot of lines for building the compute node images because of verbose tar unpack. we know the sha256 so we don't need to log the contents. my hope is that this will allow us more reliably use the github live updating log view.	2024-06-04 12:42:57 +01:00
Joonas Koivunen	0acb604fa3	test: no missed wakeups, cancellation and timeout flow to downloads (#7863 ) I suspected a wakeup could be lost with `remote_storage::support::DownloadStream` if the cancellation and inner stream wakeups happen simultaneously. The next poll would only return the cancellation error without setting the wakeup. There is no lost wakeup because the single future for getting the cancellation error is consumed when the value is ready, and a new future is created for the next value. The new future is always polled. Similarly, if only the `Stream::poll_next` is being used after a `Some(_)` value has been yielded, it makes no sense to have an expectation of a wakeup for the (N+1)th stream value already set because when a value is wanted, `Stream::poll_next` will be called. A test is added to show that the above is true. Additionally, there was a question of these cancellations and timeouts flowing to attached or secondary tenant downloads. A test is added to show that this, in fact, happens. Lastly, a warning message is logged when a download stream is polled after a timeout or cancellation error (currently unexpected) so we can rule it out while troubleshooting.	2024-06-04 14:19:36 +03:00
Konstantin Knizhnik	387a36874c	Set page LSN when reconstructing VM in page server (#7935 ) ## Problem Page LSN is not set while VM update. May be reason of test_vm_bits flukyness. Buit more serious issues can be also caused by wrong LSN. Related: https://github.com/neondatabase/neon/pull/7935 ## Summary of changes - In `apply_in_neon`, set the LSN bytes when applying records of type `ClearVisibilityMapFlags`	2024-06-04 09:56:03 +01:00
Anna Khanova	00032c9d9f	[proxy] Fix dynamic rate limiter (#7950 ) ## Problem There was a bug in dynamic rate limiter, which exhausted CPU in proxy and proxy wasn't able to accept any connections. ## Summary of changes 1. `if self.available > 1` -> `if self.available >= 1` 2. remove `timeout_at` to use just timeout 3. remove potential infinite loops which can exhaust CPUs.	2024-06-04 05:07:54 +01:00
John Spray	11bb265de1	pageserver: don't squash all image layer generation errors into anyhow::Error (#7943 ) ## Problem CreateImageLayersError and CompactionError had proper From implementations, but compact_legacy was explicitly squashing all image layer errors into an anyhow::Error anyway. This led to errors like: ``` Error processing HTTP request: InternalServerError(timeline shutting down Stack backtrace: 0: <<anyhow::Error as core::convert::From<pageserver::tenant::timeline::CreateImageLayersError>>::from as core::ops::function::FnOnce<(pageserver::tenant::timeline::CreateImageLayersError,)>>::call_once at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5 1: <core::result::Result<alloc::vec::Vec<pageserver::tenant::storage_layer::layer::ResidentLayer>, pageserver::tenant::timeline::CreateImageLayersError>>::map_err::<anyhow::Error, <anyhow::Error as core::convert::From<pageserver::tenant::timeline::CreateImageLayersError>>::from> at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/result.rs:829:27 2: <pageserver::tenant::timeline::Timeline>::compact_legacy::{closure#0} at pageserver/src/tenant/timeline/compaction.rs:125:36 3: <pageserver::tenant::timeline::Timeline>::compact::{closure#0} at pageserver/src/tenant/timeline.rs:1719:84 4: pageserver::http::routes::timeline_checkpoint_handler::{closure#0}::{closure#0} ``` Closes: https://github.com/neondatabase/neon/issues/7861	2024-06-03 22:10:13 +02:00
John Spray	69026a9a36	storcon_cli: add 'drop' and eviction interval utilities (#7938 ) The storage controller has 'drop' APIs for tenants and nodes, for use in situations where something weird has happened: - node-drop is useful until we implement proper node decom, or if we have a partially provisioned node that somehow gets registered with the storage controller but is then dead. - tenant-drop is useful if we accidentally add a tenant that shouldn't be there at all, or if we want to make the controller forget about a tenant without deleting its data. For example, if one uses the tenant-warmup command with a bad tenant ID and needs to clean that up. The drop commands require an `--unsafe` parameter, to reduce the chance that someone incorrectly assumes these are the normal/clean ways to delete things. This PR also adds a convenience command for setting the time based eviction parameters on a tenant. This is useful when onboarding an existing tenant that has high resident size due to storage amplification in compaction: setting a lower time based eviction threshold brings down the resident size ahead of doing a shard split.	2024-06-03 18:13:01 +00:00
Konstantin Knizhnik	7006caf3a1	Store logical replication origin in KV storage (#7099 ) Store logical replication origin in KV storage ## Problem See #6977 ## Summary of changes * Extract origin_lsn from commit WAl record * Add ReplOrigin key to KV storage and store origin_lsn * In basebackup replace snapshot origin_lsn with last committed origin_lsn at basebackup LSN ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Alex Chi Z <chi@neon.tech>	2024-06-03 19:37:33 +03:00
John Spray	69d18d6429	s3_scrubber: add `pageserver-physical-gc` (#7925 ) ## Problem Currently, we leave `index_part.json` objects from old generations behind each time a pageserver restarts or a tenant is migrated. This doesn't break anything, but it's annoying when a tenant has been around for a long time and starts to accumulate 10s-100s of these. Partially implements: #7043 ## Summary of changes - Add a new `pageserver-physical-gc` command to `s3_scrubber` The name is a bit of a mouthful, but I think it makes sense: - GC is the accurate term for what we are doing here: removing data that takes up storage but can never be accessed. - "physical" is a necessary distinction from the "normal" GC that we do online in the pageserver, which operates at a higher level in terms of LSNs+layers, whereas this type of GC is purely about S3 objects. - "pageserver" makes clear that this command deals exclusively with pageserver data, not safekeeper.	2024-06-03 17:16:23 +01:00
Arpad Müller	acf0a11fea	Move keyspace utils to inherent impls (#7929 ) The keyspace utils like `is_rel_size_key` or `is_rel_fsm_block_key` and many others are free functions and have to be either imported separately or specified with the full path starting in `pageserver_api:🔑:`. This is less convenient than if these functions were just inherent impls. Follow-up of #7890 Fixes #6438	2024-06-03 16:18:07 +02:00
Alex Chi Z	c1f55c1525	feat(pageserver): collect aux file tombstones (#7900 ) close https://github.com/neondatabase/neon/issues/7800 This is a small change to enable the tombstone -> exclude from image layer path. Most of the pull request is unit tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-03 09:56:36 -04:00
Joonas Koivunen	34f450c05a	test: allow no vectored gets happening (#7939 ) when running the regress tests locally without any environment variables we use on CI, `test_pageserver_compaction_smoke` fails with division by zero. fix it temporarily by allowing no vectored read happening. to be cleaned when vectored get validation gets removed and the default value can be changed. Cc: https://github.com/neondatabase/neon/issues/7381	2024-06-03 09:37:11 -04:00
Arpad Müller	db477c0b8c	Add metrics for Azure blob storage (#7933 ) In issue #5590 it was proposed to implement metrics for Azure blob storage. This PR implements them except for the part that performs the rename, which is left for a followup. Closes #5590	2024-06-02 14:10:56 +00:00
Arthur Petukhovsky	a345cf3fc6	Fix span for WAL removal task (#7930 ) During refactoring in https://github.com/neondatabase/neon/pull/7887 I forgot to add "WAL removal" span with ttid. This commit fixes it.	2024-06-01 12:23:59 +01:00
Arthur Petukhovsky	e98bc4fd2b	Run gc on too many partial backup segments (#7700 ) The general partial backup idea is that each safekeeper keeps only one partial segment in remote storage at a time. Sometimes this is not true, for example if we uploaded object to S3 but got an error when tried to remove the previous upload. In this case we still keep a list of all potentially uploaded objects in safekeeper state. This commit prints a warning to logs if there is too many objects in safekeeper state. This is not expected and we should try to fix this state, we can do this by running gc. I haven't seen this being an issue anywhere, but printing a warning is something that I wanted to do and forgot in initial PR.	2024-06-01 00:18:56 +01:00
John Spray	7e60563910	pageserver: add GcError type (#7917 ) ## Problem - Because GC exposes all errors as an anyhow::Error, we have intermittent issues with spurious log errors during shutdown, e.g. in this failure of a performance test https://neon-github-public-dev.s3.amazonaws.com/reports/main/9300804302/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/214a2154f6f0217a/ ``` Gc failed 1 times, retrying in 2s: shutting down ``` GC really doesn't do a lot of complicated IO: it doesn't benefit from the backtrace capabilities of anyhow::Error, and can be expressed more robustly as an enum. ## Summary of changes - Add GcError type and use it instead of anyhow::Error in GC functions - In `gc_iteration_internal`, return GcError::Cancelled on shutdown rather than Ok(()) (we only used Ok before because we didn't have a clear cancellation error variant to use). - In `gc_iteration_internal`, skip past timelines that are shutting down, to avoid having to go through another GC iteration if we happen to see a deleting timeline during a GC run. - In `refresh_gc_info_internal`, avoid an error case where a timeline might not be found after being looked up, by carrying an Arc<Timeline> instead of a TimelineId between the first loop and second loop in the function. - In HTTP request handler, handle Cancelled variants as 503 instead of turning all GC errors into 500s.	2024-05-31 22:20:06 +01:00
Joonas Koivunen	ef83f31e77	pagectl: key command for dumping what we know about the key (#7890 ) What we know about the key via added `pagectl key $key` command: - debug formatting - shard placement when `--shard-count` is specified - different boolean queries in `key.rs` - aux files v2 Example: ``` $ cargo run -qp pagectl -- key 000000063F00004005000060270000100E2C parsed from hex: 000000063F00004005000060270000100E2C: Key { field1: 0, field2: 1599, field3: 16389, field4: 24615, field5: 0, field6: 1052204 } rel_block: true rel_vm_block: false rel_fsm_block: false slru_block: false inherited: true rel_size: false slru_segment_size: false recognized kind: None ```	2024-05-31 18:19:41 +00:00
John Spray	9fda85b486	pageserver: remove AncestorStopping error variants (#7916 ) ## Problem In all cases, AncestorStopping is equivalent to Cancelled. This became more obvious in https://github.com/neondatabase/neon/pull/7912#discussion_r1620582309 when updating these error types. ## Summary of changes - Remove AncestorStopping, always use Cancelled instead	2024-05-31 17:02:10 +01:00
Alex Chi Z	87afbf6b24	test(pageserver): add test interface to create artificial layers (#7899 ) This pull request adds necessary interfaces to deterministically create scenarios we want to test. Simplify some test cases to use this interface to make it stable + reproducible. Compaction test will be able to use this interface. Also the upcoming delete tombstone tests will use this interface to make test reproducible. ## Summary of changes * `force_create_image_layer` * `force_create_delta_layer` * `force_advance_lsn` * `create_test_timeline_with_states` * `branch_timeline_test_with_states` --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-31 12:00:40 -04:00
Arthur Petukhovsky	16b2e74037	Add FullAccessTimeline guard in safekeepers (#7887 ) This is a preparation for https://github.com/neondatabase/neon/issues/6337. The idea is to add FullAccessTimeline, which will act as a guard for tasks requiring access to WAL files. Eviction will be blocked on these tasks and WAL won't be deleted from disk until there is at least one active FullAccessTimeline. To get FullAccessTimeline, tasks call `tli.full_access_guard().await?`. After eviction is implemented, this function will be responsible for downloading missing WAL file and waiting until the download finishes. This commit also contains other small refactorings: - Separate `get_tenant_dir` and `get_timeline_dir` functions for building a local path. This is useful for looking at usages and finding tasks requiring access to local filesystem. - `timeline_manager` is now responsible for spawning all background tasks - WAL removal task is now spawned instantly after horizon is updated	2024-05-31 13:19:45 +00:00
John Spray	5a394fde56	pageserver: avoid spurious "bad state" logs/errors during shutdown (#7912 ) ## Problem - Initial size calculations tend to fail with `Bad state (not active)` Closes: https://github.com/neondatabase/neon/issues/7911 ## Summary of changes - In `wait_lsn`, return WaitLsnError::Cancelled rather than BadState when the state is Stopping - Replace PageReconstructError's `Other` variant with a specific `BadState` variant - Avoid returning anyhow::Error from get_ready_ancestor_timeline -- this was only used for the case where there was no ancestor. All callers of this function had implicitly checked that the ancestor timeline exists before calling it, so they can pass in the ancestor instead of handling an error.	2024-05-31 13:31:42 +01:00
Arseny Sher	7ec70b5eff	safekeeper: rename epoch to last_log_term. epoch is a historical and potentially confusing name. It semantically means lastLogTerm from the raft paper, so let's use it. This commit changes only internal namings, not public interface (http).	2024-05-31 12:59:13 +03:00
Arseny Sher	1fcc2b37eb	Add test checking term change during pull_timeline.	2024-05-31 12:58:59 +03:00
Arseny Sher	af40bf3c2e	Fix term/epoch confusion in python tests. Call epoch last_log_term and add separate term field.	2024-05-31 12:58:59 +03:00
Arseny Sher	e6db8069b0	neon_walreader: check after local read that the segment still exists. Otherwise read might receive zeros/garbage if the file is recycled (renamed) for as a future segment.	2024-05-31 12:57:56 +03:00
John Spray	98dadf8543	pageserver: quieten some shutdown logs around logical size and flush (#7907 ) ## Problem Looking at several noisy shutdown logs: - In https://github.com/neondatabase/neon/issues/7861 we're hitting a log error with `InternalServerError(timeline shutting down\n'` on the checkpoint API handler. - In the field, we see initial_logical_size_calculation errors on shutdown, via DownloadError - In the field, we see errors logged from layer download code (independent of the error propagated) during shutdown Closes: https://github.com/neondatabase/neon/issues/7861 ## Summary of changes The theme of these changes is to avoid propagating anyhow::Errors for cases that aren't really unexpected error cases that we might want a stacktrace for, and avoid "Other" error variants unless we really do have unexpected error cases to propagate. - On the flush_frozen_layers path, use the `FlushLayerError` type throughout, rather than munging it into an anyhow::Error. Give FlushLayerError an explicit from_anyhow helper that checks for timeline cancellation, and uses it to give a Cancelled error instead of an Other error when the timeline is shutting down. - In logical size calculation, remove BackgroundCalculationError (this type was just a Cancelled variant and an Other variant), and instead use CalculateLogicalSizeError throughout. This can express a PageReconstructError, and has a From impl that translates cancel-like page reconstruct errors to Cancelled. - Replace CalculateLogicalSizeError's Other(anyhow::Error) variant case with a Decode(DeserializeError) variant, as this was the only kind of error we actually used in the Other case. - During layer download, drop out early if the timeline is shutting down, so that we don't do an `error!()` log of the shutdown error in this case.	2024-05-31 09:18:58 +01:00
Arpad Müller	c18b1c0646	Update tokio-epoll-uring for linux-raw-sys (#7918 ) Updates the `tokio-epoll-uring` dependency. There is [only one change](`342ddd197a...08ccfa94ff`), the adoption of linux-raw-sys for `statx` instead of using libc. Part of #7889.	2024-05-30 17:45:48 +02:00
Alex Chi Z	f20a9e760f	chore(pageserver): warn on delete non-existing file (#7847 ) Consider the following sequence of migration: ``` 1. user starts compute 2. force migrate to v2 3. user continues to write data ``` At the time of (3), the compute node is not aware that the page server does not contain replication states any more, and might continue to ingest neon-file records into the safekeeper. This will leave the pageserver store a partial replication state and cause some errors. For example, the compute could issue a deletion of some aux files in v1, but this file does not exist in v2. Therefore, we should ignore all these errors until everyone is migrated to v2. Also note that if we see this warning in prod, it is likely because we did not fully suspend users' compute when flipping the v1/v2 flag. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-30 14:45:34 +00:00
Alex Chi Z	33395dcf4e	perf(pageserver): postpone vectored get fringe keyspace construction (#7904 ) Perf shows a significant amount of time is spent on `Keyspace::merge`. This pull request postpones merging keyspace until retrieving the layer, which contributes to a 30x improvement in aux keyspace basebackup time. ``` --- old 10000 files found in 0.580569459s --- new 10000 files found in 0.02995075s ``` Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-30 10:31:57 -04:00
Alex Chi Z	1eca8b8a6b	fix(pageserver): ensure to_i128 works for metadata keys (#7895 ) field2 of metadata keys can be 0xFFFF because of the mapping. Allow 0xFFFF for `to_i128`. An alternative is to encode 0xFFFF as 0xFFFFFFFF (which is allowed in the original `to_i128`). But checking the places where field2 is referenced, the rest part of the system does not seem to depend on this assertion. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-30 10:03:17 -04:00
YukiSeino	167394a073	refacter : VirtualFile::open uses AsRef (#7908 ) ## Problem #7371 ## Summary of changes * The VirtualFile::open, open_with_options, and create methods use AsRef, similar to the standard library's std::fs APIs.	2024-05-30 15:58:20 +02:00
Conrad Ludgate	9a081c230f	proxy: lazily parse startup pg params (#7905 ) ## Problem proxy params being a `HashMap<String,String>` when it contains just ``` application_name: psql database: neondb user: neondb_owner ``` is quite wasteful allocation wise. ## Summary of changes Keep the params in the wire protocol form, eg: ``` application_name\0psql\0database\0neondb\0user\0neondb_owner\0 ``` Using a linear search for the map is fast enough at small sizes, which is the normal case.	2024-05-30 11:02:38 +00:00
Conrad Ludgate	fddd11dd1a	proxy: upload postgres connection options as json in the parquet upload (#7903 ) ## Problem https://github.com/neondatabase/cloud/issues/9943 ## Summary of changes Captures the postgres options, converts them to json, uploads them in parquet.	2024-05-30 11:10:27 +01:00
Conrad Ludgate	238fa47bee	proxy fix wake compute rate limit (#7902 ) ## Problem We were rate limiting wake_compute in the wrong place ## Summary of changes Move wake_compute rate limit to after the permit is acquired. Also makes a slight refactor on normalize, as it caught my eye	2024-05-30 11:09:27 +01:00
a-masterov	b0a954bde2	CI: switch ubuntu-latest with ubuntu-22.04 (#7256 ) (#7901 ) ## Problem We use ubuntu-latest as a default OS for running jobs. It can cause problems due to instability, so we should use the LTS version of Ubuntu. ## Summary of changes The image ubuntu-latest was changed with ubuntu-22.04 in workflows. ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-05-30 08:25:10 +02:00
Konstantin Knizhnik	7ac11d3942	Do not produce error if gin page is not restored in redo (#7876 ) ## Problem See https://github.com/neondatabase/cloud/issues/10845 ## Summary of changes Do not report error if GIN page is not restored ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-29 22:18:09 +03:00
Conrad Ludgate	c8cebecabf	proxy: reintroduce dynamic limiter for compute lock (#7737 ) ## Problem Computes that are healthy can manage many connection attempts at a time. Unhealthy computes cannot. We initially handled this with a fixed concurrency limit, but it seems this inhibits pgbench. ## Summary of changes Support AIMD for connect_to_compute lock to allow varying the concurrency limit based on compute health	2024-05-29 11:17:05 +01:00
Arpad Müller	14df69d0e3	Drop postgres-native-tls in favour of tokio-postgres-rustls (#7883 ) Get rid of postgres-native-tls and openssl in favour of rustls in our dependency tree. Do further steps to completely remove native-tls and openssl. Among other advantages, this allows us to do static musl builds more easily: #7889	2024-05-28 15:40:52 +00:00
John Spray	352b08d0be	pageserver: fix a warning on secondary mode downloads after evictions (#7877 ) ## Problem In `4ce6e2d2fc` we added a warning when progress stats don't look right at the end of a secondary download pass. This `Correcting drift in progress stats` warning fired in staging on a pageserver that had been doing some disk usage eviction. The impact is low because in the same place we log the warning, we also fix up the progress values. ## Summary of changes - When we skip downloading a layer because it was recently evicted, update the progress stats to ensure they still reach a clean complete state at the end of a download pass. - Also add a log for evicting secondary location layers, for symmetry with attached locations, so that we can clearly see when eviction has happened for a particular tenant's layers when investigating issues. This is a point fix -- the code would also benefit from being refactored so that there is some "download result" type with a Skip variant, to ensure that we are updating the progress stats uniformly for those cases.	2024-05-28 16:06:47 +01:00
Peter Bendel	f9f69a2ee7	clarify how to load the dbpedia vector embeddings into a postgres database (#7894 ) ## Problem Improve the readme for the data load step in the pgvector performance test.	2024-05-28 17:21:09 +03:00
Peter Bendel	fabeff822f	Performance test for pgvector HNSW index build and queries (#7873 ) ## Problem We want to regularly verify the performance of pgvector HNSW parallel index builds and parallel similarity search using HNSW indexes. The first release that considerably improved the index-build parallelism was pgvector 0.7.0 and we want to make sure that we do not regress by our neon compute VM settings (swap, memory over commit, pg conf etc.) ## Summary of changes Prepare a Neon project with 1 million openAI vector embeddings (vector size 1536). Run HNSW indexing operations in the regression test for the various distance metrics. Run similarity queries using pgbench with 100 concurrent clients. I have also added the relevant metrics to the grafana dashboards pgbench and olape --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-05-28 11:05:33 +00:00
Arseny Sher	4a0ce9512b	Add safekeeper test truncating WAL. We do it as a part of more complicated tests like test_compute_restarts, but let's have a simple test as well.	2024-05-28 11:08:29 +03:00
Konstantin Knizhnik	d61e924103	Fix connect to PS on MacOS/X (#7885 ) ## Problem After [`0e4f182680`] which introduce async connect Neon is not able to connect to page server. ## Summary of changes Perform sync commit at MacOS/X ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-27 15:57:57 +03:00
Arseny Sher	b2d34a82b9	Make python Safekeeper datadir Path instead of str.	2024-05-25 06:06:32 +03:00
Arseny Sher	3797566c36	safekeeper: test pull_timeline with WAL gc. Do pull_timeline while WAL is being removed. To this end - extract pausable_failpoint to utils, sprinkle pull_timeline with it - add 'checkpoint' sk http endpoint to force WAL removal. After fixing checking for pull file status code test fails so far which is expected.	2024-05-25 06:06:32 +03:00
Conrad Ludgate	43f9a16e46	proxy: fix websocket buffering (#7878 ) ## Problem Seems the websocket buffering was broken for large query responses only ## Summary of changes Move buffering until after the underlying stream is ready. Tested locally confirms this fixes the bug. Also fixes the pg-sni-router missing metrics bug	2024-05-24 17:56:12 +01:00
Alexander Bayandin	71a7fd983e	CI(release): tune Storage & Compute release PR title (#7870 ) ## Problem A title for automatic proxy release PRs is `Proxy release`, and for storage & compute, it's just `Release` ## Summary of changes - Amend PR title for Storage & Compute releases to "Storage & Compute release"	2024-05-24 14:11:51 +01:00
Joonas Koivunen	a3f5b83677	chore: lower gate guard drop logging threshold to 100ms (#7862 ) We have some 1001ms cases, which do not yield gate guard context.	2024-05-24 14:07:58 +01:00
John Spray	1455f5a261	pageserver: revert concurrent secondary downloads, make DownloadStream always yield Err after cancel (#7866 ) ## Problem Ongoing hunt for secondary location shutdown hang issues. ## Summary of changes - Revert the functional changes from #7675 - Tweak a log in secondary downloads to make it more apparent when we drop out on cancellation - Modify DownloadStream's behavior to always return an Err after it has been cancelled. This _should_ not impact anything, but it makes the behavior simpler to reason about (e.g. even if the poll function somehow got called again, it could never end up in an un-cancellable state) Related #https://github.com/neondatabase/cloud/issues/13576	2024-05-24 11:45:34 +03:00
John Spray	3860bc9c6c	pageserver: post-shard-split layer rewrites (2/2) (#7531 ) ## Problem - After a shard split of a large existing tenant, child tenants can end up with oversized historic layers indefinitely, if those layers are prevented from being GC'd by branchpoints. This PR follows https://github.com/neondatabase/neon/pull/7531, and adds rewriting of layers that contain a mixture of needed & un-needed contents, in addition to dropping un-needed layers. Closes: https://github.com/neondatabase/neon/issues/7504 ## Summary of changes - Add methods to ImageLayer for reading back existing layers - Extend `compact_shard_ancestors` to rewrite layer files that contain a mixture of keys that we want and keys we do not, if unwanted keys are the majority of those in the file. - Amend initialization code to handle multiple layers with the same LayerName properly - Get rid of of renaming bad layer files to `.old` since that's now expected on restarts during rewrites.	2024-05-24 08:33:19 +00:00
Roman Zaynetdinov	c1f4028fc0	Export db size metrics for 10 user databases (#7857 ) ## Problem One database is too limiting. We have agreed to raise this limit to 10. ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-05-24 09:05:20 +01:00
MMeent	0e4f182680	Rework PageStream connection state handling: (#7611 ) * Make PS connection startup use async APIs This allows for improved query cancellation when we start connections * Make PS connections have per-shard connection retry state. Previously they shared global backoff state, which is bad for quickly getting all connections started and/or back online. * Make sure we clean up most connection state on failed connections. Previously, we could technically leak some resources that we'd otherwise clean up. Now, the resources are correctly cleaned up. * pagestore_smgr.c now PANICs on unexpected response message types. Unexpected responses are likely a symptom of having a desynchronized view of the connection state. As a desynchronized connection state can cause corruption, we PANIC, as we don't know what data may have been written to buffers: the only solution is to fail fast & hope we didn't write wrong data. * Catch errors in sync pagestream request handling. Previously, if a query was cancelled after a message was sent to the pageserver, but before the data was received, the backend could forget that it sent the synchronous request, and let others deal with the repercussions. This could then lead to incorrect responses, or errors such as "unexpected response from page server with tag 0x68"	2024-05-23 23:26:42 +02:00
Sasha Krassovsky	ea2e830707	Remove apostrophe (#7868 ) ## Problem ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-05-23 20:35:59 +00:00
Joonas Koivunen	7cf726e36e	refactor(rtc): remove the duplicate IndexLayerMetadata (#7860 ) Once upon a time, we used to have duplicated types for runtime IndexPart and whatever we stored. Because of the serde fixes in #5335 we have no need for duplicated IndexPart type anymore, but the `IndexLayerMetadata` stayed. - remove the type - remove LayerFileMetadata::file_size() in favor of direct field access Split off from #7833. Cc: #3072.	2024-05-23 23:24:31 +03:00
Alex Chi Z	6b3164269c	chore(pageserver): reduce logging related to image layers (#7864 ) * Reduce the logging level for create image layers of metadata keys. (question: is it possible to adjust logging levels at runtime?) * Do a info logging of image layers only after the layer is created. Now there are a lot of cases where we create the image layer writer but then discarding that image layer because it does not contain any key. Therefore, I changed the new image layer logging to trace, and create image layer logging to info. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-23 15:30:43 +00:00
Arpad Müller	75a52ac7fd	Use Timeline::create_image_layer_for_rel_blocks in tiered compaction (#7850 ) Reduces duplication between tiered and legacy compaction by using the `Timeline::create_image_layer_for_rel_blocks` function. This way, we also use vectored get in tiered compaction, so the change has two benefits in one. fixes #7659 --------- Co-authored-by: Alex Chi Z. <iskyzh@gmail.com>	2024-05-23 15:10:24 +00:00
Alex Chi Z	e28e46f20b	fix(pageserver): make wal connstr a connstr (#7846 ) The list timeline API gives something like `"wal_source_connstr":"PgConnectionConfig { host: Domain(\"safekeeper-5.us-east-2.aws.neon.build\"), port: 6500, password: Some(REDACTED-STRING) }"`, which is weird. This pull request makes it somehow like a connection string. This field is not used at least in the neon database, so I assume no one is reading or parsing it. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-23 09:45:29 -04:00
Arpad Müller	d5d15eb6eb	Warn if a blob in an image is larger than 256 MiB (#7852 ) We'd like to get some bits reserved in the length field of image layers for future usage (compression). This PR bases on the assumption that we don't have any blobs that require more than 28 bits (3 bytes + 4 bits) to store the length, but as a preparation, before erroring, we want to first emit warnings as if the assumption is wrong, such warnings are less disruptive than errors. A metric would be even less disruptive (log messages are more slow, if we have a LOT of such large blobs then it would take a lot of time to print them). At the same time, likely such 256 MiB blobs will occupy an entire layer file, as they are larger than our target size. For layer files we already log something, so there shouldn't be a large increase in overhead. Part of #5431	2024-05-23 14:28:05 +02:00
Joonas Koivunen	49d7f9b5a4	test_import_from_pageserver_small: try to make less flaky (#7843 ) With #7828 and proper fullbackup testing the test became flaky ([evidence]). - produce better assertion messages in `assert_pageserver_backups_equal` - use read only endpoint to confirm the row count [evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7839/9192447962/index.html#suites/89cfa994d71769e01e3fc4f475a1f3fa/49009214d0f8b8ce	2024-05-23 14:44:08 +03:00
Peter Bendel	95a49f0075	remove march=native from pgvector Makefile's OPTFLAGS (#7854 ) ## Problem By default, pgvector compiles with `-march=native` on some platforms for best performance. However, this can lead to `Illegal instruction` errors if trying to run the compiled extension on a different machine. I had this problem when trying to run the Neon compute docker image on MacOS with Apple Silicon with Rosetta. see `ff9b22977e/README.md (L1021)` ## Summary of changes Pass OPTFLAGS="" to make.	2024-05-23 10:08:06 +00:00
John Spray	545f7e8cd7	tests: fix an allow list entry (#7856 ) https://github.com/neondatabase/neon/pull/7844 typo'd one of the expressions: https://neon-github-public-dev.s3.amazonaws.com/reports/main/9196993886/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/e420fbfdb193bf80/	2024-05-23 10:50:21 +01:00
Anna Khanova	cd6d811213	[proxy] Do not fail after parquet upload error (#7858 ) ## Problem If the parquet upload was unsuccessful, it will panic. ## Summary of changes Write error in logs instead.	2024-05-23 09:41:29 +00:00
Arthur Petukhovsky	8f3c316bae	Skip unnecessary shared state updates in safekeepers (#7851 ) I looked at the metrics from https://github.com/neondatabase/neon/pull/7768 on staging and it seems that manager does too many iterations. This is probably caused by background job `remove_wal.rs` which iterates over all timelines and tries to remove WAL and persist control file. This causes shared state updates and wakes up the manager. The fix is to skip notifying about the updates if nothing was updated.	2024-05-23 09:45:24 +01:00
Joonas Koivunen	58e31fe098	test_attach_tenant_config: add allowed error (#7839 ) [evidence] of quite rare flaky. the detach can cause this with the right timing. [evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7650/9191613501/index.html#suites/7745dadbd815ab87f5798aa881796f47/2190222925001078	2024-05-23 11:25:38 +03:00
John Spray	a43a1ad1df	pageserver: fix API-driven secondary downloads possibly colliding with background downloads (#7848 ) ## Problem We've seen some strange behaviors when doing lots of migrations involving secondary locations. One of these was where a tenant was apparently stuck in the `Scheduler::running` list, but didn't appear to be making any progress. Another was a shutdown hang (https://github.com/neondatabase/cloud/issues/13576). ## Summary of changes - Fix one issue (probably not the only one) where a tenant in the `pending` list could proceed to `spawn` even if the same tenant already had a running task via `handle_command` (this could have resulted in a weird value of SecondaryProgress) - Add various extra logging: - log before as well as after layer downloads so that it would be obvious if we were stuck in remote storage code (we shouldn't be, it has built in timeouts) - log the number of running + pending jobs from the scheduler every time it wakes up to do a scheduling iteration (~10s) -- this is quite chatty, but not compared with the volume of logs on a busy pageserver. It should give us confidence that the scheduler loop is still alive, and visibility of how many tasks the scheduler thinks are running.	2024-05-23 09:13:55 +01:00
Oleg Vasilev	eb0c026aac	Bump vm-builder v0.28.1 -> v0.29.3 (#7849 ) One change: runner: allow coredump collection (#931)	2024-05-22 21:48:59 +00:00
Alex Chi Z	ff560a1113	chore(pageserver): use kebab case for compaction algorithms (#7845 ) Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-22 21:28:47 +00:00
Alex Chi Z	4a278cce7c	chore(pageserver): add force aux file policy switch handler (#7842 ) For existing users, we want to allow doing a force switch for their aux file policy. Part of #7462 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-22 19:05:26 +00:00
John Spray	f98fdd20e3	tests: add a couple of allow lists for shutdown cases (#7844 ) ## Problem Failures on some of our uglier shutdown log messages: https://neon-github-public-dev.s3.amazonaws.com/reports/main/9192662995/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/51b365408678c66f/ ## Summary of changes - Allow-list these errors.	2024-05-22 18:38:22 +00:00
John Spray	014f822a78	tests: refine test_secondary_background_downloads (#7829 ) ## Problem This test relied on some sleeps, and was failing ~5% of the time. ## Summary of changes Use log-watching rather than straight waits, and make timeouts more generous for the CI environment.	2024-05-22 19:17:47 +01:00
Alex Chi Z	ddd8ebd253	chore(pageserver): use kebab case for aux file flag (#7840 ) part of https://github.com/neondatabase/neon/issues/7462 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-22 17:06:00 +00:00
Conrad Ludgate	9cfe08e3d9	proxy password threadpool (#7806 ) ## Problem Despite making password hashing async, it can still take time away from the network code. ## Summary of changes Introduce a custom threadpool, inspired by rayon. Features: ### Fairness Each task is tagged with it's endpoint ID. The more times we have seen the endpoint, the more likely we are to skip the task if it comes up in the queue. This is using a min-count-sketch estimator for the number of times we have seen the endpoint, resetting it every 1000+ steps. Since tasks are immediately rescheduled if they do not complete, the worker could get stuck in a "always work available loop". To combat this, we check the global queue every 61 steps to ensure all tasks quickly get a worker assigned to them. ### Balanced Using crossbeam_deque, like rayon does, we have workstealing out of the box. I've tested it a fair amount and it seems to balance the workload accordingly	2024-05-22 17:05:43 +00:00
Alex Chi Z	64577cfddc	feat(pageserver): auto-detect previous aux file policy (#7841 ) ## Problem If an existing user already has some aux v1 files, we don't want to switch them to the global tenant-level config. Part of #7462 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-22 12:41:13 -04:00
Heikki Linnakangas	37f81289c2	Make 'neon.protocol_version = 2' the default, take two (#7819 ) Once all the computes in production have restarted, we can remove protocol version 1 altogether. See issue #6211. This was done earlier already in commit `0115fe6cb2`, but reverted before it was released to production in commit `bbe730d7ca` because of issue https://github.com/neondatabase/neon/issues/7692. That issue was fixed in commit `22afaea6e1`, so we are ready to change the default again.	2024-05-22 18:24:52 +03:00
Heikki Linnakangas	9217564026	Fix issues with determining request LSN in read replica (#7795 ) Don't set last-written LSN of a page when the record is replayed, only when the page is evicted from cache. For comparison, we don't update the last-written LSN on every page modification on the primary either, only when the page is evicted. Do update the last-written LSN when the page update is skipped in WAL redo, however. In neon_get_request_lsns(), don't be surprised if the last-written LSN is equal to the record being replayed. Use the LSN of the record being replayed as the request LSN in that case. Add a long comment explaining how that can happen. In neon_wallog_page, update last-written LSN also when Shutdown has been requested. We might still fetch and evict pages for a while, after shutdown has been requested, so we better continue to do that correctly. Enable the check that we don't evict a page with zero LSN also in standby, but make it a LOG message instead of PANIC Fixes issue https://github.com/neondatabase/neon/issues/7791	2024-05-22 18:24:21 +03:00
Heikki Linnakangas	3404e76a51	Fix confusion between 1-based Buffer and 0-based index (#7825 ) The code was working correctly, but was incorrectly using Buffer for a 0-based index into the BufferDesc array.	2024-05-22 18:24:21 +03:00
Joonas Koivunen	62aac6c8ad	fix(Layer): carry gate until eviction is complete (#7838 ) the gate was accidentially being dropped before the final blocking phase, possibly explaining the resident physical size global problems during deletions. it could had caused more harm as well, but the path is not actively being tested because cplane no longer puts locationconfigs with higher generation number during normal operation which prompted the last wave of fixes. Cc: #7341.	2024-05-22 18:13:45 +03:00
John Spray	e015b2bf3e	safekeeper: use CancellationToken instead of watch channel (#7836 ) ## Problem Safekeeper Timeline uses a channel for cancellation, but we have a dedicated type for that. ## Summary of changes - Use CancellationToken in Timeline	2024-05-22 16:10:58 +01:00
Alexander Bayandin	a7f31f1a59	CI: build multi-arch images (#7696 ) ## Problem We don't build our docker images for ARM arch, and that makes it harder to run images on ARM (on MacBooks with Apple Silicon, for example). ## Summary of changes - Build `neondatabase/neon` for ARM and create a multi-arch image - Build `neondatabase/compute-node-vXX` for ARM and create a multi-arch image - Run `test-images` job on ARM as well	2024-05-22 16:06:05 +01:00
Alexander Bayandin	325f3784f9	CI(promote-images): simplify & fix the job (#7826 ) ## Problem Currently, `latest` tag is added to the images in several cases: ``` github.ref_name == 'main' \|\| github.ref_name == 'release' \|\| github.ref_name == 'release-proxy' ``` This leads to a race; the `latest` tag jumps back and forth depending on the branch that has built images. ## Summary of changes - Do not push `latest` images to prod ECR (we don't use it) - Use `docker buildx imagetools` instead of `crane` for tagging images - Unify `vm-compute-node-image` job with others and use dockerhub as a first source for images (sync images with ECR) - Tag images with `latest` only for commits in `main`	2024-05-22 15:02:20 +00:00
Tristan Partin	900f391115	Make postgres_version action input default to a string This is "required" by GitHub Actions, though they must do some coersion on their side.	2024-05-22 09:20:00 -05:00
Tristan Partin	8901ce9c99	Fix typos in action definitions	2024-05-22 09:20:00 -05:00
Joonas Koivunen	ce44dfe353	openapi: document timeline ancestor detach (#7650 ) The openapi description with the error descriptions: - 200 is used for "detached or has been detached previously" - 400 is used for "cannot be detached right now" -- it's an odd thing, but good enough - 404 is used for tenant or timeline not found - 409 is used for "can never be detached" (root timeline) - 500 is used for transient errors (basically ill-defined shutdown errors) - 503 is used for busy (other tenant ancestor detach underway, pageserver shutdown) Cc: #6994	2024-05-22 13:55:34 +00:00
Alexander Bayandin	d1d55bbd9f	CI(report-benchmarks-failures): fix condition (#7820 ) ## Problem `report-benchmarks-failures` got skipped if a dependent job fails. ## Summary of changes - Fix the if-condition by adding `&& failures()` to it; it'll make the job run if the dependent job fails.	2024-05-22 14:43:10 +01:00
Joonas Koivunen	df9ab1b5e3	refactor(test): duplication with fullbackup, tar content hashing (#7828 ) "taking a fullbackup" is an ugly multi-liner copypasted in multiple places, most recently with timeline ancestor detach tests. move it under `PgBin` which is not a great place, but better than yet another utility function. Additionally: - cleanup `psql_env` repetition (PgBin already configures that) - move the backup tar comparison as a yet another free utility function - use backup tar comparison in `test_import.py` where a size check was done previously - cleanup extra timeline creation from test Cc: #7715	2024-05-22 15:43:21 +03:00
Heikki Linnakangas	ef96c82c9f	Fix zenith_test_evict mode and clear_buffer_cache() function Using InvalidateBuffer is wrong, because if the page is concurrently dirtied, it will throw away the dirty page without calling smgwrite(). In Neon, that means that the last-written LSN update for the page is missed. In v16, use the new InvalidateVictimBuffer() function that does what we need. In v15 and v14, backport the InvalidateVictimBuffer() function. Fixes issue https://github.com/neondatabase/neon/issues/7802	2024-05-22 14:26:03 +03:00
Arseny Sher	b43f6daa48	One more iteration on making walcraft test more robust. Some WAL might be inserted on the page boundary before XLOG_SWITCH lands there, repeat construction in this case.	2024-05-22 14:23:49 +03:00
Arpad Müller	664f92dc6e	Refactor PageServerHandler::process_query parsing (#7835 ) In the process_query function in page_service.rs there was some redundant duplication. Remove it and create a vector of whitespace separated parts at the start and then use `slice::strip_prefix`. Only use `starts_with` in the places with multiple whitespace separated parameters: here we want to preserve grep/rg ability. Followup of #7815, requested in https://github.com/neondatabase/neon/pull/7815#pullrequestreview-2068835674	2024-05-22 12:43:03 +02:00
Arthur Petukhovsky	bd5cb9e86b	Implement timeline_manager for safekeeper background tasks (#7768 ) In safekeepers we have several background tasks. Previously `WAL backup` task was spawned by another task called `wal_backup_launcher`. That task received notifications via `wal_backup_launcher_rx` and decided to spawn or kill existing backup task associated with the timeline. This was inconvenient because each code segment that touched shared state was responsible for pushing notification into `wal_backup_launcher_tx` channel. This was error prone because it's easy to miss and could lead to deadlock in some cases, if notification pushing was done in the wrong order. We also had a similar issue with `is_active` timeline flag. That flag was calculated based on the state and code modifying the state had to call function to update the flag. We had a few bugs related to that, when we forgot to update `is_active` flag in some places where it could change. To fix these issues, this PR adds a new `timeline_manager` background task associated with each timeline. This task is responsible for managing all background tasks, including `is_active` flag which is used for pushing broker messages. It is subscribed for updates in timeline state in a loop and decides to spawn/kill background tasks when needed. There is a new structure called `TimelinesSet`. It stores a set of `Arc<Timeline>` and allows to copy the set to iterate without holding the mutex. This is what replaced `is_active` flag for the broker. Now broker push task holds a reference to the `TimelinesSet` with active timelines and use it instead of iterating over all timelines and filtering by `is_active` flag. Also added some metrics for manager iterations and active backup tasks. Ideally manager should be doing not too many iterations and we should not have a lot of backup tasks spawned at the same time. Fixes #7751 --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2024-05-22 09:34:39 +01:00
Em Sharnoff	00d66e8012	compute_ctl: Fix handling of missing /neonvm/bin/resize-swap (#7832 ) The logic added in the original PR (#7434) only worked before sudo was used, because 'sudo foo' will only fail with NotFound if 'sudo' doesn't exist; if 'foo' doesn't exist, then sudo will fail with a normal error exit. This means that compute_ctl may fail to restart if it exits after successfully enabling swap.	2024-05-21 16:52:48 -07:00
Arpad Müller	679e031cf6	Add dummy lsn lease http and page service APIs (#7815 ) We want to introduce a concept of temporary and expiring LSN leases. This adds both a http API as well as one for the page service to obtain temporary LSN leases. This adds a dummy implementation to unblock integration work of this API. A functional implementation of the lease feature is deferred to a later step. Fixes #7808 Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-05-21 23:31:20 +02:00
Alex Chi Z	e3f6a07ca3	chore(pageserver): remove metrics for in-memory ingestion (#7823 ) The metrics was added in https://github.com/neondatabase/neon/pull/7515/ to observe if https://github.com/neondatabase/neon/pull/7467 introduces any perf regressions. The change was deployed on 5/7 and no changes are observed in the metrics. So it's safe to remove the metrics now. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-21 13:33:29 -04:00
Joonas Koivunen	a8a88ba7bc	test(detach_ancestor): ensure L0 compaction in history is ok (#7813 ) detaching a timeline from its ancestor can leave the resulting timeline with more L0 layers than the compaction threshold. most of the time, the detached timeline has made progress, and next time the L0 -> L1 compaction happens near the original branch point and not near the last_record_lsn. add a test to ensure that inheriting the historical L0s does not change fullbackup. additionally: - add `wait_until_completed` to test-only timeline checkpoint and compact HTTP endpoints. with `?wait_until_completed=true` the endpoints will wait until the remote client has completed uploads. - for delta layers, describe L0-ness with the `/layer` endpoint Cc: #6994	2024-05-21 20:08:43 +03:00
John Spray	353afe4fe7	neon_local: run controller's postgres with fsync=off (#7817 ) ## Problem In `test_storage_controller_many_tenants` we [occasionally](https://neon-github-public-dev.s3.amazonaws.com/reports/main/9155810417/index.html#/testresult/8fbdf57a0e859c2d) see it hit the retry limit on serializable transactions. That's likely due to a combination of relative slow fsync on the hetzner nodes running the test, and the way the test does lots of parallel timeline creations, putting high load on the drive. Running the storage controller's db with fsync=off may help here. ## Summary of changes - Set `fsync=off` in the postgres config for the database used by the storage controller in tests	2024-05-21 18:13:54 +03:00
Tristan Partin	1988ad8db7	Extend test_unlogged to include a sequence Unlogged sequences were added in v15, so let's just test to make sure they work on Neon.	2024-05-21 09:18:11 -05:00
Tristan Partin	e3415706b7	Upgrade Postgres v16 to 16.3	2024-05-21 09:18:11 -05:00
Tristan Partin	9d081851ec	Upgrade Postgres v15 to 15.7	2024-05-21 09:18:11 -05:00
Tristan Partin	781352bd8e	Upgrade Postgres v14 to 14.12	2024-05-21 09:18:11 -05:00
Tristan Partin	8030b8e4c5	Fix test_pg_regress for unlogged relations Previously we worked around file comparison issues by dropping unlogged relations in the pg_regress tests, but this would lead to an unnecessary diff when compared to upstream in our Postgres fork. Instead, we can precompute the files that we know will be different, and ignore them.	2024-05-21 09:18:11 -05:00
Tristan Partin	9a4b896636	Use a constant for database name in test_pg_regress	2024-05-21 09:18:11 -05:00
Tristan Partin	e8b8ebfa1d	Allow check_restored_datadir_content to ignore certain files Some files may have known differences that we are okay with.	2024-05-21 09:18:11 -05:00
Tristan Partin	d9d471e3c4	Add some Python typing in a few test files	2024-05-21 09:18:11 -05:00
Arseny Sher	d43dcceef9	Minimize hot standby feedback xmins to next_xid. Hot standby feedback xmins can be greater than next_xid due to sparse update of nextXid on pageserver (to do less writes it advances next xid on 1024). ProcessStandbyHSFeedback ignores such xids from the future; to fix, minimize received xmin to next_xid. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-21 16:21:29 +03:00
Arseny Sher	f2771a99b7	Add metric for pageserver standby horizon. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-21 16:21:29 +03:00
Arseny Sher	f54c3b96e0	Fix bugs in hot standby feedback propagation and add test for it. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-21 16:21:29 +03:00
Arseny Sher	478cc37a70	Propagate standby apply LSN to pageserver to hold off GC. To avoid pageserver gc'ing data needed by standby, propagate standby apply LSN through standby -> safekeeper -> broker -> pageserver flow and hold off GC for it. Iteration of GC resets the value to remove the horizon when standby goes away -- pushes are assumed to happen at least once between gc iterations. As a safety guard max allowed lag compared to normal GC horizon is hardcoded as 10GB. Add test for the feature. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-21 16:21:29 +03:00
John Spray	4ce6e2d2fc	pageserver: fix secondary progress stats when layers are 404 (#7814 ) ## Problem Noticed this issue in staging. When a tenant is under somewhat heavy timeline creation/deletion thrashing, it becomes quite common for secondary downloads to encounter 404s downloading layers. This is tolerated by design, because heatmaps are not guaranteed to be up to date with what layers/timelines actually exist. However, we were not updating the SecondaryProgress structure in this case, so after such a download pass, we would leave a SecondaryProgress state with lower "downloaded" stats than "total" stats. This causes the storage controller to consider this secondary location inelegible for optimization actions such as we do after shard splits This issue has relative low impact because a typical tenant will eventually upload a heatmap where we do download all the layers and thereby enable the controller to progress with migrations -- the heavy thrashing of timeline creation/deletion is an artifact of our nightly stress tests. ## Summary of changes - In the layer 404 case, subtract the skipped layer's stats from the totals, so that at the end of this download pass we should still end up in a complete state. - When updating `last_downloaded`, do a sanity check that our progress is complete. In debug builds, assert out if this is not the case. In prod builds, correct the stats and log a warning.	2024-05-21 13:46:04 +01:00
dependabot[bot]	baeb58432f	build(deps): bump requests from 2.31.0 to 2.32.0 (#7816 )	2024-05-21 10:48:17 +00:00
Sasha Krassovsky	6f3e043a76	Add some more replication slot metrics (#7761 ) ## Problem We want to add alerts for when people's replication slots break, and also metrics for retained WAL so that we can make warn customers when their storage gets bloated. ## Summary of changes Adds the metrics. Addresses https://github.com/neondatabase/neon/issues/7593	2024-05-21 00:00:47 +00:00
Alex Chi Z	6810d2aa53	feat(pageserver): do not read past image layers for vectored get (#7773 ) ## Problem Part of https://github.com/neondatabase/neon/issues/7462 On metadata keyspace, vectored get will not stop if a key is not found, and will read past the image layer. However, the semantics is different from single get, because if a key does not exist in the image layer, it means that the key does not exist in the past, or have been deleted. This pull request fixed it by recording image layer coverage during the vectored get process and stop when the full keyspace is covered by an image layer. A corresponding test case is added to ensure generating image layer reduces the number of delta layers. This optimization (or bug fix) also applies to rel block keyspaces. If a key is missing, we can know it's missing once the first image layer is reached. Page server will not attempt to read lower layers, which potentially incurs layer downloads + evictions. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-20 14:24:18 -04:00
Andy Hattemer	2d7091871f	Update banner image in Readme (#7801 ) Update the readme banner with updated branding.	2024-05-20 12:15:43 -04:00
Alex Chi Z	7701ca45dd	feat(pageserver): generate image layers for sparse keyspace (#7567 ) Part of https://github.com/neondatabase/neon/issues/7462 Sparse keyspace does not generate image layers for now. This pull request adds support for generating image layers for sparse keyspace. ## Summary of changes * Use the scan interface to generate compaction data for sparse keyspace. * Track num of delta layers reads during scan. * Read-trigger compaction: when a scan on the keyspace touches too many delta files, generate an image layer. There are one hard-coded threshold for now: max delta layers we want to touch for a scan. * L0 compaction does not need to compute holes for metadata keyspace. Know issue: the scan interface currently reads past the image layer, which causes `delta_layer_accessed` keeps increasing even if image layers are generated. The pull request to fix that will be separate, and orthogonal to this one. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-20 16:08:45 +00:00
Arseny Sher	de8dfee4bd	safekeeper: log LSNs on walreceiver/walsender exit. Useful for observability.	2024-05-20 15:43:10 +03:00
Arseny Sher	e3f51abadf	safekeeper: close connection when COPY stream ends. We can't gracefully exit COPY mode (and don't need that), so close connection to prevent further attempts to use it.	2024-05-20 15:43:10 +03:00
Peter Bendel	a7b84cca5a	Upgrade of pgvector to 0.7.0 (#7726 ) Upgrade pgvector to 0.7.0. This PR is based on Heikki's PR #6753 and just uses pgvector 0.7.0 instead of 0.6.0 I have now done all planned manual tests. The pull request is ready to be reviewed and merged and can be deployed in production together / after swap enablement. See (https://github.com/neondatabase/autoscaling/issues/800) Fixes https://github.com/neondatabase/neon/issues/6516 Fixes https://github.com/neondatabase/neon/issues/7780 ## Documentation input for usage recommendations ### maintenance_work_mem In Neon `maintenance_work_mem` is very small by default (depends on configured RAM for your compute but can be as low as 64 MB). To optimize pgvector index build time you may have to bump it up according to your working set size (size of tuples for vector index creation). You can do so in the current session using `SET maintenance_work_mem='10 GB';` The target value you choose should fit into the memory of your compute size and not exceed 50-60% of available RAM. The value above has been successfully used on a 7CU endpoint. ### max_parallel_maintenance_workers max_parallel_maintenance_workers is also small by default (2). For efficient parallel pgvector index creation you have to bump it up with `SET max_parallel_maintenance_workers = 7` to make use of all the CPUs available, assuming you have configured your endpoint to use 7CU. ## ID input for changelog pgvector extension in Neon has been upgraded from version 0.5.1 to version 0.7.0. Please see https://github.com/pgvector/pgvector/ for documentation of new capabilities in pgvector version 0.7.0 If you have existing databases with pgvector 0.5.1 already installed there is a slight difference in behavior in the following corner cases even if you don't run `ALTER EXTENSION UPDATE`: ### L2 distance from NULL::vector For the following script, comparing the NULL::vector to non-null vectors the resulting output changes: ```sql SET enable_seqscan = off; CREATE TABLE t (val vector(3)); INSERT INTO t (val) VALUES ('[0,0,0]'), ('[1,2,3]'), ('[1,1,1]'), (NULL); CREATE INDEX ON t USING hnsw (val vector_l2_ops); INSERT INTO t (val) VALUES ('[1,2,4]'); SELECT * FROM t ORDER BY val <-> (SELECT NULL::vector); ``` and now the output is ``` val --------- [1,1,1] [1,2,4] [1,2,3] [0,0,0] (4 rows) ``` For the following script ```sql SET enable_seqscan = off; CREATE TABLE t (val vector(3)); INSERT INTO t (val) VALUES ('[0,0,0]'), ('[1,2,3]'), ('[1,1,1]'), (NULL); CREATE INDEX ON t USING ivfflat (val vector_l2_ops) WITH (lists = 1); INSERT INTO t (val) VALUES ('[1,2,4]'); SELECT * FROM t ORDER BY val <-> (SELECT NULL::vector); ``` the output now is ``` val --------- [0,0,0] [1,2,3] [1,1,1] [1,2,4] (4 rows) ``` ### changed error messages If you provide invalid literals for datatype vector you may get improved/changed error messages, for example: ```sql neondb=> SELECT '[4e38,1]'::vector; ERROR: "4e38" is out of range for type vector LINE 1: SELECT '[4e38,1]'::vector; ^ ``` --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-05-20 12:07:25 +02:00
John Spray	291fcb9e4f	pageserver: use the heatmap upload interval to set the secondary download interval (#7793 ) ## Problem The heatmap upload period is configurable, but secondary mode downloads were using a fixed download period. Closes: #6200 ## Summary of changes - Use the upload period in the heatmap to adjust the download period. In practice, this will reduce the frequency of downloads from its current 60 second period to what heatmaps use, which is 5-10m depending on environment. This is an improvement rather than being optimal: we could be smarter about periods, and schedule downloads to occur around the time we expect the next upload, rather than just using the same period, but that's something we can address in future if it comes up.	2024-05-20 09:25:25 +01:00
Conrad Ludgate	a5ecca976e	proxy: bump parquet (#7782 ) ## Summary of changes Updates the parquet lib. one change left that we need is in an open PR against upstream, hopefully we can remove the git dependency by 52.0.0 https://github.com/apache/arrow-rs/pull/5773 I'm not sure why the parquet files got a little bit bigger. I tested them and they still open fine. 🤷 side effect of the update, chrono updated and added yet another deprecation warning (hence why the safekeepers change)	2024-05-19 19:45:53 +00:00
Heikki Linnakangas	5caee4ca54	Fix calculation in test The comment says that this checks if there's enough space on the page for logical message and an XLOG_SWITCH. So the sizes of the logical message and the XLOG_SWITCH record should be added together, not subtracted. I saw a panic in the test that led me to investigate and notice this (https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7803/9142396223/index.html): RuntimeError: Run ['/tmp/neon/bin/wal_craft', 'in-existing', 'last_wal_record_xlog_switch_ends_on_page_boundary', "host=localhost port=16165 user=cloud_admin dbname=postgres options='-cstatement_timeout=120s '"] failed: stdout: stderr: thread 'main' panicked at libs/postgres_ffi/wal_craft/src/lib.rs:370:27: attempt to subtract with overflow stack backtrace: 0: rust_begin_unwind at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5 1: core::panicking::panic_fmt at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14 2: core::panicking::panic at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:145:5 3: <wal_craft::LastWalRecordXlogSwitchEndsOnPageBoundary as wal_craft::Crafter>::craft::<postgres::client::Client> at libs/postgres_ffi/wal_craft/src/lib.rs:370:27 4: wal_craft::main::{closure#0} at libs/postgres_ffi/wal_craft/src/bin/wal_craft.rs:21:17 5: wal_craft::main at libs/postgres_ffi/wal_craft/src/bin/wal_craft.rs:66:47 6: <fn() -> core::result::Result<(), anyhow::Error> as core::ops::function::FnOnce<()>>::call_once at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.	2024-05-19 21:49:51 +03:00
Alex Chi Z	e1a9669d05	feat(pagebench): add aux file bench (#7746 ) part of https://github.com/neondatabase/neon/issues/7462 ## Summary of changes This pull request adds two APIs to the pageserver management API: list_aux_files and ingest_aux_files. The aux file pagebench is intended to be used on an empty timeline because the data do not go through the safekeeper. LSNs are advanced by 8 for each ingestion, to avoid invariant checks inside the pageserver. For now, I only care about space amplification / read amplification, so the bench is designed in a very simple way: ingest 10000 files, and I will manually dump the layer map to analyze. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-17 20:04:02 +00:00
Alex Chi Z	aaf60819fa	feat(pageserver): persist aux file policy in index part (#7668 ) Part of https://github.com/neondatabase/neon/issues/7462 ## Summary of changes Tenant config is not persisted unless it's attached on the storage controller. In this pull request, we persist the aux file policy flag in the `index_part.json`. Admins can set `switch_aux_file_policy` in the storage controller or using the page server API. Upon the first aux file gets written, the write path will compare the aux file policy target with the current policy. If it is switch-able, we will do the switch. Otherwise, the original policy will be used. The test cases show what the admins can do / cannot do. The `last_aux_file_policy` is stored in `IndexPart`. Updates to the persisted policy are done via `schedule_index_upload_for_aux_file_policy_update`. On the write path, the writer will update the field. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-05-17 19:22:49 +00:00
John Spray	c84656a53e	pageserver: implement auto-splitting (#7681 ) ## Problem Currently tenants are only split into multiple shards if a human being calls the API to do it. Issue: #7388 ## Summary of changes - Add a pageserver API for returning the top tenants by size - Add a step to the controller's background loop where if there is no reconciliation or optimization to be done, it looks for things to split. - Add a test that runs pgbench on many tenants concurrently, and checks that splitting happens as expected as tenants grow, without interrupting the client I/O. This PR is quite basic: there is a tasklist in https://github.com/neondatabase/neon/issues/7388 for further work. This PR is meant to be safe (off by default), and sufficient to enable our staging environment to run lots of sharded tenants without a human having to set them up.	2024-05-17 16:01:24 +00:00
John Spray	af99c959ef	storage controller: use SERIALIZABLE isolation level (#7792 ) ## Problem The storage controller generally assumes that things like updating generation numbers are atomic: it should use a strict isolation level. ## Summary of changes - Wrap all database operations in a SERIALIZABLE transaction. - Retry serialization failures, as these do not indicate problems and are normal when plenty of concurrent work is happening. Using this isolation level for all reads is overkill, but much simpler than reasoning about it on a per-operation basis, and does not hurt performance. Tested this with a modified version of storage_controller_many_tenants test with 128k shards, to check that our performance is still fine: it is.	2024-05-17 16:44:33 +01:00
John Spray	a8e6d259cb	pageserver: fixes for layer path changes (#7786 ) ## Problem - When a layer with legacy local path format is evicted and then re-downloaded, a panic happened because the path downloaded by remote storage didn't match the path stored in Layer. - While investigating, I also realized that secondary locations would have a similar issue with evictions. Closes: #7783 ## Summary of changes - Make remote timeline client take local paths as an input: it should not have its own ideas about local paths, instead it just uses the layer path that the Layer has. - Make secondary state store an explicit local path, populated on scan of local disk at startup. This provides the same behavior as for Layer, that our local_layer_path is a _default_, but the layer path can actually be anything (e.g. an old style one). - Add tests for both cases.	2024-05-17 13:24:03 +01:00
Joonas Koivunen	c1390bfc3b	chore: update defaults for timeline_detach_ancestor (#7779 ) by having 100 copy operations in flight twe climb up to 2500 requests per min or 41/s. This is still probably less than is allowed, but fast enough for our purposes.	2024-05-17 12:25:01 +02:00
Christian Schwarz	6d951e69d6	test_suite: patch, don't replace, the `tenant_config` field, where appropriate (#7771 ) Before this PR, the changed tests would overwrite the entire `tenant_config` because `pageserver_config_override` is merged non-recursively into the `ps_cfg`. This meant they would override the `PAGESERVER_DEFAULT_TENANT_CONFIG_COMPACTION_ALGORITHM`, impacting our matrix build for `compaction_algorithm=Tiered\|Legacy` in https://github.com/neondatabase/neon/pull/7748. I found the tests fixed in this PR using the `NEON_PAGESERVER_PANIC_ON_UNSPECIFIED_COMPACTION_ALGORITHM` env var that I added in #7748. Therefore, I think this is an exhaustive fix. This is better than just searching the code base for `tenant_config`, which is what I had sketched in #7747. refs #7749	2024-05-17 12:24:02 +02:00
Arpad Müller	4b8809b280	Tiered compaction: improvements to the windows (#7787 ) Tiered compaction employs two sliding windows over the keyspace: `KeyspaceWindow` for the image layer generation and `Window` for the delta layer generation. Do some fixes to both windows: * The distinction between the two windows is not very clear. Do the absolute minimum to mention where they are used in the rustdoc description of the struct. Maybe we should rename them (say `WindowForImage` and `WindowForDelta`) or merge them into one window implementation. * Require the keys to strictly increase. The `accum_key_values` already combines the key, so there is no logic needed in `Window::feed` for the same key repeating. This is a follow-up to address the request in https://github.com/neondatabase/neon/pull/7671#pullrequestreview-2051995541 * In `choose_next_delta`, we claimed in the comment to use 1.25 as the factor but it was 1.66 instead. Fix this discrepancy by using `*5/4` as the two operations.	2024-05-16 22:25:19 +02:00
Arpad Müller	4c5afb7b10	Remove SSO_ACCOUNT_ID from scrubber docs and BucketConfig (#7774 ) As of #6202 we support `AWS_PROFILE` as well, which is more convenient. Change the docs to using it instead of `SSO_ACCOUNT_ID`. Also, remove `SSO_ACCOUNT_ID` from BucketConfig as it is confusing to the code's reader: it's not the "main" way of setting up authentication for the scrubber any more. It is a breaking change for the on-disk format as we persist `sso_account_id` to disk, but it was quite inconsistent with the other methods which are not persistet. Also, I don't think we want to support the case where one version writes the json and another version reads it. Related: #7667	2024-05-16 19:35:13 +02:00
Arpad Müller	ec069dc45e	tiered compaction: introduce PAGE_SZ constant and use it (#7785 ) pointed out by @problame : we use the literal 8192 instead of a properly defined constant. replace the literal by a PAGE_SZ constant.	2024-05-16 16:48:49 +02:00
Conrad Ludgate	790c05d675	proxy: swap tungstenite for a simpler impl (#7353 ) ## Problem I wanted to do a deep dive of the tungstenite codebase. tokio-tungstenite is incredibly convoluted... In my searching I found [fastwebsockets by deno](https://github.com/denoland/fastwebsockets), but it wasn't quite sufficient. This also removes the default 16MB/64MB frame/message size limitation. framed-websockets solves this by inserting continuation frames for partially received messages, so the whole message does not need to be entirely read into memory. ## Summary of changes I took the fastwebsockets code as a starting off point and rewrote it to be simpler, server-only, and be poll-based to support our Read/Write wrappers. I have replaced our tungstenite code with my framed-websockets fork. <https://github.com/neondatabase/framed-websockets>	2024-05-16 13:05:50 +02:00
Andrew Rudenko	923cf91aa4	compute_ctl: catalog API endpoints (#7575 ) ## Problem There are two cloud's features that require extra compute endpoints. 1. We are running pg_dump to get DB schemas. Currently, we are using a special service for this. But it would be great to execute pg_dump in an isolated environment. And we already have such an environment, it's our compute! And likely enough pg_dump already exists there too! (see https://github.com/neondatabase/cloud/issues/11644#issuecomment-2084617832) 2. We need to have a way to get databases and roles from compute after time travel (see https://github.com/neondatabase/cloud/issues/12109) ## Summary of changes It adds two API endpoints to compute_ctl HTTP API that target both of the aforementioned cases. --------- Co-authored-by: Tristan Partin <tristan@neon.tech>	2024-05-16 12:04:16 +02:00
John Spray	03c6039707	pageserver: refine tenant_id->shard lookup (#7762 ) ## Problem This is tech debt from when shard splitting was implemented, to handle more nicely the edge case of a client reconnect at the moment of the split. During shard splits, there were edge cases where we could incorrectly return NotFound to a getpage@lsn request, prompting an unwanted reconnect/backoff from the client. It is already the case that parent shards during splits are marked InProgress before child shards are created, so `resolve_attached_shard` will not match on them, thereby implicitly preferring child shards (good). However, we were not doing any elegant handling of InProgress in general: `get_active_tenant_with_timeout` was previously mostly dead code: it was inspecting the slot found by `resolve_attached_shard` and maybe waiting for InProgress, but that path is never taken because since `ef7c9c2ccc` the resolve function only ever returns attached slots. Closes: https://github.com/neondatabase/neon/issues/7044 ## Summary of changes - Change return value of `resolve_attached_shard` to distinguish between true NotFound case, and the case where we skipped slots that were InProgress. - Rework `get_active_tenant_with_timeout` to loop over calling resolve_attached_shard, waiting if it sees an InProgress result. The resulting behavior during a shard split is: - If we look up a shard early in split when parent is InProgress but children aren't created yet, we'll wait for the parent to be shut down. This corresponds to the part of the split where we wait for LSNs to catch up: so a small delay to the request, but a clean enough handling. - If we look up a shard while child shards are already present, we will match on those shards rather than the parent, as intended.	2024-05-16 08:26:34 +00:00
Alex Chi Z	c6d5ff944d	fix(test): ensure fixtures are correctly used for pageserver_aux_file_policy (#7769 ) Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-15 18:29:12 +00:00
Alex Chi Z	4b97683338	feat(pageserver): use fnv hash for aux file encoding (#7742 ) FNV hash is simple, portable, and stable. This pull request vendors the FNV hash implementation from servo and modified it to use the u128 variant. replaces https://github.com/neondatabase/neon/pull/7644 ref https://github.com/neondatabase/neon/issues/7462 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-15 13:17:57 -04:00
Jure Bajic	affc18f912	Add performance regress `test_ondemand_download_churn.py` (#7242 ) Add performance regress test for on-demand download throughput. Closes https://github.com/neondatabase/neon/issues/7146 Co-authored-by: Christian Schwarz <christian@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-05-15 18:41:12 +02:00
Christian Schwarz	3ef6e21211	fixup #7747 : actually use the fixture for neon_env_builder (#7767 ) The `= None` makes it not use the fixture. This slipped due to last-minute changes.	2024-05-15 18:17:55 +02:00
Arpad Müller	1075386d77	Add test_uploads_and_deletions test (#7758 ) Adds a test that is a reproducer for many tiered compaction bugs, both ones that have since been fixed as well as still unfxied ones: * (now fixed) #7296 * #7707 * #7759 * Likely also #7244 but I haven't tried that. The key ordering bug can be reproduced by switching to `merge_delta_keys` instead of `merge_delta_keys_buffered`, so reverting a big part of #7661, although it only sometimes reproduces (30-50% of cases). part of https://github.com/neondatabase/neon/issues/7554	2024-05-15 15:32:47 +02:00
Christian Schwarz	c3dd646ab3	chore!: always use async walredo, warn if sync is configured (#7754 ) refs https://github.com/neondatabase/neon/issues/7753 This PR is step (1) of removing sync walredo from Pageserver. Changes: * Remove the sync impl * If sync is configured, warn! and use async instead * Remove the metric that exposes `kind` * Remove the tenant status API that exposes `kind` Future Work ----------- After we've released this change to prod and are sure we won't roll back, we will 1. update the prod Ansible to remove the config flag from the prod pageserver.toml. 2. remove the remaining `kind` code in pageserver These two changes need no release inbetween. See https://github.com/neondatabase/neon/issues/7753 for details.	2024-05-15 15:04:52 +02:00
Christian Schwarz	bc78b0e9cc	chore(deps): use upstream svg_fmt after they merged our PR (#7764 ) They have merged our PR https://github.com/nical/rust_debug/pull/4 but they haven't released a new crate version yet. refs https://github.com/neondatabase/neon/issues/7763	2024-05-15 14:18:02 +02:00
John Spray	f342b87f30	pageserver: remove Option<> around remote storage, clean up metadata file refs (#7752 ) ## Problem This is historical baggage from when the pageserver could be run with local disk only: we had a bunch of places where we had to treat remote storage as optional. Closes: https://github.com/neondatabase/neon/issues/6890 ## Changes - Remove Option<> around remote storage (in https://github.com/neondatabase/neon/pull/7722 we made remote storage clearly mandatory) - Remove code for deleting old metadata files: they're all gone now. - Remove other references to metadata files when loading directories, as none exist. I checked last 14 days of logs for "found legacy metadata", there are no instances.	2024-05-15 12:05:24 +00:00
Alexander Bayandin	438bacc32e	CI(neon-extra-builds): Use small-arm64 runners instead of large-arm64 (#7740 ) ## Problem There are not enough arm runners and jobs in `neon-extra-builds` workflow take about the same amount of time on a small-arm runner as on large-arm. ## Summary of changes - Switch `neon-extra-builds` workflow from `large-arm64` to `small-arm64` runners	2024-05-15 14:29:12 +03:00
Arseny Sher	1a2a3cb446	Add restart_lsn metric for logical slots.	2024-05-15 11:19:33 +03:00
Christian Schwarz	4eedb3b6f1	test suite: allow overriding default compaction algorithm via env var (#7747 ) This PR allows setting the `PAGESERVER_DEFAULT_TENANT_CONFIG_COMPACTION_ALGORITHM` env var to override the `tenant_config.compaction_algorithm` field in the initial `pageserver.toml` for all tests. I tested manually that this works by halting a test using pdb and inspecting the `effective_config` in the tenant status managment API. If the env var is set, the tests are parametrized by the `kind` tag field, allowing to do a matrix build in CI and let Allure summarize everything in a nice report. If the env var is not set, the tests are not parametrized. So, merging this PR doesn't cause problems for flaky test detection. In fact, it doesn't cause any runtime change if the env var is not set. There are some tests in the test suite that set used to override the entire tenant_config using `NeonEnvBuilder.pageserver_config_override`. Since config overrides are merged non-recursively, such overrides that don't specify `kind = ` cause a fallback to pageserver's built-in `DEFAULT_COMPACTION_ALGORITHM`. Such cases can be found using ``` ["']tenant_config\s*[='"] ``` We'll deal with these tests in a future PR. closes https://github.com/neondatabase/neon/issues/7555	2024-05-14 18:03:08 +02:00
Arpad Müller	e67fcf9563	Update mold to 2.31 (#7757 ) The [2.31.0 release](https://github.com/rui314/mold/releases/tag/v2.31.0) of mold includes a 10% speed improvement for binaries with a lot of debug info. As we have such, it might be useful to update mold to the latest release. The jump is from 2.4.0 to 2.31.0, but it's not been many releases in between as the version number was raised by the mold maintainers to 2.30.0 after 2.4.1 [to avoid confusion for some tools](https://github.com/rui314/mold/releases/tag/v2.30.0).	2024-05-14 17:49:19 +02:00
John Spray	82960b2175	pageserver: skip waiting for logical size on shard >0 (#7744 ) ## Problem Shards with number >0 could hang waiting for `await_initial_logical_size`, as we don't calculate logical size on these shards. This causes them to hold onto semaphore units and starve other tenants out from proceeding with warmup activation. That doesn't hurt availability (we still have on-demand activation), but it does mean that some background tasks like consumption metrics would omit some tenants. ## Summary of changes - Skip waiting for logical size calculation on shards >0 - Upgrade unexpected code paths to use debug_assert!(), which acts as an implicit regression test for this issue, and make the info() one into a warn()	2024-05-14 16:39:17 +01:00
Alex Chi Z	30d15ad403	chore(test): add version check for forward compat test (#7685 ) A test for https://github.com/neondatabase/neon/pull/7684. This pull request checks if the pageserver version we specified is the one actually running by comparing the git hash in forward compatibility tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-14 10:36:48 -04:00
Alexander Bayandin	b6ee91835b	CI(report-benchmarks-failures): fix condition (#7745 ) ## Problem `report-benchmarks-failures` job is triggered for any failure in the CI pipeline, but we need it to be triggered only for failed `benchmarks` job ## Summary of changes - replace `failure()` with `needs.benchmarks.result == 'failure'` in the condition	2024-05-14 13:39:59 +03:00
John Spray	df0f1e359b	pageserver: switch on new-style local layer paths (#7660 ) We recently added support for local layer paths that contain a generation number: - https://github.com/neondatabase/neon/pull/7609 - https://github.com/neondatabase/neon/pull/7640 Now that we've cut a [release](https://github.com/neondatabase/neon/pull/7735) that includes those changes, we can proceed to enable writing the new format without breaking forward compatibility.	2024-05-14 09:37:48 +01:00
John Spray	cd0e344938	pageserver: do fewer heatmap uploads for tiny tenants (#7731 ) ## Problem Currently we do a large number of heatmap uploads for tiny tenants. "tiny" in this context is defined as being less than a single layer in size. These uploads are triggered by atime changes rather than changes in the set of layers. Uploading heatmaps for atime changes on small tenants isn't useful, because even without bumping these atimes, disk usage eviction still avoids evicting the largest resident layer of a tenant, which in practice keeps tiny/empty tenants mostly resident irrespective of atimes. ## Summary of changes - For tenants smaller than one checkpoint interval, only upload heatmap if the set of layers has changed, not if only the atimes have changed. - Include the heatmap period in the uploaded heatmap, as a precursor to implementing https://github.com/neondatabase/neon/issues/6200 (auto-adjusting download intervals to match upload intervals)	2024-05-14 09:31:26 +01:00
Heikki Linnakangas	22afaea6e1	Always use Lsn::MAX as the request LSN in the primary (#7708 ) The new protocol version supports sending two LSNs to the pageserver: request LSN and a "not_modified_since" hint. A primary always wants to read the latest version of each page, so having two values was not strictly necessary, and the old protocol worked fine with just the "not_modified_since" LSN and a flag to request the latest page version. Nevertheless, it seemed like a good idea to set the request LSN to the current insert/flush LSN, because that's logically the page version that the primary wants to read. However, that made the test_gc_aggressive test case flaky. When the primary requests a page with the last inserted or flushed LSN, it's possible that by the time that the pageserver processes the request, more WAL has been generated by other processes in the compute and already digested by the pageserver. Furthermore, if the PITR horizon in the pageserver is set to 0, and GC runs during that window, it's possible that the GC horizon has advances past the request LSN, before the pageserver processes the request. It is still correct to send the latest page version in that case, because the compute either has the page locked so the it cannot have been modified in the primary, or if it's a prefetch request, and we will validate the LSNs when the prefetch response is processed and discard it if the page has been modified. But the pageserver doesn't know that and rightly complains. To fix, modify the compute so that the primary always uses Lsn::MAX in the requests. This reverts the primary's behavior to how the protocol version 1 worked. In protocol version 1, there was only one LSN, the "not_modified_since" hint, and a flag was set to read the latest page version, whatever that might be. Requests from computes that are still using protocol version 1 were already mapped to Lsn::MAX in the pageserver, now we do the same with protocol version 2 for primary's requests. (I'm a bit sad about losing the information in the pageserver, what the last LSN was at the time that the request wa made. We never had it with protocol version 1, but I wanted to make it available for debugging purposes.) Add another field, 'effective_request_lsn', to track what the flush LSN was when the request was made. It's not sent to the pageserver, Lsn::MAX is now used as the request LSN, but it's still needed internally in the compute to track the validity of prefetch requests. Fixes issue https://github.com/neondatabase/neon/issues/7692	2024-05-14 09:32:43 +03:00
Heikki Linnakangas	ba20752b76	Refactor the request LSNs to a separate struct (#7708 ) We had a lot of code that passed around the two LSNs that are associated with each GetPage request. Introduce a new struct to encapsulate them. I'm about to add a third LSN to the struct in the next commit, this is a mechanical refactoring in preparation for that.	2024-05-14 09:32:43 +03:00
Arpad Müller	3a6fa76828	Tiered compaction: cut deltas along lsn as well if needed (#7671 ) In general, tiered compaction is splitting delta layers along the key dimension, but this can only continue until a single key is reached: if the changes from a single key don't fit into one layer file, we used to create layer files of unbounded sizes. This patch implements the method listed as TODO/FIXME in the source code. It does the following things: * Make `accum_key_values` take the target size and if one key's modifications exceed it, make it fill `partition_lsns`, a vector of lsns to use for partitioning. * Have `retile_deltas` use that `partition_lsns` to create delta layers separated by lsn. * Adjust the `test_many_updates_for_single_key` to allow layer files below 0.5 the target size. This situation can create arbitarily small layer files: The amount of data is arbitrary that sits between having just cut a new delta, and then stumbling upon the key that needs to be split along lsn. This data will end up in a dedicated layer and it can be arbitrarily small. * Ignore single-key delta layers for depth calculation: in theory we might have only single-key delta layers in a tier, and this might confuse depth calculation as well, but this should be unlikely. Fixes #7243 Part of #7554 --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-05-14 01:13:25 +02:00
Alex Chi Z	9ffb852359	fix(test): ensure compatibility test uses the correct compute node (#7741 ) Use the old compute node for compat tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-13 17:14:08 -04:00
John Spray	972470b174	pageserver: use adaptive concurrency in secondary layer downloads (#7675 ) ## Problem Secondary downloads are a low priority task, and intentionally do not try to max out download speeds. This is almost always fine when they are used through the life of a tenant shard as a continuous "trickle" of background downloads. However, there are sometimes circumstances where we would like to populate a secondary location as fast as we can, within the constraint that we don't want to impact the activity of attached tenants: - During node removal, where we will need to create replacements for secondary locations on the node being removed - After a shard split, we need new secondary locations for the new shards to populate before the shards can be migrated to their final location. ## Summary of changes - Add an activity() function to the remote storage interface, enabling callers to query how busy the remote storage backend is - In the secondary download code, use a very modest amount of concurrency, driven by the remote storage's state: we only use concurrency if the remote storage semaphore is 75% free, and scale the amount of concurrency used within that range. This is not a super clever form of prioritization, but it should accomplish the key goals: - Enable secondary downloads to happen faster when the system is idle - Make secondary downloads a much lower priority than attached tenants when the remote storage is busy. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-05-13 17:38:30 +00:00
Vlad Lazar	1412e9b3e8	pagectl: fix diagrams generation for paths containing generations (#7739 ) ## Problem When layer paths include generations, the lsn parsing does not work and `pagectl` errors out. ## Summary of changes If the last "word" of the layer path contains 8 characters, discard it for the purpose of lsn parsing.	2024-05-13 18:24:12 +01:00
John Spray	be0c73f8e7	pageserver: improve API for invoking GC (#7655 ) ## Problem In https://github.com/neondatabase/neon/pull/7531, I had a test flaky because the GC API endpoint fails if the tenant happens not to be active yet. ## Summary of changes While adding that wait for the tenant to be active, I noticed that this endpoint is kind of strange (spawns a TaskManager task) and has a comment `// TODO: spawning is redundant now, need to hold the gate`, so this PR cleans it up to just run the GC inline while holding a gate. The GC code is updated to avoid assuming it runs inside a task manager task. Avoiding checking the task_mgr cancellation token is safe, because our timeline shutdown always cancels Timeline::cancel.	2024-05-13 17:59:59 +01:00
Alex Chi Z	7f51764001	feat(pageserver): add metrics for aux file size (#7623 ) ref https://github.com/neondatabase/neon/issues/7443 ## Summary of changes This pull request adds a size estimator for aux files. Each timeline stores a cached `isize` for the estimated total size of aux files. It gets reset on basebackup, and gets updated for each aux file modification. TODO: print a warning when it exceeds the size. The size metrics is not accurate. Race between `on_basebackup` and other functions could create a negative basebackup size, but the chance is rare. Anyways, this does not impose any extra I/Os to the storage as everything is computed in-memory. The aux files are only stored on shard 0. As basebackups are only generated on shard 0, only shard 0 will report this metrics. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-13 15:33:41 +00:00
Joonas Koivunen	4d8a10af1c	fix: do not create metrics contention from background task permit (#7730 ) The background task loop permit metrics do two of `with_label_values` very often. Change the codepath to cache the counters on first access into a `Lazy` with `enum_map::EnumMap`. The expectation is that this should not fix for metric collection failures under load, but it doesn't hurt. Cc: #7161	2024-05-13 17:49:50 +03:00
Alexander Bayandin	55ba885f6b	CI(report-benchmarks-failures): report benchmarks failures to slack (#7678 ) ## Problem `benchmarks` job that we run on the main doesn't block anything, so it's easy to miss its failure. Ref https://github.com/neondatabase/cloud/issues/13087 ## Summary of changes - Add `report-benchmarks-failures` job that report failures of `benchmarks` job to a Slack channel	2024-05-13 14:16:03 +01:00
Christian Schwarz	6ff74295b5	chore(pageserver): plumb through RequestContext to VirtualFile open methods (#7725 ) This PR introduces no functional changes. The `open()` path will be done separately. refs https://github.com/neondatabase/neon/issues/6107 refs https://github.com/neondatabase/neon/issues/7386 Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-05-13 14:52:06 +02:00
Vlad Lazar	bbe730d7ca	Revert protocol version upgrade (#7727 ) ## Problem "John pointed out that the switch to protocol version 2 made test_gc_aggressive test flaky: https://github.com/neondatabase/neon/issues/7692. I tracked it down, and that is indeed an issue. Conditions for hitting the issue: The problem occurs in the primary GC horizon is set to a very low value, e.g. 0. If the primary is actively writing WAL, and GC runs in the pageserver at the same time that the primary sends a GetPage request, it's possible that the GC advances the GC horizon past the GetPage request's LSN. I'm working on a fix here: https://github.com/neondatabase/neon/pull/7708." - Heikki ## Summary of changes Use protocol version 1 as default.	2024-05-13 13:41:14 +01:00
Jure Bajic	5a0da93c53	Fix `test_lock_time_tracing` flakiness (#7712 ) ## Problem Closes [test_lock_time_tracing](https://github.com/neondatabase/neon/issues/7691) ## Summary of changes Taking a look at the execution of the same test in logs, it can be concluded that the time we are holding the lock is sometimes not enough(must be above 30s) to cause the second log to be shown by the thread that is creating a timeline. In the [successful execution](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7663/9021247520/index.html#testresult/a21bce8c702b37f0) it can be seen that the log `Operation TimelineCreate on key 5e088fc2dd14945020d0fa6d9efd1e36 has waited 30.000887709s for shared lock` was on the edge of being logged, if it was below 30s it would not be shown. ``` 2024-05-09T18:02:32.552093Z WARN request{method=PUT path=/control/v1/tenant/5e088fc2dd14945020d0fa6d9efd1e36/policy request_id=af7e4a04-d181-4acb-952f-9597c8eba5a8}: Lock on UpdatePolicy was held for 31.001892592s 2024-05-09T18:02:32.552109Z INFO request{method=PUT path=/control/v1/tenant/5e088fc2dd14945020d0fa6d9efd1e36/policy request_id=af7e4a04-d181-4acb-952f-9597c8eba5a8}: Request handled, status: 200 OK 2024-05-09T18:02:32.552271Z WARN request{method=POST path=/v1/tenant/5e088fc2dd14945020d0fa6d9efd1e36/timeline request_id=d3af756e-dbb3-476b-89bd-3594f19bbb67}: Operation TimelineCreate on key 5e088fc2dd14945020d0fa6d9efd1e36 has waited 30.000887709s for shared lock ``` In the [failed execution](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7663/9022743601/index.html#/testresult/deb90136aeae4fce): ``` 2024-05-09T20:14:33.526311Z INFO request{method=POST path=/v1/tenant/68194ffadb61ca11adcbb11cbeb4ec6e/timeline request_id=1daa8c31-522d-4805-9114-68cdcffb9823}: Creating timeline 68194ffadb61ca11adcbb11cbeb4ec6e/f72185990ed13f0b0533383f81d877af 2024-05-09T20:14:36.441165Z INFO Heartbeat round complete for 1 nodes, 0 offline 2024-05-09T20:14:41.441657Z INFO Heartbeat round complete for 1 nodes, 0 offline 2024-05-09T20:14:41.535227Z INFO request{method=POST path=/upcall/v1/validate request_id=94a7be88-474e-4163-92f8-57b401473add}: Handling request 2024-05-09T20:14:41.535269Z INFO request{method=POST path=/upcall/v1/validate request_id=94a7be88-474e-4163-92f8-57b401473add}: handle_validate: 68194ffadb61ca11adcbb11cbeb4ec6e(gen 1): valid=true (latest Some(00000001)) 2024-05-09T20:14:41.535284Z INFO request{method=POST path=/upcall/v1/validate request_id=94a7be88-474e-4163-92f8-57b401473add}: Request handled, status: 200 OK 2024-05-09T20:14:46.441854Z INFO Heartbeat round complete for 1 nodes, 0 offline 2024-05-09T20:14:51.441151Z INFO Heartbeat round complete for 1 nodes, 0 offline 2024-05-09T20:14:56.441199Z INFO Heartbeat round complete for 1 nodes, 0 offline 2024-05-09T20:15:01.440971Z INFO Heartbeat round complete for 1 nodes, 0 offline 2024-05-09T20:15:03.516320Z INFO request{method=PUT path=/control/v1/tenant/68194ffadb61ca11adcbb11cbeb4ec6e/policy request_id=0edfdb5b-2b05-486b-9879-d83f234d2f0d}: failpoint "tenant-update-policy-exclusive-lock": sleep done 2024-05-09T20:15:03.518474Z INFO request{method=PUT path=/control/v1/tenant/68194ffadb61ca11adcbb11cbeb4ec6e/policy request_id=0edfdb5b-2b05-486b-9879-d83f234d2f0d}: Updated scheduling policy to Stop tenant_id=68194ffadb61ca11adcbb11cbeb4ec6e shard_id=0000 2024-05-09T20:15:03.518512Z WARN request{method=PUT path=/control/v1/tenant/68194ffadb61ca11adcbb11cbeb4ec6e/policy request_id=0edfdb5b-2b05-486b-9879-d83f234d2f0d}: Scheduling is disabled by policy Stop tenant_id=68194ffadb61ca11adcbb11cbeb4ec6e shard_id=0000 2024-05-09T20:15:03.518540Z WARN request{method=PUT path=/control/v1/tenant/68194ffadb61ca11adcbb11cbeb4ec6e/policy request_id=0edfdb5b-2b05-486b-9879-d83f234d2f0d}: Lock on UpdatePolicy was held for 31.003712703s 2024-05-09T20:15:03.518570Z INFO request{method=PUT path=/control/v1/tenant/68194ffadb61ca11adcbb11cbeb4ec6e/policy request_id=0edfdb5b-2b05-486b-9879-d83f234d2f0d}: Request handled, status: 200 OK 2024-05-09T20:15:03.518804Z WARN request{method=POST path=/v1/tenant/68194ffadb61ca11adcbb11cbeb4ec6e/timeline request_id=1daa8c31-522d-4805-9114-68cdcffb9823}: Scheduling is disabled by policy Stop tenant_id=68194ffadb61ca11adcbb11cbeb4ec6e shard_id=0000 2024-05-09T20:15:03.518815Z INFO request{method=POST path=/v1/tenant/68194ffadb61ca11adcbb11cbeb4ec6e/timeline request_id=1daa8c31-522d-4805-9114-68cdcffb9823}: Creating timeline on shard 68194ffadb61ca11adcbb11cbeb4ec6e/f72185990ed13f0b0533383f81d877af, attached to node 1 (localhost) ``` we can see that the difference between starting to create timeline `2024-05-09T20:14:33.526311Z` and creating timeline `2024-05-09T20:15:03.518815Z` is not above 30s and will not cause any logs to appear. The proposed solution is to prolong how long we will pause to ensure that the thread that creates the timeline waits above 30s.	2024-05-13 13:18:14 +01:00
Joonas Koivunen	d9dcbffac3	python: allow using allowed_errors.py (#7719 ) See #7718. Fix it by renaming all `types.py` to `common_types.py`. Additionally, add an advert for using `allowed_errors.py` to test any added regex.	2024-05-13 15:16:23 +03:00
John Spray	f50ff14560	pageserver: refuse to run without remote storage (#7722 ) ## Problem Since https://github.com/neondatabase/neon/pull/6769, the pageserver is intentionally not usable without remote storage: it's purpose is to act as a cache to an object store, rather than as a source of truth in its own right. ## Summary of changes - Make remote storage configuration mandatory: the pageserver will refuse to start if it is not provided. This is a precursor that will make it safe to subsequently remove all the internal Option<>s	2024-05-13 13:05:46 +01:00
Christian Schwarz	b58a615197	chore(pageserver): plumb through RequestContext to VirtualFile read methods (#7720 ) This PR introduces no functional changes. The `open()` path will be done separately. refs https://github.com/neondatabase/neon/issues/6107 refs https://github.com/neondatabase/neon/issues/7386	2024-05-13 09:22:10 +00:00
Joonas Koivunen	1a1d527875	test: allow vectored get validation failure during shutdown (#7716 ) Per [evidence] the timeline ancestor detach tests can panic while shutting down on vectored get validation. Allow the error because tenant is restarted twice in the test. [evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7708/9058185709/index.html#suites/a1c2be32556270764423c495fad75d47/d444f7e5c0a18ce9	2024-05-13 09:21:49 +00:00
Joonas Koivunen	216fc5ba7b	test: fix confusing limit and logging (#7589 ) The test has been flaky since 2024-04-11 for unknown reason, and the logging was off. Fix the logging and raise the limit a bit. The problematic ratio reproduces with pg14 and added sleep (not included) but not on pg15. The new ratio abs diff limit works for all inspected examples. Cc: #7536	2024-05-13 11:56:07 +03:00
Joonas Koivunen	4270e86eb2	test(ancestor detach): verify with fullbackup (#7706 ) In timeline detach ancestor tests there is no way to really be sure that there were no subtle off-by one bugs. One such bug is demoed and reverted. Add verifying fullbackup is equal before and after detaching ancestor. Fullbackup is expected to be equal apart from `zenith.signal`, which is known to be good because endpoint can be started without the detached branch receiving writes.	2024-05-13 10:58:03 +03:00
Joonas Koivunen	6351313ae9	feat: allow detaching from ancestor for timelines without writes (#7639 ) The first implementation #7456 did not include `index_part.json` changes in an attempt to keep amount of changes down. Tracks the historic reparentings and earlier detach in `index_part.json`. - `index_part.json` receives a new field `lineage: Lineage` - `Lineage` is queried through RemoteTimelineClient during basebackup, creating `PREV LSN: none` for the invalid prev record lsn just as it would had been created for a newly created timeline - as `struct IndexPart` grew, it is now boxed in places Cc: #6994	2024-05-10 22:30:05 +03:00
Anastasia Lubennikova	95098c3216	Fix checkpoint metric (#7701 ) Split checkpoint_stats into two separate metrics: checkpoints_req and checkpoints_timed Fixes commit `21e1a496a3` --------- Co-authored-by: Peter Bendel <peterbendel@neon.tech>	2024-05-10 16:20:14 +00:00
Arpad Müller	d7c68dc981	Tiered compaction: fix early exit check in main loop (#7702 ) The old test based on the immutable `target_file_size` that was a parameter to the function. It makes no sense to go further once `current_level_target_height` has reached `u64::MAX`, as lsn's are u64 typed. In practice, we should only run into this if there is a bug, as the practical lsn range usually ends much earlier. Testing on `target_file_size` makes less sense, it basically implements an invocation mode that turns off the looping and only runs one iteration of it. @hlinnaka agrees that `current_level_target_height` is better here. Part of #7554	2024-05-10 18:50:47 +03:00
Joonas Koivunen	6206f76419	build: run doctests (#7697 ) While switching to use nextest with the repository in `f28bdb6`, we had not noticed that it doesn't yet support running doctests. Run the doc tests before other tests.	2024-05-10 16:46:50 +02:00
Joonas Koivunen	d7f34bc339	draw_timeline_dir: draw branch points and gc cutoff lines (#7657 ) in addition to layer names, expand the input vocabulary to recognize lines in the form of: ${kind}:${lsn} where: - kind in `gc_cutoff` or `branch` - lsn is accepted in Lsn display format (x/y) or hex (as used in layer names) gc_cutoff and branch have different colors.	2024-05-10 17:41:34 +03:00
Joonas Koivunen	86905c1322	openapi: resolve the synthetic_size duplication (#7651 ) We had accidentally left two endpoints for `tenant`: `/synthetic_size` and `/size`. Size had the more extensive description but has returned 404 since renaming. Remove the `/size` in favor of the working one and describe the `text/html` output.	2024-05-10 17:15:11 +03:00
Arthur Petukhovsky	0b02043ba4	Fix permissions for safekeeper failpoints (#7669 ) We didn't check permission in `"/v1/failpoints"` endpoint, it means that everyone with per-tenant token could modify the failpoints. This commit fixes that.	2024-05-10 13:32:42 +01:00
Andrey Taranik	873b222080	use own arm64 gha runners (#7373 ) ## Problem Move from aws based arm64 runners to bare-metal based ## Summary of changes Changes in GitHub action workflows where `runs-on: arm64` used. More parallelism added, build time for `neon with extra platform builds` workflow reduced from 45m to 25m	2024-05-10 11:04:23 +00:00
John Spray	13d9589c35	pageserver: don't call get_vectored with empty keyspace (#7686 ) ## Problem This caused a variation of the stats bug fixed by https://github.com/neondatabase/neon/pull/7662. That PR also fixed this case, but we still shouldn't make redundant get calls. ## Summary of changes - Only call get in the create image layers loop at the end of a range if some keys have been accumulated	2024-05-10 11:01:39 +00:00
Anna Khanova	be1a88e574	Proxy added per ep rate limiter (#7636 ) ## Problem There is no global per-ep rate limiter in proxy. ## Summary of changes * Return global per-ep rate limiter back. * Rename weak compute rate limiter (the cli flags were not used anywhere, so it's safe to rename).	2024-05-10 12:17:00 +02:00
Alex Chi Z	b9fd8dcf13	fix(test): update the config for neon_binpath in from_repo_dir (#7684 ) ## Problem https://github.com/neondatabase/neon/pull/7637 breaks forward compat test. On commit `ea531d448e`. https://neon-github-public-dev.s3.amazonaws.com/reports/main/8988324349/index.html ``` test_create_snapshot 2024-05-07T16:03:11.331883Z INFO version: git-env:ea531d448eb65c4f58abb9ef7d8cd461952f7c5f failpoints: true, features: ["testing"] launch_timestamp: 2024-05-07 16:03:11.316131763 UTC build_tag: build_tag-env:5159 test_forward_compatibility 2024-05-07T16:07:02.310769Z INFO version: git-env:ea531d448eb65c4f58abb9ef7d8cd461952f7c5f failpoints: true, features: ["testing"] launch_timestamp: 2024-05-07 16:07:02.294676183 UTC build_tag: build_tag-env:5159 ``` The forward compatibility test is actually using the same tag as the current build. The commit before that, https://neon-github-public-dev.s3.amazonaws.com/reports/main/8988126011/index.html ``` test_create_snapshot 2024-05-07T15:47:21.900796Z INFO version: git-env:2dbd1c1ed5cd0458933e8ffd40a9c0a5f4d610b8 failpoints: true, features: ["testing"] launch_timestamp: 2024-05-07 15:47:21.882784185 UTC build_tag: build_tag-env:5158 test_forward_compatibility 2024-05-07T15:50:48.828733Z INFO version: git-env:c4d7d5982553d2cf66634d1fbf85d95ef44a6524 failpoints: true, features: ["testing"] launch_timestamp: 2024-05-07 15:50:48.816635176 UTC build_tag: build_tag-env:release-5434 ``` This pull request patches the bin path so that the new neon_local will use the old binary. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-09 15:52:56 -04:00
dependabot[bot]	5ea117cddf	build(deps): bump Npgsql from 8.0.2 to 8.0.3 in /test_runner/pg_clients/csharp/npgsql (#7680 )	2024-05-09 17:55:57 +00:00
Alex Chi Z	2682e0254f	Revert "chore(neon_test_utils): restrict installation to superuser" (#7679 ) This reverts commit `1173ee6a7e`. ## Problem It breaks autoscaling tests	2024-05-09 15:15:19 +00:00
Arpad Müller	41fb838799	Fix tiered compaction k-merge bug and use in-memory alternative (#7661 ) This PR does two things: First, it fixes a bug with tiered compaction's k-merge implementation. It ignored the lsn of a key during ordering, so multiple updates of the same key could be read in arbitrary order, say from different layers. For example there is layers `[(a, 2),(b, 3)]` and `[(a, 1),(c, 2)]` in the heap, they might return `(a,2)` and `(a,1)`. Ultimately, this change wasn't enough to fix the ordering issues in #7296, in other words there is likely still bugs in the k-merge. So as the second thing, we switch away from the k-merge to an in-memory based one, similar to #4839, but leave the code around to be improved and maybe switched to later on. Part of #7296	2024-05-09 16:01:16 +02:00
John Spray	107f535294	storage controller: fix handing of tenants with no timelines during scheduling optimization (#7673 ) ## Problem Storage controller was using a zero layer count in SecondaryProgress as a proxy for "not initialized". However, in tenants with zero timelines (a legitimate state), the layer count remains zero forever. This caused https://github.com/neondatabase/neon/pull/7583 to destabilize the storage controller scale test, which creates lots of tenants, some of which don't get any timelines. ## Summary of changes - Use a None mtime instead of zero layer count to determine if a SecondaryProgress should be ignored. - Adjust the test to use a shorter heatmap upload period to let it proceed faster while waiting for scheduling optimizations to complete.	2024-05-09 12:33:09 +01:00
John Spray	39c712f2ca	tests: adjust log allow list since reqwest upgrade (#7666 ) ## Problem Various performance test cases were destabilized by the recent upgrade of `reqwest`, because it changes an error string. Examples: - https://neon-github-public-dev.s3.amazonaws.com/reports/main/9005532594/index.html#testresult/3f984e471a9029a5/ - https://neon-github-public-dev.s3.amazonaws.com/reports/main/9005532594/index.html#testresult/8bd0f095fe0402b7/ The performance tests suffer from this more than most tests, because they churn enough data that the pageserver is still trying to contact the storage controller while it is shut down at the end of tests. ## Summary of changes s/Connection refused/error sending request/	2024-05-09 10:07:59 +01:00
Christian Schwarz	ab10523cc1	remote_storage: AWS_PROFILE with endpoint overrides in ~/.aws/config (updates AWS SDKs) (#7664 ) Before this PR, using the AWS SDK profile feature for running against minio didn't work because * our SDK versions were too old and didn't include https://github.com/awslabs/aws-sdk-rust/issues/1060 and * we didn't massage the s3 client config builder correctly. This PR * udpates all the AWS SDKs we use to, respectively, the latest version I could find on crates.io (Is there a better process?) * changes the way remote_storage constructs the S3 client, and * documents how to run the test suite against real S3 & local minio. Regarding the changes to `remote_storage`: if one reads the SDK docs, it is clear that the recommended way is to use `aws_config::from_env`, then customize. What we were doing instead is to use the `aws_sdk_s3` builder directly. To get the `local-minio` in the added docs working, I needed to update both the SDKs and make the changes to the `remote_storage`. See the commit history in this PR for details. Refs: * byproduct: https://github.com/smithy-lang/smithy-rs/pull/3633 * follow-up on deprecation: https://github.com/neondatabase/neon/issues/7665 * follow-up for scrubber S3 setup: https://github.com/neondatabase/neon/issues/7667	2024-05-09 10:58:38 +02:00
Vlad Lazar	d5399b729b	pageserver: fix division by zero in layer counting metric (#7662 ) For aux file keys (v1 or v2) the vectored read path does not return an error when they're missing. Instead they are omitted from the resulting btree (this is a requirement, not a bug). Skip updating the metric in these cases to avoid infinite results.	2024-05-08 18:29:16 +00:00
Konstantin Knizhnik	b06eec41fa	Ignore page header when comparing VM pages in test_vm_bits.py (#7499 ) ## Problem See #6714, #6967 ## Summary of changes Completely ignore page header when comparing VM pages. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-08 20:58:35 +03:00
John Spray	ca154d9cd8	pageserver: local layer path followups (#7640 ) - Rename "filename" types which no longer map directly to a filename (LayerFileName -> LayerName) - Add a -v1- part to local layer paths to smooth the path to future updates (we anticipate a -v2- that uses checksums later) - Rename methods that refer to the string-ized version of a LayerName to no longer be called "filename" - Refactor reconcile() function to use a LocalLayerFileMetadata type that includes the local path, rather than carrying local path separately in a tuple and unwrap()'ing it later.	2024-05-08 16:50:21 +00:00
Alex Chi Z	1173ee6a7e	chore(neon_test_utils): restrict installation to superuser (#7624 ) The test utils should only be used during tests. Users should not be able to create this extension on their own. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-08 11:53:54 -04:00
Sasha Krassovsky	21e1a496a3	Expose LSN and replication delay as metrics (#7610 ) ## Problem We currently have no way to see what the current LSN of a compute its, and in case of read replicas, we don't know what the difference in LSNs is. ## Summary of changes Adds these metrics	2024-05-08 08:49:57 -07:00
Arthur Petukhovsky	0457980728	Fix flaky test_gc_of_remote_layers (#7647 ) Fixes flaky test `test_gc_of_remote_layers`, which was failing because of the `Nothing to GC` pageserver log. I looked into the fails, it seems that backround `gc_loop` sometimes started GC for initial tenant, which wasn't configured to disable GC. The fix is to not create initial tenant with enabled gc at all. Fixes #7538	2024-05-08 15:22:13 +00:00
Christian Schwarz	8728d5a5fd	neon_local: use `pageserver.toml` as source of truth for `struct PageServerConf` (#7642 ) Before this PR, `neon_local` would store a copy of a subset of the initial `pageserver.toml` in its `.neon/config`, e.g, `listen_pg_addr`. That copy is represented as `struct PageServerConf`. This copy was used to inform e.g., `neon_local endpoint` and other commands that depend on Pageserver about which port to connect to. The problem with that scheme is that the duplicated information in `.neon/config` can get stale if `pageserver.toml` is changed. This PR fixes that by eliminating populating `struct PageServerConf` from the `pageserver.toml`s. The `[[pageservers]]` TOML table in the `.neon/config` is obsolete. As of this PR, `neon_local` will fail to start and print an error informing about this change. Code-level changes: - Remove the `--pg-version` flag, it was only used for some checks during `neon_local init` - Remove the warn-but-continue behavior for when auth key creation fails but auth keys are not required. It's just complexity that is unjustified for a tool like `neon_local`. - Introduce a type-system-level distinction between the runtime state and the two (!) toml formats that are almost the same but not quite. - runtime state: `struct PageServerConf`, now without `serde` derives - toml format 1: the state in `.neon/config` => `struct OnDiskState` - toml format 2: the `neon_local init --config TMPFILE` that, unlike `struct OnDiskState`, allows specifying `pageservers` - Remove `[[pageservers]]` from the `struct OnDiskState` and load the data from the individual `pageserver.toml`s instead.	2024-05-08 14:32:21 +00:00
Alexander Bayandin	a4a4d78993	build(deps): bump moto from 4.1.2 to 5.0.6 (#7653 ) ## Problem The main point of this PR is to get rid of `python-jose` and `ecdsa` packages as transitive dependencies through `moto`. They have a bunch of open vulnerabilities[1][2][3] (which don't affect us directly), but it's nice not to have them at all. - [1] https://github.com/advisories/GHSA-wj6h-64fc-37mp - [2] https://github.com/advisories/GHSA-6c5p-j8vq-pqhj - [3] https://github.com/advisories/GHSA-cjwg-qfpm-7377 ## Summary of changes - Update `moto` from 4.1.2 to 5.0.6 - Update code to accommodate breaking changes in `moto_server`	2024-05-08 12:26:56 +01:00
Arpad Müller	870786bd82	Improve tiered compaction tests (#7643 ) Improves the tiered compaction tests: * Adds a new test that is a simpler version of the ignored `test_many_updates_for_single_key` test. * Reduces the amount of data that `test_many_updates_for_single_key` processes to make it execute more quickly. * Adds logging support.	2024-05-08 13:22:55 +02:00
Arpad Müller	b6d547cf92	Tiered compaction: add order asserts after delta key k-merge (#7648 ) Adds ordering asserts to the output of the delta key iterator `MergeDeltaKeys` that implements a k-merge. Part of #7296 : the asserts added by this PR get hit in the reproducers of #7296 as well, but they are earlier in the pipeline.	2024-05-08 13:22:27 +02:00
Conrad Ludgate	e3a2631df9	proxy: do not invalidate cache for permit errors (#7652 ) ## Problem If a permit cannot be acquired to connect to compute, the cache is invalidated. This had the observed affect of sending more traffic to ProxyWakeCompute on cplane. ## Summary of changes Make sure that permit acquire failures are marked as "should not invalidate cache".	2024-05-08 10:33:41 +00:00
Christian Schwarz	02d42861e4	`neon_local init`: write `pageserver.toml` directly; no `pageserver --init --config-override` (#7638 ) This does to `neon_local` what https://github.com/neondatabase/aws/pull/1322 does to our production deployment. After both are merged, there are no users of `pageserver --init` / `pageserver --config-override` left, and we can remove those flags eventually.	2024-05-08 09:03:29 +00:00
John Spray	586e77bb24	tests: common log allow list for ancestor detach tests (#7645 ) These log lines were repeated, and `test_detached_receives_flushes_while_being_detached` had an incomplete definition. Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7531/8989511410/index.html#suites/a1c2be32556270764423c495fad75d47/992897d3a3369210	2024-05-08 08:50:34 +01:00
Em Sharnoff	b827e7b330	compute_ctl: Fix unused variable on non-Linux (#7646 ) Introduced by refactorings from #7577. See an example check-macos-build failure here: https://github.com/neondatabase/neon/actions/runs/8992211409/job/24701531264	2024-05-07 22:35:23 +00:00
Em Sharnoff	26b1483204	compute_ctl: Lift drop(startup_context_guard) into main() (#7577 ) Part of applying the changes from #7600. This piece technically can change the semantics because now the context guard is held before process_cli, but... the difference is likely quite small. Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-05-07 13:58:46 -07:00
Em Sharnoff	d709bcba81	compute_ctl: Break up main() into discrete phases (#7577 ) This commit is intentionally designed to have as small a diff as possible. To that end, the basic idea is that each distinct "chunk" of the previous main() has been wrapped in its own function, with the return values from each function being passed directly into the next. The structure of main() is now visible from its contents, which have a handful of smaller functions. There's a lot of other work that can / should(?) be done beyond this, but I figure that's more opinionated, and this should be a solid start. Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-05-07 13:58:46 -07:00
Em Sharnoff	b158a5eda0	compute_ctl: Non-functional prep changes to reduce diff (#7577 ) A couple lines moved further down in main(), and one case of using Option<&str> instead of Option<&String>.	2024-05-07 13:58:46 -07:00
Conrad Ludgate	0c99e5ec6d	proxy: cull http connections (#7632 ) ## Problem Some HTTP client connections can stay open for quite a long time. ## Summary of changes When there are too many HTTP client connections, pick a random connection and gracefully cancel it.	2024-05-07 18:15:06 +01:00
John Spray	0af66a6003	pageserver: include generation number in local layer paths (#7609 ) ## Problem In https://github.com/neondatabase/neon/pull/7531, we would like to be able to rewrite layers safely. One option is to make `Layer` able to rewrite files in place safely (e.g. by blocking evictions/deletions for an old Layer while a new one is created), but that's relatively fragile. It's more robust in general if we simply never overwrite the same local file: we can do that by putting the generation number in the filename. ## Summary of changes - Add `local_layer_path` (counterpart to `remote_layer_path`) and convert all locations that manually constructed a local layer path by joining LayerFileName to timeline path - In the layer upload path, construct remote paths with `remote_layer_path` rather than trying to build them out of a local path. - During startup, carry the full path to layer files through `init::reconcile`, and pass it into `Layer::for_resident` - Add a test to make sure we handle upgrades properly. - Comment out the generation part of `local_layer_path`, since we need to maintain forward compatibility for one release. A tiny followup PR will enable it afterwards. We could make this a bit simpler if we bulk renamed existing layers on startup instead of carrying literal paths through init, but that is operationally risky on existing servers with millions of layer files. We can always do a renaming change in future if it becomes annoying, but for the moment it's kind of nice to have a structure that enables us to change local path names again in future quite easily. We should rename `LayerFileName` to `LayerName` or somesuch, to make it more obvious that it's not a literal filename: this was already a bit confusing where that type is used in remote paths. That will be a followup, to avoid polluting this PR's diff.	2024-05-07 18:03:12 +01:00
Alex Chi Z	017c34b773	feat(pageserver): generate basebackup from aux file v2 storage (#7517 ) This pull request adds the new basebackup read path + aux file write path. In the regression test, all logical replication tests are run with matrix aux_file_v2=false/true. Also fixed the vectored get code path to correctly return missing key error when being called from the unified sequential get code path. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-07 16:30:18 +00:00
Christian Schwarz	308227fa51	remove `neon_local --pageserver-config-override` (#7614 ) Preceding PR https://github.com/neondatabase/neon/pull/7613 reduced the usage of `--pageserver-config-override`. This PR builds on top of that work and fully removes the `neon_local --pageserver-config-override`. Tests that need a non-default `pageserver.toml` control it using two options: 1. Specify `NeonEnvBuilder.pageserver_config_override` before `NeonEnvBuilder.init_start()`. This uses a new `neon_local init --pageserver-config` flag. 2. After `init_start()`: `env.pageserver.stop()` + `NeonPageserver.edit_config_toml()` + `env.pageserver.start()` A few test cases were using `env.pageserver.start(overrides=("--pageserver-config-override...",))`. I changed them to use one of the options above. Future Work ----------- The `neon_local init --pageserver-config` flag still uses `pageserver --config-override` under the hood. In the future, neon_local should just write the `pageserver.toml` directly. The `NeonEnvBuilder.pageserver_config_override` field should be renamed to `pageserver_initial_config`. Let's save this churn for a separate refactor commit.	2024-05-07 16:29:59 +00:00
Joonas Koivunen	d041f9a887	refactor(rtc): remove excess cloning (#7635 ) RemoteTimelineClient has a lot of mandatory cloning. By using a single way of creating IndexPart out of UploadQueueInitialized we can simplify things and also avoid cloning the latest files for each `index_part.json` upload (the contents will still be cloned).	2024-05-07 19:22:29 +03:00
Christian Schwarz	ea531d448e	fix(test suite): forward compat test is not using latest neon_local (#7637 ) The `test_forward_compatibility` test runs the old production binaries, but is supposed to always run the latest neon_local binary. I think commit `6acbee23` broke that by accident because in that commit, `from_repo_dir` is introduced and runs an `init_start()` before the `test_forward_compatibility` gets a chance to patch up the neon_local_binpath.	2024-05-07 15:43:04 +00:00
dependabot[bot]	2dbd1c1ed5	build(deps): bump flask-cors from 3.0.10 to 4.0.1 (#7633 )	2024-05-07 16:29:40 +01:00
Alexander Bayandin	51376ef3c8	Add Postgres commit sha to Postgres version (#4603 ) ## Problem Ref https://neondb.slack.com/archives/C036U0GRMRB/p1688122168477729 ## Summary of changes - Add sha from postgres repo into postgres version string (via `--with-extra-version`) - Add a test that Postgres version matches the expected one - Remove build-time hard check and allow only related tests to fail	2024-05-07 15:18:17 +00:00
dependabot[bot]	5a3d8e75ed	build(deps): bump jinja2 from 3.1.3 to 3.1.4 (#7626 )	2024-05-07 12:53:52 +00:00
dependabot[bot]	6e4e578841	build(deps): bump werkzeug from 3.0.1 to 3.0.3 (#7625 )	2024-05-07 13:12:53 +01:00
Joonas Koivunen	3c9b484c4d	feat: Timeline detach ancestor (#7456 ) ## Problem Timelines cannot be deleted if they have children. In many production cases, a branch or a timeline has been created off the main branch for various reasons to the effect of having now a "new main" branch. This feature will make it possible to detach a timeline from its ancestor by inheriting all of the data before the branchpoint to the detached timeline and by also reparenting all of the ancestor's earlier branches to the detached timeline. ## Summary of changes - Earlier added copy_lsn_prefix functionality is used - RemoteTimelineClient learns to adopt layers by copying them from another timeline - LayerManager adds support for adding adopted layers - `timeline::Timeline::{prepare_to_detach,complete_detaching}_from_ancestor` and `timeline::detach_ancestor` are added - HTTP PUT handler Cc: #6994 Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-05-07 13:47:57 +03:00
John Spray	af849a1f61	pageserver: post-shard-split layer trimming (1/2) (#7572 ) ## Problem After a shard split of a large existing tenant, child tenants can end up with oversized historic layers indefinitely, if those layers are prevented from being GC'd by branchpoints. This PR is followed by https://github.com/neondatabase/neon/pull/7531 Related issue: https://github.com/neondatabase/neon/issues/7504 ## Summary of changes - Add a new compaction phase `compact_shard_ancestors`, which identifies layers that are no longer needed after a shard split. - Add a Timeline->LayerMap code path called `rewrite_layers` , which is currently only used to drop layers, but will later be used to rewrite them as well in https://github.com/neondatabase/neon/pull/7531 - Add a new test that compacts after a split, and checks that something is deleted. Note that this doesn't have much impact on a tenant's resident size (since unused layers would end up evicted anyway), but it: - Makes index_part.json much smaller - Makes the system easier to reason about: avoid having tenants which are like "my physical size is 4TiB but don't worry I'll never actually download it", instead have tenants report the real physical size of what they might download. Why do we remove these layers in compaction rather than during the split? Because we have existing split tenants that need cleaning up. We can add it to the split operation in future as an optimization.	2024-05-07 11:15:58 +01:00
Christian Schwarz	ac7dc82103	use less `neon_local --pageserver-config-override` / `pageserver -c` (#7613 )	2024-05-06 22:31:26 +02:00
Anna Khanova	f1b654b77d	proxy: reduce number of concurrent connections (#7620 ) ## Problem Usually, the connection itself is quite fast (bellow 10ms for p999: https://neonprod.grafana.net/goto/aOyn8vYIg?orgId=1). It doesn't make a lot of sense to wait for a lot of time for the lock, if it takes a lot of time to acquire it, probably, something goes wrong. We also spawn a lot of retries, but they are not super helpful (0 means that it was connected successfully, 1, most probably, that it was re-request of the compute node address https://neonprod.grafana.net/goto/J_8VQvLIR?orgId=1). Let's try to keep a small number of retries.	2024-05-06 19:03:25 +00:00
Sasha Krassovsky	7dd58e1449	On-demand WAL download for walsender (#6872 ) ## Problem There's allegedly a bug where if we connect a subscriber before WAL is downloaded from the safekeeper, it creates an error. ## Summary of changes Adds support for pausing safekeepers from sending WAL to computes, and then creates a compute and attaches a subscriber while it's in this paused state. Fails to reproduce the issue, but probably a good test to have --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2024-05-06 10:54:07 -07:00
Arpad Müller	f3af5f4660	Fix test_ts_of_lsn_api flakiness (#7599 ) Changes parameters to fix the flakiness of `test_ts_of_lsn_api`. Already now, the amount of flakiness of the test is pretty low. With this, it's even lower. cc #5768	2024-05-06 16:41:51 +00:00
Joonas Koivunen	a96e15cb6b	test: less flaky test_synthetic_size_while_deleting (#7622 ) #7585 introduced test case for deletions while synthetic size is being calculated. The test has a race against deletion, but we only accept one outcome. Fix it to accept 404 as well, as we cannot control from outside which outcome happens. Evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7456/8970595458/index.html#/testresult/32a5b2f8c4094bdb	2024-05-06 15:52:51 +00:00
Christian Schwarz	df1def7018	refactor(pageserver): remove --update-init flag (#7612 ) We don't actually use it. refs https://github.com/neondatabase/neon/issues/7555	2024-05-06 16:40:44 +02:00
Tristan Partin	69337be5c2	Fix grammar in provider.rs error message s/temporary/temporarily --------- Co-authored-by: Barry Grenon <barry_grenon@yahoo.ca>	2024-05-06 09:14:42 -05:00
John Spray	67a2215163	pageserver: label tenant_slots metric by slot type (#7603 ) ## Problem The current `tenant_slots` metric becomes less useful once we have lots of secondaries, because we can't tell how many tenants are really attached (without doing a sum() on some other metric). ## Summary of changes - Add a `mode` label to this metric - Update the metric with `slot_added` and `slot_removed` helpers that are called at all the places we mutate the tenants map. - Add a debug assertion at shutdown that checks the metrics add up to the right number, as a cheap way of validating that we're calling the metric hooks in all the right places.	2024-05-06 14:07:15 +01:00
John Spray	3764dd2e84	pageserver: call maybe_freeze_ephemeral_layer from a dedicated task (#7594 ) ## Problem In testing of the earlier fix for OOMs under heavy write load (https://github.com/neondatabase/neon/pull/7218), we saw that the limit on ephemeral layer size wasn't being reliably enforced. That was diagnosed as being due to overwhelmed compaction loops: most tenants were waiting on the semaphore for background tasks, and thereby not running the function that proactively rolls layers frequently enough. Related: https://github.com/neondatabase/neon/issues/6939 ## Summary of changes - Create a new per-tenant background loop for "ingest housekeeping", which invokes maybe_freeze_ephemeral_layer() without taking the background task semaphore. - Downgrade to DEBUG a log line in maybe_freeze_ephemeral_layer that had been INFO, but turns out to be pretty common in the field. There's some discussion on the issue (https://github.com/neondatabase/neon/issues/6939#issuecomment-2083554275) about alternatives for calling this maybe_freeze_epemeral_layer periodically without it getting stuck behind compaction. A whole task just for this feels like kind of a big hammer, but we may in future find that there are other pieces of lightweight housekeeping that we want to do here too. Why is it okay to call maybe_freeze_ephemeral_layer outside of the background tasks semaphore? - this is the same work we would do anyway if we receive writes from the safekeeper, just done a bit sooner. - The period of the new task is generously jittered (+/- 5%), so when the ephemeral layer size tips over the threshold, we shouldn't see an excessively aggressive thundering herd of layer freezes (and only layers larger than the mean layer size will be frozen) - All that said, this is an imperfect approach that relies on having a generous amount of RAM to dip into when we need to freeze somewhat urgently. It would be nice in future to also block compaction/GC when we recognize resource stress and need to do other work (like layer freezing) to reduce memory footprint.	2024-05-06 14:07:07 +01:00
Heikki Linnakangas	0115fe6cb2	Make 'neon.protocol_version = 2' the default (#7616 ) Once all the computes in production have restarted, we can remove protocol version 1 altogether. See issue #6211.	2024-05-06 14:37:55 +03:00
Arseny Sher	e6da7e29ed	Add option allowing running multiple endpoints on the same branch. This is used by safekeeper tests.	2024-05-06 11:08:51 +03:00
Arseny Sher	0353a72a00	pg_waldump segment on safekeeper in test_pg_waldump. To test it as well.	2024-05-06 07:18:38 +03:00
Arseny Sher	ce4d3da3ae	Properly initialize first WAL segment on safekeepers. Previously its segment header and page header of first record weren't initialized because compute streams data only since first record LSN. Also, fix a bug in the existing code for initialization: xlp_rem_len must not include page header. These changes make first segment pg_waldump'able.	2024-05-06 07:18:38 +03:00
Arseny Sher	5da3e2113a	Allow bad state (not active) pageserver error/warns in walcraft test. The top reason for it being flaky.	2024-05-06 06:45:27 +03:00
Heikki Linnakangas	4deb8dc52e	compute_ctl: Be more precise in how startup time is calculated (#7601 ) - On a non-pooled start, do not reset the 'start_time' after launching the HTTP service. In a non-pooled start, it's fair to include that in the total startup time. - When setting wait_for_spec_ms and resetting start_time, call Utc::now() only once. It's a waste of cycles to call it twice, but also, it ensures the time between setting wait_for_spec_ms and resetting start_time is included in one or the other time period. These differences should be insignificant in practice, in the microsecond range, but IMHO it seems more logical and readable this way too. Also fix and clarify some of the surrounding comments. (This caught my eye while reviewing PR #7577)	2024-05-04 08:44:18 +03:00
Em Sharnoff	64f0613edf	compute_ctl: Add support for swap resizing (#7434 ) Part of neondatabase/cloud#12047. Resolves #7239. In short, this PR: 1. Adds `ComputeSpec.swap_size_bytes: Option<u64>` 2. Adds a flag to compute_ctl: `--resize-swap-on-bind` 3. Implements running `/neonvm/bin/resize-swap` with the value from the compute spec before starting postgres, if both the value in the spec AND the flag are specified. 4. Adds `sudo` to the final image 5. Adds a file in `/etc/sudoers.d` to allow `compute_ctl` to resize swap Various bits of reasoning about design decisions in the added comments. In short: We have both a compute spec field and a flag to make rollout easier to implement. The flag will most likely be removed as part of cleanups for neondatabase/cloud#12047.	2024-05-03 12:57:45 -07:00
Christian Schwarz	1e7cd6ac9f	refactor: move `NodeMetadata` to `pageserver_api`; use it from `neon_local` (#7606 ) This is the first step towards representing all of Pageserver configuration as clean `serde::Serialize`able Rust structs in `pageserver_api`. The `neon_local` code will then use those structs instead of the crude `toml_edit` / string concatenation that it does today. refs https://github.com/neondatabase/neon/issues/7555 --------- Co-authored-by: Alex Chi Z <iskyzh@gmail.com>	2024-05-03 13:15:38 -04:00
Alex Chi Z	ef03b38e52	fix(pageserver): remove update_gc_info calls in tests (#7608 ) introduced by https://github.com/neondatabase/neon/pull/7468 conflicting with https://github.com/neondatabase/neon/pull/7584 Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-03 16:01:33 +00:00
Conrad Ludgate	9b65946566	proxy: add connect compute concurrency lock (#7607 ) ## Problem Too many connect_compute attempts can overwhelm postgres, getting the connections stuck. ## Summary of changes Limit number of connection attempts that can happen at a given time.	2024-05-03 15:45:24 +00:00
Alex Chi Z	a3fe12b6d8	feat(pageserver): add scan interface (#7468 ) This pull request adds the scan interface. Scan operates on a sparse keyspace and retrieves all the key-value pairs from the keyspaces. Currently, scan only supports the metadata keyspace, and by default do not retrieve anything from the ancestor branch. This should be fixed in the future if we need to have some keyspaces that inherits from the parent. The scan interface reuses the vectored get code path by disabling the missing key errors. This pull request also changes the behavior of vectored get on aux file v1/v2 key/keyspace: if the key is not found, it is simply not included in the result, instead of throwing a missing key error. TODOs in future pull requests: limit memory consumption, ensure the search stops when all keys are covered by the image layer, remove `#[allow(dead_code)]` once the code path is used in basebackups / aux files, remove unnecessary fine-grained keyspace tracking in vectored get (or have another code path for scan) to improve performance. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-03 10:43:30 -04:00
John Spray	b5a6e68e68	storage controller: check warmth of secondary before doing proactive migration (#7583 ) ## Problem The logic in Service::optimize_all would sometimes choose to migrate a tenant to a secondary location that was only recently created, resulting in Reconciler::live_migrate hitting its 5 minute timeout warming up the location, and proceeding to attach a tenant to a location that doesn't have a warm enough local set of layer files for good performance. Closes: #7532 ## Summary of changes - Add a pageserver API for checking download progress of a secondary location - During `optimize_all`, connect to pageservers of candidate optimization secondary locations, and check they are warm. - During shard split, do heatmap uploads and start secondary downloads, so that the new shards' secondary locations start downloading ASAP, rather than waiting minutes for background downloads to kick in. I have intentionally not implemented this by continuously reading the status of locations, to avoid dealing with the scale challenge of efficiently polling & updating 10k-100k locations status. If we implement that in the future, then this code can be simplified to act based on latest state of a location rather than fetching it inline during optimize_all.	2024-05-03 14:28:23 +00:00
Christian Schwarz	ce0ddd749c	test_runner: remove unused `NeonPageserver.config_override` field (#7605 ) refs https://github.com/neondatabase/neon/issues/7555	2024-05-03 16:05:00 +02:00
Arpad Müller	426598cf76	Update rust to 1.78.0 (#7598 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. Release notes: https://blog.rust-lang.org/2024/05/02/Rust-1.78.0.html Prior update was in #7198	2024-05-03 15:59:28 +02:00
John Spray	8b4dd5dc27	pageserver: jitter secondary periods (#7544 ) ## Problem After some time the load from heatmap uploads gets rather spiky. They're unintentionally synchronising. Chart (does this make a _boing_ sound in anyone else's head?): ![image](https://github.com/neondatabase/neon/assets/944640/18829fc8-c5b7-4739-9a9b-491b5d6fcade) ## Summary of changes - Add a helper `period_jitter` and apply a 5% jitter from downloader and heatmap_uploader when updating the next runtime at the end of an interation. - Refactor existing places that we pick a startup interval into `period_warmup`, so that the intent is obvious.	2024-05-03 12:31:25 +00:00
Joonas Koivunen	ed9a114bde	fix: find gc cutoff points without holding Tenant::gc_cs (#7585 ) The current implementation of finding timeline gc cutoff Lsn(s) is done while holding `Tenant::gc_cs`. In recent incidents long create branch times were caused by holding the `Tenant::gc_cs` over extremely long `Timeline::find_lsn_by_timestamp`. The fix is to find the GC cutoff values before taking the `Tenant::gc_cs` lock. This change is safe to do because the GC cutoff values and the branch points have no dependencies on each other. In the case of `Timeline::find_gc_cutoff` taking a long time with this change, we should no longer see `Tenant::gc_cs` interfering with branch creation. Additionally, the `Tenant::refresh_gc_info` is now tolerant of timeline deletions (or any other failures to find the pitr_cutoff). This helps with the synthetic size calculation being constantly completed instead of having a break for a timely timeline deletion. Fixes: #7560 Fixes: #7587	2024-05-03 14:57:26 +03:00
John Spray	b7385bb016	storage_controller: fix non-timeline passthrough GETs (#7602 ) ## Problem We were matching on `/tenant/:tenant_id` and `/tenant/:tenant_id/timeline`, but not non-timeline tenant sub-paths. There aren't many: this was only noticeable when using the synthetic_size endpoint by hand. ## Summary of changes - Change the wildcard from `/tenant/:tenant_id/timeline` to `/tenant/:tenant_id/*` - Add test lines that exercise this	2024-05-03 12:52:43 +01:00
Vlad Lazar	37b1930b2f	tests: relax test download remote layers api (#7604 ) ## Problem This test triggers layer download failures on demand. It is possible to modify the failpoint during a `Timeline::get_vectored` right between the vectored read and it's validation read. This means that one of the reads can fail while the other one succeeds and vice versa. ## Summary of changes These errors are expected, so allow them to happen.	2024-05-03 12:40:09 +01:00
Arpad Müller	d76963691f	Increase Azure parallelism limit to 100 (#7597 ) After #5563 has been addressed we can now set the Azure strorage parallelism limit to 100 like it is for S3. Part of #5567	2024-05-03 13:23:11 +02:00
Joonas Koivunen	60f570c70d	refactor(update_gc_info): split GcInfo to compose out of GcCutoffs (#7584 ) Split `GcInfo` and replace `Timeline::update_gc_info` with a method that simply finds gc cutoffs `Timeline::find_gc_cutoffs` to be combined as `Timeline::gc_info` at the caller. This change will be followed up with a change that finds the GC cutoff values before taking the `Tenant::gc_cs` lock. Cc: #7560	2024-05-03 13:11:51 +03:00
Alex Chi Z	3582a95c87	fix(pageserver): compile warning of download_object.ctx on macos (#7596 ) fix macOS compile warning introduced in `45ec8688ea` Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-03 10:55:48 +02:00
Jure Bajic	00423152c6	Store operation identifier in `IdLockMap` on exclusive lock (#7397 ) ## Problem Issues around operation and tenant locks would have been hard to debug since there was little observability around them. ## Summary of changes - As suggested in the issue, a wrapper was added around `OwnedRwLockWriteGuard` called `IdentifierLock` that removes the operation currently holding the exclusive lock when it's dropped. - The value in `IdLockMap` was extended to hold a pair of locks and operations that can be accessed and locked independently. - When requesting an exclusive lock besides returning the lock on that resource, an operation is changed if the lock is acquired. Closes https://github.com/neondatabase/neon/issues/7108	2024-05-03 09:38:19 +01:00
Anna Khanova	240efb82f9	Proxy reconnect pubsub before expiration (#7562 ) ## Problem Proxy reconnects to redis only after it's already unavailable. ## Summary of changes Reconnects every 6h.	2024-05-03 10:00:29 +02:00
Arpad Müller	5f099dc760	Use streaming downloads for Azure as well (#7579 ) The main challenge was in the second commit, as `DownloadStream` requires the inner to be Sync but the stream returned by the Azure SDK wasn't Sync. This left us with three options: * Change the Azure SDK to return Sync streams. This was abandoned after we realized that we couldn't just make `TokenCredential`'s returned future Sync: it uses the `async_trait` macro and as the `TokenCredential` trait is used in dyn form, one can't use Rust's new "async fn in Trait" feature. * Change `DownloadStream` to not require `Sync`. This was abandoned after it turned into a safekeeper refactoring project. * Put the stream into a `Mutex` and make it obtain a lock on every poll. This adds some performance overhead but locks that actually don't do anything should be comparatively cheap. We went with the third option in the end as the change still represents an improvement. Follow up of #5446 , fixes #5563	2024-05-02 20:19:00 +02:00
Arpad Müller	7a49e5d5c2	Remove tenant_id from TenantLocationConfigRequest (#7469 ) Follow-up of #7055 and #7476 to remove `tenant_id` from `TenantLocationConfigRequest` completely. All components of our system should now not specify the `tenant_id`. cc https://github.com/neondatabase/cloud/pull/11791	2024-05-02 20:18:13 +02:00
Christian Schwarz	45ec8688ea	chore(pageserver): plumb through RequestContext to VirtualFile write methods (#7566 ) This PR introduces no functional changes. The read path will be done separately. refs https://github.com/neondatabase/neon/issues/6107 refs https://github.com/neondatabase/neon/issues/7386	2024-05-02 18:58:10 +02:00
Alex Chi Z	4b55dad813	vm-image: add sqlexporter for autoscaling metrics (#7514 ) As discussed in https://github.com/neondatabase/autoscaling/pull/895, we want to have a separate sql_exporter for simple metrics to avoid overload the database because the autoscaling agent needs to scrape at a higher interval. The new exporter is exposed at port 9499. I didn't do any testing for this pull request but given it's just a configuration change I assume this works. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-02 12:43:36 -04:00
Matt Podraza	ab95942fc2	storage controller: make the initial database wait configurable (#7591 ) This allows passing a humantime string in the CLI to configure the initial wait for the database. It defaults to the previously hard-coded value of 5 seconds.	2024-05-02 15:19:51 +00:00
Alex Chi Z	f656db09a4	fix(pageserver): properly propagate missing key error for vectored get (#7569 ) Some part of the code requires missing key error to be propagated to the code path correctly (i.e., aux key range scan). Currently, it's an anyhow error. * remove `stuck_lsn` from the missing key error. * as a result, when matching missing key, we do not distinguish the case `stuck_lsn = false/true`. * vectored get now use the unified missing key error. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-02 09:19:45 -04:00
Anastasia Lubennikova	69bf1bae7d	Fix usage of pg_waldump --ignore option (#7578 ) Previously, the --ignore option was only used when reading from a single file. With this PR pg_waldump -i is enough to open any neon WAL segments	2024-05-02 11:52:30 +00:00
Anna Khanova	25af32e834	proxy: keep track on the number of events from redis by type. (#7582 ) ## Problem It's unclear what is the distribution of messages, proxy is consuming from redis. ## Summary of changes Add counter.	2024-05-02 09:50:11 +00:00
Conrad Ludgate	cb4b4750ba	update to reqwest 0.12 (#7561 ) ## Problem #7557 ## Summary of changes	2024-05-02 11:16:04 +02:00
Sasha Krassovsky	d43d77389e	Add retry loops and bump test timeout in test_pageserver_connection_stress (#7281 )	2024-05-01 21:36:50 -07:00
Alex Chi Z	5558457c84	chore(pageserver): categorize basebackup errors (#7523 ) close https://github.com/neondatabase/neon/issues/7391 ## Summary of changes Categorize basebackup error into two types: server error and client error. This makes it easier to set up alerts. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-01 16:31:59 +00:00
Alex Chi Z	26e6ff8ba6	chore(pageserver): concise error message for layer traversal (#7565 ) Instead of showing the full path of layer traversal, we now only show tenant (in tracing context)+timeline+filename. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-01 11:44:42 -04:00
Arthur Petukhovsky	50a45e67dc	Discover safekeepers via broker request (#7279 ) We had an incident where pageserver requests timed out because pageserver couldn't fetch WAL from safekeepers. This incident was caused by a bug in safekeeper logic for timeline activation, which prevented pageserver from finding safekeepers. This bug was since fixed, but there is still a chance of a similar bug in the future due to overall complexity. We add a new broker message to "signal interest" for timeline. This signal will be sent by pageservers `wait_lsn`, and safekeepers will receive this signal to start broadcasting broker messages. Then every broker subscriber will be able to find the safekeepers and connect to them (to start fetching WAL). This feature is not limited to pageservers and any service that wants to download WAL from safekeepers will be able to use this discovery request. This commit changes pageserver's connection_manager (walreceiver) to send a SafekeeperDiscoveryRequest when there is no information about safekeepers present in memory. Current implementation will send these requests only if there is an active wait_lsn() call and no more often than once per 10 seconds. Add `test_broker_discovery` to test this: safekeepers started with `--disable-periodic-broker-push` will not push info to broker so that pageserver must use a discovery to start fetching WAL. Add task_stats in safekeepers broker module to log a warning if there is no message received from the broker for the last 10 seconds. Closes #5471 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-04-30 18:50:03 +00:00
Andrew Rudenko	fcbe60f436	Makefile: DISABLE_HOMEBREW variable (#7556 ) ## Problem The current Makefile assumes that homebrew is used on macos. There are other ways to install dependencies on MacOS (nix, macports, "manually"). It would be great to allow the one who wants to use other options to disable homebrew integration. ## Summary of changes It adds DISABLE_HOMEBREW variable that if set skips extra homebrew-specific configuration steps.	2024-04-30 19:44:02 +02:00
John Spray	e018cac1f7	tests: tweak log allow list in test_sharding_split_failures (#7549 ) ## Problem This test became flaky recently with failures like: ``` AssertionError: Log errors on storage_controller: (129, '2024-04-29T16:41:03.591506Z ERROR request{method=PUT path=/control/v1/tenant/b38c0447fbdbcf4e1c023f00b0f7c221/shard_split request_id=34df4975-2ef3-4ed8-b167-2956650e365c}: Error processing HTTP request: InternalServerError(Reconcile error on shard b38c0447fbdbcf4e1c023f00b0f7c221-0002: Cancelled\n') ``` Likely due to #7508 changing how errors are reported from Reconcilers. ## Summary of changes - Tolerate `Reconcile error.*Cancelled` log errors	2024-04-30 18:00:24 +01:00
John Spray	a74b60066c	storage controller: test for large shard counts (#7475 ) ## Problem Storage controller was observed to have unexpectedly large memory consumption when loaded with many thousands of shards. This was recently fixed: - https://github.com/neondatabase/neon/pull/7493 ...but we need a general test that the controller is well behaved with thousands of shards. Closes: https://github.com/neondatabase/neon/issues/7460 Closes: https://github.com/neondatabase/neon/issues/7463 ## Summary of changes - Add test test_storage_controller_many_tenants to exercise the system's behaviour with a more substantial workload. This test measures memory consumption and reproduces #7460 before the other changes in this PR. - Tweak reconcile_all's return value to make it nonzero if it spawns no reconcilers, but _would_ have spawned some reconcilers if they weren't blocked by the reconcile concurrency limit. This makes the test's reconcile_until_idle behave as expected (i.e. not complete until the system is nice and calm). - Fix an issue where tenant migrations would leave a spurious secondary location when migrated to some location that was not already their secondary (this was an existing low-impact bug that tripped up the test's consistency checks). On the test with 8000 shards, the resident memory per shard is about 20KiB. This is not really per-shard memory: the primary source of memory growth is the number of concurrent network/db clients we create. With 8000 shards, the test takes 125s to run on my workstation.	2024-04-30 15:21:54 +00:00
Arseny Sher	3a2f10712a	Add more context to s3 listing error.	2024-04-30 18:19:52 +03:00
Arseny Sher	4ac4b21598	Add retries to cloud_admin client.	2024-04-30 18:19:52 +03:00
Arseny Sher	9f792f9c0b	Recheck tenant_id in find_timeline_branch. As it turns out we have at least one case of the same timeline_id in different projects.	2024-04-30 18:19:52 +03:00
Arseny Sher	7434674d86	Decrease CONSOLE_CONCURRENCY. Last run with 128 created too much load on cplane.	2024-04-30 18:19:52 +03:00
Arseny Sher	ea37234ccc	s3_scrubber: revive garbage collection for safekeepers. - pageserver_id in project details is now is optional, fix it - add active_timeline_count guard/stat similar to active_tenant_count - fix safekeeper prefix - count and log deleted keys	2024-04-30 18:19:52 +03:00
Arseny Sher	3da54e6d90	s3_scrubber: implement scan-metadata for safekeepers. It works by listing postgres table with memory dump of safekeepers state. s3 contents for each timeline are checked then against timeline_start_lsn and backup_lsn. If inconsistency is found, before complaining timeline (branch) is checked at control plane; it might have been deleted between the dump take and s3 check.	2024-04-30 18:19:52 +03:00
Arpad Müller	010f0a310a	Make test_random_updates and test_read_at_max_lsn compatible with new compaction (#7551 ) Makes two of the tests work with the tiered compaction that I had to ignore in #7283. The issue was that tiered compaction actually created image layers, but the keys didn't appear in them as `collect_keyspace` didn't include them. Not a compaction problem, but due to how the test is structured. Fixes #7287	2024-04-30 16:52:54 +02:00
John Spray	eb53345d48	pageserver: reduce runtime of init_tenant_mgr (#7553 ) ## Problem `init_tenant_mgr` blocks the rest of pageserver startup, including starting the admin API. This was noticeable in #7475 , where the init_tenant_mgr runtime could be long enough to trip the controller's 30 second heartbeat timeout. ## Summary of changes - When detaching tenants during startup, spawn the background deletes as background tasks instead of doing them inline - Write all configs before spawning any tenants, so that the config writes aren't fighting tenants for system resources - Write configs with some concurrency (16) rather than writing them all sequentially.	2024-04-30 15:16:15 +01:00
Alex Chi Z	45c625fb34	feat(pageserver): separate sparse and dense keyspace (#7503 ) extracted (and tested) from https://github.com/neondatabase/neon/pull/7468, part of https://github.com/neondatabase/neon/issues/7462. The current codebase assumes the keyspace is dense -- which means that if we have a keyspace of 0x00-0x100, we assume every key (e.g., 0x00, 0x01, 0x02, ...) exists in the storage engine. However, the assumption does not hold any more in metadata keyspace. The metadata keyspace is sparse. It is impossible to do per-key check. Ideally, we should not have the assumption of dense keyspace at all, but this would incur a lot of refactors. Therefore, we split the keyspaces we have to dense/sparse and handle them differently in the code for now. At some point in the future, we should assume all keyspaces are sparse. ## Summary of changes * Split collect_keyspace to return dense+sparse keyspace. * Do not allow generating image layers for sparse keyspace (for now -- will fix this next week, we need image layers anyways). * Generate delta layers for sparse keyspace. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-30 09:39:10 -04:00
Cihan Demirci	84b6b95783	docs: fix unintentional file link (#7506 ) Not sure if this should actually be a link pointing to the `persistence.rs` file but following the conventions of the rest of the file, change `persistence.rs` reference to simply be a file name mention.	2024-04-30 14:17:01 +01:00
John Spray	577982b778	pageserver: remove workarounds from #7454 (#7550 ) PR #7454 included a workaround that let any existing bugged databases start up. Having used that already, we may now Closes: https://github.com/neondatabase/neon/issues/7480	2024-04-30 11:04:54 +01:00
John Spray	574645412b	pageserver: shard-aware keyspace partitioning (#6778 ) ## Problem Followup to https://github.com/neondatabase/neon/pull/6776 While #6776 makes compaction safe on sharded tenants, the logic for keyspace partitioning remains inefficient: it assumes that the size of data on a pageserver can be calculated simply as the range between start and end of a Range -- this is not the case in sharded tenants, where data within a range belongs to a variety of shards. Closes: https://github.com/neondatabase/neon/issues/6774 ## Summary of changes I experimented with using a sharding-aware range type in KeySpace to replace all the Range<Key> uses, but the impact on other code was quite large (many places use the ranges), and not all of them need this property of being able to approximate the physical size of data within a key range. So I compromised on expressing this as a ShardedRange type, but only using that type selctively: during keyspace repartition, and in tiered compaction when accumulating key ranges. - keyspace partitioning methods take sharding parameters as an input - new `ShardedRange` type wraps a Range<Key> and a shard identity - ShardedRange::page_count is the shard-aware replacement for key_range_size - Callers that don't need to be shard-aware (e.g. vectored get code that just wants to count the number of keys in a keyspace) can use ShardedRange::raw_size to get the faster, shard-naive code (same as old `key_range_size`) - Compaction code is updated to carry a shard identity so that it can use shard aware calculations - Unit tests for the new fragmentation logic. - Add a test for compaction on sharded tenants, that validates that we generate appropriately sized image layers (this fails before fixing keyspace partitioning)	2024-04-29 17:46:46 +00:00
Alex Chi Z	11945e64ec	chore(pageserver): improve in-memory layer vectored get (#7467 ) previously in https://github.com/neondatabase/neon/pull/7375, we observed that for in-memory layers, we will need to iterate every key in the key space in order to get the result. The operation can be more efficient if we use BTreeMap as the in-memory layer representation, even if we are doing vectored get in a dense keyspace. Imagine a case that the in-memory layer covers a very little part of the keyspace, and most of the keys need to be found in lower layers. Using a BTreeMap can significantly reduce probes for nonexistent keys. ## Summary of changes * Use BTreeMap as in-memory layer representation. * Optimize the vectored get flow to utilize the range scan functionality of BTreeMap. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-29 17:16:42 +00:00
Arpad Müller	cddafc79e1	Update azure_* crates to 0.19 (#7539 ) Updates the four azure SDK crates used by remote_storage to 0.19.	2024-04-29 19:02:53 +02:00
Vlad Lazar	af7cca4949	pageserver: tweak vec get validation for ancestor lsn wait (#7533 ) ## Problem Sequential get runs after vectored get, so it is possible for the later to time out while waiting for its ancestor's Lsn to become ready and for the former to succeed (it essentially has a doubled wait time). ## Summary of Changes Relax the validation to allow for such rare cases.	2024-04-29 17:35:08 +01:00
Alex Chi Z	89cae64e38	chore(vm-image): specify sql exporter listen port (#7526 ) Extracted from https://github.com/neondatabase/neon/pull/7514, 9399 is the default port. We want to specify it b/c we will start a second sql exporter for autoscaling agent soon. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-29 12:33:01 -04:00
Vlad Lazar	1f417af9fd	pagserver: use vectored read path in benchmarks (#7498 ) ## Problem Benchmarks don't use the vectored read path. ## Summary of changes * Update the benchmarks to use the vectored read path for both singular and vectored gets. * Disable validation for the benchmarks	2024-04-29 17:26:35 +01:00
Anna Khanova	1684bbf162	proxy: Create disconnect events (#7535 ) ## Problem It's not possible to get the duration of the session from proxy events. ## Summary of changes * Added a separate events folder in s3, to record disconnect events. * Disconnect events are exactly the same as normal events, but also have `disconnect_timestamp` field not empty. * @oruen suggested to fill it with the same information as the original events to avoid potentially heavy joins.	2024-04-29 15:22:13 +02:00
Anna Khanova	90cadfa986	proxy: Adjust retry wake compute (#7537 ) ## Problem Right now we always do retry wake compute. ## Summary of changes Create a list of errors when we could avoid needless retries.	2024-04-29 12:26:21 +00:00
John Spray	2226acef7c	s3_scrubber: add `tenant-snapshot` (#7444 ) ## Problem Downloading tenant data for analysis/debug with `aws s3 cp` works well for small tenants, but for larger tenants it is unlikely that one ends up with an index that matches layer files, due to the time taken to download. ## Summary of changes - Add a `tenant-snapshot` command to the scrubber, which reads timeline indices and then downloads the layers referenced in the index, even if they were deleted. The result is a snapshot of the tenant's remote storage state that should be usable when imported (#7399 ).	2024-04-29 12:16:00 +00:00
Anna Khanova	24ce878039	proxy: Exclude compute and retries (#7529 ) ## Problem Alerts fire if the connection the compute is slow. ## Summary of changes Exclude compute and retry from latencies.	2024-04-29 11:49:42 +02:00
John Spray	84914434e3	storage controller: send startup compute notifications in background (#7495 ) ## Problem Previously, we try to send compute notifications in startup_reconcile before completing that function, with a time limit. Any notifications that don't happen within the time limit result in tenants having their `pending_compute_notification` flag set, which causes them to spawn a Reconciler next time the background reconciler loop runs. This causes two problems: - Spawning a lot of reconcilers after startup caused a spike in memory (this is addressed in https://github.com/neondatabase/neon/pull/7493) - After https://github.com/neondatabase/neon/pull/7493, spawning lots of reconcilers will block some other operations, e.g. a tenant creation might fail due to lack of reconciler semaphore units while the controller is busy running all the Reconcilers for its startup compute notifications. When the code was first written, ComputeHook didn't have internal ordering logic to ensure that notifications for a shard were sent in the right order. Since that was added in https://github.com/neondatabase/neon/pull/7088, we can use it to avoid waiting for notifications to complete in startup_reconcile. Related to: https://github.com/neondatabase/neon/issues/7460 ## Summary of changes - Add a `notify_background` method to ComputeHook. - Call this from startup_reconcile instead of doing notifications inline - Process completions from `notify_background` in `process_results`, and if a notification failed then set the `pending_compute_notification` flag on the shard. The result is that we will only spawn lots of Reconcilers if the compute notifications _fail_, not just because they take some significant amount of time. Test coverage for this case is in https://github.com/neondatabase/neon/pull/7475	2024-04-29 08:59:22 +00:00
John Spray	b655c7030f	neon_local: add "tenant import" (#7399 ) ## Problem Sometimes we have test data in the form of S3 contents that we would like to run live in a neon_local environment. ## Summary of changes - Add a storage controller API that imports an existing tenant. Currently this is equivalent to doing a create with a high generation number, but in future this would be something smarter to probe S3 to find the shards in a tenant and find generation numbers. - Add a `neon_local` command that invokes the import API, and then inspects timelines in the newly attached tenant to create matching branches.	2024-04-29 08:52:18 +01:00
Joonas Koivunen	3695a1efa1	metrics: record time to update gc info as a per timeline metric (#7473 ) We know that updating gc info can take a very long time from [recent incident], and holding `Tenant::gc_cs` affects many per-tenant operations in the system. We need a direct way to observe the time it takes. The solution is to add metrics so that we know when this happens: - 2 new per-timeline metric - 1 new global histogram Verified that the buckets are okay-ish in [dashboard]. In our current state, we will see a lot more of `Inf,` but that is probably okay; at least we can learn which timelines are having issues. Can we afford to add these metrics? A bit unclear, see [another dashboard] with top pageserver `/metrics` response sizes. [dashboard]: https://neonprod.grafana.net/d/b7a5a5e2-1276-4bb0-9e3a-b4528adb6eb6/storage-operations-histograms-in-prod?orgId=1&var-datasource=ZNX49CDVz&var-instance=All&var-operation=All&from=now-7d&to=now [another dashboard]: https://neonprod.grafana.net/d/MQx4SN-Vk/metric-sizes-on-prod-and-some-correlations?orgId=1 [recent incident]: https://neondb.slack.com/archives/C06UEMLK7FE/p1713817696580119?thread_ts=1713468604.508969&cid=C06UEMLK7FE	2024-04-29 07:14:53 +03:00
Alex Chi Z	75b4440d07	fix(virtual_file): compile warnings on macos (#7525 ) starting at commit `dbb0c967d5`, macOS reports warning for a few functions in the virtual file module. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-26 17:09:51 -04:00
Alex Chi Z	ee3437cbd8	chore(pageserver): shrink aux keyspace to 0x60-0x7F (#7502 ) extracted from https://github.com/neondatabase/neon/pull/7468, part of https://github.com/neondatabase/neon/issues/7462. In the page server, we use i128 (instead of u128) to do the integer representation of the key, which indicates that the highest bit of the key should not be 1. This constraints our keyspace to <= 0x7F. Also fix the bug of `to_i128` that dropped the highest 4b. Now we keep 3b of them, dropping the sign bit. And on that, we shrink the metadata keyspace to 0x60-0x7F for now, and once we add support for u128, we can have a larger metadata keyspace. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-26 13:35:01 -04:00
Alex Chi Z	dbe0aa653a	feat(pageserver): add aux-file-v2 flag on tenant level (#7505 ) Changing metadata format is not easy. This pull request adds a tenant-level flag on whether to enable aux file v2. As long as we don't roll this out to the user and guarantee our staging projects can persist tenant config correctly, we can test the aux file v2 change with setting this flag. Previous discussion at https://github.com/neondatabase/neon/pull/7424. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-26 11:48:47 -04:00
Arpad Müller	39427925c2	Return Past instead of Present or Future when commit_lsn < min_lsn (#7520 ) Implements an approach different from the one #7488 chose: We now return `past` instead of `present` (or`future`) when encountering the edge case where commit_lsn < min_lsn. In my opinion, both `past` and `present` are correct responses, but past is slightly better as the lsn returned by `present` with #7488 is one too "new". In practice, this shouldn't matter much, but shrug. We agreed in slack that this is the better approach: https://neondb.slack.com/archives/C03F5SM1N02/p1713871064147029	2024-04-26 16:23:25 +02:00
Vlad Lazar	af43f78561	pageserver: fix image layer creation check that inhibited compaction (#7420 ) ## Problem PR #7230 attempted to introduce a WAL ingest threshold for checking whether enough deltas are stacked to warrant creating a new image layer. However, this check was incorrectly performed at the compaction partition level instead of the timeline level. Hence, it inhibited GC for any keys outside of the first partition. ## Summary of Changes Hoist the check up to the timeline level.	2024-04-26 14:53:05 +01:00
Christian Schwarz	ed57772793	perf!: use larger buffers for blob_io and ephemeral_file (#7485 ) part of https://github.com/neondatabase/neon/issues/7124 # Problem (Re-stating the problem from #7124 for posterity) The `test_bulk_ingest` benchmark shows about 2x lower throughput with `tokio-epoll-uring` compared to `std-fs`. That's why we temporarily disabled it in #7238. The reason for this regression is that the benchmark runs on a system without memory pressure and thus std-fs writes don't block on disk IO but only copy the data into the kernel page cache. `tokio-epoll-uring` cannot beat that at this time, and possibly never. (However, under memory pressure, std-fs would stall the executor thread on kernel page cache writeback disk IO. That's why we want to use `tokio-epoll-uring`. And we likely want to use O_DIRECT in the future, at which point std-fs becomes an absolute show-stopper.) More elaborate analysis: https://neondatabase.notion.site/Why-test_bulk_ingest-is-slower-with-tokio-epoll-uring-918c5e619df045a7bd7b5f806cfbd53f?pvs=4 # Changes This PR increases the buffer size of `blob_io` and `EphemeralFile` from PAGE_SZ=8k to 64k. Longer-term, we probably want to do double-buffering / pipelined IO. # Resource Usage We currently do not flush the buffer when freezing the InMemoryLayer. That means a single Timeline can have multiple 64k buffers alive, esp if flushing is slow. This poses an OOM risk. We should either bound the number of frozen layers (https://github.com/neondatabase/neon/issues/7317). Or we should change the freezing code to flush the buffer and drop the allocation. However, that's future work. # Performance (Measurements done on i3en.3xlarge.) The `test_bulk_insert.py` is too noisy, even with instance storage. It varies by 30-40%. I suspect that's due to compaction. Raising amount of data by 10x doesn't help with the noisiness.) So, I used the `bench_ingest` from @jcsp 's #7409 . Specifically, the `ingest-small-values/ingest 128MB/100b seq` and `ingest-small-values/ingest 128MB/100b seq, no delta` benchmarks. \| \| \| seq \| seq, no delta \| \|-----\|-------------------\|-----\|---------------\| \| 8k \| std-fs \| 55 \| 165 \| \| 8k \| tokio-epoll-uring \| 37 \| 107 \| \| 64k \| std-fs \| 55 \| 180 \| \| 64k \| tokio-epoll-uring \| 48 \| 164 \| The `8k` is from before this PR, the `64k` is with this PR. The values are the throughput reported by the benchmark (MiB/s). We see that this PR gets `tokio-epoll-uring` from 67% to 87% of `std-fs` performance in the `seq` benchmark. Notably, `seq` appears to hit some other bottleneck at `55 MiB/s`. CC'ing #7418 due to the apparent bottlenecks in writing delta layers. For `seq, no delta`, this PR gets `tokio-epoll-uring` from 64% to 91% of `std-fs` performance.	2024-04-26 11:34:28 +00:00
John Spray	f1de18f1c9	Remove unused import (#7519 ) Linter error from a merge collision	2024-04-26 11:15:05 +00:00
Christian Schwarz	dbb0c967d5	refactor(ephemeral_file): reuse owned_buffers_io::BufferedWriter (#7484 ) part of https://github.com/neondatabase/neon/issues/7124 Changes ------- This PR replaces the `EphemeralFile::write_blob`-specifc `struct Writer` with re-use of `owned_buffers_io::write::BufferedWriter`. Further, it restructures the code to cleanly separate * the high-level aspect of EphemeralFile's write_blob / read_blk API * the page-caching aspect * the aspect of IO * performing buffered write IO to an underlying VirtualFile * serving reads from either the VirtualFile or the buffer if it hasn't been flushed yet * the annoying "feature" that reads past the end of the written range are allowed and expected to return zeroed memory, as long as one remains within one PAGE_SZ	2024-04-26 13:01:26 +02:00
Christian Schwarz	bf369f4268	refactor(owned_buffer_io::util::size_tracking_writer): make generic over underlying writer (#7483 ) part of https://github.com/neondatabase/neon/issues/7124	2024-04-26 09:19:41 +00:00
Christian Schwarz	70f4a16a05	refactor(owned_buffers_io::BufferedWriter): be generic over the type of buffer (#7482 )	2024-04-26 08:30:20 +00:00
John Spray	d63185fa6c	storage controller: log hygiene & better error type (#7508 ) These are testability/logging improvements spun off from #7475 - Don't log warnings for shutdown errors in compute hook - Revise logging around heartbeats and reconcile_all so that we aren't emitting such a large volume of INFO messages under normal quite conditions. - Clean up the `last_error` of TenantShard to hold a ReconcileError instead of a String, and use that properly typed error to suppress reconciler cancel errors during reconcile_all_now. This is important for tests that iteratively call that, as otherwise they would get 500 errors when some reconciler in flight was cancelled (perhaps due to a state change on the tenant shard starting a new reconciler).	2024-04-26 08:15:59 +00:00
Heikki Linnakangas	ca8fca0e9f	Add test to demonstrate the problem with protocol version 1 (#7377 )	2024-04-25 20:45:37 +03:00
Heikki Linnakangas	0397427dcf	Add test for SLRU download (#7377 ) Before PR #7377, on-demand SLRU download always used the basebackup's LSN in the SLRU download, but that LSN might get garbage-collected away in the pageserver. We should request the latest LSN, like with GetPage requests, with the LSN just indicating that we know that the page hasn't been changed since the LSN (since the basebackup in this case). Add test to demonstrate the problem. Without the fix, it fails with "tried to request a page version that was garbage collected" error from the pageserver. I wrote this test as part of earlier PR #6693, but that fell through the cracks and was never applied. PR #7377 superseded the fix from that older PR, but the test is still valid.	2024-04-25 20:45:37 +03:00
Heikki Linnakangas	a2a44ea213	Refactor how the request LSNs are tracked in compute (#7377 ) Instead of thinking in terms of 'latest' and 'lsn' of the request, each request has two LSNs: the request LSN and 'not_modified_since' LSN. The request is nominally made at the request LSN, that determines what page version we want to see. But as a hint, we also include 'not_modified_since'. It tells the pageserver that the page has not been modified since that LSN, which allows the pageserver to skip waiting for newer WAL to arrive, and could allow more optimizations in the future. Refactor the internal functions to calculate the request LSN to calculate both LSNs. Sending two LSNs to the pageserver requires using the new protocol version 2. The previous commit added the server support for it, but we still default to the old protocol for compatibility with old pageservers. The 'neon.protocol_version' GUC can be used to use the new protocol. The new protocol addresses one cause of issue #6211, although you can still get the same error if you have a standby that is lagging behind so that the page version it needs is genuinely GC'd away.	2024-04-25 20:45:37 +03:00
Heikki Linnakangas	4917f52c88	Server support for new pagestream protocol version (#7377 ) In the old protocol version, the client sent with each request: - latest: bool. If true, the client requested the latest page version, and the 'lsn' was just a hint of when the page was last modified - lsn: Lsn, the page version to return This protocol didn't allow requesting a page at a particular non-latest LSN and also sending a hint on when the page was last modified. That put a read only compute into an awkward position where it had to either request each page at the replay-LSN, which could be very close to the last LSN written in the primary and therefore require the pageserver to wait for it to arrive, or an older LSN which could already be garbage collected in the pageserver, resulting in an error. The new protocol version fixes that by allowing a read only compute to send both LSNs. To use the new protocol version, use "pagestream_v2" command instead of just "pagestream". The old protocol version is still supported, for compatibility with old computes (and in fact there is no client support yet, it is added by the next commit).	2024-04-25 20:45:37 +03:00
Heikki Linnakangas	04a682021f	Remove the now-unused 'latest' arguments (#7377 ) The 'latest' argument was passed to the functions in pgdatadir_mapping.rs to know when they can update the relsize cache. Commit `e69ff3fc00` changed how the relsize cache is updated, making the 'latest' argument unused.	2024-04-25 20:45:37 +03:00
Alex Chi Z	c59abedd85	chore(pageserver): temporary metrics on ingestion time (#7515 ) As a follow-up on https://github.com/neondatabase/neon/pull/7467, also measure the ingestion operation speed. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-25 16:39:27 +00:00
Anna Khanova	5357f40183	proxy: Workaround switch to the regional redis (#7513 ) ## Problem Start switching from the global redis to the regional one ## Summary of changes * Publish cancellations to the regional redis * Listen notifications from both: global and regional	2024-04-25 15:26:18 +00:00
Vlad Lazar	e4a279db13	pageserver: coalesce read paths (#7477 ) ## Problem We are currently supporting two read paths. No bueno. ## Summary of changes High level: use vectored read path to serve get page requests - gated by `get_impl` config Low level: 1. Add ps config, `get_impl` to specify which read path to use when serving get page requests 2. Fix base cached image handling for the vectored read path. This was subtly broken: previously we would not mark keys that went past their cached lsn as complete. This is a self standing change which could be its own PR, but I've included it here because writing separate tests for it is tricky. 3. Fork get page to use either the legacy or vectored implementation 4. Validate the use of vectored read path when serving get page requests against the legacy implementation. Controlled by `validate_vectored_get` ps config. 5. Use the vectored read path to serve get page requests in tests (with validation). ## Note Since the vectored read path does not go through the page cache to read buffers, this change also amounts to a removal of the buffer page cache. Materialized page cache is still used.	2024-04-25 13:29:17 +01:00
Anna Khanova	b1d47f3911	proxy: Fix cancellations (#7510 ) ## Problem Cancellations were published to the channel, that was never read. ## Summary of changes Fallback to global redis publishing.	2024-04-25 11:38:51 +00:00
Anna Khanova	a3d62b31bb	Update connect to compute and wake compute retry configs (#7509 ) ## Problem ## Summary of changes Decrease waiting time	2024-04-25 11:16:27 +00:00
Conrad Ludgate	cdccab4bd9	reduce complexity of proxy protocol parse (#7078 ) ## Problem The `WithClientIp` AsyncRead/Write abstraction never filled me with much joy. I would just rather read the protocol header once and then get the remaining buf and reader. ## Summary of changes * Replace `WithClientIp::wait_for_addr` with `read_proxy_protocol`. * Replace `WithClientIp` with `ChainRW`. * Optimise `ChainRW` to make the standard path more optimal.	2024-04-25 11:14:04 +01:00
John Spray	e8814b6f81	controller: limit Reconciler concurrency (#7493 ) ## Problem Storage controller memory can spike very high if we have many tenants and they all try to reconcile at the same time. Related: - https://github.com/neondatabase/neon/issues/7463 - https://github.com/neondatabase/neon/issues/7460 Not closing those issues in this PR, because the test coverage for them will be in https://github.com/neondatabase/neon/pull/7475 ## Summary of changes - Add a CLI arg `--reconciler-concurrency`, defaulted to 128 - Add a semaphore to Service with this many units - In `maybe_reconcile_shard`, try to acquire semaphore unit. If we can't get one, return a ReconcileWaiter for a future sequence number, and push the TenantShardId onto a channel of delayed IDs. - In `process_result`, consume from the channel of delayed IDs if there are semaphore units available and call maybe_reconcile_shard again for these delayed shards. This has been tested in https://github.com/neondatabase/neon/pull/7475, but will land that PR separately because it contains other changes & needs the test stabilizing. This change is worth merging sooner, because it fixes a practical issue with larger shard counts.	2024-04-25 10:46:07 +01:00
Arpad Müller	c18d3340b5	Ability to specify the upload_storage_class in S3 bucket configuration (#7461 ) Currently we move data to the intended storage class via lifecycle rules, but those are a daily batch job so data first spends up to a day in standard storage. Therefore, make it possible to specify the storage class used for uploads to S3 so that the data doesn't have to be migrated automatically. The advantage of this is that it gives cleaner billing reports. Part of https://github.com/neondatabase/cloud/issues/11348	2024-04-24 18:48:25 +02:00
Alex Chi Z	447a063f3c	fix(metrics): correct maxrss metrics on macos (#7487 ) macOS max_rss is in bytes, while Linux is in kilobytes. https://stackoverflow.com/a/59915669 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-24 15:09:23 +00:00
Vlad Lazar	c12861cccd	pageserver: finish vectored get early (#7490 ) ## Problem If the previous step of the vectored left no further keyspace to investigate (i.e. keyspace remains empty after removing keys completed in the previous step), then we'd still grab the layers lock, potentially add an in-mem layer to the fringe and at some further point read its index without reading any values from it. ## Summary of changes If there's nothing left in the current keyspace, then skip the search and just select the next item from the fringe as usual. When running `test_pg_regress[release-pg16]` with the vectored read path for singular gets this improved perf drastically (see PR cover letter). ## Correctness Since no keys remained from the previous range (i.e. we are on a leaf node) there's nothing that search can find in deeper nodes.	2024-04-24 15:36:23 +01:00
Vlad Lazar	2a3a8ee31d	pageserver: publish the same metrics from both read paths (#7486 ) ## Problem Vectored and non-vectored read paths don't publish the same set of metrics. Metrics parity is needed for coalescing the read paths. ## Summary of changes * Publish reconstruct time and fetching data for reconstruct time from the vectored read path * Remove pageserver_getpage_reconstruct_seconds{res="err"} - wasn't used anyway	2024-04-24 13:52:46 +00:00
Anna Khanova	5dda371c2b	Fix a bug with retries (#7494 ) ## Problem ## Summary of changes By default, it's 5s retry.	2024-04-24 14:13:18 +01:00
Joonas Koivunen	a60035b23a	fix: avoid starving background task permits in eviction task (#7471 ) As seen with a recent incident, eviction tasks can cause pageserver-wide permit starvation on the background task semaphore when synthetic size calculation takes a long time for a tenant that has more than our permit number of timelines or multiple tenants that have slow synthetic size and total number of timelines exceeds the permits. Metric links can be found in the internal [slack thread]. As a solution, release the permit while waiting for the state guarding the synthetic size calculation. This will most likely hurt the eviction task eviction performance, but that does not matter because we are hoping to get away from it using OnlyImitiate policy anyway and rely solely on disk usage-based eviction. [slack thread]: https://neondb.slack.com/archives/C06UEMLK7FE/p1713810505587809?thread_ts=1713468604.508969&cid=C06UEMLK7FE	2024-04-24 11:38:59 +03:00
Arpad Müller	18fd73d84a	get_lsn_by_timestamp: clamp commit_lsn to be >= min_lsn (#7488 ) There was an edge case where `get_lsn_by_timestamp`/`find_lsn_for_timestamp` could have returned an lsn that is before the limits we enforce: when we did find SLRU entries with timestamps before the one we search for. The API contract of `get_lsn_by_timestamp` is to not return something before the anchestor lsn. cc https://neondb.slack.com/archives/C03F5SM1N02/p1713871064147029	2024-04-24 00:46:48 +02:00
John Spray	ee9ec26808	pageserver: change pitr_interval=0 behavior (#7423 ) ## Problem We already made a change in #6407 to make pitr_interval authoritative for synthetic size calculations (do not charge users for data retained due to gc_horizon), but that change didn't cover the case where someone entirely disables time-based retention by setting pitr_interval=0 Relates to: https://github.com/neondatabase/neon/issues/6374 ## Summary of changes When pitr_interval is zero, do not set `pitr_cutoff` based on gc_horizon. gc_horizon is still enforced, but separately (its value is passed separately, there was never a need to claim pitr_cutoff to gc_horizon) ## More detail ### Issue 1 Before this PR, we would skip the update_gc_info for timelines with last_record_lsn() < gc_horizon. Let's call such timelines "tiny". The rationale for that presumably was that we can't GC anything in the tiny timelines, why bother to call update_gc_info(). However, synthetic size calculation relies on up-to-date update_gc_info() data. Before this PR, tiny timelines would never get an updated GcInfo::pitr_horizon (it remained Lsn(0)). Even on projects with pitr_interval=0d. With this PR, update_gc_info is always called, hence GcInfo::pitr_horizon is always updated, thereby providing synthetic size calculation with up-to-data data. ### Issue 2 Before this PR, regardless of whether the timeline is "tiny" or not, GcInfo::pitr_horizon was clamped to at least last_record_lsn - gc_horizon, even if the pitr window in terms of LSN range was shorter (=less than) the gc_horizon. With this PR, that clamping is removed, so, for pitr_interval=0, the pitr_horizon = last_record_lsn.	2024-04-23 17:16:17 +01:00
John Spray	e22c072064	remote_storage: fix prefix handling in remote storage & clean up (#7431 ) ## Problem Split off from https://github.com/neondatabase/neon/pull/7399, which is the first piece of code that does a WithDelimiter object listing using a prefix that isn't a full directory name. ## Summary of changes - Revise list function to not append a `/` to the prefix -- prefixes don't have to end with a slash. - Fix local_fs implementation of list to not assume that WithDelimiter case will always use a directory as a prerfix. - Remove `list_files`, `list_prefixes` wrappers, as they add little value and obscure the underlying list function -- we need callers to understand the semantics of what they're really calling (listobjectsv2)	2024-04-23 16:24:51 +01:00
Alex Chi Z	89f023e6b0	feat(pageserver): add metadata key range and aux key encoding (#7401 ) Extracted from https://github.com/neondatabase/neon/pull/7375. We assume everything >= 0x80 are metadata keys. AUX file keys are part of the metadata keys, and we use `0x90` as the prefix for AUX file keys. The AUX file encoding is described in the code comment. We use xxhash128 as the hash algorithm. It seems to be portable according to the introduction, > xxHash is an Extremely fast Hash algorithm, processing at RAM speed limits. Code is highly portable, and produces hashes identical across all platforms (little / big endian). ...though whether the Rust version follows the same convention is unknown and might need manual review of the library. Anyways, we can always change the hash algorithm before rolling it out in staging/end-user, and I made a quick decision to use xxhash here because it generates 128b hash + portable. We can save the discussion of which hash algorithm to use later. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-23 15:16:04 +00:00
John Spray	8426fb886b	storage_controller: wait for db on startup (#7479 ) ## Problem In some dev/test environments, there aren't health checks to guarantee the database is available before starting the controller. This creates friction for the developer. ## Summary of changes - Wait up to 5 seconds for the database to become available on startup	2024-04-23 14:20:12 +01:00
Vlad Lazar	28e7fa98c4	pageserver: add read depth metrics and test (#7464 ) ## Problem We recently went through an incident where compaction was inhibited by a bug. We didn't observe this until quite late because we did not have alerting on deep reads. ## Summary of changes + Tweak an existing metric that tracks the depth of a read on the non-vectored read path: * Give it a better name * Track all layers * Larger buckets + Add a similar metric for the vectored read path + Add a compaction smoke test which uses these metrics. This test would have caught the compaction issue mentioned earlier. Related https://github.com/neondatabase/neon/issues/7428	2024-04-23 14:05:02 +01:00
Vlad Lazar	a9fda8c832	pageserver: fix vectored read aux key handling (#7404 ) ## Problem Vectored get would descend into ancestor timelines for aux files. This is not the behaviour of the legacy read path and blocks cutting over to the vectored read path. Fixes https://github.com/neondatabase/neon/issues/7379 ## Summary of Changes Treat non inherited keys specially in vectored get. At the point when we want to descend into the ancestor mark all pending non inherited keys as errored out at the key level. Note that this diverges from the standard vectored get behaviour for missing keys which is a top level error. This divergence is required to avoid blocking compaction in case such an error is encountered when compaction aux files keys. I'm pretty sure the bug I just described predates the vectored get implementation, but it's still worth fixing.	2024-04-23 14:03:33 +01:00
Arpad Müller	fa12d60237	Don't pass tenant_id in location_config requests from storage controller (#7476 ) Tested this locally via a simple patch, the `tenant_id` is now gone from the json. Follow-up of #7055, prerequisite for #7469.	2024-04-23 11:42:58 +00:00
Vlad Lazar	d551bfee09	pageserver: remove import/export script previously used for breaking format changes (#7458 ) ## Problem The `export_import_between_pageservers` script us to do major storage format changes in the past. If we have to do such breaking changes in the future this approach wouldn't be suitable because: 1. It doesn't scale to the current size of the fleet 2. It loses history ## Summary of changes Remove the script and its associated test. Keep `fullbasebackup` and friends because it's useful for debugging. Closes https://github.com/neondatabase/cloud/issues/11648	2024-04-23 11:36:56 +01:00
Heikki Linnakangas	e69ff3fc00	Refactor updating relation size cache on reads (#7376 ) Instead of trusting that a request with latest == true means that the request LSN was at least last_record_lsn, remember explicitly when the relation cache was initialized. Incidentally, this allows updating the relation size cache also on reads from read-only endpoints, when the endpoint is at a relatively recent LSN (more recent than the end of the timeline when the timeline was loaded in the pageserver). Add a comment to wait_or_get_last_lsn() that it might be better to use an older LSN when possible. Note that doing that would be unsafe, without the relation cache changes in this commit!	2024-04-22 19:40:08 +03:00
Alex Chi Z	25d9dc6eaf	chore(pageserver): separate missing key error (#7393 ) As part of https://github.com/neondatabase/neon/pull/7375 and to improve the current vectored get implementation, we separate the missing key error out. This also saves us several Box allocations in the get page implementation. ## Summary of changes * Create a caching field of layer traversal id for each of the layer. * Remove box allocations for layer traversal id retrieval and implement MissingKey error message as before. This should be a little bit faster. * Do not format error message until `Display`. * For in-mem layer, the descriptor is different before/after frozen. I'm using once lock for that. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-22 10:40:35 -04:00
Christian Schwarz	139d1346d5	pagectl draw-timeline-dir: include layer file name as an SVG comment (#7455 ) fixes https://github.com/neondatabase/neon/issues/7452 Also, drive-by improve the usage instructions with commands I found useful during that incident. The patch in the fork of `svg_fmt` is [being upstreamed](https://github.com/nical/rust_debug/pull/4), but, in the meantime, let's commit what we have because it was useful during the incident.	2024-04-22 12:55:17 +00:00
John Spray	0bd16182f7	pageserver: fix unlogged relations with sharding (#7454 ) ## Problem - #7451 INIT_FORKNUM blocks must be stored on shard 0 to enable including them in basebackup. This issue can be missed in simple tests because creating an unlogged table isn't sufficient -- to repro I had to create an _index_ on an unlogged table (then restart the endpoint). Closes: #7451 ## Summary of changes - Add a reproducer for the issue. - Tweak the condition for `key_is_shard0` to include anything that isn't a normal relation block _and_ any normal relation block whose forknum is INIT_FORKNUM. - To enable existing databases to recover from the issue, add a special case that omits relations if they were stored on the wrong INITFORK. This enables postgres to start and the user to drop the table and recreate it.	2024-04-22 11:47:24 +00:00
Anna Khanova	6a5650d40c	proxy: Make retries configurable and record it. (#7438 ) ## Problem Currently we cannot configure retries, also, we don't really have visibility of what's going on there. ## Summary of changes * Added cli params * Improved logging * Decrease the number of retries: it feels like most of retries doesn't help. Once there would be better errors handling, we can increase it back.	2024-04-22 11:37:22 +00:00
Joonas Koivunen	47addc15f1	relaxation: allow using layers across timelines (#7453 ) Before, we asserted that a layer would only be loaded by the timeline that initially created it. Now, with the ancestor detach, we will want to utilize remote copy as much as possible, so we will need to open other timeline layers as our own. Cc: #6994	2024-04-22 13:04:37 +03:00
Joonas Koivunen	b91c58a8bf	refactor(Timeline): simpler metadata updates (#7422 ) Currently, any `Timeline::schedule_uploads` will generate a fresh `TimelineMetadata` instead of updating the values, which it means to update. This makes it impossible for #6994 to work while `Timeline` receives layer flushes by overwriting any configured new `ancestor_timeline_id` and possible `ancestor_lsn`. The solution is to only make full `TimelineMetadata` "updates" from one place: branching. At runtime, update only the three fields, same as before in `Timeline::schedule_updates`.	2024-04-22 11:57:14 +03:00
Heikki Linnakangas	00d9c2d9a8	Make another walcraft test more robust (#7439 ) There were two issues with the test at page boundaries: 1. If the first logical message with 10 bytes payload crossed a page boundary, the calculated 'base_size' was too large because it included the page header. 2. If it was inserted near the end of a page so that there was not enough room for another one, we did "remaining_lsn += XLOG_BLCKSZ" but that didn't take into account the page headers either. As a result, the test would fail if the WAL insert position at the beginning of the test was too close to the end of a WAL page. Fix the calculations by repeating the 10-byte logical message if the starting position is not suitable. I bumped into this with PR #7377; it changed the arguments of a few SQL functions in neon_test_utils extension, which changed the WAL positions slightly, and caused a test failure. This is similar to https://github.com/neondatabase/neon/pull/7436, but for different test.	2024-04-22 10:58:28 +03:00
Heikki Linnakangas	3a673dce67	Make test less sensitive to exact WAL positions (#7436 ) As noted in the comment, the craft_internal() function fails if the inserted WAL happens to land at page boundary. I bumped into that with PR #7377; it changed the arguments of a few SQL functions in neon_test_utils extension, which changed the WAL positions slightly, and caused a test failure.	2024-04-22 10:58:10 +03:00
Em Sharnoff	35e9fb360b	Bump vm-builder v0.23.2 -> v0.28.1 (#7433 ) Only one relevant change, from v0.28.0: - neondatabase/autoscaling#887 Double-checked with `git log neonvm/tools/vm-builder`.	2024-04-21 17:35:01 -07:00
Heikki Linnakangas	0d21187322	update rustls ## Problem `cargo deny check` is complaining about our rustls versions, causing CI to fail: ``` error[vulnerability]: `rustls::ConnectionCommon::complete_io` could fall into an infinite loop based on network input ┌─ /__w/neon/neon/Cargo.lock:395:1 │ 395 │ rustls 0.21.9 registry+https://github.com/rust-lang/crates.io-index │ ------------------------------------------------------------------- security vulnerability detected │ = ID: RUSTSEC-2024-0336 = Advisory: https://rustsec.org/advisories/RUSTSEC-2024-0336 = If a `close_notify` alert is received during a handshake, `complete_io` does not terminate. Callers which do not call `complete_io` are not affected. `rustls-tokio` and `rustls-ffi` do not call `complete_io` and are not affected. `rustls::Stream` and `rustls::StreamOwned` types use `complete_io` and are affected. = Announcement: https://github.com/rustls/rustls/security/advisories/GHSA-6g7w-8wpp-frhj = Solution: Upgrade to >=0.23.5 OR >=0.22.4, <0.23.0 OR >=0.21.11, <0.22.0 (try `cargo update -p rustls`) error[vulnerability]: `rustls::ConnectionCommon::complete_io` could fall into an infinite loop based on network input ┌─ /__w/neon/neon/Cargo.lock:396:1 │ 396 │ rustls 0.22.2 registry+https://github.com/rust-lang/crates.io-index │ ------------------------------------------------------------------- security vulnerability detected │ = ID: RUSTSEC-2024-0336 = Advisory: https://rustsec.org/advisories/RUSTSEC-2024-0336 = If a `close_notify` alert is received during a handshake, `complete_io` does not terminate. Callers which do not call `complete_io` are not affected. `rustls-tokio` and `rustls-ffi` do not call `complete_io` and are not affected. `rustls::Stream` and `rustls::StreamOwned` types use `complete_io` and are affected. = Announcement: https://github.com/rustls/rustls/security/advisories/GHSA-6g7w-8wpp-frhj = Solution: Upgrade to >=0.23.5 OR >=0.22.4, <0.23.0 OR >=0.21.11, <0.22.0 (try `cargo update -p rustls`) ``` ## Summary of changes `cargo update -p rustls@0.21.9 -p rustls@0.22.2`	2024-04-21 21:10:05 +01:00
Alexander Bayandin	e8a98adcd0	CI: downgrade docker/setup-buildx-action to v2 - Cleanup part for `docker/setup-buildx-action` started to fail with the following error (for no obvious reason): ``` /nvme/actions-runner/_work/_actions/docker/setup-buildx-action/v3/webpack:/docker-setup-buildx/node_modules/@actions/cache/lib/cache.js:175 throw new Error(`Path Validation Error: Path(s) specified in the action for caching do(es) not exist, hence no cache is being saved.`); ^ Error: Path Validation Error: Path(s) specified in the action for caching do(es) not exist, hence no cache is being saved. at Object.rejected (/nvme/actions-runner/_work/_actions/docker/setup-buildx-action/v3/webpack:/docker-setup-buildx/node_modules/@actions/cache/lib/cache.js:175:1) at Generator.next (<anonymous>) at fulfilled (/nvme/actions-runner/_work/_actions/docker/setup-buildx-action/v3/webpack:/docker-setup-buildx/node_modules/@actions/cache/lib/cache.js:29:1) ``` - Downgrade `docker/setup-buildx-action` from v3 to v2	2024-04-21 21:10:05 +01:00
John Spray	98be8b9430	storcon_cli: `tenant-warmup` command (#7432 ) ## Problem When we migrate a large existing tenant, we would like to be able to ensure it has pre-loaded layers onto a pageserver managed by the storage controller. ## Summary of changes - Add `storcon_cli tenant-warmup`, which configures the tenant into PlacementPolicy::Secondary (unless it's already attached), and then polls the secondary download API reporting progress. - Extend a test case to check that when onboarding with a secondary location pre-created, we properly use that location for our first attachment.	2024-04-19 12:32:58 +01:00
Vlad Lazar	6eb946e2de	pageserver: fix cont lsn jump on vectored read path (#7412 ) ## Problem Vectored read path may return an image that's newer than the request lsn under certain circumstances. ``` LSN ^ \| \| 500 \| ------------------------- -> branch point 400 \| X 300 \| X 200 \| ------------------------------------> requested lsn 100 \| X \|---------------------------------> Key Legend: * X - page images ``` The vectored read path inspects each ancestor timeline one by one starting from the current one. When moving into the ancestor timeline, the current code resets the current search lsn (called `cont_lsn` in code) to the lsn of the ancestor timeline ([here](`d5708e7435/pageserver/src/tenant/timeline.rs (L2971)`)). For instance, if the request lsn was 200, we would: 1. Look into the current timeline and find nothing for the key 2. Descend into the ancestor timeline and set `cont_lsn=500` 3. Return the page image at LSN 400 Myself and Christian find it very unlikely for this to have happened in prod since the vectored read path is always used at the last record lsn. This issue was found by a regress test during the work to migrate get page handling to use the vectored implementation. I've applied my fix to that wip branch and it fixed the issue. ## Summary of changes The fix is to set the current search lsn to the min between the requested LSN and the ancestor lsn. Hence, at step 2 above we would set the current search lsn to 200 and ignore the images above that. A test illustrating the bug is also included. Fails without the patch and passes with it.	2024-04-18 18:40:30 +01:00
dependabot[bot]	681a04d287	build(deps): bump aiohttp from 3.9.2 to 3.9.4 (#7429 )	2024-04-18 16:47:34 +00:00
Joonas Koivunen	3df67bf4d7	fix(Layer): metric regression with too many canceled evictions (#7363 ) #7030 introduced an annoying papercut, deeming a failure to acquire a strong reference to `LayerInner` from `DownloadedLayer::drop` as a canceled eviction. Most of the time, it wasn't that, but just timeline deletion or tenant detach with the layer not wanting to be deleted or evicted. When a Layer is dropped as part of a normal shutdown, the `Layer` is dropped first, and the `DownloadedLayer` the second. Because of this, we cannot detect eviction being canceled from the `DownloadedLayer::drop`. We can detect it from `LayerInner::drop`, which this PR adds. Test case is added which before had 1 started eviction, 2 canceled. Now it accurately finds 1 started, 1 canceled.	2024-04-18 15:27:58 +00:00
John Spray	0d8e68003a	Add a docs page for storage controller (#7392 ) ## Problem External contributors need information on how to use the storage controller. ## Summary of changes - Background content on what the storage controller is. - Deployment information on how to use it. This is not super-detailed, but should be enough for a well motivated third party to get started, with an occasional peek at the code.	2024-04-18 13:45:25 +00:00
John Spray	637ad4a638	pageserver: fix secondary download scheduling (#7396 ) ## Problem Some tenants were observed to stop doing downloads after some time ## Summary of changes - Fix a rogue `<` that was incorrectly scheduling work when `now` was _before_ the scheduling target, rather than after. This usually resulted in too-frequent execution, but could also result in never executing, if the current time has advanced ahead of `next_download` at the time we call `schedule()`. - Fix in-memory list of timelines not being amended after timeline deletion: the resulted in repeated harmless logs about the timeline being removed, and redundant calls to remove_dir_all for the timeline path. - Add a log at startup to make it easier to see a particular tenant starting in secondary mode (this is for parity with the logging that exists when spawning an attached tenant). Previously searching on tenant ID didn't provide a clear signal as to how the tenant was started during pageserver start. - Add a test that exercises secondary downloads using the background scheduling, whereas existing tests were using the API hook to invoke download directly.	2024-04-18 13:16:03 +01:00
Joonas Koivunen	8d0f701767	feat: copy delta layer prefix or "truncate" (#7228 ) For "timeline ancestor merge" or "timeline detach," we need to "cut" delta layers at particular LSN. The name "truncate" is not used as it would imply that a layer file changes, instead of what happens: we copy keys with Lsn less than a "cut point". Cc: #6994 Add the "copy delta layer prefix" operation to DeltaLayerInner, re-using some of the vectored read internals. The code is `cfg(test)` until it will be used later with a more complete integration test.	2024-04-18 10:43:04 +03:00
Anna Khanova	5191f6ef0e	proxy: Record only valid rejected events (#7415 ) ## Problem Sometimes rejected metric might record invalid events. ## Summary of changes * Only record it `rejected` was explicitly set. * Change order in logs. * Report metrics if not under high-load.	2024-04-18 06:09:12 +01:00
Conrad Ludgate	a54ea8fb1c	proxy: move endpoint rate limiter (#7413 ) ## Problem ## Summary of changes Rate limit for wake_compute calls	2024-04-18 06:00:33 +01:00
Anna Khanova	d5708e7435	proxy: Record role to span (#7407 ) ## Problem ## Summary of changes Add dbrole to span.	2024-04-17 14:16:11 +02:00
Anna Khanova	fd49005cb3	proxy: Improve logging (#7405 ) ## Problem It's unclear from logs what's going on with the regional redis. ## Summary of changes Make logs better.	2024-04-17 11:33:31 +00:00
Vlad Lazar	3023de156e	pageserver: demote range end fallback log (#7403 ) ## Problem This trace is emitted whenever a vectored read touches the end of a delta layer file. It's a perfectly normal case, but I expected it to be more rare when implementing the code. ## Summary of changes Demote log to debug.	2024-04-17 11:32:07 +01:00
Jure Bajic	e49e931bc4	Add for `add-help-for-timeline-arg` for `timeline` command (#7361 ) ## Problem When calling `./neon_local timeline` a confusing error message pops up: `command failed: no tenant subcommand provided` ## Summary of changes Add `add-help-for-timeline-arg` for timeline commands so when no argument for the timeline is provided help is printed.	2024-04-17 10:23:55 +01:00
Anna Khanova	13b9135d4e	proxy: Cleanup unused rate limiter (#7400 ) ## Problem There is an unused dead code. ## Summary of changes Let's remove it. In case we would need it in the future, we can always return it back. Also removed cli arguments. They shouldn't be used by anyone but us.	2024-04-17 11:11:49 +02:00
Alexander Bayandin	41bb1e42b8	CI(check-build-tools-image): fix getting build-tools image tag (#7402 ) ## Problem For PRs, by default, we check out a phantom merge commit (merge a branch into the main), but using a real branches head when finding `build-tools` image tag. ## Summary of changes - Change `COMMIT_SHA` to use `${{ github.sha }}` instead of `${{ github.event.pull_request.head.sha }}` for PRs ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-04-17 09:50:58 +01:00
Alex Chi Z	cb4b40f9c1	chore(compute_ctl): add error context to apply_spec (#7374 ) Make it faster to identify which part of apply spec goes wrong by adding an error context. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-17 09:11:04 +03:00
Alex Chi Z	9e567d9814	feat(neon_local): support listen addr for safekeeper (#7328 ) Leftover from my LFC benchmarks. Safekeepers only listen on `127.0.0.1` for `neon_local`. This pull request adds support for listening on other address. To specify a custom address, modify `.neon/config`. ``` [[safekeepers]] listen_addr = "192.168.?.?" ``` Endpoints created by neon_local still use 127.0.0.1 and I will fix them later. I didn't fix it in the same pull request because my benchmark setting does not use neon_local to create compute nodes so I don't know how to fix it yet -- maybe replacing a few `127.0.0.1`s. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-17 09:10:01 +03:00
Vlad Lazar	1c012958c7	pageserver/http: remove status code boilerplate from swagger spec (#7385 ) ## Problem We specify a bunch of possible error codes in the pageserver api swagger spec. This is error prone and annoying to work with. https://github.com/neondatabase/cloud/pull/11907 introduced generic error handling on the control plane side, so we can now clean up the spec. ## Summary of changes * Remove generic error codes from swagger spec * Update a couple route handlers which would previously return an error without a `msg` field in the response body. Tested via https://github.com/neondatabase/cloud/pull/12340 Related https://github.com/neondatabase/cloud/issues/7238	2024-04-16 16:24:09 +01:00
Conrad Ludgate	e5c50bb12b	proxy: rate limit authentication by masked IPv6. (#7316 ) ## Problem Many users have access to ipv6 subnets (eg a /64). That gives them 2^64 addresses to play with ## Summary of changes Truncate the address to /64 to reduce the attack surface. Todo: ~~Will NAT64 be an issue here? AFAIU they put the IPv4 address at the end of the IPv6 address. By truncating we will lose all that detail.~~ It's the same problem as a host sharing IPv6 addresses between clients. I don't think it's up to us to solve. If a customer is getting DDoSed, then they likely need to arrange a dedicated IP with us.	2024-04-16 14:16:34 +00:00
John Spray	926662eb7c	storage_controller: suppress misleading log (#7395 ) ## Problem - https://github.com/neondatabase/neon/issues/7355 The optimize_secondary function calls schedule_shard to check for improvements, but if there are exactly the same number of nodes as there are replicas of the shard, it emits some scary looking logs about no nodes being elegible. Closes https://github.com/neondatabase/neon/issues/7355 ## Summary of changes - Add a mode to SchedulingContext that controls logging: this should be useful in future any time we add a log to the scheduling path, to avoid it becoming a source of spam when the scheduler is called during optimization.	2024-04-16 12:41:48 +00:00
John Spray	3366cd34ba	pageserver: return ACCEPTED when deletion already in flight (#7384 ) ## Problem test_sharding_smoke recently got an added section that checks deletion of a sharded tenant. The storage controller does a retry loop for deletion, waiting for a 404 response. When deletion is a bit slow (debug builds), the retry of deletion was getting a 500 response -- this caused the test to become flaky (example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/release-proxy/8659801445/index.html#testresult/b4cbf5b58190f60e/retries) There was a false comment in the code: ``` match tenant.current_state() { TenantState::Broken { .. } \| TenantState::Stopping { .. } => { - // If a tenant is broken or stopping, DeleteTenantFlow can - // handle it: broken tenants proceed to delete, stopping tenants - // are checked for deletion already in progress. ``` If the tenant is stopping, DeleteTenantFlow does not in fact handle it, but returns a 500-yielding errror. ## Summary of changes Before calling into DeleteTenantFlow, if the tenant is in stopping\|broken state then return 202 if a deletion is in progress. This makes the API friendlier for retries. The historic AlreadyInProgress (409) response still exists for if we enter DeleteTenantFlow and unexpectedly see the tenant stopping. That should go away when we implement #5080 . For the moment, callers that handle 409s should continue to do so.	2024-04-16 09:39:18 +01:00
Christian Schwarz	2d5a8462c8	add `async` walredo mode (disabled-by-default, opt-in via config) (#6548 ) Before this PR, the `nix::poll::poll` call would stall the executor. This PR refactors the `walredo::process` module to allow for different implementations, and adds a new `async` implementation which uses `tokio::process::ChildStd{in,out}` for IPC. The `sync` variant remains the default for now; we'll do more testing in staging and gradual rollout to prod using the config variable. Performance ----------- I updated `bench_walredo.rs`, demonstrating that a single `async`-based walredo manager used by N=1...128 tokio tasks has lower latency and higher throughput. I further did manual less-micro-benchmarking in the real pageserver binary. Methodology & results are published here: https://neondatabase.notion.site/2024-04-08-async-walredo-benchmarking-8c0ed3cc8d364a44937c4cb50b6d7019?pvs=4 tl;dr: - use pagebench against a pageserver patched to answer getpage request & small-enough working set to fit into PS PageCache / kernel page cache. - compare knee in the latency/throughput curve - N tenants, each 1 pagebench clients - sync better throughput at N < 30, async better at higher N - async generally noticable but not much worse p99.X tail latencies - eyeballing CPU efficiency in htop, `async` seems significantly more CPU efficient at ca N=[0.5ncpus, 1.5ncpus], worse than `sync` outside of that band Mental Model For Walredo & Scheduler Interactions ------------------------------------------------- Walredo is CPU-/DRAM-only work. This means that as soon as the Pageserver writes to the pipe, the walredo process becomes runnable. To the Linux kernel scheduler, the `$ncpus` executor threads and the walredo process thread are just `struct task_struct`, and it will divide CPU time fairly among them. In `sync` mode, there are always `$ncpus` runnable `struct task_struct` because the executor thread blocks while `walredo` runs, and the executor thread becomes runnable when the `walredo` process is done handling the request. In `async` mode, the executor threads remain runnable unless there are no more runnable tokio tasks, which is unlikely in a production pageserver. The above means that in `sync` mode, there is an implicit concurrency limit on concurrent walredo requests (`$num_runtimes * $num_executor_threads_per_runtime`). And executor threads do not compete in the Linux kernel scheduler for CPU time, due to the blocked-runnable-ping-pong. In `async` mode, there is no concurrency limit, and the walredo tasks compete with the executor threads for CPU time in the kernel scheduler. If we're not CPU-bound, `async` has a pipelining and hence throughput advantage over `sync` because one executor thread can continue processing requests while a walredo request is in flight. If we're CPU-bound, under a fair CPU scheduler, the fixed number of executor threads has to share CPU time with the aggregate of walredo processes. It's trivial to reason about this in `sync` mode due to the blocked-runnable-ping-pong. In `async` mode, at 100% CPU, the system arrives at some (potentially sub-optiomal) equilibrium where the executor threads get just enough CPU time to fill up the remaining CPU time with runnable walredo process. Why `async` mode Doesn't Limit Walredo Concurrency -------------------------------------------------- To control that equilibrium in `async` mode, one may add a tokio semaphore to limit the number of in-flight walredo requests. However, the placement of such a semaphore is non-trivial because it means that tasks queuing up behind it hold on to their request-scoped allocations. In the case of walredo, that might be the entire reconstruct data. We don't limit the number of total inflight Timeline::get (we only throttle admission). So, that queue might lead to an OOM. The alternative is to acquire the semaphore permit before collecting reconstruct data. However, what if we need to on-demand download? A combination of semaphores might help: one for reconstruct data, one for walredo. The reconstruct data semaphore permit is dropped after acquiring the walredo semaphore permit. This scheme effectively enables both a limit on in-flight reconstruct data and walredo concurrency. However, sizing the amount of permits for the semaphores is tricky: - Reconstruct data retrieval is a mix of disk IO and CPU work. - If we need to do on-demand downloads, it's network IO + disk IO + CPU work. - At this time, we have no good data on how the wall clock time is distributed. It turns out that, in my benchmarking, the system worked fine without a semaphore. So, we're shipping async walredo without one for now. Future Work ----------- We will do more testing of `async` mode and gradual rollout to prod using the config flag. Once that is done, we'll remove `sync` mode to avoid the temporary code duplication introduced by this PR. The flag will be removed. The `wait()` for the child process to exit is still synchronous; the comment [here]( `655d3b6468/pageserver/src/walredo.rs (L294-L306)`) is still a valid argument in favor of that. The `sync` mode had another implicit advantage: from tokio's perspective, the calling task was using up coop budget. But with `async` mode, that's no longer the case -- to tokio, the writes to the child process pipe look like IO. We could/should inform tokio about the CPU time budget consumed by the task to achieve fairness similar to `sync`. However, the [runtime function for this is `tokio_unstable`](`https://docs.rs/tokio/latest/tokio/task/fn.consume_budget.html). Refs ---- refs #6628 refs https://github.com/neondatabase/neon/issues/2975	2024-04-15 22:14:42 +02:00
Anna Khanova	110282ee7e	proxy: Exclude private ip errors from recorded metrics (#7389 ) ## Problem Right now we record errors from internal VPC. ## Summary of changes * Exclude it from the metrics. * Simplify pg-sni-router	2024-04-15 20:21:50 +02:00
Christian Schwarz	f752c40f58	storage release: stop using no-op deployProxy / deployPgSniRouter (#7382 ) As of https://github.com/neondatabase/aws/pull/1264 these options are no-ops. This PR unblocks removal of the variables in https://github.com/neondatabase/aws/pull/1263	2024-04-15 15:05:44 +02:00
John Spray	83cdbbb89a	pageserver: improve readability of shard.rs (#7330 ) No functional changes, this is a comments/naming PR. While merging sharding changes, some cleanup of the shard.rs types was deferred. In this PR: - Rename `is_zero` to `is_shard_zero` to make clear that this method doesn't literally mean that the entire object is zeros, just that it refers to the 0th shard in a tenant. - Pull definitions of types to the top of shard.rs and add a big comment giving an overview of which type is for what. Closes: https://github.com/neondatabase/neon/issues/6072	2024-04-15 11:50:26 +01:00
dependabot[bot]	5288f9621e	build(deps): bump idna from 3.3 to 3.7 (#7367 )	2024-04-12 10:15:40 +01:00
Tristan Partin	e8338c60f9	Fix typo in pg_ctl shutdown mode (#7365 ) The allowed modes as of Postgres 17 are: smart, fast, and immediate. $ cargo neon stop Finished dev [unoptimized + debuginfo] target(s) in 0.24s Running `target/debug/neon_local stop` postgres stop failed: pg_ctl failed, exit code: exit status: 1, stdout: , stderr: pg_ctl: unrecognized shutdown mode "fast " Try "pg_ctl --help" for more information.	2024-04-11 23:42:18 -05:00
Alexander Bayandin	94505fd672	CI: speed up Allure reports upload (#7362 ) ## Problem `create-test-report` job takes more than 8 minutes, the longest step is uploading Allure report to S3: Before: ``` + aws s3 cp --recursive --only-show-errors /tmp/pr-7362-1712847045/report s3://neon-github-public-dev/reports/pr-7362/8647730612 real 6m10.572s user 6m37.717s sys 1m9.429s ``` After: ``` + s5cmd --log error cp '/tmp/pr-7362-1712858221/report/*' s3://neon-github-public-dev/reports/pr-7362/8650636861/ real 0m9.698s user 1m9.438s sys 0m6.419s ``` ## Summary of changes - Add `s5cmd`(https://github.com/peak/s5cmd) to build-tools image - Use `s5cmd` instead of `aws s3` for uploading Allure reports	2024-04-11 23:35:30 +01:00
Conrad Ludgate	e92fb94149	proxy: fix overloaded db connection closure (#7364 ) ## Problem possible for the database connections to not close in time. ## Summary of changes force the closing of connections if the client has hung up	2024-04-11 20:55:05 +00:00
Anna Khanova	40f15c3123	Read cplane events from regional redis (#7352 ) ## Problem Actually read redis events. ## Summary of changes This is revert of https://github.com/neondatabase/neon/pull/7350 + fixes. * Fixed events parsing * Added timeout after connection failure * Separated regional and global redis clients.	2024-04-11 18:24:34 +00:00
Conrad Ludgate	5299f917d6	proxy: replace prometheus with measured (#6717 ) ## Problem My benchmarks show that prometheus is not very good. https://github.com/conradludgate/measured We're already using it in storage_controller and it seems to be working well. ## Summary of changes Replace prometheus with my new measured crate in proxy only. Apologies for the large diff. I tried to keep it as minimal as I could. The label types add a bit of boiler plate (but reduce the chance we mistype the labels), and some of our custom metrics like CounterPair and HLL needed to be rewritten.	2024-04-11 16:26:01 +00:00
Alexander Bayandin	99a56b5606	CI(build-build-tools-image): Do not cancel concurrent workflows (#7226 ) ## Problem `build-build-tools-image` workflow is designed to be run only in one example per the whole repository. Currently, the job gets cancelled if a newer one is scheduled, here's an example: https://github.com/neondatabase/neon/actions/runs/8419610607 ## Summary of changes - Explicitly set `cancel-in-progress: false` for all jobs that aren't supposed to be cancelled	2024-04-11 15:23:08 +01:00
John Spray	1628b5b145	compute hook: use shared client with explicit timeout (#7359 ) ## Problem We are seeing some mysterious long waits when sending requests. ## Summary of changes - To eliminate risk that we are incurring some unreasonable overheads from setup, e.g. DNS, use a single Client (internally a pool) instead of repeatedly constructing a fresh one. - To make it clearer where a timeout is occurring, apply a 10 second timeout to requests as we send them.	2024-04-11 14:14:09 +00:00
Arthur Petukhovsky	db72543f4d	Reenable test_forward_compatibility (#7358 ) It was disabled due to https://github.com/neondatabase/neon/pull/6530 breaking forward compatiblity. Now that we have deployed it to production, we can reenable the test	2024-04-11 12:31:27 +02:00
Konstantin Knizhnik	d47e4a2a41	Remember last written LSN when it is first requested (#7343 ) ## Problem See https://neondb.slack.com/archives/C03QLRH7PPD/p1712529369520409 In case of statements CREATE TABLE AS SELECT... or INSERT FROM SELECT... we are fetching data from source table and storing it in destination table. It cause problems with prefetch last-written-lsn is known for the pages of source table (which for example happens after compute restart). In this case we get get global value of last-written-lsn which is changed frequently as far as we are writing pages of destination table. As a result request-isn for the prefetch and request-let when this page is actually needed are different and we got exported prefetch request. So it actually disarms prefetch. ## Summary of changes Proposed simple patch stores last-written LSN for the page when it is not found. So next time we will request last-written LSN for this page, we will get the same value (certainly if the page was not changed). ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-04-11 07:47:45 +03:00
Em Sharnoff	f86845f64b	compute_ctl: Auto-set dynamic_shared_memory_type (#7348 ) Part of neondatabase/cloud#12047. The basic idea is that for our VMs, we want to enable swap and disable Linux memory overcommit. Alongside these, we should set postgres' dynamic_shared_memory_type to mmap, but we want to avoid setting it to mmap if swap is not enabled. Implementing this in the control plane would be fiddly, but it's relatively straightforward to add to compute_ctl.	2024-04-10 13:13:48 +00:00
Anna Khanova	0bb04ebe19	Revert "Proxy read ids from redis (#7205 )" (#7350 ) This reverts commit `dbac2d2c47`. ## Problem Proxy pods fails to install in k8s clusters, cplane release blocking. ## Summary of changes Revert	2024-04-10 10:12:55 +00:00
Anna Khanova	5efe95a008	proxy: fix credentials cache lookup (#7349 ) ## Problem Incorrect processing of `-pooler` connections. ## Summary of changes Fix TODO: add e2e tests for caching	2024-04-10 08:30:09 +00:00
Conrad Ludgate	c0ff4f18dc	proxy: hyper1 for only proxy (#7073 ) ## Problem hyper1 offers control over the HTTP connection that hyper0_14 does not. We're blocked on switching all services to hyper1 because of how we use tonic, but no reason we can't switch proxy over. ## Summary of changes 1. hyper0.14 -> hyper1 1. self managed server 2. Remove the `WithConnectionGuard` wrapper from `protocol2` 2. Remove TLS listener as it's no longer necessary 3. include first session ID in connection startup logs	2024-04-10 08:23:59 +00:00
Arpad Müller	fd88d4608c	Add command to time travel recover prefixes (#7322 ) Adds another tool to the DR toolbox: ability in pagectl to recover arbitrary prefixes in remote storage. Requires remote storage config, the prefix, and the travel-to timestamp parameter to be specified as cli args. The done-if-after parameter is also supported. Example invocation (after `aws login --profile dev`): ``` RUST_LOG=remote_storage=debug AWS_PROFILE=dev cargo run -p pagectl time-travel-remote-prefix 'remote_storage = { bucket_name = "neon-test-bucket-name", bucket_region = "us-east-2" }' wal/3aa8fcc61f6d357410b7de754b1d9001/641e5342083b2235ee3deb8066819683/ 2024-04-05T17:00:00Z ``` This has been written to resolve a customer recovery case: https://neondb.slack.com/archives/C033RQ5SPDH/p1712256888468009 There is validation of the prefix to prevent accidentially specifying too generic prefixes, which can cause corruption and data loss if used wrongly. Still, the validation is not perfect and it is important that the command is used with caution. If possible, `time_travel_remote_storage` should be used instead which has additional checks in place.	2024-04-10 09:12:07 +02:00
Vlad Lazar	221414de4b	pageserver: time based rolling based on the first write timestamp (#7346 ) Problem Currently, we base our time based layer rolling decision on the last time we froze a layer. This means that if we roll a layer and then go idle for longer than the checkpoint timeout the next layer will be rolled after the first write. This is of course not desirable. Summary of changes Record the timepoint of the first write to an open layer and use that for time based layer rolling decisions. Note that I had to keep `Timeline::last_freeze_ts` for the sharded tenant disk consistent lsn skip hack. Fixes #7241	2024-04-10 06:31:28 +01:00
Anna Khanova	dbac2d2c47	Proxy read ids from redis (#7205 ) ## Problem Proxy doesn't know about existing endpoints. ## Summary of changes * Added caching of all available endpoints. * On the high load, use it before going to cplane. * Report metrics for the outcome. * For rate limiter and credentials caching don't distinguish between `-pooled` and not TODOs: * Make metrics more meaningful * Consider integrating it with the endpoint rate limiter * Test it together with cplane in preview	2024-04-10 02:40:14 +02:00
Alexander Bayandin	4f4f787119	Update staging hostname (#7347 ) ## Problem ``` Could not resolve host: console.stage.neon.tech ``` ## Summary of changes - replace `console.stage.neon.tech` with `console-stage.neon.build`	2024-04-09 12:03:46 +01:00
Alexander Bayandin	bcab344490	CI(flaky-tests): remove outdated restriction (#7345 ) ## Problem After switching the default pageserver io-engine to `tokio-epoll-uring` on CI, we tuned a query that finds flaky tests (in https://github.com/neondatabase/neon/pull/7077). It has been almost a month since then, additional query tuning is not required anymore. ## Summary of changes - Remove extra condition from flaky tests query - Also return back parameterisation to the query	2024-04-09 10:50:43 +01:00
Conrad Ludgate	f212630da2	update measured with some more convenient features (#7334 ) ## Problem Some awkwardness in the measured API. Missing process metrics. ## Summary of changes Update measured to use the new convenience setup features. Added measured-process lib. Added measured support for libmetrics	2024-04-08 18:01:41 +00:00
Kevin Mingtarja	a306d0a54b	implement Serialize/Deserialize for SystemTime with RFC3339 format (#7203 ) ## Problem We have two places that use a helper (`ser_rfc3339_millis`) to get serde to stringify SystemTimes into the desired format. ## Summary of changes Created a new module `utils::serde_system_time` and inside it a wrapper type `SystemTime` for `std::time::SystemTime` that serializes/deserializes to the RFC3339 format. This new type is then used in the two places that were previously using the helper for serialization, thereby eliminating the need to decorate structs. Closes #7151.	2024-04-08 15:53:07 +01:00
Christian Schwarz	1081a4d246	pageserver: option to run with just one tokio runtime (#7331 ) This PR is an off-by-default revision v2 of the (since-reverted) PR #6555 / commit `3220f830b7fbb785d6db8a93775f46314f10a99b`. See that PR for details on why running with a single runtime is desirable and why we should be ready. We reverted #6555 because it showed regressions in prodlike cloudbench, see the revert commit message `ad072de4209193fd21314cf7f03f14df4fa55eb1` for more context. This PR makes it an opt-in choice via an env var. The default is to use the 4 separate runtimes that we have today, there shouldn't be any performance change. I tested manually that the env var & added metric works. ``` # undefined env var => no change to before this PR, uses 4 runtimes ./target/debug/neon_local start # defining the env var enables one-runtime mode, value defines that one runtime's configuration NEON_PAGESERVER_USE_ONE_RUNTIME=current_thread ./target/debug/neon_local start NEON_PAGESERVER_USE_ONE_RUNTIME=multi_thread:1 ./target/debug/neon_local start NEON_PAGESERVER_USE_ONE_RUNTIME=multi_thread:2 ./target/debug/neon_local start NEON_PAGESERVER_USE_ONE_RUNTIME=multi_thread:default ./target/debug/neon_local start ``` I want to use this change to do more manualy testing and potentially testing in staging. Future Work ----------- Testing / deployment ergonomics would be better if this were a variable in `pageserver.toml`. It can be done, but, I don't need it right now, so let's stick with the env var.	2024-04-08 16:27:08 +02:00
Arpad Müller	47b705cffe	Remove async_trait from CompactionDeltaLayer (#7342 ) Removes usage of async_trait from the `CompactionDeltaLayer` trait. Split off from #7301 Related earlier work: https://github.com/neondatabase/neon/pull/6305, https://github.com/neondatabase/neon/pull/6464, https://github.com/neondatabase/neon/pull/7303	2024-04-08 14:59:08 +02:00
Christian Schwarz	2d3c9f0d43	refactor(pageserver): use tokio::signal instead of spawn_blocking (#7332 ) It's just unnecessary to use spawn_blocking there, and with https://github.com/neondatabase/neon/pull/7331 , it will result in really just one executor thread when enabling one-runtime with current_thread executor.	2024-04-08 09:35:32 +00:00
Joonas Koivunen	21b3e1d13b	fix(utilization): return used as does df (#7337 ) We can currently underflow `pageserver_resident_physical_size_global`, so the used disk bytes would show `u63::MAX` by mistake. The assumption of the API (and the documented behavior) was to give the layer files disk usage. Switch to reporting numbers that match `df` output. Fixes: #7336	2024-04-08 09:01:38 +03:00
John Spray	0788760451	tests: further stabilize test_deletion_queue_recovery (#7335 ) This is the other main failure mode called out in #6092 , that the test can shut down the pageserver while it has "future layers" in the index, and that this results in unexpected stats after restart. We can avoid this nondeterminism by shutting down the endpoint, flushing everything from SK to PS, checkpointing, and then waiting for that final LSN to be uploaded. This is more heavyweight than most of our tests require, but useful in the case of tests that expect a particular behavior after restart wrt layer deletions.	2024-04-07 21:21:18 +00:00
John Spray	74b2314a5d	control_plane: revise compute_hook locking (don't serialise all calls) (#7088 ) ## Problem - Previously, an async mutex was held for the duration of `ComputeHook::notify`. This served multiple purposes: - Ensure updates to a given tenant are sent in the proper order - Prevent concurrent calls into neon_local endpoint updates in test environments (neon_local is not safe to call concurrently) - Protect the inner ComputeHook::state hashmap that is used to calculate when to send notifications. This worked, but had the major downside that while we're waiting for a compute hook request to the control plane to succeed, we can't notify about any other tenants. Notifications block progress of live migrations, so this is a problem. ## Summary of changes - Protect `ComputeHook::state` with a sync lock instead of an async lock - Use a separate async lock ( `ComputeHook::neon_local_lock` ) for preventing concurrent calls into neon_local, and only take this in the neon_local code path. - Add per-tenant async locks in ShardedComputeHookTenant, and use these to ensure that only one remote notification can be sent at once per tenant. If several shards update concurrently, their updates will be coalesced. - Add an explicit semaphore that limits concurrency of calls into the cloud control plane.	2024-04-06 19:51:59 +00:00
Christian Schwarz	edcaae6290	fixup: PR #7319 defined workload.py `def stop()` twice (#7333 ) Somehow it made it through CI.	2024-04-05 19:11:04 +00:00
John Spray	4fc95d2d71	pageserver: apply shard filtering to blocks ingested during initdb (#7319 ) ## Problem Ingest filtering wasn't being applied to timeline creations, so a timeline created on a sharded tenant would use 20MB+ on each shard (each shard got a full copy). This didn't break anything, but is inefficient and leaves the system in a harder-to-validate state where shards initially have some data that they will eventually drop during compaction. Closes: https://github.com/neondatabase/neon/issues/6649 ## Summary of changes - in `import_rel`, filter block-by-block with is_key_local - During test_sharding_smoke, check that per-shard physical sizes are as expected - Also extend the test to check deletion works as expected (this was an outstanding tech debt task)	2024-04-05 18:07:35 +01:00
John Spray	534c099b42	tests: improve stability of `test_deletion_queue_recovery` (#7325 ) ## Problem As https://github.com/neondatabase/neon/issues/6092 points out, this test was (ab)using a failpoint!() with 'pause', which was occasionally causing index uploads to get hung on a stuck executor thread, resulting in timeouts waiting for remote_consistent_lsn. That is one of several failure modes, but by far the most frequent. ## Summary of changes - Replace the failpoint! with a `sleep_millis_async`, which is not only async but also supports clean shutdown. - Improve debugging: log the consistent LSN when scheduling an index upload - Tidy: remove an unnecessary checkpoint in the test code, where last_flush_lsn_upload had just been called (this does a checkpoint internally)	2024-04-05 18:01:31 +01:00
John Spray	ec01292b55	storage controller: rename TenantState to TenantShard (#7329 ) This is a widely used type that had a misleading name: it's not the total state of a tenant, but rrepresents one shard.	2024-04-05 16:29:53 +00:00
John Spray	66fc465484	Clean up 'attachment service' names to storage controller (#7326 ) The binary etc were renamed some time ago, but the path in the source tree remained "attachment_service" to avoid disruption to ongoing PRs. There aren't any big PRs out right now, so it's a good time to cut over. - Rename `attachment_service` to `storage_controller` - Move it to the top level for symmetry with `storage_broker` & to avoid mixing the non-prod neon_local stuff (`control_plane/`) with the storage controller which is a production component.	2024-04-05 16:18:00 +01:00
Conrad Ludgate	55da8eff4f	proxy: report metrics based on cold start info (#7324 ) ## Problem Would be nice to have a bit more info on cold start metrics. ## Summary of changes * Change connect compute latency to include `cold_start_info`. * Update `ColdStartInfo` to include HttpPoolHit and WarmCached. * Several changes to make more use of interned strings	2024-04-05 16:14:50 +01:00
Arpad Müller	0fa517eb80	Update test-context dependency to 0.3 (#7303 ) Updates the `test-context` dev-dependency of the `remote_storage` crate to 0.3. This removes a lot of `async_trait` instances. Related earlier work: #6305, #6464	2024-04-05 15:53:29 +02:00
Arthur Petukhovsky	8ceb4f0a69	Fix partial zero segment upload (#7318 ) Found these logs on staging safekeepers: ``` INFO Partial backup{ttid=X/Y}: failed to upload 000000010000000000000000_173_0000000000000000_0000000000000000_sk56.partial: Failed to open file "/storage/safekeeper/data/X/Y/000000010000000000000000.partial" for wal backup: No such file or directory (os error 2) INFO Partial backup{ttid=X/Y}:upload{name=000000010000000000000000_173_0000000000000000_0000000000000000_sk56.partial}: starting upload PartialRemoteSegment { status: InProgress, name: "000000010000000000000000_173_0000000000000000_0000000000000000_sk56.partial", commit_lsn: 0/0, flush_lsn: 0/0, term: 173 } ``` This is because partial backup tries to upload zero segment when there is no data in timeline. This PR fixes this bug introduced in #6530.	2024-04-05 11:48:08 +01:00
John Spray	6019ccef06	tests: extend log allow list in test_storcon_cli (#7321 ) This test was occasionally flaky: it already allowed the log for the scheduler complaining about Stop state, but not the log for maybe_reconcile complaining.	2024-04-05 11:44:15 +01:00
John Spray	0c6367a732	storage controller: fix repeated location_conf returning no shards (#7314 ) ## Problem When a location_conf request was repeated with no changes, we failed to build the list of shards in the result. ## Summary of changes Remove conditional that only generated a list of updates if something had really changed. This does some redundant database updates, but it is preferable to having a whole separate code path for no-op changes. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-04-04 17:34:05 +00:00
John Spray	e17bc6afb4	pageserver: update mgmt_api to use TenantShardId (#7313 ) ## Problem The API client was written around the same time as some of the server APIs changed from TenantId to TenantShardId Closes: https://github.com/neondatabase/neon/issues/6154 ## Summary of changes - Refactor mgmt_api timeline_info and keyspace methods to use TenantShardId to match the server This doesn't make pagebench sharding aware, but it paves the way to do so later.	2024-04-04 18:23:45 +01:00
John Spray	ac7fc6110b	pageserver: handle WAL gaps on sharded tenants (#6788 ) ## Problem In the test for https://github.com/neondatabase/neon/pull/6776, a test cases uses tiny layer sizes and tiny stripe sizes. This hits a scenario where a shard's checkpoint interval spans a region where none of the content in the WAL is ingested by this shard. Since there is no layer to flush, we do not advance disk_consistent_lsn, and this causes the test to fail while waiting for LSN to advance. ## Summary of changes - Pass an LSN through `layer_flush_start_tx`. This is the LSN to which we have frozen at the time we ask the flush to flush layers frozen up to this point. - In the layer flush task, if the layers we flush do not reach `frozen_to_lsn`, then advance disk_consistent_lsn up to this point. - In `maybe_freeze_ephemeral_layer`, handle the case where last_record_lsn has advanced without writing a layer file: this ensures that disk_consistent_lsn and remote_consistent_lsn advance anyway. The net effect is that the disk_consistent_lsn is allowed to advance past regions in the WAL where a shard ingests no data, and that we uphold our guarantee that remote_consistent_lsn always eventually reaches the tip of the WAL. The case of no layer at all is hard to test at present due to >0 shards being polluted with SLRU writes, but I have tested it locally with a branch that disables SLRU writes on shards >0. We can tighten up the testing on this in future as/when we refine shard filtering (currently shards >0 need the SLRU because they use it to figure out cutoff in GC using timestamp-to-lsn).	2024-04-04 16:54:38 +00:00
John Spray	862a6b7018	pageserver: timeout on deletion queue flush in timeline deletion (#7315 ) Some time ago, we had an issue where a deletion queue hang was also causing timeline deletions to hang. This was unnecessary because the timeline deletion doesn't _need_ to flush the deletion queue, it just does it as a pleasantry to make the behavior easier to understand and test. In this PR, we wrap the flush calls in a 10 second timeout (typically the flush takes milliseconds) so that in the event of issues with the deletion queue, timeline deletions are slower but not entirely blocked. Closes: https://github.com/neondatabase/neon/issues/6440	2024-04-04 17:51:44 +01:00
Christian Schwarz	4810c22607	fix(walredo spawn): coalescing stalls other executors std::sync::RwLock (#7310 ) part of #6628 Before this PR, we used a std::sync::RwLock to coalesce multiple callers on one walredo spawning. One thread would win the write lock and others would queue up either at the read() or write() lock call. In a scenario where a compute initiates multiple getpage requests from different Postgres backends (= different page_service conns), and we don't have a walredo process around, this means all these page_service handler tasks will enter the spawning code path, one of them will do the spawning, and the others will stall their respective executor thread because they do a blocking read()/write() lock call. I don't know exactly how bad the impact is in reality because posix_spawn uses CLONE_VFORK under the hood, which means that the entire parent process stalls anyway until the child does `exec`, which in turn resumes the parent. But, anyway, we won't know until we fix this issue. And, there's definitely a future way out of stalling the pageserver on posix_spawn, namely, forking template walredo processes that fork again when they need to be per-tenant. This idea is tracked in https://github.com/neondatabase/neon/issues/7320. Changes ------- This PR fixes that scenario by switching to use `heavier_once_cell` for coalescing. There is a comment on the struct field that explains it in a bit more nuance. ### Alternative Design An alternative would be to use tokio::sync::RwLock. I did this in the first commit in this PR branch, before switching to `heavier_once_cell`. Performance ----------- I re-ran the `bench_walredo` and updated the results, showing that the changes are neglible. For the record, the earlier commit in this PR branch that uses `tokio::sync::RwLock` also has updated benchmark numbers, and the results / kinds of tiny regression were equivalent to `heavier_once_cell`. Note that the above doesn't measure performance on the cold path, i.e., when we need to launch the process and coalesce. We don't have a benchmark for that, and I don't expect any significant changes. We have metrics and we log spawn latency, so, we can monitor it in staging & prod. Risks ----- As "usual", replacing a std::sync primitive with something that yields to the executor risks exposing concurrency that was previously implicitly limited to the number of executor threads. This would be the first one for walredo. The risk is that we get descheduled while the reconstruct data is already there. That could pile up reconstruct data. In practice, I think the risk is low because once we get scheduled again, we'll likely have a walredo process ready, and there is no further await point until walredo is complete and the reconstruct data has been dropped. This will change with async walredo PR #6548, and I'm well aware of it in that PR.	2024-04-04 17:54:14 +02:00
Vlad Lazar	9d754e984f	storage_controller: setup sentry reporting (#7311 ) ## Problem No alerting for storage controller is in place. ## Summary of changes Set up sentry for the storage controller.	2024-04-04 13:41:04 +01:00
John Spray	375e15815c	storage controller: grant 'admin' access to all APIs (#7307 ) ## Problem Currently, using `storcon-cli` requires user to select a token with either `pageserverapi` or `admin` scope depending on which endpoint they're using. ## Summary of changes - In check_permissions, permit access with the admin scope even if the required scope is missing. The effect is that an endpoint that required `pageserverapi` now accepts either `pageserverapi` or `admin`, and for the CLI one can simply use an `admin` scope token for everything.	2024-04-04 11:22:08 +00:00
Anna Khanova	7ce613354e	Fix length (#7308 ) ## Problem Bug ## Summary of changes Use `compressed_data.len()` instead of `data.len()`.	2024-04-04 10:29:10 +00:00
Konstantin Knizhnik	ae15acdee7	Fix bug in prefetch cleanup (#7277 ) ## Problem Running test_pageserver_restarts_under_workload in POR #7275 I get the following assertion failure in prefetch: ``` #5 0x00005587220d4bf0 in ExceptionalCondition ( conditionName=0x7fbf24d003c8 "(ring_index) < MyPState->ring_unused && (ring_index) >= MyPState->ring_last", fileName=0x7fbf24d00240 "/home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c", lineNumber=644) at /home/knizhnik/neon.main//vendor/postgres-v16/src/backend/utils/error/assert.c:66 #6 0x00007fbf24cebc9b in prefetch_set_unused (ring_index=1509) at /home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c:644 #7 0x00007fbf24cec613 in prefetch_register_buffer (tag=..., force_latest=0x0, force_lsn=0x0) at /home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c:891 #8 0x00007fbf24cef21e in neon_prefetch (reln=0x5587233b7388, forknum=MAIN_FORKNUM, blocknum=14110) at /home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c:2055 (gdb) p ring_index $1 = 1509 (gdb) p MyPState->ring_unused $2 = 1636 (gdb) p MyPState->ring_last $3 = 1636 ``` ## Summary of changes Check status of `prefetch_wait_for` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-04-04 13:28:22 +03:00
Vlad Lazar	c5f64fe54f	tests: reinstate some syntethic size tests (#7294 ) ## Problem `test_empty_tenant_size` was marked `xfail` and a few other tests were skipped. ## Summary of changes Stabilise `test_empty_tenant_size`. This test attempted to disable checkpointing for the postgres instance and expected that the synthetic size remains stable for an empty tenant. When debugging I noticed that postgres was issuing a checkpoint after the transaction in the test (perhaps something changed since the test was introduced). Hence, I relaxed the size check to allow for the checkpoint key written on the pageserver. Also removed the checks for synthetic size inputs since the expected values differ between postgres versions. Closes https://github.com/neondatabase/neon/issues/7138	2024-04-04 09:45:14 +00:00
Conrad Ludgate	40852b955d	update ordered-multimap (#7306 ) ## Problem ordered-multimap was yanked ## Summary of changes `cargo update -p ordered-multimap`	2024-04-04 08:55:43 +00:00
Christian Schwarz	b30b15e7cb	refactor(Timeline::shutdown): rely more on Timeline::cancel; use it from deletion code path (#7233 ) This PR is a fallout from work on #7062. # Changes - Unify the freeze-and-flush and hard shutdown code paths into a single method `Timeline::shutdown` that takes the shutdown mode as an argument. - Replace `freeze_and_flush` bool arg in callers with that mode argument, makes them more expressive. - Switch timeline deletion to use `Timeline::shutdown` instead of its own slightly-out-of-sync copy. - Remove usage of `task_mgr::shutdown_watcher` / `task_mgr::shutdown_token` where possible # Future Work Do we really need the freeze_and_flush? If we could get rid of it, then there'd be no need for a specific shutdown order. Also, if you undo this patch's changes to the `eviction_task.rs` and enable RUST_LOG=debug, it's easy to see that we do leave some task hanging that logs under span `Connection{...}` at debug level. I think it's a pre-existing issue; it's probably a broker client task.	2024-04-03 17:49:54 +02:00
Vlad Lazar	36b875388f	pageserver: replace the locked tenant config with arcsawps (#7292 ) ## Problem For reasons unrelated to this PR, I would like to make use of the tenant conf in the `InMemoryLayer`. Previously, this was not possible without copying and manually updating the copy to keep it in sync with updates. ## Summary of Changes: Replace the `Arc<RwLock<AttachedTenantConf>>` with `Arc<ArcSwap<AttachedTenantConf>>` (how many `Arc(s)` can one fit in a type?). The most interesting part of this change is the updating of the tenant config (`set_new_tenant_config` and `set_new_location_config`). In theory, these two may race, although the storage controller should prevent this via the tenant exclusive op lock. Particular care has been taken to not "lose" a location config update by using the read-copy-update approach when updating only the config.	2024-04-03 16:46:25 +01:00
Arthur Petukhovsky	3f77f26aa2	Upload partial segments (#6530 ) Add support for backing up partial segments to remote storage. Disabled by default, can be enabled with `--partial-backup-enabled`. Safekeeper timeline has a background task which is subscribed to `commit_lsn` and `flush_lsn` updates. After the partial segment was updated (`flush_lsn` was changed), the segment will be uploaded to S3 in about 15 minutes. The filename format for partial segments is `Segment_Term_Flush_Commit_skNN.partial`, where: - `Segment` – the segment name, like `000000010000000000000001` - `Term` – current term - `Flush` – flush_lsn in hex format `{:016X}`, e.g. `00000000346BC568` - `Commit` – commit_lsn in the same hex format - `NN` – safekeeper_id, like `1` The full object name example: `000000010000000000000002_2_0000000002534868_0000000002534410_sk1.partial` Each safekeeper will keep info about remote partial segments in its control file. Code updates state in the control file before doing any S3 operations. This way control file stores information about all potentially existing remote partial segments and can clean them up after uploading a newer version. Closes #6336	2024-04-03 15:20:51 +00:00
John Spray	8b10407be4	pageserver: on-demand activation of tenant on GET tenant status (#7250 ) ## Problem (Follows https://github.com/neondatabase/neon/pull/7237) Some API users will query a tenant to wait for it to activate. Currently, we return the current status of the tenant, whatever that may be. Under heavy load, a pageserver starting up might take a long time to activate such a tenant. ## Summary of changes - In `tenant_status` handler, call wait_to_become_active on the tenant. If the tenant is currently waiting for activation, this causes it to skip the queue, similiar to other API handlers that require an active tenant, like timeline creation. This avoids external services waiting a long time for activation when polling GET /v1/tenant/<id>.	2024-04-03 16:53:43 +03:00
Arpad Müller	944313ffe1	Schedule image layer uploads in tiered compaction (#7282 ) Tiered compaction hasn't scheduled the upload of image layers. In the `test_gc_feedback.py` test this has caused warnings like with tiered compaction: ``` INFO request[...] Deleting layer [...] not found in latest_files list, never uploaded? ``` Which caused errors like: ``` ERROR layer_delete[...] was unlinked but was not dangling ``` Fixes #7244	2024-04-03 13:42:45 +02:00
Joonas Koivunen	d443d07518	wal_ingest: global counter for bytes received (#7240 ) Fixes #7102 by adding a metric for global total received WAL bytes: `pageserver_wal_ingest_bytes_received`.	2024-04-03 13:30:14 +03:00
Christian Schwarz	3de416a016	refactor(walreceiver): eliminate task_mgr usage (#7260 ) We want to move the code base away from task_mgr. This PR refactors the walreceiver code such that it doesn't use `task_mgr` anymore. # Background As a reminder, there are three tasks in a Timeline that's ingesting WAL. `WalReceiverManager`, `WalReceiverConnectionHandler`, and `WalReceiverConnectionPoller`. See the documentation in `task_mgr.rs` for how they interact. Before this PR, cancellation was requested through task_mgr::shutdown_token() and `TaskHandle::shutdown`. Wait-for-task-finish was implemented using a mixture of `task_mgr::shutdown_tasks` and `TaskHandle::shutdown`. This drawing might help: <img width="300" alt="image" src="https://github.com/neondatabase/neon/assets/956573/b6be7ad6-ecb3-41d0-b410-ec85cb8d6d20"> # Changes For cancellation, the entire WalReceiver task tree now has a `child_token()` of `Timeline::cancel`. The `TaskHandle` no longer is a cancellation root. This means that `Timeline::cancel.cancel()` is propagated. For wait-for-task-finish, all three tasks in the task tree hold the `Timeline::gate` open until they exit. The downside of using the `Timeline::gate` is that we can no longer wait for just the walreceiver to shut down, which is particularly relevant for `Timeline::flush_and_shutdown`. Effectively, it means that we might ingest more WAL while the `freeze_and_flush()` call is ongoing. Also, drive-by-fix the assertiosn around task kinds in `wait_lsn`. The check for `WalReceiverConnectionHandler` was ineffective because that never was a task_mgr task, but a TaskHandle task. Refine the assertion to check whether we would wait, and only fail in that case. # Alternatives I contemplated (ab-)using the `Gate` by having a separate `Gate` for `struct WalReceiver`. All the child tasks would use _that_ gate instead of `Timeline::gate`. And `struct WalReceiver` itself would hold an `Option<GateGuard>` of the `Timeline::gate`. Then we could have a `WalReceiver::stop` function that closes the WalReceiver's gate, then drops the `WalReceiver::Option<GateGuard>`. However, such design would mean sharing the WalReceiver's `Gate` in an `Arc`, which seems awkward. A proper abstraction would be to make gates hierarchical, analogous to CancellationToken. In the end, @jcsp and I talked it over and we determined that it's not worth the effort at this time. # Refs part of #7062	2024-04-03 12:28:04 +02:00
John Spray	bc05d7eb9c	pageserver: even more debug for test_secondary_downloads (#7295 ) The latest failures of test_secondary_downloads are spooky: layers are missing on disk according to the test, but present according to the pageserver logs: - Make the pageserver assert that layers are really present on disk and log the full path (debug mode only) - Make the test dump a full listing on failure of the assert that failed the last two times Related: #6966	2024-04-03 11:23:44 +01:00
Conrad Ludgate	d8da51e78a	remove http timeout (#7291 ) ## Problem https://github.com/neondatabase/cloud/issues/11051 additionally, I felt like the http logic was a bit complex. ## Summary of changes 1. Removes timeout for HTTP requests. 2. Split out header parsing to a `HttpHeaders` type. 3. Moved db client handling to `QueryData::process` and `BatchQueryData::process` to simplify the logic of `handle_inner` a bit.	2024-04-03 11:23:26 +01:00
John Spray	6e3834d506	controller: add `storcon-cli` (#7114 ) ## Problem During incidents, we may need to quickly access the storage controller's API without trying API client code or crafting `curl` CLIs on the fly. A basic CLI client is needed for this. ## Summary of changes - Update storage controller node listing API to only use public types in controller_api.rs - Add a storage controller API for listing tenants - Add a basic test that the CLI can list and modify nodes and tenants.	2024-04-03 10:07:56 +00:00
Anna Khanova	582cec53c5	proxy: upload consumption events to S3 (#7213 ) ## Problem If vector is unavailable, we are missing consumption events. https://github.com/neondatabase/cloud/issues/9826 ## Summary of changes Added integration with the consumption bucket.	2024-04-02 21:46:23 +02:00
Vlad Lazar	9957c6a9a0	pageserver: drop the layer map lock after planning reads (#7215 ) ## Problem The vectored read path holds the layer map lock while visiting a timeline. ## Summary of changes * Rework the fringe order to hold `Layer` on `Arc<InMemoryLayer>` handles instead of descriptions that are resolved by the layer map at the time of read. Note that previously `get_values_reconstruct_data` was implemented for the layer description which already knew the lsn range for the read. Now it is implemented on the new `ReadableLayer` handle and needs to get the lsn range as an argument. * Drop the layer map lock after updating the fringe. Related https://github.com/neondatabase/neon/issues/6833	2024-04-02 17:16:15 +01:00
John Spray	a5777bab09	tests: clean up compat test workarounds (#7097 ) - Cleanup from https://github.com/neondatabase/neon/pull/7040#discussion_r1521120263 -- in that PR, we needed to let compat tests manually register a node, because it would run an old binary that doesn't self-register. - Cleanup vectored get config workaround - Cleanup a log allow list for which the underlying log noise has been fixed.	2024-04-02 16:46:24 +01:00
Alexander Bayandin	90a8ff55fa	CI(benchmarking): Add Sharded Tenant for pgbench (#7186 ) ## Problem During Nightly Benchmarks, we want to collect pgbench results for sharded tenants as well. ## Summary of changes - Add pre-created sharded project for pgbench	2024-04-02 14:39:24 +01:00
macdoos	3b95e8072a	test_runner: replace all `.format()` with f-strings (#7194 )	2024-04-02 14:32:14 +01:00
Conrad Ludgate	8ee54ffd30	update tokio 1.37 (#7276 ) ## Problem ## Summary of changes `cargo update -p tokio`. The only risky change I could see is the `tokio::io::split` moving from a spin-lock to a mutex but I think that's ok.	2024-04-02 10:12:54 +01:00
Alex Chi Z	3ab9f56f5f	fixup(#7278/compute_ctl): remote extension download permission (#7280 ) Fix #7278 ## Summary of changes * Explicitly create the extension download directory and assign correct permissoins. * Fix the problem that the extension download failure will cause all future downloads to fail. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-29 17:59:30 +00:00
Alex Chi Z	7ddc7b4990	neonvm: add LFC approximate working set size to metrics (#7252 ) ref https://github.com/neondatabase/autoscaling/pull/878 ref https://github.com/neondatabase/autoscaling/issues/872 Add `approximate_working_set_size` to sql exporter so that autoscaling can use it in the future. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Peter Bendel <peterbendel@neon.tech>	2024-03-29 12:11:17 -04:00
John Spray	63213fc814	storage controller: scheduling optimization for sharded tenants (#7181 ) ## Problem - When we scheduled locations, we were doing it without any context about other shards in the same tenant - After a shard split, there wasn't an automatic mechanism to migrate the attachments away from the split location - After a shard split and the migration away from the split location, there wasn't an automatic mechanism to pick new secondary locations so that the end state has no concentration of locations on the nodes where the split happened. Partially completes: https://github.com/neondatabase/neon/issues/7139 ## Summary of changes - Scheduler now takes a `ScheduleContext` object that can be populated with information about other shards - During tenant creation and shard split, we incrementally build up the ScheduleContext, updating it for each shard as we proceed. - When scheduling new locations, the ScheduleContext is used to apply a soft anti-affinity to nodes where a tenant already has shards. - The background reconciler task now has an extra phase `optimize_all`, which runs only if the primary `reconcile_all` phase didn't generate any work. The separation is that `reconcile_all` is needed for availability, but optimize_all is purely "nice to have" work to balance work across the nodes better. - optimize_all calls into two new TenantState methods called optimize_attachment and optimize_secondary, which seek out opportunities to improve placment: - optimize_attachment: if the node where we're currently attached has an excess of attached shard locations for this tenant compared with the node where we have a secondary location, then cut over to the secondary location. - optimize_secondary: if the node holding our secondary location has an excessive number of locations for this tenant compared with some other node where we don't currently have a location, then create a new secondary location on that other node. - a new debug API endpoint is provided to run background tasks on-demand. This returns a number of reconciliations in progress, so callers can keep calling until they get a `0` to advance the system to its final state without waiting for many iterations of the background task. Optimization is run at an implicitly low priority by: - Omitting the phase entirely if reconcile_all has work to do - Skipping optimization of any tenant that has reconciles in flight - Limiting the total number of optimizations that will be run from one call to optimize_all to a constant (currently 2). The idea of that low priority execution is to minimize the operational risk that optimization work overloads any part of the system. It happens to also make the system easier to observe and debug, as we avoid running large numbers of concurrent changes. Eventually we may relax these limitations: there is no correctness problem with optimizing lots of tenants concurrently, and optimizing multiple shards in one tenant just requires housekeeping changes to update ShardContext with the result of one optimization before proceeding to the next shard.	2024-03-28 18:48:52 +00:00
Vlad Lazar	090123a429	pageserver: check for new image layers based on ingested WAL (#7230 ) ## Problem Part of the legacy (but current) compaction algorithm is to find a stack of overlapping delta layers which will be turned into an image layer. This operation is exponential in terms of the number of matching layers and we do it roughly every 20 seconds. ## Summary of changes Only check if a new image layer is required if we've ingested a certain amount of WAL since the last check. The amount of wal is expressed in terms of multiples of checkpoint distance, with the intuition being that that there's little point doing the check if we only have two new L1 layers (not enough to create a new image).	2024-03-28 17:44:55 +00:00
John Spray	39d1818ae9	storage controller: be more tolerant of control plane blocking notifications (#7268 ) ## Problem - Control plane can deadlock if it calls into a function that requires reconciliation to complete, while refusing compute notification hooks API calls. ## Summary of changes - Fail faster in the notify path in 438 errors: these were originally expected to be transient, but in practice it's more common that a 438 results from an operation blocking on the currently API call, rather than something happening in the background. - In ensure_attached, relax the condition for spawning a reconciler: instead of just the general maybe_reconcile path, do a pre-check that skips trying to reconcile if the shard appears to be attached. This avoids doing work in cases where the tenant is attached, but is dirty from a reconciliation point of view, e.g. due to a failed compute notification.	2024-03-28 17:38:08 +00:00
Alex Chi Z	90be79fcf5	spec: allow neon extension auto-upgrade + softfail upgrade (#7231 ) reverts https://github.com/neondatabase/neon/pull/7128, unblocks https://github.com/neondatabase/cloud/issues/10742 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-28 17:22:35 +00:00
Alexander Bayandin	c52b80b930	CI(deploy): Do not deploy storage controller to preprod for proxy releases (#7269 ) ## Problem Proxy release to a preprod automatically triggers a deployment of storage controller (`deployStorageController=true` by default) ## Summary of changes - Set `deployStorageController=false` for proxy releases to preprod - Set explicitly `deployStorageController=true` for storage releases to preprod and prod	2024-03-28 16:51:45 +00:00
Anastasia Lubennikova	722f271f6e	Specify caller in 'unexpected response from page server' error (#7272 ) Tiny improvement for log messages to investigate https://github.com/neondatabase/cloud/issues/11559	2024-03-28 15:28:58 +00:00
Alex Chi Z	be1d8fc4f7	fix: drop replication slot causes postgres stuck on exit (#7192 ) Fix https://github.com/neondatabase/neon/issues/6969 Ref https://github.com/neondatabase/postgres/pull/395 https://github.com/neondatabase/postgres/pull/396 Postgres will stuck on exit if the replication slot is not dropped before shutting down. This is caused by Neon's custom WAL record to record replication slots. The pull requests in the postgres repo fixes the problem, and this pull request bumps the postgres commit. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-28 15:24:36 +00:00
Vlad Lazar	25c4b676e0	pageserver: fix oversized key on vectored read (#7259 ) ## Problem During this week's deployment we observed panics due to the blobs for certain keys not fitting in the vectored read buffers. The likely cause of this is a bloated AUX_FILE_KEY caused by logical replication. ## Summary of changes This pr fixes the issue by allocating a buffer big enough to fit the widest read. It also has the benefit of saving space if all keys in the read have blobs smaller than the max vectored read size. If the soft limit for the max size of a vectored read is violated, we print a warning which includes the offending key and lsn. A randomised (but deterministic) end to end test is also added for vectored reads on the delta layer.	2024-03-28 14:27:15 +00:00
John Spray	6633332e67	storage controller: tenant scheduling policy (#7262 ) ## Problem In the event of bugs with scheduling or reconciliation, we need to be able to switch this off at a per-tenant granularity. This is intended to mitigate risk of issues with https://github.com/neondatabase/neon/pull/7181, which makes scheduling more involved. Closes: #7103 ## Summary of changes - Introduce a scheduling policy per tenant, with API to set it - Refactor persistent.rs helpers for updating tenants to be more general - Add tests	2024-03-28 14:19:25 +00:00
Arpad Müller	5928f6709c	Support compaction_threshold=1 for tiered compaction (#7257 ) Many tests like `test_live_migration` or `test_timeline_deletion_with_files_stuck_in_upload_queue` set `compaction_threshold` to 1, to create a lot of changes/updates. The compaction threshold was passed as `fanout` parameter to the tiered_compaction function, which didn't support values of 1 however. Now we change the assert to support it, while still retaining the exponential nature of the increase in range in terms of lsn that a layer is responsible for. A large chunk of the failures in #6964 was due to hitting this issue that we now resolved. Part of #6768.	2024-03-28 13:48:47 +01:00
Konstantin Knizhnik	63b2060aef	Drop connections with all shards invoplved in prefetch in case of error (#7249 ) ## Problem See https://github.com/neondatabase/cloud/issues/11559 If we have multiple shards, we need to reset connections to all shards involved in prefetch (having active prefetch requests) if connection with any of them is lost. ## Summary of changes In `prefetch_on_ps_disconnect` drop connection to all shards with active page requests. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-03-28 08:16:05 +02:00
Sasha Krassovsky	24c5a5ac16	Revert "Revoke REPLICATION" (#7261 ) Reverts neondatabase/neon#7052	2024-03-27 18:07:51 +00:00
Alexander Bayandin	7f9cc1bd5e	CI(trigger-e2e-tests): set e2e-platforms (#7229 ) ## Problem We don't want to run an excessive e2e test suite on neonvm if there are no relevant changes. ## Summary of changes - Check PR diff and if there are no relevant compute changes (in `vendor/`, `pgxn/`, `libs/vm_monitor` or `Dockerfile.compute-node` - Switch job from `small` to `ubuntu-latest` runner to make it possible to use GitHub CLI	2024-03-27 13:10:37 +00:00
Christian Schwarz	cdf12ed008	fix(walreceiver): Timeline::shutdown can leave a dangling handle_walreceiver_connection tokio task (#7235 ) # Problem As pointed out through doc-comments in this PR, `drop_old_connection` is not cancellation-safe. This means we can leave a `handle_walreceiver_connection` tokio task dangling during Timeline shutdown. More details described in the corresponding issue #7062. # Solution Don't cancel-by-drop the `connection_manager_loop_step` from the `tokio::select!()` in the task_mgr task. Instead, transform the code to use a `CancellationToken` --- specifically, `task_mgr::shutdown_token()` --- and make code responsive to it. The `drop_old_connection()` is still not cancellation-safe and also doesn't get a cancellation token, because there's no point inside the function where we could return early if cancellation were requested using a token. We rely on the `handle_walreceiver_connection` to be sensitive to the `TaskHandle`s cancellation token (argument name: `cancellation`). Currently it checks for `cancellation` on each WAL message. It is probably also sensitive to `Timeline::cancel` because ultimately all that `handle_walreceiver_connection` does is interact with the `Timeline`. In summary, the above means that the following code (which is found in `Timeline::shutdown`) now might take longer, but actually ensures that all `handle_walreceiver_connection` tasks are finished: ```rust task_mgr::shutdown_tasks( Some(TaskKind::WalReceiverManager), Some(self.tenant_shard_id), Some(self.timeline_id) ) ``` # Refs refs #7062	2024-03-27 12:04:31 +01:00
Conrad Ludgate	12512f3173	add authentication rate limiting (#6865 ) ## Problem https://github.com/neondatabase/cloud/issues/9642 ## Summary of changes 1. Make `EndpointRateLimiter` generic, renamed as `BucketRateLimiter` 2. Add support for claiming multiple tokens at once 3. Add `AuthRateLimiter` alias. 4. Check `(Endpoint, IP)` pair during authentication, weighted by how many hashes proxy would be doing. TODO: handle ipv6 subnets. will do this in a separate PR.	2024-03-26 19:31:19 +00:00
John Spray	b3b7ce457c	pageserver: remove bare mgr::get_tenant, mgr::list_tenants (#7237 ) ## Problem This is a refactor. This PR was a precursor to a much smaller change `e5bd602dc1`, where as I was writing it I found that we were not far from getting rid of the last non-deprecated code paths that use `mgr::` scoped functions to get at the TenantManager state. We're almost done cleaning this up as per https://github.com/neondatabase/neon/issues/5796. The only significant remaining mgr:: item is `get_active_tenant_with_timeout`, which is page_service's path for fetching tenants. ## Summary of changes - Remove the bool argument to get_attached_tenant_shard: this was almost always false from API use cases, and in cases when it was true, it was readily replacable with an explicit check of the returned tenant's status. - Rather than letting the timeline eviction task query any tenant it likes via `mgr::`, pass an `Arc<Tenant>` into the task. This is still an ugly circular reference, but should eventually go away: either when we switch to exclusively using disk usage eviction, or when we change metadata storage to avoid the need to imitate layer accesses. - Convert all the mgr::get_tenant call sites to use TenantManager::get_attached_tenant_shard - Move list_tenants into TenantManager.	2024-03-26 18:29:08 +00:00
John Spray	6814bb4b59	tests: add a log allow list to stabilize benchmarks (#7251 ) ## Problem https://github.com/neondatabase/neon/pull/7227 destabilized various tests in the performance suite, with log errors during shutdown. It's because we switched shutdown order to stop the storage controller before the pageservers. ## Summary of changes - Tolerate "connection failed" errors from pageservers trying to validation their deletion queue.	2024-03-26 17:44:18 +00:00
John Spray	b3bb1d1cad	storage controller: make direct tenant creation more robust (#7247 ) ## Problem - Creations were not idempotent (unique key violation) - Creations waited for reconciliation, which control plane blocks while an operation is in flight ## Summary of changes - Handle unique key constraint violation as an OK situation: if we're creating the same tenant ID and shard count, it's reasonable to assume this is a duplicate creation. - Make the wait for reconcile during creation tolerate failures: this is similar to location_conf, where the cloud control plane blocks our notification calls until it is done with calling into our API (in future this constraint is expected to relax as the cloud control plane learns to run multiple operations concurrently for a tenant)	2024-03-26 16:57:35 +00:00
John Spray	47d2b3a483	pageserver: limit total ephemeral layer bytes (#7218 ) ## Problem Follows: https://github.com/neondatabase/neon/pull/7182 - Sufficient concurrent writes could OOM a pageserver from the size of indices on all the InMemoryLayer instances. - Enforcement of checkpoint_period only happened if there were some writes. Closes: https://github.com/neondatabase/neon/issues/6916 ## Summary of changes - Add `ephemeral_bytes_per_memory_kb` config property. This controls the ratio of ephemeral layer capacity to memory capacity. The weird unit is to enable making the ratio less than 1:1 (set this property to 1024 to use 1MB of ephemeral layers for every 1MB of RAM, set it smaller to get a fraction). - Implement background layer rolling checks in Timeline::compaction_iteration -- this ensures we apply layer rolling policy in the absence of writes. - During background checks, if the total ephemeral layer size has exceeded the limit, then roll layers whose size is greater than the mean size of all ephemeral layers. - Remove the tick() path from walreceiver: it isn't needed any more now that we do equivalent checks from compaction_iteration. - Add tests for the above. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-03-26 15:45:32 +00:00
John Spray	8dfe3a070c	pageserver: return 429 on timeline creation in progress (#7225 ) ## Problem Currently, we return 409 (Conflict) in two cases: - Temporary: Timeline creation cannot proceed because another timeline with the same ID is being created - Permanent: Timeline creation cannot proceed because another timeline exists with different parameters but the same ID. Callers which time out a request and retry should be able to distinguish these cases. Closes: #7208 ## Summary of changes - Expose `AlreadyCreating` errors as 429 instead of 409	2024-03-26 15:20:05 +00:00
Alexander Bayandin	3426619a79	test_runner/performance: skip test_bulk_insert (#7238 ) ## Problem `test_bulk_insert` becomes too slow, and it fails constantly: https://github.com/neondatabase/neon/issues/7124 ## Summary of changes - Skip `test_bulk_insert` until it's fixed	2024-03-26 15:10:15 +00:00
Vlad Lazar	de03742ca3	pageserver: drop layer map lock in Timeline::get (#7217 ) ## Problem We currently hold the layer map read lock while doing IO on the read path. This is not required for correctness. ## Summary of changes Drop the layer map lock after figuring out which layer we wish to read from. Why is this correct: * `Layer` models the lifecycle of an on disk layer. In the event the layer is removed from local disk, it will be on demand downloaded * `InMemoryLayer` holds the `EphemeralFile` which wraps the on disk file. As long as the `InMemoryLayer` is in scope, it's safe to read from it. Related https://github.com/neondatabase/neon/issues/6833	2024-03-26 14:35:36 +00:00
Christian Schwarz	ad072de420	Revert "pageserver: use a single tokio runtime (#6555 )" (#7246 )	2024-03-26 15:24:18 +01:00
Anna Khanova	6c18109734	proxy: reuse sess_id as request_id for the cplane requests (#7245 ) ## Problem https://github.com/neondatabase/cloud/issues/11599 ## Summary of changes Reuse the same sess_id for requests within the one session. TODO: get rid of `session_id` in query params.	2024-03-26 11:27:48 +00:00
John Spray	5dee58f492	tests: wait for uploads in test_secondary_downloads (#7220 ) ## Problem - https://github.com/neondatabase/neon/issues/6966 This test occasionally failed with some layers unexpectedly not present on the secondary pageserver. The issue in that failure is the attached pageserver uploading heatmaps that refer to not-yet-uploaded layers. ## Summary of changes After uploading heatmap, drain upload queue on attached pageserver, to guarantee that all the layers referenced in the haetmap are uploaded.	2024-03-26 10:59:16 +00:00
John Spray	6313f1fa7a	tests: tolerate transient unavailability in test_sharding_split_failures (#7223 ) ## Problem While most forms of split rollback don't interrupt clients, there are a couple of cases that do -- this interruption is brief, driven by the time it takes the controller to kick off Reconcilers during the async abort of the split, so it's operationally fine, but can trip up a test. - #7148 ## Summary of changes - Relax test check to require that the tenant is eventually available after split failure, rather than immediately. In the vast majority of cases this will pass on the first iteration.	2024-03-26 09:56:47 +00:00
Christian Schwarz	f72415e1fd	refactor(remote_timeline_client): infallible stop() and shutdown() (#7234 ) preliminary refactoring for https://github.com/neondatabase/neon/pull/7233 part of #7062	2024-03-25 18:42:18 +01:00
George Ma	d837ce0686	chore: remove repetitive words (#7206 ) Signed-off-by: availhang <mayangang@outlook.com>	2024-03-25 11:43:02 -04:00
John Spray	2713142308	tests: stabilize compat tests (#7227 ) This test had two flaky failure modes: - pageserver log error for timeline not found: this resulted from changes for DR when timeline destroy/create was added, but endpoint was left running during that operation. - storage controller log error because the test was running for long enough that a background reconcile happened at almost the exact moment of test teardown, and our test fixtures tear down the pageservers before the controller. Closes: #7224	2024-03-25 14:35:24 +00:00
Arseny Sher	a6c1fdcaf6	Try to fix test_crafted_wal_end flakiness. Postgres can always write some more WAL, so previous checks that WAL doesn't change after something had been crafted were wrong; remove them. Add comments here and there. should fix https://github.com/neondatabase/neon/issues/4691	2024-03-25 14:53:06 +03:00
John Spray	adb0526262	pageserver: track total ephemeral layer bytes (#7182 ) ## Problem Large quantities of ephemeral layer data can lead to excessive memory consumption (https://github.com/neondatabase/neon/issues/6939). We currently don't have a way to know how much ephemeral layer data is present on a pageserver. Before we can add new behaviors to proactively roll layers in response to too much ephemeral data, we must calculate that total. Related: https://github.com/neondatabase/neon/issues/6916 ## Summary of changes - Create GlobalResources and GlobalResourceUnits types, where timelines carry a GlobalResourceUnits in their TimelineWriterState. - Periodically update the size in GlobalResourceUnits: - During tick() - During layer roll - During put() if the latest value has drifted more than 10MB since our last update - Expose the value of the global ephemeral layer bytes counter as a prometheus metric. - Extend the lifetime of TimelineWriterState: - Instead of dropping it in TimelineWriter::drop, let it remain. - Drop TimelineWriterState in roll_layer: this drops our guard on the global byte count to reflect the fact that we're freezing the layer. - Ensure the validity of the later in the writer state by clearing the state in the same place we freeze layers, and asserting on the write-ability of the layer in `writer()` - Add a 'context' parameter to `get_open_layer_action` so that it can skip the prev_lsn==lsn check when called in tick() -- this is needed because now tick is called with a populated state, where prev_lsn==Some(lsn) is true for an idle timeline. - Extend layer rolling test to use this metric	2024-03-25 11:52:50 +00:00
John Spray	0099dfa56b	storage controller: tighten up secrets handling (#7105 ) - Remove code for using AWS secrets manager, as we're deploying with k8s->env vars instead - Load each secret independently, so that one can mix CLI args with environment variables, rather than requiring that all secrets are loaded with the same mechanism. - Add a 'strict mode', enabled by default, which will refuse to start if secrets are not loaded. This avoids the risk of accidentially disabling auth by omitting the public key, for example	2024-03-25 11:52:33 +00:00
Vlad Lazar	3a4ebfb95d	test: fix `test_pageserver_recovery` flakyness (#7207 ) ## Problem We recently introduced log file validation for the storage controller. The heartbeater will WARN when it fails for a node, hence the test fails. Closes https://github.com/neondatabase/neon/issues/7159 ## Summary of changes * Warn only once for each set of heartbeat retries * Allow list heartbeat warns	2024-03-25 09:38:12 +00:00
Christian Schwarz	3220f830b7	pageserver: use a single tokio runtime (#6555 ) Before this PR, each core had 3 executor threads from 3 different runtimes. With this PR, we just have one runtime, with one thread per core. Switching to a single tokio runtime should reduce that effective over-commit of CPU and in theory help with tail latencies -- iff all tokio tasks are well-behaved and yield to the runtime regularly. Are All Tasks Well-Behaved? Are We Ready? ----------------------------------------- Sadly there doesn't seem to be good out-of-the box tokio tooling to answer this question. We believe all tasks are well behaved in today's code base, as of the switch to `virtual_file_io_engine = "tokio-epoll-uring"` in production (https://github.com/neondatabase/aws/pull/1121). The only remaining executor-thread-blocking code is walredo and some filesystem namespace operations. Filesystem namespace operations work is being tracked in #6663 and not considered likely to actually block at this time. Regarding walredo, it currently does a blocking `poll` for read/write to the pipe file descriptors we use for IPC with the walredo process. There is an ongoing experiment to make walredo async (#6628), but it needs more time because there are surprisingly tricky trade-offs that are articulated in that PR's description (which itself is still WIP). What's relevant for this PR is that 1. walredo is always CPU-bound 2. production tail latencies for walredo request-response (`pageserver_wal_redo_seconds_bucket`) are - p90: with few exceptions, low hundreds of micro-seconds - p95: except on very packed pageservers, below 1ms - p99: all below 50ms, vast majority below 1ms - p99.9: almost all around 50ms, rarely at >= 70ms - [Dashboard Link](https://neonprod.grafana.net/d/edgggcrmki3uof/2024-03-walredo-latency?orgId=1&var-ds=ZNX49CDVz&var-pXX_by_instance=0.9&var-pXX_by_instance=0.99&var-pXX_by_instance=0.95&var-adhoc=instance%7C%21%3D%7Cpageserver-30.us-west-2.aws.neon.tech&var-per_instance_pXX_max_seconds=0.0005&from=1711049688777&to=1711136088777) The ones below 1ms are below our current threshold for when we start thinking about yielding to the executor. The tens of milliseconds stalls aren't great, but, not least because of the implicit overcommit of CPU by the three runtimes, we can't be sure whether these tens of milliseconds are inherently necessary to do the walredo work or whether we could be faster if there was less contention for CPU. On the first item (walredo being always CPU-bound work): it means that walredo processes will always compete with the executor threads. We could yield, using async walredo, but then we hit the trade-offs explained in that PR. tl;dr: the risk of stalling executor threads through blocking walredo seems low, and switching to one runtime cleans up one potential source for higher-than-necessary stall times (explained in the previous paragraphs). Code Changes ------------ - Remove the 3 different runtime definitions. - Add a new definition called `THE_RUNTIME`. - Use it in all places that previously used one of the 3 removed runtimes. - Remove the argument from `task_mgr`. - Fix failpoint usage where `pausable_failpoint!` should have been used. We encountered some actual failures because of this, e.g., hung `get_metric()` calls during test teardown that would client-timeout after 300s. As indicated by the comment above `THE_RUNTIME`, we could take this clean-up further. But before we create so much churn, let's first validate that there's no perf regression. Performance ----------- We will test this in staging using the various nightly benchmark runs. However, the worst-case impact of this change is likely compaction (=>image layer creation) competing with compute requests. Image layer creation work can't be easily generated & repeated quickly by pagebench. So, we'll simply watch getpage & basebackup tail latencies in staging. Additionally, I have done manual benchmarking using pagebench. Report: https://neondatabase.notion.site/2024-03-23-oneruntime-change-benchmarking-22a399c411e24399a73311115fb703ec?pvs=4 Tail latencies and throughput are marginally better (no regression = good). Except in a workload with 128 clients against one tenant. There, the p99.9 and p99.99 getpage latency is about 2x worse (at slightly lower throughput). A dip in throughput every 20s (compaction_period_ is clearly visible, and probably responsible for that worse tail latency. This has potential to improve with async walredo, and is an edge case workload anyway. Future Work ----------- 1. Once this change has shown satisfying results in production, change the codebase to use the ambient runtime instead of explicitly referencing `THE_RUNTIME`. 2. Have a mode where we run with a single-threaded runtime, so we uncover executor stalls more quickly. 3. Switch or write our own failpoints library that is async-native: https://github.com/neondatabase/neon/issues/7216	2024-03-23 19:25:11 +01:00
Conrad Ludgate	72103d481d	proxy: fix stack overflow in cancel publisher (#7212 ) ## Problem stack overflow in blanket impl for `CancellationPublisher` ## Summary of changes Removes `async_trait` and fixes the impl order to make it non-recursive.	2024-03-23 06:36:58 +00:00
Alex Chi Z	643683f41a	fixup(#7204 / postgres): revert `IsPrimaryAlive` checks (#7209 ) Fix #7204. https://github.com/neondatabase/postgres/pull/400 https://github.com/neondatabase/postgres/pull/401 https://github.com/neondatabase/postgres/pull/402 These commits never go into prod. Detailed investigation will be posted in another issue. Reverting the commits so that things can keep running in prod. This pull request adds the test to start two replicas. It fails on the current main https://github.com/neondatabase/neon/pull/7210 but passes in this pull request. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-23 01:01:51 +00:00
Konstantin Knizhnik	35f4c04c9b	Remove Get/SetZenithCurrentClusterSize from Postgres core (#7196 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1711003752072899 ## Summary of changes Move keeping of cluster size to neon extension --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-03-22 13:14:31 -04:00
John Spray	1787cf19e3	pageserver: write consumption metrics to S3 (#7200 ) ## Problem The service that receives consumption metrics has lower availability than S3. Writing metrics to S3 improves their availability. Closes: https://github.com/neondatabase/cloud/issues/9824 ## Summary of changes - The same data as consumption metrics POST bodies is also compressed and written to an S3 object with a timestamp-formatted path. - Set `metric_collection_bucket` (same format as `remote_storage` config) to configure the location to write to	2024-03-22 14:52:14 +00:00
Alexander Bayandin	2668a1dfab	CI: deploy release version to a preprod region (#6811 ) ## Problem We want to deploy releases to a preprod region first to perform required checks ## Summary of changes - Deploy `release-XXX` / `release-proxy-YYY` docker tags to a preprod region	2024-03-22 14:42:10 +00:00
Conrad Ludgate	77f3a30440	proxy: unit tests for auth_quirks (#7199 ) ## Problem I noticed code coverage for auth_quirks was pretty bare ## Summary of changes Adds 3 happy path unit tests for auth_quirks * scram * cleartext (websockets) * cleartext (password hack)	2024-03-22 13:31:10 +00:00
John Spray	62b318c928	Fix ephemeral file warning on secondaries (#7201 ) A test was added which exercises secondary locations more, and there was a location in the secondary downloader that warned on ephemeral files. This was intended to be fixed in this faulty commit: `8cea866adf`	2024-03-22 10:10:28 +00:00
Anna Khanova	6770ddba2e	proxy: connect redis with AWS IAM (#7189 ) ## Problem Support of IAM Roles for Service Accounts for authentication. ## Summary of changes * Obtain aws 15m-long credentials * Retrieve redis password from credentials * Update every 1h to keep connection for more than 12h * For now allow to have different endpoints for pubsub/stream redis. TODOs: * PubSub doesn't support credentials refresh, consider using stream instead. * We need an AWS role for proxy to be able to connect to both: S3 and elasticache. Credentials obtaining and connection refresh was tested on xenon preview. https://github.com/neondatabase/cloud/issues/10365	2024-03-22 09:38:04 +01:00
Arpad Müller	3ee34a3f26	Update Rust to 1.77.0 (#7198 ) Release notes: https://blog.rust-lang.org/2024/03/21/Rust-1.77.0.html Thanks to #6886 the diff is reasonable, only for one new lint `clippy::suspicious_open_options`. I added `truncate()` calls to the places where it is obviously the right choice to me, and added allows everywhere else, leaving it for followups. I had to specify cargo install --locked because the build would fail otherwise. This was also recommended by upstream.	2024-03-22 06:52:31 +00:00
Christian Schwarz	fb60278e02	walredo benchmark: throughput-oriented rewrite (#7190 ) See the updated `bench_walredo.rs` module comment. tl;dr: we measure avg latency of single redo operations issues against a single redo manager from N tokio tasks. part of https://github.com/neondatabase/neon/issues/6628	2024-03-21 15:24:56 +01:00
Conrad Ludgate	d5304337cf	proxy: simplify password validation (#7188 ) ## Problem for HTTP/WS/password hack flows we imitate SCRAM to validate passwords. This code was unnecessarily complicated. ## Summary of changes Copy in the `pbkdf2` and 'derive keys' steps from the `postgres_protocol` crate in our `rust-postgres` fork. Derive the `client_key`, `server_key` and `stored_key` from the password directly. Use constant time equality to compare the `stored_key` and `server_key` with the ones we are sent from cplane.	2024-03-21 13:54:06 +00:00
John Spray	06cb582d91	pageserver: extend /re-attach response to include tenant mode (#6941 ) This change improves the resilience of the system to unclean restarts. Previously, re-attach responses only included attached tenants - If the pageserver had local state for a secondary location, it would remain, but with no guarantee that it was still _meant_ to be there. After this change, the pageserver will only retain secondary locations if the /re-attach response indicates that they should still be there. - If the pageserver had local state for an attached location that was omitted from a re-attach response, it would be entirely detached. This is wasteful in a typical HA setup, where an offline node's tenants might have been re-attached elsewhere before it restarts, but the offline node's location should revert to a secondary location rather than being wiped. Including secondary tenants in the re-attach response enables the pageserver to avoid throwing away local state unnecessarily. In this PR: - The re-attach items are extended with a 'mode' field. - Storage controller populates 'mode' - Pageserver interprets it (default is attached if missing) to construct either a SecondaryTenant or a Tenant. - A new test exercises both cases.	2024-03-21 13:39:23 +00:00
John Spray	bb47d536fb	pageserver: quieten log on shutdown-while-attaching (#7177 ) ## Problem If a shutdown happens when a tenant is attaching, we were logging at ERROR severity and with a backtrace. Yuck. ## Summary of changes - Pass a flag into `make_broken` to enable quietening this non-scary case.	2024-03-21 12:56:13 +00:00
John Spray	59cdee749e	storage controller: fixes to secondary location handling (#7169 ) Stacks on: - https://github.com/neondatabase/neon/pull/7165 Fixes while working on background optimization of scheduling after a split: - When a tenant has secondary locations, we weren't detaching the parent shards' secondary locations when doing a split - When a reconciler detaches a location, it was feeding back a locationconf with `Detached` mode in its `observed` object, whereas it should omit that location. This could cause the background reconcile task to keep kicking off no-op reconcilers forever (harmless but annoying). - During shard split, we were scheduling secondary locations for the child shards, but no reconcile was run for these until the next time the background reconcile task ran. Creating these ASAP is useful, because they'll be used shortly after a shard split as the destination locations for migrating the new shards to different nodes.	2024-03-21 12:06:57 +00:00
Vlad Lazar	c75b584430	storage_controller: add metrics (#7178 ) ## Problem Storage controller had basically no metrics. ## Summary of changes 1. Migrate the existing metrics to use Conrad's [`measured`](https://docs.rs/measured/0.0.14/measured/) crate. 2. Add metrics for incoming http requests 3. Add metrics for outgoing http requests to the pageserver 4. Add metrics for outgoing pass through requests to the pageserver 5. Add metrics for database queries Note that the metrics response for the attachment service does not use chunked encoding like the rest of the metrics endpoints. Conrad has kindly extended the crate such that it can now be done. Let's leave it for a follow-up since the payload shouldn't be that big at this point. Fixes https://github.com/neondatabase/neon/issues/6875	2024-03-21 12:00:20 +00:00
Conrad Ludgate	5ec6862bcf	proxy: async aware password validation (#7176 ) ## Problem spawn_blocking in #7171 was a hack ## Summary of changes https://github.com/neondatabase/rust-postgres/pull/29	2024-03-21 11:58:41 +01:00
Jure Bajic	94138c1a28	Enforce LSN ordering of batch entries (#7071 ) ## Summary of changes Enforce LSN ordering of batch entries. Closes https://github.com/neondatabase/neon/issues/6707	2024-03-21 09:17:24 +00:00
Joonas Koivunen	2206e14c26	fix(layer): remove the need to repair internal state (#7030 ) ## Problem The current implementation of struct Layer supports canceled read requests, but those will leave the internal state such that a following `Layer::keep_resident` call will need to repair the state. In pathological cases seen during generation numbers resetting in staging or with too many in-progress on-demand downloads, this repair activity will need to wait for the download to complete, which stalls disk usage-based eviction. Similar stalls have been observed in staging near disk-full situations, where downloads failed because the disk was full. Fixes #6028 or the "layer is present on filesystem but not evictable" problems by: 1. not canceling pending evictions by a canceled `LayerInner::get_or_maybe_download` 2. completing post-download initialization of the `LayerInner::inner` from the download task Not canceling evictions above case (1) and always initializing (2) lead to plain `LayerInner::inner` always having the up-to-date information, which leads to the old `Layer::keep_resident` never having to wait for downloads to complete. Finally, the `Layer::keep_resident` is replaced with `Layer::is_likely_resident`. These fix #7145. ## Summary of changes - add a new test showing that a canceled get_or_maybe_download should not cancel the eviction - switch to using a `watch` internally rather than a `broadcast` to avoid hanging eviction while a download is ongoing - doc changes for new semantics and cleanup - fix `Layer::keep_resident` to use just `self.0.inner.get()` as truth as `Layer::is_likely_resident` - remove `LayerInner::wanted_evicted` boolean as no longer needed Builds upon: #7185. Cc: #5331.	2024-03-21 03:19:08 +02:00
Joonas Koivunen	a95c41f463	fix(heavier_once_cell): take_and_deinit should take ownership (#7185 ) Small fix to remove confusing `mut` bindings. Builds upon #7175, split off from #7030. Cc: #5331.	2024-03-21 00:42:38 +02:00
Tristan Partin	041b653a1a	Add state diagram for compute Models a compute's lifetime.	2024-03-20 17:10:46 -05:00
Alex Chi Z	55c4ef408b	safekeeper: correctly handle signals (#7167 ) errno is not preserved in the signal handler. This pull request fixes it. Maybe related: https://github.com/neondatabase/neon/issues/6969, but does not fix the flaky test problem. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-20 15:22:25 -04:00
Alex Chi Z	5f0d9f2360	fix: add safekeeper team to pgxn codeowners (#7170 ) `pgxn/` also contains WAL proposer code, so modifications to this directory should be able to be approved by the safekeeper team. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-20 18:40:48 +00:00
Arpad Müller	34fa34d15c	Dump layer map json in test_gc_feedback.py (#7179 ) The layer map json is an interesting file for that test, so dump it to make debugging easier.	2024-03-20 18:39:46 +00:00
Joonas Koivunen	e961e0d3df	fix(Layer): always init after downloading in the spawned task (#7175 ) Before this PR, cancellation for `LayerInner::get_or_maybe_download` could occur so that we have downloaded the layer file in the filesystem, but because of the cancellation chance, we have not set the internal `LayerInner::inner` or initialized the state. With the detached init support introduced in #7135 and in place in #7152, we can now initialize the internal state after successfully downloading in the spawned task. The next PR will fix the remaining problems that this PR leaves: - `Layer::keep_resident` is still used because - `Layer::get_or_maybe_download` always cancels an eviction, even when canceled Split off from #7030. Stacked on top of #7152. Cc: #5331.	2024-03-20 20:37:47 +02:00
John Spray	2726b1934e	pageserver: extra debug for test_secondary_downloads failures (#7183 ) - Enable debug logs for this test - Add some debug logging detail in downloader.rs - Add an info-level message in scheduler.rs that makes it obvious if a command is waiting for an existing task rather than spawning a new one.	2024-03-20 18:07:45 +00:00
Joonas Koivunen	3d16cda846	refactor(layer): use detached init (#7152 ) The second part of work towards fixing `Layer::keep_resident` so that it does not need to repair the internal state. #7135 added a nicer API for initialization. This PR uses it to remove a few indentation levels and the loop construction. The next PR #7175 will use the refactorings done in this PR, and always initialize the internal state after a download. Cc: #5331	2024-03-20 18:03:09 +02:00
Joonas Koivunen	fb66a3dd85	fix: ResidentLayer::load_keys should not create INFO level span (#7174 ) Since #6115 with more often used get_value_reconstruct_data and friends, we should not have needless INFO level span creation near hot paths. In our prod configuration, INFO spans are always created, but in practice, very rarely anything at INFO level is logged underneath. `ResidentLayer::load_keys` is only used during compaction so it is not that hot, but this aligns the access paths and their span usage. PR changes the span level to debug to align with others, and adds the layer name to the error which was missing. Split off from #7030.	2024-03-20 15:08:03 +01:00
Conrad Ludgate	6d996427b1	proxy: enable sha2 asm support (#7184 ) ## Problem faster sha2 hashing. ## Summary of changes enable asm feature for sha2. this feature will be default in sha2 0.11, so we might as well lean into it now. It provides a noticeable speed boost on macos aarch64. Haven't tested on x86 though	2024-03-20 12:26:31 +00:00
Vlad Lazar	4ba3f3518e	test: fix on demand activation test flakyness (#7180 ) Warm-up (and the "tenant startup complete" metric update) happens in a background tokio task. The tenant map is eagerly updated (can happen before the task finishes). The test assumed that if the tenant map was updated, then the metric should reflect that. That's not the case, so we tweak the test to wait for the metric. Fixes https://github.com/neondatabase/neon/issues/7158	2024-03-20 10:24:59 +00:00
John Spray	a5d5c2a6a0	storage controller: tech debt (#7165 ) This is a mixed bag of changes split out for separate review while working on other things, and batched together to reduce load on CI runners. Each commits stands alone for review purposes: - do_tenant_shard_split was a long function and had a synchronous validation phase at the start that could readily be pulled out into a separate function. This also avoids the special casing of ApiError::BadRequest when deciding whether an abort is needed on errors - Add a 'describe' API (GET on tenant ID) that will enable storcon-cli to see what's going on with a tenant - the 'locate' API wasn't really meant for use in the field. It's for tests: demote it to the /debug/ prefix - The `Single` placement policy was a redundant duplicate of Double(0), and Double was a bad name. Rename it Attached. (https://github.com/neondatabase/neon/issues/7107) - Some neon_local commands were added for debug/demos, which are now replaced by commands in storcon-cli (#7114 ). Even though that's not merged yet, we don't need the neon_local ones any more. Closes https://github.com/neondatabase/neon/issues/7107 ## Backward compat of Single/Double -> `Attached(n)` change A database migration is used to convert any existing values.	2024-03-19 16:08:20 +00:00
Tristan Partin	64c6dfd3e4	Move functions for creating/extracting tarballs into utils Useful for other code paths which will handle zstd compression and decompression.	2024-03-19 10:50:41 -05:00
Alex Chi Z	a8384a074e	fixup(#7168 ): neon_local: use pageserver defaults for known but unspecified config overrides (#7166 ) e2e tests cannot run on macOS unless the file engine env var is supplied. ``` ./scripts/pytest test_runner/regress/test_neon_superuser.py -s ``` will fail with tokio-epoll-uring not supported. This is because we persist the file engine config by default. In this pull request, we only persist when someone specifies it, so that it can use the default platform-variant config in the page server. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-19 10:43:24 -04:00
John Spray	b80704cd34	tests: log hygiene checks for storage controller (#6710 ) ## Problem As with the pageserver, we should fail tests that emit unexpected log errors/warnings. ## Summary of changes - Refactor existing log checks to be reusable - Run log checks for attachment_service - Add allow lists as needed.	2024-03-19 10:30:33 +00:00
Conrad Ludgate	49be446d95	async password validation (#7171 ) ## Problem password hashing can block main thread ## Summary of changes spawn_blocking the password hash call	2024-03-18 23:57:32 +01:00
Arthur Petukhovsky	ad5efb49ee	Support backpressure for sharding (#7100 ) Add shard_number to PageserverFeedback and parse it on the compute side. When compute receives a new ps_feedback, it calculates min LSNs among feedbacks from all shards, and uses those LSNs for backpressure. Add `test_sharding_backpressure` to verify that backpressure slows down compute to wait for the slowest shard.	2024-03-18 21:54:44 +00:00
Christian Schwarz	2bc2fd9cfd	fixup(#7160 / tokio_epoll_uring_ext): double-panic caused by info! in thread-local's drop() (#7164 ) Manual testing of the changes in #7160 revealed that, if the thread-local destructor ever runs (it apparently doesn't in our test suite runs, otherwise #7160 would not have auto-merged), we can encounter an `abort()` due to a double-panic in the tracing code. This github comment here contains the stack trace: https://github.com/neondatabase/neon/pull/7160#issuecomment-2003778176 This PR reverts #7160 and uses a atomic counter to identify the thread-local in log messages, instead of the memory address of the thread local, which may be re-used.	2024-03-18 16:12:01 +01:00
Joonas Koivunen	877fd14401	fix: spanless log message (#7155 ) with `immediate_gc` the span only covered the `gc_iteration`, make it cover the whole needless spawned task, which also does waiting for layer drops and stray logging in tests. also clarify some comments while we are here. Fixes: #6910	2024-03-18 16:27:53 +02:00
Christian Schwarz	db749914d8	fixup(#7141 / tokio_epoll_uring_ext): high frequency log message (#7160 ) The PR #7141 added log message ``` ThreadLocalState is being dropped and id might be re-used in the future ``` which was supposed to be emitted when the thread-local is destroyed. Instead, it was emitted on _each_ call to `thread_local_system()`, ie.., on each tokio-epoll-uring operation. Testing ------- Reproduced the issue locally and verified that this PR fixes the issue.	2024-03-18 12:29:20 +00:00
John Spray	1d3ae57f18	pageserver: refactoring in TenantManager to reduce duplication (#6732 ) ## Problem Followup to https://github.com/neondatabase/neon/pull/6725 In that PR, code for purging local files from a tenant shard was duplicated. ## Summary of changes - Refactor detach code into TenantManager - `spawn_background_purge` method can now be common between detach and split operations	2024-03-18 10:37:20 +00:00
Joonas Koivunen	30a3d80d2f	build: make procfs linux only dependency (#7156 ) the dependency refuses to build on macos so builds on `main` are broken right now, including the `release` PR.	2024-03-18 09:28:45 +00:00
Christian Schwarz	5cec5cb3cf	fixup(#7120 ): the macOS code used an outdated constant name, broke the build (#7150 )	2024-03-15 19:48:51 +00:00
Christian Schwarz	0694ee9531	tokio-epoll-uring: retry on launch failures due to locked memory (#7141 ) refs https://github.com/neondatabase/neon/issues/7136 Problem ------- Before this PR, we were using `tokio_epoll_uring::thread_local_system()`, which panics on tokio_epoll_uring::System::launch() failure As we've learned in [the past](https://github.com/neondatabase/neon/issues/6373#issuecomment-1905814391), some older Linux kernels account io_uring instances as locked memory. And while we've raised the limit in prod considerably, we did hit it once on 2024-03-11 16:30 UTC. That was after we enabled tokio-epoll-uring fleet-wide, but before we had shipped release-5090 (`c6ed86d3d0`) which did away with the last mass-creation of tokio-epoll-uring instances as per commit `3da410c8fe` Author: Christian Schwarz <christian@neon.tech> Date: Tue Mar 5 10:03:54 2024 +0100 tokio-epoll-uring: use it on the layer-creating code paths (#6378) Nonetheless, it highlighted that panicking in this situation is probably not ideal, as it can leave the pageserver process in a semi-broken state. Further, due to low sampling rate of Prometheus metrics, we don't know much about the circumstances of this failure instance. Solution -------- This PR implements a custom thread_local_system() that is pageserver-aware and will do the following on failure: - dump relevant stats to `tracing!`, hopefully they will be useful to understand the circumstances better - if it's the locked memory failure (or any other ENOMEM): abort() the process - if it's ENOMEM, retry with exponential back-off, capped at 3s. - add metric counters so we can create an alert This makes sense in the production environment where we know that _usually_, there's ample locked memory allowance available, and we know the failure rate is rare.	2024-03-15 19:46:15 +00:00
John Spray	9752ad8489	pageserver, controller: improve secondary download APIs for large shards (#7131 ) ## Problem The existing secondary download API relied on the caller to wait as long as it took to complete -- for large shards that could be a long time, so typical clients that might have a baked-in ~30s timeout would have a problem. ## Summary of changes - Take a `wait_ms` query parameter to instruct the pageserver how long to wait: if the download isn't complete in this duration, then 201 is returned instead of 200. - For both 200 and 201 responses, include response body describing download progress, in terms of layers and bytes. This is sufficient for the caller to track how much data is being transferred and log/present that status. - In storage controller live migrations, use this API to apply a much longer outer timeout, with smaller individual per-request timeouts, and log the progress of the downloads. - Add a test that injects layer download delays to exercise the new behavior	2024-03-15 19:45:58 +00:00
Christian Schwarz	ad6f538aef	tokio-epoll-uring: use it for on-demand downloads (#6992 ) # Problem On-demand downloads are still using `tokio::fs`, which we know is inefficient. # Changes - Add `pagebench ondemand-download-churn` to quantify on-demand download throughput - Requires dumping layer map, which required making `history_buffer` impl `Deserialize` - Implement an equivalent of `tokio::io::copy_buf` for owned buffers => `owned_buffers_io` module and children. - Make layer file download sensitive to `io_engine::get()`, using VirtualFile + above copy loop - For this, I had to move some code into the `retry_download`, e.g., `sync_all()` call. Drive-by: - fix missing escaping in `scripts/ps_ec2_setup_instance_store` - if we failed in retry_download to create a file, we'd try to remove it, encounter `NotFound`, and `abort()` the process using `on_fatal_io_error`. This PR adds treats `NotFound` as a success. # Testing Functional - The copy loop is generic & unit tested. Performance - Used the `ondemand-download-churn` benchmark to manually test against real S3. - Results (public Notion page): https://neondatabase.notion.site/Benchmarking-tokio-epoll-uring-on-demand-downloads-2024-04-15-newer-code-03c0fdc475c54492b44d9627b6e4e710?pvs=4 - Performance is equivalent at low concurrency. Jumpier situation at high concurrency, but, still less CPU / throughput with tokio-epoll-uring. - It’s a win. # Future Work Turn the manual performance testing described in the above results document into a performance regression test: https://github.com/neondatabase/neon/issues/7146	2024-03-15 18:57:05 +00:00
John Spray	1aa159acca	pageserver: cancellation for remote ops in tenant deletion on shutdown (#6105 ) ## Problem Tenant deletion had a couple of TODOs where we weren't using proper cancellation tokens that would have aborted the deletions during process shutdown. ## Summary of changes - Refactor enough that deletion/shutdown code has access to the TenantManager's cancellation toke - Use that cancellation token in tenant deletion instead of dummy tokens.	2024-03-15 18:03:49 +00:00
Christian Schwarz	60f30000ef	tokio-epoll-uring: fallback to std-fs if not available & not explicitly requested (#7120 ) fixes https://github.com/neondatabase/neon/issues/7116 Changes: - refactor PageServerConfigBuilder: support not-set values - implement runtime feature test - use runtime feature test to determine `virtual_file_io_engine` if not explicitly configured in the config - log the effective engine at startup - drive-by: improve assertion messages in `test_pageserver_init_node_id` This needed a tiny bit of tokio-epoll-uring work, hence bumping it. Changelog: ``` git log --no-decorate --oneline --reverse 868d2c42b5d54ca82fead6e8f2f233b69a540d3e..342ddd197a060a8354e8f11f4d12994419fff939 c7a74c6 Bump mio from 0.8.8 to 0.8.11 4df3466 Bump mio from 0.8.8 to 0.8.11 (#47) 342ddd1 lifecycle: expose `LaunchResult` enum (#49) ```	2024-03-15 17:46:04 +00:00
John Spray	bc1efa827f	pageserver: exclude gc_horizon from synthetic size calculation (#6407 ) ## Problem See: - https://github.com/neondatabase/neon/issues/6374 ## Summary of changes Whereas previously we calculated synthetic size from the gc_horizon or the pitr_interval (whichever is the lower LSN), now we ignore gc_horizon and exclusively start from the `pitr_interval`. This is a more generous calculation for billing, where we do not charge users for data retained due to gc_horizon.	2024-03-15 16:07:36 +00:00
John Spray	67522ce83d	docs: shard splitting RFC (#6358 ) Extend the previous sharding RFC with functionality for dynamically splitting shards to increase the total shard count on existing tenants.	2024-03-15 16:00:04 +00:00
John Spray	7d32af5ad5	.github: apply timeout to pytest `regress` (#7142 ) These test runs usually take 20-30 minutes. if something hangs, we see actions proceeding for several hours: it's more convenient to have them time out sooner so that we notice that something has hung faster.	2024-03-15 15:57:01 +00:00
Joonas Koivunen	59b6cce418	heavier_once_cell: add detached init support (#7135 ) Aiming for the design where `heavier_once_cell::OnceCell` is initialized by a future factory lead to awkwardness with how `LayerInner::get_or_maybe_download` looks right now with the `loop`. The loop helps with two situations: - an eviction has been scheduled but has not yet happened, and a read access should cancel the eviction - a previous `LayerInner::get_or_maybe_download` that canceled a pending eviction was canceled leaving the `heavier_once_cell::OnceCell` uninitialized but needing repair by the next `LayerInner::get_or_maybe_download` By instead supporting detached initialization in `heavier_once_cell::OnceCell` via an `OnceCell::get_or_detached_init`, we can fix what the monolithic #7030 does: - spawned off download task initializes the `heavier_once_cell::OnceCell` regardless of the download starter being canceled - a canceled `LayerInner::get_or_maybe_download` no longer stops eviction but can win it if not canceled Split off from #7030. Cc: #5331	2024-03-15 15:54:28 +00:00
Joonas Koivunen	bf187aa13f	fix(layer): metric miscalculations (#7137 ) Split off from #7030: - each early exit is counted as canceled init, even though it most likely was just `LayerInner::keep_resident` doing the no-download repair check - `downloaded_after` could had been accounted for multiple times, and also when repairing to match on-disk state Cc: #5331	2024-03-15 17:30:13 +02:00
John Spray	22c26d610b	pageserver: remove un-needed "uninit mark" (#5717 ) Switched the order; doing https://github.com/neondatabase/neon/pull/6139 first then can remove uninit marker after. ## Problem Previously, existence of a timeline directory was treated as evidence of the timeline's logical existence. That is no longer the case since we treat remote storage as the source of truth on each startup: we can therefore do without this mark file. The mark file had also been used as a pseudo-lock to guard against concurrent creations of the same TimelineId -- now that persistence is no longer required, this is a bit unwieldy. In #6139 the `Tenant::timelines_creating` was added to protect against concurrent creations on the same TimelineId, making the uninit mark file entirely redundant. ## Summary of changes - Code that writes & reads mark file is removed - Some nearby `pub` definitions are amended to `pub(crate)` - `test_duplicate_creation` is added to demonstrate that mutual exclusion of creations still works.	2024-03-15 17:23:05 +02:00
John Spray	516f793ab4	remote_storage: make last_modified and etag mandatory (#7126 ) ## Problem These fields were only optional for the convenience of the `local_fs` test helper -- real remote storage backends provide them. It complicated any code that actually wanted to use them for anything. ## Summary of changes - Make these fields non-optional - For azure/S3 it is an error if the server doesn't provide them - For local_fs, use random strings as etags and the file's mtime for last_modified.	2024-03-15 13:37:49 +00:00
John Spray	6443dbef90	tests: extend log allow list for test_sharding_split_failures (#7134 ) Failure types that panic the storage controller can cause unlucky pageservers to emit log warnings that they can't reach the generation validation API: https://neon-github-public-dev.s3.amazonaws.com/reports/main/8284495687/index.html Tolerate this log message: it's an expected behavior.	2024-03-15 13:18:12 +00:00
John Spray	23416cc358	docs: sharding phase 1 RFC (#5432 ) We need to shard our Tenants to support larger databases without those large databases dominating our pageservers and/or requiring dedicated pageservers. This RFC aims to define an initial capability that will permit creating large-capacity databases using a static configuration defined at time of Tenant creation. Online re-sharding is deferred as future work, as is offloading layers for historical reads. However, both of these capabilities would be implementable without further changes to the control plane or compute: this RFC aims to define the cross-component work needed to bootstrap sharding end-to-end.	2024-03-15 11:14:25 +00:00
Anna Khanova	46098ea0ea	proxy: add more missing warm logging (#7133 ) ## Problem There is one more missing thing about cached connections for `cold_start_info`. ## Summary of changes Fix and add comments.	2024-03-15 11:13:15 +00:00
Conrad Ludgate	49bc734e02	proxy: add websocket regression tests (#7121 ) ## Problem We have no regression tests for websocket flow ## Summary of changes Add a hacky implementation of the postgres protocol over websockets just to verify the protocol behaviour does not regress over time.	2024-03-15 10:21:48 +01:00
Alex Chi Z	76c44dc140	spec: disable neon extension auto upgrade (#7128 ) This pull request disables neon extension auto upgrade to help the next compute image upgrade smooth. ## Summary of changes We have two places to auto-upgrade neon extension: during compute spec update, and when the compute node starts. The compute spec update logic is always there, and the compute node start logic is added in https://github.com/neondatabase/neon/pull/7029. In this pull request, we disable both of them, so that we can still roll back to an older version of compute before figuring out the best way of extension upgrade-downgrade. https://github.com/neondatabase/neon/issues/6936 We will enable auto-upgrade in the next release following this release. There are no other extension upgrades from release 4917 and therefore after this pull request, it would be safe to revert to release 4917. Impact: * Project created after unpinning the compute image -> if we need to roll back, they will stuck, because the default neon extension version is 1.3. Need to manually pin the compute image version if such things happen. * Projects already stuck on staging due to not downgradeable -> I don't know their current status, maybe they are already running the latest compute image? * Other projects -> can be rolled back to release 4917. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-14 19:45:38 +00:00
Joonas Koivunen	58ef78cf41	doc(README): note cargo-nextest usage (#7122 ) We have been using #5681 for quite some time, and at least since #6931 the tests have assumed `cargo-nextest` to work around our use of global statics. Unlike the `cargo test`, the `cargo nextest run` runs each test as a separate process that can be timeouted. Add a mention of using `cargo-nextest` in the top-level README.md. Sub-crates can still declare they support `cargo test`, like `compute_tools/README.md` does.	2024-03-14 18:49:42 +00:00
John Spray	678ed39de2	storage controller: validate DNS of registering nodes (#7101 ) A node with a bad DNS configuration can register itself with the storage controller, and the controller will try and schedule work onto the node, but never succeed because it can't reach the node. The DNS case is a special case of asymmetric network issues. The general case isn't covered here -- but might make sense to tighten up after #6844 merges -- then we can avoid assuming a node is immediately available in re_attach.	2024-03-14 16:48:38 +00:00
Vlad Lazar	3d8830ac35	test_runner: re-enable large slru benchmark (#7125 ) Previously disabled due to https://github.com/neondatabase/neon/issues/7006.	2024-03-14 16:47:32 +00:00
Vlad Lazar	38767ace68	storage_controller: periodic pageserver heartbeats (#7092 ) ## Problem If a pageserver was offline when the storage controller started, there was no mechanism to update the storage controller state when the pageserver becomes active. ## Summary of changes * Add a heartbeater module. The heartbeater must be driven by an external loop. * Integrate the heartbeater into the service. - Extend the types used by the service and scheduler to keep track of a nodes' utilisation score. - Add a background loop to drive the heartbeater and update the state based on the deltas it generated - Do an initial round of heartbeats at start-up	2024-03-14 15:21:36 +00:00
Arseny Sher	9fe0193e51	Bump vendor/postgres v15 v14.	2024-03-14 18:06:53 +04:00
Christian Schwarz	8075f0965a	fix(test suite) `virtual_file_io_engine` and `get_vectored_impl` patametrization doesn't work (#7113 ) # Problem While investigating #7124, I noticed that the benchmark was always using the `DEFAULT_*` `virtual_file_io_engine` , i.e., `tokio-epoll-uring` as of https://github.com/neondatabase/neon/pull/7077. The fundamental problem is that the `control_plane` code has its own view of `PageServerConfig`, which, I believe, will always be a subset of the real pageserver's `pageserver/src/config.rs`. For the `virtual_file_io_engine` and `get_vectored_impl` parametrization of the test suite, we were constructing a dict on the Python side that contained these parameters, then handed it to `control_plane::PageServerConfig`'s derived `serde::Deserialize`. The default in serde is to ignore unknown fields, so, the Deserialize impl silently ignored the fields. In consequence, the fields weren't propagated to the `pageserver --init` call, and the tests ended up using the `pageserver/src/config.rs::DEFAULT_` values for the respective options all the time. Tests that explicitly used overrides in `env.pageserver.start()` and similar were not affected by this. But, it means that all the test suite runs where with parametrization didn't properly exercise the code path. # Changes - use `serde(deny_unknown_fields)` to expose the problem - With this change, the Python tests that override `virtual_file_io_engine` and `get_vectored_impl` fail on `pageserver --init`, exposing the problem. - use destructuring to uncover the issue in the future - fix the issue by adding the missing fields to the `control_plane` crate's `PageServerConf` - A better solution would be for control plane to re-use a struct provided by the pageserver crate, so that everything is in one place in `pageserver/src/config.rs`, but, our config parsing code is (almost) beyond repair anyways. - fix the `pageserver_virtual_file_io_engine` to be responsive to the env var - => required to make parametrization work in benchmarks # Testing Before merging this PR, I re-ran the regression tests & CI with the full matrix of `virtual_file_io_engine` and `tokio-epoll-uring`, see `9c7ea364e0`	2024-03-14 11:18:55 +00:00
John Spray	44f42627dd	pageserver/controller: error handling for shard splitting (#7074 ) ## Problem Shard splits worked, but weren't safe against failures (e.g. node crash during split) yet. Related: #6676 ## Summary of changes - Introduce async rwlocks at the scope of Tenant and Node: - exclusive tenant lock is used to protect splits - exclusive node lock is used to protect new reconciliation process that happens when setting node active - exclusive locks used in both cases when doing persistent updates (e.g. node scheduling conf) where the update to DB & in-memory state needs to be atomic. - Add failpoints to shard splitting in control plane and pageserver code. - Implement error handling in control plane for shard splits: this detaches child chards and ensures parent shards are re-attached. - Crash-safety for storage controller restarts requires little effort: we already reconcile with nodes over a storage controller restart, so as long as we reset any incomplete splits in the DB on restart (added in this PR), things are implicitly cleaned up. - Implement reconciliation with offline nodes before they transition to active: - (in this context reconciliation means something like startup_reconcile, not literally the Reconciler) - This covers cases where split abort cannot reach a node to clean it up: the cleanup will eventually happen when the node is marked active, as part of reconciliation. - This also covers the case where a node was unavailable when the storage controller started, but becomes available later: previously this allowed it to skip the startup reconcile. - Storage controller now terminates on panics. We only use panics for true "should never happen" assertions, and these cases can leave us in an un-usable state if we keep running (e.g. panicking in a shard split). In the unlikely event that we get into a crashloop as a result, we'll rely on kubernetes to back us off. - Add `test_sharding_split_failures` which exercises a variety of failure cases during shard split.	2024-03-14 09:11:57 +00:00
Conrad Ludgate	3bd6551b36	proxy http cancellation safety (#7117 ) ## Problem hyper auto-cancels the request futures on connection close. `sql_over_http::handle` is not 'drop cancel safe', so we need to do some other work to make sure connections are queries in the right way. ## Summary of changes 1. tokio::spawn the request handler to resolve the initial cancel-safety issue 2. share a cancellation token, and cancel it when the request `Service` is dropped. 3. Add a new log span to be able to track the HTTP connection lifecycle.	2024-03-14 08:20:56 +00:00
Christian Schwarz	69338e53e3	throttling: fixup interactions with Timeline::get_vectored (#7089 ) ## Problem Before this PR, `Timeline::get_vectored` would be throttled twice if the sequential option was enabled or if validation was enabled. Also, `pageserver_get_vectored_seconds` included the time spent in the throttle, which turns out to be undesirable for what we use that metric for. ## Summary of changes Double-throttle: * Add `Timeline::get0` method which is unthrottled. * Use that method from within the `Timeline::get_vectored` code path. Metric: * return throttled time from `throttle()` method * deduct the value from the observed time * globally rate-limited logging of duration subtraction errors, like in all other places that do the throttled-time deduction from observations	2024-03-13 17:49:17 +00:00
Arpad Müller	5309711691	Make tenant_id in TenantLocationConfigRequest optional (#7055 ) The `tenant_id` in `TenantLocationConfigRequest` in the `location_config` endpoint was only used in the storage controller/attachment service, and there it was only used for assertions and the creation part.	2024-03-13 17:30:29 +01:00
Joonas Koivunen	8a53d576e6	fix(metrics): time individual layer flush operations (#7109 ) Currently, the flushing operation could flush multiple frozen layers to the disk and store the aggregate time in the histogram. The result is a bimodal distribution with short and over 1000-second flushes. Change it so that we record how long one layer flush takes.	2024-03-13 15:10:20 +00:00
Anna Khanova	b0aff04157	proxy: add new dimension to exclude cplane latency (#7011 ) ## Problem Currently cplane communication is a part of the latency monitoring. It doesn't allow to setup the proper alerting based on proxy latency. ## Summary of changes Added dimension to exclude cplane latency.	2024-03-13 13:50:05 +01:00
Anna Khanova	0554bee022	proxy: Report warm cold start if connection is from the local cache (#7104 ) ## Problem * quotes in serialized string * no status if connection is from local cache ## Summary of changes * remove quotes * report warm if connection if from local cache	2024-03-13 11:45:19 +00:00
Conrad Ludgate	83855a907c	proxy http error classification (#7098 ) ## Problem Missing error classification for SQL-over-HTTP queries. Not respecting `UserFacingError` for SQL-over-HTTP queries. ## Summary of changes Adds error classification. Adds user facing errors.	2024-03-13 07:35:49 +01:00
John Spray	1b41db8bdd	pageserver: enable setting stripe size inline with split request. (#7093 ) ## Summary - Currently we can set stripe size at tenant creation, but it doesn't mean anything until we have multiple shards - When onboarding an existing tenant, it will always get a default shard stripe size, so we would like to be able to pick the actual stripe size at the point we split. ## Why do this inline with a split? The alternative to this change would be to have a separate endpoint on the storage controller for setting the stripe size on a tenant, and only permit writes to that endpoint when the tenant has only a single shard. That would work, but be a little bit more work for a client, and not appreciably simpler (instead of having a special argument to the split functions, we'd have a special separate endpoint, and a requirement that the controller must sync its config down to the pageserver before calling the split API). Either approach would work, but this one feels a bit more robust end-to-end: the split API is the _very last moment_ that the stripe size is mutable, so if we aim to set it before splitting, it makes sense to do it as part of the same operation.	2024-03-12 20:41:08 +00:00
Jure Bajic	bac06ea1ac	pageserver: fix read path max lsn bug (#7007 ) ## Summary of changes The problem it fixes is when `request_lsn` is `u64::MAX-1` the `cont_lsn` becomes `u64::MAX` which is the same as `prev_lsn` which stops the loop. Closes https://github.com/neondatabase/neon/issues/6812	2024-03-12 16:32:47 +00:00
John Spray	7ae8364b0b	storage controller: register nodes in re-attach request (#7040 ) ## Problem Currently we manually register nodes with the storage controller, and use a script during deploy to register with the cloud control plane. Rather than extend that script further, nodes should just register on startup. ## Summary of changes - Extend the re-attach request to include an optional NodeRegisterRequest - If the `register` field is set, handle it like a normal node registration before executing the normal re-attach work. - Update tests/neon_local that used to rely on doing an explicit register step that could be enabled/disabled. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-03-12 14:47:12 +00:00
Conrad Ludgate	1f7d54f987	proxy refactor tls listener (#7056 ) ## Problem Now that we have tls-listener vendored, we can refactor and remove a lot of bloated code and make the whole flow a bit simpler ## Summary of changes 1. Remove dead code 2. Move the error handling to inside the `TlsListener` accept() function 3. Extract the peer_addr from the PROXY protocol header and log it with errors	2024-03-12 13:05:40 +00:00
Arthur Petukhovsky	580e136b2e	Forward all backpressure feedback to compute (#7079 ) Previously we aggregated ps_feedback on each safekeeper and sent it to walproposer with every AppendResponse. This PR changes it to send ps_feedback to walproposer right after receiving it from pageserver, without aggregating it in memory. Also contains some preparations for implementing backpressure support for sharding.	2024-03-12 12:14:02 +00:00
Conrad Ludgate	09699d4bd8	proxy: cancel http queries on timeout (#7031 ) ## Problem On HTTP query timeout, we should try and cancel the current in-flight SQL query. ## Summary of changes Trigger a cancellation command in postgres once the timeout is reach	2024-03-12 11:52:00 +00:00
John Spray	89cf714890	tests/neon_local: rename "attachment service" -> "storage controller" (#7087 ) Not a user-facing change, but can break any existing `.neon` directories created by neon_local, as the name of the database used by the storage controller changes. This PR changes all the locations apart from the path of `control_plane/attachment_service` (waiting for an opportune moment to do that one, because it's the most conflict-ish wrt ongoing PRs like #6676 )	2024-03-12 11:36:27 +00:00
Heikki Linnakangas	621ea2ec44	tests: try to make restored-datadir comparison tests not flaky v2 This test occasionally fails with a difference in "pg_xact/0000" file between the local and restored datadirs. My hypothesis is that something changed in the database between the last explicit checkpoint and the shutdown. I suspect autovacuum, it could certainly create transactions. To fix, be more precise about the point in time that we compare. Shut down the endpoint first, then read the last LSN (i.e. the shutdown checkpoint's LSN), from the local disk with pg_controldata. And use exactly that LSN in the basebackup. Closes #559	2024-03-11 23:29:32 +04:00
Heikki Linnakangas	74d09b78c7	Keep walproposer alive until shutdown checkpoint is safe on safekepeers The walproposer pretends to be a walsender in many ways. It has a WalSnd slot, it claims to be a walsender by calling MarkPostmasterChildWalSender() etc. But one different to real walsenders was that the postmaster still treated it as a bgworker rather than a walsender. The difference is that at shutdown, walsenders are not killed until the very end, after the checkpointer process has written the shutdown checkpoint and exited. As a result, the walproposer always got killed before the shutdown checkpoint was written, so the shutdown checkpoint never made it to safekeepers. That's fine in principle, we don't require a clean shutdown after all. But it also feels a bit silly not to stream the shutdown checkpoint. It could be useful for initializing hot standby mode in a read replica, for example. Change postmaster to treat background workers that have called MarkPostmasterChildWalSender() as walsenders. That unfortunately requires another small change in postgres core. After doing that, walproposers stay alive longer. However, it also means that the checkpointer will wait for the walproposer to switch to WALSNDSTATE_STOPPING state, when the checkpointer sends the PROCSIG_WALSND_INIT_STOPPING signal. We don't have the machinery in walproposer to receive and handle that signal reliably. Instead, we mark walproposer as being in WALSNDSTATE_STOPPING always. In commit `568f91420a`, I assumed that shutdown will wait for all the remaining WAL to be streamed to safekeepers, but before this commit that was not true, and the test became flaky. This should make it stable again. Some tests wrongly assumed that no WAL could have been written between pg_current_wal_flush_lsn and quick pg stop after it. Fix them by introducing flush_ep_to_pageserver which first stops the endpoint and then waits till all committed WAL reaches the pageserver. In passing extract safekeeper http client to its own module.	2024-03-11 23:29:32 +04:00
Arseny Sher	0cf0731d8b	SIGQUIT instead of SIGKILL prewarmed postgres. To avoid orphaned processes using wiped datadir with confusing logging.	2024-03-11 22:36:52 +04:00
Sasha Krassovsky	98723844ee	Don't return from inside PG_TRY (#7095 ) ## Problem Returning from PG_TRY is a bug, and we currently do that ## Summary of changes Make it break and then return false. This should also help stabilize test_bad_connection.py	2024-03-11 18:36:39 +00:00
Alex Chi Z	73a8c97ac8	fix: warnings when compiling neon extensions (#7053 ) proceeding https://github.com/neondatabase/neon/pull/7010, close https://github.com/neondatabase/neon/issues/6188 ## Summary of changes This pull request (should) fix all warnings except `-Wdeclaration-after-statement` in the neon extension compilation. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-11 17:49:58 +00:00
Christian Schwarz	17a3c9036e	follow-up(#7077 ): adjust flaky-test-detection cutoff date for tokio-epoll-uring (#7090 ) Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-03-11 16:36:49 +00:00
Joonas Koivunen	8c5b310090	fix: Layer delete on drop and eviction can outlive timeline shutdown (#7082 ) This is a follow-up to #7051 where `LayerInner::drop` and `LayerInner::evict_blocking` were not noticed to require a gate before the file deletion. The lack of entering a gate opens up a similar possibility of deleting a layer file which a newer Timeline instance has already checked out to be resident in a similar case as #7051.	2024-03-11 16:54:06 +01:00
Christian Schwarz	8224580f3e	fix(tenant/timeline metrics): race condition during shutdown + recreation (#7064 ) Tenant::shutdown or Timeline::shutdown completes and becomes externally observable before the corresponding Tenant/Timeline object is dropped. For example, after observing a Tenant::shutdown to complete, we could attach the same tenant_id again. The shut down Tenant object might still be around at the time of the attach. The race is then the following: - old object's metrics are still around - new object uses with_label_values - old object calls remove_label_values The outcome is that the new object will have the metric objects (they're an Arc internall) but the metrics won't be part of the internal registry and hence they'll be missing in `/metrics`. Later, when the new object gets shut down and tries to remove_label_value, it will observe an error because the metric was already removed by the old object. Changes ------- This PR moves metric removal to `shutdown()`. An alternative design would be to multi-version the metrics using a distinguishing label, or, to use a better metrics crate that allows removing metrics from the registry through the locally held metric handle instead of interacting with the (globally shared) registry. refs https://github.com/neondatabase/neon/pull/7051	2024-03-11 15:41:41 +01:00
Christian Schwarz	2b0f3549f7	default to tokio-epoll-uring in CI tests & on Linux (#7077 ) All of production is using it now as of https://github.com/neondatabase/aws/pull/1121 The change in `flaky_tests.py` resets the flakiness detection logic. The alternative would have been to repeat the choice of io engine in each test name, which would junk up the various test reports too much. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-03-11 14:35:59 +00:00
John Spray	b4972d07d4	storage controller: refactor non-mutable members up into Service (#7086 ) result_tx and compute_hook were in ServiceState (i.e. behind a sync mutex), but didn't need to be. Moving them up into Service removes a bunch of boilerplate clones. While we're here, create a helper `Service::maybe_reconcile_shard` which avoids writing out all the `&self.` arguments to `TenantState::maybe_reconcile` everywhere we call it.	2024-03-11 14:29:32 +00:00
Joonas Koivunen	26ae7b0b3e	fix(metrics): reset TENANT_STATE metric on startup (#7084 ) Otherwise, it might happen that we never get to witness the same state on subsequent restarts, thus the time series will show the value from a few restarts ago. The actual case here was that "Activating" was showing `3` while I was doing tenant migration testing on staging. The number 3 was however from a startup that happened some time ago which had been interrupted by another deployment.	2024-03-11 13:25:53 +00:00
John Spray	f8483cc4a3	pageserver: update swagger for HA APIs (#7070 ) - The type of heatmap_period in tenant config was wrrong - Secondary download and heatmap upload endpoints weren't in swagger.	2024-03-11 09:32:17 +00:00
Conrad Ludgate	cc5d6c66b3	proxy: categorise new cplane error message (#7057 ) ## Problem `422 Unprocessable Entity: compute time quota of non-primary branches is exceeded` being marked as a control plane error. ## Summary of changes Add the manual checks to make this a user error that should not be retried.	2024-03-11 09:20:09 +01:00
Roman Zaynetdinov	d894d2b450	Export db size, deadlocks and changed row metrics (#7050 ) ## Problem We want to report metrics for the oldest user database.	2024-03-11 08:10:04 +00:00
Joonas Koivunen	b09d686335	fix: on-demand downloads can outlive timeline shutdown (#7051 ) ## Problem Before this PR, it was possible that on-demand downloads were started after `Timeline::shutdown()`. For example, we have observed a walreceiver-connection-handler-initiated on-demand download that was started after `Timeline::shutdown()`s final `task_mgr::shutdown_tasks()` call. The underlying issue is that `task_mgr::shutdown_tasks()` isn't sticky, i.e., new tasks can be spawned during or after `task_mgr::shutdown_tasks()`. Cc: https://github.com/neondatabase/neon/issues/4175 in lieu of a more specific issue for task_mgr. We already decided we want to get rid of it anyways. Original investigation: https://neondb.slack.com/archives/C033RQ5SPDH/p1709824952465949 ## Changes - enter gate while downloading - use timeline cancellation token for cancelling download thereby, fixes #7054 Entering the gate might also remove recent "kept the gate from closing" in staging.	2024-03-09 13:09:08 +00:00
Christian Schwarz	74d24582cf	throttling: exclude throttled time from basebackup (fixup of #6953 ) (#7072 ) PR #6953 only excluded throttled time from the handle_pagerequests (aka smgr metrics). This PR implements the deduction for `basebackup ` queries. The other page_service methods either don't use Timeline::get or they aren't used in production. Found by manually inspecting in [staging logs](https://neonprod.grafana.net/explore?schemaVersion=1&panes=%7B%22wx8%22:%7B%22datasource%22:%22xHHYY0dVz%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bhostname%3D%5C%22pageserver-0.eu-west-1.aws.neon.build%5C%22%7D%20%7C~%20%60git-env%7CERR%7CWARN%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22xHHYY0dVz%22%7D,%22editorMode%22:%22code%22%7D%5D,%22range%22:%7B%22to%22:%221709919114642%22,%22from%22:%221709904430898%22%7D%7D%7D).	2024-03-09 13:37:02 +01:00
Sasha Krassovsky	4834d22d2d	Revoke REPLICATION (#7052 ) ## Problem Currently users can cause problems with replication ## Summary of changes Don't let them replicate	2024-03-08 22:24:30 +00:00
Anastasia Lubennikova	86e8c43ddf	Add downgrade scripts for neon extension. (#7065 ) ## Problem When we start compute with newer version of extension (i.e. 1.2) and then rollback the release, downgrading the compute version, next compute start will try to update extension to the latest version available in neon.control (i.e. 1.1). Thus we need to provide downgrade scripts like neon--1.2--1.1.sql These scripts must revert the changes made by the upgrade scripts in the reverse order. This is necessary to ensure that the next upgrade will work correctly. In general, we need to write upgrade and downgrade scripts to be more robust and add IF EXISTS / CREATE OR REPLACE clauses to all statements (where applicable). ## Summary of changes Adds downgrade scripts. Adds test cases for extension downgrade/upgrade. fixes #7066 This is a follow-up for https://app.incident.io/neondb/incidents/167?tab=follow-ups Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Alex Chi Z <iskyzh@gmail.com> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2024-03-08 20:42:35 +00:00
John Spray	7329413705	storage controller: enable setting PlacementPolicy in tenant creation (#7037 ) ## Problem Tenants created via the storage controller have a `PlacementPolicy` that defines their HA/secondary/detach intent. For backward compat we can just set it to Single, for onboarding tenants using /location_conf it is automatically set to Double(1) if there are at least two pageservers, but for freshly created tenants we didn't have a way to specify it. This unblocks writing tests that create HA tenants on the storage controller and do failure injection testing. ## Summary of changes - Add optional fields to TenantCreateRequest for specifying PlacementPolicy. This request structure is used both on pageserver API and storage controller API, but this method is only meaningful for the storage controller (same as existing `shard_parameters` attribute). - Use the value from the creation request in tenant creation, if provided.	2024-03-08 15:34:53 +00:00
Conrad Ludgate	2c132e45cb	proxy: do not store ephemeral endpoints in http pool (#6819 ) ## Problem For the ephemeral endpoint feature, it's not really too helpful to keep them around in the connection pool. This isn't really pressing but I think it's still a bit better this way. ## Summary of changes Add `is_ephemeral` function to `NeonOptions`. Allow `serverless::ConnInfo::endpoint_cache_key()` to return an `Option`. Handle that option appropriately	2024-03-08 07:56:23 +00:00
Vlad Lazar	0f05ef67e2	pageserver: revert open layer rolling revert (#6962 ) ## Problem We reverted https://github.com/neondatabase/neon/pull/6661 a few days ago. The change led to OOMs in benchmarks followed by large WAL reingests. The issue was that we removed [this code](`d04af08567/pageserver/src/tenant/timeline/walreceiver/walreceiver_connection.rs (L409-L417)`). That call may trigger a roll of the open layer due to the keepalive messages received from the safekeeper. Removing it meant that enforcing of checkpoint timeout became even more lax and led to using up large amounts of memory for the in memory layer indices. ## Summary of changes Piggyback on keep alive messages to enforce checkpoint timeout. This is a hack, but it's exactly what the current code is doing. ## Alternatives Christhian, Joonas and myself sketched out a timer based approach [here](https://github.com/neondatabase/neon/pull/6940). While discussing it further, it became obvious that's also a bit of a hack and not the desired end state. I chose not to take that further since it's not what we ultimately want and it'll be harder to rip out. Right now it's unclear what the ideal system behaviour is: * early flushing on memory pressure, or ... * detaching tenants on memory pressure	2024-03-07 19:53:10 +00:00
Conrad Ludgate	02358b21a4	update rustls (#7048 ) ## Summary of changes Update rustls from 0.21 to 0.22. reqwest/tonic/aws-smithy still use rustls 0.21. no upgrade route available yet.	2024-03-07 18:23:19 +00:00
Sasha Krassovsky	2fc89428c3	Hopefully stabilize test_bad_connection.py (#6976 ) ## Problem It seems that even though we have a retry on basebackup, it still sometimes fails to fetch it with the failpoint enabled, resulting in a test error. ## Summary of changes If we fail to get the basebackup, disable the failpoint and try again.	2024-03-07 10:12:06 -08:00
Arpad Müller	ce7a82db05	Update svg_fmt (#7049 ) Gets upstream PR https://github.com/nical/rust_debug/pull/3 , removes trailing "s from output.	2024-03-07 17:32:09 +00:00
John Spray	d5a6a2a16d	storage controller: robustness improvements (#7027 ) ## Problem Closes: https://github.com/neondatabase/neon/issues/6847 Closes: https://github.com/neondatabase/neon/issues/7006 ## Summary of changes - Pageserver API calls are wrapped in timeout/retry logic: this prevents a reconciler getting hung on a pageserver API hang, and prevents reconcilers having to totally retry if one API call returns a retryable error (e.g. 503). - Add a cancellation token to `Node`, so that when we mark a node offline we will cancel any API calls in progress to that node, and avoid issuing any more API calls to that offline node. - If the dirty locations of a shard are all on offline nodes, then don't spawn a reconciler - In re-attach, if we have no observed state object for a tenant then construct one with conf: None (which means "unknown"). Then in Reconciler, implement a TODO for scanning such locations before running, so that we will avoid spuriously incrementing a generation in the case of a node that was offline while we started (this is the case that tripped up #7006) - Refactoring: make Node contents private (and thereby guarantee that updates to availability mode reliably update the cancellation token.) - Refactoring: don't pass the whole map of nodes into Reconciler (and thereby remove a bunch of .expect() calls) Some of this was discovered/tested with a new failure injection test that will come in a separate PR, once it is stable enough for CI.	2024-03-07 17:10:03 +00:00
Vlad Lazar	871977f14c	pageserver: fix early bail out in vectored get (#7038 ) ## Problem When vectored get encountered a portion of the key range that could not be mapped to any layer in the current timeline it would incorrectly bail out of the current timeline. This is incorrect since we may have had layers queued for a visit in the fringe. ## Summary of changes * Add a repro unit test * Remove the early bail out path * Simplify range search return value	2024-03-07 16:02:20 +00:00
Joonas Koivunen	602a4da9a5	bench: run branch_creation_many at 500, seeded (#6959 ) We have a benchmark for creating a lot of branches, but it does random things, and the branch count is not what we is the largest maximum we aim to support. If this PR would stabilize the benchmark total duration it means that there are some structures which are very much slower than others. Then we should add a seed-outputting variant to help find and reproduce such cases. Additionally, record for the benchmark: - shutdown duration - startup metrics once done (on restart) - duration of first compaction completion via debug logging	2024-03-07 16:23:42 +02:00
John Spray	d3c583efbe	Rename binary attachment_service -> storage_controller (#7042 ) ## Problem The storage controller binary still has its historic `attachment_service` name -- it will be painful to change this later because we can't atomically update this repo and the helm charts used to deploy. Companion helm chart change: https://github.com/neondatabase/helm-charts/pull/70 ## Summary of changes - Change the name of the binary to `storage_controller` - Skipping renaming things in the source right now: this is just to get rid of the legacy name in external interfaces. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-03-07 14:06:48 +00:00
Vlad Lazar	d03ec9d998	pageserver: don't validate vectored get on shut-down (#7039 ) ## Problem We attempted validation for cancelled errors under the assumption that if vectored get fails, sequential get will too. That's not right 100% of times though because sequential get may have the values cached and slip them through even when shutting down. ## Summary of changes Don't validate if either search impl failed due to tenant shutdown.	2024-03-07 12:37:52 +00:00
Conrad Ludgate	c2876ec55d	proxy http tls investigations (#7045 ) ## Problem Some HTTP-specific TLS errors ## Summary of changes Add more logging, vendor `tls-listener` with minor modifications.	2024-03-07 12:36:47 +00:00
Alex Chi Z	0b330e1310	upgrade neon extension on startup (#7029 ) ## Problem Fix https://github.com/neondatabase/neon/issues/7003. Fix https://github.com/neondatabase/neon/issues/6982. Currently, neon extension is only upgraded when new compute spec gets applied, for example, when creating a new role or creating a new database. This also resolves `neon.lfc_stat` not found warnings in prod. ## Summary of changes This pull request adds the logic to spawn a background thread to upgrade the neon extension version if the compute is a primary. If for whatever reason the upgrade fails, it reports an error to the console and does not impact compute node state. This change can be further applied to 3rd-party extension upgrades. We can silently upgrade the version of 3rd party extensions in the background in the future. Questions: * Does alter extension takes some kind of lock that will block user requests? * Does `ALTER EXTENSION` writes to the database if nothing needs to be upgraded? (may impact storage size). Otherwise it's safe to land this pull request. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-06 12:20:44 -05:00
Alexander Bayandin	f40b13d801	Update client libs for test_runner/pg_clients to their latest versions (#7022 ) ## Problem Closes https://github.com/neondatabase/neon/security/dependabot/56 Supersedes https://github.com/neondatabase/neon/pull/7013 Workflow run: https://github.com/neondatabase/neon/actions/runs/8157302480 ## Summary of changes - Update client libs for `test_runner/pg_clients` to their latest versions	2024-03-06 17:09:54 +00:00
John Spray	a9a4a76d13	storage controller: misc fixes (#7036 ) ## Problem Collection of small changes, batched together to reduce CI overhead. ## Summary of changes - Layer download messages include size -- this is useful when watching a pageserver hydrate its on disk cache in the log. - Controller migrate API could put an invalid NodeId into TenantState - Scheduling errors during tenant create could result in creating some shards and not others. - Consistency check could give hard-to-understand failures in tests if a reconcile was in process: explicitly fail the check if reconciles are in progress instead.	2024-03-06 16:47:32 +00:00
Alex Chi Z	5dc2088cf3	fix(test): drop subscription when test completes (#6975 ) This pull request mitigates https://github.com/neondatabase/neon/issues/6969, but the longer-term problem is that we cannot properly stop Postgres if there is a subscription. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-06 15:52:24 +00:00
John Spray	4a31e18c81	storage controller: include stripe size in compute notifications (#6974 ) ## Problem - The storage controller is the source of truth for a tenant's stripe size, but doesn't currently have a way to propagate that to compute: we're just using the default stripe size everywhere. Closes: https://github.com/neondatabase/neon/issues/6903 ## Summary of changes - Include stripe size in `ComputeHookNotifyRequest` - Include stripe size in `LocationConfigResponse` The stripe size is optional: it will only be advertised for multi-sharded tenants. This enables the controller to defer the choice of stripe size until we split a tenant for the first time.	2024-03-06 13:56:30 +00:00
John Spray	a3ef50c9b6	storage controller: use 'lazy' mode for location_config (#6987 ) ## Problem If large numbers of shards are attached to a pageserver concurrently, for example after another node fails, it can cause excessive I/O queue depths due to all the newly attached shards trying to calculate logical sizes concurrently. #6907 added the `lazy` flag to handle this. ## Summary of changes - Use `lazy=true` from all /location_config calls in the storage controller Reconciler.	2024-03-06 11:26:29 +00:00
Arpad Müller	2f88e7a921	Move compaction code to compaction.rs (#7026 ) Moves some of the (legacy) compaction code to compaction.rs. No functional changes, just moves of code. Before, compaction.rs was only for the new tiered compaction mechanism, now it's for both the old and new mechanisms. Part of #6768	2024-03-06 01:40:23 +00:00
Christian Schwarz	eacdc179dc	fixup(#6991 ): it broke the macOS build (#7024 )	2024-03-05 17:03:51 +00:00
Vlad Lazar	2daa2f1d10	test: disable large slru basebackup bench in ci (#7025 ) The test is flaky due to https://github.com/neondatabase/neon/issues/7006.	2024-03-05 15:41:05 +00:00
Anna Khanova	15b3665dc4	proxy: fix bug with populating the data (#7023 ) ## Problem Branch/project and coldStart were not populated to data events. ## Summary of changes Populate it. Also added logging for the coldstart info.	2024-03-05 15:32:58 +00:00
Arpad Müller	e69a25542b	Minor improvements to tiered compaction (#7020 ) Minor non-functional improvements to tiered compaction, mostly consisting of comment fixes. Followup of #6830, part of #6768	2024-03-05 16:26:51 +01:00
Alex Chi Z	b036c32262	fix -Wmissing-prototypes for neon extension (#7010 ) ## Problem ref https://github.com/neondatabase/neon/issues/6188 ## Summary of changes This pull request fixes `-Wmissing-prototypes` for the neon extension. Note that (1) the gcc version in CI and macOS is different, therefore some of the warning does not get reported when developing the neon extension locally. (2) the CI env variable `COPT = -Werror` does not get passed into the docker build process, therefore warnings are not treated as errors on CI. `e62baa9704/.github/workflows/build_and_test.yml (L22)` There will be follow-up pull requests on solving other warnings. By the way, I did not figure out the default compile parameters in the CI env, and therefore this pull request is tested by manually adding `-Wmissing-prototypes` into the `COPT`. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-05 10:03:44 -05:00
Anna Khanova	bdbb2f4afc	proxy: report redis broken message metric (#7021 ) ## Problem Not really a problem. Improving visibility around redis communication. ## Summary of changes Added metric on the number of broken messages.	2024-03-05 16:02:51 +01:00
Christian Schwarz	270d3be507	feat(per-tenant throttling): exclude throttled time from page_service metrics + regression test (#6953 ) part of https://github.com/neondatabase/neon/issues/5899 Problem ------- Before this PR, the time spent waiting on the throttle was charged towards the higher-level page_service metrics, i.e., `pageserver_smgr_query_seconds`. The metrics are the foundation of internal SLIs / SLOs. A throttled tenant would cause the SLI to degrade / SLO alerts to fire. Changes ------- - don't charge time spent in throttle towards the page_service metrics - record time spent in throttle in RequestContext and subtract it from the elapsed time - this works because the page_service path doesn't create child context, so, all the throttle time is recorded in the parent - it's quite brittle and will break if we ever decide to spawn child tasks that need child RequestContexts, which would have separate instances of the `micros_spent_throttled` counter. - however, let's punt that to a more general refactoring of RequestContext - add a test case that ensures that - throttling happens for getpage requests; this aspect of the test passed before this PR - throttling delays aren't charged towards the page_service metrics; this aspect of the test only passes with this PR - drive-by: make the throttle log message `info!`, it's an expected condition Performance ----------- I took the same measurements as in #6706 , no meaningful change in CPU overhead. Future Work ----------- This PR enables us to experiment with the throttle for select tenants without affecting the SLI metrics / triggering SLO alerts. Before declaring this feature done, we need more work to happen, specifically: - decide on whether we want to retain the flexibility of throttling any `Timeline::get` call, filtered by TaskKind - versus: separate throttles for each page_service endpoint, potentially with separate config options - the trouble here is that this decision implies changes to the TenantConfig, so, if we start using the current config style now, then decide to switch to a different config, it'll be a breaking change Nice-to-haves but probably not worth the time right now: - Equivalent tests to ensure the throttle applies to all other page_service handlers.	2024-03-05 13:44:00 +00:00
Vlad Lazar	9dec65b75b	pageserver: fix vectored read path delta layer index traversal (#7001 ) ## Problem Last weeks enablement of vectored get generated a number of panics. From them, I diagnosed two issues in the delta layer index traversal logic 1. The `key >= range.start && lsn >= lsn_range.start` was too aggressive. Lsns are not monotonically increasing in the delta layer index (keys are though), so we cannot assert on them. 2. Lsns greater or equal to `lsn_range.end` were not skipped. This caused the query to consider records newer than the request Lsn. ## Summary of changes * Fix the issues mentioned above inline * Refactor the layer traversal logic to make it unit testable * Add unit test which reproduces the failure modes listed above.	2024-03-05 13:35:45 +00:00
Vlad Lazar	ae8468f97e	pageserver: fix AUX key vectored get validation (#7018 ) ## Problem The value reconstruct of AUX_FILES_KEY from records is not deterministic since it uses a hash map under the hood. This caused vectored get validation failures when enabled in staging. ## Summary of changes Deserialise AUX_FILES_KEY blobs comparing. All other keys should reconstruct deterministically, so we simply compare the blobs.	2024-03-05 13:30:43 +00:00
Christian Schwarz	f3e4f85e65	layer file download: final rename: fix durability (#6991 ) Before this PR, the layer file download code would fsync the inode after rename instead of the timeline directory. That is not in line with what a comment further up says we're doing, and it's obviously not achieving the goal of making the rename durable. part of https://github.com/neondatabase/neon/issues/6663	2024-03-05 11:09:13 +00:00
Joonas Koivunen	752bf5a22f	build: clippy disallow futures::pin_mut macro (#7016 ) `std` has had `pin!` macro for some time, there is no need for us to use the older alternatives. Cannot disallow `tokio::pin` because tokio macros use that.	2024-03-05 10:14:37 +00:00
Christian Schwarz	3da410c8fe	tokio-epoll-uring: use it on the layer-creating code paths (#6378 ) part of #6663 See that epic for more context & related commits. Problem ------- Before this PR, the layer-file-creating code paths were using VirtualFile, but under the hood these were still blocking system calls. Generally this meant we'd stall the executor thread, unless the caller "knew" and used the following pattern instead: ``` spawn_blocking(\|\| { Handle::block_on(async { VirtualFile::....().await; }) }).await ``` Solution -------- This PR adopts `tokio-epoll-uring` on the layer-file-creating code paths in pageserver. Note that on-demand downloads still use `tokio::fs`, these will be converted in a future PR. Design: Avoiding Regressions With `std-fs` ------------------------------------------ If we make the VirtualFile write path truly async using `tokio-epoll-uring`, should we then remove the `spawn_blocking` + `Handle::block_on` usage upstack in the same commit? No, because if we’re still using the `std-fs` io engine, we’d then block the executor in those places where previously we were protecting us from that through the `spawn_blocking` . So, if we want to see benefits from `tokio-epoll-uring` on the write path while also preserving the ability to switch between `tokio-epoll-uring` and `std-fs` , where `std-fs` will behave identical to what we have now, we need to **conditionally use `spawn_blocking + Handle::block_on`** . I.e., in the places where we use that know, we’ll need to make that conditional based on the currently configured io engine. It boils down to investigating all the places where we do `spawn_blocking(... block_on(... VirtualFile::...))`. Detailed [write-up of that investigation in Notion](https://neondatabase.notion.site/Surveying-VirtualFile-write-path-usage-wrt-tokio-epoll-uring-integration-spawn_blocking-Handle-bl-5dc2270dbb764db7b2e60803f375e015?pvs=4 ), made publicly accessible. tl;dr: Preceding PRs addressed the relevant call sites: - `metadata` file: turns out we could simply remove it (#6777, #6769, #6775) - `create_delta_layer()`: made sensitive to `virtual_file_io_engine` in #6986 NB: once we are switched over to `tokio-epoll-uring` everywhere in production, we can deprecate `std-fs`; to keep macOS support, we can use `tokio::fs` instead. That will remove this whole headache. Code Changes In This PR ----------------------- - VirtualFile API changes - `VirtualFile::write_at` - implement an `ioengine` operation and switch `VirtualFile::write_at` to it - `VirtualFile::metadata()` - curiously, we only use it from the layer writers' `finish()` methods - introduce a wrapper `Metadata` enum because `std::fs::Metadata` cannot be constructed by code outside rust std - `VirtualFile::sync_all()` and for completeness sake, add `VirtualFile::sync_data()` Testing & Rollout ----------------- Before merging this PR, we ran the CI with both io engines. Additionally, the changes will soak in staging. We could have a feature gate / add a new io engine `tokio-epoll-uring-write-path` to do a gradual rollout. However, that's not part of this PR. Future Work ----------- There's still some use of `std::fs` and/or `tokio::fs` for directory namespace operations, e.g. `std::fs::rename`. We're not addressing those in this PR, as we'll need to add the support in tokio-epoll-uring first. Note that rename itself is usually fast if the directory is in the kernel dentry cache, and only the fsync after rename is slow. These fsyncs are using tokio-epoll-uring, so, the impact should be small.	2024-03-05 09:03:54 +00:00
Alex Chi Z	b7db912be6	compute_ctl: only try zenith_admin if could not authenticate (#6955 ) ## Problem Fix https://github.com/neondatabase/neon/issues/6498 ## Summary of changes Only re-authenticate with zenith_admin if authentication fails. Otherwise, directly return the error message. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-04 14:28:45 -05:00
Alexander Bayandin	3dfae4be8d	upgrade mio 0.8.10 => 0.8.11 (#7009 ) ## Problem `cargo deny` fails - https://rustsec.org/advisories/RUSTSEC-2024-0019 - https://github.com/tokio-rs/mio/security/advisories/GHSA-r8w9-5wcg-vfj7 > The vulnerability is Windows-specific, and can only happen if you are using named pipes. Other IO resources are not affected. ## Summary of changes - Upgrade `mio` from 0.8.10 to 0.8.11 (`cargo update -p mio`)	2024-03-04 19:16:07 +00:00
Christian Schwarz	e62baa9704	upgrade tokio 1.34 => 1.36 (#7008 ) tokio 1.36 has been out for a month. Release notes don't indicate major changes. Skimming through their issue tracker, I can't find open `C-bug` issues that would affect us. (My personal motivation for this is `JoinSet::try_join_next`.)	2024-03-04 18:36:29 +01:00
Alexander Bayandin	191d8ac7e0	vm-image: update pgbouncer from 1.22.0 to 1.22.1 (#7005 ) pgbouncer 1.22.1 has been released > This release fixes issues caused by some clients using COPY FROM STDIN queries. Such queries could introduce memory leaks, performance regressions and prepared statement misbehavior. - NEWS: https://www.pgbouncer.org/2024/03/pgbouncer-1-22-1 - CHANGES: https://github.com/pgbouncer/pgbouncer/compare/pgbouncer_1_22_0...pgbouncer_1_22_1 ## Summary of changes - vm-image: update pgbouncer from 1.22.0 to 1.22.1	2024-03-04 16:04:12 +00:00
Roman Zaynetdinov	0d2395fe96	Update postgres-exporter to v0.12.1 (#7004 ) Fixes https://github.com/neondatabase/neon/issues/6996 Thanks to @bayandin	2024-03-04 16:02:10 +00:00
Christian Schwarz	f0be9400f2	fix(test_remote_storage_upload_queue_retries): became flakier since #6960 (#6999 ) This PR increases the `wait_until` timeout. These are where things became more flaky as of https://github.com/neondatabase/neon/pull/6960. Most likely because it doubles the work in the `churn_while_failpoints_active_thread`. Slack context: https://neondb.slack.com/archives/C033RQ5SPDH/p1709554455962959?thread_ts=1709286362.850549&cid=C033RQ5SPDH	2024-03-04 15:47:13 +01:00
Alex Chi Z	e938bb8157	fix epic issue template (#6920 ) The template does not parse on GitHub	2024-03-04 09:17:14 -05:00
Christian Schwarz	944cac950d	layer file creation: fsync timeline directories using `VirtualFile::sync_all()` (#6986 ) Except for the involvement of the VirtualFile fd cache, this is equivalent to what happened before at runtime. Future PR https://github.com/neondatabase/neon/pull/6378 will implement `VirtualFile::sync_all()` using tokio-epoll-uring if that's configured as the io engine. This PR is preliminary work for that. part of https://github.com/neondatabase/neon/issues/6663	2024-03-04 13:31:09 +00:00
Anna Khanova	e1c032fb3c	Fix type (#6998 ) ## Problem Typo ## Summary of changes Fix	2024-03-04 13:26:16 +00:00
Christian Schwarz	c861d71eeb	layer file creation: fatal_err on timeline dir fsync (#6985 ) As pointed out in the comments added in this PR: the in-memory state of the filesystem already has the layer file in its final place. If the fsync fails, but pageserver continues to execute, it's quite easy for subsequent pageserver code to observe the file being there and assume it's durable, when it really isn't. It can happen that we get ENOSPC during the fsync. However, 1. the timeline dir is small (remember, the big layer _file_ has already been synced). Small data means ENOSPC due to delayed allocation races etc are less likely. 2. what else are we going to do in that case? If we decide to bubble up the error, the file remains on disk. We could try to unlink it and fsync after the unlink. If that fails, we would _definitely_ need to error out. Is it worth the trouble though? Side note: all this logic about not carrying on after fsync failure implies that we `sync` the filesystem successfully before we restart the pageserver. We don't do that right now, but should (=> https://github.com/neondatabase/neon/issues/6989) part of https://github.com/neondatabase/neon/issues/6663	2024-03-04 12:18:22 +00:00
Alexander Bayandin	6e46204712	CI(deploy): use separate workflow for proxy deploys (#6995 ) ## Problem The current implementation of `deploy-prod` workflow doesn't allow to run parallel deploys on Storage and Proxy. ## Summary of changes - Call `deploy-proxy-prod` workflow that deploys only Proxy components, and that can be run in parallel with `deploy-prod` for Storage.	2024-03-04 12:08:44 +00:00
Andreas Scherbaum	5c6d78d469	Rename "zenith" to "neon" (#6957 ) Usually RFC documents are not modified, but the vast mentions of "zenith" in early RFC documents make it desirable to update the product name to today's name, to avoid confusion. ## Problem Early RFC documents use the old "zenith" product name a lot, which is not something everyone is aware of after the product was renamed. ## Summary of changes Replace occurrences of "zenith" with "neon". Images are excluded. --------- Co-authored-by: Andreas Scherbaum <andreas@neon.tech>	2024-03-04 13:02:18 +01:00
Christian Schwarz	3fd77eb0d4	layer file creation: remove redundant fsync()s (#6983 ) The `writer.finish()` methods already fsync the inode, using `VirtualFile::sync_all()`. All that the callers need to do is fsync their directory, i.e., the timeline directory. Note that there's a call in the new compaction code that is apparently dead-at-runtime, so, I couldn't fix up any fsyncs there [Link](`502b69b33b/pageserver/src/tenant/timeline/compaction.rs (L204-L211)`). Note that layer durability still matters somewhat, even after #5198 which made remote storage authoritative. We do have the layer file length as an indicator, but no checksums on the layer file contents. So, a series of overwrites without fsyncs in the middle, plus a subsequent crash, could cause us to end up in a state where the file length matches but the contents are garbage. part of https://github.com/neondatabase/neon/issues/6663	2024-03-04 12:33:42 +01:00
Anna Khanova	3114be034a	proxy: change is cold start to enum (#6948 ) ## Problem Actually it's good idea to distinguish between cases when it's a cold start, but we took the compute from the pool ## Summary of changes Updated to enum.	2024-03-04 10:31:28 +01:00
John Spray	8dc7dc79dd	tests: debugging for `test_secondary_downloads` failures (#6984 ) ## Problem - #6966 - Existing logs aren't pointing to a cause: it looks like heatmap upload and download are happening, but for some reason the evicted layer isn't removed on the secondary location. ## Summary of changes - Assert evicted layer is gone from heatmap before checking its gone from local disk: this will give clarity on whether the issue is with the uploads or downloads. - On assertion failures, log the contents of heatmap.	2024-03-04 09:10:04 +00:00
John Spray	fad9be4598	pageserver: mention key in walredo errors (#6988 ) ## Problem - Walredo errors, e.g. during image creation, mention the LSN affected but not the key. ## Summary of changes - Add key to "error applying ... WAL records" log message	2024-03-04 08:56:55 +00:00
John Spray	20d0939b00	control_plane/attachment_service: implement PlacementPolicy::Secondary, configuration updates (#6521 ) During onboarding, the control plane may attempt ad-hoc creation of a secondary location to facilitate live migration. This gives us two problems to solve: - Accept 'Secondary' mode in /location_config and use it to put the tenant into secondary mode on some physical pageserver, then pass through /tenant/xyz/secondary/download requests - Create tenants with no generation initially, since the initial `Secondary` mode call will not provide us a generation. This PR also fixes modification of a tenant's TenantConf during /location_conf, which was previously ignored, and refines the flow for config modification: - avoid bumping generations when the only reason we're reconciling an attached location is a config change - increment TenantState.sequence when spawning a reconciler: usually schedule() does this, but when we do config changes that doesn't happen, so without this change waiters would think reconciliation was done immediately. `sequence` is a bit of a murky thing right now, as it's dual-purposed for tracking waiters, and for checking if an existing reconciliation is already making updates to our current sequence. I'll follow up at some point to clarify it's purpose. - test config modification at the end of onboarding test	2024-03-01 20:25:53 +00:00
Alex Chi Z	ea0d35f3ca	neon_local: improved docs and fix wrong connstr (#6954 ) The user created with the `--create-test-user` flag is `test` instead of `user`. ref https://github.com/neondatabase/neon/pull/6848 Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-01 14:54:07 -05:00
John Spray	e34059cd18	pageserver: increase DEFAULT_MAX_WALRECEIVER_LSN_WAL_LAG (#6970 ) ## Problem At high ingest rates, pageservers spuriously disconnect from safekeepers because stats updates don't come in frequently enough to keep the broker/safekeeper LSN delta under the wal lag limit. ## Summary of changes - Increase DEFAULT_MAX_WALRECEIVER_LSN_WAL_LAG from 10MiB to 1GiB. This should be enough for realistic per-timeline throughputs.	2024-03-01 16:49:37 +00:00
John Spray	d999c46692	pageserver: handle temp_download files in secondary locations (#6990 ) ## Problem PR #6837 fixed secondary locations to avoid spamming log warnings on temp files, but we also have ".temp_download" files to consider. ## Summary of changes - Give temp_download files the same behavior as temp files. - Refactor the relevant helper to pub(crate) from pub	2024-03-01 16:19:40 +00:00
Arpad Müller	82853cc1d1	Fix warnings and compile errors on nightly (#6886 ) Nightly has added a bunch of compiler and linter warnings. There is also two dependencies that fail compilation on latest nightly due to using the old `stdsimd` feature name. This PR fixes them.	2024-03-01 17:14:19 +01:00
Vlad Lazar	1efaa16260	test: add test for checkpoint timeout flushing (#6950 ) ## Problem https://github.com/neondatabase/neon/pull/6661 changed the layer flushing logic and led to OOMs in staging. The issue turned out to be holding on to in-memory layers for too long. After OOMing we'd need to replay potentially a lot of WAL. ## Summary of changes Test that open layers get flushed after the `checkpoint_timeout` config and do not require WAL reingest upon restart. The workload creates a number of timelines and writes some data to each, but not enough to trigger flushes via the `checkpoint_distance` config. I ran this test against https://github.com/neondatabase/neon/pull/6661 and it was indeed failing.	2024-03-01 14:43:33 +00:00
Bodobolero	4dbb74b559	new test for LFC stats in explain (#6968 ) ## Problem PR https://github.com/neondatabase/neon/pull/6851 implemented new output in PostgreSQL explain. this is a test case for the new function. ## Summary of changes ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [x] If it is a core feature, I have added thorough tests. - [no ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [no] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-03-01 14:33:08 +00:00
Joonas Koivunen	5ab10d051d	metrics: record more details of the responding (#6979 ) On eu-west-1 during benchmarks we sometimes lose samples. Add more time measurements.	2024-03-01 14:04:39 +00:00
John Spray	f8bdce1015	pageserver: fix duplicate shard_id in span (#6981 ) ## Problem shard_id in span is repeated: - https://github.com/neondatabase/neon/issues/6723 Closes: #6723 ## Summary of changes - Only add shard_id to the span when fetching a cached timeline, as it is already added when loading an uncached timeline.	2024-03-01 13:26:45 +00:00
Bodobolero	7ba50708e3	Testcase for neon extension function approximate_working_set_size() (#6980 ) ## Problem PR https://github.com/neondatabase/neon/pull/6935 introduced a new function in neon extension: approximate_working_set_size This test case verifies its working correctly. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-03-01 13:29:08 +01:00
Christian Schwarz	e9e77ee744	tests: add optional cursor to `log_contains` + fix truthiness issues in callers (#6960 ) Extracted from https://github.com/neondatabase/neon/pull/6953 Part of https://github.com/neondatabase/neon/issues/5899 Core Change ----------- In #6953, we need the ability to scan the log _after_ a specific line and ignore anything before that line. This PR changes `log_contains` to returns a tuple of `(matching line, cursor)`. Hand that cursor to a subsequent `log_contains` call to search the log for the next occurrence of the pattern. Other Changes ------------- - Inspect all the callsites of `log_contains` to handle the new tuple return type. - Above inspection unveiled many callers aren't using `assert log_contains(...) is not None` but some weaker version of the code that breaks if `log_contains` ever returns a not-None but falsy value. Fix that. - Above changes unveiled that `test_remote_storage_upload_queue_retries` was using `wait_until` incorrectly; after fixing the usage, I had to raise the `wait_until` timeout. So, maybe this will fix its flakiness.	2024-03-01 10:45:39 +01:00
Joonas Koivunen	ee93700a0f	dube: timeout individual layer evictions, log progress and record metrics (#6131 ) Because of bugs evictions could hang and pause disk usage eviction task. One such bug is known and fixed #6928. Guard each layer eviction with a modest timeout deeming timeouted evictions as failures, to be conservative. In addition, add logging and metrics recording on each eviction iteration: - log collection completed with duration and amount of layers - per tenant collection time is observed in a new histogram - per tenant layer count is observed in a new histogram - record metric for collected, selected and evicted layer counts - log if eviction takes more than 10s - log eviction completion with eviction duration Additionally remove dead code for which no dead code warnings appeared in earlier PR. Follow-up to: #6060.	2024-02-29 20:54:16 +00:00
Christian Schwarz	502b69b33b	refactor(compaction): `RequestContext` shouldn't be `Clone`, only `RequestContextAdaptor` uses it (#6961 ) Extracted from https://github.com/neondatabase/neon/pull/6953 Part of https://github.com/neondatabase/neon/issues/5899	2024-02-29 19:50:23 +00:00
Alex Chi Z	76ab57f33f	test: disable test_superuser on pg15 (#6972 ) ref https://github.com/neondatabase/neon/issues/6969 Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-02-29 18:51:15 +00:00
Vlad Lazar	5984edaecd	libs: fix expired token in auth decode test (#6963 ) The test token expired earlier today (1709200879). I regenerated the token, but without an expiration date this time.	2024-02-29 13:55:38 +00:00
Konstantin Knizhnik	3eb83a0ebb	Provide appoximation of working set using hyper-log-log algorithm in LFC (#6935 ) ## Summary of changes Calculate number of unique page accesses at compute. It can be used to estimate working set size and adjust cache size (shared_buffers or local file cache). Approximation is made using HyperLogLog algorithm. It is performed by local file cache and so is available only when local file cache is enabled. This calculation doesn't take in account access to the pages present in shared buffers, but includes pages available in local file cache. This information can be retrieved using approximate_working_set_size(reset bool) function from neon extension. reset parameter can be used to reset statistic and so collect unique accesses for the particular interval. Below is an example of estimating working set size after pgbench -c 10 -S -T 100 -s 10: ``` postgres=# select approximate_working_set_size(false); approximate_working_set_size ------------------------------ 19052 (1 row) postgres=# select pg_table_size('pgbench_accounts')/8192; ?column? ---------- 16402 (1 row) ``` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-29 15:54:58 +02:00
Joonas Koivunen	4d426f6fbe	feat: support lazy, queued tenant attaches (#6907 ) Add off-by-default support for lazy queued tenant activation on attach. This should be useful on bulk migrations as some tenants will be activated faster due to operations or endpoint startup. Eventually all tenants will get activated by reusing the same mechanism we have at startup (`PageserverConf::concurrent_tenant_warmup`). The difference to lazy attached tenants to startup ones is that we leave their initial logical size calculation be triggered by WalReceiver or consumption metrics. Fixes: #6315 Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-02-29 13:26:29 +02:00
John Spray	d04af08567	control_plane: storage controller secrets by env (#6952 ) ## Problem Sometimes folks prefer not to expose secrets as CLI args. ## Summary of changes - Add ability to load secrets from environment variables. We can eventually remove the AWS SM code path here if nobody is using it -- we don't need to maintain three ways to load secrets.	2024-02-29 10:00:01 +00:00
Alexander Bayandin	54586d6b57	CI: create compute-tools image from compute-node image (#6899 ) ## Problem We build compute-tools binary twice — in `compute-node` and in `compute-tools` jobs, and we build them slightly differently: - `cargo build --locked --profile release-line-debug-size-lto` (previously in `compute-node`) - `mold -run cargo build -p compute_tools --locked --release` (previously in `compute-tools`) Before: - compute-node: 6m 34s - compute-tools (as a separate job): 7m 47s After: - compute-node: 7m 34s - compute-tools (as a separate step, within compute-node job): 5s ## Summary of changes - Move compute-tools image creation to `Dockerfile.compute-node` - Delete `Dockerfile.compute-tools`	2024-02-28 15:24:35 +00:00
John Spray	e5384ebefc	pageserver: accelerate tenant activation on HTTP API timeline read requests (#6944 ) ## Problem Callers of the timeline creation API may issue timeline GETs ahead of creation to e.g. check if their intended timeline already exists, or to learn the LSN of a parent timeline. Although the timeline creation API already triggers activation of a timeline if it's currently waiting to activate, the GET endpoint doesn't, so such callers will encounter 503 responses for several minutes after a pageserver restarts, while tenants are lazily warming up. The original scope of which APIs will activate a timeline was quite small, but really it makes sense to do it for any API that needs a particular timeline to be active. ## Summary of changes - In the timeline detail GET handler, use wait_to_become_active, which triggers immediate activation of a tenant if it was currently waiting for the warmup semaphore, then waits up to 5 seconds for the activation to complete. If it doesn't complete promptly, we return a 503 as before. - Modify active_timeline_for_active_tenant to also use wait_to_become_active, which indirectly makes several other timeline-scope request handlers fast-activate a tenant when called. This is important because a timeline creation flow could also use e.g. get_lsn_for_timestamp as a precursor to creating a timeline. - There is some risk to this change: an excessive number of timeline GET requests could cause too many tenant activations to happen at the same time, leading to excessive queue depth to the S3 client. However, this was already the case for e.g. many concurrent timeline creations.	2024-02-28 14:53:35 +00:00
Alexander Bayandin	60a232400b	CI(pin-build-tools-image): pass secrets to the job (#6949 ) ## Problem `pin-build-tools-image` job doesn't have access to secrets and thus fails. Missed in the original PR[0] - [0] https://github.com/neondatabase/neon/pull/6795 ## Summary of changes - pass secrets to `pin-build-tools-image` job	2024-02-28 14:36:17 +00:00
Andreas Scherbaum	edd809747b	English keyboard has "z" and "y" switched (#6947 ) ## Problem The "z" and "y" letters are switched on the English keyboard, and I'm used to a German keyboard. Very embarrassing. ## Summary of changes Fix syntax error in README Co-authored-by: Andreas Scherbaum <andreas@neon.tech>	2024-02-28 14:10:58 +01:00
Conrad Ludgate	48957e23b7	proxy: refactor span usage (#6946 ) ## Problem Hard to find error reasons by endpoint for HTTP flow. ## Summary of changes I want all root spans to have session id and endpoint id. I want all root spans to be consistent.	2024-02-28 17:10:07 +04:00
Alexander Bayandin	1d5e476c96	CI: use build-tools image from dockerhub (#6795 ) ## Problem Currently, after updating `Dockerfile.build-tools` in a PR, it requires a manual action to make it `pinned`, i.e., the default for everyone. It also makes all opened PRs use such images (even created in the PR and without such changes). This PR overhauls the way we build and use `build-tools` image (and uses the image from Docker Hub). ## Summary of changes - The `neondatabase/build-tools` image gets tagged with the latest commit sha for the `Dockerfile.build-tools` file - Each PR calculates the tag for `neondatabase/build-tools`, tries to pull it, and rebuilds the image with such tag if it doesn't exist. - Use `neondatabase/build-tools` as a default image - When running on `main` branch — create a `pinned` tag and push it to ECR - Use `concurrency` to ensure we don't build `build-tools` image for the same commit in parallel from different PRs	2024-02-28 12:38:11 +00:00
Vlad Lazar	2b11466b59	pageserver: optimise disk io for vectored get (#6780 ) ## Problem The vectored read path proposed in https://github.com/neondatabase/neon/pull/6576 seems to be functionally correct, but in my testing (see below) it is about 10-20% slower than the naive sequential vectored implementation. ## Summary of changes There's three parts to this PR: 1. Supporting vectored blob reads. This is actually trickier than it sounds because on disk blobs are prefixed with a variable length size header. Since the blobs are not necessarily fixed size, we need to juggle the offsets such that the callers can retrieve the blobs from the resulting buffer. 2. Merge disk read requests issued by the vectored read path up to a maximum size. Again, the merging is complicated by the fact that blobs are not fixed size. We keep track of the begin and end offset of each blob and pass them into the vectored blob reader. In turn, the reader will return a buffer and the offsets at which the blobs begin and end. 3. A benchmark for basebackup requests against tenant with large SLRU block counts is added. This required a small change to pagebench and a new config variable for the pageserver which toggles the vectored get validation. We can probably optimise things further by adding a little bit of concurrency for our IO. In principle, it's as simple as spawning a task which deals with issuing IO and doing the serialisation and handling on the parent task which receives input via a channel.	2024-02-28 12:06:00 +00:00
Christian Schwarz	b6bd75964f	Revert "pageserver: roll open layer in timeline writer (#6661 )" + PR #6842 (#6938 ) This reverts commits `587cb705b8` (PR #6661) and `fcbe9fb184` (PR #6842). Conflicts: pageserver/src/tenant.rs pageserver/src/tenant/timeline.rs The conflicts were with * pageserver: adjust checkpoint distance for sharded tenants (#6852) * pageserver: add vectored get implementation (#6576) Also we had to keep the `allowed_errors` to make `test_forward_compatibility` happy, see the PR thread on GitHub for details.	2024-02-28 11:38:23 +00:00
Joonas Koivunen	fcb77f3d8f	build: add a timeout for test-images (#6942 ) normal runtime seems to be 3min, add 20min timeout.	2024-02-28 12:58:13 +02:00
Vlad Lazar	c3a40a06f3	test: wait for storage controller readiness (#6930 ) ## Problem Starting up the pageserver before the storage controller is ready can lead to a round of reconciliation, which leads to the previous tenant being shut down. This disturbs some tests. ## Summary of changes Wait for the storage controller to become ready on neon env start-up. Closes https://github.com/neondatabase/neon/issues/6724	2024-02-28 09:52:22 +00:00
Joonas Koivunen	1b1320a263	fix: allow evicting wanted deleted layers (#6931 ) Not allowing evicting wanted deleted layers is something I've forgotten to implement on #5645. This PR makes it possible to evict such layers, which should reduce the amount of hanging evictions. Fixes: #6928 Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-02-28 00:02:44 +02:00
Konstantin Knizhnik	e1b4d96b5b	Limit number of AUX files deltas to reduce reconstruct time (#6874 ) ## Problem After commit [`840abe3954`] (store AUX files as deltas) we avoid quadratic growth of storage size when storing LR snapshots but get quadratic slowdown of reconstruct time. As a result storing 70k snapshots at my local Neon instance took more than 3 hours and starting node (creation of basecbackup): ~10 minutes. In prod 70k AUX files cause increase of startup time to 40 minutes: https://neondb.slack.com/archives/C03F5SM1N02/p1708513010480179 ## Summary of changes Enforce storing full AUX directory (some analog of FPI) each 1024 files. Time of creation 70k snapshots is reduced to 6 minutes and startup time - to 1.5 minutes (100 seconds). ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-27 21:18:46 +02:00
John Spray	a8ec18c0f4	refactor: move storage controller API structs into pageserver_api (#6927 ) ## Problem This is a precursor to adding a convenience CLI for the storage controller. ## Summary of changes - move controller api structs into pageserver_api::controller_api to make them visible to other crates - rename pageserver_api::control_api to pageserver_api::upcall_api to match the /upcall/v1/ naming in the storage controller. Why here rather than a totally separate crate? It's convenient to have all the pageserver-related stuff in one place, and if we ever wanted to move it to a different crate it's super easy to do that later.	2024-02-27 17:24:01 +00:00
Arpad Müller	045bc6af8b	Add new compaction abstraction, simulator, and implementation. (#6830 ) Rebased version of #5234, part of #6768 This consists of three parts: 1. A refactoring and new contract for implementing and testing compaction. The logic is now in a separate crate, with no dependency on the 'pageserver' crate. It defines an interface that the real pageserver must implement, in order to call the compaction algorithm. The interface models things like delta and image layers, but just the parts that the compaction algorithm needs to make decisions. That makes it easier unit test the algorithm and experiment with different implementations. I did not convert the current code to the new abstraction, however. When compaction algorithm is set to "Legacy", we just use the old code. It might be worthwhile to convert the old code to the new abstraction, so that we can compare the behavior of the new algorithm against the old one, using the same simulated cases. If we do that, have to be careful that the converted code really is equivalent to the old. This inclues only trivial changes to the main pageserver code. All the new code is behind a tenant config option. So this should be pretty safe to merge, even if the new implementation is buggy, as long as we don't enable it. 2. A new compaction algorithm, implemented using the new abstraction. The new algorithm is tiered compaction. It is inspired by the PoC at PR #4539, although I did not use that code directly, as I needed the new implementation to fit the new abstraction. The algorithm here is less advanced, I did not implement partial image layers, for example. I wanted to keep it simple on purpose, so that as we add bells and whistles, we can see the effects using the included simulator. One difference to #4539 and your typical LSM tree implementations is how we keep track of the LSM tree levels. This PR doesn't have a permanent concept of a level, tier or sorted run at all. There are just delta and image layers. However, when compaction starts, we look at the layers that exist, and arrange them into levels, depending on their shapes. That is ephemeral: when the compaction finishes, we forget that information. This allows the new algorithm to work without any extra bookkeeping. That makes it easier to transition from the old algorithm to new, and back again. There is just a new tenant config option to choose the compaction algorithm. The default is "Legacy", meaning the current algorithm in 'main'. If you set it to "Tiered", the new algorithm is used. 3. A simulator, which implements the new abstraction. The simulator can be used to analyze write and storage amplification, without running a test with the full pageserver. It can also draw an SVG animation of the simulation, to visualize how layers are created and deleted. To run the simulator: cargo run --bin compaction-simulator run-suite --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-02-27 17:15:46 +01:00
siegerts	c8ac4c054e	readme: Update Neon link URL (#6918 ) ## Problem ## Summary of changes Updates the neon.tech link to point to a /github page in order to correctly attribute visits originating from the repo.	2024-02-27 11:08:43 -05:00
Anna Khanova	896d51367e	proxy: introdice is cold start for analytics (#6902 ) ## Problem Data team cannot distinguish between cold start and not cold start. ## Summary of changes Report `is_cold_start` to analytics. --------- Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2024-02-27 19:53:02 +04:00
Joonas Koivunen	a691786ce2	fix: logical size calculation gating (#6915 ) Noticed that we are failing to handle `Result::Err` when entering a gate for logical size calculation. Audited rest of the gate enters, which seem fine, unified two instances. Noticed that the gate guard allows to remove a failpoint, then noticed that adjacent failpoint was blocking the executor thread instead of using `pausable_failpoint!`, fix both. eviction_task.rs now maintains a gate guard as well. Cc: #4733	2024-02-27 14:27:13 +00:00
Roman Zaynetdinov	2991d01b61	Export connection counts from sql_exporter (#6926 ) ## Problem We want to show connection counts to console users. ## Summary of changes Start exporting connection counts grouped by database name and connection state.	2024-02-27 13:47:05 +00:00
Konstantin Knizhnik	e895644555	Show LFC statistic in EXPLAIN (#6851 ) ## Problem LFC has high impact on Neon application performance but there is no way for user to check efficiency of its usage ## Summary of changes Show LFC statistic in EXPLAIN ANALYZE ## Description Local file cache (LFC) A layer of caching that stores frequently accessed data from the storage layer in the local memory of the Neon compute instance. This cache helps to reduce latency and improve query performance by minimizing the need to fetch data from the storage layer repeatedly. Externalization of LFC in explain output Then EXPLAIN ANALYZE output is extended to display important counts for local file cache (LFC) hits and misses. This works both, for EXPLAIN text and json output. File cache: hits Whenever the Postgres backend retrieves a page/block from SGMR, it is not found in shared buffer but the page is already found in the LFC this counter is incremented. File cache: misses Whenever the Postgres backend retrieves a page/block from SGMR, it is not found in shared buffer and also not in then LFC but the page is retrieved from Neon storage (page server) this counter is incremented. Example (for explain text output) ```sql explain (analyze,buffers,prefetch,filecache) select count(*) from pgbench_accounts; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=214486.94..214486.95 rows=1 width=8) (actual time=5195.378..5196.034 rows=1 loops=1) Buffers: shared hit=178875 read=143691 dirtied=128597 written=127346 Prefetch: hits=0 misses=1865 expired=0 duplicates=0 File cache: hits=141826 misses=1865 -> Gather (cost=214486.73..214486.94 rows=2 width=8) (actual time=5195.366..5196.025 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=178875 read=143691 dirtied=128597 written=127346 Prefetch: hits=0 misses=1865 expired=0 duplicates=0 File cache: hits=141826 misses=1865 -> Partial Aggregate (cost=213486.73..213486.74 rows=1 width=8) (actual time=5187.670..5187.670 rows=1 loops=3) Buffers: shared hit=178875 read=143691 dirtied=128597 written=127346 Prefetch: hits=0 misses=1865 expired=0 duplicates=0 File cache: hits=141826 misses=1865 -> Parallel Index Only Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.43..203003.02 rows=4193481 width=0) (actual time=0.574..4928.995 rows=3333333 loops=3) Heap Fetches: 3675286 Buffers: shared hit=178875 read=143691 dirtied=128597 written=127346 Prefetch: hits=0 misses=1865 expired=0 duplicates=0 File cache: hits=141826 misses=1865 ``` The json output uses the following keys and provides integer values for those keys: ``` ... "File Cache Hits": 141826, "File Cache Misses": 1865 ... ``` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-27 14:45:54 +02:00
Christian Schwarz	62d77e263f	test_remote_timeline_client_calls_started_metric: fix flakiness (#6911 ) fixes https://github.com/neondatabase/neon/issues/6889 # Problem The failure in the last 3 flaky runs on `main` is ``` test_runner/regress/test_remote_storage.py:460: in test_remote_timeline_client_calls_started_metric churn("a", "b") test_runner/regress/test_remote_storage.py:457: in churn assert gc_result["layers_removed"] > 0 E assert 0 > 0 ``` That's this code `cd449d66ea/test_runner/regress/test_remote_storage.py (L448-L460)` So, the test expects GC to remove some layers but the GC doesn't. # Fix My impression is that the VACUUM isn't re-using pages aggressively enough, but I can't really prove that. Tried to analyze the layer map dump but it's too complex. So, this PR: - Creates more churn by doing the overwrite twice. - Forces image layer creation. It also drive-by removes the redundant call to timeline_compact, because, timeline_checkpoint already does that internally.	2024-02-27 10:55:10 +01:00
Alex Chi Z	b2bbc20311	fix: only alter default privileges when public schema exists (#6914 ) ## Problem Following up https://github.com/neondatabase/neon/pull/6885, only alter default privileges when the public schema exists. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-02-26 11:48:56 -09:00
Vlad Lazar	5accf6e24a	attachment_service: JWT auth enforcement (#6897 ) ## Problem Attachment service does not do auth based on JWT scopes. ## Summary of changes Do JWT based permission checking for requests coming into the attachment service. Requests into the attachment service must use different tokens based on the endpoint: * `/control` and `/debug` require `admin` scope * `/upcall` requires `generations_api` scope * `/v1/...` requires `pageserverapi` scope Requests into the pageserver from the attachment service must use `pageserverapi` scope.	2024-02-26 18:17:06 +00:00
Andreas Scherbaum	0881d4f9e3	Update README, include cleanup details (#6816 ) ## Problem README.md is missing cleanup instructions ## Summary of changes Add cleanup instructions Add instructions how to handle errors during initialization --------- Co-authored-by: Andreas Scherbaum <andreas@neon.tech>	2024-02-26 18:53:48 +01:00
Alexander Bayandin	975786265c	CI: Delete GitHub Actions caches once PR is closed (#6900 ) ## Problem > Approaching total cache storage limit (9.25 GB of 10 GB Used) > Least recently used caches will be automatically evicted to limit the total cache storage to 10 GB. [Learn more about cache usage.](https://docs.github.com/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy) From https://github.com/neondatabase/neon/actions/caches Some of these caches are from closed/merged PRs. ## Summary of changes - Add a workflow that deletes caches for closed branches	2024-02-26 18:17:22 +01:00
Christian Schwarz	c4059939e6	fixup(#6893 ): report_size() still used pageserver_created_persistent_* metrics (#6909 ) Use the remote_timeline_client metrics instead, they work for layer file uploads and are reasonable close to what the `pageserver_created_persistent_*` metrics were. Should we wait for empty upload queue before calling `report_size()`? part of https://github.com/neondatabase/neon/issues/6737	2024-02-26 17:28:00 +01:00
Bodobolero	75baf83fce	externalize statistics on LFC cache usage (#6906 ) ## Problem Customers should be able to determine the size of their workload's working set to right size their compute. Since Neon uses Local file cache (LFC) instead of shared buffers on bigger compute nodes to cache pages we need to externalize a means to determine LFC hit ratio in addition to shared buffer hit ratio. Currently the following end user documentation `fb7cd3af0e/content/docs/manage/endpoints.md (L137)` is wrong because it describes how to right size a compute node based on shared buffer hit ratio. Note that the existing functionality in extension "neon" is NOT available to end users but only to superuser / cloud_admin. ## Summary of changes - externalize functions and views in neon extension to end users - introduce a new view `NEON_STAT_FILE_CACHE` with the following DDL ```sql CREATE OR REPLACE VIEW NEON_STAT_FILE_CACHE AS WITH lfc_stats AS ( SELECT stat_name, count FROM neon_get_lfc_stats() AS t(stat_name text, count bigint) ), lfc_values AS ( SELECT MAX(CASE WHEN stat_name = 'file_cache_misses' THEN count ELSE NULL END) AS file_cache_misses, MAX(CASE WHEN stat_name = 'file_cache_hits' THEN count ELSE NULL END) AS file_cache_hits, MAX(CASE WHEN stat_name = 'file_cache_used' THEN count ELSE NULL END) AS file_cache_used, MAX(CASE WHEN stat_name = 'file_cache_writes' THEN count ELSE NULL END) AS file_cache_writes, -- Calculate the file_cache_hit_ratio within the same CTE for simplicity CASE WHEN MAX(CASE WHEN stat_name = 'file_cache_misses' THEN count ELSE 0 END) + MAX(CASE WHEN stat_name = 'file_cache_hits' THEN count ELSE 0 END) = 0 THEN NULL ELSE ROUND((MAX(CASE WHEN stat_name = 'file_cache_hits' THEN count ELSE 0 END)::DECIMAL / (MAX(CASE WHEN stat_name = 'file_cache_hits' THEN count ELSE 0 END) + MAX(CASE WHEN stat_name = 'file_cache_misses' THEN count ELSE 0 END))) * 100, 2) END AS file_cache_hit_ratio FROM lfc_stats ) SELECT file_cache_misses, file_cache_hits, file_cache_used, file_cache_writes, file_cache_hit_ratio from lfc_values; ``` This view can be used by an end user as follows: ```sql CREATE EXTENSION NEON; SELECT * from neon. NEON_STAT_FILE_CACHE" ``` The output looks like the following: ``` select * from NEON_STAT_FILE_CACHE; file_cache_misses \| file_cache_hits \| file_cache_used \| file_cache_writes \| file_cache_hit_ratio -------------------+-----------------+-----------------+-------------------+---------------------- 2133643 \| 108999742 \| 607 \| 10767410 \| 98.08 (1 row) ``` ## Checklist before requesting a review - [x ] I have performed a self-review of my code. - [x ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [x ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-02-26 16:06:00 +00:00
Roman Zaynetdinov	459c2af8c1	Expose LFC cache size limit from sql_exporter (#6912 ) ## Problem We want to report how much cache was used and what the limit was. ## Summary of changes Added one more query to sql_exporter to expose `neon.file_cache_size_limit`.	2024-02-26 10:36:11 -05:00
Arpad Müller	51a43b121c	Fix test_remote_storage_upload_queue_retries flakiness (#6898 ) * decreases checkpointing and compaction targets for even more layer files * write 10 thousand rows 2 times instead of writing 20 thousand rows 1 time so that there is more to GC. Before it was noisily jumping between 1 and 0 layer files, now it's jumping between 19 and 20 layer files. The 0 caused an assertion error that gave the test most of its flakiness. * larger timeout for the churn while failpoints are active thread: this is mostly so that the test is more robust on systems with more load Fixes #3051	2024-02-26 13:21:40 +01:00
John Spray	256058f2ab	pageserver: only write out legacy tenant config if no generation (#6891 ) ## Problem Previously we always wrote out both legacy and modern tenant config files. The legacy write enabled rollbacks, but we are long past the point where that is needed. We still need the legacy format for situations where someone is running tenants without generations (that will be yanked as well eventually), but we can avoid writing it out at all if we do have a generation number set. We implicitly also avoid writing the legacy config if our mode is Secondary (secondary mode is newer than generations). ## Summary of changes - Make writing legacy tenant config conditional on there being no generation number set.	2024-02-26 10:24:58 +00:00
Christian Schwarz	ceedc3ef73	Timeline::repartition: enforce no concurrent callers & lsn to not move backwards (#6862 ) This PR enforces aspects of `Timeline::repartition` that were already true at runtime: - it's not called concurrently, so, bail out if it is anyway (see comment why it's not called concurrently) - the `lsn` should never be moving backwards over the lifetime of a Timeline object, because last_record_lsn() can only move forwards over the lifetime of a Timeline object The switch to tokio::sync::Mutex blows up the size of the `partitioning` field from 40 bytes to 72 bytes on Linux x86_64. That would be concerning if it was a hot field, but, `partitioning` is only accessed every 20s by one task, so, there won't be excessive cache pain on it. (It still sucks that it's now >1 cache line, but I need the Send-able MutexGuard in the next PR) part of https://github.com/neondatabase/neon/issues/6861	2024-02-26 11:22:15 +01:00
Christian Schwarz	5273c94c59	pageserver: remove two obsolete/unused per-timeline metrics (#6893 ) over-compensating the addition of a new per-timeline metric in https://github.com/neondatabase/neon/pull/6834 part of https://github.com/neondatabase/neon/issues/6737	2024-02-26 09:19:24 +00:00
Christian Schwarz	dedf66ba5b	remove `gc_feedback` mechanism (#6863 ) It's been dead-code-at-runtime for 9 months, let's remove it. We can always re-introduce it at a later point. Came across this while working on #6861, which will touch `time_for_new_image_layer`. This is an opporunity to make that function simpler.	2024-02-26 10:05:24 +01:00
John Spray	8283779ee8	pageserver: remove legacy attach/detach APIs from swagger (#6883 ) ## Problem Since the location config API was added, the attach and detach endpoints are deprecated. Hiding them from consumers of the swagger definition is a precursor to removing them Neon's cloud no longer uses this api since https://github.com/neondatabase/cloud/pull/10538 Fully removing the APIs will implicitly make use of generation numbers mandatory, and should happen alongside https://github.com/neondatabase/neon/issues/5388, which will happen once we're happy that the storage controller is ready for prime time. ## Summary of changes - Remove /attach and /detach from pageserver's swagger file	2024-02-25 14:53:17 +00:00
Joonas Koivunen	b8f9e3a9eb	fix(flaky): typo Stopping/Stopped (#6894 ) introduced in `8dee9908f8`, should help with the #6681 common problem which is just a mismatched allowed error.	2024-02-24 21:32:41 +00:00
Christian Schwarz	ec3efc56a8	Revert "Revert "refactor(VirtualFile::crashsafe_overwrite): avoid Handle::block_on in callers"" (#6775 ) Reverts neondatabase/neon#6765 , bringing back #6731 We concluded that #6731 never was the root cause for the instability in staging. More details: https://neondb.slack.com/archives/C033RQ5SPDH/p1708011674755319 However, the massive amount of concurrent `spawn_blocking` calls from the `save_metadata` calls during startups might cause a performance regression. So, we'll merge this PR here after we've stopped writing the metadata #6769).	2024-02-23 17:16:43 +01:00
Alexander Bayandin	94f6b488ed	CI(release-proxy): fix a couple missed release-proxy branch handling (#6892 ) ## Problem In the original PR[0], I've missed a couple of `release` occurrences that should also be handled for `release-proxy` branch - [0] https://github.com/neondatabase/neon/pull/6797 ## Summary of changes - Add handling for `release-proxy` branch to allure report - Add handling for `release-proxy` branch to e2e tests malts.com	2024-02-23 14:12:09 +00:00
Anastasia Lubennikova	a12e4261a3	Add neon.primary_is_running GUC. (#6705 ) We set it for neon replica, if primary is running. Postgres uses this GUC at the start, to determine if replica should wait for RUNNING_XACTS from primary or not. Corresponding cloud PR is https://github.com/neondatabase/cloud/pull/10183 * Add test hot-standby replica startup. * Extract oldest_running_xid from XlRunningXits WAL records. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-02-23 13:56:41 +00:00
Christian Schwarz	cd449d66ea	stop writing `metadata` file (#6769 ) Building atop #6777, this PR removes the code that writes the `metadata` file and adds a piece of migration code that removes any remaining `metadata` files. We'll remove the migration code after this PR has been deployed. part of https://github.com/neondatabase/neon/issues/6663 More cleanups punted into follow-up issue, as they touch a lot of code: https://github.com/neondatabase/neon/issues/6890	2024-02-23 14:33:47 +01:00
Alexander Bayandin	6f8f7c7de9	CI: Build images using docker buildx instead of kaniko (#6871 ) ## Problem To "build" a compute image that doesn't have anything new, kaniko takes 13m[0], docker buildx does it in 5m[1]. Also, kaniko doesn't fully support bash expressions in the Dockerfile `RUN`, so we have to use different workarounds for this (like `bash -c ...`). - [0] https://github.com/neondatabase/neon/actions/runs/8011512414/job/21884933687 - [1] https://github.com/neondatabase/neon/actions/runs/8008245697/job/21874278162 ## Summary of changes - Use docker buildx to build `compute-node` images - Use docker buildx to build `neon-image` image - Use docker buildx to build `compute-tools` image - Use docker hub for image cache (instead of ECR)	2024-02-23 12:36:18 +01:00
Alex Chi Z	12487e662d	compute_ctl: move default privileges grants to handle_grants (#6885 ) ## Problem Following up https://github.com/neondatabase/neon/pull/6884, hopefully, a real final fix for https://github.com/neondatabase/neon/issues/6236. ## Summary of changes `handle_migrations` is done over the main `postgres` db connection. Therefore, the privileges assigned here do not work with databases created later (i.e., `neondb`). This pull request moves the grants to `handle_grants`, so that it runs for each DB created. The SQL is added into the `BEGIN/END` block, so that it takes only one RTT to apply all of them. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-02-22 17:00:03 -05:00
Arseny Sher	5bcae3a86e	Drop LR slots if too many .snap files are found. PR #6655 turned out to be not enough to prevent .snap files bloat; some subscribers just don't ack flushed position, thus never advancing the slot. Probably other bloating scenarios are also possible, so add a more direct restriction -- drop all slots if too many .snap files has been discovered.	2024-02-23 01:12:49 +04:00
Konstantin Knizhnik	47657f2df4	Flush logical messages with snapshots and replication origin (#6826 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1708363190710839 ## Summary of changes Flush logical message with snapshot and origin state ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-22 21:33:38 +02:00
Sasha Krassovsky	d669dacd71	Add pgpartman (#6849 ) ## Problem ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-02-22 10:05:37 -08:00
Alex Chi Z	837988b6c9	compute_ctl: run migrations to grant default grantable privileges (#6884 ) ## Problem Following up on https://github.com/neondatabase/neon/pull/6845, we did not make the default privileges grantable before, and therefore, even if the users have full privileges, they are not able to grant them to others. Should be a final fix for https://github.com/neondatabase/neon/issues/6236. ## Summary of changes Add `WITH GRANT` to migrations so that neon_superuser can grant the permissions. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-02-22 17:49:02 +00:00
John Spray	9c6145f0a9	control_plane: fix a compilation error from racing PRs (#6882 ) Merge of two green PRs raced, and ended up with a non-compiling result.	2024-02-22 16:51:46 +00:00
Alexander Bayandin	2424d90883	CI: Split Proxy and Storage releases (#6797 ) ## Problem We want to release Proxy at a different cadence. ## Summary of changes - build-and-test workflow: - Handle the `release-proxy` branch - Tag images built on this branch with `release-proxy-XXX` tag - Trigger deploy workflow with `deployStorage=true` & `deployStorageBroker=true` on `release` branch - Trigger deploy workflow with `deployPgSniRouter=true` & `deployProxy=true` on `release-proxy` branch - release workflow (scheduled creation of release branch): - Schedule Proxy releases for Thursdays (a random day to make it different from Storage releases)	2024-02-22 17:15:18 +01:00
John Spray	cf3baf6039	storage controller: fix consistency check (#6855 ) - Some checks weren't properly returning an error when they failed - TenantState::to_persistent wasn't setting generation_pageserver properly - Changes to node scheduling policy weren't being persisted.	2024-02-22 14:10:49 +00:00
John Spray	9c48b5c4ab	controller: improved handling of offline nodes (#6846 ) Stacks on https://github.com/neondatabase/neon/pull/6823 - Pending a heartbeating mechanism (#6844 ), use /re-attach calls as a cue to mark an offline node as active, so that a node which is unavailable during controller startup doesn't require manual intervention if it later starts/restarts. - Tweak scheduling logic so that when we schedule the attached location for a tenant, we prefer to select from secondary locations rather than picking a fresh one. This is an interim state until we implement #6844 and full chaos testing for handling failures.	2024-02-22 14:01:06 +00:00
Christian Schwarz	c671aeacd4	fix(per-tenant throttling): incorrect `allowed_rps` field in log message (#6869 ) The `refill_interval` switched from a milliseconds usize to a Duration during a review follow-up, hence this slipped through manual testing. Part of https://github.com/neondatabase/neon/issues/5899	2024-02-22 14:19:11 +01:00
Joonas Koivunen	bc7a82caf2	feat: bare-bones /v1/utilization (#6831 ) PR adds a simple at most 1Hz refreshed informational API for querying pageserver utilization. In this first phase, no actual background calculation is performed. Instead, the worst possible score is always returned. The returned bytes information is however correct. Cc: #6835 Cc: #5331	2024-02-22 13:58:59 +02:00
John Spray	b5246753bf	storage controller: miscellaneous improvements (#6800 ) - Add some context to logs - Add tests for pageserver restarts when managed by storage controller - Make /location_config tolerate compute hook failures on shard creations, not just modifications.	2024-02-22 09:33:40 +00:00
John Spray	c1095f4c52	pageserver: don't warn on tempfiles in secondary location (#6837 ) ## Problem When a secondary mode location starts up, it scans local layer files. Currently it warns on any layers whose names don't parse as a LayerFileName, generating warning spam from perfectly normal tempfiles. ## Summary of changes - Refactor local vars to build a Utf8PathBuf for the layer file candidate - Use the crate::is_temporary check to identify + clean up temp files. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-02-22 09:32:27 +00:00
Anna Khanova	1718c0b59b	Proxy: cancel query on connection drop (#6832 ) ## Problem https://github.com/neondatabase/cloud/issues/10259 ## Summary of changes Make sure that the request is dropped once the connection was dropped.	2024-02-21 22:43:55 +00:00
Joe Drumgoole	8107ae8377	README: Fix the link to the free tier request (#6858 )	2024-02-21 23:42:24 +01:00
dependabot[bot]	555ee9fdd0	build(deps): bump cryptography from 42.0.2 to 42.0.4 (#6870 )	2024-02-21 21:41:51 +00:00
Alex Chi Z	6921577cec	compute_ctl: grant default privileges on table to `neon_superuser` (#6845 ) ## Problem fix https://github.com/neondatabase/neon/issues/6236 again ## Summary of changes This pull request adds a setup command in compute spec to modify default privileges of public schema to have full permission on table/sequence for neon_superuser. If an extension upgrades to superuser during creation, the tables/sequences they create in the public schema will be automatically granted to neon_superuser. Questions: * does it impose any security flaws? public schema should be fine... * for all extensions that create tables in schemas other than public, we will need to manually handle them (e.g., pg_anon). * we can modify some extensions to remove their superuser requirement in the future. * we may contribute to Postgres to allow for the creation of extensions with a specific user in the future. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-02-21 16:09:34 -05:00
Arpad Müller	20fff05699	Remove stray del and TODO (#6867 ) The TODO has made it into #6821. I originally just put it there for bookmarking purposes. The `del` has been added by #6818 but is also redundant.	2024-02-21 19:39:14 +00:00
Alexander Bayandin	f2767d2056	CI: run check-permissions before all jobs (#6794 ) ## Problem For PRs from external contributors, we're still running `actionlint` and `neon_extra_builds` workflows (which could fail due to lack of permissions to secrets). ## Summary of changes - Extract `check-permissions` job to a separate reusable workflow - Depend all jobs from `actionlint` and `neon_extra_builds` workflows on `check-permissions`	2024-02-21 20:32:12 +01:00
Tristan Partin	76b92e3389	Fix multithreaded postmaster on macOS curl_global_init() with an IPv6 enabled curl build on macOS will cause the calling program to become multithreaded. Unfortunately for shared_preload_libraries, that means the postmaster becomes multithreaded, which CANNOT happen. There are checks in Postgres to make sure that this is not the case.	2024-02-21 13:22:30 -06:00
Arthur Petukhovsky	03f8a42ed9	Add walsenders_keep_horizon option (#6860 ) Add `--walsenders-keep-horizon` argument to safekeeper cmdline. It will prevent deleting WAL segments from disk if they are needed by the active START_REPLICATION connection. This is useful for sharding. Without this option, if one of the shard falls behind, it starts to read WAL from S3, which is much slower than disk. This can result in huge shard lagging.	2024-02-21 19:09:40 +00:00
Conrad Ludgate	60e5a56a5a	proxy: include client IP in ip deny message (#6854 ) ## Problem Debugging IP deny errors is difficult for our users ## Summary of changes Include the client IP in the deny message	2024-02-21 18:24:59 +01:00
John Spray	afda4420bd	test_sharding_ingress: bigger data, skip in debug mode (#6859 ) ## Problem Accidentally merged #6852 without this test stability change. The test as-written could sometimes fail on debug-pg14. ## Summary of changes - Write more data so that the test can more reliably assert on the ratio of total layers to small layers - Skip the test in debug mode, since writing any more than a tiny bit of data tends to result in a flaky test in the much slower debug environment.	2024-02-21 17:03:55 +00:00
John Spray	ce1673a8c4	tests: improve stability of tests using `wait_for_upload_queue_empty` (#6856 ) ## Problem PR #6834 introduced an assertion that the sets of metric labels on finished operations should equal those on started operations, which is not true if no operations have finished yet for a particular set of labels. ## Summary of changes - Instead of asserting out, wait and re-check in the case that finished metrics don't match started	2024-02-21 16:00:17 +00:00
John Spray	532b0fa52b	Revise CODEOWNERS (#6840 ) ## Problem - Current file has ambiguous ownership for some paths - The /control_plane/attachment_service is storage specific & updates there don't need to request reviews from other teams. ## Summary of changes - Define a single owning team per path, so that we can make reviews by that team mandatory in future. - Remove the top-level /control_plane as no one specific team owns neon_local, and we would rarely see a PR that exclusively touches that path. - Add an entry for /control_plane/attachment_service, which is newer storage-specific code.	2024-02-21 15:45:22 +00:00
Arpad Müller	4de2f0f3e0	Implement a sharded time travel recovery endpoint (#6821 ) The sharding service didn't have support for S3 disaster recovery. This PR adds a new endpoint to the attachment service, which is slightly different from the endpoint on the pageserver, in that it takes the shard count history of the tenant as json parameters: we need to do time travel recovery for both the shard count at the target time and the shard count at the current moment in time, as well as the past shard counts that either still reference. Fixes #6604, part of https://github.com/neondatabase/cloud/issues/8233 --------- Co-authored-by: John Spray <john@neon.tech>	2024-02-21 16:35:37 +01:00
Joonas Koivunen	41464325c7	fix: remaining missed cancellations and timeouts (#6843 ) As noticed in #6836 some occurances of error conversions were missed in #6697: - `std::io::Error` popped up by `tokio::io::copy_buf` containing `DownloadError` was turned into `DownloadError::Other` - similarly for secondary downloader errors These changes come at the loss of pathname context. Cc: #6096	2024-02-21 15:20:59 +00:00
Joonas Koivunen	7257ffbf75	feat: imitiation_only eviction_task policy (#6598 ) mostly reusing the existing and perhaps controversially sharing the histogram. in practice we don't configure this per-tenant. Cc: #5331	2024-02-21 16:57:30 +02:00
John Spray	84f027357d	pageserver: adjust checkpoint distance for sharded tenants (#6852 ) ## Problem Where the stripe size is the same order of magnitude as the checkpoint distance (such as with default settings), tenant shards can easily pass through `checkpoint_distance` bytes of LSN without actually ingesting anything. This results in emitting many tiny L0 delta layers. ## Summary of changes - Multiply checkpoint distance by shard count before comparing with LSN distance. This is a heuristic and does not guarantee that we won't emit small layers, but it fixes the issue for typical cases where the writes in a (checkpoint_distance * shard_count) range of LSN bytes are somewhat distributed across shards. - Add a test that checks the size of layers after ingesting to a sharded tenant; this fails before the fix. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-02-21 14:12:35 +00:00
Heikki Linnakangas	428d9fe69e	tests: Make test_vm_bit_clear_on_heap_lock more robust again. (#6714 ) When checking that the contents of the VM page in cache and in pageserver match, ignore the LSN on the page. It could be different, if the page was flushed from cache by a checkpoint, for example. Here's one such failure from the CI that this hopefully fixes: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6687/7847132649/index.html#suites/8545ca7650e609b2963d4035816a356b/5f9018db15ef4408/ In the passing, also remove some log.infos from the loop. I added them while developing the tests, but now they're just noise.	2024-02-21 12:36:57 +00:00
Conrad Ludgate	e0af945f8f	proxy: improve error classification (#6841 ) ## Problem ## Summary of changes 1. Classify further cplane API errors 2. add 'serviceratelimit' and make a few of the timeout errors return that. 3. a few additional minor changes	2024-02-21 10:04:09 +00:00
John Spray	e7452d3756	storage controller: concurrency + deadlines during startup reconcile (#6823 ) ## Problem During startup_reconcile we do a couple of potentially-slow things: - Calling out to all nodes to read their locations - Calling out to the cloud control plane to notify it of all tenants' attached nodes The read of node locations was not being done concurrently across nodes, and neither operation was bounded by a well defined deadline. ## Summary of changes - Refactor the async parts of startup_reconcile into separate functions - Add concurrency and deadline to `scan_node_locations` - Add deadline to `compute_notify_many` - Run `cleanup_locations` in the background: there's no need for startup_reconcile to wait for this to complete.	2024-02-21 09:54:25 +00:00
Vlad Lazar	5d6083bfc6	pageserver: add vectored get implementation (#6576 ) This PR introduces a new vectored implementation of the read path. The search is basically a DFS if you squint at it long enough. LayerFringe tracks the next layers to visit and acts as our stack. Vertices are tuples of (layer, keyspace, lsn range). Continuously pop the top of the stack (most recent layer) and do all the reads for one layer at once. The search maintains a fringe (`LayerFringe`) which tracks all the layers that intersect the current keyspace being searched. Continuously pop the top of the fringe (layer with highest LSN) and get all the data required from the layer in one go. Said search is done on one timeline at a time. If data is still required for some keys, then search the ancestor timeline. Apart from the high level layer traversal, vectored variants have been introduced for grabbing data from each layer type. They still suffer from read amplification issues and that will be addressed in a different PR. You might notice that in some places we duplicate the code for the existing read path. All of that code will be removed when we switch the non-vectored read path to proxy into the vectored read path. In the meantime, we'll have to contend with the extra cruft for the sake of testing and gentle releasing.	2024-02-21 09:49:46 +00:00
Alex Chi Z	3882f57001	neon_local: add flag to create test user and database (#6848 ) This pull request adds two flags: `--update-catalog true` for `endpoint create`, and `--create-test-user true` for `endpoint start`. The former enables catalog updates for neon_superuser permission and many other things, while the latter adds the user `test` and the database `neondb` when setting up the database. A combination of these two flags will create a Postgres similar to the production environment so that it would be easier for us to test if extensions behave correctly when added to Neon Postgres. Example output: ``` ❯ cargo neon endpoint start main --create-test-user true Finished dev [unoptimized + debuginfo] target(s) in 0.22s Running `target/debug/neon_local endpoint start main --create-test-user true` Starting existing endpoint main... Starting postgres node at 'postgresql://cloud_admin@127.0.0.1:55432/postgres' Also at 'postgresql://user@127.0.0.1:55432/neondb' ``` --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-02-21 00:20:42 +00:00
Alexander Bayandin	04190a1fea	CI(test_runner): misc small changes (#6801 ) ## Problem A set of small changes that are too small to open a separate for each. A notable change is adding `pytest-repeat` plugin, which can help to ensure that a flaky test is fixed by running such a test several times. ## Summary of changes - Update Allure from 2.24.0 to 2.27.0 - Update Ruff from 0.1.11 to 0.2.2 (update `[tool.ruff]` section of `pyproject.toml` for it) - Install pytest-repeat plugin	2024-02-20 20:45:00 +00:00
Vlad Lazar	fcbe9fb184	test: adjust checkpoint distance in `test_layer_map` (#6842 ) `587cb705b8` changed the layer rolling logic to more closely obey the `checkpoint_distance` config. Previously, this test was getting layers significantly larger than the 8K it was asking for. Now the payload in the layers is closer to 8K (which means more layers in total). Tweak the `checkpoint_distance` to get a number of layers more reasonable for this test. Note that we still get more layers than before (~8K vs ~5K).	2024-02-20 19:42:54 +00:00
Nikita Kalyanov	cbb599f353	Add /terminate API (#6745 ) this is to speed up suspends, see https://github.com/neondatabase/cloud/issues/10284 ## Problem ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-02-20 19:42:36 +02:00
Christian Schwarz	e49602ecf5	feat(metrics): per-timeline metric for on-demand downloads, remove calls_started histogram (#6834 ) refs #6737 # Problem Before this PR, on-demand downloads weren't measured per tenant_id. This makes root-cause analysis of latency spikes harder, requiring us to resort to log scraping for ``` {neon_service="pageserver"} \|= `downloading on-demand` \|= `$tenant_id` ``` which can be expensive when zooming out in Grafana. Context: https://neondb.slack.com/archives/C033RQ5SPDH/p1707809037868189 # Solution / Changes - Remove the calls_started histogram - I did the dilegence, there are only 2 dashboards using this histogram, and in fact only one uses it as a histogram, the other just as a a counter. - [Link 1](`8115b54d9f/neonprod/dashboards/hkXNF7oVz/dashboard-Z31XmM24k.yaml (L1454)`): `Pageserver Thrashing` dashboard, linked from playbook, will fix. - [Link 2](`8115b54d9f/neonprod/dashboards/CEllzAO4z/dashboard-sJqfNFL4k.yaml (L599)`): one of my personal dashboards, unused for a long time, already broken in other ways, no need to fix. - replace `pageserver_remote_timeline_client_calls_unfinished` gauge with a counter pair - Required `Clone`-able `IntCounterPair`, made the necessary changes in the `libs/metrics` crate - fix tests to deal with the fallout A subsequent PR will remove a timeline-scoped metric to compensate. Note that we don't need additional global counters for the per-timeline counters affected by this PR; we can use the `remote_storage` histogram for those, which, conveniently, also include the secondary-mode downloads, which aren't covered by the remote timeline client metrics (should they?).	2024-02-20 17:52:23 +01:00
John Spray	eb02f4619e	tests: add a shutdown log noise case to test_location_conf_churn (#6828 ) This test does lots of shutdowns, and we may emit this layer warning during shutdown. Saw a spurious failure here: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6820/7964134049/index.html#/testresult/784218040583d963	2024-02-20 17:34:12 +01:00
Arthur Petukhovsky	9b8df2634f	Fix active_timelines_count metric (#6839 )	2024-02-20 15:55:51 +00:00
John Spray	d152d4f16f	pageserver: fix treating all download errors as 'Other' (#6836 ) ## Problem `download_retry` correctly uses a fatal check to avoid retrying forever on cancellations and NotFound cases. However, `download_layer_file` was casting all download errors to "Other" in order to attach an anyhow::Context. Noticed this issue in the context of secondary downloads, where requests to download layers that might not exist are issued intentionally, and this resulted in lots of error spam from retries that shouldn't have happened. ## Summary of changes - Remove the `.context()` so that the original DownloadError is visible to backoff::retry	2024-02-20 13:40:46 +00:00
Christian Schwarz	b467d8067b	fix(test_ondemand_download_timetravel): occasionally fails with WAL timeout during layer creation (#6818 ) refs https://github.com/neondatabase/neon/issues/4112 amends https://github.com/neondatabase/neon/pull/6687 Since my last PR #6687 regarding this test, the type of flakiness that has been observed has shifted to the beginning of the test, where we create the layers: ``` timed out while waiting for remote_consistent_lsn to reach 0/411A5D8, was 0/411A5A0 ``` [Example Allure Report](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6789/7932503173/index.html#/testresult/ddb877cfa4062f7d) Analysis -------- I suspect there was the following race condition: - endpoints push out some tiny piece of WAL during their endpoints.stop_all() - that WAL reaches the SK (it's just one SK according to logs) - the SKs send it into the walreceiver connection - the SK gets shut down - the checkpoint is taken, with last_record_lsn = 0/411A5A0 - the PS's walreceiver_connection_handler processes the WAL that was sent into the connection by the SKs; this advances last_record_lsn to 0/411A5D8 - we get current_lsn = 0/411A5D8 - nothing flushes a layer Changes ------- There's no testing / debug interface to shut down / server all walreceiver connections. So, this PR restarts pageserver to achieve it. Also, it lifts the "wait for image layer uploads" further up, so that after this first restart, the pageserver really does _nothing_ by itself, and so, the origianl physical size mismatch issue quoted in #6687 should be fixed. (My initial suspicion hasn't changed that it was due to the tiny chunk of endpoint.stop_all() WAL being ingested after the second PS restart.)	2024-02-20 14:09:15 +01:00
Christian Schwarz	a48b23d777	fix(startup + remote_timeline_client): no-op deletion ops scheduled during startup (#6825 ) Before this PR, if remote storage is configured, `load_layer_map`'s call to `RemoteTimelineClient::schedule_layer_file_deletion` would schedule an empty UploadOp::Delete for each timeline. It's jsut CPU overhead, no actual interaction with deletion queue on-disk state or S3, as far as I can tell. However, it shows up in the "RemoteTimelineClient calls started metrics", which I'm refining in an orthogonal PR.	2024-02-20 14:06:25 +01:00
Conrad Ludgate	21a86487a2	proxy: fix #6529 (#6807 ) ## Problem `application_name` for HTTP is not being recorded ## Summary of changes get `application_name` query param	2024-02-20 11:58:01 +01:00
Conrad Ludgate	686b3c79c8	http2 alpn (#6815 ) ## Problem Proxy already supported HTTP2, but I expect no one is using it because we don't advertise it in the TLS handshake. ## Summary of changes #6335 without the websocket changes.	2024-02-20 10:44:46 +00:00
John Spray	02a8b7fbe0	storage controller: issue timeline create/delete calls concurrently (#6827 ) ## Problem Timeline creation is meant to be very fast: it should only take approximately on S3 PUT latency. When we have many shards in a tenant, we should preserve that responsiveness. ## Summary of changes - Issue create/delete pageserver API calls concurrently across all >0 shards - During tenant deletion, delete shard zero last, separately, to avoid confusing anything using GETs on the timeline. - Return 201 instead of 200 on creations to make cloud control plane happy --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-02-20 10:13:21 +00:00
Alexander Bayandin	feb359b459	CI: Update deprecated GitHub Actions (#6822 ) ## Problem We use a bunch of deprecated actions. See https://github.com/neondatabase/neon/actions/runs/7958569728 (Annotations section) ``` Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3, actions/setup-java@v3, actions/cache@v3, actions/github-script@v6. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/. ``` ## Summary of changes - `actions/cache@v3` -> `actions/cache@v4` - `actions/checkout@v3` -> `actions/checkout@v4` - `actions/github-script@v6` -> `actions/github-script@v7` - `actions/setup-java@v3` -> `actions/setup-java@v4` - `actions/upload-artifact@v3` -> `actions/upload-artifact@v4`	2024-02-19 21:46:22 +00:00
John Spray	0c105ef352	storage controller: debug observability endpoints and self-test (#6820 ) This PR stacks on https://github.com/neondatabase/neon/pull/6814 Observability: - Because we only persist a subset of our state, and our external API is pretty high level, it can be hard to get at the detail of what's going on internally (e.g. the IntentState of a shard). - Add debug endpoints for getting a full dump of all TenantState and SchedulerNode objects - Enrich the /control/v1/node listing endpoint to include full in-memory detail of `Node` rather than just the `NodePersistence` subset Consistency checks: - The storage controller maintains separate in-memory and on-disk states, by design. To catch subtle bugs, it is useful to occasionally cross-check these. - The Scheduler maintains reference counts for shard->node relationships, which could drift if there was a bug in IntentState: exhausively cross check them in tests.	2024-02-19 20:29:23 +00:00
John Spray	4f7704af24	storage controller: fix spurious reconciles after pageserver restarts (#6814 ) ## Problem When investigating test failures (https://github.com/neondatabase/neon/issues/6813) I noticed we were doing a bunch of Reconciler runs right after splitting a tenant. It's because the splitting test does a pageserver restart, and there was a bug in /re-attach handling, where we would update the generation correctly in the database and intent state, but not observed state, thereby triggering a reconciliation on the next call to maybe_reconcile. This didn't break anything profound (underlying rules about generations were respected), but caused the storage controller to do an un-needed extra round of bumping the generation and reconciling. ## Summary of changes - Start adding metrics to the storage controller - Assert on the number of reconciles done in test_sharding_split_smoke - Fix /re-attach to update `observed` such that we don't spuriously re-reconcile tenants.	2024-02-19 17:44:20 +00:00
Arpad Müller	e0c12faabd	Allow initdb preservation for broken tenants (#6790 ) Often times the tenants we want to (WAL) DR are the ones which the pageserver marks as broken. Therefore, we should allow initdb preservation also for broken tenants. Fixes #6781.	2024-02-19 17:27:02 +01:00
John Spray	2f8a2681b8	pageserver: ensure we never try to save empty delta layer (#6805 ) ## Problem Sharded tenants could panic during compaction when they try to generate an L1 delta layer for a region that contains no keys on a particular shard. This is a variant of https://github.com/neondatabase/neon/issues/6755, where we attempt to save a delta layer with no keys. It is harder to reproduce than the case of image layers fixed in https://github.com/neondatabase/neon/pull/6776. It will become even less likely once https://github.com/neondatabase/neon/pull/6778 tweaks keyspace generation, but even then, we should not rely on keyspace partitioning to guarantee at least one stored key in each partition. ## Summary of changes - Move construction of `writer` in `compact_level0_phase1`, so that we never leave a writer constructed but without any keys.	2024-02-19 15:07:07 +00:00
John Spray	7e4280955e	control_plane/attachment_service: improve Scheduler (#6633 ) ## Problem One of the major shortcuts in the initial version of this code was to construct a fresh `Scheduler` each time we need it, which is an O(N^2) cost as the tenant count increases. ## Summary of changes - Keep `Scheduler` alive through the lifetime of ServiceState - Use `IntentState` as a reference tracking helper, updating Scheduler refcounts as nodes are added/removed from the intent. There is an automated test that checks things don't get pathologically slow with thousands of shards, but it's not included in this PR because tests that implicitly test the runner node performance take some thought to stabilize/land in CI.	2024-02-19 14:12:20 +00:00
John Spray	349b375010	pageserver: remove heatmap file during tenant delete (#6806 ) ## Problem Secondary mode locations keep a local copy of the heatmap, which needs cleaning up during deletion. Closes: https://github.com/neondatabase/neon/issues/6802 ## Summary of changes - Extend test_live_migration to reproduce the issue - Remove heatmap-v1.json during tenant deletion	2024-02-19 14:01:36 +00:00
Conrad Ludgate	d0d4871682	proxy: use postgres_protocol scram/sasl code (#4748 ) 1) `scram::password` was used in tests only. can be replaced with `postgres_protocol::password`. 2) `postgres_protocol::authentication::sasl` provides a client impl of SASL which improves our ability to test	2024-02-19 12:54:17 +00:00
Vlad Lazar	587cb705b8	pageserver: roll open layer in timeline writer (#6661 ) ## Problem One WAL record can actually produce an arbitrary amount of key value pairs. This is problematic since it might cause our frozen layers to bloat past the max allowed size of S3 single shot uploads. [#6639](https://github.com/neondatabase/neon/pull/6639) introduced a "should roll" check after every batch of `ingest_batch_size` (100 WAL records by default). This helps, but the original problem still exists. ## Summary of changes This patch moves the responsibility of rolling the currently open layer to the `TimelineWriter`. Previously, this was done ad-hoc via calls to `check_checkpoint_distance`. The advantages of this approach are: * ability to split one batch over multiple open layers * less layer map locking * remove ad-hoc check_checkpoint_distance calls More specifically, we track the current size of the open layer in the writer. On each `put` check whether the current layer should be closed and a new one opened. Keeping track of the currently open layer results in less contention on the layer map lock. It only needs to be acquired on the first write and on writes that require a roll afterwards. Rolling the open layer can be triggered by: 1. The distance from the last LSN we rolled at. This bounds the amount of WAL that the safekeepers need to store. 2. The size of the currently open layer. 3. The time since the last roll. It helps safekeepers to regard pageserver as caught up and suspend activity. Closes #6624	2024-02-19 12:34:27 +00:00
Alexander Bayandin	4d2bf55e6c	CI: temporary disable coverage report for regression tests (#6798 ) ## Problem The merging coverage data step recently started to be too flaky. This failure blocks staging deployment and along with the flakiness of regression tests might require 4-5-6 manual restarts of a CI job. Refs: - https://github.com/neondatabase/neon/issues/4540 - https://github.com/neondatabase/neon/issues/6485 - https://neondb.slack.com/archives/C059ZC138NR/p1704131143740669 ## Summary of changes - Disable code coverage report for functional tests	2024-02-19 11:07:27 +00:00
John Spray	5667372c61	pageserver: during shard split, wait for child to activate (#6789 ) ## Problem test_sharding_split_unsharded was flaky with log errors from tenants not being active. This was happening when the split function enters wait_lsn() while the child shard might still be activating. It's flaky rather than an outright failure because activation is usually very fast. This is also a real bug fix, because in realistic scenarios we could proceed to detach the parent shard before the children are ready, leading to an availability gap for clients. ## Summary of changes - Do a short wait_to_become_active on the child shards before proceeding to wait for their LSNs to advance --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-02-18 15:55:19 +00:00
Alexander Bayandin	61f99d703d	test_create_snapshot: do not try to copy pg_dynshmem dir (#6796 ) ## Problem `test_create_snapshot` is flaky[0] on CI and fails constantly on macOS, but with a slightly different error: ``` shutil.Error: [('/Users/bayandin/work/neon/test_output/test_create_snapshot[release-pg15-1-100]/repo/endpoints/ep-1/pgdata/pg_dynshmem', '/Users/bayandin/work/neon/test_output/compatibility_snapshot_pgv15/repo/endpoints/ep-1/pgdata/pg_dynshmem', "[Errno 2] No such file or directory: '/Users/bayandin/work/neon/test_output/test_create_snapshot[release-pg15-1-100]/repo/endpoints/ep-1/pgdata/pg_dynshmem'")] ``` Also (on macOS) `repo/endpoints/ep-1/pgdata/pg_dynshmem` is a symlink to `/dev/shm/`. - [0] https://github.com/neondatabase/neon/issues/6784 ## Summary of changes Ignore `pg_dynshmem` directory while copying a snapshot	2024-02-18 12:16:07 +00:00
John Spray	24014d8383	pageserver: fix sharding emitting empty image layers during compaction (#6776 ) ## Problem Sharded tenants would sometimes try to write empty image layers during compaction: this was more noticeable on larger databases. - https://github.com/neondatabase/neon/issues/6755 Note to reviewers: the last commit is a refactor that de-intents a whole block, I recommend reviewing the earlier commits one by one to see the real changes ## Summary of changes - Fix a case where when we drop a key during compaction, we might fail to write out keys (this was broken when vectored get was added) - If an image layer is empty, then do not try and write it out, but leave `start` where it is so that if the subsequent key range meets criteria for writing an image layer, we will extend its key range to cover the empty area. - Add a compaction test that configures small layers and compaction thresholds, and asserts that we really successfully did image layer generation. This fails before the fix.	2024-02-18 08:51:12 +00:00
Konstantin Knizhnik	e3ded64d1b	Support pg-ivm extension (#6793 ) ## Problem See https://github.com/neondatabase/cloud/issues/10268 ## Summary of changes Add pg_ivm extension ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-02-17 22:13:25 +02:00
dependabot[bot]	9b714c8572	build(deps): bump cryptography from 42.0.0 to 42.0.2 (#6792 )	2024-02-17 19:15:21 +00:00
Alex Chi Z	29fb675432	Revert "fix superuser permission check for extensions (#6733 )" (#6791 ) This reverts commit `9ad940086c`. This pull request reverts #6733 to avoid incompatibility with pgvector and I will push further fixes later. Note that after reverting this pull request, the postgres submodule will point to some detached branches.	2024-02-16 20:50:09 +00:00
Christian Schwarz	ca07fa5f8b	per-TenantShard read throttling (#6706 )	2024-02-16 21:26:59 +01:00
John Spray	5d039c6e9b	libs: add 'generations_api' auth scope (#6783 ) ## Problem Even if you're not enforcing auth, the JwtAuth middleware barfs on scopes it doesn't know about. Add `generations_api` scope, which was invented in the cloud control plane for the pageserver's /re-attach and /validate upcalls: this will be enforced in storage controller's implementation of these in a later PR. Unfortunately the scope's naming doesn't match the other scope's naming styles, so needs a manual serde decorator to give it an underscore. ## Summary of changes - Add `Scope::GenerationsApi` variant - Update pageserver + safekeeper auth code to print appropriate message if they see it.	2024-02-16 15:53:09 +00:00
Calin Anca	36e1100949	bench_walredo: use tokio multi-threaded runtime (#6743 ) fixes https://github.com/neondatabase/neon/issues/6648 Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-02-16 16:31:54 +01:00
Alexander Bayandin	59c5b374de	test_pageserver_max_throughput_getpage_at_latest_lsn: disable on CI (#6785 ) ## Problem `test_pageserver_max_throughput_getpage_at_latest_lsn` is flaky which makes CI status red pretty frequently. `benchmarks` is not a blocking job (doesn't block `deploy`), so having it red might hide failures in other jobs Ref: https://github.com/neondatabase/neon/issues/6724 ## Summary of changes - Disable `test_pageserver_max_throughput_getpage_at_latest_lsn` on CI until it fixed	2024-02-16 15:30:04 +00:00
Arpad Müller	0f3b87d023	Add test for pageserver_directory_entries_count metric (#6767 ) Adds a simple test to ensure the metric works. The test creates a bunch of relations to activate the metric. Follow-up of #6736	2024-02-16 14:53:36 +00:00
Konstantin Knizhnik	c19625a29c	Support sharding for compute_ctl (#6787 ) ## Problem See https://github.com/neondatabase/neon/issues/6786 ## Summary of changes Split connection string in compute.rs when requesting basebackup	2024-02-16 14:50:09 +00:00
John Spray	f2e5212fed	storage controller: background reconcile, graceful shutdown, better logging (#6709 ) ## Problem Now that the storage controller is working end to end, we start burning down the robustness aspects. ## Summary of changes - Add a background task that periodically calls `reconcile_all`. This ensures that if earlier operations couldn't succeed (e.g. because a node was unavailable), we will eventually retry. This is a naive initial implementation can start an unlimited number of reconcile tasks: limiting reconcile concurrency is a later item in #6342 - Add a number of tracing spans in key locations: each background task, each reconciler task. - Add a top level CancellationToken and Gate, and use these to implement a graceful shutdown that waits for tasks to shut down. This is not bulletproof yet, because within these tasks we have remote HTTP calls that aren't wrapped in cancellation/timeouts, but it creates the structure, and if we don't shutdown promptly then k8s will kill us. - To protect shard splits from background reconciliation, expose the `SplitState` in memory and use it to guard any APIs that require an attached tenant.	2024-02-16 13:00:53 +00:00
Christian Schwarz	568bc1fde3	fix(build): production flamegraphs are useless (#6764 )	2024-02-16 10:12:34 +00:00
Christian Schwarz	45e929c069	stop reading local `metadata` file (#6777 )	2024-02-16 09:35:11 +00:00
John Spray	6b980f38da	libs: refactor ShardCount.0 to private (#6690 ) ## Problem The ShardCount type has a magic '0' value that represents a legacy single-sharded tenant, whose TenantShardId is formatted without a `-0001` suffix (i.e. formatted as a traditional TenantId). This was error-prone in code locations that wanted the actual number of shards: they had to handle the 0 case specially. ## Summary of changes - Make the internal value of ShardCount private, and expose `count()` and `literal()` getters so that callers have to explicitly say whether they want the literal value (e.g. for storing in a TenantShardId), or the actual number of shards in the tenant. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-02-15 21:59:39 +00:00
MMeent	f0d8bd7855	Update Makefile (#6779 ) This fixes issues where `neon-pg-ext-clean-vYY` is used as target and resolves using the `neon-pg-ext-%` template with `$` resolving as `clean-vYY`, for older versions of GNU Make, rather than `neon-pg-ext-clean-%` using `$` = `vYY` ## Problem ``` $ make clean ... rm -f pg_config_paths.h Compiling neon clean-v14 mkdir -p /Users/<user>/neon-build//pg_install//build/neon-clean-v14 /Applications/Xcode.app/Contents/Developer/usr/bin/make PG_CONFIG=/Users/<user>/neon-build//pg_install//clean-v14/bin/pg_config CFLAGS='-O0 -g3 ' \ -C /Users/<user>/neon-build//pg_install//build/neon-clean-v14 \ -f /Users/<user>/neon-build//pgxn/neon/Makefile install make[1]: /Users/<user>/neon-build//pg_install//clean-v14/bin/pg_config: Command not found make[1]: * No rule to make target `install'. Stop. make: * [neon-pg-ext-clean-v14] Error 2 ```	2024-02-15 19:48:50 +00:00
Joonas Koivunen	046d9c69e6	fix: require wider jwt for changing the io engine (#6770 ) io-engine should not be changeable with any JWT token, for example the tenant_id scoped token which computes have.	2024-02-15 16:58:26 +00:00
Alexander Bayandin	c72cb44213	test_runner/performance: parametrize benchmarks (#6744 ) ## Problem Currently, we don't store `PLATFORM` for Nightly Benchmarks. It causes them to be merged as reruns in Allure report (because they have the same test name). ## Summary of changes - Parametrize benchmarks by - Postgres Version (14/15/16) - Build Type (debug/release/remote) - PLATFORM (neon-staging/github-actions-selfhosted/...) --------- Co-authored-by: Bodobolero <peterbendel@neon.tech>	2024-02-15 15:53:58 +00:00
Arpad Müller	cd3e4ac18d	Rename TEST_IMG function to test_img (#6762 ) Latter follows the canonical way to naming functions in Rust.	2024-02-15 15:14:51 +00:00
Alex Chi Z	9ad940086c	fix superuser permission check for extensions (#6733 ) close https://github.com/neondatabase/neon/issues/6236 This pull request bumps neon postgres dependencies. The corresponding postgres commits fix the checks for superuser permission when creating an extension. Also, for creating native functinos, it now allows neon_superuser only in the extension creation process. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-02-15 14:59:13 +00:00
Joonas Koivunen	936f2ee2a5	fix: accidential wide span in tests (#6772 ) introduced in a PR without other #[tracing::instrument] changes.	2024-02-15 13:48:44 +00:00
Heikki Linnakangas	1af047dd3e	Fix typo in CI message (#6749 )	2024-02-15 14:34:19 +02:00
John Spray	5fa747e493	pageserver: shard splitting refinements (parent deletion, hard linking) (#6725 ) ## Problem - We weren't deleting parent shard contents once the split was done - Re-downloading layers into child shards is wasteful ## Summary of changes - Hard-link layers into child chart local storage during split - Delete parent shards content at the end --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-02-15 10:21:53 +02:00
Joonas Koivunen	80854b98ff	move timeouts and cancellation handling to remote_storage (#6697 ) Cancellation and timeouts are handled at remote_storage callsites, if they are. However they should always be handled, because we've had transient problems with remote storage connections. - Add cancellation token to the `trait RemoteStorage` methods - For `download`, `list` methods there is `DownloadError::{Cancelled,Timeout}` - For the rest now using `anyhow::Error`, it will have root cause `remote_storage::TimeoutOrCancel::{Cancel,Timeout}` - Both types have `::is_permanent` equivalent which should be passed to `backoff::retry` - New generic RemoteStorageConfig option `timeout`, defaults to 120s - Start counting timeouts only after acquiring concurrency limiter permit - Cancellable permit acquiring - Download stream timeout or cancellation is communicated via an `std::io::Error` - Exit backoff::retry by marking cancellation errors permanent Fixes: #6096 Closes: #4781 Co-authored-by: arpad-m <arpad-m@users.noreply.github.com>	2024-02-14 23:24:07 +00:00
Christian Schwarz	024372a3db	Revert "refactor(VirtualFile::crashsafe_overwrite): avoid Handle::block_on in callers" (#6765 ) Reverts neondatabase/neon#6731 On high tenant count Pageservers in staging, memory and CPU usage shoots to 100% with this change. (NB: staging currently has tokio-epoll-uring enabled) Will analyze tomorrow. https://neondb.slack.com/archives/C03H1K0PGKH/p1707933875639379?thread_ts=1707929541.125329&cid=C03H1K0PGKH	2024-02-14 19:17:12 +00:00
Shayan Hosseini	fff2468aa2	Add resource consume test funcs (#6747 ) ## Problem Building on #5875 to add handy test functions for autoscaling. Resolves #5609 ## Summary of changes This PR makes the following changes to #5875: - Enable `neon_test_utils` extension in the compute node docker image, so we could use it in the e2e tests (as discussed with @kelvich). - Removed test functions related to disk as we don't use them for autoscaling. - Fix the warning with printf-ing unsigned long variables. --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-02-14 18:45:05 +00:00
Anna Khanova	c7538a2c20	Proxy: remove fail fast logic to connect to compute (#6759 ) ## Problem Flaky tests ## Summary of changes Remove failfast logic	2024-02-14 18:43:52 +00:00
Arpad Müller	a2d0d44b42	Remove unused allow's (#6760 ) These allow's became redundant some time ago so remove them, or address them if addressing is very simple.	2024-02-14 18:16:05 +00:00
Christian Schwarz	7d3cdc05d4	fix(pageserver): pagebench doesn't work with released artifacts (#6757 ) The canonical release artifact of neon.git is the Docker image with all the binaries in them: ``` docker pull neondatabase/neon:release-4854 docker create --name extract neondatabase/neon:release-4854 docker cp extract:/usr/local/bin/pageserver ./pageserver.release-4854 chmod +x pageserver.release-4854 cp -a pageserver.release-4854 ./target/release/pageserver ``` Before this PR, these artifacts didn't expose the `keyspace` API, thereby preventing `pagebench get-page-latest-lsn` from working. Having working pagebench is useful, e.g., for experiments in staging. So, expose the API, but don't document it, as it's not part of the interface with control plane.	2024-02-14 17:01:15 +00:00
John Spray	840abe3954	pageserver: store aux files as deltas (#6742 ) ## Problem Aux files were stored with an O(N^2) cost, since on each modification the entire map is re-written as a page image. This addresses one axis of the inefficiency in logical replication's use of storage (https://github.com/neondatabase/neon/issues/6626). It will still be writing a large amount of duplicative data if writing the same slot's state every 15 seconds, but the impact will be O(N) instead of O(N^2). ## Summary of changes - Introduce `NeonWalRecord::AuxFile` - In `DatadirModification`, if the AUX_FILES_KEY has already been set, then write a delta instead of an image	2024-02-14 15:01:16 +00:00
Christian Schwarz	774a6e7475	refactor(virtual_file) make write_all_at take owned buffers (#6673 ) context: https://github.com/neondatabase/neon/issues/6663 Building atop #6664, this PR switches `write_all_at` to take owned buffers. The main challenge here is the `EphemeralFile::mutable_tail`, for which I'm picking the ugly solution of an `Option` that is `None` while the IO is in flight. After this, we will be able to switch `write_at` to take owned buffers and call tokio-epoll-uring's `write` function with that owned buffer. That'll be done in #6378.	2024-02-14 15:59:06 +01:00
Christian Schwarz	df5d588f63	refactor(VirtualFile::crashsafe_overwrite): avoid Handle::block_on in callers (#6731 ) Some callers of `VirtualFile::crashsafe_overwrite` call it on the executor thread, thereby potentially stalling it. Others are more diligent and wrap it in `spawn_blocking(..., Handle::block_on, ... )` to avoid stalling the executor thread. However, because `crashsafe_overwrite` uses VirtualFile::open_with_options internally, we spawn a new thread-local `tokio-epoll-uring::System` in the blocking pool thread that's used for the `spawn_blocking` call. This PR refactors the situation such that we do the `spawn_blocking` inside `VirtualFile::crashsafe_overwrite`. This unifies the situation for the better: 1. Callers who didn't wrap in `spawn_blocking(..., Handle::block_on, ...)` before no longer stall the executor. 2. Callers who did it before now can avoid the `block_on`, resolving the problem with the short-lived `tokio-epoll-uring::System`s in the blocking pool threads. A future PR will build on top of this and divert to tokio-epoll-uring if it's configures as the IO engine. Changes ------- - Convert implementation to std::fs and move it into `crashsafe.rs` - Yes, I know, Safekeepers (cc @arssher ) added `durable_rename` and `fsync_async_opt` recently. However, `crashsafe_overwrite` is different in the sense that it's higher level, i.e., it's more like `std::fs::write` and the Safekeeper team's code is more building block style. - The consequence is that we don't use the VirtualFile file descriptor cache anymore. - I don't think it's a big deal because we have plenty of slack wrt production file descriptor limit rlimit (see [this dashboard](https://neonprod.grafana.net/d/e4a40325-9acf-4aa0-8fd9-f6322b3f30bd/pageserver-open-file-descriptors?orgId=1)) - Use `tokio::task::spawn_blocking` in `VirtualFile::crashsafe_overwrite` to call the new `crashsafe::overwrite` API. - Inspect all callers to remove any double-`spawn_blocking` - spawn_blocking requires the captures data to be 'static + Send. So, refactor the callers. We'll need this for future tokio-epoll-uring support anyway, because tokio-epoll-uring requires owned buffers. Related Issues -------------- - overall epic to enable write path to tokio-epoll-uring: #6663 - this is also kind of relevant to the tokio-epoll-uring System creation failures that we encountered in staging, investigation being tracked in #6667 - why is it relevant? Because this PR removes two uses of `spawn_blocking+Handle::block_on`	2024-02-14 14:22:41 +00:00
John Spray	f39b0fce9b	Revert #6666 "tests: try to make restored-datadir comparison tests not flaky" (#6751 ) The #6666 change appears to have made the test fail more often. PR https://github.com/neondatabase/neon/pull/6712 should re-instate this change, along with its change to make the overall flow more reliable. This reverts commit `568f91420a`.	2024-02-14 10:57:01 +00:00
Conrad Ludgate	a9ec4eb4fc	hold cancel session (#6750 ) ## Problem In a recent refactor, we accidentally dropped the cancel session early ## Summary of changes Hold the cancel session during proxy passthrough	2024-02-14 10:26:32 +00:00
Heikki Linnakangas	a97b54e3b9	Cherry-pick Postgres bugfix to 'mmap' DSM implementation Cherry-pick Upstream commit fbf9a7ac4d to neon stable branches. We'll get it in the next PostgreSQL minor release anyway, but we need it now, if we want to start using the 'mmap' implementation. See https://github.com/neondatabase/autoscaling/issues/800 for the plans on doing that.	2024-02-14 11:37:52 +02:00
Heikki Linnakangas	a5114a99b2	Create a symlink from pg_dynshmem to /dev/shm See included comment and issue https://github.com/neondatabase/autoscaling/issues/800 for details. This has no effect, unless you set "dynamic_shared_memory_type = mmap" in postgresql.conf.	2024-02-14 11:37:52 +02:00
Arpad Müller	ee7bbdda0e	Create new metric for directory counts (#6736 ) There is O(n^2) issues due to how we store these directories (#6626), so it's good to keep an eye on them and ensure the numbers stay low. The new per-timeline metric `pageserver_directory_entries_count` isn't perfect, namely we don't calculate it every time we attach the timeline, but only if there is an actual change. Also, it is a collective metric over multiple scalars. Lastly, we only emit the metric if it is above a certain threshold. However, the metric still give a feel for the general size of the timeline. We care less for small values as the metric is mainly there to detect and track tenants with large directory counts. We also expose the directory counts in `TimelineInfo` so that one can get the detailed size distribution directly via the pageserver's API. Related: #6642 , https://github.com/neondatabase/cloud/issues/10273	2024-02-14 02:12:00 +01:00
Konstantin Knizhnik	b6e070bf85	Do not perform fast exit for catalog pages in redo filter (#6730 ) ## Problem See https://github.com/neondatabase/neon/issues/6674 Current implementation of `neon_redo_read_buffer_filter` performs fast exist for catalog pages: ``` /* * Out of an abundance of caution, we always run redo on shared catalogs, * regardless of whether the block is stored in shared buffers. See also * this function's top comment. / if (!OidIsValid(NInfoGetDbOid(rinfo))) return false; / as a result last written lsn and relation size for FSM fork are not correctly updated for catalog relations. ## Summary of changes Do not perform fast path return for catalog relations. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-13 20:41:17 +02:00
Christian Schwarz	7fa732c96c	refactor(virtual_file): take owned buffer in VirtualFile::write_all (#6664 ) Building atop #6660 , this PR converts VirtualFile::write_all to owned buffers. Part of https://github.com/neondatabase/neon/issues/6663	2024-02-13 18:46:25 +01:00
Anna Khanova	331935df91	Proxy: send cancel notifications to all instances (#6719 ) ## Problem If cancel request ends up on the wrong proxy instance, it doesn't take an effect. ## Summary of changes Send redis notifications to all proxy pods about the cancel request. Related issue: https://github.com/neondatabase/neon/issues/5839, https://github.com/neondatabase/cloud/issues/10262	2024-02-13 17:58:58 +01:00
John Spray	a8eb4042ba	tests: test_secondary_mode_eviction: avoid use of mocked statvfs (#6698 ) ## Problem Test sometimes fails with `used_blocks > total_blocks`, because when using mocked statvfs with the total blocks set to the size of data on disk before starting, we are implicitly asserting that nothing at all can be written to disk between startup and calling statvfs. Related: https://github.com/neondatabase/neon/issues/6511 ## Summary of changes - Use HTTP API to invoke disk usage eviction instead of mocked statvfs	2024-02-13 09:00:50 +02:00
Arthur Petukhovsky	4be2223a4c	Discrete event simulation for safekeepers (#5804 ) This PR contains the first version of a [FoundationDB-like](https://www.youtube.com/watch?v=4fFDFbi3toc) simulation testing for safekeeper and walproposer. ### desim This is a core "framework" for running determenistic simulation. It operates on threads, allowing to test syncronous code (like walproposer). `libs/desim/src/executor.rs` contains implementation of a determenistic thread execution. This is achieved by blocking all threads, and each time allowing only a single thread to make an execution step. All executor's threads are blocked using `yield_me(after_ms)` function. This function is called when a thread wants to sleep or wait for an external notification (like blocking on a channel until it has a ready message). `libs/desim/src/chan.rs` contains implementation of a channel (basic sync primitive). It has unlimited capacity and any thread can push or read messages to/from it. `libs/desim/src/network.rs` has a very naive implementation of a network (only reliable TCP-like connections are supported for now), that can have arbitrary delays for each package and failure injections for breaking connections with some probability. `libs/desim/src/world.rs` ties everything together, to have a concept of virtual nodes that can have network connections between them. ### walproposer_sim Has everything to run walproposer and safekeepers in a simulation. `safekeeper.rs` reimplements all necesary stuff from `receive_wal.rs`, `send_wal.rs` and `timelines_global_map.rs`. `walproposer_api.rs` implements all walproposer callback to use simulation library. `simulation.rs` defines a schedule – a set of events like `restart <sk>` or `write_wal` that should happen at time `<ts>`. It also has code to spawn walproposer/safekeeper threads and provide config to them. ### tests `simple_test.rs` has tests that just start walproposer and 3 safekeepers together in a simulation, and tests that they are not crashing right away. `misc_test.rs` has tests checking more advanced simulation cases, like crashing or restarting threads, testing memory deallocation, etc. `random_test.rs` is the main test, it checks thousands of random seeds (schedules) for correctness. It roughly corresponds to running a real python integration test in an environment with very unstable network and cpu, but in a determenistic way (each seed results in the same execution log) and much much faster. Closes #547 --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2024-02-12 20:29:57 +00:00
Anna Khanova	fac50a6264	Proxy refactor auth+connect (#6708 ) ## Problem Not really a problem, just refactoring. ## Summary of changes Separate authenticate from wake compute. Do not call wake compute second time if we managed to connect to postgres or if we got it not from cache.	2024-02-12 18:41:02 +00:00
Arpad Müller	a1f37cba1c	Add test that runs the S3 scrubber (#6641 ) In #6079 it was found that there is no test that executes the scrubber. We now add such a test, which does the following things: * create a tenant, write some data * run the scrubber * remove the tenant * run the scrubber again Each time, the scrubber runs the scan-metadata command. Before #6079 we would have errored, now we don't. Fixes #6080	2024-02-12 19:15:21 +01:00
Christian Schwarz	8b8ff88e4b	GH actions: label to disable CI runs completely (#6677 ) I don't want my very-early-draft PRs to trigger any CI runs. So, add a label `run-no-ci`, and piggy-back on the `check-permissions` job.	2024-02-12 15:25:33 +00:00
Joonas Koivunen	7ea593db22	refactor(LayerManager): resident layers query (#6634 ) Refactor out layer accesses so that we can have easy access to resident layers, which are needed for number of cases instead of layers for eviction. Simplifies the heatmap building by only using Layers, not RemoteTimelineClient. Cc: #5331	2024-02-12 17:13:35 +02:00
Conrad Ludgate	789a71c4ee	proxy: add more http logging (#6726 ) ## Problem hard to see where time is taken during HTTP flow. ## Summary of changes add a lot more for query state. add a conn_id field to the sql-over-http span	2024-02-12 15:03:45 +00:00
Christian Schwarz	242dd8398c	refactor(blob_io): use owned buffers (#6660 ) This PR refactors the `blob_io` code away from using slices towards taking owned buffers and return them after use. Using owned buffers will eventually allow us to use io_uring for writes. part of https://github.com/neondatabase/neon/issues/6663 Depends on https://github.com/neondatabase/tokio-epoll-uring/pull/43 The high level scheme is as follows: - call writing functions with the `BoundedBuf` - return the underlying `BoundedBuf::Buf` for potential reuse in the caller NB: Invoking `BoundedBuf::slice(..)` will return a slice that _includes the uninitialized portion of `BoundedBuf`_. I.e., the portion between `bytes_init()` and `bytes_total()`. It's a safe API that actually permits access to uninitialized memory. Not great. Another wrinkle is that it panics if the range has length 0. However, I don't want to switch away from the `BoundedBuf` API, since it's what tokio-uring uses. We can always weed this out later by replacing `BoundedBuf` with our own type. Created an issue so we don't forget: https://github.com/neondatabase/tokio-epoll-uring/issues/46	2024-02-12 15:58:55 +01:00
Conrad Ludgate	98ec5c5c46	proxy: some more parquet data (#6711 ) ## Summary of changes add auth_method and database to the parquet logs	2024-02-12 13:14:06 +00:00
Anna Khanova	020e607637	Proxy: copy bidirectional fork (#6720 ) ## Problem `tokio::io::copy_bidirectional` doesn't close the connection once one of the sides closes it. It's not really suitable for the postgres protocol. ## Summary of changes Fork `copy_bidirectional` and initiate a shutdown for both connections. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2024-02-12 14:04:46 +01:00
Joonas Koivunen	c77411e903	cleanup around `attach` (#6621 ) The smaller changes I found while looking around #6584. - rustfmt was not able to format handle_timeline_create - fix Generation::get_suffix always allocating - Generation was missing a `#[track_caller]` for panicky method - attach has a lot of issues, but even with this PR it cannot be formatted by rustfmt - moved the `preload` span to be on top of `attach` -- it is awaited inline - make disconnected panic! or unreachable! into expect, expect_err	2024-02-12 14:52:20 +02:00
Joonas Koivunen	aeda82a010	fix(heavier_once_cell): assertion failure can be hit (#6722 ) @problame noticed that the `tokio::sync::AcquireError` branch assertion can be hit like in the added test. We haven't seen this yet in production, but I'd prefer not to see it there. There `take_and_deinit` is being used, but this race must be quite timing sensitive. Rework of earlier: #6652.	2024-02-12 09:57:29 +00:00
Heikki Linnakangas	e5daf366ac	tests: Remove unnecessary port config with VanillaPostgres class VanillaPostgres constructor prints the "port={port}" line to the config file, no need to do it in the callers. The TODO comment that it would be nice if VanillaPostgres could pick the port by itself is still valid though.	2024-02-11 01:34:31 +02:00
Heikki Linnakangas	d77583c86a	tests: Remove obsolete allowlist entries Commit `9a6c0be823` removed the code that printed these warnings: marking {} as locally complete, while it doesnt exist in remote index No timelines to attach received Remove those warnings from all the allowlists in tests.	2024-02-11 01:34:31 +02:00
Heikki Linnakangas	241dcbf70c	tests: Remove "Running in ..." log message from every CLI call It's always the same directory, the test's "repo" directory.	2024-02-11 01:34:31 +02:00
Heikki Linnakangas	da626fb1fa	tests: Remove "postgres is running on ... branch" messages It seems like useless chatter. The endpoint.start() itself prints a "Running command ... neon_local endpoint start" message too.	2024-02-11 01:34:31 +02:00
John Spray	12b39c9db9	control_plane: add debug APIs for force-dropping tenant/node (#6702 ) ## Problem When debugging/supporting this service, we sometimes need it to just forget about a tenant or node, e.g. because of an issue cleanly tearing them down. For example, if I create a tenant with a PlacementPolicy that can't be scheduled on the nodes we have, we would never be able to schedule it for a DELETE to work. ## Summary of changes - Add APIs for dropping nodes and tenants that do no teardown other than removing the entity from the DB and removing any references to it.	2024-02-10 11:56:52 +00:00
Heikki Linnakangas	df5e2729a9	Remove now unused allowlisted errors. I'm not sure when we stopped emitting these, but they don't seem to be needed anymore.	2024-02-10 12:05:02 +02:00
Heikki Linnakangas	0fd3cd27cb	Tighten up the check for garbage after end-of-tar. Turn the warning into an error, if there is garbage after the end of imported tar file. However, it's normal for 'tar' to append extra empty blocks to the end, so tolerate those without warnings or errors.	2024-02-10 12:05:02 +02:00
Christian Schwarz	5779c7908a	revert two recent `heavier_once_cell` changes (#6704 ) This PR reverts - https://github.com/neondatabase/neon/pull/6589 - https://github.com/neondatabase/neon/pull/6652 because there's a performance regression that's particularly visible at high layer counts. Most likely it's because the switch to RwLock inflates the ``` inner: heavier_once_cell::OnceCell<ResidentOrWantedEvicted>, ``` size from 48 to 88 bytes, which, by itself is almost a doubling of the cache footprint, and probably the fact that it's now larger than a cache line also doesn't help. See this chat on the Neon discord for more context: https://discord.com/channels/1176467419317940276/1204714372295958548/1205541184634617906 I'm reverting 6652 as well because it might also have perf implications, and we're getting close to the next release. We should re-do its changes after the next release, though. cc @koivunej cc @ivaxer	2024-02-09 22:22:40 +00:00
Sasha Krassovsky	1a4dd58b70	Grant pg_monitor to neon_superuser (#6691 ) ## Problem The people want pg_monitor https://github.com/neondatabase/neon/issues/6682 ## Summary of changes Gives the people pg_monitor	2024-02-09 20:22:53 +00:00
Conrad Ludgate	cbd3a32d4d	proxy: decode username and password (#6700 ) ## Problem usernames and passwords can be URL 'percent' encoded in the connection string URL provided by serverless driver. ## Summary of changes Decode the parameters when getting conn info	2024-02-09 19:22:23 +00:00
Christian Schwarz	ca818c8bd7	fix(test_ondemand_download_timetravel): occasionally fails with slightly higher physical size (#6687 )	2024-02-09 20:09:37 +01:00
Arseny Sher	1bb9abebf2	Remove WAL segments from s3 in batches. Do list-delete operations in batches instead of doing full list first, to ensure deletion makes progress even if there are a lot of files to remove. To this end, add max_keys limit to remote storage list_files.	2024-02-09 22:11:53 +04:00
Conrad Ludgate	96d89cde51	Proxy error reworking (#6453 ) ## Problem Taking my ideas from https://github.com/neondatabase/neon/pull/6283 and doing a bit less radical changes. smaller commits. We currently don't report error classifications in proxy as the current error handling made it hard to do so. ## Summary of changes 1. Add a `ReportableError` trait that all errors will implement. This provides the error classification functionality. 2. Handle Client requests a strongly typed error * this error is a `ReportableError` and is logged appropriately 3. The handle client error only has a few possible error types, to account for the fact that at this point errors should be returned to the user.	2024-02-09 15:50:51 +00:00
John Spray	89a5c654bf	control_plane: follow up for embedded migrations (#6647 ) ## Problem In https://github.com/neondatabase/neon/pull/6637, we remove the need to run migrations externally, but for compat tests to work we can't remove those invocations from the neon_local binary. Once that previous PR merges, we can make the followup changes without upsetting compat tests.	2024-02-09 14:26:50 +00:00
Heikki Linnakangas	5239cdc29f	Fix test_vm_bit_clear_on_heap_lock test The test was supposed to reproduce the bug fixed in commit `66fa176cc8`, i.e. that the clearing of the VM bit was not replayed in the pageserver on HEAP_LOCK records. But it was broken in many ways and failed to reproduce the original problem if you reverted the fix: - The comparison of XIDs was broken. The test read the XID in to a variable in python, but it was treated as a string rather than an integer. As a result, e.g. "999" > "1000". - The test accessed the locked tuple too early, in the loop. Accessing it early, before the pg_xact page had been removed, set the hint bits. That masked the problem on subsequent accesses. - The on-demand SLRU download that was introduced in commit `9a9d9beaee` hid the issue. Even though an SLRU segment was removed by Postgres, when it later tried to access it, it could still download it from the pageserver. To ensure that doesn't happen, shorten the GC period and compact and GC aggressively in the test. I also added a more direct check that the VM page is updated, using the get_page_at_lsn() debugging function. Right after locking the row, we now fetch the VM page from pageserver and directly compare it with the VM page in the page cache. They should match. That assertion is more robust to things like on-demand SLRU download that could mask the bug.	2024-02-09 15:56:41 +02:00
Heikki Linnakangas	84a0e7b022	tests: Allow setting shutdown mode separately from 'destroy' flag In neon_local, the default mode is now always 'fast', regardless of 'destroy'. You can override it with the "neon_local endpoint stop --mode=immediate" flag. In python tests, we still default to 'immediate' mode when using the stop_and_destroy() function, and 'fast' with plain stop(). I kept that to avoid changing behavior in existing tests. I don't think existing tests depend on it, but I wasn't 100% certain.	2024-02-09 15:56:41 +02:00
John Spray	8d98981fe5	tests: deflake test_sharding_split_unsharded (#6699 ) ## Problem This test was a subset of the larger sharding test, and it missed the validate() call on workload that was implicitly waiting for a tenant to become active before trying to split it. It could therefore fail to split due to tenant not yet being active. ## Summary of changes - Insert .validate() call, and move the Workload setup to after the check of shard ID (as the shard ID check should pass immediately)	2024-02-09 13:20:04 +00:00
Joonas Koivunen	eb919cab88	prepare to move timeouts and cancellation handling to remote_storage (#6696 ) This PR is preliminary cleanups and refactoring around `remote_storage` for next PR which will move the timeouts and cancellation into `remote_storage`. Summary: - smaller drive-by fixes - code simplification - refactor common parts like `DownloadError::is_permanent` - align error types with `RemoteStorage::list_*` to use more `download_retry` helper Cc: #6096	2024-02-09 12:52:58 +00:00
Anastasia Lubennikova	eec1e1a192	Pre-install anon extension from compute_ctl if anon is in shared_preload_libraries. Users cannot install it themselves, because superuser is required. GRANT all priveleged needed to use it to db_owner We use the neon fork of the extension, because small change to sql file is needed to allow db_owner to use it. This feature is behind a feature flag AnonExtension, so it is not enabled by default.	2024-02-09 12:32:07 +00:00
Conrad Ludgate	ea089dc977	proxy: add per query array mode flag (#6678 ) ## Problem Drizzle needs to be able to configure the array_mode flag per query. ## Summary of changes Adds an array_mode flag to the query data json that will otherwise default to the header flag.	2024-02-09 10:29:20 +00:00
John Spray	951c9bf4ca	control_plane: fix shard splitting on unsharded tenant (#6689 ) ## Problem Previous test started with a new-style TenantShardId with a non-zero ShardCount. We also need to handle the case of a ShardCount() (aka `unsharded`) parent shard. A followup PR will refactor ShardCount to make its inner value private and thereby make this kind of mistake harder ## Summary of changes - Fix a place we were incorrectly treating a ShardCount as a number of shards rather than as thing that can be zero or the number of shards. - Add a test for this case.	2024-02-09 10:12:40 +00:00
Heikki Linnakangas	568f91420a	tests: try to make restored-datadir comparison tests not flaky (#6666 ) This test occasionally fails with a difference in "pg_xact/0000" file between the local and restored datadirs. My hypothesis is that something changed in the database between the last explicit checkpoint and the shutdown. I suspect autovacuum, it could certainly create transactions. To fix, be more precise about the point in time that we compare. Shut down the endpoint first, then read the last LSN (i.e. the shutdown checkpoint's LSN), from the local disk with pg_controldata. And use exactly that LSN in the basebackup. Closes #559. I'm proposing this as an alternative to https://github.com/neondatabase/neon/pull/6662.	2024-02-09 11:34:15 +02:00
Joonas Koivunen	a18aa14754	test: shutdown endpoints before deletion (#6619 ) this avoids a page_service error in the log sometimes. keeping the endpoint running while deleting has no function for this test.	2024-02-09 09:01:07 +00:00
Konstantin Knizhnik	529a79d263	Increment generation which LFC is disabled by assigning 0 to neon.file_cache_size_limit (#6692 ) ## Problem test_lfc_resize sometimes filed with assertion failure when require lock in write operation: ``` if (lfc_ctl->generation == generation) { Assert(LFC_ENABLED()); ``` ## Summary of changes Increment generation when 0 is assigned to neon.file_cache_size_limit ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-09 08:14:41 +02:00
Joonas Koivunen	c09993396e	fix: secondary tenant relative order eviction (#6491 ) Calculate the `relative_last_activity` using the total evicted and resident layers similar to what we originally planned. Cc: #5331	2024-02-09 00:37:57 +02:00
Joonas Koivunen	9a31311990	fix(heavier_once_cell): assertion failure can be hit (#6652 ) @problame noticed that the `tokio::sync::AcquireError` branch assertion can be hit like in the first commit. We haven't seen this yet in production, but I'd prefer not to see it there. There `take_and_deinit` is being used, but this race must be quite timing sensitive.	2024-02-08 22:40:14 +02:00
Arpad Müller	c0e0fc8151	Update Rust to 1.76.0 (#6683 ) [Release notes](https://github.com/rust-lang/rust/releases/tag/1.75.0).	2024-02-08 19:57:02 +01:00
John Spray	e8d2843df6	storage controller: improved handling of node availability on restart (#6658 ) - Automatically set a node's availability to Active if it is responsive in startup_reconcile - Impose a 5s timeout of HTTP request to list location conf, so that an unresponsive node can't hang it for minutes - Do several retries if the request fails with a retryable error, to be tolerant of concurrent pageserver & storage controller restarts - Add a readiness hook for use with k8s so that we can tell when the startup reconciliaton is done and the service is fully ready to do work. - Add /metrics to the list of un-authenticated endpoints (this is unrelated but we're touching the line in this PR already, and it fixes auth error spam in deployed container.) - A test for the above. Closes: #6670	2024-02-08 18:00:53 +00:00
John Spray	af91a28936	pageserver: shard splitting (#6379 ) ## Problem One doesn't know at tenant creation time how large the tenant will grow. We need to be able to dynamically adjust the shard count at runtime. This is implemented as "splitting" of shards into smaller child shards, which cover a subset of the keyspace that the parent covered. Refer to RFC: https://github.com/neondatabase/neon/pull/6358 Part of epic: #6278 ## Summary of changes This PR implements the happy path (does not cleanly recover from a crash mid-split, although won't lose any data), without any optimizations (e.g. child shards re-download their own copies of layers that the parent shard already had on local disk) - Add `/v1/tenant/:tenant_shard_id/shard_split` API to pageserver: this copies the shard's index to the child shards' paths, instantiates child `Tenant` object, and tears down parent `Tenant` object. - Add `splitting` column to `tenant_shards` table. This is written into an existing migration because we haven't deployed yet, so don't need to cleanly upgrade. - Add `/control/v1/tenant/:tenant_id/shard_split` API to attachment_service, - Add `test_sharding_split_smoke` test. This covers the happy path: future PRs will add tests that exercise failure cases.	2024-02-08 15:35:13 +00:00
Konstantin Knizhnik	43eae17f0d	Drop unused replication slots (#6655 ) ## Problem See #6626 If there is inactive replication slot then Postgres will not bw able to shrink WAL and delete unused snapshots. If she other active subscription is present, then snapshots created each 15 seconds will overflow AUX_DIR. Setting `max_slot_wal_keep_size` doesn't solve the problem, because even small WAL segment will be enough to overflow AUX_DIR if there is no other activity on the system. ## Summary of changes If there are active subscriptions and some logical replication slots are not used during `neon.logical_replication_max_time_lag` interval, then unused slot is dropped. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-08 17:31:15 +02:00
Anna Khanova	6c34d4cd14	Proxy: set timeout on establishing connection (#6679 ) ## Problem There is no timeout on the handshake. ## Summary of changes Set the timeout on the establishing connection.	2024-02-08 13:52:04 +00:00
Anna Khanova	c63e3e7e84	Proxy: improve http-pool (#6577 ) ## Problem The password check logic for the sql-over-http is a bit non-intuitive. ## Summary of changes 1. Perform scram auth using the same logic as for websocket cleartext password. 2. Split establish connection logic and connection pool. 3. Parallelize param parsing logic with authentication + wake compute. 4. Limit the total number of clients	2024-02-08 12:57:05 +01:00
Christian Schwarz	c52495774d	tokio-epoll-uring: expose its metrics in pageserver's `/metrics` (#6672 ) context: https://github.com/neondatabase/neon/issues/6667	2024-02-07 23:58:54 +00:00
Andreas Scherbaum	9a017778a9	Update copyright notice, set it to current year (#6671 ) ## Problem Copyright notice is outdated ## Summary of changes Replace the initial year `2022` with `2022 - 2024`, after brief discussion with Stas about the format Co-authored-by: Andreas Scherbaum <andreas@neon.tech>	2024-02-08 00:48:31 +01:00
Christian Schwarz	c561ad4e2e	feat: expose locked memory in pageserver `/metrics` (#6669 ) context: https://github.com/neondatabase/neon/issues/6667	2024-02-07 19:39:52 +00:00
John Spray	3bd2a4fd56	control_plane: avoid feedback loop with /location_config if compute hook fails. (#6668 ) ## Problem The existing behavior isn't exactly incorrect, but is operationally risky: if the control plane compute hook breaks, then all the control plane operations trying to call /location_config will end up retrying forever, which could put more load on the system. ## Summary of changes - Treat 404s as fatal errors to do fewer retries: a 404 either indicates we have the wrong URL, or some control plane bug is failing to recognize our tenant ID as existing. - Do not return an error on reconcilation errors in a non-creating /location_config response: this allows the control plane to finish its Operation (and we will eventually retry the compute notification later)	2024-02-07 19:14:18 +00:00
Tristan Partin	128fae7054	Update Postgres 16 to 16.2	2024-02-07 11:10:48 -08:00
Tristan Partin	5541244dc4	Update Postgres 15 to 15.6	2024-02-07 11:10:48 -08:00
Tristan Partin	2e9b1f7aaf	Update Postgres 14 to 14.11	2024-02-07 11:10:48 -08:00
Christian Schwarz	51f9385b1b	live-reconfigurable virtual_file::IoEngine (#6552 ) This PR adds an API to live-reconfigure the VirtualFile io engine. It also adds a flag to `pagebench get-page-latest-lsn`, which is where I found this functionality to be useful: it helps compare the io engines in a benchmark without re-compiling a release build, which took ~50s on the i3en.3xlarge where I was doing the benchmark. Switching the IO engine is completely safe at runtime.	2024-02-07 17:47:55 +00:00
Sasha Krassovsky	7b49e5e5c3	Remove compute migrations feature flag (#6653 )	2024-02-07 07:55:55 -09:00
Abhijeet Patil	75f1a01d4a	Optimise e2e run (#6513 ) ## Problem We have finite amount of runners and intermediate results are often wanted before a PR is ready for merging. Currently all PRs get e2e tests run and this creates a lot of throwaway e2e results which may or may not get to start or complete before a new push. ## Summary of changes 1. Skip e2e test when PR is in draft mode 2. Run e2e when PR status changes from draft to ready for review (change this to having its trigger in below PR and update results of build and test) 3. Abstract e2e test in a Separate workflow and call it from the main workflow for the e2e test 5. Add a label, if that label is present run e2e test in draft (run-e2e-test-in-draft) 6. Auto add a label(approve to ci) so that all the external contributors PR , e2e run in draft 7. Document the new label changes and the above behaviour Draft PR : https://github.com/neondatabase/neon/actions/runs/7729128470 Ready To Review : https://github.com/neondatabase/neon/actions/runs/7733779916 Draft PR with label : https://github.com/neondatabase/neon/actions/runs/7725691012/job/21062432342 and https://github.com/neondatabase/neon/actions/runs/7733854028 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-02-07 16:14:10 +00:00
John Spray	090a789408	storage controller: use PUT instead of POST (#6659 ) This was a typo, the server expects PUT.	2024-02-07 13:24:10 +00:00
John Spray	3d4fe205ba	control_plane/attachment_service: database connection pool (#6622 ) ## Problem This is mainly to limit our concurrency, rather than to speed up requests (I was doing some sanity checks on performance of the service with thousands of shards) ## Summary of changes - Enable the `diesel:r2d2` feature, which provides an async connection pool - Acquire a connection before entering spawn_blocking for a database transaction (recall that diesel's interface is sync) - Set a connection pool size of 99 to fit within default postgres limit (100) - Also set the tokio blocking thread count to accomodate the same number of blocking tasks (the only thing we use spawn_blocking for is database calls).	2024-02-07 13:08:09 +00:00
Arpad Müller	f7516df6c1	Pass timestamp as a datetime (#6656 ) This saves some repetition. I did this in #6533 for `tenant_time_travel_remote_storage` already.	2024-02-07 12:56:53 +01:00
Konstantin Knizhnik	f3d7d23805	Some small WAL records can write a lot of data to KV storage, so perform checkpoint check more frequently (#6639 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1707149618314539?thread_ts=1707081520.140049&cid=C04DGM6SMTM ## Summary of changes Perform checkpoint check after processing `ingest_batch_size` (default 100) WAL records. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-07 08:47:19 +02:00
Alexander Bayandin	9f75da7c0a	test_lazy_startup: fix statement_timeout setting (#6654 ) ## Problem Test `test_lazy_startup` is flaky[0], sometimes (pretty frequently) it fails with `canceling statement due to statement timeout`. - [0] https://neon-github-public-dev.s3.amazonaws.com/reports/main/7803316870/index.html#suites/355b1a7a5b1e740b23ea53728913b4fa/7263782d30986c50/history ## Summary of changes - Fix setting `statement_timeout` setting by reusing a connection for all queries. - Also fix label (`lazy`, `eager`) assignment - Split `test_lazy_startup` into two, by `slru` laziness and make tests smaller	2024-02-07 00:31:26 +00:00
Alexander Bayandin	f4cc7cae14	CI(build-tools): Update Python from 3.9.2 to 3.9.18 (#6615 ) ## Problem We use an outdated version of Python (3.9.2) ## Summary of changes - Update Python to the latest patch version (3.9.18) - Unify the usage of python caches where possible	2024-02-06 20:30:43 +00:00
John Spray	4f57dc6cc6	control_plane/attachment_service: take public key as value (#6651 ) It's awkward to point to a file when doing some kinds of ad-hoc deployment (like right now, when I'm hacking a helm chart having not quite hooked up secrets properly yet). We take all the rest of the secrets as CLI args directly, so let's do the same for public key.	2024-02-06 19:08:39 +00:00
Heikki Linnakangas	dc811d1923	Add a span to 'create_neon_superuser' for better OpenTelemetry traces (#6644 ) create_neon_superuser runs the first queries in the database after cold start. Traces suggest that those first queries can make up a significant fraction of the cold start time. Make it more visible by adding an explict tracing span to it; currently you just have to deduce it by looking at the time spent in the parent 'apply_config' span subtracted by all the other child spans.	2024-02-06 20:37:35 +02:00
Alexander Bayandin	e65f0fe874	CI(benchmarks): make job split consistent across reruns (#6614 ) ## Problem We've got several issues with the current `benchmarks` job setup: - `benchmark_durations.json` file (that we generate in runtime to split tests into several jobs[0]) is not consistent between these jobs (and very not consistent with the file if we rerun the job). I.e. test selection for each job can be different, which could end up in missed tests in a test run. - `scripts/benchmark_durations` doesn't fetch all tests from the database (it doesn't expect any extra directories inside `test_runner/performance`) - For some reason, currently split into 4 groups ends up with the 4th group has no tests to run, which fails the job[1] - [0] https://github.com/neondatabase/neon/pull/4683 - [1] https://github.com/neondatabase/neon/issues/6629 ## Summary of changes - Generate `benchmark_durations.json` file once before we start `benchmarks` jobs (this makes it consistent across the jobs) and pass the file content through the GitHub Actions input (this makes it consistent for reruns) - `scripts/benchmark_durations` fix SQL query for getting all required tests - Split benchmarks into 5 jobs instead of 4 jobs.	2024-02-06 17:00:55 +00:00
Joonas Koivunen	bb92721168	build: migrate check-style-rust to small runners (#6588 ) We have more small runners than large runners, and often a shortage of large runners. Migrate `check-style-rust` to run on small runners.	2024-02-06 15:53:04 +00:00
Christian Schwarz	d7b29aace7	refactor(walredo): don't create WalRedoManager for broken tenants (#6597 ) When we'll later introduce a global pool of pre-spawned walredo processes (https://github.com/neondatabase/neon/issues/6581), this refactoring avoids plumbing through the reference to the pool to all the places where we create a broken tenant. Builds atop the refactoring in #6583	2024-02-06 16:20:02 +01:00
Christian Schwarz	53a3ed0a7e	debug_assert presence of `shard_id` tracing field (#6572 ) also: fixes https://github.com/neondatabase/neon/issues/6638	2024-02-06 14:43:33 +00:00
dependabot[bot]	27a3c9ecbe	build(deps): bump cryptography from 41.0.6 to 42.0.0 (#6643 )	2024-02-06 13:15:07 +00:00
John Spray	6297843317	tests: flakiness fixes in pageserver tests (#6632 ) Fix several test flakes: - test_sharding_service_smoke had log failures on "Dropped LSN updates" - test_emergency_mode had log failures on a deletion queue shutdown check, where the check was incorrect because it was expecting channel receiver to stay alive after cancellation token was fired. - test_secondary_mode_eviction had racing heatmap uploads because the test was using a live migration hook to set up locations, where that migration was itself uploading heatmaps and generally making the situation more complex than it needed to be. These are the failure modes that I saw when spot checking the last few failures of each test. This will mostly/completely address #6511, but I'll leave that ticket open for a couple days and then check if either of the tests named in that ticket are flaky. Related #6511	2024-02-06 12:49:41 +00:00
Vadim Kharitonov	dae56ef60c	Do not suspend compute if there is an active logical replication subscription. (#6570 ) ## Problem the idea is to keep compute up and running if there are any active logical replication subscriptions. ### Rationale Rationale: - The Write-Ahead Logging (WAL) files, which contain the data changes, will need to be retained on the publisher side until the subscriber is able to connect again and apply these changes. This could potentially lead to increased disk usage on the publisher - and we do not want to disrupt the source - I think it is more pain for our customer to resolve storage issues on the source than to pay for the compute at the target. - Upon resuming the compute resources, the subscriber will start consuming and applying the changes from the retained WAL files. The time taken to catch up will depend on the volume of changes and the configured vCPUs. we can avoid explaining complex situations where we lag behind (in extreme cases we could lag behind hours, days or even months) - I think an important use case for logical replication from a source is a one-time migration or release upgrade. In this case the customer would not mind if we are not suspended for the duration of the migration. We need to document this in the release notes and the documentation in the context of logical replication where Neon is the target (subscriber) ### See internal discussion here https://neondb.slack.com/archives/C04DGM6SMTM/p1706793400746539?thread_ts=1706792628.701279&cid=C04DGM6SMTM	2024-02-06 12:15:42 +00:00
Christian Schwarz	0de46fd6f2	heavier_once_cell: switch to tokio::sync::RwLock (#6589 ) Using the RwLock reduces contention on the hot path. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-02-06 14:04:15 +02:00
Joonas Koivunen	53743991de	uploader: avoid cloning vecs just to get Bytes (#6645 ) Fix cloning the serialized heatmap on every attempt by just turning it into `bytes::Bytes` before clone so it will be a refcounted instead of refcounting a vec clone later on. Also fixes one cancellation token cloning I had missed in #6618. Cc: #6096	2024-02-06 11:34:13 +00:00
John Spray	431f4234d4	storage controller: embed database migrations in binary (#6637 ) ## Problem We don't have a neat way to carry around migration .sql files during deploy, and in any case would prefer to avoid depending on diesel CLI to deploy. ## Summary of changes - Use `diesel_migrations` crate to embed migrations in our binary - Run migrations on startup - Drop the diesel dependency in the `neon_local` binary, as the attachment_service binary just needs the database to exist. Do database creation with a simple `createdb`. Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-02-06 10:07:10 +00:00
Christian Schwarz	edcde05c1c	refactor(walredo): split up the massive `walredo.rs` (#6583 ) Part of https://github.com/neondatabase/neon/issues/6581	2024-02-06 09:44:49 +00:00
Christian Schwarz	e196d974cc	pagebench: actually implement `--num_clients` (#6640 ) Will need this to validate per-tenant throttling in https://github.com/neondatabase/neon/issues/5899	2024-02-06 10:34:16 +01:00
Joonas Koivunen	947165788d	refactor: needless cancellation token cloning (#6618 ) The solution we ended up for `backoff::retry` requires always cloning of cancellation tokens even though there is just `.await`. Fix that, and also turn the return type into `Option<Result<T, E>>` avoiding the need for the `E::cancelled()` fn passed in. Cc: #6096	2024-02-06 09:39:06 +02:00
John Spray	8e114bd610	control_plane/attachment_service: make --database-url optional (#6636 ) ## Problem This change was left out of #6585 accidentally -- just forgot to push the very last version of my branch. Now that we can load database url from Secrets Manager, we don't always need it on the CLI any more. We should let the user omit it instead of passing `--database-url ""` ## Summary of changes - Make `--database-url` optional	2024-02-05 20:31:55 +01:00
John Spray	cb7c89332f	control_plane: fix tenant GET, clean up endpoints (#6553 ) Cleanups from https://github.com/neondatabase/neon/pull/6394 - There was a rogue `*` breaking the `GET /tenant/:tenant_id`, which passes through to shard zero - There was a duplicate migrate endpoint - There are un-prefixed API endpoints that were only needed for compat tests and can now be removed.	2024-02-05 14:29:05 +00:00
Conrad Ludgate	74c5e3d9b8	use string interner for project cache (#6578 ) ## Problem Running some memory profiling with high concurrent request rate shows seemingly some memory fragmentation. ## Summary of changes Eventually, we will want to separate global memory (caches) from local memory (per connection handshake and per passthrough). Using a string interner for project info cache helps reduce some of the fragmentation of the global cache by having a single heap dedicated to project strings, and not scattering them throughout all a requests. At the same time, the interned key is 4 bytes vs the 24 bytes that `SmolStr` offers. Important: we should only store verified strings in the interner because there's no way to remove them afterwards. Good for caching responses from console.	2024-02-05 14:27:25 +00:00
Joonas Koivunen	5e8deca268	metrics: remove broken tenants (#6586 ) Before tenant migration it made sense to leak broken tenants in the metrics until restart. Nowdays it makes less sense because on cancellations we set the tenant broken. The set metric still allows filterable alerting. Fixes: #6507	2024-02-05 14:49:35 +02:00
Joonas Koivunen	db89b13aaa	fix: use the shared constant download buffer size (#6620 ) Noticed that we had forgotten to use `remote_timeline_client.rs::BUFFER_SIZE` in one instance.	2024-02-05 13:10:08 +01:00
Abhijeet Patil	01c57ec547	Removed Uploading of perf result to git repo 'zenith-perf-data' (#6590 ) ## Problem We were archiving the pref benchmarks to - neon DB - git repo `zenith-perf-data` As the pref batch ran in parallel when the uploading of results to zenith-perf-data` git repo resulted in merge conflicts. Which made the run flaky and as a side effect the build started failing . The problem is been expressed in https://github.com/neondatabase/neon/issues/5160 ## Summary of changes As the results were not used from the git repo it was redundant hence in this PR cleaning up the results uploading of of perf results to git repo The shell script `generate_and_push_perf_report.sh` was using a py script [git-upload](https://github.com/neondatabase/neon/compare/remove-perf-benchmark-git-upload?expand=1#diff-c6d938e7f060e487367d9dc8055245c82b51a73c1f97956111a495a8a86e9a33) and [scripts/generate_perf_report_page.py](https://github.com/neondatabase/neon/pull/6590/files#diff-81af2147e72d07e4cf8ee4395632596d805d6168ba75c71cab58db2659956ef8) which are not used anywhere else in repo hence also cleaning that up ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat the commit message to not include the above checklist	2024-02-05 10:08:20 +00:00
Arpad Müller	56cf360439	Don't preserve temp files on creation errors of delta layers (#6612 ) There is currently no cleanup done after a delta layer creation error, so delta layers can accumulate. The problem gets worse as the operation gets retried and delta layers accumulate on the disk. Therefore, delete them from disk (if something has been written to disk).	2024-02-05 09:53:37 +00:00
Heikki Linnakangas	df7bee7cfa	Fix compilation with recent glibc headers with close_range(2). I was getting an error: /home/heikki/git-sandbox/neon//pgxn/neon_walredo/walredoproc.c:161:5: error: conflicting types for ‘close_range’; have ‘int(unsigned int, unsigned int, unsigned int)’ 161 \| int close_range(unsigned int start_fd, unsigned int count, unsigned int flags) { \| ^~~~~~~~~~~ In file included from /usr/include/x86_64-linux-gnu/bits/sigstksz.h:24, from /usr/include/signal.h:328, from /home/heikki/git-sandbox/neon//pgxn/neon_walredo/walredoproc.c:50: /usr/include/unistd.h:1208:12: note: previous declaration of ‘close_range’ with type ‘int(unsigned int, unsigned int, int)’ 1208 \| extern int close_range (unsigned int __fd, unsigned int __max_fd, \| ^~~~~~~~~~~ The discrepancy is in the 3rd argument. Apparently in the glibc wrapper it's signed. As a quick fix, rename our close_range() function, the one that calls syscall() directly, to avoid the clash with the glibc wrapper. In the long term, an autoconf test would be nice, and some equivalent on macOS, see issue #6580.	2024-02-05 11:50:45 +02:00
Joonas Koivunen	70f646ffe2	More logging fixes (#6584 ) I was on-call this week, these would had made me understand more/faster of the system: - move stray attaching start logging inside the span it starts, add generation - log ancestor timeline_id or bootstrapping in the beginning of timeline creation	2024-02-05 09:34:03 +02:00
Vadim Kharitonov	7e8529bec1	Revert "Update pgvector to v0.6.0, third attempt" (#6610 ) The issue is still unsolved because of shmem size in VMs. Need to figure it out before applying this patch. For more details: ``` ERROR: could not resize shared memory segment "/PostgreSQL.2892504480" to 16774205952 bytes: No space left on device ``` As an example, the same issue in community pgvector/pgvector#453.	2024-02-04 22:27:07 +00:00
Clarence	09519c1773	chore: update wording in docs to improve readability (#6607 ) ## Problem Found typos while reading the docs ## Summary of changes Fixed the typos found	2024-02-04 19:33:38 +00:00
Joonas Koivunen	9dd69194d4	refactor(proxy): std::io::Write for BytesMut exists (#6606 ) Replace TODO with an existing implementation via `BufMut::writer``.	2024-02-03 22:15:59 +00:00
Heikki Linnakangas	647b85fc15	Update pgvector to v0.6.0, third attempt This includes a compatibility patch that is needed because pgvector now skips WAL-logging during the index build, and WAL-logs the index only in one go at the end. That's how GIN, GiST and SP-GIST index builds work in core PostgreSQL too, but we need some Neon-specific calls to mark the beginning and end of those build phases. pgvector is the first index AM that does that with parallel workers, so I had to modify those functions in the Neon extension to be aware of parallel workers. Only the leader needs to create the underlying file and perform the WAL-logging. (In principle, the parallel workers could participate in the WAL-logging too, but pgvector doesn't do that. This will need some further work if that changes). The previous attempt at this (#6592) missed that parallel workers needed those changes, and segfaulted in parallel build that spilled to disk. Testing ------- We don't have a place for regression tests of extensions at the moment. I tested this manually with the following script: ``` CREATE EXTENSION IF NOT EXISTS vector; DROP TABLE IF EXISTS tst; CREATE TABLE tst (i serial, v vector(3)); INSERT INTO tst (v) SELECT ARRAY[random(), random(), random()] FROM generate_series(1, 15000) g; -- Serial build, in memory ALTER TABLE tst SET (parallel_workers=0); SET maintenance_work_mem='50 MB'; CREATE INDEX idx ON tst USING hnsw (v vector_l2_ops); -- Test that the index works. (The table contents are random, and the -- search is approximate anyway, so we cannot check the exact values. -- For now, just eyeball that they look reasonable) set enable_seqscan=off; explain SELECT * FROM tst ORDER BY v <-> ARRAY[0, 0, 0]::vector LIMIT 5; SELECT * FROM tst ORDER BY v <-> ARRAY[0, 0, 0]::vector LIMIT 5; DROP INDEX idx; -- Serial build, spills to on disk ALTER TABLE tst SET (parallel_workers=0); SET maintenance_work_mem='5 MB'; CREATE INDEX idx ON tst USING hnsw (v vector_l2_ops); SELECT * FROM tst ORDER BY v <-> ARRAY[0, 0, 0]::vector LIMIT 5; DROP INDEX idx; -- Parallel build, in memory ALTER TABLE tst SET (parallel_workers=4); SET maintenance_work_mem='50 MB'; CREATE INDEX idx ON tst USING hnsw (v vector_l2_ops); SELECT * FROM tst ORDER BY v <-> ARRAY[0, 0, 0]::vector LIMIT 5; DROP INDEX idx; -- Parallel build, spills to disk ALTER TABLE tst SET (parallel_workers=4); SET maintenance_work_mem='5 MB'; CREATE INDEX idx ON tst USING hnsw (v vector_l2_ops); SELECT * FROM tst ORDER BY v <-> ARRAY[0, 0, 0]::vector LIMIT 5; DROP INDEX idx; ```	2024-02-03 09:19:37 +02:00
Heikki Linnakangas	c96aead502	Reorganize .dockerignore Author: Alexander Bayandin <alexander@neon.tech>	2024-02-03 09:19:37 +02:00
Arpad Müller	aac8eb2c36	Minor logging improvements (#6593 ) * log when `lsn_by_timestamp` finished together with its result * add back logging of the layer name as suggested in https://github.com/neondatabase/neon/pull/6549#discussion_r1475756808	2024-02-03 02:16:20 +01:00
Clarence	3d1b08496a	Update words in docs for better readability (#6600 ) ## Problem Found typos while reading the docs ## Summary of changes Fixed the typos found	2024-02-03 00:59:39 +00:00
Arpad Müller	0ac2606c8a	S3 restore test: Use a workaround to enable moto's self-copy support (#6594 ) While working on https://github.com/getmoto/moto/pull/7303 I discovered that if you enable bucket encryption, moto allows self-copies. So we can un-ignore the test. I tried it out locally, it works great. Followup of #6533, part of https://github.com/neondatabase/cloud/issues/8233	2024-02-02 23:45:57 +01:00
Em Sharnoff	d820d64e38	Bump vm-builder v0.21.0 -> v0.23.2 (#6480 ) Relevant changes were all from v0.23.0: - neondatabase/autoscaling#724 - neondatabase/autoscaling#726 - neondatabase/autoscaling#732 Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-02-02 22:39:20 +00:00
Arthur Petukhovsky	f2aa96f003	Console split RFC (#1997 ) [Rendered](https://github.com/neondatabase/neon/blob/rfc-console-split/docs/rfcs/017-console-split.md) Co-authored-by: Stas Kelvich <stas.kelvich@gmail.com>	2024-02-02 23:41:55 +02:00
Sasha Krassovsky	2fd8e24c8f	Switch sleeps to wait_until (#6575 ) ## Problem I didn't know about `wait_until` and was relying on `sleep` to wait for stuff. This caused some tests to be flaky. https://github.com/neondatabase/neon/issues/6561 ## Summary of changes Switch to `wait_until`, this should make it tests less flaky	2024-02-02 21:32:40 +00:00
Heikki Linnakangas	c9876b0993	Fix double-free bug in walredo process. (#6534 ) At the end of ApplyRecord(), we called pfree on the decoded record, if it was "oversized". However, we had alread linked it to the "decode queue" list in XLogReaderState. If we later called XLogBeginRead(), it called ResetDecoder and tried to free the same record again. The conditions to hit this are: - a large WAL record (larger than aboue 64 kB I think, per DEFAULT_DECODE_BUFFER_SIZE), and - another WAL record processed by the same WAL redo process after the large one. I think the reason we haven't seen this earlier is that you don't get WAL records that large that are sent to the WAL redo process, except when logical replication is enabled. Logical replication adds data to the WAL records, making them larger. To fix, allocate the buffer ourselves, and don't link it to the decode queue. Alternatively, we could perhaps have just removed the pfree(), but frankly I'm a bit scared about the whole queue thing.	2024-02-02 21:49:11 +02:00
John Spray	786e9cf75b	control_plane: implement HTTP compute hook for attachment service (#6471 ) ## Problem When we change which physical pageservers a tenant is attached to, we must update the control plane so that it can update computes. This will be done via an HTTP hook, as described in https://www.notion.so/neondatabase/Sharding-Service-Control-Plane-interface-6de56dd310a043bfa5c2f5564fa98365#1fe185a35d6d41f0a54279ac1a41bc94 ## Summary of changes - Optional CLI args `--control-plane-jwt-token` and `-compute-hook-url` are added. If these are set, then we will use this HTTP endpoint, instead of trying to use neon_local LocalEnv to update compute configuration. - Implement an HTTP-driven version of ComputeHook that calls into the configured URL - Notify for all tenants on startup, to ensure that we don't miss notifications if we crash partway through a change, and carry a `pending_compute_notification` flag at runtime to allow notifications to fail without risking never sending the update. - Add a test for all this One might wonder: why not do a "forever" retry for compute hook notifications, rather than carrying a flag on the shard to call reconcile() again later. The reason is that we will later limit concurreny of reconciles, when dealing with larger numbers of shards, and if reconcile is stuck waiting for the control plane to accept a notification request, it could jam up the whole system and prevent us making other changes. Anyway: from the perspective of the outside world, we _do_ retry forever, but we don't retry forever within a given Reconciler lifetime. The `pending_compute_notification` logic is predicated on later adding a background task that just calls `Service::reconcile_all` on a schedule to make sure that anything+everything that can fail a Reconciler::reconcile call will eventually be retried.	2024-02-02 19:22:03 +00:00
Vadim Kharitonov	0b91edb943	Revert pgvector 0.6.0 (#6592 ) It doesn't work in our VMs. Need more time to investigate	2024-02-02 18:36:31 +00:00
John Spray	2e5eab69c6	tests: remove test_gc_cutoff (#6587 ) This test became flaky when postgres retry handling was fixed to use backoff delays -- each iteration in this test's loop was taking much longer because pgbench doesn't fail until postgres has given up on retrying to the pageserver. We are just removing it, because the condition it tests is no longer risky: we reload all metadata from remote storage on restart, so crashing directly between making local changes and doing remote uploads isn't interesting any more. Closes: https://github.com/neondatabase/neon/issues/2856 Closes: https://github.com/neondatabase/neon/issues/5329	2024-02-02 18:20:18 +00:00
Joonas Koivunen	caf868e274	test: assert we eventually free space (#6536 ) in `test_statvfs_pressure_{usage,min_avail_bytes}` we now race against initial logical size calculation on-demand downloading the layers. first wait out the initial logical sizes, then change the final asserts to be "eventual", which is not great but it is faster than failing and retrying. this issue seems to happen only in debug mode tests. Fixes: #6510	2024-02-02 19:46:47 +02:00
John Spray	7e2436695d	storage controller: use AWS Secrets Manager for database URL, etc (#6585 ) ## Problem Passing secrets in via CLI/environment is awkward when using helm for deployment, and not ideal for security (secrets may show up in ps, /proc). We can bypass these issues by simply connecting directly to the AWS Secrets Manager service at runtime. ## Summary of changes - Add dependency on aws-sdk-secretsmanager - Update other aws dependencies to latest, to match transitive dependency versions - Add `Secrets` type in attachment service, using AWS SDK to load if secrets are not provided on the command line.	2024-02-02 16:57:11 +00:00
Conrad Ludgate	6506fd14c4	proxy: more refactors (#6526 ) ## Problem not really any problem, just some drive-by changes ## Summary of changes 1. move wake compute 2. move json processing 3. move handle_try_wake 4. move test backend to api provider 5. reduce wake-compute concerns 6. remove duplicate wake-compute loop	2024-02-02 16:07:35 +00:00
John Spray	46fb1a90ce	pageserver: avoid calculating/sending logical sizes on shard !=0 (#6567 ) ## Problem Sharded tenants only maintain accurate relation sizes on shard 0. Therefore logical size can only be calculated on shard 0. Fortunately it is also only _needed_ on shard 0, to provide Safekeeper feedback and to send consumption metrics. Closes: #6307 ## Summary of changes - Send 0 for logical size to safekeepers on shards !=0 - Skip logical size warmup task on shards !=0 - Skip imitate_layer_accesses on shards !=0	2024-02-02 15:52:03 +00:00
John Spray	56171cbe8c	pageserver: more permissive activation timeout when testing (#6564 ) ## Problem The 5 second activation timeout is appropriate for production environments, where we want to give a prompt response to the cloud control plane, and if we fail it will retry the call. In tests however, we don't want every call to e.g. timeline create to have to come with a retry wrapper. This issue has always been there, but it is more apparent in sharding tests that concurrently attach several tenant shards. Closes: https://github.com/neondatabase/neon/issues/6563 ## Summary of changes When `testing` feature is enabled, make `ACTIVE_TENANT_TIMEOUT` 30 seconds instead of 5 seconds.	2024-02-02 15:14:42 +01:00
Arpad Müller	48b05b7c50	Add a time_travel_remote_storage http endpoint (#6533 ) Adds an endpoint to the pageserver to S3-recover an entire tenant to a specific given timestamp. Required input parameters: * `travel_to`: the target timestamp to recover the S3 state to * `done_if_after`: a timestamp that marks the beginning of the recovery process. retries of the query should keep this value constant. it must be after `travel_to`, and also after any changes we want to revert, and must represent a point in time before the endpoint is being called, all of these time points in terms of the time source used by S3. these criteria need to hold even in the face of clock differences, so I recommend waiting a specific amount of time, then taking `done_if_after`, then waiting some amount of time again, and only then issuing the request. Also important to note: the timestamps in S3 work at second accuracy, so one needs to add generous waits before and after for the process to work smoothly (at least 2-3 seconds). We ignore the added test for the mocked S3 for now due to a limitation in moto: https://github.com/getmoto/moto/issues/7300 . Part of https://github.com/neondatabase/cloud/issues/8233	2024-02-02 14:52:12 +01:00
Conrad Ludgate	0856fe6676	proxy: remove per client bytes (#5466 ) ## Problem Follow up to #5461 In my memory usage/fragmentation measurements, these metrics came up as a large source of small allocations. The replacement metric has been in use for a long time now so I think it's good to finally remove this. Per-endpoint data is still tracked elsewhere ## Summary of changes remove the per-client bytes metrics	2024-02-02 12:28:48 +00:00
Alexander Bayandin	4133d14a77	Compute: pgbouncer 1.22.0 (#6582 ) ## Problem Update pgbouncer from 1.21 (and patches[0][1]) to 1.22 (which includes these patches) - [0] https://github.com/pgbouncer/pgbouncer/pull/972 - [1] https://github.com/pgbouncer/pgbouncer/pull/998 ## Summary of changes - Build pgbouncer 1.22.0 for neonVMs from upstream	2024-02-02 11:49:11 +00:00
Alexander Bayandin	30c9e145d7	check-macos-build: switch job to macos-14 (M1) (#6539 ) ## Problem - GitHub made available `macos-14` runners, and they run on M1 processors[0] - The price is the same as Intel-based runners — "macOS \| 3 or 4 (M1 or Intel) \| $0.08"[1], but runners on Apple Silicon should be significantly faster than their Intel counterparts. - Most developers who use macOS use Apple Silicon-based Macs nowadays. - [0] https://github.blog/changelog/2024-01-30-github-actions-introducing-the-new-m1-macos-runner-available-to-open-source/ - [1] https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#per-minute-rates ## Summary of changes - Run `check-macos-build` on `macos-14`	2024-02-02 10:51:20 +00:00
John Spray	24e916d37f	pageserver: fix a syntax error in swagger (#6566 ) A description was written as a follow-on to a section line, rather than in the proper `description:` part. This caused swagger parsers to rightly reject it.	2024-02-02 10:35:09 +00:00
Andreas Scherbaum	23f58145ed	Update wording for better readability (#6559 ) Update wording, add spaces in commandline arguments Co-authored-by: Andreas Scherbaum <andreas@neon.tech>	2024-02-02 11:22:32 +01:00
Heikki Linnakangas	350865392c	Print checkpoint key contents with "pagectl print-layer-file" (#6541 ) This was very useful in debugging the bugs fixed in #6410 and #6502. There's a lot more we could do. This only adds the printing to delta layers, not image layers, for example, and it might be useful to print details of more record types. But this is a good start.	2024-02-02 01:35:31 +02:00
Christian Schwarz	1be5e564ce	feat(walredo): use posix_spawn by moving close_fds() work to walredo C code (#6574 ) The rust stdlib uses the efficient `posix_spawn` by default. However, before this PR, pageserver used `pre_exec()` in our `close_fds()` ext trait. This PR moves the work that `close_fds()` did to the walredo C code. I verified manually using `gdb` that we're now forking out the walredo process using `posix_spawn`. refs https://github.com/neondatabase/neon/issues/6565	2024-02-01 22:38:34 +01:00
Christian Schwarz	7a70ef991f	feat(walredo): various observability improvements (#6573 ) - log when we start walredo process - include tenant shard id in walredo argv - dump some basic walredo state in tenant details api - more suitable walredo process launch histogram buckets - avoid duplicate tracing labels in walredo launch spans	2024-02-01 21:59:40 +01:00
Sasha Krassovsky	be30388901	Add retry to fetching basebackup (#6537 ) ## Problem Currently we have no retry mechanism for fetching basebackup. If there's an unstable connection, starting compute will just fail. ## Summary of changes Adds an exponential backoff with 7 retries to get the basebackup.	2024-02-01 20:50:04 +00:00
Heikki Linnakangas	3525080031	Fix pgvector 0.6.0 with Neon. (#6571 ) The previous patch was broken. rd_smgr as not open yet, need to use RelationGetSmgr() to access it.	2024-02-01 20:48:31 +00:00
Arpad Müller	527cdbc010	Don't require AWS access keys for S3 pytests (#6556 ) Don't require AWS access keys (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY) for S3 usage in the pytests, and also allow AWS_PROFILE to be passed. One of the two methods is required however. This allows local development like: ``` aws sso login --profile dev export ENABLE_REAL_S3_REMOTE_STORAGE=nonempty REMOTE_STORAGE_S3_REGION=eu-central-1 REMOTE_STORAGE_S3_BUCKET=neon-github-ci-tests AWS_PROFILE=dev cargo build_testing && RUST_BACKTRACE=1 ./scripts/pytest -k debug-pg16 test_runner/regress/test_tenant_delete.py::test_tenant_delete_smoke ``` related earlier PR for the cargo unit tests of the `remote_storage` crate: #6202 --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-02-01 20:18:07 +00:00
Alexander Bayandin	39be2b0108	Makefile: set PQ_LIB_DIR to avoid linkage with system libpq (#6538 ) ## Problem Initially spotted on macOS. When building `attachment_service`, it might get linked with system `libpq`: ``` $ otool -L target/debug/attachment_service target/debug/attachment_service: /opt/homebrew/opt/libpq/lib/libpq.5.dylib (compatibility version 5.0.0, current version 5.16.0) /System/Library/Frameworks/Security.framework/Versions/A/Security (compatibility version 1.0.0, current version 61040.61.1) /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 2202.0.0) /usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.61.1) ``` After this PR: ``` $ otool -L target/debug/attachment_service target/debug/attachment_service: /Users/bayandin/work/neon/pg_install/v16/lib/libpq.5.dylib (compatibility version 5.0.0, current version 5.16.0) /System/Library/Frameworks/Security.framework/Versions/A/Security (compatibility version 1.0.0, current version 61040.61.1) /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 2202.0.0) /usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.61.1) ``` ## Summary of changes - Set `PQ_LIB_DIR` to bundled Postgres 16 lib dir	2024-02-01 17:34:48 +00:00
Alexander Bayandin	fa52cd575e	Remove old tests results and old coverage collection (#6376 ) ## Problem We have switched to new test results and new coverage results, so no need to collect these data in old formats. ## Summary of changes - Remove "Upload coverage report" for old coverage report - Remove "Store Allure test stat in the DB" for old test results format	2024-02-01 13:36:55 +00:00
Vlad Lazar	d2c410c748	pageserver_api: remove overlaps from KeySpace (#6544 ) This commit adds a function to `KeySpace` which updates a key key space by removing all overlaps with a second key space. This can involve splitting or removing of existing ranges. The implementation is not particularly efficient: O(M * N * log(N)) where N is the number of ranges in the current key space and M is the number of ranges in the key space we are checking against. In practice, this shouldn't matter much since, in the short term, the only caller of this function will be the vectored read path and the number of key spaces invovled will be small. This follows from the upper bound placed on the number of keys accepted by the vectored read path. A couple other small utility functions are added. They'll be used by the vectored search path as well.	2024-02-01 13:14:35 +00:00
Vlad Lazar	221531c9db	pageserver: lift ancestor timeline logic from read path (#6543 ) When the read path needs to follow a key into the ancestor timeline, it needs to wait for said ancestor to become active and aware of it's branching lsn. The logic is lifted into a separate function with it's own new error type. This is done because the vectored read path needs the same logic. It's also the reason for the newly introduced error type. When we'll switch the read path to proxy into `get_vectored`, we can remove the duplicated variants from `PageReconstructError`.	2024-02-01 10:35:18 +00:00
Christian Schwarz	4c173456dc	pagebench: fix percentiles reporting (#6547 ) Before this patch, pagebench was always showing the same value. refs https://github.com/neondatabase/neon/issues/6509	2024-01-31 23:29:48 +00:00
Christian Schwarz	e82625b77d	refactor(pageserver main): signal handling (#6554 ) This refactoring makes it easier to experimentally replace BACKGROUND_RUNTIME with a single-threaded runtime. Found this useful [during benchmarking](https://github.com/neondatabase/neon/pull/6555).	2024-01-31 23:25:57 +00:00
Christian Schwarz	0ac1e71524	update tokio-epoll-uring (#6558 ) to pull in fixes for https://github.com/neondatabase/tokio-epoll-uring/issues/37	2024-01-31 22:54:54 +00:00
Anna Khanova	271133d960	Proxy: reduce number of get role secret calls (#6557 ) ## Problem Right now if get_role_secret response wasn't cached (e.g. cache already reached max size) it will send the second (exactly the same request). ## Summary of changes Avoid needless request.	2024-01-31 22:16:56 +00:00
Joonas Koivunen	3d5fab127a	rewrite Gate impl for better observability (#6542 ) changes: - two messages instead of message every second when gate was closing - replace the gate name string by using a pointer - slow GateGuards are likely to log who they were (see example) example found in regress tests: <https://github.com/neondatabase/neon/pull/6542#issuecomment-1919009256>	2024-01-31 22:15:58 +00:00
Joonas Koivunen	66719d7eaf	logging: fix span usage (#6549 ) Fixes some duplication due to extra or misconfigured `#[instrument]`, while filling in the `timeline_id` to delete timeline flow calls.	2024-01-31 20:52:00 +00:00
Konstantin Knizhnik	9a9d9beaee	Download SLRU segments on demand (#6151 ) ## Problem See https://github.com/neondatabase/cloud/issues/8673 ## Summary of changes Download missed SLRU segments from page server ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-01-31 21:39:18 +02:00
John Spray	2bfc831c60	control_plane/attachment_service: make --path optional (#6545 ) ## Problem The `--path` argument is only used in testing, for compat tests that use a JSON snapshot of state rather than the postgres database. In regular deployments, it should be omitted (currently one has to specify `--path ""`) ## Summary of changes Make `--path` optional.	2024-01-31 17:02:41 +00:00
Joonas Koivunen	799db161d3	tests: support for running on single pg version, use in one place (#6525 ) Some tests which are unit test alike do not need to run on different pg versions. Logging test is one of them which I found for unrelated reasons. Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-01-31 17:37:25 +02:00
Arpad Müller	47380be12d	Remove version param from get_lsn_by_timestamp (#6551 ) This removes the last remnants of the version param added by #5608 , concluding the transition plan laid out in https://github.com/neondatabase/cloud/pull/7553#discussion_r1370473911 . It follows PR https://github.com/neondatabase/cloud/pull/9202, which we now assume has been deployed to all environments. Full history: * https://github.com/neondatabase/neon/pull/5608 * https://github.com/neondatabase/cloud/pull/7553 * https://github.com/neondatabase/neon/pull/6178 * https://github.com/neondatabase/cloud/pull/9202	2024-01-31 15:30:19 +01:00
Conrad Ludgate	c7b02ce8ec	proxy: use jemalloc (#6531 ) ## Summary of changes Experiment with jemalloc in proxy	2024-01-31 14:51:11 +01:00
John Spray	4010adf653	control_plane/attachment_service: complete APIs (#6394 ) Depends on: https://github.com/neondatabase/neon/pull/6468 ## Problem The sharding service will be used as a "virtual pageserver" by the control plane -- so it needs the set of pageserver APIs that the control plane uses, and to present them under identical URLs, including prefix (/v1). ## Summary of changes - Add missing APIs: - Tenant deletion - Timeline deletion - Node list (used in test now, later in tools) - `/location_config` API (for migrating tenants into the sharding service) - Rework attachment service URLs: - `/v1` prefix is used for pageserver-compatible APIs - `/upcall/v1` prefix is used for APIs that are called by the pageserver (re-attach and validate) - `/debug/v1` prefix is used for endpoints that are for testing - `/control/v1` prefix is used for new sharding service APIs that do not mimic a pageserver API, such as registering and configuring nodes. - Add test_sharding_service. The sharding service already had some collateral coverage from its use in general tests, but this is the first dedicated testing for it.	2024-01-31 12:23:06 +00:00
Konstantin Knizhnik	e10a7ee391	Prevent to frequent reconnects in case of race condition errors returned by PS (tenant not found) (#6522 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1706531433057289 ## Summary of changes 1. Do not decrease reconnect timeout until maximal interval value (1 second) is reached 2. Compute reconnect time after connection attempt is taken to exclude connect time itself from the interval measurement. So now backend should not perform more than 4 reconnect attempts per second. But please notice that backoff is performed locally in each backend and so if there are many active backends, then connection (and so error) rate may be much higher. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-01-31 09:17:32 +02:00
Sasha Krassovsky	e8c9a51273	Allow creating subscriptions as neon_superuser (#6484 ) ## Problem We currently can't create subscriptions in PG14 and PG15 because only superusers can, and PG16 requires adding roles to pg_create_subscription. ## Summary of changes I added changes to PG14 and PG15 that allow neon_superuser to bypass the superuser requirement. For PG16, I didn't do that but added a migration that adds neon_superuser to pg_create_subscription. Also added a test to make sure it works.	2024-01-30 22:32:33 -08:00
Alexander Bayandin	3c3ee8f3e8	Compute: add compatibility patch for pgvector (#6527 ) ## Problem `pgvector` requires a patch to work well with Neon (a patch created by @hlinnaka) ## Summary of changes - Apply the patch to `pgvector`	2024-01-30 17:33:24 +00:00
Arpad Müller	6928a34f59	S3 DR: Large prefix improvements (#6515 ) ## Problem PR #6500 has removed the limiting by number of versions/deletions for time travel calls. We never get informed about how many versions there are, and thus the call would just hang without any indication of progress. ## Summary of changes We improve the pageserver's behaviour with large prefixes, i.e. those with many keys, removed or currently still available. * Add a hard limit of 100k versions/deletions. For the reasoning see https://github.com/neondatabase/cloud/issues/8233#issuecomment-1915021625 , but TLDR it will roughly support tenants of 2 TiB size, of course depending on general write activity and duration of the s3 retention window. The goal is to have a limit at all so that the process doesn't accumulate increasing numbers of versions until an eventual crash. * Lower the RAM footprint for the `VerOrDelete` datastructure. This means we now don't cache a lot of redundant metadata in RAM like the owner ID. The top level datastructure's footprint goes down from 264 bytes to 80 (but it contains strings that are not counted in there). Follow-up of #6500, part of https://github.com/neondatabase/cloud/issues/8233 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-01-30 15:57:27 +00:00
Arseny Sher	bc684e9d3b	Make WAL segment init atomic. Since fdatasync is used for flushing WAL, changing file size is unsafe. Make segment creation atomic by using tmp file + rename to avoid using partially initialized segments. fixes https://github.com/neondatabase/neon/issues/6402	2024-01-30 18:05:22 +04:00
Arseny Sher	08532231ee	Fix find_end_of_wal busy loop. It hanged if file size is less than of a normal segment. Normally that doesn't happen, but it might in case of crash during segment init. We're going to fix that half initialized segment by durably renaming it after cooking, so this fix won't be needed, but better avoid busy loop anyway. fixes https://github.com/neondatabase/neon/issues/6401	2024-01-30 18:05:22 +04:00
Christian Schwarz	79137a089f	fix(#6366 ): pageserver: incorrect log level for Tenant not found during basebackup (#6400 ) Before this patch, when requesting basebackup for a not-found tenant or timeline, we'd emit an ERROR-level log entry with a huge stack trace. See #6366 "Details" section for an example With this patch, we log at INFO level and only a single line. Example: ``` 2024-01-19T14:16:11.479800Z INFO page_service_conn_main{peer_addr=127.0.0.1:43448}: query handler for 'basebackup d69a536d529a68fcf85bc070030cdf4b 035484e9c28d8d0138a492caadd03ffd 0/2204340 --gzip' entity not found: Tenant d69a536d529a68fcf85bc070030cdf4b not found 2024-01-19T14:19:35.807819Z INFO page_service_conn_main{peer_addr=127.0.0.1:48862}: query handler for 'basebackup d69a536d529a68fcf85bc070030cdf4a 035484e9c28d8d0138a492caadd03ffd 0/2204340 --gzip' entity not found: Timeline d69a536d529a68fcf85bc070030cdf4a/035484e9c28d8d0138a492caadd03ffd was not found ``` fixes https://github.com/neondatabase/neon/issues/6366 Changes ------- - Change `handle_basebackup_request` to return a `QueryError` - The new `impl From<WaitLsnError> for QueryError` is needed so the `?` at `wait_lsn()` call in `handle_basebackup_request` works again. It's duplicating `impl From<WaitLsnError> for PageStreamError`. - Remove hard-to-spot conversion of `handle_basebackup_request` return value to anyhow::Result (the place where I replaced `anyhow::Ok` with `Result::<(), QueryError>::Ok(())` - Add forgotten distinguished handling for "Tenant not found" case in `impl From<GetActiveTenantError> for QueryError` This was not at all pleasant, and I find it very hard to follow the various error conversions. It took me a while to spot the hard-to-spot `anyhow::Ok` thing above. It would have been caught by the compiler if we weren't auto-converting `anyhow::Error` into `QueryError::Other`. We should move away from that, in my opinion, instead forcing each `.context()` site to become `.context().map_err(QueryError::Other)`. But that's for a future PR.	2024-01-30 13:10:48 +00:00
Joonas Koivunen	e3cb715e8a	fix: capture initdb stderr, discard others (#6524 ) When using spawn + wait_with_output instead of std::process::Command::output or tokio::process::Command::output we must configure the redirection. Fixes: #6523 by discarding the stdout completely, we only care about stderr if any.	2024-01-30 14:07:58 +01:00
dependabot[bot]	c70bf9150f	build(deps): bump aiohttp from 3.9.0 to 3.9.2 (#6518 )	2024-01-30 10:46:49 +00:00
Alexander Bayandin	8e4da52069	Compute: pgvector 0.6.0 (#6517 ) Update pgvector extension from 0.5.1 to 0.6.0	2024-01-30 09:29:45 +00:00
Arthur Petukhovsky	2ff1a5cecd	Patch safekeeper control file on HTTP request (#6455 ) Closes #6397	2024-01-29 18:20:57 +00:00
Conrad Ludgate	ec8dcc2231	flatten proxy flow (#6447 ) ## Problem Taking my ideas from https://github.com/neondatabase/neon/pull/6283 and doing a bit less radical changes. smaller commits. Proxy flow was quite deeply nested, which makes adding more interesting error handling quite tricky. ## Summary of changes I recommend reviewing commit by commit. 1. move handshake logic into a separate file 2. move passthrough logic into a separate file 3. no longer accept a closure in CancelMap session logic 4. Remove connect_to_db, copy logic into handle_client 5. flatten auth_and_wake_compute in authenticate 6. record info for link auth	2024-01-29 17:38:03 +00:00
Arpad Müller	b844c6f0c7	Do pagination in list_object_versions call (#6500 ) ## Problem The tenants we want to recover might have tens of thousands of keys, or more. At that point, the AWS API returns a paginated response. ## Summary of changes Support paginated responses for `list_object_versions` requests. Follow-up of #6155, part of https://github.com/neondatabase/cloud/issues/8233	2024-01-29 17:59:26 +01:00
Alexander Bayandin	6a85a06e1b	Compute: build rdkit without freetype support (#6495 ) ## Problem `rdkit` extension is built with `RDK_BUILD_FREETYPE_SUPPORT=ON` (by default), which requires a bunch of additional dependencies, but the support of freetype fonts isn't required for Postgres. With `RDK_BUILD_FREETYPE_SUPPORT=ON`: ``` ldd /usr/local/pgsql/lib/rdkit.so linux-vdso.so.1 (0x0000ffff82ea8000) libfreetype.so.6 => /usr/lib/aarch64-linux-gnu/libfreetype.so.6 (0x0000ffff825e5000) libboost_serialization.so.1.74.0 => /usr/lib/aarch64-linux-gnu/libboost_serialization.so.1.74.0 (0x0000ffff82590000) libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000ffff8255f000) libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000ffff82387000) libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffff822dc000) libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000ffff822b8000) libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffff82144000) libpng16.so.16 => /usr/lib/aarch64-linux-gnu/libpng16.so.16 (0x0000ffff820fd000) libz.so.1 => /lib/aarch64-linux-gnu/libz.so.1 (0x0000ffff820d3000) libbrotlidec.so.1 => /usr/lib/aarch64-linux-gnu/libbrotlidec.so.1 (0x0000ffff820b8000) /lib/ld-linux-aarch64.so.1 (0x0000ffff82e78000) libbrotlicommon.so.1 => /usr/lib/aarch64-linux-gnu/libbrotlicommon.so.1 (0x0000ffff82087000) ``` With `RDK_BUILD_FREETYPE_SUPPORT=OFF`: ``` ldd /usr/local/pgsql/lib/rdkit.so linux-vdso.so.1 (0x0000ffffbba75000) libboost_serialization.so.1.74.0 => /usr/lib/aarch64-linux-gnu/libboost_serialization.so.1.74.0 (0x0000ffffbb259000) libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000ffffbb228000) libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000ffffbb050000) libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffbafa5000) libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000ffffbaf81000) libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffbae0d000) /lib/ld-linux-aarch64.so.1 (0x0000ffffbba45000) ``` ## Summary of changes - Build `rdkit` with `RDK_BUILD_FREETYPE_SUPPORT=OFF` - Remove extra dependencies from the Compute image	2024-01-29 16:16:37 +00:00
John Spray	b04a6acd6c	docker: add attachment_service binary (#6506 ) ## Problem Creating sharded tenants will require an instance of the sharding service -- the initial goal is to deploy one of these in a staging region (https://github.com/neondatabase/cloud/issues/9718). It will run as a kubernetes container, similar to the storage broker, so needs to be built into the container image. ## Summary of changes Add `attachment_service` binary to container image	2024-01-29 13:31:56 +00:00
Vlad Lazar	0c7b89235c	pageserver: add range layer map search implementation (#6469 ) ## Problem There's no efficient way of querying the layer map for a range. ## Summary of changes Introduce a range query for the layer map (`LayerMap::range_search`). There's two broad steps to it: 1. Find all coverage changes for layers that intersect the queried range (see `LayerCoverage::range_overlaps`). The slightly tricky part is dealing with the start of the range. We can either be aligned with a layer or not and we need to treat these cases differently. 2. Iterate over the coverage changes and collect the result. For this we use a two pointer approach: the trailing pointer tracks the start of the current range (current location in the key space) and the forward pointer tracks the next coverage change. Plugging the range search into the read path is deferred to a future PR. ## Performance I adapted the layer map benchmarks on a local branch. Range searches are between 2x and 2.5x slower than point searches. That's in line with what I expected since we query thelayer map twice. Since `Timeline::get` will proxy to `Timeline::get_vectored` we can special case the one element layer map range search at that point.	2024-01-29 09:47:12 +00:00
Joonas Koivunen	1e9a50bca8	disk_usage_eviction_task: cleanup summaries (#6490 ) This is the "partial revert" of #6384. The summaries turned out to be expensive due to naive vec usage, but also inconclusive because of the additional context required. In addition to removing summary traces, small refactoring is done.	2024-01-29 10:38:40 +02:00
Conrad Ludgate	511e730cc0	hll experiment (#6312 ) ## Problem Measuring cardinality using logs is expensive and slow. ## Summary of changes Implement a pre-aggregated HyperLogLog-based cardinality estimate. HyperLogLog estimates the cardinality of a set by using the probability that the uniform hash of a value will have a run of n 0s at the end is `1/2^n`, therefore, having observed a run of `n` 0s suggests we have measured `2^n` distinct values. By using multiple shards, we can use the harmonic mean to get a more accurate estimate. We record this into a Prometheus time-series. HyperLogLog counts can be merged by taking the `max` of each shard. We can apply a `max_over_time` in order to find the estimate of cardinality of distinct values over time	2024-01-29 07:26:20 +00:00
Konstantin Knizhnik	c1148dc9ac	Fix calculation of maximal multixact in ingest_multixact_create_record (#6502 ) ## Problem See https://neondb.slack.com/archives/C06F5UJH601/p1706373716661439 ## Summary of changes Use None instead of 0 as initial accumulator value for calculating maximal multixact XID. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-01-29 07:39:16 +02:00
Anna Khanova	8253cf1931	proxy: Relax endpoint check (#6503 ) ## Problem http-over-sql allowes host to be in format api.aws.... however it's not the case for the websocket flow. ## Summary of changes Relax endpoint check for the ws serverless connections.	2024-01-28 21:27:14 +00:00
Christian Schwarz	3a82430432	fixup(#6492 ): also switch the benchmarks that runs on merge-to-main back to std-fs (#6501 )	2024-01-28 00:15:11 +01:00
Arpad Müller	734755eaca	Enable nextest retries for the arm build (#6496 ) Also make the NEXTEST_RETRIES declaration more local. Requested in https://github.com/neondatabase/neon/pull/6493#issuecomment-1912110202	2024-01-27 05:16:11 +01:00
Christian Schwarz	e34166a28f	CI: switch back to std-fs io engine for soak time before next release (#6492 ) PR #5824 introduced the concept of io engines in pageserver and implemented `tokio-epoll-uring` in addition to our current method, `std-fs`. We used `tokio-epoll-uring` in CI for a day to get more exposure to the code. Now it's time to switch CI back so that we test with `std-fs` as well, because that's what we're (still) using in production.	2024-01-26 22:48:34 +01:00
Christian Schwarz	3a36a0a227	fix(test suite): some tests leak child processes (#6497 )	2024-01-26 18:23:53 +00:00
John Spray	58f6cb649e	control_plane: database persistence for attachment_service (#6468 ) ## Problem Spun off from https://github.com/neondatabase/neon/pull/6394 -- this PR is just the persistence parts and the changes that enable it to work nicely ## Summary of changes - Revert #6444 and #6450 - In neon_local, start a vanilla postgres instance for the attachment service to use. - Adopt `diesel` crate for database access in attachment service. This uses raw SQL migrations as the source of truth for the schema, so it's a soft dependency: we can switch libraries pretty easily. - Rewrite persistence.rs to use postgres (via diesel) instead of JSON. - Preserve JSON read+write at startup and shutdown: this enables using the JSON format in compatibility tests, so that we don't have to commit to our DB schema yet. - In neon_local, run database creation + migrations before starting attachment service - Run the initial reconciliation in Service::spawn in the background, so that the pageserver + attachment service don't get stuck waiting for each other to start, when restarting both together in a test.	2024-01-26 17:20:44 +00:00
Arpad Müller	dcc7610ad6	Do backoff::retry in s3 timetravel test (#6493 ) The top level retries weren't enough, probably because we do so many network requests. Fine grained retries ensure that there is higher potential for the entire test to succeed. To demonstrate this, consider the following example: let's assume that each request has 5% chance of failing and we do 10 requests. Then chances of success without any retries is 0.95^10 = 0.6. With 3 top level retries it is 1-0.4^3 = 0.936. With 3 fine grained retries it is (1-0.05^3)^10 = 0.9988 (roundings implicit). So chances of failure are 6.4% for the top level retry vs 0.12% for the fine grained retry. Follow-up of #6155	2024-01-26 16:43:56 +00:00
Alexander Bayandin	4c245b0f5a	update_build_tools_image.yml: Push build-tools image to Docker Hub (#6481 ) ## Problem - `docker.io/neondatabase/build-tools:pinned` image is frequently outdated on Docker Hub because there's no automated way to update it. - `update_build_tools_image.yml` workflow contains legacy roll-back logic, which is not required anymore because it updates only a single image. ## Summary of changes - Make `update_build_tools_image.yml` workflow push images to both ECR and Docker Hub - Remove unneeded roll-back logic	2024-01-26 16:12:49 +00:00
John Spray	55b7cde665	tests: add basic coverage for sharding (#6380 ) ## Problem The support for sharding in the pageserver was written before https://github.com/neondatabase/neon/pull/6205 landed, so when it landed we couldn't directly test sharding. ## Summary of changes - Add `test_sharding_smoke` which tests the basics of creating a sharding tenant, creating a timeline within it, checking that data within it is distributed. - Add modes to pg_regress tests for running with 4 shards as well as with 1.	2024-01-26 14:40:47 +00:00
Vlad Lazar	5b34d5f561	pageserver: add vectored get latency histogram (#6461 ) This patch introduces a new set of grafana metrics for a histogram: pageserver_get_vectored_seconds_bucket{task_kind="Compaction\|PageRequestHandler"}. While it has a `task_kind` label, only compaction and SLRU fetches are tracked. This reduces the increase in cardinality to 24. The metric should allow us to isolate performance regressions while the vectorized get is being implemented. Once the implementation is complete, it'll also allow us to quantify the improvements.	2024-01-26 13:40:03 +00:00
Alexander Bayandin	26c55b0255	Compute: fix rdkit extension build (#6488 ) ## Problem `rdkit` extension build started to fail because of the changed checksum of the Comic Neue font: ``` Downloading https://fonts.google.com/download?family=Comic%20Neue... CMake Error at Code/cmake/Modules/RDKitUtils.cmake:257 (MESSAGE): The md5 checksum for /rdkit-src/Code/GraphMol/MolDraw2D/Comic_Neue.zip is incorrect; expected: 850b0df852f1cda4970887b540f8f333, found: b7fd0df73ad4637504432d72a0accb8f ``` https://github.com/neondatabase/neon/actions/runs/7666530536/job/20895534826 Ref https://neondb.slack.com/archives/C059ZC138NR/p1706265392422469 ## Summary of changes - Disable comic fonts for `rdkit` extension	2024-01-26 12:39:20 +00:00
Vadim Kharitonov	12e9b2a909	Update plv8 (#6465 )	2024-01-26 09:56:11 +00:00
Christian Schwarz	918b03b3b0	integrate tokio-epoll-uring as alternative VirtualFile IO engine (#5824 )	2024-01-26 09:25:07 +01:00
Alexander Bayandin	d36623ad74	CI: cancel old e2e-tests on new commits (#6463 ) ## Problem Triggered `e2e-tests` job is not cancelled along with other jobs in a PR if the PR get new commits. We can improve the situation by setting `concurrency_group` for the remote workflow (https://github.com/neondatabase/cloud/pull/9622 adds `concurrency_group` group input to the remote workflow). Ref https://neondb.slack.com/archives/C059ZC138NR/p1706087124297569 Cloud's part added in https://github.com/neondatabase/cloud/pull/9622 ## Summary of changes - Set `concurrency_group` parameter when triggering `e2e-tests` - At the beginning of a CI pipeline, trigger Cloud's `cancel-previous-in-concurrency-group.yml` workflow which cancels previously triggered e2e-tests	2024-01-25 19:25:29 +00:00
Christian Schwarz	689ad72e92	fix(neon_local): leaks child process if it fails to start & pass checks (#6474 ) refs https://github.com/neondatabase/neon/issues/6473 Before this PR, if process_started() didn't return Ok(true) until we ran out of retries, we'd return an error but leave the process running. Try it by adding a 20s sleep to the pageserver `main()`, e.g., right before we claim the pidfile. Without this PR, output looks like so: ``` (.venv) cs@devvm-mbp:[~/src/neon-work-2]: ./target/debug/neon_local start Starting neon broker at 127.0.0.1:50051. storage_broker started, pid: 2710939 . attachment_service started, pid: 2710949 Starting pageserver node 1 at '127.0.0.1:64000' in ".neon/pageserver_1"..... pageserver has not started yet, continuing to wait..... pageserver 1 start failed: pageserver did not start in 10 seconds No process is holding the pidfile. The process must have already exited. Leave in place to avoid race conditions: ".neon/pageserver_1/pageserver.pid" No process is holding the pidfile. The process must have already exited. Leave in place to avoid race conditions: ".neon/safekeepers/sk1/safekeeper.pid" Stopping storage_broker with pid 2710939 immediately....... storage_broker has not stopped yet, continuing to wait..... neon broker stop failed: storage_broker with pid 2710939 did not stop in 10 seconds Stopping attachment_service with pid 2710949 immediately....... attachment_service has not stopped yet, continuing to wait..... attachment service stop failed: attachment_service with pid 2710949 did not stop in 10 seconds ``` and we leak the pageserver process ``` (.venv) cs@devvm-mbp:[~/src/neon-work-2]: ps aux \| grep pageserver cs 2710959 0.0 0.2 2377960 47616 pts/4 Sl 14:36 0:00 /home/cs/src/neon-work-2/target/debug/pageserver -D .neon/pageserver_1 -c id=1 -c pg_distrib_dir='/home/cs/src/neon-work-2/pg_install' -c http_auth_type='Trust' -c pg_auth_type='Trust' -c listen_http_addr='127.0.0.1:9898' -c listen_pg_addr='127.0.0.1:64000' -c broker_endpoint='http://127.0.0.1:50051/' -c control_plane_api='http://127.0.0.1:1234/' -c remote_storage={local_path='../local_fs_remote_storage/pageserver'} ``` After this PR, there is no leaked process.	2024-01-25 19:20:02 +01:00
Christian Schwarz	fd4cce9417	test_pageserver_max_throughput_getpage_at_latest_lsn: remove n_tenants=100 combination (#6477 ) Need to fix the neon_local timeouts first (https://github.com/neondatabase/neon/issues/6473) and also not run them on every merge, but only nightly: https://github.com/neondatabase/neon/issues/6476	2024-01-25 18:17:53 +00:00
Arpad Müller	d52b81340f	S3 based recovery (#6155 ) Adds a new `time_travel_recover` function to the `RemoteStorage` trait that allows time travel like functionality for S3 buckets, regardless of their content (it is not even pageserver related). It takes a different approach from [this post](https://aws.amazon.com/blogs/storage/point-in-time-restore-for-amazon-s3-buckets/) that is more complicated. It takes as input a prefix a target timestamp, and a limit timestamp: * executes [`ListObjectVersions`](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectVersions.html) * obtains the latest version that comes before the target timestamp * copies that latest version to the same prefix * if there is versions newer than the limit timestamp, it doesn't do anything for the file The limit timestamp is meant to be some timestamp before the start of the recovery operation and after any changes that one wants to revert. For example, it might be the time point after a tenant was detached from all involved pageservers. The limiting mechanism ensures that the operation is idempotent and can be retried without causing additional writes/copies. The approach fulfills all the requirements laid out in 8233, and is a recoverable operation. Nothing is deleted permanently, only new entries added to the version log. I also enable [nextest retries](https://nexte.st/book/retries.html) to help with some general S3 flakiness (on top of low level retries). Part of https://github.com/neondatabase/cloud/issues/8233	2024-01-25 18:23:18 +01:00
Joonas Koivunen	8dee9908f8	fix(compaction_task): wrong log levels (#6442 ) Filter what we log on compaction task. Per discussion in last triage call, fixing these by introducing and inspecting the root cause within anyhow::Error instead of rolling out proper conversions. Fixes: #6365 Fixes: #6367	2024-01-25 18:45:17 +02:00
Konstantin Knizhnik	19ed230708	Add support for PS sharding in compute (#6205 ) refer #5508 replaces #5837 ## Problem This PR implements sharding support at compute side. Relations are splinted in stripes and `get_page` requests are redirected to the particular shard where stripe is located. All other requests (i.e. get relation or database size) are always send to shard 0. ## Summary of changes Support of sharding at compute side include three things: 1. Make it possible to specify and change in runtime connection to more retain one page server 2. Send `get_page` request to the particular shard (determined by hash of page key) 3. Support multiple servers in prefetch ring requests ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: John Spray <john@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-01-25 15:53:31 +02:00
Joonas Koivunen	463b6a26b5	test: show relative order eviction with "fast growing tenant" (#6377 ) Refactor out test_disk_usage_eviction tenant creation and add a custom case with 4 tenants, 3 made with pgbench scale=1 and 1 made with pgbench scale=4. Because the tenants are created in order of scales [1, 1, 1, 4] this is simple enough to demonstrate the problem with using absolute access times, because on a disk usage based eviction run we will disproportionally target the first scale=1 tenant(s), and the later larger tenant does not lose anything. This test is not enough to show the difference between `relative_equal` and `relative_spare` (the fudge factor); much larger scale will be needed for "the large tenant", but that will make debug mode tests slower. Cc: #5304	2024-01-25 15:38:28 +02:00
John Spray	c9b1657e4c	pageserver: fixes for creation operations overlapping with shutdown/startup (#6436 ) ## Problem For #6423, creating a reproducer turned out to be very easy, as an extension to test_ondemand_activation. However, before I had diagnosed the issue, I was starting with a more brute force approach of running creation API calls in the background while restarting a pageserver, and that shows up a bunch of other interesting issues. In this PR: - Add the reproducer for #6423 by extending `test_ondemand_activation` (confirmed that this test fails if I revert the fix from https://github.com/neondatabase/neon/pull/6430) - In timeline creation, return 503 responses when we get an error and the tenant's cancellation token is set: this covers the cases where we get an anyhow::Error from something during timeline creation as a result of shutdown. - While waiting for tenants to become active during creation, don't .map_err() the result to a 500: instead let the `From` impl map the result to something appropriate (this includes mapping shutdown to 503) - During tenant creation, we were calling `Tenant::load_local` because no Preload object is provided. This is usually harmless because the tenant dir is empty, but if there are some half-created timelines in there, bad things can happen. Propagate the SpawnMode into Tenant::attach, so that it can properly skip _any_ attempt to load timelines if creating. - When we call upsert_location, there's a SpawnMode that tells us whether to load from remote storage or not. But if the operation is a retry and we already have the tenant, it is not correct to skip loading from remote storage: there might be a timeline there. This isn't strictly a correctness issue as long as the caller behaves correctly (does not assume that any timelines are persistent until the creation is acked), but it's a more defensive position. - If we shut down while the task in Tenant::attach is running, it can end up spawning rogue tasks. Fix this by holding a GateGuard through here, and in upsert_location shutting down a tenant after calling tenant_spawn if we can't insert it into tenants_map. This fixes the expected behavior that after shutdown_all_tenants returns, no tenant tasks are running. - Add `test_create_churn_during_restart`, which runs tenant & timeline creations across pageserver restarts. - Update a couple of tests that covered cancellation, to reflect the cleaner errors we now return.	2024-01-25 12:35:52 +00:00
Arpad Müller	b92be77e19	Make RemoteStorage not use async_trait (#6464 ) Makes the `RemoteStorage` trait not be based on `async_trait` any more. To avoid recursion in async (not supported by Rust), we made `GenericRemoteStorage` generic on the "Unreliable" variant. That allows us to have the unreliable wrapper never contain/call itself. related earlier work: #6305	2024-01-24 21:27:54 +01:00
Arthur Petukhovsky	8cb8c8d7b5	Allow remove_wal.rs to run on inactive timelines (#6462 ) Temporary enable it on staging to help with https://github.com/neondatabase/neon/issues/6403 Can be also deployed to prod if will work well on staging.	2024-01-24 16:48:56 +00:00
Conrad Ludgate	210700d0d9	proxy: add newtype wrappers for string based IDs (#6445 ) ## Problem too many string based IDs. easy to mix up ID types. ## Summary of changes Add a bunch of `SmolStr` wrappers that provide convenience methods but are type safe	2024-01-24 16:38:10 +00:00
Joonas Koivunen	a0a3ba85e7	fix(page_service): walredo logging problem (#6460 ) Fixes: #6459 by formatting full causes of an error to log, while keeping the top level string for end-user. Changes user visible error detail from: ``` -DETAIL: page server returned error: Read error: Failed to reconstruct a page image: +DETAIL: page server returned error: Read error ``` However on pageserver logs: ``` -ERROR page_service_conn_main{...}: error reading relation or page version: Read error: Failed to reconstruct a page image: +ERROR page_service_conn_main{...}: error reading relation or page version: Read error: reconstruct a page image: launch walredo process: spawn process: Permission denied (os error 13) ```	2024-01-24 15:47:17 +00:00
Arpad Müller	d820aa1d08	Disable initdb cancellation (#6451 ) ## Problem The initdb cancellation added in #5921 is not sufficient to reliably abort the entire initdb process. Initdb also spawns children. The tests added by #6310 (#6385) and #6436 now do initdb cancellations on a more regular basis. In #6385, I attempted to issue `killpg` (after giving it a new process group ID) to kill not just the initdb but all its spawned subprocesses, but this didn't work. Initdb doesn't take that long in the end either, so we just wait until it concludes. ## Summary of changes * revert initdb cancellation support added in #5921 * still return `Err(Cancelled)` upon cancellation, but this is just to not have to remove the cancellation infrastructure * fixes to the `test_tenant_delete_races_timeline_creation` test to make it reliably pass Fixes #6385	2024-01-24 13:06:05 +01:00
Christian Schwarz	996abc9563	pagebench-based GetPage@LSN performance test (#6214 )	2024-01-24 12:51:53 +01:00
John Spray	a72af29d12	control_plane/attachment_service: implement PlacementPolicy::Detached (#6458 ) ## Problem The API for detaching things wasn't implement yet, but one could hit this case indirectly from tests when using attach-hook, and find tenants unexpectedly attached again because their policy remained Single. ## Summary of changes Add PlacementPolicy::Detached, and: - add the behavior for it in schedule() - in tenant_migrate, refuse if the policy is detached - automatically set this policy in attach-hook if the caller has specified pageserver=null.	2024-01-24 12:49:30 +01:00
Sasha Krassovsky	4f51824820	Fix creating publications for all tables	2024-01-23 22:41:00 -08:00
Christian Schwarz	743f6dfb9b	fix(attachment_service): corrupted attachments.json when parallel requests (#6450 ) The pagebench integration PR (#6214) issues attachment requests in parallel. We observed corrupted attachments.json from time to time, especially in the test cases with high tenant counts. The atomic overwrite added in #6444 exposed the root cause cleanly: the `.commit()` calls of two request handlers could interleave or be reordered. See also: https://github.com/neondatabase/neon/pull/6444#issuecomment-1906392259 This PR makes changes to the `persistence` module to fix above race: - mpsc queue for PendingWrites - one writer task performs the writes in mpsc queue order - request handlers that need to do writes do it using the new `mutating_transaction` function. `mutating_transaction`, while holding the lock, does the modifications, serializes the post-modification state, and pushes that as a `PendingWrite` into the mpsc queue. It then release the lock and `await`s the completion of the write. The writer tasks executes the `PendingWrites` in queue order. Once the write has been executed, it wakes the writing tokio task.	2024-01-23 19:14:32 +00:00
Arpad Müller	faf275d4a2	Remove initdb on timeline delete (#6387 ) This PR: * makes `initdb.tar.zst` be deleted by default on timeline deletion (#6226), mirroring the safekeeper: https://github.com/neondatabase/neon/pull/6381 * adds a new `preserve_initdb_archive` endpoint for a timeline, to be used during the disaster recovery process, see reasoning [here](https://github.com/neondatabase/neon/issues/6226#issuecomment-1894574778) * makes the creation code look for `initdb-preserved.tar.zst` in addition to `initdb.tar.zst`. * makes the tests use the new endpoint fixes #6226	2024-01-23 18:22:59 +00:00
Vlad Lazar	001f0d6db7	pageserver: fix import failure caused by merge race (#6448 ) PR #6406 raced with #6372 and broke main.	2024-01-23 18:07:01 +01:00
Christian Schwarz	42c17a6fc6	attachment_service: use atomic overwrite to persist attachments.json (#6444 ) The pagebench integration PR (#6214) is the first to SIGQUIT & then restart attachment_service. With many tenants (100), we have found frequent failures on restart in the CI[^1]. [^1]: [Allure](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6214/7615750160/index.html#suites/e26265675583c610f99af77084ae58f1/851ff709578c4452/) ``` 2024-01-22T19:07:57.932021Z INFO request{method=POST path=/attach-hook request_id=2697503c-7b3e-4529-b8c1-d12ef912d3eb}: Request handled, status: 200 OK 2024-01-22T19:07:58.898213Z INFO Got SIGQUIT. Terminating 2024-01-22T19:08:02.176588Z INFO version: git-env:d56f31639356ed8e8ce832097f132f27ee19ac8a, launch_timestamp: 2024-01-22 19:08:02.174634554 UTC, build_tag build_tag-env:7615750160, state at /tmp/test_output/test_pageserver_max_throughput_getpage_at_latest_lsn[10-13-30]/repo/attachments.json, listening on 127.0.0.1:15048 thread 'main' panicked at /__w/neon/neon/control_plane/attachment_service/src/persistence.rs:95:17: Failed to load state from '/tmp/test_output/test_pageserver_max_throughput_getpage_at_latest_lsn[10-13-30]/repo/attachments.json': trailing characters at line 1 column 8957 (maybe your .neon/ dir was written by an older version?) stack backtrace: 0: rust_begin_unwind at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5 1: core::panicking::panic_fmt at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14 2: attachment_service::persistence::PersistentState::load_or_new::{{closure}} at ./control_plane/attachment_service/src/persistence.rs:95:17 3: attachment_service::persistence::Persistence:🆕:{{closure}} at ./control_plane/attachment_service/src/persistence.rs:103:56 4: attachment_service::main::{{closure}} at ./control_plane/attachment_service/src/main.rs:69:61 5: tokio::runtime::park::CachedParkThread::block_on::{{closure}} at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/park.rs:282:63 6: tokio::runtime::coop::with_budget at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/coop.rs:107:5 7: tokio::runtime::coop::budget at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/coop.rs:73:5 8: tokio::runtime::park::CachedParkThread::block_on at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/park.rs:282:31 9: tokio::runtime::context::blocking::BlockingRegionGuard::block_on at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/context/blocking.rs:66:9 10: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}} at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/scheduler/multi_thread/mod.rs:87:13 11: tokio::runtime::context::runtime::enter_runtime at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/context/runtime.rs:65:16 12: tokio::runtime::scheduler::multi_thread::MultiThread::block_on at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/scheduler/multi_thread/mod.rs:86:9 13: tokio::runtime::runtime::Runtime::block_on at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/runtime.rs:350:50 14: attachment_service::main at ./control_plane/attachment_service/src/main.rs:99:5 15: core::ops::function::FnOnce::call_once at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace. ``` The attachment_service handles SIGQUIT by just exiting the process. In theory, the SIGQUIT could come in while we're writing out the `attachments.json`. Now, in above log output, there's a 1 second gap between the last request completing and the SIGQUIT coming in. So, there must be some other issue. But, let's have this change anyways, maybe it helps uncover the real cause for the test failure.	2024-01-23 17:21:06 +01:00
Vlad Lazar	37638fce79	pageserver: introduce vectored Timeline::get interface (#6372 ) 1. Introduce a naive `Timeline::get_vectored` implementation The return type is intended to be flexible enough for various types of callers. We return the pages in a map keyed by `Key` such that the caller doesn't have to map back to the key if it needs to know it. Some callers can ignore errors for specific pages, so we return a separate `Result<Bytes, PageReconstructError>` for each page and an overarching `GetVectoredError` for API misuse. The overhead of the mapping will be small and bounded since we enforce a maximum key count for the operation. 2. Use the `get_vectored` API for SLRU segment reconstruction and image layer creation.	2024-01-23 14:23:53 +00:00
Christian Schwarz	50288c16b1	fix(pagebench): avoid CopyFail error in success case (#6443 ) PR #6392 fixed CopyFail in the case where we get cancelled. But, we also want to use `client.shutdown()` if we don't get cancelled.	2024-01-23 15:11:32 +01:00
Conrad Ludgate	e03f8abba9	eager parsing of ip addr (#6446 ) ## Problem Parsing the IP address at check time is a little wasteful. ## Summary of changes Parse the IP when we get it from cplane. Adding a `None` variant to still allow malformed patterns	2024-01-23 13:25:01 +00:00
Anna Khanova	1905f0bced	proxy: store role not found in cache (#6439 ) ## Problem There are a lot of responses with 404 role not found error, which are not getting cached in proxy. ## Summary of changes If there was returned an empty secret but with the project_id, store it in cache.	2024-01-23 13:15:05 +01:00
Conrad Ludgate	72de1cb511	remove some duped deps (#6422 ) ## Problem duplicated deps ## Summary of changes little bit of fiddling with deps to reduce duplicates needs consideration: https://github.com/notify-rs/notify/blob/main/CHANGELOG.md#notify-600-2023-05-17	2024-01-23 11:17:15 +00:00
Konstantin Knizhnik	00d9bf5b61	Implement lockless update of pageserver_connstring GUC in shared memory (#6314 ) ## Problem There is "neon.pageserver_connstring" GUC with PGC_SIGHUP option, allowing to change it using pg_reload_conf(). It is used by control plane to update pageserver connection string if page server is crashed, relocated or new shards are added. It is copied to shared memory because config can not be loaded during query execution and we need to reestablish connection to page server. ## Summary of changes Copying connection string to shared memory is done by postmaster. And other backends should check update counter to determine of connection URL is changed and connection needs to be reestablished. We can not use standard Postgres LW-locks, because postmaster has proc entry and so can not wait on this primitive. This is why lockless access algorithm is implemented using two atomic counters to enforce consistent reading of connection string value from shared memory. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-01-23 07:55:05 +02:00
Sasha Krassovsky	71f495c7f7	Gate it behind feature flags	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	0a7e050144	Fix test one last time	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	55bfa91bd7	Fix test again again	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	d90b2b99df	Fix test again	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	27587e155d	Fix test	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	55aede2762	Prevnet duplicate insertions	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	9f186b4d3e	Fix query	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	585687d563	Fix syntax error	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	65a98e425d	Switch to bigint	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	b2e7249979	Sleep	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	844303255a	Cargo fmt	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	6d8df2579b	Fix dumb thing	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	3c3b53f8ad	Update test	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	30064eb197	Add scary comment	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	869acfe29b	Make migrations transactional	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	11a91eaf7b	Uncomment the thread	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	394ef013d0	Push the migrations test	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	a718287902	Make migrations happen on a separate thread	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	2eac1adcb9	Make clippy happy	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	3f90b2d337	Fix test_ddl_forwarding	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	a40ed86d87	Add test for migrations, add initial migration	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	1bf8bb88c5	Add support for migrations within compute_ctl	2024-01-22 14:53:29 -08:00
Vlad Lazar	f1901833a6	pageserver_api: migrate keyspace related functions from `pgdatadir_mapping` (#6406 ) The idea is to achieve separation between keyspace layout definition and operating on said keyspace. I've inlined all these function since they're small and we don't use LTO in the storage release builds at the moment. Closes https://github.com/neondatabase/neon/issues/6347	2024-01-22 19:16:38 +00:00
Arthur Petukhovsky	b41ee81308	Log warning on slow WAL removal (#6432 ) Also add `safekeeper_active_timelines` metric. Should help investigating #6403	2024-01-22 18:38:05 +00:00
Christian Schwarz	205b6111e6	attachment_service: /attach-hook: correctly handle detach (#6433 ) Before this patch, we would update the `tenant_state.intent` in memory but not persist the detachment to disk. I noticed this in https://github.com/neondatabase/neon/pull/6214 where we stop, then restart, the attachment service.	2024-01-22 18:27:05 +00:00
John Spray	93572a3e99	pageserver: mark tenant broken when cancelling attach (#6430 ) ## Problem When a tenant is in Attaching state, and waiting for the `concurrent_tenant_warmup` semaphore, it also listens for the tenant cancellation token. When that token fires, Tenant::attach drops out. Meanwhile, Tenant::set_stopping waits forever for the tenant to exit Attaching state. Fixes: https://github.com/neondatabase/neon/issues/6423 ## Summary of changes - In the absence of a valid state for the tenant, it is set to Broken in this path. A more elegant solution will require more refactoring, beyond this minimal fix.	2024-01-22 15:50:32 +00:00
Christian Schwarz	15c0df4de7	fixup(#6037 ): actually fix the issue, #6388 failed to do so (#6429 ) Before this patch, the select! still retured immediately if `futs` was empty. Must have tested a stale build in my manual testing of #6388.	2024-01-22 14:27:29 +00:00
Anna Khanova	3290fb09bf	Proxy: fix gc (#6426 ) ## Problem Gc currently doesn't work properly. ## Summary of changes Change statement on running gc.	2024-01-22 13:24:10 +00:00
hamishc	efdb2bf948	Added missing PG_VERSION arg into compute node dockerfile (#6382 ) ## Problem If you build the compute-node dockerfile with the PG_VERSION argument passed in (e.g. `docker build -f Dockerfile.compute-node --build-arg PG_VERSION=v15 .`, it fails, as some of stages doesn't have the PG_VERSION arg defined. ## Summary of changes Added the PG_VERSION arg to the plv8-build, neon-pg-ext-build, and pg-embedding-pg-build stages of Dockerfile.compute-node	2024-01-22 11:05:27 +00:00
Conrad Ludgate	5559b16953	bump shlex (#6421 ) ## Problem https://rustsec.org/advisories/RUSTSEC-2024-0006 ## Summary of changes `cargo update -p shlex`	2024-01-22 09:14:30 +00:00
Konstantin Knizhnik	1aea65eb9d	Fix potential overflow in update_next_xid (#6412 ) ## Problem See https://neondb.slack.com/archives/C06F5UJH601/p1705731304237889 Adding 1 to xid in `update_next_xid` can cause overflow in debug mode. 0xffffffff is valid transaction ID. ## Summary of changes Use `wrapping_add` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-01-21 22:11:00 +02:00
Conrad Ludgate	34ddec67d9	proxy small tweaks (#6398 ) ## Problem In https://github.com/neondatabase/neon/pull/6283 I did a couple changes that weren't directly related to the goal of extracting the state machine, so I'm putting them here ## Summary of changes - move postgres vs console provider into another enum - reduce error cases for link auth - slightly refactor link flow	2024-01-21 09:58:42 +01:00
Anna Khanova	9ace36d93c	Proxy: do not store empty key (#6415 ) ## Problem Currently we store in cache even if the project is undefined. That makes invalidation impossible. ## Summary of changes Do not store if project id is empty.	2024-01-20 16:14:53 +00:00
Heikki Linnakangas	e4898a6e60	Don't pass InvalidTransactionId to update_next_xid. (#6410 ) update_next_xid() doesn't have any special treatment for the invalid or other special XIDs, so it will treat InvalidTransactionId (0) as a regular XID. If old nextXid is smaller than 2^31, 0 will look like a very old XID, and nothing happens. But if nextXid is greater than 2^31 0 will look like a very new XID, and update_next_xid() will incorrectly bump up nextXID.	2024-01-20 18:04:16 +02:00
Joonas Koivunen	c77981289c	build: terminate long running tests (#6389 ) configures nextest to kill tests after 1 minute. slow period is set to 20s which is how long our tests currently take in total, there will be 2 warnings and then the test will be killed and it's output logged. Cc: #6361 Cc: #6368 -- likely this will be enough for longer time, but it will be counter productive when we want to attach and debug; the added line would have to be commented out.	2024-01-20 17:41:55 +02:00
Anna Khanova	f003dd6ad5	Remove rename in parameters (#6411 ) ## Problem Name in notifications is not compatible with console name. ## Summary of changes Rename fields to make it compatible.	2024-01-20 10:20:53 +00:00
Conrad Ludgate	7e7e9f5191	proxy: add more columns to parquet upload (#6405 ) ## Problem Some fields were missed in the initial spec. ## Summary of changes Adds a success boolean (defaults to false unless specifically marked as successful). Adds a duration_us integer that tracks how many microseconds were taken from session start through to request completion.	2024-01-20 09:38:11 +00:00
Christian Schwarz	760a48207d	fixup(#6037 ): page_service hangs up within 10ms if there's no message (#6388 ) From #6037 on, until this patch, if the client opens the connection but doesn't send a `PagestreamFeMessage` within the first 10ms, we'd close the connection because `self.timeline_cancelled()` returns. It returns because `self.shard_timelines` is still empty at that point: it gets filled lazily within the handlers for the incoming messages. Changes ------- The question is: if we can't check for timeline cancellation, what else do we need to be cancellable for? `tenant.cancel` is also a bad choice because the `tenant` (shard) we pick at the top of handle_pagerequests might indeed go away over the course of the connection lifetime, but other shards may still be there. The correct solution, I think, is to be responsive to task_mgr cancellation, because the connection handler runs in a task_mgr task and it is already the current canonical way how we shut down a tenant's / timelin's page_service connections (see `Tenant::shutdown` / `Timeline::shutdown`). So, rename the function and make it sensitive to task_mgr cancellation.	2024-01-19 19:16:01 +00:00
Arseny Sher	88df057531	Delete WAL segments from s3 when timeline is deleted. In the most straightforward way; safekeeper performs it in DELETE endpoint implementation, with no coordination between sks. delete_force endpoint in the code is renamed to delete as there is only one way to delete.	2024-01-19 20:11:24 +04:00
Alexander Bayandin	c65ac37a6d	zenbenchmark: attach perf results to allure report (#6395 ) ## Problem For PRs with `run-benchmarks` label, we don't upload results to the db, making it harder to debug such tests. The only way to see some numbers is by examining GitHub Action output which is really inconvenient. This PR adds zenbenchmark metrics to Allure reports. ## Summary of changes - Create a json file with zenbenchmark results and attach it to allure report	2024-01-18 20:59:43 +00:00
Arthur Petukhovsky	a092127b17	Fix truncateLsn initialization (#6396 ) In `7f828890cf` we changed the logic for persisting control_files. Previously it was updated if `peer_horizon_lsn` jumped more than one segment, which made `peer_horizon_lsn` initialized on disk as soon as safekeeper has received a first `AppendRequest`. This caused an issue with `truncateLsn`, which now can be zero sometimes. This PR fixes it, and now `truncateLsn/peer_horizon_lsn` can never be zero once we know `timeline_start_lsn`. Closes https://github.com/neondatabase/neon/issues/6248	2024-01-18 18:55:24 +00:00
Christian Schwarz	e8f773387d	pagebench: avoid noise about `CopyFail` in PS logs (#6392 ) Before this patch, pagebench get-page-latest-lsn would sometimes cause noisy errors in pageserver log about `CopyFail` protocol message. refs https://github.com/neondatabase/neon/issues/6390	2024-01-18 18:50:42 +00:00
Christian Schwarz	00936d19e1	pagebench: use tracing panic hook (#6393 )	2024-01-18 18:39:38 +00:00
Joonas Koivunen	57155ada77	temp: human readable summaries for relative access time compared to absolute (#6384 ) With testing the new eviction order there is a problem of all of the (currently rare) disk usage based evictions being rare and unique; this PR adds a human readable summary of what absolute order would had done and what the relative order does. Assumption is that these loggings will make the few evictions runs in staging more useful. Cc: #5304 for allowing testing in the staging	2024-01-18 17:21:08 +02:00
Konstantin Knizhnik	02b916d3c9	Use [NEON_SMGR] tag for all messages in neon extension (#6313 ) ## Problem Use [NEON_SMGR] for all log messages produced by neon extension. ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-01-18 17:08:34 +02:00
Anastasia Lubennikova	e6e013b3b7	Fix pgbouncer settings update: - Start pgbouncer in VM from postgres user, to allow connection to pgbouncer admin console. - Remove unused compute_ctl options --pgbouncer-connstr and --pgbouncer-ini-path. - Fix and cleanup code of connection to pgbouncer, add retries because pgbouncer may not be instantly ready when compute_ctl starts.	2024-01-18 11:27:12 +00:00
John Spray	bd19290d9f	pageserver: add shard_id to metric labels (#6308 ) ## Problem tenant_id/timeline_id is no longer a full identifier for metrics from a `Tenant` or `Timeline` object. Closes: https://github.com/neondatabase/neon/issues/5953 ## Summary of changes Include `shard_id` label everywhere we have `tenant_id`/`timeline_id` label.	2024-01-18 10:52:18 +00:00
Joonas Koivunen	a584e300d1	test: figure out the relative eviction order assertions (#6375 ) I just failed to see this earlier on #6136. layer counts are used as an abstraction, and each of the two tenants lose proportionally about the same amount of layers. sadly there is no difference in between `relative_spare` and `relative_equal` as both of these end up evicting the exact same amount of layers, but I'll try to add later another test for those. Cc: #5304	2024-01-18 12:39:45 +02:00
Joonas Koivunen	e247ddbddc	build: update h2 (#6383 ) Notes: https://github.com/hyperium/h2/releases/tag/v0.3.24 Related: https://rustsec.org/advisories/RUSTSEC-2024-0003	2024-01-18 09:54:15 +00:00
Konstantin Knizhnik	0dc4c9b0b8	Relsize hash lru eviction (#6353 ) ## Problem Currently relation hash size is limited by "neon.relsize_hash_size" GUC with default value 64k. 64k relations is not so small number... but it is enough to create 376 databases to exhaust it. ## Summary of changes Use LRU replacement algorithm to prevent hash overflow ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-01-17 20:34:30 +02:00
John Spray	b6ec11ad78	control_plane: generalize attachment_service to handle sharding (#6251 ) ## Problem To test sharding, we need something to control it. We could write python code for doing this from the test runner, but this wouldn't be usable with neon_local run directly, and when we want to write tests with large number of shards/tenants, Rust is a better fit efficiently handling all the required state. This service enables automated tests to easily get a system with sharding/HA without the test itself having to set this all up by hand: existing tests can be run against sharded tenants just by setting a shard count when creating the tenant. ## Summary of changes Attachment service was previously a map of TenantId->TenantState, where the principal state stored for each tenant was the generation and the last attached pageserver. This enabled it to serve the re-attach and validate requests that the pageserver requires. In this PR, the scope of the service is extended substantially to do overall management of tenants in the pageserver, including tenant/timeline creation, live migration, evacuation of offline pageservers etc. This is done using synchronous code to make declarative changes to the tenant's intended state (`TenantState.policy` and `TenantState.intent`), which are then translated into calls into the pageserver by the `Reconciler`. Top level summary of modules within `control_plane/attachment_service/src`: - `tenant_state`: structure that represents one tenant shard. - `service`: implements the main high level such as tenant/timeline creation, marking a node offline, etc. - `scheduler`: for operations that need to pick a pageserver for a tenant, construct a scheduler and call into it. - `compute_hook`: receive notifications when a tenant shard is attached somewhere new. Once we have locations for all the shards in a tenant, emit an update to postgres configuration via the neon_local `LocalEnv`. - `http`: HTTP stubs. These mostly map to methods on `Service`, but are separated for readability and so that it'll be easier to adapt if/when we switch to another RPC layer. - `node`: structure that describes a pageserver node. The most important attribute of a node is its availability: marking a node offline causes tenant shards to reschedule away from it. This PR is a precursor to implementing the full sharding service for prod (#6342). What's the difference between this and a production-ready controller for pageservers? - JSON file persistence to be replaced with a database - Limited observability. - No concurrency limits. Marking a pageserver offline will try and migrate every tenant to a new pageserver concurrently, even if there are thousands. - Very simple scheduler that only knows to pick the pageserver with fewest tenants, and place secondary locations on a different pageserver than attached locations: it does not try to place shards for the same tenant on different pageservers. This matters little in tests, because picking the least-used pageserver usually results in round-robin placement. - Scheduler state is rebuilt exhaustively for each operation that requires a scheduler. - Relies on neon_local mechanisms for updating postgres: in production this would be something that flows through the real control plane. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-01-17 18:01:08 +00:00
John Spray	4cec95ba13	pageserver: add list API for LocationConf (#6329 ) ## Problem The `/v1/tenant` listing API only applies to attached tenants. For an external service to implement a global reconciliation of its list of shards vs. what's on the pageserver, we need a full view of what's in TenantManager, including secondary tenant locations, and InProgress locations. Dependency of https://github.com/neondatabase/neon/pull/6251 ## Summary of changes - Add methods to Tenant and SecondaryTenant to reconstruct the LocationConf used to create them. - Add `GET /v1/location_config` API	2024-01-17 13:34:51 +00:00
Arpad Müller	ab86060d97	Copy initdb if loading from different timeline ID (#6363 ) Previously, if we: 1. created a new timeline B from a different timeline's A initdb 2. deleted timeline A the initdb for timeline B would be gone, at least in a world where we are deleting initdbs upon timeline deletion. This world is imminent (#6226). Therefore, if the pageserver is instructed to load the initdb from a different timeline ID, copy it to the newly created timeline's directory in S3. This ensures that we can disaster recover the new timeline as well, regardless of whether the original timeline was deleted or not. Part of https://github.com/neondatabase/neon/issues/5282.	2024-01-17 12:42:42 +01:00
Arpad Müller	6ffdcfe6a4	remote_storage: unify azure and S3 tests (#6364 ) The remote_storage crate contains two copies of each test, one for azure and one for S3. The repetition is not necessary and makes the tests more prone to drift, so we remove it by moving the tests into a shared module. The module has a different name depending on where it is included, so that each test still has "s3" or "azure" in its full path, allowing you to just test the S3 test or just the azure tests. Earlier PR that removed some duplication already: #6176 Fixes #6146.	2024-01-16 18:45:19 +01:00
Arpad Müller	4b0204ede5	Add copy operation tests and implement them for azure blobs (#6362 ) This implements the `copy` operation for azure blobs, added to S3 by #6091, and adds tests both to s3 and azure ensuring that the copy operation works.	2024-01-16 12:07:20 +00:00
John Spray	bf4e708646	pageserver: eviction for secondary mode tenants (#6225 ) Follows #6123 Closes: https://github.com/neondatabase/neon/issues/5342 The approach here is to avoid using `Layer` from secondary tenants, and instead make the eviction types (e.g. `EvictionCandidate`) have a variant that carries a Layer for attached tenants, and a different variant for secondary tenants. Other changes: - EvictionCandidate no longer carries a `Timeline`: this was only used for providing a witness reference to remote timeline client. - The types for returning eviction candidates are all in disk_usage_eviction_task.rs now, whereas some of them were in timeline.rs before. - The EvictionCandidate type replaces LocalLayerInfoForDiskUsageEviction type, which was basically the same thing.	2024-01-16 10:29:26 +00:00
John Spray	887e94d7da	page_service: more efficient page_service -> shard lookup (#6037 ) ## Problem In #5980 the page service connection handler gets a simple piece of logic for finding the right Timeline: at connection time, it picks an arbitrary Timeline, and then when handling individual page requests it checks if the original timeline is the correct shard, and if not looks one up. This is pretty slow in the case where we have to go look up the other timeline, because we take the big tenants manager lock. ## Summary of changes - Add a `shard_timelines` map of ShardIndex to Timeline on the page service connection handler - When looking up a Timeline for a particular ShardIndex, consult `shard_timelines` to avoid hitting the TenantsManager unless we really need to. - Re-work the CancellationToken handling, because the handler now holds gateguards on multiple timelines, and so must respect cancellation of _any_ timeline it has in its cache, not just the timeline related to the request it is currently servicing. --------- Co-authored-by: Vlad Lazar <vlad@neon.tech>	2024-01-16 09:39:19 +00:00
John Spray	df9e9de541	pageserver: API updates for sharding (#6330 ) The theme of the changes in this PR is that they're enablers for #6251 which are superficial struct/api changes. This is a spinoff from #6251: - Various APIs + clients thereof take TenantShardId rather than TenantId - The creation API gets a ShardParameters member, which may be used to configure shard count and stripe size. This enables the attachment service to present a "virtual pageserver" creation endpoint that creates multiple shards. - The attachment service will use tenant size information to drive shard splitting. Make a version of `TenantHistorySize` that is usable for decoding these API responses. - ComputeSpec includes a shard stripe size.	2024-01-16 09:21:00 +00:00
Anna Khanova	3f2187eb92	Proxy relax sni check (#6323 ) ## Problem Using the same domain name () for serverless driver can help with connection caching. https://github.com/neondatabase/neon/issues/6290 ## Summary of changes Relax SNI check.	2024-01-16 08:42:13 +00:00
John Khvatov	2a3cfc9665	Remove PAGE_CACHE_ACQUIRE_PINNED_SLOT_TIME histogram. (#6356 ) Fixes #6343. ## Problem PAGE_CACHE_ACQUIRE_PINNED_SLOT_TIME is used on hot path and it adds noticeable latency to GetPage@LSN. ## Refs https://discordapp.com/channels/1176467419317940276/1195022264115151001/1196370689268125716	2024-01-15 17:19:19 +01:00
Cihan Demirci	d34adf46b4	do not provide disclaimer input for the deploy-prod workflow (#6360 ) We've removed this input from the deploy-prod workflow.	2024-01-15 16:15:34 +00:00
Conrad Ludgate	0bac8ddd76	proxy: fix serverless error message info (#6279 ) ## Problem https://github.com/neondatabase/serverless/issues/51#issuecomment-1878677318 ## Summary of changes 1. When we have a db_error, use db_error.message() as the message. 2. include error position. 3. line should be a string (weird?) 4. `datatype` -> `dataType`	2024-01-15 16:43:19 +01:00
Christian Schwarz	0e1ef3713e	fix(pagebench): #6325 broke running without `--runtime` (#6351 ) After PR #6325, when running without --runtime, we wouldn't wait for start_work_barrier, causing the benchmark to not start at all.	2024-01-15 08:54:19 +00:00
Konstantin Knizhnik	31a4eb40b2	Do not suspend compute if autovacuum is active (#6322 ) ## Problem Se.e https://github.com/orgs/neondatabase/projects/49/views/13?pane=issue&itemId=48282912 ## Summary of changes Do not suspend compute if there are active auto vacuum workers ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-01-14 09:33:57 +02:00
Arpad Müller	60ced06586	Fix timeline creation and tenant deletion race (#6310 ) Fixes the race condition between timeline creation and tenant deletion outlined in #6255. Related: #5914, which is a similar race condition about the uninit marker file. Fixes #6255	2024-01-13 09:15:58 +01:00
Christian Schwarz	b76454ae41	add script to set up EC2 storage-optimized instance store for benchmarking (#6350 ) Been using this all the time in https://github.com/neondatabase/neon/pull/6214 Part of https://github.com/neondatabase/neon/issues/5771 Should consider this in https://github.com/neondatabase/neon/issues/6297	2024-01-12 19:25:17 +00:00
Arthur Petukhovsky	97b48c23f8	Compact some compute_ctl logs (#6346 ) Print postgres roles in a single line and add some info.	2024-01-12 18:24:22 +00:00
Christian Schwarz	cd48ea784f	TenantInfo: expose generation number (#6348 ) Generally useful when debugging / troubleshooting. I found this useful when manually duplicating a tenant from a script[^1] where I can't use `neon_fixtures.Pageserver.tenant_attach`'s automatic integration with the neon_local's attachment_service. [^1]: https://github.com/neondatabase/neon/pull/6349	2024-01-12 18:27:11 +01:00
Alexey Kondratov	1c432d5492	[compute_ctl] Do not miss short-living connections (#6008 ) ## Problem Currently, activity monitor in `compute_ctl` has 500 ms polling interval. It also looks on the list of current client backends looking for an active one or one with the most recent state change. This means we can miss short-living connections. Yet, during testing this PR I realized that it's usually not a problem with pooled connection, as pgbouncer maintains connections to Postgres even though client connection are short-living. We can still miss direct connections. ## Summary of changes This commit introduces another way to detect user activity on the compute. It polls a sum of `active_time` and sum of `sessions` from all non-system databases in the `pg_stat_database` [1]. If user runs some queries or just open a direct connection, it will rise; if user will drop db, it can go down, but it's still a change and will be detected as activity. New statistic-based logic seems to be working fine. Yet, after having it running for a couple of hours I've seen several odd cases with connections via pgbouncer: 1. Sometimes, if you run just `psql pooler_connstr -c 'select 1;'` `active_time` could be not updated immediately, and it may take a couple of dozens of seconds. This doesn't seem critical, though. 2. Same query with pooler, `active_time` can be bumped a bit, then pgbouncer keeps open connection to Postgres for ~10 minutes, then it disconnects, and `active_time` could be bumped a bit again. 'Could be' because I've seen it once, but it didn't reproduce for a second try. I think this can create false-positives (hopefully rare), when we will not suspend some computes because of lagged statistics update OR because some non-user processes will try to connect to user databases. Currently, we don't touch them outside of startup and `postgres_exporter` is configured to do not discover other databases, but this can change in the future. New behavior is covered by feature flag `activity_monitor_experimental`, which should be provided by control plane via neondatabase/cloud#9171 [1] https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-DATABASE-VIEW Related to neondatabase/cloud#7966, neondatabase/cloud#7198	2024-01-12 18:15:41 +01:00
Vlad Lazar	02c6abadf0	pageserver: remove depenency of pagebench on pageserver (#6334 ) To achieve this I had to lift the BlockNumber and key_to_rel_block definitions to pageserver_api (similar to a change in #5980). Closes #6299	2024-01-12 17:11:19 +00:00
John Spray	7af4c676c0	pageserver: only upload initdb from shard 0 (#6331 ) ## Problem When creating a timeline on a sharded tenant, we call into each shard. We don't need to upload the initdb from every shard: only do it on shard zero. ## Summary of changes - Move the initdb upload into a function, and only call it on shard zero.	2024-01-12 15:32:27 +01:00
John Spray	aafe79873c	page_service: handle GetActiveTenantError::Cancelled (#6344 ) ## Problem Occasional test failures with QueryError::Other errors saying "cancelled" that get logged at error severity. ## Summary of changes Avoid casting GetActiveTenantError::Cancelled into QueryError::Other -- it should be QueryError::Shutdown, which is not logged as an error.	2024-01-12 12:43:14 +00:00
Christian Schwarz	eae74383c1	pageserver client: mgmt_api: expose reset API (#6326 ) By-product of some hack work that will be thrown away.	2024-01-12 11:07:16 +00:00
Christian Schwarz	8b657a1481	pagebench: getpage: cancellation & better logging (#6325 ) Needed these while working on https://github.com/neondatabase/neon/issues/5479	2024-01-12 11:53:18 +01:00
Christian Schwarz	42613d4c30	refactor(NeonEnv): shutdown of child processes (#6327 ) Also shuts down `Broker`, which, before this PR, we did start in `start()` but relied on the fixture to stop. Do it a bit earlier so that, after `NeonEnv.stop()` returns, there are no child processes using `repo_dir`. Also, drive-by-fixes inverted logic around `ps_assert_metric_no_errors`, missed during https://github.com/neondatabase/neon/pull/6295 --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-01-12 10:23:21 +01:00
Arseny Sher	7f828890cf	Extract safekeeper per timeline state from safekeeper.rs safekeeper.rs is mostly about consensus, but state is wider. Also form SafekeeperState which encapsulates persistent part + in memory layer with API for atomic updates. Moves remote_consistent_lsn back to SafekeeperMemState, fixes its absense from memory dump. Also renames SafekeeperState to TimelinePersistentState, as TimelineMemState and TimelinePersistent state are created.	2024-01-12 10:58:22 +04:00
Sasha Krassovsky	1eb30b40af	Bump postgres version to support CREATE PUBLICATION FOR ALL TABLES	2024-01-11 15:30:33 -08:00
dependabot[bot]	8551a61014	build(deps): bump jinja2 from 3.1.2 to 3.1.3 (#6333 )	2024-01-11 19:49:28 +00:00
Christian Schwarz	087526b81b	neon_local init: add `--force` mode that allows an empty dir (#6328 ) Need this in https://github.com/neondatabase/neon/pull/6214 refs https://github.com/neondatabase/neon/issues/5771	2024-01-11 18:11:44 +00:00
Christian Schwarz	915fba146d	pagebench: getpage: optional keyspace cache file (#6324 ) Proved useful when benchmarking 20k tenant setup when validating https://github.com/neondatabase/neon/issues/5479	2024-01-11 17:42:11 +00:00
Vlad Lazar	da7a7c867e	pageserver: do not bump priority of background task for timeline status requests (#6301 ) ## Problem Previously, `GET /v1/tenant/:tenant_id/timeline` and `GET /v1/tenant/:tenant_id/timeline/:timeline_id` would bump the priority of the background task which computes the initial logical size by cancelling the wait on the synchronisation semaphore. However, the request would still return an approximate logical size. It's undesirable to force background work for a status request. ## Summary of changes This PR updates the priority used by the timeline status request such that they don't do priority boosting by default anymore. An optional query parameter, `force-await-initial-logical-size`, is added for both mentioned endpoints. When set to true, it will skip the concurrency limiting semaphore and wait for the background task to complete before returning the exact logical size. In order to exercise this behaviour in a test I had to add an extra failpoint. If you think it's too intrusive, it can be removed. Also fixeda small bug where the cancellation of a download is reported as an opaque download failure upstream. This caused `test_location_conf_churn` to fail at teardown due to a WARN log line. Closes https://github.com/neondatabase/neon/issues/6168	2024-01-11 15:55:32 +00:00
Conrad Ludgate	551f0cc097	proxy: refactor how neon-options are handled (#6306 ) ## Problem HTTP connection pool was not respecting the PitR options. ## Summary of changes 1. refactor neon_options a bit to allow easier access to cache_key 2. make HTTP not go through `StartupMessageParams` 3. expose SNI processing to replace what was removed in step 2.	2024-01-11 14:58:31 +00:00
Anna Khanova	a84935d266	Extend unsupported startup parameter error message (#6318 ) ## Problem Unsupported startup parameter error happens with pooled connection. However the reason of this error might not be obvious to the user. ## Summary of changes Send more descriptive message with the link to our troubleshooting page: https://neon.tech/docs/connect/connection-errors#unsupported-startup-parameter. Resolves: https://github.com/neondatabase/neon/issues/6291	2024-01-11 12:09:26 +00:00
Christian Schwarz	3ee981889f	compaction: avoid no-op timeline dir fsync (#6311 ) Random find while looking at an idle 20k tenant pageserver where each tenant has 9 tiny L0 layers and compaction produces no new L1s / image layers. The aggregate CPU cost of running this every 20s for 20k tenants is actually substantial, due to the use of `spawn_blocking`.	2024-01-11 10:32:39 +00:00
Christian Schwarz	fc66ba43c4	Revert "revert recent VirtualFile asyncification changes (#5291 )" (#6309 ) This reverts commit `ab1f37e908`. Thereby fixes #5479 Updated Analysis ================ The problem with the original patch was that it, for the first time, exposed the `VirtualFile` code to tokio task concurrency instead of just thread-based concurrency. That caused the VirtualFile file descriptor cache to start thrashing, effectively grinding the system to a halt. Details ------- At the time of the original patch, we had a _lot_ of runnable tasks in the pageserver. The symptom that prompted the revert (now being reverted in this PR) is that our production systems fell into a valley of zero goodput, high CPU, and zero disk IOPS shortly after PS restart. We lay out the root cause for that behavior in this subsection. At the time, there was no concurrency limit on the number of concurrent initial logical size calculations. Initial size calculation was initiated for all timelines within the first 10 minutes as part of consumption metrics collection. On a PS with 20k timelines, we'd thus have 20k runnable tasks. Before the original patch, the `VirtualFile` code never returned `Poll::Pending`. That meant that once we entered it, the calling tokio task would not yield to the tokio executor until we were done performing the VirtualFile operation, i.e., doing a blocking IO system call. The original patch switched the VirtualFile file descriptor cache's synchronization primitives to those from `tokio::sync`. It did not change that we were doing synchronous IO system calls. And the cache had more slots than we have tokio executor threads. So, these primitives never actually needed to return `Poll::Pending`. But, the tokio scheduler makes tokio sync primitives return `Pending` artificially, as a mechanism for the scheduler to get back into control more often ([example](https://docs.rs/tokio/1.35.1/src/tokio/sync/batch_semaphore.rs.html#570)). So, the new reality was that VirtualFile calls could now yield to the tokio executor. Tokio would pick one of the other 19999 runnable tasks to run. These tasks were also using VirtualFile. So, we now had a lot more concurrency in that area of the code. The problem with more concurrency was that caches started thrashing, most notably the VirtualFile file descriptor cache: each time a task would be rescheduled, it would want to do its next VirtualFile operation. For that, it would first need to evict another (task's) VirtualFile fd from the cache to make room for its own fd. It would then do one VirtualFile operation before hitting an await point and yielding to the executor again. The executor would run the other 19999 tasks for fairness before circling back to the first task, which would find its fd evicted. The other cache that would theoretically be impacted in a similar way is the pageserver's `PageCache`. However, for initial logical size calculation, it seems much less relevant in experiments, likely because of the random access nature of initial logical size calculation. Fixes ===== We fixed the above problems by - raising VirtualFile cache sizes - https://github.com/neondatabase/cloud/issues/8351 - changing code to ensure forward-progress once cache slots have been acquired - https://github.com/neondatabase/neon/pull/5480 - https://github.com/neondatabase/neon/pull/5482 - tbd: https://github.com/neondatabase/neon/issues/6065 - reducing the amount of runnable tokio tasks - https://github.com/neondatabase/neon/pull/5578 - https://github.com/neondatabase/neon/pull/6000 - fix bugs that caused unnecessary concurrency induced by connection handlers - https://github.com/neondatabase/neon/issues/5993 I manually verified that this PR doesn't negatively affect startup performance as follows: create a pageserver in production configuration, with 20k tenants/timelines, 9 tiny L0 layer files each; Start it, and observe ``` INFO Startup complete (368.009s since start) elapsed_ms=368009 ``` I further verified in that same setup that, when using `pagebench`'s getpage benchmark at as-fast-as-possible request rate against 5k of the 20k tenants, the achieved throughput is identical. The VirtualFile cache isn't thrashing in that case. Future Work =========== We will still exposed to the cache thrashing risk from outside factors, e.g., request concurrency is unbounded, and initial size calculation skips the concurrency limiter when we establish a walreceiver connection. Once we start thrashing, we will degrade non-gracefully, i.e., encounter a valley as was seen with the original patch. However, we have sufficient means to deal with that unlikely situation: 1. we have dashboards & metrics to monitor & alert on cache thrashing 2. we can react by scaling the bottleneck resources (cache size) or by manually shedding load through tenant relocation Potential systematic solutions are future work: * global concurrency limiting * per-tenant rate limiting => #5899 * pageserver-initiated load shedding Related Issues ============== This PR unblocks the introduction of tokio-epoll-uring for asynchronous disk IO ([Epic](#4744)).	2024-01-11 11:29:14 +01:00
Arthur Petukhovsky	544284cce0	Collapse multiline queries in compute_ctl (#6316 )	2024-01-10 22:25:28 +04:00
Arthur Petukhovsky	71beabf82d	Join multiline postgres logs in compute_ctl (#5903 ) Postgres can write multiline logs, and they are difficult to handle after they are mixed with other logs. This PR combines multiline logs from postgres into a single line, where previous line breaks are replaced with unicode zero-width spaces. Then postgres logs are written to stderr with `PG:` prefix. It makes it easy to distinguish postgres logs from all other compute logs with a simple grep, e.g. `\|= "PG:"`	2024-01-10 15:11:43 +00:00
Anna Khanova	76372ce002	Added auth info cache with notifiations to redis. (#6208 ) ## Problem Current cache doesn't support any updates from the cplane. ## Summary of changes * Added redis notifier listner. * Added cache which can be invalidated with the notifier. If the notifier is not available, it's just a normal ttl cache. * Updated cplane api. The motivation behind this organization of the data is the following: * In the Neon data model there are projects. Projects could have multiple branches and each branch could have more than one endpoint. * Also there is one special `main` branch. * Password reset works per branch. * Allowed IPs are the same for every branch in the project (except, maybe, the main one). * The main branch can be changed to the other branch. * The endpoint can be moved between branches. Every event described above requires some special processing on the porxy (or cplane) side. The idea of invalidating for the project is that whenever one of the events above is happening with the project, proxy can invalidate all entries for the entire project. This approach also requires some additional API change (returning project_id inside the auth info).	2024-01-10 11:51:05 +00:00
Christian Schwarz	4e1b0b84eb	pagebench: fixup after is_rel_block_key changes in #6266 (#6303 ) PR #6266 broke the getpage_latest_lsn benchmark. Before this patch, we'd fail with ``` not implemented: split up range ``` because `r.start = rel size key` and `r.end = rel size key + 1`. The filtering of the key ranges in that loop is a bit ugly, but, I measured: * setup with 180k layer files (20k tenants * 9 layers). * total physical size is 463GiB * 5k tenants, the range filtering takes `0.6 seconds` on an i3en.3xlarge. That's a tiny fraction of the overall time it takes for pagebench to get ready to send requests. So, this is good enough for now / there are other bottlenecks that are bigger.	2024-01-09 19:00:37 +01:00
John Spray	f94abbab95	pageserver: clean up a redundant tenant_id attribute (#6280 ) This was a small TODO(sharding) thing in TenantHarness.	2024-01-09 12:10:15 +00:00
John Spray	4b9b4c2c36	pageserver: cleanup redundant create/attach code, fix detach while attaching (#6277 ) ## Problem The code for tenant create and tenant attach was just a special case of what upsert_location does. ## Summary of changes - Use `upsert_location` for create and attach APIs - Clean up error handling in upsert_location so that it can generate appropriate HTTP response codes - Update tests that asserted the old non-idempotent behavior of attach - Rework the `test_ignore_while_attaching` test, and fix tenant shutdown during activation, which this test was supposed to cover, but it was actually just waiting for activation to complete.	2024-01-09 10:37:54 +00:00
Arpad Müller	8186f6b6f9	Drop async_trait usage from three internal traits (#6305 ) This uses the [newly stable](https://blog.rust-lang.org/2023/12/21/async-fn-rpit-in-traits.html) async trait feature for three internal traits. One requires `Send` bounds to be present so uses `impl Future<...> + Send` instead. Advantages: * less macro usage * no extra boxing Disadvantages: * impl syntax needed for `Send` bounds is a bit more verbose (but only required in one place)	2024-01-09 11:20:08 +01:00
Christian Schwarz	90e0219b29	python tests: support overlayfs for NeonEnvBuilder.from_repo_dir (#6295 ) Part of #5771 Extracted from https://github.com/neondatabase/neon/pull/6214 This PR makes the test suite sensitive to the new env var `NEON_ENV_BUILDER_FROM_REPO_DIR_USE_OVERLAYFS`. If it is set, `NeonEnvBuilder.from_repo_dir` uses overlayfs to duplicate the the snapshot repo dir contents. Since mounting requires root privileges, we use sudo to perform the mounts. That, and macOS support, is also why copytree remains the default. If we ever run on a filesystem with copy reflink support, we should consider that as an alternative. This PR can be tried on a Linux machine on the `test_backward_compatiblity` test, which uses `from_repo_dir`.	2024-01-09 10:15:46 +00:00
Christian Schwarz	4b6004e8c9	fix(page_service client): correctly deserialize pagestream errors (#6302 ) Before this PR, we wouldn't advance the underlying `Bytes`'s cursor. fixes https://github.com/neondatabase/neon/issues/6298	2024-01-09 10:22:43 +01:00
Em Sharnoff	9bf7664049	vm-monitor: Remove spammy log line (#6284 ) During a previous incident, we noticed that this particular line can be repeatedly logged every 100ms if the memory usage continues is persistently high enough to warrant upscaling. Per the added comment: Ideally we'd still like to include this log line, because it's useful information, but the simple way to include it produces far too many log lines, and the more complex ways to deduplicate the log lines while still including the information are probably not worth the effort right now.	2024-01-08 21:12:39 -08:00
Arpad Müller	d5e3434371	Also allow unnecessary_fallible_conversions lint (#6294 ) This fixes the clippy lint firing on macOS on the conversion which needed for portability. For some reason, the logic in https://github.com/rust-lang/rust-clippy/pull/11669 to avoid an overlap is not working.	2024-01-09 04:22:36 +00:00
Christian Schwarz	66c52a629a	RFC: vectored `Timeline::get` (#6250 )	2024-01-08 15:00:01 +00:00
Conrad Ludgate	8a646cb750	proxy: add request context for observability and blocking (#6160 ) ## Summary of changes ### RequestMonitoring We want to add an event stream with information on each request for easier analysis than what we can do with diagnostic logs alone (https://github.com/neondatabase/cloud/issues/8807). This RequestMonitoring will keep a record of the final state of a request. On drop it will be pushed into a queue to be uploaded. Because this context is a bag of data, I don't want this information to impact logic of request handling. I personally think that weakly typed data (such as all these options) makes for spaghetti code. I will however allow for this data to impact rate-limiting and blocking of requests, as this does not _really_ change how a request is handled. ### Parquet Each `RequestMonitoring` is flushed into a channel where it is converted into `RequestData`, which is accumulated into parquet files. Each file will have a certain number of rows per row group, and several row groups will eventually fill up the file, which we then upload to S3. We will also upload smaller files if they take too long to construct.	2024-01-08 11:42:43 +00:00
Arpad Müller	a4ac8e26e8	Update Rust to 1.75.0 (#6285 ) [Release notes](https://github.com/rust-lang/rust/releases/tag/1.75.0).	2024-01-08 11:46:16 +01:00
John Spray	b3a681d121	s3_scrubber: updates for sharding (#6281 ) This is a lightweight change to keep the scrubber providing sensible output when using sharding. - The timeline count was wrong when using sharding - When checking for tenant existence, we didn't re-use results between different shards in the same tenant Closes: https://github.com/neondatabase/neon/issues/5929	2024-01-08 09:19:10 +00:00
John Spray	b5ed6f22ae	pageserver: clean up a TODO comment (#6282 ) These functions don't need updating for sharding: it's fine for them to remain shard-naive, as they're only used in the context of dumping a layer file. The sharding metadata doesn't live in the layer file, it lives in the index.	2024-01-08 09:19:00 +00:00
John Spray	d1c0232e21	pageserver: use `pub(crate)` in metrics.rs, and clean up unused items (#6275 ) ## Problem Noticed while making other changes that there were `pub` items that were unused. ## Summary of changes - Make everything `pub(crate)` in metrics.rs, apart from items used from `bin/` - Fix the timelines eviction metric: it was never being incremented - Remove an unused ephemeral_bytes counter.	2024-01-08 03:53:15 +00:00
Arseny Sher	a41c4122e3	Don't suspend compute if there is active LR subscriber. https://github.com/neondatabase/neon/issues/6258	2024-01-06 01:24:44 +04:00
Alexander Bayandin	7de829e475	test_runner: replace black with ruff format (#6268 ) ## Problem `black` is slow sometimes, we can replace it with `ruff format` (a new feature in 0.1.2 [0]), which produces pretty similar to black style [1]. On my local machine (MacBook M1 Pro 16GB): ``` # `black` on main $ hyperfine "BLACK_CACHE_DIR=/dev/null poetry run black ." Benchmark 1: BLACK_CACHE_DIR=/dev/null poetry run black . Time (mean ± σ): 3.131 s ± 0.090 s [User: 5.194 s, System: 0.859 s] Range (min … max): 3.047 s … 3.354 s 10 runs ``` ``` # `ruff format` on the current PR $ hyperfine "RUFF_NO_CACHE=true poetry run ruff format" Benchmark 1: RUFF_NO_CACHE=true poetry run ruff format Time (mean ± σ): 300.7 ms ± 50.2 ms [User: 259.5 ms, System: 76.1 ms] Range (min … max): 267.5 ms … 420.2 ms 10 runs ``` ## Summary of changes - Replace `black` with `ruff format` everywhere - [0] https://docs.astral.sh/ruff/formatter/ - [1] https://docs.astral.sh/ruff/formatter/#black-compatibility	2024-01-05 15:35:07 +00:00
John Spray	3c560d27a8	pageserver: implement secondary-mode downloads (#6123 ) Follows on from #6050 , in which we upload heatmaps. Secondary locations will now poll those heatmaps and download layers mentioned in the heatmap. TODO: - [X] ~Unify/reconcile stats for behind-schedule execution with warn_when_period_overrun (https://github.com/neondatabase/neon/pull/6050#discussion_r1426560695)~ - [x] Give downloads their own concurrency config independent of uploads Deferred optimizations: - https://github.com/neondatabase/neon/issues/6199 - https://github.com/neondatabase/neon/issues/6200 Eviction will be the next PR: - #5342	2024-01-05 12:29:20 +00:00
Christian Schwarz	d260426a14	is_rel_block_key: exclude the relsize key (#6266 ) Before this PR, `is_rel_block_key` returns true for the blknum `0xffffffff`, which is a blknum that's actually never written by Postgres, but used by Neon Pageserver to store the relsize. Quoting @MMeent: > PostgreSQL can't extend the relation beyond size of 0xFFFFFFFF blocks, > so block number 0xFFFFFFFE is the last valid block number. This PR changes the definition of the function to exclude blknum 0xffffffff. My motivation for doing this change is to fix the `pagebench` getpage benchmark, which uses `is_rel_block_key` to filter the keyspace for valid pages to request from page_service. fixes https://github.com/neondatabase/neon/issues/6210 I checked other users of the function. The first one is `key_is_shard0`, which already had added an exemption for 0xffffffff. So, there's no functional change with this PR. The second one is `DatadirModification::flush`[^1]. With this PR, `.flush()` will skip the relsize key, whereas it didn't before. This means we will pile up all the relsize key-value pairs `(Key,u32)` in `DatadirModification::pending_updates` until `.commit()` is called. The only place I can think of where that would be a problem is if we import from a full basebackup, and don't `.commit()` regularly, like we currently don't do in `import_basebackup_from_tar`. It exposes us to input-controlled allocations. However, that was already the case for the other keys that are skipped, so, one can argue that this change is not making the situation much worse. [^1]: That type's `flush()` and `commit()` methods are terribly named, but, that's for another time	2024-01-05 11:48:06 +01:00
Arthur Petukhovsky	f3b5db1443	Add API for safekeeper timeline copy (#6091 ) Implement API for cloning a single timeline inside a safekeeper. Also add API for calculating a sha256 hash of WAL, which is used in tests. `/copy` API works by copying objects inside S3 for all but the last segments, and the last segments are copied on-disk. A special temporary directory is created for a timeline, because copy can take a lot of time, especially for large timelines. After all files segments have been prepared, this directory is mounted to the main tree and timeline is loaded to memory. Some caveats: - large timelines can take a lot of time to copy, because we need to copy many S3 segments - caller should wait for HTTP call to finish indefinetely and don't close the HTTP connection, because it will stop the process, which is not continued in the background - `until_lsn` must be a valid LSN, otherwise bad things can happen - API will return 200 if specified `timeline_id` already exists, even if it's not a copy - each safekeeper will try to copy S3 segments, so it's better to not call this API in-parallel on different safekeepers	2024-01-04 17:40:38 +00:00
John Spray	18e9208158	pageserver: improved error handling for shard routing error, timeline not found (#6262 ) ## Problem - When a client requests a key that isn't found in any shard on the node (edge case that only happens if a compute's config is out of date), we should prompt them to reconnect (as this includes a backoff), since they will not be able to complete the request until they eventually get a correct pageserver connection string. - QueryError::Other is used excessively: this contains a type-ambiguous anyhow::Error and is logged very verbosely (including backtrace). ## Summary of changes - Introduce PageStreamError to replace use of anyhow::Error in request handlers for getpage, etc. - Introduce Reconnect and NotFound variants to QueryError - Map the "shard routing error" case to PageStreamError::Reconnect -> QueryError::Reconnect - Update type conversions for LSN timeouts and tenant/timeline not found errors to use PageStreamError::NotFound->QueryError::NotFound	2024-01-04 10:40:03 +00:00
Sasha Krassovsky	7662df6ca0	Fix minimum backoff to 1ms	2024-01-03 21:09:19 -08:00
John Spray	c119af8ddd	pageserver: run at least 2 background task threads Otherwise an assertion in CONCURRENT_BACKGROUND_TASKS will trip if you try to run the pageserver on a single core.	2024-01-03 14:22:40 +00:00
John Spray	a2e083ebe0	pageserver: make walredo shard-aware This does not have a functional impact, but enables all the logging in this code to include the shard_id label.	2024-01-03 14:22:40 +00:00
John Spray	73a944205b	pageserver: log details on shard routing error	2024-01-03 14:22:40 +00:00
John Spray	34ebfbdd6f	pageserver: fix handling getpage with multiple shards on one node Previously, we would wait for the LSN to be visible on whichever timeline we happened to load at the start of the connection, then proceed to look up the correct timeline for the key and do the read. If the timeline holding the key was behind the timeline we used for the LSN wait, then we might serve an apparently-successful read result that actually contains data from behind the requested lsn.	2024-01-03 14:22:40 +00:00
John Spray	ef7c9c2ccc	pageserver: fix active tenant lookup hitting secondaries with sharding If there is some secondary shard for a tenant on the same node as an attached shard, the secondary shard could trip up this code and cause page_service to incorrectly get an error instead of finding the attached shard.	2024-01-03 14:22:40 +00:00
John Spray	6c79e12630	pageserver: drop unwanted keys during compaction after split	2024-01-03 14:22:40 +00:00
John Spray	753d97bd77	pageserver: don't delete ancestor shard layers	2024-01-03 14:22:40 +00:00
John Spray	edc962f1d7	test_runner: test_issue_5878 log allow list (#6259 ) ## Problem https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6254/7388706419/index.html#suites/5a4b8734277a9878cb429b80c314f470/e54c4f6f6ed22672 ## Summary of changes Permit the log message: because the test helper's detach function increments the generation number, a detach/attach cycle can cause the error if the test runner node is slow enough for the opportunistic deletion queue flush on detach not to complete by the time we call attach.	2024-01-03 14:22:17 +00:00
Arseny Sher	65b4e6e7d6	Remove empty safekeeper init since truncateLsn. It has caveats such as creating half empty segment which can't be offloaded. Instead we'll pursue approach of pull_timeline, seeding new state from some peer.	2024-01-03 18:20:19 +04:00
Alexander Bayandin	17b256679b	vm-image-spec: build pgbouncer from Neon's fork (#6249 ) ## Problem We need to add one more patch to pgbouncer (for https://github.com/neondatabase/neon/issues/5801). I've decided to cherry-pick all required patches to a pgbouncer fork (`neondatabase/pgbouncer`) and use it instead. See https://github.com/neondatabase/pgbouncer/releases/tag/pgbouncer_1_21_0-neon-1 ## Summary of changes - Revert the previous patch (for deallocate/discard all) — the fork already contains it. - Remove `libssl-dev` dependency — we build pgbouncer without `openssl` support. - Clone git tag and build pgbouncer from source code.	2024-01-03 13:02:04 +00:00
John Spray	673a865055	tests: tolerate 304 when evicting layers (#6261 ) In tests that evict layers, explicit eviction can race with automatic eviction of the same layer and result in a 304	2024-01-03 11:50:58 +00:00
Cuong Nguyen	fb518aea0d	Add batch ingestion mechanism to avoid high contention (#5886 ) ## Problem For context, this problem was observed in a research project where we try to make neon run in multiple regions and I was asked by @hlinnaka to make this PR. In our project, we use the pageserver in a non-conventional way such that we would send a larger number of requests to the pageserver than normal (imagine postgres without the buffer pool). I measured the time from the moment a WAL record left the safekeeper to when it reached the pageserver ([code](`e593db1f5a/pageserver/src/tenant/timeline/walreceiver/walreceiver_connection.rs (L282-L287)`)) and observed that when the number of get_page_at_lsn requests was high, the wal receiving time increased significantly (see the left side of the graphs below). Upon further investigation, I found that the delay was caused by this line `d2ca410919/pageserver/src/tenant/timeline.rs (L2348)` The `get_layer_for_write` method is called for every value during WAL ingestion and it tries to acquire layers write lock every time, thus this results in high contention when read lock is acquired more frequently. ![Untitled](https://github.com/neondatabase/neon/assets/6244849/85460f4d-ead1-4532-bc64-736d0bfd7f16) ![Untitled2](https://github.com/neondatabase/neon/assets/6244849/84199ab7-5f0e-413b-a42b-f728f2225218) ## Summary of changes It is unnecessary to call `get_layer_for_write` repeatedly for all values in a WAL message since they would end up in the same memory layer anyway, so I created the batched versions of `InMemoryLayer::put_value`, `InMemoryLayer ::put_tombstone`, `Timeline::put_value`, and `Timeline::put_tombstone`, that acquire the locks once for a batch of values. Additionally, `DatadirModification` is changed to store multiple versions of uncommitted values, and `WalIngest::ingest_record()` can now ingest records without immediately committing them. With these new APIs, the new ingestion loop can be changed to commit for every `ingest_batch_size` records. The `ingest_batch_size` variable is exposed as a config. If it is set to 1 then we get the same behavior before this change. I found that setting this value to 100 seems to work the best, and you can see its effect on the right side of the above graphs. --------- Co-authored-by: John Spray <john@neon.tech>	2024-01-03 10:41:58 +00:00
John Spray	42f41afcbd	tests: update pytest and boto3 dependencies (#6253 ) ## Problem The version of pytest we were using emits a number of DeprecationWarnings on latest python: these are fixed in latest release. boto3 and python-dateutil also have deprecation warnings, but unfortunately these aren't fixed upstream yet. ## Summary of changes - Update pytest - Update boto3 (this doesn't fix deprecation warnings, but by the time I figured that out I had already done the update, and it's good hygiene anyway)	2024-01-03 10:36:53 +00:00
Arseny Sher	f71110383c	Remove second check for max_slot_wal_keep_size download size. Already checked in GetLogRepRestartLSN, a rebase artifact.	2024-01-03 13:13:32 +04:00
Arseny Sher	ae3eaf9995	Add [WP] prefix to all walproposer logging. - rename walpop_log to wp_log - create also wpg_log which is used in postgres-specific code - in passing format messages to start with lower case	2024-01-03 11:10:27 +04:00
Christian Schwarz	aa9f1d4b69	pagebench get-page: default to latest=true, make configurable via flag (#6252 ) fixes https://github.com/neondatabase/neon/issues/6209	2024-01-02 16:57:29 +00:00
Joonas Koivunen	946c6a0006	scrubber: use adaptive config with retries, check subset of tenants (#6219 ) The tool still needs a lot of work. These are the easiest fix and feature: - use similar adaptive config with s3 as remote_storage, use retries - process only particular tenants Tenants need to be from the correct region, they are not deduplicated, but the feature is useful for re-checking small amount of tenants after a large run.	2024-01-02 15:22:16 +00:00
Sasha Krassovsky	ce13281d54	MIN not MAX	2024-01-02 06:28:49 -08:00
Sasha Krassovsky	4e1d16f311	Switch to exponential rate-limiting	2024-01-02 06:28:49 -08:00
Sasha Krassovsky	091a0cda9d	Switch to rate-limiting strategy	2024-01-02 06:28:49 -08:00
Sasha Krassovsky	ea9fad419e	Add exponential backoff to page_server->send	2024-01-02 06:28:49 -08:00
Arseny Sher	e92c9f42c0	Don't split WAL record across two XLogData's when sending from safekeepers. As protocol demands. Not following this makes standby complain about corrupted WAL in various ways. https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719 closes https://github.com/neondatabase/cloud/issues/9057	2024-01-02 10:50:20 +04:00
Arseny Sher	aaaa39d9f5	Add large insertion and slow WAL sending to test_hot_standby. To exercise MAX_SEND_SIZE sending from safekeeper; we've had a bug with WAL records torn across several XLogData messages. Add failpoint to safekeeper to slow down sending. Also check for corrupted WAL complains in standby log. Make the test a bit simpler in passing, e.g. we don't need explicit commits as autocommit is enabled by default. https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719 https://github.com/neondatabase/cloud/issues/9057	2024-01-02 10:50:20 +04:00
Arseny Sher	e79a19339c	Add failpoint support to safekeeper. Just a copy paste from pageserver.	2024-01-02 10:50:20 +04:00
Arseny Sher	dbd36e40dc	Move failpoint support code to utils. To enable them in safekeeper as well.	2024-01-02 10:50:20 +04:00
Arseny Sher	90ef48aab8	Fix safekeeper START_REPLICATION (term=n). It was giving WAL only up to commit_lsn instead of flush_lsn, so recovery of uncommitted WAL since `cdb08f03` hanged. Add test for this.	2024-01-01 20:44:05 +04:00
Arseny Sher	9a43c04a19	compute_ctl: kill postgres and sync-safekeeprs on exit. Otherwise they are left orphaned when compute_ctl is terminated with a signal. It was invisible most of the time because normally neon_local or k8s kills postgres directly and then compute_ctl finishes gracefully. However, in some tests compute_ctl gets stuck waiting for sync-safekeepers which intentionally never ends because safekeepers are offline, and we want to stop compute_ctl without leaving orphanes behind. This is a quite rough approach which doesn't wait for children termination. A better way would be to convert compute_ctl to async which would make waiting easy.	2024-01-01 20:44:05 +04:00
Abhijeet Patil	f28bdb6528	Use nextest for rust unittests (#6223 ) ## Problem `cargo test` doesn't support timeouts or junit output format ## Summary of changes - Add `nextest` to `build-tools` image - Switch `cargo test` with `cargo nextest` on CI - Set timeout	2023-12-30 13:45:31 +00:00
Conrad Ludgate	1c037209c7	proxy: fix compute addr parsing (#6237 ) ## Problem control plane should be able to return domain names and not just IP addresses. ## Summary of changes 1. add regression tests 2. use rsplit to split the port from the back, then trim the ipv6 brackets	2023-12-29 09:32:24 +00:00
Bodobolero	e5a3b6dfd8	Pg stat statements reset for neon superuser (#6232 ) ## Problem Extension pg_stat_statements has function pg_stat_statements_reset(). In vanilla Postgres this function can only be called by superuser role or other users/roles explicitly granted. In Neon no end user can use superuser role. Instead we have neon_superuser role. We need to grant execute on pg_stat_statements_reset() to neon_superuser ## Summary of changes Modify the Postgres v14, v15, v16 contrib in our compute docker file to grant execute on pg_stat_statements_reset() to neon_superuser. (Modifying it in our docker file is preferable to changes in neondatabase/postgres because we want to limit the changes in our fork that we have to carry with each new version of Postgres). Note that the interface of proc/function pg_stat_statements_reset changed in pg_stat_statements version 1.7 So for versions up to and including 1.6 we must `GRANT EXECUTE ON FUNCTION pg_stat_statements_reset() TO neon_superuser;` and for versions starting from 1.7 we must `GRANT EXECUTE ON FUNCTION pg_stat_statements_reset(Oid, Oid, bigint) TO neon_superuser;` If we just use `GRANT EXECUTE ON FUNCTION pg_stat_statements_reset() TO neon_superuser;` for all version this results in the following error for versions 1.7+: ```sql neondb=> create extension pg_stat_statements; ERROR: function pg_stat_statements_reset() does not exist ``` ## Checklist before requesting a review - [x ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [x ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist ## I have run the following test and could now invoke pg_stat_statements_reset() using default user ```bash (neon) peterbendel@Peters-MBP neon % kubectl get pods \| grep compute-quiet-mud-88416983 compute-quiet-mud-88416983-74f4bf67db-crl4c 3/3 Running 0 7m26s (neon) peterbendel@Peters-MBP neon % kubectl set image deploy/compute-quiet-mud-88416983 compute-node=neondatabase/compute-node-v15:7307610371 deployment.apps/compute-quiet-mud-88416983 image updated (neon) peterbendel@Peters-MBP neon % psql postgresql://peterbendel:<secret>@ep-bitter-sunset-73589702.us-east-2.aws.neon.build/neondb psql (16.1, server 15.5) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help. neondb=> select version(); version --------------------------------------------------------------------------------------------------- PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit (1 row) neondb=> create extension pg_stat_statements; CREATE EXTENSION neondb=> select pg_stat_statements_reset(); pg_stat_statements_reset -------------------------- (1 row) ```	2023-12-27 18:15:17 +01:00
Sasha Krassovsky	136aab5479	Bump postgres submodule versions	2023-12-27 08:39:00 -08:00
Anastasia Lubennikova	6e40900569	Manage pgbouncer configuration from compute_ctl: - add pgbouncer_settings section to compute spec; - add pgbouncer-connstr option to compute_ctl. - add pgbouncer-ini-path option to compute_ctl. Default: /etc/pgbouncer/pgbouncer.ini Apply pgbouncer config on compute start and respec to override default spec. Save pgbouncer config updates to pgbouncer.ini to preserve them across pgbouncer restarts.	2023-12-26 15:17:09 +00:00
Arseny Sher	ddc431fc8f	pgindent walproposer condvar comment	2023-12-26 14:12:53 +04:00
Arseny Sher	bfc98f36e3	Refactor handling responses in walproposer. Remove confirm_wal_streamed; we already apply both write and flush positions of the slot to commit_lsn which is fine because 1) we need to wake up waiters 2) committed WAL can be fetched from safekeepers by neon_walreader now.	2023-12-26 14:12:53 +04:00
Arseny Sher	d5fbfe2399	Remove test_wal_deleted_after_broadcast. It is superseded by stronger test_lagging_sk.	2023-12-26 14:12:53 +04:00
Arseny Sher	1f1c50e8c7	Don't re-add neon_walreader socket to waiteventset if possible. Should make recovery slightly more efficient (likely negligibly).	2023-12-26 14:12:53 +04:00
Arseny Sher	854df0f566	Do PQgetCopyData before PQconsumeInput in libpqwp_async_read. To avoid a lot of redundant memmoves and bloated input buffer. fixes https://github.com/neondatabase/neon/issues/6055	2023-12-26 14:12:53 +04:00
Arseny Sher	9c493869c7	Perform synchronous WAL download in wp only for logical replication. wp -> sk communication now uses neon_walreader which will fetch missing WAL on demand from safekeepers, so doesn't need this anymore. Also, cap WAL download by max_slot_wal_keep_size to be able to start compute if lag is too high.	2023-12-26 14:12:53 +04:00
Arseny Sher	df760e6de5	Add test_lagging_sk.	2023-12-26 14:12:53 +04:00
Arseny Sher	14913c6443	Adapt rust walproposer to neon_walreader.	2023-12-26 14:12:53 +04:00
Arseny Sher	cdb08f0362	Introduce NeonWALReader downloading sk -> compute WAL on demand. It is similar to XLogReader, but when either requested segment is missing locally or requested LSN is before basebackup_lsn NeonWALReader asynchronously fetches WAL from one of safekeepers. Patch includes walproposer switch to NeonWALReader, splitting wouldn't make much sense as it is hard to test otherwise. This finally removes risk of pg_wal explosion (as well as slow start time) when one safekeeper is lagging, at the same time allowing to recover it. In the future reader should also be used by logical walsender for similar reasons (currently we download the tail on compute start synchronously). The main test is test_lagging_sk. However, I also run it manually a lot varying MAX_SEND_SIZE on both sides (on safekeeper and on walproposer), testing various fragmentations (one side having small buffer, another, both), which brought up https://github.com/neondatabase/neon/issues/6055 closes https://github.com/neondatabase/neon/issues/1012	2023-12-26 14:12:53 +04:00
Konstantin Knizhnik	572bc06011	Do not copy WAL for lagged slots (#6221 ) ## Problem See https://neondb.slack.com/archives/C026T7K2YP9/p1702813041997959 ## Summary of changes Do not take in account invalidated slots when calculate restart_lsn position for basebackup at page server ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-12-22 20:47:55 +02:00
Arpad Müller	a7342b3897	remote_storage: store last_modified and etag in Download (#6227 ) Store the content of the `last-modified` and `etag` HTTP headers in `Download`. This serves both as the first step towards #6199 and as a preparation for tests in #6155 .	2023-12-22 14:13:20 +01:00
John Spray	e68ae2888a	pageserver: expedite tenant activation on delete (#6190 ) ## Problem During startup, a tenant delete request might have to retry for many minutes waiting for a tenant to enter Active state. ## Summary of changes - Refactor delete_tenant into TenantManager: this is not a functional change, but will avoid merge conflicts with https://github.com/neondatabase/neon/pull/6105 later - Add 412 responses to the swagger definition of this endpoint. - Use Tenant::wait_to_become_active in `TenantManager::delete_tenant` --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-12-22 10:22:22 +00:00
Arpad Müller	83000b3824	buildtools: update protoc and mold (#6222 ) These updates aren't very important but I would like to try out the new process as of #6195	2023-12-21 18:07:21 +01:00
Arpad Müller	a21b719770	Use neon-github-ci-tests S3 bucket for remote_storage tests (#6216 ) This bucket is already used by the pytests. The current bucket github-public-dev is more meant for longer living artifacts. slack thread: https://neondb.slack.com/archives/C039YKBRZB4/p1703124944669009 Part of https://github.com/neondatabase/cloud/issues/8233 / #6155	2023-12-21 17:28:28 +01:00
Alexander Bayandin	1dff98be84	CI: fix build-tools image tag for PRs (#6217 ) ## Problem Fix build-tools image tag calculation for PRs. Broken in https://github.com/neondatabase/neon/pull/6195 ## Summary of changes - Use `pinned` tag instead of `$GITHUB_RUN_ID` if there's no changes in the dockerfile (and we don't build such image)	2023-12-21 14:55:24 +00:00
Arpad Müller	7d6fc3c826	Use pre-generated initdb.tar.zst in test_ingest_real_wal (#6206 ) This implements the TODO mentioned in the test added by #5892.	2023-12-21 14:23:09 +00:00
Abhijeet Patil	61b6c4cf30	Build dockerfile from neon repo (#6195 ) ## Fixing GitHub workflow issue related to build and push images ## Summary of changes Followup of PR#608[move docker file from build repo to neon to solve issue some issues The build started failing because it missed a validation in logic that determines changes in the docker file Also, all the dependent jobs were skipped because of the build and push of the image job. To address the above issue following changes were made - we are adding validation to generate image tag even if it's a merge to repo. - All the dependent jobs won't skip even if the build and push image job is skipped. - We have moved the logic to generate a tag in the sub-workflow. As the tag name was necessary to be passed to the sub-workflow it made sense to abstract that away where it was needed and then store it as an output variable so that downward dependent jobs could access the value. - This made the dependency logic easy and we don't need complex expressions to check the condition on which it will run - An earlier PR was closed that tried solving a similar problem that has some feedback and context before creating this PR https://github.com/neondatabase/neon/pull/6175 ## Checklist before requesting a review - [x] Move the tag generation logic from the main workflow to the sub-workflow of build and push the image - [x] Add a condition to generate an image tag for a non-PR-related run - [x] remove complex if the condition from the job if conditions --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Abhijeet Patil <abhijeet@neon.tech>	2023-12-21 12:46:51 +00:00
Bodobolero	f93d15f781	add comment to run vacuum for clickbench (#6212 ) ## Problem This is a comment only change. To ensure that our benchmarking results are fair we need to have correct stats in catalog. Otherwise optimizer chooses seq scan instead of index only scan for some queries. Added comment to run vacuum after data prep.	2023-12-21 13:34:31 +01:00
Christian Schwarz	5385791ca6	add pageserver component-level benchmark (`pagebench`) (#6174 ) This PR adds a component-level benchmarking utility for pageserver. Its name is `pagebench`. The problem solved by `pagebench` is that we want to put Pageserver under high load. This isn't easily achieved with `pgbench` because it needs to go through a compute, which has signficant performance overhead compared to accessing Pageserver directly. Further, compute has its own performance optimizations (most importantly: caches). Instead of designing a compute-facing workload that defeats those internal optimizations, `pagebench` simply bypasses them by accessing pageserver directly. Supported benchmarks: * getpage@latest_lsn * basebackup * triggering logical size calculation This code has no automated users yet. A performance regression test for getpage@latest_lsn will be added in a later PR. part of https://github.com/neondatabase/neon/issues/5771	2023-12-21 13:07:23 +01:00
Conrad Ludgate	2df3602a4b	Add GC to http connection pool (#6196 ) ## Problem HTTP connection pool will grow without being pruned ## Summary of changes Remove connection clients from pools once idle, or once they exit. Periodically clear pool shards. GC Logic: Each shard contains a hashmap of `Arc<EndpointPool>`s. Each connection stores a `Weak<EndpointPool>`. During a GC sweep, we take a random shard write lock, and check that if any of the `Arc<EndpointPool>`s are unique (using `Arc::get_mut`). - If they are unique, then we check that the endpoint-pool is empty, and sweep if it is. - If they are not unique, then the endpoint-pool is in active use and we don't sweep. - Idle connections will self-clear from the endpoint-pool after 5 minutes. Technically, the uniqueness of the endpoint-pool should be enough to consider it empty, but the connection count check is done for completeness sake.	2023-12-21 12:00:10 +00:00
Arpad Müller	48890d206e	Simplify inject_index_part test function (#6207 ) Instead of manually constructing the directory's path, we can just use the `parent()` function. This is a drive-by improvement from #6206	2023-12-21 12:52:38 +01:00
Arpad Müller	baa1323b4a	Use ProfileFileCredentialsProvider for AWS SDK configuration (#6202 ) Allows usage via `aws sso login --profile=<p>; AWS_PROFILE=<p>`. Now there is no need to manually configure things any more via `SSO_ACCOUNT_ID` and others. Now one can run the tests locally (given Neon employee access to aws): ``` aws sso login --profile dev export ENABLE_REAL_S3_REMOTE_STORAGE=nonempty REMOTE_STORAGE_S3_REGION=eu-central-1 REMOTE_STORAGE_S3_BUCKET=neon-github-public-dev AWS_PROFILE=dev cargo test -p remote_storage -j 1 s3 -- --nocapture ``` Also makes the scrubber use the same region for auth that it does its operations in (not touching the hard coded role name and start_url values here, they are not ideal though).	2023-12-20 22:38:58 +00:00
Joonas Koivunen	48f156b8a2	feat: relative last activity based eviction (#6136 ) Adds a new disk usage based eviction option, EvictionOrder, which selects whether to use the current `AbsoluteAccessed` or this new proposed but not yet tested `RelativeAccessed`. Additionally a fudge factor was noticed while implementing this, which might help sparing smaller tenants at the expense of targeting larger tenants. Cc: #5304 Co-authored-by: Arpad Müller <arpad@neon.tech>	2023-12-20 18:44:19 +00:00
John Spray	ac38d3a88c	remote_storage: don't count 404s as errors (#6201 ) ## Problem Currently a chart of S3 error rate is misleading: it can show errors any time we are attaching a tenant (probing for index_part generation, checking for remote delete marker). Considering 404 successful isn't perfectly elegant, but it enables the error rate to be used a a more meaningful alert signal: it would indicate if we were having auth issues, sending bad requests, getting throttled ,etc. ## Summary of changes Track 404 requests in the AttemptOutcome::Ok bucket instead of the AttemptOutcome::Err bucket.	2023-12-20 17:00:29 +00:00
Arthur Petukhovsky	0f56104a61	Make sk_collect_dumps also possible with teleport (#4739 ) Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2023-12-20 15:06:55 +00:00
John Spray	f260f1565e	pageserver: fixes + test updates for sharding (#6186 ) This is a precursor to: - https://github.com/neondatabase/neon/pull/6185 While that PR contains big changes to neon_local and attachment_service, this PR contains a few unrelated standalone changes generated while working on that branch: - Fix restarting a pageserver when it contains multiple shards for the same tenant - When using location_config api to attach a tenant, create its timelines dir - Update test paths where generations were previously optional to make them always-on: this avoids tests having to spuriously assert that attachment_service is not None in order to make the linter happy. - Add a TenantShardId python implementation for subsequent use in test helpers that will be made shard-aware - Teach scrubber to read across shards when checking for layer existence: this is a refactor to track the list of existent layers at tenant-level rather than locally to each timeline. This is a precursor to testing shard splitting.	2023-12-20 12:26:20 +00:00
Joonas Koivunen	c29df80634	fix(layer): move backoff to spawned task (#5746 ) Move the backoff to spawned task as it can still be useful; make the sleep cancellable.	2023-12-20 10:26:06 +02:00
Em Sharnoff	58dbca6ce3	Bump vm-builder v0.19.0 -> v0.21.0 (#6197 ) Only applicable change was neondatabase/autoscaling#650, reducing the vector scrape interval (inside the VM) from 15 seconds to 1 second.	2023-12-19 23:48:41 +00:00
Arthur Petukhovsky	613906acea	Support custom types in broker (#5761 ) Old methods are unchanged for backwards compatibility. Added `SafekeeperDiscoveryRequest` and `SafekeeperDiscoveryResponse` types to serve as example, and also as a prerequisite for https://github.com/neondatabase/neon/issues/5471	2023-12-19 17:06:43 +00:00
Christian Schwarz	82809d2ec2	fix metric `pageserver_initial_logical_size_start_calculation` (#6191 ) It wasn't being incremented. Fixup of commit `1c88824ed0` Author: Christian Schwarz <christian@neon.tech> Date: Fri Dec 1 12:52:59 2023 +0100 initial logical size calculation: add a bunch of metrics (#5995)	2023-12-19 17:44:49 +01:00
Anastasia Lubennikova	0bd79eb063	Handle role deletion when project has no databases. (#6170 ) There is still default 'postgres' database, that may contain objects owned by the role or some ACLs. We need to reassign objects in this database too. ## Problem If customer deleted all databases and then tries to delete role, that has some non-standard ACLs, `apply_config` operation will stuck because of failing role deletion.	2023-12-19 16:27:47 +00:00
Konstantin Knizhnik	8ff5387da1	eliminate GCC warning for unchecked result of fread (#6167 ) ## Problem GCCproduce warning that bread result is not checked. It doesn't affect program logic, but better live without warnings. ## Summary of changes Check read result. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-12-19 18:17:11 +02:00
Arpad Müller	8b91bbc38e	Update jsonwebtoken to 9 and sct to 0.7.1 (#6189 ) This increases the list of crates that base on `ring` 0.17.	2023-12-19 15:45:17 +00:00
Christian Schwarz	e6bf6952b8	higher resolution histograms for getpage@lsn (#6177 ) part of https://github.com/neondatabase/cloud/issues/7811	2023-12-19 14:46:17 +01:00
Arpad Müller	a2fab34371	Update zstd to 0.13 (#6187 ) This updates the `zstd` crate to 0.13, and `zstd-sys` with it (it contains C so we should always run the newest version of that).	2023-12-19 13:16:53 +00:00
Vadim Kharitonov	c52384752e	Compile `pg_semver` extension (#6184 ) Closes #6183	2023-12-19 15:10:07 +02:00
Bodobolero	73d247c464	Analyze clickbench performance with explain plans and pg_stat_statements (#6161 ) ## Problem To understand differences in performance between neon, aurora and rds we want to collect explain analyze plans and pg_stat_statements for selected benchmarking runs ## Summary of changes Add workflow input options to collect explain and pg_stat_statements for benchmarking workflow Co-authored-by: BodoBolero <bodobolero@gmail.com>	2023-12-19 11:44:25 +00:00
Arseny Sher	b701394d7a	Fix WAL waiting in walproposer for v16. Just preparing cv right before waiting is not enough as we might have already missed the flushptr change & wakeup, so re-checked before sleep. https://neondb.slack.com/archives/C03QLRH7PPD/p1702830965396619?thread_ts=1702756761.836649&cid=C03QLRH7PPD	2023-12-19 15:34:14 +04:00
John Spray	d89af4cf8e	pageserver: downgrade 'connection reset' WAL errors (#6181 ) This squashes a particularly noisy warn-level log that occurs when safekeepers are restarted. Unfortunately the error type from `tonic` doesn't provide a neat way of matching this, so we use a string comparison	2023-12-19 10:38:00 +00:00
Christian Schwarz	6ffbbb2e02	include timeline ids in tenant details response (#6166 ) Part of getpage@lsn benchmark epic: https://github.com/neondatabase/neon/issues/5771 This allows getting the list of tenants and timelines without triggering initial logical size calculation by requesting the timeline details API response, which would skew our results.	2023-12-19 10:32:51 +00:00
Arpad Müller	fbb979d5e3	remote_storage: move shared utilities for S3 and Azure into common module (#6176 ) The PR does two things: * move the util functions present in the remote_storage Azure and S3 test files into a shared one, deduplicating them. * add a `s3_upload_download_works` test as a copy of the Azure test The goal is mainly to fight duplication and make the code a little bit more generic (like removing mentions of s3 and azure from function names). This is a first step towards #6146.	2023-12-19 11:29:50 +01:00
Arpad Müller	a89d6dc76e	Always send a json response for timeline_get_lsn_by_timestamp (#6178 ) As part of the transition laid out in [this](https://github.com/neondatabase/cloud/pull/7553#discussion_r1370473911) comment, don't read the `version` query parameter in `timeline_get_lsn_by_timestamp`, but always return the structured json response. Follow-up of https://github.com/neondatabase/neon/pull/5608	2023-12-19 11:29:16 +01:00
Christian Schwarz	c272c68e5c	RFC: Per-Tenant GetPage@LSN Throttling (#5648 ) Implementation epic: https://github.com/neondatabase/neon/issues/5899	2023-12-19 11:20:56 +01:00
Anna Khanova	6e6e40dd7f	Invalidate credentials on auth failure (#6171 ) ## Problem If the user reset password, cache could receive this information only after `ttl` minutes. ## Summary of changes Invalidate password on auth failure.	2023-12-18 23:24:22 +01:00
Heikki Linnakangas	6939fc3db6	Remove declarations of non-existent global variables and functions FileCacheMonitorMain was removed in commit `b497d0094e`.	2023-12-18 21:05:31 +02:00
Heikki Linnakangas	c4c48cfd63	Clean up #includes - No need to include c.h, port.h or pg_config.h, they are included in postgres.h - No need to include postgres.h in header files. Instead, the assumption in PostgreSQL is that all .c files include postgres.h. - Reorder includes to alphabetical order, and system headers before pgsql headers - Remove bunch of other unnecessary includes that got copy-pasted from one source file to another	2023-12-18 21:05:29 +02:00
Heikki Linnakangas	82215d20b0	Mark some variables 'static' Move initialization of neon_redo_read_buffer_filter. This allows marking it 'static', too.	2023-12-18 21:05:24 +02:00
Sasha Krassovsky	62737f3776	Grant BYPASSRLS and REPLICATION explicitly to neon_superuser roles	2023-12-18 10:54:14 -08:00
Christian Schwarz	1f9a7d1cd0	add a Rust client for Pageserver page_service (#6128 ) Part of getpage@lsn benchmark epic: https://github.com/neondatabase/neon/issues/5771 Stacked atop https://github.com/neondatabase/neon/pull/6145	2023-12-18 18:17:19 +00:00
John Spray	4ea4812ab2	tests: update python dependencies (#6164 ) ## Problem Existing dependencies didn't work on Fedora 39 (python 3.12) ## Summary of changes - Update pyyaml 6.0 -> 6.0.1 - Update yarl 1.8.2->1.9.4 - Update the `dnf install` line in README to include dependencies of python packages (unrelated to upgrades, just noticed absences while doing fresh pysync run)	2023-12-18 15:47:09 +00:00
Anna Khanova	00d90ce76a	Added cache for get role secret (#6165 ) ## Problem Currently if we are getting many consecutive connections to the same user/ep we will send a lot of traffic to the console. ## Summary of changes Cache with ttl=4min proxy_get_role_secret response. Note: this is the temporary hack, notifier listener is WIP.	2023-12-18 16:04:47 +01:00
John Khvatov	33cb9a68f7	pageserver: Reduce tracing overhead in timeline::get (#6115 ) ## Problem Compaction process (specifically the image layer reconstructions part) is lagging behind wal ingest (at speed ~10-15MB/s) for medium-sized tenants (30-50GB). CPU profile shows that significant amount of time (see flamegraph) is being spent in `tracing::span::Span::new`. mainline (commit: `0ba4cae491`): ![reconstruct-mainline-0ba4cae491c2](https://github.com/neondatabase/neon/assets/289788/ebfd262e-5c97-4858-80c7-664a1dbcc59d) ## Summary of changes By lowering the tracing level in get_value_reconstruct_data and get_or_maybe_download from info to debug, we can reduce the overhead of span creation in prod environments. On my system, this sped up the image reconstruction process by 60% (from 14500 to 23160 page reconstruction per sec) pr: ![reconstruct-opt-2](https://github.com/neondatabase/neon/assets/289788/563a159b-8f2f-4300-b0a1-6cd66e7df769) `create_image_layers()` (it's 1 CPU bound here) mainline vs pr: ![image](https://github.com/neondatabase/neon/assets/289788/a981e3cb-6df9-4882-8a94-95e99c35aa83)	2023-12-18 13:33:23 +00:00
Conrad Ludgate	17bde7eda5	proxy refactor large files (#6153 ) ## Problem The `src/proxy.rs` file is far too large ## Summary of changes Creates 3 new files: ``` src/metrics.rs src/proxy/retry.rs src/proxy/connect_compute.rs ```	2023-12-18 10:59:49 +00:00
John Spray	dbdb1d21f2	pageserver: on-demand activation cleanups (#6157 ) ## Problem #6112 added some logs and metrics: clean these up a bit: - Avoid counting startup completions for tenants launched after startup - exclude no-op cases from timing histograms - remove a rogue log messages	2023-12-18 10:29:19 +00:00
Arseny Sher	e1935f42a1	Don't generate core dump when walproposer intentionally panics. Walproposer sometimes intentionally PANICs when its term is defeated as the basebackup is likely spoiled by that time. We don't want core dumped in this case.	2023-12-18 11:03:34 +04:00
Alexander Bayandin	9bdc25f0af	Revert "CI: build build-tools image" (#6156 ) It turns out the issue with skipped jobs is not so trivial (because Github checks jobs transitively), a possible workaround with `if: always() && contains(fromJSON('["success", "skipped"]'), needs.build-buildtools-image.result)` will tangle the workflow really bad. We'll need to come up with a better solution. To unblock the main I'm going to revert https://github.com/neondatabase/neon/pull/6082.	2023-12-16 12:32:00 +00:00
Christian Schwarz	47873470db	pageserver: add method to dump keyspace in mgmt api client (#6145 ) Part of getpage@lsn benchmark epic: https://github.com/neondatabase/neon/issues/5771	2023-12-16 10:52:48 +00:00
Abhijeet Patil	8619e6295a	CI: build build-tools image (#6082 ) ## Currently our build docker file is located in the build repo it makes sense to have it as a part of our neon repo ## Summary of changes We had the docker file that we use to build our binary and other tools resided in the build repo It made sense to bring the docker file to its repo where it has been used So that the contributors can also view it and amend if required It will reduce the maintenance. Docker file changes and code changes can be accommodated in same PR Also, building the image and pushing it to ECR is abstracted in a reusable workflow. Ideal is to use that for any other jobs too ## Checklist before requesting a review - [x] Moved the docker file used to build the binary from the build repo to the neon repo - [x] adding gh workflow to build and push the image - [x] adding gh workflow to tag the pushed image - [x] update readMe file --------- Co-authored-by: Abhijeet Patil <abhijeet@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-12-16 10:33:52 +00:00
Conrad Ludgate	83811491da	update zerocopy (#6148 ) ## Problem https://github.com/neondatabase/neon/security/dependabot/48 ``` $ cargo tree -i zerocopy zerocopy v0.7.3 └── ahash v0.8.5 └── hashbrown v0.13.2 ``` ahash doesn't use the affected APIs we we are not vulnerable but best to update to silence the alert anyway ## Summary of changes ``` $ cargo update -p zerocopy --precise 0.7.31 Updating crates.io index Updating syn v2.0.28 -> v2.0.32 Updating zerocopy v0.7.3 -> v0.7.31 Updating zerocopy-derive v0.7.3 -> v0.7.31 ```	2023-12-16 09:06:00 +00:00
John Spray	d066dad84b	pageserver: prioritize activation of tenants with client requests (#6112 ) ## Problem During startup, a client request might have to wait a long time while the system is busy initializing all the attached tenants, even though most of the attached tenants probably don't have any client requests to service, and could wait a bit. ## Summary of changes - Add a semaphore to limit how many Tenant::spawn()s may concurrently do I/O to attach their tenant (i.e. read indices from remote storage, scan local layer files, etc). - Add Tenant::activate_now, a hook for kicking a tenant in its spawn() method to skip waiting for the warmup semaphore - For tenants that attached via warmup semaphore units, wait for logical size calculation to complete before dropping the warmup units - Set Tenant::activate_now in `get_active_tenant_with_timeout` (the page service's path for getting a reference to a tenant). - Wait for tenant activation in HTTP handlers for timeline creation and deletion: like page service requests, these require an active tenant and should prioritize activation if called.	2023-12-15 20:37:47 +00:00
John Spray	56f7d55ba7	pageserver: basic cancel/timeout for remote storage operations (#6097 ) ## Problem Various places in remote storage were not subject to a timeout (thereby stuck TCP connections could hold things up), and did not respect a cancellation token (so things like timeline deletion or tenant detach would have to wait arbitrarily long). ## Summary of changes - Add download_cancellable and upload_cancellable helpers, and use them in all the places we wait for remote storage operations (with the exception of initdb downloads, where it would not have been safe). - Add a cancellation token arg to `download_retry`. - Use cancellation token args in various places that were missing one per #5066 Closes: #5066 Why is this only "basic" handling? - Doesn't express difference between shutdown and errors in return types, to avoid refactoring all the places that use an anyhow::Error (these should all eventually return a more structured error type) - Implements timeouts on top of remote storage, rather than within it: this means that operations hitting their timeout will lose their semaphore permit and thereby go to the back of the queue for their retry. - Doing a nicer job is tracked in https://github.com/neondatabase/neon/issues/6096	2023-12-15 17:43:02 +00:00
Christian Schwarz	1a9854bfb7	add a Rust client for Pageserver management API (#6127 ) Part of getpage@lsn benchmark epic: https://github.com/neondatabase/neon/issues/5771 This PR moves the control plane's spread-all-over-the-place client for the pageserver management API into a separate module within the pageserver crate. I need that client to be async in my benchmarking work, so, this PR switches to the async version of `reqwest`. That is also the right direction generally IMO. The switch to async in turn mandated converting most of the `control_plane/` code to async. Note that some of the client methods should be taking `TenantShardId` instead of `TenantId`, but, none of the callers seem to be sharding-aware. Leaving that for another time: https://github.com/neondatabase/neon/issues/6154	2023-12-15 18:33:45 +01:00
John Spray	de1a9c6e3b	s3_scrubber: basic support for sharding (#6119 ) This doesn't make the scrubber smart enough to understand that many shards are part of the same tenants, but it makes it understand paths well enough to scrub the individual shards without thinking they're malformed. This is a prerequisite to being able to run tests with sharding enabled. Related: #5929	2023-12-15 15:48:55 +00:00
Arseny Sher	e62569a878	A few comments on rust walproposer build.	2023-12-15 19:31:51 +04:00
John Spray	bd1cb1b217	tests: update allow list for `negative_env` (#6144 ) Tests attaching the tenant immediately after the fixture detaches it could result in LSN updates failing validation e.g. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6142/7211196140/index.html#suites/7745dadbd815ab87f5798aa881796f47/32b12ccc0b01b122	2023-12-15 15:08:28 +00:00
Conrad Ludgate	98629841e0	improve proxy code cov (#6141 ) ## Summary of changes saw some low-hanging codecov improvements. even if code coverage is somewhat of a pointless game, might as well add tests where we can and delete code if it's unused	2023-12-15 12:11:50 +00:00
Arpad Müller	215cdd18c4	Make initdb upload retries cancellable and seek to beginning (#6147 ) * initdb uploads had no cancellation token, which means that when we were stuck in upload retries, we wouldn't be able to delete the timeline. in general, the combination of retrying forever and not having cancellation tokens is quite dangerous. * initdb uploads wouldn't rewind the file. this wasn't discovered in the purposefully unreliable test-s3 in pytest because those fail on the first byte always, not somewhere during the connection. we'd be getting errors from the AWS sdk that the file was at an unexpected end. slack thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1702632247784079	2023-12-15 12:11:25 +00:00
Joonas Koivunen	0fd80484a9	fix: Timeline deletion during busy startup (#6133 ) Compaction was holding back timeline deletion because the compaction lock had been acquired, but the semaphore was waited on. Timeline deletion was waiting on the same lock for 1500s. This replaces the `pageserver::tenant::tasks::concurrent_background_tasks_rate_limit` (which looks correct) with a simpler `..._permit` which is just an infallible acquire, which is easier to spot "aah this needs to be raced with cancellation tokens". Ref: https://neondb.slack.com/archives/C03F5SM1N02/p1702496912904719 Ref: https://neondb.slack.com/archives/C03F5SM1N02/p1702578093497779	2023-12-15 11:59:24 +00:00
Joonas Koivunen	07508fb110	fix: better Json parsing errors (#6135 ) Before any json parsing from the http api only returned errors were per field errors. Now they are done using `serde_path_to_error`, which at least helped greatly with the `disk_usage_eviction_run` used for testing. I don't think this can conflict with anything added in #5310.	2023-12-15 12:18:22 +02:00
Arseny Sher	5bb9ba37cc	Fix python list_segments of sk. Fixes rare test_peer_recovery flakiness as we started to compare tmp control file. https://neondb.slack.com/archives/C04KGFVUWUQ/p1702310929657179	2023-12-15 13:43:11 +04:00
John Spray	f1cd1a2122	pageserver: improved handling of concurrent timeline creations on the same ID (#6139 ) ## Problem Historically, the pageserver used an "uninit mark" file on disk for two purposes: - Track which timeline dirs are incomplete for handling on restart - Avoid trying to create the same timeline twice at the same time. The original purpose of handling restarts is now defunct, as we use remote storage as the source of truth and clean up any trash timeline dirs on startup. Using the file to mutually exclude creation operations is error prone compared with just doing it in memory, and the existing checks happened some way into the creation operation, and could expose errors as 500s (anyhow::Errors) rather than something clean. ## Summary of changes - Creations are now mutually excluded in memory (using `Tenant::timelines_creating`), rather than relying on a file on disk for coordination. - Acquiring unique access to the timeline ID now happens earlier in the request. - Creating the same timeline which already exists is now a 201: this simplifies retry handling for clients. - 409 is still returned if a timeline with the same ID is still being created: if this happens it is probably because the client timed out an earlier request and has retried. - Colliding timeline creation requests should no longer return 500 errors This paves the way to entirely removing uninit markers in a subsequent change. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-12-15 08:51:23 +00:00
Joonas Koivunen	f010479107	feat(layer): pageserver_layer_redownloaded_after histogram (#6132 ) this is aimed at replacing the current mtime only based trashing alerting later. Cc: #5331	2023-12-14 21:32:54 +02:00
Conrad Ludgate	cc633585dc	gauge guards (#6138 ) ## Problem The websockets gauge for active db connections seems to be growing more than the gauge for client connections over websockets, which does not make sense. ## Summary of changes refactor how our counter-pair gauges are represented. not sure if this will improve the problem, but it should be harder to mess-up the counters. The API is much nicer though now and doesn't require scopeguard::defer hacks	2023-12-14 17:21:39 +00:00
Christian Schwarz	aa5581d14f	utils::logging: TracingEventCountLayer: don't use with_label_values() on hot path (#6129 ) fixes #6126	2023-12-14 16:31:41 +01:00
John Spray	c4e0ef507f	pageserver: heatmap uploads (#6050 ) Dependency (commits inline): https://github.com/neondatabase/neon/pull/5842 ## Problem Secondary mode tenants need a manifest of what to download. Ultimately this will be some kind of heat-scored set of layers, but as a robust first step we will simply use the set of resident layers: secondary tenant locations will aim to match the on-disk content of the attached location. ## Summary of changes - Add heatmap types representing the remote structure - Add hooks to Tenant/Timeline for generating these heatmaps - Create a new `HeatmapUploader` type that is external to `Tenant`, and responsible for walking the list of attached tenants and scheduling heatmap uploads. Notes to reviewers: - Putting the logic for uploads (and later, secondary mode downloads) outside of `Tenant` is an opinionated choice, motivated by: - Enable future smarter scheduling of operations, e.g. uploading the stalest tenant first, rather than having all tenants compete for a fair semaphore on a first-come-first-served basis. Similarly for downloads, we may wish to schedule the tenants with the hottest un-downloaded layers first. - Enable accessing upload-related state without synchronization (it belongs to HeatmapUploader, rather than being some Mutex<>'d part of Tenant) - Avoid further expanding the scope of Tenant/Timeline types, which are already among the largest in the codebase - You might reasonably wonder how much of the uploader code could be a generic job manager thing. Probably some of it: but let's defer pulling that out until we have at least two users (perhaps secondary downloads will be the second one) to highlight which bits are really generic. Compromises: - Later, instead of using digests of heatmaps to decide whether anything changed, I would prefer to avoid walking the layers in tenants that don't have changes: tracking that will be a bit invasive, as it needs input from both remote_timeline_client and Layer.	2023-12-14 13:09:24 +00:00
Conrad Ludgate	6987b5c44e	proxy: add more rates to endpoint limiter (#6130 ) ## Problem Single rate bucket is limited in usefulness ## Summary of changes Introduce a secondary bucket allowing an average of 200 requests per second over 1 minute, and a tertiary bucket allowing an average of 100 requests per second over 10 minutes. Configured by using a format like ```sh proxy --endpoint-rps-limit 300@1s --endpoint-rps-limit 100@10s --endpoint-rps-limit 50@1m ``` If the bucket limits are inconsistent, an error is returned on startup ``` $ proxy --endpoint-rps-limit 300@1s --endpoint-rps-limit 10@10s Error: invalid endpoint RPS limits. 10@10s allows fewer requests per bucket than 300@1s (100 vs 300) ```	2023-12-13 21:43:49 +00:00
Alexander Bayandin	0cd49cac84	test_compatibility: make it use initdb.tar.zst	2023-12-13 15:04:25 -06:00
Alexander Bayandin	904dff58b5	test_wal_restore_http: cleanup test	2023-12-13 15:04:25 -06:00
Arthur Petukhovsky	f401a21cf6	Fix test_simple_sync_safekeepers There is a postgres 16 version encoded in a binary message.	2023-12-13 15:04:25 -06:00
Tristan Partin	158adf602e	Update Postgres 16 series to 16.1	2023-12-13 15:04:25 -06:00
Tristan Partin	c94db6adbb	Update Postgres 15 series to 15.5	2023-12-13 15:04:25 -06:00
Tristan Partin	85720616b1	Update Postgres 14 series to 14.10	2023-12-13 15:04:25 -06:00
George MacKerron	d6fcc18eb2	Add Neon-Batch- headers to OPTIONS response for SQL-over-HTTP requests (#6116 ) This is needed to allow use of batch queries from browsers. ## Problem SQL-over-HTTP batch queries fail from web browsers because the relevant headers, `Neon-Batch-isolation-Level` and `Neon-Batch-Read-Only`, are not included in the server's OPTIONS response. I think we simply forgot to add them when implementing the batch query feature. ## Summary of changes Added `Neon-Batch-Isolation-Level` and `Neon-Batch-Read-Only` to the OPTIONS response.	2023-12-13 17:18:20 +00:00
Vadim Kharitonov	c2528ae671	Increase pgbouncer pool size to 64 for VMs (#6124 ) The pool size was changed for pods (https://github.com/neondatabase/cloud/pull/8057). The idea to increase it for VMs too	2023-12-13 16:23:24 +00:00
Joonas Koivunen	a919b863d1	refactor: remove eviction batching (#6060 ) We no longer have `layer_removal_cs` since #5108, we no longer need batching.	2023-12-13 18:05:33 +02:00
Joonas Koivunen	2d22661061	refactor: calculate_synthetic_size_worker, remove PRE::NeedsDownload (#6111 ) Changes I wanted to make on #6106 but decided to leave out to keep that commit clean for including in the #6090. Finally remove `PageReconstructionError::NeedsDownload`.	2023-12-13 14:23:19 +00:00
John Spray	e3778381a8	tests: make test_bulk_insert recreate tenant in same generation (#6113 ) ## Problem Test deletes tenant and recreates with the same ID. The recreation bumps generation number. This could lead to stale generation warnings in the logs. ## Summary of changes Handle this more gracefully by re-creating in the same generation that the tenant was previously attached in. We could also update the tenant delete path to have the attachment service to drop tenant state on delete, but I like having it there: it makes debug easier, and the only time it's a problem is when a test is re-using a tenant ID after deletion. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-12-13 14:14:38 +00:00
Conrad Ludgate	c8316b7a3f	simplify endpoint limiter (#6122 ) ## Problem 1. Using chrono for durations only is wasteful 2. The arc/mutex was not being utilised 3. Locking every shard in the dashmap every GC could cause latency spikes 4. More buckets ## Summary of changes 1. Use `Instant` instead of `NaiveTime`. 2. Remove the `Arc<Mutex<_>>` wrapper, utilising that dashmap entry returns mut access 3. Clear only a random shard, update gc interval accordingly 4. Multiple buckets can be checked before allowing access When I benchmarked the check function, it took on average 811ns when multithreaded over the course of 10 million checks.	2023-12-13 13:53:23 +00:00
Stas Kelvich	8460654f61	Add per-endpoint rate limiter to proxy	2023-12-13 07:03:21 +02:00
Arpad Müller	7c2c87a5ab	Update azure SDK to 0.18 and use open range support (#6103 ) * Update `azure-` crates to 0.18 Use new open ranges support added by upstream in https://github.com/Azure/azure-sdk-for-rust/pull/1482 Part of #5567. Prior update PR: #6081	2023-12-12 18:20:12 +01:00
Arpad Müller	5820faaa87	Use extend instead of groups of append calls in tests (#6109 ) Repeated calls to `.append` don't line up as nicely as they might get formatted in different ways. Also, it is more characters and the lines might be longer. Saw this while working on #5912.	2023-12-12 18:00:37 +01:00
John Spray	dfb0a6fdaf	scrubber: handle initdb files, fix an issue with prefixes (#6079 ) - The code for calculating the prefix in the bucket was expecting a trailing slash (as it is in the tests), but that's an awkward expectation to impose for use in the field: make the code more flexible by only trimming a trailing character if it is indeed a slash. - initdb archives were detected by the scrubber as malformed layer files. Teach it to recognize and ignore them.	2023-12-12 16:53:08 +00:00
Alexander Bayandin	6acbee2368	test_runner: add `from_repo_dir` method (#6087 ) ## Problem We need a reliable way to restore a project state (in this context, I mean data on pageservers, safekeepers, and remote storage) from a snapshot. The existing method (that we use in `test_compatibility`) heavily relies on config files, which makes it harder to add/change fields in the config. The proposed solution uses config file only to get `default_tenant_id` and `branch_name_mappings`. ## Summary of changes - Add `NeonEnvBuilder#from_repo_dir` method, which allows using the `neon_env_builder` fixture with data from a snapshot. - Use `NeonEnvBuilder#from_repo_dir` in compatibility tests Requires for https://github.com/neondatabase/neon/issues/6033	2023-12-12 16:24:13 +00:00
Konstantin Knizhnik	aec1acdbac	Do not inherite replication slots in branch (#5898 ) ## Problem See https://github.com/neondatabase/company_projects/issues/111 https://neondb.slack.com/archives/C03H1K0PGKH/p1700166126954079 ## Summary of changes Do not search for AUX_FILES_KEY in parent timelines ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2023-12-12 14:24:21 +02:00
Konstantin Knizhnik	8bb4a13192	Do not materialize null images in PS (#5979 ) ## Problem PG16 is writing null images during relation extension. And page server implements optimisation which replace WAL record with FPI with page image. So instead of WAL record ~30 bytes we store 8kb null page image. Ans this image is almost useless, because most likely it will be shortly rewritten with actual page content. ## Summary of changes Do not materialize wal records with null page FPI. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-12-12 14:23:45 +02:00
Anna Khanova	9e071e4458	Propagate information about the protocol to console (#6102 ) ## Problem In snowflake logs currently there is no information about the protocol, that the client uses. ## Summary of changes Propagate the information about the protocol together with the app_name. In format: `{app_name}/{sql_over_http/tcp/ws}`. This will give to @stepashka more observability on what our clients are using.	2023-12-12 11:42:51 +00:00
John Spray	fead836f26	swagger: remove 'format: hex' from tenant IDs (#6099 ) ## Problem TenantId is changing to TenantShardId in many APIs. The swagger had `format: hex` attributes on some of these IDs. That isn't formally defined anywhere, but a reasonable person might think it means "hex digits only", which will no longer be the case once we start using shard-aware IDs (they're like `<tenant_id>-0001`). ## Summary of changes - Remove these `format` attributes from all `tenant_id` fields in the swagger definition	2023-12-12 10:39:34 +00:00
John Spray	20e9cf7d31	pageserver: tweaks to slow/hung task logging (#6098 ) ## Problem - `shutdown_tasks` would log when a particular task was taking a long time to shut down, but not when it eventually completed. That left one uncertain as to whether the slow task was the source of a hang, or just a precursor. ## Summary of changes - Add a log line after a slow task shutdown - Add an equivalent in Gate's `warn_if_stuck`, in case we ever need it. This isn't related to the original issue but was noticed when checking through these logging paths.	2023-12-12 07:19:59 +00:00
Joonas Koivunen	3b04f3a749	fix: accidential return Ok (#6106 ) Error indicating request cancellation OR timeline shutdown was deemed as a reason to exit the background worker that calculated synthetic size. Fix it to only be considered for avoiding logging such of such errors.	2023-12-11 21:27:53 +00:00
Arpad Müller	c49fd69bd6	Add initdb_lsn to TimelineInfo (#6104 ) This way, we can query it. Background: I want to do statistics for how reproducible `initdb_lsn` really is, see https://github.com/neondatabase/cloud/issues/8284 and https://neondb.slack.com/archives/C036U0GRMRB/p1701895218280269	2023-12-11 21:08:14 +00:00
Tristan Partin	5ab9592a2d	Add submodule paths as safe directories as a precaution The check-codestyle-rust-arm job requires this for some reason, so let's just add them everywhere we do this workaround.	2023-12-11 13:08:37 -06:00
Tristan Partin	036558c956	Fix git ownership issue in check-codestyle-rust-arm We have this workaround for other jobs. Looks like this one was forgotten about.	2023-12-11 13:08:37 -06:00
John Spray	6a922b1a75	tests: start adding tests for secondary mode, live migration (#5842 ) These tests have been loitering on a branch of mine for a while: they already provide value even without all the secondary mode bits landed yet, and the Workload helper is handy for other tests too. - `Workload` is a re-usable test workload that replaces some of the arbitrary "write a few rows" SQL that I've found my self repeating, and adds a systematic way to append data and check that reads properly reflect the changes. This append+validate stuff is important when doing migrations, as we want to detect situations where we might be reading from a pageserver that has not properly seen latest changes. - test_multi_attach is a validation of how the pageserver handles attaching the same tenant to multiple pageservers, from a safety point of view. This is intentionally separate from the larger testing of migration, to provide an isolated environment for multi-attachment. - test_location_conf_churn is a pseudo-random walk through the various states that TenantSlot can be put into, with validation that attached tenants remain externally readable when they should, and as a side effect validating that the compute endpoint's online configuration changes work as expected. - test_live_migration is the reference implementation of how to drive a pair of pageservers through a zero-downtime migration of a tenant. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-12-11 16:55:43 +00:00
John Spray	f1fc1fd639	pageserver: further refactoring from TenantId to TenantShardId (#6059 ) ## Problem In https://github.com/neondatabase/neon/pull/5957, the most essential types were updated to use TenantShardId rather than TenantId. That unblocked other work, but didn't fully enable running multiple shards from the same tenant on the same pageserver. ## Summary of changes - Use TenantShardId in page cache key for materialized pages - Update mgr.rs get_tenant() and list_tenants() functions to use a shard id, and update all callers. - Eliminate the exactly_one_or_none helper in mgr.rs and all code that used it - Convert timeline HTTP routes to use tenant_shard_id Note on page cache: ``` struct MaterializedPageHashKey { /// Why is this TenantShardId rather than TenantId? /// /// Usually, the materialized value of a page@lsn is identical on any shard in the same tenant. However, this /// this not the case for certain internally-generated pages (e.g. relation sizes). In future, we may make this /// key smaller by omitting the shard, if we ensure that reads to such pages always skip the cache, or are /// special-cased in some other way. tenant_shard_id: TenantShardId, timeline_id: TimelineId, key: Key, } ```	2023-12-11 15:52:33 +00:00
Alexander Bayandin	66a7a226f8	test_runner: use toml instead of formatted strings (#6088 ) ## Problem A bunch of refactorings extracted from https://github.com/neondatabase/neon/pull/6087 (not required for it); the most significant one is using toml instead of formatted strings. ## Summary of changes - Use toml instead of formatted strings for config - Skip pageserver log check if `pageserver.log` doesn't exist - `chmod -x test_runner/regress/test_config.py`	2023-12-11 15:13:27 +00:00
Joonas Koivunen	f0d15cee6f	build: update azure-* to 0.17 (#6081 ) this is a drive-by upgrade while we refresh the access tokens at the same time.	2023-12-11 12:21:02 +01:00
Sasha Krassovsky	0ba4cae491	Fix RLS/REPLICATION granting (#6083 ) ## Problem ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-12-08 12:55:44 -08:00
Andrew Rudenko	df1f8e13c4	proxy: pass neon options in deep object format (#6068 ) --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2023-12-08 19:58:36 +01:00
John Spray	e640bc7dba	tests: allow-lists for occasional failures (#6074 ) test_creating_tenant_conf_after... - Test detaches a tenant and then re-attaches immediatel: this causes a race between pending remote LSN update and the generation bump in the attachment. test_gc_cutoff: - Test rapidly restarts a pageserver before one generation has had the chance to process deletions from the previous generation	2023-12-08 17:32:16 +00:00
Christian Schwarz	cf024de202	virtual_file metrics: expose max size of the fd cache (#6078 ) And also leave a comment on how to determine current size. Kind of follow-up to #6066 refs https://github.com/neondatabase/cloud/issues/8351 refs https://github.com/neondatabase/neon/issues/5479	2023-12-08 17:23:50 +00:00
Conrad Ludgate	e1a564ace2	proxy simplify cancellation (#5916 ) ## Problem The cancellation code was confusing and error prone (as seen before in our memory leaks). ## Summary of changes * Use the new `TaskTracker` primitve instead of JoinSet to gracefully wait for tasks to shutdown. * Updated libs/utils/completion to use `TaskTracker` * Remove `tokio::select` in favour of `futures::future::select` in a specialised `run_until_cancelled()` helper function	2023-12-08 16:21:17 +00:00
Christian Schwarz	f5b9af6ac7	page cache: improve eviction-related metrics (#6077 ) These changes help with identifying thrashing. The existing `pageserver_page_cache_find_victim_iters_total` is already useful, but, it doesn't tell us how many individual find_victim() calls are happening, only how many clock-LRU steps happened in the entire system, without info about whether we needed to actually evict other data vs just scan for a long time, e.g., because the cache is large. The changes in this PR allows us to 1. count each possible outcome separately, esp evictions 2. compute mean iterations/outcome I don't think anyone except me was paying close attention to `pageserver_page_cache_find_victim_iters_total` before, so, I think the slight behavior change of also counting iterations for the 'iters exceeded' case is fine. refs https://github.com/neondatabase/cloud/issues/8351 refs https://github.com/neondatabase/neon/issues/5479	2023-12-08 15:27:21 +00:00
John Spray	5e98855d80	tests: update tests that used local_fs&mock_s3 to use one or the other (#6015 ) ## Problem This was wasting resources: if we run a test with mock s3 we don't then need to run it again with local fs. When we're running in CI, we don't need to run with the mock/local storage as well as real S3. There is some value in having CI notice/spot issues that might otherwise only happen when running locally, but that doesn't justify the cost of running the tests so many more times on every PR. ## Summary of changes - For tests that used available_remote_storages or available_s3_storages, update them to either specify no remote storage (therefore inherit the default, which is currently local fs), or to specify s3_storage() for the tests that actually want an S3 API.	2023-12-08 14:52:37 +00:00
Conrad Ludgate	699049b8f3	proxy: make auth more type safe (#5689 ) ## Problem `a5292f7e67/proxy/src/auth/backend.rs (L146-L148)` `a5292f7e67/proxy/src/console/provider/neon.rs (L90)` `a5292f7e67/proxy/src/console/provider/neon.rs (L154)` ## Summary of changes 1. Test backend is only enabled on `cfg(test)`. 2. Postgres mock backend + MD5 auth keys are only enabled on `cfg(feature = testing)` 3. Password hack and cleartext flow will have their passwords validated before proceeding. 4. Distinguish between ClientCredentials with endpoint and without, removing many panics in the process	2023-12-08 11:48:37 +00:00
John Spray	2c544343e0	pageserver: filtered WAL ingest for sharding (#6024 ) ## Problem Currently, if one creates many shards they will all ingest all the data: not much use! We want them to ingest a proportional share of the data each. Closes: #6025 ## Summary of changes - WalIngest object gets a copy of the ShardIdentity for the Tenant it was created by. - While iterating the `blocks` part of a decoded record, blocks that do not match the current shard are ignored, apart from on shard zero where they are used to update relation sizes in `observe_decoded_block` (but not stored). - Before committing a `DataDirModificiation` from a WAL record, we check if it's empty, and drop the record if so. This check is necessary (rather than just looking at the `blocks` part) because certain record types may modify blocks in non-obvious ways (e.g. `ingest_heapam_record`). - Add WAL ingest metrics to record the total received, total committed, and total filtered out - Behaviour for unsharded tenants is unchanged: they will continue to ingest all blocks, and will take the fast path through `is_key_local` that doesn't bother calculating any hashes. After this change, shards store a subset of the tenant's total data, and accurate relation sizes are only maintained on shard zero. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-12-08 10:12:37 +00:00
Arseny Sher	193e60e2b8	Fix/edit pgindent confusing places in neon.	2023-12-08 14:03:13 +04:00
Arseny Sher	1bbd6cae24	pgindent pgxn/neon	2023-12-08 14:03:13 +04:00
Arseny Sher	65f48c7002	Make targets to run pgindent on core and neon extension.	2023-12-08 14:03:13 +04:00
Alexander Bayandin	d9d8e9afc7	test_tenant_reattach: fix reattach mode names (#6070 ) ## Problem Ref https://neondb.slack.com/archives/C033QLM5P7D/p1701987609146109?thread_ts=1701976393.757279&cid=C033QLM5P7D ## Summary of changes - Make reattach mode names unique for `test_tenant_reattach`	2023-12-08 08:39:45 +00:00
Arpad Müller	7914eaf1e6	Buffer initdb.tar.zst to a temporary file before upload (#5944 ) In https://github.com/neondatabase/neon/pull/5912#pullrequestreview-1749982732 , Christian liked the idea of using files instead of buffering the archive to RAM for the download path. This is for the upload path, which is a very similar situation.	2023-12-08 03:33:44 +01:00
Joonas Koivunen	37fdbc3aaa	fix: use larger buffers for remote storage (#6069 ) Currently using 8kB buffers, raise that to 32kB to hopefully 1/4 of `spawn_blocking` usage. Also a drive-by fixing of last `tokio::io::copy` to `tokio::io::copy_buf`.	2023-12-07 19:36:44 +00:00
Tristan Partin	7aa1e58301	Add support for Python 3.12	2023-12-07 12:30:42 -06:00
Christian Schwarz	f2892d3798	virtual_file metrics: distinguish first and subsequent open() syscalls (#6066 ) This helps with identifying thrashing. I don't love the name, but, there is already "close-by-replace". While reading the code, I also found a case where we waste work in a cache pressure situation: https://github.com/neondatabase/neon/issues/6065 refs https://github.com/neondatabase/cloud/issues/8351	2023-12-07 16:17:33 +00:00
Joonas Koivunen	b492cedf51	fix(remote_storage): buffering, by using streams for upload and download (#5446 ) There is double buffering in remote_storage and in pageserver for 8KiB in using `tokio::io::copy` to read `BufReader<ReaderStream<_>>`. Switches downloads and uploads to use `Stream<Item = std::io::Result<Bytes>>`. Caller and only caller now handles setting up buffering. For reading, `Stream<Item = ...>` is also a `AsyncBufRead`, so when writing to a file, we now have `tokio::io::copy_buf` reading full buffers and writing them to `tokio::io::BufWriter` which handles the buffering before dispatching over to `tokio::fs::File`. Additionally implements streaming uploads for azure. With azure downloads are a bit nicer than before, but not much; instead of one huge vec they just hold on to N allocations we got over the wire. This PR will also make it trivial to switch reading and writing to io-uring based methods. Cc: #5563.	2023-12-07 15:52:22 +00:00
John Spray	880663f6bc	tests: use tenant_create() helper in test_bulk_insert (#6064 ) ## Problem Since #5449 we enable generations in tests by default. Running benchmarks was missed while merging that PR, and there was one that needed updating. ## Summary of changes Make test_bulk_insert use the proper generation-aware helper for tenant creation.	2023-12-07 14:52:16 +00:00
John Spray	e89e41f8ba	tests: update for tenant generations (#5449 ) ## Problem Some existing tests are written in a way that's incompatible with tenant generations. ## Summary of changes Update all the tests that need updating: this is things like calling through the NeonPageserver.tenant_attach helper to get a generation number, instead of calling directly into the pageserver API. There are various more subtle cases.	2023-12-07 12:27:16 +00:00
Conrad Ludgate	f9401fdd31	proxy: fix channel binding error messages (#6054 ) ## Problem For channel binding failed messages we were still saying "channel binding not supported" in the errors. ## Summary of changes Fix error messages	2023-12-07 11:47:16 +00:00
Joonas Koivunen	b7ffe24426	build: update tokio to 1.34.0, tokio-utils 0.7.10 (#6061 ) We should still remember to bump minimum crates for libraries beginning to use task tracker.	2023-12-07 11:31:38 +00:00
Joonas Koivunen	52718bb8ff	fix(layer): metric splitting, span rename (#5902 ) Per [feedback], split the Layer metrics, also finally account for lost and [re-submitted feedback] on `layer_gc` by renaming it to `layer_delete`, `Layer::garbage_collect_on_drop` renamed to `Layer::delete_on_drop`. References to "gc" dropped from metric names and elsewhere. Also fixes how the cancellations were tracked: there was one rare counter. Now there is a top level metric for cancelled inits, and the rare "download failed but failed to communicate" counter is kept. Fixes: #6027 [feedback]: https://github.com/neondatabase/neon/pull/5809#pullrequestreview-1720043251 [re-submitted feedback]: https://github.com/neondatabase/neon/pull/5108#discussion_r1401867311	2023-12-07 11:39:40 +02:00
Joonas Koivunen	10c77cb410	temp: increase the wait tenant activation timeout (#6058 ) 5s is causing way too much noise; this is of course a temporary fix, we should prioritize tenants for which there are pagestream openings the highest, second highest the basebackups. Deployment thread for context: https://neondb.slack.com/archives/C03H1K0PGKH/p1701935048144479?thread_ts=1701765158.926659&cid=C03H1K0PGKH	2023-12-07 09:01:08 +00:00
Heikki Linnakangas	31be301ef3	Make simple_rcu::RcuWaitList::wait() async (#6046 ) The gc_timeline() function is async, but it calls the synchronous wait() function. In the worst case, that could lead to a deadlock by using up all tokio executor threads. In the passing, fix a few typos in comments. Fixes issue #6045. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-12-07 10:20:40 +02:00
Joonas Koivunen	a3c7d400b4	fix: avoid allocations with logging a slug (#6047 ) to_string forces allocating a less than pointer sized string (costing on stack 4 usize), using a Display formattable slug saves that. the difference seems small, but at the same time, we log these a lot.	2023-12-07 07:25:22 +00:00
Vadim Kharitonov	7501ca6efb	Revert timescaledb for pg14 and pg15 (#6056 ) ``` could not start the compute node: compute is in state "failed": db error: ERROR: could not access file "$libdir/timescaledb-2.10.1": No such file or directory Caused by: ERROR: could not access file "$libdir/timescaledb-2.10.1": No such file or directory ```	2023-12-06 15:12:36 +00:00
Christian Schwarz	987c9aaea0	virtual_file: fix the metric for close() calls done by VirtualFile::drop (#6051 ) Before this PR we would inc() the counter for `Close` even though the slot's FD had already been closed. Especially visible when subtracting `open` from `close+close-by-replace` on a system that does a lot of attach and detach. refs https://github.com/neondatabase/cloud/issues/8440 refs https://github.com/neondatabase/cloud/issues/8351	2023-12-06 12:05:28 +00:00
Konstantin Knizhnik	7fab731f65	Track size of FSM fork while applying records at replica (#5901 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1700560921471619 ## Summary of changes Update relation size cache for FSM fork in WAL records filter ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-12-05 18:49:24 +02:00
John Spray	483caa22c6	pageserver: logging tweaks (#6039 ) - The `Attaching tenant` log message omitted some useful information like the generation and mode - info-level messages about writing configuration files were unnecessarily verbose - During process shutdown, we don't emit logs about the various phases: this is very cheap to log since we do it once per process lifetime, and is helpful when figuring out where something got stuck during a hang.	2023-12-05 16:11:15 +00:00
John Spray	da5e03b0d8	pageserver: add a /reset API for tenants (#6014 ) ## Problem Traditionally we would detach/attach directly with curl if we wanted to "reboot" a single tenant. That's kind of inconvenient these days, because one needs to know a generation number to issue an attach request. Closes: https://github.com/neondatabase/neon/issues/6011 ## Summary of changes - Introduce a new `/reset` API, which remembers the LocationConf from the current attachment so that callers do not have to work out the correct configuration/generation to use. - As an additional support tool, allow an optional `drop_cache` query parameter, for situations where we are concerned that some on-disk state might be bad and want to clear that as well as the in-memory state. One might wonder why I didn't call this "reattach" -- it's because there's already a PS->CP API of that name and it could get confusing.	2023-12-05 15:38:27 +00:00
John Spray	be885370f6	pageserver: remove redundant unsafe_create_dir_all (#6040 ) This non-fsyncing analog to our safe directory creation function was just duplicating what tokio's fs::create_dir_all does.	2023-12-05 15:03:07 +00:00
Alexey Kondratov	bc1020f965	compute_ctl: Notify waiters when Postgres failed to start (#6034 ) In case of configuring the empty compute, API handler is waiting on condvar for compute state change. Yet, previously if Postgres failed to start we were just setting compute status to `Failed` without notifying. It causes a timeout on control plane side, although we can return a proper error from compute earlier. With this commit API handler should be properly notified.	2023-12-05 13:38:45 +01:00
John Spray	61fe9d360d	pageserver: add Key->Shard mapping logic & use it in page service (#5980 ) ## Problem When a pageserver receives a page service request identified by TenantId, it must decide which `Tenant` object to route it to. As in earlier PRs, this stuff is all a no-op for tenants with a single shard: calls to `is_key_local` always return true without doing any hashing on a single-shard ShardIdentity. Closes: https://github.com/neondatabase/neon/issues/6026 ## Summary of changes - Carry immutable `ShardIdentity` objects in Tenant and Timeline. These provide the information that Tenants/Timelines need to figure out which shard is responsible for which Key. - Augment `get_active_tenant_with_timeout` to take a `ShardSelector` specifying how the shard should be resolved for this tenant. This mode depends on the kind of request (e.g. basebackups always go to shard zero). - In `handle_get_page_at_lsn_request`, handle the case where the Timeline we looked up at connection time is not the correct shard for the page being requested. This can happen whenever one node holds multiple shards for the same tenant. This is currently written as a "slow path" with the optimistic expectation that usually we'll run with one shard per pageserver, and the Timeline resolved at connection time will be the one serving page requests. There is scope for optimization here later, to avoid doing the full shard lookup for each page. - Omit consumption metrics from nonzero shards: only the 0th shard is responsible for tracing accurate relation sizes. Note to reviewers: - Testing of these changes is happening separately on the `jcsp/sharding-pt1` branch, where we have hacked neon_local etc needed to run a test_pg_regress. - The main caveat to this implementation is that page service connections still look up one Timeline when the connection is opened, before they know which pages are going to be read. If there is one shard per pageserver then this will always also be the Timeline that serves page requests. However, if multiple shards are on one pageserver then get page requests will incur the cost of looking up the correct Timeline on each getpage request. We may look to improve this in future with a "sticky" timeline per connection handler so that subsequent requests for the same Timeline don't have to look up again, and/or by having postgres pass a shard hint when connecting. This is tracked in the "Loose ends" section of https://github.com/neondatabase/neon/issues/5507	2023-12-05 12:01:55 +00:00
Conrad Ludgate	f60e49fe8e	proxy: fix panic in startup packet (#6032 ) ## Problem Panic when less than 8 bytes is presented in a startup packet. ## Summary of changes We need there to be a 4 byte message code, so the expected min length is 8.	2023-12-05 11:24:16 +01:00
Anna Khanova	c48918d329	Rename metric (#6030 ) ## Problem It looks like because of reallocation of the buckets in previous PR, the metric is broken in graphana. ## Summary of changes Renamed the metric.	2023-12-05 10:03:07 +00:00
Sasha Krassovsky	bad686bb71	Remove trusted from wal2json (#6035 ) ## Problem ## Summary of changes	2023-12-04 21:10:23 +00:00
Alexey Kondratov	85d08581ed	[compute_ctl] Introduce feature flags in the compute spec (#6016 ) ## Problem In the past we've rolled out all new `compute_ctl` functionality right to all users, which could be risky. I want to have a more fine-grained control over what we enable, in which env and to which users. ## Summary of changes Add an option to pass a list of feature flags to `compute_ctl`. If not passed, it defaults to an empty list. Any unknown flags are ignored. This allows us to release new experimental features safer, as we can then flip the flag for one specific user, only Neon employees, free / pro / etc. users and so on. Or control it per environment. In the current implementation feature flags are passed via compute spec, so they do not allow controlling behavior of `empty` computes. For them, we can either stick with the previous approach, i.e. add separate cli args or introduce a more generic `--features` cli argument.	2023-12-04 19:54:18 +01:00
Christian Schwarz	c7f1143e57	concurrency-limit low-priority initial logical size calculation [v2] (#6000 ) Problem ------- Before this PR, there was no concurrency limit on initial logical size computations. While logical size computations are lazy in theory, in practice (production), they happen in a short timeframe after restart. This means that on a PS with 20k tenants, we'd have up to 20k concurrent initial logical size calculation requests. This is self-inflicted needless overload. This hasn't been a problem so far because the `.await` points on the logical size calculation path never return `Pending`, hence we have a natural concurrency limit of the number of executor threads. But, as soon as we return `Pending` somewhere in the logical size calculation path, other concurrent tasks get scheduled by tokio. If these other tasks are also logical size calculations, they eventually pound on the same bottleneck. For example, in #5479, we want to switch the VirtualFile descriptor cache to a `tokio::sync::RwLock`, which makes us return `Pending`, and without measures like this patch, after PS restart, VirtualFile descriptor cache thrashes heavily for 2 hours until all the logical size calculations have been computed and the degree of concurrency / concurrent VirtualFile operations is down to regular levels. See the Experiment section below for details. <!-- Experiments (see below) show that plain #5479 causes heavy thrashing of the VirtualFile descriptor cache. The high degree of concurrency is too much for In the case of #5479 the VirtualFile descriptor cache size starts thrashing heavily. --> Background ---------- Before this PR, initial logical size calculation was spawned lazily on first call to `Timeline::get_current_logical_size()`. In practice (prod), the lazy calculation is triggered by `WalReceiverConnectionHandler` if the timeline is active according to storage broker, or by the first iteration of consumption metrics worker after restart (`MetricsCollection`). The spawns by walreceiver are high-priority because logical size is needed by Safekeepers (via walreceiver `PageserverFeedback`) to enforce the project logical size limit. The spawns by metrics collection are not on the user-critical path and hence low-priority. [^consumption_metrics_slo] [^consumption_metrics_slo]: We can't delay metrics collection indefintely because there are TBD internal SLOs tied to metrics collection happening in a timeline manner (https://github.com/neondatabase/cloud/issues/7408). But let's ignore that in this issue. The ratio of walreceiver-initiated spawns vs consumption-metrics-initiated spawns can be reconstructed from logs (`spawning logical size computation from context of task kind {:?}"`). PR #5995 and #6018 adds metrics for this. First investigation of the ratio lead to the discovery that walreceiver spawns 75% of init logical size computations. That's because of two bugs: - In Safekeepers: https://github.com/neondatabase/neon/issues/5993 - In interaction between Pageservers and Safekeepers: https://github.com/neondatabase/neon/issues/5962 The safekeeper bug is likely primarily responsible but we don't have the data yet. The metrics will hopefully provide some insights. When assessing production-readiness of this PR, please assume that neither of these bugs are fixed yet. Changes In This PR ------------------ With this PR, initial logical size calculation is reworked as follows: First, all initial logical size calculation task_mgr tasks are started early, as part of timeline activation, and run a retry loop with long back-off until success. This removes the lazy computation; it was needless complexity because in practice, we compute all logical sizes anyways, because consumption metrics collects it. Second, within the initial logical size calculation task, each attempt queues behind the background loop concurrency limiter semaphore. This fixes the performance issue that we pointed out in the "Problem" section earlier. Third, there is a twist to queuing behind the background loop concurrency limiter semaphore. Logical size is needed by Safekeepers (via walreceiver `PageserverFeedback`) to enforce the project logical size limit. However, we currently do open walreceiver connections even before we have an exact logical size. That's bad, and I'll build on top of this PR to fix that (https://github.com/neondatabase/neon/issues/5963). But, for the purposes of this PR, we don't want to introduce a regression, i.e., we don't want to provide an exact value later than before this PR. The solution is to introduce a priority-boosting mechanism (`GetLogicalSizePriority`), allowing callers of `Timeline::get_current_logical_size` to specify how urgently they need an exact value. The effect of specifying high urgency is that the initial logical size calculation task for the timeline will skip the concurrency limiting semaphore. This should yield effectively the same behavior as we had before this PR with lazy spawning. Last, the priority-boosting mechanism obsoletes the `init_order`'s grace period for initial logical size calculations. It's a separate commit to reduce the churn during review. We can drop that commit if people think it's too much churn, and commit it later once we know this PR here worked as intended. Experiment With #5479 --------------------- I validated this PR combined with #5479 to assess whether we're making forward progress towards asyncification. The setup is an `i3en.3xlarge` instance with 20k tenants, each with one timeline that has 9 layers. All tenants are inactive, i.e., not known to SKs nor storage broker. This means all initial logical size calculations are spawned by consumption metrics `MetricsCollection` task kind. The consumption metrics worker starts requesting logical sizes at low priority immediately after restart. This is achieved by deleting the consumption metrics cache file on disk before starting PS.[^consumption_metrics_cache_file] [^consumption_metrics_cache_file] Consumption metrics worker persists its interval across restarts to achieve persistent reporting intervals across PS restarts; delete the state file on disk to get predictable (and I believe worst-case in terms of concurrency during PS restart) behavior. Before this patch, all of these timelines would all do their initial logical size calculation in parallel, leading to extreme thrashing in page cache and virtual file cache. With this patch, the virtual file cache thrashing is reduced significantly (from 80k `open`-system-calls/second to ~500 `open`-system-calls/second during loading). ### Critique The obvious critique with above experiment is that there's no skipping of the semaphore, i.e., the priority-boosting aspect of this PR is not exercised. If even just 1% of our 20k tenants in the setup were active in SK/storage_broker, then 200 logical size calculations would skip the limiting semaphore immediately after restart and run concurrently. Further critique: given the two bugs wrt timeline inactive vs active state that were mentioned in the Background section, we could have 75% of our 20k tenants being (falsely) active on restart. So... (next section) This Doesn't Make Us Ready For Async VirtualFile ------------------------------------------------ This PR is a step towards asynchronous `VirtualFile`, aka, #5479 or even #4744. But it doesn't yet enable us to ship #5479. The reason is that this PR doesn't limit the amount of high-priority logical size computations. If there are many high-priority logical size calculations requested, we'll fall over like we did if #5479 is applied without this PR. And currently, at very least due to the bugs mentioned in the Background section, we run thousands of high-priority logical size calculations on PS startup in prod. So, at a minimum, we need to fix these bugs. Then we can ship #5479 and #4744, and things will likely be fine under normal operation. But in high-traffic situations, overload problems will still be more likely to happen, e.g., VirtualFile cache descriptor thrashing. The solution candidates for that are orthogonal to this PR though: * global concurrency limiting * per-tenant rate limiting => #5899 * load shedding * scaling bottleneck resources (fd cache size (neondatabase/cloud#8351), page cache size(neondatabase/cloud#8351), spread load across more PSes, etc) Conclusion ---------- Even with the remarks from in the previous section, we should merge this PR because: 1. it's an improvement over the status quo (esp. if the aforementioned bugs wrt timeline active / inactive are fixed) 2. it prepares the way for https://github.com/neondatabase/neon/pull/6010 3. it gets us close to shipping #5479 and #4744	2023-12-04 17:22:26 +00:00
Christian Schwarz	7403d55013	walredo: stderr cleanup & make explicitly cancel safe (#6031 ) # Problem I need walredo to be cancellation-safe for https://github.com/neondatabase/neon/pull/6000#discussion_r1412049728 # Solution We are only `async fn` because of `wait_for(stderr_logger_task_done).await`, added in #5560 . The `stderr_logger_cancel` and `stderr_logger_task_done` were there out of precaution that the stderr logger task might for some reason not stop when the walredo process terminates. That hasn't been a problem in practice. So, simplify things: - remove `stderr_logger_cancel` and the `wait_for(...stderr_logger_task_done...)` - use `tokio::process::ChildStderr` in the stderr logger task - add metrics to track number of running stderr logger tasks so in case I'm wrong here, we can use these metrics to identify the issue (not planning to put them into a dashboard or anything)	2023-12-04 16:06:41 +00:00
Anna Khanova	12f02523a4	Enable dynamic rate limiter (#6029 ) ## Problem Limit the number of open connections between the control plane and proxy. ## Summary of changes Enable dynamic rate limiter in prod. Unfortunately the latency metrics are a bit broken, but from logs I see that on staging for the past 7 days only 2 times latency for acquiring was greater than 1ms (for most of the cases it's insignificant).	2023-12-04 15:00:24 +00:00
Arseny Sher	207c527270	Safekeepers: persist state before timeline deactivation. Without it, sometimes on restart we lose latest remote_consistent_lsn which leads to excessive ps -> sk reconnections. https://github.com/neondatabase/neon/issues/5993	2023-12-04 18:22:36 +04:00
John Khvatov	eae49ff598	Perform L0 compaction before creating new image layers (#5950 ) If there are too many L0 layers before compaction, the compaction process becomes slow because of slow `Timeline::get`. As a result of the slowdown, the pageserver will generate even more L0 layers for the next iteration, further exacerbating the slow performance. Change to perform L0 -> L1 compaction before creating new images. The simple change speeds up compaction time and `Timeline::get` to 5x. `Timeline::get` is faster on top of L1 layers. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-12-04 12:35:09 +00:00
Alexander Bayandin	e6b2f89fec	test_pg_clients: fix test that reads from stdout (#6021 ) ## Problem `test_pg_clients` reads the actual result from a .stdout file, https://github.com/neondatabase/neon/pull/5977 has added a header to such files, so `test_pg_clients` started to fail. ## Summary of changes - Use `capture_stdout` and compare the expected result with the output instead of .stdout file content	2023-12-04 11:18:41 +00:00
John Spray	1d81e70d60	pageserver: tweak logs for index_part loading (#6005 ) ## Problem On pageservers upgraded to enable generations, these INFO level logs were rather frequent. If a tenant timeline hasn't written new layers since the upgrade, it will emit the "No index_part.json*" log every time it starts. ## Summary of changes - Downgrade two log lines from info to debug - Add a tiny unit test that I wrote for sanity-checking that there wasn't something wrong with our Generation-comparing logic when loading index parts.	2023-12-04 09:57:47 +00:00
Anastasia Lubennikova	e3512340c1	Override neon.max_cluster_size for the time of compute_ctl (#5998 ) Temporarily reset neon.max_cluster_size to avoid the possibility of hitting the limit, while we are applying config: creating new extensions, roles, etc...	2023-12-03 15:21:44 +00:00
Christian Schwarz	e43cde7aba	initial logical size: remove CALLS metric from hot path (#6018 ) Only introduced a few hours ago (#5995), I took a look at the numbers from staging and realized that `get_current_logical_size()` is on the walingest hot path: we call it for every `ReplicationMessage::XLogData` that we receive. Since the metric is global, it would be quite a busy cache line. This PR replaces it with a new metric purpose-built for what's most interesting right now.	2023-12-01 22:45:04 +01:00
Alexey Kondratov	c1295bfb3a	[compute_ctl] User correct HTTP code in the /configure errors (#6017 ) It was using `PRECONDITION_FAILED` for errors during `ComputeSpec` to `ParsedSpec` conversion, but this disobeys the OpenAPI spec [1] and correct code should be `BAD_REQUEST` for any spec processing errors. While on it, I also noticed that `compute_ctl` OpenAPI spec has an invalid format and fixed it. [1] `fd81945a60/compute_tools/src/http/openapi_spec.yaml (L119-L120)`	2023-12-01 18:19:55 +01:00
Joonas Koivunen	711425cc47	fix: use create_new instead of create for mutex file (#6012 ) Using create_new makes the uninit marker work as a mutual exclusion primitive. Temporary hopefully.	2023-12-01 18:30:51 +02:00
bojanserafimov	fd81945a60	Use TEST_OUTPUT envvar in pageserver (#5984 )	2023-12-01 09:16:24 -05:00
bojanserafimov	e49c21a3cd	Speed up rel extend (#5983 )	2023-12-01 09:11:41 -05:00
Anastasia Lubennikova	92e7cd40e8	add sql_exporter to vm-image (#5949 ) expose LFC metrics	2023-12-01 13:40:49 +00:00
Alexander Bayandin	7eabfc40ee	test_runner: use separate directory for each rerun (#6004 ) ## Problem While investigating https://github.com/neondatabase/neon/issues/5854, we hypothesised that logs/repo-dir from the initial failure might leak into reruns. Use different directories for each run to avoid such a possibility. ## Summary of changes - make each test rerun use different directories - update `pytest-rerunfailure` plugin from 11.1.2 to 13.0	2023-12-01 13:26:19 +00:00
Christian Schwarz	ce1652990d	logical size: better represent level of accuracy in the type system (#5999 ) I would love to not expose the in-accurate value int he mgmt API at all, and in fact control plane doesn't use it [^1]. But our tests do, and I have no desire to change them at this time. [^1]: https://github.com/neondatabase/cloud/pull/8317	2023-12-01 14:16:29 +01:00
Christian Schwarz	8cd28e1718	logical size calculation: make .current_size() infallible (#5999 ) ... by panicking on overflow; It was made fallible initially due to in-confidence in logical size calculation. However, the error has never happened since I am at Neon. Let's stop worrying about this by converting the overflow check into a panic.	2023-12-01 14:16:29 +01:00
Christian Schwarz	1c88824ed0	initial logical size calculation: add a bunch of metrics (#5995 ) These will help us answer questions such as: - when & at what do calculations get started after PS restart? - how often is the api to get current incrementally-computed logical size called, and does it return Exact vs Approximate? I'd also be interested in a histogram of how much wall clock time size calculations take, but, I don't know good bucket sizes, and, logging it would introduce yet another per-timeline log message during startup; don't think that's worth it just yet. Context - https://neondb.slack.com/archives/C033RQ5SPDH/p1701197668789769 - https://github.com/neondatabase/neon/issues/5962 - https://github.com/neondatabase/neon/issues/5963 - https://github.com/neondatabase/neon/pull/5955 - https://github.com/neondatabase/cloud/issues/7408	2023-12-01 12:52:59 +01:00
Arpad Müller	1ce1c82d78	Clean up local state if index_part.json request gives 404 (#6009 ) If `index_part.json` is (verifiably) not present on remote storage, we should regard the timeline as inexistent. This lets `clean_up_timelines` purge the partial local disk state, which is important in the case of incomplete creations leaving behind state that hinders retries. For incomplete deletions, we also want the timeline's local disk content be gone completely. The PR removes the allowed warnings added by #5390 and #5912, as we now are only supposed to issue info level messages. It also adds a reproducer for #6007, by parametrizing the `test_timeline_init_break_before_checkpoint_recreate` test added by #5390. If one reverts the .rs changes, the "cannot create its uninit mark file" log line occurs once one comments out the failing checks for the local disk state being actually empty. Closes #6007 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-12-01 10:58:06 +00:00
Vadim Kharitonov	f784e59b12	Update timescaledb to 2.13.0 (#5975 ) TimescaleDB has released 2.13.0. This version is compatible with Postgres16	2023-11-30 17:12:52 -06:00
Arpad Müller	b71b8ecfc2	Add existing_initdb_timeline_id param to timeline creation (#5912 ) This PR adds an `existing_initdb_timeline_id` option to timeline creation APIs, taking an optional timeline ID. Follow-up of #5390. If the `existing_initdb_timeline_id` option is specified via the HTTP API, the pageserver downloads the existing initdb archive from the given timeline ID and extracts it, instead of running initdb itself. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-11-30 22:32:04 +01:00
Arpad Müller	3842773546	Correct RFC number for Pageserver WAL DR RFC (#5997 ) When I opened #5248, 27 was an unused RFC number. Since then, two RFCs have been merged, so now 27 is taken. 29 is free though, so move it there.	2023-11-30 21:01:25 +00:00
Conrad Ludgate	f39fca0049	proxy: chore: replace strings with SmolStr (#5786 ) ## Problem no problem ## Summary of changes replaces boxstr with arcstr as it's cheaper to clone. mild perf improvement. probably should look into other smallstring optimsations tbh, they will likely be even better. The longest endpoint name I was able to construct is something like `ep-weathered-wildflower-12345678` which is 32 bytes. Most string optimisations top out at 23 bytes	2023-11-30 20:52:30 +00:00
Joonas Koivunen	b451e75dc6	test: include cmdline in captured output (#5977 ) aiming for faster to understand a bunch of `.stdout` and `.stderr` files, see example echo_1.stdout differences: ``` +# echo foobar abbacd + foobar abbacd ``` it can be disabled and is disabled in this PR for some tests; use `pg_bin.run_capture(..., with_command_header=False)` for that. as a bonus this cleans up the echoed newlines from s3_scrubber output which are also saved to file but echoed to test log. Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-11-30 17:31:03 +00:00
Anna Khanova	3657a3c76e	Proxy fix metrics record (#5996 ) ## Problem Some latency metrics are recorded in inconsistent way. ## Summary of changes Make sure that everything is recorded in seconds.	2023-11-30 16:33:54 +00:00
Joonas Koivunen	eba3bfc57e	test: python needs thread safety as well (#5992 ) we have test cases which launch processes from threads, and they capture output assuming this counter is thread-safe. at least according to my understanding this operation in python requires a lock to be thread-safe.	2023-11-30 15:48:40 +00:00
John Spray	57ae9cd07f	pageserver: add `flush_ms` and document `/location_config` API (#5860 ) - During migration of tenants, it is useful for callers to `/location_conf` to flush a tenant's layers while transitioning to AttachedStale: this optimization reduces the redundant WAL replay work that the tenant's new attached pageserver will have to do. Test coverage for this will come as part of the larger tests for live migration in #5745 #5842 - Flushing is controlled with `flush_ms` query parameter: it is the caller's job to decide how long they want to wait for a flush to complete. If flush is not complete within the time limit, the pageserver proceeds to succeed anyway: flushing is only an optimization. - Add swagger definitions for all this: the location_config API is the primary interface for driving tenant migration as described in docs/rfcs/028-pageserver-migration.md, and will eventually replace the various /attach /detach /load /ignore APIs. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-30 14:22:07 +00:00
Christian Schwarz	3bb1030f5d	walingest: refactor if-cascade on `decoded.xl_rmid` into match statement (#5974 ) refs https://github.com/neondatabase/neon/issues/5962 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-30 14:07:41 +00:00
John Spray	5d3c3636fc	tests: add a log allow list entry in `test_timeline_deletion_with_files_stuck_in_upload_queue` (#5981 ) Test failure seen here: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5860/7032903218/index.html#suites/837740b64a53e769572c4ed7b7a7eeeb/c0f1c79a70a3b9ab ``` E AssertionError: assert not [(302, '2023-11-29T13:23:51.046801Z ERROR request{method=PUT path=/v1/tenant/f6b845de60cb0e92f4426e0d6af1d2ea/timeline/69a8c98004abe71a281cff8642a45274/checkpoint request_id=eca33d8a-7af2-46e7-92ab-c28629feb42c}: Error processing HTTP request: InternalServerError(queue is in state Stopped\n')] ``` This appears to be a legitimate log: the test is issuing checkpoint requests in the background, and deleting (therefore shutting down) a timeline.	2023-11-30 13:44:14 +00:00
Conrad Ludgate	0c87d1866b	proxy: fix wake_compute error prop (#5989 ) ## Problem fixes #5654 - WakeComputeErrors occuring during a connect_to_compute got propagated as IO errors, which get forwarded to the user as "Couldn't connect to compute node" with no helpful message. ## Summary of changes Handle WakeComputeError during ConnectionError properly	2023-11-30 13:43:21 +00:00
Arpad Müller	8ec6033ed8	Pageserver disaster recovery RFC (#5248 ) Enable the pageserver to recover from data corruption events by implementing a feature to re-apply historic WAL records in parallel to the already occurring WAL replay. The feature is outside of the user-visible backup and history story, and only serves as a second-level backup for the case that there is a bug in the pageservers that corrupted the served pages. The RFC proposes the addition of two new features: * recover a broken branch from WAL (downtime is allowed) * a test recovery system to recover random branches to make sure recovery works	2023-11-30 14:30:17 +01:00
Anna Khanova	e12e2681e9	IP allowlist on the proxy side (#5906 ) ## Problem Per-project IP allowlist: https://github.com/neondatabase/cloud/issues/8116 ## Summary of changes Implemented IP filtering on the proxy side. To retrieve ip allowlist for all scenarios, added `get_auth_info` call to the control plane for: * sql-over-http * password_hack * cleartext_hack Added cache with ttl for sql-over-http path This might slow down a bit, consider using redis in the future. --------- Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2023-11-30 13:14:33 +00:00
Joonas Koivunen	1e57ddaabc	fix: flush loop should also keep the gate open (#5987 ) I was expecting this to already be in place, because this should not conflict how we shutdown (0. cancel, 1. shutdown_tasks, 2. close gate).	2023-11-30 14:26:11 +02:00
John Khvatov	3e094e90d7	update aws sdk to 1.0.x (#5976 ) This change will be useful for experimenting with S3 performance. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-30 14:17:58 +02:00
Christian Schwarz	292281c9df	pagectl: add subcommand to rewrite layer file summary (#5933 ) Part of getpage@lsn benchmark epic: https://github.com/neondatabase/neon/issues/5771	2023-11-30 11:34:30 +00:00
Rahul Modpur	50d959fddc	refactor: use serde for TenantConf deserialization Fixes: #5300 (#5310 ) Remove handcrafted TenantConf deserialization code. Use `serde_path_to_error` to include the field which failed parsing. Leaves the duplicated TenantConf in pageserver and models, does not touch PageserverConf handcrafted deserialization. Error change: - before change: "configure option `checkpoint_distance` cannot be negative" - after change: "`checkpoint_distance`: invalid value: integer `-1`, expected u64" Fixes: #5300 Cc: #3682 --------- Signed-off-by: Rahul Modpur <rmodpur2@gmail.com> Co-authored-by: Shany Pozin <shany@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-30 12:47:13 +02:00
Conrad Ludgate	fc77c42c57	proxy: add flag to enable http pool for all users (#5959 ) ## Problem #5123 ## Summary of changes Add `--sql-over-http-pool-opt-in true` default cli arg. Allows us to set `--sql-over-http-pool-opt-in false` region-by-region	2023-11-30 10:19:30 +00:00
Conrad Ludgate	f05d1b598a	proxy: add more db error info (#5951 ) ## Problem https://github.com/neondatabase/serverless/issues/51 ## Summary of changes include more error fields in the json response	2023-11-30 10:18:59 +00:00
Christian Schwarz	ca597206b8	walredo: latency histogram for spawn duration (#5925 ) fixes https://github.com/neondatabase/neon/issues/5891	2023-11-29 18:44:37 +00:00
Rahul Modpur	46f20faa0d	neon_local: fix endpoint api to prevent two primary endpoints (#5520 ) `neon_local endpoint` subcommand currently allows creating two primary endpoints for the same branch which leads to shutdown of both endpoints `neon_local endpoint start` new behavior: 1. Fail if endpoint doesn't exist 2. Fail if two primary conflict detected Fixes #4959 Closes #5426 Signed-off-by: Rahul Modpur <rmodpur2@gmail.com> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-29 19:38:03 +02:00
John Spray	9e55ad4796	pageserver: refactor TenantId to TenantShardId in Tenant & Timeline (#5957 ) (includes two preparatory commits from https://github.com/neondatabase/neon/pull/5960) ## Problem To accommodate multiple shards in the same tenant on the same pageserver, we must include the full TenantShardId in local paths. That means that all code touching local storage needs to see the TenantShardId. ## Summary of changes - Replace `tenant_id: TenantId` with `tenant_shard_id: TenantShardId` on Tenant, Timeline and RemoteTimelineClient. - Use TenantShardId in helpers for building local paths. - Update all the relevant call sites. This doesn't update absolutely everything: things like PageCache, TaskMgr, WalRedo are still shard-naive. The purpose of this PR is to update the core types so that others code can be added/updated incrementally without churning the most central shared types.	2023-11-29 14:52:35 +00:00
John Spray	70b5646fba	pageserver: remove redundant serialization helpers on DeletionList (#5960 ) Precursor for https://github.com/neondatabase/neon/pull/5957 ## Problem When DeletionList was written, TenantId/TimelineId didn't have human-friendly modes in their serde. #5335 added those, such that the helpers used in serialization of HashMaps are no longer necessary. ## Summary of changes - Add a unit test to ensure that this change isn't changing anything about the serialized form - Remove the serialization helpers for maps of Id	2023-11-29 10:39:12 +00:00
Konstantin Knizhnik	64890594a5	Optimize storing of null page in WAL (#5910 ) ## Problem PG16 (https://github.com/neondatabase/postgres/pull/327) adds new function to SMGR: zeroextend It's implementation in Neon actually wal-log zero pages of extended relation. This zero page is wal-logged using XLOG_FPI. As far as page is zero, the hole optimization (excluding from the image everything between pg_upper and pd_lower) doesn't work. ## Summary of changes In case of zero page (`PageIsNull()` returns true) assume `hole_size=BLCKSZ` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-11-29 12:08:20 +02:00
Arseny Sher	78e73b20e1	Notify safekeeper readiness with systemd. To avoid downtime during deploy, as in busy regions initial load can currently take ~30s.	2023-11-29 14:07:06 +04:00
John Spray	c48cc020bd	pageserver: fix race between deletion completion and incoming requests (#5941 ) ## Problem This is a narrow race that can leave a stuck Stopping tenant behind, while emitting a log error "Missing InProgress marker during tenant upsert, this is a bug" - Deletion request 1 puts tenant into Stopping state, and fires off background part of DeleteTenantFlow - Deletion request 2 acquires a SlotGuard for the same tenant ID, leaves a TenantSlot::InProgress in place while it checks if the tenant's state is accept able. - DeleteTenantFlow finishes, calls TenantsMap::remove, which removes the InProgress marker. - Deletion request 2 calls SlotGuard::revert, which upserts the old value (the Tenant in Stopping state), and emits the telltale log message. Closes: #5936 ## Summary of changes - Add a regression test which uses pausable failpoints to reproduce this scenario. - TenantsMap::remove is only called by DeleteTenantFlow. Its behavior is tweaked to express the different possible states, especially `InProgress` which carriers a barrier. - In DeleteTenantFlow, if we see such a barrier result from remove(), wait for the barrier and then try removing again. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-29 09:32:26 +00:00
dependabot[bot]	a15969714c	build(deps): bump openssl from 0.10.57 to 0.10.60 in /test_runner/pg_clients/rust/tokio-postgres (#5966 ) Bumps [openssl](https://github.com/sfackler/rust-openssl) from 0.10.57 to 0.10.60. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-11-29 02:17:15 +01:00
dependabot[bot]	8c195d8214	build(deps): bump cryptography from 41.0.4 to 41.0.6 (#5970 ) Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.4 to 41.0.6. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-11-29 02:16:35 +01:00
dependabot[bot]	0d16874960	build(deps): bump openssl from 0.10.55 to 0.10.60 (#5965 ) Bumps [openssl](https://github.com/sfackler/rust-openssl) from 0.10.55 to 0.10.60. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-11-29 01:24:02 +01:00
Alexander Bayandin	fd440e7d79	neonvm: add pgbouncer patch to support DEALLOCATD/DISCARD ALL (#5958 ) pgbouncer 1.21.0 doesn't play nicely with DEALLOCATD/DISCARD ALL if prepared statement support is enabled (max_prepared_statements > 0). There's a patch[0] that improves this (it will be included in the next release of pgbouncer). This PR applies this patch on top of 1.21.0 release tarball. For some reason, the tarball doesn't include `test/test_prepared.py` (which is modified by the patch as well), so the patch can't be applied clearly. I use `filterdiff` (from `patchutils` package) to apply the required changes. [0] `a7b3c0a5f4`	2023-11-28 23:43:24 +00:00
bojanserafimov	65160650da	Add walingest test (#5892 )	2023-11-28 12:50:53 -05:00
dependabot[bot]	12dd6b61df	build(deps): bump aiohttp from 3.8.6 to 3.9.0 (#5946 )	2023-11-28 17:47:15 +00:00
bojanserafimov	5345c1c21b	perf readme fix (#5956 )	2023-11-28 17:31:42 +00:00
Joonas Koivunen	105edc265c	fix: remove layer_removal_cs (#5108 ) Quest: https://github.com/neondatabase/neon/issues/4745. Follow-up to #4938. - add in locks for compaction and gc, so we don't have multiple executions at the same time in tests - remove layer_removal_cs - remove waiting for uploads in eviction/gc/compaction - #4938 will keep the file resident until upload completes Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-11-28 19:15:21 +02:00
Shany Pozin	8625466144	Move run_initdb to be async and guarded by max of 8 running tasks. Fixes #5895 . Use tenant.cancel for cancellation (#5921 ) ## Problem https://github.com/neondatabase/neon/issues/5895	2023-11-28 14:49:31 +00:00
John Spray	1ab0cfc8cb	pageserver: add sharding metadata to `LocationConf` (#5932 ) ## Problem The TenantShardId in API URLs is sufficient to uniquely identify a tenant shard, but not for it to function: it also needs to know its full sharding configuration (stripe size, layout version) in order to map keys to shards. ## Summary of changes - Introduce ShardIdentity: this is the superset of ShardIndex (#5924 ) that is required for translating keys to shard numbers. - Include ShardIdentity as an optional attribute of LocationConf - Extend the public `LocationConfig` API structure with a flat representation of shard attributes. The net result is that at the point we construct a `Tenant`, we have a `ShardIdentity` (inside LocationConf). This enables the next steps to actually use the ShardIdentity to split WAL and validate that page service requires are reaching the correct shard.	2023-11-28 13:14:51 +00:00
John Spray	ca469be1cf	pageserver: add shard indices to layer metadata (#5928 ) ## Problem For sharded tenants, the layer keys must include the shard number and shard count, to disambiguate keys written by different shards in the same tenant (shard number), and disambiguate layers written before and after splits (shard count). Closes: https://github.com/neondatabase/neon/issues/5924 ## Summary of changes There are no functional changes in this PR: everything behaves the same for the default ShardIndex::unsharded() value. Actual construct of sharded tenants will come next. - Add a ShardIndex type: this is just a wrapper for a ShardCount and ShardNumber. This is a subset of ShardIdentity: whereas ShardIdentity contains enough information to filter page keys, ShardIndex contains just enough information to construct a remote key. ShardIndex has a compact encoding, the same as the shard part of TenantShardId. - Store the ShardIndex as part of IndexLayerMetadata, if it is set to a different value than ShardIndex::unsharded. - Update RemoteTimelineClient and DeletionQueue to construct paths using the layer metadata. Deletion code paths that previously just passed a `Generation` now pass a full `LayerFileMetadata` to capture the shard as well. Notes to reviewers: - In deletion code paths, I could have used a (Generation, ShardIndex) instead of the full LayerFileMetadata. I opted for the full object partly for brevity, and partly because in future when we add checksums the deletion code really will care about the full metadata in order to validate that it is deleting what was intended. - While ShardIdentity and TenantShardId could both use a ShardIndex, I find that they read more cleanly as "flat" structs that spell out the shard count and number field separately. Serialization code would need writing out by hand anyway, because TenantShardId's serialized form is not a serde struct-style serialization. - ShardIndex doesn't _have_ to exist (we could use ShardIdentity everywhere), but it is a worthwhile optimization, as we will have many copies of this as part of layer metadata. In future the size difference betweedn ShardIndex and ShardIdentity may become larger if we implement more sophisticated key distribution mechanisms (i.e. new values of ShardIdentity::layout). --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-11-28 11:47:25 +00:00
Christian Schwarz	286f34dfce	test suite: add method for generation-aware detachment of a tenant (#5939 ) Part of getpage@lsn benchmark epic: https://github.com/neondatabase/neon/issues/5771	2023-11-28 09:51:37 +00:00
Sasha Krassovsky	f290b27378	Fix check for if shmem is valid to take into account detached shmem (#5937 ) ## Problem We can segfault if we update connstr inside of a process that has detached from shmem (e.g. inside stats collector) ## Summary of changes Add a check to make sure we're not detached	2023-11-28 03:14:42 +00:00
Sasha Krassovsky	4cd18fcebd	Compile wal2json (#5893 ) Add wal2json extension	2023-11-27 18:17:26 -08:00
Anastasia Lubennikova	4c29e0594e	Update neon extension relocatable for existing installations (#5943 )	2023-11-27 23:29:24 +00:00
Anastasia Lubennikova	3c56a4dd18	Make neon extension relocatable to allow SET SCHEMA (#5942 )	2023-11-27 21:45:41 +00:00
Conrad Ludgate	316309c85b	channel binding (#5683 ) ## Problem channel binding protects scram from sophisticated MITM attacks where the attacker is able to produce 'valid' TLS certificates. ## Summary of changes get the tls-server-end-point channel binding, and verify it is correct for the SCRAM-SHA-256-PLUS authentication flow	2023-11-27 21:45:15 +00:00
Arpad Müller	e09bb9974c	bootstrap_timeline: rename initdb_path to pgdata_path (#5931 ) This is a rename without functional changes, in preparation for #5912. Split off from #5912 as per review request.	2023-11-27 20:14:39 +00:00
Anastasia Lubennikova	5289f341ce	Use test specific directory in test_remote_extensions (#5938 )	2023-11-27 18:57:58 +00:00
Joonas Koivunen	683ec2417c	deflake: test_live_reconfig_get_evictions_low_residence_... (#5926 ) - disable extra tenant - disable compaction which could try to repartition while we assert Split from #5108.	2023-11-27 15:20:54 +02:00
Christian Schwarz	a76a503b8b	remove confusing no-op .take() of init_tenant_load_remote (#5923 ) The `Tenant::spawn()` method already `.take()`s it. I think this was an oversight in https://github.com/neondatabase/neon/pull/5580 .	2023-11-27 12:50:19 +00:00
Anastasia Lubennikova	92bc2bb132	Refactor remote extensions feature to request extensions from proxy (#5836 ) instead of direct S3 request. Pros: - simplify code a lot (no need to provide AWS credentials and paths); - reduce latency of downloading extension data as proxy resides near computes; -reduce AWS costs as proxy has cache and 1000 computes asking the same extension will not generate 1000 downloads from S3. - we can use only one S3 bucket to store extensions (and rid of regional buckets which were introduced to reduce latency); Changes: - deprecate remote-ext-config compute_ctl parameter, use http://pg-ext-s3-gateway if any old format remote-ext-cofig is provided; - refactor tests to use mock http server;	2023-11-27 12:10:23 +00:00
John Spray	b80b9e1c4c	pageserver: remove defunct local timeline delete markers (#5699 ) ## Problem Historically, we treated the presence of a timeline on local disk as evidence that it logically exists. Since #5580 that is no longer the case, so we can always rely on remote storage. If we restart and the timeline is gone in remote storage, we will also purge it from local disk: no need for a marker. Reference on why this PR is for timeline markers and not tenant markers: https://github.com/neondatabase/neon/issues/5080#issuecomment-1783187807 ## Summary of changes Remove code paths that read + write deletion marker for timelines. Leave code path that deletes these markers, just in case we deploy while there are some in existence. This can be cleaned up later. (https://github.com/neondatabase/neon/issues/5718)	2023-11-27 09:31:20 +00:00
Anastasia Lubennikova	87b8ac3ec3	Only create neon extension in postgres database; (#5918 ) Create neon extension in neon schema.	2023-11-26 08:37:01 +00:00
Joonas Koivunen	6b1c4cc983	fix: long timeline create cancelled by tenant delete (#5917 ) Fix the fallible vs. infallible check order with `UninitTimeline::finish_creation` so that the incomplete timeline can be removed. Currently the order of drop guard unwrapping causes uninit files to be left on pageserver, blocking the tenant deletion. Cc: #5914 Cc: #investigation-2023-11-23-stuck-tenant-deletion	2023-11-24 16:17:56 +00:00
Joonas Koivunen	831fad46d5	tests: fix allowed_error for compaction detecting a shutdown (#5919 ) This has been causing flaky tests, [example evidence]. Follow-up to #5883 where I forgot to fix this. [example evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5917/6981540065/index.html#suites/9d2450a537238135fd4007859e09aca7/6fd3556a879fa3d1	2023-11-24 16:14:32 +00:00
Joonas Koivunen	53851ea8ec	fix: log cancelled request handler errors (#5915 ) noticed during [investigation] with @problame a major point of lost error logging which would had sped up the investigation. Cc: #5815 [investigation]: https://neondb.slack.com/archives/C066ZFAJU85/p1700751858049319	2023-11-24 15:54:06 +02:00
Joonas Koivunen	044375732a	test: support validating allowed_errors against a logfile (#5905 ) this will make it easier to test if an added allowed_error does in fact match for example against a log file from an allure report. ``` $ python3 test_runner/fixtures/pageserver/allowed_errors.py --help usage: allowed_errors.py [-h] [-i INPUT] check input against pageserver global allowed_errors optional arguments: -h, --help show this help message and exit -i INPUT, --input INPUT Pageserver logs file. Reads from stdin if no file is provided. ``` Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-11-24 12:43:25 +00:00
Konstantin Knizhnik	ea63b43009	Check if LFC was intialized in local_cache_pages function (#5911 ) ## Problem There is not check that LFC is initialised (`lfc_max_size != 0`) in `local_cache_pages` function ## Summary of changes Add proper check. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-11-24 08:23:00 +02:00
Conrad Ludgate	a56fd45f56	proxy: fix memory leak again (#5909 ) ## Problem The connections.join_next helped but it wasn't enough... The way I implemented the improvement before was still faulty but it mostly worked so it looked like it was working correctly. From [`tokio::select` docs](https://docs.rs/tokio/latest/tokio/macro.select.html): > 4. Once an <async expression> returns a value, attempt to apply the value to the provided <pattern>, if the pattern matches, evaluate <handler> and return. If the pattern does not match, disable the current branch and for the remainder of the current call to select!. Continue from step 3. The `connections.join_next()` future would complete and `Some(Err(e))` branch would be evaluated but not match (as the future would complete without panicking, we would hope). Since the branch doesn't match, it's disabled. The select continues but never attempts to call `join_next` again. Getting unlucky, more TCP connections are created than we attempt to join_next. ## Summary of changes Replace the `Some(Err(e))` pattern with `Some(e)`. Because of the auto-disabling feature, we don't need the `if !connections.is_empty()` step as the `None` pattern will disable it for us.	2023-11-23 19:11:24 +00:00
Anastasia Lubennikova	582a42762b	update extension version in test_neon_extension	2023-11-23 18:53:03 +00:00
Anastasia Lubennikova	f5dfa6f140	Create extension neon in existing databases too	2023-11-23 18:53:03 +00:00
Anastasia Lubennikova	f8d9bd8d14	Add extension neon to all databases. - Run CREATE EXTENSION neon for template1, so that it was created in all databases. - Run ALTER EXTENSION neon in all databases, to always have the newest version of the extension in computes. - Add test_neon_extension test	2023-11-23 18:53:03 +00:00
Anastasia Lubennikova	04e6c09f14	Add pgxn/neon/README.md	2023-11-23 18:53:03 +00:00
Arpad Müller	54327bbeec	Upload initdb results to S3 (#5390 ) ## Problem See #2592 ## Summary of changes Compresses the results of initdb into a .tar.zst file and uploads them to S3, to enable usage in recovery from lsn. Generations should not be involved I think because we do this only once at the very beginning of a timeline. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-23 18:11:52 +00:00
Shany Pozin	35f243e787	Move weekly release PR trigger to Monday morning (#5908 )	2023-11-23 19:09:34 +02:00
Shany Pozin	b7a988ba46	Support cancellation for find_lsn_for_timestamp API (#5904 ) ## Problem #5900 ## Summary of changes Added cancellation token as param in all relevant code paths and actually used it in the find_lsn_for_timestamp main loop	2023-11-23 17:08:32 +02:00
Christian Schwarz	a0e61145c8	fix: cleanup of layers from the future can race with their re-creation (#5890 ) fixes https://github.com/neondatabase/neon/issues/5878 obsoletes https://github.com/neondatabase/neon/issues/5879 Before this PR, it could happen that `load_layer_map` schedules removal of the future image layer. Then a later compaction run could re-create the same image layer, scheduling a PUT. Due to lack of an upload queue barrier, the PUT and DELETE could be re-ordered. The result was IndexPart referencing a non-existent object. ## Summary of changes * Add support to `pagectl` / Python tests to decode `IndexPart` * Rust * new `pagectl` Subcommand * `IndexPart::{from,to}_s3_bytes()` methods to internalize knowledge about encoding of `IndexPart` * Python * new `NeonCli` subclass * Add regression test * Rust * Ability to force repartitioning; required to ensure image layer creation at last_record_lsn * Python * The regression test. * Fix the issue * Insert an `UploadOp::Barrier` after scheduling the deletions.	2023-11-23 13:33:41 +00:00
Konstantin Knizhnik	6afbadc90e	LFC fixes + statistics (#5727 ) ## Problem ## Summary of changes See #5500 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-11-23 08:59:19 +02:00
Anastasia Lubennikova	2a12e9c46b	Add documentation for our sample pre-commit hook (#5868 )	2023-11-22 12:04:36 +00:00
Christian Schwarz	9e3c07611c	logging: support output to stderr (#5896 ) (part of the getpage benchmarking epic #5771) The plan is to make the benchmarking tool log on stderr and emit results as JSON on stdout. That way, the test suite can simply take captures stdout and json.loads() it, while interactive users of the benchmarking tool have a reasonable experience as well. Existing logging users continue to print to stdout, so, this change should be a no-op functionally and performance-wise.	2023-11-22 11:08:35 +00:00
Christian Schwarz	d353fa1998	refer to our rust-postgres.git fork by branch name (#5894 ) This way, `cargo update -p tokio-postgres` just works. The `Cargo.toml` communicates more clearly that we're referring to the `main` branch. And the git revision is still pinned in `Cargo.lock`.	2023-11-22 10:58:27 +00:00
Joonas Koivunen	0d10992e46	Cleanup compact_level0_phase1 fsyncing (#5852 ) While reviewing code noticed a scary `layer_paths.pop().unwrap()` then realized this should be further asyncified, something I forgot to do when I switched the `compact_level0_phase1` back to async in #4938. This keeps the double-fsync for new deltas as #4749 is still unsolved.	2023-11-21 15:30:40 +02:00
Arpad Müller	3e131bb3d7	Update Rust to 1.74.0 (#5873 ) [Release notes](https://github.com/rust-lang/rust/releases/tag/1.74.0).	2023-11-21 11:41:41 +01:00
Sasha Krassovsky	81b2cefe10	Disallow CREATE DATABASE WITH OWNER neon_superuser (#5887 ) ## Problem Currently, control plane doesn't know about neon_superuser, so if a user creates a database with owner neon_superuser it causes an exception when it tries to forward it. It is also currently possible to ALTER ROLE neon_superuser. ## Summary of changes Disallow creating database with owner neon_superuser. This is probably fine, since I don't think you can create a database with owner normal superuser. Also forbids altering neon_superuser	2023-11-20 22:39:47 +00:00
Christian Schwarz	d2ca410919	build: back to opt-level=0 in debug builds, for faster compile times (#5751 ) This change brings down incremental compilation for me from > 1min to 10s (and this is a pretty old Ryzen 1700X). More details: "incremental compilation" here means to change one character in the `failed to read value from offset` string in `image_layer.rs`. The command for incremental compilation is `cargo build_testing`. The system on which I got these numbers uses `mold` via `~/.cargo/config.toml`. As a bonus, `rust-gdb` is now at least a little fun again. Some tests are timing out in debug builds due to these changes. This PR makes them skip for debug builds. We run both with debug and release build, so, the loss of coverage is marginal. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-11-20 15:41:37 +01:00
Joonas Koivunen	d98ac04136	chore(background_tasks): missed allowed_error change, logging change (#5883 ) - I am always confused by the log for the error wait time, now it will be `2s` or `2.0s` not `2.0` - fix missed string change introduced in #5881 [evidence] [evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/main/6921062837/index.html#suites/f9eba3cfdb71aa6e2b54f6466222829b/87897fe1ddee3825	2023-11-20 07:33:17 +00:00
Joonas Koivunen	ac08072d2e	fix(layer): VirtualFile opening and read errors can be caused by contention (#5880 ) A very low number of layer loads have been marked wrongly as permanent, as I did not remember that `VirtualFile::open` or reading could fail transiently for contention. Return separate errors for transient and persistent errors from `{Delta,Image}LayerInner::load`. Includes drive-by comment changes. The implementation looks quite ugly because having the same type be both the inner (operation error) and outer (critical error), but with the alternatives I tried I did not find a better way.	2023-11-19 14:57:39 +00:00
John Spray	d22dce2e31	pageserver: shut down idle walredo processes (#5877 ) The longer a pageserver runs, the more walredo processes it accumulates from tenants that are touched intermittently (e.g. by availability checks). This can lead to getting OOM killed. Changes: - Add an Instant recording the last use of the walredo process for a tenant - After compaction iteration in the background task, check for idleness and stop the walredo process if idle for more than 10x compaction period. Cc: #3620 Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Shany Pozin <shany@neon.tech>	2023-11-19 14:21:16 +00:00
Joonas Koivunen	3b3f040be3	fix(background_tasks): first backoff, compaction error stacktraces (#5881 ) First compaction/gc error backoff starts from 0 which is less than 2s what it was before #5672. This is now fixed to be the intended 2**n. Additionally noticed the `compaction_iteration` creating an `anyhow::Error` via `into()` always captures a stacktrace even if we had a stacktraceful anyhow error within the CompactionError because there is no stable api for querying that.	2023-11-19 14:16:31 +00:00
Em Sharnoff	cad0dca4b8	compute_ctl: Remove deprecated flag `--file-cache-on-disk` (#5622 ) See neondatabase/cloud#7516 for more.	2023-11-18 12:43:54 +01:00
Em Sharnoff	5d13a2e426	Improve error message when neon.max_cluster_size reached (#4173 ) Changes the error message encountered when the `neon.max_cluster_size` limit is reached. Reasoning is that this is user-visible, and so should probably use language that's closer to what users are familiar with.	2023-11-16 21:51:26 +00:00
khanova	0c243faf96	Proxy log pid hack (#5869 ) ## Problem Improve observability for the compute node. ## Summary of changes Log pid from the compute node. Doesn't work with pgbouncer.	2023-11-16 20:46:23 +00:00
Em Sharnoff	d0a842a509	Update vm-builder to v0.19.0 and move its customization here (#5783 ) ref neondatabase/autoscaling#600 for more	2023-11-16 18:17:42 +01:00
khanova	6b82f22ada	Collect number of connections by sni type (#5867 ) ## Problem We don't know the number of users with the different kind of authentication: ["sni", "endpoint in options" (A and B from [here](https://neon.tech/docs/connect/connection-errors)), "password_hack"] ## Summary of changes Collect metrics by sni kind.	2023-11-16 12:19:13 +00:00
John Spray	ab631e6792	pageserver: make TenantsMap shard-aware (#5819 ) ## Problem When using TenantId as the key, we are unable to handle multiple tenant shards attached to the same pageserver for the same tenant ID. This is an expected scenario if we have e.g. 8 shards and 5 pageservers. ## Summary of changes - TenantsMap is now a BTreeMap instead of a HashMap: this enables looking up by range. In future, we will need this for page_service, as incoming requests will just specify the Key, and we'll have to figure out which shard to route it to. - A new key type TenantShardId is introduced, to act as the key in TenantsMap, and as the id type in external APIs. Its human readable serialization is backward compatible with TenantId, and also forward-compatible as long as sharding is not actually used (when we construct a TenantShardId with ShardCount(0), it serializes to an old-fashioned TenantId). - Essential tenant APIs are updated to accept TenantShardIds: tenant/timeline create, tenant delete, and /location_conf. These are the APIs that will enable driving sharded tenants. Other apis like /attach /detach /load /ignore will not work with sharding: those will soon be deprecated and replaced with /location_conf as part of the live migration work. Closes: #5787	2023-11-15 23:20:21 +02:00
Alexander Bayandin	f84ac2b98d	Fix baseline commit and branch for code coverage (#5769 ) ## Problem `HEAD` commit for a PR is a phantom merge commit which skews the baseline commit for coverage reports. See https://github.com/neondatabase/neon/pull/5751#issuecomment-1790717867 ## Summary of changes - Use commit hash instead of `HEAD` for finding baseline commits for code coverage - Use the base branch for PRs or the current branch for pushes	2023-11-15 12:40:21 +01:00
dependabot[bot]	5cd5b93066	build(deps): bump aiohttp from 3.8.5 to 3.8.6 (#5864 )	2023-11-15 11:08:49 +00:00
khanova	2f0d245c2a	Proxy control plane rate limiter (#5785 ) ## Problem Proxy might overload the control plane. ## Summary of changes Implement rate limiter for proxy<->control plane connection. Resolves https://github.com/neondatabase/neon/issues/5707 Used implementation ideas from https://github.com/conradludgate/squeeze/	2023-11-15 09:15:59 +00:00
Joonas Koivunen	462f04d377	Smaller test addition and change (#5858 ) - trivial serialization roundtrip test for `pageserver::repository::Value` - add missing `start_paused = true` to 15s test making it <0s test - completely unrelated future clippy lint avoidance (helps beta channel users)	2023-11-14 18:04:34 +01:00
Arpad Müller	31a54d663c	Migrate links from wiki to notion (#5862 ) See the slack discussion: https://neondb.slack.com/archives/C033A2WE6BZ/p1696429688621489?thread_ts=1695647103.117499	2023-11-14 15:36:47 +00:00
John Spray	7709c91fe5	neon_local: use remote storage by default, add `cargo neon tenant migrate` (#5760 ) ## Problem Currently the only way to exercise tenant migration is via python test code. We need a convenient way for developers to do it directly in a neon local environment. ## Summary of changes - Add a `--num-pageservers` argument to `cargo neon init` so that it's easy to run with multiple pageservers - Modify default pageserver overrides in neon_local to set up `LocalFs` remote storage, as any migration/attach/detach stuff doesn't work in the legacy local storage mode. This also unblocks removing the pageserver's support for the legacy local mode. - Add a new `cargo neon tenant migrate` command that orchestrates tenant migration, including endpoints.	2023-11-14 09:51:51 +00:00
Arpad Müller	f7249b9018	Fix comment in find_lsn_for_timestamp (#5855 ) We still subtract 1 from low to compute `commit_lsn`. the comment moved/added by #5844 should point this out.	2023-11-11 00:32:00 +00:00
Joonas Koivunen	74d150ba45	build: upgrade ahash (#5851 ) `cargo deny` was complaining the version 0.8.3 was yanked (for possible DoS attack [wiki]), but the latest version (0.8.5) also includes aarch64 fixes which may or may not be relevant. Our usage of ahash limits to proxy, but I don't think we are at any risk. [wiki]: https://github.com/tkaitchuck/aHash/wiki/Yanked-versions	2023-11-10 19:10:54 +00:00
Joonas Koivunen	b7f45204a2	build: deny async-std and friends (#5849 ) rationale: some crates pull these in as default; hopefully these hints will require less cleanup-after and Cargo.lock file watching. follow-up to #5848.	2023-11-10 18:02:22 +01:00
Joonas Koivunen	a05f104cce	build: remove async-std dependency (#5848 ) Introduced by accident (missing `default-features = false`) in `e09d5ada6a`. We directly need only `http_types::StatusCode`.	2023-11-10 16:05:21 +02:00
John Spray	d672e44eee	pageserver: error type for collect_keyspace (#5846 ) ## Problem This is a log hygiene fix, for an occasional test failure. warn-level logging in imitate_timeline_cached_layer_accesses can't distinguish actual errors from shutdown cases. ## Summary of changes Replaced anyhow::Error with an explicit CollectKeySpaceError type, that includes conversion from PageReconstructError::Cancelled.	2023-11-10 13:58:18 +00:00
Rahul Modpur	a6f892e200	metric: add started and killed walredo processes counter (#5809 ) In OOM situations, knowing exactly how many walredo processes there were at a time would help afterwards to understand why was pageserver OOM killed. Add `pageserver_wal_redo_process_total` metric to keep track of total wal redo process started, shutdown and killed since pageserver start. Closes #5722 --------- Signed-off-by: Rahul Modpur <rmodpur2@gmail.com> Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <me@cschwarz.com>	2023-11-10 15:05:22 +02:00
Alexander Bayandin	71b380f90a	Set BUILD_TAG for build-neon job (#5847 ) ## Problem I've added `BUILD_TAG` to docker images. (https://github.com/neondatabase/neon/pull/5812), but forgot to add it to services that we build for tests ## Summary of changes - Set `BUILD_TAG` in `build-neon` job	2023-11-10 12:49:52 +00:00
Alexander Bayandin	6e145a44fa	workflows/neon_extra_builds: run check-codestyle-rust & build-neon on arm64 (#5832 ) ## Problem Some developers use workstations with arm CPUs, and sometimes x86-64 code is not fully compatible with it (for example, https://github.com/neondatabase/neon/pull/5827). Although we don't have arm CPUs in the prod (yet?), it is worth having some basic checks for this architecture to have a better developer experience. Closes https://github.com/neondatabase/neon/issues/5829 ## Summary of changes - Run `check-codestyle-rust`-like & `build-neon`-like jobs on Arm runner - Add `run-extra-build-*` label to run all available extra builds	2023-11-10 12:45:41 +00:00
Arpad Müller	8e5e3971ba	find_lsn_for_timestamp fixes (#5844 ) Includes the changes of #3689 that address point 1 of #3689, plus some further improvements. In particular, this PR does: * set `min_lsn` to a safe value to create branches from (and verify it in tests) * return `min_lsn` instead of `max_lsn` for `NoData` and `Past` (verify it in test for `Past`, `NoData` is harder and not as important) * return `commit_lsn` instead of `max_lsn` for Future (and verify it in the tests) * add some comments Split out of #5686 to get something more minimal out to users.	2023-11-10 13:38:44 +01:00
Joonas Koivunen	8dd29f1e27	fix(pageserver): spawn all kinds of tenant shutdowns (#5841 ) Minor bugfix, something noticed while manual code-review. Use the same joinset for inprogress tenants so we can get the benefit of the buffering logging just as we get for attached tenants, and no single inprogress task can hold up shutdown of other tenants.	2023-11-09 21:36:57 +00:00
Joonas Koivunen	f5344fb85a	temp: log all layer loading errors while we lose them (#5816 ) Temporary workaround while some errors are not being logged. Cc: #5815.	2023-11-09 21:31:53 +00:00
Arpad Müller	f95f001b8b	Lsn for get_timestamp_of_lsn should be string, not integer (#5840 ) The `get_timestamp_of_lsn` pageserver endpoint has been added in #5497, but the yml it added was wrong: the lsn is expected in hex format, not in integer (decimal) format.	2023-11-09 16:12:18 +00:00
John Spray	e0821e1eab	pageserver: refined Timeline shutdown (#5833 ) ## Problem We have observed the shutdown of a timeline taking a long time when a deletion arrives at a busy time for the system. This suggests that we are not respecting cancellation tokens promptly enough. ## Summary of changes - Refactor timeline shutdown so that rather than having a shutdown() function that takes a flag for optionally flushing, there are two distinct functions, one for graceful flushing shutdown, and another that does the "normal" shutdown where we're just setting a cancellation token and then tearing down as fast as we can. This makes things a bit easier to reason about, and enables us to remove the hand-written variant of shutdown that was maintained in `delete.rs` - Layer flush task checks cancellation token more carefully - Logical size calculation's handling of cancellation tokens is simplified: rather than passing one in, it respects the Timeline's cancellation token. This PR doesn't touch RemoteTimelineClient, which will be a key thing to fix as well, so that a slow remote storage op doesn't hold up shutdown.	2023-11-09 16:02:59 +00:00
bojanserafimov	4469b1a62c	Fix blob_io test (#5818 )	2023-11-09 10:47:03 -05:00
Joonas Koivunen	842223b47f	fix(metric): remove pageserver_wal_redo_wait_seconds (#5791 ) the meaning of the values recorded in this histogram changed with #5560 and we never had it visualized as a histogram, just the `increase(_sum)`. The histogram is not too interesting to look at, so remove it per discussion in [slack thread](https://neondb.slack.com/archives/C063LJFF26S/p1699008316109999?thread_ts=1698852436.637559&cid=C063LJFF26S).	2023-11-09 16:40:52 +02:00
Anna Stepanyan	893616051d	Update epic-template.md (#5709 ) replace the checkbox list with a a proper task list in the epic template NB: this PR does not change the code, it only touches the github issue templates	2023-11-09 15:24:43 +01:00
Conrad Ludgate	7cdde285a5	proxy: limit concurrent wake_compute requests per endpoint (#5799 ) ## Problem A user can perform many database connections at the same instant of time - these will all cache miss and materialise as requests to the control plane. #5705 ## Summary of changes I am using a `DashMap` (a sharded `RwLock<HashMap>`) of endpoints -> semaphores to apply a limiter. If the limiter is enabled (permits > 0), the semaphore will be retrieved per endpoint and a permit will be awaited before continuing to call the wake_compute endpoint. ### Important details This dashmap would grow uncontrollably without maintenance. It's not a cache so I don't think an LRU-based reclamation makes sense. Instead, I've made use of the sharding functionality of DashMap to lock a single shard and clear out unused semaphores periodically. I ran a test in release, using 128 tokio tasks among 12 threads each pushing 1000 entries into the map per second, clearing a shard every 2 seconds (64 second epoch with 32 shards). The endpoint names were sampled from a gamma distribution to make sure some overlap would occur, and each permit was held for 1ms. The histogram for time to clear each shard settled between 256-512us without any variance in my testing. Holding a lock for under a millisecond for 1 of the shards does not concern me as blocking	2023-11-09 14:14:30 +00:00
John Spray	9c30883c4b	remote_storage: use S3 SDK's adaptive retry policy (#5813 ) ## Problem Currently, we aren't doing any explicit slowdown in response to 429 responses. Recently, as we hit remote storage a bit harder (pageserver does more ListObjectsv2 requests than it used to since #5580 ), we're seeing storms of 429 responses that may be the result of not just doing too may requests, but continuing to do those extra requests without backing off any more than our usual backoff::exponential. ## Summary of changes Switch from AWS's "Standard" retry policy to "Adaptive" -- docs describe this as experimental but it has been around for a long time. The main difference between Standard and Adaptive is that Adaptive rate-limits the client in response to feedback from the server, which is meant to avoid scenarios where the client would otherwise repeatedly hit throttling responses.	2023-11-09 13:50:13 +00:00
Arthur Petukhovsky	0495798591	Fix walproposer build on aarch64 (#5827 ) There was a compilation error due to `std::ffi::c_char` being different type on different platforms. Clippy also complained due to a similar reason.	2023-11-09 13:05:17 +00:00
Sasha Krassovsky	87389bc933	Add test simulating bad connection between pageserver and compute (#5728 ) ## Problem We have a funny 3-day timeout for connections between the compute and pageserver. We want to get rid of it, so to do that we need to make sure the compute is resilient to connection failures. Closes: https://github.com/neondatabase/neon/issues/5518 ## Summary of changes This test makes the pageserver randomly drop the connection if the failpoint is enabled, and ensures we can keep querying the pageserver. This PR also reduces the default timeout to 10 minutes from 3 days.	2023-11-08 19:48:57 +00:00
Arpad Müller	ea118a238a	JWT logging improvements (#5823 ) * lower level on auth success from info to debug (fixes #5820) * don't log stacktraces on auth errors (as requested on slack). we do this by introducing an `AuthError` type instead of using `anyhow` and `bail`. * return errors that have been censored for improved security.	2023-11-08 16:56:53 +00:00
Christian Schwarz	e9b227a11e	cleanup unused RemoteStorage fields (#5830 ) Found this while working on #5771	2023-11-08 16:54:33 +00:00
John Spray	40441f8ada	pageserver: use `Gate` for stronger safety check in `SlotGuard` (#5793 ) ## Problem #5711 and #5367 raced -- the `SlotGuard` type needs `Gate` to properly enforce its invariant that we may not drop an `Arc<Tenant>` from a slot. ## Summary of changes Replace the TODO with the intended check of Gate.	2023-11-08 13:00:11 +00:00
John Spray	a8a39cd464	test: de-flake test_deletion_queue_recovery (#5822 ) ## Problem This test could fail if timing is unlucky, and the deletions in the test land in two deletion lists instead of one. ## Summary of changes We await _some_ validations instead of _all_ validations, because our execution failpoint will prevent validation proceeding for any but the first DeletionList. Usually the workload just generates one, but if it generates two due to timing, then we must not expect that the second one will be validated.	2023-11-08 12:41:48 +00:00
John Spray	b989ad1922	extend test_change_pageserver for failure case, rework changing pageserver (#5693 ) Reproducer for https://github.com/neondatabase/neon/issues/5692 The test change in this PR intentionally fails, to demonstrate the issue. --------- Co-authored-by: Sasha Krassovsky <krassovskysasha@gmail.com>	2023-11-08 11:26:56 +00:00
Em Sharnoff	acef742a6e	vm-monitor: Remove dependency on workspace_hack (#5752 ) neondatabase/autoscaling builds libs/vm-monitor during CI because it's a necessary component of autoscaling. workspace_hack includes a lot of crates that are not necessary for vm-monitor, which artificially inflates the build time on the autoscaling side, so hopefully removing the dependency should speed things up. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-07 09:41:20 -08:00
duguorong009	11d9d801b5	pageserver: improve the shutdown log error (#5792 ) ## Problem - Close #5784 ## Summary of changes - Update the `GetActiveTenantError` -> `QueryError` conversion process in `pageserver/src/page_service.rs` - Update the pytest logging exceptions in `./test_runner/regress/test_tenant_detach.py`	2023-11-07 16:57:26 +00:00
Andrew Rudenko	fc47af156f	Passing neon options to the console (#5781 ) The idea is to pass neon_* prefixed options to control plane. It can be used by cplane to dynamically create timelines and computes. Such options also should be excluded from passing to compute. Another issue is how connection caching is working now, because compute's instance now depends not only on hostname but probably on such options too I included them to cache key.	2023-11-07 16:49:26 +01:00
Arpad Müller	e310533ed3	Support JWT key reload in pageserver (#5594 ) ## Problem For quickly rotating JWT secrets, we want to be able to reload the JWT public key file in the pageserver, and also support multiple JWT keys. See #4897. ## Summary of changes * Allow directories for the `auth_validation_public_key_path` config param instead of just files. for the safekeepers, all of their config options also support multiple JWT keys. * For the pageservers, make the JWT public keys easily globally swappable by using the `arc-swap` crate. * Add an endpoint to the pageserver, triggered by a POST to `/v1/reload_auth_validation_keys`, that reloads the JWT public keys from the pre-configured path (for security reasons, you cannot upload any keys yourself). Fixes #4897 --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-07 15:43:29 +01:00
John Spray	1d68f52b57	pageserver: move deletion failpoint inside backoff (#5814 ) ## Problem When enabled, this failpoint would busy-spin in a loop that emits log messages. ## Summary of changes Move the failpoint inside a backoff::exponential block: it will still spam the log, but at much lower rate. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-07 14:25:51 +00:00
Alexander Bayandin	4cd47b7d4b	Dockerfile: Set BUILD_TAG for storage services (#5812 ) ## Problem https://github.com/neondatabase/neon/pull/5576 added `build-tag` reporting to `libmetrics_build_info`, but it's not reported because we didn't set the corresponding env variable in the build process. ## Summary of changes - Add `BUILD_TAG` env var while building services	2023-11-07 13:45:59 +00:00
Fernando Luz	0141c95788	build: Add warning when missing postgres submodule during the build (#5614 ) I forked the project and in my local repo, I wasn't able to compile the project and in my search, I found the solution in neon forum. After a PR discussion, I made a change in the makefile to alert the missing `git submodules update` step. --------- Signed-off-by: Fernando Luz <prof.fernando.luz@gmail.com> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-07 12:13:05 +00:00
Shany Pozin	0ac4cf67a6	Use self.tenants instead of TENANTS (#5811 )	2023-11-07 11:38:02 +00:00
Joonas Koivunen	4be6bc7251	refactor: remove unnecessary unsafe (#5802 ) unsafe impls for `Send` and `Sync` should not be added by default. in the case of `SlotGuard` removing them does not cause any issues, as the compiler automatically derives those. This PR adds requirement to document the unsafety (see [clippy::undocumented_unsafe_blocks]) and opportunistically adds `#![deny(unsafe_code)]` to most places where we don't have unsafe code right now. TRPL on Send and Sync: https://doc.rust-lang.org/book/ch16-04-extensible-concurrency-sync-and-send.html [clippy::undocumented_unsafe_blocks]: https://rust-lang.github.io/rust-clippy/master/#/undocumented_unsafe_blocks	2023-11-07 10:26:25 +00:00
John Spray	a394f49e0d	pageserver: avoid converting an error to anyhow::Error (#5803 ) This was preventing it getting cleanly converted to a CalculateLogicalSizeError::Cancelled, resulting in "Logical size calculation failed" errors in logs.	2023-11-07 09:35:45 +00:00
John Spray	c00651ff9b	pageserver: start refactoring into TenantManager (#5797 ) ## Problem See: https://github.com/neondatabase/neon/issues/5796 ## Summary of changes Completing the refactor is quite verbose and can be done in stages: each interface that is currently called directly from a top-level mgr.rs function can be moved into TenantManager once the relevant subsystems have access to it. Landing the initial change to create of TenantManager is useful because it enables new code to use it without having to be altered later, and sets us up to incrementally fix the existing code to use an explicit Arc<TenantManager> instead of relying on the static TENANTS.	2023-11-07 09:06:53 +00:00
Richy Wang	bea8efac24	Fix comments in 'receive_wal.rs'. (#5807 ) ## Problem Some comments in 'receive_wal.rs' is not suitable. It may copy from 'send_wal.rs' and leave it unchanged. ## Summary of changes This commit fixes two comments in the code: Changed "/// Unregister walsender." to "/// Unregister walreceiver." Changed "///Scope guard to access slot in WalSenders registry" to "///Scope guard to access slot in WalReceivers registry."	2023-11-07 09:13:01 +01:00
Conrad Ludgate	ad5b02e175	proxy: remove unsafe (#5805 ) ## Problem `unsafe {}` ## Summary of changes `CStr` has a method to parse the bytes up to a null byte, so we don't have to do it ourselves.	2023-11-06 17:44:44 +00:00
Arpad Müller	b09a851705	Make azure blob storage not do extra metadata requests (#5777 ) Load the metadata from the returned `GetBlobResponse` and avoid downloading it via a separate request. As it turns out, the SDK does return the metadata: https://github.com/Azure/azure-sdk-for-rust/issues/1439 . This PR will reduce the number of requests to Azure caused by downloads. Fixes #5571	2023-11-06 15:16:55 +00:00
John Spray	85cd97af61	pageserver: add `InProgress` tenant map state, use a sync lock for the map (#5367 ) ## Problem Follows on from #5299 - We didn't have a generic way to protect a tenant undergoing changes: `Tenant` had states, but for our arbitrary transitions between secondary/attached, we need a general way to say "reserve this tenant ID, and don't allow any other ops on it, but don't try and report it as being in any particular state". - The TenantsMap structure was behind an async RwLock, but it was never correct to hold it across await points: that would block any other changes for all tenants. ## Summary of changes - Add the `TenantSlot::InProgress` value. This means: - Incoming administrative operations on the tenant should retry later - Anything trying to read the live state of the tenant (e.g. a page service reader) should retry later or block. - Store TenantsMap in `std::sync::RwLock` - Provide an extended `get_active_tenant_with_timeout` for page_service to use, which will wait on InProgress slots as well as non-active tenants. Closes: https://github.com/neondatabase/neon/issues/5378 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-11-06 14:03:22 +00:00
Arpad Müller	e6470ee92e	Add API description for safekeeper copy endpoint (#5770 ) Adds a yaml API description for a new endpoint that allows creation of a new timeline as the copy of an existing one. Part of #5282	2023-11-06 15:00:07 +01:00
bojanserafimov	dc72567288	Layer flush minor speedup (#5765 ) Convert keys to `i128` before sorting	2023-11-06 08:58:20 -05:00
John Spray	6defa2b5d5	pageserver: add `Gate` as a partner to CancellationToken for safe shutdown of `Tenant` & `Timeline` (#5711 ) ## Problem When shutting down a Tenant, it isn't just important to cause any background tasks to stop. It's also important to wait until they have stopped before declaring shutdown complete, in cases where we may re-use the tenant's local storage for something else, such as running in secondary mode, or creating a new tenant with the same ID. ## Summary of changes A `Gate` class is added, inspired by [seastar::gate](https://docs.seastar.io/master/classseastar_1_1gate.html). For types that have an important lifetime that corresponds to some physical resource, use of a Gate as well as a CancellationToken provides a robust pattern for async requests & shutdown: - Requests must always acquire the gate as long as they are using the object - Shutdown must set the cancellation token, and then `close()` the gate to wait for requests in progress before returning. This is not for memory safety: it's for expressing the difference between "Arc<Tenant> exists", and "This tenant's files on disk are eligible to be read/written". - Both Tenant and Timeline get a Gate & CancellationToken. - The Timeline gate is held during eviction of layers, and during page_service requests. - Existing cancellation support in page_service is refined to use the timeline-scope cancellation token instead of a process-scope cancellation token. This replaces the use of `task_mgr::associate_with`: tasks no longer change their tenant/timelineidentity after being spawned. The Tenant's Gate is not yet used, but will be important for Tenant-scoped operations in secondary mode, where we must ensure that our secondary-mode downloads for a tenant are gated wrt the activity of an attached Tenant. This is part of a broader move away from using the global-state driven `task_mgr` shutdown tokens: - less global state where we rely on implicit knowledge of what task a given function is running in, and more explicit references to the cancellation token that a particular function/type will respect, making shutdown easier to reason about. - eventually avoid the big global TASKS mutex. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-06 12:39:20 +00:00
duguorong009	b3d3a2587d	feat: improve the serde impl for several types(`Lsn`, `TenantId`, `TimelineId` ...) (#5335 ) Improve the serde impl for several types (`Lsn`, `TenantId`, `TimelineId`) by making them sensitive to `Serializer::is_human_readadable` (true for json, false for bincode). Fixes #3511 by: - Implement the custom serde for `Lsn` - Implement the custom serde for `Id` - Add the helper module `serde_as_u64` in `libs/utils/src/lsn.rs` - Remove the unnecessary attr `#[serde_as(as = "DisplayFromStr")]` in all possible structs Additionally some safekeeper types gained serde tests. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-06 11:40:03 +02:00
Heikki Linnakangas	b85fc39bdb	Update control plane API path for getting compute spec. (#5357 ) We changed the path in the control plane. The old path is still accepted for compatibility with existing computes, but we'd like to phase it out.	2023-11-06 09:26:09 +02:00
duguorong009	09b5954526	refactor: use streaming in safekeeper `/v1/debug_dump` http response (#5731 ) - Update the handler for `/v1/debug_dump` http response in safekeeper - Update the `debug_dump::build()` to use the streaming in JSON build process	2023-11-05 10:16:54 +00:00
John Spray	306c4f9967	s3_scrubber: prepare for scrubbing buckets with generation-aware content (#5700 ) ## Problem The scrubber didn't know how to find the latest index_part when generations were in use. ## Summary of changes - Teach the scrubber to do the same dance that pageserver does when finding the latest index_part.json - Teach the scrubber how to understand layer files with generation suffixes. - General improvement to testability: scan_metadata has a machine readable output that the testing `S3Scrubber` wrapper can read. - Existing test coverage of scrubber was false-passing because it just didn't see any data due to prefixing of data in the bucket. Fix that. This is incremental improvement: the more confidence we can have in the scrubber, the more we can use it in integration tests to validate the state of remote storage. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-11-03 17:36:02 +00:00
Konstantin Knizhnik	5ceccdc7de	Logical replication startup fixes (#5750 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1698226491736459 ## Summary of changes Update WAL affected buffers when restoring WAL from safekeeper ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2023-11-03 18:40:27 +02:00
Conrad Ludgate	cdcaa329bf	proxy: no more statements (#5747 ) ## Problem my prepared statements change in tokio-postgres landed in the latest release. it didn't work as we intended ## Summary of changes https://github.com/neondatabase/rust-postgres/pull/24	2023-11-03 08:30:58 +00:00
Joonas Koivunen	27bdbf5e36	chore(layer): restore logging, doc changes (#5766 ) Some of the log messages were lost with the #4938. This PR adds some of them back, most notably: - starting to on-demand download - successful completion of on-demand download - ability to see when there were many waiters for the layer download - "unexpectedly on-demand downloading ..." is now `info!` Additionally some rare events are logged as error, which should never happen.	2023-11-02 19:05:33 +00:00
khanova	4c7fa12a2a	Proxy introduce allowed ips (#5729 ) ## Problem Proxy doesn't accept wake_compute responses with the allowed IPs. ## Summary of changes Extend wake_compute api to be able to return allowed_ips.	2023-11-02 16:26:15 +00:00
Em Sharnoff	367971a0e9	vm-monitor: Remove support for file cache in tmpfs (#5617 ) ref neondatabase/cloud#7516. We switched everything over to file cache on disk, now time to remove support for having it in tmpfs.	2023-11-02 16:06:16 +00:00
bojanserafimov	51570114ea	Remove outdated and flaky perf test (#5762 )	2023-11-02 10:43:59 -04:00
Joonas Koivunen	098d3111a5	fix(layer): get_and_upgrade and metrics (#5767 ) when introducing `get_and_upgrade` I forgot that an `evict_and_wait` would had already incremented the counter for started evictions, but an upgrade would just "silently" cancel the eviction as no drop would ever run. these metrics are likely sources for alerts with the next release, so it's important to keep them correct.	2023-11-02 13:06:14 +00:00
Joonas Koivunen	3737fe3a4b	fix(layer): error out early if layer path is non-file (#5756 ) In an earlier PR https://github.com/neondatabase/neon/pull/5743#discussion_r1378625244 I added a FIXME and there's a simple solution suggested by @jcsp, so implement it. Wondering why I did not implement this originally, there is no concept of a permanent failure, so this failure will happen quite often. I don't think the frequency is a problem however. Sadly for std::fs::FileType there is only decimal and hex formatting, no octal.	2023-11-02 11:03:38 +00:00
John Spray	5650138532	pageserver: helpers for explicitly dying on fatal I/O errors (#5651 ) Following from discussion on https://github.com/neondatabase/neon/pull/5436 where hacking an implicit die-on-fatal-io behavior into an Error type was a source of disagreement -- in this PR, dying on fatal I/O errors is explicit, with `fatal_err` and `maybe_fatal_err` helpers in the `MaybeFatalIo` trait, which is implemented for std::io::Result. To enable this approach with `crashsafe_overwrite`, the return type of that function is changed to std::io::Result -- the previous error enum for this function was not used for any logic, and the utility of saying exactly which step in the function failed is outweighed by the hygiene of having an I/O funciton return an io::Result. The initial use case for these helpers is the deletion queue.	2023-11-02 09:14:26 +00:00
Joonas Koivunen	2dca4c03fc	feat(layer): cancellable get_or_maybe_download (#5744 ) With the layer implementation as was done in #4938, it is possible via cancellation to cause two concurrent downloads on the same path, due to how `RemoteTimelineClient::download_remote_layer` does tempfiles. Thread the init semaphore through the spawned task of downloading to make this impossible to happen.	2023-11-02 08:06:32 +00:00
bojanserafimov	0b790b6d00	Record wal size in import benchmark (#5755 )	2023-11-01 17:02:58 -04:00
Joonas Koivunen	e82d1ad6b8	fix(layer): reinit on access before eviction happens (#5743 ) Right before merging, I added a loop to `fn LayerInner::get_or_maybe_download`, which was always supposed to be there. However I had forgotten to restart initialization instead of waiting for the eviction to happen to support original design goal of "eviction should always lose to redownload (or init)". This was wrong. After this fix, if `spawn_blocking` queue is blocked on something, nothing bad will happen. Part of #5737.	2023-11-01 17:38:32 +02:00
Muhammet Yazici	4f0a8e92ad	fix: Add bearer prefix to Authorization header (#5740 ) ## Problem Some requests with `Authorization` header did not properly set the `Bearer ` prefix. Problem explained here https://github.com/neondatabase/cloud/issues/6390. ## Summary of changes Added `Bearer ` prefix to missing requests.	2023-11-01 09:41:48 +03:00
Konstantin Knizhnik	5952f350cb	Always handle POLLHUP in walredo error poll loop (#5716 ) ## Problem test_stderr hangs on MacOS. See https://neondb.slack.com/archives/C036U0GRMRB/p1698438997903919 ## Summary of changes Always handle POLLHUP to prevent infinite loop. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-10-31 20:57:03 +02:00
Tristan Partin	726c8e6730	Add docs for updating Postgres for new minor versions	2023-10-31 12:31:14 -05:00
Em Sharnoff	f7067a38b7	compute_ctl: Assume --vm-monitor-addr arg is always present (#5611 ) It has a default value, so this should be sound. Treating its presence as semantically significant was leading to spurious warnings.	2023-10-31 10:00:23 -07:00
Joonas Koivunen	896347f307	refactor(layer): remove version checking with atomics (#5742 ) The `LayerInner::version` never needed to be read in more than one place. Clarified while fixing #5737 of which this is the first step. This decrements possible wrong atomics usage in Layer, but does not really fix anything.	2023-10-31 18:40:08 +02:00
John Spray	e5c81fef86	tests: minor improvements (#5674 ) Minor changes from while I have been working on HA tests: - Manual pytest executions came with some warnings from `log.warn()` usage - When something fails in a generations-enabled test, it it useful to have a log from the attachment service of what attached when, and with which generation. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-10-31 11:44:35 +00:00
Christian Schwarz	7ebe9ca1ac	pageserver: `/attach`: clarify semantics of 409 (#5698 ) context: https://app.incident.io/neondb/incidents/75 specifically: https://neondb.slack.com/archives/C0634NXQ6E7/p1698422852902959?thread_ts=1698419362.155059&cid=C0634NXQ6E7	2023-10-31 09:32:58 +01:00
Shany Pozin	1588601503	Move release PR creation to Friday (#5721 ) Prepare for a new release workflow * Release PR is created on Fridays * The discussion/approval happens during Friday * Sunday morning the deployment will be done in central-il and perf tests will be run * On Monday early IST morning gradually start rolling (starting from US regions as they are still in weekend time) See slack for discussion: https://neondb.slack.com/archives/C04P81J55LK/p1698565305607839?thread_ts=1698428241.031979&cid=C04P81J55LK	2023-10-30 22:10:24 +01:00
John Spray	9c35e1e6e5	pageserver: downgrade slow task warnings from warn to info (#5724 ) ## Problem In #5658 we suppressed the first-iteration output from these logs, but the volume of warnings is still problematic. ## Summary of changes - Downgrade all slow task warnings to INFO. The information is still there if we actively want to know about which tasks are running slowly, without polluting the overall stream of warnings with situations that are unsurprising to us. - Revert the previous change so that we output on the first iteration as we used to do. There is no reason to suppress these, now that the severity is just info.	2023-10-30 18:32:30 +00:00
Conrad Ludgate	d8c21ec70d	fix nightly 1.75 (#5719 ) ## Problem Neon doesn't compile on nightly and had numerous clippy complaints. ## Summary of changes 1. Fixed troublesome dependency 2. Fixed or ignored the lints where appropriate	2023-10-30 16:43:06 +00:00
Konstantin Knizhnik	ad99fa5f03	Grant BYPASSRLS and REPLICATION to exited roles (#5657 ) ## Problem Role need to have REPLICATION privilege to be able to used for logical replication. New roles are created with this option. This PR tries to update existed roles. ## Summary of changes Update roles in `handle_roles` method ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-10-30 15:29:25 +00:00
John Spray	e675f4cec8	tests: fix missing comma in test_timeline_deletion_with_files_stuck_… (#5713 ) …in_upload_queue This was a syntax mistake in https://github.com/neondatabase/neon/pull/5149 We didn't notice because the situation the log allow list covers is a relative rare race.	2023-10-30 15:18:32 +00:00
Joonas Koivunen	4db8efb2cf	Layer: logging fixes (#5676 ) - include Layer generation in the default display, with Generation::Broken as `-broken` - omit layer from `layer_gc` span because the api it works with needs to support N layers, so the api needs to log each layer	2023-10-30 16:21:30 +02:00
John Spray	07c2b29895	pageserver: fix error logging on stray timeline files (#5712 ) ## Problem If there were stray files in the timelines/ dir after tenant deletion, pageserver could panic on out of range. ## Summary of changes Use iterator `take()`, which doesn't care if the number of elements available is less than requested.	2023-10-30 13:24:52 +00:00
Konstantin Knizhnik	9cdffd164a	Prevent SIGSEGV in apply_error_callback when record was not decoded (#5703 ) ## Problem See https://neondb.slack.com/archives/C036U0GRMRB/p1698652221399419?thread_ts=1698438997.903919&cid=C036U0GRMRB ## Summary of changes Check if record pointer is not NULL before trying to print record descriptor ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-10-30 12:06:08 +02:00
John Spray	87db4b441c	pageserver: cleaner shutdown in timeline delete (#5701 ) The flush task logs a backtrace if it tries to upload and remote timeline client is already in stopped state. Therefore we cannot shut them down concurrently: flush task must be shut down first. This wasn't more obvious because: - Timeline deletions IRL usually happen when not much is being written - In tests, there is a global allow-list for this log It's not obvious whether removing the global log allow list is safe, this PR was prompted by how the log spam got in my way when testing deletion changes.	2023-10-30 09:18:40 +00:00
Conrad Ludgate	964c5c56b7	proxy: dont retry server errors (#5694 ) ## Problem accidental spam ## Summary of changes don't spam control plane if control plane is down :) ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-10-30 08:38:56 +00:00
Arpad Müller	bd59349af3	Fix Rust 1.74 warnings (#5702 ) Fixes new warnings and clippy changes introduced by version 1.74 of the rust compiler toolchain.	2023-10-28 03:47:26 +02:00
Joonas Koivunen	2bd79906d9	fix: possible page_service hang on cancel (#5696 ) Fixes #5341, one more suspected case, see: https://github.com/neondatabase/neon/issues/5341#issuecomment-1783052379 - races `MaybeWriteOnly::shutdown` with cancellation - switches to using `AsyncWriteExt::write_buf` - notes cancellation safety for shutdown	2023-10-27 19:09:34 +01:00
Conrad Ludgate	493b47e1da	proxy: exclude client latencies in metrics (#5688 ) ## Problem In #5539, I moved the connect_to_compute latency to start counting before authentication - this is because authentication will perform some calls to the control plane in order to get credentials and to eagerly wake a compute server. It felt important to include these times in the latency metric as these are times we should definitely care about reducing. What is not interesting to record in this metric is the roundtrip time during authentication when we wait for the client to respond. ## Summary of changes Implement a mechanism to pause the latency timer, resuming on drop of the pause struct. We pause the timer right before we send the authentication message to the client, and we resume the timer right after we complete the authentication flow.	2023-10-27 17:17:39 +00:00
John Spray	c13e932c3b	pageserver: add generation fields in openapi spec (#5690 ) These optional fields have existed for as while, but weren't mentioned in `openapi_spec.yaml` yet.	2023-10-27 14:20:04 +01:00
Gleb Novikov	a5292f7e67	Some minor renames in attachment service API (#5687 ) ## Problem ## Summary of changes ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] ~~If it is a core feature, I have added thorough tests.~~ - [ ] ~~Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?~~ - [ ] ~~If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.~~ ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-10-27 12:36:34 +01:00
Arthur Petukhovsky	262348e41b	Fix safekeeper log spans (#5643 ) We were missing spans with ttid in "WAL backup" and several other places, this commit should fix it. Here are the examples of logs before and after: https://gist.github.com/petuhovskiy/711a4a4e7ddde3cab3fa6419b2f70fb9	2023-10-27 12:09:02 +01:00
Joonas Koivunen	68f15cf967	fix: schedule_compaction_update must only unlink (#5675 ) #5649 added the concept of dangling layers which #4938 uses but only partially. I forgot to change `schedule_compaction_update` to not schedule deletions to uphold the "have a layer, you can read it". With the now remembered fix, I don't think these checks should ever fail except for a mistake I already did. These changes might be useful for protecting future changes, even though the Layer carrying the generation AND the `schedule_(gc\|compaction)_update` require strong arcs. Rationale for keeping the `#[cfg(feature = "testing")]` is worsening any leak situation which might come up.	2023-10-27 11:16:01 +01:00
duguorong009	39f8fd6945	feat: add `build_tag` env support for `set_build_info_metric` (#5576 ) - Add a new util `project_build_tag` macro, similar to `project_git_version` - Update the `set_build_info_metric` to accept and make use of `build_tag` info - Update all codes which use the `set_build_info_metric`	2023-10-27 10:47:11 +01:00
John Spray	83567f9e4e	tests: revise perf test that interfered with local disk state (#5682 ) This benchmark started failing after #5580 merged. It was manually deleting some local content on a pageserver, and expecting the behavior that the pageserver would "forget" about the timeline on startup as a result. That is no longer our behavior: pageservers use the remote storage as the source of truth. Rather than having the test go manually delete things at all, we can just delete the whole tenant via the pageserver API, and thereby start from a clean situation.	2023-10-27 09:23:49 +01:00
Conrad Ludgate	71611f4ab3	proxy: prepare to remove high cardinality metrics (#5461 ) ## Problem High cardinality metrics are bad ## Summary of changes Preparing to remove high cardinality metrics. Will actually remove in #5466	2023-10-26 22:54:37 +01:00
John Spray	7c16b5215e	scrubber: add separate find/purge garbage commands (#5409 ) ## Problem The previous garbage cleanup functionality relied on doing a dry run, inspecting logs, and then doing a deletion. This isn't ideal, because what one actually deletes might not be the same as what one saw in the dry run. It's also risky UX to rely on presence/absence of one CLI flag to control deletion: ideally the deletion command should be totally separate from the one that scans the bucket. Related: https://github.com/neondatabase/neon/issues/5037 ## Summary of changes This is a major re-work of the code, which results in a net decrease in line count of about 600. The old code for removing garbage was build around the idea of doing discovery and purging together: a "delete_batch_producer" sent batches into a deleter. The new code writes out both procedures separately, in functions that use the async streams introduced in https://github.com/neondatabase/neon/pull/5176 to achieve fast concurrent access to S3 while retaining the readability of a single function. - Add `find-garbage`, which writes out a JSON file of tenants/timelines to purge - Add `purge-garbage` which consumes the garbage JSON file, applies some extra validations, and does deletions. - The purge command will refuse to execute if the garbage file indicates that only garbage was found: this guards against classes of bugs where the scrubber might incorrectly deem everything garbage. - The purge command defaults to only deleting tenants that were found in "deleted" state in the control plane. This guards against the risk that using the wrong console API endpoint could cause all tenants to appear to be missing. Outstanding work for a future PR: - Make whatever changes are needed to adapt to the Console/Control Plane separation. - Make purge even safer by checking S3 `Modified` times for index_part.json files (not doing this here, because it will depend on the generation-aware changes for finding index_part.json files) ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com> Co-authored-by: Shany Pozin <shany@neon.tech>	2023-10-26 20:36:28 +01:00
Em Sharnoff	39b148b74e	Bump vm-builder v0.18.2 -> v0.18.4 (#5666 ) Only applicable change was neondatabase/autoscaling#584, setting pgbouncer auth_dbname=postgres in order to fix superuser connections from preventing dropping databases.	2023-10-26 20:04:57 +01:00
Sasha Krassovsky	116c342cad	Support changing pageserver dynamically (#5542 ) ## Problem We currently require full restart of compute if we change the pageserver url ## Summary of changes Makes it so that we don't have to do a full restart, but can just send SIGHUP	2023-10-26 10:56:07 -07:00
John Spray	ba4fe9e10f	pageserver: fix the second "AUX files" warning (#5673 ) In https://github.com/neondatabase/neon/pull/5669 I didn't notice that the same warning is logged in two places: fix the other one.	2023-10-26 13:54:52 +00:00
John Spray	de90bf4663	pageserver: always load remote metadata (no more `spawn_load`) (#5580 ) ## Problem The pageserver had two ways of loading a tenant: - `spawn_load` would trust on-disk content to reflect all existing timelines - `spawn_attach` would list timelines in remote storage. It was incorrect for `spawn_load` to trust local disk content, because it doesn't know if the tenant might have been attached and written somewhere else. To make this correct would requires some generation number checks, but the payoff is to avoid one S3 op per tenant at startup, so it's not worth the complexity -- it is much simpler to have one way to load a tenant. ## Summary of changes - `Tenant` objects are always created with `Tenant::spawn`: there is no more distinction between "load" and "attach". - The ability to run without remote storage (for `neon_local`) is preserved by adding a branch inside `attach` that uses a fallback `load_local` if no remote_storage is present. - Fix attaching a tenant when it has a timeline with no IndexPart: this can occur if a newly created timeline manages to upload a layer before it has uploaded an index. - The attach marker file that used to indicate whether a tenant should be "loaded" or "attached" is no longer needed, and is removed. - The GenericRemoteStorage interface gets a `list()` method that maps more directly to what ListObjects does, returning both keys and common prefixes. The existing `list_files` and `list_prefixes` methods are just calls into `list()` now -- these can be removed later if we would like to shrink the interface a bit. - The remote deletion marker is moved into `timelines/` and detected as part of listing timelines rather than as a separate GET request. If any existing tenants have a marker in the old location (unlikely, only happens if something crashes mid-delete), then they will rely on the control plane retrying to complete their deletion. - Revise S3 calls for timeline listing and tenant load to take a cancellation token, and retry forever: it never makes sense to make a Tenant broken because of a transient S3 issue. ## Breaking changes - The remote deletion marker is moved from `deleted` to `timelines/deleted` within the tenant prefix. Markers in the old location will be ignored: it is the control plane's responsibility to retry deletions until they succeed. Markers in the new location will be tolerated by the previous release of pageserver via https://github.com/neondatabase/neon/pull/5632 - The local `attaching` marker file is no longer written. Therefore, if the pageserver is downgraded after running this code, the old pageserver will not be able to distinguish between partially attached tenants and fully attached tenants. This would only impact tenants that were partway through attaching at the moment of downgrade. In the unlikely even t that we do experience an incident that prompts us to roll back, then we may check for attach operations in flight, and manually insert `attaching` marker files as needed. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-10-26 14:48:44 +01:00
John Spray	8360307ea0	pageserver: exponential backoff on compaction/GC failures (#5672 ) Previously, if walredo process crashed we would try to spawn a fresh one every 2 seconds, which is expensive in itself, but also results in a high I/O load from the part of the compaction prior to the failure, which we re-run every 2 seconds. Closes: https://github.com/neondatabase/neon/issues/5671	2023-10-26 14:00:26 +01:00
MMeent	6129077d31	WALRedo: Limit logging to log_level = ERROR and above (#5587 ) This fixes issues in pageserver's walredo process where WALRedo logs of loglevel=LOG are interpreted as errors. ## Problem See #5560 ## Summary of changes Set the log level to something that doesn't include LOG.	2023-10-26 12:21:41 +01:00
John Spray	e0ebdfc7ce	pageserver: suppress compaction/gc errors while stopping (#5670 ) ## Problem Tenant deletions would sometimes be accompanied by compaction stack traces, because `shutdown()` puts the tenant into stopping state before it joins background tasks. ## Summary of changes Treat GC+Compaction as no-ops on a Stopping tenant.	2023-10-26 10:59:24 +01:00
Joonas Koivunen	c508d3b5fa	reimpl Layer, remove remote layer, trait Layer, trait PersistentLayer (#4938 ) Implement a new `struct Layer` abstraction which manages downloadness internally, requiring no LayerMap locking or rewriting to download or evict providing a property "you have a layer, you can read it". The new `struct Layer` provides ability to keep the file resident via a RAII structure for new layers which still need to be uploaded. Previous solution solved this `RemoteTimelineClient::wait_completion` which lead to bugs like #5639. Evicting or the final local deletion after garbage collection is done using Arc'd value `Drop`. With a single `struct Layer` the closed open ended `trait Layer`, `trait PersistentLayer` and `struct RemoteLayer` are removed following noting that compaction could be simplified by simply not using any of the traits in between: #4839. The new `struct Layer` is a preliminary to remove `Timeline::layer_removal_cs` documented in #4745. Preliminaries: #4936, #4937, #5013, #5014, #5022, #5033, #5044, #5058, #5059, #5061, #5074, #5103, epic #5172, #5645, #5649. Related split off: #5057, #5134.	2023-10-26 12:36:38 +03:00
John Spray	acda65d7d4	pageserver: quieten "Failed to get info about AUX files" (#5669 ) ## Problem This line caused lots of errors to be emitted for healthy tenants. ## Summary of changes Downgrade to debug, since it is an expected code path we'll take for tenants at startup.	2023-10-26 09:53:18 +01:00
dependabot[bot]	378daa358b	build(deps): bump werkzeug from 2.2.3 to 3.0.1 (#5665 )	2023-10-25 22:50:35 +00:00
Alexander Bayandin	85f4514e7d	Get env var for real Azure tests from GitHub (#5662 ) ## Problem We'll need to switch `REMOTE_STORAGE_AZURE_REGION` from the current `eastus2` region to something `eu-central-1`-like. This may require changing `AZURE_STORAGE_ACCESS_KEY`. To make it possible to switch from one place (not to break a lot of builds on CI), move `REMOTE_STORAGE_AZURE_CONTAINER` and `REMOTE_STORAGE_AZURE_REGION` to GitHub Variables. See https://github.com/neondatabase/neon/settings/variables/actions ## Summary of changes - Get values for `REMOTE_STORAGE_AZURE_CONTAINER` & `REMOTE_STORAGE_AZURE_REGION` from GitHub Variables	2023-10-25 22:54:23 +01:00
Joonas Koivunen	f70019797c	refactor(rtc): schedule compaction update (#5649 ) a single operation instead of N uploads and 1 deletion scheduling with write(layer_map) lock releasing in the between. Compaction update will make for a much better place to change how the operation will change in future compared to more general file based operations. builds upon #5645. solves the problem of difficult to see hopeful correctness w.r.t. other `index_part.json` changing operations. Co-authored-by: Shany Pozin <shany@neon.tech>	2023-10-25 22:25:43 +01:00
Joonas Koivunen	325258413a	fix: trampling on global physical size metric (#5663 ) All loading (attached, or from disk) timelines overwrite the global gauge for physical size. The `_set` method cannot be used safely, so remove it and just "add" the physical size.	2023-10-25 19:29:12 +01:00
Konstantin Knizhnik	4ddbc0e46d	Ignore missed AUX_FILES_KEY when generating image layer (#5660 ) ## Problem Logical replication requires new AUX_FILES_KEY which is definitely absent in existed database. We do not have function to check if key exists in our KV storage. So I have to handle the error in `list_aux_files` method. But this key is also included in key space range and accessed y `create_image_layer` method. ## Summary of changes Check if AUX_FILES_KEY exists before including it in keyspace. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Shany Pozin <shany@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-10-25 18:35:23 +01:00
Arpad Müller	a673e4e7a9	Optionally return json from get_lsn_by_timestamp (#5608 ) This does two things: first a minor refactor to not use HTTP/1.x style header names and also to not panic if some certain requests had no "Accept" header. As a second thing, it addresses the third bullet point from #3689: > Change `get_lsn_by_timestamp` API method to return LSN even if we only found commit before the specified timestamp. This is done by adding a version parameter to the `get_lsn_by_timestamp` API call and making its behaviour depend on the version number. Part of #3414 (but doesn't address it in its entirety). --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-10-25 18:46:34 +02:00
bojanserafimov	c155cc0c3f	Fix test instructions readme (#5644 )	2023-10-25 11:53:04 -04:00
Conrad Ludgate	32126d705b	proxy refactor serverless (#4685 ) ## Problem Our serverless backend was a bit jumbled. As a comment indicated, we were handling SQL-over-HTTP in our `websocket.rs` file. I've extracted out the `sql_over_http` and `websocket` files from the `http` module and put them into a new module called `serverless`. ## Summary of changes ```sh mkdir proxy/src/serverless mv proxy/src/http/{conn_pool,sql_over_http,websocket}.rs proxy/src/serverless/ mv proxy/src/http/server.rs proxy/src/http/health_server.rs mv proxy/src/metrics proxy/src/usage_metrics.rs ``` I have also extracted the hyper server and handler from websocket.rs into `serverless.rs`	2023-10-25 15:43:03 +01:00
John Spray	5683ae9eab	pageserver: suppress some of the most common spurious warnings (#5658 ) Two of the most common spurious log messages: - broker connections terminate & we log at error severity. Unfortunately tonic gives us an "Unknown" error so to suppress these we're doing string matching. It's hacky but worthwhile for operations. - the first iteration of tenant background tasks tends to over-run its schedule and emit a warning. Ultimately we should fix these to run on time, but for now we are not benefiting from polluting our logs with the warnings.	2023-10-25 14:55:37 +01:00
Alexander Bayandin	4778b6a12e	Switch to querying new tests results DB (#5616 ) ## Problem We started to store test results in a new format in https://github.com/neondatabase/neon/pull/4549. This PR switches scripts to query this db. (we can completely remove old DB/ingestions scripts in a couple of weeks after the PR merged) ## Summary of changes - `scripts/benchmark_durations.py` query new database - `scripts/flaky_tests.py` query new database	2023-10-25 14:25:13 +01:00
John Spray	8b8be7bed4	tests: don't fail tests on torn log lines (#5655 ) ## Problem Tests that force-kill and restart a service can generate torn log lines that might match WARN\|ERROR, but not match the allow expression that a test has loaded, e.g. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5651/6638398772/index.html#suites/7538959189f4501983ddd9e167836c8b/d272ba8a73e6945c ## Summary of changes Ignore log lines which match a regex for torn log lines on restart: they have two timestamps and the second line is an "INFO version"... message.	2023-10-25 13:29:30 +01:00
Conrad Ludgate	a461c459d8	fix http pool test (#5653 ) ## Problem We defer the returning of connections the the connection pool. It's possible for our test to be faster than the returning of connections - which then gets a differing process ID because it opens a new connection. ## Summary of changes 1. Delay the tests just a little (20ms) to give more chance for connections to return. 2. Correlate connection IDs with the connection logs a bit more	2023-10-25 13:20:45 +01:00
Joonas Koivunen	4ae2d1390d	refactor(remote_timeline_client): Split deletion into unlinking + deletion (#5645 ) Quest: #4745. Prerequisite for #4938. Original https://github.com/neondatabase/neon/pull/4938#issuecomment-1777150665. The new Layer implementation has so far been using `RemoteTimelineClient::schedule_layer_file_deletion` from `Layer::drop` but it was noticed that this could mean that the L0s compaction wanted to remove could linger in the index part for longer time or be left there for longer time. Solution is to split the `RemoteTimelineClient::schedule_layer_file_deletion` into two parts: - unlinking from index_part.json, to be called from end of compaction and gc - scheduling of actual deletions, to be called from `Layer::drop` The added methods are added unused.	2023-10-25 15:01:19 +03:00
Joonas Koivunen	c5949e1fd6	misc smaller improvements (#5527 ) - finally add an `#[instrument]` to Timeline::create_image_layers, making it easier to see that something is happening because we create image layers - format some macro context code - add a warning not to create new validation functions a la parse do not validate Split off from #5198.	2023-10-25 14:59:43 +03:00
John Spray	127837abb0	tests: de-flake test_eviction_across_generations (#5650 ) ## Problem There was an edge case where initial logical size calculation can be downloading a layer that wasn't hit by the test's `SELECT`, and it's on-disk but still marked as remote in the pageserver's internal state, so evicting it fails. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5648/6630099807/index.html#categories/dee044ec96f666edb90a77c01099a941/e38e97a2735ffa8c/ ## Summary of changes Use pageserver API to learn about layers, instead of inspecting local disk, so that we will always agree with the pageserver about which layer are local.	2023-10-25 10:55:45 +01:00
Conrad Ludgate	b2c96047d0	move wake compute after the auth quirks logic (#5642 ) ## Problem https://github.com/neondatabase/neon/issues/5568#issuecomment-1777015606 ## Summary of changes Make the auth_quirks_creds return the authentication information, and push the wake_compute loop to after, inside `auth_quirks`	2023-10-25 08:30:47 +01:00
Em Sharnoff	44202eeb3b	Bump vm-builder v0.18.1 -> v0.18.2 (#5646 ) Only applicable change was neondatabase/autoscaling#571, removing the postgres_exporter flags `--auto-discover-databases` and `--exclude-databases=...`	2023-10-24 16:04:28 -07:00
Arpad Müller	4bef977c56	Use tuples instead of manual comparison chain (#5637 ) Makes code a little bit simpler	2023-10-24 17:16:23 +00:00
John Spray	a0b862a8bd	pageserver: schedule frozen layer uploads inside the layers lock (#5639 ) ## Problem Compaction's source of truth for what layers exist is the LayerManager. `flush_frozen_layer` updates LayerManager before it has scheduled upload of the frozen layer. Compaction can then "see" the new layer, decide to delete it, schedule uploads of replacement layers, all before `flush_frozen_layer` wakes up again and schedules the upload. When the upload is scheduled, the local layer file may be gone, in which case we end up with no such layer in remote storage, but an entry still added to IndexPart pointing to the missing layer. ## Summary of changes Schedule layer uploads inside the `self.layers` lock, so that whenever a frozen layer is present in LayerManager, it is also present in RemoteTimelineClient's metadata. Closes: #5635	2023-10-24 13:57:01 +01:00
Conrad Ludgate	767ef29390	proxy: filter out more quota exceeded errors (#5640 ) ## Problem Looking at logs, I saw more retries being performed for other quota exceeded errors ## Summary of changes Filter out all quota exceeded family of errors	2023-10-24 13:13:23 +01:00
Alexander Bayandin	a8a800af51	Run real Azure tests on CI (#5627 ) ## Problem We do not run real Azure-related tests on CI ## Summary of changes - Set required env variables to run real Azure blob storage tests on CI	2023-10-24 12:12:11 +01:00
Arpad Müller	1e250cd90a	Cleanup in azure_upload_download_works test (#5636 ) The `azure_upload_download_works` test is not cleaning up after itself, leaving behind the files it is uploading. I found these files when looking at the contents of the bucket in #5627. We now clean up the file we uploaded before, like the other tests do it as well. Follow-up of #5546	2023-10-23 19:08:56 +01:00
John Spray	eaaa18f6ed	attachment_service: graceful SIGQUIT (#5626 ) `attachment_service` doesn't explicitly handle signals, which causes a backtrace when `neon_local` kills it with SIGQUIT. Closes: https://github.com/neondatabase/neon/issues/5613	2023-10-23 17:30:25 +01:00
John Spray	188f67e1df	pageserver: forward compat: be tolerant of deletion marker in `timelines/` (#5632 ) ## Problem https://github.com/neondatabase/neon/pull/5580 will move the remote deletion marker into the `timelines/` path. This would cause old pageserver code to fail loading the tenant due to an apparently invalid timeline ID. That would be a problem if we had to roll back after deploying #5580 ## Summary of changes If a `deleted` file is in `timelines/` just ignore it.	2023-10-23 17:51:38 +02:00
John Spray	7e805200bb	pageserver: parallel load of configs (#5607 ) ## Problem When the number of tenants is large, sequentially issuing the open/read calls for their config files is a ~1000ms delay during startup. It's not a lot, but it's simple to fix. ## Summary of changes Put all the config loads into spawn_blocking() tasks and run them in a JoinSet. We can simplify this a bit later when we have full async disk I/O. --------- Co-authored-by: Shany Pozin <shany@neon.tech>	2023-10-23 15:32:34 +01:00
Christian Schwarz	c6ca1d76d2	consumption_metrics: fix periodicness behavior & reporting (#5625 ) Before this PR, the ticker was running at default miss behavior `Delay`. For example, here is the startup output with 25k tenants: ``` 2023-10-19T09:57:21.682466Z INFO synthetic_size_worker: starting calculate_synthetic_size_worker 2023-10-19T10:50:44.678202Z WARN synthetic_size_worker: task iteration took longer than the configured period elapsed=3202.995707156s period=10m task=ConsumptionMetricsSyntheticSizeWorker 2023-10-19T10:52:17.408056Z WARN synthetic_size_worker: task iteration took longer than the configured period elapsed=2695.72556035s period=10m task=ConsumptionMetricsSyntheticSizeWorker ``` The first message's `elapsed` value is correct. It matches the delta between the log line timestamps. The second one is logged ca 1.5min after, though, but reports a much larger `elapsed` than 1.5min. This PR fixes the behavior by copying what `eviction_task.rs` does.	2023-10-23 16:31:38 +02:00
Conrad Ludgate	94b4e76e13	proxy: latency connect outcome (#5588 ) ## Problem I recently updated the latency timers to include cache miss and pool miss, as well as connection protocol. By moving the latency timer to start before authentication, we count a lot more failures and it's messed up the latency dashboard. ## Summary of changes Add another label to LatencyTimer metrics for outcome. Explicitly report on success	2023-10-23 15:17:28 +01:00
khanova	b514da90cb	Set up timeout for scram protocol execution (#5551 ) ## Problem Context: https://github.com/neondatabase/neon/issues/5511#issuecomment-1759649679 Some of out scram protocol execution timed out only after 17 minutes. ## Summary of changes Make timeout for scram execution meaningful and configurable.	2023-10-23 15:11:05 +01:00
Conrad Ludgate	7d17f1719f	reduce cancel map contention (#5555 ) ## Problem Every database request locks this cancel map rwlock. At high requests per second this would have high contention ## Summary of changes Switch to dashmap which has a sharded rwlock to reduce contention	2023-10-23 14:12:41 +01:00
John Spray	41ee75bc71	pageserver: do config writes in a spawn_blocking (#5603 ) ## Problem We now persist tenant configuration every time we spawn a tenant. The persist_tenant_config function is doing a series of non-async filesystem I/O, because `crashsafe::` isn't async yet. This isn't a demonstrated problem, but is a source of uncertainty when reasoning about what's happening with our startup times. ## Summary of changes - Wrap `crashsafe_overwrite` in `spawn_blocking`. - Although I think this change makes sense, it does not have a measurable impact on load time when testing with 10k tenants. - This can be reverted when we have full async I/O	2023-10-23 09:19:01 +01:00
Christian Schwarz	11e523f503	walredo: fix EGAGAIN/"os error 11" false page reconstruction failures (#5560 ) Stacked atop https://github.com/neondatabase/neon/pull/5559 Before this PR, there was the following race condition: ``` T1: polls for writeable stdin T1: writes to stdin T1: enters poll for stdout/stderr T2: enters poll for stdin write WALREDO: writes to stderr KERNEL: wakes up T1 and T2 Tx: reads stderr and prints it Ty: reads stderr and gets EAGAIN (valid values for (x, y) are (1, 2) or (2, 1)) ``` The concrete symptom that we observed repeatedly was with PG16, which started logging `registered custom resource manager` to stderr always, during startup, thereby giving us repeated opportunity to hit above race condition. PG14 and PG15 didn't log anything to stderr, hence we could have only hit this race condition if there was an actual error happening. This PR fixes the race by moving the reading of stderr into a tokio task. It exits when the stderr is closed by the child process, which in turn happens when the child exits, either by itself or because we killed it. The downside is that the async scheduling can reorder the log messages, which can be seen in the new `test_stderr`, which runs in a single-threaded runtime. I included the output below. Overall I think we should move the entire walredo to async, as Joonas proposed many months ago. This PR's asyncification is just the first step to resolve these false page reconstruction errors. After this is fixed, we should stop printing that annoying stderr message on walredo startup; it causes noise in the pageserver logs. That work is tracked in #5399 . ``` 2023-10-13T19:05:21.878858Z ERROR apply_wal_records{tenant_id=d546fb76ba529195392fb4d19e243991 pid=753986}: failed to write out the walredo errored input: No such file or directory (os error 2) target=walredo-1697223921878-1132-0.walredo length=1132 2023-10-13T19:05:21.878932Z DEBUG postgres applied 2 WAL records (1062 bytes) in 114666 us to reconstruct page image at LSN 0/0 2023-10-13T19:05:21.878942Z ERROR error applying 2 WAL records 0/16A9388..0/16D4080 (1062 bytes) to base image with LSN 0/0 to reconstruct page image at LSN 0/0 n_attempts=0: apply_wal_records Caused by: WAL redo process closed its stdout unexpectedly 2023-10-13T19:05:21.879027Z INFO kill_and_wait_impl{pid=753986}: wait successful exit_status=signal: 11 (SIGSEGV) (core dumped) 2023-10-13T19:05:21.879079Z DEBUG wal-redo-postgres-stderr{pid=753986 tenant_id=d546fb76ba529195392fb4d19e243991 pg_version=16}: wal-redo-postgres stderr_logger_task started 2023-10-13T19:05:21.879104Z ERROR wal-redo-postgres-stderr{pid=753986 tenant_id=d546fb76ba529195392fb4d19e243991 pg_version=16}: received output output="2023-10-13 19:05:21.769 GMT [753986] LOG: registered custom resource manager \"neon\" with ID 134\n" 2023-10-13T19:05:21.879116Z DEBUG wal-redo-postgres-stderr{pid=753986 tenant_id=d546fb76ba529195392fb4d19e243991 pg_version=16}: wal-redo-postgres stderr_logger_task finished 2023-10-13T19:05:22.004439Z ERROR apply_wal_records{tenant_id=d546fb76ba529195392fb4d19e243991 pid=754000}: failed to write out the walredo errored input: No such file or directory (os error 2) target=walredo-1697223922004-1132-0.walredo length=1132 2023-10-13T19:05:22.004493Z DEBUG postgres applied 2 WAL records (1062 bytes) in 125344 us to reconstruct page image at LSN 0/0 2023-10-13T19:05:22.004501Z ERROR error applying 2 WAL records 0/16A9388..0/16D4080 (1062 bytes) to base image with LSN 0/0 to reconstruct page image at LSN 0/0 n_attempts=1: apply_wal_records Caused by: WAL redo process closed its stdout unexpectedly 2023-10-13T19:05:22.004588Z INFO kill_and_wait_impl{pid=754000}: wait successful exit_status=signal: 11 (SIGSEGV) (core dumped) 2023-10-13T19:05:22.004624Z DEBUG wal-redo-postgres-stderr{pid=754000 tenant_id=d546fb76ba529195392fb4d19e243991 pg_version=16}: wal-redo-postgres stderr_logger_task started 2023-10-13T19:05:22.004653Z ERROR wal-redo-postgres-stderr{pid=754000 tenant_id=d546fb76ba529195392fb4d19e243991 pg_version=16}: received output output="2023-10-13 19:05:21.884 GMT [754000] LOG: registered custom resource manager \"neon\" with ID 134\n" 2023-10-13T19:05:22.004666Z DEBUG wal-redo-postgres-stderr{pid=754000 tenant_id=d546fb76ba529195392fb4d19e243991 pg_version=16}: wal-redo-postgres stderr_logger_task finished ```	2023-10-23 09:00:13 +01:00
Konstantin Knizhnik	b1a1126152	Grant replication permission to newly created users (#5615 ) ## Problem ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-10-20 21:29:17 +03:00
John Spray	a8899e1e0f	pageserver: apply timeout when waiting for tenant loads (#5601 ) ## Problem Loading tenants shouldn't hang. However, if it does, we shouldn't let one hung tenant prevent the entire process from starting background jobs. ## Summary of changes Generalize the timeout mechanism that we already applied to loading initial logical sizes: each phase in startup where we wait for a barrier is subject to a timeout, and startup will proceed if it doesn't complete within timeout. Startup metrics will still reflect the time when a phase actually completed, rather than when we skipped it. The code isn't the most beautiful, but that kind of reflects the awkwardness of await'ing on a future and then stashing it to await again later if we time out. I could imagine making this cleaner in future by waiting on a structure that doesn't self-destruct on wait() the way Barrier does, then make InitializationOrder into a structure that manages the series of waits etc.	2023-10-20 09:15:34 +01:00
Arseny Sher	2fbd5ab075	Add safekeeper test_late_init.	2023-10-20 10:57:59 +03:00
Arseny Sher	702382e99a	Add check that WAL segments are identical after recovery.	2023-10-20 10:57:59 +03:00
Arseny Sher	1b53b3e200	Make test_pageserver_http_get_wal_receiver_success not wait for keepalive.	2023-10-20 10:57:59 +03:00
Arseny Sher	b332268cec	Introduce safekeeper peer recovery. Implements fetching of WAL by safekeeper from another safekeeper by imitating behaviour of last elected leader. This allows to avoid WAL accumulation on compute and facilitates faster compute startup as it doesn't need to download any WAL. Actually removing WAL download in walproposer is a matter of another patch though. There is a per timeline task which always runs, checking regularly if it should start recovery frome someone, meaning there is something to fetch and there is no streaming compute. It then proceeds with fetching, finishing when there is nothing more to receive. Implements https://github.com/neondatabase/neon/pull/4875	2023-10-20 10:57:59 +03:00
Arseny Sher	76c702219c	Don't use AppenRequestHeader.epoch_start_lsn. It is simpler to get it once from ProposerEelected.	2023-10-20 10:57:59 +03:00
Arthur Petukhovsky	ba856140e7	Fix neon_extra_build.yml (#5605 ) Build walproposer-lib in gather-rust-build-stats, fix nproc usage, fix walproposer-lib on macos.	2023-10-19 22:20:39 +01:00
Em Sharnoff	2cf6a47cca	vm-monitor: Deny not fail downscale if no memory stats yet (#5606 ) Fixes an issue we observed on staging that happens when the autoscaler-agent attempts to immediately downscale the VM after binding, which is typical for pooled computes. The issue was occurring because the autoscaler-agent was requesting downscaling before the vm-monitor had gathered sufficient cgroup memory stats to be confident in approving it. When the vm-monitor returned an internal error instead of denying downscaling, the autoscaler-agent retried the connection and immediately hit the same issue (in part because cgroup stats are collected per-connection, rather than globally).	2023-10-19 19:09:37 +01:00
Konstantin Knizhnik	5a8bcdccb0	Fix elog format error in wallog_mapping_file (#5602 ) ## Problem Fix elog format error in wallog_mapping_file ## Summary of changes Use proper case to avoid compilation warning=error in C at MacOS. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-10-19 20:24:35 +03:00
Em Sharnoff	2c8741a5ed	vm-monitor: Log full error on message handling failure (#5604 ) There's currently an issue with the vm-monitor on staging that's not really feasible to debug because the current display impl gives no context to the errors (just says "failed to downscale"). Logging the full error should help. For communications with the autoscaler-agent, it's ok to only provide the outermost cause, because we can cross-reference with the VM logs. At some point in the future, we may want to change that.	2023-10-19 18:10:33 +02:00
Shany Pozin	893b7bac9a	Fix neon_extra_builds.yml : nproc is not supported in mac os (#5598 ) ## Problem nproc is not supported in mac os, use sysctl -n hw.ncpu instead	2023-10-19 15:24:23 +01:00
Arthur Petukhovsky	66f8f5f1c8	Call walproposer from Rust (#5403 ) Create Rust bindings for C functions from walproposer. This allows to write better tests with real walproposer code without spawning multiple processes and starting up the whole environment. `make walproposer-lib` stage was added to build static libraries `libwalproposer.a`, `libpgport.a`, `libpgcommon.a`. These libraries can be statically linked to any executable to call walproposer functions. `libs/walproposer/src/walproposer.rs` contains `test_simple_sync_safekeepers` to test that walproposer can be called from Rust to emulate sync_safekeepers logic. It can also be used as a usage example.	2023-10-19 14:17:15 +01:00
Alexander Bayandin	3a19da1066	build(deps): bump rustix from 0.37.19 to 0.37.25 (#5596 ) ## Problem @dependabot has bumped `rustix` 0.36 version to the latest in https://github.com/neondatabase/neon/pull/5591, but didn't bump 0.37. Also, update all Rust dependencies for `test_runner/pg_clients/rust/tokio-postgres`. Fixes - https://github.com/neondatabase/neon/security/dependabot/39 - https://github.com/neondatabase/neon/security/dependabot/40 ## Summary of changes - `cargo update -p rustix@0.37.19` - Update all dependencies for `test_runner/pg_clients/rust/tokio-postgres`	2023-10-19 13:49:06 +01:00
Conrad Ludgate	572eda44ee	update tokio-postgres (#5597 ) https://github.com/neondatabase/rust-postgres/pull/23	2023-10-19 14:32:19 +02:00
Arpad Müller	b1d6af5ebe	Azure blobs: Simplify error conversion by addition of to_download_error (#5575 ) There is a bunch of duplication and manual Result handling that can be simplified by moving the error conversion into a shared function, using `map_err`, and the question mark operator.	2023-10-19 14:31:09 +02:00
Arpad Müller	f842b22b90	Add endpoint for querying time info for lsn (#5497 ) ## Problem See #5468. ## Summary of changes Add a new `get_timestamp_of_lsn` endpoint, returning the timestamp associated with the given lsn. Fixes #5468. --------- Co-authored-by: Shany Pozin <shany@neon.tech>	2023-10-19 04:50:49 +02:00
dependabot[bot]	d444d4dcea	build(deps): bump rustix from 0.36.14 to 0.36.16 (#5591 ) Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.36.14 to 0.36.16. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-19 03:43:49 +01:00
Tristan Partin	c8637f3736	Remove specific file references in NOTICE Seems like a burden to update this file with each major release.	2023-10-18 14:58:48 -05:00
John Spray	ecf759be6d	tests: allow-list S3 500 on DeleteObjects key (#5586 ) ## Problem S3 can give us a 500 whenever it likes: when this happens at request level we eat it in `backoff::retry`, but when it happens for a key inside a DeleteObjects request, we log it at warn level. ## Summary of changes Allow-list this class of log message in all tests.	2023-10-18 15:16:58 +00:00
Arthur Petukhovsky	9a9d9eba42	Add test_idle_reconnections	2023-10-18 17:09:26 +03:00
Arseny Sher	1f4805baf8	Remove remnants of num_computes field. Fixes https://github.com/neondatabase/neon/issues/5581	2023-10-18 17:09:26 +03:00
Konstantin Knizhnik	5c88213eaf	Logical replication (#5271 ) ## Problem See https://github.com/neondatabase/company_projects/issues/111 ## Summary of changes Save logical replication files in WAL at compute and include them in basebackup at pate server. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2023-10-18 16:42:22 +03:00
John Spray	607d19f0e0	pageserver: clean up page service Result handling for shutdown/disconnect (#5504 ) ## Problem - QueryError always logged at error severity, even though disconnections are not true errors. - QueryError type is not expressive enough to distinguish actual errors from shutdowns. - In some functions we're returning Ok(()) on shutdown, in others we're returning an error ## Summary of changes - Add QueryError::Shutdown and use it in places we check for cancellation - Adopt consistent Result behavior: disconnects and shutdowns are always QueryError, not ok - Transform shutdown+disconnect errors to Ok(()) at the very top of the task that runs query handler - Use the postgres protocol error code for "admin shutdown" in responses to clients when we are shutting down. Closes: #5517	2023-10-18 13:28:38 +01:00
dependabot[bot]	1fa0478980	build(deps): bump urllib3 from 1.26.17 to 1.26.18 (#5582 )	2023-10-18 12:21:54 +01:00
Christian Schwarz	9da67c4f19	walredo: make request_redo() an async fn (#5559 ) Stacked atop https://github.com/neondatabase/neon/pull/5557 Prep work for https://github.com/neondatabase/neon/pull/5560 These changes have a 2% impact on `bench_walredo`. That's likely because of the `block_on() in the innermost piece of benchmark-only code. So, it doesn't affect production code. The use of closures in the benchmarking code prevents a straightforward conversion of the whole benchmarking code to async. before: ``` $ cargo bench --features testing --bench bench_walredo Compiling pageserver v0.1.0 (/home/cs/src/neon/pageserver) Finished bench [optimized + debuginfo] target(s) in 2m 11s Running benches/bench_walredo.rs (target/release/deps/bench_walredo-d99a324337dead70) Gnuplot not found, using plotters backend short/short/1 time: [26.363 µs 27.451 µs 28.573 µs] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild short/short/2 time: [64.340 µs 64.927 µs 65.485 µs] Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) low mild short/short/4 time: [101.98 µs 104.06 µs 106.13 µs] short/short/8 time: [151.42 µs 152.74 µs 154.03 µs] short/short/16 time: [296.30 µs 297.53 µs 298.88 µs] Found 14 outliers among 100 measurements (14.00%) 10 (10.00%) high mild 4 (4.00%) high severe medium/medium/1 time: [225.12 µs 225.90 µs 226.66 µs] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild medium/medium/2 time: [490.80 µs 491.64 µs 492.49 µs] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild medium/medium/4 time: [934.47 µs 936.49 µs 938.52 µs] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe medium/medium/8 time: [1.8364 ms 1.8412 ms 1.8463 ms] Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild medium/medium/16 time: [3.6694 ms 3.6896 ms 3.7104 ms] ``` after: ``` $ cargo bench --features testing --bench bench_walredo Compiling pageserver v0.1.0 (/home/cs/src/neon/pageserver) Finished bench [optimized + debuginfo] target(s) in 2m 11s Running benches/bench_walredo.rs (target/release/deps/bench_walredo-d99a324337dead70) Gnuplot not found, using plotters backend short/short/1 time: [28.345 µs 28.529 µs 28.699 µs] change: [-0.2201% +3.9276% +8.2451%] (p = 0.07 > 0.05) No change in performance detected. Found 17 outliers among 100 measurements (17.00%) 4 (4.00%) low severe 5 (5.00%) high mild 8 (8.00%) high severe short/short/2 time: [66.145 µs 66.719 µs 67.274 µs] change: [+1.5467% +2.7605% +3.9927%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) low mild short/short/4 time: [105.51 µs 107.52 µs 109.49 µs] change: [+0.5023% +3.3196% +6.1986%] (p = 0.02 < 0.05) Change within noise threshold. short/short/8 time: [151.90 µs 153.16 µs 154.41 µs] change: [-1.0001% +0.2779% +1.4221%] (p = 0.65 > 0.05) No change in performance detected. short/short/16 time: [297.38 µs 298.26 µs 299.20 µs] change: [-0.2953% +0.2462% +0.7763%] (p = 0.37 > 0.05) No change in performance detected. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild medium/medium/1 time: [229.76 µs 230.72 µs 231.69 µs] change: [+1.5804% +2.1354% +2.6635%] (p = 0.00 < 0.05) Performance has regressed. medium/medium/2 time: [501.14 µs 502.31 µs 503.64 µs] change: [+1.8730% +2.1709% +2.5199%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low mild 1 (1.00%) high mild 5 (5.00%) high severe medium/medium/4 time: [954.15 µs 956.74 µs 959.33 µs] change: [+1.7962% +2.1627% +2.4905%] (p = 0.00 < 0.05) Performance has regressed. medium/medium/8 time: [1.8726 ms 1.8785 ms 1.8848 ms] change: [+1.5858% +2.0240% +2.4626%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe medium/medium/16 time: [3.7565 ms 3.7746 ms 3.7934 ms] change: [+1.5503% +2.3044% +3.0818%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild ```	2023-10-18 11:23:06 +01:00
Em Sharnoff	16c87b5bda	Bump vm-builder v0.17.12 -> v0.18.1 (#5583 ) Only applicable change was neondatabase/autoscaling#566, updating pgbouncer to 1.21.0 and enabling support for prepared statements.	2023-10-18 11:10:01 +02:00
Em Sharnoff	9fe5cc6a82	vm-monitor: Switch from memory.high to polling memory.stat (#5524 ) tl;dr it's really hard to avoid throttling from memory.high, and it counts tmpfs & page cache usage, so it's also hard to make sense of. In the interest of fixing things quickly with something that should be good enough, this PR switches to instead periodically fetch memory statistics from the cgroup's memory.stat and use that data to determine if and when we should upscale. This PR fixes #5444, which has a lot more detail on the difficulties we've hit with memory.high. This PR also supersedes #5488.	2023-10-17 15:30:40 -07:00
Conrad Ludgate	543b8153c6	proxy: add flag to reject requests without proxy protocol client ip (#5417 ) ## Problem We need a flag to require proxy protocol (prerequisite for #5416) ## Summary of changes Add a cli flag to require client IP addresses. Error if IP address is missing when the flag is active.	2023-10-17 16:59:35 +01:00
Christian Schwarz	3a8959a4c4	page_cache: remove dead code (#5493 )	2023-10-17 15:56:16 +01:00
Christian Schwarz	4a50483861	docs: error handling: document preferred anyhow context & logging style (#5178 ) We already had strong support for this many months ago on Slack: https://neondb.slack.com/archives/C0277TKAJCA/p1673453329770429	2023-10-17 15:41:47 +01:00
Conrad Ludgate	f775928dfc	proxy: refactor how and when connections are returned to the pool (#5095 ) ## Problem Transactions break connections in the pool fixes #4698 ## Summary of changes * Pool `Client`s are smart object that return themselves to the pool * Pool `Client`s can be 'discard'ed * Pool `Client`s are discarded when certain errors are encountered. * Pool `Client`s are discarded when ReadyForQuery returns a non-idle state.	2023-10-17 13:55:52 +00:00
John Spray	ea648cfbc6	tests: fix test_eviction_across_generations trying to evict temp files (#5579 ) This test is listing files in a timeline and then evicting them: if the test ran slowly this could encounter temp files for unfinished downloads: fix by filtering these out in evict_all_layers.	2023-10-17 13:26:11 +01:00
Arpad Müller	093f8c5f45	Update rust to 1.73.0 (#5574 ) [Release notes](https://blog.rust-lang.org/2023/10/05/Rust-1.73.0.html)	2023-10-17 13:13:12 +01:00
Arpad Müller	00c71bb93a	Also try to login to Azure via SDK provided methods (#5573 ) ## Problem We ideally use the Azure SDK's way of obtaining authorization, as pointed out in https://github.com/neondatabase/neon/pull/5546#discussion_r1360619178 . ## Summary of changes This PR adds support for Azure SDK based authentication, using [DefaultAzureCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.DefaultAzureCredential.html), which tries the following credentials: * [EnvironmentCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.EnvironmentCredential.html), reading from various env vars * [ImdsManagedIdentityCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.ImdsManagedIdentityCredential.html), using managed identity * [AzureCliCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.AzureCliCredential.html), using Azure CLI closes #5566.	2023-10-17 11:59:57 +01:00
Christian Schwarz	9256788273	limit imitate accesses concurrency, using same semaphore as compactions (#5578 ) Before this PR, when we restarted pageserver, we'd see a rush of `$number_of_tenants` concurrent eviction tasks starting to do imitate accesses building up in the period of `[init_order allows activations, $random_access_delay + EvictionPolicyLayerAccessThreshold::period]`. We simply cannot handle that degree of concurrent IO. We already solved the problem for compactions by adding a semaphore. So, this PR shares that semaphore for use by evictions. Part of https://github.com/neondatabase/neon/issues/5479 Which is again part of https://github.com/neondatabase/neon/issues/4743 Risks / Changes In System Behavior ================================== * we don't do evictions as timely as we currently do * we log a bunch of warnings about eviction taking too long * imitate accesses and compactions compete for the same concurrency limit, so, they'll slow each other down through this shares semaphore Changes ======= - Move the `CONCURRENT_COMPACTIONS` semaphore into `tasks.rs` - Rename it to `CONCURRENT_BACKGROUND_TASKS` - Use it also for the eviction imitate accesses: - Imitate acceses are both per-TIMELINE and per-TENANT - The per-TENANT is done through coalescing all the per-TIMELINE tasks via a tokio mutex `eviction_task_tenant_state`. - We acquire the CONCURRENT_BACKGROUND_TASKS permit early, at the beginning of the eviction iteration, much before the imitate acesses start (and they may not even start at all in the given iteration, as they happen only every $threshold). - Acquiring early is sub-optimal because when the per-timline tasks coalesce on the `eviction_task_tenant_state` mutex, they are already holding a CONCURRENT_BACKGROUND_TASKS permit. - It's also unfair because tenants with many timelines win the CONCURRENT_BACKGROUND_TASKS more often. - I don't think there's another way though, without refactoring more of the imitate accesses logic, e.g, making it all per-tenant. - Add metrics for queue depth behind the semaphore. I found these very useful to understand what work is queued in the system. - The metrics are tagged by the new `BackgroundLoopKind`. - On a green slate, I would have used `TaskKind`, but we already had pre-existing labels whose names didn't map exactly to task kind. Also the task kind is kind of a lower-level detail, so, I think it's fine to have a separate enum to identify background work kinds. Future Work =========== I guess I could move the eviction tasks from a ticker to "sleep for $period". The benefit would be that the semaphore automatically "smears" the eviction task scheduling over time, so, we only have the rush on restart but a smeared-out rush afterward. The downside is that this perverts the meaning of "$period", as we'd actually not run the eviction at a fixed period. It also means the the "took to long" warning & metric becomes meaningless. Then again, that is already the case for the compaction and gc tasks, which do sleep for `$period` instead of using a ticker.	2023-10-17 11:29:48 +02:00
Joonas Koivunen	9e1449353d	crash-consistent layer map through index_part.json (#5198 ) Fixes #5172 as it: - removes recoinciliation with remote index_part.json and accepts remote index_part.json as the truth, deleting any local progress which is yet to be reflected in remote - moves to prefer remote metadata Additionally: - tests with single LOCAL_FS parametrization are cleaned up - adds a test case for branched (non-bootstrap) local only timeline availability after restart --------- Co-authored-by: Christian Schwarz <christian@neon.tech> Co-authored-by: John Spray <john@neon.tech>	2023-10-17 10:04:56 +01:00
John Spray	b06dffe3dc	pageserver: fixes to `/location_config` API (#5548 ) ## Problem I found some issues with the `/location_config` API when writing new tests. ## Summary of changes - Calling the API with the "Detached" state is now idempotent. - `Tenant::spawn_attach` now takes a boolean to indicate whether to expect a marker file. Marker files are used in the old attach path, but not in the new location conf API. They aren't needed because in the New World, the choice of whether to attach via remote state ("attach") or to trust local state ("load") will be revised to cope with the transitions between secondary & attached (see https://github.com/neondatabase/neon/issues/5550). It is okay to merge this change ahead of that ticket, because the API is not used in the wild yet. - Instead of using `schedule_local_tenant_processing`, the location conf API handler does its own directory creation and calls `spawn_attach` directly. - A new `unsafe_create_dir_all` is added. This differs from crashsafe::create_dir_all in two ways: - It is intentionally not crashsafe, because in the location conf API we are no longer using directory or config existence as the signal for any important business logic. - It is async and uses `tokio::fs`.	2023-10-17 10:21:31 +02:00
Christian Schwarz	b08a0ee186	walredo: fix race condition where shutdown kills the wrong process (#5557 ) Before this PR, the following race condition existed: ``` T1: does the apply_wal_records() call and gets back an error T2: does the apply_wal_records() call and gets back an error T2: does the kill_and_shutdown T2: new loop iteration T2: launches new walredo process T1: does the kill_and_shutdown of the new process ``` That last step is wrong, T2 already did the kill_and_shutdown. The symptom of this race condition was that T2 would observe an error when it tried to do something with the process after T1 killed it. For example, but not limited to: `POLLHUP` / `"WAL redo process closed its stderr unexpectedly"`. The fix in this PR is the following: * Use Arc to represent walredo processes. The Arc lives at least as long as the walredo process. * Use Arc::ptr_eq to determine whether to kill the process or not. The price is an additional RwLock to protect the new `redo_process` field that holds the Arc. I guess that could perhaps be an atomic pointer swap some day. But, let's get one race fixed without risking introducing a new one. The use of Arc/drop is also not super great here because it now allows for an unlimited number of to-be-killed processes to exist concurrently. See the various `NB` comments above `drop(proc)` for why it's "ok" right now due to the blocking `wait` inside `drop`. Note: an earlier fix attempt was https://github.com/neondatabase/neon/pull/5545 where we apply_batch_postgres would compare stdout_fd for equality. That's incorrect because the kernel can reuse the file descriptor when T2 launches the new process. Details: https://github.com/neondatabase/neon/pull/5545#pullrequestreview-1676589373	2023-10-17 09:55:39 +02:00
Arpad Müller	3666df6342	azure_blob.rs: use division instead of left shift (#5572 ) Should have been a right shift but I did a left shift. It's constant folded anyways so we just use a shift.	2023-10-16 19:52:07 +01:00
Alexey Kondratov	0ca342260c	[compute_ctl+pgxn] Handle invalid databases after failed drop (#5561 ) ## Problem In `89275f6c1e` we fixed an issue, when we were dropping db in Postgres even though cplane request failed. Yet, it introduced a new problem that we now de-register db in cplane even if we didn't actually drop it in Postgres. ## Summary of changes Here we revert extension change, so we now again may leave db in invalid state after failed drop. Instead, `compute_ctl` is now responsible for cleaning up invalid databases during full configuration. Thus, there are two ways of recovering from failed DROP DATABASE: 1. User can just repeat DROP DATABASE, same as in Vanilla Postgres. 2. If they didn't, then on next full configuration (dbs / roles changes in the API; password reset; or data availability check) invalid db will be cleaned up in the Postgres and re-created by `compute_ctl`. So again it follows pretty much the same semantics as Vanilla Postgres -- you need to drop it again after failed drop. That way, we have a recovery trajectory for both problems. See this commit for info about `invalid` db state: `a4b4cc1d60` According to it: > An invalid database cannot be connected to anymore, but can still be dropped. While on it, this commit also fixes another issue, when `compute_ctl` was trying to connect to databases with `ALLOW CONNECTIONS false`. Now it will just skip them. Fixes #5435	2023-10-16 20:46:45 +02:00
John Spray	ded7f48565	pageserver: measure startup duration spent fetching remote indices (#5564 ) ## Problem Currently it's unclear how much of the `initial_tenant_load` period is in S3 objects, and therefore how impactful it is to make changes to remote operations during startup. ## Summary of changes - `Tenant::load` is refactored to load remote indices in parallel and to wait for all these remote downloads to finish before it proceeds to construct any `Timeline` objects. - `pageserver_startup_duration_seconds` gets a new `phase` value of `initial_tenant_load_remote` which counts the time from startup to when the last tenant finishes loading remote content. - `test_pageserver_restart` is extended to validate this phase. The previous version of the test was relying on order of dict entries, which stopped working when adding a phase, so this is refactored a bit. - `test_pageserver_restart` used to explicitly create a branch, now it uses the default initial_timeline. This avoids startup getting held up waiting for logical sizes, when one of the branches is not in use.	2023-10-16 18:21:37 +01:00
Arpad Müller	e09d5ada6a	Azure blob storage support (#5546 ) Adds prototype-level support for [Azure blob storage](https://azure.microsoft.com/en-us/products/storage/blobs). Some corners were cut, see the TODOs and the followup issue #5567 for details. Steps to try it out: * Create a storage account with block blobs (this is a per-storage account setting). * Create a container inside that storage account. * Set the appropriate env vars: `AZURE_STORAGE_ACCOUNT, AZURE_STORAGE_ACCESS_KEY, REMOTE_STORAGE_AZURE_CONTAINER, REMOTE_STORAGE_AZURE_REGION` * Set the env var `ENABLE_REAL_AZURE_REMOTE_STORAGE=y` and run `cargo test -p remote_storage azure` Fixes #5562	2023-10-16 17:37:09 +02:00
Conrad Ludgate	8c522ea034	proxy: count cache-miss for compute latency (#5539 ) ## Problem Would be good to view latency for hot-path vs cold-path ## Summary of changes add some labels to latency metrics	2023-10-16 16:31:04 +01:00
John Spray	44b1c4c456	pageserver: fix eviction across generations (#5538 ) ## Problem Bug was introduced by me in `83ae2bd82c` When eviction constructs a RemoteLayer to replace the layer it just evicted, it is building a LayerFileMetadata using its _current_ generation, rather than the generation of the layer. ## Summary of changes - Retrieve Generation from RemoteTimelineClient when evicting. This will no longer be necessary when #4938 lands. - Add a test for the scenario in question (this fails without the fix).	2023-10-15 20:23:18 +01:00
Christian Schwarz	99c15907c1	walredo: trim public interfaces (#5556 ) Stacked atop https://github.com/neondatabase/neon/pull/5554.	2023-10-13 19:35:53 +01:00
Christian Schwarz	c3626e3432	walredo: remove legacy wal-redo-datadir cleanup code (#5554 ) It says it in the comment.	2023-10-13 19:16:15 +01:00
Christian Schwarz	dd6990567f	walredo: apply_batch_postgres: get a backtrace whenever it encounters an error (#5541 ) For 2 weeks we've seen rare, spurious, not-reproducible page reconstruction failures with PG16 in prod. One of the commits we deployed this week was Commit commit `fc467941f9` Author: Joonas Koivunen <joonas@neon.tech> Date: Wed Oct 4 16:19:19 2023 +0300 walredo: log retryed error (#546) With the logs from that commit, we learned that some read() or write() system call that walredo does fails with `EAGAIN`, aka `Resource temporarily unavailable (os error 11)`. But we have no idea where exactly in the code we get back that error. So, use anyhow instead of fake std::io::Error's as an easy way to get a backtrace when the error happens, and change the logging to print that backtrace (i.e., use `{:?}` instead of `utils::error::report_compact_sources(e)`). The `WalRedoError` type had to go because we add additional `.context()` further up the call chain before we `{:?}`-print it. That additional `.context()` further up doesn't see that there's already an anyhow::Error inside the `WalRedoError::ApplyWalRecords` variant, and hence captures another backtrace and prints that one on `{:?}`-print instead of the original one inside `WalRedoError::ApplyWalRecords`. If we ever switch back to `report_compact_sources`, we should make sure we have some other way to uniquely identify the places where we return an error in the error message.	2023-10-13 14:08:23 +00:00
khanova	21deb81acb	Fix case for array of jsons (#5523 ) ## Problem Currently proxy doesn't handle array of json parameters correctly. ## Summary of changes Added one more level of quotes escaping for the array of jsons case. Resolves: https://github.com/neondatabase/neon/issues/5515	2023-10-12 14:32:49 +02:00
khanova	dbb21d6592	Make http timeout configurable (#5532 ) ## Problem Currently http timeout is hardcoded to 15 seconds. ## Summary of changes Added an option to configure it via cli args. Context: https://neondb.slack.com/archives/C04DGM6SMTM/p1696941726151899	2023-10-12 11:41:07 +02:00
Joonas Koivunen	ddceb9e6cd	fix(branching): read last record lsn only after Tenant::gc_cs (#5535 ) Fixes #5531, at least the latest error of not being able to create a branch from the head under write and gc pressure.	2023-10-11 16:24:36 +01:00
John Spray	0fc3708de2	pageserver: use a backoff::retry in Deleter (#5534 ) ## Problem The `Deleter` currently doesn't use a backoff::retry because it doesn't need to: it is already inside a loop when doing the deletion, so can just let the loop go around. However, this is a problem for logging, because we log on errors, which includes things like 503/429 cases that would usually be swallowed by a backoff::retry in most places we use the RemoteStorage interface. The underlying problem is that RemoteStorage doesn't have a proper error type, and an anyhow::Error can't easily be interrogated for its original S3 SdkError because downcast_ref requires a concrete type, but SdkError is parametrized on response type. ## Summary of changes Wrap remote deletions in Deleter in a backoff::retry to avoid logging warnings on transient 429/503 conditions, and for symmetry with how RemoteStorage is used in other places.	2023-10-11 15:25:08 +01:00
John Spray	e0c8ad48d4	remote_storage: log detail errors in delete_objects (#5530 ) ## Problem When we got an error in the payload of a DeleteObjects response, we only logged how many errors, not what they were. ## Summary of changes Log up to 10 specific errors. We do not log all of them because that would be up to 1000 log lines per request.	2023-10-11 13:22:00 +01:00
John Spray	39e144696f	pageserver: clean up `mgr.rs` types that needn't be public (#5529 ) ## Problem These types/functions are public and it prevents clippy from catching unused things. ## Summary of changes Move to `pub(crate)` and remove the error enum that becomes clearly unused as a result.	2023-10-11 11:50:16 +00:00
Alexander Bayandin	653044f754	test_runners: increase some timeouts to make tests less flaky (#5521 ) ## Problem - `test_heavy_write_workload` is flaky, and fails because of to statement timeout - `test_wal_lagging` is flaky and fails because of the default pytest timeout (see https://github.com/neondatabase/neon/issues/5305) ## Summary of changes - `test_heavy_write_workload`: increase statement timeout to 5 minutes (from default 2 minutes) - `test_wal_lagging`: increase pytest timeout to 600s (from default 300s)	2023-10-11 10:49:15 +01:00
Vadim Kharitonov	80dcdfa8bf	Update pgvector to 0.5.1 (#5525 )	2023-10-11 09:47:19 +01:00
Arseny Sher	685add2009	Enable /metrics without auth. To enable auth faster.	2023-10-10 20:06:25 +03:00
Conrad Ludgate	d4dc86f8e3	proxy: more connection metrics (#5464 ) ## Problem Hard to tell 1. How many clients are connected to proxy 2. How many requests clients are making 3. How many connections are made to a database 1 and 2 are different because of the properties of HTTP. We have 2 already tracked through `proxy_accepted_connections_total` and `proxy_closed_connections_total`, but nothing for 1 and 3 ## Summary of changes Adds 2 new counter gauges. * `proxy_opened_client_connections_total`,`proxy_closed_client_connections_total` - how many client connections are open to proxy * `proxy_opened_db_connections_total`,`proxy_closed_db_connections_total` - how many active connections are made through to a database. For TCP and Websockets, we expect all 3 of these quantities to be roughly the same, barring users connecting but with invalid details. For HTTP: * client_connections/connections can differ because the client connections can be reused. * connections/db_connections can differ because of connection pooling.	2023-10-10 16:33:20 +01:00
Alex Chi Z	5158de70f3	proxy: breakdown wake up failure metrics (#4933 ) ## Problem close https://github.com/neondatabase/neon/issues/4702 ## Summary of changes This PR adds a new metrics for wake up errors and breaks it down by most common reasons (mostly follows the `could_retry` implementation).	2023-10-10 13:17:37 +01:00
khanova	aec9188d36	Added timeout for http requests (#5514 ) # Problem Proxy timeout for HTTP-requests ## Summary of changes If the HTTP-request exceeds 15s, it would be killed. Resolves: https://github.com/neondatabase/neon/issues/4847	2023-10-10 13:39:38 +02:00
John Spray	acefee9a32	pageserver: flush deletion queue on detach (#5452 ) ## Problem If a caller detaches a tenant and then attaches it again, pending deletions from the old attachment might not have happened yet. This is not a correctness problem, but it causes: - Risk of leaking some objects in S3 - Some warnings from the deletion queue when pending LSN updates and pending deletions don't pass validation. ## Summary of changes - Deletion queue now uses UnboundedChannel so that the push interfaces don't have to be async. - This was pulled out of https://github.com/neondatabase/neon/pull/5397, where it is also useful to be able to drive the queue from non-async contexts. - Why is it okay for this to be unbounded? The only way the unbounded-ness of the channel can become a problem is if writing out deletion lists can't keep up, but if the system were that overloaded then the code generating deletions (GC, compaction) would also be impacted. - DeletionQueueClient gets a new `flush_advisory` function, which is like flush_execute, but doesn't wait for completion: this is appropriate for use in contexts where we would like to encourage the deletion queue to flush, but don't need to block on it. - This function is also expected to be useful in next steps for seamless migration, where the option to flush to S3 while transitioning into AttachedStale will also include flushing deletion queue, but we wouldn't want to block on that flush. - The tenant_detach code in mgr.rs invokes flush_advisory after stopping the `Tenant` object. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-10-10 10:46:24 +01:00
Conrad Ludgate	bf065aabdf	proxy: update locked error retry filter (#5376 ) ## Problem We don't want to retry customer quota exhaustion errors. ## Summary of changes Make sure both types of quota exhaustion errors are not retried	2023-10-10 08:59:16 +01:00
Konstantin Knizhnik	fe74fac276	Fix handling flush error in prefetch (#5473 ) ## Problem See https://neondb.slack.com/archives/C05U648A9NJ In case of failure of flush in prefetch, prefetch state is reseted. We need to retry register buffer attempt, otherwise we will get assertion failure. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-10-10 07:43:37 +03:00
Alexander Bayandin	b91ac670e1	Update plpgsql_check extension to 2.5.3 (#5437 )	2023-10-09 17:07:43 +01:00
John Spray	b3195afd20	tests: fix a race in test_deletion_queue_recovery on loaded nodes (#5495 ) ## Problem Seen in CI for https://github.com/neondatabase/neon/pull/5453 -- the time gap between validation completing and the header getting written is long enough to fail the test, where it was doing a cheeky 1 second sleep. ## Summary of changes - Replace 1 second sleep with a wait_until to see the header file get written - Use enums as test params to make the results more readable (instead of True-False parameters) - Fix the temp suffix used for deletion queue headers: this worked fine, but resulted in `..tmp` extension.	2023-10-09 16:28:28 +01:00
John Spray	7eaa7a496b	pageserver: cancellation handling in writes to postgres client socket (#5503 ) ## Problem Writes to the postgres client socket from the page server were not wrapped in cancellation handling, so a stuck client connection could prevent tenant shutdowwn. ## Summary of changes All the places we call flush() to write to the socket, we should be respecting the cancellation token for the task. In this PR, I explicitly pass around a CancellationToken rather than doing inline `task_mgr::shutdown_token` calls, to avoid coupling it to the global task_mgr state and make it easier to refactor later. I have some follow-on commits that add a Shutdown variant to QueryError and use it more extensively, but that's pure refactor so will keep separate from this bug fix PR. Closes: https://github.com/neondatabase/neon/issues/5341	2023-10-09 15:54:17 +01:00
Joonas Koivunen	4772cd6c93	fix: deny branching, starting compute from not yet uploaded timelines (#5484 ) Part of #5172. First commits show that we used to allow starting up a compute or creating a branch off a not yet uploaded timeline. This PR moves activation of a timeline to happen after initial layer file(s) (if any) and `index_part.json` have been uploaded. Simply moving activation to be after downloads have finished works because we now spawn a task per http request handler. Current behaviour of uploading on the timelines on next startup is kept, to be removed later as part of #5172. Adds: - `NeonCli.map_branch` and corresponding `neon_local` implementation: allow creating computes for timelines managed via pageserver http client/api - possibly duplicate tests (I did not want to search for, will cleanup in a follow-up if these duplicated) Changes: - make `wait_until_tenant_state` return immediatedly on `Broken` and not wait more	2023-10-09 17:03:38 +03:00
Shany Pozin	010b4d0d5c	Move ApiError 404 to info level (#5501 ) ## Problem Moving ApiError 404 to info level logging (see https://github.com/neondatabase/neon/pull/5489#issuecomment-1750211212)	2023-10-09 13:54:46 +03:00
Rahul Modpur	477cb3717b	Fix neon_local pageserver status command (#5475 ) ## Problem Fix neon_local pageserver status command #5430 ## Summary of changes Fix clap config for pageserver status subcommand ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>	2023-10-09 09:13:57 +01:00
John Spray	ea5a97e7b4	pageserver: implement emergency mode for operating without control plane (#5469 ) ## Problem Pageservers with `control_plane_api` configured require a control plane to start up: in an incident this might be a problem. ## Summary of changes Note to reviewers: most of the code churn in mgr.rs is the refactor commit that enables the later emergency mode commit: you may want to review commits separately. - Add `control_plane_emergency_mode` configuration property - Refactor init_tenant_mgr to separate loading configurations from the main loop where we construct Tenant, so that the generations fetch can peek at the configs in emergency mode. - During startup, in emergency mode, attach any tenants that were attached on their last run, using the same generation number. Closes: #5381 Closes: https://github.com/neondatabase/neon/issues/5492	2023-10-06 17:25:21 +01:00
John Spray	547914fe19	pageserver: adjust timeline deletion for generations (#5453 ) ## Problem Spun off from https://github.com/neondatabase/neon/pull/5449 Timeline deletion does the following: 1. Delete layers referenced in the index 2. Delete everything else in the timeline prefix, except the index 3. Delete the index. When generations were added, the filter in step 2 got outdated, such that the index objects were deleted along with everything else at step 2. That didn't really break anything, but it makes an automated test unhappy and is a violation of the original intent of the code, which presumably intends to upload an invariant that as long as any objects for a timeline exist, the index exists. (Eventually, this index-object-last complexity can go away: when we do https://github.com/neondatabase/neon/issues/5080, there is no need to keep the index_part around, as deletions can always be retried any time any where.) ## Summary of changes After object listing, split the listed objects into layers and index objects. Delete the layers first, then the index objects.	2023-10-06 16:15:18 +00:00
Arpad Müller	607b185a49	Fix 1.73.0 clippy lints (#5494 ) Doesn't do an upgrade of rustc to 1.73.0 as we want to wait for the cargo response of the curl CVE before updating. In preparation for an update, we address the clippy lints that are newly firing in 1.73.0.	2023-10-06 14:17:19 +01:00
Christian Schwarz	bfba5e3aca	page_cache: ensure forward progress on miss (#5482 ) Problem ======= Prior to this PR, when we had a cache miss, we'd get back a write guard, fill it, the drop it and retry the read from cache. If there's severe contention for the cache, it could happen that the just-filled data gets evicted before our retry, resulting in lost work and no forward progress. Solution ======== This PR leverages the now-available `tokio::sync::RwLockWriteGuard`'s `downgrade()` functionality to turn the filled slot write guard into a read guard. We don't drop the guard at any point, so, forward progress is ensured. Refs ==== Stacked atop https://github.com/neondatabase/neon/pull/5480 part of https://github.com/neondatabase/neon/issues/4743 specifically part of https://github.com/neondatabase/neon/issues/5479	2023-10-06 13:41:13 +01:00
Christian Schwarz	ecc7a9567b	page_cache: inline `{,try_}lock_for_write` into `memorize_materialized_page` (#5480 ) Motivation ========== It's the only user, and the name of `_for_write` is wrong as of commit `7a63685cde` Author: Christian Schwarz <christian@neon.tech> Date: Fri Aug 18 19:31:03 2023 +0200 simplify page-caching of EphemeralFile (#4994) Notes ===== This also allows us to get rid of the WriteBufResult type. Also rename `search_mapping_for_write` to `search_mapping_exact`. It makes more sense that way because there is `_for_write`-locking anymore. Refs ==== part of https://github.com/neondatabase/neon/issues/4743 specifically https://github.com/neondatabase/neon/issues/5479 this is prep work for https://github.com/neondatabase/neon/pull/5482	2023-10-06 13:38:02 +02:00
Joonas Koivunen	45f98dd018	debug_tool: get page at lsn and keyspace via http api (#5057 ) If there are any layermap or layer file related problems, having a reproducable `get_page@lsn` easily usable for fast debugging iteration is helpful. Split off from #4938. Later evolved to add http apis for: - `get_page@lsn` at `/v1/tenant/:tenant_id/timeline/:timeline_id/get?key=<hex>&lsn=<lsn string>` - collecting the keyspace at `/v1/tenant/:tenant_id/timeline/:timeline_id/keyspace?[at_lsn=<lsn string>]` - defaults to `last_record_lsn` collecting the keyspace seems to yield some ranges for which there is no key.	2023-10-06 12:17:38 +01:00
John Spray	bdfe27f3ac	swagger: add a 503 definition to each endpoint (#5476 ) ## Problem The control plane doesn't have generic handling for this. ## Summary of changes Add a 503 response to every endpoint.	2023-10-06 11:31:49 +01:00
Joonas Koivunen	a15f9b3baa	pageserver: Tune 503 Resource unavailable (#5489 ) 503 Resource Unavailable appears as error in logs, but is not really an error which should ever fail a test on, or even log an error in prod, [evidence]. Changes: - log 503 as `info!` level - use `Cow<'static, str>` instead of `String` - add an additional `wait_until_tenant_active` in `test_actually_duplicate_l1` We ought to have in tests "wait for tenants to complete loading" but this is easier to implement for now. [evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5485/6423110295/index.html#/testresult/182de66203864fc0	2023-10-06 09:59:14 +01:00
Alexander Bayandin	ce92638185	test_runner: allow race in test_tenant_delete_is_resumed_on_attach (#5478 ) ## Problem `test_tenant_delete_is_resumed_on_attach` is flaky ## Summary of changes - Allow race in `test_tenant_delete_is_resumed_on_attach` - Cleanup `allowed_errors` in the file a bit	2023-10-06 09:49:31 +01:00
Joonas Koivunen	a3c82f19b8	tests: prettier subprocess output in test log (#5485 ) Clean subprocess output so that: - one line of output is just one line without a linebreak - like shells handle `echo subshell says: $(echo foo)` - multiple lines are indented like other pytest output - error output is dedented and then indented to be like other pytest output Minor readability changes remove friction.	2023-10-05 20:15:55 +00:00
Arthur Petukhovsky	8b15252f98	Move walproposer state into struct (#5364 ) This patch extracts all postgres-dependent functions in a separate `walproposer_api` functions struct. It helps to compile walproposer as static library without compiling all other postgres server code. This is useful to allow calling walproposer C code from Rust, or linking this library with anything else. All global variables containing walproposer state were extracted to a separate `WalProposer` struct. This makes it possible to run several walproposers in the same process, in separate threads. There were no logic changes and PR mostly consists of shuffling functions between several files. We have a good test coverage for walproposer code and I've seen no issues with tests while I was refactoring it, so I don't expect any issues after merge. ref https://github.com/neondatabase/neon/issues/547 --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2023-10-05 18:48:01 +01:00
Alexander Bayandin	522aaca718	Temporary deploy staging preprod region from main (#5477 ) ## Problem Stating preprod region can't use `release-XXX` right now, the config is unified across all regions, it supports only `XXX`. Ref https://neondb.slack.com/archives/C03H1K0PGKH/p1696506459720909?thread_ts=1696437812.365249&cid=C03H1K0PGKH ## Summary of changes - Deploy staging-preprod from main	2023-10-05 14:02:20 +00:00
John Spray	7cbb39063a	tests: stabilize + extend deletion queue recovery test (#5457 ) ## Problem This test was unstable when run in parallel with lots of others: if the pageserver stayed up long enough for some of the deletions to get validated, they won't be discarded on restart the way the test expects when keep_attachment=True. This was a test bug, not a pageserver bug. ## Summary of changes - Add failpoints to control plane api client - Use failpoint to pause validation in the test to cover the case where it had been flaky - Add a metric for the number of deleted keys validated - Add a permutation to the test to additionally exercise the case where we _do_ validate lists before restart: this is a coverage enhancement that seemed sensible when realizing that the test was relying on nothing being validated before restart. - the test will now always enter the restart with nothing or everything validated.	2023-10-05 11:22:05 +01:00
John Spray	baa5fa1e77	pageserver: location configuration API, attachment modes, secondary locations (#5299 ) ## Problem These changes are part of building seamless tenant migration, as described in the RFC: - https://github.com/neondatabase/neon/pull/5029 ## Summary of changes - A new configuration type `LocationConf` supersedes `TenantConfOpt` for storing a tenant's configuration in the pageserver repo dir. It contains `TenantConfOpt`, as well as a new `mode` attribute that describes what kind of location this is (secondary, attached, attachment mode etc). It is written to a file called `config-v1` instead of `config` -- this prepares us for neatly making any other profound changes to the format of the file in future. Forward compat for existing pageserver code is achieved by writing out both old and new style files. Backward compat is achieved by checking for the old-style file if the new one isn't found. - The `TenantMap` type changes, to hold `TenantSlot` instead of just `Tenant`. The `Tenant` type continues to be used for attached tenants only. Tenants in other states (such as secondaries) are represented by a different variant of `TenantSlot`. - Where `Tenant` & `Timeline` used to hold an Arc<Mutex<TenantConfOpt>>, they now hold a reference to a AttachedTenantConf, which includes the extra information from LocationConf. This enables them to know the current attachment mode. - The attachment mode is used as an advisory input to decide whether to do compaction and GC (AttachedStale is meant to avoid doing uploads, AttachedMulti is meant to avoid doing deletions). - A new HTTP API is added at `PUT /tenants/<tenant_id>/location_config` to drive new location configuration. This provides a superset of the functionality of attach/detach/load/ignore: - Attaching a tenant is just configuring it in an attached state - Detaching a tenant is configuring it to a detached state - Loading a tenant is just the same as attaching it - Ignoring a tenant is the same as configuring it into Secondary with warm=false (i.e. retain the files on disk but do nothing else). Caveats: - AttachedMulti tenants don't do compaction in this PR, but they do in the follow on #5397 - Concurrent updates to the `location_config` API are not handled elegantly in this PR, a better mechanism is added in the follow on https://github.com/neondatabase/neon/pull/5367 - Secondary mode is just a placeholder in this PR: the code to upload heatmaps and do downloads on secondary locations will be added in a later PR (but that shouldn't change any external interfaces) Closes: https://github.com/neondatabase/neon/issues/5379 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-10-05 09:55:10 +01:00
Conrad Ludgate	c216b16b0f	proxy: fix memory leak (#5472 ) ## Problem these JoinSets live for the duration of the process. they might have many millions of connections spawned on them and they never get cleared. Fixes #4672 ## Summary of changes Drain the connections as we go	2023-10-05 07:30:28 +01:00
John Spray	c5ea91f831	pageserver: fix loading control plane JWT token (#5470 ) ## Problem In #5383 this configuration was added, but it missed the parts of the Builder class that let it actually be used. ## Summary of changes Add `control_plane_api_token` hooks to PageserverConfigBuilder	2023-10-05 01:31:17 +01:00
Em Sharnoff	6489a4ea40	vm-monitor: Remove mem::forget of tokio::sync::mpsc::Sender (#5441 ) If the cgroup integration was not enabled, this would cause compute_ctl to leak memory. Thankfully, we never use vm-monitor without the cgroup handling enabled, so this wasn't actually impacting us, but... it still looked suspicious, so figured it was worth changing.	2023-10-04 15:08:10 -07:00
Arthur Petukhovsky	f8a7498965	Wait for sk tli init in test_timeline_status (#5467 ) Fix #5447	2023-10-04 22:53:34 +01:00
Joonas Koivunen	7dce62a9ee	test: duplicate L1 layer (#5412 ) We overwrite L1 layers if compaction gets interrupted. We did not have a test showing that we do in fact do this. The test might be a bit flaky due to timestamp usage, but separating for smaller diff in as part of #5172. Also removes an unrelated 200s pgbench from the test suite.	2023-10-04 16:52:32 +01:00
Alexander Bayandin	7a2cafb34d	Use zstd to compress large allure artifacts (#5458 ) ## Problem - Because we compress artifacts file by file, we don't need to put them into `tar` containers (ie instead of `tar.gz` we can use just `gz`). - Pythons gz single-threaded and pretty slow. A benchmark has shown ~20 times speedup (19.876176291 vs 0.8748335830000009) on my laptop (for a pageserver.log size is 1.3M) ## Summary of changes - Replace tarfile with zstandart - Update allure to 2.24.0	2023-10-04 16:20:16 +01:00
duguorong009	25a37215f3	fix: replace all `std::PathBuf`s with `camino::Utf8PathBuf` (#5352 ) Fixes #4689 by replacing all of `std::Path` , `std::PathBuf` with `camino::Utf8Path`, `camino::Utf8PathBuf` in - pageserver - safekeeper - control_plane - libs/remote_storage Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-10-04 17:52:23 +03:00
Conrad Ludgate	f002b1a219	proxy: http limits (#5460 ) ## Problem 1MB request body is apparently too small for some clients ## Summary of changes Update to 10 MB request body. Also revert the removal of response limits while we don't have streaming support.	2023-10-04 15:01:05 +01:00
Joonas Koivunen	fc467941f9	walredo: log retryed error (#5462 ) We currently lose the actual reason the first walredo attempt failed. Together with implicit retry making it difficult to eyeball what is happening. PR version keeps the logging the same error message twice, which is what we've been doing all along. However correlating the retrying case and the finally returned error is difficult, because the actual error message was left out before this PR. Lastly, log the final error we present to postgres in the same span, not outside it. Additionally, suppress the stacktrace as the comment suggested.	2023-10-04 14:19:19 +01:00
Christian Schwarz	25bf791568	metrics: distinguish page reconstruction success & failure (#5463 ) Here's the existing dashboards that use the metric: https://github.com/search?q=repo%3Aneondatabase%2Fgrafana-dashboard-export%20pageserver_getpage_reconstruct_seconds&type=code Looks like only `_count` and `_sum` values are used currently. We can fix them up easily post merge. I think the histogram is worth keeping, though. follow-up to https://github.com/neondatabase/neon/pull/5459#pullrequestreview-1657072882 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-10-04 13:40:00 +01:00
Joonas Koivunen	dee2bcca44	fix: time the reconstruction, not future creation (#5459 ) `pageserver_getpage_reconstruct_seconds` histogram had been only recording the time it takes to create a future, not await on it. Since: `eb0a698adc`.	2023-10-04 11:01:07 +01:00
Joonas Koivunen	db8ff9d64b	testing: record walredo failures to test reports (#5451 ) We have rare walredo failures with pg16. Let's introduce recording of failing walredo input in `#[cfg(feature = "testing")]`. There is additional logging (the value reconstruction path logging usually shown with not found keys), keeping it for `#[cfg(features = "testing")]`. Cc: #5404.	2023-10-04 11:24:30 +03:00
Rahul Modpur	af6a20dfc2	Improve CrashsafeOverwriteError source printing (#5410 ) ## Problem Duplication of error in log Fixes #5366 ## Summary of changes Removed `{0}` from error description above each enum due to presence of `#[source]` to avoid duplication Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>	2023-10-04 02:38:42 +02:00
Alexander Bayandin	fec94ad5b3	Update checksums for pg_jsonschema & pg_graphql (#5455 ) ## Problem Folks have re-taged releases for `pg_jsonschema` and `pg_graphql` (to increase timeouts on their CI), for us, these are a noop changes, but unfortunately, this will cause our builds to fail due to checksums mismatch (this might not strike right away because of the build cache). - `8ba7c7be9d` - `aa7509370a` ## Summary of changes - `pg_jsonschema` update checksum - `pg_graphql` update checksum	2023-10-03 18:42:39 +01:00
John Spray	ace0c775fc	pageserver: prefer 503 to 500 for transient unavailability (#5439 ) ## Problem The 500 status code should only be used for bugs or unrecoverable failures: situations we did not expect. Currently, the pageserver is misusing this response code for some situations that are totally normal, like requests targeting tenants that are in the process of activating. The 503 response is a convenient catch-all for "I can't right now, but I will be able to". ## Summary of changes - Change some transient availability error conditions to return 503 instead of 500 - Update the HTTP client configuration in integration tests to retry on 503 After these changes, things like creating a tenant and then trying to create a timeline within it will no longer require carefully checking its status first, or retrying on 500s. Instead, a client which is properly configured to retry on 503 can quietly handle such situations.	2023-10-03 17:00:55 +01:00
dependabot[bot]	78dde31827	build(deps): bump urllib3 from 1.26.11 to 1.26.17 (#5442 )	2023-10-03 11:50:27 +01:00
Christian Schwarz	de0e96d2be	remote_storage: separate semaphores for read and write ops (#5440 ) Before this PR, a compaction that queues a lot of uploads could grab all the semaphore permits. Any readers that need on-demand downloads would queue up, causing getpage@lsn outliers. Internal context: https://neondb.slack.com/archives/C05NXJFNRPA/p1696264359425419?thread_ts=1696250393.840899&cid=C05NXJFNRPA	2023-10-03 11:22:11 +03:00
Alexander Bayandin	00369c8c2a	Update pg_jsonschema & pg_grapgql extensions (#5438 ) - Update `pg_jsonschema` to 0.2.0 with Postgres 16 support - Update `pg_grapgql` to 1.4.0 with Postgres 16 support - Remove `pgx` (old name of `pgrx`) layer from Dockerfile	2023-10-02 23:50:27 +01:00
Vadim Kharitonov	c1dcf61ca2	Update pgx-ulid extension (#5382 ) - Update `pgx-ulid` from 0.1.0 to 0.1.3, and add it to Postgres 16 - Add `pg_tiktoken` to Postgres 16 image Closes #5374 --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-10-02 15:52:45 +01:00
Sasha Krassovsky	89275f6c1e	Fix invalid database resulting from failed DROP DB (#5423 ) ## Problem If the control plane happened to respond to a DROP DATABASE request with a non-200 response, we'd abort the DROP DATABASE transaction in the usual spot. However, Postgres for some reason actually performs the drop inside of `standard_ProcessUtility`. As such, the database was left in a weird state after aborting the transaction. We had test coverage of a failed CREATE DATABASE but not a failed DROP DATABASE. ## Summary of changes Since DROP DATABASE can't be inside of a transaction block, we can just forward the DDL changes to the control plane inside of `ProcessUtility_hook`, and if we respond with 500 bail out of `ProcessUtility` before we perform the drop. This change also adds a test, which reproduced the invalid database issue before the fix was applied.	2023-09-29 19:39:28 +01:00
Christian Schwarz	c07eef8ea5	page_cache: find_victim: don't spin while there's no chance for a slot (#5319 ) It is wasteful to cycle through the page cache slots trying to find a victim slot if all the slots are currently un-evictable because a read / write guard is alive. We suspect this wasteful cycling to be the root cause for an "indigestion" we observed in staging (#5291). The hypothesis is that we `.await` after we get ahold of a read / write guard, and that tokio actually deschedules us in favor of another future. If that other future then needs a page slot, it can't get ours because we're holding the guard. Repeat this, and eventually, the other future(s) will find themselves doing `find_victim` until they hit `exceeded evict iter limit`. The `find_victim` is wasteful and CPU-starves the futures that are already holding the read/write guard. A `yield` inside `find_victim` could mitigate the starvation, but wouldn't fix the wasting of CPU cycles. So instead, this PR queues waiters behind a tokio semaphore that counts evictable slots. The downside is that this stops the clock page replacement if we have 0 evictable slots. Also, as explained by the big block comment in `find_victims`, the semaphore doesn't fully prevent starvation because because we can't make tokio prioritize those tasks executing `find_victim` that have been trying the longest. Implementation =============== We need to acquire the semaphore permit before locking the slot. Otherwise, we could deadlock / discover that all permits are gone and would have to relinquish the slot, having moved forward the Clock LRU without making progress. The downside is that, we never get full throughput for read-heavy workloads, because, until the reader coalesces onto an existing permit, it'll hold its own permit. Addendum To Root-Cause Analysis In #5291 ======================================== Since merging that PR, @arpad-m pointed out that we couldn't have reached the `slot.write().await` with his patches because the VirtualFile slots can't have all been write-locked, because we only hold them locked while the IO is ongoing, and the IO is still done with synchronous system calls in that patch set, so, we can have had at most $number_of_executor_threads locked at any given time. I count 3 tokio runtimes that do `Timeline::get`, each with 8 executor threads in our deployment => $number_of_executor_threads = 3*8 = 24 . But the virtual file cache has 100 slots. We both agree that nothing changed about the core hypothesis, i.e., additional await points inside VirtualFile caused higher concurrency resulting in exhaustion of page cache slots. But we'll need to reproduce the issue and investigate further to truly understand the root cause, or find out that & why we were indeed using 100 VirtualFile slots. TODO: could it be compaction that needs to hold guards of many VirtualFile's in its iterators?	2023-09-29 20:03:56 +02:00
Alexander Bayandin	86dd28d4fb	Bump hermit-abi & num_cpus packages (#5427 ) ## Problem I've noticed that `hermit-abi` 0.3.1 [1] has been yanked from crates.io (looks like nothing too bad [2]). Also, we have 2 versions of `hermit-api` in dependencies (0.3.* and 0.2.*), update `num-cpus` to use the latest `hermit-api` 0.3.3. - [1] https://crates.io/crates/hermit-abi/0.3.1 - [2] https://github.com/hermit-os/hermit-rs/issues/436 ## Summary of changes - `cargo update -p num-cpus` - `cargo update -p hermit-abi` - Unignore `RUSTSEC-2023-0052` in `deny.toml` (it has been fixed in https://github.com/neondatabase/neon/pull/5069)	2023-09-29 12:57:45 +01:00
Conrad Ludgate	fd20bbc6cb	proxy: log params when no endpoint (#5418 ) ## Problem Our SNI error dashboard features IP addresses but it's not immediately clear who that is still (#5369) ## Summary of changes Log some startup params with this error	2023-09-29 09:40:27 +01:00
John Spray	6a1903987a	tests: use approximate equality in test_get_tenant_size_with_multiple_branches (#5411 ) ## Problem This test has been flaky for a long time. As far as I can tell, the test was simply wrong to expect postgres activity to result in deterministic sizes: making the match fuzzy is not a hack, it's just matching the reality that postgres doesn't promise to write exactly the same number of pages every time it runs a given query. ## Summary of changes Equalities now tolerate up to 4 pages different. This is big enough to tolerate the deltas we've seen in practice. Closes: https://github.com/neondatabase/neon/issues/2962	2023-09-29 09:15:43 +01:00
John Spray	1881373ec4	Update CODEOWNERS (#5421 ) It is usually not intended to notify a random member of the compute team for pageserver PRs. Leaving the notification of the storage team in place, because this serves a purpose when some external contributor opens a PR and isn't sure who to ask.	2023-09-28 17:34:51 +01:00
John Spray	ca3ca2bb9c	pageserver: don't try and recover deletion queue if no remote storage (#5419 ) ## Problem Because `neon_local` by default runs with no remote storage, it was not running the deletion queue workers, and the attempt to call into `recover()` was failing. This is a bogus configuration that will go away when we make remote storage mandatory. ## Summary of changes Don't try and do deletion queue recovery when remote storage is disabled. The reason we don't just unset `control_plane_api` to avoid this is that generations will soon become mandatory, irrespective of when we make remote storage mandatory.	2023-09-28 17:20:34 +01:00
Em Sharnoff	b497d0094e	file cache: Remove free space monitor (#5406 ) This effectively reverts #3832. There's a couple issues we just discovered with the free space monitor, and to my knowledge, the fact we're putting the file cache on a separate filesystem (even when on disk) that's guaranteed to have more room than the maximum size means that this free space monitor should have no effect. More details: 1. The control plane sets the maximum file cache size based on max CU 2. The control plane sets the size of the filesystem underlying the file cache based on the maximum user selectable CU (or, if the endpoint is larger, then that size), so that there's always enough room 3. If postmaster gets SIGKILL'd, then the free space monitor process does not exit 4. If the free space monitor is acting on the cache file but not subject to locking or up-to-date metadata from a newer postgres instance, then this could lead to data corruption. So, in practice I belive the risk of data corruption is low but not nothing, and given the issues we hit because of (3), and given that this the free space monitor shouldn't be necessary because of (1) and (2), it's best to just remove it outright. See also: neondatabase/autoscaling#534, #5405	2023-09-28 06:47:44 -07:00
Conrad Ludgate	528fb1bd81	proxy: metrics2 (#5179 ) ## Problem We need to count metrics always when a connection is open. Not only when the transfer is 0. We also need to count bytes usage for HTTP. ## Summary of changes New structure for usage metrics. A `DashMap<Ids, Arc<Counters>>`. If the arc has 1 owner (the map) then I can conclude that no connections are open. If the counters has "open_connections" non zero, then I can conclude a new connection was opened in the last interval and should be reported on. Also, keep count of how many bytes processed for HTTP and report it here.	2023-09-28 11:38:26 +01:00
Joonas Koivunen	af28362a47	tests: Default to LOCAL_FS for pageserver remote storage (#5402 ) Part of #5172. Builds upon #5243, #5298. Includes the test changes: - no more RemoteStorageKind.NOOP - no more testing of pageserver without remote storage - benchmarks now use LOCAL_FS as well Support for running without RemoteStorage is still kept but in practice, there are no tests and should not be any tests. Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-09-28 12:25:20 +03:00
John Spray	6b4bb91d0a	docs/rfcs: add RFC for fast tenant migration/failover (#5029 ) ## Problem Currently we don't have a way to migrate tenants from one pageserver to another without a risk of gap in availability. ## Summary of changes This follows on from https://github.com/neondatabase/neon/pull/4919 Migrating tenants between pageservers is essential to operating a service at scale, in several contexts: 1. Responding to a pageserver node failure by migrating tenants to other pageservers 2. Balancing load and capacity across pageservers, for example when a user expands their database and they need to migrate to a pageserver with more capacity. 3. Restarting pageservers for upgrades and maintenance Currently, a tenant may migrated by attaching to a new node, re-configuring endpoints to use the new node, and then later detaching from the old node. This is safe once [generation numbers](025-generation-numbers.md) are implemented, but does meet our seamless/fast/efficient goals: Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-09-28 10:07:11 +01:00
Em Sharnoff	5fdc80db03	Bump vm-builder v0.17.11 -> v0.17.12 (#5407 ) Only relevant change is neondatabase/autoscaling#534 - refer there for more details.	2023-09-28 09:52:39 +02:00
Em Sharnoff	48e85460fc	vm-monitor: Unset memory.high on start + refactor cgroup handling (#5348 ) ## Problem Over the past couple days, we've had a couple VMs hit issues with postgres getting hit by memory.high throttling, even after #5303 was supposed to fix that. The tl;dr of those issues is that because vm-monitor startup sets the file cache size first, before interacting with the cgroup, cgroup throttling can mean we timeout connecting to the file cache and never reset the cgroup, even if memory has been upscaled since then. See e.g.: - https://neondb.slack.com/archives/C03F5SM1N02/p1695218132208249 - https://neondb.slack.com/archives/C03F5SM1N02/p1695314613696659 ## Summary of changes This PR adds an additional step into vm-monitor startup, where we first set the cgroup's memory.high value to 'max', removing the capacity for throttling. This preferable to just setting memory.high before the file cache, because it's theoretically possible that the new value of memory.high could still be less than the current memory usage, in which case postgres could continue to be throttled without sufficient memory events to relieve that. Implementing this properly involved adding a method to our internal cgroup interface, and it seemed like there was duplicated functionality there, so this PR unifies that as well, making things a bit more consistent.	2023-09-27 21:27:23 -07:00
Christian Schwarz	090a644392	metrics for resident & remote physical size without tenant/timeline dimension (#5389 ) So that we can compute worst-case /storage size dashboard panel more cheaply.	2023-09-27 13:18:05 +01:00
John Spray	2cced770da	pageserver: add control_plane_api_token config (#5383 ) ## Problem Control plane API calls in prod will need authentication. ## Summary of changes `control_plane_api_token` config is loaded and set as HTTP `Authorization` header. Closes: https://github.com/neondatabase/neon/issues/5139	2023-09-27 13:12:13 +01:00
MMeent	7038ce40ce	Fix neon_zeroextend's WAL logging (#5387 ) When you log more than a few blocks, you need to reserve the space in advance. We didn't do that, so we got errors. Now we do that, and shouldn't get errors.	2023-09-27 13:48:30 +02:00
Joonas Koivunen	ce45fd4cc7	test_pageserver_metric_collection: allowed synthetic size to be cancelled at shutdown (#5398 ) [evidence] of these messages during shutdown. They can happen if we are unlucky enough. [evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/main/6323709725/index.html#suites/e557ea0d920cfebd45c1921296031273/4120269a64eed172	2023-09-27 12:00:49 +01:00
Joonas Koivunen	6cc8c31fd8	disk_usage_based_eviction: switch warmup to use full table scans (#5384 ) Fixes #3978. `test_partial_evict_tenant` can fail multiple times so even though we retry it as flaky, it will still haunt us. Originally was going to just relax the comparison, then ended up replacing warming up to use full table scans instead of `pgbench --select-only`. This seems to help by producing the expected layer accesses. There might be something off with how many layers pg16 produces compared to pg14 and pg15. Created #5392.	2023-09-27 10:00:21 +01:00
John Spray	ba92668e37	pageserver: deletion queue & generation validation for deletions (#5207 ) ## Problem Pageservers must not delete objects or advertise updates to remote_consistent_lsn without checking that they hold the latest generation for the tenant in question (see [the RFC]( https://github.com/neondatabase/neon/blob/main/docs/rfcs/025-generation-numbers.md)) In this PR: - A new "deletion queue" subsystem is introduced, through which deletions flow - `RemoteTimelineClient` is modified to send deletions through the deletion queue: - For GC & compaction, deletions flow through the full generation verifying process - For timeline deletions, deletions take a fast path that bypasses generation verification - The `last_uploaded_consistent_lsn` value in `UploadQueue` is replaced with a mechanism that maintains a "projected" lsn (equivalent to the previous property), and a "visible" LSN (which is the one that we may share with safekeepers). - Until `control_plane_api` is set, all deletions skip generation validation - Tests are introduced for the new functionality in `test_pageserver_generations.py` Once this lands, if a pageserver is configured with the `control_plane_api` configuration added in https://github.com/neondatabase/neon/pull/5163, it becomes safe to attach a tenant to multiple pageservers concurrently. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-09-26 16:11:55 +01:00
Joonas Koivunen	16f0622222	fix: real_s3 flakyness with rust tests (#5386 ) Fixes #5072. See proof from https://github.com/neondatabase/neon/issues/5072#issuecomment-1735580798. Turns out multiple threads can get the same nanoseconds since epoch, so switch to using millis (for finding the prefix later on) and randomness via `thread_rng` (protect against adversial ci runners). Also changes the "per test looking alike" prefix to more "general" prefix.	2023-09-26 15:59:25 +01:00
Christian Schwarz	3322b6c5b0	page cache: metrics: add page content kind dimension (#5373 ) The TaskKind dimension added in #5339 is insufficient to understand what kind of data causes the cache hits. Regarding performance considerations: I'm not too worried because we're moving from 3 to 4 one-byte sized fields; likely the space now used by the new field was padding before. Didn't check this, though, and it doesn't matter, we need the data. What I don't like about this PR is that we have an `Unknown` content type, and I also don't like that there's no compile-time way to assert that it's set to something != `Unknown` when calling the page cache. But, this is what I could come up with before tomorrow’s release, and I think it covers the hot paths.	2023-09-26 10:01:09 +03:00
Konstantin Knizhnik	c338bb7423	Update last written LSN after walloging all createdb stuff (#5340 ) ## Problem See https://neondb.slack.com/archives/C033RQ5SPDH/p1694595347598249 ## Summary of changes Update last written LSN after walloging all createdb stuff ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-09-26 09:20:56 +03:00
Em Sharnoff	a24cd69589	Bump vm-builder v0.17.10 -> v0.17.11 (#5371 ) This only includes the changes from neondatabase/autoscaling#525, which improves graceful VM shutdown.	2023-09-25 19:49:07 +01:00
Christian Schwarz	1d98d3e4c1	VirtualFile::atomic_overwrite: add basic unit tests (#5191 ) Should have added them in the initial PR #5186. Would have been nice to test the failure cases as well, but, without mocking the FS, that's too hard / platform-dependent.	2023-09-25 17:16:36 +00:00
Christian Schwarz	a0c82969a2	page cache: per-task-kind access stats (#5339 ) This PR adds a `task_kind` label to page cache access metrics. These are to validate our hypothesis that the high hit page cache rate we observe in prod is due to internal tasks, not getpage requests from compute. We believe the latter should near-always be a pageserver-page-cache _miss_ because compute has it's own page cache, and hence there is no locality of reference for its accesses to pageserver page cache. Before this PR, we didn't have `RequestContext` propagation to any code below the on-demand downloader. The vast majority of changes in this PR is concerned with adding that propagation.	2023-09-25 18:30:10 +02:00
George MacKerron	d8977d5199	Altered retry timing parameters for connect to compute, to get more and quicker retries (#5358 ) ## Problem Compute start time has improved, but the timing of connection retries from the proxy is rather slow, meaning we could be making clients wait hundreds of milliseconds longer than necessary. ## Summary of changes Previously, retry time in ms was `100 * 1.5*n`, and `n` starts at 1, giving: 150, 225, 337, 506, 759, 1139, 1709, ... This PR changes that to `25 sqrt(2)**(n - 1)` instead, giving: 25, 35, 50, 71, 100, 141, 200, ...	2023-09-25 12:27:41 +01:00
Alexander Bayandin	211f882428	Update hyper-tungstenite to 0.11 (#5361 )	2023-09-23 18:06:25 +01:00
Alexander Bayandin	3a2e6a03bc	Forbid installation of `hnsw` extension (#5346 ) ## Problem Do not allow new installation of deprecated `hnsw` extension. The same approach as in https://github.com/neondatabase/neon/pull/5345 ## Summary of changes - Remove `trusted = true` from `hnsw.control` - Remove `hnsw` related targets from Makefile	2023-09-23 16:47:57 +01:00
Vadim Kharitonov	6d33d8b092	Update rust to 1.72.1 (#5359 )	2023-09-22 16:55:55 +01:00
Alexander Bayandin	3048a5f0e2	Deploy releases to staging-preprod first (#5308 ) ## Problem Before releasing new version to production, we'd like to run a set of required checks on the incoming release. The simplest approach, which doesn't require many changes — dedicate one staging region to `preprod` installation. The proposed changes to the release flow are the following: - When a release PR is merged into the release branch — trigger deployment from the release branch to a dedicated staging-preprod region (for now, it's going to be `eu-west-1` — Ireland) Corresponding infrastructure PR: https://github.com/neondatabase/aws/pull/585 ## Summary of changes - Trigger `deploy.dev` workflow with `-f deployPreprodRegion=true` for release branch	2023-09-22 14:17:43 +01:00
dependabot[bot]	ae79978ae4	build(deps): bump cryptography from 41.0.3 to 41.0.4 (#5349 )	2023-09-22 13:15:33 +01:00
Heikki Linnakangas	810a355b9d	Add script to download a basebackup from pageserver. (#5344 ) I used this while investigating a production issue, and seems like it could come handy in the future, too.	2023-09-22 11:11:28 +00:00
Vadim Kharitonov	e1e1c08563	Forbid installation of `pg_embedding` extension (#5345 )	2023-09-21 22:28:56 +02:00
John Spray	97a571091e	README: update for libicu dependency (#5343 ) ## Problem In `83e7e5dbbd` dependencies were only updated for Mac users. Without libicu, postgres 16 build fails. ## Summary of changes Update dependencies on Ubuntu and fedora to include libicu.	2023-09-21 10:27:58 +02:00
Christian Schwarz	93b41cbb58	page cache metrics: remove unused read_accesses_ephemeral & read_hits_ephemeral (#5338 ) We removed the user of this in #4994 . But the metrics field was `pub`, so, didn't cause an unused-warning in #4994. This is preliminary for: #5339	2023-09-20 15:55:58 +00:00
Konstantin Knizhnik	6723a79bec	Do not handle lfc_change_limit in processes not haing PGPROC structure (#5332 ) ## Problem See https://neondb.slack.com/archives/C05L7D1JAUS/p1693775881474019 ## Summary of changes Do not perform local file cache resizing in processes having no PGPROC ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-09-19 21:55:36 +03:00
Joonas Koivunen	5d8597c2f0	refactor(consumption_metrics): post-split cleanup (#5327 ) Split off from #5297. Builds upon #5326. Handles original review comments which I did not move to earlier split PRs. Completes test support for verifying events by notifying of the last batch of events. Adds cleaning up of tempfiles left because of an unlucky shutdown or SIGKILL. Finally closes #5175. Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-09-18 23:30:01 +03:00
Em Sharnoff	722e5260bf	vm-monitor: Don't set cgroup memory.max (#5333 ) All it does is make postgres OOM more often (which, tbf, means we're less likely to have e.g. compute_ctl get OOM-killed, but that tradeoff isn't worth it). Internally, this means removing all references to `memory.max` and the places where we calculate or store the intended value. As discussed in the sync earlier. ref: - https://neondb.slack.com/archives/C03H1K0PGKH/p1694698949252439?thread_ts=1694505575.693449&cid=C03H1K0PGKH - https://neondb.slack.com/archives/C03H1K0PGKH/p1695049198622759	2023-09-18 17:47:48 +00:00
Em Sharnoff	18f3a706da	Bump vm-builder v0.17.5 -> v0.17.10 (#5334 ) Only notable change is including neondatabase/autoscaling#523, which we hope will help with making sure that TCP connections are properly terminated before shutdown (which hopefully fixes a leak in the pageserver).	2023-09-18 17:30:34 +00:00
Alexander Bayandin	70b17981a7	Enable compatibility tests on Postgres 16 (#5314 ) ## Problem We didn't have a Postgres 16 snapshot of data to run compatibility tests on, but now we have it (since the release). ## Summary of changes - remove `@skip_on_postgres(PgVersion.V16, ...)` from compatibility tests	2023-09-18 12:58:34 +01:00
Alexander Bayandin	0904d8cf4a	Downgrade plv8 for Postgres 14/15 (#5320 ) Backport https://github.com/neondatabase/neon/pull/5318 from release into main	2023-09-18 12:55:49 +01:00
Joonas Koivunen	55371af711	test: workaround known bad mock_s3 ListObjectsV2 response (#5330 ) this should allow test test_delete_tenant_exercise_crash_safety_failpoints with debug-pg16-Check.RETRY_WITH_RESTART-mock_s3-tenant-delete-before-remove-timelines-dir-True to pass more reliably.	2023-09-18 09:24:53 +02:00
Joonas Koivunen	e62ab176b8	refactor(consumption_metrics): split (#5326 ) Split off from #5297. Builds upon #5325, should contain only the splitting. Next up: #5327.	2023-09-16 18:45:08 +03:00
Joonas Koivunen	a221ecb0da	test: test_download_remote_layers_api again (#5322 ) The test is still flaky, perhaps more after #5233, see #3831. Do one more `timeline_checkpoint` after shutting down safekeepers before shutting down pageserver. Put more effort into not compacting or creating image layers.	2023-09-16 18:27:19 +03:00
Joonas Koivunen	9cf4ae86ff	refactor(consumption_metrics): pre-split cleanup (#5325 ) Cleanups in preparation to splitting the consumption_metrics.rs in #5326. Split off from #5297.	2023-09-16 18:08:33 +03:00
Joonas Koivunen	74d99b5883	refactor(test_consumption_metrics): split for pageserver and proxy (#5324 ) With the addition of the "stateful event verification" the test_consumption_metrics.py is now too crowded. Split it up for pageserver and proxy. Split from #5297.	2023-09-16 18:05:35 +03:00
Joonas Koivunen	f902777202	fix: consumption metrics on restart (#5323 ) Write collected metrics to disk to recover previously sent metrics on restart. Recover the previously collected metrics during startup, send them over at right time - send cached synthetic size before actual is calculated - when `last_record_lsn` rolls back on startup - stay at last sent `written_size` metric - send `written_size_delta_bytes` metric as 0 Add test support: stateful verification of events in python tests. Fixes: #5206 Cc: #5175 (loggings, will be enhanced in follow-up)	2023-09-16 11:24:42 +03:00
Joonas Koivunen	a7f4ee02a3	fix(consumption_metrics): exp backoff retry (#5317 ) Split off from #5297. Depends on #5315. Cc: #5175 for retry	2023-09-16 01:11:01 +03:00
Joonas Koivunen	00c4c8e2e8	feat(consumption_metrics): remove event deduplication support (#5316 ) We no longer use pageserver deduplication anywhere. Give out a warning instead. Split off from #5297. Cc: #5175 for dedup.	2023-09-16 00:06:19 +03:00
Joonas Koivunen	c5d226d9c7	refactor(consumption_metrics): prereq refactorings, tests (#5315 ) Split off from #5297. There should be no functional changes here: - refactor tenant metric "production" like previously timeline, allows unit testing, though not interesting enough yet to test - introduce type aliases for tuples - extra refactoring for `collect`, was initially thinking it was useful but will do a inline later - shorter binding names - support for future allocation reuse quests with IdempotencyKey - move code out of tokio::select to make it rustfmt-able - generification, allow later replacement of `&'static str` with enum - add tests that assert sent event contents exactly	2023-09-15 19:44:14 +03:00
Konstantin Knizhnik	66fa176cc8	Handle update of VM in XLOG_HEAP_LOCK/XLOG_HEAP2_LOCK_UPDATED WAL records (#4896 ) ## Problem VM should be updated if XLH_LOCK_ALL_FROZEN_CLEARED flags is set in XLOG_HEAP_LOCK,XLOG_HEAP_2_LOCK_UPDATED WAL records ## Summary of changes Add handling of this records in walingest.rs ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-09-15 17:47:29 +03:00
Heikki Linnakangas	9e6b5b686c	Add a test case for "CREATE DATABASE STRATEGY=file_copy". (#5301 ) It was utterly broken on v15 before commit `83e7e5dbbd`, which fixed the incorrect definition of XLOG_DBASE_CREATE_WAL_LOG. We never noticed because we had no tests for it.	2023-09-15 16:50:57 +03:00
Rahul Modpur	e6985bd098	Move tenant & timeline dir method to NeonPageserver and use them everywhere (#5262 ) ## Problem In many places in test code, paths are built manually from what NeonEnv.tenant_dir and NeonEnv.timeline_dir could do. ## Summary of changes 1. NeonEnv.tenant_dir and NeonEnv.timeline_dir moved under class NeonPageserver as the path they use is per-pageserver instance. 2. Used these everywhere to replace manual path building Closes #5258 --------- Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>	2023-09-15 11:17:18 +01:00
Konstantin Knizhnik	e400a38fb9	References to old and new blocks were mixed in xlog_heap_update handler (#5312 ) ## Problem See https://neondb.slack.com/archives/C05L7D1JAUS/p1694614585955029 https://www.notion.so/neondatabase/Duplicate-key-issue-651627ce843c45188fbdcb2d30fd2178 ## Summary of changes Swap old/new block references ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-09-15 10:32:25 +03:00
Alexander Bayandin	bd36d1c44a	approved-for-ci-run.yml: fix variable name and permissions (#5307 ) ## Problem - `gh pr list` fails with `unknown argument "main"; please quote all values that have spaces due to using a variable with the wrong name - `permissions: write-all` are too wide for the job ## Summary of changes - For variable name `HEAD` -> `BRANCH` - Grant only required permissions for each job --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-14 20:18:49 +03:00
Alexander Bayandin	0501b74f55	Update checksum for pg_hint_plan (#5309 ) ## Problem The checksum for `pg_hint_plan` doesn't match: ``` sha256sum: WARNING: 1 computed checksum did NOT match ``` Ref https://github.com/neondatabase/neon/actions/runs/6185715461/job/16793609251?pr=5307 It seems that the release was retagged yesterday: https://github.com/ossc-db/pg_hint_plan/releases/tag/REL16_1_6_0 I don't see any malicious changes from 15_1.5.1: https://github.com/ossc-db/pg_hint_plan/compare/REL15_1_5_1...REL16_1_6_0, so it should be ok to update. ## Summary of changes - Update checksum for `pg_hint_plan` 16_1.6.0	2023-09-14 18:17:50 +03:00
Em Sharnoff	3895829bda	vm-monitor: Fix cgroup throttling (#5303 ) I believe this (not actual IO problems) is the cause of the "disk speed issue" that we've had for VMs recently. See e.g.: 1. https://neondb.slack.com/archives/C03H1K0PGKH/p1694287808046179?thread_ts=1694271790.580099&cid=C03H1K0PGKH 2. https://neondb.slack.com/archives/C03H1K0PGKH/p1694511932560659 The vm-informant (and now, the vm-monitor, its replacement) is supposed to gradually increase the `neon-postgres` cgroup's memory.high value, because otherwise the kernel will throttle all the processes in the cgroup. This PR fixes a bug with the vm-monitor's implementation of this behavior. --- Other references, for the vm-informant's implementation: - Original issue: neondatabase/autoscaling#44 - Original PR: neondatabase/autoscaling#223	2023-09-14 13:21:50 +03:00
Joonas Koivunen	ffd146c3e5	refactor: globals in tests (#5298 ) Refactor tests to have less globals. This will allow to hopefully write more complex tests for our new metric collection requirements in #5297. Includes reverted work from #4761 related to test globals. Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: MMeent <matthias@neon.tech>	2023-09-13 22:05:30 +03:00
Konstantin Knizhnik	1697e7b319	Fix lfc_ensure_function which now disables LFC (#5294 ) ## Problem There was a bug in lfc_ensure_opened which actually disables LFC ## Summary of changes Return true ifLFC file is normally opened ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-09-13 08:56:03 +03:00
bojanserafimov	8556d94740	proxy http: reproduce issue with transactions in pool (#5293 ) xfail test reproducing issue https://github.com/neondatabase/neon/issues/4698	2023-09-12 17:13:25 -04:00
MMeent	3b6b847d76	Fixes for Pg16: (#5292 ) - pagestore_smgr.c had unnecessary WALSync() (see #5287 ) - Compute node dockerfile didn't build the neon_rmgr extension - Add PostgreSQL 16 image to docker-compose tests - Fix issue with high CPU usage in Safekeeper due to a bug in WALSender Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-09-12 22:02:03 +03:00
Alexander Bayandin	2641ff3d1a	Use CI_ACCESS_TOKEN to create release PR (#5286 ) ## Problem If @github-actions creates release PR, the CI pipeline is not triggered (but we have `release-notify.yml` workflow that we expect to run on this event). I suspect this happened because @github-actions is not a repository member. Ref https://github.com/neondatabase/neon/pull/5283#issuecomment-1715209291 ## Summary of changes - Use `CI_ACCESS_TOKEN` to create a PR - Use `gh` instead of `thomaseizinger/create-pull-request` - Restrict permissions for GITHUB_TOKEN to `contents: write` only (required for `git push`)	2023-09-12 20:01:21 +01:00
Alexander Bayandin	e1661c3c3c	approved-for-ci-run.yml: fix ci-run/pr-* branch deletion (#5278 ) ## Problem `ci-run/pr-*` branches (and attached PRs) should be deleted automatically when their parent PRs get closed. But there are not ## Summary of changes - Fix if-condition	2023-09-12 19:29:26 +03:00
Alexander Bayandin	9c3f38e10f	Document how to run CI for external contributors (#5279 ) ## Problem We don't have this instruction written anywhere but in internal Slack ## Summary of changes - Add `How to run a CI pipeline on Pull Requests from external contributors` section to `CONTRIBUTING.md` --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-09-12 16:53:13 +01:00
Christian Schwarz	ab1f37e908	revert recent VirtualFile asyncification changes (#5291 ) Motivation ========== We observed two "indigestion" events on staging, each shortly after restarting `pageserver-0.eu-west-1.aws.neon.build`. It has ~8k tenants. The indigestion manifests as `Timeline::get` calls failing with `exceeded evict iter limit` . The error is from `page_cache.rs`; it was unable to find a free page and hence failed with the error. The indigestion events started occuring after we started deploying builds that contained the following commits: ``` [~/src/neon]: git log --oneline c0ed362790caa368aa65ba57d352a2f1562fd6bf..15eaf78083ecff62b7669 091da1a1c8b4f60ebf8 `15eaf7808` Disallow block_in_place and Handle::block_on (#5101) `a18d6d9ae` Make File opening in VirtualFile async-compatible (#5280) `76cc87398` Use tokio locks in VirtualFile and turn with_file into macro (#5247) ``` The second and third commit are interesting. They add .await points to the VirtualFile code. Background ========== On the read path, which is the dominant user of page cache & VirtualFile during pageserver restart, `Timeline::get` `page_cache` and VirtualFile interact as follows: 1. Timeline::get tries to read from a layer 2. This read goes through the page cache. 3. If we have a page miss (which is known to be common after restart), page_cache uses `find_victim` to find an empty slot, and once it has found a slot, it gives exclusive ownership of it to the caller through a `PageWriteGuard`. 4. The caller is supposed to fill the write guard with data from the underlying backing store, i.e., the layer `VirtualFile`. 5. So, we call into `VirtualFile::read_at`` to fill the write guard. The `find_victim` method finds an empty slot using a basic implementation of clock page replacement algorithm. Slots that are currently in use (`PageReadGuard` / `PageWriteGuard`) cannot become victims. If there have been too many iterations, `find_victim` gives up with error `exceeded evict iter limit`. Root Cause For Indigestion ========================== The second and third commit quoted in the "Motivation" section introduced `.await` points in the VirtualFile code. These enable tokio to preempt us and schedule another future __while__ we hold the `PageWriteGuard` and are calling `VirtualFile::read_at`. This was not possible before these commits, because there simply were no await points that weren't Poll::Ready immediately. With the offending commits, there is now actual usage of `tokio::sync::RwLock` to protect the VirtualFile file descriptor cache. And we __know__ from other experiments that, during the post-restart "rush", the VirtualFile fd cache __is__ too small, i.e., all slots are taken by _ongoing_ VirtualFile operations and cannot be victims. So, assume that VirtualFile's `find_victim_slot`'s `RwLock::write().await` calls _will_ yield control to the executor. The above can lead to the pathological situation if we have N runnable tokio tasks, each wanting to do `Timeline::get`, but only M slots, N >> M. Suppose M of the N tasks win a PageWriteGuard and get preempted at some .await point inside `VirtualFile::read_at`. Now suppose tokio schedules the remaining N-M tasks for fairness, then schedules the first M tasks again. Each of the N-M tasks will run `find_victim()` until it hits the `exceeded evict iter limit`. Why? Because the first M tasks took all the slots and are still holding them tight through their `PageWriteGuard`. The result is massive wastage of CPU time in `find_victim()`. The effort to find a page is futile, but each of the N-M tasks still attempts it. This delays the time when tokio gets around to schedule the first M tasks again. Eventually, tokio will schedule them, they will make progress, fill the `PageWriteGuard`, release it. But in the meantime, the N-M tasks have already bailed with error `exceeded evict iter limit`. Eventually, higher level mechanisms will retry for the N-M tasks, and this time, there won't be as many concurrent tasks wanting to do `Timeline::get`. So, it will shake out. But, it's a massive indigestion until then. This PR ======= This PR reverts the offending commits until we find a proper fix. ``` Revert "Use tokio locks in VirtualFile and turn with_file into macro (#5247)" This reverts commit `76cc87398c`. Revert "Make File opening in VirtualFile async-compatible (#5280)" This reverts commit `a18d6d9ae3`. ```	2023-09-12 17:38:31 +02:00
MMeent	83e7e5dbbd	Feat/postgres 16 (#4761 ) This adds PostgreSQL 16 as a vendored postgresql version, and adapts the code to support this version. The important changes to PostgreSQL 16 compared to the PostgreSQL 15 changeset include the addition of a neon_rmgr instead of altering Postgres's original WAL format. Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-09-12 15:11:32 +02:00
Christian Schwarz	5be8d38a63	fix deadlock around TENANTS (#5285 ) The sequence that can lead to a deadlock: 1. DELETE request gets all the way to `tenant.shutdown(progress, false).await.is_err() ` , while holding TENANTS.read() 2. POST request for tenant creation comes in, calls `tenant_map_insert`, it does `let mut guard = TENANTS.write().await;` 3. Something that `tenant.shutdown()` needs to wait for needs a `TENANTS.read().await`. The only case identified in exhaustive manual scanning of the code base is this one: Imitate size access does `get_tenant().await`, which does `TENANTS.read().await` under the hood. In the above case (1) waits for (3), (3)'s read-lock request is queued behind (2)'s write-lock, and (2) waits for (1). Deadlock. I made a reproducer/proof-that-above-hypothesis-holds in https://github.com/neondatabase/neon/pull/5281 , but, it's not ready for merge yet and we want the fix _now_. fixes https://github.com/neondatabase/neon/issues/5284	2023-09-12 11:23:46 +02:00
John Spray	36c261851f	s3_scrubber: remove atty dependency (#5171 ) ## Problem - https://github.com/neondatabase/neon/security/dependabot/28 ## Summary of changes Remove atty, and remove the `with_ansi` arg to scrubber's stdout logger.	2023-09-12 10:11:41 +01:00
Arpad Müller	15eaf78083	Disallow block_in_place and Handle::block_on (#5101 ) ## Problem `block_in_place` is a quite expensive operation, and if it is used, we should explicitly have to opt into it by allowing the `clippy::disallowed_methods` lint. For more, see https://github.com/neondatabase/neon/pull/5023#discussion_r1304194495. Similar arguments exist for `Handle::block_on`, but we don't do this yet as there is still usages. ## Summary of changes Adds a clippy.toml file, configuring the [`disallowed_methods` lint](https://rust-lang.github.io/rust-clippy/master/#/disallowed_method).	2023-09-12 00:11:16 +00:00
Arpad Müller	a18d6d9ae3	Make File opening in VirtualFile async-compatible (#5280 ) ## Problem Previously, we were using `observe_closure_duration` in `VirtualFile` file opening code, but this doesn't support async open operations, which we want to use as part of #4743. ## Summary of changes * Move the duration measurement from the `with_file` macro into a `observe_duration` macro. * Some smaller drive-by fixes to replace the old strings with the new variant names introduced by #5273 Part of #4743, follow-up of #5247.	2023-09-11 18:41:08 +02:00
Arpad Müller	76cc87398c	Use tokio locks in VirtualFile and turn with_file into macro (#5247 ) ## Problem For #4743, we want to convert everything up to the actual I/O operations of `VirtualFile` to `async fn`. ## Summary of changes This PR is the last change in a series of changes to `VirtualFile`: #5189, #5190, #5195, #5203, and #5224. It does the last preparations before the I/O operations are actually made async. We are doing the following things: * First, we change the locks for the file descriptor cache to tokio's locks that support Send. This is important when one wants to hold locks across await points (which we want to do), otherwise the Future won't be Send. Also, one shouldn't generally block in async code as executors don't like that. * Due to the lock change, we now take an approach for the `VirtualFile` destructors similar to the one proposed by #5122 for the page cache, to use `try_write`. Similarly to the situation in the linked PR, one can make an argument that if we are in the destructor and the slot has not been reused yet, we are the only user accessing the slot due to owning the lock mutably. It is still possible that we are not obtaining the lock, but the only cause for that is the clock algorithm touching the slot, which should be quite an unlikely occurence. For the instance of `try_write` failing, we spawn an async task to destroy the lock. As just argued however, most of the time the code path where we spawn the task should not be visited. * Lastly, we split `with_file` into a macro part, and a function part that contains most of the logic. The function part returns a lock object, that the macro uses. The macro exists to perform the operation in a more compact fashion, saving code from putting the lock into a variable and then doing the operation while measuring the time to run it. We take the locks approach because Rust has no support for async closures. One can make normal closures return a future, but that approach gets into lifetime issues the moment you want to pass data to these closures via parameters that has a lifetime (captures work). For details, see [this](https://smallcultfollowing.com/babysteps/blog/2023/03/29/thoughts-on-async-closures/) and [this](https://users.rust-lang.org/t/function-that-takes-an-async-closure/61663) link. In #5224, we ran into a similar problem with the `test_files` function, and we ended up passing the path and the `OpenOptions` by-value instead of by-ref, at the expense of a few extra copies. This can be done as the data is cheaply copyable, and we are in test code. But here, we are not, and while `File::try_clone` exists, it [issues system calls internally](`1e746d7741/library/std/src/os/fd/owned.rs (L94-L111)`). Also, it would allocate an entirely new file descriptor, something that the fd cache was built to prevent. * We change the `STORAGE_IO_TIME` metrics to support async. Part of #4743.	2023-09-11 17:35:05 +02:00
bojanserafimov	c0ed362790	Measure pageserver wal recovery time and fix flush() method (#5240 )	2023-09-11 09:46:06 -04:00
duguorong009	d7fa2dba2d	fix(pageserver): update the `STORAGE_IO_TIME` metrics to avoid expensive operations (#5273 ) Introduce the `StorageIoOperation` enum, `StorageIoTime` struct, and `STORAGE_IO_TIME_METRIC` static which provides lockless access to histograms consumed by `VirtualFile`. Closes #5131 Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-11 14:58:15 +03:00
Joonas Koivunen	a55a78a453	Misc test flakyness fixes (#5233 ) Assorted flakyness fixes from #5198, might not be flaky on `main`. Migrate some tests using neon_simple_env to just neon_env_builder and using initial_tenant to make flakyness understanding easier. (Did not understand the flakyness of `test_timeline_create_break_after_uninit_mark`.) `test_download_remote_layers_api` is flaky because we have no atomic "wait for WAL, checkpoint, wait for upload and do not receive any more WAL". `test_tenant_size` fixes are just boilerplate which should had always existed; we should wait for the tenant to be active. similarly for `test_timeline_delete`. `test_timeline_size_post_checkpoint` fails often for me with reading zero from metrics. Give it a few attempts.	2023-09-11 11:42:49 +03:00
Rahul Modpur	999fe668e7	Ack tenant detach before local files are deleted (#5211 ) ## Problem Detaching a tenant can involve many thousands of local filesystem metadata writes, but the control plane would benefit from us not blocking detach/delete responses on these. ## Summary of changes After rename of local tenant directory ack tenant detach and delete tenant directory in background #5183 --------- Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>	2023-09-10 22:59:51 +03:00
Alexander Bayandin	d33e1b1b24	approved-for-ci-run.yml: use token to checkout the repo (#5266 ) ## Problem Another thing I overlooked regarding'approved-for-ci-run`: - When we create a PR, the action is associated with @vipvap and this triggers the pipeline — this is good. - When we update the PR by force-pushing to the branch, the action is associated with @github-actions, which doesn't trigger a pipeline — this is bad. Initially spotted in #5239 / #5211 ([link](https://github.com/neondatabase/neon/actions/runs/6122249456/job/16633919558?pr=5239)) — `check-permissions` should not fail. ## Summary of changes - Use `CI_ACCESS_TOKEN` to check out the repo (I expect this token will be reused in the following `git push`)	2023-09-10 20:12:38 +01:00
Alexander Bayandin	15fd188fd6	Fix GitHub Autocomment for `ci-run/pr`s (#5268 ) ## Problem When PR `ci-run/pr-*` is created the GitHub Autocomment with test results are supposed to be posted to the original PR, currently, this doesn't work. I created this PR from a personal fork to debug and fix the issue. ## Summary of changes - `scripts/comment-test-report.js`: use `pull_request.head` instead of `pull_request.base` 🤦	2023-09-10 20:06:10 +01:00
Alexander Bayandin	34e39645c4	GitHub Workflows: add actionlint (#5265 ) ## Problem Add a CI pipeline that checks GitHub Workflows with https://github.com/rhysd/actionlint (it uses `shellcheck` for shell scripts in steps) To run it locally: `SHELLCHECK_OPTS=--exclude=SC2046,SC2086 actionlint` ## Summary of changes - Add `.github/workflows/actionlint.yml` - Fix actionlint warnings	2023-09-10 20:05:07 +01:00
Em Sharnoff	1cac923af8	vm-monitor: Rate-limit upscale requests (#5263 ) Some VMs, when already scaled up as much as possible, end up spamming the autoscaler-agent with upscale requests that will never be fulfilled. If postgres is using memory greater than the cgroup's memory.high, it can emit new memory.high events 1000 times per second, which... just means unnecessary load on the rest of the system. This changes the vm-monitor so that we skip sending upscale requests if we already sent one within the last second, to avoid spamming the autoscaler-agent. This matches previous behavior that the vm-informant hand.	2023-09-10 20:33:53 +03:00
Em Sharnoff	853552dcb4	vm-monitor: Don't include Args in top-level span (#5264 ) It makes the logs too verbose. ref https://neondb.slack.com/archives/C03F5SM1N02/p1694281232874719?thread_ts=1694272777.207109&cid=C03F5SM1N02	2023-09-10 20:15:53 +03:00
Alexander Bayandin	1ea93af56c	Create GitHub release from release tag (#5246 ) ## Problem This PR creates a GitHub release from a release tag with an autogenerated changelog: https://github.com/neondatabase/neon/releases ## Summary of changes - Call GitHub API to create a release	2023-09-09 22:02:28 +01:00
Konstantin Knizhnik	f64b338ce3	Ingore DISK_FULL error when performing availability check for client (#5010 ) See #5001 No space is what's expected if we're at size limit. Of course if SK incorrectly returned "no space", the availability check wouldn't fire. But users would notice such a bug quite soon anyways. So ignoring "no space" is the right trade-off. ## Problem ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-09 21:51:04 +03:00
Konstantin Knizhnik	ba06ea26bb	Fix issues with reanabling LFC (#5209 ) refer #5208 ## Problem See https://neondb.slack.com/archives/C03H1K0PGKH/p1693938336062439?thread_ts=1693928260.704799&cid=C03H1K0PGKH #5208 disable LFC forever in case of error. It is not good because the problem causing this error (for example ENOSPC) can be resolved anti will be nice to reenable it after fixing. Also #5208 disables LFC locally in one backend. But other backends may still see corrupted data. It should not cause problems right now with "permission denied" error because there should be no backend which is able to normally open LFC. But in case of out-of-disk-space error, other backend can read corrupted data. ## Summary of changes 1. Cleanup hash table after error to prevent access to stale or corrupted data 2. Perform disk write under exclusive lock (hoping it will not affect performance because usually write just copy data from user to system space) 3. Use generations to prevent access to stale data in lfc_read ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-09-09 17:51:16 +03:00
Joonas Koivunen	6f28da1737	fix: LocalFs root in test_compatibility is PosixPath('...') (#5261 ) I forgot a `str(...)` conversion in #5243. This lead to log lines such as: ``` Using fs root 'PosixPath('/tmp/test_output/test_backward_compatibility[debug-pg14]/compatibility_snapshot/repo/local_fs_remote_storage/pageserver')' as a remote storage ``` This surprisingly works, creating hierarchy of under current working directory (`repo_dir` for tests): - `PosixPath('` - `tmp` .. up until .. `local_fs_remote_storage` - `pageserver')` It should not work but right now test_compatibility.py tests finds local metadata and layers, which end up used. After #5172 when remote storage is the source of truth it will no longer work.	2023-09-08 20:27:00 +03:00
Heikki Linnakangas	60050212e1	Update rdkit to version 2023_03_03. (#5260 ) It includes PostgreSQL 16 support.	2023-09-08 19:40:29 +03:00
Joonas Koivunen	66633ef2a9	rust-toolchain: use 1.72.0, same as CI (#5256 ) Switches everyone without an `rustup override` to 1.72.0. Code changes required already done in #5255. Depends on https://github.com/neondatabase/build/pull/65.	2023-09-08 19:36:02 +03:00
Alexander Bayandin	028fbae161	Miscellaneous fixes for tests-related things (#5259 ) ## Problem A bunch of fixes for different test-related things ## Summary of changes - Fix test_runner/pg_clients (`subprocess_capture` return value has changed) - Do not run create-test-report if check-permissions failed for not cancelled jobs - Fix Code Coverage comment layout after flaky tests. Add another healing "\n" - test_compatibility: add an instruction for local run Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-08 16:28:09 +01:00
John Spray	7b6337db58	tests: enable multiple pageservers in `neon_local` and `neon_fixture` (#5231 ) ## Problem Currently our testing environment only supports running a single pageserver at a time. This is insufficient for testing failover and migrations. - Dependency of writing tests for #5207 ## Summary of changes - `neon_local` and `neon_fixture` now handle multiple pageservers - This is a breaking change to the `.neon/config` format: any local environments will need recreating - Existing tests continue to work unchanged: - The default number of pageservers is 1 - `NeonEnv.pageserver` is now a helper property that retrieves the first pageserver if there is only one, else throws. - Pageserver data directories are now at `.neon/pageserver_{n}` where n is 1,2,3... - Compatibility tests get some special casing to migrate neon_local configs: these are not meant to be backward/forward compatible, but they were treated that way by the test.	2023-09-08 16:19:57 +01:00
Konstantin Knizhnik	499d0707d2	Perform throttling for concurrent build index which is done outside transaction (#5048 ) See https://neondb.slack.com/archives/C03H1K0PGKH/p1692550646191429 ## Problem Build index concurrently is writing WAL outside transaction. `backpressure_throttling_impl` doesn't perform throttling for read-only transactions (not assigned XID). It cause huge write lag which can cause large delay of accessing the table. ## Summary of changes Looks at `PROC_IN_SAFE_IC` in process state set during concurrent index build. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-09-08 18:05:08 +03:00
Joonas Koivunen	720d59737a	rust-1.72.0 changes (#5255 ) Prepare to upgrade rust version to latest stable. - `rustfmt` has learned to format `let irrefutable = $expr else { ... };` blocks - There's a new warning about virtual (workspace) crate resolver, picked the latest resolver as I suspect everyone would expect it to be the latest; should not matter anyways - Some new clippies, which seem alright	2023-09-08 16:28:41 +03:00
Joonas Koivunen	ff87fc569d	test: Remote storage refactorings (#5243 ) Remote storage cleanup split from #5198: - pageserver, extensions, and safekeepers now have their separate remote storage - RemoteStorageKind has the configuration code - S3Storage has the cleanup code - with MOCK_S3, pageserver, extensions, safekeepers use different buckets - with LOCAL_FS, `repo_dir / "local_fs_remote_storage" / $user` is used as path, where $user is `pageserver`, `safekeeper` - no more `NeonEnvBuilder.enable_xxx_remote_storage` but one `enable_{pageserver,extensions,safekeeper}_remote_storage` Should not have any real changes. These will allow us to default to `LOCAL_FS` for pageserver on the next PR, remove `RemoteStorageKind.NOOP`, work towards #5172. Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-09-08 13:54:23 +03:00
Heikki Linnakangas	cdc65c1857	Update pg_cron to version 1.6.0 (#5252 ) This includes PostgreSQL 16 support. There are no catalog changes, so this is a drop-in replacement, no need to run "ALTER EXTENSION UPDATE".	2023-09-08 12:42:46 +03:00
Heikki Linnakangas	dac995e7e9	Update plpgsql_check extension to version v2.4.0 (#5249 ) This brings v16 support.	2023-09-08 10:46:02 +03:00
Alexander Bayandin	b80740bf9f	test_startup: increase timeout (#5238 ) ## Problem `test_runner/performance/test_startup.py::test_startup` started to fail more frequently because of the timeout. Let's increase the timeout to see the failures on the perf dashboard. ## Summary of changes - Increase timeout for`test_startup` from 600 to 900 seconds	2023-09-08 01:57:38 +01:00
Heikki Linnakangas	57c1ea49b3	Update hypopg extension to version 1.4.0 (#5245 ) The v1.4.0 includes changes to make it compile with PostgreSQL 16. The commit log doesn't call it out explicitly, but I tested it manually. v1.4.0 includes some new functions, but I tested manually that the the v1.3.1 functionality works with the v1.4.0 version of the library. That means that this doesn't break existing installations. Users can do "ALTER EXTENSION hypopg UPDATE" if they want to use the new v1.4.0 functionality, but they don't have to.	2023-09-08 03:30:11 +03:00
Heikki Linnakangas	6c31a2d342	Upgrade prefix extension to version 1.2.10 (#5244 ) This version includes trivial changes to make it compile with PostgreSQL 16. No functional changes.	2023-09-08 02:10:01 +03:00
Heikki Linnakangas	252b953f18	Upgrade postgresql-hll to version 2.18. (#5241 ) This includes PostgreSQL 16 support. No other changes, really. The extension version in the upstream was changed from 2.17 to 2.18, however, there is no difference between the catalog objects. So if you had installed 2.17 previously, it will continue to work. You can run "ALTER EXTENSION hll UPDATE", but all it will do is update the version number in the pg_extension table.	2023-09-08 02:07:17 +03:00
Heikki Linnakangas	b414360afb	Upgrade ip4r to version 2.4.2 (#5242 ) Includes PostgreSQL v16 support. No functional changes.	2023-09-08 02:06:53 +03:00
Arpad Müller	d206655a63	Make VirtualFile::{open, open_with_options, create,sync_all,with_file} async fn (#5224 ) ## Problem Once we use async file system APIs for `VirtualFile`, these functions will also need to be async fn. ## Summary of changes Makes the functions `open, open_with_options, create,sync_all,with_file` of `VirtualFile` async fn, including all functions that call it. Like in the prior PRs, the actual I/O operations are not using async APIs yet, as per request in the #4743 epic. We switch towards not using `VirtualFile` in the par_fsync module, hopefully this is only temporary until we can actually do fully async I/O in `VirtualFile`. This might cause us to exhaust fd limits in the tests, but it should only be an issue for the local developer as we have high ulimits in prod. This PR is a follow-up of #5189, #5190, #5195, and #5203. Part of #4743.	2023-09-08 00:50:50 +02:00
Heikki Linnakangas	e5adc4efb9	Upgrade h3-pg to version 4.1.3. (#5237 ) This includes v16 support.	2023-09-07 21:39:12 +03:00
Heikki Linnakangas	c202f0ba10	Update PostGIS to version 3.3.3 (#5236 ) It's a good idea to keep up-to-date in general. One noteworthy change is that PostGIS 3.3.3 adds support for PostgreSQL v16. We'll need that. PostGIS 3.4.0 has already been released, and we should consider upgrading to that. However, it's a major upgrade and requires running "SELECT postgis_extensions_upgrade();" in each database, to upgrade the catalogs. I don't want to deal with that right now.	2023-09-07 21:38:55 +03:00
Alexander Bayandin	d15563f93b	Misc workflows: fix quotes in bash (#5235 )	2023-09-07 19:39:42 +03:00
Rahul Modpur	485a2cfdd3	Fix pg_config version parsing (#5200 ) ## Problem Fix pg_config version parsing ## Summary of changes Use regex to capture major version of postgres #5146	2023-09-07 15:34:22 +02:00
Alexander Bayandin	1fee69371b	Update `plv8` to 3.1.8 (#5230 ) ## Problem We likely need this to support Postgres 16 It's also been asked by a user https://github.com/neondatabase/neon/discussions/5042 The latest version is 3.2.0, but it requires some changes in the build script (which I haven't checked, but it didn't work right away) ## Summary of changes ``` 3.1.8 2023-08-01 - force v8 to compile in release mode 3.1.7 2023-06-26 - fix byteoffset issue with arraybuffers - support postgres 16 beta 3.1.6 2023-04-08 - fix crash issue on fetch apply - fix interrupt issue ``` From https://github.com/plv8/plv8/blob/v3.1.8/Changes	2023-09-07 14:21:38 +01:00
Alexander Bayandin	f8a91e792c	Even better handling of `approved-for-ci-run` label (#5227 ) ## Problem We've got `approved-for-ci-run` to work 🎉 But it's still a bit rough, this PR should improve the UX for external contributors. ## Summary of changes - `build_and_test.yml`: add `check-permissions` job, which fails if PR is created from a fork. Make all jobs in the workflow to be dependant on `check-permission` to fail fast - `approved-for-ci-run.yml`: add `cleanup` job to close `ci-run/pr-` PRs and delete linked branches when the parent PR is closed - `approved-for-ci-run.yml`: fix the layout for the `ci-run/pr-` PR description - GitHub Autocomment: add a comment with tests result to the original PR (instead of a PR from `ci-run/pr-*` )	2023-09-07 14:21:01 +01:00
duguorong009	706977fb77	fix(pageserver): add the walreceiver state to tenant timeline GET api endpoint (#5196 ) Add a `walreceiver_state` field to `TimelineInfo` (response of `GET /v1/tenant/:tenant_id/timeline/:timeline_id`) and while doing that, refactor out a common `Timeline::walreceiver_state(..)`. No OpenAPI changes, because this is an internal debugging addition. Fixes #3115. Co-authored-by: Joonas Koivunen <joonas.koivunen@gmail.com>	2023-09-07 14:17:18 +03:00
Arpad Müller	7ba0f5c08d	Improve comment in page cache (#5220 ) It was easy to interpret comment in the page cache initialization code to be about justifying why we leak here at all, not just why this specific type of leaking is done (which the comment was actually meant to describe). See https://github.com/neondatabase/neon/pull/5125#discussion_r1308445993 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-06 21:44:54 +02:00
Arpad Müller	6243b44dea	Remove Virtual from FileBlockReaderVirtual variant name (#5225 ) With #5181, the generics for `FileBlockReader` have been removed, so having a `Virtual` postfix makes less sense now.	2023-09-06 20:54:57 +02:00
Joonas Koivunen	3a966852aa	doc: tests expect lsof (#5226 ) On a clean system `lsof` needs to be installed. Add it to the list just to keep things nice and copy-pasteable (except for poetry).	2023-09-06 21:40:00 +03:00
duguorong009	31e1568dee	refactor(pageserver): refactor pageserver router state creation (#5165 ) Fixes #3894 by: - Refactor the pageserver router creation flow - Create the router state in `pageserver/src/bin/pageserver.rs`	2023-09-06 21:31:49 +03:00
Chengpeng Yan	9a9187b81a	Complete the missing metrics for files_created/bytes_written (#5120 )	2023-09-06 14:00:15 -04:00
Chengpeng Yan	dfe2e5159a	remove the duplicate entries in postgresql.conf (#5090 )	2023-09-06 13:57:03 -04:00
Alexander Bayandin	e4b1d6b30a	Misc post-merge fixes (#5219 ) ## Problem - `SCALE: unbound variable` from https://github.com/neondatabase/neon/pull/5079 - The layout of the GitHub auto-comment is broken if the code coverage section follows flaky test section from https://github.com/neondatabase/neon/pull/4999 ## Summary of changes - `benchmarking.yml`: Rename `SCALE` to `TEST_OLAP_SCALE` - `comment-test-report.js`: Add an extra new-line before Code coverage section	2023-09-06 20:11:44 +03:00
Alexander Bayandin	76a96b0745	Notify Slack channel about upcoming releases (#5197 ) ## Problem When the next release is coming, we want to let everyone know about it by posting a message to the Slack channel with a list of commits. ## Summary of changes - `.github/workflows/release-notify.yml` is added - the workflow sends a message to `vars.SLACK_UPCOMING_RELEASE_CHANNEL_ID` (or [#test-release-notifications](https://neondb.slack.com/archives/C05QQ9J1BRC) if not configured) - On each PR update, the workflow updates the list of commits in the message (it doesn't send additional messages)	2023-09-06 17:52:21 +01:00
Arpad Müller	5e00c44169	Add WriteBlobWriter buffering and make VirtualFile::{write,write_all} async (#5203 ) ## Problem We want to convert the `VirtualFile` APIs to async fn so that we can adopt one of the async I/O solutions. ## Summary of changes This PR is a follow-up of #5189, #5190, and #5195, and does the following: * Move the used `Write` trait functions of `VirtualFile` into inherent functions * Add optional buffering to `WriteBlobWriter`. The buffer is discarded on drop, similarly to how tokio's `BufWriter` does it: drop is neither async nor does it support errors. * Remove the generics by `Write` impl of `WriteBlobWriter`, alwaays using `VirtualFile` * Rename `WriteBlobWriter` to `BlobWriter` * Make various functions in the write path async, like `VirtualFile::{write,write_all}`. Part of #4743.	2023-09-06 18:17:12 +02:00
Alexander Bayandin	d5f1858f78	approved-for-ci-run.yml: use different tokens (#5218 ) ## Problem `CI_ACCESS_TOKEN` has quite limited access (which is good), but this doesn't allow it to remove labels from PRs (which is bad) ## Summary of changes - Use `GITHUB_TOKEN` to remove labels - Use `CI_ACCESS_TOKEN` to create PRs	2023-09-06 18:50:59 +03:00
John Spray	61d661a6c3	pageserver: generation number fetch on startup and use in /attach (#5163 ) ## Problem - #5050 Closes: https://github.com/neondatabase/neon/issues/5136 ## Summary of changes - A new configuration property `control_plane_api` controls other functionality in this PR: if it is unset (default) then everything still works as it does today. - If `control_plane_api` is set, then on startup we call out to control plane `/re-attach` endpoint to discover our attachments and their generations. If an attachment is missing from the response we implicitly detach the tenant. - Calls to pageserver `/attach` API may include a `generation` parameter. If `control_plane_api` is set, then this parameter is mandatory. - RemoteTimelineClient's loading of index_part.json is generation-aware, and will try to load the index_part with the most recent generation <= its own generation. - The `neon_local` testing environment now includes a new binary `attachment_service` which implements the endpoints that the pageserver requires to operate. This is on by default if running `cargo neon` by hand. In `test_runner/` tests, it is off by default: existing tests continue to run with in the legacy generation-less mode. Caveats: - The re-attachment during startup assumes that we are only re-attaching tenants that have previously been attached, and not totally new tenants -- this relies on the control plane's attachment logic to keep retrying so that we should eventually see the attach API call. That's important because the `/re-attach` API doesn't tell us which timelines we should attach -- we still use local disk state for that. Ref: https://github.com/neondatabase/neon/issues/5173 - Testing: generations are only enabled for one integration test right now (test_pageserver_restart), as a smoke test that all the machinery basically works. Writing fuller tests that stress tenant migration will come later, and involve extending our test fixtures to deal with multiple pageservers. - I'm not in love with "attachment_service" as a name for the neon_local component, but it's not very important because we can easily rename these test bits whenever we want. - Limited observability when in re-attach on startup: when I add generation validation for deletions in a later PR, I want to wrap up the control plane API calls in some small client class that will expose metrics for things like errors calling the control plane API, which will act as a strong red signal that something is not right. Co-authored-by: Christian Schwarz <christian@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-06 14:44:48 +01:00
Alexander Bayandin	da60f69909	approved-for-ci-run.yml: use our bot (#5216 ) ## Problem Pull Requests created by GitHub Actions bot doesn't have access to secrets, so we need to use our bot for it to auto-trigger a tests run See previous PRs #4663, #5210, #5212 ## Summary of changes - Use our bot to create PRs	2023-09-06 14:55:11 +03:00
John Spray	743933176e	scrubber: add `scan-metadata` and hook into integration tests (#5176 ) ## Problem - Scrubber's `tidy` command requires presence of a control plane - Scrubber has no tests at all ## Summary of changes - Add re-usable async streams for reading metadata from a bucket - Add a `scan-metadata` command that reads from those streams and calls existing `checks.rs` code to validate metadata, then returns a summary struct for the bucket. Command returns nonzero status if errors are found. - Add an `enable_scrub_on_exit()` function to NeonEnvBuilder so that tests using remote storage can request to have the scrubber run after they finish - Enable remote storarge and scrub_on_exit in test_pageserver_restart and test_pageserver_chaos This is a "toe in the water" of the overall space of validating the scrubber. Later, we should: - Enable scrubbing at end of tests using remote storage by default - Make the success condition stricter than "no errors": tests should declare what tenants+timelines they expect to see in the bucket (or sniff these from the functions tests use to create them) and we should require that the scrubber reports on these particular tenants/timelines. The `tidy` command is untouched in this PR, but it should be refactored later to use similar async streaming interface instead of the current batch-reading approach (the streams are faster with large buckets), and to also be covered by some tests. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech> Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2023-09-06 11:55:24 +01:00
Alexander Bayandin	8e25d3e79e	test_runner: add scale parameter to tpc-h tests (#5079 ) ## Problem It's hard to find out which DB size we use for OLAP benchmarks (TPC-H in particular). This PR adds handling of `TEST_OLAP_SCALE` env var, which is get added to a test name as a parameter. This is required for performing larger periodic benchmarks. ## Summary of changes - Handle `TEST_OLAP_SCALE` in `test_runner/performance/test_perf_olap.py` - Set `TEST_OLAP_SCALE` in `.github/workflows/benchmarking.yml` to a TPC-H scale	2023-09-06 13:22:57 +03:00
duguorong009	4fec48f2b5	chore(pageserver): remove unnecessary logging in tenant task loops (#5188 ) Fixes #3830 by adding the `#[cfg(not(feature = "testing"))]` attribute to unnecessary loggings in `pageserver/src/tenant/tasks.rs`. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-06 13:19:19 +03:00
Vadim Kharitonov	88b1ac48bd	Create Release PR at 7:00 UTC every Tuesday (#5213 )	2023-09-06 13:17:52 +03:00
Alexander Bayandin	15ff4e5fd1	approved-for-ci-run.yml: trigger on pull_request_target (#5212 ) ## Problem Continuation of #4663, #5210 We're still getting an error: ``` GraphQL: Resource not accessible by integration (removeLabelsFromLabelable) ``` ## Summary of changes - trigger `approved-for-ci-run.yml` workflow on `pull_request_target` instead of `pull_request`	2023-09-06 13:14:07 +03:00
Alexander Bayandin	dbfb4ea7b8	Make CI more friendly for external contributors. Second try (#5210 ) ## Problem `approved-for-ci-run` label logic doesn't work as expected: - https://github.com/neondatabase/neon/pull/4722#issuecomment-1636742145 - https://github.com/neondatabase/neon/pull/4722#issuecomment-1636755394 Continuation of https://github.com/neondatabase/neon/pull/4663 Closes #2222 (hopefully) ## Summary of changes - Create a twin PR automatically - Allow `GITHUB_TOKEN` to manipulate with labels	2023-09-06 10:06:55 +01:00
Alexander Bayandin	c222320a2a	Generate lcov coverage report (#4999 ) ## Problem We want to display coverage information for each PR. - an example of a full coverage report: https://neon-github-public-dev.s3.amazonaws.com/code-coverage/abea64800fb390c32a3efe6795d53d8621115c83/lcov/index.html - an example of GitHub auto-comment with coverage information: https://github.com/neondatabase/neon/pull/4999#issuecomment-1679344658 ## Summary of changes - Use patched[*](`426e7e7a22`) lcov to generate coverage report - Upload HTML coverage report to S3 - `scripts/comment-test-report.js`: add coverage information	2023-09-06 00:48:15 +01:00
MMeent	89c64e179e	Fix corruption issue in Local File Cache (#5208 ) Fix issue where updating the size of the Local File Cache could lead to invalid reads: ## Problem LFC cache can get re-enabled when lfc_max_size is set, e.g. through an autoscaler configuration, or PostgreSQL not liking us setting the variable. 1. initialize: LFC enabled (lfc_size_limit > 0; lfc_desc = 0) 2. Open LFC file fails, lfc_desc = -1. lfc_size_limit is set to 0; 3. lfc_size_limit is updated by autoscaling to >0 4. read() now thinks LFC is enabled (size_limit > 0) and lfc_desc is valid, but doesn't try to read from the invalid file handle and thus doesn't update the buffer content with the page's data, but does think the data was read... Any buffer we try to read from local file cache is essentially uninitialized memory. Those are likely 0-bytes, but might potentially be any old buffer that was previously read from or flushed to disk. Fix this by adding a more definitive disable flag, plus better invalid state handling.	2023-09-05 20:00:47 +02:00
Alexander Bayandin	7ceddadb37	Merge custom extension CI jobs (#5194 ) ## Problem When a remote custom extension build fails, it looks a bit confusing on neon CI: - `trigger-custom-extensions-build` is green - `wait-for-extensions-build` is red - `build-and-upload-extensions` is red But to restart the build (to get everything green), you need to restart the only passed `trigger-custom-extensions-build`. ## Summary of changes - Merge `trigger-custom-extensions-build` and `wait-for-extensions-build` jobs into `trigger-custom-extensions-build-and-wait`	2023-09-05 14:02:37 +01:00
Arpad Müller	4904613aaa	Convert `VirtualFile::{seek,metadata}` to async (#5195 ) ## Problem We want to convert the `VirtualFile` APIs to async fn so that we can adopt one of the async I/O solutions. ## Summary of changes Convert the following APIs of `VirtualFile` to async fn (as well as all of the APIs calling it): * `VirtualFile::seek` * `VirtualFile::metadata` * Also, prepare for deletion of the write impl by writing the summary to a buffer before writing it to disk, as suggested in https://github.com/neondatabase/neon/issues/4743#issuecomment-1700663864 . This change adds an additional warning for the case when the summary exceeds a block. Previously, we'd have silently corrupted data in this (unlikely) case. * `WriteBlobWriter::write_blob`, in preparation for making `VirtualFile::write_all` async.	2023-09-05 12:55:45 +02:00
Nikita Kalyanov	77658a155b	support deploying in IPv6-only environments (#4135 ) A set of changes to enable neon to work in IPv6 environments. The changes are backward-compatible but allow to deploy neon even to IPv6-only environments: - bind to both IPv4 and IPv6 interfaces - allow connections to Postgres from IPv6 interface - parse the address from control plane that could also be IPv6	2023-09-05 12:45:46 +03:00
Arpad Müller	128a85ba5e	Convert many VirtualFile APIs to async (#5190 ) ## Problem `VirtualFile` does both reading and writing, and it would be nice if both could be converted to async, so that it doesn't have to support an async read path and a blocking write path (especially for the locks this is annoying as none of the lock implementations in std, tokio or parking_lot have support for both async and blocking access). ## Summary of changes This PR is some initial work on making the `VirtualFile` APIs async. It can be reviewed commit-by-commit. * Introduce the `MaybeVirtualFile` enum to be generic in a test that compares real files with virtual files. * Make various APIs of `VirtualFile` async, including `write_all_at`, `read_at`, `read_exact_at`. Part of #4743 , successor of #5180. Co-authored-by: Christian Schwarz <me@cschwarz.com>	2023-09-04 17:05:20 +02:00
Arpad Müller	6cd497bb44	Make VirtualFile::crashsafe_overwrite async fn (#5189 ) ## Problem The `VirtualFile::crashsafe_overwrite` function was introduced by #5186 but it was not turned `async fn` yet. We want to make these functions async fn as part of #4743. ## Summary of changes Make `VirtualFile::crashsafe_overwrite` async fn, as well as all the functions calling it. Don't make anything inside `crashsafe_overwrite` use async functionalities, as per #4743 instructions. Also, add rustdoc to `crashsafe_overwrite`. Part of #4743.	2023-09-04 12:52:35 +02:00
John Spray	80f10d5ced	pageserver: safe deletion for tenant directories (#5182 ) ## Problem If a pageserver crashes partway through deleting a tenant's directory, it might leave a partial state that confuses a subsequent startup/attach. ## Summary of changes Rename tenant directory to a temporary path before deleting. Timeline deletions already have deletion markers to provide safety. In future, it would be nice to exploit this to send responses to detach requests earlier: https://github.com/neondatabase/neon/issues/5183	2023-09-04 08:31:55 +01:00
Christian Schwarz	7e817789d5	VirtualFile: add crash-safe overwrite abstraction & use it (#5186 ) (part of #4743) (preliminary to #5180) This PR adds a special-purpose API to `VirtualFile` for write-once files. It adopts it for `save_metadata` and `persist_tenant_conf`. This is helpful for the asyncification efforts (#4743) and specifically asyncification of `VirtualFile` because above two functions were the only ones that needed the VirtualFile to be an `std::io::Write`. (There was also `manifest.rs` that needed the `std::io::Write`, but, it isn't used right now, and likely won't be used because we're taking a different route for crash consistency, see #5172. So, let's remove it. It'll be in Git history if we need to re-introduce it when picking up the compaction work again; that's why it was introduced in the first place). We can't remove the `impl std::io::Write for VirtualFile` just yet because of the `BufWriter` in ```rust struct DeltaLayerWriterInner { ... blob_writer: WriteBlobWriter<BufWriter<VirtualFile>>, } ``` But, @arpad-m and I have a plan to get rid of that by extracting the append-only-ness-on-top-of-VirtualFile that #4994 added to `EphemeralFile` into an abstraction that can be re-used in the `DeltaLayerWriterInner` and `ImageLayerWriterInner`. That'll be another PR. ### Performance Impact This PR adds more fsyncs compared to before because we fsync the parent directory every time. 1. For `save_metadata`, the additional fsyncs are unnecessary because we know that `metadata` fits into a kernel page, and hence the write won't be torn on the way into the kernel. However, the `metadata` file in general is going to lose signficance very soon (=> see #5172), and the NVMes in prod can definitely handle the additional fsync. So, let's not worry about it. 2. For `persist_tenant_conf`, which we don't check to fit into a single kernel page, this PR makes it actually crash-consistent. Before, we could crash while writing out the tenant conf, leaving a prefix of the tenant conf on disk.	2023-09-02 10:06:14 +02:00
John Spray	41aa627ec0	tests: get test name automatically for remote storage (#5184 ) ## Problem Tests using remote storage have manually entered `test_name` parameters, which: - Are easy to accidentally duplicate when copying code to make a new test - Omit parameters, so don't actually create unique S3 buckets when running many tests concurrently. ## Summary of changes - Use the `request` fixture in neon_env_builder fixture to get the test name, then munge that into an S3 compatible bucket name. - Remove the explicit `test_name` parameters to enable_remote_storage	2023-09-01 17:29:38 +01:00
Conrad Ludgate	44da9c38e0	proxy: error typo (#5187 ) ## Problem https://github.com/neondatabase/neon/pull/5162#discussion_r1311853491	2023-09-01 19:21:33 +03:00
Christian Schwarz	cfc0fb573d	pageserver: run all Rust tests with remote storage enabled (#5164 ) For [#5086](https://github.com/neondatabase/neon/pull/5086#issuecomment-1701331777) we will require remote storage to be configured in pageserver. This PR enables `localfs`-based storage for all Rust unit tests. Changes: - In `TenantHarness`, set up localfs remote storage for the tenant. - `create_test_timeline` should mimic what real timeline creation does, and real timeline creation waits for the timeline to reach remote storage. With this PR, `create_test_timeline` now does that as well. - All the places that create the harness tenant twice need to shut down the tenant before the re-create through a second call to `try_load` or `load`. - Without shutting down, upload tasks initiated by/through the first incarnation of the harness tenant might still be ongoing when the second incarnation of the harness tenant is `try_load`/`load`ed. That doesn't make sense in the tests that do that, they generally try to set up a scenario similar to pageserver stop & start. - There was one test that recreates a timeline, not the tenant. For that case, I needed to create a `Timeline::shutdown` method. It's a refactoring of the existing `Tenant::shutdown` method. - The remote_timeline_client tests previously set up their own `GenericRemoteStorage` and `RemoteTimelineClient`. Now they re-use the one that's pre-created by the TenantHarness. Some adjustments to the assertions were needed because the assertions now need to account for the initial image layer that's created by `create_test_timeline` to be present.	2023-09-01 18:10:40 +02:00
Christian Schwarz	aa22000e67	FileBlockReader<File> is never used (#5181 ) part of #4743 preliminary to #5180	2023-09-01 17:30:22 +02:00
Christian Schwarz	5edae96a83	rfc: Crash-Consistent Layer Map Updates By Leveraging index_part.json (#5086 ) This RFC describes a simple scheme to make layer map updates crash consistent by leveraging the index_part.json in remote storage. Without such a mechanism, crashes can induce certain edge cases in which broadly held assumptions about system invariants don't hold.	2023-09-01 15:24:58 +02:00
Christian Schwarz	40ce520c07	remote_timeline_client: tests: run upload ops on the tokio::test runtime (#5177 ) The `remote_timeline_client` tests use `#[tokio::test]` and rely on the fact that the test runtime that is set up by this macro is single-threaded. In PR https://github.com/neondatabase/neon/pull/5164, we observed interesting flakiness with the `upload_scheduling` test case: it would observe the upload of the third layer (`layer_file_name_3`) before we did `wait_completion`. Under the single-threaded-runtime assumption, that wouldn't be possible, because the test code doesn't await inbetween scheduling the upload and calling `wait_completion`. However, RemoteTimelineClient was actually using `BACKGROUND_RUNTIME`. That means there was parallelism where the tests didn't expect it, leading to flakiness such as execution of an UploadOp task before the test calls `wait_completion`. The most confusing scenario is code like this: ``` schedule upload(A); wait_completion.await; // B schedule_upload(C); wait_completion.await; // D ``` On a single-threaded executor, it is guaranteed that the upload up C doesn't run before D, because we (the test) don't relinquish control to the executor before D's `await` point. However, RemoteTimelineClient actually scheduled onto the BACKGROUND_RUNTIME, so, `A` could start running before `B` and `C` could start running before `D`. This would cause flaky tests when making assertions about the state manipulated by the operations. The concrete issue that led to discover of this bug was an assertion about `remote_fs_dir` state in #5164.	2023-09-01 16:24:04 +03:00
Alexander Bayandin	e9f2c64322	Wait for custom extensions build before deploy (#5170 ) ## Problem Currently, the `deploy` job doesn't wait for the custom extension job (in another repo) and can be started even with failed extensions build. This PR adds another job that polls the status of the extension build job and fails if the extension build fails. ## Summary of changes - Add `wait-for-extensions-build` job, which waits for a custom extension build in another repo.	2023-09-01 12:59:19 +01:00
John Spray	715077ab5b	tests: broaden a log allow regex in `test_ignored_tenant_stays_broken_without_metadata` (#5168 ) ## Problem - https://github.com/neondatabase/neon/issues/5167 ## Summary of changes Accept "will not become active" log line with _either_ Broken or Stopping state, because we may hit it while in the process of doing the `/ignore` (earlier in the test than the test expects to see the same line with Broken)	2023-09-01 08:36:38 +01:00
John Spray	616e7046c7	s3_scrubber: import into the main `neon` repository (#5141 ) ## Problem The S3 scrubber currently lives at https://github.com/neondatabase/s3-scrubber We don't have tests that use it, and it has copies of some data structures that can get stale. ## Summary of changes - Import the s3-scrubber as `s3_scrubber/ - Replace copied_definitions/ in the scrubber with direct access to the `utils` and `pageserver` crates - Modify visibility of a few definitions in `pageserver` to allow the scrubber to use them - Update scrubber code for recent changes to `IndexPart` - Update `KNOWN_VERSIONS` for IndexPart and move the definition into index.rs so that it is easier to keep up to date As a future refinement, it would be good to pull the remote persistence types (like IndexPart) out of `pageserver` into a separate library so that the scrubber doesn't have to link against the whole pageserver, and so that it's clearer which types need to be public. Co-authored-by: Kirill Bulatov <kirill@neon.tech> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-08-31 19:01:39 +01:00
Conrad Ludgate	1b916a105a	proxy: locked is not retriable (#5162 ) ## Problem Management service returns Locked when quotas are exhausted. We cannot retry on those ## Summary of changes Makes Locked status unretriable	2023-08-31 15:50:15 +03:00
Conrad Ludgate	d11621d904	Proxy: proxy protocol v2 (#5028 ) ## Problem We need to log the client IP, not the IP of the NLB. ## Summary of changes Parse the proxy [protocol version 2](https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt) if possible	2023-08-31 14:30:25 +03:00
John Spray	43bb8bfdbb	pageserver: fix flake in test_timeline_deletion_with_files_stuck_in_upload_queue (#5149 ) ## Problem Test failing on a different ERROR log than it anticipated. Closes: https://github.com/neondatabase/neon/issues/5148 ## Summary of changes Add the "could not flush frozen layer" error log to the permitted errors.	2023-08-31 10:42:32 +01:00
John Spray	300a5aa05e	pageserver: fix test v4_indexpart_is_parsed (#5157 ) ## Problem Two recent PRs raced: - https://github.com/neondatabase/neon/pull/5153 - https://github.com/neondatabase/neon/pull/5140 ## Summary of changes Add missing `generation` argument to IndexLayerMetadata construction	2023-08-31 10:40:46 +01:00
Nikita Kalyanov	b9c111962f	pass JWT to management API (#5151 ) support authentication with JWT from env for proxy calls to mgmt API	2023-08-31 12:23:51 +03:00
John Spray	83ae2bd82c	pageserver: generation number support in keys and indices (#5140 ) ## Problem To implement split brain protection, we need tenants and timelines to be aware of their current generation, and use it when composing S3 keys. ## Summary of changes - A `Generation` type is introduced in the `utils` crate -- it is in this broadly-visible location because it will later be used from `control_plane/` as well as `pageserver/`. Generations can be a number, None, or Broken, to support legacy content (None), and Tenants in the broken state (Broken). - Tenant, Timeline, and RemoteTimelineClient all get a generation attribute - IndexPart's IndexLayerMetadata has a new `generation` attribute. Legacy layers' metadata will deserialize to Generation::none(). - Remote paths are composed with a trailing generation suffix. If a generation is equal to Generation::none() (as it currently always is), then this suffix is an empty string. - Functions for composing remote storage paths added in remote_timeline_client: these avoid the way that we currently always compose a local path and then strip the prefix, and avoid requiring a PageserverConf reference on functions that want to create remote paths (the conf is only needed for local paths). These are less DRY than the old functions, but remote storage paths are a very rarely changing thing, so it's better to write out our paths clearly in the functions than to compose timeline paths from tenant paths, etc. - Code paths that construct a Tenant take a `generation` argument in anticipation that we will soon load generations on startup before constructing Tenant. Until the whole feature is done, we don't want any generation-ful keys though: so initially we will carry this everywhere with the special Generation::none() value. Closes: https://github.com/neondatabase/neon/issues/5135 Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-08-31 09:19:34 +01:00
Alexey Kondratov	f2c21447ce	[compute_ctl] Create check availability data during full configuration (#5084 ) I've moved it to the API handler in the `589cf1ed2` to do not delay compute start. Yet, we now skip full configuration and catalog updates in the most hot path -- waking up suspended compute, and only do it at: - first start - start with applying new configuration - start for availability check so it doesn't really matter anymore. The problem with creating the table and test record in the API handler is that someone can fill up timeline till the logical limit. Then it's suspended and availability check is scheduled, so it fails. If table + test row are created at the very beginning, we reserve a 8 KB page for future checks, which theoretically will last almost forever. For example, my ~1y old branch still has 8 KB sized test table: ```sql cloud_admin@postgres=# select pg_relation_size('health_check'); pg_relation_size ------------------ 8192 (1 row) ``` --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2023-08-30 17:44:28 +02:00
Conrad Ludgate	93dcdb293a	proxy: password hack hack (#5126 ) ## Problem fixes #4881 ## Summary of changes	2023-08-30 16:20:27 +01:00
John Spray	a93274b389	pageserver: remove vestigial `timeline_layers` attribute (#5153 ) ## Problem `timeline_layers` was write-only since `b95addddd5` We deployed the version that no longer requires it for deserializing, so now we can stop including it when serializing. ## Summary of changes Fully remove `timeline_layers`.	2023-08-30 16:14:04 +01:00
Anastasia Lubennikova	a7c0e4dcd0	Check if custiom extension is enabled. This check was lost in the latest refactoring. If extension is not present in 'public_extensions' or 'custom_extensions' don't download it	2023-08-30 17:47:06 +03:00
Conrad Ludgate	3b81e0c86d	chore: remove webpki (#5069 ) ## Problem webpki is unmaintained Closes https://github.com/neondatabase/neon/security/dependabot/33 ## Summary of changes Update all dependents of webpki.	2023-08-30 15:14:03 +01:00
Anastasia Lubennikova	e5a397cf96	Form archive_path for remote extensions on the fly	2023-08-30 13:56:51 +03:00
Joonas Koivunen	05773708d3	fix: add context for ancestor lsn wait (#5143 ) In logs it is confusing to see seqwait timeouts which seemingly arise from the branched lsn but actually are about the ancestor, leading to questions like "has the last_record_lsn went back". Noticed by @problame.	2023-08-30 12:21:41 +03:00
John Spray	382473d9a5	docs: add RFC for remote storage generation numbers (#4919 ) ## Summary A scheme of logical "generation numbers" for pageservers and their attachments is proposed, along with changes to the remote storage format to include these generation numbers in S3 keys. Using the control plane as the issuer of these generation numbers enables strong anti-split-brain properties in the pageserver cluster without implementing a consensus mechanism directly in the pageservers. ## Motivation Currently, the pageserver's remote storage format does not provide a mechanism for addressing split brain conditions that may happen when replacing a node during failover or when migrating a tenant from one pageserver to another. From a remote storage perspective, a split brain condition occurs whenever two nodes both think they have the same tenant attached, and both can write to S3. This can happen in the case of a network partition, pathologically long delays (e.g. suspended VM), or software bugs. This blocks robust implementation of failover from unresponsive pageservers, due to the risk that the unresponsive pageserver is still writing to S3. --------- Co-authored-by: Christian Schwarz <christian@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-08-30 09:49:55 +01:00
Arpad Müller	eb0a698adc	Make page cache and read_blk async (#5023 ) ## Problem `read_blk` does I/O and thus we would like to make it async. We can't make the function async as long as the `PageReadGuard` returned by `read_blk` isn't `Send`. The page cache is called by `read_blk`, and thus it can't be async without `read_blk` being async. Thus, we have a circular dependency. ## Summary of changes Due to the circular dependency, we convert both the page cache and `read_blk` to async at the same time: We make the page cache use `tokio::sync` synchronization primitives as those are `Send`. This makes all the places that acquire a lock require async though, which we then also do. This includes also asyncification of the `read_blk` function. Builds upon #4994, #5015, #5056, and #5129. Part of #4743.	2023-08-30 09:04:31 +02:00
Arseny Sher	81b6578c44	Allow walsender in recovery mode give WAL till dynamic flush_lsn. Instead of fixed during the start of replication. To this end, create term_flush_lsn watch channel similar to commit_lsn one. This allows to continue recovery streaming if new data appears.	2023-08-29 23:19:40 +03:00
Arseny Sher	bc49c73fee	Move wal_stream_connection_config to utils. It will be used by safekeeper as well.	2023-08-29 23:19:40 +03:00
Arseny Sher	e98580b092	Add term and http endpoint to broker messaged SkTimelineInfo. We need them for safekeeper peer recovery https://github.com/neondatabase/neon/pull/4875	2023-08-29 23:19:40 +03:00
Arseny Sher	804ef23043	Rename TermSwitchEntry to TermLsn. Add derive Ord for easy comparison of <term, lsn> pairs. part of https://github.com/neondatabase/neon/pull/4875	2023-08-29 23:19:40 +03:00
Arseny Sher	87f7d6bce3	Start and stop per timeline recovery task. Slightly refactors init: now load_tenant_timelines is also async to properly init the timeline, but to keep global map lock sync we just acquire it anew for each timeline. Recovery task itself is just a stub here. part of https://github.com/neondatabase/neon/pull/4875	2023-08-29 23:19:40 +03:00
Arseny Sher	39e3fbbeb0	Add safekeeper peers to TimelineInfo. Now available under GET /tenant/xxx/timeline/yyy for inspection.	2023-08-29 23:19:40 +03:00
Em Sharnoff	8d2a4aa5f8	vm-monitor: Add flag for when file cache on disk (#5130 ) Part 1 of 2, for moving the file cache onto disk. Because VMs are created by the control plane (and that's where the filesystem for the file cache is defined), we can't rely on any kind of synchronization between releases, so the change needs to be feature-gated (kind of), with the default remaining the same for now. See also: neondatabase/cloud#6593	2023-08-29 12:44:48 -07:00
Joonas Koivunen	d1fcdf75b3	test: enhanced logging for curious mock_s3 (#5134 ) Possible flakyness with mock_s3. Add logging in hopes this will happen again. Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-08-29 14:48:50 +03:00
Alexander Bayandin	7e39a96441	scripts/flaky_tests.py: Improve flaky tests detection (#5094 ) ## Problem We still need to rerun some builds manually because flaky tests weren't detected automatically. I found two reasons for it: - If a test is flaky on a particular build type, on a particular Postgres version, there's a high chance that this test is flaky on all configurations, but we don't automatically detect such cases. - We detect flaky tests only on the main branch, which requires manual retrigger runs for freshly made flaky tests. Both of them are fixed in the PR. ## Summary of changes - Spread flakiness of a single test to all configurations - Detect flaky tests in all branches (not only in the main) - Look back only at 7 days of test history (instead of 10)	2023-08-29 11:53:24 +01:00
Vadim Kharitonov	babefdd3f9	Upgrade pgvector to 0.5.0 (#5132 )	2023-08-29 12:53:50 +03:00
Arpad Müller	805fee1483	page cache: small code cleanups (#5125 ) ## Problem I saw these things while working on #5111. ## Summary of changes * Add a comment explaining why we use `Vec::leak` instead of `Vec::into_boxed_slice` plus `Box::leak`. * Add another comment explaining what `valid` is doing, it wasn't very clear before. * Add a function `set_usage_count` to not set it directly.	2023-08-29 11:49:04 +03:00
Felix Prasanna	85d6d9dc85	monitor/compute_ctl: remove references to the informant (#5115 ) Also added some docs to the monitor :) Co-authored-by: Em Sharnoff <sharnoff@neon.tech>	2023-08-29 02:59:27 +03:00
Em Sharnoff	e40ee7c3d1	remove unused file 'vm-cgconfig.conf' (#5127 ) Honestly no clue why it's still here, should have been removed ages ago. This is handled by vm-builder now.	2023-08-28 13:04:57 -07:00
Christian Schwarz	0fe3b3646a	page cache: don't proactively evict EphemeralFile pages (#5129 ) Before this patch, when dropping an EphemeralFile, we'd scan the entire `slots` to proactively evict its pages (`drop_buffers_for_immutable`). This was _necessary_ before #4994 because the page cache was a write-back cache: we'd be deleting the EphemeralFile from disk after, so, if we hadn't evicted its pages before that, write-back in `find_victim` wouldhave failed. But, since #4994, the page cache is a read-only cache, so, it's safe to keep read-only data cached. It's never going to get accessed again and eventually, `find_victim` will evict it. The only remaining advantage of `drop_buffers_for_immutable` over relying on `find_victim` is that `find_victim` has to do the clock page replacement iterations until the count reaches 0, whereas `drop_buffers_for_immutable` can kick the page out right away. However, weigh that against the cost of `drop_buffers_for_immutable`, which currently scans the entire `slots` array to find the EphemeralFile's pages. Alternatives have been proposed in #5122 and #5128, but, they come with their own overheads & trade-offs. Also, the real reason why we're looking into this piece of code is that we want to make the slots rwlock async in #5023. Since `drop_buffers_for_immutable` is called from drop, and there is no async drop, it would be nice to not have to deal with this. So, let's just stop doing `drop_buffers_for_immutable` and observe the performance impact in benchmarks.	2023-08-28 20:42:18 +02:00
Em Sharnoff	529f8b5016	compute_ctl: Fix switched vm-monitor args (#5117 ) Small switcheroo from #4946.	2023-08-28 14:55:41 +02:00
Joonas Koivunen	fbcd174489	load_layer_map: schedule deletions for any future layers (#5103 ) Unrelated fixes noticed while integrating #4938. - Stop leaking future layers in remote storage - We schedule extra index_part uploads if layer name to be removed was not actually present	2023-08-28 10:51:49 +03:00
Felix Prasanna	7b5489a0bb	compute_ctl: start pg in cgroup for vms (#4920 ) Starts `postgres` in cgroup directly from `compute_ctl` instead of from `vm-builder`. This is required because the `vm-monitor` cannot be in the cgroup it is managing. Otherwise, it itself would be frozen when freezing the cgroup. Requires https://github.com/neondatabase/cloud/pull/6331, which adds the `AUTOSCALING` environment variable letting `compute_ctl` know to start `postgres` in the cgroup. Requires https://github.com/neondatabase/autoscaling/pull/468, which prevents `vm-builder` from starting the monitor and putting postgres in a cgroup. This will require a `VM_BUILDER_VERSION` bump.	2023-08-25 15:59:12 -04:00
Felix Prasanna	40268dcd8d	monitor: fix filecache calculations (#5112 ) ## Problem An underflow bug in the filecache calculations. ## Summary of changes Fixed the bug, cleaned up calculations in general.	2023-08-25 13:29:10 -04:00
Vadim Kharitonov	4436c84751	Change codeowners (#5109 )	2023-08-25 19:48:16 +03:00
John Spray	b758bf47ca	pageserver: refactor TimelineMetadata serialization in IndexPart (#5091 ) ## Problem The `metadata_bytes` field of IndexPart required explicit deserialization & error checking everywhere it was used -- there isn't anything special about this structure that should prevent it from being serialized & deserialized along with the rest of the structure. ## Summary of changes - Implement Serialize and Deserialize for TimelineMetadata - Replace IndexPart::metadata_bytes with a simpler `metadata`, that can be used directly. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-08-25 16:16:20 +01:00
Felix Prasanna	024e306f73	monitor: improve logging (#5099 )	2023-08-25 10:09:53 -04:00
Alek Westover	f71c82e5de	remove obsolete `need` dependency (#5087 )	2023-08-25 09:10:26 -04:00
Conrad Ludgate	faf070f288	proxy: dont return connection pending (#5107 ) ## Problem We were returning Pending when a connection had a notice/notification (introduced recently in #5020). When returning pending, the runtime assumes you will call `cx.waker().wake()` in order to continue processing. We weren't doing that, so the connection task would get stuck ## Summary of changes Don't return pending. Loop instead	2023-08-25 15:08:45 +03:00
Arpad Müller	8c13296add	Remove `BlockReader::read_blk` in favour of `BlockCursor` (#5015 ) ## Problem We want to make `read_blk` an async function, but outside of `async_trait`, which allocates, and nightly features, we can't use async fn's in traits. ## Summary of changes * Remove all uses of `BlockReader::read_blk` in favour of using block cursors, at least where the type of the `BlockReader` is behind a generic * Introduce a `BlockReaderRef` enum that lists all implementors of `BlockReader::read_blk`. * Remove `BlockReader::read_blk` and move its implementations into inherent functions on the types instead. We don't turn `read_blk` into an async fn yet, for that we also need to modify the page cache. So this is a preparatory PR, albeit an important one. Part of #4743.	2023-08-25 12:28:01 +02:00
Felix Prasanna	18537be298	monitor: listen on correct port to accept agent connections (#5100 ) ## Problem The previous arguments have the monitor listen on `localhost`, which the informant can connect to since it's also in the VM, but which the agent cannot. Also, the port is wrong. ## Summary of changes Listen on `0.0.0.0:10301`	2023-08-24 17:32:46 -04:00
Felix Prasanna	3128eeff01	compute_ctl: add vm-monitor (#4946 ) Co-authored-by: Em Sharnoff <sharnoff@neon.tech>	2023-08-24 15:54:37 -04:00
Arpad Müller	227c87e333	Make EphemeralFile::write_blob function async (#5056 ) ## Problem The `EphemeralFile::write_blob` function accesses the page cache internally. We want to require `async` for these accesses in #5023. ## Summary of changes This removes the implementaiton of the `BlobWriter` trait for `EphemeralFile` and turns the `write_blob` function into an inherent function. We can then make it async as well as the `push_bytes` function. We move the `SER_BUFFER` thread-local into the `InMemoryLayerInner` so that the same buffer can be accessed by different threads as the async is (potentially) moved between threads. Part of #4743, preparation for #5023.	2023-08-24 19:18:30 +02:00
Alek Westover	e8f9aaf78c	Don't use non-existent docker tags (#5096 )	2023-08-24 19:45:23 +03:00
Chengpeng Yan	fa74d5649e	rename `EphmeralFile::size` to `EphemeralFile::len` (#5076 ) ## Problem close https://github.com/neondatabase/neon/issues/5034 ## Summary of changes Based on the [comment](https://github.com/neondatabase/neon/pull/4994#discussion_r1297277922). Just rename the `EphmeralFile::size` to `EphemeralFile::len`.	2023-08-24 16:41:57 +02:00
Joonas Koivunen	f70871dfd0	internal-devx: pageserver future layers (#5092 ) I've personally forgotten why/how can we have future layers during reconciliation. Adds `#[cfg(feature = "testing")]` logging when we upload such index_part.json, with a cross reference to where the cleanup happens. Latest private slack thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1692879032573809?thread_ts=1692792276.173979&cid=C033RQ5SPDH Builds upon #5074. Should had been considered on #4837.	2023-08-24 17:22:36 +03:00
Alek Westover	99a1be6c4e	remove upload step from neon, it is in private repo now (#5085 )	2023-08-24 17:14:40 +03:00
Joonas Koivunen	76aa01c90f	refactor: single phase Timeline::load_layer_map (#5074 ) Current implementation first calls `load_layer_map`, which loads all local layers, cleans up files, leave cleaning up stuff to "second function". Then the "second function" is finally called, it does not do the cleanup and some of the first functions setup can torn down. "Second function" is actually both `reconcile_with_remote` and `create_remote_layers`. This change makes it a bit more verbose but in one phase with the following sub-steps: 1. scan the timeline directory 2. delete extra files - now including on-demand download files - fixes #3660 3. recoincile the two sources of layers (directory, index_part) 4. rename_to_backup future layers, short layers 5. create the remaining as layers Needed by #4938. It was also noticed that this is blocking code in an `async fn` so just do it in a `spawn_blocking`, which should be healthy for our startup times. Other effects includes hopefully halving of `stat` calls; extra calls which were not done previously are now done for the future layers. Co-authored-by: Christian Schwarz <christian@neon.tech> Co-authored-by: John Spray <john@neon.tech>	2023-08-24 16:07:40 +03:00
John Spray	3e2f0ffb11	libs: make backoff::retry() take a cancellation token (#5065 ) ## Problem Currently, anything that uses backoff::retry will delay the join of its task by however long its backoff sleep is, multiplied by its max retries. Whenever we call a function that sleeps, we should be passing in a CancellationToken. ## Summary of changes - Add a `Cancel` type to backoff::retry that wraps a CancellationToken and an error `Fn` to generate an error if the cancellation token fires. - In call sites that already run in a `task_mgr` task, use `shutdown_token()` to provide the token. In other locations, use a dead `CancellationToken` to satisfy the interface, and leave a TODO to fix it up when we broaden the use of explicit cancellation tokens.	2023-08-24 14:54:46 +03:00
Arseny Sher	d597e6d42b	Track list of walreceivers and their voting/streaming state in shmem. Also add both walsenders and walreceivers to TimelineStatus (available under v1/tenant/xxx/timeline/yyy). Prepares for https://github.com/neondatabase/neon/pull/4875	2023-08-23 16:04:08 +03:00
Christian Schwarz	71ccb07a43	ci: fix upload-postgres-extensions-to-s3 job (#5063 ) This is cherry-picked-then-improved version of release branch commit `4204960942` PR #4861) The commit commit `5f8fd640bf` Author: Alek Westover <alek.westover@gmail.com> Date: Wed Jul 26 08:24:03 2023 -0400 Upload Test Remote Extensions (#4792) switched to using the release tag instead of `latest`, but, the `promote-images` job only uploads `latest` to the prod ECR. The switch to using release tag was good in principle, but, it broke the release pipeline. So, switch release pipeline back to using `latest`. Note that a proper fix should abandon use of `:latest` tag at all: currently, if a `main` pipeline runs concurrently with a `release` pipeline, the `release` pipeline may end up using the `main` pipeline's images. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-08-22 22:45:25 +03:00
Joonas Koivunen	ad8d777c1c	refactor: remove is_incremental=true for ImageLayers footgun (#5061 ) Accidentially giving is_incremental=true for ImageLayers costs a lot of debugging time. Removes all API which would allow to do that. They can easily be restored later when needed. Split off from #4938.	2023-08-22 22:12:05 +03:00
Joonas Koivunen	2f97b43315	build: update tar, get rid of duplicate xattr (#5071 ) `tar` recently pushed to 0.4.40. No big changes, but less Cargo.lock and one less nagging from `cargo-deny`. The diff: https://github.com/alexcrichton/tar-rs/compare/0.4.38...0.4.40.	2023-08-22 21:21:44 +03:00
Joonas Koivunen	533a92636c	refactor: pre-cleanup Layer, PersistentLayer and impls (#5059 ) Remove pub but dead code, move trait methods as inherent methods, remove unnecessary. Split off from #4938.	2023-08-22 21:14:28 +03:00
Alek Westover	bf303a6575	Trigger workflow in remote (private) repo to build and upload private extensions (#4944 )	2023-08-22 13:32:29 -04:00
Christian Schwarz	8cd20485f8	metrics: smgr query time: add a pre-aggregated histogram (#5064 ) When doing global queries in VictoriaMetrics, the per-timeline histograms make us run into cardinality limits. We don't want to give them up just yet because we don't have an alternative for drilling down on timeline-specific performance issues. So, add a pre-aggregated histogram and add observations to it whenever we add observations to the per-timeline histogram. While we're at it, switch to using a strummed enum for the operation type names.	2023-08-22 20:08:31 +03:00
Joonas Koivunen	933a869f00	refactor: compaction becomes async again (#5058 ) #4938 will make on-demand download of layers in compaction possible, so it's not suitable for our "policy" of no `spawn_blocking(\|\| ... Handle::block_on(async { spawn_blocking(...).await })` because this poses a clear deadlock risk. Nested spawn_blockings are because of the download using `tokio::fs::File`. - Remove `spawn_blocking` from caller of `compact_level0_phase1` - Remove `Handle::block_on` from `compact_level0_phase1` (indentation change) - Revert to `AsLayerDesc::layer_desc` usage temporarily (until it becomes field access in #4938)	2023-08-22 20:03:14 +03:00
Conrad Ludgate	8c6541fea9	chore: add supported targets to deny (#5070 ) ## Problem many duplicate windows crates pollute the cargo deny output ## Summary of changes we don't build those crates, so remove those targets from being checked	2023-08-22 19:44:31 +03:00
Alek Westover	5cf75d92d8	Fix cargo deny errors (#5068 ) ## Problem cargo deny lint broken Links to the CVEs: [rustsec.org/advisories/RUSTSEC-2023-0052](https://rustsec.org/advisories/RUSTSEC-2023-0052) [rustsec.org/advisories/RUSTSEC-2023-0053](https://rustsec.org/advisories/RUSTSEC-2023-0053) One is fixed, the other one isn't so we allow it (for now), to unbreak CI. Then later we'll try to get rid of webpki in favour of the rustls fork. ## Summary of changes ``` +ignore = ["RUSTSEC-2023-0052"] ```	2023-08-22 18:41:32 +03:00
Conrad Ludgate	0b001a0001	proxy: remove connections on shutdown (#5051 ) ## Problem On shutdown, proxy connections are staying open. ## Summary of changes Remove the connections on shutdown	2023-08-21 19:20:58 +01:00
Felix Prasanna	4a8bd866f6	bump vm-builder version to v0.16.3 (#5055 ) This change to autoscaling allows agents to connect directly to the monitor, completely removing the informant.	2023-08-21 13:29:16 -04:00
John Spray	615a490239	pageserver: refactor Tenant/Timeline args into structs (#5053 ) ## Problem There are some common types that we pass into tenants and timelines as we construct them, such as remote storage and the broker client. Currently the list is small, but this is likely to grow -- the deletion queue PR (#4960) pushed some methods to the point of clippy complaining they had too many args, because of the extra deletion queue client being passed around. There are some shared objects that currently aren't passed around explicitly because they use a static `once_cell` (e.g. CONCURRENT_COMPACTIONS), but as we add more resource management and concurreny control over time, it will be more readable & testable to pass a type around in the respective Resources object, rather than to coordinate via static objects. The `Resources` structures in this PR will make it easier to add references to central coordination functions, without having to rely on statics. ## Summary of changes - For `Tenant`, the `broker_client` and `remote_storage` are bundled into `TenantSharedResources` - For `Timeline`, the `remote_client` is wrapped into `TimelineResources`. Both of these structures will get an additional deletion queue member in #4960.	2023-08-21 17:30:28 +01:00
John Spray	b95addddd5	pageserver: do not read redundant `timeline_layers` from IndexPart, so that we can remove it later (#4972 ) ## Problem IndexPart contains two redundant lists of layer names: a set of the names, and then a map of name to metadata. We already required that all the layers in `timeline_layers` are also in `layers_metadata`, in `initialize_with_current_remote_index_part`, so if there were any index_part.json files in the field that relied on these sets being different, they would already be broken. ## Summary of changes `timeline_layers` is made private and no longer read at runtime. It is still serialized, but not deserialized. `disk_consistent_lsn` is also made private, as this field only exists for convenience of humans reading the serialized JSON. This prepares us to entirely remove `timeline_layers` in a future release, once this change is fully deployed, and therefore no pageservers are trying to read the field.	2023-08-21 14:29:36 +03:00
Joonas Koivunen	130ccb4b67	Remove initial timeline id troubles (#5044 ) I made a mistake when I adding `env.initial_timeline: Optional[TimelineId]` in the #3839, should had just generated it and used it to create a specific timeline. This PR fixes those mistakes, and some extra calling into psql which must be slower than python field access.	2023-08-20 12:33:19 +03:00
Dmitry Rodionov	9140a950f4	Resume tenant deletion on attach (#5039 ) I'm still a bit nervous about attach -> crash case. But it should work. (unlike case with timeline). Ideally would be cool to cover this with test. This continues tradition of adding bool flags for Tenant::set_stopping. Probably lifecycle project will help with fixing it.	2023-08-20 12:28:50 +03:00
Arpad Müller	a23b0773f1	Fix DeltaLayer dumping (#5045 ) ## Problem Before, DeltaLayer dumping (via `cargo run --release -p pagectl -- print-layer-file` ) would crash as one can't call `Handle::block_on` in an async executor thread. ## Summary of changes Avoid the problem by using `DeltaLayerInner::load_keys` to load the keys into RAM (which we already do during compaction), and then load the values one by one during dumping.	2023-08-19 00:56:03 +02:00
Joonas Koivunen	368ee6c8ca	refactor: failpoint support (#5033 ) - move them to pageserver which is the only dependant on the crate fail - "move" the exported macro to the new module - support at init time the same failpoints as runtime Found while debugging test failures and making tests more repeatable by allowing "exit" from pageserver start via environment variables. Made those changes to `test_gc_cutoff.py`. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-08-19 01:01:44 +03:00
Felix Prasanna	5c6a692cf1	bump `VM_BUILDER_VERSION` to v0.16.2 (#5031 ) A very slight change that allows us to configure the UID of the neon-postgres cgroup owner. We start postgres in this cgroup so we can scale it with the cgroups v2 api. Currently, the control plane overwrites the entrypoint set by `vm-builder`, so `compute_ctl` (and thus postgres), is not started in the neon-postgres cgroup. Having `compute_ctl` start postgres in the cgroup should fix this. However, at the moment appears like it does not have the correct permissions. Configuring the neon-postgres UID to `postgres` (which is the UID `compute_ctl` runs under) should hopefully fix this. See #4920 - the PR to modify `compute_ctl` to start postgres in the cgorup. See: neondatabase/autoscaling#480, neondatabase/autoscaling#477. Both these PR's are part of an effort to increase `vm-builder`'s configurability and allow us to adjust it as we integrate in the monitor.	2023-08-18 14:29:20 -04:00
Dmitry Rodionov	30888a24d9	Avoid flakiness in test_timeline_delete_fail_before_local_delete (#5032 ) The problem was that timeline detail can return timelines in not only active state. And by the time request comes timeline deletion can still be in progress if we're unlucky (test execution happened to be slower for some reason) Reference for failed test run https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5022/5891420105/index.html#suites/f588e0a787c49e67b29490359c589fae/dab036e9bd673274 The error was `Exception: detail succeeded (it should return 404)` reported by @koivunej	2023-08-18 20:49:11 +03:00
Dmitry Rodionov	f6c671c140	resume timeline deletions on attach (#5030 ) closes [#5036](https://github.com/neondatabase/neon/issues/5036)	2023-08-18 20:48:33 +03:00
Christian Schwarz	ed5bce7cba	rfcs: archive my MVCC S3 Notion Proposal (#5040 ) This is a copy from the [original Notion page](https://www.notion.so/neondatabase/Proposal-Pageserver-MVCC-S3-Storage-8a424c0c7ec5459e89d3e3f00e87657c?pvs=4), taken on 2023-08-16. This is for archival mostly. The RFC that we're likely to go with is https://github.com/neondatabase/neon/pull/4919.	2023-08-18 19:34:29 +02:00
Christian Schwarz	7a63685cde	simplify page-caching of EphemeralFile (#4994 ) (This PR is the successor of https://github.com/neondatabase/neon/pull/4984 ) ## Summary The current way in which `EphemeralFile` uses `PageCache` complicates the Pageserver code base to a degree that isn't worth it. This PR refactors how we cache `EphemeralFile` contents, by exploiting the append-only nature of `EphemeralFile`. The result is that `PageCache` only holds `ImmutableFilePage` and `MaterializedPage`. These types of pages are read-only and evictable without write-back. This allows us to remove the writeback code from `PageCache`, also eliminating an entire failure mode. Futher, many great open-source libraries exist to solve the problem of a read-only cache, much better than our `page_cache.rs` (e.g., better replacement policy, less global locking). With this PR, we can now explore using them. ## Problem & Analysis Before this PR, `PageCache` had three types of pages: * `ImmutableFilePage`: caches Delta / Image layer file contents * `MaterializedPage`: caches results of Timeline::get (page materialization) * `EphemeralPage`: caches `EphemeralFile` contents `EphemeralPage` is quite different from `ImmutableFilePage` and `MaterializedPage`: * Immutable and materialized pages are for the acceleration of (future) reads of the same data using `PAGE_CACHE_SIZE * PAGE_SIZE` bytes of DRAM. * Ephemeral pages are a write-back cache of `EphemeralFile` contents, i.e., if there is pressure in the page cache, we spill `EphemeralFile` contents to disk. `EphemeralFile` is only used by `InMemoryLayer`, for the following purposes: * write: when filling up the `InMemoryLayer`, via `impl BlobWriter for EphemeralFile` * read: when doing page reconstruction for a page@lsn that isn't written to disk * read: when writing L0 layer files, we re-read the `InMemoryLayer` and put the contents into the L0 delta writer (`create_delta_layer`). This happens every 10min or when InMemoryLayer reaches 256MB in size. The access patterns of the `InMemoryLayer` use case are as follows: * write: via `BlobWriter`, strictly append-only * read for page reconstruction: via `BlobReader`, random * read for `create_delta_layer`: via `BlobReader`, dependent on data, but generally random. Why? * in classical LSM terms, this function is what writes the memory-resident `C0` tree into the disk-resident `C1` tree * in our system, though, the values of InMemoryLayer are stored in an EphemeralFile, and hence they are not guaranteed to be memory-resident * the function reads `Value`s in `Key, LSN` order, which is `!=` insert order What do these `EphemeralFile`-level access patterns mean for the page cache? * write: * the common case is that `Value` is a WAL record, and if it isn't a full-page-image WAL record, then it's smaller than `PAGE_SIZE` * So, the `EphemeralPage` pages act as a buffer for these `< PAGE_CACHE` sized writes. * If there's no page cache eviction between subsequent `InMemoryLayer::put_value` calls, the `EphemeralPage` is still resident, so the page cache avoids doing a `write` system call. * In practice, a busy page server will have page cache evictions because we only configure 64MB of page cache size. * reads for page reconstruction: read acceleration, just as for the other page types. * reads for `create_delta_layer`: * The `Value` reads happen through a `BlockCursor`, which optimizes the case of repeated reads from the same page. * So, the best case is that subsequent values are located on the same page; hence `BlockCursor`s buffer is maximally effective. * The worst case is that each `Value` is on a different page; hence the `BlockCursor`'s 1-page-sized buffer is ineffective. * The best case translates into `256MB/PAGE_SIZE` page cache accesses, one per page. * the worst case translates into `#Values` page cache accesses * again, the page cache accesses must be assumed to be random because the `Value`s aren't accessed in insertion order but `Key, LSN` order. ## Summary of changes Preliminaries for this PR were: - #5003 - #5004 - #5005 - uncommitted microbenchmark in #5011 Based on the observations outlined above, this PR makes the following changes: * Rip out `EphemeralPage` from `page_cache.rs` * Move the `block_io::FileId` to `page_cache::FileId` * Add a `PAGE_SIZE`d buffer to the `EphemeralPage` struct. It's called `mutable_tail`. * Change `write_blob` to use `mutable_tail` for the write buffering instead of a page cache page. * if `mutable_tail` is full, it writes it out to disk, zeroes it out, and re-uses it. * There is explicitly no double-buffering, so that memory allocation per `EphemeralFile` instance is fixed. * Change `read_blob` to return different `BlockLease` variants depending on `blknum` * for the `blknum` that corresponds to the `mutable_tail`, return a ref to it * Rust borrowing rules prevent `write_blob` calls while refs are outstanding. * for all non-tail blocks, return a page-cached `ImmutablePage` * It is safe to page-cache these as ImmutablePage because EphemeralFile is append-only. ## Performance How doe the changes above affect performance? M claim is: not significantly. * write path: * before this PR, the `EphemeralFile::write_blob` didn't issue its own `write` system calls. * If there were enough free pages, it didn't issue any `write` system calls. * If it had to evict other `EphemeralPage`s to get pages a page for its writes (`get_buf_for_write`), the page cache code would implicitly issue the writeback of victim pages as needed. * With this PR, `EphemeralFile::write_blob` always issues all of its own `write` system calls. * Also, the writes are explicit instead of implicit through page cache write back, which will help #4743 * The perf impact of always doing the writes is the CPU overhead and syscall latency. * Before this PR, we might have never issued them if there were enough free pages. * We don't issue `fsync` and can expect the writes to only hit the kernel page cache. * There is also an advantage in issuing the writes directly: the perf impact is paid by the tenant that caused the writes, instead of whatever tenant evicts the `EphemeralPage`. * reads for page reconstruction: no impact. * The `write_blob` function pre-warms the page cache when it writes the `mutable_tail` to disk. * So, the behavior is the same as with the EphemeralPages before this PR. * reads for `create_delta_layer`: no impact. * Same argument as for page reconstruction. * Note for the future: * going through the page cache likely causes read amplification here. Why? * Due to the `Key,Lsn`-ordered access pattern, we don't read all the values in the page before moving to the next page. In the worst case, we might read the same page multiple times to read different `Values` from it. * So, it might be better to bypass the page cache here. * Idea drafts: * bypass PS page cache + prefetch pipeline + iovec-based IO * bypass PS page cache + use `copy_file_range` to copy from ephemeral file into the L0 delta file, without going through user space	2023-08-18 20:31:03 +03:00
Joonas Koivunen	0a082aee77	test: allow race with flush and stopped queue (#5027 ) A lucky race can happen with the shutdown order I guess right now. Seen in [test_tenant_delete_smoke]. The message is not the greatest to match against. [test_tenant_delete_smoke]: https://neon-github-public-dev.s3.amazonaws.com/reports/main/5892262320/index.html#suites/3556ed71f2d69272a7014df6dcb02317/189a0d1245fb5a8c	2023-08-18 19:36:25 +03:00
Arthur Petukhovsky	0b90411380	Fix safekeeper recovery with auth (#5035 ) Fix missing a password in walrcv_connect for a safekeeper recovery. Add a test which restarts endpoint and triggers a recovery.	2023-08-18 16:48:55 +01:00
Arpad Müller	f4da010aee	Make the compaction warning more tolerant (#5024 ) ## Problem The performance benchmark in `test_runner/performance/test_layer_map.py` is currently failing due to the warning added in #4888. ## Summary of changes The test mentioned has a `compaction_target_size` of 8192, which is just one page size. This is an unattainable goal, as we generate at least three pages: one for the header, one for the b-tree (minimally sized ones have just the root node in a single page), one for the data. Therefore, we add two pages to the warning limit. The warning text becomes a bit less accurate but I think this is okay.	2023-08-18 16:36:31 +02:00
Conrad Ludgate	ec10838aa4	proxy: pool connection logs (#5020 ) ## Problem Errors and notices that happen during a pooled connection lifecycle have no session identifiers ## Summary of changes Using a watch channel, we set the session ID whenever it changes. This way we can see the status of a connection for that session Also, adding a connection id to be able to search the entire connection lifecycle	2023-08-18 11:44:08 +01:00
Joonas Koivunen	67af24191e	test: cleanup remote_timeline_client tests (#5013 ) I will have to change these as I change remote_timeline_client api in #4938. So a bit of cleanup, handle my comments which were just resolved during initial review. Cleanup: - use unwrap in tests instead of mixed `?` and `unwrap` - use `Handle` instead of `&'static Reactor` to make the RemoteTimelineClient more natural - use arrays in tests - use plain `#[tokio::test]`	2023-08-17 19:27:30 +03:00
Joonas Koivunen	6af5f9bfe0	fix: format context (#5022 ) We return an error with unformatted `{timeline_id}`.	2023-08-17 14:30:25 +00:00
Dmitry Rodionov	64fc7eafcd	Increase timeout once again. (#5021 ) When failpoint is early in deletion process it takes longer to complete after failpoint is removed. Example was: https://neon-github-public-dev.s3.amazonaws.com/reports/main/5889544346/index.html#suites/3556ed71f2d69272a7014df6dcb02317/49826c68ce8492b1	2023-08-17 15:37:28 +03:00
Conrad Ludgate	3e4710c59e	proxy: add more sasl logs (#5012 ) ## Problem A customer is having trouble connecting to neon from their production environment. The logs show a mix of "Internal error" and "authentication protocol violation" but not the full error ## Summary of changes Make sure we don't miss any logs during SASL/SCRAM	2023-08-17 12:05:54 +01:00
Dmitry Rodionov	d8b0a298b7	Do not attach deleted tenants (#5008 ) Rather temporary solution before proper: https://github.com/neondatabase/neon/issues/5006 It requires more plumbing so lets not attach deleted tenants first and then implement resume. Additionally fix `assert_prefix_empty`. It had a buggy prefix calculation, and since we always asserted for absence of stuff it worked. Here I started to assert for presence of stuff too and it failed. Added more "presence" asserts to other places to be confident that it works. Resolves [#5016](https://github.com/neondatabase/neon/issues/5016)	2023-08-17 13:46:49 +03:00
Alexander Bayandin	c8094ee51e	test_compatibility: run amcheck unconditionally (#4985 ) ## Problem The previous version of neon (that we use in the forward compatibility test) has installed `amcheck` extension now. We can run `pg_amcheck` unconditionally. ## Summary of changes - Run `pg_amcheck` in compatibility tests unconditionally	2023-08-17 11:46:00 +01:00
Christian Schwarz	957af049c2	ephemeral file: refactor write_blob impl to concentrate mutable state (#5004 ) Before this patch, we had the `off` and `blknum` as function-wide mutable state. Now it's contained in the `Writer` struct. The use of `push_bytes` instead of index-based filling of the buffer also makes it easier to reason about what's going on. This is prep for https://github.com/neondatabase/neon/pull/4994	2023-08-17 13:07:25 +03:00
Anastasia Lubennikova	786c7b3708	Refactor remote extensions index download. Don't download ext_index.json from s3, but instead receive it as a part of spec from control plane. This eliminates s3 access for most compute starts, and also allows us to update extensions spec on the fly	2023-08-17 12:48:33 +03:00
Joonas Koivunen	d3612ce266	delta_layer: Restore generic from last week (#5014 ) Restores #4937 work relating to the ability to use `ResidentDeltaLayer` (which is an Arc wrapper) in #4938 for the ValueRef's by removing the borrow from `ValueRef` and providing it from an upper layer. This should not have any functional changes, most importantly, the `main` will continue to use the borrowed `DeltaLayerInner`. It might be that I can change #4938 to be like this. If that is so, I'll gladly rip out the `Ref` and move the borrow back. But I'll first want to look at the current test failures.	2023-08-17 11:47:31 +03:00
Christian Schwarz	994411f5c2	page cache: newtype the blob_io and ephemeral_file file ids (#5005 ) This makes it more explicit that these are different u64-sized namespaces. Re-using one in place of the other would be catastrophic. Prep for https://github.com/neondatabase/neon/pull/4994 which will eliminate the ephemeral_file::FileId and move the blob_io::FileId into page_cache. It makes sense to have this preliminary commit though, to minimize amount of new concept in #4994 and other preliminaries that depend on that work.	2023-08-16 18:33:47 +02:00
Conrad Ludgate	25934ec1ba	proxy: reduce global conn pool contention (#4747 ) ## Problem As documented, the global connection pool will be high contention. ## Summary of changes Use DashMap rather than Mutex<HashMap>. Of note, DashMap currently uses a RwLock internally, but it's partially sharded to reduce contention by a factor of N. We could potentially use flurry which is a port of Java's concurrent hashmap, but I have no good understanding of it's performance characteristics. Dashmap is at least equivalent to hashmap but less contention. See the read heavy benchmark to analyse our expected performance <https://github.com/xacrimon/conc-map-bench#ready-heavy> I also spoke with the developer of dashmap recently, and they are working on porting the implementation to use concurrent HAMT FWIW	2023-08-16 17:20:28 +01:00
Arpad Müller	0bdbc39cb1	Compaction: unify key and value reference vecs (#4888 ) ## Problem PR #4839 has already reduced the number of b-tree traversals and vec creations from 3 to 2, but as pointed out in https://github.com/neondatabase/neon/pull/4839#discussion_r1279167815 , we would ideally just traverse the b-tree once during compaction. Afer #4836, the two vecs created are one for the list of keys, lsns and sizes, and one for the list of `(key, lsn, value reference)`. However, they are not equal, as pointed out in https://github.com/neondatabase/neon/pull/4839#issuecomment-1660418012 and the following comment: the key vec creation combines multiple entries for which the lsn is changing but the key stays the same into one, with the size being the sum of the sub-sizes. In SQL, this would correspond to something like `SELECT key, lsn, SUM(size) FROM b_tree GROUP BY key;` and `SELECT key, lsn, val_ref FROM b_tree;`. Therefore, the join operation is non-trivial. ## Summary of changes This PR merges the two lists of keys and value references into one. It's not a trivial change and affects the size pattern of the resulting files, which is why this is in a separate PR from #4839 . The key vec is used in compaction for determining when to start a new layer file. The loop uses various thresholds to come to this conclusion, but the grouping via the key has led to the behaviour that regardless of the threshold, it only starts a new file when either a new key is encountered, or a new delta file. The new code now does the combination after the merging and sorting of the various keys from the delta files. This mostly does the same as the old code, except for a detail: with the grouping done on a per-delta-layer basis, the sorted and merged vec would still have multiple entries for multiple delta files, but now, we don't have an easy way to tell when a new input delta layer file is encountered, so we cannot create multiple entries on that basis easily. To prevent possibly infinite growth, our new grouping code compares the combined size with the threshold, and if it is exceeded, it cuts a new entry so that the downstream code can cut a new output file. Here, we perform a tradeoff however, as if the threshold is too small, we risk putting entries for the same key into multiple layer files, but if the threshold is too big, we can in some instances exceed the target size. Currently, we set the threshold to the target size, so in theory we would stay below or roughly at double the `target_file_size`. We also fix the way the size was calculated for the last key. The calculation was wrong and accounted for the old layer's btree, even though we already account for the overhead of the in-construction btree. Builds on top of #4839 .	2023-08-16 18:27:18 +03:00
Dmitry Rodionov	96b84ace89	Correctly remove orphaned objects in RemoteTimelineClient::delete_all (#5000 ) Previously list_prefixes was incorrectly used for that purpose. Change to use list_files. Add a test. Some drive by refactorings on python side to move helpers out of specific test file to be widely accessible resolves https://github.com/neondatabase/neon/issues/4499	2023-08-16 17:31:16 +03:00
Christian Schwarz	368b783ada	ephemeral_file: remove FileExt impl (was only used by tests) (#5003 ) Extracted from https://github.com/neondatabase/neon/pull/4994	2023-08-16 15:41:25 +02:00
Dmitry Rodionov	0f47bc03eb	Fix delete_objects in UnreliableWrapper (#5002 ) For `delete_objects` it was injecting failures for whole delete_objects operation and then for every delete it contains. Make it fail once for the whole operation.	2023-08-16 14:08:53 +03:00
Arseny Sher	fdbe8dc8e0	Fix test_s3_wal_replay flakiness. ref https://github.com/neondatabase/neon/issues/4466	2023-08-16 12:57:43 +03:00
Arthur Petukhovsky	1b97a3074c	Disable neon-pool-opt-in (#4995 )	2023-08-15 20:57:56 +03:00
John Spray	5c836ee5b4	tests: extend timeout in timeline deletion test (#4992 ) ## Problem This was set to 5 seconds, which was very close to how long a compaction took on my workstation, and when deletion is blocked on compaction the test would fail. We will fix this to make compactions drop out on deletion, but for the moment let's stabilize the test. ## Summary of changes Change timeout on timeline deletion in `test_timeline_deletion_with_files_stuck_in_upload_queue` from 5 seconds to 30 seconds.	2023-08-15 20:14:03 +03:00
Arseny Sher	4687b2e597	Test that auth on pg/http services can be enabled separately in sks. To this end add 1) -e option to 'neon_local safekeeper start' command appending extra options to safekeeper invocation; 2) Allow multiple occurrences of the same option in safekeepers, the last value is taken. 3) Allow to specify empty string for *-auth-public-key-path opts, it disables auth for the service.	2023-08-15 19:31:20 +03:00
Arseny Sher	13adc83fc3	Allow to enable http/pg/pg tenant only auth separately in safekeeper. The same option enables auth and specifies public key, so this allows to use different public keys as well. The motivation is to 1) Allow to e.g. change pageserver key/token without replacing all compute tokens. 2) Enable auth gradually.	2023-08-15 19:31:20 +03:00
Dmitry Rodionov	52c2c69351	fsync directory before mark file removal (#4986 ) ## Problem Deletions can be possibly reordered. Use fsync to avoid the case when mark file doesnt exist but other tenant/timeline files do. See added comments. resolves #4987	2023-08-15 19:24:23 +03:00
Alexander Bayandin	207919f5eb	Upload test results to DB right after generation (#4967 ) ## Problem While adding new test results format, I've also changed the way we upload Allure reports to S3 (`722c7956bb`) to avoid duplicated results from previous runs. But it broke links at earlier results (results are still available but on different URLs). This PR fixes this (by reverting logic in `722c7956bb` changes), and moves the logic for storing test results into db to allure generate step. It allows us to avoid test results duplicates in the db and saves some time on extra s3 downloads that happened in a different job before the PR. Ref https://neondb.slack.com/archives/C059ZC138NR/p1691669522160229 ## Summary of changes - Move test results storing logic from a workflow to `actions/allure-report-generate`	2023-08-15 15:32:30 +01:00
George MacKerron	218be9eb32	Added deferrable transaction option to http batch queries (#4993 ) ## Problem HTTP batch queries currently allow us to set the isolation level and read only, but not deferrable. ## Summary of changes Add support for deferrable. Echo deferrable status in response headers only if true. Likewise, now echo read-only status in response headers only if true.	2023-08-15 14:52:00 +01:00
Joonas Koivunen	8198b865c3	Remote storage metrics follow-up (#4957 ) #4942 left old metrics in place for migration purposes. It was noticed that from new metrics the total number of deleted objects was forgotten, add it. While reviewing, it was noticed that the delete_object could just be delete_objects of one. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-08-15 12:30:27 +03:00
Arpad Müller	baf395983f	Turn BlockLease associated type into an enum (#4982 ) ## Problem The `BlockReader` trait is not ready to be asyncified, as associated types are not supported by asyncification strategies like via the `async_trait` macro, or via adopting enums. ## Summary of changes Remove the `BlockLease` associated type from the `BlockReader` trait and turn it into an enum instead, bearing the same name. The enum has two variants, one of which is gated by `#[cfg(test)]`. Therefore, outside of test settings, the enum has zero overhead over just having the `PageReadGuard`. Using the enum allows us to impl `BlockReader` without needing the page cache. Part of https://github.com/neondatabase/neon/issues/4743	2023-08-14 18:48:09 +02:00
Arpad Müller	ce7efbe48a	Turn BlockCursor::{read_blob,read_blob_into_buf} async fn (#4905 ) ## Problem The `BlockCursor::read_blob` and `BlockCursor::read_blob_into_buf` functions are calling `read_blk` internally, so if we want to make that function async fn, they need to be async themselves. ## Summary of changes * We first turn `ValueRef::load` into an async fn. * Then, we switch the `RwLock` implementation in `InMemoryLayer` to use the one from `tokio`. * Last, we convert the `read_blob` and `read_blob_into_buf` functions into async fn. In three instances we use `Handle::block_on`: * one use is in compaction code, which currently isn't async. We put the entire loop into an `async` block to prevent the potentially hot loop from doing cross-thread operations. * one use is in dumping code for `DeltaLayer`. The "proper" way to address this would be to enable the visit function to take async closures, but then we'd need to be generic over async fs non async, which [isn't supported by rust right now](https://blog.rust-lang.org/inside-rust/2022/07/27/keyword-generics.html). The other alternative would be to do a first pass where we cache the data into memory, and only then to dump it. * the third use is in writing code, inside a loop that copies from one file to another. It is is synchronous and we'd like to keep it that way (for now?). Part of #4743	2023-08-14 17:20:37 +02:00
Tristan Partin	ef4a76c01e	Update Postgres to v15.4 and v14.9 (#4965 )	2023-08-14 16:19:45 +01:00
George MacKerron	1ca08cc523	Changed batch query body to from [...] to { queries: [...] } (#4975 ) ## Problem It's nice if `single query : single response :: batch query : batch response`. But at present, in the single case we send `{ query: '', params: [] }` and get back a single `{ rows: [], ... }` object, while in the batch case we send an array of `{ query: '', params: [] }` objects and get back not an array of `{ rows: [], ... }` objects but a `{ results: [ { rows: [] , ... }, { rows: [] , ... }, ... ] }` object instead. ## Summary of changes With this change, the batch query body becomes `{ queries: [{ query: '', params: [] }, ... ] }`, which restores a consistent relationship between the request and response bodies.	2023-08-14 16:07:33 +01:00
Dmitry Rodionov	4626d89eda	Harden retries on tenant/timeline deletion path. (#4973 ) Originated from test failure where we got SlowDown error from s3. The patch generalizes `download_retry` to not be download specific. Resulting `retry` function is moved to utils crate. `download_retries` is now a thin wrapper around this `retry` function. To ensure that all needed retries are in place test code now uses `test_remote_failures=1` setting. Ref https://neondb.slack.com/archives/C059ZC138NR/p1691743624353009	2023-08-14 17:16:49 +03:00
Arseny Sher	49c57c0b13	Add neon_local to docker image. People sometimes ask about this. https://community.neon.tech/t/is-the-neon-local-binary-in-any-of-the-official-docker-images/360/2	2023-08-14 14:08:51 +03:00
John Spray	d3a97fdf88	pageserver: avoid incrementing access time when reading layers for compaction (#4971 ) ## Problem Currently, image generation reads delta layers before writing out subsequent image layers, which updates the access time of the delta layers and effectively puts them at the back of the queue for eviction. This is the opposite of what we want, because after a delta layer is covered by a later image layer, it's likely that subsequent reads of latest data will hit the image rather than the delta layer, so the delta layer should be quite a good candidate for eviction. ## Summary of changes `RequestContext` gets a new `ATimeBehavior` field, and a `RequestContextBuilder` helper so that we can optionally add the new field without growing `RequestContext::new` every time we add something like this. Request context is passed into the `record_access` function, and the access time is not updated if `ATimeBehavior::Skip` is set. The compaction background task constructs its request context with this skip policy. Closes: https://github.com/neondatabase/neon/issues/4969	2023-08-14 10:18:22 +01:00
Arthur Petukhovsky	763f5c0641	Remove dead code from walproposer_utils.c (#4525 ) This code was mostly copied from walsender.c and the idea was to keep it similar to walsender.c, so that we can easily copy-paste future upstream changes to walsender.c to waproposer_utils.c, too. But right now I see that deleting it doesn't break anything, so it's better to remove unused parts.	2023-08-14 09:49:51 +01:00
Arseny Sher	8173813584	Add term=n option to safekeeper START_REPLICATION command. It allows term leader to ensure he pulls data from the correct term. Absense of it wasn't very problematic due to CRC checks, but let's be strict. walproposer still doesn't use it as we're going to remove recovery completely from it.	2023-08-12 12:20:13 +03:00
Felix Prasanna	cc2d00fea4	bump vm-builder version to v0.15.4 (#4980 ) Patches a bug in vm-builder where it did not include enough parameters in the query string. These parameters are `host=localhost port=5432`. These parameters were not necessary for the monitor because the `pq` go postgres driver included them by default.	2023-08-11 14:26:53 -04:00
Arpad Müller	9ffccb55f1	InMemoryLayer: move end_lsn out of the lock (#4963 ) ## Problem In some places, the lock on `InMemoryLayerInner` is only created to obtain `end_lsn`. This is not needed however, if we move `end_lsn` to `InMemoryLayer` instead. ## Summary of changes Make `end_lsn` a member of `InMemoryLayer`, and do less locking of `InMemoryLayerInner`. `end_lsn` is changed from `Option<Lsn>` into an `OnceLock<Lsn>`. Thanks to this change, we don't need to lock any more in three functions. Part of #4743 . Suggested in https://github.com/neondatabase/neon/pull/4905#issuecomment-1666458428 .	2023-08-11 18:01:02 +02:00
Arthur Petukhovsky	3a6b99f03c	proxy: improve http logs (#4976 ) Fix multiline logs on websocket errors and always print sql-over-http errors sent to the user.	2023-08-11 18:18:07 +03:00
Dmitry Rodionov	d39fd66773	tests: remove redundant wait_while (#4952 ) Remove redundant `wait_while` in tests. It had only one usage. Use `wait_tenant_status404`. Related: https://github.com/neondatabase/neon/pull/4855#discussion_r1289610641	2023-08-11 10:18:13 +03:00
Arthur Petukhovsky	73d7a9bc6e	proxy: propagate ws span (#4966 ) Found this log on staging: ``` 2023-08-10T17:42:58.573790Z INFO handling interactive connection from client protocol="ws" ``` We seem to be losing websocket span in spawn, this patch fixes it.	2023-08-10 23:38:22 +03:00
Sasha Krassovsky	3a71cf38c1	Grant BypassRLS to new neon_superuser roles (#4935 )	2023-08-10 21:04:45 +02:00
Conrad Ludgate	25c66dc635	proxy: http logging to 11 (#4950 ) ## Problem Mysterious network issues ## Summary of changes Log a lot more about HTTP/DNS in hopes of detecting more of the network errors	2023-08-10 17:49:24 +01:00
George MacKerron	538373019a	Increase max sql-over-http response size from 1MB to 10MB (#4961 ) ## Problem 1MB response limit is very small. ## Summary of changes This data is not yet tracked, so we shoudn't raise the limit too high yet. But as discussed with @kelvich and @conradludgate, this PR lifts it to 10MB, and adds also details of the limit to the error response.	2023-08-10 17:21:52 +01:00
Dmitry Rodionov	c58b22bacb	Delete tenant's data from s3 (#4855 ) ## Summary of changes For context see https://github.com/neondatabase/neon/blob/main/docs/rfcs/022-pageserver-delete-from-s3.md Create Flow to delete tenant's data from pageserver. The approach heavily mimics previously implemented timeline deletion implemented mostly in https://github.com/neondatabase/neon/pull/4384 and followed up in https://github.com/neondatabase/neon/pull/4552 For remaining deletion related issues consult with deletion project here: https://github.com/orgs/neondatabase/projects/33 resolves #4250 resolves https://github.com/neondatabase/neon/issues/3889 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-08-10 18:53:16 +03:00
Alek Westover	17aea78aa7	delete already present files from library index (#4955 )	2023-08-10 16:51:16 +03:00
Joonas Koivunen	71f9d9e5a3	test: allow slow shutdown warning (#4953 ) Introduced in #4886, did not consider that tests with real_s3 could sometimes go over the limit. Do not fail tests because of that.	2023-08-10 15:55:41 +03:00
Alek Westover	119b86480f	test: make pg_regress less flaky, hopefully (#4903 ) `pg_regress` is flaky: https://github.com/neondatabase/neon/issues/559 Consolidated `CHECKPOINT` to `check_restored_datadir_content`, add a wait for `wait_for_last_flush_lsn`. Some recently introduced flakyness was fixed with #4948. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-08-10 15:24:43 +03:00
Arpad Müller	fa1f87b268	Make the implementation of DiskBtreeReader::visit non-recursive (#4884 ) ## Problem The `DiskBtreeReader::visit` function calls `read_blk` internally, and while #4863 converted the API of `visit` to async, the internal function is still recursive. So, analogously to #4838, we turn the recursive function into an iterative one. ## Summary of changes First, we prepare the change by moving the for loop outside of the case switch, so that we only have one loop that calls recursion. Then, we switch from using recursion to an approach where we store the search path inside the tree on a stack on the heap. The caller of the `visit` function can control when the search over the B-Tree ends, by returning `false` from the closure. This is often used to either only find one specific entry (by always returning `false`), but it is also used to iterate over all entries of the B-tree (by always returning `true`), or to look for ranges (mostly in tests, but `get_value_reconstruct_data` also has such a use). Each stack entry contains two things: the block number (aka the block's offset), and a children iterator. The children iterator is constructed depending on the search direction, and with the results of a binary search over node's children list. It is the only thing that survives a spilling/push to the stack, everything else is reconstructed. In other words, each stack spill, will, if the search is still ongoing, cause an entire re-parsing of the node. Theoretically, this would be a linear overhead in the number of leaves the search visits. However, one needs to note: * the workloads to look for a specific entry are just visiting one leaf, ever, so this is mostly about workloads that visit larger ranges, including ones that visit the entire B-tree. * the requests first hit the page cache, so often the cost is just in terms of node deserialization * for nodes that only have leaf nodes as children, no spilling to the stack-on-heap happens (outside of the initial request where the iterator is `None`). In other words, for balanced trees, the spilling overhead is $\Theta\left(\frac{n}{b^2}\right)$, where `b` is the branching factor and `n` is the number of nodes in the tree. The B-Trees in the current implementation have a branching factor of roughly `PAGE_SZ/L` where `PAGE_SZ` is 8192, and `L` is `DELTA_KEY_SIZE = 26` or `KEY_SIZE = 18` in production code, so this gives us an estimate that we'd be re-loading an inner node for every 99000 leaves in the B-tree in the worst case. Due to these points above, I'd say that not fully caching the inner nodes with inner children is reasonable, especially as we also want to be fast for the "find one specific entry" workloads, where the stack content is never accessed: any action to make the spilling computationally more complex would contribute to wasted cycles here, even if these workloads "only" spill one node for each depth level of the b-tree (which is practically always a low single-digit number, Kleppmann points out on page 81 that for branching factor 500, a four level B-tree with 4 KB pages can store 250 TB of data). But disclaimer, this is all stuff I thought about in my head, I have not confirmed it with any benchmarks or data. Builds on top of #4863, part of #4743	2023-08-10 13:43:13 +02:00
Joonas Koivunen	db48f7e40d	test: mark test_download_extensions.py skipped for now (#4948 ) The test mutates a shared directory which does not work with multiple concurrent tests. It is being fixed, so this should be a very temporary band-aid. Cc: #4949.	2023-08-10 11:05:27 +00:00
Alek Westover	e157b16c24	if control file already exists ignore the remote version of the extension (#4945 )	2023-08-09 18:56:09 +00:00
bojanserafimov	94ad9204bb	Measure compute-pageserver latency (#4901 ) Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-08-09 13:20:30 -04:00
Joonas Koivunen	c8aed107c5	refactor: make {Delta,Image}LayerInners usable without {Delta,Image}Layer (#4937 ) On the quest of #4745, these are more related to the task at hand, but still small. In addition to $subject, allow `ValueRef<ResidentDeltaLayer>`.	2023-08-09 19:18:44 +03:00
Anastasia Lubennikova	da128a509a	fix pkglibdir path for remote extensions	2023-08-09 19:13:11 +03:00
Alexander Bayandin	5993b2bedc	test_runner: remove excessive timeouts (#4659 ) ## Problem For some tests, we override the default timeout (300s / 5m) with a larger values like 600s / 10m or even 1800s / 30m, even if it's not required. I've collected some statistics (for the last 60 days) for tests duration: \| test \| max (s) \| p99 (s) \| p50 (s) \| count \| \|-----------------------------------\|---------\|---------\|---------\|-------\| \| test_hot_standby \| 9 \| 2 \| 2 \| 5319 \| \| test_import_from_vanilla \| 16 \| 9 \| 6 \| 5692 \| \| test_import_from_pageserver_small \| 37 \| 7 \| 5 \| 5719 \| \| test_pg_regress \| 101 \| 73 \| 44 \| 5642 \| \| test_isolation \| 65 \| 56 \| 39 \| 5692 \| A couple of tests that I left with custom 600s / 10m timeout. \| test \| max (s) \| p99 (s) \| p50 (s) \| count \| \|-----------------------------------\|---------\|---------\|---------\|-------\| \| test_gc_cutoff \| 456 \| 224 \| 109 \| 5694 \| \| test_pageserver_chaos \| 528 \| 267 \| 121 \| 5712 \| ## Summary of changes - Remove `@pytest.mark.timeout` annotation from several tests	2023-08-09 16:27:53 +01:00
Anastasia Lubennikova	4ce7aa9ffe	Fix extensions download error handling (#4941 ) Don't panic if library or extension is not found in remote extension storage or download has failed. Instead, log the error and proceed - if file is not present locally as well, postgres will fail with postgres error. If it is a shared_preload_library, it won't start, because of bad config. Otherwise, it will just fail to run the SQL function/ command that needs the library. Also, don't try to download extensions if remote storage is not configured.	2023-08-09 15:37:51 +03:00
Joonas Koivunen	cbd04f5140	remove_remote_layer: uninteresting refactorings (#4936 ) In the quest to solve #4745 by moving the download/evictedness to be internally mutable factor of a Layer and get rid of `trait PersistentLayer` at least for prod usage, `layer_removal_cs`, we present some misc cleanups. --------- Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>	2023-08-09 14:35:56 +03:00
Arpad Müller	1037a8ddd9	Explain why VirtualFile stores tenant_id and timeline_id as strings (#4930 ) ## Problem One might wonder why the code here doesn't use `TimelineId` or `TenantId`. I originally had a refactor to use them, but then discarded it, because converting to strings on each time there is a read or write is wasteful. ## Summary of changes We add some docs explaining why here no `TimelineId` or `TenantId` is being used.	2023-08-08 23:41:09 +02:00
Felix Prasanna	6661f4fd44	bump vm-builder version to v0.15.0-alpha1 (#4934 )	2023-08-08 15:22:10 -05:00
Alexander Bayandin	b9f84b9609	Improve test results format (#4549 ) ## Problem The current test history format is a bit inconvenient: - It stores all test results in one row, so all queries should include subqueries which expand the test - It includes duplicated test results if the rerun is triggered manually for one of the test jobs (for example, if we rerun `debug-pg14`, then the report will include duplicates for other build types/postgres versions) - It doesn't have a reference to run_id, which we use to create a link to allure report Here's the proposed new format: ``` id BIGSERIAL PRIMARY KEY, parent_suite TEXT NOT NULL, suite TEXT NOT NULL, name TEXT NOT NULL, status TEXT NOT NULL, started_at TIMESTAMPTZ NOT NULL, stopped_at TIMESTAMPTZ NOT NULL, duration INT NOT NULL, flaky BOOLEAN NOT NULL, build_type TEXT NOT NULL, pg_version INT NOT NULL, run_id BIGINT NOT NULL, run_attempt INT NOT NULL, reference TEXT NOT NULL, revision CHAR(40) NOT NULL, raw JSONB COMPRESSION lz4 NOT NULL, ``` ## Summary of changes - Misc allure changes: - Update allure to 2.23.1 - Delete files from previous runs in HTML report (by using `sync --delete` instead of `mv`) - Use `test-cases/*.json` instead of `suites.json`, using this directory allows us to catch all reruns. - Until we migrated `scripts/flaky_tests.py` and `scripts/benchmark_durations.py` store test results in 2 formats (in 2 different databases).	2023-08-08 20:09:38 +01:00
Felix Prasanna	459253879e	Revert "bump vm-builder to v0.15.0-alpha1 (#4895 )" (#4931 ) This reverts commit `682dfb3a31`.	2023-08-08 20:21:39 +03:00
Conrad Ludgate	0fa85aa08e	proxy: delay auth on retry (#4929 ) ## Problem When an endpoint is shutting down, it can take a few seconds. Currently when starting a new compute, this causes an "endpoint is in transition" error. We need to add delays before retrying to ensure that we allow time for the endpoint to shutdown properly. ## Summary of changes Adds a delay before retrying in auth. connect_to_compute already has this delay	2023-08-08 17:19:24 +03:00
Cuong Nguyen	039017cb4b	Add new flag for advertising pg address (#4898 ) ## Problem The safekeeper advertises the same address specified in `--listen-pg`, which is problematic when the listening address is different from the address that the pageserver can use to connect to the safekeeper. ## Summary of changes Add a new optional flag called `--advertise-pg` for the address to be advertised. If this flag is not specified, the behavior is the same as before.	2023-08-08 14:26:38 +03:00
John Spray	4dc644612b	pageserver: expose prometheus metrics for startup time (#4893 ) ## Problem Currently to know how long pageserver startup took requires inspecting logs. ## Summary of changes `pageserver_startup_duration_ms` metric is added, with label `phase` for different phases of startup. These are broken down by phase, where the phases correspond to the existing wait points in the code: - Start of doing I/O - When tenant load is done - When initial size calculation is done - When background jobs start - Then "complete" when everything is done. `pageserver_startup_is_loading` is a 0/1 gauge that indicates whether we are in the initial load of tenants. `pageserver_tenant_activation_seconds` is a histogram of time in seconds taken to activate a tenant. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-08-08 12:41:37 +03:00
Anastasia Lubennikova	6d17d6c775	Use WebIdentityTokenCredentialsProvider to access remote extensions (#4921 ) Fixes access to s3 buckets that use IAM roles for service accounts access control method --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-08-08 12:37:22 +03:00
John Spray	4892a5c5b7	pageserver: avoid logging the "ERROR" part of DbErrors that are successes (#4902 ) ## Problem The pageserver<->safekeeper protocol uses error messages to indicate end of stream. pageserver already logs these at INFO level, but the inner error message includes the word "ERROR", which interferes with log searching. Example: ``` walreceiver connection handling ended: db error: ERROR: ending streaming to Some("pageserver") at 0/4031CA8 ``` The inner DbError has a severity of ERROR so DbError's Display implementation includes that ERROR, even though we are actually logging the error at INFO level. ## Summary of changes Introduce an explicit WalReceiverError type, and in its From<> for postgres errors, apply the logic from ExpectedError, for expected errors, and a new condition for successes. The new output looks like: ``` walreceiver connection handling ended: Successful completion: ending streaming to Some("pageserver") at 0/154E9C0, receiver is caughtup and there is no computes ```	2023-08-08 12:35:24 +03:00
John Spray	33cb1e9c0c	tests: enable higher concurrency and adjust tests with outlier runtime (#4904 ) ## Problem I spent a few minutes seeing how fast I could get our regression test suite to run on my workstation, for when I want to run a "did I break anything?" smoke test before pushing to CI. - Test runtime was dominated by a couple of tests that run for longer than all the others take together - Test concurrency was limited to <16 by the ports-per-worker setting There's no "right answer" for how long a test should be, but as a rule of thumb, no one test should run for much longer than the time it takes to run all the other tests together. ## Summary of changes - Make the ports per worker setting dynamic depending on worker count - Modify the longest running tests to run for a shorter time (`test_duplicate_layers` which uses a pgbench runtime) or fewer iterations (`test_restarts_frequent_checkpoints`).	2023-08-08 09:16:21 +01:00
Arpad Müller	9559ef6f3b	Sort by (key, lsn), not just key (#4918 ) ## Problem PR #4839 didn't output the keys/values in lsn order, but for a given key, the lsns were kept in incoming file order. I think the ordering by lsn is expected. ## Summary of changes We now also sort by `(key, lsn)`, like we did before #4839.	2023-08-07 18:14:15 +03:00
John Spray	64a4fb35c9	pagectl: skip `metadata` file in `pagectl draw-timeline` (#4872 ) ## Problem Running `pagectl draw-timeline` on a pageserver directory wasn't working out of the box because it trips up on the `metadata` file. ## Summary of changes Just ignore the `metadata` file in the list of input files passed to `draw-timeline`.	2023-08-07 08:24:50 +01:00
MMeent	95ec42f2b8	Change log levels on various operations (#4914 ) Cache changes are now DEBUG2 Logs that indicate disabled caches now explicitly call out that the file cache is disabled on WARNING level instead of LOG/INFO	2023-08-06 20:37:09 +02:00
Joonas Koivunen	ba9df27e78	fix: silence not found error when removing ephmeral (#4900 ) We currently cannot drop tenant before removing it's directory, or use Tenant::drop for this. This creates unnecessary or inactionable warnings during detach at least. Silence the most typical, file not found. Log remaining at `error!`. Cc: #2442	2023-08-04 21:03:17 +03:00
Joonas Koivunen	ea3e1b51ec	Remote storage metrics (#4892 ) We don't know how our s3 remote_storage is performing, or if it's blocking the shutdown. Well, for sampling reasons, we will not really know even after this PR. Add metrics: - align remote_storage metrics towards #4813 goals - histogram `remote_storage_s3_request_seconds{request_type=(get_object\|put_object\|delete_object\|list_objects), result=(ok\|err\|cancelled)}` - histogram `remote_storage_s3_wait_seconds{request_type=(same kinds)}` - counter `remote_storage_s3_cancelled_waits_total{request_type=(same kinds)}` Follow-up work: - After release, remove the old metrics, migrate dashboards Histogram buckets are rough guesses, need to be tuned. In pageserver we have a download timeout of 120s, so I think the 100s bucket is quite nice.	2023-08-04 21:01:29 +03:00
John Spray	e3e739ee71	pageserver: remove no-op attempt to report fail/failpoint feature (#4879 ) ## Problem The current output from a prod binary at startup is: ``` git-env:765455bca22700e49c053d47f44f58a6df7c321f failpoints: true, features: [] launch_timestamp: 2023-08-02 10:30:35.545217477 UTC ``` It's confusing to read that line, then read the code and think "if failpoints is true, but not in the features list, what does that mean?". As far as I can tell, the check of `fail/failpoints` is just always false because cargo doesn't expose features across crates like this: the `fail/failpoints` syntax works in the cargo CLI but not from a macro in some crate other than `fail`. ## Summary of changes Remove the lines that try to check `fail/failpoints` from the pageserver entrypoint module. This has no functional impact but makes the code slightly easier to understand when trying to make sense of the line printed on startup.	2023-08-04 17:56:31 +01:00
Conrad Ludgate	606caa0c5d	proxy: update logs and span data to be consistent and have more info (#4878 ) ## Problem Pre-requisites for #4852 and #4853 ## Summary of changes 1. Includes the client's IP address (which we already log) with the span info so we can have it on all associated logs. This makes making dashboards based on IP addresses easier. 2. Switch to a consistent error/warning log for errors during connection. This includes error, num_retries, retriable=true/false and a consistent log message that we can grep for.	2023-08-04 12:37:18 +03:00
Arpad Müller	6a906c68c9	Make {DeltaLayer,ImageLayer}::{load,load_inner} async (#4883 ) ## Problem The functions `DeltaLayer::load_inner` and `ImageLayer::load_inner` are calling `read_blk` internally, which we would like to turn into an async fn. ## Summary of changes We switch from `once_cell`'s `OnceCell` implementation to the one in `tokio` in order to be able to call an async `get_or_try_init` function. Builds on top of #4839, part of #4743	2023-08-04 12:35:45 +03:00
Felix Prasanna	682dfb3a31	bump vm-builder to v0.15.0-alpha1 (#4895 )	2023-08-03 14:26:14 -04:00
Joonas Koivunen	5263b39e2c	fix: shutdown logging again (#4886 ) During deploys of 2023-08-03 we logged too much on shutdown. Fix the logging by timing each top level shutdown step, and possibly warn on it taking more than a rough threshold, based on how long I think it possibly should be taking. Also remove all shutdown logging from background tasks since there is already "shutdown is taking a long time" logging. Co-authored-by: John Spray <john@neon.tech>	2023-08-03 20:34:05 +03:00
Arpad Müller	a241c8b2a4	Make DiskBtreeReader::{visit, get} async (#4863 ) ## Problem `DiskBtreeReader::get` and `DiskBtreeReader::visit` both call `read_blk` internally, which we would like to make async in the future. This PR focuses on making the interface of these two functions `async`. There is further work to be done in forms of making `visit` to not be recursive any more, similar to #4838. For that, see https://github.com/neondatabase/neon/pull/4884. Builds on top of https://github.com/neondatabase/neon/pull/4839, part of https://github.com/neondatabase/neon/issues/4743 ## Summary of changes Make `DiskBtreeReader::get` and `DiskBtreeReader::visit` async functions and `await` in the places that call these functions.	2023-08-03 17:36:46 +02:00
John Spray	e71d8095b9	README: make it a bit clearer how to get regression tests running (#4885 ) ## Problem When setting up for the first time I hit a couple of nits running tests: - It wasn't obvious that `openssl` and `poetry` were needed (poetry is mentioned kind of obliquely via "dependency installation notes" rather than being in the list of rpm/deb packages to install. - It wasn't obvious how to get the tests to run for just particular parameters (e.g. just release mode) ## Summary of changes Add openssl and poetry to the package lists. Add an example of how to run pytest for just a particular build type and postgres version.	2023-08-03 15:23:23 +01:00
Dmitry Rodionov	1497a42296	tests: split neon_fixtures.py (#4871 ) ## Problem neon_fixtures.py has grown to unmanageable size. It attracts conflicts. When adding specific utils under for example `fixtures/pageserver` things sometimes need to import stuff from `neon_fixtures.py` which creates circular import. This is usually only needed for type annotations, so `typing.TYPE_CHECKING` flag can mask the issue. Nevertheless I believe that splitting neon_fixtures.py into smaller parts is a better approach. Currently the PR contains small things, but I plan to continue and move NeonEnv to its own `fixtures.env` module. To keep the diff small I think this PR can already be merged to cause less conflicts. UPD: it looks like currently its not really possible to fully avoid usage of `typing.TYPE_CHECKING`, because some components directly depend on each other. I e Env -> Cli -> Env cycle. But its still worth it to avoid it in as many places as possible. And decreasing neon_fixture's size still makes sense.	2023-08-03 17:20:24 +03:00
Alexander Bayandin	cd33089a66	test_runner: set AWS credentials for endpoints (#4887 ) ## Problem If AWS credentials are not set locally (via AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env vars) `test_remote_library[release-pg15-mock_s3]` test fails with the following error: ``` ERROR could not start the compute node: Failed to download a remote file: Failed to download S3 object: failed to construct request ``` ## Summary of changes - set AWS credentials for endpoints programmatically	2023-08-03 16:44:48 +03:00
Arpad Müller	416c14b353	Compaction: sort on slices directly instead of kmerge (#4839 ) ## Problem The k-merge in pageserver compaction currently relies on iterators over the keys and also over the values. This approach does not support async code because we are using iterators and those don't support async in general. Also, the k-merge implementation we use doesn't support async either. Instead, as we already load all the keys into memory, just do sorting in-memory. ## Summary of changes The PR can be read commit-by-commit, but most importantly, it: * Stops using kmerge in compaction, using slice sorting instead. * Makes `load_keys` and `load_val_refs` async, using `Handle::block_on` in the compaction code as we don't want to turn the compaction function, called inside `spawn_blocking`, into an async fn. Builds on top of #4836, part of https://github.com/neondatabase/neon/issues/4743	2023-08-03 15:30:41 +02:00
John Spray	df49a9b7aa	pagekeeper: suppress error logs in shutdown/detach (#4876 ) ## Problem Error messages like this coming up during normal operations: ``` Compaction failed, retrying in 2s: timeline is Stopping Compaction failed, retrying in 2s: Cannot run compaction iteration on inactive tenant ``` ## Summary of changes Add explicit handling for the shutdown case in these locations, to suppress error logs.	2023-08-02 19:31:09 +01:00
bojanserafimov	4ad0c8f960	compute_ctl: Prewarm before starting http server (#4867 )	2023-08-02 14:19:06 -04:00
Joonas Koivunen	e0b05ecafb	build: ca-certificates need to be present (#4880 ) as needed since #4715 or this will happen: ``` ERROR panic{thread=main location=.../hyper-rustls-0.23.2/src/config.rs:48:9}: no CA certificates found ```	2023-08-02 20:34:21 +03:00
Vadim Kharitonov	ca4d71a954	Upgrade pg_embedding to 0.3.5 (#4873 )	2023-08-02 18:18:33 +03:00
Alexander Bayandin	381f41e685	Bump cryptography from 41.0.2 to 41.0.3 (#4870 )	2023-08-02 14:10:36 +03:00
Alek Westover	d005c77ea3	Tar Remote Extensions (#4715 ) Add infrastructure to dynamically load postgres extensions and shared libraries from remote extension storage. Before postgres start downloads list of available remote extensions and libraries, and also downloads 'shared_preload_libraries'. After postgres is running, 'compute_ctl' listens for HTTP requests to load files. Postgres has new GUC 'extension_server_port' to specify port on which 'compute_ctl' listens for requests. When PostgreSQL requests a file, 'compute_ctl' downloads it. See more details about feature design and remote extension storage layout in docs/rfcs/024-extension-loading.md --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Alek Westover <alek.westover@gmail.com>	2023-08-02 12:38:12 +03:00
Joonas Koivunen	04776ade6c	fix(consumption): rename _size_ => _data_ (#4866 ) I failed at renaming the metric middle part while managing to do a great job with the suffix. Fix the middle part as well.	2023-08-01 19:18:25 +03:00
Dmitry Rodionov	c3fe335eaf	wait for tenant to be active before polling for timeline absence (#4856 ) ## Problem https://neon-github-public-dev.s3.amazonaws.com/reports/main/5692829577/index.html#suites/f588e0a787c49e67b29490359c589fae/4c50937643d68a66 ## Summary of changes wait for tenant to be active after restart before polling for timeline absence	2023-08-01 18:28:18 +03:00
Joonas Koivunen	3a00a5deb2	refactor: tidy consumption metrics (#4860 ) Tidying up I've been wanting to do for some time. Follow-up to #4857.	2023-08-01 18:14:16 +03:00
Joonas Koivunen	78fa2b13e5	test: written_size_bytes_delta (#4857 ) Two stabs at this, by mocking a http receiver and the globals out (now reverted) and then by separating the timeline dependency and just testing what kind of events certain timelines produce. I think this pattern could work for some of our problems. Follow-up to #4822.	2023-08-01 15:30:36 +03:00
John Spray	7c076edeea	pageserver: tweak period of imitate_layer_accesses (#4859 ) ## Problem When the eviction threshold is an integer multiple of the eviction period, it is unreliable to skip imitating accesses based on whether the last imitation was more recent than the threshold. This is because as finite time passes between the time used for the periodic execution, and the 'now' time used for updating last_layer_access_imitation. When this is just a few milliseconds, and everything else is on-time, then a 5 second threshold with a 1 second period will end up entering its 5th iteration slightly _less than_ 5 second since last_layer_access_imitation, and thereby skipping instead of running the imitation. If a few milliseconds then pass before we check the access time of a file that _should_ have been bumped by the imitation pass, then we end up evicting something we shouldn't have evicted. ## Summary of changes We can make this race far less likely by using the threshold minus one interval as the period for re-executing the imitate_layer_accesses: that way we're not vulnerable to racing by just a few millis, and there would have to be a delay of the order `period` to cause us to wrongly evict a layer. This is not a complete solution: it would be good to revisit this and use a non-walltime mechanism for pinning these layers into local storage, rather than relying on bumping access times.	2023-08-01 13:17:49 +01:00
Arpad Müller	69528b7c30	Prepare k-merge in compaction for async I/O (#4836 ) ## Problem The k-merge in pageserver compaction currently relies on iterators over the keys and also over the values. This approach does not support async code because we are using iterators and those don't support async in general. Also, the k-merge implementation we use doesn't support async either. Instead, as we already load all the keys into memory, the plan is to just do the sorting in-memory for now, switch to async, and then once we want to support workloads that don't have all keys stored in memory, we can look into switching to a k-merge implementation that supports async instead. ## Summary of changes The core of this PR is the move from functions on the `PersistentLayer` trait to return custom iterator types to inherent functions on `DeltaLayer` that return buffers with all keys or value references. Value references are a type we created in this PR, containing a `BlobRef` as well as an `Arc` pointer to the `DeltaLayerInner`, so that we can lazily load the values during compaction. This preserves the property of the current code. This PR does not switch us to doing the k-merge via sort on slices, but with this PR, doing such a switch is relatively easy and only requires changes of the compaction code itself. Part of https://github.com/neondatabase/neon/issues/4743	2023-08-01 13:38:35 +02:00
Konstantin Knizhnik	a98a80abc2	Deffine NEON_SMGR to make it possible for extensions to use Neon SMG API (#4840 ) ## Problem See https://neondb.slack.com/archives/C036U0GRMRB/p1689148023067319 ## Summary of changes Define NEON_SMGR in smgr.h ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-08-01 10:04:45 +03:00
Alex Chi Z	7b6c849456	support isolation level + read only for http batch sql (#4830 ) We will retrieve `neon-batch-isolation-level` and `neon-batch-read-only` from the http header, which sets the txn properties. https://github.com/neondatabase/serverless/pull/38#issuecomment-1653130981 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-08-01 02:59:11 +03:00
Joonas Koivunen	326189d950	consumption_metrics: send timeline_written_size_delta (#4822 ) We want to have timeline_written_size_delta which is defined as difference to the previously sent `timeline_written_size` from the current `timeline_written_size`. Solution is to send it. On the first round `disk_consistent_lsn` is used which is captured during `load` time. After that an incremental "event" is sent on every collection. Incremental "events" are not part of deduplication. I've added some infrastructure to allow somewhat typesafe `EventType::Absolute` and `EventType::Incremental` factories per metrics, now that we have our first `EventType::Incremental` usage.	2023-07-31 22:10:19 +03:00
bojanserafimov	ddbe170454	Prewarm compute nodes (#4828 )	2023-07-31 14:13:32 -04:00
Alexander Bayandin	39e458f049	test_compatibility: fix pg_tenant_only_port port collision (#4850 ) ## Problem Compatibility tests fail from time to time due to `pg_tenant_only_port` port collision (added in https://github.com/neondatabase/neon/pull/4731) ## Summary of changes - replace `pg_tenant_only_port` value in config with new port - remove old logic, than we don't need anymore - unify config overrides	2023-07-31 20:49:46 +03:00
Vadim Kharitonov	e1424647a0	Update pg_embedding to 0.3.1 version (#4811 )	2023-07-31 20:23:18 +03:00
Yinnan Yao	705ae2dce9	Fix error message for listen_pg_addr_tenant_only binding (#4787 ) ## Problem Wrong use of `conf.listen_pg_addr` in `error!()`. ## Summary of changes Use `listen_pg_addr_tenant_only` instead of `conf.listen_pg_addr`. Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>	2023-07-31 14:40:52 +01:00
Conrad Ludgate	eb78603121	proxy: div by zero (#4845 ) ## Problem 1. In the CacheInvalid state loop, we weren't checking the `num_retries`. If this managed to get up to `32`, the retry_after procedure would compute 2^32 which would overflow to 0 and trigger a div by zero 2. When fixing the above, I started working on a flow diagram for the state machine logic and realised it was more complex than it had to be: a. We start in a `Cached` state b. `Cached`: call `connect_once`. After the first connect_once error, we always move to the `CacheInvalid` state, otherwise, we return the connection. c. `CacheInvalid`: we attempt to `wake_compute` and we either switch to Cached or we retry this step (or we error). d. `Cached`: call `connect_once`. We either retry this step or we have a connection (or we error) - After num_retries > 1 we never switch back to `CacheInvalid`. ## Summary of changes 1. Insert a `num_retries` check in the `handle_try_wake` procedure. Also using floats in the retry_after procedure to prevent the overflow entirely 2. Refactor connect_to_compute to be more linear in design.	2023-07-31 09:30:24 -04:00
John Spray	f0ad603693	pageserver: add unit test for deleted_at in IndexPart (#4844 ) ## Problem Existing IndexPart unit tests only exercised the version 1 format (i.e. without deleted_at set). ## Summary of changes Add a test that sets version to 2, and sets a value for deleted_at. Closes https://github.com/neondatabase/neon/issues/4162	2023-07-31 12:51:18 +01:00
Arpad Müller	e5183f85dc	Make DiskBtreeReader::dump async (#4838 ) ## Problem `DiskBtreeReader::dump` calls `read_blk` internally, which we want to make async in the future. As it is currently relying on recursion, and async doesn't like recursion, we want to find an alternative to that and instead traverse the tree using a loop and a manual stack. ## Summary of changes * Make `DiskBtreeReader::dump` and all the places calling it async * Make `DiskBtreeReader::dump` non-recursive internally and use a stack instead. It now deparses the node in each iteration, which isn't optimal, but on the other hand it's hard to store the node as it is referencing the buffer. Self referential data are hard in Rust. For a dumping function, speed isn't a priority so we deparse the node multiple times now (up to branching factor many times). Part of https://github.com/neondatabase/neon/issues/4743 I have verified that output is unchanged by comparing the output of this command both before and after this patch: ``` cargo test -p pageserver -- particular_data --nocapture ```	2023-07-31 12:52:29 +02:00
Joonas Koivunen	89ee8f2028	fix: demote warnings, fix flakyness (#4837 ) `WARN ... found future (image\|delta) layer` are not actionable log lines. They don't need to be warnings. `info!` is enough. This also fixes some known but not tracked flakyness in [`test_remote_timeline_client_calls_started_metric`][evidence]. [evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4829/5683495367/index.html#/testresult/34fe79e24729618b Closes #3369. Closes #4473.	2023-07-31 07:43:12 +00:00
Alex Chi Z	a8f3540f3d	proxy: add unit test for wake_compute (#4819 ) ## Problem ref https://github.com/neondatabase/neon/pull/4721, ref https://github.com/neondatabase/neon/issues/4709 ## Summary of changes This PR adds unit tests for wake_compute. The patch adds a new variant `Test` to auth backends. When `wake_compute` is called, we will verify if it is the exact operation sequence we are expecting. The operation sequence now contains 3 more operations: `Wake`, `WakeRetry`, and `WakeFail`. The unit tests for proxy connects are now complete and I'll continue work on WebSocket e2e test in future PRs. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-28 19:10:55 -04:00
Konstantin Knizhnik	4338eed8c4	Make it possible to grant self perfmissions to self created roles (#4821 ) ## Problem See: https://neondb.slack.com/archives/C04USJQNLD6/p1689973957908869 ## Summary of changes Bump Postgres version ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-07-28 22:06:03 +03:00
Joonas Koivunen	2fbdf26094	test: raise timeout to avoid flakyness (#4832 ) 2s timeout was too tight for our CI, [evidence](https://neon-github-public-dev.s3.amazonaws.com/reports/main/5669956577/index.html#/testresult/6388e31182cc2d6e). 15s might be better. Also cleanup code no longer needed after #4204.	2023-07-28 14:32:01 -04:00
Alexander Bayandin	7374634845	test_runner: clean up test_compatibility (#4770 ) ## Problem We have some amount of outdated logic in test_compatibility, that we don't need anymore. ## Summary of changes - Remove `PR4425_ALLOWED_DIFF` and tune `dump_differs` method to accept allowed diffs in the future (a cleanup after https://github.com/neondatabase/neon/pull/4425) - Remote etcd related code (a cleanup after https://github.com/neondatabase/neon/pull/2733) - Don't set `preserve_database_files`	2023-07-28 16:15:31 +01:00
Alexander Bayandin	9fdd3a4a1e	test_runner: add amcheck to test_compatibility (#4772 ) Run `pg_amcheck` in forward and backward compatibility tests to catch some data corruption. ## Summary of changes - Add amcheck compiling to Makefile - Add `pg_amcheck` to test_compatibility	2023-07-28 16:00:55 +01:00
Alek Westover	3681fc39fd	modify `relative_path_to_s3_object` logic for `prefix=None` (#4795 ) see added unit tests for more description	2023-07-28 10:03:18 -04:00
Joonas Koivunen	67d2fa6dec	test: fix `test_neon_cli_basics` flakyness without making it better for future (#4827 ) The test was starting two endpoints on the same branch as discovered by @petuhovskiy. The fix is to allow passing branch-name from the python side over to neon_local, which already accepted it. Split from #4824, which will handle making this more misuse resistant.	2023-07-27 19:13:58 +03:00
Dmitry Rodionov	cafbe8237e	Move tenant/delete.rs to tenant/timeline/delete.rs (#4825 ) move tenant/delete.rs to tenant/timeline/delete.rs to prepare for appearance of tenant deletion routines in tenant/delete.rs	2023-07-27 15:52:36 +03:00
Joonas Koivunen	3e425c40c0	fix(compute_ctl): remove stray variable in error message (#4823 ) error is not needed because anyhow will have the cause chain reported anyways. related to test_neon_cli_basics being flaky, but doesn't actually fix any flakyness, just the obvious stray `{e}`.	2023-07-27 15:40:53 +03:00
Joonas Koivunen	395bd9174e	test: allow future image layer warning (#4818 ) https://neon-github-public-dev.s3.amazonaws.com/reports/main/5670795960/index.html#suites/837740b64a53e769572c4ed7b7a7eeeb/5a73fa4a69399123/retries Allow it because we are doing immediate stop.	2023-07-27 10:22:44 +03:00
Alek Westover	b9a7a661d0	add list of public extensions and lookup table for libraries (#4807 )	2023-07-26 15:55:55 -04:00
Joonas Koivunen	48ce95533c	test: allow normal warnings in test_threshold_based_eviction (#4801 ) See: https://neon-github-public-dev.s3.amazonaws.com/reports/main/5654328815/index.html#suites/3fc871d9ee8127d8501d607e03205abb/3482458eba88c021	2023-07-26 20:20:12 +03:00
Dmitry Rodionov	874c31976e	dedup cleanup fs traces (#4778 ) This is a follow up for discussion: https://github.com/neondatabase/neon/pull/4552#discussion_r1253417777 see context there	2023-07-26 18:39:32 +03:00
Conrad Ludgate	231d7a7616	proxy: retry compute wake in auth (#4817 ) ## Problem wake_compute can fail sometimes but is eligible for retries. We retry during the main connect, but not during auth. ## Summary of changes retry wake_compute during auth flow if there was an error talking to control plane, or if there was a temporary error in waking the compute node	2023-07-26 16:34:46 +01:00
arpad-m	5705413d90	Use OnceLock instead of manually implementing it (#4805 ) ## Problem In https://github.com/neondatabase/neon/issues/4743 , I'm trying to make more of the pageserver async, but in order for that to happen, I need to be able to persist the result of `ImageLayer::load` across await points. For that to happen, the return value needs to be `Send`. ## Summary of changes Use `OnceLock` in the image layer instead of manually implementing it with booleans, locks and `Option`. Part of #4743	2023-07-26 17:20:09 +02:00
Conrad Ludgate	35370f967f	proxy: add some connection init logs (#4812 ) ## Problem The first session event we emit is after we receive the first startup packet from the client. This means we can't detect any issues between TCP open and handling of the first PG packet ## Summary of changes Add some new logs for websocket upgrade and connection handling	2023-07-26 15:03:51 +00:00
Alexander Bayandin	b98419ee56	Fix allure report overwriting for different Postgres versions (#4806 ) ## Problem We've got an example of Allure reports from 2 different runners for the same build that started to upload at the exact second, making one overwrite another ## Summary of changes - Use the Postgres version to distinguish artifacts (along with the build type)	2023-07-26 15:19:18 +01:00
Alexander Bayandin	86a61b318b	Bump certifi from 2022.12.7 to 2023.7.22 (#4815 )	2023-07-26 16:32:56 +03:00
Alek Westover	5f8fd640bf	Upload Test Remote Extensions (#4792 ) We need some real extensions in S3 to accurately test the code for handling remote extensions. In this PR we just upload three extensions (anon, kq_imcx and postgis), which is enough for testing purposes for now. In addition to creating and uploading the extension archives, we must generate a file `ext_index.json` which specifies important metadata about the extensions. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-07-26 15:24:03 +03:00
bojanserafimov	916a5871a6	compute_ctl: Parse sk connstring (#4809 )	2023-07-26 08:10:49 -04:00
Dmitry Rodionov	700d929529	Init Timeline in Stopping state in create_timeline_struct when Cause::Delete (#4780 ) See https://github.com/neondatabase/neon/pull/4552#discussion_r1258368127 for context. TLDR: use CreateTimelineCause to infer desired state instead of using .set_stopping after initialization	2023-07-26 14:05:18 +03:00
bojanserafimov	520046f5bd	cold starts: Add sync-safekeepers fast path (#4804 )	2023-07-25 19:44:18 -04:00
Conrad Ludgate	2ebd2ce2b6	proxy: record connection type (#4802 ) ## Problem We want to measure how many users are using TCP/WS connections. We also want to measure how long it takes to establish a connection with the compute node. I plan to also add a separate counter for HTTP requests, but because of pooling this needs to be disambiguated against new HTTP compute connections ## Summary of changes * record connection type (ws/tcp) in the connection counters. * record connection latency including retry latency	2023-07-25 18:57:42 +03:00
Alex Chi Z	bcc2aee704	proxy: add tests for batch http sql (#4793 ) This PR adds an integration test case for batch HTTP SQL endpoint. https://github.com/neondatabase/neon/pull/4654/ should be merged first before we land this PR. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-25 15:08:24 +00:00
Dmitry Rodionov	6d023484ed	Use mark file to allow for deletion operations to continue through restarts (#4552 ) ## Problem Currently we delete local files first, so if pageserver restarts after local files deletion then remote deletion is not continued. This can be solved with inversion of these steps. But even if these steps are inverted when index_part.json is deleted there is no way to distinguish between "this timeline is good, we just didnt upload it to remote" and "this timeline is deleted we should continue with removal of local state". So to solve it we use another mark file. After index part is deleted presence of this mark file indentifies that it was a deletion intention. Alternative approach that was discussed was to delete all except metadata first, and then delete metadata and index part. In this case we still do not support local only configs making them rather unsafe (deletion in them is already unsafe, but this direction solidifies this direction instead of fixing it). Another downside is that if we crash after local metadata gets removed we may leave dangling index part on the remote which in theory shouldnt be a big deal because the file is small. It is not a big change to choose another approach at this point. ## Summary of changes Timeline deletion sequence: 1. Set deleted_at in remote index part. 2. Create local mark file. 3. Delete local files except metadata (it is simpler this way, to be able to reuse timeline initialization code that expects metadata) 4. Delete remote layers 5. Delete index part 6. Delete meta, timeline directory. 7. Delete mark file. This works for local only configuration without remote storage. Sequence is resumable from any point. resolves #4453 resolves https://github.com/neondatabase/neon/pull/4552 (the issue was created with async cancellation in mind, but we can still have issues with retries if metadata is deleted among the first by remove_dir_all (which doesnt have any ordering guarantees)) --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-07-25 16:25:27 +03:00
Nick Randall	062159ac17	support non-interactive transactions in sql-over-http (#4654 ) This PR adds support for non-interactive transaction query endpoint. It accepts an array of queries and parameters and returns an array of query results. The queries will be run in a single transaction one after another on the proxy side.	2023-07-25 13:03:55 +01:00
cui fliter	f2e2b8a7f4	fix some typos (#4662 ) Typos fix Signed-off-by: cui fliter <imcusg@gmail.com>	2023-07-25 14:39:29 +03:00
Joonas Koivunen	f9214771b4	fix: count broken tenant more correct (#4800 ) count only once; on startup create the counter right away because we will not observe any changes later. small, probably never reachable from outside fix for #4796.	2023-07-25 12:31:24 +03:00
Joonas Koivunen	77a68326c5	Thin out TenantState metric, keep set of broken tenants (#4796 ) We currently have a timeseries for each of the tenants in different states. We only want this for Broken. Other states could be counters. Fix this by making the `pageserver_tenant_states_count` a counter without a `tenant_id` and add a `pageserver_broken_tenants_count` which has a `tenant_id` label, each broken tenant being 1.	2023-07-25 11:15:54 +03:00
Joonas Koivunen	a25504deae	Limit concurrent compactions (#4777 ) Compactions can create a lot of concurrent work right now with #4265. Limit compactions to use at most 6/8 background runtime threads.	2023-07-25 10:19:04 +03:00
Joonas Koivunen	294b8a8fde	Convert per timeline metrics to global (#4769 ) Cut down the per-(tenant, timeline) histograms by making them global: - `pageserver_getpage_get_reconstruct_data_seconds` - `pageserver_read_num_fs_layers` - `pageserver_remote_operation_seconds` - `pageserver_remote_timeline_client_calls_started` - `pageserver_wait_lsn_seconds` - `pageserver_io_operations_seconds` --------- Co-authored-by: Shany Pozin <shany@neon.tech>	2023-07-25 00:43:27 +03:00
Alex Chi Z	407a20ceae	add proxy unit tests for retry connections (#4721 ) Given now we've refactored `connect_to_compute` as a generic, we can test it with mock backends. In this PR, we mock the error API and connect_once API to test the retry behavior of `connect_to_compute`. In the next PR, I'll add mock for credentials so that we can also test behavior with `wake_compute`. ref https://github.com/neondatabase/neon/issues/4709 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-24 20:41:42 +03:00
arpad-m	e5b7ddfeee	Preparatory pageserver async conversions (#4773 ) In #4743, we'd like to convert the read path to use `async` rust. In preparation of that, this PR switches some functions that are calling lower level functions like `BlockReader::read_blk`, `BlockCursor::read_blob`, etc into `async`. The PR does not switch all functions however, and only focuses on the ones which are easy to switch. This leaves around some async functions that are (currently) unnecessarily `async`, but on the other hand it makes future changes smaller in diff. Part of #4743 (but does not completely address it).	2023-07-24 14:01:54 +02:00
Alek Westover	7feb0d1a80	`unwrap` instead of passing `anyhow::Error` on failure to spawn a thread (#4779 )	2023-07-21 15:17:16 -04:00
Konstantin Knizhnik	457e3a3ebc	Mx offset bug (#4775 ) Fix mx_offset_to_flags_offset() function Fixes issue #4774 Postgres `MXOffsetToFlagsOffset` was not correctly converted to Rust because cast to u16 is done before division by modulo. It is possible only if divider is power of two. Add a small rust unit test to check that the function produces same results as the PostgreSQL macro, and extend the existing python test to cover this bug. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-07-21 21:20:53 +03:00
Joonas Koivunen	25d2f4b669	metrics: chunked responses (#4768 ) Metrics can get really large in the order of hundreds of megabytes, which we used to buffer completly (after a few rounds of growing the buffer).	2023-07-21 15:10:55 +00:00
Alex Chi Z	1685593f38	stable merge and sort in compaction (#4573 ) Per discussion at https://github.com/neondatabase/neon/pull/4537#discussion_r1242086217, it looks like a better idea to use `<` instead of `<=` for all these comparisons. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-21 10:15:44 -04:00
dependabot[bot]	8d0f4a7857	Bump aiohttp from 3.7.4 to 3.8.5 (#4762 )	2023-07-20 22:33:50 +03:00
Alex Chi Z	3fc3666df7	make flush frozen layer an atomic operation (#4720 ) ## Problem close https://github.com/neondatabase/neon/issues/4712 ## Summary of changes Previously, when flushing frozen layers, it was split into two operations: add delta layer to disk + remove frozen layer from memory. This would cause a short period of time where we will have the same data both in frozen and delta layer. In this PR, we merge them into one atomic operation in layer map manager, therefore simplifying the code. Note that if we decide to create image layers for L0 flush, it will still be split into two operations on layer map. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-07-20 13:39:19 -04:00
Joonas Koivunen	89746a48c6	chore: fix copypaste caused flakyness (#4763 ) I introduced a copypaste error leading to flaky [test failure][report] in #4737. Solution is to use correct/unique test name. I also looked into providing a proper fn name via macro but ... Yeah, it's probably not a great idea. [report]: https://github.com/neondatabase/neon/actions/runs/5612473297/job/15206293430#step:15:197	2023-07-20 19:55:40 +03:00
Joonas Koivunen	8d27a9c54e	Less verbose eviction failures (#4737 ) As seen in staging logs with some massive compactions (create_image_layer), in addition to racing with compaction or gc or even between two invocations to `evict_layer_batch`. Cc: #4745 Fixes: #3851 (organic tech debt reduction) Solution is not to log the Not Found in such cases; it is perfectly natural to happen. Route to this is quite long, but implemented two cases of "race between two eviction processes" which are like our disk usage based eviction and eviction_task, both have the separate "lets figure out what to evict" and "lets evict" phases.	2023-07-20 17:45:10 +03:00
arpad-m	d98cb39978	pageserver: use tokio::time::timeout where possible (#4756 ) Removes a bunch of cases which used `tokio::select` to emulate the `tokio::time::timeout` function. I've done an additional review on the cancellation safety of these futures, all of them seem to be cancellation safe (not that `select!` allows non-cancellation-safe futures, but as we touch them, such a review makes sense). Furthermore, I correct a few mentions of a non-existent `tokio::timeout!` macro in the docs to the `tokio::time::timeout` function.	2023-07-20 16:19:38 +02:00
Alexander Bayandin	27c73c8740	Bump pg_embedding extension (#4758 ) ``` % git log --pretty=oneline 2465f831ea1f8d49c1d74f8959adb7fc277d70cd..eeb3ba7c3a60c95b2604dd543c64b2f1bb4a3703 eeb3ba7c3a60c95b2604dd543c64b2f1bb4a3703 (HEAD -> main, origin/main) Fixc in-mmeory index rebuild after TRUNCATE 1d7cfcfe3d58e2cf4566900437c609725448d14b Correctly handle truncate forin-0memory HNSW index 8fd2a4a191f67858498d876ec378b58e76b5874a :Fix empty index search issue 30e9ef4064cff40c60ff2f78afeac6c296722757 Fix extensiomn name in makefile 23bb5d504aa21b1663719739f6eedfdcb139d948 Fix getting memory size at Mac OS/X 39193a38d6ad8badd2a8d1dce2dd999e1b86885d Update a comment for the extension bf3b0d62a7df56a5e4db9d9e62dc535794c425bc Merge branch 'main' of https://github.com/neondatabase/pg_embedding c2142d514280e14322d1026f0c811876ccf7a91f Update README.md 53b641880f786d2b69a75941c49e569018e8e97e Create LICENSE 093aaa36d5af183831bf370c97b563c12d15f23a Update README.md 91f0bb84d14cb26fd8b452bf9e1ecea026ac5cbc Update README.md 7f7efa38015f24ee9a09beca3009b8d0497a40b4 Update README.md 71defdd4143ecf35489d93289f6cdfa2545fbd36 Merge pull request #4 from neondatabase/danieltprice-patch-1 e06c228b99c6b7c47ebce3bb7c97dbd494088b0a Update README.md d7e52b576b47d9023743b124bdd0360a9fc98f59 Update README.md 70ab399c861330b50a9aff9ab9edc7044942a65b Merge pull request #5 from neondatabase/oom_error_reporting 0aee1d937997198fa2d2b2ed7a0886d1075fa790 Fix OOM error reporting and support vectprization for ARM 18d80079ce60b2aa81d58cefdf42fc09d2621fc1 Update README.md ```	2023-07-20 12:32:57 +01:00
Joonas Koivunen	9e871318a0	Wait detaches or ignores on pageserver shutdown (#4678 ) Adds in a barrier for the duration of the `Tenant::shutdown`. `pageserver_shutdown` will join this await, `detach`es and `ignore`s will not. Fixes #4429. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-07-20 13:14:13 +03:00
bojanserafimov	e1061879aa	Improve startup python test (#4757 )	2023-07-19 23:46:16 -04:00
Daniel	f09e82270e	Update comment for hnsw extension (#4755 ) Updated the description that appears for hnsw when you query extensions: ``` neondb=> SELECT * FROM pg_available_extensions WHERE name = 'hnsw'; name \| default_version \| installed_version \| comment ----------------------+-----------------+-------------------+-------------------------------------------------- hnsw \| 0.1.0 \| \| Deprecated Please use pg_embedding instead (1 row) ``` --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-07-19 19:08:25 +01:00
Alexander Bayandin	d4a5fd5258	Disable extension uploading to S3 (#4751 ) ## Problem We're going to reset S3 buckets for extensions (https://github.com/neondatabase/aws/pull/413), and as soon as we're going to change the format we store extensions on S3. Let's stop uploading extensions in the old format. ## Summary of changes - Disable `aws s3 cp` step for extensions	2023-07-19 15:44:14 +01:00
Arseny Sher	921bb86909	Use safekeeper tenant only port in all tests and actually test it. Compute now uses special safekeeper WAL service port allowing auth tokens with only tenant scope. Adds understanding of this port to neon_local and fixtures, as well as test of both ports behaviour with different tokens. ref https://github.com/neondatabase/neon/issues/4730	2023-07-19 06:03:51 +04:00
Arseny Sher	1e7db5458f	Add one more WAL service port allowing only tenant scoped auth tokens. It will make it easier to limit access at network level, with e.g. k8s network policies. ref https://github.com/neondatabase/neon/issues/4730	2023-07-19 06:03:51 +04:00
Alexander Bayandin	b4d36f572d	Use sharded-slab from crates (#4729 ) ## Problem We use a patched version of `sharded-slab` with increased MAX_THREADS [1]. It is not required anymore because safekeepers are async now. A valid comment from the original PR tho [1]: > Note that patch can affect other rust services, not only the safekeeper binary. - [1] https://github.com/neondatabase/neon/pull/4122 ## Summary of changes - Remove patch for `sharded-slab`	2023-07-18 13:50:44 +01:00
Joonas Koivunen	762a8a7bb5	python: more linting (#4734 ) Ruff has "B" class of lints, including B018 which will nag on useless expressions, related to #4719. Enable such lints and fix the existing issues. Most notably: - https://beta.ruff.rs/docs/rules/mutable-argument-default/ - https://beta.ruff.rs/docs/rules/assert-false/ --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-07-18 12:56:40 +03:00
Conrad Ludgate	2e8a3afab1	proxy: merge handle_client (#4740 ) ## Problem Second half of #4699. we were maintaining 2 implementations of handle_client. ## Summary of changes Merge the handle_client code, but abstract some of the details. ## Checklist before requesting a review - [X] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-07-17 22:20:23 +01:00
Alexander Bayandin	4580f5085a	test_runner: run benchmarks in parallel (#4683 ) ## Problem Benchmarks run takes about an hour on main branch (in a single job), which delays pipeline results. And it takes another hour if we want to restart the job due to some failures. ## Summary of changes - Use `pytest-split` plugin to run benchmarks on separate CI runners in 4 parallel jobs - Add `scripts/benchmark_durations.py` for getting benchmark durations from the database to help `pytest-split` schedule tests more evenly. It uses p99 for the last 10 days' results (durations). The current distribution could be better; each worker's durations vary from 9m to 35m, but this could be improved in consequent PRs.	2023-07-17 20:09:45 +01:00
Conrad Ludgate	e074ccf170	reduce proxy timeouts (#4708 ) ## Problem 10 retries * 10 second timeouts makes for a very long retry window. ## Summary of changes Adds a 2s timeout to sql_over_http connections, and also reduces the 10s timeout in TCP.	2023-07-17 20:05:26 +01:00
George MacKerron	196943c78f	CORS preflight OPTIONS support for /sql (http fetch) endpoint (#4706 ) ## Problem HTTP fetch can't be used from browsers because proxy doesn't support [CORS 'preflight' `OPTIONS` requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS#preflighted_requests). ## Summary of changes Added a simple `OPTIONS` endpoint for `/sql`.	2023-07-17 20:01:25 +01:00
bojanserafimov	149dd36b6b	Update pg: add startup logs (#4736 )	2023-07-17 14:47:08 -04:00
Kirill Bulatov	be271e3edf	Use upstream version of tokio-tar (#4722 ) tokio-tar 0.3.1 got released, including all changes from the fork currently used, switch over to that one.	2023-07-17 17:18:33 +01:00
Conrad Ludgate	7c85c7ea91	proxy: merge connect compute (#4713 ) ## Problem Half of #4699. TCP/WS have one implementation of `connect_to_compute`, HTTP has another implementation of `connect_to_compute`. Having both is annoying to deal with. ## Summary of changes Creates a set of traits `ConnectMechanism` and `ShouldError` that allows the `connect_to_compute` to be generic over raw TCP stream or tokio_postgres based connections. I'm not super happy with this. I think it would be nice to remove tokio_postgres entirely but that will need a lot more thought to be put into it. I have also slightly refactored the caching to use fewer references. Instead using ownership to ensure the state of retrying is encoded in the type system.	2023-07-17 15:53:01 +01:00
Alex Chi Z	1066bca5e3	compaction: allow duplicated layers and skip in replacement (#4696 ) ## Problem Compactions might generate files of exactly the same name as before compaction due to our naming of layer files. This could have already caused some mess in the system, and is known to cause some issues like https://github.com/neondatabase/neon/issues/4088. Therefore, we now consider duplicated layers in the post-compaction process to avoid violating the layer map duplicate checks. related previous works: close https://github.com/neondatabase/neon/pull/4094 error reported in: https://github.com/neondatabase/neon/issues/4690, https://github.com/neondatabase/neon/issues/4088 ## Summary of changes If a file already exists in the layer map before the compaction, do not modify the layer map and do not delete the file. The file on disk at that time should be the new one overwritten by the compaction process. This PR also adds a test case with a fail point that produces exactly the same set of files. This bypassing behavior is safe because the produced layer files have the same content / are the same representation of the original file. An alternative might be directly removing the duplicate check in the layer map, but I feel it would be good if we can prevent that in the first place. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru> Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-07-17 17:26:29 +03:00
bojanserafimov	1aad8918e1	Document recommended ccls setup (#4723 )	2023-07-17 09:21:42 -04:00
Christian Schwarz	966213f429	basebackup query metric: use same buckets as control plane (#4732 ) The `CRITICAL_OPS_BUCKETS` is not useful for getting an accurate picture of basebackup latency because all the observations that negatively affect our SLI fall into one bucket, i.e., 100ms-1s. Use the same buckets as control plane instead.	2023-07-17 13:46:13 +02:00
arpad-m	35e73759f5	Reword comment and add comment on race condition (#4725 ) The race condition that caused #4526 is still not fixed, so point it out in a comment. Also, reword a comment in upload.rs. Follow-up of #4694	2023-07-17 12:49:58 +02:00
Vadim Kharitonov	48936d44f8	Update postgres version (#4727 )	2023-07-16 13:40:59 +03:00
Em Sharnoff	2eae0a1fe5	Update vm-builder v0.12.1 -> v0.13.1 (#4728 ) This should only affect the version of the vm-informant used. The only change to the vm-informant from v0.12.1 to v0.13.1 was: * https://github.com/neondatabase/autoscaling/pull/407 Just a typo fix; worth getting in anyways.	2023-07-15 15:38:15 -07:00
dependabot[bot]	53470ad12a	Bump cryptography from 41.0.0 to 41.0.2 (#4724 )	2023-07-15 14:36:13 +03:00
Alexander Bayandin	edccef4514	Make CI more friendly for external contributors (#4663 ) ## Problem CI doesn't work for external contributors (for PRs from forks), see #2222 for more information. I'm proposing the following: - External PR is created - PR is reviewed so that it doesn't contain any malicious code - Label `approved-for-ci-run` is added to that PR (by the reviewer) - A new workflow picks up this label and creates an internal branch from that PR (the branch name is `ci-run/pr-`) - CI is run on the branch, but the results are also propagated to the PRs check - We can merge a PR itself if it's green; if not — repeat. ## Summary of changes - Create `approved-for-ci-run.yml` workflow which handles `approved-for-ci-run` label - Trigger `build_and_test.yml` and `neon_extra_builds.yml` workflows on `ci-run/pr-` branches	2023-07-15 11:58:15 +01:00
arpad-m	982fce1e72	Fix rustdoc warnings and test cargo doc in CI (#4711 ) ## Problem `cargo +nightly doc` is giving a lot of warnings: broken links, naked URLs, etc. ## Summary of changes * update the `proc-macro2` dependency so that it can compile on latest Rust nightly, see https://github.com/dtolnay/proc-macro2/pull/391 and https://github.com/dtolnay/proc-macro2/issues/398 * allow the `private_intra_doc_links` lint, as linking to something that's private is always more useful than just mentioning it without a link: if the link breaks in the future, at least there is a warning due to that. Also, one might enable [`--document-private-items`](https://doc.rust-lang.org/cargo/commands/cargo-doc.html#documentation-options) in the future and make these links work in general. * fix all the remaining warnings given by `cargo +nightly doc` * make it possible to run `cargo doc` on stable Rust by updating `opentelemetry` and associated crates to version 0.19, pulling in a fix that previously broke `cargo doc` on stable: https://github.com/open-telemetry/opentelemetry-rust/pull/904 * Add `cargo doc` to CI to ensure that it won't get broken in the future. Fixes #2557 ## Future work * Potentially, it might make sense, for development purposes, to publish the generated rustdocs somewhere, like for example [how the rust compiler does it](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/index.html). I will file an issue for discussion.	2023-07-15 05:11:25 +03:00
Vadim Kharitonov	e767ced8d0	Update rust to 1.71.0 (#4718 ) Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-07-14 18:34:01 +02:00
Alex Chi Z	1309571f5d	proxy: switch to structopt for clap parsing (#4714 ) Using `#[clap]` for parsing cli opts, which is easier to maintain. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-14 19:11:01 +03:00
Joonas Koivunen	9a69b6cb94	Demote deletion warning, list files (#4688 ) Handle test failures like: ``` AssertionError: assert not ['$ts WARN delete_timeline{tenant_id=X timeline_id=Y}: About to remove 1 files\n'] ``` Instead of logging: ``` WARN delete_timeline{tenant_id=X timeline_id=Y}: Found 1 files not bound to index_file.json, proceeding with their deletion WARN delete_timeline{tenant_id=X timeline_id=Y}: About to remove 1 files ``` For each one operation of timeline deletion, list all unref files with `info!`, and then continue to delete them with the added spice of logging the rare/never happening non-utf8 name with `warn!`. Rationale for `info!` instead of `warn!`: this is a normal operation; like we had mentioned in `test_import.py` -- basically whenever we delete a timeline which is not idle. Rationale for N * (`ìnfo!`\|`warn!`): symmetry for the layer deletions; if we could ever need those, we could also need these for layer files which are not yet mentioned in `index_part.json`. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-07-14 18:59:16 +03:00
Joonas Koivunen	cc82cd1b07	spanchecks: Support testing without tracing (#4682 ) Tests cannot be ran without configuring tracing. Split from #4678. Does not nag about the span checks when there is no subscriber configured, because then the spans will have no links and nothing can be checked. Sadly the `SpanTrace::status()` cannot be used for this. `tracing` is always configured in regress testing (running with `pageserver` binary), which should be enough. Additionally cleans up the test code in span checks to be in the test code. Fixes a `#[should_panic]` test which was flaky before these changes, but the `#[should_panic]` hid the flakyness. Rationale for need: Unit tests might not be testing only the public or `feature="testing"` APIs which are only testable within `regress` tests so not all spans might be configured.	2023-07-14 17:45:25 +03:00
Alex Chi Z	c76b74c50d	semantic layer map operations (#4618 ) ## Problem ref https://github.com/neondatabase/neon/issues/4373 ## Summary of changes A step towards immutable layer map. I decided to finish the refactor with this new approach and apply https://github.com/neondatabase/neon/pull/4455 on this patch later. In this PR, we moved all modifications of the layer map to one place with semantic operations like `initialize_local_layers`, `finish_compact_l0`, `finish_gc_timeline`, etc, which is now part of `LayerManager`. This makes it easier to build new features upon this PR: * For immutable storage state refactor, we can simply replace the layer map with `ArcSwap<LayerMap>` and remove the `layers` lock. Moving towards it requires us to put all layer map changes in a single place as in https://github.com/neondatabase/neon/pull/4455. * For manifest, we can write to manifest in each of the semantic functions. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-07-13 17:35:27 +03:00
Alexey Kondratov	ed938885ff	[compute_ctl] Fix deletion of template databases (#4661 ) If database was created with `is_template true` Postgres doesn't allow dropping it right away and throws error ``` ERROR: cannot drop a template database ``` so we have to unset `is_template` first. Fixing it, I noticed that our `escape_literal` isn't exactly correct and following the same logic as in `quote_literal_internal`, we need to prepend string with `E`. Otherwise, it's not possible to filter `pg_database` using `escape_literal()` result if name contains `\`, for example. Also use `FORCE` to drop database even if there are active connections. We run this from `cloud_admin`, so it should have enough privileges. NB: there could be other db states, which prevent us from dropping the database. For example, if db is used by any active subscription or logical replication slot. TODO: deal with it once we allow logical replication. Proper fix should involve returning an error code to the control plane, so it could figure out that this is a non-retryable error, return it to the user and mark operation as permanently failed. Related to neondatabase/cloud#4258	2023-07-13 13:18:35 +02:00
Conrad Ludgate	db4d094afa	proxy: add more error cases to retry connect (#4707 ) ## Problem In the logs, I noticed we still weren't retrying in some cases. Seemed to be timeouts but we explicitly wanted to handle those ## Summary of changes Retry on io::ErrorKind::TimedOut errors. Handle IO errors in tokio_postgres::Error.	2023-07-13 11:47:27 +01:00
Conrad Ludgate	0626e0bfd3	proxy: refactor some error handling and shutdowns (#4684 ) ## Problem It took me a while to understand the purpose of all the tasks spawned in the main functions. ## Summary of changes Utilising the type system and less macros, plus much more comments, document the shutdown procedure of each task in detail	2023-07-13 11:03:37 +01:00
Stas Kelvich	444d6e337f	add rfcs/022-user-mgmt.md (#3838 ) Co-authored-by: Vadim Kharitonov <vadim@neon.tech>	2023-07-12 19:58:55 +02:00
Arthur Petukhovsky	3a1be9b246	Broadcast before exiting sync safekeepers (#4700 ) Recently we started doing sync-safekeepers before exiting compute_ctl, expecting that it will make next sync faster by skipping recovery. But recovery is still running in some cases (https://github.com/neondatabase/neon/pull/4574#issuecomment-1629256166) because of the lagging truncateLsn. This PR should help with updating truncateLsn.	2023-07-12 17:48:20 +01:00
arpad-m	664d32eb7f	Don't propagate but log file not found error in layer uploading (#4694 ) This addresses the issue in #4526 by adding a test that reproduces the race condition that gave rise to the bug (or at least a race condition that gave rise to the same error message), and then implementing a fix that just prints a message to the log if a file could not been found for uploading. Even though the underlying race condition is not fixed yet, this will un-block the upload queue in that situation, greatly reducing the impact of such a (rare) race. Fixes #4526.	2023-07-12 18:10:49 +02:00
Alexander Bayandin	ed845b644b	Prevent unintentional Postgres submodule update (#4692 ) ## Problem Postgres submodule can be changed unintentionally, and these changes are easy to miss during the review. Adding a check that should prevent this from happening, the check fails `build-neon` job with the following message: ``` Expected postgres-v14 rev to be at '1414141414141414141414141414141414141414', but it is at '1144aee1661c79eec65e784a8dad8bd450d9df79' Expected postgres-v15 rev to be at '1515151515151515151515151515151515151515', but it is at '1984832c740a7fa0e468bb720f40c525b652835d' Please update vendors/revisions.json if these changes are intentional. ``` This is an alternative approach to https://github.com/neondatabase/neon/pull/4603 ## Summary of changes - Add `vendor/revisions.json` file with expected revisions - Add built-time check (to `build-neon` job) that Postgres submodules match revisions from `vendor/revisions.json` - A couple of small improvements for logs from https://github.com/neondatabase/neon/pull/4603 - Fixed GitHub autocomment for no tests was run case --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-07-12 15:12:37 +01:00
Joonas Koivunen	87dd37a2f2	pageserver: Align tenant, timeline id names in spans (#4687 ) Uses `(tenant\|timeline)_id`. Not a statement about endorsing this naming style but it is better to be aligned.	2023-07-12 16:58:40 +03:00
arpad-m	1355bd0ac5	layer deletion: Improve a comment and fix TOCTOU (#4673 ) The comment referenced an issue that was already closed. Remove that reference and replace it with an explanation why we already don't print an error. See discussion in https://github.com/neondatabase/neon/issues/2934#issuecomment-1626505916 For the TOCTOU fixes, the two calls after the `.exists()` both didn't handle the situation well where the file was deleted after the initial `.exists()`: one would assume that the path wasn't a file, giving a bad error, the second would give an accurate error but that's not wanted either. We remove both racy `exists` and `is_file` checks, and instead just look for errors about files not being found.	2023-07-12 15:52:14 +02:00
Conrad Ludgate	a1d6b1a4af	proxy wake_compute loop (#4675 ) ## Problem If we fail to wake up the compute node, a subsequent connect attempt will definitely fail. However, kubernetes won't fail the connection immediately, instead it hangs until we timeout (10s). ## Summary of changes Refactor the loop to allow fast retries of compute_wake and to skip a connect attempt.	2023-07-12 11:38:36 +01:00
bojanserafimov	92aee7e07f	cold starts: basebackup compression (#4482 ) Co-authored-by: Alex Chi Z <iskyzh@gmail.com>	2023-07-11 13:11:23 -04:00
Em Sharnoff	5e2f29491f	Update vm-builder v0.11.1 -> v0.12.1 (#4680 ) This should only affect the version of the vm-informant used. The only PR changing the informant since v0.11.1 was: * https://github.com/neondatabase/autoscaling/pull/389 The bug that autoscaling#389 fixed impacts all pooled VMs, so the updated images from this PR must be released before https://github.com/neondatabase/cloud/pull/5721.	2023-07-11 12:45:25 +02:00
bojanserafimov	618d36ee6d	compute_ctl: log a structured event on successful start (#4679 )	2023-07-10 15:34:26 -04:00
Alexander Bayandin	33c2d94ba6	Fix git-env version for PRs (#4641 ) ## Problem Binaries created from PRs (both in docker images and for tests) have wrong git-env versions, they point to phantom merge commits. ## Summary of changes - Prefer GIT_VERSION env variable even if git information was accessible - Use `${{ github.event.pull_request.head.sha \|\| github.sha }}` instead of `${{ github.sha }}` for `GIT_VERSION` in workflows So the builds will still happen from this phantom commit, but we will report the PR commit. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-07-10 20:01:01 +01:00
Alex Chi Z	08bfe1c826	remove `LayerDescriptor` and use `LayerObject` for tests (#4637 ) ## Problem part of https://github.com/neondatabase/neon/pull/4340 ## Summary of changes Remove LayerDescriptor and remove `todo!`. At the same time, this PR adds `AsLayerDesc` trait for all persistent layers and changed `LayerFileManager` to have a generic type. For tests, we are now using `LayerObject`, which is a wrapper around `PersistentLayerDesc`. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-10 19:40:37 +03:00
Christian Schwarz	65ff256bb8	page_service: add peer_addr span field, and set tenant_id / timeline_id fields earlier (#4638 ) Before this PR, during shutdown, we'd find naked logs like this one for every active page service connection: ``` 2023-07-05T14:13:50.791992Z INFO shutdown request received in run_message_loop ``` This PR 1. adds a peer_addr span field to distinguish the connections in logs 2. sets the tenant_id / timeline_id fields earlier It would be nice to have `tenant_id` and `timeline_id` directly on the `page_service_conn_main` span (empty, initially), then set them at the top of `process_query`. The problem is that the debug asserts for `tenant_id` and `timeline_id` presence in the tracing span doesn't support detecting empty values [1]. So, I'm a bit hesitant about over-using `Span::record`. [1] https://github.com/neondatabase/neon/issues/4676	2023-07-10 15:23:40 +02:00
Alex Chi Z	5177c1e4b1	pagectl: separate xy margin for draw timeline (#4669 ) We were computing margin by lsn range, but this will cause problems for layer maps with large overlapping LSN range. Now we compute x, y margin separately to avoid this issue. ## Summary of changes before: <img width="1651" alt="image" src="https://github.com/neondatabase/neon/assets/4198311/3bfb50cb-960b-4d8f-9bbe-a55c89d82a28"> we have a lot of rectangles of negative width, and they disappear in the layer map. after: <img width="1320" alt="image" src="https://github.com/neondatabase/neon/assets/4198311/550f0f96-849f-4bdc-a852-b977499f04f4"> Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-10 09:22:06 -04:00
Christian Schwarz	49efcc3773	walredo: add tenant_id to span of NoLeakChild::drop (#4640 ) We see the following log lines occasionally in prod: ``` kill_and_wait_impl{pid=1983042}: wait successful exit_status=signal: 9 (SIGKILL) ``` This PR makes it easier to find the tenant for the pid, by including the tenant id as a field in the span.	2023-07-10 12:49:22 +03:00
Dmitry Rodionov	76b1cdc17e	Order tenant_id argument before timeline_id, use references (#4671 ) It started from few config methods that have various orderings and sometimes use references sometimes not. So I unified path manipulation methods to always order tenant_id before timeline_id and use referenced because we dont need owned values. Similar changes happened to call-sites of config methods. I'd say its a good idea to always order tenant_id before timeline_id so it is consistent across the whole codebase.	2023-07-10 10:23:37 +02:00
Alexander Bayandin	1f151d03d8	Dockerfile.compute-node: support arm64 (#4660 ) ## Problem `docker build ... -f Dockerfile.compute-node ...` fails on ARM (I'm checking on macOS). ## Summary of changes - Download the arm version of cmake on arm	2023-07-07 18:21:15 +01:00
Conrad Ludgate	ac758e4f51	allow repeated IO errors from compute node (#4624 ) ## Problem #4598 compute nodes are not accessible some time after wake up due to kubernetes DNS not being fully propagated. ## Summary of changes Update connect retry mechanism to support handling IO errors and sleeping for 100ms ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-07-07 19:50:50 +03:00
arpad-m	4f280c2953	Small pageserver cleanups (#4657 ) ## Problem I was reading the code of the page server today and found these minor things that I thought could be cleaned up. ## Summary of changes * remove a redundant indentation layer and continue in the flushing loop * use the builtin `PartialEq` check instead of hand-rolling a `range_eq` function * Add a missing `>` to a prominent doc comment	2023-07-07 16:53:14 +02:00
Dmitry Rodionov	20137d9588	Polish tracing helpers (#4651 ) Context: comments here: https://github.com/neondatabase/neon/pull/4645	2023-07-06 19:49:14 +03:00
Arseny Sher	634be4f4e0	Fix async write in safekeepers. General Rust Write trait semantics (as well as its async brother) is that write definitely happens only after Write::flush(). This wasn't needed in sync where rust write calls the syscall directly, but is required in async. Also fix setting initial end_pos in walsender, sometimes it was from the future. fixes https://github.com/neondatabase/neon/issues/4518	2023-07-06 19:56:28 +04:00
Alex Chi Z	d340cf3721	dump more info in layer map (#4567 ) A simple commit extracted from https://github.com/neondatabase/neon/pull/4539 This PR adds more info for layer dumps (is_delta, is_incremental, size). --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-06 18:21:45 +03:00
Konstantin Knizhnik	1741edf933	Pageserver reconnect v2 (#4519 ) ## Problem Compute is not always able to reconnect to pages server. First of all it caused by long time of restart of pageserver. So number of attempts is increased from 5 (hardcoded) to 60 (GUC). Also we do not perform flush after each command to increase performance (it is especially critical for prefetch). Unfortunately such pending flush makes it not possible to transparently reconnect to restarted pageserver. What we can do is to try to minimzie such probabilty. Most likely broken connection will be detected in first sens command after some idle period. This is why max_flush_delay parameter is added which force flush to be performed after first request after some idle period. See #4497 ## Summary of changes Add neon.max_reconnect_attempts and neon.max_glush_delay GUCs which contol when flush has to be done and when it is possible to try to reconnect to page server ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-07-06 12:47:43 +03:00
Joonas Koivunen	269e20aeab	fix: filter out zero synthetic sizes (#4639 ) Apparently sending the synthetic_size == 0 is causing problems, so filter out sending zeros. Slack discussion: https://neondb.slack.com/archives/C03F5SM1N02/p1688574285718989?thread_ts=1687874910.681049&cid=C03F5SM1N02	2023-07-06 12:32:34 +03:00
Tomoka Hayashi	91435006bd	Fix docker-compose file and document (#4621 ) ## Problem - Running the command according to docker.md gives warning and error. - Warning `permissions should be u=rw (0600) or less` is output when executing `psql -h localhost -p 55433 -U cloud_admin`. - `FATAL: password authentication failed for user "root”` is output in compute logs. ## Summary of changes - Add `$ chmod 600 ~/.pgpass` in docker.md to avoid warning. - Add username (cloud_admin) to pg_isready command in docker-compose.yml to avoid error. --------- Co-authored-by: Tomoka Hayashi <tomoka.hayashi@ntt.com>	2023-07-06 10:11:24 +01:00
Dmitry Rodionov	b263510866	move some logical size bits to separate logical_size.rs	2023-07-06 11:58:41 +03:00
Dmitry Rodionov	e418fc6dc3	move some tracing related assertions to separate module for tenant and timeline	2023-07-06 11:58:41 +03:00
Dmitry Rodionov	434eaadbe3	Move uninitialized timeline from tenant.rs to timeline/uninit.rs	2023-07-06 11:58:41 +03:00
Alexander Bayandin	6fb7edf494	Compile `pg_embedding` extension (#4634 ) ``` CREATE EXTENSION embedding; CREATE TABLE t (val real[]); INSERT INTO t (val) VALUES ('{0,0,0}'), ('{1,2,3}'), ('{1,1,1}'), (NULL); CREATE INDEX ON t USING hnsw (val) WITH (maxelements = 10, dims=3, m=3); INSERT INTO t (val) VALUES (array[1,2,4]); SELECT * FROM t ORDER BY val <-> array[3,3,3]; val --------- {1,2,3} {1,2,4} {1,1,1} {0,0,0} (5 rows) ```	2023-07-05 18:40:25 +01:00
Christian Schwarz	505aa242ac	page cache: add size metrics (#4629 ) Make them a member of `struct PageCache` to prepare for a future where there's no global state.	2023-07-05 15:36:42 +03:00
arpad-m	1c516906e7	Impl Display for LayerFileName and require it for Layer (#4630 ) Does three things: * add a `Display` impl for `LayerFileName` equal to the `short_id` * based on that, replace the `Layer::short_id` function by a requirement for a `Display` impl * use that `Display` impl in the places where the `short_id` and `file_name()` functions were used instead Fixes #4145	2023-07-05 14:27:50 +02:00
Christian Schwarz	7d7cd8375c	callers of task_mgr::spawn: some top-level async blocks were missing tenant/timeline id (#4283 ) Looking at logs from staging and prod, I found there are a bunch of log lines without tenant / timeline context. Manully walk through all task_mgr::spawn lines and fix that using the least amount of work required. While doing it, remove some redundant `shutting down` messages. refs https://github.com/neondatabase/neon/issues/4222	2023-07-05 14:04:05 +02:00
Vadim Kharitonov	c92b7543b5	Update `pgvector` to 0.4.4 (#4632 ) After announcing `hnsw`, there is a hypothesis that the community will start comparing it with `pgvector` by themselves. Therefore, let's have an actual version of `pgvector` in Neon.	2023-07-05 13:39:51 +03:00
Stas Kelvich	dbf88cf2d7	Minimalistic pool for http endpoint compute connections (under opt-in flag) Cache up to 20 connections per endpoint. Once all pooled connections are used current implementation can open an extra connection, so the maximum number of simultaneous connections is not enforced. There are more things to do here, especially with background clean-up of closed connections, and checks for transaction state. But current implementation allows to check for smaller coonection latencies that this cache should bring.	2023-07-05 12:00:03 +03:00
Konstantin Knizhnik	f1db87ac36	Check if there is enough memory for HNSW index (#4602 ) ## Problem HNSW index is created in memory. Try to prevent OOM by checking of available RAM. ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-07-05 11:40:38 +03:00
Christian Schwarz	3f9defbfb4	page cache: add access & hit rate metrics (#4628 ) Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>	2023-07-05 10:38:32 +02:00
bojanserafimov	c7143dbde6	compute_ctl: Fix misleading metric (#4608 )	2023-07-04 19:07:36 -04:00
Stas Kelvich	cbf9a40889	Set a shorter timeout for the initial connection attempti in proxy. In case we try to connect to an outdated address that is no longer valid, the default behavior of Kubernetes is to drop the packets, causing us to wait for the entire timeout period. We want to fail fast in such cases. A specific case to consider is when we have cached compute node information with a 5-minute TTL (Time To Live), but the user has executed a `/suspend` API call, resulting in the nonexistence of the compute node.	2023-07-04 20:34:22 +03:00
Joonas Koivunen	10aba174c9	metrics: Remove comments regarding upgradeable rwlocks (#4622 ) Closes #4001 by removing the comments alluding towards upgradeable/downgradeable RwLocks.	2023-07-04 17:40:51 +03:00
Conrad Ludgate	ab2ea8cfa5	use pbkdf2 crate (#4626 ) ## Problem While pbkdf2 is a simple algorithm, we should probably use a well tested implementation ## Summary of changes * Use pbkdf2 crate * Use arrays like the hmac comment says ## Checklist before requesting a review - [X] I have performed a self-review of my code. - [X] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-07-04 14:54:59 +01:00
arpad-m	9c8c55e819	Add _cached and _bytes to pageserver_tenant_synthetic_size metric name (#4616 ) This renames the `pageserver_tenant_synthetic_size` metric to `pageserver_tenant_synthetic_cached_size_bytes`, as was requested on slack (link in the linked issue). * `_cached` to hint that it is not incrementally calculated * `_bytes` to indicate the unit the size is measured in Fixes #3748	2023-07-03 19:34:07 +02:00
Conrad Ludgate	10110bee69	fix setup instructions (#4615 ) ## Problem 1. The local endpoints provision 2 ports (postgres and HTTP) which means the migration_check endpoint has a different port than what the setup implies 2. psycopg2-binary 2.9.3 has a deprecated poetry config and doesn't install. ## Summary of changes Update psycopg2-binary and update the endpoint ports in the readme --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-07-03 18:10:44 +03:00
Joonas Koivunen	cff7ae0b0d	fix: no more ansi colored logs (#4613 ) Allure does not support ansi colored logs, yet `compute_ctl` has them. Upgrade criterion to get rid of atty dependency, disable ansi colors, remove atty dependency and disable ansi feature of tracing-subscriber. This is a heavy-handed approach. I am not aware of a workflow where you'd want to connect a terminal directly to for example `compute_ctl`, usually you find the logs in a file. If someone had been using colors, they will now need to: - turn the `tracing-subscriber.default-features` to `true` - edit their wanted project to have colors I decided to explicitly disable ansi colors in case we would have in future a dependency accidentally enabling the feature on `tracing-subscriber`, which would be quite surprising but not unimagineable. By getting rid of `atty` from dependencies we get rid of <https://github.com/advisories/GHSA-g98v-hv3f-hcfr>.	2023-07-03 16:37:02 +03:00
Alexander Bayandin	78a7f68902	Make pg_version and build_type regular parameters (#4311 ) ## Problem All tests have already been parametrised by Postgres version and build type (to have them distinguishable in the Allure report), but despite it, it's anyway required to have DEFAULT_PG_VERSION and BUILD_TYPE env vars set to corresponding values, for example to run`test_timeline_deletion_with_files_stuck_in_upload_queue[release-pg14-local_fs]` test it's required to set `DEFAULT_PG_VERSION=14` and `BUILD_TYPE=release`. This PR makes the test framework pick up parameters from the test name itself. ## Summary of changes - Postgres version and build type related fixtures now are function-scoped (instead of being sessions scoped before) - Deprecate `--pg-version` argument in favour of DEFAULT_PG_VERSION env variable (it's easier to parse) - GitHub autocomment now includes only one command with all the failed tests + runs them in parallel	2023-07-03 13:51:40 +01:00
Christian Schwarz	24eaa3b7ca	timeline creation: reflect failures due to ancestor LSN issues in status code (#4600 ) Before, it was a `500` and control plane would retry, wereas it actually should have stopped retrying. (Stacked on top of https://github.com/neondatabase/neon/pull/4597 ) fixes https://github.com/neondatabase/neon/issues/4595 part of https://github.com/neondatabase/cloud/issues/5626 --------- Co-authored-by: Shany Pozin <shany@neon.tech>	2023-07-03 15:21:10 +03:00
Shany Pozin	26828560a8	Add timeouts and retries to consumption metrics reporting client (#4563 ) ## Problem #4528 ## Summary of changes Add a 60 seconds default timeout to the reqwest client Add retries for up to 3 times to call into the metric consumption endpoint --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-07-03 15:20:49 +03:00
Alek Westover	86604b3b7d	Delete Unnecessary files in Extension Bucket (#4606 ) Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-07-03 07:37:30 -04:00
Joonas Koivunen	4957bb2d48	fix(proxy): stray span enter globbers up logs (#4612 ) Prod logs have deep accidential span nesting. Introduced in #3759 and has been untouched since, maybe no one watches proxy logs? :) I found it by accident when looking to see if proxy logs have ansi colors with `{neon_service="proxy"}`. The solution is to mostly stop using `Span::enter` or `Span::entered` in async code. Kept on `Span::entered` in cancel on shutdown related path.	2023-07-03 11:53:57 +01:00
Sasha Krassovsky	ff1a1aea86	Make control plane connector send encrypted password (#4607 ) Control plane needs the encrypted hash that postgres itself generates	2023-06-30 14:17:44 -07:00
Em Sharnoff	c9f05d418d	Bump vm-builder v0.11.0 -> v0.11.1 (#4605 ) This applies the fix from https://github.com/neondatabase/autoscaling/pull/367, which should resolve the "leaking cloud_admin connections" issue that has been observed for some customers.	2023-06-30 23:49:06 +03:00
bojanserafimov	9de1a6fb14	cold starts: Run sync_safekeepers on compute_ctl shutdown (#4588 )	2023-06-30 16:29:47 -04:00
Konstantin Knizhnik	fbd37740c5	Make file_cache logging less verbose (#4601 ) ## Problem Message "set local file cache limit to..." polutes compute logs. ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-06-30 21:55:56 +03:00
Anastasia Lubennikova	3e55d9dec6	Bump vendor/postgres Point to REL_14_STABLE_neon and REL_15_STABLE_neon. This change updates PostgreSQL versions to 14.8 and 15.3	2023-06-30 16:29:10 +03:00
Christian Schwarz	f558f88a08	refactor: distinguished error type for timeline creation failure (#4597 ) refs https://github.com/neondatabase/neon/issues/4595	2023-06-30 14:53:21 +02:00
Dmitry Rodionov	b990200496	tests: use shortcut everywhere to get timeline path (#4586 )	2023-06-30 15:01:06 +03:00
Alex Chi Z	7e20b49da4	refactor: use LayerDesc in LayerMap (part 2) (#4437 ) ## Problem part of https://github.com/neondatabase/neon/issues/4392, continuation of https://github.com/neondatabase/neon/pull/4408 ## Summary of changes This PR removes all layer objects from LayerMap and moves it to the timeline struct. In timeline struct, LayerFileManager maps a layer descriptor to a layer object, and it is stored in the same RwLock as LayerMap to avoid behavior difference. Key changes: * LayerMap now does not have generic, and only stores descriptors. * In Timeline, we add a new struct called layer mapping. * Currently, layer mapping is stored in the same lock with layer map. Every time we retrieve data from the layer map, we will need to map the descriptor to the actual object. * Replace_historic is moved to layer mapping's replace, and the return value behavior is different from before. I'm a little bit unsure about this part and it would be good to have some comments on that. * Some test cases are rewritten to adapt to the new interface, and we can decide whether to remove it in the future because it does not make much sense now. * LayerDescriptor is moved to `tests` module and should only be intended for unit testing / benchmarks. * Because we now have a usage pattern like "take the guard of lock, then get the reference of two fields", we want to avoid dropping the incorrect object when we intend to unlock the lock guard. Therefore, a new set of helper function `drop_r/wlock` is added. This can be removed in the future when we finish the refactor. TODOs after this PR: fully remove RemoteLayer, and move LayerMapping to a separate LayerCache. all refactor PRs: ``` #4437 --- #4479 ------------ #4510 (refactor done at this point) \-- #4455 -- #4502 --/ ``` --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-06-29 15:06:07 -04:00
Alek Westover	032b603011	Fix: Wrong Enum Variant (#4589 )	2023-06-29 10:55:02 -04:00
Alex Chi Z	ca0e0781c8	use const instead of magic number for repartition threshold (#4286 ) There is a magic number about how often we repartition and therefore affecting how often we compact. This PR makes this number `10` a global constant and add docs. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-06-29 16:56:17 +03:00
Alexander Bayandin	b2a5e91a88	Upload custom extensions to S3 (#4585 ) ## Problem We want to have a number of custom extensions that will not be available by default, as an example of such is [Postgres Anonymizer](https://postgresql-anonymizer.readthedocs.io/en/stable/) (please note that the extension should be added to `shared_preload_libraries`). To distinguish them, custom extensions should be added to a different S3 path: ``` s3://<bucket>/<release version>/<postgres_version>/<ext_name>/share/extensions/ s3://<bucket>/<release version>/<postgres_version>/<ext_name>/lib where <ext_name> is an extension name ``` Resolves https://github.com/neondatabase/neon/issues/4582 ## Summary of changes - Add Postgres Anonymizer extension to Dockerfile (it's included only to postgres-extensions target) - Build extensions image from postgres-extensions target in a workflow - Upload custom extensions to S3 (different directory)	2023-06-29 16:33:26 +03:00
Joonas Koivunen	44e7d5132f	fix: hide token from logs (#4584 ) fixes #4583 and also changes all needlessly arg listing places to use `skip_all`.	2023-06-29 15:53:16 +03:00
Alex Chi Z	c19681bc12	neon_local: support force init (#4363 ) When we use local SSD for bench and create `.neon` directory before we do `cargo neon init`, the initialization process will error due to directory already exists. This PR adds a flag `--force` that removes everything inside the directory if `.neon` already exists. --------- Signed-off-by: Alex Chi Z. <chi@neon.tech>	2023-06-28 11:39:07 -04:00
Shany Pozin	ec9b585837	Add Activating as a possible state for attaching a tenant in test_tenant_relocation validation (#4581 ) Fix flakyness of test_tenant_relocation	2023-06-28 12:35:52 +03:00
Joonas Koivunen	02ef246db6	refactor: to pattern of await after timeout (#4432 ) Refactor the `!completed` to be about `Option<_>` instead, side-stepping any boolean true/false or false/true. As discussed on https://github.com/neondatabase/neon/pull/4399#discussion_r1219321848	2023-06-28 06:18:45 +00:00
Konstantin Knizhnik	195d4932c6	Set LwLSN after WAL records when redo is performed or skipped (#4579 ) ## Problem See #4516 Inspecting log it is possible to notice that if lwlsn is set to the beginning of applied WAL record, then incorrect version of the page is loaded: ``` 2023-06-27 18:36:51.930 GMT [3273945] CONTEXT: WAL redo at 0/14AF6F0 for Heap/INSERT: off 2 flags 0x01; blkref #0: rel 1663/5/1259, blk 0 FPW 2023-06-27 18:36:51.930 GMT [3273945] LOG: Do REDO block 0 of rel 1663/5/1259 fork 0 at LSN 0/014AF6F0..0/014AFA60 2023-06-27 18:37:02.173 GMT [3273963] LOG: Read blk 0 in rel 1663/5/1259 fork 0 (request LSN 0/014AF6F0): lsn=0/0143C7F8 at character 22 2023-06-27 18:37:47.780 GMT [3273945] LOG: apply WAL record at 0/1BB8F38 xl_tot_len=188, xl_prev=0/1BB8EF8 2023-06-27 18:37:47.780 GMT [3273945] CONTEXT: WAL redo at 0/1BB8F38 for Heap/INPLACE: off 2; blkref #0: rel 1663/5/1259, blk 0 2023-06-27 18:37:47.780 GMT [3273945] LOG: Do REDO block 0 of rel 1663/5/1259 fork 0 at LSN 0/01BB8F38..0/01BB8FF8 2023-06-27 18:37:47.780 GMT [3273945] CONTEXT: WAL redo at 0/1BB8F38 for Heap/INPLACE: off 2; blkref #0: rel 1663/5/1259, blk 0 2023-06-27 18:37:47.780 GMT [3273945] PANIC: invalid lp ``` ## Summary of changes 1. Use end record LSN for both cases 2. Update lwlsn for relation metadata ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-06-28 09:04:13 +03:00
Alexander Bayandin	7fe0a4bf1a	Fix promote-images job (#4577 ) ## Problem ``` + crane tag neondatabase/extensions:3337 latest Error: fetching "neondatabase/extensions:3337": GET https://index.docker.io/v2/neondatabase/extensions/manifests/3337: MANIFEST_UNKNOWN: manifest unknown; unknown tag=3337 ``` We don't build `neondatabase/extensions` image yet (broken in https://github.com/neondatabase/neon/pull/4505) ## Summary of changes - Do not try to interact with `neondatabase/extensions`	2023-06-27 20:05:10 +03:00
bojanserafimov	ef2b9ffbcb	Basebackup forward compatibility (#4572 )	2023-06-27 12:05:27 -04:00
Alexander Bayandin	250a27fb85	Upload Postgres Extensions to S3 (#4505 ) ## Problem We want to store Postgres Extensions in S3 (resolves https://github.com/neondatabase/neon/issues/4493). Proposed solution: - Create a separate docker image (from scratch) that contains only extensions - Do not include extensions into compute-node (except for neon extensions)* - For release and main builds upload extract extension from the image and upload to S3 (`s3://<bucket>/<release version>/<postgres_version>/...`)** ) We're not doing it until the feature is not fully implemented *) This differs from the initial proposal in https://github.com/neondatabase/neon/issues/4493 of putting extensions straight into `s3://<bucket>/<postgres_version>/...`, because we can't upload directory atomicly. A drawback of this is that we end up with unnecessary copies of files ~2.1 GB per release (i.e. +2.1 GB for each commit in main also). We don't really need to update extensions for each release if there're no relevant changes, but this requires extra work. ## Summary of changes - Created a separate stage in Dockerfile.compute-node `postgres-extensions` that contains only extensions - Added a separate step in a workflow that builds `postgres-extensions` image (because of a bug in kaniko this step is commented out because it takes way too long to get built) - Extract extensions from the image and upload files to S3 for release and main builds - Upload extenstions only for staging (for now)	2023-06-27 16:23:22 +01:00
Shany Pozin	d748615c1f	RemoteTimelineClient::delete_all() to use s3::delete_objects (#4461 ) ## Problem [#4325](https://github.com/neondatabase/neon/issues/4325) ## Summary of changes Use delete_objects() method	2023-06-27 15:01:32 +03:00
Dmitry Rodionov	681c6910c2	Straighten the spec for timeline delete (#4538 ) ## Problem Lets keep 500 for unusual stuff that is not considered normal. Came up during one of the discussions around console logs now seeing this 500's. ## Summary of changes - Return 409 Conflict instead of 500 - Remove 200 ok status because it is not used anymore	2023-06-27 13:56:32 +03:00
Vadim Kharitonov	148f0f9b21	Compile `pg_roaringbitmap` extension (#4564 ) ## Problem ``` postgres=# create extension roaringbitmap; CREATE EXTENSION postgres=# select roaringbitmap('{1,100,10}'); roaringbitmap ------------------------------------------------ \x3a30000001000000000002001000000001000a006400 (1 row) ```	2023-06-27 10:55:03 +01:00
Shany Pozin	a7f3f5f356	Revert "run `Layer::get_value_reconstruct_data` in `spawn_blocking`#4498" (#4569 ) This reverts commit `1faf69a698`.	2023-06-27 10:57:28 +03:00
Felix Prasanna	00d1cfa503	bump VM_BUILDER_VERSION to 0.11.0 (#4566 ) Routine bump of autoscaling version `0.8.0` -> `0.11.0`	2023-06-26 14:10:27 -04:00
Christian Schwarz	1faf69a698	run `Layer::get_value_reconstruct_data` in `spawn_blocking` (#4498 ) This PR concludes the "async `Layer::get_value_reconstruct_data`" project. The problem we're solving is that, before this patch, we'd execute `Layer::get_value_reconstruct_data` on the tokio executor threads. This function is IO- and/or CPU-intensive. The IO is using VirtualFile / std::fs; hence it's blocking. This results in unfairness towards other tokio tasks, especially under (disk) load. Some context can be found at https://github.com/neondatabase/neon/issues/4154 where I suspect (but can't prove) load spikes of logical size calculation to cause heavy eviction skew. Sadly we don't have tokio runtime/scheduler metrics to quantify the unfairness. But generally, we know blocking the executor threads on std::fs IO is bad. So, let's have this change and watch out for severe perf regressions in staging & during rollout. ## Changes * rename `Layer::get_value_reconstruct_data` to `Layer::get_value_reconstruct_data_blocking` * add a new blanket impl'd `Layer::get_value_reconstruct_data` `async_trait` method that runs `get_value_reconstruct_data_blocking` inside `spawn_blocking`. * The `spawn_blocking` requires `'static` lifetime of the captured variables; hence I had to change the data flow to _move_ the `ValueReconstructState` into and back out of get_value_reconstruct_data instead of passing a reference. It's a small struct, so I don't expect a big performance penalty. ## Performance Fundamentally, the code changes cause the following performance-relevant changes: * Latency & allocations: each `get_value_reconstruct_data` call now makes a short-lived allocation because `async_trait` is just sugar for boxed futures under the hood * Latency: `spawn_blocking` adds some latency because it needs to move the work to a thread pool * using `spawn_blocking` plus the existing synchronous code inside is probably more efficient better than switching all the synchronous code to tokio::fs because _each_ tokio::fs call does `spawn_blocking` under the hood. * Throughput: the `spawn_blocking` thread pool is much larger than the async executor thread pool. Hence, as long as the disks can keep up, which they should according to AWS specs, we will be able to deliver higher `get_value_reconstruct_data` throughput. * Disk IOPS utilization: we will see higher disk utilization if we get more throughput. Not a problem because the disks in prod are currently under-utilized, according to node_exporter metrics & the AWS specs. * CPU utilization: at higher throughput, CPU utilization will be higher. Slightly higher latency under regular load is acceptable given the throughput gains and expected better fairness during disk load peaks, such as logical size calculation peaks uncovered in #4154. ## Full Stack Of Preliminary PRs This PR builds on top of the following preliminary PRs 1. Clean-ups * https://github.com/neondatabase/neon/pull/4316 * https://github.com/neondatabase/neon/pull/4317 * https://github.com/neondatabase/neon/pull/4318 * https://github.com/neondatabase/neon/pull/4319 * https://github.com/neondatabase/neon/pull/4321 * Note: these were mostly to find an alternative to #4291, which I thought we'd need in my original plan where we would need to convert `Tenant::timelines` into an async locking primitive (#4333). In reviews, we walked away from that, but these cleanups were still quite useful. 2. https://github.com/neondatabase/neon/pull/4364 3. https://github.com/neondatabase/neon/pull/4472 4. https://github.com/neondatabase/neon/pull/4476 5. https://github.com/neondatabase/neon/pull/4477 6. https://github.com/neondatabase/neon/pull/4485 7. https://github.com/neondatabase/neon/pull/4441	2023-06-26 11:43:11 +02:00
Christian Schwarz	44a441080d	bring back spawn_blocking for `compact_level0_phase1` (#4537 ) The stats for `compact_level0_phase` that I added in #4527 show the following breakdown (24h data from prod, only looking at compactions with > 1 L1 produced): * 10%ish of wall-clock time spent between the two read locks * I learned that the `DeltaLayer::iter()` and `DeltaLayer::key_iter()` calls actually do IO, even before we call `.next()`. I suspect that is why they take so much time between the locks. * 80+% of wall-clock time spent writing layer files * Lock acquisition time is irrelevant (low double-digit microseconds at most) * The generation of the holes holds the read lock for a relatively long time and it's proportional to the amount of keys / IO required to iterate over them (max: 110ms in prod; staging (nightly benchmarks): multiple seconds). Find below screenshots from my ad-hoc spreadsheet + some graphs. <img width="1182" alt="image" src="https://github.com/neondatabase/neon/assets/956573/81398b3f-6fa1-40dd-9887-46a4715d9194"> <img width="901" alt="image" src="https://github.com/neondatabase/neon/assets/956573/e4ac0393-f2c1-4187-a5e5-39a8b0c394c9"> <img width="210" alt="image" src="https://github.com/neondatabase/neon/assets/956573/7977ade7-6aa5-4773-a0a2-f9729aecee0d"> ## Changes In This PR This PR makes the following changes: * rearrange the `compact_level0_phase1` code such that we build the `all_keys_iter` and `all_values_iter` later than before * only grab the `Timeline::layers` lock once, and hold it until we've computed the holes * run compact_level0_phase1 in spawn_blocking, pre-grabbing the `Timeline::layers` lock in the async code and passing it in as an `OwnedRwLockReadGuard`. * the code inside spawn_blocking drops this guard after computing the holds * the `OwnedRwLockReadGuard` requires the `Timeline::layers` to be wrapped in an `Arc`. I think that's Ok, the locking for the RwLock is more heavy-weight than an additional pointer indirection. ## Alternatives Considered The naive alternative is to throw the entire function into `spawn_blocking`, and use `blocking_read` for `Timeline::layers` access. What I've done in this PR is better because, with this alternative, 1. while we `blocking_read()`, we'd waste one slot in the spawn_blocking pool 2. there's deadlock risk because the spawn_blocking pool is a finite resource ![image](https://github.com/neondatabase/neon/assets/956573/46c419f1-6695-467e-b315-9d1fc0949058) ## Metadata Fixes https://github.com/neondatabase/neon/issues/4492	2023-06-26 11:42:17 +02:00
Sasha Krassovsky	c215389f1c	quote_ident identifiers when creating neon_superuser (#4562 ) ## Problem	2023-06-24 10:34:15 +03:00
Sasha Krassovsky	b1477b4448	Create neon_superuser role, grant it to roles created from control plane (#4425 ) ## Problem Currently, if a user creates a role, it won't by default have any grants applied to it. If the compute restarts, the grants get applied. This gives a very strange UX of being able to drop roles/not have any access to anything at first, and then once something triggers a config application, suddenly grants are applied. This removes these grants.	2023-06-24 01:38:27 +03:00
Christian Schwarz	a500bb06fb	use preinitialize_metrics to initialize page cache metrics (#4557 ) This is follow-up to ``` commit `2252c5c282` Author: Alex Chi Z <iskyzh@gmail.com> Date: Wed Jun 14 17:12:34 2023 -0400 metrics: convert some metrics to pageserver-level (#4490) ```	2023-06-23 16:40:50 -04:00
Christian Schwarz	15456625c2	don't use MGMT_REQUEST_RUNTIME for consumption metrics synthetic size worker (#4560 ) The consumption metrics synthetic size worker does logical size calculation. Logical size calculation currently does synchronous disk IO. This blocks the MGMT_REQUEST_RUNTIME's executor threads, starving other futures. While there's work on the way to move the synchronous disk IO into spawn_blocking, the quickfix here is to use the BACKGROUND_RUNTIME instead of MGMT_REQUEST_RUNTIME. Actually it's not just a quickfix. We simply shouldn't be blocking MGMT_REQUEST_RUNTIME executor threads on CPU or sync disk IO. That work isn't done yet, as many of the mgmt tasks still _do_ disk IO. But it's not as intensive as the logical size calculations that we're fixing here. While we're at it, fix disk-usage-based eviction in a similar way. It wasn't the culprit here, according to prod logs, but it can theoretically be a little CPU-intensive. More context, including graphs from Prod: https://neondb.slack.com/archives/C03F5SM1N02/p1687541681336949	2023-06-23 15:40:36 -04:00
Vadim Kharitonov	a3f0dd2d30	Compile `pg_uuidv7` (#4558 ) Doc says that it should be added into `shared_preload_libraries`, but, practically, it's not required. ``` postgres=# create extension pg_uuidv7; CREATE EXTENSION postgres=# SELECT uuid_generate_v7(); uuid_generate_v7 -------------------------------------- 0188e823-3f8f-796c-a92c-833b0b2d1746 (1 row) ```	2023-06-23 15:56:49 +01:00
Christian Schwarz	76718472be	add pageserver-global histogram for basebackup latency (#4559 ) The histogram distinguishes by ok/err. I took the liberty to create a small abstraction for such use cases. It helps keep the label values inside `metrics.rs`, right next to the place where the metric and its labels are declared.	2023-06-23 16:42:12 +02:00
Alexander Bayandin	c07b6ffbdc	Fix git tag name for release (#4545 ) ## Problem A git tag for a release has an extra `release-` prefix (it looks like `release-release-3439`). ## Summary of changes - Do not add `release-` prefix when create git tag	2023-06-23 12:52:17 +01:00
Alexander Bayandin	6c3605fc24	Nightly Benchmarks: Increase timeout for pgbench-compare job (#4551 ) ## Problem In the test environment vacuum duration fluctuates from ~1h to ~5h, along with another two 1h benchmarks (`select-only` and `simple-update`) it could be up to 7h which is longer than 6h timeout. ## Summary of changes - Increase timeout for pgbench-compare job to 8h - Remove 6h timeouts from Nightly Benchmarks (this is a default value)	2023-06-23 12:47:37 +01:00
Vadim Kharitonov	d96d51a3b7	Update rust to 1.70.0 (#4550 )	2023-06-23 13:09:04 +02:00
Alex Chi Z	a010b2108a	pgserver: better template config file (#4554 ) * `compaction_threshold` should be an integer, not a string. * uncomment `[section]` so that if a user needs to modify the config, they can simply uncomment the corresponding line. Otherwise it's easy for us to forget uncommenting the `[section]` when uncommenting the config item we want to configure. Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-06-23 10:18:06 +03:00
Anastasia Lubennikova	2f618f46be	Use BUILD_TAG in compute_ctl binary. (#4541 ) Pass BUILD_TAG to compute_ctl binary. We need it to access versioned extension storage.	2023-06-22 17:06:16 +03:00
Alexander Bayandin	d3aa8a48ea	Update client libs for test_runner/pg_clients to their latest versions (#4547 ) Resolves https://github.com/neondatabase/neon/security/dependabot/27	2023-06-21 16:20:35 +01:00
Christian Schwarz	e4da76f021	update_gc_info: fix typo in timeline_id tracing field (#4546 ) Commit ``` commit `472cc17b7a` Author: Dmitry Rodionov <dmitry@neon.tech> Date: Thu Jun 15 17:30:12 2023 +0300 propagate lock guard to background deletion task (#4495) ``` did a drive-by fix, but, the drive-by had a typo. ``` gc_loop{tenant_id=2e2f2bff091b258ac22a4c4dd39bd25d}:update_gc_info{timline_id=837c688fd37c903639b9aa0a6dd3f1f1}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000024DA0D1-000000000443FB51}:panic{thread=background op worker location=pageserver/src/tenant/timeline.rs:4843:25}: missing extractors: ["TimelineId"] Stack backtrace: 0: utils::logging::tracing_panic_hook at /libs/utils/src/logging.rs:166:21 1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/alloc/src/boxed.rs:2002:9 2: std::panicking::rust_panic_with_hook at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:692:13 3: std::panicking::begin_panic_handler::{{closure}} at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:579:13 4: std::sys_common::backtrace::__rust_end_short_backtrace at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/sys_common/backtrace.rs:137:18 5: rust_begin_unwind at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:575:5 6: core::panicking::panic_fmt at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/panicking.rs:64:14 7: pageserver::tenant::timeline::debug_assert_current_span_has_tenant_and_timeline_id at /pageserver/src/tenant/timeline.rs:4843:25 8: <pageserver::tenant::timeline::Timeline>::download_remote_layer::{closure#0}::{closure#0} at /pageserver/src/tenant/timeline.rs:4368:9 9: <tracing::instrument::Instrumented<<pageserver::tenant::timeline::Timeline>::download_remote_layer::{closure#0}::{closure#0}> as core::future::future::Future>::poll at /.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs:272:9 10: <pageserver::tenant::timeline::Timeline>::download_remote_layer::{closure#0} at /pageserver/src/tenant/timeline.rs:4363:5 11: <pageserver::tenant::timeline::Timeline>::get_reconstruct_data::{closure#0} at /pageserver/src/tenant/timeline.rs:2618:69 12: <pageserver::tenant::timeline::Timeline>::get::{closure#0} at /pageserver/src/tenant/timeline.rs:565:13 13: <pageserver::tenant::timeline::Timeline>::list_slru_segments::{closure#0} at /pageserver/src/pgdatadir_mapping.rs:427:42 14: <pageserver::tenant::timeline::Timeline>::is_latest_commit_timestamp_ge_than::{closure#0} at /pageserver/src/pgdatadir_mapping.rs:390:13 15: <pageserver::tenant::timeline::Timeline>::find_lsn_for_timestamp::{closure#0} at /pageserver/src/pgdatadir_mapping.rs:338:17 16: <pageserver::tenant::timeline::Timeline>::update_gc_info::{closure#0}::{closure#0} at /pageserver/src/tenant/timeline.rs:3967:71 17: <tracing::instrument::Instrumented<<pageserver::tenant::timeline::Timeline>::update_gc_info::{closure#0}::{closure#0}> as core::future::future::Future>::poll at /.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs:272:9 18: <pageserver::tenant::timeline::Timeline>::update_gc_info::{closure#0} at /pageserver/src/tenant/timeline.rs:3948:5 19: <pageserver::tenant::Tenant>::refresh_gc_info_internal::{closure#0} at /pageserver/src/tenant.rs:2687:21 20: <pageserver::tenant::Tenant>::gc_iteration_internal::{closure#0} at /pageserver/src/tenant.rs:2551:13 21: <pageserver::tenant::Tenant>::gc_iteration::{closure#0} at /pageserver/src/tenant.rs:1490:13 22: pageserver::tenant::tasks::gc_loop::{closure#0}::{closure#0} at /pageserver/src/tenant/tasks.rs:187:21 23: pageserver::tenant::tasks::gc_loop::{closure#0} at /pageserver/src/tenant/tasks.rs:208:5 ``` ## Problem ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-06-21 18:00:14 +03:00
Christian Schwarz	870740c949	cargo update -p openssl (#4542 ) To unblock release https://github.com/neondatabase/neon/pull/4536#issuecomment-1600678054 Context: https://rustsec.org/advisories/RUSTSEC-2023-0044	2023-06-21 15:50:52 +03:00
Dmitry Rodionov	75d583c04a	Tenant::load: fix uninit timeline marker processing (#4458 ) ## Problem During timeline creation we create special mark file which presense indicates that initialization didnt complete successfully. In case of a crash restart we can remove such half-initialized timeline and following retry from control plane side should perform another attempt. So in case of a possible crash restart during initial loading we have following picture: ``` timelines \| - <timeline_id>___uninit \| - <timeline_id> \| - \| <timeline files> ``` We call `std::fs::read_dir` to walk files in `timelines` directory one by one. If we see uninit file we proceed with deletion of both, timeline directory and uninit file. If we see timeline we check if uninit file exists and do the same cleanup. But in fact its possible to get both branches to be true at the same time. Result of readdir doesnt reflect following directory state modifications. So you can still get "valid" entry on the next iteration of the loop despite the fact that it was deleted in one of the previous iterations of the loop. To see that you can apply the following patch (it disables uninit mark cleanup on successful timeline creation): ```diff diff --git a/pageserver/src/tenant.rs b/pageserver/src/tenant.rs index 4beb2664..b3cdad8f 100644 --- a/pageserver/src/tenant.rs +++ b/pageserver/src/tenant.rs @@ -224,11 +224,6 @@ impl UninitializedTimeline<'_> { ) })?; } - uninit_mark.remove_uninit_mark().with_context(\|\| { - format!( - "Failed to remove uninit mark file for timeline {tenant_id}/{timeline_id}" - ) - })?; v.insert(Arc::clone(&new_timeline)); new_timeline.maybe_spawn_flush_loop(); ``` And perform the following steps: ```bash neon_local init neon_local start neon_local tenant create neon_local stop neon_local start ``` The error is: ```log INFO load{tenant_id=X}:blocking: Found an uninit mark file .neon/tenants/X/timelines/Y.___uninit, removing the timeline and its uninit mark 2023-06-09T18:43:41.664247Z ERROR load{tenant_id=X}: load failed, setting tenant state to Broken: failed to load metadata Caused by: 0: Failed to read metadata bytes from path .neon/tenants/X/timelines/Y/metadata 1: No such file or directory (os error 2) ``` So uninit mark got deleted together with timeline directory but we still got directory entry for it and tried to load it. The bug prevented tenant from being successfully loaded. ## Summary of changes Ideally I think we shouldnt place uninit marks in the same directory as timeline directories but move them to separate directory and gather them as an input to actual listing, but that would be sort of an on-disk format change, so just check whether entries are still valid before operating on them.	2023-06-21 14:25:58 +03:00
Alek Westover	b4c5beff9f	`list_files` function in `remote_storage` (#4522 )	2023-06-20 15:36:28 -04:00
bojanserafimov	90e1f629e8	Add test for `skip_pg_catalog_updates` (#4530 )	2023-06-20 11:38:59 -04:00
Alek Westover	2023e22ed3	Add `RelationError` error type to pageserver rather than string parsing error messages (#4508 )	2023-06-19 13:14:20 -04:00
Christian Schwarz	036fda392f	log timings for compact_level0_phase1 (#4527 ) The data will help decide whether it's ok to keep holding Timeline::layers in shared mode until after we've calculated the holes. Other timings are to understand the general breakdown of timings in that function. Context: https://github.com/neondatabase/neon/issues/4492	2023-06-19 17:25:57 +03:00
Arseny Sher	557abc18f3	Fix test_s3_wal_replay assertion flakiness. Supposedly fixes https://github.com/neondatabase/neon/issues/4277	2023-06-19 16:08:20 +04:00
Arseny Sher	3b06a5bc54	Raise pageserver walreceiver timeouts. I observe sporadic reconnections with ~10k idle computes. It looks like a separate issue, probably walreceiver runtime gets blocked somewhere, but in any case 2-3 seconds is too small.	2023-06-19 15:59:38 +04:00
Alexander Bayandin	1b947fc8af	test_runner: workaround rerunfailures and timeout incompatibility (#4469 ) ## Problem `pytest-timeout` and `pytest-rerunfailures` are incompatible (or rather not fully compatible). Timeouts aren't set for reruns. Ref https://github.com/pytest-dev/pytest-rerunfailures/issues/99 ## Summary of changes - Dynamically make timeouts `func_only` for tests that we're going to retry. It applies timeouts for reruns as well.	2023-06-16 18:08:11 +01:00
Christian Schwarz	78082d0b9f	create_delta_layer: avoid needless `stat` (#4489 ) We already do it inside `frozen_layer.write_to_disk()`. Context: https://github.com/neondatabase/neon/pull/4441#discussion_r1228083959	2023-06-16 16:54:41 +02:00
Alexander Bayandin	190c3ba610	Add tags for releases (#4524 ) ## Problem It's not a trivial task to find corresponding changes for a particular release (for example, for 3371 — 🤷) Ref: https://neondb.slack.com/archives/C04BLQ4LW7K/p1686761537607649?thread_ts=1686736854.174559&cid=C04BLQ4LW7K ## Summary of changes - Tag releases - Add a manual trigger for the release workflow	2023-06-16 14:17:37 +01:00
Christian Schwarz	14d495ae14	create_delta_layer: improve misleading TODO comment (#4488 ) Context: https://github.com/neondatabase/neon/pull/4441#discussion_r1228086608	2023-06-16 14:23:55 +03:00
Dmitry Rodionov	472cc17b7a	propagate lock guard to background deletion task (#4495 ) ## Problem 1. During the rollout we got a panic: "timeline that we were deleting was concurrently removed from 'timelines' map" that was caused by lock guard not being propagated to the background part of the deletion. Existing test didnt catch it because failpoint that was used for verification was placed earlier prior to background task spawning. 2. When looking at surrounding code one more bug was detected. We removed timeline from the map before deletion is finished, which breaks client retry logic, because it will indicate 404 before actual deletion is completed which can lead to client stopping its retry poll earlier. ## Summary of changes 1. Carry the lock guard over to background deletion. Ensure existing test case fails without applied patch (second deletion becomes stuck without it, which eventually leads to a test failure). 2. Move delete_all call earlier so timeline is removed from the map is the last thing done during deletion. Additionally I've added timeline_id to the `update_gc_info` span, because `debug_assert_current_span_has_tenant_and_timeline_id` in `download_remote_layer` was firing when `update_gc_info` lead to on-demand downloads via `find_lsn_for_timestamp` (caught by @problame). This is not directly related to the PR but fixes possible flakiness. Another smaller set of changes involves deletion wrapper used in python tests. Now there is a simpler wrapper that waits for deletions to complete `timeline_delete_wait_completed`. Most of the test_delete_timeline.py tests make negative tests, i.e., "does ps_http.timeline_delete() fail in this and that scenario". These can be left alone. Other places when we actually do the deletions, we need to use the helper that polls for completion. Discussion https://neondb.slack.com/archives/C03F5SM1N02/p1686668007396639 resolves #4496 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-06-15 17:30:12 +03:00
Arthur Petukhovsky	76413a0fb8	Revert reconnect_timeout to improve performance (#4512 ) Default value for `wal_acceptor_reconnect_timeout` was changed in https://github.com/neondatabase/neon/pull/4428 and it affected performance up to 20% in some cases. Revert the value back.	2023-06-15 15:26:59 +03:00
Alexander Bayandin	e60b70b475	Fix data ingestion scripts (#4515 ) ## Problem When I switched `psycopg2.connect` from context manager to a regular function call in https://github.com/neondatabase/neon/pull/4382 I embarrassingly forgot about commit, so it doesn't really put data into DB 😞 ## Summary of changes - Enable autocommit for data ingestion scripts	2023-06-15 15:01:06 +03:00
Alex Chi Z	2252c5c282	metrics: convert some metrics to pageserver-level (#4490 ) ## Problem Some metrics are better to be observed at page-server level. Otherwise, as we have a lot of tenants in production, we cannot do a sum b/c Prometheus has limit on how many time series we can aggregate. This also helps reduce metrics scraping size. ## Summary of changes Some integration tests are likely not to pass as it will check the existence of some metrics. Waiting for CI complete and fix them. Metrics downgraded: page cache hit (where we are likely to have a page-server level page cache in the future instead of per-tenant), and reconstruct time (this would better be tenant-level, as we have one pg replayer for each tenant, but now we make it page-server level as we do not need that fine-grained data). --------- Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-06-14 17:12:34 -04:00
Alexander Bayandin	94f315d490	Remove neon-image-depot job (#4506 ) ## Problem `neon-image-depot` is an experimental job we use to compare with the main `neon-image` job. But it's not stable and right now we don't have the capacity to properly fix and evaluate it. We can come back to this later. ## Summary of changes Remove `neon-image-depot` job	2023-06-14 19:03:09 +01:00
Christian Schwarz	cd3faa8c0c	test_basic_eviction: avoid some sources of flakiness (#4504 ) We've seen the download_layer() call return 304 in prod because of a spurious on-demand download caused by a GetPage request from compute. Avoid these and some other sources of on-demand downloads by shutting down compute, SKs, and by disabling background loops. CF https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4498/5258914461/index.html#suites/2599693fa27db8427603ba822bcf2a20/357808fd552fede3	2023-06-14 19:04:22 +02:00
Arthur Petukhovsky	a7a0c3cd27	Invalidate proxy cache in http-over-sql (#4500 ) HTTP queries failed with errors `error connecting to server: failed to lookup address information: Name or service not known\n\nCaused by:\n failed to lookup address information: Name or service not known` The fix reused cache invalidation logic in proxy from usual postgres connections and added it to HTTP-over-SQL queries. Also removed a timeout for HTTP request, because it almost never worked on staging (50s+ time just to start the compute), and we can have the similar case in production. Should be ok, since we have a limits for the requests and responses.	2023-06-14 19:24:46 +03:00
Dmitry Rodionov	ee9a5bae43	Filter only active timelines for compaction (#4487 ) Previously we may've included Stopping/Broken timelines here, which leads to errors in logs -> causes tests to sporadically fail resolves #4467	2023-06-14 19:07:42 +03:00
Alexander Bayandin	9484b96d7c	GitHub Autocomment: do not fail the job (#4478 ) ## Problem If the script fails to generate a test summary, the step also fails the job/workflow (despite this could be a non-fatal problem). ## Summary of changes - Separate JSON parsing and summarisation into separate functions - Wrap functions calling into try..catch block, add an error message to GitHub comment and do not fail the step - Make `scripts/comment-test-report.js` a CLI script that can be run locally (mock GitHub calls) to make it easier to debug issues locally	2023-06-14 15:07:30 +01:00
Shany Pozin	ebee8247b5	Move s3 delete_objects to use chunks of 1000 OIDs (#4463 ) See https://github.com/neondatabase/neon/pull/4461#pullrequestreview-1474240712	2023-06-14 15:38:01 +03:00
bojanserafimov	3164ad7052	compute_ctl: Spec parser forward compatibility test (#4494 )	2023-06-13 21:48:09 -04:00
Alexander Bayandin	a0b3990411	Retry data ingestion scripts on connection errors (#4382 ) ## Problem From time to time, we're catching a race condition when trying to upload perf or regression test results. Ref: - https://neondb.slack.com/archives/C03H1K0PGKH/p1685462717870759 - https://github.com/neondatabase/cloud/issues/3686 ## Summary of changes Wrap `psycopg2.connect` method with `@backoff.on_exception` contextmanager	2023-06-13 22:33:42 +01:00
Stas Kelvich	4385e0c291	Return more RowDescription fields via proxy json endpoint As we aim to align client-side behavior with node-postgres, it's necessary for us to return these fields, because node-postgres does so as well.	2023-06-13 22:31:18 +03:00
Christian Schwarz	3693d1f431	turn Timeline::layers into tokio::sync::RwLock (#4441 ) This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`). # Full Stack Of Preliminary PRs Thanks to the countless preliminary PRs, this conversion is relatively straight-forward. 1. Clean-ups * https://github.com/neondatabase/neon/pull/4316 * https://github.com/neondatabase/neon/pull/4317 * https://github.com/neondatabase/neon/pull/4318 * https://github.com/neondatabase/neon/pull/4319 * https://github.com/neondatabase/neon/pull/4321 * Note: these were mostly to find an alternative to #4291, which I thought we'd need in my original plan where we would need to convert `Tenant::timelines` into an async locking primitive (#4333). In reviews, we walked away from that, but these cleanups were still quite useful. 2. https://github.com/neondatabase/neon/pull/4364 3. https://github.com/neondatabase/neon/pull/4472 4. https://github.com/neondatabase/neon/pull/4476 5. https://github.com/neondatabase/neon/pull/4477 6. https://github.com/neondatabase/neon/pull/4485 # Significant Changes In This PR ## `compact_level0_phase1` & `create_delta_layer` This commit partially reverts "pgserver: spawn_blocking in compaction (#4265)" `4e359db4c7`. Specifically, it reverts the `spawn_blocking`-ificiation of `compact_level0_phase1`. If we didn't revert it, we'd have to use `Timeline::layers.blocking_read()` inside `compact_level0_phase1`. That would use up a thread in the `spawn_blocking` thread pool, which is hard-capped. I considered wrapping the code that follows the second `layers.read().await` into `spawn_blocking`, but there are lifetime issues with `deltas_to_compact`. Also, this PR switches the `create_delta_layer` _function_ back to async, and uses `spawn_blocking` inside to run the code that does sync IO, while keeping the code that needs to lock `Timeline::layers` async. ## `LayerIter` and `LayerKeyIter` `Send` bounds I had to add a `Send` bound on the `dyn` type that `LayerIter` and `LayerKeyIter` wrap. Why? Because we now have the second `layers.read().await` inside `compact_level0_phase`, and these iterator instances are held across that await-point. More background: https://github.com/neondatabase/neon/pull/4462#issuecomment-1587376960 ## `DatadirModification::flush` Needed to replace the `HashMap::retain` with a hand-rolled variant because `TimelineWriter::put` is now async.	2023-06-13 18:38:41 +02:00
Christian Schwarz	fdf7a67ed2	init_empty_layer_map: use `try_write` (#4485 ) This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`). Or more specifically, #4441, where we turn Timeline::layers into a tokio::sync::RwLock. By using try_write() here, we can avoid turning init_empty_layer_map async, which is nice because much of its transitive call(er) graph isn't async.	2023-06-13 13:49:40 +02:00
Alexey Kondratov	1299df87d2	[compute_ctl] Fix logging if catalog updates are skipped (#4480 ) Otherwise, it wasn't clear from the log when Postgres started up completely if catalog updates were skipped. Follow-up for `4936ab6`	2023-06-13 13:34:56 +02:00
Christian Schwarz	754ceaefac	make TimelineWriter `Send` by using `tokio::sync Mutex` internally (#4477 ) This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`). There, we want to switch `Timeline::layers` to be a `tokio::sync::RwLock`. That will require the `TimelineWriter` to become async, because at times its functions need to lock `Timeline::layers` in order to freeze the open layer. While doing that, rustc complains that we're now holding `Timeline::write_lock` across await points (lock order is that `write_lock` must be acquired before `Timelines::layers`). So, we need to switch it over to an async primitive.	2023-06-13 10:15:25 +02:00
Arseny Sher	143fa0da42	Remove timeout on test_close_on_connections_exit We have 300s timeout on all tests, and doubling logic in popen.wait sometimes exceeds 5s, making the test flaky. ref https://github.com/neondatabase/neon/issues/4211	2023-06-13 06:26:03 +04:00
bojanserafimov	4936ab6842	compute_ctl: add flag to avoid config step (#4457 ) Add backwards-compatible flag that cplane can use to speed up startup time	2023-06-12 13:57:02 -04:00
Christian Schwarz	939593d0d3	refactor check_checkpoint_distance to prepare for async Timeline::layers (#4476 ) This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`). There, we want to switch `Timeline::layers` to be a `tokio::sync::RwLock`. That will require the `TimelineWriter` to become async. That will require `freeze_inmem_layer` to become async. So, inside check_checkpoint_distance, we will have `freeze_inmem_layer().await`. But current rustc isn't smart enough to understand that we `drop(layers)` earlier, and hence, will complain about the `!Send` `layers` being held across the `freeze_inmem_layer().await`-point. This patch puts the guard into a scope, so rustc will shut up in the next patch where we make the transition for `TimelineWriter`. obsoletes https://github.com/neondatabase/neon/pull/4474	2023-06-12 17:45:56 +01:00
Christian Schwarz	2011cc05cd	make Delta{Value,Key}Iter Send (#4472 ) ... by switching the internal RwLock to a OnceCell. This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`). See https://github.com/neondatabase/neon/pull/4462#issuecomment-1587398883 for more context. fixes https://github.com/neondatabase/neon/issues/4471	2023-06-12 17:45:56 +01:00
Arthur Petukhovsky	b0286e3c46	Always truncate WAL after restart (#4464 ) `c058e1cec2` skipped `truncate_wal()` it if `write_lsn` is equal to truncation position, but didn't took into account that `write_lsn` is reset on restart. Fixes regression looking like: ``` ERROR WAL acceptor{cid=22 ...}:panic{thread=WAL acceptor 19b6c1743666ec02991a7633c57178db/b07db8c88f4c76ea5ed0954c04cc1e74 location=safekeeper/src/wal_storage.rs:230:13}: unexpected write into non-partial segment file ``` This fix will prevent skipping WAL truncation when we are running for the first time after restart.	2023-06-12 13:42:28 +00:00
Heikki Linnakangas	e4f05ce0a2	Enable sanity check that disk_consistent_lsn is valid on created timeline. Commit `create_test_timeline: always put@initdb_lsn the minimum required keys` already switched us over to using valid initdb_lsns. All that's left to do is to actually flush the minimum keys so that we move from disk_consistent_lsn=Lsn(0) to disk_consistent_lsn=initdb_lsn. Co-authored-by: Christian Schwarz <christian@neon.tech> Part of https://github.com/neondatabase/neon/pull/4364	2023-06-12 11:56:49 +01:00
Heikki Linnakangas	8d106708d7	Clean up timeline initialization code. Clarify who's responsible for initializing the layer map. There were previously two different ways to do it: - create_empty_timeline and bootstrap_timeline let prepare_timeline() initialize an empty layer map. - branch_timeline passed a flag to initialize_with_lock() to tell initialize_with_lock to call load_layer_map(). Because it was a newly created timeline, load_layer_map() never found any layer files, so it just initialized an empty layer map. With this commit, prepare_new_timeline() always does it. The LSN to initialize it with is passed as argument. Other changes per function: prepare_timeline: - rename to 'prepare_new_timeline' to make it clear that it's only used when creating a new timeline, not when loading an existing timeline - always initialize an empty layer map. The caller can pass the LSN to initialize it with. (Previously, prepare_timeline would optionally load the layer map at 'initdb_lsn'. Some caller used that, while others let initialize_with_lock do it initialize_with_lock: - As mentioned above, remove the option to load the layer map - Acquire the 'timelines' lock in the function itself. None of the callers did any other work while holding the lock. - Rename it to finish_creation() to make its intent more clear. It's only used when creating a new timeline now. create_timeline_data: - Rename to create_timeline_struct() for clarity. It just initializes the Timeline struct, not any other "data" create_timeline_files: - use create_dir rather than create_dir_all, to be a little more strict. We know that the parent directory should already exist, and the timeline directory should not exist. - Move the call to create_timeline_struct() to the caller. It was just being "passed through" Part of https://github.com/neondatabase/neon/pull/4364	2023-06-12 11:56:49 +01:00
Christian Schwarz	f450369b20	timeline_init_and_sync: don't hold Tenant::timelines while load_layer_map This patch inlines `initialize_with_lock` and then reorganizes the code such that we can `load_layer_map` without holding the `Tenant::timelines` lock. As a nice aside, we can get rid of the dummy() uninit mark, which has always been a terrible hack. Part of https://github.com/neondatabase/neon/pull/4364	2023-06-12 11:56:49 +01:00
Christian Schwarz	aad918fb56	create_test_timeline: tests for put@initdb_lsn optimization code	2023-06-12 11:04:49 +01:00
Christian Schwarz	86dd8c96d3	add infrastructure to expect use of initdb_lsn flush optimization	2023-06-12 11:04:49 +01:00
Christian Schwarz	6a65c4a4fe	create_test_timeline: always put@initdb_lsn the minimum required keys (#4451 ) See the added comment on `create_empty_timeline`. The various test cases now need to set a valid `Lsn` instead of `Lsn(0)`. Rough context: https://github.com/neondatabase/neon/pull/4364#discussion_r1221995691	2023-06-12 09:28:34 +00:00
Vadim Kharitonov	e9072ee178	Compile rdkit (#4442 ) `rdkit` extension ``` postgres=# create extension rdkit; CREATE EXTENSION postgres=# select 'c1[o,s]ncn1'::qmol; qmol ------------- c1[o,s]ncn1 (1 row) ```	2023-06-12 11:13:33 +02:00
Joonas Koivunen	7e17979d7a	feat: http request logging on safekeepers. With RequestSpan, successfull GETs are not logged, but all others, errors and warns on cancellations are.	2023-06-11 22:53:08 +04:00
Arseny Sher	227271ccad	Switch safekeepers to async. This is a full switch, fs io operations are also tokio ones, working through thread pool. Similar to pageserver, we have multiple runtimes for easier `top` usage and isolation. Notable points: - Now that guts of safekeeper.rs are full of .await's, we need to be very careful not to drop task at random point, leaving timeline in unclear state. Currently the only writer is walreceiver and we don't have top level cancellation there, so we are good. But to be safe probably we should add a fuse panicking if task is being dropped while operation on a timeline is in progress. - Timeline lock is Tokio one now, as we do disk IO under it. - Collecting metrics got a crutch: since prometheus Collector is synchronous, it spawns a thread with current thread runtime collecting data. - Anything involving closures becomes significantly more complicated, as async fns are already kinda closures + 'async closures are unstable'. - Main thread now tracks other main tasks, which got much easier. - The only sync place left is initial data loading, as otherwise clippy complains on timeline map lock being held across await points -- which is not bad here as it happens only in single threaded runtime of main thread. But having it sync doesn't hurt either. I'm concerned about performance of thread pool io offloading, async traits and many await points; but we can try and see how it goes. fixes https://github.com/neondatabase/neon/issues/3036 fixes https://github.com/neondatabase/neon/issues/3966	2023-06-11 22:53:08 +04:00
dependabot[bot]	fbf0367e27	build(deps): bump cryptography from 39.0.1 to 41.0.0 (#4409 )	2023-06-11 19:14:30 +01:00
Arthur Petukhovsky	a21b55fe0b	Use connect_timeout for broker::connect (#4452 ) Use `storage_broker::connect` everywhere. Add a default 5 seconds timeout for opening new connection.	2023-06-09 17:38:53 +03:00
Shany Pozin	add51e1372	Add delete_objects to storage api (#4449 ) ## Summary of changes Add missing delete_objects API to support bulk deletes	2023-06-09 13:23:12 +03:00
Alex Chi Z	cdce04d721	pgserver: add local manifest for atomic operation (#4422 ) ## Problem Part of https://github.com/neondatabase/neon/issues/4418 ## Summary of changes This PR implements the local manifest interfaces. After the refactor of timeline is done, we can integrate this with the current storage. The reader will stop at the first corrupted record. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com> Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>	2023-06-08 19:34:25 -04:00
bojanserafimov	6bac770811	Add cold start test (#4436 )	2023-06-08 18:11:33 -04:00
Stas Kelvich	c82d19d8d6	Fix NULLs handling in proxy json endpoint There were few problems with null handling: * query_raw_txt() accepted vector of string so it always (erroneously) treated "null" as a string instead of null. Change rust pg client to accept the vector of Option<String> instead of just Strings. Adopt coding here to pass nulls as None. * pg_text_to_json() had a check that always interpreted "NULL" string as null. That is wrong and nulls were already handled by match None. This bug appeared as a bad attempt to parse arrays containing NULL elements. Fix coding by checking presence of quotes while parsing an array (no quotes -> null, quoted -> "null" string). Array parser fix also slightly changes behavior by always cleaning current entry when pushing to the resulting vector. This seems to be an omission by previous coding, however looks like it was harmless as entry was not cleared only at the end of the nested or to-level array.	2023-06-08 16:00:18 +03:00
Stas Kelvich	d73639646e	Add more output options to proxy json endpoint With this commit client can pass following optional headers: `Neon-Raw-Text-Output: true`. Return postgres values as text, without parsing them. So numbers, objects, booleans, nulls and arrays will be returned as text. That can be useful in cases when client code wants to implement it's own parsing or reuse parsing libraries from e.g. node-postgres. `Neon-Array-Mode: true`. Return postgres rows as arrays instead of objects. That is more compact representation and also helps in some edge cases where it is hard to use rows represented as objects (e.g. when several fields have the same name).	2023-06-08 16:00:18 +03:00
Dmitry Rodionov	d53f9ab3eb	delete timelines from s3 (#4384 ) Delete data from s3 when timeline deletion is requested ## Summary of changes UploadQueue is altered to support scheduling of delete operations in stopped state. This looks weird, and I'm thinking whether there are better options/refactorings for upload client to make it look better. Probably can be part of https://github.com/neondatabase/neon/issues/4378 Deletion is implemented directly in existing endpoint because changes are not that significant. If we want more safety we can separate those or create feature flag for new behavior. resolves [#4193](https://github.com/neondatabase/neon/issues/4193) --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-06-08 15:01:22 +03:00
Dmitry Rodionov	8560a98d68	fix openapi spec to pass swagger editor validation (#4445 ) There shouldnt be a dash before `type: object`. Also added description.	2023-06-08 13:25:30 +03:00
Alex Chi Z	2e687bca5b	refactor: use LayerDesc in layer map (part 1) (#4408 ) ## Problem part of https://github.com/neondatabase/neon/issues/4392 ## Summary of changes This PR adds a new HashMap that maps persistent layer desc to the layer object inside LayerMap. Originally I directly went towards adding such layer cache in Timeline, but the changes are too many and cannot be reviewed as a reasonably-sized PR. Therefore, we take this intermediate step to change part of the codebase to use persistent layer desc, and come up with other PRs to move this hash map of layer desc to the timeline struct. Also, file_size is now part of the layer desc. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com> Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>	2023-06-07 18:28:18 +03:00
Dmitry Rodionov	1a1019990a	map TenantState::Broken to TenantAttachmentStatus::Failed (#4371 ) ## Problem Attach failures are not reported in public part of the api (in `attachment_status` field of TenantInfo). ## Summary of changes Expose TenantState::Broken as TenantAttachmentStatus::Failed In the way its written Failed status will be reported even if no attachment happened. (I e if tenant become broken on startup). This is in line with other members. I e Active will be resolved to Attached even if no actual attach took place. This can be tweaked if needed. At the current stage it would be overengineering without clear motivation resolves #4344	2023-06-07 18:25:30 +03:00
Alex Chi Z	1c200bd15f	fix: break dev dependencies between wal_craft and pg_ffi (#4424 ) ## Problem close https://github.com/neondatabase/neon/issues/4266 ## Summary of changes With this PR, rust-analyzer should be able to give lints and auto complete in `mod tests`, and this makes writing tests easier. Previously, rust-analyzer cannot do auto completion. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-06-07 17:51:13 +03:00
Arseny Sher	37bf2cac4f	Persist safekeeper control file once in a while. It should make remote_consistent_lsn commonly up-to-date on non actively writing projects, which removes spike or pageserver -> safekeeper reconnections on storage nodes restart.	2023-06-07 17:23:37 +04:00
Joonas Koivunen	5761190e0d	feat: three phased startup order (#4399 ) Initial logical size calculation could still hinder our fast startup efforts in #4397. See #4183. In deployment of 2023-06-06 about a 200 initial logical sizes were calculated on hosts which took the longest to complete initial load (12s). Implements the three step/tier initialization ordering described in #4397: 1. load local tenants 2. do initial logical sizes per walreceivers for 10s 3. background tasks Ordering is controlled by: - waiting on `utils::completion::Barrier`s on background tasks - having one attempt for each Timeline to do initial logical size calculation - `pageserver/src/bin/pageserver.rs` releasing background jobs after timeout or completion of initial logical size calculation The timeout is there just to safeguard in case a legitimate non-broken timeline initial logical size calculation goes long. The timeout is configurable, by default 10s, which I think would be fine for production systems. In the test cases I've been looking at, it seems that these steps are completed as fast as possible. Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-06-07 14:29:23 +03:00
Vadim Kharitonov	88f0cfc575	Fix `pgx_ulid` extension (#4431 ) The issue was in the wrong `control` file name	2023-06-07 11:41:53 +02:00
Arseny Sher	6b3c020cd9	Don't warn on system id = 0 in walproposer greeting. sync-safekeepers doesn't know it and sends 0.	2023-06-07 12:39:20 +04:00
Arseny Sher	c058e1cec2	Quick exit in truncate_wal if nothing to do. ref https://github.com/neondatabase/neon/issues/4414	2023-06-07 12:39:20 +04:00
Arseny Sher	dc6a382873	Increase timeouts on compute -> sk connections. context: https://github.com/neondatabase/neon/issues/4414 And improve messages/comments here and there.	2023-06-07 12:39:20 +04:00
Heikki Linnakangas	df3bae2ce3	Use `compute_ctl` to manage Postgres in tests. (#3886 ) This adds test coverage for 'compute_ctl', as it is now used by all the python tests. There are a few differences in how 'compute_ctl' is called in the tests, compared to the real web console: - In the tests, the postgresql.conf file is included as one large string in the spec file, and it is written out as it is to the data directory. I added a new field for that to the spec file. The real web console, however, sets all the necessary settings in the 'settings' field, and 'compute_ctl' creates the postgresql.conf from those settings. - In the tests, the information needed to connect to the storage, i.e. tenant_id, timeline_id, connection strings to pageserver and safekeepers, are now passed as new fields in the spec file. The real web console includes them as the GUCs in the 'settings' field. (Both of these are different from what the test control plane used to do: It used to write the GUCs directly in the postgresql.conf file). The plan is to change the control plane to use the new method, and remove the old method, but for now, support both. Some tests that were sensitive to the amount of WAL generated needed small changes, to accommodate that compute_ctl runs the background health monitor which makes a few small updates. Also some tests shut down the pageserver, and now that the background health check can run some queries while the pageserver is down, that can produce a few extra errors in the logs, which needed to be allowlisted. Other changes: - remove obsolete comments about PostgresNode; - create standby.signal file for Static compute node; - log output of `compute_ctl` and `postgres` is merged into `endpoints/compute.log`. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2023-06-06 14:59:36 +01:00
Joonas Koivunen	0cef7e977d	refactor: just one way to shutdown a tenant (#4407 ) We have 2 ways of tenant shutdown, we should have just one. Changes are mostly mechanical simple refactorings. Added `warn!` on the "shutdown all remaining tasks" should trigger test failures in the between time of not having solved the "tenant/timeline owns all spawned tasks" issue. Cc: #4327.	2023-06-06 15:30:55 +03:00
Joonas Koivunen	18a9d47f8e	test: restore NotConnected being allowed globally (#4426 ) Flakyness introduced by #4402 evidence [^1]. I had assumed the NotConnected would had been an expected io error, but it's not. Restore the global `allowed_error`. [^1]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4407/5185897757/index.html#suites/82004ab4e3720b47bf78f312dabe7c55/14f636d0ecd3939d/	2023-06-06 13:51:39 +03:00
Sasha Krassovsky	ac11e7c32d	Remove arch-specific stuff from HNSW extension (#4423 )	2023-06-05 22:04:15 -08:00
Konstantin Knizhnik	8e1b5e1224	Remove -ftree-vectorizer-verbose=0 option notrecognized by MaxOS/X c… (#4412 ) …ompiler ## Problem ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-06-05 20:10:19 +03:00
Joonas Koivunen	e0bd81ce1f	test: fix flaky warning on attach (#4415 ) added the `allowed_error` to the `positive_env` so any tests completing the attach are allowed have this print out. they are allowed to do so, because the `random_init_delay` can produce close to zero and thus the first run will be near attach. Though... Unsure if we ever really need the eviction task to run before it can evict something, as in after 20min or 24h. in the failed test case however period is 20s so interesting that we didn't run into this sooner. evidence of flaky: https://github.com/neondatabase/neon/actions/runs/5175677035/jobs/9323705929?pr=4399#step:4:38536	2023-06-05 18:12:58 +03:00
Joonas Koivunen	77598f5d0a	Better walreceiver logging (#4402 ) walreceiver logs are a bit hard to understand because of partial span usage, extra messages, ignored errors popping up as huge stacktraces. Fixes #3330 (by spans, also demote info -> debug). - arrange walreceivers spans into a hiearchy: - `wal_connection_manager{tenant_id, timeline_id}` -> `connection{node_id}` -> `poller` - unifies the error reporting inside `wal_receiver`: - All ok errors are now `walreceiver connection handling ended: {e:#}` - All unknown errors are still stacktraceful task_mgr reported errors with context `walreceiver connection handling failure` - Remove `connect` special casing, was: `DB connection stream finished` for ok errors - Remove `done replicating` special casing, was `Replication stream finished` for ok errors - lowered log levels for (non-exhaustive list): - `WAL receiver manager started, connecting to broker` (at startup) - `WAL receiver shutdown requested, shutting down` (at shutdown) - `Connection manager loop ended, shutting down` (at shutdown) - `sender is dropped while join handle is still alive` (at lucky shutdown, see #2885) - `timeline entered terminal state {:?}, stopping wal connection manager loop` (at shutdown) - `connected!` (at startup) - `Walreceiver db connection closed` (at disconnects?, was without span) - `Connection cancelled` (at shutdown, was without span) - `observed timeline state change, new state is {new_state:?}` (never after Timeline::activate was made infallible) - changed: - `Timeline dropped state updates sender, stopping wal connection manager loop` - was out of date; sender is not dropped but `Broken \| Stopping` state transition - also made `debug!` - `Timeline dropped state updates sender before becoming active, stopping wal connection manager loop` - was out of date: sender is again not dropped but `Broken \| Stopping` state transition - also made `debug!` - log fixes: - stop double reporting panics via JoinError	2023-06-05 17:35:23 +03:00
Joonas Koivunen	8142edda01	test: Less flaky gc (#4416 ) Solves a flaky test error in the wild[^1] by: - Make the gc shutdown signal reading an `allowed_error` - Note the gc shutdown signal readings as being in `allowed_error`s - Allow passing tenant conf to init_start to avoid unncessary tenants [^1]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4399/5176432780/index.html#suites/b97efae3a617afb71cb8142f5afa5224/2cd76021ea011f93	2023-06-05 15:43:52 +03:00
Vadim Kharitonov	b9871158ba	Compile PGX ULID extension (#4413 ) Create pgx_ulid extension ``` postgres=# create extension ulid; CREATE EXTENSION postgres=# CREATE TABLE users ( id ulid NOT NULL DEFAULT gen_ulid() PRIMARY KEY, name text NOT NULL ); CREATE TABLE postgres=# insert into users (name) values ('vadim'); INSERT 0 1 postgres=# select * from users; id \| name ----------------------------+------- 01H25DDG3KYMYZTNR41X38E256 \| vadim ```	2023-06-05 12:52:13 +03:00
Joonas Koivunen	8caef2c0c5	fix: delay `eviction_task` as well (#4397 ) As seen on deployment of 2023-06-01 release, times were improving but there were some outliers caused by: - timelines `eviction_task` starting while activating and running imitation - timelines `initial logical size` calculation This PR fixes it so that `eviction_task` is delayed like other background tasks fixing an oversight from earlier #4372. After this PR activation will be two phases: 1. load and activate tenants AND calculate some initial logical sizes 2. rest of initial logical sizes AND background tasks - compaction, gc, disk usage based eviction, timelines `eviction_task`, consumption metrics	2023-06-05 09:37:53 +03:00
Konstantin Knizhnik	04542826be	Add HNSW extension (#4227 ) ## Describe your changes Port HNSW implementation for ANN search top Postgres ## Issue ticket number and link https://www.pinecone.io/learn/hnsw ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-06-04 11:41:38 +03:00
bojanserafimov	4ba950a35a	Add libcurl as dependency to readme (#4405 )	2023-06-02 18:07:45 -04:00
Joonas Koivunen	a55c663848	chore: comment marker fixes (#4406 ) Upgrading to rust 1.70 will require these.	2023-06-02 21:03:12 +03:00
Heikki Linnakangas	9787227c35	Shield HTTP request handlers from async cancellations. (#4314 ) We now spawn a new task for every HTTP request, and wait on the JoinHandle. If Hyper drops the Future, the spawned task will keep running. This protects the rest of the pageserver code from unexpected async cancellations. This creates a CancellationToken for each request and passes it to the handler function. If the HTTP request is dropped by the client, the CancellationToken is signaled. None of the handler functions make use for the CancellationToken currently, but they now they could. The CancellationToken arguments also work like documentation. When you're looking at a function signature and you see that it takes a CancellationToken as argument, it's a nice hint that the function might run for a long time, and won't be async cancelled. The default assumption in the pageserver is now that async functions are not cancellation-safe anyway, unless explictly marked as such, but this is a nice extra reminder. Spawning a task for each request is OK from a performance point of view because spawning is very cheap in Tokio, and none of our HTTP requests are very performance critical anyway. Fixes issue #3478	2023-06-02 08:28:13 -04:00
Joonas Koivunen	ef80a902c8	pg_sni_router: add session_id to more messages (#4403 ) See superceded #4390. - capture log in test - expand the span to cover init and error reporting - remove obvious logging by logging only unexpected	2023-06-02 14:59:10 +03:00
Alex Chi Z	66cdba990a	refactor: use PersistentLayerDesc for persistent layers (#4398 ) ## Problem Part of https://github.com/neondatabase/neon/issues/4373 ## Summary of changes This PR adds `PersistentLayerDesc`, which will be used in LayerMap mapping and probably layer cache. After this PR and after we change LayerMap to map to layer desc, we can safely drop RemoteLayerDesc. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com> Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>	2023-06-01 22:06:28 +03:00
Alex Chi Z	82484e8241	pgserver: add more metrics for better observability (#4323 ) ## Problem This PR includes doc changes to the current metrics as well as adding new metrics. With the new set of metrics, we can quantitatively analyze the read amp., write amp. and space amp. in the system, when used together with https://github.com/neondatabase/neonbench close https://github.com/neondatabase/neon/issues/4312 ref https://github.com/neondatabase/neon/issues/4347 compaction metrics TBD, a novel idea is to print L0 file number and number of layers in the system, and we can do this in the future when we start working on compaction. ## Summary of changes * Add `READ_NUM_FS_LAYERS` for computing read amp. * Add `MATERIALIZED_PAGE_CACHE_HIT_UPON_REQUEST`. * Add `GET_RECONSTRUCT_DATA_TIME`. GET_RECONSTRUCT_DATA_TIME + RECONSTRUCT_TIME + WAIT_LSN_TIME should be approximately total time of reads. * Add `5.0` and `10.0` to `STORAGE_IO_TIME_BUCKETS` given some fsync runs slow (i.e., > 1s) in some cases. * Some `WAL_REDO` metrics are only used when Postgres is involved in the redo process. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-06-01 21:46:04 +03:00
Joonas Koivunen	36fee50f4d	compute_ctl: enable tracing panic hook (#4375 ) compute_ctl can panic, but `tracing` is used for logging. panic stderr output can interleave with messages from normal logging. The fix is to use the established way (pageserver, safekeeper, storage_broker) of using `tracing` to report panics.	2023-06-01 20:12:07 +03:00
bojanserafimov	330083638f	Fix stale and misleading comment in LayerMap (#4297 )	2023-06-01 05:04:46 +03:00
Konstantin Knizhnik	952d6e43a2	Add pageserver parameter forced_image_creation_limit which can be used… (#4353 ) This parameter can be use to restrict number of image layers generated because of GC request (wanted image layers). Been set to zero it completely eliminates creation of such image layers. So it allows to avoid extra storage consumption after merging #3673 ## Problem PR #3673 forces generation of missed image layers. So i short term is cause cause increase (in worst case up to two times) size of storage. It was intended (by me) that GC period is comparable with PiTR interval. But looks like it is not the case now - GC is performed much more frequently. It may cause the problem with space exhaustion: GC forces new image creation while large PiTR still prevent GC from collecting old layers. ## Summary of changes Add new pageserver parameter` forced_image_creation_limit` which restrict number of created image layers which are requested by GC. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-05-31 21:37:20 +03:00
bojanserafimov	b6447462dc	Fix layer map correctness bug (#4342 )	2023-05-31 12:23:00 -04:00
Dmitry Rodionov	b190c3e6c3	reduce flakiness by allowing Compaction failed, retrying in X queue is in state Stopped. (#4379 ) resolves https://github.com/neondatabase/neon/issues/4374 by adding the error to allowed_errors	2023-05-30 20:11:44 +03:00
Joonas Koivunen	f4db85de40	Continued startup speedup (#4372 ) Startup continues to be slow, work towards to alleviate it. Summary of changes: - pretty the functional improvements from #4366 into `utils::completion::{Completion, Barrier}` - extend "initial load completion" usage up to tenant background tasks - previously only global background tasks - spawn_blocking the tenant load directory traversal - demote some logging - remove some unwraps - propagate some spans to `spawn_blocking` Runtime effects should be major speedup to loading, but after that, the `BACKGROUND_RUNTIME` will be blocked for a long time (minutes). Possible follow-ups: - complete initial tenant sizes before allowing background tasks to block the `BACKGROUND_RUNTIME`	2023-05-30 16:25:07 +03:00
Arthur Petukhovsky	210be6b6ab	Replace broker duration logs with metrics (#4370 ) I've added logs for broker push duration after every iteration in https://github.com/neondatabase/neon/pull/4142. This log has not found any real issues, so we can replace it with metrics, to slightly reduce log volume. LogQL query found that pushes longer that 500ms happened only 90 times for the last month. https://neonprod.grafana.net/goto/KTNj9UwVg?orgId=1 `{unit="safekeeper.service"} \|= "timeline updates to broker in" \| regexp "to broker in (?P<duration>.*)" \| duration > 500ms`	2023-05-30 16:08:02 +03:00
Alexander Bayandin	daa79b150f	Code Coverage: store lcov report (#4358 ) ## Problem In the future, we want to compare code coverage on a PR with coverage on the main branch. Currently, we store only code coverage HTML reports, I suggest we start storing reports in "lcov info" format that we can use/parse in the future. Currently, the file size is ~7Mb (it's a text-based format and could be compressed into a ~400Kb archive) - More about "lcov info" format: https://manpages.ubuntu.com/manpages/jammy/man1/geninfo.1.html#files - Part of https://github.com/neondatabase/neon/issues/3543 ## Summary of changes - Change `scripts/coverage` to output lcov coverage to `report/lcov.info` file instead of stdout (we already upload the whole `report/` directory to S3)	2023-05-30 14:05:41 +01:00
Joonas Koivunen	db14355367	revert: static global init logical size limiter (#4368 ) added in #4366. revert for testing without it; it may have unintenteded side-effects, and it's very difficult to understand the results from the 10k load testing environments. earlier results: https://github.com/neondatabase/neon/pull/4366#issuecomment-1567491064	2023-05-30 10:40:37 +03:00
Joonas Koivunen	cb83495744	try: startup speedup (#4366 ) Startup can take a long time. We suspect it's the initial logical size calculations. Long term solution is to not block the tokio executors but do most of I/O in spawn_blocking. See: #4025, #4183 Short-term solution to above: - Delay global background tasks until initial tenant loads complete - Just limit how many init logical size calculations can we have at the same time to `cores / 2` This PR is for trying in staging.	2023-05-29 21:48:38 +03:00
Christian Schwarz	f4f300732a	refactor TenantState transitions (#4321 ) This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`). The motivation is to avoid locking `Tenant::timelines` in places that can't be `async`, because in #4333 we want to convert Tenant::timelines from `std::sync::Mutex` to `tokio::sync::Mutex`. But, the changes here are useful in general because they clean up & document tenant state transitions. That also paves the way for #4350, which is an alternative to #4333 that refactors the pageserver code so that we can keep the `Tenant::timelines` mutex sync. This patch consists of the following core insights and changes: * spawn_load and spawn_attach own the tenant state until they're done * once load()/attach() calls are done ... * if they failed, transition them to Broken directly (we know that there's no background activity because we didn't call activate yet) * if they succeed, call activate. We can make it infallible. How? Later. * set_broken() and set_stopping() are changed to wait for spawn_load() / spawn_attach() to finish. * This sounds scary because it might hinder detach or shutdown, but actually, concurrent attach+detach, or attach+shutdown, or load+shutdown, or attach+shutdown were just racy before this PR. So, with this change, they're not anymore. In the future, we can add a `CancellationToken` stored in Tenant to cancel `load` and `attach` faster, i.e., make `spawn_load` / `spawn_attach` transition them to Broken state sooner. See the doc comments on TenantState for the state transitions that are now possible. It might seem scary, but actually, this patch reduces the possible state transitions. We introduce a new state `TenantState::Activating` to avoid grabbing the `Tenant::timelines` lock inside the `send_modify` closure. These were the humble beginnings of this PR (see Motivation section), and I think it's still the right thing to have this `Activating` state, even if we decide against async `Tenant::timelines` mutex. The reason is that `send_modify` locks internally, and by moving locking of Tenant::timelines out of the closure, the internal locking of `send_modify` becomes a leaf of the lock graph, and so, we eliminate deadlock risk. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-05-29 17:52:41 +03:00
Em Sharnoff	ccf653c1f4	re-enable file cache integration for VM compute node (#4338 ) #4155 inadvertently switched to a version of the VM builder that leaves the file cache integration disabled by default. This re-enables the vm-informant's file cache integration. (as a refresher: The vm-informant is the autoscaling component that sits inside the VM and manages postgres / compute_ctl) See also: https://github.com/neondatabase/autoscaling/pull/265	2023-05-28 10:22:45 -07:00
Heikki Linnakangas	2d6a022bb8	Don't allow two timeline_delete operations to run concurrently. (#4313 ) If the timeline is already being deleted, return an error. We used to notice the duplicate request and error out in persist_index_part_with_deleted_flag(), but it's better to detect it earlier. Add an explicit lock for the deletion. Note: This doesn't do anything about the async cancellation problem (github issue #3478): if the original HTTP request dropped, because the client disconnected, the timeline deletion stops half-way through the operation. That needs to be fixed, too, but that's a separate story. (This is a simpler replacement for PR #4194. I'm also working on the cancellation shielding, see PR #4314.)	2023-05-27 15:55:43 +03:00
Heikki Linnakangas	2cdf07f12c	Refactor RequestSpan into a function. Previously, you used it like this: \|r\| RequestSpan(my_handler).handle(r) But I don't see the point of the RequestSpan struct. It's just a wrapper around the handler function. With this commit, the call becomes: \|r\| request_span(r, my_handler) Which seems a little simpler. At first I thought that the RequestSpan struct would allow "chaining" other kinds of decorators like RequestSpan, so that you could do something like this: \|r\| CheckPermissions(RequestSpan(my_handler)).handle(r) But it doesn't work like that. If each of those structs wrap a handler function, it would actually look like this: \|r\| CheckPermissions(\|r\| RequestSpan(my_handler).handle(r))).handle(r) This commit doesn't make that kind of chaining any easier, but seems a little more straightforward anyway.	2023-05-27 11:47:22 +03:00
Heikki Linnakangas	200a520e6c	Minor refactoring in RequestSpan Require the error type to be ApiError. It implicitly required that anyway, because the function used error::handler, which downcasted the error to an ApiError. If the error was in fact anything else than ApiError, it would just panic. Better to check it at compilation time. Also make the last-resort error handler more forgiving, so that it returns an 500 Internal Server error response, instead of panicking, if a request handler returns some other error than an ApiError.	2023-05-27 11:47:22 +03:00
Alex Chi Z	4e359db4c7	pgserver: spawn_blocking in compaction (#4265 ) Compaction is usually a compute-heavy process and might affect other futures running on the thread of the compaction. Therefore, we add `block_in_place` as a temporary solution to avoid blocking other futures on the same thread as compaction in the runtime. As we are migrating towards a fully-async-style pageserver, we can revert this change when everything is async and when we move compaction to a separate runtime. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-05-26 17:15:47 -04:00
Joonas Koivunen	be177f82dc	Revert "Allow for higher s3 concurrency (#4292 )" (#4356 ) This reverts commit `024109fbeb` for it failing to be speed up anything, but run into more errors. See: #3698.	2023-05-26 18:37:17 +03:00
Alexander Bayandin	339a3e3146	GitHub Autocomment: comment commits for branches (#4335 ) ## Problem GitHub Autocomment script posts a comment only for PRs. It's harder to debug failed tests on main or release branches. ## Summary of changes - Change the GitHub Autocomment script to be able to post a comment to either a PR or a commit of a branch	2023-05-26 14:49:42 +01:00
Heikki Linnakangas	a560b28829	Make new tenant/timeline IDs mandatory in create APIs. (#4304 ) We used to generate the ID, if the caller didn't specify it. That's bad practice, however, because network is never fully reliable, so it's possible we create a new tenant but the caller doesn't know about it, and because it doesn't know the tenant ID, it has no way of retrying or checking if it succeeded. To discourage that, make it mandatory. The web control plane has not relied on the auto-generation for a long time.	2023-05-26 16:19:36 +03:00
Joonas Koivunen	024109fbeb	Allow for higher s3 concurrency (#4292 ) We currently have a semaphore based rate limiter which we hope will keep us under S3 limits. However, the semaphore does not consider time, so I've been hesitant to raise the concurrency limit of 100. See #3698. The PR Introduces a leaky-bucket based rate limiter instead of the `tokio::sync::Semaphore` which will allow us to raise the limit later on. The configuration changes are not contained here.	2023-05-26 13:35:50 +03:00
Alexander Bayandin	2b25f0dfa0	Fix flakiness of test_metric_collection (#4346 ) ## Problem Test `test_metric_collection` become flaky: ``` AssertionError: assert not ['2023-05-25T14:03:41.644042Z ERROR metrics_collection: failed to send metrics: reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("localhost")), port: Some(18022), path: "/billing/api/v1/usage_events", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 99, kind: AddrNotAvailable, message: "Cannot assign requested address" })) }', ...] ``` I suspect it is caused by having 2 places when we define `httpserver_listen_address` fixture (which is internally used by `pytest-httpserver` plugin) ## Summary of changes - Remove the definition of `httpserver_listen_address` from `test_runner/regress/test_ddl_forwarding.py` and keep one in `test_runner/fixtures/neon_fixtures.py` - Also remote unused `httpserver_listen_address` parameter from `test_proxy_metric_collection`	2023-05-26 00:05:11 +03:00
Christian Schwarz	057cceb559	refactor: make timeline activation infallible (#4319 ) Timeline::activate() was only fallible because `launch_wal_receiver` was. `launch_wal_receiver` was fallible only because of some preliminary checks in `WalReceiver::start`. Turns out these checks can be shifted to the type system by delaying creatinon of the `WalReceiver` struct to the point where we activate the timeline. The changes in this PR were enabled by my previous refactoring that funneled the broker_client from pageserver startup to the activate() call sites. Patch series: - #4316 - #4317 - #4318 - #4319	2023-05-25 20:26:43 +02:00
sharnoff	ae805b985d	Bump vm-builder v0.7.3-alpha3 -> v0.8.0 (#4339 ) Routine `vm-builder` version bump, from autoscaling repo release. You can find the release notes here: https://github.com/neondatabase/autoscaling/releases/tag/v0.8.0 The changes are from v0.7.2 — most of them were already included in v0.7.3-alpha3. Of particular note: This (finally) fixes the cgroup issues, so we should now be able to scale up when we're about to run out of memory. NB: This has the effect of limit the DB's memory usage in a way it wasn't limited before. We may run into issues because of that. There is currently no way to disable that behavior, other than switching the endpoint back to the k8s-pod provisioner.	2023-05-25 09:33:18 -07:00
Joonas Koivunen	85e76090ea	test: fix ancestor is stopping flakyness (#4234 ) Flakyness most likely introduced in #4170, detected in https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4232/4980691289/index.html#suites/542b1248464b42cc5a4560f408115965/18e623585e47af33. Opted to allow it globally because it can happen in other tests as well, basically whenever compaction is enabled and we stop pageserver gracefully.	2023-05-25 16:22:58 +00:00
Alexander Bayandin	08e7d2407b	Storage: use Postgres 15 as default (#2809 )	2023-05-25 15:55:46 +01:00
Alex Chi Z	ab2757f64a	bump dependencies version (#4336 ) proceeding https://github.com/neondatabase/neon/pull/4237, this PR bumps AWS dependencies along with all other dependencies to the latest compatible semver. Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-05-25 10:21:15 -04:00
Christian Schwarz	e5617021a7	refactor: eliminate global storage_broker client state (#4318 ) (This is prep work to make `Timeline::activate` infallible.) This patch removes the global storage_broker client instance from the pageserver codebase. Instead, pageserver startup instantiates it and passes it down to the `Timeline::activate` function, which in turn passes it to the WalReceiver, which is the entity that actually uses it. Patch series: - #4316 - #4317 - #4318 - #4319	2023-05-25 16:47:42 +03:00
Christian Schwarz	83ba02b431	tenant_status: don't InternalServerError if tenant not found (#4337 ) Note this also changes the status code to the (correct) 404. Not sure if that's relevant to Console. Context: https://neondb.slack.com/archives/C04PSBP2SAF/p1684746238831449?thread_ts=1684742106.169859&cid=C04PSBP2SAF Atop #4300 because it cleans up the mgr::get_tenant() error type and I want eyes on that PR.	2023-05-25 11:38:04 +02:00
Christian Schwarz	37ecebe45b	mgr::get_tenant: distinguished error type (#4300 ) Before this patch, it would use error type `TenantStateError` which has many more error variants than can actually happen with `mgr::get_tenant`. Along the way, I also introduced `SetNewTenantConfigError` because it uses `mgr::get_tenant` and also can only fail in much fewer ways than `TenantStateError` suggests. The new `page_service.rs`'s `GetActiveTimelineError` and `GetActiveTenantError` types were necessary to avoid an `Other` variant on the `GetTenantError`. This patch is a by-product of reading code that subscribes to `Tenant::state` changes. Can't really connect it to any given project.	2023-05-25 11:37:12 +02:00
Sasha Krassovsky	6052ecee07	Add connector extension to send Role/Database updates to console (#3891 ) ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [x] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-05-25 12:36:57 +03:00
Christian Schwarz	e11ba24ec5	tenant loops: operate on the Arc<Tenant> directly (#4298 ) (Instead of going through mgr every iteration.) The `wait_for_active_tenant` function's `wait` argument could be removed because it was only used for the loop that waits for the tenant to show up in the tenants map. Since we're passing the tenant in, we now longer need to get it from the tenants map. NB that there's no guarantee that the tenant object is in the tenants map at the time the background loop function starts running. But the tenant mgr guarantees that it will be quite soon. See `tenant_map_insert` way upwards in the call hierarchy for details. This is prep work to eliminate `subscribe_for_state_updates` (PR #4299 ) Fixes: #3501	2023-05-25 10:49:09 +02:00
Alex Chi Z	f276f21636	ci: use eu-central-1 bucket (#4315 ) Probably increase CI success rate. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-05-25 00:00:21 +03:00
Alex Chi Z	7126197000	pagectl: refactor ctl and support dump kv in delta (#4268 ) This PR refactors the original page_binutils with a single tool pagectl, use clap derive for better command line parsing, and adds the dump kv tool to extract information from delta file. This helps me better understand what's inside the page server. We can add support for other types of file and more functionalities in the future. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-05-24 19:36:07 +03:00
Christian Schwarz	afc48e2cd9	refactor responsibility for tenant/timeline activation (#4317 ) (This is prep work to make `Timeline::activate()` infallible.) The current possibility for failure in `Timeline::activate()` is the broker client's presence / absence. It should be an assert, but we're careful with these. So, I'm planning to pass in the broker client to activate(), thereby eliminating the possiblity of its absence. In the unit tests, we don't have a broker client. So, I thought I'd be in trouble because the unit tests also called `activate()` before this PR. However, closer inspection reveals a long-standing FIXME about this, which is addressed by this patch. It turns out that the unit tests don't actually need the background loops to be running. They just need the state value to be `Active`. So, for the tests, we just set it to that value but don't spawn the background loops. We'll need to revisit this if we ever do more Rust unit tests in the future. But right now, this refactoring improves the code, so, let's revisit when we get there. Patch series: - #4316 - #4317 - #4318 - #4319	2023-05-24 16:54:11 +02:00
Christian Schwarz	df52587bef	attach-time tenant config (#4255 ) This PR adds support for supplying the tenant config upon /attach. Before this change, when relocating a tenant using `/detach` and `/attach`, the tenant config after `/attach` would be the default config from `pageserver.toml`. That is undesirable for settings such as the PITR-interval: if the tenant's config on the source was `30 days` and the default config on the attach-side is `7 days`, then the first GC run would eradicate 23 days worth of PITR capability. The API change is backwards-compatible: if the body is empty, we continue to use the default config. We'll remove that capability as soon as the cloud.git code is updated to use attach-time tenant config (https://github.com/neondatabase/neon/issues/4282 keeps track of this). unblocks https://github.com/neondatabase/cloud/issues/5092 fixes https://github.com/neondatabase/neon/issues/1555 part of https://github.com/neondatabase/neon/issues/886 (Tenant Relocation) Implementation ============== The preliminary PRs for this work were (most-recent to least-recent) * https://github.com/neondatabase/neon/pull/4279 * https://github.com/neondatabase/neon/pull/4267 * https://github.com/neondatabase/neon/pull/4252 * https://github.com/neondatabase/neon/pull/4235	2023-05-24 17:46:30 +03:00
Alexander Bayandin	35bb10757d	scripts/ingest_perf_test_result.py: increase connection timeout (#4329 ) ## Problem Sometimes default connection timeout is not enough to connect to the DB with perf test results, [an example](https://github.com/neondatabase/neon/actions/runs/5064263522/jobs/9091692868#step:10:332). Similar changes were made for similar scripts: - For `scripts/flaky_tests.py` in https://github.com/neondatabase/neon/pull/4096 - For `scripts/ingest_regress_test_result.py` in https://github.com/neondatabase/neon/pull/2367 (from the very begginning) ## Summary of changes - Connection timeout increased to 30s for `scripts/ingest_perf_test_result.py`	2023-05-24 10:11:24 -04:00
Alexander Bayandin	2a3f54002c	test_runner: update dependencies (#4328 ) ## Problem `pytest` 6 truncates error messages and this is not configured. It's fixed in `pytest` 7, it prints the whole message (truncating limit is higher) if `--verbose` is set (it's set on CI). ## Summary of changes - `pytest` and `pytest` plugins are updated to their latest versions - linters (`black` and `ruff`) are updated to their latest versions - `mypy` and types are updated to their latest versions, new warnings are fixed - while we're here, allure updated its latest version as well	2023-05-24 12:47:01 +01:00
Joonas Koivunen	f3769d45ae	chore: upgrade tokio to 1.28.1 (#4294 ) no major changes, but this is the most recent LTS release and will be required by #4292.	2023-05-24 08:15:39 +03:00
Arseny Sher	c200ebc096	proxy: log endpoint name everywhere. Checking out proxy logs for the endpoint is a frequent (often first) operation during user issues investigation; let's remove endpoint id -> session id mapping annoying extra step here.	2023-05-24 09:11:23 +04:00
Konstantin Knizhnik	417f37b2e8	Pass set of wanted image layers from GC to compaction (#3673 ) ## Describe your changes Right now the only criteria for image layer generation is number of delta layer since last image layer. If we have "stairs" layout of delta layers (see link below) then it can happen that there a lot of old delta layers which can not be reclaimed by GC because are not fully covered with image layers. This PR constructs list of "wanted" image layers in GC (which image layers are needed to be able to remove old layers) and pass this list to compaction task which performs generation of image layers. So right now except deltas count criteria we also take in account "wishes" of GC. ## Issue ticket number and link See https://neondb.slack.com/archives/C033RQ5SPDH/p1676914249982519 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-05-24 08:01:41 +03:00
sharnoff	7f1973f8ac	bump vm-builder, use Neon-specific version (#4155 ) In the v0.6.0 release, vm-builder was changed to be Neon-specific, so it's handling all the stuff that Dockerfile.vm-compute-node used to do. This commit bumps vm-builder to v0.7.3-alpha3.	2023-05-23 15:20:20 -07:00
Christian Schwarz	00f7fc324d	tenant_map_insert: don't expose the vacant entry to the closure (#4316 ) This tightens up the API a little. Byproduct of some refactoring work that I'm doing right now.	2023-05-23 15:16:12 -04:00
Stas Kelvich	dad3519351	Add SQL-over-HTTP endpoint to Proxy This commit introduces an SQL-over-HTTP endpoint in the proxy, with a JSON response structure resembling that of the node-postgres driver. This method, using HTTP POST, achieves smaller amortized latencies in edge setups due to fewer round trips and an enhanced open connection reuse by the v8 engine. This update involves several intricacies: 1. SQL injection protection: We employed the extended query protocol, modifying the rust-postgres driver to send queries in one roundtrip using a text protocol rather than binary, bypassing potential issues like those identified in https://github.com/sfackler/rust-postgres/issues/1030. 2. Postgres type compatibility: As not all postgres types have binary representations (e.g., acl's in pg_class), we adjusted rust-postgres to respond with text protocol, simplifying serialization and fixing queries with text-only types in response. 3. Data type conversion: Considering JSON supports fewer data types than Postgres, we perform conversions where possible, passing all other types as strings. Key conversions include: - postgres int2, int4, float4, float8 -> json number (NaN and Inf remain text) - postgres bool, null, text -> json bool, null, string - postgres array -> json array - postgres json and jsonb -> json object 4. Alignment with node-postgres: To facilitate integration with js libraries, we've matched the response structure of node-postgres, returning command tags and column oids. Command tag capturing was added to the rust-postgres functionality as part of this change.	2023-05-23 20:01:40 +03:00
dependabot[bot]	d75b4e0f16	Bump requests from 2.28.1 to 2.31.0 (#4305 )	2023-05-23 14:54:51 +01:00
Christian Schwarz	4d41b2d379	fix: `max_lsn_wal_lag` broken in tenant conf (#4279 ) This patch fixes parsing of the `max_lsn_wal_lag` tenant config item. We were incorrectly expecting a string before, but the type is a NonZeroU64. So, when setting it in the config, the (updated) test case would fail with ``` E psycopg2.errors.InternalError_: Tenant a1fa9cc383e32ddafb73ff920de5f2e6 will not become active. Current state: Broken due to: Failed to parse config from file '.../repo/tenants/a1fa9cc383e32ddafb73ff920de5f2e6/config' as pageserver config: configure option max_lsn_wal_lag is not a string. Backtrace: ``` So, not even the assertions added are necessary. The test coverage for tenant config is rather thin in general. For example, the `test_tenant_conf.py` test doesn't cover all the options. I'll add a new regression test as part of attach-time-tenant-conf PR https://github.com/neondatabase/neon/pull/4255	2023-05-23 16:29:59 +03:00
Shany Pozin	d6cf347670	Add an option to set "latest gc cutoff lsn" in pageserver binutils (#4290 ) ## Problem [#2539](https://github.com/neondatabase/neon/issues/2539) ## Summary of changes Add support for latest_gc_cutoff_lsn update in pageserver_binutils	2023-05-23 15:48:43 +03:00
Joonas Koivunen	6388454375	test: allow benign warning in relation to startup ordering (#4262 ) Allow the warning which happens because the disk usage based eviction runs before tenants are loaded. Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/main/5001582237/index.html#suites/0e58fb04d9998963e98e45fe1880af7d/a711f5baf8f8bd8d/	2023-05-22 11:59:54 +03:00
Alexander Bayandin	3837fca7a2	compute-node-image: fix postgis download (#4280 ) ## Problem `osgeo.org` is experiencing some problems with DNS resolving which breaks `compute-node-image` (because it can't download postgis) ## Summary of changes - Add `140.211.15.30 download.osgeo.org` to /etc/hosts by passing it via the container option	2023-05-19 15:34:22 +01:00
Dmitry Rodionov	7529ee2ec7	rfc: the state of pageserver tenant relocation (#3868 ) Summarize current state of tenant relocation related activities and implementation ideas	2023-05-19 14:35:33 +03:00
Christian Schwarz	b391c94440	tenant create / update-config: reject unknown fields (#4267 ) This PR enforces that the tenant create / update-config APIs reject requests with unknown fields. This is a desirable property because some tenant config settings control the lifetime of user data (e.g., GC horizon or PITR interval). Suppose we inadvertently rename the `pitr_interval` field in the Rust code. Then, right now, a client that still uses the old name will send a tenant config request to configure a new PITR interval. Before this PR, we would accept such a request, ignore the old name field, and use the pageserver.toml default value for what the new PITR interval is. With this PR, we will instead reject such a request. One might argue that the client could simply check whether the config it sent has been applied, using the `/v1/tenant/.../config` endpoint. That is correct for tenant create and update-config. But, attach will soon [^1] grow the ability to have attach-time config as well. If we ignore unknown fields and fall back to global defaults in that case, we risk data loss. Example: 1. Default PITR in pageservers is 7 days. 2. Create a tenant and set its PITR to 30 days. 3. For 30 days, fill the tenant continuously with data. 4. Detach the tenant. 5. Attach tenant. Attach must use the 30-day PITR setting in this scenario. If it were to fall back to the 7-day default value, we would lose 23 days of PITR capability for the tenant. So, the PR that adds attach-time tenant config will build on the (clunky) infrastructure added in this PR [^1]: https://github.com/neondatabase/neon/pull/4255 Implementation Notes ==================== This could have been a simple `#[serde(deny_unknown_fields)]` but sadly, that is documented- but silent-at-compile-time-incompatible with `#[serde(flatten)]`. But we are still using this by adding on outer struct and use unit tests to ensure it is correct. `neon_local tenant config` now uses the `.remove()` pattern + bail if there are leftover config args. That's in line with what `neon_local tenant create` does. We should dedupe that logic in a future PR. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com> Co-authored-by: Alex Chi <iskyzh@gmail.com>	2023-05-18 21:16:09 -04:00
Alexander Bayandin	5abc4514b7	Un-xfail fixed tests on Postgres 15 (#4275 ) - https://github.com/neondatabase/neon/pull/4182 - https://github.com/neondatabase/neon/pull/4213	2023-05-18 22:38:33 +01:00
Alexander Bayandin	1b2ece3715	Re-enable compatibility tests on Postgres 15 (#4274 ) - Enable compatibility tests for Postgres 15 - Also add `PgVersion::v_prefixed` property to return the version number with, _guess what,_ v-prefix!	2023-05-18 19:56:09 +01:00
Anastasia Lubennikova	8ebae74c6f	Fix handling of XLOG_XACT_COMMIT/ABORT: Previously we didn't handle XACT_XINFO_HAS_INVALS and XACT_XINFO_HAS_DROPPED_STAT correctly, which led to getting incorrect value of twophase_xid for records with XACT_XINFO_HAS_TWOPHASE. This caused 'twophase file for xid {} does not exist' errors in test_isolation	2023-05-18 14:36:45 +01:00
Vadim Kharitonov	fc886dc8c0	Compile pg_cron extension	2023-05-17 17:43:50 +02:00
Heikki Linnakangas	72346e102d	Document that our code is mostly not async cancellation-safe. We had a hot debate on whether we should try to make our code cancellation-safe, or just accept that it's not, and make sure that our Futures are driven to completion. The decision is that we drive Futures to completion. This documents the decision, and summarizes the reasoning for that. Discussion that sparked this: https://github.com/neondatabase/neon/pull/4198#discussion_r1190209316	2023-05-17 17:29:54 +03:00
Joonas Koivunen	918cd25453	ondemand_download_large_rel: solve flakyness (#3697 ) Disable background tasks to not get compaction downloading all layers but also stop safekeepers before checkpointing, use a readonly endpoint. Fixes: #3666 Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-05-17 16:19:02 +02:00
Alex Chi Z	9767432cff	add `cargo neon` shortcut for neon_local (#4240 ) Add `cargo neon` as a shortcut for compiling and running `neon_local`. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-05-17 16:48:00 +03:00
Anastasia Lubennikova	0c4dc55a39	Disable recovery_prefetch for Neon hot standby. Prefetching of blocks referenced in WAL doesn't make sense for us, because Neon hot standby anyway ignores pages that are not in the shared_buffers.	2023-05-17 13:35:56 +01:00
Alexander Bayandin	7b9e8be6e4	GitHub Autocomment: add a command to run all failed tests (#4200 ) - Group tests by Postgres version - Merge different build types - Add a command to GitHub comment on how to rerun all failed tests (different command for different Postgres versions) - Restore a link to a test report in the build summary	2023-05-17 11:38:41 +01:00
Christian Schwarz	89307822b0	mgmt api: share a single tenant config model struct in Rust and OpenAPI (#4252 ) This is prep for https://github.com/neondatabase/neon/pull/4255 [1/X] OpenAPI: share a single definition of TenantConfig DRYs up the pageserver OpenAPI YAML's representation of tenant config. All the fields of tenant config are now located in a model schema called TenantConfig. The tenant create & config-change endpoints have separate schemas, TenantCreateInfo and TenantConfigureArg, respectively. These schemas inherit from TenantConfig, using allOf 1. The tenant config-GET handler's response was previously named TenantConfig. It's now named TenantConfigResponse. None of these changes affect how the request looks on the wire. The generated Go code will change for Console because the OpenAPI code generator maps `allOf` to a Go struct embedding. Luckily, usage of tenant config in Console is still very lightweigt, but that will change in the near future. So, this is a good chance to set things straight. The console changes are tracked in https://github.com/neondatabase/cloud/pull/5046 [2/x]: extract the tenant config parts of create & config requests [3/x]: code movement: move TenantConfigRequestConfig next to TenantCreateRequestConfig [4/x] type-alias TenantConfigRequestConfig = TenantCreateRequestConfig; They are exactly the same. [5/x] switch to qualified use for tenant create/config request api models [6/x] rename models::TenantConfig{RequestConfig,} and remove the alias [7/x] OpenAPI: sync tenant create & configure body names from Rust code [8/x]: dedupe the two TryFrom<...> for TenantConfOpt impls The only difference is that the TenantConfigRequest impl does ``` tenant_conf.max_lsn_wal_lag = request_data.max_lsn_wal_lag; tenant_conf.trace_read_requests = request_data.trace_read_requests; ``` and the TenantCreateRequest impl does ``` if let Some(max_lsn_wal_lag) = request_data.max_lsn_wal_lag { tenant_conf.max_lsn_wal_lag = Some(max_lsn_wal_lag); } if let Some(trace_read_requests) = request_data.trace_read_requests { tenant_conf.trace_read_requests = Some(trace_read_requests); } ``` As far as I can tell, these are identical.	2023-05-17 12:31:17 +02:00
Alexander Bayandin	30fe310602	Code Coverage: upload reports to S3 (#4256 ) ## Problem `neondatabase/zenith-coverage-data` is too big: - It takes ~6 minutes to clone and push the repo - GitHub fails to publish an HTML report to github.io Part of https://github.com/neondatabase/neon/issues/3543 ## Summary of changes Replace pushing code coverage report to `neondatabase/zenith-coverage-data` with uploading it to S3	2023-05-17 11:30:07 +01:00
0x29a	ef41b63db7	docs: add links to the doc for better read experience (#4258 ) add links to the doc and refine links for better read experience	2023-05-17 12:25:01 +03:00
Christian Schwarz	1bceceac5a	add helper to debug_assert that current span has a TenantId (#4248 ) We already have `debug_assert_current_span_has_tenant_and_timeline_id`. Have the same for just TenantId.	2023-05-17 11:03:46 +02:00
Christian Schwarz	4431779e32	refactor: attach: use create_tenant_files + schedule_local_tenant_processing (#4235 ) With this patch, the attach handler now follows the same pattern as tenant create with regards to instantiation of the new tenant: 1. Prepare on-disk state using `create_tenant_files`. 2. Use the same code path as pageserver startup to load it into memory and start background loops (`schedule_local_tenant_processing`). It's a bit sad we can't use the `PageServerConfig::tenant_attaching_mark_file_path` method inside `create_tenant_files` because it operates in a temporary directory. However, it's a small price to pay for the gained simplicity. During implementation, I noticed that we don't handle failures post `create_tenant_files` well. I left TODO comments in the code linking to the issue that I created for this [^1]. Also, I'll dedupe the spawn_load and spawn_attach code in a future commit. refs https://github.com/neondatabase/neon/issues/1555 part of https://github.com/neondatabase/neon/issues/886 (Tenant Relocation) [^1]: https://github.com/neondatabase/neon/issues/4233	2023-05-16 12:53:17 -04:00
Alexander Bayandin	131343ed45	Fix regress-tests job for Postgres 15 on release branch (#4253 ) ## Problem Compatibility tests don't support Postgres 15 yet, but we're still trying to upload compatibility snapshot (which we do not collect). Ref https://github.com/neondatabase/neon/actions/runs/4991394158/jobs/8940369368#step:4:38129 ## Summary of changes Add `pg_version` parameter to `run-python-test-set` actions and do not upload compatibility snapshot for Postgres 15	2023-05-16 17:18:56 +01:00
Joseph Koshakow	511b0945c3	Replace usages of wait_for_active_timeline (#4243 ) This commit replaces all usages of connection_manager.rs: wait_for_active_timeline with Timeline::wait_to_become_active. wait_to_become_active is better and in the right module. close https://github.com/neondatabase/neon/issues/4189 Co-authored-by: Shany Pozin <shany@neon.tech>	2023-05-16 10:38:39 -04:00
Dmitry Rodionov	b7db62411b	Make storage time operations an enum instead of an array (#4238 ) Use an enum instead of an array. Before that there was no connection between definition of the metric and point where it was used aside from matching string literals. Now its possible to use IDE features to check for references. Also this allows to avoid mismatch between set of metrics that was defined and set of metrics that was actually used What is interesting is that `init logical size` case is not used. I think `LogicalSize` is a duplicate of `InitLogicalSize`. So removed the latter.	2023-05-16 16:54:29 +03:00
MMeent	efe9e131a7	Update vendored PostgreSQL to latest patch releases (#4208 ) Conflicts: - Changes in PG15's xlogrecovery.c resulted in non-substantial conflicts between ecb01e6ebb5a67f3fc00840695682a8b1ba40461 and aee72b7be903e52d9bdc6449aa4c17fb852d8708 Fixes #4207	2023-05-16 15:23:50 +02:00
Alex Chi Z	4a67f60a3b	bump aws dep version (#4237 ) This PR is simply the patch from https://github.com/neondatabase/neon/issues/4008 except we enabled `force_path_style` for custom endpoints. This is because at some version, the s3 sdk by default uses the virtual-host style access, which is not supported by MinIO in the default configuration. By enforcing path style access for custom endpoints, we can pass all e2e test cases. SDK 0.55 is not the latest version and we can bump it further later when all flaky tests in this PR are resolved. This PR also (hopefully) fixes flaky test `test_ondemand_download_timetravel`. close https://github.com/neondatabase/neon/issues/4008 Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-05-16 09:09:50 -04:00
Alexander Bayandin	a65e0774a5	Increase shared memory size for regression test run (#4232 ) Should fix flakiness caused by the error ``` FATAL: could not resize shared memory segment "/PostgreSQL.3944613150" to 1048576 bytes: No space left on device ```	2023-05-16 14:06:47 +01:00
Dmitry Rodionov	a0b34e8c49	add create tenant metric to storage operations (#4231 ) Add a metric to track time spent in create tenant requests Originated from https://github.com/neondatabase/neon/pull/4204	2023-05-16 15:15:29 +03:00
bojanserafimov	fdc1c12fb0	Simplify github PR template (#4241 )	2023-05-16 08:13:54 -04:00
Alexander Bayandin	0322e2720f	Nightly Benchmarks: add neonvm to pgbench-compare (#4225 )	2023-05-16 12:46:28 +01:00
Vadim Kharitonov	4f64be4a98	Add endpoint to connection string	2023-05-15 23:45:04 +02:00
Tristan Partin	e7514cc15e	Wrap naked PQerrorMessage calls in libpagestore with pchomp (#4242 )	2023-05-15 15:36:53 -05:00
Tristan Partin	6415dc791c	Fix use-after-free issue in libpagestore (#4239 ) ## Describe your changes `pageserver_disconnect()` calls `PQfinish()` which deallocates resources on the connection structure. `PQerrorMessage()` hands back a pointer to an allocated resource. Duplicate the error message prior to calling `pageserver_disconnect()`. ## Issue ticket number and link Fixes https://github.com/neondatabase/neon/issues/4214 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [x] If it is a core feature, I have added thorough tests. - [x] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [x] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [x] Do not forget to reformat commit message to not include the above checklist	2023-05-15 13:38:18 -05:00
Alexander Bayandin	a5615bd8ea	Fix Allure reports for different benchmark jobs (#4229 ) - Fix Allure report generation failure for Nightly Benchmarks - Fix GitHub Autocomment for `run-benchmarks` label (`build_and_test.yml::benchmarks` job)	2023-05-15 13:04:03 +01:00
Joonas Koivunen	4a76f2b8d6	upload new timeline index part json before 201 or on retry (#4204 ) Await for upload to complete before returning 201 Created on `branch_timeline` or when `bootstrap_timeline` happens. Should either of those waits fail, then on the retried request await for uploads again. This should work as expected assuming control-plane does not start to use timeline creation as a wait_for_upload mechanism. Fixes #3865, started from https://github.com/neondatabase/neon/pull/3857/files#r1144468177 Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-05-15 14:16:43 +03:00
Shany Pozin	9cd6f2ceeb	Remove duplicated logic in creating TenantConfOpt (#4230 ) ## Describe your changes Remove duplicated logic in creating TenantConfOpt in both TryFrom of TenantConfigRequest and TenantCreateRequest	2023-05-15 10:08:44 +03:00
Heikki Linnakangas	2855c73990	Fix race condition after attaching tenant with branches. (#4170 ) After tenant attach, there is a window where the child timeline is loaded and accepts GetPage requests, but its parent is not. If a GetPage request needs to traverse to the parent, it needs to wait for the parent timeline to become active, or it might miss some records on the parent timeline. It's also possible that the parent timeline is active, but it hasn't yet received all the WAL up to the branch point from the safekeeper. This happens if a pageserver crashes soon after creating a timeline, so that the WAL leading to the branch point has not yet been uploaded to remote storage. After restart, the WAL will be re-streamed and ingested from the safekeeper, but that takes a while. Because of that, it's not enough to check that the parent timeline is active, we also need to wait for the WAL to arrive on the parent timeline, just like at the beginning of GetPage handling. We probably should change the behavior at create_timeline so that a timeline can only be created after all the WAL up to the branch point has been uploaded to remote storage, but that's not currently the case and out of scope for this PR (see github issue #4218). @NanoBjorn encountered this while working on tenant migration. After migrating a tenant with a parent and child branch, connecting to the child branch failed with an error like: ``` FATAL: "base/16385" is not a valid data directory DETAIL: File "base/16385/PG_VERSION" is missing. ``` This commit adds two tests that reproduce the bug, with slightly different symptoms.	2023-05-13 10:44:11 +03:00
Christian Schwarz	edcf4d61a4	distinguish imitated from real size::gather_input calls in metrics (#4224 ) Before this PR, the gather_inputs() calls made to imitate synthetic size calculation accesses were accounted towards the real logical size calculation metric. This PR forces all callers to declare the cause for making logical size calculations, making the decision which cause counts towards which metric explicit. This is follow-up to ``` commit `1d266a6365` Author: Christian Schwarz <christian@neon.tech> Date: Thu May 11 16:09:29 2023 +0200 logical size calculation metrics: differentiate regular vs imitated (#4197) ``` After merging this patch, I hope to be able to explain why we have ca 30x more "logical size" ops in prod than "imitate logical size" for any given observation interval. refs https://github.com/neondatabase/neon/issues/4154	2023-05-12 17:57:33 +00:00
Christian Schwarz	a2a9c598be	add counter metric that increases whenever a background loop overruns its period (#4223 ) We already have the warn!() log line for this condition. This PR adds a corresponding metric on which we can have a dedicated alert. Cheaper and more reliable than alerting on the logs, because, we run into log rate limits from time to time these days. refs https://github.com/neondatabase/neon/issues/4222	2023-05-12 19:00:06 +03:00
Alexander Bayandin	bb06d281ea	Run regressions tests on both Postgres 14 and 15 (#4192 ) This PR adds tests runs on Postgres 15 and created unified Allure report with results for all tests. - Split `.github/actions/allure-report` into `.github/actions/allure-report-store` and `.github/actions/allure-report-generate` - Add debug or release pytest parameter for all tests (depending on `BUILD_TYPE` env variable) - Add Postgres version as a pytest parameter for all tests (depending on `DEFAULT_PG_VERSION` env variable) - Fix `test_wal_restore` and `restore_from_wal.sh` to support path with `[`/`]` in it (fixed by applying spellcheck to the script and fixing all warnings), `restore_from_wal_archive.sh` is deleted as unused. - All known failures on Postgres 15 marked with xfail	2023-05-12 15:28:51 +01:00
Christian Schwarz	5869234290	logical size calculation: spawn with in_current_span (#4196 ) While investigating https://github.com/neondatabase/neon/issues/4154 I found that the `Calculating logical size for timeline` tracing events created from within the logical size computation code are not always attributable to the background task that caused it. My goal is to be able to distinguish in the logs whether a `Calculating logical size for timeline` was logged as part of a real synthetic size calculation VS an imitation by the eviction task. I want this distinction so I can prove my assumption that the disk IO peaks which we see every 24h on prod are due to eviction's imitate synthetic size calculations. The alternative here, which I would have preferred, but is more work: link RequestContext's into a child->parent list and dump this list when we log `Calculating logical size for timeline`. I would have preferred that over what we have in this PR because, technically, the ondemand logical size computation can outlive the caller that spawned it. This is against the idea of correctly nested spans. I guess in OpenTelemetry land, the correct modelling would be a link between the caller's span and the task_mgr task's span. Anyways, I think the case where we hang up on the spawned ondemand logical size calculation is quite rare. So, I'm willing to tolerate incorrectly nested spans for these edge-cases. refs https://github.com/neondatabase/neon/issues/4154	2023-05-12 15:36:30 +02:00
Rahul Modpur	ecfe4757d3	fix bogus at character context in log messages Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>	2023-05-11 23:31:42 +01:00
Christian Schwarz	845e296562	eviction: add global histogram for iteration durations (#4212 ) I would like to know whether and by how much the eviction iterations spike in the $period-sized window that happens every $threshold , when all the timelines do the imitate accesses. refs https://github.com/neondatabase/neon/issues/4154	2023-05-11 18:02:19 +03:00
Heikki Linnakangas	1988cc5527	Fix `failpoint_sleep_millis_async` without `use std::time::Duration` (#4195 ) I tried to use failpoint_sleep_millis_async(...) in a source file that didn't do `use std::time::Duration`, and got a compiler error: ``` error[E0433]: failed to resolve: use of undeclared type `Duration` --> pageserver/src/walingest.rs:316:17 \| 316 \| utils::failpoint_sleep_millis_async!("wal-ingest-logical-message-sleep"); \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in this scope \| = note: this error originates in the macro `utils::failpoint_sleep_millis_async` (in Nightly builds, run with -Z macro-backtrace for more info) help: consider importing one of these items \| 24 \| use chrono::Duration; \| 24 \| use core::time::Duration; \| 24 \| use humantime::Duration; \| 24 \| use serde_with::__private__::Duration; \| and 2 other candidates ```	2023-05-11 17:53:42 +03:00
Christian Schwarz	1d266a6365	logical size calculation metrics: differentiate regular vs imitated (#4197 ) I want this distinction so I can prove my assumption that the disk IO peaks which we see every 24h on prod are due to eviction's imitate synthetic size calculations. refs https://github.com/neondatabase/neon/issues/4154	2023-05-11 17:09:29 +03:00
Christian Schwarz	80522a1b9d	replace has_in_progress_downloads with new attachment_status field (#4168 ) Control Plane currently [^1] polls for `has_in_progress_downloads == false` after /attach to determine that an attach operation succeeded. As pointed out in the OpenAPI spec as of neon#4151, polling for `has_in_progress_downloads` is incorrect. This patch changes the situation by - removing `has_in_progress_downloads` - adding a new field `attachment_status.` - changing instructions for `/attach` to poll for `attachment_status == attached`. This makes the instructions in `/attach` actionable for Control Plane. NB that we don't expose the TenantState in the OpenAPI docs, even though we expose it in the endpoint. That is with good reason because we don't want to commit to a fixed set of tenant states forever. Hence, the separate `attachment_status` field that exposes the bare minimum required to make /attach + subsequent polling 100% safe wrt split brain. It would have been nice to report failures explicitly, but the problem is that we lose that state when we restart. So, we return `attached` upon attach failure. The tenant is Broken in that case, causing Control Plane's subsequent health check will fail. Control Plane can roll back the relocation operation then. NB: the reliance on the subsequent health check is no change to what we had before this patch! NB: we can always add additional TenantAttachmentStatus'es in the future to communicate failure. This PR also moves the attach-marker file's creation to the API handler's synchronous part. That was done to avoid the need to distinguish * `Attaching but marker not yet written => AttachmentStatus::Maybe` from * `Attaching, marker written, but attach failed for other reason => AttachmentStatus::Attached` Coincidentally, this also adds more transactionality to the /attach API because we only return 202 once we've written the marker file. But, in the end, it doesn't affect how the control plane interacts with us or how it needs to do retries. So, we don't mention any of this in the API docs. [^1]: The one-click tenant relocation PR cloud#4740, currently WIP, is the first real user.	2023-05-11 16:53:46 +03:00
Joonas Koivunen	ecced13d90	try: higher page_service timeouts to isolate an issue (#4206 ) See #4205.	2023-05-11 16:14:42 +03:00
Alexander Bayandin	59510f6449	scripts/flaky_tests.py: use retriesStatusChange from Allure	2023-05-10 16:59:03 +01:00
Alexander Bayandin	7fc778d251	GitHub Autocomment: fix flaky test notifications	2023-05-10 16:59:03 +01:00
Alexander Bayandin	1d490b2311	Make benchmark_fixture less noisy	2023-05-10 16:59:03 +01:00
Dmitry Rodionov	eb3a8be933	keep track of timeline deletion status in IndexPart to prevent timeline resurrection (#3919 ) Before this patch, the following sequence would lead to the resurrection of a deleted timeline: - create timeline - wait for its index part to reach s3 - delete timeline - wait an arbitrary amount of time, including 0 seconds - detach tenant - attach tenant - the timeline is there and Active again This happens because we only kept track of the deletion in the tenant dir (by deleting the timeline dir) but not in S3. The solution is to turn the deleted timeline's IndexPart into a tombstone. The deletion status of the timeline is expressed in the `deleted_at: Option<NativeDateTime>` field of IndexPart. It's `None` while the timeline is alive and `Some(deletion time stamp)` if it is deleted. We change the timeline deletion handler to upload this tombstoned IndexPart. The handler does not return success if the upload fails. Coincidentally, this fixes the long-stanging TODO about the `std::fs::remove_dir_all` being not atomic. It need not be atomic anymore because we set the `deleted_at=Some()` before starting the `remove_dir_all`. The tombstone is in the IndexPart only, not in the `metadata`. So, we only have the tombstone and the `remove_dir_all` benefits mentioned above if remote storage is configured. This was a conscious trade-off because there's no good format evolution story for the current metadata file format. The introduction of this additional step into `delete_timeline` was painful because delete_timeline needs to be 1. cancel-safe 2. idempotent 3. safe to call concurrently These are mostly self-inflicted limitations that can be avoided by using request-coalescing. PR https://github.com/neondatabase/neon/pull/4159 will do that. fixes https://github.com/neondatabase/neon/issues/3560 refs https://github.com/neondatabase/neon/issues/3889 (part of tenant relocation) Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-05-10 10:27:12 +02:00
Christian Schwarz	3ec52088dd	eviction_task: tracing::instrument the imitate-access calls (#4180 ) Currently, if we unexpectly download from the eviction task, the log lines look like what we have in https://github.com/neondatabase/neon/issues/4154 ``` 2023-05-04T14:42:57.586772Z WARN eviction_task{tenant_id=$TENANT timeline_id=$TIMELINE}:eviction_iteration{policy_kind="LayerAccessThreshold"}: unexpectedly on-demand downloading remote layer remote $TIMELINE/000000067F000032AC0000400C00FFFFFFFF-000000067F000032AC000040140000000008__0000000001696070-0000000003DC76E9 for task kind Eviction ``` We know these are caused by the imitate accesses. But we don't know which one (my bet is on update_gc_info). I didn't want to pollute the other tasks' logs with the additional spans, so, using `.instrument()` when we call non-eviction-task code. refs https://github.com/neondatabase/neon/issues/4154	2023-05-09 18:16:22 +02:00
Heikki Linnakangas	66b06e416a	Pass tracing context in env variables instead of the spec file. (#4174 ) If compute_ctl is launched without a spec file, it fetches it from the control plane with an HTTP request. We cannot get the startup tracing context from the compute spec in that case, because we don't have it available on start. We could still read the tracing context from the compute spec after we have fetched it, but that would leave the fetch itself out of the context. Pass the tracing context in environment variables instead.	2023-05-09 17:08:02 +03:00
Arthur Petukhovsky	d62315327a	Allow parallel backup in safekeepers (#4177 ) Add `wal_backup_parallel_jobs` cmdline argument to specify the max count of parallel segments upload. New default value is 5, meaning that safekeepers will try to upload 5 segments concurrently if they are available. Setting this value to 1 will be equivalent to the sequential upload that we had before. Part of the https://github.com/neondatabase/neon/issues/3957	2023-05-09 12:20:35 +03:00
Anastasia Lubennikova	4bd7b1daf2	Bump vendor/postgres: Fix entering hot standby mode for Neon postgres v15	2023-05-08 21:25:47 +01:00
Sergey Melnikov	0d3d022eb1	Remove deploy workflows (#4157 ) ## Describe your changes Removing deploy workflows (moving to aws repo)	2023-05-08 17:30:16 +02:00
Raouf Chebri	e85cbddd2e	Update neondatabase banner in README.md (#4176 ) ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-05-08 17:12:42 +02:00
Anton Chaporgin	51ff9f9359	pg-sni-router nlb is internal (#4164 )	2023-05-08 18:03:50 +03:00
Vadim Kharitonov	0f8b2d8f0a	Compile kq_imcx extension (#3568 ) ## Describe your changes Compiles kq_imcx extension At this moment, there are some issues with the extension: 1. I'm cloning it directly from the master branch. It's better to fetch tag/archive 2. PG14: ``` postgres=# CREATE EXTENSION IF NOT EXISTS kq_imcx; postgres=# select * from kq_calendar_cache_info(); 2023-02-08 13:55:22.853 UTC [412] ERROR: relation "ketteq.slice_type" does not exist at character 34 2023-02-08 13:55:22.853 UTC [412] QUERY: select min(s.id), max(s.id) from ketteq.slice_type s 2023-02-08 13:55:22.853 UTC [412] STATEMENT: select * from kq_calendar_cache_info(); ERROR: relation "ketteq.slice_type" does not exist LINE 1: select min(s.id), max(s.id) from ketteq.slice_type s ``` 3. PG15: `cannot request additional shared memory outside shmem_request_hook` Note: I don't think we need to publish info about this extension in the docs. ## Issue ticket number and link neondatabase/cloud#3387	2023-05-08 15:56:08 +02:00
Gleb Novikov	9860d59aa2	Public docker image repository by default	2023-05-08 15:51:54 +04:00
Christian Schwarz	411c71b486	document current tenant attach API semantics (#4151 ) We currently return 202 as soon as the tenant is allocated in memory before we've written out the marker file. So, the /attach API currently does not have a transactional character. For example, it can happen that we respond with a 202 and then crash before writing out the marker file. In such a case, it is important that the client 1. observes the lost attach (by polling tenant status and observing 404) 2. and consequently retries the attach. It has to do it in this loop until it observes the tenant as "Active" in the tenant status. If the client doesn't follow this protocol and instead goes to another pageserver to attach the tenant, we risk a split-brain situation where both the first and second pageserver write to the tenant's S3 state. The improved description highlights the consequences of this behavior for clients that use the /attach endpoint. The tenant relocation that is currently being implemented in cloud#4740 implements retries of Attach and it does poll afterwards, but, it polls `has_in_progress_downloads`. That is incorrect, as described in the patch body. The motivation for this write-up is that, in a future PR, we'll extend the /attach endpoint with an option to provide the tenant config. If we decide to leave the non-transactional behavior of /attach unmodified, we will be able to avoid persisting the tenant config. Conversely, if we decide that the /attach API should become transactional, we'll need to persist the tenant config in the attach-marker-file before acknowledging receipt of the /attach operation. refs https://github.com/neondatabase/cloud/pull/4740 refs https://github.com/neondatabase/neon/issues/2238 refs https://github.com/neondatabase/neon/issues/1555	2023-05-05 19:32:41 +03:00
Alexey Kondratov	dd4fd89dc6	[compute_ctl] Do not initialize `last_active` on start (#4137 ) Our scale-to-zero logic was optimized for short auto-suspend intervals, e.g. minutes or hours. In this case, if compute was restarted by k8s due to some reason (OOM, k8s node went down, pod relocation, etc.), `last_active` got bumped, we start counting auto-suspend timeout again. It's not a big deal, i.e. we suspend completely idle compute not after 5 minutes, but after 10 minutes or so. Yet, some clients may want days or even weeks. And chance that compute could be restarted during this interval is pretty high, but in this case we could be not able to suspend some computes for weeks. After this commit, we won't initialize `last_active` on start, so `/status` could return an unset attribute. This means that there was no user activity since start. Control-plane should deal with it by taking `max()` out of all available activity timestamps: `started_at`, `last_active`, etc. compute_ctl part of neondatabase/cloud#4853	2023-05-05 11:45:37 +02:00
Alexander Bayandin	653e633c59	test_runner: add --pg-version pytest argument (#4037 ) - allows setting Postgres version for testing using --pg-version argument - fixes tests for the non-default Postgres version.	2023-05-05 02:57:47 +03:00
Alexander Bayandin	291b4f0d41	Update client libs for test_runner/pg_clients to their latest versions (#4092 ) Also, use Workaround D for `swift/PostgresClientKitExample`, which supports neither SNI nor connections options	2023-05-04 18:22:04 +01:00
Christian Schwarz	88f39c11d4	refactor: the code that builds TenantConfOpt from mgmt API requests (#4152 ) - extract code that builds TenantConfOpt from requests into a From<> impl - move map_err(ApiError::BadRequest) into callers	2023-05-04 18:10:40 +03:00
Heikki Linnakangas	b627fa71e4	Make read-only replicas explicit in compute spec (#4136 ) This builds on top of PR #4058, and supersedes #4018	2023-05-04 17:41:42 +03:00
Christian Schwarz	7dd9553bbb	eviction: regression test + distinguish layer write from map insert (#4005 ) This patch adds a regression test for the threshold-based layer eviction. The test asserts the basic invariant that, if left alone, the residence statuses will stabilize, with some layers resident and some layers evicted. Thereby, we cover both the aspect of last-access-time-threshold-based eviction, and the "imitate access" hacks that we put in recently. The aggressive `period` and `threshold` values revealed a subtle bug which is also fixed in this patch. The symptom was that, without the Rust changes of this patch, there would be occasional test failures due to `WARN... unexpectedly downloading` log messages. These log messages were caused by the "imitate access" calls of the eviction task. But, the whole point of the "imitate access" hack was to prevent eviction of the layers that we access there. After some digging, I found the root cause, which is the following race condition: 1. Compact: Write out an L1 layer from several L0 layers. This records residence event `LayerCreate` with the current timestamp. 2. Eviction: imitate access logical size calculation. This accesses the L0 layers because the L1 layer is not yet in the layer map. 3. Compact: Grab layer map lock, add the new L1 to layer map and remove the L0s, release layer map lock. 4. Eviction: observes the new L1 layer whose only activity timestamp is the `LayerCreate` event. The L1 layer had no chance of being accessed until after (3). So, if enough time passes between (1) and (3), then (4) will observe a layer with `now-last_activity > threshold` and evict it The fix is to require the first `record_residence_event` to happen while we already hold the layer map lock. The API requires a ref to a `BatchedUpdates` as a witness that we are inside a layer map lock. That is not fool-proof, e.g., new call sites for `insert_historic` could just completely forget to record the residence event. It would be nice to prevent this at the type level. In the meantime, we have a rate-limited log messages to warn us, if such an implementation error sneaks in in the future. fixes https://github.com/neondatabase/neon/issues/3593 fixes https://github.com/neondatabase/neon/issues/3942 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-05-04 16:16:48 +02:00
Heikki Linnakangas	b5d64a1e32	Rename field, to match field name in XLogData struct and in rust-postgres (#4149 ) The field means the same thing as the `wal_end` field in the XLogData struct. And in the postgres-protocol crate's corresponding PrimaryKeepAlive struct, it's also called `wal_end`. Let's be consistent. As noted by Arthur at https://github.com/neondatabase/neon/pull/4144#pullrequestreview-1411031881	2023-05-04 14:41:15 +03:00
Christian Schwarz	f9839a0dd9	import_basebackup_from_tar: don't load local layers twice (#4111 ) PR #4104 removed these bits as part of a revert of a larger change. follow-up to https://github.com/neondatabase/neon/pull/4104#discussion_r1180444952 --- Let's not merge this before the release.	2023-05-04 09:23:49 +02:00
Arthur Petukhovsky	ce1bbc9fa7	Always send the latest commit_lsn in send_wal (#4150 ) When a new connection is established to the safekeeper, the 'end_pos' field is initially set to Lsn::INVALID (i.e 0/0). If there is no WAL to send to the client, we send KeepAlive messages with Lsn::INVALID. That confuses the pageserver: it thinks that safekeeper is lagging very much behind the tip of the branch, and will reconnect to a different safekeeper. Then the same thing happens with the new safekeeper, until some WAL is streamed which sets 'end_pos' to a valid value. This fix always sets `end_pos` to the most recent `commit_lsn` value. This is useful to send the latest `commit_lsn` to the receiver, so it will know how advanced this safekeeper compared to the others. Fixes https://github.com/neondatabase/neon/issues/3972 Supersedes https://github.com/neondatabase/neon/pull/4144	2023-05-04 00:07:45 +03:00
Alexander Bayandin	b114ef26c2	GitHub Autocomment: add a note if no tests were run (#4109 ) - Always (if not cancelled) add a comment to a PR - Mention in the comment if no tests were run / reports were not generated.	2023-05-03 15:38:49 +01:00
Arthur Petukhovsky	3ceef7b17a	Add more safekeeper and walreceiver metrics (#4142 ) Add essential safekeeper and pageserver::walreceiver metrics. Mostly counters, such as the number of received queries, broker messages, removed WAL segments, or connection switches events in walreceiver. Also logs broker push loop duration.	2023-05-03 17:07:41 +03:00
Kirill Bulatov	586e6e55f8	Print WalReceiver context on WAL waiting timeout (#4090 ) Closes https://github.com/neondatabase/neon/issues/2106 Before: ``` Extracting base backup to create postgres instance: path=/Users/someonetoignore/work/neon/neon_main/test_output/test_pageserver_lsn_wait_error_safekeeper_stop/repo/endpoints/ep-2/pgdata port=15017 stderr: command failed: page server 'basebackup' command failed Caused by: 0: db error: ERROR: Timed out while waiting for WAL record at LSN 0/FFFFFFFF to arrive, last_record_lsn 0/A2C3F58 disk consistent LSN=0/16B5A50 1: ERROR: Timed out while waiting for WAL record at LSN 0/FFFFFFFF to arrive, last_record_lsn 0/A2C3F58 disk consistent LSN=0/16B5A50 Stack backtrace: ``` After: ``` Extracting base backup to create postgres instance: path=/Users/someonetoignore/work/neon/neon/test_output/test_pageserver_lsn_wait_error_safekeeper_stop/repo/endpoints/ep-2/pgdata port=15011 stderr: command failed: page server 'basebackup' command failed Caused by: 0: db error: ERROR: Timed out while waiting for WAL record at LSN 0/FFFFFFFF to arrive, last_record_lsn 0/A2C3F58 disk consistent LSN=0/16B5A50, WalReceiver status (update 2023-04-26 14:20:39): streaming WAL from node 12346, commit\|streaming Lsn: 0/A2C3F58\|0/A2C3F58, safekeeper candidates (id\|update_time\|commit_lsn): [(12348\|14:20:40\|0/A2C3F58), (12346\|14:20:40\|0/A2C3F58), (12347\|14:20:40\|0/A2C3F58)] 1: ERROR: Timed out while waiting for WAL record at LSN 0/FFFFFFFF to arrive, last_record_lsn 0/A2C3F58 disk consistent LSN=0/16B5A50, WalReceiver status (update 2023-04-26 14:20:39): streaming WAL from node 12346, commit\|streaming Lsn: 0/A2C3F58\|0/A2C3F58, safekeeper candidates (id\|update_time\|commit_lsn): [(12348\|14:20:40\|0/A2C3F58), (12346\|14:20:40\|0/A2C3F58), (12347\|14:20:40\|0/A2C3F58)] Stack backtrace: ``` As the issue requests, the PR adds the context in logs only, but I think we should expose the context via HTTP management API similar way — it should be simple with the new API, but better be done in a separate PR. Co-authored-by: Kirill Bulatov <kirill@neon.tech>	2023-05-03 16:25:19 +03:00
Anton Chaporgin	db81242f4a	add debug to pg-sni-router install (#4143 )	2023-05-03 16:14:16 +03:00
dependabot[bot]	39ca7c7c09	Bump flask from 2.1.3 to 2.2.5 (#4138 )	2023-05-03 10:40:35 +01:00
Heikki Linnakangas	ecc0cf8cd6	Treat EPIPE as an expected error. (#4141 ) If the other end of a TCP connection closes its read end of the socket, you get an EPIPE when you try to send. I saw that happen in the CI once: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4136/release/4869464644/index.html#suites/c19bc2126511ef8cb145cca25c438215/7ec87b016c0b4b50/ ``` 2023-05-03T07:53:22.394152Z ERROR Task 'serving compute connection task' tenant_id: Some(c204447079e02e7ba8f593cb8bc57e76), timeline_id: Some(b666f26600e6deaa9f43e1aeee5bacb7) exited with error: Postgres connection error Caused by: Broken pipe (os error 32) Stack backtrace: 0: pageserver::page_service::page_service_conn_main::{{closure}} at /__w/neon/neon/pageserver/src/page_service.rs:282:17 <core::panic::unwind_safe::AssertUnwindSafe<F> as core::future::future::Future>::poll at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/panic/unwind_safe.rs:296:9 <futures_util::future::future::catch_unwind::CatchUnwind<Fut> as core::future::future::Future>::poll::{{closure}} at /__w/neon/neon/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.28/src/future/future/catch_unwind.rs:36:42 <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/panic/unwind_safe.rs:271:9 ... ``` In the passing, add a comment to explain what the "expected" in the `is_expected_io_error` function means.	2023-05-03 12:03:13 +03:00
Joonas Koivunen	faebe3177b	wal_craft: cleanup to keep editable (#4121 ) wal_craft had accumulated some trouble by using `use anyhow::*;`. Fixes that, removes redundant conversions (never need to convert a Path to OsStr), especially at the `Process` args. Originally in #4100 but we merged a later PR instead for the fixes. I dropped the `postmaster.pid` polling in favor of just having a longer connect timeout.	2023-05-03 11:11:55 +03:00
Joonas Koivunen	474f69c1c0	fix: omit cancellation logging when panicking (#4125 ) noticed while describing `RequestSpan`, this fix will omit the otherwise logged message about request being cancelled when panicking in the request handler. this was missed on #4064.	2023-05-03 10:56:49 +03:00
Konstantin Knizhnik	47521693ed	Fix file_cache build warnings (#4118 ) ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-05-02 22:28:21 +03:00
Heikki Linnakangas	4d55d61807	Store basic endpoint info in endpoint.json file. (#4058 ) It's more convenient than parsing the postgresql.conf file. Extracted from PR #3886. I started working on another patch (to make it safe to run two "neon_local endpoint create" commands concurrently), and realized that this change will make that simpler too.	2023-05-02 20:36:11 +03:00
Sergey Melnikov	093fafd6bd	Deploy pg-sni-router (#4132 )	2023-05-01 17:18:45 +02:00
Shany Pozin	e3ae2661ee	Add 2 new sets of safekeepers to us-west2 (#4130 ) ## Describe your changes TF output: module.safekeeper-us-west-2.aws_instance.this["3"]: Creation complete after 13s [id=i-089f6b9ef426dff76] module.safekeeper-us-west-2.aws_instance.this["4"]: Creation complete after 13s [id=i-0fe6bf912c4710c82] module.safekeeper-us-west-2.aws_instance.this["5"]: Creation complete after 13s [id=i-0a83c1c46d2b4e409] module.safekeeper-us-west-2.aws_instance.this["6"]: Creation complete after 13s [id=i-0fef5317b8fdc9f8d] module.safekeeper-us-west-2.aws_instance.this["7"]: Creation complete after 13s [id=i-0be739190d4289bf9] module.safekeeper-us-west-2.aws_instance.this["8"]: Creation complete after 13s [id=i-00e851803669e5cfe]	2023-05-01 14:22:59 +03:00
Anton Chaporgin	7e368f3edf	build pg-sni-router binary (#4129 ) ## Describe your changes This adds pg-sni-router binary to the build pipeline and neon image. ## Issue ticket number and link https://github.com/neondatabase/cloud/issues/1461	2023-05-01 12:14:31 +02:00
Joonas Koivunen	138bc028ed	fix: quick and dirty panic avoidance on drop path (#4128 ) Sentry caught a panic on load testing server related to metric removals: https://neondatabase.sentry.io/issues/4142396994 Turn the `expect` into logging, but also add logging for each removal, so we could identify in which cases we do double-remove. The double-removal (or never adding) cause is not obvious or expected. Original added in #3837.	2023-05-01 11:54:09 +03:00
Stas Kelvich	d53f81b449	Add one more pageserver to staging	2023-04-30 22:39:34 +03:00
Joonas Koivunen	6f472df0d0	fix: restore not logging ignored io errors as errors (#4120 ) the fix is rather indirect due to the accidental applying of too much `anyhow`: if handle_pagerequests returns a `QueryError` it will now be bubbled up as-is `QueryError`. `QueryError` allows the inner `std::io::Error` to be inspected and thus we can filter certain error kinds which are perfectly normal without a huge log message. for a very long time (`b2f5102`) the errors were converted to `anyhow` by mistake which made this difficult or impossible, even though from the types it would appear that we propagate wrapped `std::io::Error`s and can filter them. Fixes #4113, most likely filters some other errors as well.	2023-04-30 14:34:55 +03:00
Rahul Patil	21eb944b5e	Staging: Add safekeeper nodes [3-8] to eu-west-1 (#4123 )	2023-04-29 15:25:57 +03:00
Arthur Petukhovsky	95244912c5	Override sharded-slab to increase MAX_THREADS (#4122 ) Add patch directive to Cargo.toml to use patched version of sharded-slab: `98d16753ab` Patch changes the MAX_THREADS limit from 4096 to 32768. This is a temporary workaround for using tracing from many threads in safekeepers code, until async safekeepers patch is merged to the main. Note that patch can affect other rust services, not only the safekeeper binary.	2023-04-29 13:31:04 +03:00
Shany Pozin	2617e70008	Add 4 new Pageservers for retool launch (#4115 ) ## Describe your changes Adding 4 new pageserves to us-west TF apply output: module.pageserver-us-west-2.aws_instance.this["7"]: Creation complete after 21s [id=i-02eec9b40617db5bc] module.pageserver-us-west-2.aws_instance.this["5"]: Creation complete after 21s [id=i-00ca6417c7bf96820] module.pageserver-us-west-2.aws_instance.this["4"]: Creation complete after 21s [id=i-013263dd1c239adcc] module.pageserver-us-west-2.aws_instance.this["6"]: Creation complete after 22s [id=i-01cdf7d2bc1433b6a]	2023-04-29 11:42:52 +02:00
Arthur Petukhovsky	8543485e92	Pull clone timeline from peer safekeepers (#4089 ) Add HTTP endpoint to initialize safekeeper timeline from peer safekeepers. This is useful for initializing new safekeeper to replace failed safekeeper. Not fully "correct" in all cases, but should work in most. This code is not suitable for production workloads but can be tested on staging to get started. New endpoint is separated from usual cases and should not affect anything if no one explicitly uses a new endpoint. We can rollback this commit in case of issues.	2023-04-28 14:20:46 +00:00
Joonas Koivunen	ec53c5ca2e	revert: "Add check for duplicates of generated image layers" (#4104 ) This reverts commit `732acc5`. Reverted PR: #3869 As noted in PR #4094, we do in fact try to insert duplicates to the layer map, if L0->L1 compaction is interrupted. We do not have a proper fix for that right now, and we are in a hurry to make a release to production, so revert the changes related to this to the state that we have in production currently. We know that we have a bug here, but better to live with the bug that we've had in production for a long time, than rush a fix to production without testing it in staging first. Cc: #4094, #4088	2023-04-28 17:20:18 +03:00
Stas Kelvich	94d612195a	bump rust-postgres version, after merging PR in rust-postgres	2023-04-28 17:15:43 +03:00
Stas Kelvich	b1329db495	fix sigterm handling	2023-04-28 17:15:43 +03:00
Stas Kelvich	5bb971d64e	fix more python tests	2023-04-28 17:15:43 +03:00
Stas Kelvich	0364f77b9a	fix python styling	2023-04-28 17:15:43 +03:00
Stas Kelvich	4ac6a9f089	add backward compatibility to proxy	2023-04-28 17:15:43 +03:00
Stas Kelvich	9486d76b2a	Add tests for link auth to compute connection	2023-04-28 17:15:43 +03:00
Stas Kelvich	040f736909	remove changes in main proxy that are now not needed	2023-04-28 17:15:43 +03:00
Stas Kelvich	645e4f6ab9	use TLS in link proxy	2023-04-28 17:15:43 +03:00
Heikki Linnakangas	e947cc119b	Add a small test case for pg_sni_router	2023-04-28 17:15:43 +03:00
Heikki Linnakangas	53e5d18da5	Start passthrough earlier As soon as we have received the SSLRequest packet, and have figured out the hostname to connect to from the SNI, we can start passing through data. We don't need to parse the StartupPacket that the client will send next.	2023-04-28 17:15:43 +03:00
Heikki Linnakangas	3813c703c9	Add an option for destination port. Makes it easier to test locally.	2023-04-28 17:15:43 +03:00
Heikki Linnakangas	b15204fa8c	Fix --help, and required args	2023-04-28 17:15:43 +03:00
Alexey Kondratov	81c75586ab	Take port from SNI, formatting, make clippy happy	2023-04-28 17:15:43 +03:00
Anton Chaporgin	556fb1642a	fixed the way hostname is parsed	2023-04-28 17:15:43 +03:00
Stas Kelvich	23aca81943	Add SNI-based proxy router In order to not to create NodePorts for each compute we can setup services that accept connections on wildcard domains and then use information from domain name to route connection to some internal service. There are ready solutions for HTTPS and TLS connections but postgresql protocol uses opportunistic TLS and we haven't found any ready solutions. This patch introduces `pg_sni_router` which routes connections to `aaa--bbb--123.external.domain` to `aaa.bbb.123.internal.domain`. In the long run we can avoid console -> compute psql communications, but now this router seems to be the easier way forward.	2023-04-28 17:15:43 +03:00
Arseny Sher	42798e6adc	Increase connection_timeout to PG in find end of WAL test. And log postgres to stdout. Probably fixes https://github.com/neondatabase/neon/issues/3778	2023-04-28 16:17:23 +04:00
Arthur Petukhovsky	b03143dfc8	Use serde_as DisplayFromStr everywhere (#4103 ) We used `display_serialize` previously, but it works only for Serialize. `DisplayFromStr` does the same, but also works for Deserialize.	2023-04-28 13:55:07 +03:00
Arseny Sher	fdacfaabfd	Move PageserverFeedback to utils. It allows to replace u64 with proper Lsn and pretty print PageserverFeedback with serde(_json). Now walsenders on safekeepers queried with debug_dump look like "walsenders": [ { "ttid": "fafe0cf39a99c608c872706149de9d2a/b4fb3be6f576935e7f0fcb84bdb909a1", "addr": "127.0.0.1:48774", "conn_id": 3, "appname": "pageserver", "feedback": { "Pageserver": { "current_timeline_size": 32096256, "last_received_lsn": "0/2415298", "disk_consistent_lsn": "0/1696628", "remote_consistent_lsn": "0/0", "replytime": "2023-04-12T13:54:53.958856+00:00" } } } ],	2023-04-28 06:22:13 +04:00
Arseny Sher	b2a3981ead	Move tracking of walsenders out of Timeline. Refactors walsenders out of timeline.rs to makes it less convoluted into separate WalSenders with its own lock, but otherwise having the same structure. Tracking of in-memory remote_consistent_lsn is also moved there as it is mainly received from pageserver. State of walsender (feedback) is also restructured to be cleaner; now it is either PageserverFeedback or StandbyFeedback(StandbyReply, HotStandbyFeedback), but not both.	2023-04-28 06:22:13 +04:00
Joonas Koivunen	fe0b616299	feat(page_service): read timeouts (#4093 ) Introduce read timeouts to our `page_service` connections. Without read timeouts, we essentially leak connections. This is a port of #3995. Split the refactorings to the other PR: #4097. Fixes #4028.	2023-04-27 17:55:35 +00:00
Alexander Bayandin	c4e1cafb63	scripts/flaky_tests.py: handle connection error (#4096 ) - Increase `connect_timeout` to 30s, which should be enough for most of the cases - If the script cannot connect to the DB (or any other `psycopg2.OperationalError` occur) — do not fail the script, log the error and proceed. Problems with fetching flaky tests shouldn't block the PR	2023-04-27 17:08:00 +01:00
Joonas Koivunen	fdf5e4db5e	refactor: Cleanup page service (#4097 ) Refactoring part of #4093. Numerious `Send + Sync` bounds were a distraction, that were not needed at all. The proper `Bytes` usage and one `"error_message".to_string()` are just drive-by fixes. Not using the `PostgresBackendTCP` allows us to start setting read timeouts (and more). `PostgresBackendTCP` is still used from proxy, so it cannot be removed.	2023-04-27 18:51:57 +03:00
Heikki Linnakangas	d1e86d65dc	Run rustfmt to fix whitespace. Commit `e6ec2400fc` introduced some trivial whitespace issues.	2023-04-27 18:45:22 +03:00
Arseny Sher	f5b4697c90	Log session_id when proxy per client task errors out.	2023-04-27 19:08:22 +04:00
Christian Schwarz	3be81dd36b	fix `clippy --release` failure introduced in #4030 (#4095 ) PR `build: run clippy for powerset of features (#4077)` brought us a `clippy --release` pass. It was merged after #4030, which fails under `clippy --release` with ``` error: static `TENANT_ID_EXTRACTOR` is never used --> pageserver/src/tenant/timeline.rs:4270:16 \| 4270 \| pub static TENANT_ID_EXTRACTOR: once_cell::sync::Lazy< \| ^^^^^^^^^^^^^^^^^^^ \| = note: `-D dead-code` implied by `-D warnings` error: static `TIMELINE_ID_EXTRACTOR` is never used --> pageserver/src/tenant/timeline.rs:4276:16 \| 4276 \| pub static TIMELINE_ID_EXTRACTOR: once_cell::sync::Lazy< \| ^^^^^^^^^^^^^^^^^^^^^ ``` A merge queue would have prevented this.	2023-04-27 17:07:25 +03:00
MMeent	e6ec2400fc	Enable hot standby PostgreSQL replicas. Notes: - This still needs UI support from the Console - I've not tuned any GUCs for PostgreSQL to make this work better - Safekeeper has gotten a tweak in which WAL is sent and how: It now sends zero-ed WAL data from the start of the timeline's first segment up to the first byte of the timeline to be compatible with normal PostgreSQL WAL streaming. - This includes the commits of #3714 Fixes one part of https://github.com/neondatabase/neon/issues/769 Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2023-04-27 15:26:44 +02:00
Christian Schwarz	5b911e1f9f	build: run clippy for powerset of features (#4077 ) This will catch compiler & clippy warnings in all feature combinations. We should probably use cargo hack for build and test as well, but, that's quite expensive and would add to overall CI wait times. obsoletes https://github.com/neondatabase/neon/pull/4073 refs https://github.com/neondatabase/neon/pull/4070	2023-04-27 15:01:27 +03:00
Christian Schwarz	9ea7b5dd38	clean up logging around on-demand downloads (#4030 ) - Remove repeated tenant & timeline from span - Demote logging of the path to debug level - Log completion at info level, in the same function where we log errors - distinguish between layer file download success & on-demand download succeeding as a whole in the log message wording - Assert that the span contains a tenant id and a timeline id fixes https://github.com/neondatabase/neon/issues/3945 Before: ``` INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{tenant_id=$TENANT_ID timeline_id=$TIMELINE_ID layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: download complete: /storage/pageserver/data/tenants/$TENANT_ID/timelines/$TIMELINE_ID/000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91 INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{tenant_id=$TENANT_ID timeline_id=$TIMELINE_ID layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: Rebuilt layer map. Did 9 insertions to process a batch of 1 updates. ``` After: ``` INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: layer file download finished INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: Rebuilt layer map. Did 9 insertions to process a batch of 1 updates. INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: on-demand download successful ```	2023-04-27 11:54:48 +02:00
Arseny Sher	0112a602e1	Add timeout on proxy -> compute connection establishment. Otherwise we sit up to default tcp_syn_retries (about 2+ min) before gettings os error 110 if compute has been migrated to another pod.	2023-04-27 09:50:52 +04:00
Anastasia Lubennikova	92214578af	Fix proxy_io_bytes_per_client metric: use branch_id identifier properly. (#4084 ) It fixes the miscalculation of the metric for projects that use multiple branches for the same endpoint. We were under billing users with such projects. So we need to communicate the change in Release Notes.	2023-04-26 17:47:54 +03:00
Christian Schwarz	6861259be7	add global metric for unexpected on-demand downloads (#4069 ) Until we have toned down the prod logs to zero WARN and ERROR, we want a dedicated metric for which we can have a dedicated alert. fixes https://github.com/neondatabase/neon/issues/3924	2023-04-26 15:18:26 +02:00
Sergey Melnikov	11df2ee5d7	Add safekeeper-3.us-east-2.aws.neon.build (#4085 )	2023-04-26 14:40:36 +03:00
Arseny Sher	31a3910fd9	Remove wait_for_sk_commit_lsn_to_reach_remote_storage. It had a couple of inherent races: 1) Even if compute is killed before the call, some more data might still arrive to safekeepers after commit_lsn on them is polled, advancing it. Then checkpoint on pageserver might not include this tail, and so upload of expected LSN won't happen until one more checkpoint. 2) commit_lsn is updated asynchronously -- compute can commit transaction before communicating commit_lsn to even single safekeeper (sync-safekeepers can be used to forces the advancement). This makes semantics of wait_for_sk_commit_lsn_to_reach_remote_storage quite complicated. Replace it with last_flush_lsn_upload which 1) Learns last flush LSN on compute; 2) Waits for it to arrive to pageserver; 3) Checkpoints it; 4) Waits for the upload. In some tests this keeps compute alive longer than before, but this doesn't seem to be important. There is a chance this fixes https://github.com/neondatabase/neon/issues/3209	2023-04-26 13:46:33 +04:00
Joonas Koivunen	381c8fca4f	feat: log how long tenant activation takes (#4080 ) Adds just a counter counting up from the creation to the tenant, logged after activation. Might help guide us with the investigation of #4025.	2023-04-26 12:39:17 +03:00
Joonas Koivunen	4625da3164	build: remove busted sk-1.us-east-2 from staging hosts (#4082 ) this should give us complete deployments while a new one is being brought up.	2023-04-26 09:07:45 +00:00
Joonas Koivunen	850f6b1cb9	refactor: drop pageserver_ondisk_layers (#4071 ) I didn't get through #3775 fast enough so we wanted to remove this metric. Fixes #3705.	2023-04-26 11:49:29 +03:00
Sergey Melnikov	f19b70b379	Configure extra domain for us-east-1 (#4078 )	2023-04-26 09:36:26 +02:00
Sergey Melnikov	9d0cf08d5f	Fix new storage-broker deploy for eu-central-1 (#4079 )	2023-04-26 10:29:44 +03:00
Alexander Bayandin	2d6fd72177	GitHub Workflows: Fix crane for several registries (#4076 ) Follow-up fix after https://github.com/neondatabase/neon/pull/4067 ``` + crane tag neondatabase/vm-compute-node-v14:3064 latest Error: fetching "neondatabase/vm-compute-node-v14:3064": GET https://index.docker.io/v2/neondatabase/vm-compute-node-v14/manifests/3064: MANIFEST_UNKNOWN: manifest unknown; unknown tag=3064 ``` I reverted back the previous approach for promoting images (login to one registry, save images to local fs, logout and login to another registry, and push images from local fs). It turns out what works for one Google project (kaniko), doesn't work for another (crane) [sigh]	2023-04-25 23:58:59 +01:00
Heikki Linnakangas	8945fbdb31	Enable OpenTelemetry tracing in proxy in staging. (#4065 ) Depends on https://github.com/neondatabase/helm-charts/pull/32 Co-authored-by: Lassi Pölönen <lassi.polonen@iki.fi>	2023-04-25 20:45:36 +03:00
Alexander Bayandin	05ac0e2493	Login to ECR and Docker Hub at once (#4067 ) - Update kaniko to 1.9.2 (from 1.7.0), problem with reproducible build is fixed - Login to ECR and Docker Hub at once, so we can push to several registries, it makes job `push-docker-hub` unneeded - `push-docker-hub` replaced with `promote-images` in `needs:` clause, Pushing images to production ECR moved to `promote-images` job	2023-04-25 17:54:10 +01:00
Joonas Koivunen	bfd45dd671	test_tenant_config: allow ERROR from eviction task (#4074 )	2023-04-25 18:41:09 +03:00
Joonas Koivunen	7f80230fd2	fix: stop dead_code rustc lint (#4070 ) only happens without `--all-features` which is what `./run_clippy.sh` uses.	2023-04-25 17:07:04 +02:00
Sergey Melnikov	78bbbccadb	Deploy proxies for preview enviroments (#4052 ) ## Describe your changes Deploy `main` proxies to the preview environments We don't deploy storage there yet, as it's tricky. ## Issue ticket number and link https://github.com/neondatabase/cloud/issues/4737	2023-04-25 16:46:52 +02:00
Christian Schwarz	dbbe032c39	neon_local: fix `tenant create -c eviction_policy:...` (#4004 ) And add corresponding unit test. The fix is to use `.remove()` instead of `.get()` when processing the arugments hash map. The code uses emptiness of the hash map to determine whether all arguments have been processed. This was likely a copy-paste error. refs https://github.com/neondatabase/neon/issues/3942	2023-04-25 15:33:30 +02:00
Joonas Koivunen	cb9473928d	feat: add rough timings for basebackup (#4062 ) just record the time needed for waiting the lsn and then the basebackup in a log message in millis. this is related to ongoing investigations to cold start performance. this could also be a a counter. it cannot be added next to smgr histograms, because we don't want another histogram per timeline. the aim is to allow drilling deeper into which timelines were slow, and to understand why some need two basebackups.	2023-04-25 13:22:16 +00:00
Christian Schwarz	fa20e37574	add gauge for in-flight layer uploads (#3951 ) For the "worst-case /storage usage panel", we need to compute ``` remote size + local-only size ``` We currently don't have a metric for local-only layers. The number of in-flight layers in the upload queue is just that, so, let Prometheus scrape it. The metric is two counters (started and finished). The delta is the amount of in-flight uploads in the queue. The metrics are incremented in the respective `call_unfinished_metric_*` functions. These track ongoing operations by file_kind and op_kind. We only need this metric for layer uploads, so, there's the new RemoteTimelineClientMetricsCallTrackSize type that forces all call sites to decide whether they want the size tracked or not. If we find that other file_kinds or op_kinds are interesting (metadata uploads, layer downloads, layer deletes) are interesting, we can just enable them, and they'll be just another label combination within the metrics that this PR adds. fixes https://github.com/neondatabase/neon/issues/3922	2023-04-25 14:22:48 +02:00
Joonas Koivunen	4911d7ce6f	feat: warn when requests get cancelled (#4064 ) Add a simple disarmable dropguard to log if request is cancelled before it is completed. We currently don't have this, and it makes for difficult to know when the request was dropped.	2023-04-25 15:22:23 +03:00
Christian Schwarz	e83684b868	add libmetric metric for each logged log message (#4055 ) This patch extends the libmetrics logging setup functionality with a `tracing` layer that increments a Prometheus counter each time we log a log message. We have the counter per tracing event level. This allows for monitoring WARN and ERR log volume without parsing the log. Also, it would allow cross-checking whether logs got dropped on the way into Loki. It would be nicer if we could hook deeper into the tracing logging layer, to avoid evaluating the filter twice. But I don't know how to do it.	2023-04-25 14:10:18 +02:00
Eduard Dyckman	afbbc61036	Adding synthetic size to pageserver swagger (#4049 ) ## Describe your changes I added synthetic size response to the console swagger. Now I am syncing it back to neon	2023-04-24 16:19:25 +03:00
Alexey Kondratov	7ba5c286b7	[compute_ctl] Improve 'empty' compute startup sequence (#4034 ) Do several attempts to get spec from the control-plane and retry network errors and all reasonable HTTP response codes. Do not hang waiting for spec without confirmation from the control-plane that compute is known and is in the `Empty` state. Adjust the way we track `total_startup_ms` metric, it should be calculated since the moment we received spec, not from the moment `compute_ctl` started. Also introduce a new `wait_for_spec_ms` metric to track the time spent sleeping and waiting for spec to be delivered from control-plane. Part of neondatabase/cloud#3533	2023-04-21 11:10:48 +02:00
sharnoff	02b28ae0b1	fix vm-informant dbname: "neondb" -> "postgres" (#4046 ) Changes the vm-informant's postgres connection string's dbname from "neondb" (which sometimes doesn't exist) to "postgres" (which _hopefully_ should exist more often?). Currently there are a handful of VMs in prod that aren't working with autoscaling because they don't have the "neondb" database. The vm-informant doesn't require any database in particular; it's just connecting as `cloud_admin` to be able to adjust the file cache settings.	2023-04-18 18:54:32 +03:00
Cihan Demirci	0bfbae2d73	Add storage broker deployment to us-east-1 (#4048 )	2023-04-18 18:41:09 +03:00
fcdm	f1b7dc4064	Update pageserver instances in us-east-1	2023-04-18 14:08:12 +01:00
Alexander Bayandin	e2a5177e89	Bump h2 from 0.3.17 to 0.3.18 (#4045 )	2023-04-18 16:04:10 +03:00
Cihan Demirci	0c083564ce	Add us-east-1 hosts file and update regions (#4042 ) ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-04-17 15:25:27 +03:00
fcdm	d8dd60dc81	Add helm values for us-east-1	2023-04-17 11:59:38 +01:00
Arthur Petukhovsky	73f34eaa5e	Send AppendResponse keepalive once per second (#4036 ) Walproposer sends AppendRequest at least once per second. This patch adds a response to these requests once per second. Fixes https://github.com/neondatabase/neon/issues/4017	2023-04-17 11:24:57 +03:00
Matt Nappo	c2496c7ef2	Added black_box in layer_map benches (fix #3396 )	2023-04-16 16:33:37 +03:00
Kirill Bulatov	ebea298415	Update most of the dependencies to their latest versions (#4026 ) See https://github.com/neondatabase/neon/pull/3991 Brings the changes back with the right way to use new `toml_edit` to deserialize values and a unit test for this. All non-trivial updates extracted into separate commits, also `carho hakari` data and its manifest format were updated. 3 sets of crates remain unupdated: * `base64` — touches proxy in a lot of places and changed its api (by 0.21 version) quite strongly since our version (0.13). * `opentelemetry` and `opentelemetry-` crates ``` error[E0308]: mismatched types --> libs/tracing-utils/src/http.rs:65:21 \| 65 \| span.set_parent(parent_ctx); \| ---------- ^^^^^^^^^^ expected struct `opentelemetry_api::context::Context`, found struct `opentelemetry::Context` \| \| \| arguments to this method are incorrect \| = note: struct `opentelemetry::Context` and struct `opentelemetry_api::context::Context` have similar names, but are actually distinct types note: struct `opentelemetry::Context` is defined in crate `opentelemetry_api` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.19.0/src/context.rs:77:1 \| 77 \| pub struct Context { \| ^^^^^^^^^^^^^^^^^^ note: struct `opentelemetry_api::context::Context` is defined in crate `opentelemetry_api` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.18.0/src/context.rs:77:1 \| 77 \| pub struct Context { \| ^^^^^^^^^^^^^^^^^^ = note: perhaps two different versions of crate `opentelemetry_api` are being used? note: associated function defined here --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-opentelemetry-0.18.0/src/span_ext.rs:43:8 \| 43 \| fn set_parent(&self, cx: Context); \| ^^^^^^^^^^ For more information about this error, try `rustc --explain E0308`. error: could not compile `tracing-utils` due to previous error warning: build failed, waiting for other jobs to finish... error: could not compile `tracing-utils` due to previous error ``` `tracing-opentelemetry` of version `0.19` is not yet released, that is supposed to have the update we need. similarly, `rustls`, `tokio-rustls`, `rustls-` and `tls-listener` crates have similar issue: ``` error[E0308]: mismatched types --> libs/postgres_backend/tests/simple_select.rs:112:78 \| 112 \| let mut make_tls_connect = tokio_postgres_rustls::MakeRustlsConnect::new(client_cfg); \| --------------------------------------------- ^^^^^^^^^^ expected struct `rustls::client::client_conn::ClientConfig`, found struct `ClientConfig` \| \| \| arguments to this function are incorrect \| = note: struct `ClientConfig` and struct `rustls::client::client_conn::ClientConfig` have similar names, but are actually distinct types note: struct `ClientConfig` is defined in crate `rustls` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.21.0/src/client/client_conn.rs:125:1 \| 125 \| pub struct ClientConfig { \| ^^^^^^^^^^^^^^^^^^^^^^^ note: struct `rustls::client::client_conn::ClientConfig` is defined in crate `rustls` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.20.8/src/client/client_conn.rs:91:1 \| 91 \| pub struct ClientConfig { \| ^^^^^^^^^^^^^^^^^^^^^^^ = note: perhaps two different versions of crate `rustls` are being used? note: associated function defined here --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-postgres-rustls-0.9.0/src/lib.rs:23:12 \| 23 \| pub fn new(config: ClientConfig) -> Self { \| ^^^ For more information about this error, try `rustc --explain E0308`. error: could not compile `postgres_backend` due to previous error warning: build failed, waiting for other jobs to finish... ``` aws crates: I could not make new API to work with bucket endpoint overload, and console e2e tests failed. Other our tests passed, further investigation is worth to be done in https://github.com/neondatabase/neon/issues/4008	2023-04-14 18:28:54 +03:00
Vadim Kharitonov	5ffa20dd82	[proxy] adjust proxy sleep timeout	2023-04-14 15:08:07 +03:00
Vadim Kharitonov	75ea8106ec	Add `procps` into compute containers	2023-04-14 15:02:26 +03:00
Vadim Kharitonov	017d3a390d	Compile postgres with lz4 and zstd support	2023-04-14 15:02:26 +03:00
Alexey Kondratov	589cf1ed21	[compute_ctl] Do not create availability checker data on each start (#4019 ) Initially, idea was to ensure that when we come and check data availability, special service table already contains one row. So if we loose it for some reason, we will error out. Yet, to do availability check we anyway start compute first! So it doesn't really add some value, but we affect each compute start as we update at least one row in the database. Also this writes some WAL, so if timeline is close to `neon.max_cluster_size` it could prevent compute from starting up. That said, do CREATE TABLE IF NOT EXISTS + UPSERT right in the `/check_writability` handler.	2023-04-14 13:05:07 +02:00
Alexander Bayandin	0c82ff3d98	test_runner: add Timeline Inspector to Grafana links (#4021 )	2023-04-14 11:46:47 +01:00
Christian Schwarz	8895f28dae	make evictions_low_residence_duration_metric_threshold per-tenant (#3949 ) Before this patch, if a tenant would override its eviction_policy setting to use a lower LayerAccessThreshold::threshold than the `evictions_low_residence_duration_metric_threshold`, the evictions done for that tenant would count towards the `evictions_with_low_residence_duration` metric. That metric is used to identify pre-mature evictions, commonly triggered by disk-usage-based eviction under disk pressure. We don't want that to happen for the legitimate evictions of the tenant that overrides its eviction_policy. So, this patch - moves the setting into TenantConf - adds test coverage - updates the staging & prod yamls Forward Compatibility: Software before this patch will ignore the new tenant conf field and use the global one instead. So we can roll back safely. Backward Compatibility: Parsing old configs with software as of this patch will fail in `PageServerConf::parse_and_validate` with error `unrecognized pageserver option 'evictions_low_residence_duration_metric_threshold'` if the option is still present in the global section. We deal with this by updating the configs in Ansible. fixes https://github.com/neondatabase/neon/issues/3940	2023-04-14 13:25:45 +03:00
dependabot[bot]	b6c7c3290f	Bump h2 from 0.3.15 to 0.3.17 (#4020 )	2023-04-13 20:03:24 +01:00
Sasha Krassovsky	fd31fafeee	Make proxy shutdown when all connections are closed (#3764 ) ## Describe your changes Makes Proxy start draining connections on SIGTERM. ## Issue ticket number and link #3333	2023-04-13 19:31:30 +03:00
Alexey Kondratov	db8dd6f380	[compute_ctl] Implement live reconfiguration (#3980 ) With this commit one can request compute reconfiguration from the running `compute_ctl` with compute in `Running` state by sending a new spec: ```shell curl -d "{\"spec\": $(cat ./compute-spec-new.json)}" http://localhost:3080/configure ``` Internally, we start a separate configurator thread that is waiting on `Condvar` for `ConfigurationPending` compute state in a loop. Then it does reconfiguration, sets compute back to `Running` state and notifies other waiters. It will need some follow-ups, e.g. for retry logic for control-plane requests, but should be useful for testing in the current state. This shouldn't affect any existing environment, since computes are configured in a different way there. Resolves neondatabase/cloud#4433	2023-04-13 18:07:29 +02:00
Alexander Bayandin	36c20946b4	Verify extensions checksums (#4014 ) To not be taken by surprise by upstream git re-tag or by malicious activity, let's verify the checksum for extensions we download Also, unify the installation of `pg_graphql` and `pg_tiktoken` with other extensions.	2023-04-13 15:25:09 +01:00
Heikki Linnakangas	89b5589b1b	Tenant size should never be zero. Simplify test. Looking at the git history of this test, I think "size == 0" used to have a special meaning earlier, but now it should never happen.	2023-04-13 16:57:31 +03:00
Heikki Linnakangas	53f438a8a8	Rename "Postgres nodes" in control_plane to endpoints. We use the term "endpoint" in for compute Postgres nodes in the web UI and user-facing documentation now. Adjust the nomenclature in the code. This changes the name of the "neon_local pg" command to "neon_local endpoint". Also adjust names of classes, variables etc. in the python tests accordingly. This also changes the directory structure so that endpoints are now stored in: .neon/endpoints/<endpoint id> instead of: .neon/pgdatadirs/tenants/<tenant_id>/<endpoint (node) name> The tenant ID is no longer part of the path. That means that you cannot have two endpoints with the same name/ID in two different tenants anymore. That's consistent with how we treat endpoints in the real control plane and proxy: the endpoint ID must be globally unique.	2023-04-13 14:34:29 +03:00
Vadim Kharitonov	356439aa33	Add note about `manual_release_instructions` label (#4015 ) ## Describe your changes Do not forget to process required manual stuff after release ## Issue ticket number and link ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>	2023-04-13 13:13:24 +03:00
Vadim Kharitonov	c237a2f5fb	Compile `pg_hint_plan extension`	2023-04-13 12:59:46 +03:00
Dmitry Rodionov	15d1f85552	Add reason to TenantState::Broken (#3954 ) Reason and backtrace are added to the Broken state. Backtrace is automatically collected when tenant entered the broken state. The format for API, CLI and metrics is changed and unified to return tenant state name in camel case. Previously snake case was used for metrics and camel case was used for everything else. Now tenant state field in TenantInfo swagger spec is changed to contain state name in "slug" field and other fields (currently only reason and backtrace for Broken variant in "data" field). To allow for this breaking change state was removed from TenantInfo swagger spec because it was not used anywhere. Please note that the tenant's broken reason is not persisted on disk so the reason is lost when pageserver is restarted. Requires changes to grafana dashboard that monitors tenant states. Closes #3001 --------- Co-authored-by: theirix <theirix@gmail.com>	2023-04-13 12:11:43 +03:00
Konstantin Knizhnik	732acc54c1	Add check for duplicates of generated image layers (#3869 ) ## Describe your changes ## Issue ticket number and link #3673 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-04-13 10:19:34 +03:00
Stas Kelvich	5d0ecadf7c	Add support for non-SNI case in multi-cert proxy When no SNI is provided use the default certificate, otherwise we can't get to the options parameter which can be used to set endpoint name too. That means that non-SNI flow will not work for CNAME domains in verify-full mode.	2023-04-12 18:16:49 +03:00
Kirill Bulatov	f7995b3c70	Revert "Update most of the dependencies to their latest versions (#3991 )" (#4013 ) This reverts commit `a64044a7a9`. See https://neondb.slack.com/archives/C03H1K0PGKH/p1681306682795559	2023-04-12 14:51:59 +00:00
Alexander Bayandin	13e53e5dc8	GitHub Workflows: use '!cancelled' instead of 'success or failure'	2023-04-12 15:22:18 +01:00
Alexander Bayandin	c94b8998be	GitHub Workflows: print error messages to stderr	2023-04-12 15:22:18 +01:00
Alexander Bayandin	218062ceba	GitHub Workflows: use ref_name instead of ref	2023-04-12 15:22:18 +01:00
Sam Gaw	8d295780cb	Add support for ip4r extension	2023-04-12 16:40:02 +03:00
Kirill Bulatov	a64044a7a9	Update most of the dependencies to their latest versions (#3991 ) All non-trivial updates extracted into separate commits, also `carho hakari` data and its manifest format were updated. 3 sets of crates remain unupdated: * `base64` — touches proxy in a lot of places and changed its api (by 0.21 version) quite strongly since our version (0.13). * `opentelemetry` and `opentelemetry-` crates ``` error[E0308]: mismatched types --> libs/tracing-utils/src/http.rs:65:21 \| 65 \| span.set_parent(parent_ctx); \| ---------- ^^^^^^^^^^ expected struct `opentelemetry_api::context::Context`, found struct `opentelemetry::Context` \| \| \| arguments to this method are incorrect \| = note: struct `opentelemetry::Context` and struct `opentelemetry_api::context::Context` have similar names, but are actually distinct types note: struct `opentelemetry::Context` is defined in crate `opentelemetry_api` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.19.0/src/context.rs:77:1 \| 77 \| pub struct Context { \| ^^^^^^^^^^^^^^^^^^ note: struct `opentelemetry_api::context::Context` is defined in crate `opentelemetry_api` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.18.0/src/context.rs:77:1 \| 77 \| pub struct Context { \| ^^^^^^^^^^^^^^^^^^ = note: perhaps two different versions of crate `opentelemetry_api` are being used? note: associated function defined here --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-opentelemetry-0.18.0/src/span_ext.rs:43:8 \| 43 \| fn set_parent(&self, cx: Context); \| ^^^^^^^^^^ For more information about this error, try `rustc --explain E0308`. error: could not compile `tracing-utils` due to previous error warning: build failed, waiting for other jobs to finish... error: could not compile `tracing-utils` due to previous error ``` `tracing-opentelemetry` of version `0.19` is not yet released, that is supposed to have the update we need. similarly, `rustls`, `tokio-rustls`, `rustls-` and `tls-listener` crates have similar issue: ``` error[E0308]: mismatched types --> libs/postgres_backend/tests/simple_select.rs:112:78 \| 112 \| let mut make_tls_connect = tokio_postgres_rustls::MakeRustlsConnect::new(client_cfg); \| --------------------------------------------- ^^^^^^^^^^ expected struct `rustls::client::client_conn::ClientConfig`, found struct `ClientConfig` \| \| \| arguments to this function are incorrect \| = note: struct `ClientConfig` and struct `rustls::client::client_conn::ClientConfig` have similar names, but are actually distinct types note: struct `ClientConfig` is defined in crate `rustls` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.21.0/src/client/client_conn.rs:125:1 \| 125 \| pub struct ClientConfig { \| ^^^^^^^^^^^^^^^^^^^^^^^ note: struct `rustls::client::client_conn::ClientConfig` is defined in crate `rustls` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.20.8/src/client/client_conn.rs:91:1 \| 91 \| pub struct ClientConfig { \| ^^^^^^^^^^^^^^^^^^^^^^^ = note: perhaps two different versions of crate `rustls` are being used? note: associated function defined here --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-postgres-rustls-0.9.0/src/lib.rs:23:12 \| 23 \| pub fn new(config: ClientConfig) -> Self { \| ^^^ For more information about this error, try `rustc --explain E0308`. error: could not compile `postgres_backend` due to previous error warning: build failed, waiting for other jobs to finish... ``` aws crates: I could not make new API to work with bucket endpoint overload, and console e2e tests failed. Other our tests passed, further investigation is worth to be done in https://github.com/neondatabase/neon/issues/4008	2023-04-12 15:32:38 +03:00
Kirill Bulatov	d8939d4162	Move walreceiver start and stop behind a struct (#3973 ) The PR changes module function-based walreceiver interface with a `WalReceiver` struct that exposes a few public methods, `new`, `start` and `stop` now. Later, the same struct is planned to be used for getting walreceiver stats (and, maybe, other extra data) to display during missing wal errors for https://github.com/neondatabase/neon/issues/2106 Now though, the change required extra logic changes: * due to the `WalReceiver` struct added, it became easier to pass `ctx` and later do a `detached_child` instead of `bfee412701/pageserver/src/tenant/timeline.rs (L1379-L1381)` * `WalReceiver::start` which is now the public API to start the walreceiver, could return an `Err` which now may turn a tenant into `Broken`, same as the timeline that it tries to load during startup. * `WalReceiverConf` was added to group walreceiver parameters from pageserver's tenant config	2023-04-12 12:39:02 +03:00
Heikki Linnakangas	06ce83c912	Tolerate missing 'operation_uuid' field in spec file. 'compute_ctl' doesn't use the operation_uuid for anything, it just prints it to the log.	2023-04-12 12:11:22 +03:00
Heikki Linnakangas	8ace7a7515	Remove unused 'timestamp' field from ComputeSpec struct.	2023-04-12 12:11:22 +03:00
Heikki Linnakangas	ef68321b31	Use Lsn, TenantId, TimelineId types in compute_ctl. Stronger types are generally nicer.	2023-04-12 12:11:22 +03:00
Heikki Linnakangas	6064a26963	Refactor 'spec' in ComputeState. Sometimes, it contained real values, sometimes just defaults if the spec was not received yet. Make the state more clear by making it an Option instead. One consequence is that if some of the required settings like neon.tenant_id are missing from the spec file sent to the /configure endpoint, it is spotted earlier and you get an immediate HTTP error response. Not that it matters very much, but it's nicer nevertheless.	2023-04-12 01:55:40 +03:00
Stas Kelvich	3c9f42a2e2	Support aarch64 in walredo seccomp code (#3996 ) Aarch64 doesn't implement some old syscalls like open and select. Use openat instead of open to check if seccomp is supported. Leave both select and pselect6 in the allowlist since we don't call select syscall directly and may hope that libc will call pselect6 on aarch64. To check whether some syscall is supported it is possible to use `scmp_sys_resolver` from seccopm package: ``` > apt install seccopm > scmp_sys_resolver -a x86_64 select 23 > scmp_sys_resolver -a aarch64 select -10101 > scmp_sys_resolver -a aarch64 pselect6 72 ``` Negative value means that syscall is not supported. Another cross-check is to look up for the actuall syscall table in `unistd.h`. To resolve all the macroses one can use `gcc -E` as it is done in `dump_sys_aarch64()` function in libseccomp/src/arch-syscall-validate. --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-04-11 19:28:18 +00:00
Alexey Kondratov	40a68e9077	[compute_ctl] Add timeout for `tracing_utils::shutdown_tracing()` (#3982 ) Shutting down OTEL tracing provider may hang for quite some time, see, for example: - https://github.com/open-telemetry/opentelemetry-rust/issues/868 - and our problems with staging https://github.com/neondatabase/cloud/issues/3707#issuecomment-1493983636 Yet, we want computes to shut down fast enough, as we may need a new one for the same timeline ASAP. So wait no longer than 2s for the shutdown to complete, then just error out and exit the main thread. Related to neondatabase/cloud#3707	2023-04-11 15:05:35 +02:00
Stas Kelvich	de99ee2c0d	Add more proxy cnames	2023-04-11 14:54:09 +03:00
Alexander Bayandin	c79d5a947c	Nightly Benchmarks: run third-party benchmarks once a week (#3987 )	2023-04-11 10:58:04 +01:00
Arseny Sher	7ad5a5e847	Enable timeout on reading from socket in safekeeper WAL service. TCP_KEEPALIVE is not enabled by default, so this prevents hanged up connections in case of abrupt client termination. Add 'closed' flag to PostgresBackendReader and pass it during handles join to prevent attempts to read from socket if we errored out previously -- now with timeouts this is a common situation. It looks like 2023-04-10T18:08:37.493448Z INFO {cid=68}:WAL receiver{ttid=59f91ad4e821ab374f9ccdf918da3a85/16438f99d61572c72f0c7b0ed772785d}: terminated: timed out Presumably fixes https://github.com/neondatabase/neon/issues/3971	2023-04-11 11:45:43 +04:00
Stas Kelvich	22c890b71c	Add more cnames to proxies	2023-04-11 01:55:25 +03:00
Stas Kelvich	83549a8d40	Revert "Support aarch64 in walredo seccomp code" This reverts commit `98df7db094`.	2023-04-11 00:08:01 +03:00
Stas Kelvich	98df7db094	Support aarch64 in walredo seccomp code Aarch64 doesn't implement some old syscalls like open and select. Use openat instead of open to check if seccomp is supported. Leave both select and pselect6 in the allowlist since we don't call select syscall directly and may hope that libc will call pselect6 on aarch64. To check whether some syscall is supported it is possible to use `scmp_sys_resolver` from seccopm package: ``` > apt install seccopm > scmp_sys_resolver -a x86_64 select 23 > scmp_sys_resolver -a aarch64 select -10101 > scmp_sys_resolver -a aarch64 pselect6 72 ``` Negative value means that syscall is not supported. Another cross-check is to look up for the actuall syscall table in `unistd.h`. To resolve all the macroses one can use `gcc -E` as it is done in `dump_sys_aarch64()` function in libseccomp/src/arch-syscall-validate.	2023-04-10 23:54:16 +03:00
Heikki Linnakangas	f0b2e076d9	Move compute_ctl structs used in HTTP API and spec file to separate crate. This is in preparation of using compute_ctl to launch postgres nodes in the neon_local control plane. And seems like a good idea to separate the public interfaces anyway. One non-mechanical change here is that the 'metrics' field is moved under the Mutex, instead of using atomics. We were not using atomics for performance but for convenience here, and it seems more clear to not use atomics in the model for the HTTP response type.	2023-04-09 21:52:28 +03:00
Alexander Bayandin	818e341af0	Nightly Benchmarks: replace neon-captest-prefetch with -new/-reuse (#3970 ) We have enabled prefetch by default, let's use this in Nightly Benchmarks: - effective_io_concurrency=100 by default (instead of 32) - maintenance_io_concurrency=100 by default (instead of 32) Rename `neon-captest-prefetch` to `neon-captest-new` (for pgbench with initialisation) and `neon-captest-reuse` (for OLAP scenarios)	2023-04-09 12:52:49 +01:00
Kirill Bulatov	dec58092e8	Replace Box<dyn> with impl in RemoteStorage upload (#3984 ) Replaces `Box<(dyn io::AsyncRead + Unpin + Send + Sync + 'static)>` with `impl io::AsyncRead + Unpin + Send + Sync + 'static` usages in the `RemoteStorage` interface, to make it closer to [`#![feature(async_fn_in_trait)]`](https://blog.rust-lang.org/inside-rust/2022/11/17/async-fn-in-trait-nightly.html) For `GenericRemoteStorage`, replaces `type Target = dyn RemoteStorage` with another impl with `RemoteStorage` methods inside it. We can reuse the trait, that would require importing the trait in every file where it's used and makes us farther from the unstable feature. After this PR, I've manged to create a patch with the changes: https://github.com/neondatabase/neon/compare/kb/less-dyn-storage...kb/nightly-async-trait?expand=1 Current rust implementation does not like recursive async trait calls, so `UnreliableWrapper` was removed: it contained a `GenericRemoteStorage` that implemented the `RemoteStorage` trait, and itself implemented the trait, which nightly rustc did not like and proposed to box the future. Similarly, `GenericRemoteStorage` cannot implement `RemoteStorage` for nightly rustc to work, since calls various remote storages' methods from inside. I've compiled current `main` and the nightly branch both with `time env RUSTC_WRAPPER="" cargo +nightly build --all --timings` command, and got ``` Finished dev [optimized + debuginfo] target(s) in 2m 04s env RUSTC_WRAPPER="" cargo +nightly build --all --timings 1283.19s user 50.40s system 1074% cpu 2:04.15 total for the new feature tried and Finished dev [optimized + debuginfo] target(s) in 2m 40s env RUSTC_WRAPPER="" cargo +nightly build --all --timings 1288.59s user 52.06s system 834% cpu 2:40.71 total for the old async_trait approach. ``` On my machine, the `remote_storage` lib compilation takes ~10 less time with the nightly feature (left) than the regular main (right). ![image](https://user-images.githubusercontent.com/2690773/230620797-163d8b89-dac8-4366-bcf6-cd1cdddcd22c.png) Full cargo reports are available at [timings.zip](https://github.com/neondatabase/neon/files/11179369/timings.zip)	2023-04-07 21:39:49 +03:00
Stas Kelvich	0bf70e113f	Add extra cnames to staging proxy	2023-04-07 19:18:19 +03:00
Vadim Kharitonov	31f2cdeb1e	Update Dockerfile.compute-node Co-authored-by: MMeent <matthias@neon.tech>	2023-04-07 15:26:22 +02:00
Vadim Kharitonov	979fa8b1ba	Compile timescaledb	2023-04-07 15:26:22 +02:00
Konstantin Knizhnik	bfee412701	Trigger tests for index scan implementation (#3968 ) ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-04-07 14:26:21 +03:00
Dmitry Rodionov	bfeb428d1b	tests: make neon_fixtures a bit thinner by splitting out some pageserver related helpers (#3977 ) neon_fixture is quite big and messy, lets clean it up a bit.	2023-04-07 13:47:28 +03:00
Stas Kelvich	b1c2a6384a	Set non-wildcard common names in link auth proxy Old coding here ignored non-wildcard common names and passed None instead. With my recent changes I started throwing an error in that case. Old logic doesn't seem to be a great choice, so instead of passing None I actually set non-wildcard common names too. That way it is possible to avoid handling cases with None in downstream code.	2023-04-07 01:24:27 +03:00
Anastasia Lubennikova	6d01d835a8	[proxy] Report error if proxy_io_bytes_per_client metric has decreased	2023-04-06 23:14:07 +03:00
Alexey Kondratov	e42982fb1e	[compute_ctl] Empty computes and /configure API (#3963 ) This commit adds an option to start compute without spec and then pass it a valid spec via `POST /configure` API endpoint. This is a main prerequisite for maintaining the pool of compute nodes in the control-plane. For example: 1. Start compute with ```shell cargo run --bin compute_ctl -- -i no-compute \ -p http://localhost:9095 \ -D compute_pgdata \ -C "postgresql://cloud_admin@127.0.0.1:5434/postgres" \ -b ./pg_install/v15/bin/postgres ``` 2. Configure it with ```shell curl -d "{\"spec\": $(cat ./compute-spec.json)}" http://localhost:3080/configure ``` Internally, it's implemented using a `Condvar` + `Mutex`. Compute spec is moved under Mutex, as it's now could be updated in the http handler. Also `RwLock` was replaced with `Mutex` because the latter works well with `Condvar`. First part of the neondatabase/cloud#4433	2023-04-06 21:21:58 +02:00
Dmitry Rodionov	b45c92e533	tests: exclude compatibility tests by default (#3975 ) This allows to skip compatibility tests based on `CHECK_ONDISK_DATA_COMPATIBILITY` environment variable. When the variable is missing (default) compatibility tests wont be run.	2023-04-06 21:21:39 +03:00
Arthur Petukhovsky	ba4a96fdb1	Eagerly update wal_backup_lsn after each segment offload (#3976 ) Otherwise it can lag a lot, preventing WAL segments cleanup. Also max wal_backup_lsn on update, pulling it down is pointless. Should help with https://github.com/neondatabase/neon/issues/3957, but will not fix it completely.	2023-04-06 20:57:06 +03:00
Alexander Bayandin	4d64edf8a5	Nightly Benchmarks: Add free tier sized compute (#3969 ) - Add support for VMs and CU - Add free tier limited benchmark (0.25 CU) - Ensure we use 1 CU by default for pgbench workload	2023-04-06 19:18:24 +03:00
Kirill Bulatov	102746bc8f	Apply clippy rule exclusion locally instead of a global approach (#3974 )	2023-04-06 18:57:48 +03:00
Alexander Bayandin	887cee64e2	test_runner: add links to grafana for remote tests (#3961 ) Add Grafana links to allure reports to make it easier to debug perf test failures	2023-04-06 13:52:41 +01:00
Vadim Kharitonov	2ce973c72f	Allow installation of `pg_stat_statements`	2023-04-06 13:26:40 +02:00
Gleb Novikov	9db70f6232	Added disk_size and instance_type to payload (#3918 ) ## Describe your changes In https://github.com/neondatabase/cloud/issues/4354 we are making scheduling of projects based on available disk space and overcommit, so we need to know disk size and just in case instance type of the pageserver ## Issue ticket number and link https://github.com/neondatabase/cloud/issues/4354 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] ~If it is a core feature, I have added thorough tests.~ - [ ] ~Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?~ - [ ] ~If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.~	2023-04-06 14:02:56 +04:00
Joonas Koivunen	b17c24fa38	fix: settle down to configured percent (#3947 ) in real env testing we noted that the disk-usage based eviction sails 1 percentage point above the configured value, which might be a source of confusion, so it might be better to get rid of that confusion now. confusion: "I configured 85% but pageserver sails at 86%". Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-04-06 12:47:21 +03:00
Alexander Bayandin	9310949b44	GitHub Autocomment: Retry on server errors (#3958 ) Retry posting/updating a comment in case of 5XX errors from GitHub API	2023-04-05 22:08:06 +03:00
Stas Kelvich	d8df5237fa	Aligne extra certificate name with default cert-manager names	2023-04-05 21:29:21 +03:00
Stas Kelvich	c3ca48c62b	Support extra domain names for proxy. Make it possible to specify directory where proxy will look up for extra certificates. Proxy will iterate through subdirs of that directory and load `key.pem` and `cert.pem` files from each subdir. Certs directory structure may look like that: certs \|--example.com \| \|--key.pem \| \|--cert.pem \|--foo.bar \|--key.pem \|--cert.pem Actual domain names are taken from certs and key, subdir names are ignored.	2023-04-05 20:06:48 +03:00
Alexander Bayandin	957acb51b5	GitHub Autocomment: Fix the link to the latest commit (#3952 )	2023-04-04 19:06:10 +03:00
Alexander Bayandin	1d23b5d1de	Comment PR with test results (#3907 ) This PR adds posting a comment with test results. Each workflow run updates the comment with new results. The layout and the information that we post can be changed to our needs, right now, it contains failed tests and test which changes status after rerun (i.e. flaky tests)	2023-04-04 12:22:47 +01:00
Alexander Bayandin	105b8bb9d3	test_runner: automatically rerun flaky tests (#3880 ) This PR adds a plugin that automatically reruns (up to 3 times) flaky tests. Internally, it uses data from `TEST_RESULT_CONNSTR` database and `pytest-rerunfailures` plugin. As the first approximation we consider the test flaky if it has failed on the main branch in the last 10 days. Flaky tests are fetched by `scripts/flaky_tests.py` script (it's possible to use it in a standalone mode to learn which tests are flaky), stored to a JSON file, and then the file is passed to the pytest plugin.	2023-04-04 12:21:54 +01:00
Kirill Bulatov	846532112c	Remove unused S3 list operation (#3936 ) In S3, pageserver only lists tenants (prefixes) on S3, no other keys. Remove the list operation from the API, since S3 impl does not seem to work normally and not used anyway,	2023-04-03 23:44:38 +03:00
Dmitry Ivanov	f85a61ceac	[proxy] Fix regression in logging For some reason, `tracing::instrument` proc_macro doesn't always print elements specified via `fields()` or even show that it's impossible (e.g. there's no Display impl). Work around this using the `?foo` notation. Before: 2023-04-03T14:48:06.017504Z INFO handle_client🤝 received SslRequest After: 2023-04-03T14:51:24.424176Z INFO handle_client{session_id=7bd07be8-3462-404e-8ccc-0a5332bf3ace}🤝 received SslRequest	2023-04-03 18:49:30 +03:00
Christian Schwarz	45bf76eb05	enable layer eviction by default in prod (#3933 ) Leave disk_usage_based_eviction above the current max usage in prod (82%ish), so that deploying this commit won't trigger disk_usage_based_eviction. As indicated in the TODO, we'll decrease the value to 80% later. Also update the staging YAMLs to use the anchor syntax for `evictions_low_residence_duration_metric_threshold` like we do in the prod YAMLs as of this patch.	2023-04-03 14:57:36 +02:00
Joonas Koivunen	a415670bc3	feat: log evictions (#3930 ) this will help log analysis with the counterpart of already logging all remote download needs and downloads. ended up with a easily regexable output in the final round.	2023-04-03 14:15:41 +03:00
Joonas Koivunen	cf5cfe6d71	fix: metric used for alerting threshold on staging (#3932 ) This should remove the too eager alerts from staging.	2023-04-03 13:26:45 +03:00
Arseny Sher	d733bc54b8	Rename ReplicationFeedback and its fields. This is the the feedback originating from pageserver, so change previous confusing names to s/ReplicationFeedback/PageserverFeedback s/ps_writelsn/last_receive_lsn s/ps_flushlsn/disk_consistent_lsn s/ps_apply_lsn/remote_consistent_lsn I haven't changed on the wire format to keep compatibility. However, understanding of new field names is added to compute, so once all computes receive this patch we can change the wire names as well. Safekeepers/pageservers are deployed roughly at the same time and it is ok to live without feedbacks during the short period, so this is not a problem there.	2023-04-03 01:52:41 +04:00
Arthur Petukhovsky	814abd9f84	Switch to safekeeper in the same AZ (#3883 ) Add a condition to switch walreceiver connection to safekeeper that is located in the same availability zone. Switch happens when commit_lsn of a candidate is not less than commit_lsn from the active connection. This condition is expected not to trigger instantly, because commit_lsn of a current connection is usually greater than commit_lsn of updates from the broker. That means that if WAL is written continuously, switch can take a lot of time, but it should happen eventually. Now protoc 3.15+ is required for building neon. Fixes https://github.com/neondatabase/neon/issues/3200	2023-04-02 11:32:27 +03:00
Alexander Bayandin	75ffe34b17	check-macos-build: fix cache key (#3926 ) We don't have `${{ matrix.build_type }}` there, so it gets resolved to an empty substring and looks like this [`v1-macOS--pg-f8a650e49b06d39ad131b860117504044b01f312-dcccd010ff851b9f72bb451f28243fa3a341f07028034bbb46ea802413b36d80`](https://github.com/neondatabase/neon/actions/runs/4575422427/jobs/8078231907#step:26:2)	2023-03-31 21:45:59 +03:00
Christian Schwarz	d2aa31f0ce	fix pageserver_evictions_with_low_residence_duration metric (#3925 ) It was doing the comparison in the wrong way.	2023-03-31 19:25:53 +03:00
Dmitry Rodionov	22f9ea5fe2	Remind people to clean up merge commit message in PR template (#3920 )	2023-03-31 16:11:34 +03:00
Joonas Koivunen	d0711d0896	build: fix git perms for deploy job (#3921 ) copy pasted from `build-neon` job. it is interesting that this is only needed by `build-neon` and `deploy`. Fixes: https://github.com/neondatabase/neon/actions/runs/4568077915/jobs/8070960178 which seems to have been going for a while.	2023-03-31 16:05:15 +03:00
Arseny Sher	271f6a6e99	Always sync-safekeepers in neon_local on compute start. Instead of checking neon.safekeepers GUC value in existing pg node data dir, just always run sync-safekeepers when safekeepers are configured. Without this change, creation of new compute didn't run it. That's ok for new timeline/branch (it doesn't return anything useful anyway, and LSN is known by pageserver), but restart of compute for existing timeline bore the risk of getting basebackup not on the latest LSN, i.e. basically broken -- it might not have prev_lsn, and even if it had, walproposer would complain anyway. fixes https://github.com/neondatabase/neon/issues/2963	2023-03-31 16:15:06 +04:00
Christian Schwarz	a64dd3ecb5	disk-usage-based layer eviction (#3809 ) This patch adds a pageserver-global background loop that evicts layers in response to a shortage of available bytes in the $repo/tenants directory's filesystem. The loop runs periodically at a configurable `period`. Each loop iteration uses `statvfs` to determine filesystem-level space usage. It compares the returned usage data against two different types of thresholds. The iteration tries to evict layers until app-internal accounting says we should be below the thresholds. We cross-check this internal accounting with the real world by making another `statvfs` at the end of the iteration. We're good if that second statvfs shows that we're _actually_ below the configured thresholds. If we're still above one or more thresholds, we emit a warning log message, leaving it to the operator to investigate further. There are two thresholds: - `max_usage_pct` is the relative available space, expressed in percent of the total filesystem space. If the actual usage is higher, the threshold is exceeded. - `min_avail_bytes` is the absolute available space in bytes. If the actual usage is lower, the threshold is exceeded. The iteration evicts layers in LRU fashion with a reservation of up to `tenant_min_resident_size` bytes of the most recent layers per tenant. The layers not part of the per-tenant reservation are evicted least-recently-used first until we're below all thresholds. The `tenant_min_resident_size` can be overridden per tenant as `min_resident_size_override` (bytes). In addition to the loop, there is also an HTTP endpoint to perform one loop iteration synchronous to the request. The endpoint takes an absolute number of bytes that the iteration needs to evict before pressure is relieved. The tests use this endpoint, which is a great simplification over setting up loopback-mounts in the tests, which would be required to test the statvfs part of the implementation. We will rely on manual testing in staging to test the statvfs parts. The HTTP endpoint is also handy in emergencies where an operator wants the pageserver to evict a given amount of space _now. Hence, it's arguments documented in openapi_spec.yml. The response type isn't documented though because we don't consider it stable. The endpoint should _not_ be used by Console but it could be used by on-call. Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-03-31 14:47:57 +03:00
Konstantin Knizhnik	bf46237fc2	Fix prefetch for parallel bitmap scan (#3875 ) ## Describe your changes Fix prefetch for parallel bitmap scan ## Issue ticket number and link ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-30 22:07:19 +03:00
Lassi Pölönen	41d364a8f1	Add more detailed logging to compute_ctl's shutdown (#3915 ) Currently we don't see from the logs, if shutting down tracing takes long time or not. We do see that shutting down computes gets delayed for some reason and hits thhe grace period limit. Moving the shutdown message to slightly later, when we don't have anything else than just exit left. ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-30 22:02:39 +03:00
Christian Schwarz	fa54a57ca2	random_init_delay: remove the minimum of 10 seconds (#3914 ) Before this patch, the range from which the random delay is picked is at minimum 10 seconds. With this patch, they delay is bounded to whatever the given `period` is, and zero, if period id Duration::ZERO. Motivation for this: the disk usage eviction tests that we'll add in https://github.com/neondatabase/neon/pull/3905 need to wait for the disk usage eviction background loop to do its job. They set a period of 1s. It seems wasteful to wait 10 seconds in the tests. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-03-30 18:38:45 +02:00
Lassi Pölönen	1c1bb904ed	Rename zenith_* labels to neon_* (#3911 ) ## Describe your changes Get rid of the legacy labeling. Aslo `neon_region_slug` with the same value as `neon_region` doesn't make much sense, so just drop it. This allows us to drop the relabeling from zenith to neon in the log collector.	2023-03-30 16:24:47 +03:00
Gleb Novikov	b26c837ed6	Fixed pageserver openapi spec properties reference (#3904 ) ## Describe your changes In [this linter run](https://github.com/neondatabase/cloud/actions/runs/4553032319/jobs/8029101300?pr=4391) accidentally found out that spec is invalid. Reference other schemas in properties should be done the way I changed. Could not find documentation specifically for schemas embedding in `components.schemas`, but it seems like the approach is inherited from json schema: https://json-schema.org/understanding-json-schema/structuring.html#ref ## Issue ticket number and link - ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] ~If it is a core feature, I have added thorough tests.~ - [ ] ~Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?~ - [ ] ~If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.~	2023-03-29 19:18:44 +04:00
Kirill Bulatov	ac9c7e8c4a	Replace pin! from tokio to the std one (#3903 ) With fresh rustc brought by https://github.com/neondatabase/neon/pull/3902, we can use `std::pin::pin!` macro instead of the tokio one. One place did not need the macro at all, other places were adjusted.	2023-03-29 14:14:56 +03:00
Vadim Kharitonov	f1b174dc6a	Update rust version to 1.68.2	2023-03-29 12:50:04 +04:00
Kirill Bulatov	9d714a8413	Split $CARGO_FLAGS and $CARGO_FEATURES to make e2e tests work	2023-03-29 00:08:30 +03:00
Kirill Bulatov	6c84cbbb58	Run new Rust IT test in CI	2023-03-29 00:08:30 +03:00
Kirill Bulatov	1300dc9239	Replace Python IT test with the Rust one	2023-03-29 00:08:30 +03:00
Kirill Bulatov	018c8b0e2b	Use proper tokens and delimeters when listing S3	2023-03-29 00:08:30 +03:00
Arseny Sher	b52389f228	Cleanly exit on any shutdown signal in storage_broker. neon_local sends SIGQUIT, which otherwise dumps core by default. Also, remove obsolete install_shutdown_handlers; in all binaries it was overridden by ShutdownSignals::handle later. ref https://github.com/neondatabase/neon/issues/3847	2023-03-28 22:29:42 +04:00
Heikki Linnakangas	5a123b56e5	Remove obsolete hack to rename neon-specific GUCs. I checked the console database, we don't have any of these left in production.	2023-03-28 17:57:22 +03:00
Arthur Petukhovsky	7456e5b71c	Add script to collect state from safekeepers (#3835 ) Add an ansible script to collect https://github.com/neondatabase/neon/pull/3710 state JSON from all safekeeper nodes and upload them to a postgres table.	2023-03-28 17:04:02 +03:00
Konstantin Knizhnik	9798737ec6	Update pgxn/neon/file_cache.c Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-03-28 14:43:34 +04:00
Konstantin Knizhnik	35ecb139dc	Use stavfs instead inof statfs to fix MacOS build	2023-03-28 14:43:34 +04:00
Arseny Sher	278d0f117d	Rename neon_local sk logs s/safekeeper 1.log/safekeeper-1.log. I don't like spaces in file names.	2023-03-28 14:28:56 +04:00
Arseny Sher	c30b9e6eb1	Show full path to pg_ctl invokation when it fails.	2023-03-28 12:06:06 +04:00
Konstantin Knizhnik	82a4777046	Add local free space monitor (#3832 ) ## Describe your changes Monitor free spae in local file system and shrink local file cache size if it is under watermark. Neon is using local storage for temp files (temp table + intermediate results), unlogged relations and local file cache. Ideally all space not used for temporary files should be used for local file cache. Temporary files and even unlogged relation are intended to have small life time (because them can be lost at any moment in case of compute restart). So the policy is to overcommit local cache size and shrink it if there is not enough free space. As far as temporary files are expected to be needed for a short time, there i no need to permanently shrink local file cache size. Instead of it, we just throw away least recently accessed elements from local file cache, releasing some space on the local disk. ## Issue ticket number and link ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. --------- Co-authored-by: sharnoff <sharnoff@neon.tech>	2023-03-28 08:27:50 +03:00
Dmitry Rodionov	6efea43449	Use precondition failed code in delete_timeline when tenant is missing (#3884 ) This allows client to differentiate between missing tenant and missing timeline cases	2023-03-27 21:01:46 +03:00
Joonas Koivunen	f14895b48e	eviction: avoid post-restart download by synthetic_size (#3871 ) As of #3867, we do artificial layer accesses to layers that will be needed after the next restart, but not until then because of caches. With this patch, we also do that for the accesses that the synthetic size calculation worker does if consumption metrics are enabled. The actual size calculation is not of importance, but we need to calculate all of the sizes, so we only call tenant::size::gather_inputs. Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-03-27 19:20:23 +02:00
Christian Schwarz	fe15624570	eviction_task: only refresh layer accesses once per p.threshold (#3877 ) Without this, we run it every p.period, which can be quite low. For example, the running experiment with 3000 tenants in prod uses a period of 1 minute. Doing it once per p.threshold is enough to prevent eviction.	2023-03-27 14:33:40 +03:00
Christian Schwarz	ff51e96fbd	fix synthetic size for (last_record_lsn - gc_horizon) < initdb_lsn (#3874 ) fix synthetic size for (last_record_lsn - gc_horizon) < initdb_lsn Assume a single-timeline project. If the gc_horizon covers all WAL (last_record_lsn < gc_horizon) but we have written more data than just initdb, the synthetic size calculation worker needs to calculate the logical size at LSN initdb_lsn (Segment BranchStart). Before this patch, that calculation would incorrectly return the initial logical size calculation result that we cache in the Timeline::initial_logical_size. Presumably, because there was confusion around initdb_lsn vs. initial size calculation. The fix is to only hand out the initialized_size() only if the LSN matches. The distinction in the metrics between "init logical size" and "logical size" was also incorrect because of the above. So, remove it. There was a special case for `size != 0`. This was to cover the case of LogicalSize::empty_initial(), but `initial_part_end` is `None` in that case, so the new `LogicalSize::initialized_size()` will return None in that case as well. Lastly, to prevent confusion like this in the future, rename all occurrences of `init_lsn` to either just `lsn` or a more specific name. Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-03-27 12:45:10 +02:00
Vadim Kharitonov	e3cbcc2ea7	Revert "Add `neondatabase/release` team as a default reviewers for storage" This reverts commit `daeaa767c4`.	2023-03-27 14:10:18 +04:00
Heikki Linnakangas	8d78329991	Remove some dead code. whoami() was never called, 'is_test' was never set. 'restart()' might be useful, but it wasn't hooked up the CLI so it was dead code. It's not clear what kind of a restart it should perform, anyway: just restart Postgres, or re-initialize the data directory from a fresh basebackup like "stop"+"start" does.	2023-03-27 12:24:35 +03:00
Dmitry Rodionov	4d8c765485	remove redundant dyn (#3878 ) remove redundant dyn	2023-03-27 12:04:48 +03:00
dependabot[bot]	4071ff8c7b	Bump openssl from 0.10.45 to 0.10.48 in /test_runner/pg_clients/rust/tokio-postgres (#3879 )	2023-03-25 12:33:39 +00:00
Dmitry Rodionov	870ba43a1f	return proper http codes in timeline delete endpoint (#3876 ) return proper http codes in timeline delete endpoint + fix openapi spec for detach to include 404 responses	2023-03-24 19:25:39 +02:00
Joonas Koivunen	f5ca897292	fix: less logging at shutdown (#3866 ) Log less during shutdown; don't log anything for quickly (less than 1s) exiting tasks.	2023-03-23 12:00:52 +02:00
Kirill Bulatov	8bd565e09e	Ensure branches with no layers have their remote storage counterpart created eventually (#3857 ) Discovered during writing a test for https://github.com/neondatabase/neon/pull/3843	2023-03-22 17:42:31 +02:00
Joonas Koivunen	6033dfdf4a	Re-access layers before threshold eviction (#3867 ) To avoid re-downloading evicted files on restart, re-compute logical size and partitioning before each threshold based eviction run. Cc: #3802 Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-03-22 16:26:27 +02:00
mikecaat	14a40c9ca6	Fix minor things for the docker-compose file (#3862 ) * Add the REPOSITORY env to build args to avoid the following error when executing without the credentials for the repository. ``` ERROR: Service 'compute' failed to build: Head "https://369495373322.dkr.ecr.eu-central-1.amazonaws.com/v2/compute-node-v15/manifests/2221": no basic auth credentials ``` * update the tag version in the documentation to support storage broker	2023-03-22 08:10:53 +00:00
Shany Pozin	0f7de84785	Allow calling detach on ignored tenant (#3834 ) ## Describe your changes Added a query param to detach API Allow to remove local state of a tenant even if its not in the memory (following ignore API) ## Issue ticket number and link #3828 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. --------- Co-authored-by: Kirill Bulatov <kirill@neon.tech>	2023-03-22 07:17:00 +00:00
Kirill Bulatov	dd22c87100	Remove older layer metadata format support code (#3854 ) The PR enforces current newest `index_part.json` format in the type system (version `1`), not allowing any previous forms of it, that were used in the past. Similarly, the code to mitigate the https://github.com/neondatabase/neon/issues/3024 issue is now also removed. Current code does not produce old formats and extra files in the index_part.json, in the future we will be able to use https://github.com/neondatabase/aversion or other approach to make version transitions more explicit. See https://neondb.slack.com/archives/C033RQ5SPDH/p1679134185248119 for the justification on the breaking changes.	2023-03-21 23:33:28 +02:00
Heikki Linnakangas	6fdd9c10d1	Read storage auth token from spec file. We read the pageserver connection string from the spec file, so let's read the auth token from the same place. We've been talking about pre-launching compute nodes that are not associated with any particular tenant at startup, so that the spec file is delivered to the compute node later. We cannot change the env variables after the process has been launched. We still pass the token to 'postgres' binary in the NEON_AUTH_TOKEN env variable, but compute_ctl is now responsible for setting it.	2023-03-21 20:12:09 +02:00
Dmitry Rodionov	4158e24e60	rfc: delete pageserver data from s3 (#3792 ) [Rendered](https://github.com/neondatabase/neon/blob/main/docs/rfcs/022-pageserver-delete-from-s3.md) --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-03-21 20:03:27 +02:00
Shany Pozin	809acb5fa9	Move neon-image-depot to a larger runner (#3860 ) ## Describe your changes https://neondb.slack.com/archives/C039YKBRZB4/p1679413279637059 ## Issue ticket number and link ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-21 19:32:36 +02:00
Heikki Linnakangas	299db9d028	Simplify and clean up the $NEON_AUTH_TOKEN stuff in compute - Remove the neon.safekeeper_token_env GUC. It was used to set the name of an environment variable, which was then used in pageserver and safekeeper connection strings to in place of the password. Instead, always look up the environment variable called NEON_AUTH_TOKEN. That's what neon.safekeeper_token_env was always set to in practice, and I don't see the need for the extra level of indirection or configurability. - Instead of substituting $NEON_AUTH_TOKEN in the connection strings, pass $NEON_AUTH_TOKEN "out-of-band" as the password, when we connect to the pageserver or safekeepers. That's simpler. - Also use the password from $NEON_AUTH_TOKEN in compute_ctl, when it connects to the pageserver to get the "base backup".	2023-03-21 00:15:04 +02:00
Heikki Linnakangas	5a786fab4f	Remove duplicated global variables in neon extension. Walproposer used to live in the backend, while pagestore_smgr was an extension. But now that both are part of the neon extension, walproposer can access the same 'neon_tenant' and 'neon_timeline' variables as the pageserver_smgr code.	2023-03-21 00:15:04 +02:00
Arseny Sher	699f200811	Send error context chain to the client when Copy stream errors.	2023-03-21 01:22:02 +04:00
Christian Schwarz	881356c417	add metrics to detect eviction-induced thrashing (#3837 ) This patch adds two metrics that will enable us to detect thrashing of layers, i.e., repetitions of `eviction, on-demand-download, eviction, ... ` for a given layer. The first metric counts all layer evictions per timeline. It requires no further explanation. The second metric counts the layer evictions where the layer was resident for less than a given threshold. We can alert on increments to the second metric. The first metric will serve as a baseline, and further, it's generally interesting, outside of thrashing. The second metric's threshold is configurable in PageServerConf and defaults to 24h. The threshold value is reproduced as a label in the metric because the counter's value is semantically tied to that threshold. Since changes to the config and hence the label value are infrequent, this will have low storage overhead in the metrics storage. The data source to determine the time that the layer was resident is the file's `mtime`. Using `mtime` is more of a crutch. It would be better if Pageserver did its own persistent bookkeeping of residence change events instead of relying on the filesystem. We had some discussion about this: https://github.com/neondatabase/neon/pull/3809#issuecomment-1470448900 My position is that `mtime` is good enough for now. It can theoretically jump forward if someone copies files without resetting `mtime`. But that shouldn't happen in practice. Note that moving files back and forth doesn't change `mtime`, nor does `chown` or `chmod`. Lastly, `rsync -a`, which is typically used for filesystem-level backup / restore, correctly syncs `mtime`. I've added a label that identifies the data source to keep options open for a future, better data source than `mtime`. Since this value will stay the same for the time being, it's not a problem for metrics storage. refs https://github.com/neondatabase/neon/issues/3728	2023-03-20 16:11:36 +01:00
Heikki Linnakangas	fea4b5f551	Switch to EdDSA algorithm for the storage JWT authentication tokens. The control plane currently only supports EdDSA. We need to either teach the storage to use EdDSA, or the control plane to use RSA. EdDSA is more modern, so let's use that. We could support both, but it would require a little more code and tests, and we don't really need the flexibility since we control both sides.	2023-03-20 16:28:01 +02:00
Heikki Linnakangas	77107607f3	Allow JWT key generation to fail if authentication is not enabled. This allows you to run without the 'openssl' binary as long as you don't enable authentication. This becomes more important with the next commit, which switches the JWT algorithm to EdDSA. LibreSSL does not support EdDSA, and LibreSSL comes with macOS, so the next commit makes it much more likely for the key generation to fail for macOS users. To allow running without a keypair, don't generate the authentication token in the 'neon_local init' step. Instead, generate a new token on every request that needs one, using the private key.	2023-03-20 16:28:01 +02:00
Heikki Linnakangas	1da963b2f9	Remove some unused code in control plane.	2023-03-20 16:28:01 +02:00
Heikki Linnakangas	1ddb9249aa	Reduce the # of histogram buckets in metrics. (#3850 ) Shrinks the total number of metrics collected for each timeline by about 50%. See https://github.com/neondatabase/neon/issues/2848. This doesn't fully solve the problem, we still collect a lot of metrics even with this, but this gives us a lot of headroom.	2023-03-20 15:49:16 +02:00
Joonas Koivunen	0c1228c37a	feat: store initial timeline in env fixture (#3839 ) minor change, but will allow more use in future for the default tenants. Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-03-20 11:57:27 +02:00
Christian Schwarz	3c15874c48	allow specifying eviction_policy in TenantCreateRequest This was on oversight from `175a577ad4`. Nothing uses this AFAIK, but, let's fix it anyways. Noticed while working on https://github.com/neondatabase/neon/issues/3728	2023-03-20 10:43:53 +01:00
Shany Pozin	93f3f4ab5f	Return NotFound in mgmt API requests when tenant is not present in the pageserver (#3818 ) ## Describe your changes Add Error enum for tenant state response to allow better error handling in mgmt api ## Issue ticket number and link #2238 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-19 10:44:42 +02:00
sharnoff	f6e2e0042d	Fix + re-enable VM cgroup creation + running (#3820 ) Re-enable cgroup shenanigans in VMs, with some special care taken to make sure that our version of cgroup-tools supports cgroup v2 (debian bullseye does not, and probably won't because it requires a breaking change in libcgroup). This involves manually building libcgroup / cgroup-tools from source, then copying the output into the final build stage. We originally considered pulling the package from debian's testing repo (which is up-to-date), but decided against it. Refer to the PR for more details. Prior work, for reference: * `2153d2e0` - Run compute_ctl in a cgroup in VMs * `1360361f` - Fix missing VM cgconfig.conf * `8dae8799` - Disable VM cgroup shenanigans	2023-03-16 17:09:45 -07:00
Christian Schwarz	b917270c67	remove unused TenantConfig::update function	2023-03-16 16:25:35 +01:00
Arthur Petukhovsky	b067378d0d	Measure cross-AZ traffic in safekeepers (#3806 ) Create `safekeeper_pg_io_bytes_total` metric to track total amount of bytes written/read in a postgres connections to safekeepers. This metric has the following labels: - `client_az` – availability zone of the connection initiator, or `"unknown"` - `sk_az` – availability zone of the safekeeper, or `"unknown"` - `app_name` – `application_name` of the postgres client - `dir` – data direction, either `"read"` or `"write"` - `same_az` – `"true"`, `"false"` or `"unknown"`. Can be derived from `client_az` and `sk_az`, exists purely for convenience. This is implemented by passing availability zone in the connection string, like this: `-c tenant_id=AAA timeline_id=BBB availability-zone=AZ-1`. Update ansible deployment scripts to add availability_zone argument to safekeeper and pageserver in systemd service files.	2023-03-16 17:24:01 +03:00
Joonas Koivunen	768c8d9972	test: allow gc to get unlucky (#3826 ) this failure case was probably introduced by `b220ba6`, because earlier the gc would always have run fast enough for restart every 1s. however, test got added later, so we have just been lucky. fixes #3824 by allowing this error to happen.	2023-03-16 11:03:21 +02:00
Rahul Patil	f1d960d2c2	Add new pageserver to eu-central-1 (#3829 ) ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-15 16:58:28 +01:00
Arthur Petukhovsky	cd17802b1f	Add pg_clients test for serverless driver (#3827 ) Fixes #3819	2023-03-15 16:18:38 +03:00
Heikki Linnakangas	10a5d36af8	Separate mgmt and libpq authentication configs in pageserver. (#3773 ) This makes it possible to enable authentication only for the mgmt HTTP API or the compute API. The HTTP API doesn't need to be directly accessible from compute nodes, and it can be secured through network policies. This also allows rolling out authentication in a piecemeal fashion.	2023-03-15 13:52:29 +02:00
Arseny Sher	a7ab53c80c	Forward framed read buf contents to compute before proxy pass. Otherwise they get lost. Normally buffer is empty before proxy pass, but this is not the case with pipeline mode of out npm driver; fixes connection hangup introduced by `b80fe41af3` for it. fixes https://github.com/neondatabase/neon/issues/3822	2023-03-15 14:32:41 +03:00
Heikki Linnakangas	2672fd09d8	Make test independent of the order of config lines.	2023-03-14 20:10:34 +02:00
Heikki Linnakangas	4a92799f24	Fix check for trailing garbage in basebackup import. There was a warning for trailing garbage after end-of-tar archive, but it didn't always work. The reason is that we created a StreamReader over the original copyin-stream, but performed the check for garbage on the copyin-stream. There could be some garbage bytes buffered in the StreamReader, which were not caught by the warning. I considered turning the the warning into a fatal error, aborting the import, but I wasn't sure if we handle aborting the import properly. Do we clean up the timeline directory on error? If we don't, we should make that more robust, but that's a different story. Also, normally a valid tar archive ends with two 512-byte blocks of zeros. The tokio_tar crate stops at the first all-zeros block. Read and check the second all-zeros block, and error out if it's not there, or contains something unexpected.	2023-03-14 19:45:33 +02:00
Konstantin Knizhnik	5396273541	Avoid holes between generated image layers (#3771 ) ## Describe your changes When we perform partitioning of the whole key space, we take in account actual ranges of relation present in the database. So if we have relation with relid=1 and size 100 and relation with relid=2 with size 200 then result of KeySpace::partition may contain partitions <100000000..100000099> and <200000000..200000199>. Generated image layers will contain the same boundaries. But when GC is checking image coverage to find out of old layer is fully covered by newer image layer and so can be deleted, it takes in account only full key range. I.e. if there is delta layer <100000000..300000000> then it never be garbage collected because image layers <100000000..100000099> and <200000000..200000199> are not completely covering it.This is how it looks in practice: 000000067F000032AC00000A300000000000-000000067F000032AC00000A330000000000__000000000F761828 000000067F000032AC00000A31000000001F-000000067F000032AC00000A620000000005__0000000001696070-000000000442A551 000000067F000032AC00000A3300FFFFFFFF-000000067F000032AC00000A650100000000__000000000F761828 So there are two image layers covering delta layer but ... there is a hole: A330000000000...A3300FFFFFFFF and as a result delta layer is not collected. ## Issue ticket number and link This PR is deeply related with #3673 because it is addressing the same problem: old layers are not utilized by GC. The test test_gc_old_layers.py in #3673 can be used to see effect of this patch. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-03-14 16:15:07 +02:00
Joonas Koivunen	c23c8946a3	chore: clippies introduced with rust 1.68 (#3781 ) - handle automatically fixable future clippies - tune run-clippy.sh to remove macos specifics which we no longer have Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-03-14 15:29:02 +02:00
Joonas Koivunen	15b692ccc9	test: more strict finding of WARN, ERROR lines (#3798 ) this prevents flakyness when `WARN\|ERROR` appears in some other part of the line, for example in a random filename.	2023-03-14 15:27:52 +02:00
Alexander Bayandin	3d869cbcde	Replace flake8 and isort with ruff (#3810 ) - Introduce ruff (https://beta.ruff.rs/) to replace flake8 and isort - Update mypy and black	2023-03-14 13:25:44 +00:00
Lassi Pölönen	68ae020b37	Use RollingUpdate strategy also for legacy proxy (#3814 ) ## Describe your changes We have previously changed the neon-proxy to use RollingUpdate. This should be enabled in legacy proxy too in order to avoid breaking connections for the clients and allow for example backups to run even during deployment. (https://github.com/neondatabase/neon/pull/3683) ## Issue ticket number and link https://github.com/neondatabase/neon/issues/3333	2023-03-14 13:23:46 +00:00
Joonas Koivunen	d6bb8caad4	refactor: correct return value for not found L0's on LayerMap::replace (#3805 ) in prev implementation, an `ok_or_else(...)?` is used to cause a "precondition error" on LayerMap::replace, however we only see this particular error if an L0 for which replace fails is not in the layermap because it is not in `l0_delta_layers`. changes or fixes this to be Replacement::NotFound instead, making it more clear that an error would only be raised for actual preconditions, like trying to replace layer with completly unrelated layer.	2023-03-14 13:18:26 +02:00
Alexander Bayandin	319402fc74	postgres_ffi: restore POSTGRES_INSTALL_DIR support (#3811 ) Fix path construction to `pg_config`: `pg_install_dir_versioned` already includes `pg_version`	2023-03-14 10:25:48 +00:00
Dmitry Ivanov	2e4bf7cee4	[proxy] Immediately log all compute node connection errors.	2023-03-14 01:45:57 +03:00
Dmitry Ivanov	15ed6af5f2	Add descriptions to proxy's python tests.	2023-03-14 01:32:37 +03:00
Dmitry Rodionov	50476a7cc7	test: update to match current interfaces	2023-03-13 17:50:10 +02:00
andres	d7ab69f303	add test for getting branchpoints from an inactive timeline	2023-03-13 17:50:10 +02:00
sharnoff	582620274a	Enable file cache handling by vm-informant (#3794 ) Enables the VM informant's file cache integration. See also: https://github.com/neondatabase/autoscaling/pull/47	2023-03-13 07:16:39 -07:00
Vadim Kharitonov	daeaa767c4	Add `neondatabase/release` team as a default reviewers for storage releases	2023-03-13 13:40:15 +01:00
Konstantin Knizhnik	f0573f5991	Remove block cursor cache (#3740 ) ## Describe your changes Do not pin current block in BlockCursor ## Issue ticket number and link See #3712 There are places (see get_reconstruct_data) in our code when thread is holding read layers lock and then try to read file and so lock page cache slot. So we have edge in dependency graph layers->page cache slot. At the same time (as Christian noticed) we can lock page cache slot in BlockCursor and then try obtain shard lock on layers. So there is backward edge in dependency graph page cache slot>layers which forms loop and may cause deadlock. There are three possible fixes of the problem: 1. Perform compaction under `layers` shared lock. See PR #3732. It fixes the problem but make it not possible to append any data to pageserver until compaction is completed. 2. Do not hold `layers` lock while accessing layers (not sure if it is possible to do because it definitely introduce some new race conditions). 3. Do not pin current pages in BockCursor (this PR). My experiments shows that this cache in BlockCursor is not so useful: the number of hits/misses for cursor cache on pgbench workload (-i -s 10/-c 10 -T 100/-c 10 -S -T 100): ``` hits: 163011 misses: 1023602 ``` So number of cache misses is 10x times larger. And results for read-only pgbench are mostly the same: ``` with cache: 14581 w/out cache: 14429 ``` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-13 14:07:35 +02:00
Nikita Kalyanov	07dcf679de	set content type explicitly (#3799 ) I moved management API v2 to ogen and the generated code seems to be more strict about content type. Let's set it properly as it is json after all ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-13 14:00:01 +02:00
Heikki Linnakangas	e0ee138a8b	Add a test for tokio-postgres client to the driver test suite. It is fully supported. To enable TLS, though, it requires some extra glue code, and a dependency to a TLS library.	2023-03-13 12:14:37 +02:00
Arthur Petukhovsky	d9a1329834	Make postgres_backend use generic IO type (#3789 ) - Support measuring inbound and outbound traffic in MeasuredStream - Start using MeasuredStream in safekeepers code	2023-03-13 12:18:10 +03:00
Joonas Koivunen	8699342249	Ondemand rx bytes and layer count (#3777 ) Adds two new global metrics: - pageserver_remote_ondemand_downloaded_layers_total - pageserver_remote_ondemand_downloaded_bytes_total An existing test is repurposed once more to check that we do get some reasonable counts. These are to replace guessing from the nic RX bytes metric how much was on-demand downloaded. First part of #3745: This does not add the "(un)?avoidable" metric, which I plan to add as a new metric, which will be a subset of the counts of the metrics added here.	2023-03-13 09:26:49 +02:00
Joonas Koivunen	ce8fbbd910	Fix allowed error again (#3790 ) Fixes #3360 again, this time checking all other "Error processing HTTP request" messages and aligning the regex with the two others.	2023-03-10 17:44:12 +00:00
Vadim Kharitonov	1401021b21	Be able to get number of CPUs (#3774 ) After enabling autoscaling, we faced the issue that customers are not able to get the number of CPUs they use at this moment. Therefore I've added these two options: 1. Postgresql function to allow customers to call it whenever they want 2. `compute_ctl` endpoint to show these number in console	2023-03-10 19:00:20 +02:00
Stas Kelvich	252b3685a2	Use `unsafe-postgres` feature to build pgx extension Recently added `unsafe-postgres` feature allows to build pgx extensions against postgres forks that decided to change their ABI name (like us). With that we can build extensions without forking them and using stock pgx. As this feature is new few manual version bumps were required.	2023-03-10 17:40:45 +02:00
Heikki Linnakangas	34d3385b2e	Add unit tests for JWT encoding and decoding.	2023-03-10 16:09:32 +02:00
Heikki Linnakangas	b00530df2a	Add section in internal docs on the JWT payload. Just copied from the code comments. Could be improved, but this is a start.	2023-03-10 16:09:32 +02:00
Heikki Linnakangas	bebf76c461	Accept RS384 and RS512 JWT tokens. Previously, we only accepted RS256. Seems like a pointless limitation, and when I was testing it with RS512 tokens, it took me a while to understand why it wasn't working.	2023-03-10 16:09:32 +02:00
Vadim Kharitonov	2ceef91da1	Compile `pg_tiktoken` extension	2023-03-10 14:59:26 +01:00
Anastasia Lubennikova	b7fddfa70d	Add branch_id field to proxy_io_bytes_per_client metric. Since we allow switching endpoints between different branches, it is important to use composite key. Otherwise, we may try to calculate delta between metric values for two different branches.	2023-03-10 15:00:48 +02:00
Heikki Linnakangas	d1537a49fa	Fix escaping in postgresql.conf that we generate at compute startup If there are any config options that contain single quotes or backslashes, they need to be escaped	2023-03-10 14:59:21 +02:00
Heikki Linnakangas	856d01ff68	Add newline at end of postgresql.conf	2023-03-10 14:59:21 +02:00
Heikki Linnakangas	42ec79fb0d	Make expected test output nicer to read. By using Rust raw string literal.	2023-03-10 14:59:21 +02:00
Rory de Zoete	3c4f5af1b9	Try depot.dev for image building (#3768 ) To see if it is faster. Run side-by-side for a while so we can gather enough data.	2023-03-10 11:11:39 +01:00
Arseny Sher	290884ea3b	Fix too many arguments in read_network clippy complain.	2023-03-10 10:50:03 +03:00
Arseny Sher	965837df53	Log connection ids in safekeeper instead of thread ids. Fixes build on macOS (which doesn't have nix gettid) after `0d8ced8534`.	2023-03-10 10:50:03 +03:00
Arseny Sher	d1a0f2f0eb	Fix example why manual-range-contains is disabled.	2023-03-10 00:03:17 +03:00
Konstantin Knizhnik	a34e78d084	Retry attempt to connect to pageserver in order to make pageserver restart transparent for clients (#3700 ) …start transparent for clients ## Describe your changes Try to reestablish connection with pageserver if send is failed to be able to make pageserver restart transparent for client ## Issue ticket number and link https://github.com/neondatabase/neon/issues/1138 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-03-09 22:15:46 +02:00
Arseny Sher	b80fe41af3	Refactor postgres protocol parsing. 1) Remove allocation and data copy during each message read. Instead, parsing functions now accept BytesMut from which data they form messages, with pointers (e.g. in CopyData) pointing directly into BytesMut buffer. Accordingly, move ConnectionError containing IO error subtype into framed.rs providing this and leave in pq_proto only ProtocolError. 2) Remove anyhow from pq_proto. 3) Move FeStartupPacket out of FeMessage. Now FeStartupPacket::parse returns it directly, eliminating dead code where user wants startup packet but has to match for others. proxy stream.rs is adapted to framed.rs with minimal changes. It also benefits from framed.rs improvements described above.	2023-03-09 20:45:56 +03:00
Arseny Sher	0d8ced8534	Remove sync postgres_backend, tidy up its split usage. - Add support for splitting async postgres_backend into read and write halfes. Safekeeper needs this for bidirectional streams. To this end, encapsulate reading-writing postgres messages to framed.rs with split support without any additional changes (relying on BufRead for reading and BytesMut out buffer for writing). - Use async postgres_backend throughout safekeeper (and in proxy auth link part). - In both safekeeper COPY streams, do read-write from the same thread/task with select! for easier error handling. - Tidy up finishing CopyBoth streams in safekeeper sending and receiving WAL -- join split parts back catching errors from them before returning. Initially I hoped to do that read-write without split at all, through polling IO: https://github.com/neondatabase/neon/pull/3522 However that turned out to be more complicated than I initially expected due to 1) borrow checking and 2) anon Future types. 1) required Rc<Refcell<...>> which is Send construct just to satisfy the checker; 2) can be workaround with transmute. But this is so messy that I decided to leave split.	2023-03-09 20:45:56 +03:00
Arseny Sher	7627d85345	Move async postgres_backend to its own crate. To untie cyclic dependency between sync and async versions of postgres_backend, copy QueryError and some logging/error routines to postgres_backend.rs. This is temporal glue to make commits smaller, sync version will be dropped by the upcoming commit completely.	2023-03-09 20:45:56 +03:00
Arseny Sher	3f11a647c0	Rename write_message to write_message_noflush in postgres_backend_async.rs To make it unifrom across the project; proxy stream.rs and older postgres_backend uses write_message_noflush.	2023-03-09 20:45:56 +03:00
Alexey Kondratov	e43c413a3f	[compute_tools] Add /insights endpoint to compute_ctl (#3704 ) This commit adds a basic HTTP API endpoint that allows scraping the `pg_stat_statements` data and getting a list of slow queries. New insights like cache hit rate and so on could be added later. Extension `pg_stat_statements` is checked / created only if compute tries to load the corresponding shared library. The latter is configured by control-plane and currently covered with feature flag. Co-authored by Eduard Dyckman (bird.duskpoet@gmail.com)	2023-03-09 14:21:10 +01:00
Heikki Linnakangas	8459e0265e	Add performance test for compaction and image layer creation	2023-03-09 14:30:12 +02:00
Kirill Bulatov	03a2ce9d13	Add tracing spans with request_id into pageserver management API handlers (#3755 ) Adds a newtype that creates a span with request_id from https://github.com/neondatabase/neon/pull/3708 for every HTTP request served. Moves request logging and error handlers under the new wrapper, so every request-related event now is logged under the request span. For compatibility reasons, error handler is left on the general router, since not every service uses the new handler wrappers yet.	2023-03-09 09:24:01 +02:00
Heikki Linnakangas	ccf92df4da	Remove deprecated support to handle ZENITH_AUTH_TOKEN. It's not used anywhere anymore.	2023-03-09 00:53:13 +02:00
Vadim Kharitonov	37bc6d9be4	Compile `plpgsql_check` extension	2023-03-08 23:20:24 +01:00
Vadim Kharitonov	177f986795	Compile `hll` extension	2023-03-08 12:03:09 +01:00
Heikki Linnakangas	fb1581d0b9	Fix setting "image_creation_threshold" setting in tenant config. (#3762 ) We have a few tests that try to set image_creation_threshold, but it didn't actually have any effect because we were missing some critical code to load the setting from config file into memory. The two modified tests in `test_remote_storage.py perform compaction and GC, and assert that GC removes some layers. That only happens if new image layers are created by the compaction. The tests explicitly disabled image layer creation by setting image_creation_threshold to a high value, but it didn't take effect because reading image_creation_threshold from config file was broken, which is why the test worked. Fix the test to set image_creation_threshold low, instead, so that GC has work to do. Change 'test_tenant_conf.py' so that it exercises the added code. This might explain why we're apparently missing test coverage for GC (issue #3415), although I didn't try to address that here, nor did I check if this improves the it.	2023-03-08 11:39:30 +02:00
Sasha Krassovsky	02b8e0e5af	Add OpenAPI spec for do_gc (#3756 ) ## Describe your changes Adds a field to the OpenAPI spec for the page server which describes the `do_gc` command. ## Issue ticket number and link #3669 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-07 09:08:46 -08:00
Vadim Kharitonov	1b16de0d0f	Compile `prefix` extension	2023-03-07 17:42:53 +01:00
Stas Kelvich	069b5b0a06	Make `postgres --wal-redo` more embeddable. * Stop allocating and maintaining 128MB hash table for last written LSN cache as it is not needed in wal-redo. * Do not require access to the initialized data directory. That saves few dozens megabytes of empty but initialized data directory. Currently such directories do occupy about 10% of the disk space on the pageservers as most of tenants are empty. * Move shmem-initialization code to the extension instead of postgres	2023-03-07 15:01:14 +02:00
Joonas Koivunen	b05e94e4ff	fix: allow ERROR log to appear per allowed failure (#3696 ) The test already allows the background thread trying to checkpoint to fail, however the resulting log message is currently not allowed thus causing flakyness.	2023-03-07 12:44:04 +00:00
Arseny Sher	0acf9ace9a	Return 404 if timeline is not found in safekeeper HTTP API.	2023-03-07 16:34:20 +04:00
Arseny Sher	ca85646df4	Max peer_horizon_lsn before adopting it. Before this patch, persistent peer_horizon_lsn was always sent to walproposer, making it initially calculate it equal to max of persistent values and in turn pulling back the in memory value. Send instead in memory value and take max when safekeeper sets it. closes https://github.com/neondatabase/neon/issues/3752	2023-03-07 10:16:54 +04:00
Shany Pozin	7b9057ad01	Add timeout to download copy (#3675 ) ## Describe your changes Adding a timeout handling for the remote download of layers of 120 seconds for each operation Note that these downloads are being retried for N times ## Issue ticket number and link Fixes: #3672 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-03-06 18:52:59 +02:00
Konstantin Knizhnik	96f65fad68	Handle crash of walredo process and retry applying wal records (#3739 ) ## Describe your changes Restart walredo process an d retry applying walredo records i case of abnormal walredo process termination ## Issue ticket number and link See #1700 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-06 10:10:58 +02:00

999 changed files with 209664 additions and 47499 deletions

									
										18

.cargo/config.toml
									
												View File
												
				@@ -1,16 +1,8 @@

				# The binaries are really slow, if you compile them in 'dev' mode with the defaults.

				# Enable some optimizations even in 'dev' mode, to make tests faster. The basic

				# optimizations enabled by "opt-level=1" don't affect debuggability too much.

				#

				# See https://www.reddit.com/r/rust/comments/gvrgca/this_is_a_neat_trick_for_getting_good_runtime/

				#

				[profile.dev.package."*"]

				# Set the default for dependencies in Development mode.

				opt-level = 3

				[profile.dev]

				# Turn on a small amount of optimization in Development mode.

				opt-level = 1

				[build]

				# This is only present for local builds, as it will be overridden

				# by the RUSTDOCFLAGS env var in CI.

				rustdocflags = ["-Arustdoc::private_intra_doc_links"]

				[alias]

				build_testing = ["build", "--features", "testing"]

				neon = ["run", "--bin", "neon_local"]

									
										8

.config/hakari.toml
									
												View File
												
				@@ -4,7 +4,7 @@

				hakari-package = "workspace_hack"

				# Format for `workspace-hack = ...` lines in other Cargo.tomls. Requires cargo-hakari 0.9.8 or above.

				dep-format-version = "3"

				dep-format-version = "4"

				# Setting workspace.resolver = "2" in the root Cargo.toml is HIGHLY recommended.

				# Hakari works much better with the new feature resolver.

				@@ -22,5 +22,11 @@ platforms = [

				    # "x86_64-pc-windows-msvc",

				]

				[final-excludes]

				# vm_monitor benefits from the same Cargo.lock as the rest of our artifacts, but

				# it is built primarly in separate repo neondatabase/autoscaling and thus is excluded

				# from depending on workspace-hack because most of the dependencies are not used.

				workspace-members = ["vm_monitor"]

				# Write out exact versions rather than a semver range. (Defaults to false.)

				# exact-versions = true

									
										2

.config/nextest.toml
									
										Normal file
									
												View File
												
				@@ -0,0 +1,2 @@

				[profile.default]

				slow-timeout = { period = "60s", terminate-after = 3 }

42

.dockerignore

View File

@@ -1,24 +1,30 @@
 *
 !rust-toolchain.toml
 !Cargo.toml
 # Files
 !Cargo.lock
 !Cargo.toml
 !Makefile
 !.cargo/
 !.config/
 !control_plane/
 !compute_tools/
 !libs/
 !pageserver/
 !pgxn/
 !proxy/
 !safekeeper/
 !storage_broker/
 !trace/
 !vendor/postgres-v14/
 !vendor/postgres-v15/
 !workspace_hack/
 !neon_local/
 !rust-toolchain.toml
 !scripts/combine_control_files.py
 !scripts/ninstall.sh
 !vm-cgconfig.conf
 !docker-compose/run-tests.sh
 # Directories
 !.cargo/
 !.config/
 !compute_tools/
 !control_plane/
 !libs/
 !neon_local/
 !pageserver/
 !patches/
 !pgxn/
 !proxy/
 !storage_scrubber/
 !safekeeper/
 !storage_broker/
 !storage_controller/
 !trace/
 !vendor/postgres-*/
 !workspace_hack/

									
										5

.github/ISSUE_TEMPLATE/epic-template.md
									
										vendored
									
												View File
												
				@@ -16,9 +16,10 @@ assignees: ''

				## Implementation ideas

				## Tasks

				- [ ]

				```[tasklist]

				- [ ] Example Task

				```

				## Other related tasks and Epics

									
										3

.github/PULL_REQUEST_TEMPLATE/release-pr.md
									
										vendored
									
												View File
												
				@@ -3,13 +3,14 @@

				**NB: this PR must be merged only by 'Create a merge commit'!**

				### Checklist when preparing for release

				- [ ] Read or refresh [the release flow guide](https://github.com/neondatabase/cloud/wiki/Release:-general-flow)

				- [ ] Read or refresh [the release flow guide](https://www.notion.so/neondatabase/Release-general-flow-61f2e39fd45d4d14a70c7749604bd70b)

				- [ ] Ask in the [cloud Slack channel](https://neondb.slack.com/archives/C033A2WE6BZ) that you are going to rollout the release. Any blockers?

				- [ ] Does this release contain any db migrations? Destructive ones? What is the rollback plan?

				<!-- List everything that should be done **before** release, any issues / setting changes / etc -->

				### Checklist after release

				- [ ] Make sure instructions from PRs included in this release and labeled `manual_release_instructions` are executed (either by you or by people who wrote them).

				- [ ] Based on the merged commits write release notes and open a PR into `website` repo ([example](https://github.com/neondatabase/website/pull/219/files))

				- [ ] Check [#dev-production-stream](https://neondb.slack.com/archives/C03F5SM1N02) Slack channel

				- [ ] Check [stuck projects page](https://console.neon.tech/admin/projects?sort=last_active&order=desc&stuck=true)

									
										13

.github/actionlint.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,13 @@

				self-hosted-runner:

				  labels:

				    - arm64

				    - gen3

				    - large

				    - large-arm64

				    - small

				    - small-arm64

				    - us-east-2

				config-variables:

				  - REMOTE_STORAGE_AZURE_CONTAINER

				  - REMOTE_STORAGE_AZURE_REGION

				  - SLACK_UPCOMING_RELEASE_CHANNEL_ID

									
										234

.github/actions/allure-report-generate/action.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,234 @@

				name: 'Create Allure report'

				description: 'Generate Allure report from uploaded by actions/allure-report-store tests results'

				inputs:

				  store-test-results-into-db:

				    description: 'Whether to store test results into the database. TEST_RESULT_CONNSTR/TEST_RESULT_CONNSTR_NEW should be set'

				    type: boolean

				    required: false

				    default: false

				outputs:

				  base-url:

				    description: 'Base URL for Allure report'

				    value: ${{ steps.generate-report.outputs.base-url }}

				  base-s3-url:

				    description: 'Base S3 URL for Allure report'

				    value: ${{ steps.generate-report.outputs.base-s3-url }}

				  report-url:

				    description: 'Allure report URL'

				    value: ${{ steps.generate-report.outputs.report-url }}

				  report-json-url:

				    description: 'Allure report JSON URL'

				    value: ${{ steps.generate-report.outputs.report-json-url }}

				runs:

				  using: "composite"

				  steps:

				    # We're using some of env variables quite offen, so let's set them once.

				    #

				    # It would be nice to have them set in common runs.env[0] section, but it doesn't work[1]

				    #

				    # - [0] https://docs.github.com/en/actions/creating-actions/metadata-syntax-for-github-actions#runsenv

				    # - [1] https://github.com/neondatabase/neon/pull/3907#discussion_r1154703456

				    #

				    - name: Set variables

				      shell: bash -euxo pipefail {0}

				      run: |

				        PR_NUMBER=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)

				        if [ "${PR_NUMBER}" != "null" ]; then

				          BRANCH_OR_PR=pr-${PR_NUMBER}

				        elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || [ "${GITHUB_REF_NAME}" = "release-proxy" ]; then

				          # Shortcut for special branches

				          BRANCH_OR_PR=${GITHUB_REF_NAME}

				        else

				          BRANCH_OR_PR=branch-$(printf "${GITHUB_REF_NAME}" | tr -c "[:alnum:]._-" "-")

				        fi

				        LOCK_FILE=reports/${BRANCH_OR_PR}/lock.txt

				        WORKDIR=/tmp/${BRANCH_OR_PR}-$(date +%s)

				        mkdir -p ${WORKDIR}

				        echo "BRANCH_OR_PR=${BRANCH_OR_PR}" >> $GITHUB_ENV

				        echo "LOCK_FILE=${LOCK_FILE}"       >> $GITHUB_ENV

				        echo "WORKDIR=${WORKDIR}"           >> $GITHUB_ENV

				        echo "BUCKET=${BUCKET}"             >> $GITHUB_ENV

				      env:

				        BUCKET: neon-github-public-dev

				    # TODO: We can replace with a special docker image with Java and Allure pre-installed

				    - uses: actions/setup-java@v4

				      with:

				        distribution: 'temurin'

				        java-version: '17'

				    - name: Install Allure

				      shell: bash -euxo pipefail {0}

				      run: |

				        if ! which allure; then

				          ALLURE_ZIP=allure-${ALLURE_VERSION}.zip

				          wget -q https://github.com/allure-framework/allure2/releases/download/${ALLURE_VERSION}/${ALLURE_ZIP}

				          echo "${ALLURE_ZIP_SHA256} ${ALLURE_ZIP}" | sha256sum --check

				          unzip -q ${ALLURE_ZIP}

				          echo "$(pwd)/allure-${ALLURE_VERSION}/bin" >> $GITHUB_PATH

				          rm -f ${ALLURE_ZIP}

				        fi

				      env:

				        ALLURE_VERSION: 2.27.0

				        ALLURE_ZIP_SHA256: b071858fb2fa542c65d8f152c5c40d26267b2dfb74df1f1608a589ecca38e777

				    # Potentially we could have several running build for the same key (for example, for the main branch), so we use improvised lock for this

				    - name: Acquire lock

				      shell: bash -euxo pipefail {0}

				      run: |

				        LOCK_TIMEOUT=300 # seconds

				        LOCK_CONTENT="${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}"

				        echo ${LOCK_CONTENT} > ${WORKDIR}/lock.txt

				        # Do it up to 5 times to avoid race condition

				        for _ in $(seq 1 5); do

				          for i in $(seq 1 ${LOCK_TIMEOUT}); do

				            LOCK_ACQUIRED=$(aws s3api head-object --bucket neon-github-public-dev --key ${LOCK_FILE} | jq --raw-output '.LastModified' || true)

				            # `date --date="..."` is supported only by gnu date (i.e. it doesn't work on BSD/macOS)

				            if [ -z "${LOCK_ACQUIRED}" ] || [ "$(( $(date +%s) - $(date --date="${LOCK_ACQUIRED}" +%s) ))" -gt "${LOCK_TIMEOUT}" ]; then

				              break

				            fi

				            sleep 1

				          done

				          aws s3 mv --only-show-errors ${WORKDIR}/lock.txt "s3://${BUCKET}/${LOCK_FILE}"

				          # Double-check that exactly THIS run has acquired the lock

				          aws s3 cp --only-show-errors "s3://${BUCKET}/${LOCK_FILE}" ./lock.txt

				          if [ "$(cat lock.txt)" = "${LOCK_CONTENT}" ]; then

				            break

				          fi

				        done

				    - name: Generate and publish final Allure report

				      id: generate-report

				      shell: bash -euxo pipefail {0}

				      run: |

				        REPORT_PREFIX=reports/${BRANCH_OR_PR}

				        RAW_PREFIX=reports-raw/${BRANCH_OR_PR}/${GITHUB_RUN_ID}

				        BASE_URL=https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/${GITHUB_RUN_ID}

				        BASE_S3_URL=s3://${BUCKET}/${REPORT_PREFIX}/${GITHUB_RUN_ID}

				        REPORT_URL=${BASE_URL}/index.html

				        REPORT_JSON_URL=${BASE_URL}/data/suites.json

				        # Get previously uploaded data for this run

				        ZSTD_NBTHREADS=0

				        S3_FILEPATHS=$(aws s3api list-objects-v2 --bucket ${BUCKET} --prefix ${RAW_PREFIX}/ | jq --raw-output '.Contents[]?.Key')

				        if [ -z "$S3_FILEPATHS" ]; then

				          # There's no previously uploaded data for this $GITHUB_RUN_ID

				          exit 0

				        fi

				        time aws s3 cp --recursive --only-show-errors "s3://${BUCKET}/${RAW_PREFIX}/" "${WORKDIR}/"

				        for archive in $(find ${WORKDIR} -name "*.tar.zst"); do

				          mkdir -p ${archive%.tar.zst}

				          time tar -xf ${archive} -C ${archive%.tar.zst}

				          rm -f ${archive}

				        done

				        # Get history trend

				        time aws s3 cp --recursive --only-show-errors "s3://${BUCKET}/${REPORT_PREFIX}/latest/history" "${WORKDIR}/latest/history" || true

				        # Generate report

				        time allure generate --clean --output ${WORKDIR}/report ${WORKDIR}/*

				        # Replace a logo link with a redirect to the latest version of the report

				        sed -i 's|<a href="." class=|<a href="https://'${BUCKET}'.s3.amazonaws.com/'${REPORT_PREFIX}'/latest/index.html?nocache='"'+Date.now()+'"'" class=|g' ${WORKDIR}/report/app.js

				        # Upload a history and the final report (in this particular order to not to have duplicated history in 2 places)

				        time aws s3 mv --recursive --only-show-errors "${WORKDIR}/report/history" "s3://${BUCKET}/${REPORT_PREFIX}/latest/history"

				        # Use aws s3 cp (instead of aws s3 sync) to keep files from previous runs to make old URLs work,

				        # and to keep files on the host to upload them to the database

				        time s5cmd --log error cp "${WORKDIR}/report/*" "s3://${BUCKET}/${REPORT_PREFIX}/${GITHUB_RUN_ID}/"

				        # Generate redirect

				        cat <<EOF > ${WORKDIR}/index.html

				          <!DOCTYPE html>

				          <meta charset="utf-8">

				          <title>Redirecting to ${REPORT_URL}</title>

				          <meta http-equiv="refresh" content="0; URL=${REPORT_URL}">

				        EOF

				        time aws s3 cp --only-show-errors ${WORKDIR}/index.html "s3://${BUCKET}/${REPORT_PREFIX}/latest/index.html"

				        echo "base-url=${BASE_URL}"               >> $GITHUB_OUTPUT

				        echo "base-s3-url=${BASE_S3_URL}"         >> $GITHUB_OUTPUT

				        echo "report-url=${REPORT_URL}"           >> $GITHUB_OUTPUT

				        echo "report-json-url=${REPORT_JSON_URL}" >> $GITHUB_OUTPUT

				        echo "[Allure Report](${REPORT_URL})" >> ${GITHUB_STEP_SUMMARY}

				    - name: Release lock

				      if: always()

				      shell: bash -euxo pipefail {0}

				      run: |

				        aws s3 cp --only-show-errors "s3://${BUCKET}/${LOCK_FILE}" ./lock.txt || exit 0

				        if [ "$(cat lock.txt)" = "${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}" ]; then

				          aws s3 rm "s3://${BUCKET}/${LOCK_FILE}"

				        fi

				    - name: Cache poetry deps

				      uses: actions/cache@v4

				      with:

				        path: ~/.cache/pypoetry/virtualenvs

				        key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-${{ hashFiles('poetry.lock') }}

				    - name: Store Allure test stat in the DB (new)

				      if: ${{ !cancelled() && inputs.store-test-results-into-db == 'true' }}

				      shell: bash -euxo pipefail {0}

				      env:

				        COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}

				        BASE_S3_URL: ${{ steps.generate-report.outputs.base-s3-url }}

				      run: |

				        if [ ! -d "${WORKDIR}/report/data/test-cases" ]; then

				          exit 0

				        fi

				        export DATABASE_URL=${REGRESS_TEST_RESULT_CONNSTR_NEW}

				        ./scripts/pysync

				        poetry run python3 scripts/ingest_regress_test_result-new-format.py \

				          --reference ${GITHUB_REF} \

				          --revision ${COMMIT_SHA} \

				          --run-id ${GITHUB_RUN_ID} \

				          --run-attempt ${GITHUB_RUN_ATTEMPT} \

				          --test-cases-dir ${WORKDIR}/report/data/test-cases

				    - name: Cleanup

				      if: always()

				      shell: bash -euxo pipefail {0}

				      run: |

				        if [ -d "${WORKDIR}" ]; then

				          rm -rf ${WORKDIR}

				        fi

				    - uses: actions/github-script@v7

				      if: always()

				      env:

				        REPORT_URL: ${{ steps.generate-report.outputs.report-url }}

				        COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}

				      with:

				        script: |

				          const { REPORT_URL, COMMIT_SHA } = process.env

				          await github.rest.repos.createCommitStatus({

				            owner: context.repo.owner,

				            repo: context.repo.repo,

				            sha: `${COMMIT_SHA}`,

				            state: 'success',

				            target_url: `${REPORT_URL}`,

				            context: 'Allure report',

				          })

									
										72

.github/actions/allure-report-store/action.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,72 @@

				name: 'Store Allure results'

				description: 'Upload test results to be used by actions/allure-report-generate'

				inputs:

				  report-dir:

				    description: 'directory with test results generated by tests'

				    required: true

				  unique-key:

				    description: 'string to distinguish different results in the same run'

				    required: true

				runs:

				  using: "composite"

				  steps:

				    - name: Set variables

				      shell: bash -euxo pipefail {0}

				      run: |

				        PR_NUMBER=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)

				        if [ "${PR_NUMBER}" != "null" ]; then

				          BRANCH_OR_PR=pr-${PR_NUMBER}

				        elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || [ "${GITHUB_REF_NAME}" = "release-proxy" ]; then

				          # Shortcut for special branches

				          BRANCH_OR_PR=${GITHUB_REF_NAME}

				        else

				          BRANCH_OR_PR=branch-$(printf "${GITHUB_REF_NAME}" | tr -c "[:alnum:]._-" "-")

				        fi

				        echo "BRANCH_OR_PR=${BRANCH_OR_PR}" >> $GITHUB_ENV

				        echo "REPORT_DIR=${REPORT_DIR}"     >> $GITHUB_ENV

				      env:

				        REPORT_DIR: ${{ inputs.report-dir }}

				    - name: Upload test results

				      shell: bash -euxo pipefail {0}

				      run: |

				        REPORT_PREFIX=reports/${BRANCH_OR_PR}

				        RAW_PREFIX=reports-raw/${BRANCH_OR_PR}/${GITHUB_RUN_ID}

				        # Add metadata

				        cat <<EOF > ${REPORT_DIR}/executor.json

				          {

				            "name": "GitHub Actions",

				            "type": "github",

				            "url": "https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/latest/index.html",

				            "buildOrder": ${GITHUB_RUN_ID},

				            "buildName": "GitHub Actions Run #${GITHUB_RUN_NUMBER}/${GITHUB_RUN_ATTEMPT}",

				            "buildUrl": "${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}/attempts/${GITHUB_RUN_ATTEMPT}",

				            "reportUrl": "https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/${GITHUB_RUN_ID}/index.html",

				            "reportName": "Allure Report"

				          }

				        EOF

				        cat <<EOF > ${REPORT_DIR}/environment.properties

				          COMMIT_SHA=${COMMIT_SHA}

				        EOF

				        ARCHIVE="${UNIQUE_KEY}-${GITHUB_RUN_ATTEMPT}-$(date +%s).tar.zst"

				        ZSTD_NBTHREADS=0

				        time tar -C ${REPORT_DIR} -cf ${ARCHIVE} --zstd .

				        time aws s3 mv --only-show-errors ${ARCHIVE} "s3://${BUCKET}/${RAW_PREFIX}/${ARCHIVE}"

				      env:

				        UNIQUE_KEY: ${{ inputs.unique-key }}

				        COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}

				        BUCKET: neon-github-public-dev

				    - name: Cleanup

				      if: always()

				      shell: bash -euxo pipefail {0}

				      run: |

				        rm -rf ${REPORT_DIR}

									
										232

.github/actions/allure-report/action.yml
									
										vendored
									
												View File
											
				@@ -1,232 +0,0 @@

				name: 'Create Allure report'

				description: 'Create and publish Allure report'

				inputs:

				  action:

				    desctiption: 'generate or store'

				    required: true

				  build_type:

				    description: '`build_type` from run-python-test-set action'

				    required: true

				  test_selection:

				    description: '`test_selector` from run-python-test-set action'

				    required: false

				outputs:

				  report-url:

				    description: 'Allure report URL'

				    value: ${{ steps.generate-report.outputs.report-url }}

				runs:

				  using: "composite"

				  steps:

				    - name: Validate input parameters

				      shell: bash -euxo pipefail {0}

				      run: |

				        if [ "${{ inputs.action }}" != "store" ] && [ "${{ inputs.action }}" != "generate" ]; then

				          echo 2>&1 "Unknown inputs.action type '${{ inputs.action }}'; allowed 'generate' or 'store' only"

				          exit 1

				        fi

				        if [ -z "${{ inputs.test_selection }}" ] && [ "${{ inputs.action }}" == "store" ]; then

				          echo 2>&1 "inputs.test_selection must be set for 'store' action"

				          exit 2

				        fi

				    - name: Calculate variables

				      id: calculate-vars

				      shell: bash -euxo pipefail {0}

				      run: |

				        # TODO: for manually triggered workflows (via workflow_dispatch) we need to have a separate key

				        pr_number=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)

				        if [ "${pr_number}" != "null" ]; then

				          key=pr-${pr_number}

				        elif [ "${GITHUB_REF_NAME}" = "main" ]; then

				          # Shortcut for a special branch

				          key=main

				        elif [ "${GITHUB_REF_NAME}" = "release" ]; then

				          # Shortcut for a special branch

				          key=release

				        else

				          key=branch-$(printf "${GITHUB_REF_NAME}" | tr -c "[:alnum:]._-" "-")

				        fi

				        echo "KEY=${key}" >> $GITHUB_OUTPUT

				        # Sanitize test selection to remove `/` and any other special characters

				        # Use printf instead of echo to avoid having `\n` at the end of the string

				        test_selection=$(printf "${{ inputs.test_selection }}" | tr -c "[:alnum:]._-" "-" )

				        echo "TEST_SELECTION=${test_selection}" >> $GITHUB_OUTPUT

				    - uses: actions/setup-java@v3

				      if: ${{ inputs.action == 'generate' }}

				      with:

				        distribution: 'temurin'

				        java-version: '17'

				    - name: Install Allure

				      if: ${{ inputs.action == 'generate' }}

				      shell: bash -euxo pipefail {0}

				      run: |

				        if ! which allure; then

				          ALLURE_ZIP=allure-${ALLURE_VERSION}.zip

				          wget -q https://github.com/allure-framework/allure2/releases/download/${ALLURE_VERSION}/${ALLURE_ZIP}

				          echo "${ALLURE_ZIP_MD5}  ${ALLURE_ZIP}" | md5sum -c

				          unzip -q ${ALLURE_ZIP}

				          echo "$(pwd)/allure-${ALLURE_VERSION}/bin" >> $GITHUB_PATH

				          rm -f ${ALLURE_ZIP}

				        fi

				      env:

				        ALLURE_VERSION: 2.19.0

				        ALLURE_ZIP_MD5: ced21401a1a8b9dfb68cee9e4c210464

				    - name: Upload Allure results

				      if: ${{ inputs.action == 'store' }}

				      env:

				        REPORT_PREFIX: reports/${{ steps.calculate-vars.outputs.KEY }}/${{ inputs.build_type }}

				        RAW_PREFIX: reports-raw/${{ steps.calculate-vars.outputs.KEY }}/${{ inputs.build_type }}

				        TEST_OUTPUT: /tmp/test_output

				        BUCKET: neon-github-public-dev

				        TEST_SELECTION: ${{ steps.calculate-vars.outputs.TEST_SELECTION }}

				      shell: bash -euxo pipefail {0}

				      run: |

				        # Add metadata

				        cat <<EOF > $TEST_OUTPUT/allure/results/executor.json

				          {

				            "name": "GitHub Actions",

				            "type": "github",

				            "url": "https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/latest/index.html",

				            "buildOrder": ${GITHUB_RUN_ID},

				            "buildName": "GitHub Actions Run #${{ github.run_number }}/${GITHUB_RUN_ATTEMPT}",

				            "buildUrl": "${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}/attempts/${GITHUB_RUN_ATTEMPT}",

				            "reportUrl": "https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/${GITHUB_RUN_ID}/index.html",

				            "reportName": "Allure Report"

				          }

				        EOF

				        cat <<EOF > $TEST_OUTPUT/allure/results/environment.properties

				          TEST_SELECTION=${{ inputs.test_selection }}

				          BUILD_TYPE=${{ inputs.build_type }}

				        EOF

				        ARCHIVE="${GITHUB_RUN_ID}-${TEST_SELECTION}-${GITHUB_RUN_ATTEMPT}-$(date +%s).tar.zst"

				        ZSTD_NBTHREADS=0

				        tar -C ${TEST_OUTPUT}/allure/results -cf ${ARCHIVE} --zstd .

				        aws s3 mv --only-show-errors ${ARCHIVE} "s3://${BUCKET}/${RAW_PREFIX}/${ARCHIVE}"

				    # Potentially we could have several running build for the same key (for example for the main branch),  so we use improvised lock for this

				    - name: Acquire Allure lock

				      if: ${{ inputs.action == 'generate' }}

				      shell: bash -euxo pipefail {0}

				      env:

				        LOCK_FILE: reports/${{ steps.calculate-vars.outputs.KEY }}/lock.txt

				        BUCKET: neon-github-public-dev

				        TEST_SELECTION: ${{ steps.calculate-vars.outputs.TEST_SELECTION }}

				      run: |

				        LOCK_TIMEOUT=300 # seconds

				        for _ in $(seq 1 5); do

				          for i in $(seq 1 ${LOCK_TIMEOUT}); do

				            LOCK_ADDED=$(aws s3api head-object --bucket neon-github-public-dev --key ${LOCK_FILE} | jq --raw-output '.LastModified' || true)

				            # `date --date="..."` is supported only by gnu date (i.e. it doesn't work on BSD/macOS)

				            if [ -z "${LOCK_ADDED}" ] || [ "$(( $(date +%s) - $(date --date="${LOCK_ADDED}" +%s) ))" -gt "${LOCK_TIMEOUT}" ]; then

				              break

				            fi

				            sleep 1

				          done

				          echo "${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}-${TEST_SELECTION}" > lock.txt

				          aws s3 mv --only-show-errors lock.txt "s3://${BUCKET}/${LOCK_FILE}"

				          # A double-check that exactly WE have acquired the lock

				          aws s3 cp --only-show-errors "s3://${BUCKET}/${LOCK_FILE}" ./lock.txt

				          if [ "$(cat lock.txt)" = "${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}-${TEST_SELECTION}" ]; then

				            break

				          fi

				        done

				    - name: Generate and publish final Allure report

				      if: ${{ inputs.action == 'generate' }}

				      id: generate-report

				      env:

				        REPORT_PREFIX: reports/${{ steps.calculate-vars.outputs.KEY }}/${{ inputs.build_type }}

				        RAW_PREFIX: reports-raw/${{ steps.calculate-vars.outputs.KEY }}/${{ inputs.build_type }}

				        TEST_OUTPUT: /tmp/test_output

				        BUCKET: neon-github-public-dev

				      shell: bash -euxo pipefail {0}

				      run: |

				        # Get previously uploaded data for this run

				        ZSTD_NBTHREADS=0

				        s3_filepaths=$(aws s3api list-objects-v2 --bucket ${BUCKET} --prefix ${RAW_PREFIX}/${GITHUB_RUN_ID}- | jq --raw-output  '.Contents[].Key')

				        if [ -z "$s3_filepaths" ]; then

				          # There's no previously uploaded data for this run

				          exit 0

				        fi

				        for s3_filepath in ${s3_filepaths}; do

				          aws s3 cp --only-show-errors "s3://${BUCKET}/${s3_filepath}" "${TEST_OUTPUT}/allure/"

				          archive=${TEST_OUTPUT}/allure/$(basename $s3_filepath)

				          mkdir -p ${archive%.tar.zst}

				          tar -xf ${archive} -C ${archive%.tar.zst}

				          rm -f ${archive}

				        done

				        # Get history trend

				        aws s3 cp --recursive --only-show-errors "s3://${BUCKET}/${REPORT_PREFIX}/latest/history" "${TEST_OUTPUT}/allure/latest/history" || true

				        # Generate report

				        allure generate --clean --output $TEST_OUTPUT/allure/report $TEST_OUTPUT/allure/*

				        # Replace a logo link with a redirect to the latest version of the report

				        sed -i 's|<a href="." class=|<a href="https://'${BUCKET}'.s3.amazonaws.com/'${REPORT_PREFIX}'/latest/index.html" class=|g' $TEST_OUTPUT/allure/report/app.js

				        # Upload a history and the final report (in this particular order to not to have duplicated history in 2 places)

				        aws s3 mv --recursive --only-show-errors "${TEST_OUTPUT}/allure/report/history" "s3://${BUCKET}/${REPORT_PREFIX}/latest/history"

				        aws s3 mv --recursive --only-show-errors "${TEST_OUTPUT}/allure/report" "s3://${BUCKET}/${REPORT_PREFIX}/${GITHUB_RUN_ID}"

				        REPORT_URL=https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/${GITHUB_RUN_ID}/index.html

				        # Generate redirect

				        cat <<EOF > ./index.html

				          <!DOCTYPE html>

				          <meta charset="utf-8">

				          <title>Redirecting to ${REPORT_URL}</title>

				          <meta http-equiv="refresh" content="0; URL=${REPORT_URL}">

				        EOF

				        aws s3 cp --only-show-errors ./index.html "s3://${BUCKET}/${REPORT_PREFIX}/latest/index.html"

				        echo "[Allure Report](${REPORT_URL})" >> ${GITHUB_STEP_SUMMARY}

				        echo "report-url=${REPORT_URL}" >> $GITHUB_OUTPUT

				    - name: Release Allure lock

				      if: ${{ inputs.action == 'generate' && always() }}

				      shell: bash -euxo pipefail {0}

				      env:

				        LOCK_FILE: reports/${{ steps.calculate-vars.outputs.KEY }}/lock.txt

				        BUCKET: neon-github-public-dev

				        TEST_SELECTION: ${{ steps.calculate-vars.outputs.TEST_SELECTION }}

				      run: |

				        aws s3 cp --only-show-errors "s3://${BUCKET}/${LOCK_FILE}" ./lock.txt || exit 0

				        if [ "$(cat lock.txt)" = "${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}-${TEST_SELECTION}" ]; then

				          aws s3 rm "s3://${BUCKET}/${LOCK_FILE}"

				        fi

				    - uses: actions/github-script@v6

				      if: ${{ inputs.action == 'generate' && always() }}

				      env:

				        REPORT_URL: ${{ steps.generate-report.outputs.report-url }}

				        BUILD_TYPE: ${{ inputs.build_type }}

				        SHA: ${{ github.event.pull_request.head.sha || github.sha }}

				      with:

				        script: |

				          const { REPORT_URL, BUILD_TYPE, SHA } = process.env

				          await github.rest.repos.createCommitStatus({

				            owner: context.repo.owner,

				            repo: context.repo.repo,

				            sha: `${SHA}`,

				            state: 'success',

				            target_url: `${REPORT_URL}`,

				            context: `Allure report / ${BUILD_TYPE}`,

				          })

									
										6

.github/actions/download/action.yml
									
										vendored
									
												View File
												
				@@ -26,18 +26,18 @@ runs:

				        TARGET: ${{ inputs.path }}

				        ARCHIVE: /tmp/downloads/${{ inputs.name }}.tar.zst

				        SKIP_IF_DOES_NOT_EXIST: ${{ inputs.skip-if-does-not-exist }}

				        PREFIX: artifacts/${{ inputs.prefix || format('{0}/{1}', github.run_id, github.run_attempt) }}

				        PREFIX: artifacts/${{ inputs.prefix || format('{0}/{1}/{2}', github.event.pull_request.head.sha || github.sha, github.run_id, github.run_attempt) }}

				      run: |

				        BUCKET=neon-github-public-dev

				        FILENAME=$(basename $ARCHIVE)

				        S3_KEY=$(aws s3api list-objects-v2 --bucket ${BUCKET} --prefix ${PREFIX%$GITHUB_RUN_ATTEMPT} | jq -r '.Contents[].Key' | grep ${FILENAME} | sort --version-sort | tail -1 || true)

				        S3_KEY=$(aws s3api list-objects-v2 --bucket ${BUCKET} --prefix ${PREFIX%$GITHUB_RUN_ATTEMPT} | jq -r '.Contents[]?.Key' | grep ${FILENAME} | sort --version-sort | tail -1 || true)

				        if [ -z "${S3_KEY}" ]; then

				          if [ "${SKIP_IF_DOES_NOT_EXIST}" = "true" ]; then

				            echo 'SKIPPED=true' >> $GITHUB_OUTPUT

				            exit 0

				          else

				            echo 2>&1 "Neither s3://${BUCKET}/${PREFIX}/${FILENAME} nor its version from previous attempts exist"

				            echo >&2 "Neither s3://${BUCKET}/${PREFIX}/${FILENAME} nor its version from previous attempts exist"

				            exit 1

				          fi

				        fi

									
										12

.github/actions/neon-branch-create/action.yml
									
										vendored
									
												View File
												
				@@ -3,14 +3,14 @@ description: 'Create Branch using API'

				inputs:

				  api_key:

				    desctiption: 'Neon API key'

				    description: 'Neon API key'

				    required: true

				  project_id:

				    desctiption: 'ID of the Project to create Branch in'

				    description: 'ID of the Project to create Branch in'

				    required: true

				  api_host:

				    desctiption: 'Neon API host'

				    default: console.stage.neon.tech

				    description: 'Neon API host'

				    default: console-stage.neon.build

				outputs:

				  dsn:

				    description: 'Created Branch DSN (for main database)'

				@@ -58,7 +58,7 @@ runs:

				        done

				        if [ -z "${branch_id}" ] || [ "${branch_id}" == "null" ]; then

				          echo 2>&1 "Failed to create branch after 10 attempts, the latest response was: ${branch}"

				          echo >&2 "Failed to create branch after 10 attempts, the latest response was: ${branch}"

				          exit 1

				        fi

				@@ -122,7 +122,7 @@ runs:

				        done

				        if [ -z "${password}" ] || [ "${password}" == "null" ]; then

				          echo 2>&1 "Failed to reset password after 10 attempts, the latest response was: ${reset_password}"

				          echo >&2 "Failed to reset password after 10 attempts, the latest response was: ${reset_password}"

				          exit 1

				        fi

									
										12

.github/actions/neon-branch-delete/action.yml
									
										vendored
									
												View File
												
				@@ -3,17 +3,17 @@ description: 'Delete Branch using API'

				inputs:

				  api_key:

				    desctiption: 'Neon API key'

				    description: 'Neon API key'

				    required: true

				  project_id:

				    desctiption: 'ID of the Project which should be deleted'

				    description: 'ID of the Project which should be deleted'

				    required: true

				  branch_id:

				    desctiption: 'ID of the branch to delete'

				    description: 'ID of the branch to delete'

				    required: true

				  api_host:

				    desctiption: 'Neon API host'

				    default: console.stage.neon.tech

				    description: 'Neon API host'

				    default: console-stage.neon.build

				runs:

				  using: "composite"

				@@ -48,7 +48,7 @@ runs:

				        done

				        if [ -z "${branch_id}" ] || [ "${branch_id}" == "null" ]; then

				          echo 2>&1 "Failed to delete branch after 10 attempts, the latest response was: ${deleted_branch}"

				          echo >&2 "Failed to delete branch after 10 attempts, the latest response was: ${deleted_branch}"

				          exit 1

				        fi

				      env:

									
										28

.github/actions/neon-project-create/action.yml
									
										vendored
									
												View File
												
				@@ -3,17 +3,23 @@ description: 'Create Neon Project using API'

				inputs:

				  api_key:

				    desctiption: 'Neon API key'

				    description: 'Neon API key'

				    required: true

				  region_id:

				    desctiption: 'Region ID, if not set the project will be created in the default region'

				    description: 'Region ID, if not set the project will be created in the default region'

				    default: aws-us-east-2

				  postgres_version:

				    desctiption: 'Postgres version; default is 15'

				    default: 15

				    description: 'Postgres version; default is 15'

				    default: '15'

				  api_host:

				    desctiption: 'Neon API host'

				    default: console.stage.neon.tech

				    description: 'Neon API host'

				    default: console-stage.neon.build

				  provisioner:

				    description: 'k8s-pod or k8s-neonvm'

				    default: 'k8s-pod'

				  compute_units:

				    description: '[Min, Max] compute units; Min and Max are used for k8s-neonvm with autoscaling, for k8s-pod values Min and Max should be equal'

				    default: '[1, 1]'

				outputs:

				  dsn:

				@@ -31,6 +37,10 @@ runs:

				      # A shell without `set -x` to not to expose password/dsn in logs

				      shell: bash -euo pipefail {0}

				      run: |

				        if [ "${PROVISIONER}" == "k8s-pod" ] && [ "${MIN_CU}" != "${MAX_CU}" ]; then

				          echo >&2 "For k8s-pod provisioner MIN_CU should be equal to MAX_CU"

				        fi

				        project=$(curl \

				          "https://${API_HOST}/api/v2/projects" \

				          --fail \

				@@ -42,6 +52,9 @@ runs:

				              \"name\": \"Created by actions/neon-project-create; GITHUB_RUN_ID=${GITHUB_RUN_ID}\",

				              \"pg_version\": ${POSTGRES_VERSION},

				              \"region_id\": \"${REGION_ID}\",

				              \"provisioner\": \"${PROVISIONER}\",

				              \"autoscaling_limit_min_cu\": ${MIN_CU},

				              \"autoscaling_limit_max_cu\": ${MAX_CU},

				              \"settings\": { }

				            }

				          }")

				@@ -62,3 +75,6 @@ runs:

				        API_KEY: ${{ inputs.api_key }}

				        REGION_ID: ${{ inputs.region_id }}

				        POSTGRES_VERSION: ${{ inputs.postgres_version }}

				        PROVISIONER: ${{ inputs.provisioner }}

				        MIN_CU: ${{ fromJSON(inputs.compute_units)[0] }}

				        MAX_CU: ${{ fromJSON(inputs.compute_units)[1] }}

									
										8

.github/actions/neon-project-delete/action.yml
									
										vendored
									
												View File
												
				@@ -3,14 +3,14 @@ description: 'Delete Neon Project using API'

				inputs:

				  api_key:

				    desctiption: 'Neon API key'

				    description: 'Neon API key'

				    required: true

				  project_id:

				    desctiption: 'ID of the Project to delete'

				    description: 'ID of the Project to delete'

				    required: true

				  api_host:

				    desctiption: 'Neon API host'

				    default: console.stage.neon.tech

				    description: 'Neon API host'

				    default: console-stage.neon.build

				runs:

				  using: "composite"

									
										86

.github/actions/run-python-test-set/action.yml
									
										vendored
									
												View File
												
				@@ -36,14 +36,18 @@ inputs:

				    description: 'Region name for real s3 tests'

				    required: false

				    default: ''

				  real_s3_access_key_id:

				    description: 'Access key id'

				  rerun_flaky:

				    description: 'Whether to rerun flaky tests'

				    required: false

				    default: ''

				  real_s3_secret_access_key:

				    description: 'Secret access key'

				    default: 'false'

				  pg_version:

				    description: 'Postgres version to use for tests'

				    required: false

				    default: ''

				    default: 'v14'

				  benchmark_durations:

				    description: 'benchmark durations JSON'

				    required: false

				    default: '{}'

				runs:

				  using: "composite"

				@@ -52,38 +56,40 @@ runs:

				      if: inputs.build_type != 'remote'

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-${{ inputs.build_type }}-artifact

				        name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}-artifact

				        path: /tmp/neon

				    - name: Download Neon binaries for the previous release

				      if: inputs.build_type != 'remote'

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-${{ inputs.build_type }}-artifact

				        name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}-artifact

				        path: /tmp/neon-previous

				        prefix: latest

				    - name: Download compatibility snapshot for Postgres 14

				    - name: Download compatibility snapshot

				      if: inputs.build_type != 'remote'

				      uses: ./.github/actions/download

				      with:

				        name: compatibility-snapshot-${{ inputs.build_type }}-pg14

				        path: /tmp/compatibility_snapshot_pg14

				        name: compatibility-snapshot-${{ inputs.build_type }}-pg${{ inputs.pg_version }}

				        path: /tmp/compatibility_snapshot_pg${{ inputs.pg_version }}

				        prefix: latest

				        # The lack of compatibility snapshot (for example, for the new Postgres version)

				        # shouldn't fail the whole job. Only relevant test should fail.

				        skip-if-does-not-exist: true

				    - name: Checkout

				      if: inputs.needs_postgres_source == 'true'

				      uses: actions/checkout@v3

				      uses: actions/checkout@v4

				      with:

				        submodules: true

				        fetch-depth: 1

				    - name: Cache poetry deps

				      id: cache_poetry

				      uses: actions/cache@v3

				      uses: actions/cache@v4

				      with:

				        path: ~/.cache/pypoetry/virtualenvs

				        key: v1-${{ runner.os }}-python-deps-${{ hashFiles('poetry.lock') }}

				        key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-${{ hashFiles('poetry.lock') }}

				    - name: Install Python deps

				      shell: bash -euxo pipefail {0}

				@@ -96,18 +102,18 @@ runs:

				        COMPATIBILITY_POSTGRES_DISTRIB_DIR: /tmp/neon-previous/pg_install

				        TEST_OUTPUT: /tmp/test_output

				        BUILD_TYPE: ${{ inputs.build_type }}

				        AWS_ACCESS_KEY_ID: ${{ inputs.real_s3_access_key_id }}

				        AWS_SECRET_ACCESS_KEY: ${{ inputs.real_s3_secret_access_key }}

				        COMPATIBILITY_SNAPSHOT_DIR: /tmp/compatibility_snapshot_pg14

				        COMPATIBILITY_SNAPSHOT_DIR: /tmp/compatibility_snapshot_pg${{ inputs.pg_version }}

				        ALLOW_BACKWARD_COMPATIBILITY_BREAKAGE: contains(github.event.pull_request.labels.*.name, 'backward compatibility breakage')

				        ALLOW_FORWARD_COMPATIBILITY_BREAKAGE: contains(github.event.pull_request.labels.*.name, 'forward compatibility breakage')

				        RERUN_FLAKY: ${{ inputs.rerun_flaky }}

				        PG_VERSION: ${{ inputs.pg_version }}

				      shell: bash -euxo pipefail {0}

				      run: |

				        # PLATFORM will be embedded in the perf test report

				        # and it is needed to distinguish different environments

				        export PLATFORM=${PLATFORM:-github-actions-selfhosted}

				        export POSTGRES_DISTRIB_DIR=${POSTGRES_DISTRIB_DIR:-/tmp/neon/pg_install}

				        export DEFAULT_PG_VERSION=${DEFAULT_PG_VERSION:-14}

				        export DEFAULT_PG_VERSION=${PG_VERSION#v}

				        if [ "${BUILD_TYPE}" = "remote" ]; then

				          export REMOTE_ENV=1

				@@ -143,6 +149,25 @@ runs:

				          EXTRA_PARAMS="--out-dir $PERF_REPORT_DIR $EXTRA_PARAMS"

				        fi

				        if [ "${RERUN_FLAKY}" == "true" ]; then

				          mkdir -p $TEST_OUTPUT

				          poetry run ./scripts/flaky_tests.py "${TEST_RESULT_CONNSTR}" \

				                                              --days 7 \

				                                              --output "$TEST_OUTPUT/flaky.json" \

				                                              --pg-version "${DEFAULT_PG_VERSION}" \

				                                              --build-type "${BUILD_TYPE}"

				          EXTRA_PARAMS="--flaky-tests-json $TEST_OUTPUT/flaky.json $EXTRA_PARAMS"

				        fi

				        # We use pytest-split plugin to run benchmarks in parallel on different CI runners

				        if [ "${TEST_SELECTION}" = "test_runner/performance" ] && [ "${{ inputs.build_type }}" != "remote" ]; then

				          mkdir -p $TEST_OUTPUT

				          echo '${{ inputs.benchmark_durations || '{}' }}' > $TEST_OUTPUT/benchmark_durations.json

				          EXTRA_PARAMS="--durations-path $TEST_OUTPUT/benchmark_durations.json $EXTRA_PARAMS"

				        fi

				        if [[ "${{ inputs.build_type }}" == "debug" ]]; then

				          cov_prefix=(scripts/coverage "--profraw-prefix=$GITHUB_JOB" --dir=/tmp/coverage run)

				        elif [[ "${{ inputs.build_type }}" == "release" ]]; then

				@@ -158,8 +183,7 @@ runs:

				        # Run the tests.

				        #

				        # The junit.xml file allows CI tools to display more fine-grained test information

				        # in its "Tests" tab in the results page.

				        # --alluredir saves test results in Allure format (in a specified directory)

				        # --verbose prints name of each test (helpful when there are

				        # multiple tests in one file)

				        # -rA prints summary in the end

				@@ -168,7 +192,6 @@ runs:

				        #

				        mkdir -p $TEST_OUTPUT/allure/results

				        "${cov_prefix[@]}" ./scripts/pytest \

				          --junitxml=$TEST_OUTPUT/junit.xml \

				          --alluredir=$TEST_OUTPUT/allure/results \

				          --tb=short \

				          --verbose \

				@@ -180,19 +203,18 @@ runs:

				          scripts/generate_and_push_perf_report.sh

				        fi

				    - name: Upload compatibility snapshot for Postgres 14

				    - name: Upload compatibility snapshot

				      if: github.ref_name == 'release'

				      uses: ./.github/actions/upload

				      with:

				        name: compatibility-snapshot-${{ inputs.build_type }}-pg14-${{ github.run_id }}

				        # The path includes a test name (test_create_snapshot) and directory that the test creates (compatibility_snapshot_pg14), keep the path in sync with the test

				        path: /tmp/test_output/test_create_snapshot/compatibility_snapshot_pg14/

				        name: compatibility-snapshot-${{ inputs.build_type }}-pg${{ inputs.pg_version }}-${{ github.run_id }}

				        # Directory is created by test_compatibility.py::test_create_snapshot, keep the path in sync with the test

				        path: /tmp/test_output/compatibility_snapshot_pg${{ inputs.pg_version }}/

				        prefix: latest

				    - name: Create Allure report

				      if: success() || failure()

				      uses: ./.github/actions/allure-report

				    - name: Upload test results

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-store

				      with:

				        action: store

				        build_type: ${{ inputs.build_type }}

				        test_selection: ${{ inputs.test_selection }}

				        report-dir: /tmp/test_output/allure/results

				        unique-key: ${{ inputs.build_type }}-${{ inputs.pg_version }}

									
										10

.github/actions/upload/action.yml
									
										vendored
									
												View File
												
				@@ -8,7 +8,7 @@ inputs:

				    description: "A directory or file to upload"

				    required: true

				  prefix:

				    description: "S3 prefix. Default is '${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}'"

				    description: "S3 prefix. Default is '${GITHUB_SHA}/${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}'"

				    required: false

				runs:

				@@ -23,7 +23,7 @@ runs:

				        mkdir -p $(dirname $ARCHIVE)

				        if [ -f ${ARCHIVE} ]; then

				          echo 2>&1 "File ${ARCHIVE} already exist. Something went wrong before"

				          echo >&2 "File ${ARCHIVE} already exist. Something went wrong before"

				          exit 1

				        fi

				@@ -33,10 +33,10 @@ runs:

				        elif [ -f ${SOURCE} ]; then

				          time tar -cf ${ARCHIVE} --zstd ${SOURCE}

				        elif ! ls ${SOURCE} > /dev/null 2>&1; then

				          echo 2>&1 "${SOURCE} does not exist"

				          echo >&2 "${SOURCE} does not exist"

				          exit 2

				        else

				          echo 2>&1 "${SOURCE} is neither a directory nor a file, do not know how to handle it"

				          echo >&2 "${SOURCE} is neither a directory nor a file, do not know how to handle it"

				          exit 3

				        fi

				@@ -45,7 +45,7 @@ runs:

				      env:

				        SOURCE: ${{ inputs.path }}

				        ARCHIVE: /tmp/uploads/${{ inputs.name }}.tar.zst

				        PREFIX: artifacts/${{ inputs.prefix || format('{0}/{1}', github.run_id, github.run_attempt) }}

				        PREFIX: artifacts/${{ inputs.prefix || format('{0}/{1}/{2}', github.event.pull_request.head.sha || github.sha, github.run_id , github.run_attempt) }}

				      run: |

				        BUCKET=neon-github-public-dev

				        FILENAME=$(basename $ARCHIVE)

5

.github/ansible/.gitignore vendored

View File

@@ -1,5 +0,0 @@
 neon_install.tar.gz
 .neon_current_version
 collections/*
 !collections/.keep

									
										12

.github/ansible/ansible.cfg
									
										vendored
									
												View File
											
				@@ -1,12 +0,0 @@

				[defaults]

				localhost_warning = False

				host_key_checking = False

				timeout = 30

				[ssh_connection]

				ssh_args   = -F ./ansible.ssh.cfg

				# teleport doesn't support sftp yet https://github.com/gravitational/teleport/issues/7127

				# and scp neither worked for me

				transfer_method = piped

				pipelining = True

									
										15

.github/ansible/ansible.ssh.cfg
									
										vendored
									
												View File
											
				@@ -1,15 +0,0 @@

				# Remove this once https://github.com/gravitational/teleport/issues/10918 is fixed

				# (use pre 8.5 option name to cope with old ssh in CI)

				PubkeyAcceptedKeyTypes +ssh-rsa-cert-v01@openssh.com

				Host tele.zenith.tech

				    User admin

				    Port 3023

				    StrictHostKeyChecking no

				    UserKnownHostsFile /dev/null

				Host * !tele.zenith.tech

				    User admin

				    StrictHostKeyChecking no

				    UserKnownHostsFile /dev/null

				    ProxyJump tele.zenith.tech

									
										193

.github/ansible/deploy.yaml
									
										vendored
									
												View File
											
				@@ -1,193 +0,0 @@

				- name: Upload Neon binaries

				  hosts: storage

				  gather_facts: False

				  remote_user: "{{ remote_user }}"

				  tasks:

				    - name: get latest version of Neon binaries

				      register: current_version_file

				      set_fact:

				        current_version: "{{ lookup('file', '.neon_current_version') | trim }}"

				      tags:

				      - pageserver

				      - safekeeper

				    - name: inform about versions

				      debug:

				        msg: "Version to deploy - {{ current_version }}"

				      tags:

				      - pageserver

				      - safekeeper

				    - name: upload and extract Neon binaries to /usr/local

				      ansible.builtin.unarchive:

				        owner: root

				        group: root

				        src: neon_install.tar.gz

				        dest: /usr/local

				      become: true

				      tags:

				      - pageserver

				      - safekeeper

				      - binaries

				      - putbinaries

				- name: Deploy pageserver

				  hosts: pageservers

				  gather_facts: False

				  remote_user: "{{ remote_user }}"

				  tasks:

				    - name: upload init script

				      when: console_mgmt_base_url is defined

				      ansible.builtin.template:

				        src: scripts/init_pageserver.sh

				        dest: /tmp/init_pageserver.sh

				        owner: root

				        group: root

				        mode: '0755'

				      become: true

				      tags:

				      - pageserver

				    - name: init pageserver

				      shell:

				        cmd: /tmp/init_pageserver.sh

				      args:

				        creates: "/storage/pageserver/data/tenants"

				      environment:

				        NEON_REPO_DIR: "/storage/pageserver/data"

				        LD_LIBRARY_PATH: "/usr/local/v14/lib"

				      become: true

				      tags:

				      - pageserver

				    - name: read the existing remote pageserver config

				      ansible.builtin.slurp:

				        src: /storage/pageserver/data/pageserver.toml

				      register: _remote_ps_config

				      tags:

				      - pageserver

				    - name: parse the existing pageserver configuration

				      ansible.builtin.set_fact:

				        _existing_ps_config: "{{ _remote_ps_config['content'] | b64decode | sivel.toiletwater.from_toml }}"

				      tags:

				      - pageserver

				    - name: construct the final pageserver configuration dict

				      ansible.builtin.set_fact:

				        pageserver_config: "{{ pageserver_config_stub | combine({'id': _existing_ps_config.id }) }}"

				      tags:

				      - pageserver

				    - name: template the pageserver config

				      template:

				        src: templates/pageserver.toml.j2

				        dest: /storage/pageserver/data/pageserver.toml

				      become: true

				      tags:

				      - pageserver

				    - name: upload systemd service definition

				      ansible.builtin.template:

				        src: systemd/pageserver.service

				        dest: /etc/systemd/system/pageserver.service

				        owner: root

				        group: root

				        mode: '0644'

				      become: true

				      tags:

				      - pageserver

				    - name: start systemd service

				      ansible.builtin.systemd:

				        daemon_reload: yes

				        name: pageserver

				        enabled: yes

				        state: restarted

				      become: true

				      tags:

				      - pageserver

				    - name: post version to console

				      when: console_mgmt_base_url is defined

				      shell:

				        cmd: |

				          INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)

				          curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/pageservers/$INSTANCE_ID | jq '.version = {{ current_version }}' > /tmp/new_version

				          curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" -X POST -d@/tmp/new_version {{ console_mgmt_base_url }}/management/api/v2/pageservers

				      tags:

				      - pageserver

				- name: Deploy safekeeper

				  hosts: safekeepers

				  gather_facts: False

				  remote_user: "{{ remote_user }}"

				  tasks:

				    - name: upload init script

				      when: console_mgmt_base_url is defined

				      ansible.builtin.template:

				        src: scripts/init_safekeeper.sh

				        dest: /tmp/init_safekeeper.sh

				        owner: root

				        group: root

				        mode: '0755'

				      become: true

				      tags:

				      - safekeeper

				    - name: init safekeeper

				      shell:

				        cmd: /tmp/init_safekeeper.sh

				      args:

				        creates: "/storage/safekeeper/data/safekeeper.id"

				      environment:

				        NEON_REPO_DIR: "/storage/safekeeper/data"

				        LD_LIBRARY_PATH: "/usr/local/v14/lib"

				      become: true

				      tags:

				      - safekeeper

				    # in the future safekeepers should discover pageservers byself

				    # but currently use first pageserver that was discovered

				    - name: set first pageserver var for safekeepers

				      set_fact:

				        first_pageserver: "{{ hostvars[groups['pageservers'][0]]['inventory_hostname'] }}"

				      tags:

				      - safekeeper

				    - name: upload systemd service definition

				      ansible.builtin.template:

				        src: systemd/safekeeper.service

				        dest: /etc/systemd/system/safekeeper.service

				        owner: root

				        group: root

				        mode: '0644'

				      become: true

				      tags:

				      - safekeeper

				    - name: start systemd service

				      ansible.builtin.systemd:

				        daemon_reload: yes

				        name: safekeeper

				        enabled: yes

				        state: restarted

				      become: true

				      tags:

				      - safekeeper

				    - name: post version to console

				      when: console_mgmt_base_url is defined

				      shell:

				        cmd: |

				          INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)

				          curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/safekeepers/$INSTANCE_ID | jq '.version = {{ current_version }}' > /tmp/new_version

				          curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" -X POST -d@/tmp/new_version {{ console_mgmt_base_url }}/management/api/v2/safekeepers

				      tags:

				      - safekeeper

									
										42

.github/ansible/get_binaries.sh
									
										vendored
									
												View File
											
				@@ -1,42 +0,0 @@

				#!/bin/bash

				set -e

				if [ -n "${DOCKER_TAG}" ]; then

				  # Verson is DOCKER_TAG but without prefix

				  VERSION=$(echo $DOCKER_TAG | sed 's/^.*-//g')

				else

				  echo "Please set DOCKER_TAG environment variable"

				  exit 1

				fi

				# do initial cleanup

				rm -rf neon_install postgres_install.tar.gz neon_install.tar.gz .neon_current_version

				mkdir neon_install

				# retrieve binaries from docker image

				echo "getting binaries from docker image"

				docker pull --quiet neondatabase/neon:${DOCKER_TAG}

				ID=$(docker create neondatabase/neon:${DOCKER_TAG})

				docker cp ${ID}:/data/postgres_install.tar.gz .

				tar -xzf postgres_install.tar.gz -C neon_install

				mkdir neon_install/bin/

				docker cp ${ID}:/usr/local/bin/pageserver neon_install/bin/

				docker cp ${ID}:/usr/local/bin/pageserver_binutils neon_install/bin/

				docker cp ${ID}:/usr/local/bin/safekeeper neon_install/bin/

				docker cp ${ID}:/usr/local/bin/storage_broker neon_install/bin/

				docker cp ${ID}:/usr/local/bin/proxy neon_install/bin/

				docker cp ${ID}:/usr/local/v14/bin/ neon_install/v14/bin/

				docker cp ${ID}:/usr/local/v15/bin/ neon_install/v15/bin/

				docker cp ${ID}:/usr/local/v14/lib/ neon_install/v14/lib/

				docker cp ${ID}:/usr/local/v15/lib/ neon_install/v15/lib/

				docker rm -vf ${ID}

				# store version to file (for ansible playbooks) and create binaries tarball

				echo ${VERSION} > neon_install/.neon_current_version

				echo ${VERSION} > .neon_current_version

				tar -czf neon_install.tar.gz -C neon_install .

				# do final cleaup

				rm -rf neon_install postgres_install.tar.gz

									
										38

.github/ansible/prod.ap-southeast-1.hosts.yaml
									
										vendored
									
												View File
											
				@@ -1,38 +0,0 @@

				storage:

				  vars:

				    bucket_name: neon-prod-storage-ap-southeast-1

				    bucket_region: ap-southeast-1

				    console_mgmt_base_url: http://neon-internal-api.aws.neon.tech

				    broker_endpoint: http://storage-broker-lb.epsilon.ap-southeast-1.internal.aws.neon.tech:50051

				    pageserver_config_stub:

				      pg_distrib_dir: /usr/local

				      metric_collection_endpoint: http://neon-internal-api.aws.neon.tech/billing/api/v1/usage_events

				      metric_collection_interval: 10min

				      remote_storage:

				        bucket_name: "{{ bucket_name }}"

				        bucket_region: "{{ bucket_region }}"

				        prefix_in_bucket: "pageserver/v1"

				    safekeeper_s3_prefix: safekeeper/v1/wal

				    hostname_suffix: ""

				    remote_user: ssm-user

				    ansible_aws_ssm_region: ap-southeast-1

				    ansible_aws_ssm_bucket_name: neon-prod-storage-ap-southeast-1

				    console_region_id: aws-ap-southeast-1

				    sentry_environment: production

				  children:

				    pageservers:

				      hosts:

				        pageserver-0.ap-southeast-1.aws.neon.tech:

				          ansible_host:  i-064de8ea28bdb495b

				        pageserver-1.ap-southeast-1.aws.neon.tech:

				          ansible_host:  i-0b180defcaeeb6b93

				    safekeepers:

				      hosts:

				        safekeeper-0.ap-southeast-1.aws.neon.tech:

				          ansible_host:  i-0d6f1dc5161eef894

				        safekeeper-2.ap-southeast-1.aws.neon.tech:

				          ansible_host:  i-04fb63634e4679eb9

				        safekeeper-3.ap-southeast-1.aws.neon.tech:

				          ansible_host:  i-05481f3bc88cfc2d4

									
										38

.github/ansible/prod.eu-central-1.hosts.yaml
									
										vendored
									
												View File
											
				@@ -1,38 +0,0 @@

				storage:

				  vars:

				    bucket_name: neon-prod-storage-eu-central-1

				    bucket_region: eu-central-1

				    console_mgmt_base_url: http://neon-internal-api.aws.neon.tech

				    broker_endpoint: http://storage-broker-lb.gamma.eu-central-1.internal.aws.neon.tech:50051

				    pageserver_config_stub:

				      pg_distrib_dir: /usr/local

				      metric_collection_endpoint: http://neon-internal-api.aws.neon.tech/billing/api/v1/usage_events

				      metric_collection_interval: 10min

				      remote_storage:

				        bucket_name: "{{ bucket_name }}"

				        bucket_region: "{{ bucket_region }}"

				        prefix_in_bucket: "pageserver/v1"

				    safekeeper_s3_prefix: safekeeper/v1/wal

				    hostname_suffix: ""

				    remote_user: ssm-user

				    ansible_aws_ssm_region: eu-central-1

				    ansible_aws_ssm_bucket_name: neon-prod-storage-eu-central-1

				    console_region_id: aws-eu-central-1

				    sentry_environment: production

				  children:

				    pageservers:

				      hosts:

				        pageserver-0.eu-central-1.aws.neon.tech:

				          ansible_host:  i-0cd8d316ecbb715be

				        pageserver-1.eu-central-1.aws.neon.tech:

				          ansible_host:  i-090044ed3d383fef0

				    safekeepers:

				      hosts:

				        safekeeper-0.eu-central-1.aws.neon.tech:

				          ansible_host:  i-0b238612d2318a050

				        safekeeper-1.eu-central-1.aws.neon.tech:

				          ansible_host:  i-07b9c45e5c2637cd4

				        safekeeper-2.eu-central-1.aws.neon.tech:

				          ansible_host:  i-020257302c3c93d88

									
										41

.github/ansible/prod.us-east-2.hosts.yaml
									
										vendored
									
												View File
											
				@@ -1,41 +0,0 @@

				storage:

				  vars:

				    bucket_name: neon-prod-storage-us-east-2

				    bucket_region: us-east-2

				    console_mgmt_base_url: http://neon-internal-api.aws.neon.tech

				    broker_endpoint: http://storage-broker-lb.delta.us-east-2.internal.aws.neon.tech:50051

				    pageserver_config_stub:

				      pg_distrib_dir: /usr/local

				      metric_collection_endpoint: http://neon-internal-api.aws.neon.tech/billing/api/v1/usage_events

				      metric_collection_interval: 10min

				      remote_storage:

				        bucket_name: "{{ bucket_name }}"

				        bucket_region: "{{ bucket_region }}"

				        prefix_in_bucket: "pageserver/v1"

				    safekeeper_s3_prefix: safekeeper/v1/wal

				    hostname_suffix: ""

				    remote_user: ssm-user

				    ansible_aws_ssm_region: us-east-2

				    ansible_aws_ssm_bucket_name: neon-prod-storage-us-east-2

				    console_region_id: aws-us-east-2

				    sentry_environment: production

				  children:

				    pageservers:

				      hosts:

				        pageserver-0.us-east-2.aws.neon.tech:

				          ansible_host:  i-062227ba7f119eb8c

				        pageserver-1.us-east-2.aws.neon.tech:

				          ansible_host:  i-0b3ec0afab5968938

				        pageserver-2.us-east-2.aws.neon.tech:

				          ansible_host:  i-0d7a1c4325e71421d

				    safekeepers:

				      hosts:

				        safekeeper-0.us-east-2.aws.neon.tech:

				          ansible_host:  i-0e94224750c57d346

				        safekeeper-1.us-east-2.aws.neon.tech:

				          ansible_host:  i-06d113fb73bfddeb0

				        safekeeper-2.us-east-2.aws.neon.tech:

				          ansible_host:  i-09f66c8e04afff2e8

									
										43

.github/ansible/prod.us-west-2.hosts.yaml
									
										vendored
									
												View File
											
				@@ -1,43 +0,0 @@

				storage:

				  vars:

				    bucket_name: neon-prod-storage-us-west-2

				    bucket_region: us-west-2

				    console_mgmt_base_url: http://neon-internal-api.aws.neon.tech

				    broker_endpoint: http://storage-broker-lb.eta.us-west-2.internal.aws.neon.tech:50051

				    pageserver_config_stub:

				      pg_distrib_dir: /usr/local

				      metric_collection_endpoint: http://neon-internal-api.aws.neon.tech/billing/api/v1/usage_events

				      metric_collection_interval: 10min

				      remote_storage:

				        bucket_name: "{{ bucket_name }}"

				        bucket_region: "{{ bucket_region }}"

				        prefix_in_bucket: "pageserver/v1"

				    safekeeper_s3_prefix: safekeeper/v1/wal

				    hostname_suffix: ""

				    remote_user: ssm-user

				    ansible_aws_ssm_region: us-west-2

				    ansible_aws_ssm_bucket_name: neon-prod-storage-us-west-2

				    console_region_id: aws-us-west-2-new

				    sentry_environment: production

				  children:

				    pageservers:

				      hosts:

				        pageserver-0.us-west-2.aws.neon.tech:

				          ansible_host: i-0d9f6dfae0e1c780d 

				        pageserver-1.us-west-2.aws.neon.tech:

				          ansible_host: i-0c834be1dddba8b3f

				        pageserver-2.us-west-2.aws.neon.tech:

				          ansible_host: i-051642d372c0a4f32

				        pageserver-3.us-west-2.aws.neon.tech:

				          ansible_host: i-00c3844beb9ad1c6b

				    safekeepers:

				      hosts:

				        safekeeper-0.us-west-2.aws.neon.tech:

				          ansible_host: i-00719d8a74986fda6

				        safekeeper-1.us-west-2.aws.neon.tech:

				          ansible_host: i-074682f9d3c712e7c

				        safekeeper-2.us-west-2.aws.neon.tech:

				          ansible_host: i-042b7efb1729d7966

									
										33

.github/ansible/scripts/init_pageserver.sh
									
										vendored
									
												View File
											
				@@ -1,33 +0,0 @@

				#!/bin/sh

				# fetch params from meta-data service

				INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)

				AZ_ID=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)

				# store fqdn hostname in var

				HOST=$(hostname -f)

				cat <<EOF | tee /tmp/payload

				{

				  "version": 1,

				  "host": "${HOST}",

				  "port": 6400,

				  "region_id": "{{ console_region_id }}",

				  "instance_id": "${INSTANCE_ID}",

				  "http_host": "${HOST}",

				  "http_port": 9898,

				  "active": false,

				  "availability_zone_id": "${AZ_ID}"

				}

				EOF

				# check if pageserver already registered or not

				if ! curl -sf -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/pageservers/${INSTANCE_ID} -o /dev/null; then

				    # not registered, so register it now

				    ID=$(curl -sf -X POST -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/pageservers -d@/tmp/payload | jq -r '.id')

				    # init pageserver

				    sudo -u pageserver /usr/local/bin/pageserver -c "id=${ID}" -c "pg_distrib_dir='/usr/local'" --init -D /storage/pageserver/data

				fi

									
										31

.github/ansible/scripts/init_safekeeper.sh
									
										vendored
									
												View File
											
				@@ -1,31 +0,0 @@

				#!/bin/sh

				# fetch params from meta-data service

				INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)

				AZ_ID=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)

				# store fqdn hostname in var

				HOST=$(hostname -f)

				cat <<EOF | tee /tmp/payload

				{

				  "version": 1,

				  "host": "${HOST}",

				  "port": 6500,

				  "http_port": 7676,

				  "region_id": "{{ console_region_id }}",

				  "instance_id": "${INSTANCE_ID}",

				  "availability_zone_id": "${AZ_ID}",

				  "active": false

				}

				EOF

				# check if safekeeper already registered or not

				if ! curl -sf -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/safekeepers/${INSTANCE_ID} -o /dev/null; then

				    # not registered, so register it now

				    ID=$(curl -sf -X POST -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/safekeepers -d@/tmp/payload | jq -r '.id')

				    # init safekeeper

				    sudo -u safekeeper /usr/local/bin/safekeeper --id ${ID} --init -D /storage/safekeeper/data

				fi

2

.github/ansible/ssm_config vendored

View File

@@ -1,2 +0,0 @@
 ansible_connection: aws_ssm
 ansible_python_interpreter: /usr/bin/python3

									
										41

.github/ansible/staging.eu-west-1.hosts.yaml
									
										vendored
									
												View File
											
				@@ -1,41 +0,0 @@

				storage:

				  vars:

				    bucket_name: neon-dev-storage-eu-west-1

				    bucket_region: eu-west-1

				    console_mgmt_base_url: http://neon-internal-api.aws.neon.build

				    broker_endpoint: http://storage-broker-lb.zeta.eu-west-1.internal.aws.neon.build:50051

				    pageserver_config_stub:

				      pg_distrib_dir: /usr/local

				      metric_collection_endpoint: http://neon-internal-api.aws.neon.build/billing/api/v1/usage_events

				      metric_collection_interval: 10min

				      tenant_config:

				        eviction_policy:

				          kind: "LayerAccessThreshold"

				          period: "20m"

				          threshold: "20m"

				      remote_storage:

				        bucket_name: "{{ bucket_name }}"

				        bucket_region: "{{ bucket_region }}"

				        prefix_in_bucket: "pageserver/v1"

				    safekeeper_s3_prefix: safekeeper/v1/wal

				    hostname_suffix: ""

				    remote_user: ssm-user

				    ansible_aws_ssm_region: eu-west-1

				    ansible_aws_ssm_bucket_name: neon-dev-storage-eu-west-1

				    console_region_id: aws-eu-west-1

				    sentry_environment: staging

				  children:

				    pageservers:

				      hosts:

				        pageserver-0.eu-west-1.aws.neon.build:

				          ansible_host: i-01d496c5041c7f34c

				    safekeepers:

				      hosts:

				        safekeeper-0.eu-west-1.aws.neon.build:

				          ansible_host: i-05226ef85722831bf

				        safekeeper-1.eu-west-1.aws.neon.build:

				          ansible_host: i-06969ee1bf2958bfc

				        safekeeper-2.eu-west-1.aws.neon.build:

				          ansible_host: i-087892e9625984a0b

									
										51

.github/ansible/staging.us-east-2.hosts.yaml
									
										vendored
									
												View File
											
				@@ -1,51 +0,0 @@

				storage:

				  vars:

				    bucket_name: neon-staging-storage-us-east-2

				    bucket_region: us-east-2

				    console_mgmt_base_url: http://neon-internal-api.aws.neon.build

				    broker_endpoint: http://storage-broker-lb.beta.us-east-2.internal.aws.neon.build:50051

				    pageserver_config_stub:

				      pg_distrib_dir: /usr/local

				      metric_collection_endpoint: http://neon-internal-api.aws.neon.build/billing/api/v1/usage_events

				      metric_collection_interval: 10min

				      tenant_config:

				        eviction_policy:

				          kind: "LayerAccessThreshold"

				          period: "20m"

				          threshold: "20m"

				      remote_storage:

				        bucket_name: "{{ bucket_name }}"

				        bucket_region: "{{ bucket_region }}"

				        prefix_in_bucket: "pageserver/v1"

				    safekeeper_s3_prefix: safekeeper/v1/wal

				    hostname_suffix: ""

				    remote_user: ssm-user

				    ansible_aws_ssm_region: us-east-2

				    ansible_aws_ssm_bucket_name: neon-staging-storage-us-east-2

				    console_region_id: aws-us-east-2

				    sentry_environment: staging

				  children:

				    pageservers:

				      hosts:

				        pageserver-0.us-east-2.aws.neon.build:

				          ansible_host: i-0c3e70929edb5d691

				        pageserver-1.us-east-2.aws.neon.build:

				          ansible_host: i-0565a8b4008aa3f40

				        pageserver-2.us-east-2.aws.neon.build:

				          ansible_host: i-01e31cdf7e970586a

				        pageserver-3.us-east-2.aws.neon.build:

				          ansible_host: i-0602a0291365ef7cc

				        pageserver-99.us-east-2.aws.neon.build:

				          ansible_host: i-0c39491109bb88824

				    safekeepers:

				      hosts:

				        safekeeper-0.us-east-2.aws.neon.build:

				          ansible_host: i-027662bd552bf5db0

				        safekeeper-1.us-east-2.aws.neon.build:

				          ansible_host: i-0171efc3604a7b907

				        safekeeper-2.us-east-2.aws.neon.build:

				          ansible_host: i-0de0b03a51676a6ce

				        safekeeper-99.us-east-2.aws.neon.build:

				          ansible_host: i-0d61b6a2ea32028d5

									
										18

.github/ansible/systemd/pageserver.service
									
										vendored
									
												View File
											
				@@ -1,18 +0,0 @@

				[Unit]

				Description=Neon pageserver

				After=network.target auditd.service

				[Service]

				Type=simple

				User=pageserver

				Environment=RUST_BACKTRACE=1 NEON_REPO_DIR=/storage/pageserver LD_LIBRARY_PATH=/usr/local/v14/lib SENTRY_DSN={{ SENTRY_URL_PAGESERVER }} SENTRY_ENVIRONMENT={{ sentry_environment }}

				ExecStart=/usr/local/bin/pageserver -c "pg_distrib_dir='/usr/local'" -c "listen_pg_addr='0.0.0.0:6400'" -c "listen_http_addr='0.0.0.0:9898'" -c "broker_endpoint='{{ broker_endpoint }}'" -D /storage/pageserver/data

				ExecReload=/bin/kill -HUP $MAINPID

				KillMode=mixed

				KillSignal=SIGINT

				Restart=on-failure

				TimeoutSec=10

				LimitNOFILE=30000000

				[Install]

				WantedBy=multi-user.target

									
										18

.github/ansible/systemd/safekeeper.service
									
										vendored
									
												View File
											
				@@ -1,18 +0,0 @@

				[Unit]

				Description=Neon safekeeper

				After=network.target auditd.service

				[Service]

				Type=simple

				User=safekeeper

				Environment=RUST_BACKTRACE=1 NEON_REPO_DIR=/storage/safekeeper/data LD_LIBRARY_PATH=/usr/local/v14/lib SENTRY_DSN={{ SENTRY_URL_SAFEKEEPER }} SENTRY_ENVIRONMENT={{ sentry_environment }}

				ExecStart=/usr/local/bin/safekeeper -l {{ inventory_hostname }}{{ hostname_suffix }}:6500 --listen-http {{ inventory_hostname }}{{ hostname_suffix }}:7676 -D /storage/safekeeper/data --broker-endpoint={{ broker_endpoint }} --remote-storage='{bucket_name="{{bucket_name}}", bucket_region="{{bucket_region}}", prefix_in_bucket="{{ safekeeper_s3_prefix }}"}'

				ExecReload=/bin/kill -HUP $MAINPID

				KillMode=mixed

				KillSignal=SIGINT

				Restart=on-failure

				TimeoutSec=10

				LimitNOFILE=30000000

				[Install]

				WantedBy=multi-user.target

1

.github/ansible/templates/pageserver.toml.j2 vendored

View File

				`@@ -1 +0,0 @@`
				`{{ pageserver_config \| sivel.toiletwater.to_toml }}`

									
										76

.github/helm-values/dev-eu-west-1-zeta.neon-proxy-scram.yaml
									
										vendored
									
												View File
											
				@@ -1,76 +0,0 @@

				# Helm chart values for neon-proxy-scram.

				# This is a YAML-formatted file.

				deploymentStrategy:

				  type: RollingUpdate

				  rollingUpdate:

				    maxSurge: 100%

				    maxUnavailable: 50%

				# Delay the kill signal by 7 days (7 * 24 * 60 * 60)

				# The pod(s) will stay in Terminating, keeps the existing connections

				# but doesn't receive new ones

				containerLifecycle:

				  preStop:

				    exec:

				      command: ["/bin/sh", "-c", "sleep 604800"]

				terminationGracePeriodSeconds: 604800

				image:

				  repository: neondatabase/neon

				settings:

				  authBackend: "console"

				  authEndpoint: "http://neon-internal-api.aws.neon.build/management/api/v2"

				  domain: "*.eu-west-1.aws.neon.build"

				  sentryEnvironment: "staging"

				  wssPort: 8443

				  metricCollectionEndpoint: "http://neon-internal-api.aws.neon.build/billing/api/v1/usage_events"

				  metricCollectionInterval: "1min"

				# -- Additional labels for neon-proxy pods

				podLabels:

				  zenith_service: proxy-scram

				  zenith_env: dev

				  zenith_region: eu-west-1

				  zenith_region_slug: eu-west-1

				exposedService:

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

				    external-dns.alpha.kubernetes.io/hostname: eu-west-1.aws.neon.build

				  httpsPort: 443

				#metrics:

				#  enabled: true

				#  serviceMonitor:

				#    enabled: true

				#    selector:

				#      release: kube-prometheus-stack

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-proxy.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-proxy-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-proxy

				        app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-proxy"

				      endpoints:

				        - port: http

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

									
										52

.github/helm-values/dev-eu-west-1-zeta.neon-storage-broker.yaml
									
										vendored
									
												View File
											
				@@ -1,52 +0,0 @@

				# Helm chart values for neon-storage-broker

				podLabels:

				  neon_env: staging

				  neon_service: storage-broker

				# Use L4 LB

				service:

				  # service.annotations -- Annotations to add to the service

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external  # use newer AWS Load Balancer Controller

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internal  # deploy LB to private subnet

				    # assign service to this name at external-dns

				    external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.zeta.eu-west-1.internal.aws.neon.build

				  # service.type -- Service type

				  type: LoadBalancer

				  # service.port -- broker listen port

				  port: 50051

				ingress:

				  enabled: false

				metrics:

				  enabled: false

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-storage-broker.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-storage-broker

				        app.kubernetes.io/instance: neon-storage-broker

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-storage-broker"

				      endpoints:

				        - port: broker

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

				settings:

				  sentryEnvironment: "staging"

									
										68

.github/helm-values/dev-us-east-2-beta.neon-proxy-link.yaml
									
										vendored
									
												View File
											
				@@ -1,68 +0,0 @@

				# Helm chart values for neon-proxy-link.

				# This is a YAML-formatted file.

				image:

				  repository: neondatabase/neon

				settings:

				  authBackend: "link"

				  authEndpoint: "https://console.stage.neon.tech/authenticate_proxy_request/"

				  uri: "https://console.stage.neon.tech/psql_session/"

				  domain: "pg.neon.build"

				  sentryEnvironment: "staging"

				  metricCollectionEndpoint: "http://neon-internal-api.aws.neon.build/billing/api/v1/usage_events"

				  metricCollectionInterval: "1min"

				# -- Additional labels for neon-proxy-link pods

				podLabels:

				  zenith_service: proxy

				  zenith_env: dev

				  zenith_region: us-east-2

				  zenith_region_slug: us-east-2

				service:

				  type: LoadBalancer

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internal

				    external-dns.alpha.kubernetes.io/hostname: neon-proxy-link-mgmt.beta.us-east-2.aws.neon.build

				exposedService:

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

				    external-dns.alpha.kubernetes.io/hostname: neon-proxy-link.beta.us-east-2.aws.neon.build

				#metrics:

				#  enabled: true

				#  serviceMonitor:

				#    enabled: true

				#    selector:

				#      release: kube-prometheus-stack

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-proxy.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-proxy-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-proxy

				        app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-proxy"

				      endpoints:

				        - port: http

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

									
										61

.github/helm-values/dev-us-east-2-beta.neon-proxy-scram-legacy.yaml
									
										vendored
									
												View File
											
				@@ -1,61 +0,0 @@

				# Helm chart values for neon-proxy-scram.

				# This is a YAML-formatted file.

				image:

				  repository: neondatabase/neon

				settings:

				  authBackend: "console"

				  authEndpoint: "http://neon-internal-api.aws.neon.build/management/api/v2"

				  domain: "*.cloud.stage.neon.tech"

				  sentryEnvironment: "staging"

				  wssPort: 8443

				  metricCollectionEndpoint: "http://neon-internal-api.aws.neon.build/billing/api/v1/usage_events"

				  metricCollectionInterval: "1min"

				# -- Additional labels for neon-proxy pods

				podLabels:

				  zenith_service: proxy-scram-legacy

				  zenith_env: dev

				  zenith_region: us-east-2

				  zenith_region_slug: us-east-2

				exposedService:

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

				    external-dns.alpha.kubernetes.io/hostname: neon-proxy-scram-legacy.beta.us-east-2.aws.neon.build

				  httpsPort: 443

				#metrics:

				#  enabled: true

				#  serviceMonitor:

				#    enabled: true

				#    selector:

				#      release: kube-prometheus-stack

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-proxy.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-proxy-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-proxy

				        app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-proxy"

				      endpoints:

				        - port: http

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

									
										76

.github/helm-values/dev-us-east-2-beta.neon-proxy-scram.yaml
									
										vendored
									
												View File
											
				@@ -1,76 +0,0 @@

				# Helm chart values for neon-proxy-scram.

				# This is a YAML-formatted file.

				deploymentStrategy:

				  type: RollingUpdate

				  rollingUpdate:

				    maxSurge: 100%

				    maxUnavailable: 50%

				# Delay the kill signal by 7 days (7 * 24 * 60 * 60)

				# The pod(s) will stay in Terminating, keeps the existing connections

				# but doesn't receive new ones

				containerLifecycle:

				  preStop:

				    exec:

				      command: ["/bin/sh", "-c", "sleep 604800"]

				terminationGracePeriodSeconds: 604800

				image:

				  repository: neondatabase/neon

				settings:

				  authBackend: "console"

				  authEndpoint: "http://neon-internal-api.aws.neon.build/management/api/v2"

				  domain: "*.us-east-2.aws.neon.build"

				  sentryEnvironment: "staging"

				  wssPort: 8443

				  metricCollectionEndpoint: "http://neon-internal-api.aws.neon.build/billing/api/v1/usage_events"

				  metricCollectionInterval: "1min"

				# -- Additional labels for neon-proxy pods

				podLabels:

				  zenith_service: proxy-scram

				  zenith_env: dev

				  zenith_region: us-east-2

				  zenith_region_slug: us-east-2

				exposedService:

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

				    external-dns.alpha.kubernetes.io/hostname: us-east-2.aws.neon.build

				  httpsPort: 443

				#metrics:

				#  enabled: true

				#  serviceMonitor:

				#    enabled: true

				#    selector:

				#      release: kube-prometheus-stack

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-proxy.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-proxy-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-proxy

				        app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-proxy"

				      endpoints:

				        - port: http

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

									
										52

.github/helm-values/dev-us-east-2-beta.neon-storage-broker.yaml
									
										vendored
									
												View File
											
				@@ -1,52 +0,0 @@

				# Helm chart values for neon-storage-broker

				podLabels:

				  neon_env: staging

				  neon_service: storage-broker

				# Use L4 LB

				service:

				  # service.annotations -- Annotations to add to the service

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external  # use newer AWS Load Balancer Controller

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internal  # deploy LB to private subnet

				    # assign service to this name at external-dns

				    external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.beta.us-east-2.internal.aws.neon.build

				  # service.type -- Service type

				  type: LoadBalancer

				  # service.port -- broker listen port

				  port: 50051

				ingress:

				  enabled: false

				metrics:

				  enabled: false

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-storage-broker.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-storage-broker

				        app.kubernetes.io/instance: neon-storage-broker

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-storage-broker"

				      endpoints:

				        - port: broker

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

				settings:

				  sentryEnvironment: "staging"

									
										77

.github/helm-values/prod-ap-southeast-1-epsilon.neon-proxy-scram.yaml
									
										vendored
									
												View File
											
				@@ -1,77 +0,0 @@

				# Helm chart values for neon-proxy-scram.

				# This is a YAML-formatted file.

				deploymentStrategy:

				  type: RollingUpdate

				  rollingUpdate:

				    maxSurge: 100%

				    maxUnavailable: 50%

				# Delay the kill signal by 7 days (7 * 24 * 60 * 60)

				# The pod(s) will stay in Terminating, keeps the existing connections

				# but doesn't receive new ones

				containerLifecycle:

				  preStop:

				    exec:

				      command: ["/bin/sh", "-c", "sleep 604800"]

				terminationGracePeriodSeconds: 604800

				image:

				  repository: neondatabase/neon

				settings:

				  authBackend: "console"

				  authEndpoint: "http://neon-internal-api.aws.neon.tech/management/api/v2"

				  domain: "*.ap-southeast-1.aws.neon.tech"

				  sentryEnvironment: "production"

				  wssPort: 8443

				  metricCollectionEndpoint: "http://neon-internal-api.aws.neon.tech/billing/api/v1/usage_events"

				  metricCollectionInterval: "10min"

				# -- Additional labels for neon-proxy pods

				podLabels:

				  zenith_service: proxy-scram

				  zenith_env: prod

				  zenith_region: ap-southeast-1

				  zenith_region_slug: ap-southeast-1

				exposedService:

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

				    external-dns.alpha.kubernetes.io/hostname: ap-southeast-1.aws.neon.tech

				  httpsPort: 443

				#metrics:

				#  enabled: true

				#  serviceMonitor:

				#    enabled: true

				#    selector:

				#      release: kube-prometheus-stack

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-proxy.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-proxy-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-proxy

				        app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-proxy"

				      endpoints:

				        - port: http

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

									
										52

.github/helm-values/prod-ap-southeast-1-epsilon.neon-storage-broker.yaml
									
										vendored
									
												View File
											
				@@ -1,52 +0,0 @@

				# Helm chart values for neon-storage-broker

				podLabels:

				  neon_env: production

				  neon_service: storage-broker

				# Use L4 LB

				service:

				  # service.annotations -- Annotations to add to the service

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external  # use newer AWS Load Balancer Controller

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internal  # deploy LB to private subnet

				    # assign service to this name at external-dns

				    external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.epsilon.ap-southeast-1.internal.aws.neon.tech

				  # service.type -- Service type

				  type: LoadBalancer

				  # service.port -- broker listen port

				  port: 50051

				ingress:

				  enabled: false

				metrics:

				  enabled: false

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-storage-broker.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-storage-broker

				        app.kubernetes.io/instance: neon-storage-broker

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-storage-broker"

				      endpoints:

				        - port: broker

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

				settings:

				  sentryEnvironment: "production"

									
										77

.github/helm-values/prod-eu-central-1-gamma.neon-proxy-scram.yaml
									
										vendored
									
												View File
											
				@@ -1,77 +0,0 @@

				# Helm chart values for neon-proxy-scram.

				# This is a YAML-formatted file.

				deploymentStrategy:

				  type: RollingUpdate

				  rollingUpdate:

				    maxSurge: 100%

				    maxUnavailable: 50%

				# Delay the kill signal by 7 days (7 * 24 * 60 * 60)

				# The pod(s) will stay in Terminating, keeps the existing connections

				# but doesn't receive new ones

				containerLifecycle:

				  preStop:

				    exec:

				      command: ["/bin/sh", "-c", "sleep 604800"]

				terminationGracePeriodSeconds: 604800

				image:

				  repository: neondatabase/neon

				settings:

				  authBackend: "console"

				  authEndpoint: "http://neon-internal-api.aws.neon.tech/management/api/v2"

				  domain: "*.eu-central-1.aws.neon.tech"

				  sentryEnvironment: "production"

				  wssPort: 8443

				  metricCollectionEndpoint: "http://neon-internal-api.aws.neon.tech/billing/api/v1/usage_events"

				  metricCollectionInterval: "10min"

				# -- Additional labels for neon-proxy pods

				podLabels:

				  zenith_service: proxy-scram

				  zenith_env: prod

				  zenith_region: eu-central-1

				  zenith_region_slug: eu-central-1

				exposedService:

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

				    external-dns.alpha.kubernetes.io/hostname: eu-central-1.aws.neon.tech

				  httpsPort: 443

				#metrics:

				#  enabled: true

				#  serviceMonitor:

				#    enabled: true

				#    selector:

				#      release: kube-prometheus-stack

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-proxy.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-proxy-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-proxy

				        app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-proxy"

				      endpoints:

				        - port: http

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

									
										52

.github/helm-values/prod-eu-central-1-gamma.neon-storage-broker.yaml
									
										vendored
									
												View File
											
				@@ -1,52 +0,0 @@

				# Helm chart values for neon-storage-broker

				podLabels:

				  neon_env: production

				  neon_service: storage-broker

				# Use L4 LB

				service:

				  # service.annotations -- Annotations to add to the service

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external  # use newer AWS Load Balancer Controller

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internal  # deploy LB to private subnet

				    # assign service to this name at external-dns

				    external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.gamma.eu-central-1.internal.aws.neon.tech

				  # service.type -- Service type

				  type: LoadBalancer

				  # service.port -- broker listen port

				  port: 50051

				ingress:

				  enabled: false

				metrics:

				  enabled: false

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-storage-broker.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-storage-broker

				        app.kubernetes.io/instance: neon-storage-broker

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-storage-broker"

				      endpoints:

				        - port: broker

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

				settings:

				  sentryEnvironment: "production"

									
										59

.github/helm-values/prod-us-east-2-delta.neon-proxy-link.yaml
									
										vendored
									
												View File
											
				@@ -1,59 +0,0 @@

				# Helm chart values for neon-proxy-link.

				# This is a YAML-formatted file.

				image:

				  repository: neondatabase/neon

				settings:

				  authBackend: "link"

				  authEndpoint: "https://console.neon.tech/authenticate_proxy_request/"

				  uri: "https://console.neon.tech/psql_session/"

				  domain: "pg.neon.tech"

				  sentryEnvironment: "production"

				# -- Additional labels for zenith-proxy pods

				podLabels:

				  zenith_service: proxy

				  zenith_env: production

				  zenith_region: us-east-2

				  zenith_region_slug: us-east-2

				service:

				  type: LoadBalancer

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internal

				    external-dns.alpha.kubernetes.io/hostname: neon-proxy-link-mgmt.delta.us-east-2.aws.neon.tech

				exposedService:

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

				    external-dns.alpha.kubernetes.io/hostname: neon-proxy-link.delta.us-east-2.aws.neon.tech

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-proxy.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-proxy-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-proxy

				        app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-proxy"

				      endpoints:

				        - port: http

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

									
										77

.github/helm-values/prod-us-east-2-delta.neon-proxy-scram.yaml
									
										vendored
									
												View File
											
				@@ -1,77 +0,0 @@

				# Helm chart values for neon-proxy-scram.

				# This is a YAML-formatted file.

				deploymentStrategy:

				  type: RollingUpdate

				  rollingUpdate:

				    maxSurge: 100%

				    maxUnavailable: 50%

				# Delay the kill signal by 7 days (7 * 24 * 60 * 60)

				# The pod(s) will stay in Terminating, keeps the existing connections

				# but doesn't receive new ones

				containerLifecycle:

				  preStop:

				    exec:

				      command: ["/bin/sh", "-c", "sleep 604800"]

				terminationGracePeriodSeconds: 604800

				image:

				  repository: neondatabase/neon

				settings:

				  authBackend: "console"

				  authEndpoint: "http://neon-internal-api.aws.neon.tech/management/api/v2"

				  domain: "*.us-east-2.aws.neon.tech"

				  sentryEnvironment: "production"

				  wssPort: 8443

				  metricCollectionEndpoint: "http://neon-internal-api.aws.neon.tech/billing/api/v1/usage_events"

				  metricCollectionInterval: "10min"

				# -- Additional labels for neon-proxy pods

				podLabels:

				  zenith_service: proxy-scram

				  zenith_env: prod

				  zenith_region: us-east-2

				  zenith_region_slug: us-east-2

				exposedService:

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

				    external-dns.alpha.kubernetes.io/hostname: us-east-2.aws.neon.tech

				  httpsPort: 443

				#metrics:

				#  enabled: true

				#  serviceMonitor:

				#    enabled: true

				#    selector:

				#      release: kube-prometheus-stack

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-proxy.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-proxy-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-proxy

				        app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-proxy"

				      endpoints:

				        - port: http

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

									
										52

.github/helm-values/prod-us-east-2-delta.neon-storage-broker.yaml
									
										vendored
									
												View File
											
				@@ -1,52 +0,0 @@

				# Helm chart values for neon-storage-broker

				podLabels:

				  neon_env: production

				  neon_service: storage-broker

				# Use L4 LB

				service:

				  # service.annotations -- Annotations to add to the service

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external  # use newer AWS Load Balancer Controller

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internal  # deploy LB to private subnet

				    # assign service to this name at external-dns

				    external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.delta.us-east-2.internal.aws.neon.tech

				  # service.type -- Service type

				  type: LoadBalancer

				  # service.port -- broker listen port

				  port: 50051

				ingress:

				  enabled: false

				metrics:

				  enabled: false

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-storage-broker.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-storage-broker

				        app.kubernetes.io/instance: neon-storage-broker

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-storage-broker"

				      endpoints:

				        - port: broker

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

				settings:

				  sentryEnvironment: "production"

									
										61

.github/helm-values/prod-us-west-2-eta.neon-proxy-scram-legacy.yaml
									
										vendored
									
												View File
											
				@@ -1,61 +0,0 @@

				# Helm chart values for neon-proxy-scram.

				# This is a YAML-formatted file.

				image:

				  repository: neondatabase/neon

				settings:

				  authBackend: "console"

				  authEndpoint: "http://neon-internal-api.aws.neon.tech/management/api/v2"

				  domain: "*.cloud.neon.tech"

				  sentryEnvironment: "production"

				  wssPort: 8443

				  metricCollectionEndpoint: "http://neon-internal-api.aws.neon.tech/billing/api/v1/usage_events"

				  metricCollectionInterval: "10min"

				# -- Additional labels for neon-proxy pods

				podLabels:

				  zenith_service: proxy-scram

				  zenith_env: prod

				  zenith_region: us-west-2

				  zenith_region_slug: us-west-2

				exposedService:

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

				    external-dns.alpha.kubernetes.io/hostname: neon-proxy-scram-legacy.eta.us-west-2.aws.neon.tech

				  httpsPort: 443

				#metrics:

				#  enabled: true

				#  serviceMonitor:

				#    enabled: true

				#    selector:

				#      release: kube-prometheus-stack

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-proxy.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-proxy-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-proxy

				        app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-proxy"

				      endpoints:

				        - port: http

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

									
										77

.github/helm-values/prod-us-west-2-eta.neon-proxy-scram.yaml
									
										vendored
									
												View File
											
				@@ -1,77 +0,0 @@

				# Helm chart values for neon-proxy-scram.

				# This is a YAML-formatted file.

				deploymentStrategy:

				  type: RollingUpdate

				  rollingUpdate:

				    maxSurge: 100%

				    maxUnavailable: 50%

				# Delay the kill signal by 7 days (7 * 24 * 60 * 60)

				# The pod(s) will stay in Terminating, keeps the existing connections

				# but doesn't receive new ones

				containerLifecycle:

				  preStop:

				    exec:

				      command: ["/bin/sh", "-c", "sleep 604800"]

				terminationGracePeriodSeconds: 604800

				image:

				  repository: neondatabase/neon

				settings:

				  authBackend: "console"

				  authEndpoint: "http://neon-internal-api.aws.neon.tech/management/api/v2"

				  domain: "*.us-west-2.aws.neon.tech"

				  sentryEnvironment: "production"

				  wssPort: 8443

				  metricCollectionEndpoint: "http://neon-internal-api.aws.neon.tech/billing/api/v1/usage_events"

				  metricCollectionInterval: "10min"

				# -- Additional labels for neon-proxy pods

				podLabels:

				  zenith_service: proxy-scram

				  zenith_env: prod

				  zenith_region: us-west-2

				  zenith_region_slug: us-west-2

				exposedService:

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

				    external-dns.alpha.kubernetes.io/hostname: us-west-2.aws.neon.tech

				  httpsPort: 443

				#metrics:

				#  enabled: true

				#  serviceMonitor:

				#    enabled: true

				#    selector:

				#      release: kube-prometheus-stack

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-proxy.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-proxy-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-proxy

				        app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-proxy"

				      endpoints:

				        - port: http

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

									
										52

.github/helm-values/prod-us-west-2-eta.neon-storage-broker.yaml
									
										vendored
									
												View File
											
				@@ -1,52 +0,0 @@

				# Helm chart values for neon-storage-broker

				podLabels:

				  neon_env: production

				  neon_service: storage-broker

				# Use L4 LB

				service:

				  # service.annotations -- Annotations to add to the service

				  annotations:

				    service.beta.kubernetes.io/aws-load-balancer-type: external  # use newer AWS Load Balancer Controller

				    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

				    service.beta.kubernetes.io/aws-load-balancer-scheme: internal  # deploy LB to private subnet

				    # assign service to this name at external-dns

				    external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.eta.us-west-2.internal.aws.neon.tech

				  # service.type -- Service type

				  type: LoadBalancer

				  # service.port -- broker listen port

				  port: 50051

				ingress:

				  enabled: false

				metrics:

				  enabled: false

				extraManifests:

				  - apiVersion: operator.victoriametrics.com/v1beta1

				    kind: VMServiceScrape

				    metadata:

				      name: "{{ include \"neon-storage-broker.fullname\" . }}"

				      labels:

				        helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}

				        app.kubernetes.io/name: neon-storage-broker

				        app.kubernetes.io/instance: neon-storage-broker

				        app.kubernetes.io/version: "{{ .Chart.AppVersion }}"

				        app.kubernetes.io/managed-by: Helm

				      namespace: "{{ .Release.Namespace }}"

				    spec:

				      selector:

				        matchLabels:

				          app.kubernetes.io/name: "neon-storage-broker"

				      endpoints:

				        - port: broker

				          path: /metrics

				          interval: 10s

				          scrapeTimeout: 10s

				      namespaceSelector:

				        matchNames:

				          - "{{ .Release.Namespace }}"

				settings:

				  sentryEnvironment: "production"

									
										8

.github/pull_request_template.md
									
										vendored
									
												View File
												
				@@ -1,10 +1,14 @@

				## Describe your changes

				## Problem

				## Issue ticket number and link

				## Summary of changes

				## Checklist before requesting a review

				- [ ] I have performed a self-review of my code.

				- [ ] If it is a core feature, I have added thorough tests.

				- [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?

				- [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

				## Checklist before merging

				- [ ] Do not forget to reformat commit message to not include the above checklist

									
										51

.github/workflows/actionlint.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,51 @@

				name: Lint GitHub Workflows

				on:

				  push:

				    branches:

				      - main

				      - release

				    paths:

				      - '.github/workflows/*.ya?ml'

				  pull_request:

				    paths:

				      - '.github/workflows/*.ya?ml'

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: ${{ github.event_name == 'pull_request' }}

				jobs:

				  check-permissions:

				    if: ${{ !contains(github.event.pull_request.labels.*.name, 'run-no-ci') }}

				    uses: ./.github/workflows/check-permissions.yml

				    with:

				      github-event-name: ${{ github.event_name}}

				  actionlint:

				    needs: [ check-permissions ]

				    runs-on: ubuntu-22.04

				    steps:

				      - uses: actions/checkout@v4

				      - uses: reviewdog/action-actionlint@v1

				        env:

				          # SC2046 - Quote this to prevent word splitting. - https://www.shellcheck.net/wiki/SC2046

				          # SC2086 - Double quote to prevent globbing and word splitting. - https://www.shellcheck.net/wiki/SC2086

				          SHELLCHECK_OPTS: --exclude=SC2046,SC2086

				        with:

				          fail_on_error: true

				          filter_mode: nofilter

				          level: error

				      - name: Disallow 'ubuntu-latest' runners

				        run: |

				          PAT='^\s*runs-on:.*-latest'

				          if grep -ERq $PAT .github/workflows; then

				            grep -ERl $PAT .github/workflows |\

				            while read -r f

				            do

				              l=$(grep -nE $PAT .github/workflows/release.yml | awk -F: '{print $1}' | head -1)

				              echo "::error file=$f,line=$l::Please use 'ubuntu-22.04' instead of 'ubuntu-latest'"

				            done

				            exit 1

				          fi

									
										163

.github/workflows/approved-for-ci-run.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,163 @@

				name: Handle `approved-for-ci-run` label

				# This workflow helps to run CI pipeline for PRs made by external contributors (from forks).

				on:

				  pull_request_target:

				    branches:

				      - main

				    types:

				      # Default types that triggers a workflow ([1]):

				      # - [1] https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request

				      - opened

				      - synchronize

				      - reopened

				      # Types that we wand to handle in addition to keep labels tidy:

				      - closed

				      # Actual magic happens here:

				      - labeled

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number }}

				  cancel-in-progress: false

				env:

				  GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				  PR_NUMBER: ${{ github.event.pull_request.number }}

				  BRANCH: "ci-run/pr-${{ github.event.pull_request.number }}"

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				defaults:

				  run:

				    shell: bash -euo pipefail {0}

				jobs:

				  remove-label:

				    # Remove `approved-for-ci-run` label if the workflow is triggered by changes in a PR.

				    # The PR should be reviewed and labelled manually again.

				    permissions:

				      pull-requests: write # For `gh pr edit`

				    if: |

				      contains(fromJSON('["opened", "synchronize", "reopened", "closed"]'), github.event.action) &&

				      contains(github.event.pull_request.labels.*.name, 'approved-for-ci-run')

				    runs-on: ubuntu-22.04

				    steps:

				      - run: gh pr --repo "${GITHUB_REPOSITORY}" edit "${PR_NUMBER}" --remove-label "approved-for-ci-run"

				  create-or-update-pr-for-ci-run:

				    # Create local PR for an `approved-for-ci-run` labelled PR to run CI pipeline in it.

				    permissions:

				      pull-requests: write # for `gh pr edit`

				      # For `git push` and `gh pr create` we use CI_ACCESS_TOKEN

				    if: |

				      github.event.action == 'labeled' &&

				      contains(github.event.pull_request.labels.*.name, 'approved-for-ci-run')

				    runs-on: ubuntu-22.04

				    steps:

				      - run: gh pr --repo "${GITHUB_REPOSITORY}" edit "${PR_NUMBER}" --remove-label "approved-for-ci-run"

				      - uses: actions/checkout@v4

				        with:

				          ref: main

				          token: ${{ secrets.CI_ACCESS_TOKEN }}

				      - name: Look for existing PR

				        id: get-pr

				        env:

				          GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				        run: |

				          ALREADY_CREATED="$(gh pr --repo ${GITHUB_REPOSITORY} list --head ${BRANCH} --base main --json number --jq '.[].number')"

				          echo "ALREADY_CREATED=${ALREADY_CREATED}" >> ${GITHUB_OUTPUT}

				      - name: Get changed labels

				        id: get-labels

				        if: steps.get-pr.outputs.ALREADY_CREATED != ''

				        env:

				          ALREADY_CREATED: ${{ steps.get-pr.outputs.ALREADY_CREATED }}

				          GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				        run: |

				          LABELS_TO_REMOVE=$(comm -23 <(gh pr --repo ${GITHUB_REPOSITORY} view ${ALREADY_CREATED} --json labels --jq '.labels.[].name'| ( grep -E '^run' || true ) | sort) \

				          <(gh pr --repo ${GITHUB_REPOSITORY} view ${PR_NUMBER} --json labels --jq '.labels.[].name' | ( grep -E '^run' || true ) | sort ) |\

				          ( grep -v run-e2e-tests-in-draft || true ) | paste -sd , -)

				          LABELS_TO_ADD=$(comm -13 <(gh pr --repo ${GITHUB_REPOSITORY} view ${ALREADY_CREATED} --json labels --jq '.labels.[].name'| ( grep -E '^run' || true ) |sort) \

				          <(gh pr --repo ${GITHUB_REPOSITORY} view ${PR_NUMBER} --json labels --jq '.labels.[].name' |  ( grep -E '^run' || true ) | sort ) |\

				          paste -sd , -)

				          echo "LABELS_TO_ADD=${LABELS_TO_ADD}" >> ${GITHUB_OUTPUT}

				          echo "LABELS_TO_REMOVE=${LABELS_TO_REMOVE}" >> ${GITHUB_OUTPUT}

				      - run: gh pr checkout "${PR_NUMBER}"

				      - run: git checkout -b "${BRANCH}"

				      - run: git push --force origin "${BRANCH}"

				        if: steps.get-pr.outputs.ALREADY_CREATED == ''

				      - name: Create a Pull Request for CI run (if required)

				        if: steps.get-pr.outputs.ALREADY_CREATED == ''

				        env: 

				          GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				        run: |

				          cat << EOF > body.md

				            This Pull Request is created automatically to run the CI pipeline for #${PR_NUMBER}

				            Please do not alter or merge/close it.

				            Feel free to review/comment/discuss the original PR #${PR_NUMBER}.

				          EOF

				          LABELS=$( (gh pr --repo "${GITHUB_REPOSITORY}" view ${PR_NUMBER}  --json labels --jq '.labels.[].name'; echo run-e2e-tests-in-draft  )| \

				          grep -E '^run' | paste -sd , -)

				          gh pr --repo "${GITHUB_REPOSITORY}" create --title "CI run for PR #${PR_NUMBER}" \

				                                                       --body-file "body.md" \

				                                                       --head "${BRANCH}" \

				                                                       --base "main" \

				                                                       --label ${LABELS} \

				                                                       --draft

				      - name: Modify the existing pull request (if required)

				        if: steps.get-pr.outputs.ALREADY_CREATED != ''

				        env:

				          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				          LABELS_TO_ADD: ${{ steps.get-labels.outputs.LABELS_TO_ADD }}

				          LABELS_TO_REMOVE: ${{ steps.get-labels.outputs.LABELS_TO_REMOVE }}

				          ALREADY_CREATED: ${{ steps.get-pr.outputs.ALREADY_CREATED }}

				        run: |

				          ADD_CMD=

				          REMOVE_CMD=

				          [ -z "${LABELS_TO_ADD}" ] || ADD_CMD="--add-label ${LABELS_TO_ADD}"

				          [ -z "${LABELS_TO_REMOVE}" ] || REMOVE_CMD="--remove-label ${LABELS_TO_REMOVE}"

				          if [ -n "${ADD_CMD}" ] || [ -n "${REMOVE_CMD}" ]; then

				            gh pr --repo "${GITHUB_REPOSITORY}" edit ${ALREADY_CREATED} ${ADD_CMD} ${REMOVE_CMD}

				          fi

				      - run: git push --force origin "${BRANCH}"

				        if: steps.get-pr.outputs.ALREADY_CREATED != ''

				  cleanup:

				    # Close PRs and delete branchs if the original PR is closed.

				    permissions:

				      contents: write # for `--delete-branch` flag in `gh pr close`

				      pull-requests: write # for `gh pr close`

				    if: |

				      github.event.action == 'closed' &&

				      github.event.pull_request.head.repo.full_name != github.repository

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Close PR and delete `ci-run/pr-${{ env.PR_NUMBER }}` branch

				        run: |

				          CLOSED="$(gh pr --repo ${GITHUB_REPOSITORY} list --head ${BRANCH} --json 'closed' --jq '.[].closed')"

				          if [ "${CLOSED}" == "false" ]; then

				            gh pr --repo "${GITHUB_REPOSITORY}" close "${BRANCH}" --delete-branch

				          fi

									
										456

.github/workflows/benchmarking.yml
									
										vendored
									
												View File
												
				@@ -11,18 +11,38 @@ on:

				    #          │ │ ┌───────────── day of the month (1 - 31)

				    #          │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)

				    #          │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)

				    - cron:  '0 3 * * *' # run once a day, timezone is utc

				    - cron:   '0 3 * * *' # run once a day, timezone is utc

				  workflow_dispatch: # adds ability to run this manually

				    inputs:

				      region_id:

				        description: 'Use a particular region. If not set the default region will be used'

				        description: 'Project region id. If not set, the default region will be used'

				        required: false

				        default: 'aws-us-east-2'

				      save_perf_report:

				        type: boolean

				        description: 'Publish perf report or not. If not set, the report is published only for the main branch'

				        description: 'Publish perf report. If not set, the report will be published only for the main branch'

				        required: false

				      collect_olap_explain:

				        type: boolean

				        description: 'Collect EXPLAIN ANALYZE for OLAP queries. If not set, EXPLAIN ANALYZE will not be collected'

				        required: false

				        default: false

				      collect_pg_stat_statements:

				        type: boolean

				        description: 'Collect pg_stat_statements for OLAP queries. If not set, pg_stat_statements will not be collected'

				        required: false

				        default: false

				      run_AWS_RDS_AND_AURORA:

				        type: boolean

				        description: 'AWS-RDS and AWS-AURORA normally only run on Saturday. Set this to true to run them on every workflow_dispatch'

				        required: false

				        default: false

				      run_only_pgvector_tests:

				        type: boolean

				        description: 'Run pgvector tests but no other tests. If not set, all tests including pgvector tests will be run'

				        required: false

				        default: false

				defaults:

				  run:

				@@ -30,11 +50,12 @@ defaults:

				concurrency:

				  # Allow only one workflow per any non-`main` branch.

				  group: ${{ github.workflow }}-${{ github.ref }}-${{ github.ref == 'refs/heads/main' && github.sha || 'anysha' }}

				  group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}

				  cancel-in-progress: true

				jobs:

				  bench:

				    if: ${{ github.event.inputs.run_only_pgvector_tests == 'false' || github.event.inputs.run_only_pgvector_tests == null }}

				    env:

				      TEST_PG_BENCH_DURATIONS_MATRIX: "300"

				      TEST_PG_BENCH_SCALES_MATRIX: "10,100"

				@@ -42,21 +63,21 @@ jobs:

				      DEFAULT_PG_VERSION: 14

				      TEST_OUTPUT: /tmp/test_output

				      BUILD_TYPE: remote

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}

				      PLATFORM: "neon-staging"

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:pinned

				      options: --init

				    steps:

				    - uses: actions/checkout@v3

				    - uses: actions/checkout@v4

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-release-artifact

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				@@ -78,7 +99,7 @@ jobs:

				        # Set --sparse-ordering option of pytest-order plugin

				        # to ensure tests are running in order of appears in the file.

				        # It's important for test_perf_pgbench.py::test_pgbench_remote_* tests

				        extra_params: -m remote_cluster --sparse-ordering --timeout 5400 --ignore test_runner/performance/test_perf_olap.py

				        extra_params: -m remote_cluster --sparse-ordering --timeout 5400 --ignore test_runner/performance/test_perf_olap.py --ignore test_runner/performance/test_perf_pgvector_queries.py

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				@@ -92,11 +113,8 @@ jobs:

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				    - name: Create Allure report

				      if: success() || failure()

				      uses: ./.github/actions/allure-report

				      with:

				        action: generate

				        build_type: ${{ env.BUILD_TYPE }}

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-generate

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				@@ -107,25 +125,91 @@ jobs:

				      env:

				        SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

				  generate-matrices:

				    if: ${{ github.event.inputs.run_only_pgvector_tests == 'false' || github.event.inputs.run_only_pgvector_tests == null }}

				    # Create matrices for the benchmarking jobs, so we run benchmarks on rds only once a week (on Saturday)

				    #

				    # Available platforms:

				    # - neon-captest-new: Freshly created project (1 CU)

				    # - neon-captest-freetier: Use freetier-sized compute (0.25 CU)

				    # - neon-captest-reuse: Reusing existing project

				    # - rds-aurora: Aurora Postgres Serverless v2 with autoscaling from 0.5 to 2 ACUs

				    # - rds-postgres: RDS Postgres db.m5.large instance (2 vCPU, 8 GiB) with gp3 EBS storage

				    env:

				      RUN_AWS_RDS_AND_AURORA: ${{ github.event.inputs.run_AWS_RDS_AND_AURORA || 'false' }}

				    runs-on: ubuntu-22.04

				    outputs:

				      pgbench-compare-matrix: ${{ steps.pgbench-compare-matrix.outputs.matrix }}

				      olap-compare-matrix: ${{ steps.olap-compare-matrix.outputs.matrix }}

				      tpch-compare-matrix: ${{ steps.tpch-compare-matrix.outputs.matrix }}

				    steps:

				    - name: Generate matrix for pgbench benchmark

				      id: pgbench-compare-matrix

				      run: |

				        matrix='{

				          "platform": [

				            "neon-captest-new",

				            "neon-captest-reuse",

				            "neonvm-captest-new"

				          ],

				          "db_size": [ "10gb" ],

				          "include": [{ "platform": "neon-captest-freetier",         "db_size": "3gb"  },

				                      { "platform": "neon-captest-new",              "db_size": "50gb" },

				                      { "platform": "neonvm-captest-freetier",       "db_size": "3gb"  },

				                      { "platform": "neonvm-captest-new",            "db_size": "50gb" },

				                      { "platform": "neonvm-captest-sharding-reuse", "db_size": "50gb" }]

				        }'

				        if [ "$(date +%A)" = "Saturday" ]; then

				          matrix=$(echo "$matrix" | jq '.include += [{ "platform": "rds-postgres", "db_size": "10gb"},

				                                                     { "platform": "rds-aurora",   "db_size": "50gb"}]')

				        fi

				        echo "matrix=$(echo "$matrix" | jq --compact-output '.')" >> $GITHUB_OUTPUT

				    - name: Generate matrix for OLAP benchmarks

				      id: olap-compare-matrix

				      run: |

				        matrix='{

				          "platform": [

				            "neon-captest-reuse"

				          ]

				        }'

				        if [ "$(date +%A)" = "Saturday" ] || [ ${RUN_AWS_RDS_AND_AURORA} = "true" ]; then

				          matrix=$(echo "$matrix" | jq '.include += [{ "platform": "rds-postgres" },

				                                                     { "platform": "rds-aurora"   }]')

				        fi

				        echo "matrix=$(echo "$matrix" | jq --compact-output '.')" >> $GITHUB_OUTPUT

				    - name: Generate matrix for TPC-H benchmarks

				      id: tpch-compare-matrix

				      run: |

				        matrix='{

				          "platform": [

				            "neon-captest-reuse"

				          ],

				          "scale": [

				            "10"

				          ]

				        }'

				        if [ "$(date +%A)" = "Saturday" ] || [ ${RUN_AWS_RDS_AND_AURORA} = "true" ]; then

				          matrix=$(echo "$matrix" | jq '.include += [{ "platform": "rds-postgres", "scale": "10" },

				                                                     { "platform": "rds-aurora",   "scale": "10" }]')

				        fi

				        echo "matrix=$(echo "$matrix" | jq --compact-output '.')" >> $GITHUB_OUTPUT

				  pgbench-compare:

				    if: ${{ github.event.inputs.run_only_pgvector_tests == 'false' || github.event.inputs.run_only_pgvector_tests == null }}

				    needs: [ generate-matrices ]

				    strategy:

				      fail-fast: false

				      matrix:

				        # neon-captest-new: Run pgbench in a freshly created project

				        # neon-captest-reuse: Same, but reusing existing project

				        # neon-captest-prefetch: Same, with prefetching enabled (new project)

				        # rds-aurora: Aurora Postgres Serverless v2 with autoscaling from 0.5 to 2 ACUs

				        # rds-postgres: RDS Postgres db.m5.large instance (2 vCPU, 8 GiB) with gp3 EBS storage

				        platform: [ neon-captest-reuse, neon-captest-prefetch, rds-postgres ]

				        db_size: [ 10gb ]

				        runner: [ us-east-2 ]

				        include:

				          - platform: neon-captest-prefetch

				            db_size: 50gb

				            runner: us-east-2

				          - platform: rds-aurora

				            db_size: 50gb

				            runner: us-east-2

				      matrix: ${{fromJson(needs.generate-matrices.outputs.pgbench-compare-matrix)}}

				    env:

				      TEST_PG_BENCH_DURATIONS_MATRIX: "60m"

				@@ -134,23 +218,24 @@ jobs:

				      DEFAULT_PG_VERSION: 14

				      TEST_OUTPUT: /tmp/test_output

				      BUILD_TYPE: remote

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}

				      PLATFORM: ${{ matrix.platform }}

				    runs-on: [ self-hosted, "${{ matrix.runner }}", x64 ]

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:pinned

				      options: --init

				    timeout-minutes: 360 # 6h

				    # Increase timeout to 8h, default timeout is 6h

				    timeout-minutes: 480

				    steps:

				    - uses: actions/checkout@v3

				    - uses: actions/checkout@v4

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-release-artifact

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				@@ -160,13 +245,15 @@ jobs:

				        echo "${POSTGRES_DISTRIB_DIR}/v${DEFAULT_PG_VERSION}/bin" >> $GITHUB_PATH

				    - name: Create Neon Project

				      if: contains(fromJson('["neon-captest-new", "neon-captest-prefetch"]'), matrix.platform)

				      if: contains(fromJson('["neon-captest-new", "neon-captest-freetier", "neonvm-captest-new", "neonvm-captest-freetier"]'), matrix.platform)

				      id: create-neon-project

				      uses: ./.github/actions/neon-project-create

				      with:

				        region_id: ${{ github.event.inputs.region_id || 'aws-us-east-2' }}

				        postgres_version: ${{ env.DEFAULT_PG_VERSION }}

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				        compute_units: ${{ (matrix.platform == 'neon-captest-freetier' && '[0.25, 0.25]') || '[1, 1]' }}

				        provisioner: ${{ (contains(matrix.platform, 'neonvm-') && 'k8s-neonvm') || 'k8s-pod' }}

				    - name: Set up Connection String

				      id: set-up-connstr

				@@ -175,7 +262,10 @@ jobs:

				          neon-captest-reuse)

				            CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CONNSTR }}

				            ;;

				          neon-captest-new | neon-captest-prefetch)

				          neonvm-captest-sharding-reuse)

				            CONNSTR=${{ secrets.BENCHMARK_CAPTEST_SHARDING_CONNSTR }}

				            ;;

				          neon-captest-new | neon-captest-freetier | neonvm-captest-new | neonvm-captest-freetier)

				            CONNSTR=${{ steps.create-neon-project.outputs.dsn }}

				            ;;

				          rds-aurora)

				@@ -185,25 +275,22 @@ jobs:

				            CONNSTR=${{ secrets.BENCHMARK_RDS_POSTGRES_CONNSTR }}

				            ;;

				          *)

				            echo 2>&1 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-reuse', 'neon-captest-new', 'neon-captest-prefetch', 'rds-aurora', or 'rds-postgres'"

				            echo >&2 "Unknown PLATFORM=${PLATFORM}"

				            exit 1

				            ;;

				        esac

				        echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT

				        psql ${CONNSTR} -c "SELECT version();"

				        QUERIES=("SELECT version()")

				        if [[ "${PLATFORM}" = "neon"* ]]; then

				          QUERIES+=("SHOW neon.tenant_id")

				          QUERIES+=("SHOW neon.timeline_id")

				        fi

				    - name: Set database options

				      if: matrix.platform == 'neon-captest-prefetch'

				      run: |

				        DB_NAME=$(psql ${BENCHMARK_CONNSTR} --no-align --quiet -t -c "SELECT current_database()")

				        psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET enable_seqscan_prefetch=on"

				        psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET effective_io_concurrency=32"

				        psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET maintenance_io_concurrency=32"

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				        for q in "${QUERIES[@]}"; do

				          psql ${CONNSTR} -c "${q}"

				        done

				    - name: Benchmark init

				      uses: ./.github/actions/run-python-test-set

				@@ -252,11 +339,8 @@ jobs:

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				    - name: Create Allure report

				      if: success() || failure()

				      uses: ./.github/actions/allure-report

				      with:

				        action: generate

				        build_type: ${{ env.BUILD_TYPE }}

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-generate

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				@@ -267,6 +351,92 @@ jobs:

				      env:

				        SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

				  pgbench-pgvector:

				    env:

				      TEST_PG_BENCH_DURATIONS_MATRIX: "15m"

				      TEST_PG_BENCH_SCALES_MATRIX: "1"

				      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      DEFAULT_PG_VERSION: 16

				      TEST_OUTPUT: /tmp/test_output

				      BUILD_TYPE: remote

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}

				      PLATFORM: "neon-captest-pgvector"

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:pinned

				      options: --init

				    steps:

				    - uses: actions/checkout@v4

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				    - name: Add Postgres binaries to PATH

				      run: |

				        ${POSTGRES_DISTRIB_DIR}/v${DEFAULT_PG_VERSION}/bin/pgbench --version

				        echo "${POSTGRES_DISTRIB_DIR}/v${DEFAULT_PG_VERSION}/bin" >> $GITHUB_PATH

				    - name: Set up Connection String

				      id: set-up-connstr

				      run: |

				        CONNSTR=${{ secrets.BENCHMARK_PGVECTOR_CONNSTR }}

				        echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT

				        QUERIES=("SELECT version()")

				        QUERIES+=("SHOW neon.tenant_id")

				        QUERIES+=("SHOW neon.timeline_id")

				        for q in "${QUERIES[@]}"; do

				          psql ${CONNSTR} -c "${q}"

				        done

				    - name: Benchmark pgvector hnsw indexing

				      uses: ./.github/actions/run-python-test-set

				      with:

				        build_type: ${{ env.BUILD_TYPE }}

				        test_selection: performance/test_perf_olap.py

				        run_in_parallel: false

				        save_perf_report: ${{ env.SAVE_PERF_REPORT }}

				        extra_params: -m remote_cluster --timeout 21600 -k test_pgvector_indexing

				      env:

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				    - name: Benchmark pgvector queries

				      uses: ./.github/actions/run-python-test-set

				      with:

				        build_type: ${{ env.BUILD_TYPE }}

				        test_selection: performance/test_perf_pgvector_queries.py

				        run_in_parallel: false

				        save_perf_report: ${{ env.SAVE_PERF_REPORT }}

				        extra_params: -m remote_cluster --timeout 21600 

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				    - name: Create Allure report

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-generate

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				      uses: slackapi/slack-github-action@v1

				      with:

				        channel-id: "C033QLM5P7D" # dev-staging-stream

				        slack-message: "Periodic perf testing neon-captest-pgvector: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"

				      env:

				        SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

				  clickbench-compare:

				    # ClichBench DB for rds-aurora and rds-Postgres deployed to the same clusters

				    # we use for performance testing in pgbench-compare.

				@@ -275,39 +445,35 @@ jobs:

				    #

				    # *_CLICKBENCH_CONNSTR: Genuine ClickBench DB with ~100M rows

				    # *_CLICKBENCH_10M_CONNSTR: DB with the first 10M rows of ClickBench DB

				    if: success() || failure()

				    needs: [ pgbench-compare ]

				    if: ${{ !cancelled() && (github.event.inputs.run_only_pgvector_tests == 'false' || github.event.inputs.run_only_pgvector_tests == null) }}

				    needs: [ generate-matrices, pgbench-compare ]

				    strategy:

				      fail-fast: false

				      matrix:

				        # neon-captest-prefetch: We have pre-created projects with prefetch enabled

				        # rds-aurora: Aurora Postgres Serverless v2 with autoscaling from 0.5 to 2 ACUs

				        # rds-postgres: RDS Postgres db.m5.large instance (2 vCPU, 8 GiB) with gp3 EBS storage

				        platform: [ neon-captest-prefetch, rds-postgres, rds-aurora ]

				      matrix: ${{ fromJson(needs.generate-matrices.outputs.olap-compare-matrix) }}

				    env:

				      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      DEFAULT_PG_VERSION: 14

				      TEST_OUTPUT: /tmp/test_output

				      TEST_OLAP_COLLECT_EXPLAIN: ${{ github.event.inputs.collect_olap_explain }}

				      TEST_OLAP_COLLECT_PG_STAT_STATEMENTS: ${{ github.event.inputs.collect_pg_stat_statements }}

				      BUILD_TYPE: remote

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}

				      PLATFORM: ${{ matrix.platform }}

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:pinned

				      options: --init

				    timeout-minutes: 360 # 6h

				    steps:

				    - uses: actions/checkout@v3

				    - uses: actions/checkout@v4

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-release-artifact

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				@@ -320,7 +486,7 @@ jobs:

				      id: set-up-connstr

				      run: |

				        case "${PLATFORM}" in

				          neon-captest-prefetch)

				          neon-captest-reuse)

				            CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CLICKBENCH_10M_CONNSTR }}

				            ;;

				          rds-aurora)

				@@ -330,25 +496,22 @@ jobs:

				            CONNSTR=${{ secrets.BENCHMARK_RDS_POSTGRES_CLICKBENCH_10M_CONNSTR }}

				            ;;

				          *)

				            echo 2>&1 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-prefetch', 'rds-aurora', or 'rds-postgres'"

				            echo >&2 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-reuse', 'rds-aurora', or 'rds-postgres'"

				            exit 1

				            ;;

				        esac

				        echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT

				        psql ${CONNSTR} -c "SELECT version();"

				        QUERIES=("SELECT version()")

				        if [[ "${PLATFORM}" = "neon"* ]]; then

				          QUERIES+=("SHOW neon.tenant_id")

				          QUERIES+=("SHOW neon.timeline_id")

				        fi

				    - name: Set database options

				      if: matrix.platform == 'neon-captest-prefetch'

				      run: |

				        DB_NAME=$(psql ${BENCHMARK_CONNSTR} --no-align --quiet -t -c "SELECT current_database()")

				        psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET enable_seqscan_prefetch=on"

				        psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET effective_io_concurrency=32"

				        psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET maintenance_io_concurrency=32"

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				        for q in "${QUERIES[@]}"; do

				          psql ${CONNSTR} -c "${q}"

				        done

				    - name: ClickBench benchmark

				      uses: ./.github/actions/run-python-test-set

				@@ -361,14 +524,14 @@ jobs:

				      env:

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				        TEST_OLAP_COLLECT_EXPLAIN: ${{ github.event.inputs.collect_olap_explain || 'false' }}

				        TEST_OLAP_COLLECT_PG_STAT_STATEMENTS: ${{ github.event.inputs.collect_pg_stat_statements || 'false' }}

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				        TEST_OLAP_SCALE: 10

				    - name: Create Allure report

				      if: success() || failure()

				      uses: ./.github/actions/allure-report

				      with:

				        action: generate

				        build_type: ${{ env.BUILD_TYPE }}

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-generate

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				@@ -386,39 +549,34 @@ jobs:

				    # We might change it after https://github.com/neondatabase/neon/issues/2900.

				    #

				    # *_TPCH_S10_CONNSTR: DB generated with scale factor 10 (~10 GB)

				    if: success() || failure()

				    needs: [ clickbench-compare ]

				    if: ${{ !cancelled() && (github.event.inputs.run_only_pgvector_tests == 'false' || github.event.inputs.run_only_pgvector_tests == null) }}

				    needs: [ generate-matrices, clickbench-compare ]

				    strategy:

				      fail-fast: false

				      matrix:

				        # neon-captest-prefetch: We have pre-created projects with prefetch enabled

				        # rds-aurora: Aurora Postgres Serverless v2 with autoscaling from 0.5 to 2 ACUs

				        # rds-postgres: RDS Postgres db.m5.large instance (2 vCPU, 8 GiB) with gp3 EBS storage

				        platform: [ neon-captest-prefetch, rds-postgres, rds-aurora ]

				      matrix: ${{ fromJson(needs.generate-matrices.outputs.tpch-compare-matrix) }}

				    env:

				      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      DEFAULT_PG_VERSION: 14

				      TEST_OUTPUT: /tmp/test_output

				      BUILD_TYPE: remote

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}

				      PLATFORM: ${{ matrix.platform }}

				      TEST_OLAP_SCALE: ${{ matrix.scale }}

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:pinned

				      options: --init

				    timeout-minutes: 360 # 6h

				    steps:

				    - uses: actions/checkout@v3

				    - uses: actions/checkout@v4

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-release-artifact

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				@@ -427,39 +585,43 @@ jobs:

				        ${POSTGRES_DISTRIB_DIR}/v${DEFAULT_PG_VERSION}/bin/pgbench --version

				        echo "${POSTGRES_DISTRIB_DIR}/v${DEFAULT_PG_VERSION}/bin" >> $GITHUB_PATH

				    - name: Set up Connection String

				      id: set-up-connstr

				    - name: Get Connstring Secret Name

				      run: |

				        case "${PLATFORM}" in

				          neon-captest-prefetch)

				            CONNSTR=${{ secrets.BENCHMARK_CAPTEST_TPCH_S10_CONNSTR }}

				          neon-captest-reuse)

				            ENV_PLATFORM=CAPTEST_TPCH

				            ;;

				          rds-aurora)

				            CONNSTR=${{ secrets.BENCHMARK_RDS_AURORA_TPCH_S10_CONNSTR }}

				            ENV_PLATFORM=RDS_AURORA_TPCH

				            ;;

				          rds-postgres)

				            CONNSTR=${{ secrets.BENCHMARK_RDS_POSTGRES_TPCH_S10_CONNSTR }}

				            ENV_PLATFORM=RDS_AURORA_TPCH

				            ;;

				          *)

				            echo 2>&1 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-prefetch', 'rds-aurora', or 'rds-postgres'"

				            echo >&2 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-reuse', 'rds-aurora', or 'rds-postgres'"

				            exit 1

				            ;;

				        esac

				        CONNSTR_SECRET_NAME="BENCHMARK_${ENV_PLATFORM}_S${TEST_OLAP_SCALE}_CONNSTR"

				        echo "CONNSTR_SECRET_NAME=${CONNSTR_SECRET_NAME}" >> $GITHUB_ENV

				    - name: Set up Connection String

				      id: set-up-connstr

				      run: |

				        CONNSTR=${{ secrets[env.CONNSTR_SECRET_NAME] }}

				        echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT

				        psql ${CONNSTR} -c "SELECT version();"

				        QUERIES=("SELECT version()")

				        if [[ "${PLATFORM}" = "neon"* ]]; then

				          QUERIES+=("SHOW neon.tenant_id")

				          QUERIES+=("SHOW neon.timeline_id")

				        fi

				    - name: Set database options

				      if: matrix.platform == 'neon-captest-prefetch'

				      run: |

				        DB_NAME=$(psql ${BENCHMARK_CONNSTR} --no-align --quiet -t -c "SELECT current_database()")

				        psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET enable_seqscan_prefetch=on"

				        psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET effective_io_concurrency=32"

				        psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET maintenance_io_concurrency=32"

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				        for q in "${QUERIES[@]}"; do

				          psql ${CONNSTR} -c "${q}"

				        done

				    - name: Run TPC-H benchmark

				      uses: ./.github/actions/run-python-test-set

				@@ -473,13 +635,11 @@ jobs:

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				        TEST_OLAP_SCALE: ${{ matrix.scale }}

				    - name: Create Allure report

				      if: success() || failure()

				      uses: ./.github/actions/allure-report

				      with:

				        action: generate

				        build_type: ${{ env.BUILD_TYPE }}

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-generate

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				@@ -491,39 +651,33 @@ jobs:

				        SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

				  user-examples-compare:

				    if: success() || failure()

				    needs: [ tpch-compare ]

				    if: ${{ !cancelled() && (github.event.inputs.run_only_pgvector_tests == 'false' || github.event.inputs.run_only_pgvector_tests == null) }}

				    needs: [ generate-matrices, tpch-compare ]

				    strategy:

				      fail-fast: false

				      matrix:

				        # neon-captest-prefetch: We have pre-created projects with prefetch enabled

				        # rds-aurora: Aurora Postgres Serverless v2 with autoscaling from 0.5 to 2 ACUs

				        # rds-postgres: RDS Postgres db.m5.large instance (2 vCPU, 8 GiB) with gp3 EBS storage

				        platform: [ neon-captest-prefetch, rds-postgres, rds-aurora ]

				      matrix: ${{ fromJson(needs.generate-matrices.outputs.olap-compare-matrix) }}

				    env:

				      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      DEFAULT_PG_VERSION: 14

				      TEST_OUTPUT: /tmp/test_output

				      BUILD_TYPE: remote

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}

				      PLATFORM: ${{ matrix.platform }}

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:pinned

				      options: --init

				    timeout-minutes: 360 # 6h

				    steps:

				    - uses: actions/checkout@v3

				    - uses: actions/checkout@v4

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-release-artifact

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				@@ -536,7 +690,7 @@ jobs:

				      id: set-up-connstr

				      run: |

				        case "${PLATFORM}" in

				          neon-captest-prefetch)

				          neon-captest-reuse)

				            CONNSTR=${{ secrets.BENCHMARK_USER_EXAMPLE_CAPTEST_CONNSTR }}

				            ;;

				          rds-aurora)

				@@ -546,25 +700,22 @@ jobs:

				            CONNSTR=${{ secrets.BENCHMARK_USER_EXAMPLE_RDS_POSTGRES_CONNSTR }}

				            ;;

				          *)

				            echo 2>&1 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-prefetch', 'rds-aurora', or 'rds-postgres'"

				            echo >&2 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-reuse', 'rds-aurora', or 'rds-postgres'"

				            exit 1

				            ;;

				        esac

				        echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT

				        psql ${CONNSTR} -c "SELECT version();"

				        QUERIES=("SELECT version()")

				        if [[ "${PLATFORM}" = "neon"* ]]; then

				          QUERIES+=("SHOW neon.tenant_id")

				          QUERIES+=("SHOW neon.timeline_id")

				        fi

				    - name: Set database options

				      if: matrix.platform == 'neon-captest-prefetch'

				      run: |

				        DB_NAME=$(psql ${BENCHMARK_CONNSTR} --no-align --quiet -t -c "SELECT current_database()")

				        psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET enable_seqscan_prefetch=on"

				        psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET effective_io_concurrency=32"

				        psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET maintenance_io_concurrency=32"

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				        for q in "${QUERIES[@]}"; do

				          psql ${CONNSTR} -c "${q}"

				        done

				    - name: Run user examples

				      uses: ./.github/actions/run-python-test-set

				@@ -580,17 +731,14 @@ jobs:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				    - name: Create Allure report

				      if: success() || failure()

				      uses: ./.github/actions/allure-report

				      with:

				        action: generate

				        build_type: ${{ env.BUILD_TYPE }}

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-generate

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				      uses: slackapi/slack-github-action@v1

				      with:

				        channel-id: "C033QLM5P7D" # dev-staging-stream

				        slack-message: "Periodic TPC-H perf testing ${{ matrix.platform }}: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"

				        slack-message: "Periodic User example perf testing ${{ matrix.platform }}: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"

				      env:

				        SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

									
										105

.github/workflows/build-build-tools-image.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,105 @@

				name: Build build-tools image

				on:

				  workflow_call:

				    inputs:

				      image-tag:

				        description: "build-tools image tag"

				        required: true

				        type: string

				    outputs:

				      image-tag:

				        description: "build-tools tag"

				        value: ${{ inputs.image-tag }}

				      image:

				        description: "build-tools image"

				        value: neondatabase/build-tools:${{ inputs.image-tag }}

				defaults:

				  run:

				    shell: bash -euo pipefail {0}

				concurrency:

				  group: build-build-tools-image-${{ inputs.image-tag }}

				  cancel-in-progress: false

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				jobs:

				  check-image:

				    uses: ./.github/workflows/check-build-tools-image.yml

				  build-image:

				    needs: [ check-image ]

				    if: needs.check-image.outputs.found == 'false'

				    strategy:

				      matrix:

				        arch: [ x64, arm64 ]

				    runs-on: ${{ fromJson(format('["self-hosted", "gen3", "{0}"]', matrix.arch == 'arm64' && 'large-arm64' || 'large')) }}

				    env:

				      IMAGE_TAG: ${{ inputs.image-tag }}

				    steps:

				      - name: Check `input.tag` is correct

				        env:

				          INPUTS_IMAGE_TAG: ${{ inputs.image-tag }}

				          CHECK_IMAGE_TAG : ${{ needs.check-image.outputs.image-tag }}

				        run: |

				          if [ "${INPUTS_IMAGE_TAG}" != "${CHECK_IMAGE_TAG}" ]; then

				            echo "'inputs.image-tag' (${INPUTS_IMAGE_TAG}) does not match the tag of the latest build-tools image 'inputs.image-tag' (${CHECK_IMAGE_TAG})"

				            exit 1

				          fi

				      - uses: actions/checkout@v4

				      # Use custom DOCKER_CONFIG directory to avoid conflicts with default settings

				      # The default value is ~/.docker

				      - name: Set custom docker config directory

				        run: |

				          mkdir -p /tmp/.docker-custom

				          echo DOCKER_CONFIG=/tmp/.docker-custom >> $GITHUB_ENV

				      - uses: docker/setup-buildx-action@v2

				      - uses: docker/login-action@v2

				        with:

				          username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				          password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      - uses: docker/build-push-action@v4

				        with:

				          context: .

				          provenance: false

				          push: true

				          pull: true

				          file: Dockerfile.build-tools

				          cache-from: type=registry,ref=neondatabase/build-tools:cache-${{ matrix.arch }}

				          cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=neondatabase/build-tools:cache-{0},mode=max', matrix.arch) || '' }}

				          tags: neondatabase/build-tools:${{ inputs.image-tag }}-${{ matrix.arch }}

				      - name: Remove custom docker config directory

				        run: |

				          rm -rf /tmp/.docker-custom

				  merge-images:

				    needs: [ build-image ]

				    runs-on: ubuntu-22.04

				    env:

				      IMAGE_TAG: ${{ inputs.image-tag }}

				    steps:

				      - uses: docker/login-action@v3

				        with:

				          username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				          password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      - name: Create multi-arch image

				        run: |

				          docker buildx imagetools create -t neondatabase/build-tools:${IMAGE_TAG} \

				                                             neondatabase/build-tools:${IMAGE_TAG}-x64 \

				                                             neondatabase/build-tools:${IMAGE_TAG}-arm64

1253

.github/workflows/build_and_test.yml vendored

View File

File diff suppressed because it is too large Load Diff

									
										51

.github/workflows/check-build-tools-image.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,51 @@

				name: Check build-tools image

				on:

				  workflow_call:

				    outputs:

				      image-tag:

				        description: "build-tools image tag"

				        value: ${{ jobs.check-image.outputs.tag }}

				      found:

				        description: "Whether the image is found in the registry"

				        value: ${{ jobs.check-image.outputs.found }}

				defaults:

				  run:

				    shell: bash -euo pipefail {0}

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				jobs:

				  check-image:

				    runs-on: ubuntu-22.04

				    outputs:

				      tag: ${{ steps.get-build-tools-tag.outputs.image-tag }}

				      found: ${{ steps.check-image.outputs.found }}

				    steps:

				      - uses: actions/checkout@v4

				      - name: Get build-tools image tag for the current commit

				        id: get-build-tools-tag

				        env:

				          IMAGE_TAG: |

				            ${{ hashFiles('Dockerfile.build-tools',

				                          '.github/workflows/check-build-tools-image.yml',

				                          '.github/workflows/build-build-tools-image.yml') }}

				        run: |

				          echo "image-tag=${IMAGE_TAG}" | tee -a $GITHUB_OUTPUT

				      - name: Check if such tag found in the registry

				        id: check-image

				        env:

				          IMAGE_TAG: ${{ steps.get-build-tools-tag.outputs.image-tag }}

				        run: |

				          if docker manifest inspect neondatabase/build-tools:${IMAGE_TAG}; then

				            found=true

				          else

				            found=false

				          fi

				          echo "found=${found}" | tee -a $GITHUB_OUTPUT

									
										36

.github/workflows/check-permissions.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,36 @@

				name: Check Permissions

				on:

				  workflow_call:

				    inputs:

				      github-event-name:

				        required: true

				        type: string

				defaults:

				  run:

				    shell: bash -euo pipefail {0}

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				jobs:

				  check-permissions:

				    runs-on: ubuntu-22.04

				    steps:

				    - name: Disallow CI runs on PRs from forks

				      if: |

				        inputs.github-event-name  == 'pull_request' &&

				        github.event.pull_request.head.repo.full_name != github.repository

				      run: |

				        if [ "${{ contains(fromJSON('["OWNER", "MEMBER", "COLLABORATOR"]'), github.event.pull_request.author_association) }}" = "true" ]; then

				          MESSAGE="Please create a PR from a branch of ${GITHUB_REPOSITORY} instead of a fork"

				        else

				          MESSAGE="The PR should be reviewed and labelled with 'approved-for-ci-run' to trigger a CI run"

				        fi

				        # TODO: use actions/github-script to post this message as a PR comment

				        echo >&2 "We don't run CI for PRs from forks"

				        echo >&2 "${MESSAGE}"

				        exit 1

									
										32

.github/workflows/cleanup-caches-by-a-branch.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,32 @@

				# A workflow from

				# https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#force-deleting-cache-entries

				name: cleanup caches by a branch

				on:

				  pull_request:

				    types:

				      - closed

				jobs:

				  cleanup:

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Cleanup

				        run: |

				          gh extension install actions/gh-actions-cache

				          echo "Fetching list of cache key"

				          cacheKeysForPR=$(gh actions-cache list -R $REPO -B $BRANCH -L 100 | cut -f 1 )

				          ## Setting this to not fail the workflow while deleting cache keys.

				          set +e

				          echo "Deleting caches..."

				          for cacheKey in $cacheKeysForPR

				          do

				              gh actions-cache delete $cacheKey -R $REPO -B $BRANCH --confirm

				          done

				          echo "Done"

				        env:

				          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				          REPO: ${{ github.repository }}

				          BRANCH: refs/pull/${{ github.event.pull_request.number }}/merge

									
										179

.github/workflows/deploy-dev.yml
									
										vendored
									
												View File
											
				@@ -1,179 +0,0 @@

				name: Neon Deploy dev

				on:

				  workflow_dispatch:

				    inputs:

				      dockerTag:

				        description: 'Docker tag to deploy'

				        required: true

				        type: string

				      branch:

				        description: 'Branch or commit used for deploy scripts and configs'

				        required: true

				        type: string

				        default: 'main'

				      deployStorage:

				        description: 'Deploy storage'

				        required: true

				        type: boolean

				        default: true

				      deployProxy:

				        description: 'Deploy proxy'

				        required: true

				        type: boolean

				        default: true

				      deployStorageBroker:

				        description: 'Deploy storage-broker'

				        required: true

				        type: boolean

				        default: true

				env:

				  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}

				  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}

				concurrency:

				  group: deploy-dev

				  cancel-in-progress: false

				jobs:

				  deploy-storage-new:

				    runs-on: [ self-hosted, gen3, small ]

				    container:

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/ansible:pinned

				      options: --user root --privileged

				    if: inputs.deployStorage

				    defaults:

				      run:

				        shell: bash

				    strategy:

				      matrix:

				        target_region: [ eu-west-1, us-east-2 ]

				    environment:

				      name: dev-${{ matrix.target_region }}

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v3

				        with:

				          submodules: true

				          fetch-depth: 0

				          ref: ${{ inputs.branch }}

				      - name: Redeploy

				        run: |

				          export DOCKER_TAG=${{ inputs.dockerTag }}

				          cd "$(pwd)/.github/ansible"

				          ./get_binaries.sh

				          ansible-galaxy collection install sivel.toiletwater

				          ansible-playbook -v deploy.yaml -i staging.${{ matrix.target_region }}.hosts.yaml -e @ssm_config -e CONSOLE_API_TOKEN=${{ secrets.NEON_STAGING_API_KEY }} -e SENTRY_URL_PAGESERVER=${{ secrets.SENTRY_URL_PAGESERVER }} -e SENTRY_URL_SAFEKEEPER=${{ secrets.SENTRY_URL_SAFEKEEPER }}

				          rm -f neon_install.tar.gz .neon_current_version

				      - name: Cleanup ansible folder

				        run: rm -rf ~/.ansible

				  deploy-proxy-new:

				    runs-on: [ self-hosted, gen3, small ]

				    container: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/ansible:pinned

				    if: inputs.deployProxy

				    defaults:

				      run:

				        shell: bash

				    strategy:

				      matrix:

				        include:

				          - target_region:  us-east-2

				            target_cluster: dev-us-east-2-beta

				            deploy_link_proxy: true

				            deploy_legacy_scram_proxy: true

				          - target_region:  eu-west-1

				            target_cluster: dev-eu-west-1-zeta

				            deploy_link_proxy: false

				            deploy_legacy_scram_proxy: false

				    environment:

				      name: dev-${{ matrix.target_region }}

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v3

				        with:

				          submodules: true

				          fetch-depth: 0

				          ref: ${{ inputs.branch }}

				      - name: Configure AWS Credentials

				        uses: aws-actions/configure-aws-credentials@v1-node16

				        with:

				          role-to-assume: arn:aws:iam::369495373322:role/github-runner

				          aws-region: eu-central-1

				          role-skip-session-tagging: true

				          role-duration-seconds: 1800

				      - name: Configure environment

				        run: |

				          helm repo add neondatabase https://neondatabase.github.io/helm-charts

				          aws --region ${{ matrix.target_region }} eks update-kubeconfig --name  ${{ matrix.target_cluster }}

				      - name: Re-deploy scram proxy

				        run: |

				          DOCKER_TAG=${{ inputs.dockerTag }}

				          helm upgrade neon-proxy-scram neondatabase/neon-proxy --namespace neon-proxy --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-proxy-scram.yaml --set image.tag=${DOCKER_TAG} --set settings.sentryUrl=${{ secrets.SENTRY_URL_PROXY }} --wait --timeout 15m0s

				      - name: Re-deploy link proxy

				        if: matrix.deploy_link_proxy

				        run: |

				          DOCKER_TAG=${{ inputs.dockerTag }}

				          helm upgrade neon-proxy-link neondatabase/neon-proxy --namespace neon-proxy --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-proxy-link.yaml --set image.tag=${DOCKER_TAG} --set settings.sentryUrl=${{ secrets.SENTRY_URL_PROXY }} --wait --timeout 15m0s

				      - name: Re-deploy legacy scram proxy

				        if: matrix.deploy_legacy_scram_proxy

				        run: |

				          DOCKER_TAG=${{ inputs.dockerTag }}

				          helm upgrade neon-proxy-scram-legacy neondatabase/neon-proxy --namespace neon-proxy --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-proxy-scram-legacy.yaml --set image.tag=${DOCKER_TAG} --set settings.sentryUrl=${{ secrets.SENTRY_URL_PROXY }} --wait --timeout 15m0s

				      - name: Cleanup helm folder

				        run: rm -rf ~/.cache

				  deploy-storage-broker-new:

				    runs-on: [ self-hosted, gen3, small ]

				    container: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/ansible:pinned

				    if: inputs.deployStorageBroker

				    defaults:

				      run:

				        shell: bash

				    strategy:

				      matrix:

				        include:

				          - target_region:  us-east-2

				            target_cluster: dev-us-east-2-beta

				          - target_region:  eu-west-1

				            target_cluster: dev-eu-west-1-zeta

				    environment:

				      name: dev-${{ matrix.target_region }}

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v3

				        with:

				          submodules: true

				          fetch-depth: 0

				          ref: ${{ inputs.branch }}

				      - name: Configure AWS Credentials

				        uses: aws-actions/configure-aws-credentials@v1-node16

				        with:

				          role-to-assume: arn:aws:iam::369495373322:role/github-runner

				          aws-region: eu-central-1

				          role-skip-session-tagging: true

				          role-duration-seconds: 1800

				      - name: Configure environment

				        run: |

				          helm repo add neondatabase https://neondatabase.github.io/helm-charts

				          aws --region ${{ matrix.target_region }} eks update-kubeconfig --name  ${{ matrix.target_cluster }}

				      - name: Deploy storage-broker

				        run:

				          helm upgrade neon-storage-broker-lb neondatabase/neon-storage-broker --namespace neon-storage-broker-lb --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-storage-broker.yaml --set image.tag=${{ inputs.dockerTag }} --set settings.sentryUrl=${{ secrets.SENTRY_URL_BROKER }} --wait --timeout 5m0s

				      - name: Cleanup helm folder

				        run: rm -rf ~/.cache

									
										167

.github/workflows/deploy-prod.yml
									
										vendored
									
												View File
											
				@@ -1,167 +0,0 @@

				name: Neon Deploy prod

				on:

				  workflow_dispatch:

				    inputs:

				      dockerTag:

				        description: 'Docker tag to deploy'

				        required: true

				        type: string

				      branch:

				        description: 'Branch or commit used for deploy scripts and configs'

				        required: true

				        type: string

				        default: 'release'

				      deployStorage:

				        description: 'Deploy storage'

				        required: true

				        type: boolean

				        default: true

				      deployProxy:

				        description: 'Deploy proxy'

				        required: true

				        type: boolean

				        default: true

				      deployStorageBroker:

				        description: 'Deploy storage-broker'

				        required: true

				        type: boolean

				        default: true

				      disclamerAcknowledged:

				        description: 'I confirm that there is an emergency and I can not use regular release workflow'

				        required: true

				        type: boolean

				        default: false

				concurrency:

				  group: deploy-prod

				  cancel-in-progress: false

				jobs:

				  deploy-prod-new:

				    runs-on: prod

				    container:

				      image: 093970136003.dkr.ecr.eu-central-1.amazonaws.com/ansible:latest

				      options: --user root --privileged

				    if: inputs.deployStorage && inputs.disclamerAcknowledged

				    defaults:

				      run:

				        shell: bash

				    strategy:

				      matrix:

				        target_region: [ us-east-2, us-west-2, eu-central-1, ap-southeast-1 ]

				    environment:

				      name: prod-${{ matrix.target_region }}

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v3

				        with:

				          submodules: true

				          fetch-depth: 0

				          ref: ${{ inputs.branch }}

				      - name: Redeploy

				        run: |

				          export DOCKER_TAG=${{ inputs.dockerTag }}

				          cd "$(pwd)/.github/ansible"

				          ./get_binaries.sh

				          ansible-galaxy collection install sivel.toiletwater

				          ansible-playbook -v deploy.yaml -i prod.${{ matrix.target_region }}.hosts.yaml -e @ssm_config -e CONSOLE_API_TOKEN=${{ secrets.NEON_PRODUCTION_API_KEY }} -e SENTRY_URL_PAGESERVER=${{ secrets.SENTRY_URL_PAGESERVER }} -e SENTRY_URL_SAFEKEEPER=${{ secrets.SENTRY_URL_SAFEKEEPER }}

				          rm -f neon_install.tar.gz .neon_current_version

				  deploy-proxy-prod-new:

				    runs-on: prod

				    container: 093970136003.dkr.ecr.eu-central-1.amazonaws.com/ansible:latest

				    if: inputs.deployProxy && inputs.disclamerAcknowledged

				    defaults:

				      run:

				        shell: bash

				    strategy:

				      matrix:

				        include:

				          - target_region:  us-east-2

				            target_cluster: prod-us-east-2-delta

				            deploy_link_proxy: true

				            deploy_legacy_scram_proxy: false

				          - target_region:  us-west-2

				            target_cluster: prod-us-west-2-eta

				            deploy_link_proxy: false

				            deploy_legacy_scram_proxy: true

				          - target_region: eu-central-1

				            target_cluster: prod-eu-central-1-gamma

				            deploy_link_proxy: false

				            deploy_legacy_scram_proxy: false

				          - target_region: ap-southeast-1

				            target_cluster: prod-ap-southeast-1-epsilon

				            deploy_link_proxy: false

				            deploy_legacy_scram_proxy: false

				    environment:

				      name: prod-${{ matrix.target_region }}

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v3

				        with:

				          submodules: true

				          fetch-depth: 0

				          ref: ${{ inputs.branch }}

				      - name: Configure environment

				        run: |

				          helm repo add neondatabase https://neondatabase.github.io/helm-charts

				          aws --region ${{ matrix.target_region }} eks update-kubeconfig --name  ${{ matrix.target_cluster }}

				      - name: Re-deploy scram proxy

				        run: |

				          DOCKER_TAG=${{ inputs.dockerTag }}

				          helm upgrade neon-proxy-scram neondatabase/neon-proxy --namespace neon-proxy --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-proxy-scram.yaml --set image.tag=${DOCKER_TAG} --set settings.sentryUrl=${{ secrets.SENTRY_URL_PROXY }} --wait --timeout 15m0s

				      - name: Re-deploy link proxy

				        if: matrix.deploy_link_proxy

				        run: |

				          DOCKER_TAG=${{ inputs.dockerTag }}

				          helm upgrade neon-proxy-link neondatabase/neon-proxy --namespace neon-proxy --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-proxy-link.yaml --set image.tag=${DOCKER_TAG} --set settings.sentryUrl=${{ secrets.SENTRY_URL_PROXY }} --wait --timeout 15m0s

				      - name: Re-deploy legacy scram proxy

				        if: matrix.deploy_legacy_scram_proxy

				        run: |

				          DOCKER_TAG=${{ inputs.dockerTag }}

				          helm upgrade neon-proxy-scram-legacy neondatabase/neon-proxy --namespace neon-proxy --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-proxy-scram-legacy.yaml --set image.tag=${DOCKER_TAG} --set settings.sentryUrl=${{ secrets.SENTRY_URL_PROXY }} --wait --timeout 15m0s

				  deploy-storage-broker-prod-new:

				    runs-on: prod

				    container: 093970136003.dkr.ecr.eu-central-1.amazonaws.com/ansible:latest

				    if: inputs.deployStorageBroker && inputs.disclamerAcknowledged

				    defaults:

				      run:

				        shell: bash

				    strategy:

				      matrix:

				        include:

				          - target_region:  us-east-2

				            target_cluster: prod-us-east-2-delta

				          - target_region:  us-west-2

				            target_cluster: prod-us-west-2-eta

				          - target_region: eu-central-1

				            target_cluster: prod-eu-central-1-gamma

				          - target_region: ap-southeast-1

				            target_cluster: prod-ap-southeast-1-epsilon

				    environment:

				      name: prod-${{ matrix.target_region }}

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v3

				        with:

				          submodules: true

				          fetch-depth: 0

				          ref: ${{ inputs.branch }}

				      - name: Configure environment

				        run: |

				          helm repo add neondatabase https://neondatabase.github.io/helm-charts

				          aws --region ${{ matrix.target_region }} eks update-kubeconfig --name  ${{ matrix.target_cluster }}

				      - name: Deploy storage-broker

				        run:

				          helm upgrade neon-storage-broker-lb neondatabase/neon-storage-broker --namespace neon-storage-broker-lb --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-storage-broker.yaml --set image.tag=${{ inputs.dockerTag }} --set settings.sentryUrl=${{ secrets.SENTRY_URL_BROKER }} --wait --timeout 5m0s

									
										303

.github/workflows/neon_extra_builds.yml
									
										vendored
									
												View File
												
				@@ -3,7 +3,7 @@ name: Check neon with extra platform builds

				on:

				  push:

				    branches:

				    - main

				      - main

				  pull_request:

				defaults:

				@@ -12,7 +12,7 @@ defaults:

				concurrency:

				  # Allow only one workflow per any non-`main` branch.

				  group: ${{ github.workflow }}-${{ github.ref }}-${{ github.ref == 'refs/heads/main' && github.sha || 'anysha' }}

				  group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}

				  cancel-in-progress: true

				env:

				@@ -20,10 +20,31 @@ env:

				  COPT: '-Werror'

				jobs:

				  check-permissions:

				    if: ${{ !contains(github.event.pull_request.labels.*.name, 'run-no-ci') }}

				    uses: ./.github/workflows/check-permissions.yml

				    with:

				      github-event-name: ${{ github.event_name}}

				  check-build-tools-image:

				    needs: [ check-permissions ]

				    uses: ./.github/workflows/check-build-tools-image.yml

				  build-build-tools-image:

				    needs: [ check-build-tools-image ]

				    uses: ./.github/workflows/build-build-tools-image.yml

				    with:

				      image-tag: ${{ needs.check-build-tools-image.outputs.image-tag }}

				    secrets: inherit

				  check-macos-build:

				    if: github.ref_name == 'main' || contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')

				    needs: [ check-permissions ]

				    if: |

				      contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')  ||

				      contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||

				      github.ref_name == 'main'

				    timeout-minutes: 90

				    runs-on: macos-latest

				    runs-on: macos-14

				    env:

				      # Use release build only, to have less debug info around

				@@ -32,13 +53,13 @@ jobs:

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v3

				        uses: actions/checkout@v4

				        with:

				          submodules: true

				          fetch-depth: 1

				      - name: Install macOS postgres dependencies

				        run: brew install flex bison openssl protobuf

				        run: brew install flex bison openssl protobuf icu4c pkg-config

				      - name: Set pg 14 revision for caching

				        id: pg_v14_rev

				@@ -48,19 +69,30 @@ jobs:

				        id: pg_v15_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v15) >> $GITHUB_OUTPUT

				      - name: Set pg 16 revision for caching

				        id: pg_v16_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) >> $GITHUB_OUTPUT

				      - name: Cache postgres v14 build

				        id: cache_pg_14

				        uses: actions/cache@v3

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v14

				          key: v1-${{ runner.os }}-${{ matrix.build_type }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v15 build

				        id: cache_pg_15

				        uses: actions/cache@v3

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v15

				          key: v1-${{ runner.os }}-${{ matrix.build_type }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v16 build

				        id: cache_pg_16

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v16

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Set extra env for macOS

				        run: |

				@@ -68,37 +100,259 @@ jobs:

				          echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV

				      - name: Cache cargo deps

				        uses: actions/cache@v3

				        uses: actions/cache@v4

				        with:

				          path: |

				            ~/.cargo/registry

				            !~/.cargo/registry/src

				            ~/.cargo/git

				            target

				          key: v1-${{ runner.os }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust

				          key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust

				      - name: Build postgres v14

				        if: steps.cache_pg_14.outputs.cache-hit != 'true'

				        run: make postgres-v14 -j$(nproc)

				        run: make postgres-v14 -j$(sysctl -n hw.ncpu)

				      - name: Build postgres v15

				        if: steps.cache_pg_15.outputs.cache-hit != 'true'

				        run: make postgres-v15 -j$(nproc)

				        run: make postgres-v15 -j$(sysctl -n hw.ncpu)

				      - name: Build postgres v16

				        if: steps.cache_pg_16.outputs.cache-hit != 'true'

				        run: make postgres-v16 -j$(sysctl -n hw.ncpu)

				      - name: Build neon extensions

				        run: make neon-pg-ext -j$(nproc)

				        run: make neon-pg-ext -j$(sysctl -n hw.ncpu)

				      - name: Build walproposer-lib

				        run: make walproposer-lib -j$(sysctl -n hw.ncpu)

				      - name: Run cargo build

				        run: cargo build --all --release

				        run: PQ_LIB_DIR=$(pwd)/pg_install/v16/lib cargo build --all --release

				      - name: Check that no warnings are produced

				        run: ./run_clippy.sh

				  gather-rust-build-stats:

				    if: github.ref_name == 'main' || contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats')

				    runs-on: [ self-hosted, gen3, large ]

				  check-linux-arm-build:

				    needs: [ check-permissions, build-build-tools-image ]

				    timeout-minutes: 90

				    runs-on: [ self-hosted, small-arm64 ]

				    env:

				      # Use release build only, to have less debug info around

				      # Hence keeping target/ (and general cache size) smaller

				      BUILD_TYPE: release

				      CARGO_FEATURES: --features testing

				      CARGO_FLAGS: --release

				      AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}

				      AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}

				    container:

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned

				      image: ${{ needs.build-build-tools-image.outputs.image }}

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      options: --init

				    steps:

				      - name: Fix git ownership

				        run: |

				          # Workaround for `fatal: detected dubious ownership in repository at ...`

				          #

				          # Use both ${{ github.workspace }} and ${GITHUB_WORKSPACE} because they're different on host and in containers

				          #   Ref https://github.com/actions/checkout/issues/785

				          #

				          git config --global --add safe.directory ${{ github.workspace }}

				          git config --global --add safe.directory ${GITHUB_WORKSPACE}

				          for r in 14 15 16; do

				            git config --global --add safe.directory "${{ github.workspace }}/vendor/postgres-v$r"

				            git config --global --add safe.directory "${GITHUB_WORKSPACE}/vendor/postgres-v$r"

				          done

				      - name: Checkout

				        uses: actions/checkout@v4

				        with:

				          submodules: true

				          fetch-depth: 1

				      - name: Set pg 14 revision for caching

				        id: pg_v14_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v14) >> $GITHUB_OUTPUT

				      - name: Set pg 15 revision for caching

				        id: pg_v15_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v15) >> $GITHUB_OUTPUT

				      - name: Set pg 16 revision for caching

				        id: pg_v16_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) >> $GITHUB_OUTPUT

				      - name: Set env variables

				        run: |

				          echo "CARGO_HOME=${GITHUB_WORKSPACE}/.cargo" >> $GITHUB_ENV

				      - name: Cache postgres v14 build

				        id: cache_pg_14

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v14

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v15 build

				        id: cache_pg_15

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v15

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v16 build

				        id: cache_pg_16

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v16

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Build postgres v14

				        if: steps.cache_pg_14.outputs.cache-hit != 'true'

				        run: mold -run make postgres-v14 -j$(nproc)

				      - name: Build postgres v15

				        if: steps.cache_pg_15.outputs.cache-hit != 'true'

				        run: mold -run make postgres-v15 -j$(nproc)

				      - name: Build postgres v16

				        if: steps.cache_pg_16.outputs.cache-hit != 'true'

				        run: mold -run make postgres-v16 -j$(nproc)

				      - name: Build neon extensions

				        run: mold -run make neon-pg-ext -j$(nproc)

				      - name: Build walproposer-lib

				        run: mold -run make walproposer-lib -j$(nproc)

				      - name: Run cargo build

				        run: |

				          mold -run cargo build --locked $CARGO_FLAGS $CARGO_FEATURES --bins --tests -j$(nproc)

				      - name: Run cargo test

				        env:

				          NEXTEST_RETRIES: 3

				        run: |

				          cargo nextest run $CARGO_FEATURES -j$(nproc)

				          # Run separate tests for real S3

				          export ENABLE_REAL_S3_REMOTE_STORAGE=nonempty

				          export REMOTE_STORAGE_S3_BUCKET=neon-github-ci-tests

				          export REMOTE_STORAGE_S3_REGION=eu-central-1

				          # Avoid `$CARGO_FEATURES` since there's no `testing` feature in the e2e tests now

				          cargo nextest run --package remote_storage --test test_real_s3 -j$(nproc)

				          # Run separate tests for real Azure Blob Storage

				          # XXX: replace region with `eu-central-1`-like region

				          export ENABLE_REAL_AZURE_REMOTE_STORAGE=y

				          export AZURE_STORAGE_ACCOUNT="${{ secrets.AZURE_STORAGE_ACCOUNT_DEV }}"

				          export AZURE_STORAGE_ACCESS_KEY="${{ secrets.AZURE_STORAGE_ACCESS_KEY_DEV }}"

				          export REMOTE_STORAGE_AZURE_CONTAINER="${{ vars.REMOTE_STORAGE_AZURE_CONTAINER }}"

				          export REMOTE_STORAGE_AZURE_REGION="${{ vars.REMOTE_STORAGE_AZURE_REGION }}"

				          # Avoid `$CARGO_FEATURES` since there's no `testing` feature in the e2e tests now

				          cargo nextest run --package remote_storage --test test_real_azure -j$(nproc)

				  check-codestyle-rust-arm:

				    needs: [ check-permissions, build-build-tools-image ]

				    timeout-minutes: 90

				    runs-on: [ self-hosted, small-arm64 ]

				    container:

				      image: ${{ needs.build-build-tools-image.outputs.image }}

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      options: --init

				    strategy:

				      fail-fast: false

				      matrix:

				        build_type: [ debug, release ]

				    steps:

				      - name: Fix git ownership

				        run: |

				          # Workaround for `fatal: detected dubious ownership in repository at ...`

				          #

				          # Use both ${{ github.workspace }} and ${GITHUB_WORKSPACE} because they're different on host and in containers

				          #   Ref https://github.com/actions/checkout/issues/785

				          #

				          git config --global --add safe.directory ${{ github.workspace }}

				          git config --global --add safe.directory ${GITHUB_WORKSPACE}

				          for r in 14 15 16; do

				            git config --global --add safe.directory "${{ github.workspace }}/vendor/postgres-v$r"

				            git config --global --add safe.directory "${GITHUB_WORKSPACE}/vendor/postgres-v$r"

				          done

				      - name: Checkout

				        uses: actions/checkout@v4

				        with:

				          submodules: true

				          fetch-depth: 1

				      # Some of our rust modules use FFI and need those to be checked

				      - name: Get postgres headers

				        run: make postgres-headers -j$(nproc)

				      # cargo hack runs the given cargo subcommand (clippy in this case) for all feature combinations.

				      # This will catch compiler & clippy warnings in all feature combinations.

				      # TODO: use cargo hack for build and test as well, but, that's quite expensive.

				      # NB: keep clippy args in sync with ./run_clippy.sh

				      - run: |

				          CLIPPY_COMMON_ARGS="$( source .neon_clippy_args; echo "$CLIPPY_COMMON_ARGS")"

				          if [ "$CLIPPY_COMMON_ARGS" = "" ]; then

				            echo "No clippy args found in .neon_clippy_args"

				            exit 1

				          fi

				          echo "CLIPPY_COMMON_ARGS=${CLIPPY_COMMON_ARGS}" >> $GITHUB_ENV

				      - name: Run cargo clippy (debug)

				        if: matrix.build_type == 'debug'

				        run: cargo hack --feature-powerset clippy $CLIPPY_COMMON_ARGS

				      - name: Run cargo clippy (release)

				        if: matrix.build_type == 'release'

				        run: cargo hack --feature-powerset clippy --release $CLIPPY_COMMON_ARGS

				      - name: Check documentation generation

				        if: matrix.build_type == 'release'

				        run: cargo doc --workspace --no-deps --document-private-items -j$(nproc)

				        env:

				            RUSTDOCFLAGS: "-Dwarnings -Arustdoc::private_intra_doc_links"

				      # Use `${{ !cancelled() }}` to run quck tests after the longer clippy run

				      - name: Check formatting

				        if: ${{ !cancelled() && matrix.build_type == 'release' }}

				        run: cargo fmt --all -- --check

				      # https://github.com/facebookincubator/cargo-guppy/tree/bec4e0eb29dcd1faac70b1b5360267fc02bf830e/tools/cargo-hakari#2-keep-the-workspace-hack-up-to-date-in-ci

				      - name: Check rust dependencies

				        if: ${{ !cancelled() && matrix.build_type == 'release' }}

				        run: |

				          cargo hakari generate --diff  # workspace-hack Cargo.toml is up-to-date

				          cargo hakari manage-deps --dry-run  # all workspace crates depend on workspace-hack

				      # https://github.com/EmbarkStudios/cargo-deny

				      - name: Check rust licenses/bans/advisories/sources

				        if: ${{ !cancelled() && matrix.build_type == 'release' }}

				        run: cargo deny check

				  gather-rust-build-stats:

				    needs: [ check-permissions, build-build-tools-image ]

				    if: |

				      contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats') ||

				      contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||

				      github.ref_name == 'main'

				    runs-on: [ self-hosted, large ]

				    container:

				      image: ${{ needs.build-build-tools-image.outputs.image }}

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      options: --init

				    env:

				@@ -111,7 +365,7 @@ jobs:

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v3

				        uses: actions/checkout@v4

				        with:

				          submodules: true

				          fetch-depth: 1

				@@ -120,8 +374,11 @@ jobs:

				      - name: Get postgres headers

				        run: make postgres-headers -j$(nproc)

				      - name: Build walproposer-lib

				        run: make walproposer-lib -j$(nproc)

				      - name: Produce the build stats

				        run: cargo build --all --release --timings

				        run: cargo build --all --release --timings -j$(nproc)

				      - name: Upload the build stats

				        id: upload-stats

				@@ -136,7 +393,7 @@ jobs:

				          echo "report-url=${REPORT_URL}" >> $GITHUB_OUTPUT

				      - name: Publish build stats report

				        uses: actions/github-script@v6

				        uses: actions/github-script@v7

				        env:

				          REPORT_URL: ${{ steps.upload-stats.outputs.report-url }}

				          SHA: ${{ github.event.pull_request.head.sha || github.sha }}

									
										15

.github/workflows/pg_clients.yml
									
										vendored
									
												View File
												
				@@ -14,13 +14,13 @@ on:

				concurrency:

				  # Allow only one workflow per any non-`main` branch.

				  group: ${{ github.workflow }}-${{ github.ref }}-${{ github.ref == 'refs/heads/main' && github.sha || 'anysha' }}

				  group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}

				  cancel-in-progress: true

				jobs:

				  test-postgres-client-libs:

				    # TODO: switch to gen2 runner, requires docker

				    runs-on: [ ubuntu-latest ]

				    runs-on: ubuntu-22.04

				    env:

				      DEFAULT_PG_VERSION: 14

				@@ -28,7 +28,7 @@ jobs:

				    steps:

				    - name: Checkout

				      uses: actions/checkout@v3

				      uses: actions/checkout@v4

				    - uses: actions/setup-python@v4

				      with:

				@@ -38,11 +38,10 @@ jobs:

				      uses: snok/install-poetry@v1

				    - name: Cache poetry deps

				      id: cache_poetry

				      uses: actions/cache@v3

				      uses: actions/cache@v4

				      with:

				        path: ~/.cache/pypoetry/virtualenvs

				        key: v1-${{ runner.os }}-python-deps-${{ hashFiles('poetry.lock') }}

				        key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-ubunutu-latest-${{ hashFiles('poetry.lock') }}

				    - name: Install Python deps

				      shell: bash -euxo pipefail {0}

				@@ -83,10 +82,10 @@ jobs:

				    # It will be fixed after switching to gen2 runner

				    - name: Upload python test logs

				      if: always()

				      uses: actions/upload-artifact@v3

				      uses: actions/upload-artifact@v4

				      with:

				        retention-days: 7

				        name: python-test-pg_clients-${{ runner.os }}-stage-logs

				        name: python-test-pg_clients-${{ runner.os }}-${{ runner.arch }}-stage-logs

				        path: ${{ env.TEST_OUTPUT }}

				    - name: Post to a Slack channel

									
										73

.github/workflows/pin-build-tools-image.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,73 @@

				name: 'Pin build-tools image'

				on:

				  workflow_dispatch:

				    inputs:

				      from-tag:

				        description: 'Source tag'

				        required: true

				        type: string

				  workflow_call:

				    inputs:

				      from-tag:

				        description: 'Source tag'

				        required: true

				        type: string

				defaults:

				  run:

				    shell: bash -euo pipefail {0}

				concurrency:

				  group: pin-build-tools-image-${{ inputs.from-tag }}

				  cancel-in-progress: false

				permissions: {}

				jobs:

				  tag-image:

				    runs-on: ubuntu-22.04

				    env:

				      FROM_TAG: ${{ inputs.from-tag }}

				      TO_TAG: pinned

				    steps:

				      - name: Check if we really need to pin the image

				        id: check-manifests

				        run: |

				          docker manifest inspect neondatabase/build-tools:${FROM_TAG} > ${FROM_TAG}.json

				          docker manifest inspect neondatabase/build-tools:${TO_TAG}   > ${TO_TAG}.json

				          if diff ${FROM_TAG}.json ${TO_TAG}.json; then

				            skip=true

				          else

				            skip=false

				          fi

				          echo "skip=${skip}" | tee -a $GITHUB_OUTPUT

				      - uses: docker/login-action@v3

				        if: steps.check-manifests.outputs.skip == 'false'

				        with:

				          username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				          password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      - name: Tag build-tools with `${{ env.TO_TAG }}` in Docker Hub

				        if: steps.check-manifests.outputs.skip == 'false'

				        run: |

				          docker buildx imagetools create -t neondatabase/build-tools:${TO_TAG} \

				                                             neondatabase/build-tools:${FROM_TAG}

				      - uses: docker/login-action@v3

				        if: steps.check-manifests.outputs.skip == 'false'

				        with:

				          registry: 369495373322.dkr.ecr.eu-central-1.amazonaws.com

				          username: ${{ secrets.AWS_ACCESS_KEY_DEV }}

				          password: ${{ secrets.AWS_SECRET_KEY_DEV }}

				      - name: Tag build-tools with `${{ env.TO_TAG }}` in ECR

				        if: steps.check-manifests.outputs.skip == 'false'

				        run: |

				          docker buildx imagetools create -t 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:${TO_TAG} \

				                                             neondatabase/build-tools:${FROM_TAG}

									
										29

.github/workflows/release-notify.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,29 @@

				name: Notify Slack channel about upcoming release

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.number }}

				  cancel-in-progress: true

				on:

				  pull_request:

				    branches:

				      - release

				    types:

				      # Default types that triggers a workflow:

				      # - https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request

				      - opened

				      - synchronize

				      - reopened

				      # Additional types that we want to handle:

				      - closed

				jobs:

				  notify:

				    runs-on: ubuntu-22.04

				    steps:

				      - uses: neondatabase/dev-actions/release-pr-notify@main

				        with:

				          slack-token: ${{ secrets.SLACK_BOT_TOKEN }}

				          slack-channel-id: ${{ vars.SLACK_UPCOMING_RELEASE_CHANNEL_ID || 'C05QQ9J1BRC' }} # if not set, then `#test-release-notifications`

				          github-token: ${{ secrets.GITHUB_TOKEN }}

									
										102

.github/workflows/release.yml
									
										vendored
									
												View File
												
				@@ -2,32 +2,106 @@ name: Create Release Branch

				on:

				  schedule:

				    - cron: '0 10 * * 2'

				    # It should be kept in sync with if-condition in jobs

				    - cron: '0 6 * * MON' # Storage release

				    - cron: '0 6 * * THU' # Proxy release

				  workflow_dispatch:

				    inputs:

				      create-storage-release-branch:

				        type: boolean

				        description: 'Create Storage release PR'

				        required: false

				      create-proxy-release-branch:

				        type: boolean

				        description: 'Create Proxy release PR'

				        required: false

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				defaults:

				  run:

				    shell: bash -euo pipefail {0}

				jobs:

				  create_release_branch:

				    runs-on: [ubuntu-latest]

				  create-storage-release-branch:

				    if: ${{ github.event.schedule == '0 6 * * MON' || format('{0}', inputs.create-storage-release-branch) == 'true' }}

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: write # for `git push`

				    steps:

				    - name: Check out code

				      uses: actions/checkout@v3

				      uses: actions/checkout@v4

				      with:

				        ref: main

				    - name: Get current date

				      id: date

				      run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_OUTPUT

				    - name: Set environment variables

				      run: |

				        echo "RELEASE_DATE=$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV

				        echo "RELEASE_BRANCH=rc/$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV

				    - name: Create release branch

				      run: git checkout -b releases/${{ steps.date.outputs.date }}

				      run: git checkout -b $RELEASE_BRANCH

				    - name: Push new branch

				      run: git push origin releases/${{ steps.date.outputs.date }}

				      run: git push origin $RELEASE_BRANCH

				    - name: Create pull request into release

				      uses: thomaseizinger/create-pull-request@e3972219c86a56550fb70708d96800d8e24ba862 # 1.3.0

				      env:

				        GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				      run: |

				        TITLE="Storage & Compute release ${RELEASE_DATE}"

				        cat << EOF > body.md

				          ## ${TITLE}

				          **Please merge this Pull Request using 'Create a merge commit' button**

				        EOF

				        gh pr create --title "${TITLE}" \

				                     --body-file "body.md" \

				                     --head "${RELEASE_BRANCH}" \

				                     --base "release"

				  create-proxy-release-branch:

				    if: ${{ github.event.schedule == '0 6 * * THU' || format('{0}', inputs.create-proxy-release-branch) == 'true' }}

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: write # for `git push`

				    steps:

				    - name: Check out code

				      uses: actions/checkout@v4

				      with:

				        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        head: releases/${{ steps.date.outputs.date }}

				        base: release

				        title: Release ${{ steps.date.outputs.date }}

				        ref: main

				    - name: Set environment variables

				      run: |

				        echo "RELEASE_DATE=$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV

				        echo "RELEASE_BRANCH=rc/proxy/$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV

				    - name: Create release branch

				      run: git checkout -b $RELEASE_BRANCH

				    - name: Push new branch

				      run: git push origin $RELEASE_BRANCH

				    - name: Create pull request into release

				      env:

				        GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				      run: |

				        TITLE="Proxy release ${RELEASE_DATE}"

				        cat << EOF > body.md

				          ## ${TITLE}

				          **Please merge this Pull Request using 'Create a merge commit' button**

				        EOF

				        gh pr create --title "${TITLE}" \

				                     --body-file "body.md" \

				                     --head "${RELEASE_BRANCH}" \

				                     --base "release-proxy"

									
										133

.github/workflows/trigger-e2e-tests.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,133 @@

				name: Trigger E2E Tests

				on:

				  pull_request:

				    types:

				      - ready_for_review

				  workflow_call:

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				env:

				  # A concurrency group that we use for e2e-tests runs, matches `concurrency.group` above with `github.repository` as a prefix

				  E2E_CONCURRENCY_GROUP: ${{ github.repository }}-e2e-tests-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}

				  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}

				  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}

				jobs:

				  cancel-previous-e2e-tests:

				    if: github.event_name == 'pull_request'

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Cancel previous e2e-tests runs for this PR

				        env:

				          GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				        run: |

				          gh workflow --repo neondatabase/cloud \

				            run cancel-previous-in-concurrency-group.yml \

				              --field concurrency_group="${{ env.E2E_CONCURRENCY_GROUP }}"

				  tag:

				    runs-on: ubuntu-22.04

				    outputs:

				      build-tag: ${{ steps.build-tag.outputs.tag }}

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v4

				        with:

				          fetch-depth: 0

				      - name: Get build tag

				        env:

				          GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				          CURRENT_BRANCH: ${{ github.head_ref || github.ref_name }}

				          CURRENT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}

				        run: |

				          if [[ "$GITHUB_REF_NAME" == "main" ]]; then

				            echo "tag=$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT

				          elif [[ "$GITHUB_REF_NAME" == "release" ]]; then

				            echo "tag=release-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT

				          elif [[ "$GITHUB_REF_NAME" == "release-proxy" ]]; then

				            echo "tag=release-proxy-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT

				          else

				            echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main' or 'release'"

				            BUILD_AND_TEST_RUN_ID=$(gh run list -b $CURRENT_BRANCH -c $CURRENT_SHA -w 'Build and Test' -L 1 --json databaseId --jq '.[].databaseId')

				            echo "tag=$BUILD_AND_TEST_RUN_ID" | tee -a $GITHUB_OUTPUT

				          fi

				        id: build-tag

				  trigger-e2e-tests:

				    needs: [ tag ]

				    runs-on: ubuntu-22.04

				    env:

				      TAG: ${{ needs.tag.outputs.build-tag }}

				    steps:

				      - name: check if ecr image are present

				        env:

				          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}

				          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}

				        run: |

				          for REPO in neon compute-tools compute-node-v14 vm-compute-node-v14 compute-node-v15 vm-compute-node-v15 compute-node-v16 vm-compute-node-v16; do

				            OUTPUT=$(aws ecr describe-images --repository-name ${REPO} --region eu-central-1 --query "imageDetails[?imageTags[?contains(@, '${TAG}')]]" --output text)

				            if [ "$OUTPUT" == "" ]; then

				              echo "$REPO with image tag $TAG not found" >> $GITHUB_OUTPUT

				              exit 1

				            fi

				          done

				      - name: Set e2e-platforms

				        id: e2e-platforms

				        env:

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        run: |

				          # Default set of platforms to run e2e tests on

				          platforms='["docker", "k8s"]'

				          # If the PR changes vendor/, pgxn/ or libs/vm_monitor/ directories, or Dockerfile.compute-node, add k8s-neonvm to the list of platforms.

				          # If the workflow run is not a pull request, add k8s-neonvm to the list.

				          if [ "$GITHUB_EVENT_NAME" == "pull_request" ]; then

				            for f in $(gh api "/repos/${GITHUB_REPOSITORY}/pulls/${PR_NUMBER}/files" --paginate --jq '.[].filename'); do

				              case "$f" in

				                vendor/*|pgxn/*|libs/vm_monitor/*|Dockerfile.compute-node)

				                  platforms=$(echo "${platforms}" | jq --compact-output '. += ["k8s-neonvm"] | unique')

				                  ;;

				                *)

				                  # no-op

				                  ;;

				              esac

				            done

				          else

				            platforms=$(echo "${platforms}" | jq --compact-output '. += ["k8s-neonvm"] | unique')

				          fi

				          echo "e2e-platforms=${platforms}" | tee -a $GITHUB_OUTPUT

				      - name: Set PR's status to pending and request a remote CI test

				        env:

				          E2E_PLATFORMS: ${{ steps.e2e-platforms.outputs.e2e-platforms }}

				          COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}

				          GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				        run: |

				          REMOTE_REPO="${GITHUB_REPOSITORY_OWNER}/cloud"

				          gh api "/repos/${GITHUB_REPOSITORY}/statuses/${COMMIT_SHA}" \

				            --method POST \

				            --raw-field "state=pending" \

				            --raw-field "description=[$REMOTE_REPO] Remote CI job is about to start" \

				            --raw-field "context=neon-cloud-e2e"

				          gh workflow --repo ${REMOTE_REPO} \

				            run testing.yml \

				              --ref "main" \

				              --raw-field "ci_job_name=neon-cloud-e2e" \

				              --raw-field "commit_hash=$COMMIT_SHA" \

				              --raw-field "remote_repo=${GITHUB_REPOSITORY}" \

				              --raw-field "storage_image_tag=${TAG}" \

				              --raw-field "compute_image_tag=${TAG}" \

				              --raw-field "concurrency_group=${E2E_CONCURRENCY_GROUP}" \

				              --raw-field "e2e-platforms=${E2E_PLATFORMS}"

5

.gitignore vendored

View File

@@ -6,8 +6,10 @@ __pycache__/
 test_output/
 .vscode
 .idea
 neon.iml
 /.neon
 /integration_tests/.neon
 compaction-suite-results.*
 # Coverage
 *.profraw
@@ -18,3 +20,6 @@ test_output/
 *.o
 *.so
 *.Po
 # pgindent typedef lists
 *.list

4

.gitmodules vendored

View File

@@ -6,3 +6,7 @@
 	path = vendor/postgres-v15
 	url = https://github.com/neondatabase/postgres.git
 	branch = REL_15_STABLE_neon
 [submodule "vendor/postgres-v16"]
 	path = vendor/postgres-v16
 	url = https://github.com/neondatabase/postgres.git
 	branch = REL_16_STABLE_neon

5

.neon_clippy_args Normal file

View File

@@ -0,0 +1,5 @@
 # * `-A unknown_lints` – do not warn about unknown lint suppressions
 #                        that people with newer toolchains might use
 # * `-D warnings`      - fail on any warnings (`cargo` returns non-zero exit status)
 # * `-D clippy::todo`  - don't let `todo!()` slip into `main`
 export CLIPPY_COMMON_ARGS="--locked --workspace --all-targets -- -A unknown_lints -D warnings -D clippy::todo"

18

CODEOWNERS

View File

@@ -1,11 +1,13 @@
 /compute_tools/ @neondatabase/control-plane
 /control_plane/ @neondatabase/compute @neondatabase/storage
 /libs/pageserver_api/ @neondatabase/compute @neondatabase/storage
 /libs/postgres_ffi/ @neondatabase/compute
 /libs/remote_storage/ @neondatabase/storage
 /libs/safekeeper_api/ @neondatabase/safekeepers
 /pageserver/ @neondatabase/compute @neondatabase/storage
 /compute_tools/ @neondatabase/control-plane @neondatabase/compute
 /storage_controller @neondatabase/storage
 /libs/pageserver_api/ @neondatabase/storage
 /libs/postgres_ffi/ @neondatabase/compute @neondatabase/safekeepers
 /libs/remote_storage/ @neondatabase/storage
 /libs/safekeeper_api/ @neondatabase/safekeepers
 /libs/vm_monitor/ @neondatabase/autoscaling
 /pageserver/ @neondatabase/storage
 /pgxn/ @neondatabase/compute
 /proxy/ @neondatabase/control-plane
 /pgxn/neon/ @neondatabase/compute @neondatabase/safekeepers
 /proxy/ @neondatabase/proxy
 /safekeeper/ @neondatabase/safekeepers
 /vendor/ @neondatabase/compute

									
										57

CONTRIBUTING.md
									
												View File
												
				@@ -2,13 +2,31 @@

				Howdy! Usual good software engineering practices apply. Write

				tests. Write comments. Follow standard Rust coding practices where

				possible. Use 'cargo fmt' and 'clippy' to tidy up formatting.

				possible. Use `cargo fmt` and `cargo clippy` to tidy up formatting.

				There are soft spots in the code, which could use cleanup,

				refactoring, additional comments, and so forth. Let's try to raise the

				bar, and clean things up as we go. Try to leave code in a better shape

				than it was before.

				## Pre-commit hook

				We have a sample pre-commit hook in `pre-commit.py`.

				To set it up, run:

				```bash

				ln -s ../../pre-commit.py .git/hooks/pre-commit

				```

				This will run following checks on staged files before each commit:

				- `rustfmt`

				- checks for Python files, see [obligatory checks](/docs/sourcetree.md#obligatory-checks).

				There is also a separate script `./run_clippy.sh` that runs `cargo clippy` on the whole project

				and `./scripts/reformat` that runs all formatting tools to ensure the project is up to date.

				If you want to skip the hook, run `git commit` with `--no-verify` option.

				## Submitting changes

				1. Get at least one +1 on your PR before you push.

				@@ -27,3 +45,40 @@ your patch's fault. Help to fix the root cause if something else has

				broken the CI, before pushing.

				*Happy Hacking!*

				# How to run a CI pipeline on Pull Requests from external contributors

				_An instruction for maintainers_

				## TL;DR:

				- Review the PR

				- If and only if it looks **safe** (i.e. it doesn't contain any malicious code which could expose secrets or harm the CI), then:

				    - Press the "Approve and run" button in GitHub UI

				    - Add the `approved-for-ci-run` label to the PR

				    - Currently draft PR will skip e2e test (only for internal contributors). After turning the PR 'Ready to Review' CI will trigger e2e test

				      - Add `run-e2e-tests-in-draft` label to run e2e test in draft PR (override above behaviour)

				      - The `approved-for-ci-run` workflow will add `run-e2e-tests-in-draft` automatically to run e2e test for external contributors

				Repeat all steps after any change to the PR.

				- When the changes are ready to get merged — merge the original PR (not the internal one)

				## Longer version:

				GitHub Actions triggered by the `pull_request` event don't share repository secrets with the forks (for security reasons).

				So, passing the CI pipeline on Pull Requests from external contributors is impossible.

				We're using the following approach to make it work:

				- After the review, assign the `approved-for-ci-run` label to the PR if changes look safe

				- A GitHub Action will create an internal branch and a new PR with the same changes (for example, for a PR `#1234`, it'll be a branch `ci-run/pr-1234`)

				- Because the PR is created from the internal branch, it is able to access repository secrets (that's why it's crucial to make sure that the PR doesn't contain any malicious code that could expose our secrets or intentionally harm the CI)

				- The label gets removed automatically, so to run CI again with new changes, the label should be added again (after the review)

				For details see [`approved-for-ci-run.yml`](.github/workflows/approved-for-ci-run.yml)

				## How do I make build-tools image "pinned"

				It's possible to update the `pinned` tag of the `build-tools` image using the `pin-build-tools-image.yml` workflow.

				```bash

				gh workflow -R neondatabase/neon run pin-build-tools-image.yml \

				            -f from-tag=cc98d9b00d670f182c507ae3783342bd7e64c31e

				```

4975

Cargo.lock generated

View File

File diff suppressed because it is too large Load Diff

									
										196

Cargo.toml
									
												View File
												
				@@ -1,14 +1,38 @@

				[workspace]

				resolver = "2"

				members = [

				    "compute_tools",

				    "control_plane",

				    "control_plane/storcon_cli",

				    "pageserver",

				    "pageserver/compaction",

				    "pageserver/ctl",

				    "pageserver/client",

				    "pageserver/pagebench",

				    "proxy",

				    "safekeeper",

				    "storage_broker",

				    "storage_controller",

				    "storage_scrubber",

				    "workspace_hack",

				    "trace",

				    "libs/*",

				    "libs/compute_api",

				    "libs/pageserver_api",

				    "libs/postgres_ffi",

				    "libs/safekeeper_api",

				    "libs/desim",

				    "libs/utils",

				    "libs/consumption_metrics",

				    "libs/postgres_backend",

				    "libs/pq_proto",

				    "libs/tenant_size_model",

				    "libs/metrics",

				    "libs/postgres_connection",

				    "libs/remote_storage",

				    "libs/tracing-utils",

				    "libs/postgres_ffi/wal_craft",

				    "libs/vm_monitor",

				    "libs/walproposer",

				]

				[workspace.package]

				@@ -17,146 +41,210 @@ license = "Apache-2.0"

				## All dependency versions, used in the project

				[workspace.dependencies]

				ahash = "0.8"

				anyhow = { version = "1.0", features = ["backtrace"] }

				arc-swap = "1.6"

				async-compression = { version = "0.4.0", features = ["tokio", "gzip", "zstd"] }

				atomic-take = "1.1.0"

				azure_core = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls", "hmac_rust"] }

				azure_identity = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }

				azure_storage = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }

				azure_storage_blobs = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }

				flate2 = "1.0.26"

				async-stream = "0.3"

				async-trait = "0.1"

				atty = "0.2.14"

				aws-config = { version = "0.51.0", default-features = false, features=["rustls"] }

				aws-sdk-s3 = "0.21.0"

				aws-smithy-http = "0.51.0"

				aws-types = "0.51.0"

				aws-config = { version = "1.3", default-features = false, features=["rustls"] }

				aws-sdk-s3 = "1.26"

				aws-sdk-iam = "1.15.0"

				aws-smithy-async = { version = "1.2.1", default-features = false, features=["rt-tokio"] }

				aws-smithy-types = "1.1.9"

				aws-credential-types = "1.2.0"

				aws-sigv4 = { version = "1.2.1", features = ["sign-http"] }

				aws-types = "1.2.0"

				axum = { version = "0.6.20", features = ["ws"] }

				base64 = "0.13.0"

				bincode = "1.3"

				bindgen = "0.61"

				bindgen = "0.65"

				bstr = "1.0"

				byteorder = "1.4"

				bytes = "1.0"

				camino = "1.1.6"

				cfg-if = "1.0.0"

				chrono = { version = "0.4", default-features = false, features = ["clock"] }

				clap = { version = "4.0", features = ["derive"] }

				close_fds = "0.3.2"

				comfy-table = "6.1"

				const_format = "0.2"

				crc32c = "0.6"

				crossbeam-deque = "0.8.5"

				crossbeam-utils = "0.8.5"

				dashmap = { version = "5.5.0", features = ["raw-api"] }

				either = "1.8"

				enum-map = "2.4.2"

				enumset = "1.0.12"

				fail = "0.5.0"

				fallible-iterator = "0.2"

				framed-websockets = { version = "0.1.0", git = "https://github.com/neondatabase/framed-websockets" }

				fs2 = "0.4.3"

				futures = "0.3"

				futures-core = "0.3"

				futures-util = "0.3"

				git-version = "0.3"

				hashbrown = "0.13"

				hashlink = "0.8.1"

				hashbrown = "0.14"

				hashlink = "0.9.1"

				hdrhistogram = "7.5.2"

				hex = "0.4"

				hex-literal = "0.3"

				hex-literal = "0.4"

				hmac = "0.12.1"

				hostname = "0.3.1"

				http = {version = "1.1.0", features = ["std"]}

				http-types = { version = "2", default-features = false }

				humantime = "2.1"

				humantime-serde = "1.1.1"

				hyper = "0.14"

				hyper-tungstenite = "0.9"

				tokio-tungstenite = "0.20.0"

				indexmap = "2"

				inotify = "0.10.2"

				ipnet = "2.9.0"

				itertools = "0.10"

				jsonwebtoken = "8"

				jsonwebtoken = "9"

				lasso = "0.7"

				leaky-bucket = "1.0.1"

				libc = "0.2"

				md5 = "0.7.0"

				measured = { version = "0.0.21", features=["lasso"] }

				measured-process = { version = "0.0.21" }

				memoffset = "0.8"

				nix = "0.26"

				notify = "5.0.0"

				nix = { version = "0.27", features = ["fs", "process", "socket", "signal", "poll"] }

				notify = "6.0.0"

				num_cpus = "1.15"

				num-traits = "0.2.15"

				once_cell = "1.13"

				opentelemetry = "0.18.0"

				opentelemetry-otlp = { version = "0.11.0", default_features=false, features = ["http-proto", "trace", "http", "reqwest-client"] }

				opentelemetry-semantic-conventions = "0.10.0"

				opentelemetry = "0.20.0"

				opentelemetry-otlp = { version = "0.13.0", default-features=false, features = ["http-proto", "trace", "http", "reqwest-client"] }

				opentelemetry-semantic-conventions = "0.12.0"

				parking_lot = "0.12"

				parquet = { version = "51.0.0", default-features = false, features = ["zstd"] }

				parquet_derive = "51.0.0"

				pbkdf2 = { version = "0.12.1", features = ["simple", "std"] }

				pin-project-lite = "0.2"

				prometheus = {version = "0.13", default_features=false, features = ["process"]} # removes protobuf dependency

				procfs = "0.14"

				prometheus = {version = "0.13", default-features=false, features = ["process"]} # removes protobuf dependency

				prost = "0.11"

				rand = "0.8"

				regex = "1.4"

				reqwest = { version = "0.11", default-features = false, features = ["rustls-tls"] }

				reqwest-tracing = { version = "0.4.0", features = ["opentelemetry_0_18"] }

				reqwest-middleware = "0.2.0"

				redis = { version = "0.25.2", features = ["tokio-rustls-comp", "keep-alive"] }

				regex = "1.10.2"

				reqwest = { version = "0.12", default-features = false, features = ["rustls-tls"] }

				reqwest-tracing = { version = "0.5", features = ["opentelemetry_0_20"] }

				reqwest-middleware = "0.3.0"

				reqwest-retry = "0.5"

				routerify = "3"

				rpds = "0.12.0"

				rustls = "0.20"

				rustls-pemfile = "1"

				rpds = "0.13"

				rustc-hash = "1.1.0"

				rustls = "0.22"

				rustls-pemfile = "2"

				rustls-split = "0.3"

				scopeguard = "1.1"

				sentry = { version = "0.29", default-features = false, features = ["backtrace", "contexts", "panic", "rustls", "reqwest" ] }

				sysinfo = "0.29.2"

				sd-notify = "0.4.1"

				sentry = { version = "0.32", default-features = false, features = ["backtrace", "contexts", "panic", "rustls", "reqwest" ] }

				serde = { version = "1.0", features = ["derive"] }

				serde_json = "1"

				serde_path_to_error = "0.1"

				serde_with = "2.0"

				serde_assert = "0.5.0"

				sha2 = "0.10.2"

				signal-hook = "0.3"

				socket2 = "0.4.4"

				smallvec = "1.11"

				smol_str = { version = "0.2.0", features = ["serde"] }

				socket2 = "0.5"

				strum = "0.24"

				strum_macros = "0.24"

				svg_fmt = "0.4.1"

				"subtle"  = "2.5.0"

				# Our PR https://github.com/nical/rust_debug/pull/4 has been merged but no new version released yet

				svg_fmt = { git = "https://github.com/nical/rust_debug", rev = "28a7d96eecff2f28e75b1ea09f2d499a60d0e3b4" }

				sync_wrapper = "0.1.2"

				tar = "0.4"

				task-local-extensions = "0.1.4"

				test-context = "0.3"

				thiserror = "1.0"

				tls-listener = { version = "0.6", features = ["rustls", "hyper-h1"] }

				tikv-jemallocator = "0.5"

				tikv-jemalloc-ctl = "0.5"

				tokio = { version = "1.17", features = ["macros"] }

				tokio-postgres-rustls = "0.9.0"

				tokio-rustls = "0.23"

				tokio-epoll-uring = { git = "https://github.com/neondatabase/tokio-epoll-uring.git" , branch = "main" }

				tokio-io-timeout = "1.2.0"

				tokio-postgres-rustls = "0.11.0"

				tokio-rustls = "0.25"

				tokio-stream = "0.1"

				tokio-util = { version = "0.7", features = ["io"] }

				toml = "0.5"

				toml_edit = { version = "0.17", features = ["easy"] }

				tonic = {version = "0.8", features = ["tls", "tls-roots"]}

				tokio-tar = "0.3"

				tokio-util = { version = "0.7.10", features = ["io", "rt"] }

				toml = "0.7"

				toml_edit = "0.19"

				tonic = {version = "0.9", features = ["tls", "tls-roots"]}

				tower-service = "0.3.2"

				tracing = "0.1"

				tracing-opentelemetry = "0.18.0"

				tracing-subscriber = { version = "0.3", features = ["env-filter"] }

				tracing-error = "0.2.0"

				tracing-opentelemetry = "0.21.0"

				tracing-subscriber = { version = "0.3", default-features = false, features = ["smallvec", "fmt", "tracing-log", "std", "env-filter", "json", "ansi"] }

				twox-hash = { version = "1.6.3", default-features = false }

				url = "2.2"

				uuid = { version = "1.2", features = ["v4", "serde"] }

				urlencoding = "2.1"

				uuid = { version = "1.6.1", features = ["v4", "v7", "serde"] }

				walkdir = "2.3.2"

				webpki-roots = "0.22.5"

				x509-parser = "0.14"

				rustls-native-certs = "0.7"

				x509-parser = "0.15"

				## TODO replace this with tracing

				env_logger = "0.10"

				log = "0.4"

				## Libraries from neondatabase/ git forks, ideally with changes to be upstreamed

				postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="43e6db254a97fdecbce33d8bc0890accfd74495e" }

				postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", rev="43e6db254a97fdecbce33d8bc0890accfd74495e" }

				postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", rev="43e6db254a97fdecbce33d8bc0890accfd74495e" }

				tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="43e6db254a97fdecbce33d8bc0890accfd74495e" }

				tokio-tar = { git = "https://github.com/neondatabase/tokio-tar.git", rev="404df61437de0feef49ba2ccdbdd94eb8ad6e142" }

				postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }

				postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }

				postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }

				tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }

				## Other git libraries

				heapless = { default-features=false, features=[], git = "https://github.com/japaric/heapless.git", rev = "644653bf3b831c6bb4963be2de24804acf5e5001" } # upstream release pending

				## Local libraries

				compute_api = { version = "0.1", path = "./libs/compute_api/" }

				consumption_metrics = { version = "0.1", path = "./libs/consumption_metrics/" }

				metrics = { version = "0.1", path = "./libs/metrics/" }

				pageserver_api = { version = "0.1", path = "./libs/pageserver_api/" }

				pageserver_client = { path = "./pageserver/client" }

				pageserver_compaction = { version = "0.1", path = "./pageserver/compaction/" }

				postgres_backend = { version = "0.1", path = "./libs/postgres_backend/" }

				postgres_connection = { version = "0.1", path = "./libs/postgres_connection/" }

				postgres_ffi = { version = "0.1", path = "./libs/postgres_ffi/" }

				pq_proto = { version = "0.1", path = "./libs/pq_proto/" }

				remote_storage = { version = "0.1", path = "./libs/remote_storage/" }

				safekeeper_api = { version = "0.1", path = "./libs/safekeeper_api" }

				desim = { version = "0.1", path = "./libs/desim" }

				storage_broker = { version = "0.1", path = "./storage_broker/" } # Note: main broker code is inside the binary crate, so linking with the library shouldn't be heavy.

				tenant_size_model = { version = "0.1", path = "./libs/tenant_size_model/" }

				tracing-utils = { version = "0.1", path = "./libs/tracing-utils/" }

				utils = { version = "0.1", path = "./libs/utils/" }

				vm_monitor = { version = "0.1", path = "./libs/vm_monitor/" }

				walproposer = { version = "0.1", path = "./libs/walproposer/" }

				## Common library dependency

				workspace_hack = { version = "0.1", path = "./workspace_hack/" }

				## Build dependencies

				criterion = "0.4"

				rcgen = "0.10"

				rstest = "0.16"

				tempfile = "3.4"

				tonic-build = "0.8"

				criterion = "0.5.1"

				rcgen = "0.12"

				rstest = "0.18"

				camino-tempfile = "1.0.2"

				tonic-build = "0.9"

				# This is only needed for proxy's tests.

				# TODO: we should probably fork `tokio-postgres-rustls` instead.

				[patch.crates-io]

				tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="43e6db254a97fdecbce33d8bc0890accfd74495e" }

				# Needed to get `tokio-postgres-rustls` to depend on our fork.

				tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }

				# bug fixes for UUID

				parquet = { git = "https://github.com/apache/arrow-rs", branch = "master" }

				parquet_derive = { git = "https://github.com/apache/arrow-rs", branch = "master" }

				################# Binary contents sections

									
										35

Dockerfile
									
												View File
												
				@@ -2,8 +2,8 @@

				### The image itself is mainly used as a container for the binaries and for starting e2e tests with custom parameters.

				### By default, the binaries inside the image have some mock parameters and can start, but are not intended to be used

				### inside this image in the real deployments.

				ARG REPOSITORY=369495373322.dkr.ecr.eu-central-1.amazonaws.com

				ARG IMAGE=rust

				ARG REPOSITORY=neondatabase

				ARG IMAGE=build-tools

				ARG TAG=pinned

				# Build Postgres

				@@ -12,6 +12,7 @@ WORKDIR /home/nonroot

				COPY --chown=nonroot vendor/postgres-v14 vendor/postgres-v14

				COPY --chown=nonroot vendor/postgres-v15 vendor/postgres-v15

				COPY --chown=nonroot vendor/postgres-v16 vendor/postgres-v16

				COPY --chown=nonroot pgxn pgxn

				COPY --chown=nonroot Makefile Makefile

				COPY --chown=nonroot scripts/ninstall.sh scripts/ninstall.sh

				@@ -26,6 +27,7 @@ RUN set -e \

				FROM $REPOSITORY/$IMAGE:$TAG AS build

				WORKDIR /home/nonroot

				ARG GIT_VERSION=local

				ARG BUILD_TAG

				# Enable https://github.com/paritytech/cachepot to cache Rust crates' compilation results in Docker builds.

				# Set up cachepot to use an AWS S3 bucket for cache results, to reuse it between `docker build` invocations.

				@@ -39,12 +41,22 @@ ARG CACHEPOT_BUCKET=neon-github-dev

				COPY --from=pg-build /home/nonroot/pg_install/v14/include/postgresql/server pg_install/v14/include/postgresql/server

				COPY --from=pg-build /home/nonroot/pg_install/v15/include/postgresql/server pg_install/v15/include/postgresql/server

				COPY . .

				COPY --from=pg-build /home/nonroot/pg_install/v16/include/postgresql/server pg_install/v16/include/postgresql/server

				COPY --chown=nonroot . .

				# Show build caching stats to check if it was used in the end.

				# Has to be the part of the same RUN since cachepot daemon is killed in the end of this RUN, losing the compilation stats.

				RUN set -e \

				&& mold -run cargo build --bin pageserver --bin pageserver_binutils --bin draw_timeline_dir --bin safekeeper --bin storage_broker --bin proxy --locked --release \

				    && RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=mold -Clink-arg=-Wl,--no-rosegment" cargo build  \

				      --bin pg_sni_router  \

				      --bin pageserver  \

				      --bin pagectl  \

				      --bin safekeeper  \

				      --bin storage_broker  \

				      --bin storage_controller  \

				      --bin proxy  \

				      --bin neon_local \

				      --locked --release \

				    && cachepot -s

				# Build final image

				@@ -57,21 +69,23 @@ RUN set -e \

				    && apt install -y \

				        libreadline-dev \

				        libseccomp-dev \

				        openssl \

				        ca-certificates \

				    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* \

				    && useradd -d /data neon \

				    && chown -R neon:neon /data

				COPY --from=build --chown=neon:neon /home/nonroot/target/release/pg_sni_router       /usr/local/bin

				COPY --from=build --chown=neon:neon /home/nonroot/target/release/pageserver          /usr/local/bin

				COPY --from=build --chown=neon:neon /home/nonroot/target/release/pageserver_binutils /usr/local/bin

				COPY --from=build --chown=neon:neon /home/nonroot/target/release/draw_timeline_dir   /usr/local/bin

				COPY --from=build --chown=neon:neon /home/nonroot/target/release/pagectl             /usr/local/bin

				COPY --from=build --chown=neon:neon /home/nonroot/target/release/safekeeper          /usr/local/bin

				COPY --from=build --chown=neon:neon /home/nonroot/target/release/storage_broker         /usr/local/bin

				COPY --from=build --chown=neon:neon /home/nonroot/target/release/storage_broker      /usr/local/bin

				COPY --from=build --chown=neon:neon /home/nonroot/target/release/storage_controller  /usr/local/bin

				COPY --from=build --chown=neon:neon /home/nonroot/target/release/proxy               /usr/local/bin

				COPY --from=build --chown=neon:neon /home/nonroot/target/release/neon_local          /usr/local/bin

				COPY --from=pg-build /home/nonroot/pg_install/v14 /usr/local/v14/

				COPY --from=pg-build /home/nonroot/pg_install/v15 /usr/local/v15/

				COPY --from=pg-build /home/nonroot/pg_install/v16 /usr/local/v16/

				COPY --from=pg-build /home/nonroot/postgres_install.tar.gz /data/

				# By default, pageserver uses `.neon/` working directory in WORKDIR, so create one and fill it with the dummy config.

				@@ -84,6 +98,11 @@ RUN mkdir -p /data/.neon/ && chown -R neon:neon /data/.neon/ \

				       -c "listen_pg_addr='0.0.0.0:6400'" \

				       -c "listen_http_addr='0.0.0.0:9898'"

				# When running a binary that links with libpq, default to using our most recent postgres version.  Binaries

				# that want a particular postgres version will select it explicitly: this is just a default.

				ENV LD_LIBRARY_PATH /usr/local/v16/lib

				VOLUME ["/data"]

				USER neon

				EXPOSE 6400

									
										207

Dockerfile.build-tools
									
										Normal file
									
												View File
												
				@@ -0,0 +1,207 @@

				FROM debian:bullseye-slim

				# Add nonroot user

				RUN useradd -ms /bin/bash nonroot -b /home

				SHELL ["/bin/bash", "-c"]

				# System deps

				RUN set -e \

				    && apt update \

				    && apt install -y \

				        autoconf \

				        automake \

				        bison \

				        build-essential \

				        ca-certificates \

				        cmake \

				        curl \

				        flex \

				        git \

				        gnupg \

				        gzip \

				        jq \

				        libcurl4-openssl-dev \

				        libbz2-dev \

				        libffi-dev \

				        liblzma-dev \

				        libncurses5-dev \

				        libncursesw5-dev \

				        libpq-dev \

				        libreadline-dev \

				        libseccomp-dev \

				        libsqlite3-dev \

				        libssl-dev \

				        libstdc++-10-dev \

				        libtool \

				        libxml2-dev \

				        libxmlsec1-dev \

				        libxxhash-dev \

				        lsof \

				        make \

				        netcat \

				        net-tools \

				        openssh-client \

				        parallel \

				        pkg-config \

				        unzip \

				        wget \

				        xz-utils \

				        zlib1g-dev \

				        zstd \

				    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				# protobuf-compiler (protoc)

				ENV PROTOC_VERSION 25.1

				RUN curl -fsSL "https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOC_VERSION}/protoc-${PROTOC_VERSION}-linux-$(uname -m | sed 's/aarch64/aarch_64/g').zip" -o "protoc.zip" \

				    && unzip -q protoc.zip -d protoc \

				    && mv protoc/bin/protoc /usr/local/bin/protoc \

				    && mv protoc/include/google /usr/local/include/google \

				    && rm -rf protoc.zip protoc

				# s5cmd

				ENV S5CMD_VERSION=2.2.2

				RUN curl -sL "https://github.com/peak/s5cmd/releases/download/v${S5CMD_VERSION}/s5cmd_${S5CMD_VERSION}_Linux-$(uname -m | sed 's/x86_64/64bit/g' | sed 's/aarch64/arm64/g').tar.gz" | tar zxvf - s5cmd \

				    && chmod +x s5cmd \

				    && mv s5cmd /usr/local/bin/s5cmd

				# LLVM

				ENV LLVM_VERSION=18

				RUN curl -fsSL 'https://apt.llvm.org/llvm-snapshot.gpg.key' | apt-key add - \

				    && echo "deb http://apt.llvm.org/bullseye/ llvm-toolchain-bullseye-${LLVM_VERSION} main" > /etc/apt/sources.list.d/llvm.stable.list \

				    && apt update \

				    && apt install -y clang-${LLVM_VERSION} llvm-${LLVM_VERSION} \

				    && bash -c 'for f in /usr/bin/clang*-${LLVM_VERSION} /usr/bin/llvm*-${LLVM_VERSION}; do ln -s "${f}" "${f%-${LLVM_VERSION}}"; done' \

				    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				# AWS CLI

				RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-$(uname -m).zip" -o "awscliv2.zip" \

				    && unzip -q awscliv2.zip \

				    && ./aws/install \

				    && rm awscliv2.zip

				# Mold: A Modern Linker

				ENV MOLD_VERSION v2.31.0

				RUN set -e \

				    && git clone https://github.com/rui314/mold.git \

				    && mkdir mold/build \

				    && cd mold/build \

				    && git checkout ${MOLD_VERSION} \

				    && cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=clang++ .. \

				    && cmake --build . -j $(nproc) \

				    && cmake --install . \

				    && cd .. \

				    && rm -rf mold

				# LCOV

				# Build lcov from a fork:

				# It includes several bug fixes on top on v2.0 release (https://github.com/linux-test-project/lcov/compare/v2.0...master)

				# And patches from us:

				# - Generates json file with code coverage summary (https://github.com/neondatabase/lcov/commit/426e7e7a22f669da54278e9b55e6d8caabd00af0.tar.gz)

				RUN for package in Capture::Tiny DateTime Devel::Cover Digest::MD5 File::Spec JSON::XS Memory::Process Time::HiRes JSON; do yes | perl -MCPAN -e "CPAN::Shell->notest('install', '$package')"; done \

				    && wget https://github.com/neondatabase/lcov/archive/426e7e7a22f669da54278e9b55e6d8caabd00af0.tar.gz -O lcov.tar.gz \

				    && echo "61a22a62e20908b8b9e27d890bd0ea31f567a7b9668065589266371dcbca0992  lcov.tar.gz" | sha256sum --check \

				    && mkdir -p lcov && tar -xzf lcov.tar.gz -C lcov --strip-components=1 \

				    && cd lcov \

				    && make install \

				    && rm -rf ../lcov.tar.gz

				# Compile and install the static OpenSSL library

				ENV OPENSSL_VERSION=1.1.1w

				ENV OPENSSL_PREFIX=/usr/local/openssl

				RUN wget -O /tmp/openssl-${OPENSSL_VERSION}.tar.gz https://www.openssl.org/source/openssl-${OPENSSL_VERSION}.tar.gz && \

				    echo "cf3098950cb4d853ad95c0841f1f9c6d3dc102dccfcacd521d93925208b76ac8 /tmp/openssl-${OPENSSL_VERSION}.tar.gz" | sha256sum --check && \

				    cd /tmp && \

				    tar xzvf /tmp/openssl-${OPENSSL_VERSION}.tar.gz && \

				    rm /tmp/openssl-${OPENSSL_VERSION}.tar.gz && \

				    cd /tmp/openssl-${OPENSSL_VERSION} && \

				    ./config --prefix=${OPENSSL_PREFIX}  -static --static no-shared -fPIC && \

				    make -j "$(nproc)" && \

				    make install && \

				    cd /tmp && \

				    rm -rf /tmp/openssl-${OPENSSL_VERSION}

				# Use the same version of libicu as the compute nodes so that

				# clusters created using inidb on pageserver can be used by computes.

				#

				# TODO: at this time, Dockerfile.compute-node uses the debian bullseye libicu

				# package, which is 67.1. We're duplicating that knowledge here, and also, technically,

				# Debian has a few patches on top of 67.1 that we're not adding here.

				ENV ICU_VERSION=67.1

				ENV ICU_PREFIX=/usr/local/icu

				# Download and build static ICU

				RUN wget -O /tmp/libicu-${ICU_VERSION}.tgz https://github.com/unicode-org/icu/releases/download/release-${ICU_VERSION//./-}/icu4c-${ICU_VERSION//./_}-src.tgz && \

				    echo "94a80cd6f251a53bd2a997f6f1b5ac6653fe791dfab66e1eb0227740fb86d5dc /tmp/libicu-${ICU_VERSION}.tgz" | sha256sum --check && \

				    mkdir /tmp/icu && \

				    pushd /tmp/icu && \

				    tar -xzf /tmp/libicu-${ICU_VERSION}.tgz && \

				    pushd icu/source && \

				    ./configure --prefix=${ICU_PREFIX}  --enable-static --enable-shared=no CXXFLAGS="-fPIC" CFLAGS="-fPIC" && \

				    make -j "$(nproc)" && \

				    make install && \

				    popd && \

				    rm -rf icu && \

				    rm -f /tmp/libicu-${ICU_VERSION}.tgz && \

				    popd

				# Switch to nonroot user

				USER nonroot:nonroot

				WORKDIR /home/nonroot

				# Python

				ENV PYTHON_VERSION=3.9.18 \

				    PYENV_ROOT=/home/nonroot/.pyenv \

				    PATH=/home/nonroot/.pyenv/shims:/home/nonroot/.pyenv/bin:/home/nonroot/.poetry/bin:$PATH

				RUN set -e \

				    && cd $HOME \

				    && curl -sSO https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer \

				    && chmod +x pyenv-installer \

				    && ./pyenv-installer \

				    && export PYENV_ROOT=/home/nonroot/.pyenv \

				    && export PATH="$PYENV_ROOT/bin:$PATH" \

				    && export PATH="$PYENV_ROOT/shims:$PATH" \

				    && pyenv install ${PYTHON_VERSION} \

				    && pyenv global ${PYTHON_VERSION} \

				    && python --version \

				    && pip install --upgrade pip \

				    && pip --version \

				    && pip install pipenv wheel poetry

				# Switch to nonroot user (again)

				USER nonroot:nonroot

				WORKDIR /home/nonroot

				# Rust

				# Please keep the version of llvm (installed above) in sync with rust llvm (`rustc --version --verbose | grep LLVM`)

				ENV RUSTC_VERSION=1.79.0

				ENV RUSTUP_HOME="/home/nonroot/.rustup"

				ENV PATH="/home/nonroot/.cargo/bin:${PATH}"

				RUN curl -sSO https://static.rust-lang.org/rustup/dist/$(uname -m)-unknown-linux-gnu/rustup-init && whoami && \

					chmod +x rustup-init && \

					./rustup-init -y --default-toolchain ${RUSTC_VERSION} && \

					rm rustup-init && \

				    export PATH="$HOME/.cargo/bin:$PATH" && \

				    . "$HOME/.cargo/env" && \

				    cargo --version && rustup --version && \

				    rustup component add llvm-tools-preview rustfmt clippy && \

				    cargo install --git https://github.com/paritytech/cachepot && \

				    cargo install rustfilt && \

				    cargo install cargo-hakari && \

				    cargo install cargo-deny --locked && \

				    cargo install cargo-hack && \

				    cargo install cargo-nextest && \

				    rm -rf /home/nonroot/.cargo/registry && \

				    rm -rf /home/nonroot/.cargo/git

				ENV RUSTC_WRAPPER=cachepot

				# Show versions

				RUN whoami \

				    && python --version \

				    && pip --version \

				    && cargo --version --verbose \

				    && rustup --version --verbose \

				    && rustc --version --verbose \

				    && clang --version

				# Set following flag to check in Makefile if its running in Docker

				RUN touch /home/nonroot/.docker_build

									
										713

Dockerfile.compute-node
									
												View File
												
				@@ -1,7 +1,8 @@

				ARG PG_VERSION

				ARG REPOSITORY=369495373322.dkr.ecr.eu-central-1.amazonaws.com

				ARG IMAGE=rust

				ARG REPOSITORY=neondatabase

				ARG IMAGE=build-tools

				ARG TAG=pinned

				ARG BUILD_TAG

				#########################################################################################

				#

				@@ -12,7 +13,7 @@ FROM debian:bullseye-slim AS build-deps

				RUN apt update &&  \

				    apt install -y git autoconf automake libtool build-essential bison flex libreadline-dev \

				    zlib1g-dev libxml2-dev libcurl4-openssl-dev libossp-uuid-dev wget pkg-config libssl-dev \

				    libicu-dev libxslt1-dev

				    libicu-dev libxslt1-dev liblz4-dev libzstd-dev zstd

				#########################################################################################

				#

				@@ -24,8 +25,13 @@ FROM build-deps AS pg-build

				ARG PG_VERSION

				COPY vendor/postgres-${PG_VERSION} postgres

				RUN cd postgres && \

				    ./configure CFLAGS='-O2 -g3' --enable-debug --with-openssl --with-uuid=ossp --with-icu \

				    --with-libxml --with-libxslt && \

				    export CONFIGURE_CMD="./configure CFLAGS='-O2 -g3' --enable-debug --with-openssl --with-uuid=ossp \

				    --with-icu --with-libxml --with-libxslt --with-lz4" && \

				    if [ "${PG_VERSION}" != "v14" ]; then \

				        # zstd is available only from PG15

				        export CONFIGURE_CMD="${CONFIGURE_CMD} --with-zstd"; \

				    fi && \

				    eval $CONFIGURE_CMD && \

				    make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s install && \

				    make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s -C contrib/ install && \

				    # Install headers

				@@ -38,10 +44,33 @@ RUN cd postgres && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/insert_username.control && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/intagg.control && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/moddatetime.control && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/pg_stat_statements.control && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgrowlocks.control && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgstattuple.control && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/refint.control && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/xml2.control

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/xml2.control && \

				    # We need to grant EXECUTE on pg_stat_statements_reset() to neon_superuser.

				    # In vanilla postgres this function is limited to Postgres role superuser.

				    # In neon we have neon_superuser role that is not a superuser but replaces superuser in some cases.

				    # We could add the additional grant statements to the postgres repository but it would be hard to maintain,

				    # whenever we need to pick up a new postgres version and we want to limit the changes in our postgres fork,

				    # so we do it here.

				    old_list="pg_stat_statements--1.0--1.1.sql pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.2--1.3.sql pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.4--1.5.sql pg_stat_statements--1.4.sql pg_stat_statements--1.5--1.6.sql"; \

				    # the first loop is for pg_stat_statement extension version <= 1.6

				    for file in /usr/local/pgsql/share/extension/pg_stat_statements--*.sql; do \

				        filename=$(basename "$file"); \

				        if echo "$old_list" | grep -q -F "$filename"; then \

				            echo 'GRANT EXECUTE ON FUNCTION pg_stat_statements_reset() TO neon_superuser;' >> $file; \

				        fi; \

				    done; \

				    # the second loop is for pg_stat_statement extension versions >= 1.7,

				    # where pg_stat_statement_reset() got 3 additional arguments

				    for file in /usr/local/pgsql/share/extension/pg_stat_statements--*.sql; do \

				        filename=$(basename "$file"); \

				        if ! echo "$old_list" | grep -q -F "$filename"; then \

				            echo 'GRANT EXECUTE ON FUNCTION pg_stat_statements_reset(Oid, Oid, bigint) TO neon_superuser;' >> $file; \

				        fi; \

				    done

				#########################################################################################

				#

				@@ -59,15 +88,18 @@ RUN apt update && \

				# SFCGAL > 1.3 requires CGAL > 5.2, Bullseye's libcgal-dev is 5.2

				RUN wget https://gitlab.com/Oslandia/SFCGAL/-/archive/v1.3.10/SFCGAL-v1.3.10.tar.gz -O SFCGAL.tar.gz && \

				    mkdir sfcgal-src && cd sfcgal-src && tar xvzf ../SFCGAL.tar.gz --strip-components=1 -C . && \

				    cmake . && make -j $(getconf _NPROCESSORS_ONLN) && \

				    echo "4e39b3b2adada6254a7bdba6d297bb28e1a9835a9f879b74f37e2dab70203232 SFCGAL.tar.gz" | sha256sum --check && \

				    mkdir sfcgal-src && cd sfcgal-src && tar xzf ../SFCGAL.tar.gz --strip-components=1 -C . && \

				    cmake -DCMAKE_BUILD_TYPE=Release . && make -j $(getconf _NPROCESSORS_ONLN) && \

				    DESTDIR=/sfcgal make install -j $(getconf _NPROCESSORS_ONLN) && \

				    make clean && cp -R /sfcgal/* /

				ENV PATH "/usr/local/pgsql/bin:$PATH"

				RUN wget https://download.osgeo.org/postgis/source/postgis-3.3.2.tar.gz -O postgis.tar.gz && \

				    mkdir postgis-src && cd postgis-src && tar xvzf ../postgis.tar.gz --strip-components=1 -C . && \

				RUN wget https://download.osgeo.org/postgis/source/postgis-3.3.3.tar.gz -O postgis.tar.gz && \

				    echo "74eb356e3f85f14233791013360881b6748f78081cc688ff9d6f0f673a762d13 postgis.tar.gz" | sha256sum --check && \

				    mkdir postgis-src && cd postgis-src && tar xzf ../postgis.tar.gz --strip-components=1 -C . && \

				    find /usr/local/pgsql -type f | sed 's|^/usr/local/pgsql/||' > /before.txt &&\

				    ./autogen.sh && \

				    ./configure --with-sfcgal=/usr/local/bin/sfcgal-config && \

				    make -j $(getconf _NPROCESSORS_ONLN) install && \

				@@ -80,16 +112,28 @@ RUN wget https://download.osgeo.org/postgis/source/postgis-3.3.2.tar.gz -O postg

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis_tiger_geocoder.control && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis_topology.control && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/address_standardizer.control && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/address_standardizer_data_us.control

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/address_standardizer_data_us.control && \

				    mkdir -p /extensions/postgis && \

				    cp /usr/local/pgsql/share/extension/postgis.control /extensions/postgis && \

				    cp /usr/local/pgsql/share/extension/postgis_raster.control /extensions/postgis && \

				    cp /usr/local/pgsql/share/extension/postgis_sfcgal.control /extensions/postgis && \

				    cp /usr/local/pgsql/share/extension/postgis_tiger_geocoder.control /extensions/postgis && \

				    cp /usr/local/pgsql/share/extension/postgis_topology.control /extensions/postgis && \

				    cp /usr/local/pgsql/share/extension/address_standardizer.control /extensions/postgis && \

				    cp /usr/local/pgsql/share/extension/address_standardizer_data_us.control /extensions/postgis

				RUN wget https://github.com/pgRouting/pgrouting/archive/v3.4.2.tar.gz -O pgrouting.tar.gz && \

				    mkdir pgrouting-src && cd pgrouting-src && tar xvzf ../pgrouting.tar.gz --strip-components=1 -C . && \

				    mkdir build && \

				    cd build && \

				    cmake .. && \

				    echo "cac297c07d34460887c4f3b522b35c470138760fe358e351ad1db4edb6ee306e pgrouting.tar.gz" | sha256sum --check && \

				    mkdir pgrouting-src && cd pgrouting-src && tar xzf ../pgrouting.tar.gz --strip-components=1 -C . && \

				    mkdir build && cd build && \

				    cmake -DCMAKE_BUILD_TYPE=Release .. && \

				    make -j $(getconf _NPROCESSORS_ONLN) && \

				    make -j $(getconf _NPROCESSORS_ONLN) install && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgrouting.control

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgrouting.control && \

				    find /usr/local/pgsql -type f | sed 's|^/usr/local/pgsql/||' > /after.txt &&\

				    cp /usr/local/pgsql/share/extension/pgrouting.control /extensions/postgis && \

				    sort -o /before.txt /before.txt && sort -o /after.txt /after.txt && \

				    comm -13 /before.txt /after.txt | tar --directory=/usr/local/pgsql --zstd -cf /extensions/postgis.tar.zst -T -

				#########################################################################################

				#

				@@ -99,15 +143,24 @@ RUN wget https://github.com/pgRouting/pgrouting/archive/v3.4.2.tar.gz -O pgrouti

				#########################################################################################

				FROM build-deps AS plv8-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				RUN apt update && \

				    apt install -y ninja-build python3-dev libncurses5 binutils clang

				RUN wget https://github.com/plv8/plv8/archive/refs/tags/v3.1.5.tar.gz -O plv8.tar.gz && \

				    mkdir plv8-src && cd plv8-src && tar xvzf ../plv8.tar.gz --strip-components=1 -C . && \

				RUN wget https://github.com/plv8/plv8/archive/refs/tags/v3.1.10.tar.gz -O plv8.tar.gz && \

				    echo "7096c3290928561f0d4901b7a52794295dc47f6303102fae3f8e42dd575ad97d plv8.tar.gz" | sha256sum --check && \

				    mkdir plv8-src && cd plv8-src && tar xzf ../plv8.tar.gz --strip-components=1 -C . && \

				    # generate and copy upgrade scripts

				    mkdir -p upgrade && ./generate_upgrade.sh 3.1.10 && \

				    cp upgrade/* /usr/local/pgsql/share/extension/ && \

				    export PATH="/usr/local/pgsql/bin:$PATH" && \

				    make DOCKER=1 -j $(getconf _NPROCESSORS_ONLN) install && \

				    rm -rf /plv8-* && \

				    find /usr/local/pgsql/ -name "plv8-*.so" | xargs strip && \

				    # don't break computes with installed old version of plv8

				    cd /usr/local/pgsql/lib/ && \

				    ln -s plv8-3.1.10.so plv8-3.1.5.so && \

				    ln -s plv8-3.1.10.so plv8-3.1.8.so && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/plv8.control && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/plcoffee.control && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/plls.control

				@@ -121,15 +174,27 @@ RUN wget https://github.com/plv8/plv8/archive/refs/tags/v3.1.5.tar.gz -O plv8.ta

				FROM build-deps AS h3-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				# packaged cmake is too old

				RUN wget https://github.com/Kitware/CMake/releases/download/v3.24.2/cmake-3.24.2-linux-x86_64.sh \

				RUN case "$(uname -m)" in \

				      "x86_64") \

				        export CMAKE_CHECKSUM=739d372726cb23129d57a539ce1432453448816e345e1545f6127296926b6754 \

				        ;; \

				      "aarch64") \

				        export CMAKE_CHECKSUM=281b42627c9a1beed03e29706574d04c6c53fae4994472e90985ef018dd29c02 \

				        ;; \

				      *) \

				        echo "Unsupported architecture '$(uname -m)'. Supported are x86_64 and aarch64" && exit 1 \

				        ;; \

				    esac && \

				    wget https://github.com/Kitware/CMake/releases/download/v3.24.2/cmake-3.24.2-linux-$(uname -m).sh \

				      -q -O /tmp/cmake-install.sh \

				      && echo "${CMAKE_CHECKSUM} /tmp/cmake-install.sh" | sha256sum --check \

				      && chmod u+x /tmp/cmake-install.sh \

				      && /tmp/cmake-install.sh --skip-license --prefix=/usr/local/ \

				      && rm /tmp/cmake-install.sh

				RUN wget https://github.com/uber/h3/archive/refs/tags/v4.1.0.tar.gz -O h3.tar.gz && \

				    mkdir h3-src && cd h3-src && tar xvzf ../h3.tar.gz --strip-components=1 -C . && \

				    echo "ec99f1f5974846bde64f4513cf8d2ea1b8d172d2218ab41803bf6a63532272bc h3.tar.gz" | sha256sum --check && \

				    mkdir h3-src && cd h3-src && tar xzf ../h3.tar.gz --strip-components=1 -C . && \

				    mkdir build && cd build && \

				    cmake .. -DCMAKE_BUILD_TYPE=Release && \

				    make -j $(getconf _NPROCESSORS_ONLN) && \

				@@ -137,8 +202,9 @@ RUN wget https://github.com/uber/h3/archive/refs/tags/v4.1.0.tar.gz -O h3.tar.gz

				    cp -R /h3/usr / && \

				    rm -rf build

				RUN wget https://github.com/zachasme/h3-pg/archive/refs/tags/v4.1.2.tar.gz -O h3-pg.tar.gz && \

				    mkdir h3-pg-src && cd h3-pg-src && tar xvzf ../h3-pg.tar.gz --strip-components=1 -C . && \

				RUN wget https://github.com/zachasme/h3-pg/archive/refs/tags/v4.1.3.tar.gz -O h3-pg.tar.gz && \

				    echo "5c17f09a820859ffe949f847bebf1be98511fb8f1bd86f94932512c00479e324 h3-pg.tar.gz" | sha256sum --check && \

				    mkdir h3-pg-src && cd h3-pg-src && tar xzf ../h3-pg.tar.gz --strip-components=1 -C . && \

				    export PATH="/usr/local/pgsql/bin:$PATH" && \

				    make -j $(getconf _NPROCESSORS_ONLN) && \

				    make -j $(getconf _NPROCESSORS_ONLN) install && \

				@@ -155,7 +221,8 @@ FROM build-deps AS unit-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				RUN wget https://github.com/df7cb/postgresql-unit/archive/refs/tags/7.7.tar.gz -O postgresql-unit.tar.gz && \

				    mkdir postgresql-unit-src && cd postgresql-unit-src && tar xvzf ../postgresql-unit.tar.gz --strip-components=1 -C . && \

				    echo "411d05beeb97e5a4abf17572bfcfbb5a68d98d1018918feff995f6ee3bb03e79 postgresql-unit.tar.gz" | sha256sum --check && \

				    mkdir postgresql-unit-src && cd postgresql-unit-src && tar xzf ../postgresql-unit.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    # unit extension's "create extension" script relies on absolute install path to fill some reference tables.

				@@ -174,10 +241,17 @@ RUN wget https://github.com/df7cb/postgresql-unit/archive/refs/tags/7.7.tar.gz -

				FROM build-deps AS vector-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				RUN wget https://github.com/pgvector/pgvector/archive/refs/tags/v0.4.0.tar.gz -O pgvector.tar.gz && \

				    mkdir pgvector-src && cd pgvector-src && tar xvzf ../pgvector.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				COPY patches/pgvector.patch /pgvector.patch

				# By default, pgvector Makefile uses `-march=native`. We don't want that,

				# because we build the images on different machines than where we run them.

				# Pass OPTFLAGS="" to remove it.

				RUN wget https://github.com/pgvector/pgvector/archive/refs/tags/v0.7.2.tar.gz -O pgvector.tar.gz && \

				    echo "617fba855c9bcb41a2a9bc78a78567fd2e147c72afd5bf9d37b31b9591632b30 pgvector.tar.gz" | sha256sum --check && \

				    mkdir pgvector-src && cd pgvector-src && tar xzf ../pgvector.tar.gz --strip-components=1 -C . && \

				    patch -p1 < /pgvector.patch && \

				    make -j $(getconf _NPROCESSORS_ONLN) OPTFLAGS="" PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    make -j $(getconf _NPROCESSORS_ONLN) OPTFLAGS="" install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/vector.control

				#########################################################################################

				@@ -191,7 +265,8 @@ COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				# 9742dab1b2f297ad3811120db7b21451bca2d3c9 made on 13/11/2021

				RUN wget https://github.com/michelp/pgjwt/archive/9742dab1b2f297ad3811120db7b21451bca2d3c9.tar.gz -O pgjwt.tar.gz && \

				    mkdir pgjwt-src && cd pgjwt-src && tar xvzf ../pgjwt.tar.gz --strip-components=1 -C . && \

				    echo "cfdefb15007286f67d3d45510f04a6a7a495004be5b3aecb12cda667e774203f pgjwt.tar.gz" | sha256sum --check && \

				    mkdir pgjwt-src && cd pgjwt-src && tar xzf ../pgjwt.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgjwt.control

				@@ -204,8 +279,9 @@ RUN wget https://github.com/michelp/pgjwt/archive/9742dab1b2f297ad3811120db7b214

				FROM build-deps AS hypopg-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				RUN wget https://github.com/HypoPG/hypopg/archive/refs/tags/1.3.1.tar.gz -O hypopg.tar.gz && \

				    mkdir hypopg-src && cd hypopg-src && tar xvzf ../hypopg.tar.gz --strip-components=1 -C . && \

				RUN wget https://github.com/HypoPG/hypopg/archive/refs/tags/1.4.0.tar.gz -O hypopg.tar.gz && \

				    echo "0821011743083226fc9b813c1f2ef5897a91901b57b6bea85a78e466187c6819 hypopg.tar.gz" | sha256sum --check && \

				    mkdir hypopg-src && cd hypopg-src && tar xzf ../hypopg.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/hypopg.control

				@@ -220,7 +296,8 @@ FROM build-deps AS pg-hashids-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				RUN wget https://github.com/iCyberon/pg_hashids/archive/refs/tags/v1.2.1.tar.gz -O pg_hashids.tar.gz && \

				    mkdir pg_hashids-src && cd pg_hashids-src && tar xvzf ../pg_hashids.tar.gz --strip-components=1 -C . && \

				    echo "74576b992d9277c92196dd8d816baa2cc2d8046fe102f3dcd7f3c3febed6822a pg_hashids.tar.gz" | sha256sum --check && \

				    mkdir pg_hashids-src && cd pg_hashids-src && tar xzf ../pg_hashids.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \

				    make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/pg_hashids.control

				@@ -235,7 +312,8 @@ FROM build-deps AS rum-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				RUN wget https://github.com/postgrespro/rum/archive/refs/tags/1.3.13.tar.gz -O rum.tar.gz && \

				    mkdir rum-src && cd rum-src && tar xvzf ../rum.tar.gz --strip-components=1 -C . && \

				    echo "6ab370532c965568df6210bd844ac6ba649f53055e48243525b0b7e5c4d69a7d rum.tar.gz" | sha256sum --check && \

				    mkdir rum-src && cd rum-src && tar xzf ../rum.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \

				    make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/rum.control

				@@ -250,15 +328,313 @@ FROM build-deps AS pgtap-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				RUN wget https://github.com/theory/pgtap/archive/refs/tags/v1.2.0.tar.gz -O pgtap.tar.gz && \

				    mkdir pgtap-src && cd pgtap-src && tar xvzf ../pgtap.tar.gz --strip-components=1 -C . && \

				    echo "9c7c3de67ea41638e14f06da5da57bac6f5bd03fea05c165a0ec862205a5c052 pgtap.tar.gz" | sha256sum --check && \

				    mkdir pgtap-src && cd pgtap-src && tar xzf ../pgtap.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgtap.control

				#########################################################################################

				# 

				#

				# Layer "ip4r-pg-build"

				# compile ip4r extension

				#

				#########################################################################################

				FROM build-deps AS ip4r-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				RUN wget https://github.com/RhodiumToad/ip4r/archive/refs/tags/2.4.2.tar.gz -O ip4r.tar.gz && \

				    echo "0f7b1f159974f49a47842a8ab6751aecca1ed1142b6d5e38d81b064b2ead1b4b ip4r.tar.gz" | sha256sum --check && \

				    mkdir ip4r-src && cd ip4r-src && tar xzf ../ip4r.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/ip4r.control

				#########################################################################################

				#

				# Layer "prefix-pg-build"

				# compile Prefix extension

				#

				#########################################################################################

				FROM build-deps AS prefix-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				RUN wget https://github.com/dimitri/prefix/archive/refs/tags/v1.2.10.tar.gz -O prefix.tar.gz && \

				    echo "4342f251432a5f6fb05b8597139d3ccde8dcf87e8ca1498e7ee931ca057a8575 prefix.tar.gz" | sha256sum --check && \

				    mkdir prefix-src && cd prefix-src && tar xzf ../prefix.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/prefix.control

				#########################################################################################

				#

				# Layer "hll-pg-build"

				# compile hll extension

				#

				#########################################################################################

				FROM build-deps AS hll-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				RUN wget https://github.com/citusdata/postgresql-hll/archive/refs/tags/v2.18.tar.gz -O hll.tar.gz && \

				    echo "e2f55a6f4c4ab95ee4f1b4a2b73280258c5136b161fe9d059559556079694f0e hll.tar.gz" | sha256sum --check && \

				    mkdir hll-src && cd hll-src && tar xzf ../hll.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/hll.control

				#########################################################################################

				#

				# Layer "plpgsql-check-pg-build"

				# compile plpgsql_check extension

				#

				#########################################################################################

				FROM build-deps AS plpgsql-check-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				RUN wget https://github.com/okbob/plpgsql_check/archive/refs/tags/v2.5.3.tar.gz -O plpgsql_check.tar.gz && \

				    echo "6631ec3e7fb3769eaaf56e3dfedb829aa761abf163d13dba354b4c218508e1c0 plpgsql_check.tar.gz" | sha256sum --check && \

				    mkdir plpgsql_check-src && cd plpgsql_check-src && tar xzf ../plpgsql_check.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \

				    make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/plpgsql_check.control

				#########################################################################################

				#

				# Layer "timescaledb-pg-build"

				# compile timescaledb extension

				#

				#########################################################################################

				FROM build-deps AS timescaledb-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				ARG PG_VERSION

				ENV PATH "/usr/local/pgsql/bin:$PATH"

				RUN case "${PG_VERSION}" in \

				      "v14" | "v15") \

				        export TIMESCALEDB_VERSION=2.10.1 \

				        export TIMESCALEDB_CHECKSUM=6fca72a6ed0f6d32d2b3523951ede73dc5f9b0077b38450a029a5f411fdb8c73 \

				        ;; \

				      *) \

				        export TIMESCALEDB_VERSION=2.13.0 \

				        export TIMESCALEDB_CHECKSUM=584a351c7775f0e067eaa0e7277ea88cab9077cc4c455cbbf09a5d9723dce95d \

				        ;; \

				    esac && \

				    apt-get update && \

				    apt-get install -y cmake && \

				    wget https://github.com/timescale/timescaledb/archive/refs/tags/${TIMESCALEDB_VERSION}.tar.gz -O timescaledb.tar.gz && \

				    echo "${TIMESCALEDB_CHECKSUM} timescaledb.tar.gz" | sha256sum --check && \

				    mkdir timescaledb-src && cd timescaledb-src && tar xzf ../timescaledb.tar.gz --strip-components=1 -C . && \

				    ./bootstrap -DSEND_TELEMETRY_DEFAULT:BOOL=OFF -DUSE_TELEMETRY:BOOL=OFF -DAPACHE_ONLY:BOOL=ON -DCMAKE_BUILD_TYPE=Release && \

				    cd build && \

				    make -j $(getconf _NPROCESSORS_ONLN) && \

				    make install -j $(getconf _NPROCESSORS_ONLN) && \

				    echo "trusted = true" >> /usr/local/pgsql/share/extension/timescaledb.control

				#########################################################################################

				#

				# Layer "pg-hint-plan-pg-build"

				# compile pg_hint_plan extension

				#

				#########################################################################################

				FROM build-deps AS pg-hint-plan-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				ARG PG_VERSION

				ENV PATH "/usr/local/pgsql/bin:$PATH"

				RUN case "${PG_VERSION}" in \

				      "v14") \

				        export PG_HINT_PLAN_VERSION=14_1_4_1 \

				        export PG_HINT_PLAN_CHECKSUM=c3501becf70ead27f70626bce80ea401ceac6a77e2083ee5f3ff1f1444ec1ad1 \

				        ;; \

				      "v15") \

				        export PG_HINT_PLAN_VERSION=15_1_5_0 \

				        export PG_HINT_PLAN_CHECKSUM=564cbbf4820973ffece63fbf76e3c0af62c4ab23543142c7caaa682bc48918be \

				        ;; \

				      "v16") \

				        export PG_HINT_PLAN_VERSION=16_1_6_0 \

				        export PG_HINT_PLAN_CHECKSUM=fc85a9212e7d2819d4ae4ac75817481101833c3cfa9f0fe1f980984e12347d00 \

				        ;; \

				      *) \

				        echo "Export the valid PG_HINT_PLAN_VERSION variable" && exit 1 \

				        ;; \

				    esac && \

				    wget https://github.com/ossc-db/pg_hint_plan/archive/refs/tags/REL${PG_HINT_PLAN_VERSION}.tar.gz -O pg_hint_plan.tar.gz && \

				    echo "${PG_HINT_PLAN_CHECKSUM} pg_hint_plan.tar.gz" | sha256sum --check && \

				    mkdir pg_hint_plan-src && cd pg_hint_plan-src && tar xzf ../pg_hint_plan.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) && \

				    make install -j $(getconf _NPROCESSORS_ONLN) && \

				    echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_hint_plan.control

				#########################################################################################

				#

				# Layer "pg-cron-pg-build"

				# compile pg_cron extension

				#

				#########################################################################################

				FROM build-deps AS pg-cron-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				ENV PATH "/usr/local/pgsql/bin/:$PATH"

				RUN wget https://github.com/citusdata/pg_cron/archive/refs/tags/v1.6.0.tar.gz -O pg_cron.tar.gz && \

				    echo "383a627867d730222c272bfd25cd5e151c578d73f696d32910c7db8c665cc7db pg_cron.tar.gz" | sha256sum --check && \

				    mkdir pg_cron-src && cd pg_cron-src && tar xzf ../pg_cron.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) && \

				    make -j $(getconf _NPROCESSORS_ONLN) install && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/pg_cron.control

				#########################################################################################

				#

				# Layer "rdkit-pg-build"

				# compile rdkit extension

				#

				#########################################################################################

				FROM build-deps AS rdkit-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				RUN apt-get update && \

				    apt-get install -y \

				        cmake \

				        libboost-iostreams1.74-dev \

				        libboost-regex1.74-dev \

				        libboost-serialization1.74-dev \

				        libboost-system1.74-dev \

				        libeigen3-dev

				ENV PATH "/usr/local/pgsql/bin/:/usr/local/pgsql/:$PATH"

				RUN wget https://github.com/rdkit/rdkit/archive/refs/tags/Release_2023_03_3.tar.gz -O rdkit.tar.gz && \

				    echo "bdbf9a2e6988526bfeb8c56ce3cdfe2998d60ac289078e2215374288185e8c8d rdkit.tar.gz" | sha256sum --check && \

				    mkdir rdkit-src && cd rdkit-src && tar xzf ../rdkit.tar.gz --strip-components=1 -C . && \

				    cmake \

				        -D RDK_BUILD_CAIRO_SUPPORT=OFF \

				        -D RDK_BUILD_INCHI_SUPPORT=ON \

				        -D RDK_BUILD_AVALON_SUPPORT=ON \

				        -D RDK_BUILD_PYTHON_WRAPPERS=OFF \

				        -D RDK_BUILD_DESCRIPTORS3D=OFF \

				        -D RDK_BUILD_FREESASA_SUPPORT=OFF \

				        -D RDK_BUILD_COORDGEN_SUPPORT=ON \

				        -D RDK_BUILD_MOLINTERCHANGE_SUPPORT=OFF \

				        -D RDK_BUILD_YAEHMOP_SUPPORT=OFF \

				        -D RDK_BUILD_STRUCTCHECKER_SUPPORT=OFF \

				        -D RDK_USE_URF=OFF \

				        -D RDK_BUILD_PGSQL=ON \

				        -D RDK_PGSQL_STATIC=ON \

				        -D PostgreSQL_CONFIG=pg_config \

				        -D PostgreSQL_INCLUDE_DIR=`pg_config --includedir` \

				        -D PostgreSQL_TYPE_INCLUDE_DIR=`pg_config --includedir-server` \

				        -D PostgreSQL_LIBRARY_DIR=`pg_config --libdir` \

				        -D RDK_INSTALL_INTREE=OFF \

				        -D RDK_INSTALL_COMIC_FONTS=OFF \

				        -D RDK_BUILD_FREETYPE_SUPPORT=OFF \

				        -D CMAKE_BUILD_TYPE=Release \

				        . && \

				    make -j $(getconf _NPROCESSORS_ONLN) && \

				    make -j $(getconf _NPROCESSORS_ONLN) install && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/rdkit.control

				#########################################################################################

				#

				# Layer "pg-uuidv7-pg-build"

				# compile pg_uuidv7 extension

				#

				#########################################################################################

				FROM build-deps AS pg-uuidv7-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				ENV PATH "/usr/local/pgsql/bin/:$PATH"

				RUN wget https://github.com/fboulnois/pg_uuidv7/archive/refs/tags/v1.0.1.tar.gz -O pg_uuidv7.tar.gz && \

				    echo "0d0759ab01b7fb23851ecffb0bce27822e1868a4a5819bfd276101c716637a7a pg_uuidv7.tar.gz" | sha256sum --check && \

				    mkdir pg_uuidv7-src && cd pg_uuidv7-src && tar xzf ../pg_uuidv7.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) && \

				    make -j $(getconf _NPROCESSORS_ONLN) install && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/pg_uuidv7.control

				#########################################################################################

				#

				# Layer "pg-roaringbitmap-pg-build"

				# compile pg_roaringbitmap extension

				#

				#########################################################################################

				FROM build-deps AS pg-roaringbitmap-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				ENV PATH "/usr/local/pgsql/bin/:$PATH"

				RUN wget https://github.com/ChenHuajun/pg_roaringbitmap/archive/refs/tags/v0.5.4.tar.gz -O pg_roaringbitmap.tar.gz && \

				    echo "b75201efcb1c2d1b014ec4ae6a22769cc7a224e6e406a587f5784a37b6b5a2aa pg_roaringbitmap.tar.gz" | sha256sum --check && \

				    mkdir pg_roaringbitmap-src && cd pg_roaringbitmap-src && tar xzf ../pg_roaringbitmap.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) && \

				    make -j $(getconf _NPROCESSORS_ONLN) install && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/roaringbitmap.control

				#########################################################################################

				#

				# Layer "pg-semver-pg-build"

				# compile pg_semver extension

				#

				#########################################################################################

				FROM build-deps AS pg-semver-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				ENV PATH "/usr/local/pgsql/bin/:$PATH"

				RUN wget https://github.com/theory/pg-semver/archive/refs/tags/v0.32.1.tar.gz -O pg_semver.tar.gz && \

				    echo "fbdaf7512026d62eec03fad8687c15ed509b6ba395bff140acd63d2e4fbe25d7 pg_semver.tar.gz" | sha256sum --check && \

				    mkdir pg_semver-src && cd pg_semver-src && tar xzf ../pg_semver.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) && \

				    make -j $(getconf _NPROCESSORS_ONLN) install && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/semver.control

				#########################################################################################

				#

				# Layer "pg-embedding-pg-build"

				# compile pg_embedding extension

				#

				#########################################################################################

				FROM build-deps AS pg-embedding-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				ARG PG_VERSION

				ENV PATH "/usr/local/pgsql/bin/:$PATH"

				RUN case "${PG_VERSION}" in \

				      "v14" | "v15") \

				        export PG_EMBEDDING_VERSION=0.3.5 \

				        export PG_EMBEDDING_CHECKSUM=0e95b27b8b6196e2cf0a0c9ec143fe2219b82e54c5bb4ee064e76398cbe69ae9 \

				        ;; \

				      *) \

				        echo "pg_embedding not supported on this PostgreSQL version. Use pgvector instead." && exit 0;; \

				    esac && \

				    wget https://github.com/neondatabase/pg_embedding/archive/refs/tags/${PG_EMBEDDING_VERSION}.tar.gz -O pg_embedding.tar.gz && \

				    echo "${PG_EMBEDDING_CHECKSUM} pg_embedding.tar.gz" | sha256sum --check && \

				    mkdir pg_embedding-src && cd pg_embedding-src && tar xzf ../pg_embedding.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) && \

				    make -j $(getconf _NPROCESSORS_ONLN) install

				#########################################################################################

				#

				# Layer "pg-anon-pg-build"

				# compile anon extension

				#

				#########################################################################################

				FROM build-deps AS pg-anon-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				ENV PATH "/usr/local/pgsql/bin/:$PATH"

				RUN wget  https://github.com/neondatabase/postgresql_anonymizer/archive/refs/tags/neon_1.1.1.tar.gz -O pg_anon.tar.gz && \

				    echo "321ea8d5c1648880aafde850a2c576e4a9e7b9933a34ce272efc839328999fa9  pg_anon.tar.gz" | sha256sum --check && \

				    mkdir pg_anon-src && cd pg_anon-src && tar xzf ../pg_anon.tar.gz --strip-components=1 -C . && \

				    find /usr/local/pgsql -type f | sed 's|^/usr/local/pgsql/||' > /before.txt &&\

				    make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/anon.control && \

				    find /usr/local/pgsql -type f | sed 's|^/usr/local/pgsql/||' > /after.txt &&\

				    mkdir -p /extensions/anon && cp /usr/local/pgsql/share/extension/anon.control /extensions/anon && \

				    sort -o /before.txt /before.txt && sort -o /after.txt /after.txt && \

				    comm -13 /before.txt /after.txt | tar --directory=/usr/local/pgsql --zstd -cf /extensions/anon.tar.zst -T -

				#########################################################################################

				#

				# Layer "rust extensions"

				# This layer is used to build `pgx` deps

				# This layer is used to build `pgrx` deps

				#

				#########################################################################################

				FROM build-deps AS rust-extensions-build

				@@ -278,43 +654,136 @@ RUN curl -sSO https://static.rust-lang.org/rustup/dist/$(uname -m)-unknown-linux

				    chmod +x rustup-init && \

				    ./rustup-init -y --no-modify-path --profile minimal --default-toolchain stable && \

				    rm rustup-init && \

				    cargo install --git https://github.com/vadim2404/pgx --branch neon_abi_v0.6.1 --locked cargo-pgx && \

				    /bin/bash -c 'cargo pgx init --pg${PG_VERSION:1}=/usr/local/pgsql/bin/pg_config'

				    cargo install --locked --version 0.10.2 cargo-pgrx && \

				    /bin/bash -c 'cargo pgrx init --pg${PG_VERSION:1}=/usr/local/pgsql/bin/pg_config'

				USER root

				#########################################################################################

				# 

				#

				# Layer "pg-jsonschema-pg-build"

				# Compile "pg_jsonschema" extension

				#

				#########################################################################################

				FROM rust-extensions-build AS pg-jsonschema-pg-build

				ARG PG_VERSION

				RUN git clone --depth=1 --single-branch --branch neon_abi_v0.1.4 https://github.com/vadim2404/pg_jsonschema/ && \

				    cd pg_jsonschema && \

				    cargo pgx install --release && \

				    # it's needed to enable extension because it uses untrusted C language

				    sed -i 's/superuser = false/superuser = true/g' /usr/local/pgsql/share/extension/pg_jsonschema.control && \

				RUN wget https://github.com/supabase/pg_jsonschema/archive/refs/tags/v0.2.0.tar.gz -O pg_jsonschema.tar.gz && \

				    echo "9118fc508a6e231e7a39acaa6f066fcd79af17a5db757b47d2eefbe14f7794f0 pg_jsonschema.tar.gz" | sha256sum --check && \

				    mkdir pg_jsonschema-src && cd pg_jsonschema-src && tar xzf ../pg_jsonschema.tar.gz --strip-components=1 -C . && \

				    sed -i 's/pgrx = "0.10.2"/pgrx = { version = "0.10.2", features = [ "unsafe-postgres" ] }/g' Cargo.toml && \

				    cargo pgrx install --release && \

				    echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_jsonschema.control

				#########################################################################################

				# 

				#

				# Layer "pg-graphql-pg-build"

				# Compile "pg_graphql" extension

				#

				#########################################################################################

				FROM rust-extensions-build AS pg-graphql-pg-build

				ARG PG_VERSION

				RUN git clone --depth=1 --single-branch --branch neon_abi_v1.1.0 https://github.com/vadim2404/pg_graphql && \

				    cd pg_graphql && \  

				    cargo pgx install --release && \

				RUN wget https://github.com/supabase/pg_graphql/archive/refs/tags/v1.4.0.tar.gz -O pg_graphql.tar.gz && \

				    echo "bd8dc7230282b3efa9ae5baf053a54151ed0e66881c7c53750e2d0c765776edc pg_graphql.tar.gz" | sha256sum --check && \

				    mkdir pg_graphql-src && cd pg_graphql-src && tar xzf ../pg_graphql.tar.gz --strip-components=1 -C . && \

				    sed -i 's/pgrx = "=0.10.2"/pgrx = { version = "0.10.2", features = [ "unsafe-postgres" ] }/g' Cargo.toml && \

				    cargo pgrx install --release && \

				    # it's needed to enable extension because it uses untrusted C language

				    sed -i 's/superuser = false/superuser = true/g' /usr/local/pgsql/share/extension/pg_graphql.control && \

				    echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_graphql.control

				#########################################################################################

				#

				# Layer "pg-tiktoken-build"

				# Compile "pg_tiktoken" extension

				#

				#########################################################################################

				FROM rust-extensions-build AS pg-tiktoken-pg-build

				ARG PG_VERSION

				# 26806147b17b60763039c6a6878884c41a262318 made on 26/09/2023

				RUN wget https://github.com/kelvich/pg_tiktoken/archive/26806147b17b60763039c6a6878884c41a262318.tar.gz -O pg_tiktoken.tar.gz && \

				    echo "e64e55aaa38c259512d3e27c572da22c4637418cf124caba904cd50944e5004e pg_tiktoken.tar.gz" | sha256sum --check && \

				    mkdir pg_tiktoken-src && cd pg_tiktoken-src && tar xzf ../pg_tiktoken.tar.gz --strip-components=1 -C . && \

				    cargo pgrx install --release && \

				    echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_tiktoken.control

				#########################################################################################

				#

				# Layer "pg-pgx-ulid-build"

				# Compile "pgx_ulid" extension

				#

				#########################################################################################

				FROM rust-extensions-build AS pg-pgx-ulid-build

				ARG PG_VERSION

				RUN wget https://github.com/pksunkara/pgx_ulid/archive/refs/tags/v0.1.3.tar.gz -O pgx_ulid.tar.gz && \

				    echo "ee5db82945d2d9f2d15597a80cf32de9dca67b897f605beb830561705f12683c pgx_ulid.tar.gz" | sha256sum --check && \

				    mkdir pgx_ulid-src && cd pgx_ulid-src && tar xzf ../pgx_ulid.tar.gz --strip-components=1 -C . && \

				    echo "******************* Apply a patch for Postgres 16 support; delete in the next release ******************" && \

				    wget https://github.com/pksunkara/pgx_ulid/commit/f84954cf63fc8c80d964ac970d9eceed3c791196.patch && \

				    patch -p1 < f84954cf63fc8c80d964ac970d9eceed3c791196.patch && \

				    echo "********************************************************************************************************" && \

				    sed -i 's/pgrx       = "=0.10.2"/pgrx = { version = "=0.10.2", features = [ "unsafe-postgres" ] }/g' Cargo.toml && \

				    cargo pgrx install --release && \

				    echo "trusted = true" >> /usr/local/pgsql/share/extension/ulid.control

				#########################################################################################

				#

				# Layer "wal2json-build"

				# Compile "wal2json" extension

				#

				#########################################################################################

				FROM build-deps AS wal2json-pg-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				ENV PATH "/usr/local/pgsql/bin/:$PATH"

				RUN wget https://github.com/eulerto/wal2json/archive/refs/tags/wal2json_2_5.tar.gz && \

				    echo "b516653575541cf221b99cf3f8be9b6821f6dbcfc125675c85f35090f824f00e wal2json_2_5.tar.gz" | sha256sum --check && \

				    mkdir wal2json-src && cd wal2json-src && tar xzf ../wal2json_2_5.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) && \

				    make -j $(getconf _NPROCESSORS_ONLN) install

				#########################################################################################

				#

				# Layer "pg_ivm"

				# compile pg_ivm extension

				#

				#########################################################################################

				FROM build-deps AS pg-ivm-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				ENV PATH "/usr/local/pgsql/bin/:$PATH"

				RUN wget https://github.com/sraoss/pg_ivm/archive/refs/tags/v1.7.tar.gz -O pg_ivm.tar.gz && \

				    echo "ebfde04f99203c7be4b0e873f91104090e2e83e5429c32ac242d00f334224d5e pg_ivm.tar.gz" | sha256sum --check && \

				    mkdir pg_ivm-src && cd pg_ivm-src && tar xzf ../pg_ivm.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) && \

				    make -j $(getconf _NPROCESSORS_ONLN) install && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/pg_ivm.control

				#########################################################################################

				#

				# Layer "pg_partman"

				# compile pg_partman extension

				#

				#########################################################################################

				FROM build-deps AS pg-partman-build

				COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/

				ENV PATH "/usr/local/pgsql/bin/:$PATH"

				RUN wget https://github.com/pgpartman/pg_partman/archive/refs/tags/v5.0.1.tar.gz -O pg_partman.tar.gz && \

				    echo "75b541733a9659a6c90dbd40fccb904a630a32880a6e3044d0c4c5f4c8a65525 pg_partman.tar.gz" | sha256sum --check && \

				    mkdir pg_partman-src && cd pg_partman-src && tar xzf ../pg_partman.tar.gz --strip-components=1 -C . && \

				    make -j $(getconf _NPROCESSORS_ONLN) && \

				    make -j $(getconf _NPROCESSORS_ONLN) install && \

				    echo 'trusted = true' >> /usr/local/pgsql/share/extension/pg_partman.control

				#########################################################################################

				#

				# Layer "neon-pg-ext-build"

				@@ -322,6 +791,9 @@ RUN git clone --depth=1 --single-branch --branch neon_abi_v1.1.0 https://github.

				#

				#########################################################################################

				FROM build-deps AS neon-pg-ext-build

				ARG PG_VERSION

				# Public extensions

				COPY --from=postgis-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=postgis-build /sfcgal/* /

				COPY --from=plv8-build /usr/local/pgsql/ /usr/local/pgsql/

				@@ -332,15 +804,59 @@ COPY --from=vector-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=pgjwt-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=pg-jsonschema-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=pg-graphql-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=pg-tiktoken-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=hypopg-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=pg-hashids-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=rum-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=pgtap-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=ip4r-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=prefix-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=hll-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=plpgsql-check-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=timescaledb-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=pg-hint-plan-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=pg-cron-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=pg-pgx-ulid-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=rdkit-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=pg-uuidv7-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=pg-roaringbitmap-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=pg-semver-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=pg-embedding-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=wal2json-pg-build /usr/local/pgsql /usr/local/pgsql

				COPY --from=pg-anon-pg-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=pg-ivm-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY --from=pg-partman-build /usr/local/pgsql/ /usr/local/pgsql/

				COPY pgxn/ pgxn/

				RUN make -j $(getconf _NPROCESSORS_ONLN) \

				        PG_CONFIG=/usr/local/pgsql/bin/pg_config \

				        -C pgxn/neon \

				        -s install && \

				    make -j $(getconf _NPROCESSORS_ONLN) \

				        PG_CONFIG=/usr/local/pgsql/bin/pg_config \

				        -C pgxn/neon_utils \

				        -s install && \

				    make -j $(getconf _NPROCESSORS_ONLN) \

				        PG_CONFIG=/usr/local/pgsql/bin/pg_config \

				        -C pgxn/neon_test_utils \

				        -s install && \

				    make -j $(getconf _NPROCESSORS_ONLN) \

				        PG_CONFIG=/usr/local/pgsql/bin/pg_config \

				        -C pgxn/neon_rmgr \

				        -s install && \

				    case "${PG_VERSION}" in \

				        "v14" | "v15") \

				        ;; \

				        "v16") \

				            echo "Skipping HNSW for PostgreSQL 16" && exit 0 \

				        ;; \

				        *) \

				            echo "unexpected PostgreSQL version" && exit 1 \

				        ;; \

				        esac && \

				    make -j $(getconf _NPROCESSORS_ONLN) \

				        PG_CONFIG=/usr/local/pgsql/bin/pg_config \

				        -C pgxn/hnsw \

				        -s install

				#########################################################################################

				@@ -349,10 +865,23 @@ RUN make -j $(getconf _NPROCESSORS_ONLN) \

				#

				#########################################################################################

				FROM $REPOSITORY/$IMAGE:$TAG AS compute-tools

				ARG BUILD_TAG

				ENV BUILD_TAG=$BUILD_TAG

				USER nonroot

				# Copy entire project to get Cargo.* files with proper dependencies for the whole project

				COPY --chown=nonroot . .

				RUN cd compute_tools && cargo build --locked --profile release-line-debug-size-lto

				RUN cd compute_tools && mold -run cargo build --locked --profile release-line-debug-size-lto

				#########################################################################################

				#

				# Final compute-tools image

				#

				#########################################################################################

				FROM debian:bullseye-slim AS compute-tools-image

				COPY --from=compute-tools /home/nonroot/target/release-line-debug-size-lto/compute_ctl /usr/local/bin/compute_ctl

				#########################################################################################

				#

				@@ -373,6 +902,68 @@ RUN rm -r /usr/local/pgsql/include

				# if they were to be used by other libraries.

				RUN rm /usr/local/pgsql/lib/lib*.a

				#########################################################################################

				#

				# Layer neon-pg-ext-test

				#

				#########################################################################################

				FROM neon-pg-ext-build AS neon-pg-ext-test

				ARG PG_VERSION

				RUN mkdir /ext-src

				#COPY --from=postgis-build /postgis.tar.gz /ext-src/

				#COPY --from=postgis-build /sfcgal/* /usr

				COPY --from=plv8-build /plv8.tar.gz /ext-src/

				COPY --from=h3-pg-build /h3-pg.tar.gz /ext-src/

				COPY --from=unit-pg-build /postgresql-unit.tar.gz /ext-src/

				COPY --from=vector-pg-build /pgvector.tar.gz /ext-src/

				COPY --from=vector-pg-build /pgvector.patch /ext-src/

				COPY --from=pgjwt-pg-build /pgjwt.tar.gz /ext-src

				#COPY --from=pg-jsonschema-pg-build /home/nonroot/pg_jsonschema.tar.gz /ext-src

				#COPY --from=pg-graphql-pg-build /home/nonroot/pg_graphql.tar.gz /ext-src

				#COPY --from=pg-tiktoken-pg-build /home/nonroot/pg_tiktoken.tar.gz /ext-src

				COPY --from=hypopg-pg-build /hypopg.tar.gz /ext-src

				COPY --from=pg-hashids-pg-build /pg_hashids.tar.gz /ext-src

				#COPY --from=rum-pg-build /rum.tar.gz /ext-src

				#COPY --from=pgtap-pg-build /pgtap.tar.gz /ext-src

				COPY --from=ip4r-pg-build /ip4r.tar.gz /ext-src

				COPY --from=prefix-pg-build /prefix.tar.gz /ext-src

				COPY --from=hll-pg-build /hll.tar.gz /ext-src

				COPY --from=plpgsql-check-pg-build /plpgsql_check.tar.gz /ext-src

				#COPY --from=timescaledb-pg-build /timescaledb.tar.gz /ext-src

				COPY --from=pg-hint-plan-pg-build /pg_hint_plan.tar.gz /ext-src

				COPY patches/pg_hintplan.patch /ext-src

				COPY --from=pg-cron-pg-build /pg_cron.tar.gz /ext-src

				COPY patches/pg_cron.patch /ext-src

				#COPY --from=pg-pgx-ulid-build /home/nonroot/pgx_ulid.tar.gz /ext-src

				COPY --from=rdkit-pg-build /rdkit.tar.gz /ext-src

				COPY --from=pg-uuidv7-pg-build /pg_uuidv7.tar.gz /ext-src

				COPY --from=pg-roaringbitmap-pg-build /pg_roaringbitmap.tar.gz /ext-src

				COPY --from=pg-semver-pg-build /pg_semver.tar.gz /ext-src

				#COPY --from=pg-embedding-pg-build /home/nonroot/pg_embedding-src/ /ext-src

				#COPY --from=wal2json-pg-build /wal2json_2_5.tar.gz /ext-src

				COPY --from=pg-anon-pg-build /pg_anon.tar.gz /ext-src

				COPY patches/pg_anon.patch /ext-src

				COPY --from=pg-ivm-build /pg_ivm.tar.gz /ext-src

				COPY --from=pg-partman-build /pg_partman.tar.gz /ext-src

				RUN cd /ext-src/ && for f in *.tar.gz; \

				    do echo $f; dname=$(echo $f | sed 's/\.tar.*//')-src; \

				    rm -rf $dname; mkdir $dname; tar xzf $f --strip-components=1 -C $dname \

				    || exit 1; rm -f $f; done

				RUN cd /ext-src/pgvector-src && patch -p1 <../pgvector.patch

				# cmake is required for the h3 test

				RUN apt-get update && apt-get install -y cmake

				RUN patch -p1 < /ext-src/pg_hintplan.patch

				COPY --chmod=755 docker-compose/run-tests.sh /run-tests.sh

				RUN patch -p1 </ext-src/pg_anon.patch

				RUN patch -p1 </ext-src/pg_cron.patch

				ENV PATH=/usr/local/pgsql/bin:$PATH

				ENV PGHOST=compute

				ENV PGPORT=55433

				ENV PGUSER=cloud_admin

				ENV PGDATABASE=postgres

				#########################################################################################

				#

				# Final layer

				@@ -384,8 +975,10 @@ FROM debian:bullseye-slim

				RUN mkdir /var/db && useradd -m -d /var/db/postgres postgres && \

				    echo "postgres:test_console_pass" | chpasswd && \

				    mkdir /var/db/postgres/compute && mkdir /var/db/postgres/specs && \

				    mkdir /var/db/postgres/pgbouncer && \

				    chown -R postgres:postgres /var/db/postgres && \

				    chmod 0750 /var/db/postgres/compute && \

				    chmod 0750 /var/db/postgres/pgbouncer && \

				    echo '/usr/local/lib' >> /etc/ld.so.conf && /sbin/ldconfig && \

				    # create folder for file cache

				    mkdir -p -m 777 /neon/cache

				@@ -393,17 +986,29 @@ RUN mkdir /var/db && useradd -m -d /var/db/postgres postgres && \

				COPY --from=postgres-cleanup-layer --chown=postgres /usr/local/pgsql /usr/local

				COPY --from=compute-tools --chown=postgres /home/nonroot/target/release-line-debug-size-lto/compute_ctl /usr/local/bin/compute_ctl

				# Create remote extension download directory

				RUN mkdir /usr/local/download_extensions && chown -R postgres:postgres /usr/local/download_extensions

				# Install:

				# libreadline8 for psql

				# libicu67, locales for collations (including ICU)

				# libicu67, locales for collations (including ICU and plpgsql_check)

				# liblz4-1 for lz4

				# libossp-uuid16 for extension ossp-uuid

				# libgeos, libgdal, libsfcgal1, libproj and libprotobuf-c1 for PostGIS

				# libxml2, libxslt1.1 for xml2

				# libzstd1 for zstd

				# libboost* for rdkit

				# ca-certificates for communicating with s3 by compute_ctl

				RUN apt update &&  \

				    apt install --no-install-recommends -y \

				        locales \

				        gdb \

				        libicu67 \

				        liblz4-1 \

				        libreadline8 \

				        libboost-iostreams1.74.0 \

				        libboost-regex1.74.0 \

				        libboost-serialization1.74.0 \

				        libboost-system1.74.0 \

				        libossp-uuid16 \

				        libgeos-c1v5 \

				        libgdal28 \

				@@ -412,7 +1017,11 @@ RUN apt update &&  \

				        libsfcgal1 \

				        libxml2 \

				        libxslt1.1 \

				        gdb && \

				        libzstd1 \

				        libcurl4-openssl-dev \

				        locales \

				        procps \

				        ca-certificates && \

				    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \

				    localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8

									
										29

Dockerfile.compute-tools
									
												View File
											
				@@ -1,29 +0,0 @@

				# First transient image to build compute_tools binaries

				# NB: keep in sync with rust image version in .github/workflows/build_and_test.yml

				ARG REPOSITORY=369495373322.dkr.ecr.eu-central-1.amazonaws.com

				ARG IMAGE=rust

				ARG TAG=pinned

				FROM $REPOSITORY/$IMAGE:$TAG AS rust-build

				WORKDIR /home/nonroot

				# Enable https://github.com/paritytech/cachepot to cache Rust crates' compilation results in Docker builds.

				# Set up cachepot to use an AWS S3 bucket for cache results, to reuse it between `docker build` invocations.

				# cachepot falls back to local filesystem if S3 is misconfigured, not failing the build.

				ARG RUSTC_WRAPPER=cachepot

				ENV AWS_REGION=eu-central-1

				ENV CACHEPOT_S3_KEY_PREFIX=cachepot

				ARG CACHEPOT_BUCKET=neon-github-dev

				#ARG AWS_ACCESS_KEY_ID

				#ARG AWS_SECRET_ACCESS_KEY

				COPY . .

				RUN set -e \

				    && mold -run cargo build -p compute_tools --locked --release \

				    && cachepot -s

				# Final image that only has one binary

				FROM debian:bullseye-slim

				COPY --from=rust-build /home/nonroot/target/release/compute_ctl /usr/local/bin/compute_ctl

									
										25

Dockerfile.vm-compute-node
									
												View File
											
				@@ -1,25 +0,0 @@

				# Note: this file *mostly* just builds on Dockerfile.compute-node

				ARG SRC_IMAGE

				ARG VM_INFORMANT_VERSION=v0.1.6

				# Pull VM informant and set up inittab

				FROM neondatabase/vm-informant:$VM_INFORMANT_VERSION as informant

				RUN set -e \

					&& rm -f /etc/inittab \

					&& touch /etc/inittab

				RUN set -e \

					&& echo "::respawn:su vm-informant -c '/usr/local/bin/vm-informant --auto-restart'" >> /etc/inittab

				# Combine, starting from non-VM compute node image.

				FROM $SRC_IMAGE as base

				# Temporarily set user back to root so we can run adduser

				USER root

				RUN adduser vm-informant --disabled-password --no-create-home

				USER postgres

				COPY --from=informant /etc/inittab /etc/inittab

				COPY --from=informant /usr/bin/vm-informant /usr/local/bin/vm-informant

									
										179

Makefile
									
												View File
												
				@@ -3,6 +3,9 @@ ROOT_PROJECT_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))

				# Where to install Postgres, default is ./pg_install, maybe useful for package managers

				POSTGRES_INSTALL_DIR ?= $(ROOT_PROJECT_DIR)/pg_install/

				OPENSSL_PREFIX_DIR := /usr/local/openssl

				ICU_PREFIX_DIR := /usr/local/icu

				#

				# We differentiate between release / debug build types using the BUILD_TYPE

				# environment variable.

				@@ -20,18 +23,31 @@ else

					$(error Bad build type '$(BUILD_TYPE)', see Makefile for options)

				endif

				ifeq ($(shell test -e /home/nonroot/.docker_build && echo -n yes),yes)

					# Exclude static build openssl, icu for local build (MacOS, Linux)

					# Only keep for build type release and debug

					PG_CFLAGS += -I$(OPENSSL_PREFIX_DIR)/include

					PG_CONFIGURE_OPTS += --with-icu

					PG_CONFIGURE_OPTS += ICU_CFLAGS='-I/$(ICU_PREFIX_DIR)/include -DU_STATIC_IMPLEMENTATION'

					PG_CONFIGURE_OPTS += ICU_LIBS='-L$(ICU_PREFIX_DIR)/lib -L$(ICU_PREFIX_DIR)/lib64 -licui18n -licuuc -licudata -lstdc++ -Wl,-Bdynamic -lm'

					PG_CONFIGURE_OPTS += LDFLAGS='-L$(OPENSSL_PREFIX_DIR)/lib -L$(OPENSSL_PREFIX_DIR)/lib64 -L$(ICU_PREFIX_DIR)/lib -L$(ICU_PREFIX_DIR)/lib64 -Wl,-Bstatic -lssl -lcrypto -Wl,-Bdynamic -lrt -lm -ldl -lpthread'

				endif

				UNAME_S := $(shell uname -s)

				ifeq ($(UNAME_S),Linux)

					# Seccomp BPF is only available for Linux

					PG_CONFIGURE_OPTS += --with-libseccomp

				else ifeq ($(UNAME_S),Darwin)

					# macOS with brew-installed openssl requires explicit paths

					# It can be configured with OPENSSL_PREFIX variable

					OPENSSL_PREFIX ?= $(shell brew --prefix openssl@3)

					PG_CONFIGURE_OPTS += --with-includes=$(OPENSSL_PREFIX)/include --with-libraries=$(OPENSSL_PREFIX)/lib

					# macOS already has bison and flex in the system, but they are old and result in postgres-v14 target failure

					# brew formulae are keg-only and not symlinked into HOMEBREW_PREFIX, force their usage

					EXTRA_PATH_OVERRIDES += $(shell brew --prefix bison)/bin/:$(shell brew --prefix flex)/bin/:

					ifndef DISABLE_HOMEBREW

						# macOS with brew-installed openssl requires explicit paths

						# It can be configured with OPENSSL_PREFIX variable

						OPENSSL_PREFIX := $(shell brew --prefix openssl@3)

						PG_CONFIGURE_OPTS += --with-includes=$(OPENSSL_PREFIX)/include --with-libraries=$(OPENSSL_PREFIX)/lib

						PG_CONFIGURE_OPTS += PKG_CONFIG_PATH=$(shell brew --prefix icu4c)/lib/pkgconfig

						# macOS already has bison and flex in the system, but they are old and result in postgres-v14 target failure

						# brew formulae are keg-only and not symlinked into HOMEBREW_PREFIX, force their usage

						EXTRA_PATH_OVERRIDES += $(shell brew --prefix bison)/bin/:$(shell brew --prefix flex)/bin/:

					endif

				endif

				# Use -C option so that when PostgreSQL "make install" installs the

				@@ -50,6 +66,8 @@ CARGO_BUILD_FLAGS += $(filter -j1,$(MAKEFLAGS))

				CARGO_CMD_PREFIX += $(if $(filter n,$(MAKEFLAGS)),,+)

				# Force cargo not to print progress bar

				CARGO_CMD_PREFIX += CARGO_TERM_PROGRESS_WHEN=never CI=1

				# Set PQ_LIB_DIR to make sure `storage_controller` get linked with bundled libpq (through diesel)

				CARGO_CMD_PREFIX += PQ_LIB_DIR=$(POSTGRES_INSTALL_DIR)/v16/lib

				#

				# Top level Makefile to build Neon and PostgreSQL

				@@ -61,7 +79,7 @@ all: neon postgres neon-pg-ext

				#

				# The 'postgres_ffi' depends on the Postgres headers.

				.PHONY: neon

				neon: postgres-headers

				neon: postgres-headers walproposer-lib

					+@echo "Compiling Neon"

					$(CARGO_CMD_PREFIX) cargo build $(CARGO_BUILD_FLAGS)

				@@ -71,18 +89,27 @@ neon: postgres-headers

				#

				$(POSTGRES_INSTALL_DIR)/build/%/config.status:

					+@echo "Configuring Postgres $* build"

					@test -s $(ROOT_PROJECT_DIR)/vendor/postgres-$*/configure || { \

						echo "\nPostgres submodule not found in $(ROOT_PROJECT_DIR)/vendor/postgres-$*/, execute "; \

						echo "'git submodule update --init --recursive --depth 2 --progress .' in project root.\n"; \

						exit 1; }

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/$*

					(cd $(POSTGRES_INSTALL_DIR)/build/$* && \

					env PATH="$(EXTRA_PATH_OVERRIDES):$$PATH" $(ROOT_PROJECT_DIR)/vendor/postgres-$*/configure \

					VERSION=$*; \

					EXTRA_VERSION=$$(cd $(ROOT_PROJECT_DIR)/vendor/postgres-$$VERSION && git rev-parse HEAD); \

					(cd $(POSTGRES_INSTALL_DIR)/build/$$VERSION && \

					env PATH="$(EXTRA_PATH_OVERRIDES):$$PATH" $(ROOT_PROJECT_DIR)/vendor/postgres-$$VERSION/configure \

						CFLAGS='$(PG_CFLAGS)' \

						$(PG_CONFIGURE_OPTS) \

						--prefix=$(abspath $(POSTGRES_INSTALL_DIR))/$* > configure.log)

						$(PG_CONFIGURE_OPTS) --with-extra-version=" ($$EXTRA_VERSION)" \

						--prefix=$(abspath $(POSTGRES_INSTALL_DIR))/$$VERSION > configure.log)

				# nicer alias to run 'configure'

				# Note: I've been unable to use templates for this part of our configuration.

				# I'm not sure why it wouldn't work, but this is the only place (apart from

				# the "build-all-versions" entry points) where direct mention of PostgreSQL

				# versions is used.

				.PHONY: postgres-configure-v16

				postgres-configure-v16: $(POSTGRES_INSTALL_DIR)/build/v16/config.status

				.PHONY: postgres-configure-v15

				postgres-configure-v15: $(POSTGRES_INSTALL_DIR)/build/v15/config.status

				.PHONY: postgres-configure-v14

				@@ -108,6 +135,10 @@ postgres-%: postgres-configure-% \

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pg_buffercache install

					+@echo "Compiling pageinspect $*"

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pageinspect install

					+@echo "Compiling amcheck $*"

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/amcheck install

					+@echo "Compiling test_decoding $*"

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/test_decoding install

				.PHONY: postgres-clean-%

				postgres-clean-%:

				@@ -116,6 +147,10 @@ postgres-clean-%:

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pageinspect clean

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/src/interfaces/libpq clean

				.PHONY: postgres-check-%

				postgres-check-%: postgres-%

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$* MAKELEVEL=0 check

				.PHONY: neon-pg-ext-%

				neon-pg-ext-%: postgres-%

					+@echo "Compiling neon $*"

				@@ -128,14 +163,24 @@ neon-pg-ext-%: postgres-%

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-walredo-$* \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon_walredo/Makefile install

					+@echo "Compiling neon_rmgr $*"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-rmgr-$*

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-rmgr-$* \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon_rmgr/Makefile install

					+@echo "Compiling neon_test_utils $*"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$*

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$* \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon_test_utils/Makefile install

					+@echo "Compiling neon_utils $*"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-utils-$*

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-utils-$* \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon_utils/Makefile install

				.PHONY: neon-pg-ext-clean-%

				neon-pg-ext-clean-%:

				.PHONY: neon-pg-clean-ext-%

				neon-pg-clean-ext-%:

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \

					-C $(POSTGRES_INSTALL_DIR)/build/neon-$* \

					-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile clean

				@@ -145,36 +190,86 @@ neon-pg-ext-clean-%:

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \

					-C $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$* \

					-f $(ROOT_PROJECT_DIR)/pgxn/neon_test_utils/Makefile clean

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \

					-C $(POSTGRES_INSTALL_DIR)/build/neon-utils-$* \

					-f $(ROOT_PROJECT_DIR)/pgxn/neon_utils/Makefile clean

				# Build walproposer as a static library. walproposer source code is located

				# in the pgxn/neon directory.

				#

				# We also need to include libpgport.a and libpgcommon.a, because walproposer

				# uses some functions from those libraries.

				#

				# Some object files are removed from libpgport.a and libpgcommon.a because

				# they depend on openssl and other libraries that are not included in our

				# Rust build.

				.PHONY: walproposer-lib

				walproposer-lib: neon-pg-ext-v16

					+@echo "Compiling walproposer-lib"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/walproposer-lib

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v16/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/walproposer-lib \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile walproposer-lib

					cp $(POSTGRES_INSTALL_DIR)/v16/lib/libpgport.a $(POSTGRES_INSTALL_DIR)/build/walproposer-lib

					cp $(POSTGRES_INSTALL_DIR)/v16/lib/libpgcommon.a $(POSTGRES_INSTALL_DIR)/build/walproposer-lib

				ifeq ($(UNAME_S),Linux)

					$(AR) d $(POSTGRES_INSTALL_DIR)/build/walproposer-lib/libpgport.a \

						pg_strong_random.o

					$(AR) d $(POSTGRES_INSTALL_DIR)/build/walproposer-lib/libpgcommon.a \

						pg_crc32c.o \

						hmac_openssl.o \

						cryptohash_openssl.o \

						scram-common.o \

						md5_common.o \

						checksum_helper.o

				endif

				.PHONY: walproposer-lib-clean

				walproposer-lib-clean:

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v16/bin/pg_config \

						-C $(POSTGRES_INSTALL_DIR)/build/walproposer-lib \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile clean

				.PHONY: neon-pg-ext

				neon-pg-ext: \

					neon-pg-ext-v14 \

					neon-pg-ext-v15

					neon-pg-ext-v15 \

					neon-pg-ext-v16

				.PHONY: neon-pg-ext-clean

				neon-pg-ext-clean: \

					neon-pg-ext-clean-v14 \

					neon-pg-ext-clean-v15

				.PHONY: neon-pg-clean-ext

				neon-pg-clean-ext: \

					neon-pg-clean-ext-v14 \

					neon-pg-clean-ext-v15 \

					neon-pg-clean-ext-v16

				# shorthand to build all Postgres versions

				.PHONY: postgres

				postgres: \

					postgres-v14 \

					postgres-v15

					postgres-v15 \

					postgres-v16

				.PHONY: postgres-headers

				postgres-headers: \

					postgres-headers-v14 \

					postgres-headers-v15

					postgres-headers-v15 \

					postgres-headers-v16

				.PHONY: postgres-clean

				postgres-clean: \

					postgres-clean-v14 \

					postgres-clean-v15

					postgres-clean-v15 \

					postgres-clean-v16

				.PHONY: postgres-check

				postgres-check: \

					postgres-check-v14 \

					postgres-check-v15 \

					postgres-check-v16

				# This doesn't remove the effects of 'configure'.

				.PHONY: clean

				clean: postgres-clean neon-pg-ext-clean

				clean: postgres-clean neon-pg-clean-ext

					$(CARGO_CMD_PREFIX) cargo clean

				# This removes everything

				@@ -187,6 +282,44 @@ distclean:

				fmt:

					./pre-commit.py --fix-inplace

				postgres-%-pg-bsd-indent: postgres-%

					+@echo "Compiling pg_bsd_indent"

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/src/tools/pg_bsd_indent/

				# Create typedef list for the core. Note that generally it should be combined with

				# buildfarm one to cover platform specific stuff.

				# https://wiki.postgresql.org/wiki/Running_pgindent_on_non-core_code_or_development_code

				postgres-%-typedefs.list: postgres-%

					$(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/find_typedef $(POSTGRES_INSTALL_DIR)/$*/bin > $@

				# Indent postgres. See src/tools/pgindent/README for details.

				.PHONY: postgres-%-pgindent

				postgres-%-pgindent: postgres-%-pg-bsd-indent postgres-%-typedefs.list

					+@echo merge with buildfarm typedef to cover all platforms

					+@echo note: I first tried to download from pgbuildfarm.org, but for unclear reason e.g. \

						REL_16_STABLE list misses PGSemaphoreData

					# wget -q -O - "http://www.pgbuildfarm.org/cgi-bin/typedefs.pl?branch=REL_16_STABLE" |\

					# cat - postgres-$*-typedefs.list | sort | uniq > postgres-$*-typedefs-full.list

					cat $(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/pgindent/typedefs.list |\

						cat - postgres-$*-typedefs.list | sort | uniq > postgres-$*-typedefs-full.list

					+@echo note: you might want to run it on selected files/dirs instead.

					INDENT=$(POSTGRES_INSTALL_DIR)/build/$*/src/tools/pg_bsd_indent/pg_bsd_indent \

						$(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/pgindent/pgindent --typedefs postgres-$*-typedefs-full.list \

						$(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/ \

						--excludes $(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/pgindent/exclude_file_patterns

					rm -f pg*.BAK

				# Indent pxgn/neon.

				.PHONY: pgindent

				neon-pgindent: postgres-v16-pg-bsd-indent neon-pg-ext-v16

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v16/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

						FIND_TYPEDEF=$(ROOT_PROJECT_DIR)/vendor/postgres-v16/src/tools/find_typedef \

						INDENT=$(POSTGRES_INSTALL_DIR)/build/v16/src/tools/pg_bsd_indent/pg_bsd_indent \

						PGINDENT_SCRIPT=$(ROOT_PROJECT_DIR)/vendor/postgres-v16/src/tools/pgindent/pgindent \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-v16 \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile pgindent

				.PHONY: setup-pre-commit-hook

				setup-pre-commit-hook:

					ln -s -f $(ROOT_PROJECT_DIR)/pre-commit.py .git/hooks/pre-commit

6

NOTICE

View File

@@ -1,5 +1,5 @@
 Neon
 Copyright 2022 Neon Inc.
 Copyright 2022 - 2024 Neon Inc.
 The PostgreSQL submodules in vendor/postgres-v14 and vendor/postgres-v15 are licensed under the
 PostgreSQL license. See vendor/postgres-v14/COPYRIGHT and vendor/postgres-v15/COPYRIGHT.
 The PostgreSQL submodules in vendor/ are licensed under the PostgreSQL license.
 See vendor/postgres-vX/COPYRIGHT for details.

									
										138

README.md
									
												View File
												
				@@ -1,9 +1,13 @@

				[![Neon](https://github.com/neondatabase/neon/assets/11527560/f15a17f0-836e-40c5-b35d-030606a6b660)](https://neon.tech)

				# Neon

				Neon is a serverless open-source alternative to AWS Aurora Postgres. It separates storage and compute and substitutes the PostgreSQL storage layer by redistributing data across a cluster of nodes.

				## Quick start

				Try the [Neon Free Tier](https://neon.tech/docs/introduction/technical-preview-free-tier/) to create a serverless Postgres instance. Then connect to it with your preferred Postgres client (psql, dbeaver, etc) or use the online [SQL Editor](https://neon.tech/docs/get-started-with-neon/query-with-neon-sql-editor/). See [Connect from any application](https://neon.tech/docs/connect/connect-from-any-app/) for connection instructions.

				Try the [Neon Free Tier](https://neon.tech/github) to create a serverless Postgres instance. Then connect to it with your preferred Postgres client (psql, dbeaver, etc) or use the online [SQL Editor](https://neon.tech/docs/get-started-with-neon/query-with-neon-sql-editor/). See [Connect from any application](https://neon.tech/docs/connect/connect-from-any-app/) for connection instructions.

				Alternatively, compile and run the project [locally](#running-local-installation).

				@@ -12,10 +16,10 @@ Alternatively, compile and run the project [locally](#running-local-installation

				A Neon installation consists of compute nodes and the Neon storage engine. Compute nodes are stateless PostgreSQL nodes backed by the Neon storage engine.

				The Neon storage engine consists of two major components:

				- Pageserver. Scalable storage backend for the compute nodes.

				- Safekeepers. The safekeepers form a redundant WAL service that received WAL from the compute node, and stores it durably until it has been processed by the pageserver and uploaded to cloud storage.

				- Pageserver: Scalable storage backend for the compute nodes.

				- Safekeepers: The safekeepers form a redundant WAL service that received WAL from the compute node, and stores it durably until it has been processed by the pageserver and uploaded to cloud storage.

				See developer documentation in [/docs/SUMMARY.md](/docs/SUMMARY.md) for more information.

				See developer documentation in [SUMMARY.md](/docs/SUMMARY.md) for more information.

				## Running local installation

				@@ -26,31 +30,38 @@ See developer documentation in [/docs/SUMMARY.md](/docs/SUMMARY.md) for more inf

				* On Ubuntu or Debian, this set of packages should be sufficient to build the code:

				```bash

				apt install build-essential libtool libreadline-dev zlib1g-dev flex bison libseccomp-dev \

				libssl-dev clang pkg-config libpq-dev cmake postgresql-client protobuf-compiler

				libssl-dev clang pkg-config libpq-dev cmake postgresql-client protobuf-compiler \

				libcurl4-openssl-dev openssl python3-poetry lsof libicu-dev

				```

				* On Fedora, these packages are needed:

				```bash

				dnf install flex bison readline-devel zlib-devel openssl-devel \

				  libseccomp-devel perl clang cmake postgresql postgresql-contrib protobuf-compiler \

				  protobuf-devel

				  protobuf-devel libcurl-devel openssl poetry lsof libicu-devel libpq-devel python3-devel \

				  libffi-devel

				```

				* On Arch based systems, these packages are needed:

				```bash

				pacman -S base-devel readline zlib libseccomp openssl clang \

				postgresql-libs cmake postgresql protobuf

				postgresql-libs cmake postgresql protobuf curl lsof

				```

				Building Neon requires 3.15+ version of `protoc` (protobuf-compiler). If your distribution provides an older version, you can install a newer version from [here](https://github.com/protocolbuffers/protobuf/releases).

				2. [Install Rust](https://www.rust-lang.org/tools/install)

				```

				# recommended approach from https://www.rust-lang.org/tools/install

				curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

				```

				#### Installing dependencies on OSX (12.3.1)

				#### Installing dependencies on macOS (12.3.1)

				1. Install XCode and dependencies

				```

				xcode-select --install

				brew install protobuf openssl flex bison

				brew install protobuf openssl flex bison icu4c pkg-config

				# add openssl to PATH, required for ed25519 keys generation in neon_local

				echo 'export PATH="$(brew --prefix openssl)/bin:$PATH"' >> ~/.zshrc

				```

				2. [Install Rust](https://www.rust-lang.org/tools/install)

				@@ -72,9 +83,9 @@ The project uses [rust toolchain file](./rust-toolchain.toml) to define the vers

				This file is automatically picked up by [`rustup`](https://rust-lang.github.io/rustup/overrides.html#the-toolchain-file) that installs (if absent) and uses the toolchain version pinned in the file.

				rustup users who want to build with another toolchain can use [`rustup override`](https://rust-lang.github.io/rustup/overrides.html#directory-overrides) command to set a specific toolchain for the project's directory.

				rustup users who want to build with another toolchain can use the [`rustup override`](https://rust-lang.github.io/rustup/overrides.html#directory-overrides) command to set a specific toolchain for the project's directory.

				non-rustup users most probably are not getting the same toolchain automatically from the file, so are responsible to manually verify their toolchain matches the version in the file.

				non-rustup users most probably are not getting the same toolchain automatically from the file, so are responsible to manually verify that their toolchain matches the version in the file.

				Newer rustc versions most probably will work fine, yet older ones might not be supported due to some new features used by the project or the crates.

				#### Building on Linux

				@@ -115,7 +126,7 @@ make -j`sysctl -n hw.logicalcpu` -s

				To run the `psql` client, install the `postgresql-client` package or modify `PATH` and `LD_LIBRARY_PATH` to include `pg_install/bin` and `pg_install/lib`, respectively.

				To run the integration tests or Python scripts (not required to use the code), install

				Python (3.9 or higher), and install python3 packages using `./scripts/pysync` (requires [poetry>=1.3](https://python-poetry.org/)) in the project directory.

				Python (3.9 or higher), and install the python3 packages using `./scripts/pysync` (requires [poetry>=1.3](https://python-poetry.org/)) in the project directory.

				#### Running neon database

				@@ -123,39 +134,41 @@ Python (3.9 or higher), and install python3 packages using `./scripts/pysync` (r

				```sh

				# Create repository in .neon with proper paths to binaries and data

				# Later that would be responsibility of a package install script

				> ./target/debug/neon_local init

				Starting pageserver at '127.0.0.1:64000' in '.neon'.

				> cargo neon init

				Initializing pageserver node 1 at '127.0.0.1:64000' in ".neon"

				# start pageserver, safekeeper, and broker for their intercommunication

				> ./target/debug/neon_local start

				Starting neon broker at 127.0.0.1:50051

				> cargo neon start

				Starting neon broker at 127.0.0.1:50051.

				storage_broker started, pid: 2918372

				Starting pageserver at '127.0.0.1:64000' in '.neon'.

				Starting pageserver node 1 at '127.0.0.1:64000' in ".neon".

				pageserver started, pid: 2918386

				Starting safekeeper at '127.0.0.1:5454' in '.neon/safekeepers/sk1'.

				safekeeper 1 started, pid: 2918437

				# create initial tenant and use it as a default for every future neon_local invocation

				> ./target/debug/neon_local tenant create --set-default

				> cargo neon tenant create --set-default

				tenant 9ef87a5bf0d92544f6fafeeb3239695c successfully created on the pageserver

				Created an initial timeline 'de200bd42b49cc1814412c7e592dd6e9' at Lsn 0/16B5A50 for tenant: 9ef87a5bf0d92544f6fafeeb3239695c

				Setting tenant 9ef87a5bf0d92544f6fafeeb3239695c as a default one

				# create postgres compute node

				> cargo neon endpoint create main

				# start postgres compute node

				> ./target/debug/neon_local pg start main

				Starting new postgres (v14) main on timeline de200bd42b49cc1814412c7e592dd6e9 ...

				Extracting base backup to create postgres instance: path=.neon/pgdatadirs/tenants/9ef87a5bf0d92544f6fafeeb3239695c/main port=55432

				Starting postgres node at 'host=127.0.0.1 port=55432 user=cloud_admin dbname=postgres'

				> cargo neon endpoint start main

				Starting new endpoint main (PostgreSQL v14) on timeline de200bd42b49cc1814412c7e592dd6e9 ...

				Starting postgres at 'postgresql://cloud_admin@127.0.0.1:55432/postgres'

				# check list of running postgres instances

				> ./target/debug/neon_local pg list

				 NODE  ADDRESS          TIMELINE                          BRANCH NAME  LSN        STATUS

				 main  127.0.0.1:55432  de200bd42b49cc1814412c7e592dd6e9  main         0/16B5BA8  running

				> cargo neon endpoint list

				 ENDPOINT  ADDRESS          TIMELINE                          BRANCH NAME  LSN        STATUS

				 main      127.0.0.1:55432  de200bd42b49cc1814412c7e592dd6e9  main         0/16B5BA8  running

				```

				2. Now, it is possible to connect to postgres and run some queries:

				```text

				> psql -p55432 -h 127.0.0.1 -U cloud_admin postgres

				> psql -p 55432 -h 127.0.0.1 -U cloud_admin postgres

				postgres=# CREATE TABLE t(key int primary key, value text);

				CREATE TABLE

				postgres=# insert into t values(1,1);

				@@ -170,29 +183,31 @@ postgres=# select * from t;

				3. And create branches and run postgres on them:

				```sh

				# create branch named migration_check

				> ./target/debug/neon_local timeline branch --branch-name migration_check

				> cargo neon timeline branch --branch-name migration_check

				Created timeline 'b3b863fa45fa9e57e615f9f2d944e601' at Lsn 0/16F9A00 for tenant: 9ef87a5bf0d92544f6fafeeb3239695c. Ancestor timeline: 'main'

				# check branches tree

				> ./target/debug/neon_local timeline list

				> cargo neon timeline list

				(L) main [de200bd42b49cc1814412c7e592dd6e9]

				(L) ┗━ @0/16F9A00: migration_check [b3b863fa45fa9e57e615f9f2d944e601]

				# create postgres on that branch

				> cargo neon endpoint create migration_check --branch-name migration_check

				# start postgres on that branch

				> ./target/debug/neon_local pg start migration_check --branch-name migration_check

				Starting new postgres migration_check on timeline b3b863fa45fa9e57e615f9f2d944e601 ...

				Extracting base backup to create postgres instance: path=.neon/pgdatadirs/tenants/9ef87a5bf0d92544f6fafeeb3239695c/migration_check port=55433

				Starting postgres node at 'host=127.0.0.1 port=55433 user=cloud_admin dbname=postgres'

				> cargo neon endpoint start migration_check

				Starting new endpoint migration_check (PostgreSQL v14) on timeline b3b863fa45fa9e57e615f9f2d944e601 ...

				Starting postgres at 'postgresql://cloud_admin@127.0.0.1:55434/postgres'

				# check the new list of running postgres instances

				> ./target/debug/neon_local pg list

				 NODE             ADDRESS          TIMELINE                          BRANCH NAME      LSN        STATUS

				> cargo neon endpoint list

				 ENDPOINT         ADDRESS          TIMELINE                          BRANCH NAME      LSN        STATUS

				 main             127.0.0.1:55432  de200bd42b49cc1814412c7e592dd6e9  main             0/16F9A38  running

				 migration_check  127.0.0.1:55433  b3b863fa45fa9e57e615f9f2d944e601  migration_check  0/16F9A70  running

				 migration_check  127.0.0.1:55434  b3b863fa45fa9e57e615f9f2d944e601  migration_check  0/16F9A70  running

				# this new postgres instance will have all the data from 'main' postgres,

				# but all modifications would not affect data in original postgres

				> psql -p55433 -h 127.0.0.1 -U cloud_admin postgres

				> psql -p 55434 -h 127.0.0.1 -U cloud_admin postgres

				postgres=# select * from t;

				 key | value

				-----+-------

				@@ -203,7 +218,7 @@ postgres=# insert into t values(2,2);

				INSERT 0 1

				# check that the new change doesn't affect the 'main' postgres

				> psql -p55432 -h 127.0.0.1 -U cloud_admin postgres

				> psql -p 55432 -h 127.0.0.1 -U cloud_admin postgres

				postgres=# select * from t;

				 key | value

				-----+-------

				@@ -211,14 +226,28 @@ postgres=# select * from t;

				(1 row)

				```

				4. If you want to run tests afterward (see below), you must stop all the running of the pageserver, safekeeper, and postgres instances

				4. If you want to run tests afterwards (see below), you must stop all the running pageserver, safekeeper, and postgres instances

				   you have just started. You can terminate them all with one command:

				```sh

				> ./target/debug/neon_local stop

				> cargo neon stop

				```

				More advanced usages can be found at [Control Plane and Neon Local](./control_plane/README.md).

				#### Handling build failures

				If you encounter errors during setting up the initial tenant, it's best to stop everything (`cargo neon stop`) and remove the `.neon` directory. Then fix the problems, and start the setup again.

				## Running tests

				### Rust unit tests

				We are using [`cargo-nextest`](https://nexte.st/) to run the tests in Github Workflows.

				Some crates do not support running plain `cargo test` anymore, prefer `cargo nextest run` instead.

				You can install `cargo-nextest` with `cargo install cargo-nextest`.

				### Integration tests

				Ensure your dependencies are installed as described [here](https://github.com/neondatabase/neon#dependency-installation-notes).

				```sh

				@@ -229,11 +258,34 @@ CARGO_BUILD_FLAGS="--features=testing" make

				./scripts/pytest

				```

				By default, this runs both debug and release modes, and all supported postgres versions. When

				testing locally, it is convenient to run just one set of permutations, like this:

				```sh

				DEFAULT_PG_VERSION=15 BUILD_TYPE=release ./scripts/pytest

				```

				## Flamegraphs

				You may find yourself in need of flamegraphs for software in this repository.

				You can use [`flamegraph-rs`](https://github.com/flamegraph-rs/flamegraph) or the original [`flamegraph.pl`](https://github.com/brendangregg/FlameGraph). Your choice!

				>[!IMPORTANT]

				> If you're using `lld` or `mold`, you need the `--no-rosegment` linker argument.

				> It's a [general thing with Rust / lld / mold](https://crbug.com/919499#c16), not specific to this repository.

				> See [this PR for further instructions](https://github.com/neondatabase/neon/pull/6764).

				## Cleanup

				For cleaning up the source tree from build artifacts, run `make clean` in the source directory.

				For removing every artifact from build and configure steps, run `make distclean`, and also consider removing the cargo binaries in the `target` directory, as well as the database in the `.neon` directory. Note that removing the `.neon` directory will remove your database, with all data in it. You have been warned!

				## Documentation

				[/docs/](/docs/) Contains a top-level overview of all available markdown documentation.

				[docs](/docs) Contains a top-level overview of all available markdown documentation.

				- [/docs/sourcetree.md](/docs/sourcetree.md) contains overview of source tree layout.

				- [sourcetree.md](/docs/sourcetree.md) contains overview of source tree layout.

				To view your `rustdoc` documentation in a browser, try running `cargo doc --no-deps --open`

				@@ -258,6 +310,6 @@ To get more familiar with this aspect, refer to:

				## Join the development

				- Read `CONTRIBUTING.md` to learn about project code style and practices.

				- To get familiar with a source tree layout, use [/docs/sourcetree.md](/docs/sourcetree.md).

				- Read [CONTRIBUTING.md](/CONTRIBUTING.md) to learn about project code style and practices.

				- To get familiar with a source tree layout, use [sourcetree.md](/docs/sourcetree.md).

				- To learn more about PostgreSQL internals, check http://www.interdb.jp/pg/index.html

									
										14

clippy.toml
									
										Normal file
									
												View File
												
				@@ -0,0 +1,14 @@

				disallowed-methods = [

				    "tokio::task::block_in_place",

				    # Allow this for now, to deny it later once we stop using Handle::block_on completely

				    # "tokio::runtime::Handle::block_on",

				    # use tokio_epoll_uring_ext instead

				    "tokio_epoll_uring::thread_local_system",

				]

				disallowed-macros = [

				    # use std::pin::pin

				    "futures::pin_mut",

				    # cannot disallow this, because clippy finds used from tokio macros

				    #"tokio::pin",

				]

									
										17

compute_tools/Cargo.toml
									
												View File
												
				@@ -6,24 +6,41 @@ license.workspace = true

				[dependencies]

				anyhow.workspace = true

				async-compression.workspace = true

				chrono.workspace = true

				cfg-if.workspace = true

				clap.workspace = true

				flate2.workspace = true

				futures.workspace = true

				hyper = { workspace = true, features = ["full"] }

				nix.workspace = true

				notify.workspace = true

				num_cpus.workspace = true

				opentelemetry.workspace = true

				postgres.workspace = true

				regex.workspace = true

				serde.workspace = true

				serde_json.workspace = true

				signal-hook.workspace = true

				tar.workspace = true

				reqwest = { workspace = true, features = ["json"] }

				tokio = { workspace = true, features = ["rt", "rt-multi-thread"] }

				tokio-postgres.workspace = true

				tokio-util.workspace = true

				tokio-stream.workspace = true

				tracing.workspace = true

				tracing-opentelemetry.workspace = true

				tracing-subscriber.workspace = true

				tracing-utils.workspace = true

				thiserror.workspace = true

				url.workspace = true

				compute_api.workspace = true

				utils.workspace = true

				workspace_hack.workspace = true

				toml_edit.workspace = true

				remote_storage = { version = "0.1", path = "../libs/remote_storage/" }

				vm_monitor = { version = "0.1", path = "../libs/vm_monitor/" }

				zstd = "0.13"

				bytes = "1.0"

				rust-ini = "0.20.0"

									
										30

compute_tools/README.md
									
												View File
												
				@@ -19,9 +19,10 @@ Also `compute_ctl` spawns two separate service threads:

				- `http-endpoint` runs a Hyper HTTP API server, which serves readiness and the

				  last activity requests.

				If the `vm-informant` binary is present at `/bin/vm-informant`, it will also be started. For VM

				compute nodes, `vm-informant` communicates with the VM autoscaling system. It coordinates

				downscaling and (eventually) will request immediate upscaling under resource pressure.

				If `AUTOSCALING` environment variable is set, `compute_ctl` will start the

				`vm-monitor` located in [`neon/libs/vm_monitor`]. For VM compute nodes,

				`vm-monitor` communicates with the VM autoscaling system. It coordinates

				downscaling and requests immediate upscaling under resource pressure.

				Usage example:

				```sh

				@@ -31,6 +32,29 @@ compute_ctl -D /var/db/postgres/compute \

				            -b /usr/local/bin/postgres

				```

				## State Diagram

				Computes can be in various states. Below is a diagram that details how a

				compute moves between states.

				```mermaid

				%% https://mermaid.js.org/syntax/stateDiagram.html

				stateDiagram-v2

				  [*] --> Empty : Compute spawned

				  Empty --> ConfigurationPending : Waiting for compute spec

				  ConfigurationPending --> Configuration : Received compute spec

				  Configuration --> Failed : Failed to configure the compute

				  Configuration --> Running : Compute has been configured

				  Empty --> Init : Compute spec is immediately available

				  Empty --> TerminationPending : Requested termination

				  Init --> Failed : Failed to start Postgres

				  Init --> Running : Started Postgres

				  Running --> TerminationPending : Requested termination

				  TerminationPending --> Terminated : Terminated compute

				  Failed --> [*] : Compute exited

				  Terminated --> [*] : Compute exited

				```

				## Tests

				Cargo formatter:

									
										716

compute_tools/src/bin/compute_ctl.rs
									
												View File
												
				@@ -5,6 +5,8 @@

				//! - `compute_ctl` accepts cluster (compute node) specification as a JSON file.

				//! - Every start is a fresh start, so the data directory is removed and

				//!   initialized again on each run.

				//! - If remote_extension_config is provided, it will be used to fetch extensions list

				//!  and download `shared_preload_libraries` from the remote storage.

				//! - Next it will put configuration files into the `PGDATA` directory.

				//! - Sync safekeepers and get commit LSN.

				//! - Get `basebackup` from pageserver using the returned on the previous step LSN.

				@@ -18,196 +20,635 @@

				//! - `http-endpoint` runs a Hyper HTTP API server, which serves readiness and the

				//!   last activity requests.

				//!

				//! If the `vm-informant` binary is present at `/bin/vm-informant`, it will also be started. For VM

				//! compute nodes, `vm-informant` communicates with the VM autoscaling system. It coordinates

				//! downscaling and (eventually) will request immediate upscaling under resource pressure.

				//! If `AUTOSCALING` environment variable is set, `compute_ctl` will start the

				//! `vm-monitor` located in [`neon/libs/vm_monitor`]. For VM compute nodes,

				//! `vm-monitor` communicates with the VM autoscaling system. It coordinates

				//! downscaling and requests immediate upscaling under resource pressure.

				//!

				//! Usage example:

				//! ```sh

				//! compute_ctl -D /var/db/postgres/compute \

				//!             -C 'postgresql://cloud_admin@localhost/postgres' \

				//!             -S /var/db/postgres/specs/current.json \

				//!             -b /usr/local/bin/postgres

				//!             -b /usr/local/bin/postgres \

				//!             -r http://pg-ext-s3-gateway \

				//! ```

				//!

				use std::collections::HashMap;

				use std::fs::File;

				use std::panic;

				use std::path::Path;

				use std::process::exit;

				use std::sync::{Arc, RwLock};

				use std::sync::atomic::Ordering;

				use std::sync::{mpsc, Arc, Condvar, Mutex, RwLock};

				use std::{thread, time::Duration};

				use anyhow::{Context, Result};

				use chrono::Utc;

				use clap::Arg;

				use tracing::{error, info};

				use signal_hook::consts::{SIGQUIT, SIGTERM};

				use signal_hook::{consts::SIGINT, iterator::Signals};

				use tracing::{error, info, warn};

				use url::Url;

				use compute_tools::compute::{ComputeMetrics, ComputeNode, ComputeState, ComputeStatus};

				use compute_api::responses::ComputeStatus;

				use compute_api::spec::ComputeSpec;

				use compute_tools::compute::{

				    forward_termination_signal, ComputeNode, ComputeState, ParsedSpec, PG_PID,

				};

				use compute_tools::configurator::launch_configurator;

				use compute_tools::extension_server::get_pg_version;

				use compute_tools::http::api::launch_http_server;

				use compute_tools::logger::*;

				use compute_tools::monitor::launch_monitor;

				use compute_tools::params::*;

				use compute_tools::pg_helpers::*;

				use compute_tools::spec::*;

				use url::Url;

				use compute_tools::swap::resize_swap;

				// this is an arbitrary build tag. Fine as a default / for testing purposes

				// in-case of not-set environment var

				const BUILD_TAG_DEFAULT: &str = "latest";

				fn main() -> Result<()> {

				    let (build_tag, clap_args) = init()?;

				    let (pg_handle, start_pg_result) = {

				        // Enter startup tracing context

				        let _startup_context_guard = startup_context_from_env();

				        let cli_args = process_cli(&clap_args)?;

				        let cli_spec = try_spec_from_cli(&clap_args, &cli_args)?;

				        let wait_spec_result = wait_spec(build_tag, cli_args, cli_spec)?;

				        start_postgres(&clap_args, wait_spec_result)?

				        // Startup is finished, exit the startup tracing span

				    };

				    // PostgreSQL is now running, if startup was successful. Wait until it exits.

				    let wait_pg_result = wait_postgres(pg_handle)?;

				    let delay_exit = cleanup_after_postgres_exit(start_pg_result)?;

				    maybe_delay_exit(delay_exit);

				    deinit_and_exit(wait_pg_result);

				}

				fn init() -> Result<(String, clap::ArgMatches)> {

				    init_tracing_and_logging(DEFAULT_LOG_LEVEL)?;

				    let matches = cli().get_matches();

				    let mut signals = Signals::new([SIGINT, SIGTERM, SIGQUIT])?;

				    thread::spawn(move || {

				        for sig in signals.forever() {

				            handle_exit_signal(sig);

				        }

				    });

				    let build_tag = option_env!("BUILD_TAG")

				        .unwrap_or(BUILD_TAG_DEFAULT)

				        .to_string();

				    info!("build_tag: {build_tag}");

				    Ok((build_tag, cli().get_matches()))

				}

				fn process_cli(matches: &clap::ArgMatches) -> Result<ProcessCliResult> {

				    let pgbin_default = "postgres";

				    let pgbin = matches

				        .get_one::<String>("pgbin")

				        .map(|s| s.as_str())

				        .unwrap_or(pgbin_default);

				    let ext_remote_storage = matches

				        .get_one::<String>("remote-ext-config")

				        // Compatibility hack: if the control plane specified any remote-ext-config

				        // use the default value for extension storage proxy gateway.

				        // Remove this once the control plane is updated to pass the gateway URL

				        .map(|conf| {

				            if conf.starts_with("http") {

				                conf.trim_end_matches('/')

				            } else {

				                "http://pg-ext-s3-gateway"

				            }

				        });

				    let http_port = *matches

				        .get_one::<u16>("http-port")

				        .expect("http-port is required");

				    let pgdata = matches

				        .get_one::<String>("pgdata")

				        .expect("PGDATA path is required");

				    let connstr = matches

				        .get_one::<String>("connstr")

				        .expect("Postgres connection string is required");

				    let spec = matches.get_one::<String>("spec");

				    let spec_json = matches.get_one::<String>("spec");

				    let spec_path = matches.get_one::<String>("spec-path");

				    let resize_swap_on_bind = matches.get_flag("resize-swap-on-bind");

				    let compute_id = matches.get_one::<String>("compute-id");

				    let control_plane_uri = matches.get_one::<String>("control-plane-uri");

				    Ok(ProcessCliResult {

				        connstr,

				        pgdata,

				        pgbin,

				        ext_remote_storage,

				        http_port,

				        spec_json,

				        spec_path,

				        resize_swap_on_bind,

				    })

				}

				    // Try to use just 'postgres' if no path is provided

				    let pgbin = matches.get_one::<String>("pgbin").unwrap();

				struct ProcessCliResult<'clap> {

				    connstr: &'clap str,

				    pgdata: &'clap str,

				    pgbin: &'clap str,

				    ext_remote_storage: Option<&'clap str>,

				    http_port: u16,

				    spec_json: Option<&'clap String>,

				    spec_path: Option<&'clap String>,

				    resize_swap_on_bind: bool,

				}

				    let spec: ComputeSpec = match spec {

				        // First, try to get cluster spec from the cli argument

				        Some(json) => serde_json::from_str(json)?,

				        None => {

				            // Second, try to read it from the file if path is provided

				            if let Some(sp) = spec_path {

				                let path = Path::new(sp);

				                let file = File::open(path)?;

				                serde_json::from_reader(file)?

				            } else if let Some(id) = compute_id {

				                if let Some(cp_base) = control_plane_uri {

				                    let cp_uri = format!("{cp_base}/management/api/v1/{id}/spec");

				                    let jwt: String = match std::env::var("NEON_CONSOLE_JWT") {

				                        Ok(v) => v,

				                        Err(_) => "".to_string(),

				                    };

				                    reqwest::blocking::Client::new()

				                        .get(cp_uri)

				                        .header("Authorization", jwt)

				                        .send()?

				                        .json()?

				                } else {

				                    panic!(

				                        "must specify --control-plane-uri \"{:#?}\" and --compute-id \"{:#?}\"",

				                        control_plane_uri, compute_id

				                    );

				                }

				            } else {

				                panic!("compute spec should be provided via --spec or --spec-path argument");

				            }

				        }

				    };

				    // Extract OpenTelemetry context for the startup actions from the spec, and

				    // attach it to the current tracing context.

				fn startup_context_from_env() -> Option<opentelemetry::ContextGuard> {

				    // Extract OpenTelemetry context for the startup actions from the

				    // TRACEPARENT and TRACESTATE env variables, and attach it to the current

				    // tracing context.

				    //

				    // This is used to propagate the context for the 'start_compute' operation

				    // from the neon control plane. This allows linking together the wider

				    // 'start_compute' operation that creates the compute container, with the

				    // startup actions here within the container.

				    //

				    // There is no standard for passing context in env variables, but a lot of

				    // tools use TRACEPARENT/TRACESTATE, so we use that convention too. See

				    // https://github.com/open-telemetry/opentelemetry-specification/issues/740

				    //

				    // Switch to the startup context here, and exit it once the startup has

				    // completed and Postgres is up and running.

				    //

				    // If this pod is pre-created without binding it to any particular endpoint

				    // yet, this isn't the right place to enter the startup context. In that

				    // case, the control plane should pass the tracing context as part of the

				    // /configure API call.

				    //

				    // NOTE: This is supposed to only cover the *startup* actions. Once

				    // postgres is configured and up-and-running, we exit this span. Any other

				    // actions that are performed on incoming HTTP requests, for example, are

				    // performed in separate spans.

				    let startup_context_guard = if let Some(ref carrier) = spec.startup_tracing_context {

				    //

				    // XXX: If the pod is restarted, we perform the startup actions in the same

				    // context as the original startup actions, which probably doesn't make

				    // sense.

				    let mut startup_tracing_carrier: HashMap<String, String> = HashMap::new();

				    if let Ok(val) = std::env::var("TRACEPARENT") {

				        startup_tracing_carrier.insert("traceparent".to_string(), val);

				    }

				    if let Ok(val) = std::env::var("TRACESTATE") {

				        startup_tracing_carrier.insert("tracestate".to_string(), val);

				    }

				    if !startup_tracing_carrier.is_empty() {

				        use opentelemetry::propagation::TextMapPropagator;

				        use opentelemetry::sdk::propagation::TraceContextPropagator;

				        Some(TraceContextPropagator::new().extract(carrier).attach())

				        let guard = TraceContextPropagator::new()

				            .extract(&startup_tracing_carrier)

				            .attach();

				        info!("startup tracing context attached");

				        Some(guard)

				    } else {

				        None

				    };

				    }

				}

				    let pageserver_connstr = spec

				        .cluster

				        .settings

				        .find("neon.pageserver_connstring")

				        .expect("pageserver connstr should be provided");

				    let tenant = spec

				        .cluster

				        .settings

				        .find("neon.tenant_id")

				        .expect("tenant id should be provided");

				    let timeline = spec

				        .cluster

				        .settings

				        .find("neon.timeline_id")

				        .expect("tenant id should be provided");

				fn try_spec_from_cli(

				    matches: &clap::ArgMatches,

				    ProcessCliResult {

				        spec_json,

				        spec_path,

				        ..

				    }: &ProcessCliResult,

				) -> Result<CliSpecParams> {

				    let compute_id = matches.get_one::<String>("compute-id");

				    let control_plane_uri = matches.get_one::<String>("control-plane-uri");

				    let compute_state = ComputeNode {

				        start_time: Utc::now(),

				        connstr: Url::parse(connstr).context("cannot parse connstr as a URL")?,

				        pgdata: pgdata.to_string(),

				        pgbin: pgbin.to_string(),

				        spec,

				        tenant,

				        timeline,

				        pageserver_connstr,

				        metrics: ComputeMetrics::default(),

				        state: RwLock::new(ComputeState::new()),

				    };

				    let compute = Arc::new(compute_state);

				    // Launch service threads first, so we were able to serve availability

				    // requests, while configuration is still in progress.

				    let _http_handle = launch_http_server(&compute).expect("cannot launch http endpoint thread");

				    let _monitor_handle = launch_monitor(&compute).expect("cannot launch compute monitor thread");

				    // Start Postgres

				    let mut delay_exit = false;

				    let mut exit_code = None;

				    let pg = match compute.start_compute() {

				        Ok(pg) => Some(pg),

				        Err(err) => {

				            error!("could not start the compute node: {:?}", err);

				            let mut state = compute.state.write().unwrap();

				            state.error = Some(format!("{:?}", err));

				            state.status = ComputeStatus::Failed;

				            drop(state);

				            delay_exit = true;

				            None

				    let spec;

				    let mut live_config_allowed = false;

				    match spec_json {

				        // First, try to get cluster spec from the cli argument

				        Some(json) => {

				            info!("got spec from cli argument {}", json);

				            spec = Some(serde_json::from_str(json)?);

				        }

				        None => {

				            // Second, try to read it from the file if path is provided

				            if let Some(sp) = spec_path {

				                let path = Path::new(sp);

				                let file = File::open(path)?;

				                spec = Some(serde_json::from_reader(file)?);

				                live_config_allowed = true;

				            } else if let Some(id) = compute_id {

				                if let Some(cp_base) = control_plane_uri {

				                    live_config_allowed = true;

				                    spec = match get_spec_from_control_plane(cp_base, id) {

				                        Ok(s) => s,

				                        Err(e) => {

				                            error!("cannot get response from control plane: {}", e);

				                            panic!("neither spec nor confirmation that compute is in the Empty state was received");

				                        }

				                    };

				                } else {

				                    panic!("must specify both --control-plane-uri and --compute-id or none");

				                }

				            } else {

				                panic!(

				                    "compute spec should be provided by one of the following ways: \

				                    --spec OR --spec-path OR --control-plane-uri and --compute-id"

				                );

				            }

				        }

				    };

				    Ok(CliSpecParams {

				        spec,

				        live_config_allowed,

				    })

				}

				struct CliSpecParams {

				    /// If a spec was provided via CLI or file, the [`ComputeSpec`]

				    spec: Option<ComputeSpec>,

				    live_config_allowed: bool,

				}

				fn wait_spec(

				    build_tag: String,

				    ProcessCliResult {

				        connstr,

				        pgdata,

				        pgbin,

				        ext_remote_storage,

				        resize_swap_on_bind,

				        http_port,

				        ..

				    }: ProcessCliResult,

				    CliSpecParams {

				        spec,

				        live_config_allowed,

				    }: CliSpecParams,

				) -> Result<WaitSpecResult> {

				    let mut new_state = ComputeState::new();

				    let spec_set;

				    if let Some(spec) = spec {

				        let pspec = ParsedSpec::try_from(spec).map_err(|msg| anyhow::anyhow!(msg))?;

				        info!("new pspec.spec: {:?}", pspec.spec);

				        new_state.pspec = Some(pspec);

				        spec_set = true;

				    } else {

				        spec_set = false;

				    }

				    let compute_node = ComputeNode {

				        connstr: Url::parse(connstr).context("cannot parse connstr as a URL")?,

				        pgdata: pgdata.to_string(),

				        pgbin: pgbin.to_string(),

				        pgversion: get_pg_version(pgbin),

				        live_config_allowed,

				        state: Mutex::new(new_state),

				        state_changed: Condvar::new(),

				        ext_remote_storage: ext_remote_storage.map(|s| s.to_string()),

				        ext_download_progress: RwLock::new(HashMap::new()),

				        build_tag,

				    };

				    let compute = Arc::new(compute_node);

				    // If this is a pooled VM, prewarm before starting HTTP server and becoming

				    // available for binding. Prewarming helps Postgres start quicker later,

				    // because QEMU will already have its memory allocated from the host, and

				    // the necessary binaries will already be cached.

				    if !spec_set {

				        compute.prewarm_postgres()?;

				    }

				    // Launch http service first, so that we can serve control-plane requests

				    // while configuration is still in progress.

				    let _http_handle =

				        launch_http_server(http_port, &compute).expect("cannot launch http endpoint thread");

				    if !spec_set {

				        // No spec provided, hang waiting for it.

				        info!("no compute spec provided, waiting");

				        let mut state = compute.state.lock().unwrap();

				        while state.status != ComputeStatus::ConfigurationPending {

				            state = compute.state_changed.wait(state).unwrap();

				            if state.status == ComputeStatus::ConfigurationPending {

				                info!("got spec, continue configuration");

				                // Spec is already set by the http server handler.

				                break;

				            }

				        }

				        // Record for how long we slept waiting for the spec.

				        let now = Utc::now();

				        state.metrics.wait_for_spec_ms = now

				            .signed_duration_since(state.start_time)

				            .to_std()

				            .unwrap()

				            .as_millis() as u64;

				        // Reset start time, so that the total startup time that is calculated later will

				        // not include the time that we waited for the spec.

				        state.start_time = now;

				    }

				    Ok(WaitSpecResult {

				        compute,

				        http_port,

				        resize_swap_on_bind,

				    })

				}

				struct WaitSpecResult {

				    compute: Arc<ComputeNode>,

				    // passed through from ProcessCliResult

				    http_port: u16,

				    resize_swap_on_bind: bool,

				}

				fn start_postgres(

				    // need to allow unused because `matches` is only used if target_os = "linux"

				    #[allow(unused_variables)] matches: &clap::ArgMatches,

				    WaitSpecResult {

				        compute,

				        http_port,

				        resize_swap_on_bind,

				    }: WaitSpecResult,

				) -> Result<(Option<PostgresHandle>, StartPostgresResult)> {

				    // We got all we need, update the state.

				    let mut state = compute.state.lock().unwrap();

				    state.status = ComputeStatus::Init;

				    compute.state_changed.notify_all();

				    info!(

				        "running compute with features: {:?}",

				        state.pspec.as_ref().unwrap().spec.features

				    );

				    // before we release the mutex, fetch the swap size (if any) for later.

				    let swap_size_bytes = state.pspec.as_ref().unwrap().spec.swap_size_bytes;

				    drop(state);

				    // Launch remaining service threads

				    let _monitor_handle = launch_monitor(&compute);

				    let _configurator_handle = launch_configurator(&compute);

				    let mut prestartup_failed = false;

				    let mut delay_exit = false;

				    // Resize swap to the desired size if the compute spec says so

				    if let (Some(size_bytes), true) = (swap_size_bytes, resize_swap_on_bind) {

				        // To avoid 'swapoff' hitting postgres startup, we need to run resize-swap to completion

				        // *before* starting postgres.

				        //

				        // In theory, we could do this asynchronously if SkipSwapon was enabled for VMs, but this

				        // carries a risk of introducing hard-to-debug issues - e.g. if postgres sometimes gets

				        // OOM-killed during startup because swap wasn't available yet.

				        match resize_swap(size_bytes) {

				            Ok(()) => {

				                let size_gib = size_bytes as f32 / (1 << 20) as f32; // just for more coherent display.

				                info!(%size_bytes, %size_gib, "resized swap");

				            }

				            Err(err) => {

				                let err = err.context("failed to resize swap");

				                error!("{err:#}");

				                // Mark compute startup as failed; don't try to start postgres, and report this

				                // error to the control plane when it next asks.

				                prestartup_failed = true;

				                let mut state = compute.state.lock().unwrap();

				                state.error = Some(format!("{err:?}"));

				                state.status = ComputeStatus::Failed;

				                compute.state_changed.notify_all();

				                delay_exit = true;

				            }

				        }

				    }

				    let extension_server_port: u16 = http_port;

				    // Start Postgres

				    let mut pg = None;

				    if !prestartup_failed {

				        pg = match compute.start_compute(extension_server_port) {

				            Ok(pg) => Some(pg),

				            Err(err) => {

				                error!("could not start the compute node: {:#}", err);

				                let mut state = compute.state.lock().unwrap();

				                state.error = Some(format!("{:?}", err));

				                state.status = ComputeStatus::Failed;

				                // Notify others that Postgres failed to start. In case of configuring the

				                // empty compute, it's likely that API handler is still waiting for compute

				                // state change. With this we will notify it that compute is in Failed state,

				                // so control plane will know about it earlier and record proper error instead

				                // of timeout.

				                compute.state_changed.notify_all();

				                drop(state); // unlock

				                delay_exit = true;

				                None

				            }

				        };

				    } else {

				        warn!("skipping postgres startup because pre-startup step failed");

				    }

				    // Start the vm-monitor if directed to. The vm-monitor only runs on linux

				    // because it requires cgroups.

				    cfg_if::cfg_if! {

				        if #[cfg(target_os = "linux")] {

				            use std::env;

				            use tokio_util::sync::CancellationToken;

				            let vm_monitor_addr = matches

				                .get_one::<String>("vm-monitor-addr")

				                .expect("--vm-monitor-addr should always be set because it has a default arg");

				            let file_cache_connstr = matches.get_one::<String>("filecache-connstr");

				            let cgroup = matches.get_one::<String>("cgroup");

				            // Only make a runtime if we need to.

				            // Note: it seems like you can make a runtime in an inner scope and

				            // if you start a task in it it won't be dropped. However, make it

				            // in the outermost scope just to be safe.

				            let rt = if env::var_os("AUTOSCALING").is_some() {

				                Some(

				                    tokio::runtime::Builder::new_multi_thread()

				                        .worker_threads(4)

				                        .enable_all()

				                        .build()

				                        .expect("failed to create tokio runtime for monitor")

				                )

				            } else {

				                None

				            };

				            // This token is used internally by the monitor to clean up all threads

				            let token = CancellationToken::new();

				            let vm_monitor = rt.as_ref().map(|rt| {

				                rt.spawn(vm_monitor::start(

				                    Box::leak(Box::new(vm_monitor::Args {

				                        cgroup: cgroup.cloned(),

				                        pgconnstr: file_cache_connstr.cloned(),

				                        addr: vm_monitor_addr.clone(),

				                    })),

				                    token.clone(),

				                ))

				            });

				        }

				    }

				    Ok((

				        pg,

				        StartPostgresResult {

				            delay_exit,

				            compute,

				            #[cfg(target_os = "linux")]

				            rt,

				            #[cfg(target_os = "linux")]

				            token,

				            #[cfg(target_os = "linux")]

				            vm_monitor,

				        },

				    ))

				}

				type PostgresHandle = (std::process::Child, std::thread::JoinHandle<()>);

				struct StartPostgresResult {

				    delay_exit: bool,

				    // passed through from WaitSpecResult

				    compute: Arc<ComputeNode>,

				    #[cfg(target_os = "linux")]

				    rt: Option<tokio::runtime::Runtime>,

				    #[cfg(target_os = "linux")]

				    token: tokio_util::sync::CancellationToken,

				    #[cfg(target_os = "linux")]

				    vm_monitor: Option<tokio::task::JoinHandle<Result<()>>>,

				}

				fn wait_postgres(pg: Option<PostgresHandle>) -> Result<WaitPostgresResult> {

				    // Wait for the child Postgres process forever. In this state Ctrl+C will

				    // propagate to Postgres and it will be shut down as well.

				    if let Some(mut pg) = pg {

				        // Startup is finished, exit the startup tracing span

				        drop(startup_context_guard);

				    let mut exit_code = None;

				    if let Some((mut pg, logs_handle)) = pg {

				        let ecode = pg

				            .wait()

				            .expect("failed to start waiting on Postgres process");

				        PG_PID.store(0, Ordering::SeqCst);

				        // Process has exited, so we can join the logs thread.

				        let _ = logs_handle

				            .join()

				            .map_err(|e| tracing::error!("log thread panicked: {:?}", e));

				        info!("Postgres exited with code {}, shutting down", ecode);

				        exit_code = ecode.code()

				    }

				    Ok(WaitPostgresResult { exit_code })

				}

				struct WaitPostgresResult {

				    exit_code: Option<i32>,

				}

				fn cleanup_after_postgres_exit(

				    StartPostgresResult {

				        mut delay_exit,

				        compute,

				        #[cfg(target_os = "linux")]

				        vm_monitor,

				        #[cfg(target_os = "linux")]

				        token,

				        #[cfg(target_os = "linux")]

				        rt,

				    }: StartPostgresResult,

				) -> Result<bool> {

				    // Terminate the vm_monitor so it releases the file watcher on

				    // /sys/fs/cgroup/neon-postgres.

				    // Note: the vm-monitor only runs on linux because it requires cgroups.

				    cfg_if::cfg_if! {

				        if #[cfg(target_os = "linux")] {

				            if let Some(handle) = vm_monitor {

				                // Kills all threads spawned by the monitor

				                token.cancel();

				                // Kills the actual task running the monitor

				                handle.abort();

				                // If handle is some, rt must have been used to produce it, and

				                // hence is also some

				                rt.unwrap().shutdown_timeout(Duration::from_secs(2));

				            }

				        }

				    }

				    // Maybe sync safekeepers again, to speed up next startup

				    let compute_state = compute.state.lock().unwrap().clone();

				    let pspec = compute_state.pspec.as_ref().expect("spec must be set");

				    if matches!(pspec.spec.mode, compute_api::spec::ComputeMode::Primary) {

				        info!("syncing safekeepers on shutdown");

				        let storage_auth_token = pspec.storage_auth_token.clone();

				        let lsn = compute.sync_safekeepers(storage_auth_token)?;

				        info!("synced safekeepers at lsn {lsn}");

				    }

				    let mut state = compute.state.lock().unwrap();

				    if state.status == ComputeStatus::TerminationPending {

				        state.status = ComputeStatus::Terminated;

				        compute.state_changed.notify_all();

				        // we were asked to terminate gracefully, don't exit to avoid restart

				        delay_exit = true

				    }

				    drop(state);

				    if let Err(err) = compute.check_for_core_dumps() {

				        error!("error while checking for core dumps: {err:?}");

				    }

				    Ok(delay_exit)

				}

				fn maybe_delay_exit(delay_exit: bool) {

				    // If launch failed, keep serving HTTP requests for a while, so the cloud

				    // control plane can get the actual error.

				    if delay_exit {

				        info!("giving control plane 30s to collect the error before shutdown");

				        thread::sleep(Duration::from_secs(30));

				        info!("shutting down");

				    }

				}

				fn deinit_and_exit(WaitPostgresResult { exit_code }: WaitPostgresResult) -> ! {

				    // Shutdown trace pipeline gracefully, so that it has a chance to send any

				    // pending traces before we exit. Shutting down OTEL tracing provider may

				    // hang for quite some time, see, for example:

				    // - https://github.com/open-telemetry/opentelemetry-rust/issues/868

				    // - and our problems with staging https://github.com/neondatabase/cloud/issues/3707#issuecomment-1493983636

				    //

				    // Yet, we want computes to shut down fast enough, as we may need a new one

				    // for the same timeline ASAP. So wait no longer than 2s for the shutdown to

				    // complete, then just error out and exit the main thread.

				    info!("shutting down tracing");

				    let (sender, receiver) = mpsc::channel();

				    let _ = thread::spawn(move || {

				        tracing_utils::shutdown_tracing();

				        sender.send(()).ok()

				    });

				    let shutdown_res = receiver.recv_timeout(Duration::from_millis(2000));

				    if shutdown_res.is_err() {

				        error!("timed out while shutting down tracing, exiting anyway");

				    }

				    // Shutdown trace pipeline gracefully, so that it has a chance to send any

				    // pending traces before we exit.

				    tracing_utils::shutdown_tracing();

				    info!("shutting down");

				    exit(exit_code.unwrap_or(1))

				}

				@@ -216,6 +657,14 @@ fn cli() -> clap::Command {

				    let version = option_env!("CARGO_PKG_VERSION").unwrap_or("unknown");

				    clap::Command::new("compute_ctl")

				        .version(version)

				        .arg(

				            Arg::new("http-port")

				                .long("http-port")

				                .value_name("HTTP_PORT")

				                .default_value("3080")

				                .value_parser(clap::value_parser!(u16))

				                .required(false),

				        )

				        .arg(

				            Arg::new("connstr")

				                .short('C')

				@@ -259,8 +708,51 @@ fn cli() -> clap::Command {

				            Arg::new("control-plane-uri")

				                .short('p')

				                .long("control-plane-uri")

				                .value_name("CONTROL_PLANE"),

				                .value_name("CONTROL_PLANE_API_BASE_URI"),

				        )

				        .arg(

				            Arg::new("remote-ext-config")

				                .short('r')

				                .long("remote-ext-config")

				                .value_name("REMOTE_EXT_CONFIG"),

				        )

				        // TODO(fprasx): we currently have default arguments because the cloud PR

				        // to pass them in hasn't been merged yet. We should get rid of them once

				        // the PR is merged.

				        .arg(

				            Arg::new("vm-monitor-addr")

				                .long("vm-monitor-addr")

				                .default_value("0.0.0.0:10301")

				                .value_name("VM_MONITOR_ADDR"),

				        )

				        .arg(

				            Arg::new("cgroup")

				                .long("cgroup")

				                .default_value("neon-postgres")

				                .value_name("CGROUP"),

				        )

				        .arg(

				            Arg::new("filecache-connstr")

				                .long("filecache-connstr")

				                .default_value(

				                    "host=localhost port=5432 dbname=postgres user=cloud_admin sslmode=disable application_name=vm-monitor",

				                )

				                .value_name("FILECACHE_CONNSTR"),

				        )

				        .arg(

				            Arg::new("resize-swap-on-bind")

				                .long("resize-swap-on-bind")

				                .action(clap::ArgAction::SetTrue),

				        )

				}

				/// When compute_ctl is killed, send also termination signal to sync-safekeepers

				/// to prevent leakage. TODO: it is better to convert compute_ctl to async and

				/// wait for termination which would be easy then.

				fn handle_exit_signal(sig: i32) {

				    info!("received {sig} termination signal");

				    forward_termination_signal();

				    exit(1);

				}

				#[test]

									
										116

compute_tools/src/catalog.rs
									
										Normal file
									
												View File
												
				@@ -0,0 +1,116 @@

				use compute_api::{

				    responses::CatalogObjects,

				    spec::{Database, Role},

				};

				use futures::Stream;

				use postgres::{Client, NoTls};

				use std::{path::Path, process::Stdio, result::Result, sync::Arc};

				use tokio::{

				    io::{AsyncBufReadExt, BufReader},

				    process::Command,

				    task,

				};

				use tokio_stream::{self as stream, StreamExt};

				use tokio_util::codec::{BytesCodec, FramedRead};

				use tracing::warn;

				use crate::{

				    compute::ComputeNode,

				    pg_helpers::{get_existing_dbs, get_existing_roles},

				};

				pub async fn get_dbs_and_roles(compute: &Arc<ComputeNode>) -> anyhow::Result<CatalogObjects> {

				    let connstr = compute.connstr.clone();

				    task::spawn_blocking(move || {

				        let mut client = Client::connect(connstr.as_str(), NoTls)?;

				        let roles: Vec<Role>;

				        {

				            let mut xact = client.transaction()?;

				            roles = get_existing_roles(&mut xact)?;

				        }

				        let databases: Vec<Database> = get_existing_dbs(&mut client)?.values().cloned().collect();

				        Ok(CatalogObjects { roles, databases })

				    })

				    .await?

				}

				#[derive(Debug, thiserror::Error)]

				pub enum SchemaDumpError {

				    #[error("Database does not exist.")]

				    DatabaseDoesNotExist,

				    #[error("Failed to execute pg_dump.")]

				    IO(#[from] std::io::Error),

				}

				// It uses the pg_dump utility to dump the schema of the specified database.

				// The output is streamed back to the caller and supposed to be streamed via HTTP.

				//

				// Before return the result with the output, it checks that pg_dump produced any output.

				// If not, it tries to parse the stderr output to determine if the database does not exist

				// and special error is returned.

				//

				// To make sure that the process is killed when the caller drops the stream, we use tokio kill_on_drop feature.

				pub async fn get_database_schema(

				    compute: &Arc<ComputeNode>,

				    dbname: &str,

				) -> Result<impl Stream<Item = Result<bytes::Bytes, std::io::Error>>, SchemaDumpError> {

				    let pgbin = &compute.pgbin;

				    let basepath = Path::new(pgbin).parent().unwrap();

				    let pgdump = basepath.join("pg_dump");

				    let mut connstr = compute.connstr.clone();

				    connstr.set_path(dbname);

				    let mut cmd = Command::new(pgdump)

				        .arg("--schema-only")

				        .arg(connstr.as_str())

				        .stdout(Stdio::piped())

				        .stderr(Stdio::piped())

				        .kill_on_drop(true)

				        .spawn()?;

				    let stdout = cmd.stdout.take().ok_or_else(|| {

				        std::io::Error::new(std::io::ErrorKind::Other, "Failed to capture stdout.")

				    })?;

				    let stderr = cmd.stderr.take().ok_or_else(|| {

				        std::io::Error::new(std::io::ErrorKind::Other, "Failed to capture stderr.")

				    })?;

				    let mut stdout_reader = FramedRead::new(stdout, BytesCodec::new());

				    let stderr_reader = BufReader::new(stderr);

				    let first_chunk = match stdout_reader.next().await {

				        Some(Ok(bytes)) if !bytes.is_empty() => bytes,

				        Some(Err(e)) => {

				            return Err(SchemaDumpError::IO(e));

				        }

				        _ => {

				            let mut lines = stderr_reader.lines();

				            if let Some(line) = lines.next_line().await? {

				                if line.contains(&format!("FATAL:  database \"{}\" does not exist", dbname)) {

				                    return Err(SchemaDumpError::DatabaseDoesNotExist);

				                }

				                warn!("pg_dump stderr: {}", line)

				            }

				            tokio::spawn(async move {

				                while let Ok(Some(line)) = lines.next_line().await {

				                    warn!("pg_dump stderr: {}", line)

				                }

				            });

				            return Err(SchemaDumpError::IO(std::io::Error::new(

				                std::io::ErrorKind::Other,

				                "failed to start pg_dump",

				            )));

				        }

				    };

				    let initial_stream = stream::once(Ok(first_chunk.freeze()));

				    // Consume stderr and log warnings

				    tokio::spawn(async move {

				        let mut lines = stderr_reader.lines();

				        while let Ok(Some(line)) = lines.next_line().await {

				            warn!("pg_dump stderr: {}", line)

				        }

				    });

				    Ok(initial_stream.chain(stdout_reader.map(|res| res.map(|b| b.freeze()))))

				}

									
										74

compute_tools/src/checker.rs
									
												View File
												
				@@ -1,45 +1,79 @@

				use anyhow::{anyhow, Result};

				use anyhow::{anyhow, Ok, Result};

				use postgres::Client;

				use tokio_postgres::NoTls;

				use tracing::{error, instrument};

				use tracing::{error, instrument, warn};

				use crate::compute::ComputeNode;

				#[instrument(skip_all)]

				pub fn create_writability_check_data(client: &mut Client) -> Result<()> {

				/// Create a special service table for availability checks

				/// only if it does not exist already.

				pub fn create_availability_check_data(client: &mut Client) -> Result<()> {

				    let query = "

				    CREATE TABLE IF NOT EXISTS health_check (

				        id serial primary key,

				        updated_at timestamptz default now()

				    );

				    INSERT INTO health_check VALUES (1, now())

				        ON CONFLICT (id) DO UPDATE

				         SET updated_at = now();";

				    let result = client.simple_query(query)?;

				    if result.len() < 2 {

				        return Err(anyhow::format_err!("executed  {} queries", result.len()));

				    }

				        DO $$

				        BEGIN

				            IF NOT EXISTS(

				                SELECT 1

				                FROM pg_catalog.pg_tables

				                WHERE tablename = 'health_check'

				            )

				            THEN

				            CREATE TABLE health_check (

				                id serial primary key,

				                updated_at timestamptz default now()

				            );

				            INSERT INTO health_check VALUES (1, now())

				                ON CONFLICT (id) DO UPDATE

				                 SET updated_at = now();

				            END IF;

				        END

				        $$;";

				    client.execute(query, &[])?;

				    Ok(())

				}

				/// Update timestamp in a row in a special service table to check

				/// that we can actually write some data in this particular timeline.

				#[instrument(skip_all)]

				pub async fn check_writability(compute: &ComputeNode) -> Result<()> {

				    // Connect to the database.

				    let (client, connection) = tokio_postgres::connect(compute.connstr.as_str(), NoTls).await?;

				    if client.is_closed() {

				        return Err(anyhow!("connection to postgres closed"));

				    }

				    // The connection object performs the actual communication with the database,

				    // so spawn it off to run on its own.

				    tokio::spawn(async move {

				        if let Err(e) = connection.await {

				            error!("connection error: {}", e);

				        }

				    });

				    let result = client

				        .simple_query("UPDATE health_check SET updated_at = now() WHERE id = 1;")

				        .await?;

				    let query = "

				    INSERT INTO health_check VALUES (1, now())

				        ON CONFLICT (id) DO UPDATE

				         SET updated_at = now();";

				    if result.len() != 1 {

				        return Err(anyhow!("statement can't be executed"));

				    match client.simple_query(query).await {

				        Result::Ok(result) => {

				            if result.len() != 1 {

				                return Err(anyhow::anyhow!(

				                    "expected 1 query results, but got {}",

				                    result.len()

				                ));

				            }

				        }

				        Err(err) => {

				            if let Some(state) = err.code() {

				                if state == &tokio_postgres::error::SqlState::DISK_FULL {

				                    warn!("Tenant disk is full");

				                    return Ok(());

				                }

				            }

				            return Err(err.into());

				        }

				    }

				    Ok(())

				}

1300

compute_tools/src/compute.rs

View File

File diff suppressed because it is too large Load Diff

									
										112

compute_tools/src/config.rs
									
												View File
												
				@@ -5,8 +5,9 @@ use std::path::Path;

				use anyhow::Result;

				use crate::pg_helpers::PgOptionsSerialize;

				use crate::spec::ComputeSpec;

				use crate::pg_helpers::escape_conf_value;

				use crate::pg_helpers::{GenericOptionExt, PgOptionsSerialize};

				use compute_api::spec::{ComputeMode, ComputeSpec, GenericOption};

				/// Check that `line` is inside a text file and put it there if it is not.

				/// Create file if it doesn't exist.

				@@ -16,6 +17,7 @@ pub fn line_in_file(path: &Path, line: &str) -> Result<bool> {

				        .write(true)

				        .create(true)

				        .append(false)

				        .truncate(false)

				        .open(path)?;

				    let buf = io::BufReader::new(&file);

				    let mut count: usize = 0;

				@@ -32,20 +34,108 @@ pub fn line_in_file(path: &Path, line: &str) -> Result<bool> {

				}

				/// Create or completely rewrite configuration file specified by `path`

				pub fn write_postgres_conf(path: &Path, spec: &ComputeSpec) -> Result<()> {

				pub fn write_postgres_conf(

				    path: &Path,

				    spec: &ComputeSpec,

				    extension_server_port: Option<u16>,

				) -> Result<()> {

				    // File::create() destroys the file content if it exists.

				    let mut postgres_conf = File::create(path)?;

				    let mut file = File::create(path)?;

				    write_auto_managed_block(&mut postgres_conf, &spec.cluster.settings.as_pg_settings())?;

				    // Write the postgresql.conf content from the spec file as is.

				    if let Some(conf) = &spec.cluster.postgresql_conf {

				        writeln!(file, "{}", conf)?;

				    }

				    // Add options for connecting to storage

				    writeln!(file, "# Neon storage settings")?;

				    if let Some(s) = &spec.pageserver_connstring {

				        writeln!(file, "neon.pageserver_connstring={}", escape_conf_value(s))?;

				    }

				    if let Some(stripe_size) = spec.shard_stripe_size {

				        writeln!(file, "neon.stripe_size={stripe_size}")?;

				    }

				    if !spec.safekeeper_connstrings.is_empty() {

				        writeln!(

				            file,

				            "neon.safekeepers={}",

				            escape_conf_value(&spec.safekeeper_connstrings.join(","))

				        )?;

				    }

				    if let Some(s) = &spec.tenant_id {

				        writeln!(file, "neon.tenant_id={}", escape_conf_value(&s.to_string()))?;

				    }

				    if let Some(s) = &spec.timeline_id {

				        writeln!(

				            file,

				            "neon.timeline_id={}",

				            escape_conf_value(&s.to_string())

				        )?;

				    }

				    match spec.mode {

				        ComputeMode::Primary => {}

				        ComputeMode::Static(lsn) => {

				            // hot_standby is 'on' by default, but let's be explicit

				            writeln!(file, "hot_standby=on")?;

				            writeln!(file, "recovery_target_lsn='{lsn}'")?;

				        }

				        ComputeMode::Replica => {

				            // hot_standby is 'on' by default, but let's be explicit

				            writeln!(file, "hot_standby=on")?;

				        }

				    }

				    if cfg!(target_os = "linux") {

				        // Check /proc/sys/vm/overcommit_memory -- if it equals 2 (i.e. linux memory overcommit is

				        // disabled), then the control plane has enabled swap and we should set

				        // dynamic_shared_memory_type = 'mmap'.

				        //

				        // This is (maybe?) temporary - for more, see https://github.com/neondatabase/cloud/issues/12047.

				        let overcommit_memory_contents = std::fs::read_to_string("/proc/sys/vm/overcommit_memory")

				            // ignore any errors - they may be expected to occur under certain situations (e.g. when

				            // not running in Linux).

				            .unwrap_or_else(|_| String::new());

				        if overcommit_memory_contents.trim() == "2" {

				            let opt = GenericOption {

				                name: "dynamic_shared_memory_type".to_owned(),

				                value: Some("mmap".to_owned()),

				                vartype: "enum".to_owned(),

				            };

				            write!(file, "{}", opt.to_pg_setting())?;

				        }

				    }

				    // If there are any extra options in the 'settings' field, append those

				    if spec.cluster.settings.is_some() {

				        writeln!(file, "# Managed by compute_ctl: begin")?;

				        write!(file, "{}", spec.cluster.settings.as_pg_settings())?;

				        writeln!(file, "# Managed by compute_ctl: end")?;

				    }

				    if let Some(port) = extension_server_port {

				        writeln!(file, "neon.extension_server_port={}", port)?;

				    }

				    // This is essential to keep this line at the end of the file,

				    // because it is intended to override any settings above.

				    writeln!(file, "include_if_exists = 'compute_ctl_temp_override.conf'")?;

				    Ok(())

				}

				// Write Postgres config block wrapped with generated comment section

				fn write_auto_managed_block(file: &mut File, buf: &str) -> Result<()> {

				    writeln!(file, "# Managed by compute_ctl: begin")?;

				    writeln!(file, "{}", buf)?;

				    writeln!(file, "# Managed by compute_ctl: end")?;

				pub fn with_compute_ctl_tmp_override<F>(pgdata_path: &Path, options: &str, exec: F) -> Result<()>

				where

				    F: FnOnce() -> Result<()>,

				{

				    let path = pgdata_path.join("compute_ctl_temp_override.conf");

				    let mut file = File::create(path)?;

				    write!(file, "{}", options)?;

				    Ok(())

				    let res = exec();

				    file.set_len(0)?;

				    res

				}

									
										54

compute_tools/src/configurator.rs
									
										Normal file
									
												View File
												
				@@ -0,0 +1,54 @@

				use std::sync::Arc;

				use std::thread;

				use tracing::{error, info, instrument};

				use compute_api::responses::ComputeStatus;

				use crate::compute::ComputeNode;

				#[instrument(skip_all)]

				fn configurator_main_loop(compute: &Arc<ComputeNode>) {

				    info!("waiting for reconfiguration requests");

				    loop {

				        let state = compute.state.lock().unwrap();

				        let mut state = compute.state_changed.wait(state).unwrap();

				        if state.status == ComputeStatus::ConfigurationPending {

				            info!("got configuration request");

				            state.status = ComputeStatus::Configuration;

				            compute.state_changed.notify_all();

				            drop(state);

				            let mut new_status = ComputeStatus::Failed;

				            if let Err(e) = compute.reconfigure() {

				                error!("could not configure compute node: {}", e);

				            } else {

				                new_status = ComputeStatus::Running;

				                info!("compute node configured");

				            }

				            // XXX: used to test that API is blocking

				            // std::thread::sleep(std::time::Duration::from_millis(10000));

				            compute.set_status(new_status);

				        } else if state.status == ComputeStatus::Failed {

				            info!("compute node is now in Failed state, exiting");

				            break;

				        } else {

				            info!("woken up for compute status: {:?}, sleeping", state.status);

				        }

				    }

				}

				pub fn launch_configurator(compute: &Arc<ComputeNode>) -> thread::JoinHandle<()> {

				    let compute = Arc::clone(compute);

				    thread::Builder::new()

				        .name("compute-configurator".into())

				        .spawn(move || {

				            configurator_main_loop(&compute);

				            info!("configurator thread is exited");

				        })

				        .expect("cannot launch configurator thread")

				}

									
										296

compute_tools/src/extension_server.rs
									
										Normal file
									
												View File
												
				@@ -0,0 +1,296 @@

				// Download extension files from the extension store

				// and put them in the right place in the postgres directory (share / lib)

				/*

				The layout of the S3 bucket is as follows:

				5615610098 // this is an extension build number

				├── v14

				│   ├── extensions

				│   │   ├── anon.tar.zst

				│   │   └── embedding.tar.zst

				│   └── ext_index.json

				└── v15

				    ├── extensions

				    │   ├── anon.tar.zst

				    │   └── embedding.tar.zst

				    └── ext_index.json

				5615261079

				├── v14

				│   ├── extensions

				│   │   └── anon.tar.zst

				│   └── ext_index.json

				└── v15

				    ├── extensions

				    │   └── anon.tar.zst

				    └── ext_index.json

				5623261088

				├── v14

				│   ├── extensions

				│   │   └── embedding.tar.zst

				│   └── ext_index.json

				└── v15

				    ├── extensions

				    │   └── embedding.tar.zst

				    └── ext_index.json

				Note that build number cannot be part of prefix because we might need extensions

				from other build numbers.

				ext_index.json stores the control files and location of extension archives

				It also stores a list of public extensions and a library_index

				We don't need to duplicate extension.tar.zst files.

				We only need to upload a new one if it is updated.

				(Although currently we just upload every time anyways, hopefully will change

				this sometime)

				*access* is controlled by spec

				More specifically, here is an example ext_index.json

				{

				    "public_extensions": [

				        "anon",

				        "pg_buffercache"

				    ],

				    "library_index": {

				        "anon": "anon",

				        "pg_buffercache": "pg_buffercache"

				    },

				    "extension_data": {

				        "pg_buffercache": {

				            "control_data": {

				                "pg_buffercache.control": "# pg_buffercache extension \ncomment = 'examine the shared buffer cache' \ndefault_version = '1.3' \nmodule_pathname = '$libdir/pg_buffercache' \nrelocatable = true \ntrusted=true"

				            },

				            "archive_path": "5670669815/v14/extensions/pg_buffercache.tar.zst"

				        },

				        "anon": {

				            "control_data": {

				                "anon.control": "# PostgreSQL Anonymizer (anon) extension \ncomment = 'Data anonymization tools' \ndefault_version = '1.1.0' \ndirectory='extension/anon' \nrelocatable = false \nrequires = 'pgcrypto' \nsuperuser = false \nmodule_pathname = '$libdir/anon' \ntrusted = true \n"

				            },

				            "archive_path": "5670669815/v14/extensions/anon.tar.zst"

				        }

				    }

				}

				*/

				use anyhow::Result;

				use anyhow::{bail, Context};

				use bytes::Bytes;

				use compute_api::spec::RemoteExtSpec;

				use regex::Regex;

				use remote_storage::*;

				use reqwest::StatusCode;

				use std::path::Path;

				use std::str;

				use tar::Archive;

				use tracing::info;

				use tracing::log::warn;

				use zstd::stream::read::Decoder;

				fn get_pg_config(argument: &str, pgbin: &str) -> String {

				    // gives the result of `pg_config [argument]`

				    // where argument is a flag like `--version` or `--sharedir`

				    let pgconfig = pgbin

				        .strip_suffix("postgres")

				        .expect("bad pgbin")

				        .to_owned()

				        + "/pg_config";

				    let config_output = std::process::Command::new(pgconfig)

				        .arg(argument)

				        .output()

				        .expect("pg_config error");

				    std::str::from_utf8(&config_output.stdout)

				        .expect("pg_config error")

				        .trim()

				        .to_string()

				}

				pub fn get_pg_version(pgbin: &str) -> String {

				    // pg_config --version returns a (platform specific) human readable string

				    // such as "PostgreSQL 15.4". We parse this to v14/v15/v16 etc.

				    let human_version = get_pg_config("--version", pgbin);

				    return parse_pg_version(&human_version).to_string();

				}

				fn parse_pg_version(human_version: &str) -> &str {

				    // Normal releases have version strings like "PostgreSQL 15.4". But there

				    // are also pre-release versions like "PostgreSQL 17devel" or "PostgreSQL

				    // 16beta2" or "PostgreSQL 17rc1". And with the --with-extra-version

				    // configure option, you can tack any string to the version number,

				    // e.g. "PostgreSQL 15.4foobar".

				    match Regex::new(r"^PostgreSQL (?<major>\d+).+")

				        .unwrap()

				        .captures(human_version)

				    {

				        Some(captures) if captures.len() == 2 => match &captures["major"] {

				            "14" => return "v14",

				            "15" => return "v15",

				            "16" => return "v16",

				            _ => {}

				        },

				        _ => {}

				    }

				    panic!("Unsuported postgres version {human_version}");

				}

				// download the archive for a given extension,

				// unzip it, and place files in the appropriate locations (share/lib)

				pub async fn download_extension(

				    ext_name: &str,

				    ext_path: &RemotePath,

				    ext_remote_storage: &str,

				    pgbin: &str,

				) -> Result<u64> {

				    info!("Download extension {:?} from {:?}", ext_name, ext_path);

				    // TODO add retry logic

				    let download_buffer =

				        match download_extension_tar(ext_remote_storage, &ext_path.to_string()).await {

				            Ok(buffer) => buffer,

				            Err(error_message) => {

				                return Err(anyhow::anyhow!(

				                    "error downloading extension {:?}: {:?}",

				                    ext_name,

				                    error_message

				                ));

				            }

				        };

				    let download_size = download_buffer.len() as u64;

				    info!("Download size {:?}", download_size);

				    // it's unclear whether it is more performant to decompress into memory or not

				    // TODO: decompressing into memory can be avoided

				    let decoder = Decoder::new(download_buffer.as_ref())?;

				    let mut archive = Archive::new(decoder);

				    let unzip_dest = pgbin

				        .strip_suffix("/bin/postgres")

				        .expect("bad pgbin")

				        .to_string()

				        + "/download_extensions";

				    archive.unpack(&unzip_dest)?;

				    info!("Download + unzip {:?} completed successfully", &ext_path);

				    let sharedir_paths = (

				        unzip_dest.to_string() + "/share/extension",

				        Path::new(&get_pg_config("--sharedir", pgbin)).join("extension"),

				    );

				    let libdir_paths = (

				        unzip_dest.to_string() + "/lib",

				        Path::new(&get_pg_config("--pkglibdir", pgbin)).to_path_buf(),

				    );

				    // move contents of the libdir / sharedir in unzipped archive to the correct local paths

				    for paths in [sharedir_paths, libdir_paths] {

				        let (zip_dir, real_dir) = paths;

				        info!("mv {zip_dir:?}/*  {real_dir:?}");

				        for file in std::fs::read_dir(zip_dir)? {

				            let old_file = file?.path();

				            let new_file =

				                Path::new(&real_dir).join(old_file.file_name().context("error parsing file")?);

				            info!("moving {old_file:?} to {new_file:?}");

				            // extension download failed: Directory not empty (os error 39)

				            match std::fs::rename(old_file, new_file) {

				                Ok(()) => info!("move succeeded"),

				                Err(e) => {

				                    warn!("move failed, probably because the extension already exists: {e}")

				                }

				            }

				        }

				    }

				    info!("done moving extension {ext_name}");

				    Ok(download_size)

				}

				// Create extension control files from spec

				pub fn create_control_files(remote_extensions: &RemoteExtSpec, pgbin: &str) {

				    let local_sharedir = Path::new(&get_pg_config("--sharedir", pgbin)).join("extension");

				    for (ext_name, ext_data) in remote_extensions.extension_data.iter() {

				        // Check if extension is present in public or custom.

				        // If not, then it is not allowed to be used by this compute.

				        if let Some(public_extensions) = &remote_extensions.public_extensions {

				            if !public_extensions.contains(ext_name) {

				                if let Some(custom_extensions) = &remote_extensions.custom_extensions {

				                    if !custom_extensions.contains(ext_name) {

				                        continue; // skip this extension, it is not allowed

				                    }

				                }

				            }

				        }

				        for (control_name, control_content) in &ext_data.control_data {

				            let control_path = local_sharedir.join(control_name);

				            if !control_path.exists() {

				                info!("writing file {:?}{:?}", control_path, control_content);

				                std::fs::write(control_path, control_content).unwrap();

				            } else {

				                warn!("control file {:?} exists both locally and remotely. ignoring the remote version.", control_path);

				            }

				        }

				    }

				}

				// Do request to extension storage proxy, i.e.

				// curl http://pg-ext-s3-gateway/latest/v15/extensions/anon.tar.zst

				// using HHTP GET

				// and return the response body as bytes

				//

				async fn download_extension_tar(ext_remote_storage: &str, ext_path: &str) -> Result<Bytes> {

				    let uri = format!("{}/{}", ext_remote_storage, ext_path);

				    info!("Download extension {:?} from uri {:?}", ext_path, uri);

				    let resp = reqwest::get(uri).await?;

				    match resp.status() {

				        StatusCode::OK => match resp.bytes().await {

				            Ok(resp) => {

				                info!("Download extension {:?} completed successfully", ext_path);

				                Ok(resp)

				            }

				            Err(e) => bail!("could not deserialize remote extension response: {}", e),

				        },

				        StatusCode::SERVICE_UNAVAILABLE => bail!("remote extension is temporarily unavailable"),

				        _ => bail!(

				            "unexpected remote extension response status code: {}",

				            resp.status()

				        ),

				    }

				}

				#[cfg(test)]

				mod tests {

				    use super::parse_pg_version;

				    #[test]

				    fn test_parse_pg_version() {

				        assert_eq!(parse_pg_version("PostgreSQL 15.4"), "v15");

				        assert_eq!(parse_pg_version("PostgreSQL 15.14"), "v15");

				        assert_eq!(

				            parse_pg_version("PostgreSQL 15.4 (Ubuntu 15.4-0ubuntu0.23.04.1)"),

				            "v15"

				        );

				        assert_eq!(parse_pg_version("PostgreSQL 14.15"), "v14");

				        assert_eq!(parse_pg_version("PostgreSQL 14.0"), "v14");

				        assert_eq!(

				            parse_pg_version("PostgreSQL 14.9 (Debian 14.9-1.pgdg120+1"),

				            "v14"

				        );

				        assert_eq!(parse_pg_version("PostgreSQL 16devel"), "v16");

				        assert_eq!(parse_pg_version("PostgreSQL 16beta1"), "v16");

				        assert_eq!(parse_pg_version("PostgreSQL 16rc2"), "v16");

				        assert_eq!(parse_pg_version("PostgreSQL 16extra"), "v16");

				    }

				    #[test]

				    #[should_panic]

				    fn test_parse_pg_unsupported_version() {

				        parse_pg_version("PostgreSQL 13.14");

				    }

				    #[test]

				    #[should_panic]

				    fn test_parse_pg_incorrect_version_format() {

				        parse_pg_version("PostgreSQL 14");

				    }

				}

									
										372

compute_tools/src/http/api.rs
									
												View File
												
				@@ -1,15 +1,42 @@

				use std::convert::Infallible;

				use std::net::IpAddr;

				use std::net::Ipv6Addr;

				use std::net::SocketAddr;

				use std::sync::Arc;

				use std::thread;

				use crate::compute::ComputeNode;

				use crate::catalog::SchemaDumpError;

				use crate::catalog::{get_database_schema, get_dbs_and_roles};

				use crate::compute::forward_termination_signal;

				use crate::compute::{ComputeNode, ComputeState, ParsedSpec};

				use compute_api::requests::ConfigurationRequest;

				use compute_api::responses::{ComputeStatus, ComputeStatusResponse, GenericAPIError};

				use anyhow::Result;

				use hyper::header::CONTENT_TYPE;

				use hyper::service::{make_service_fn, service_fn};

				use hyper::{Body, Method, Request, Response, Server, StatusCode};

				use serde_json;

				use tracing::{error, info};

				use tokio::task;

				use tracing::{debug, error, info, warn};

				use tracing_utils::http::OtelName;

				use utils::http::request::must_get_query_param;

				fn status_response_from_state(state: &ComputeState) -> ComputeStatusResponse {

				    ComputeStatusResponse {

				        start_time: state.start_time,

				        tenant: state

				            .pspec

				            .as_ref()

				            .map(|pspec| pspec.tenant_id.to_string()),

				        timeline: state

				            .pspec

				            .as_ref()

				            .map(|pspec| pspec.timeline_id.to_string()),

				        status: state.status,

				        last_active: state.last_active,

				        error: state.error.clone(),

				    }

				}

				// Service function to handle all available routes.

				async fn routes(req: Request<Body>, compute: &Arc<ComputeNode>) -> Response<Body> {

				@@ -21,24 +48,197 @@ async fn routes(req: Request<Body>, compute: &Arc<ComputeNode>) -> Response<Body

				    match (req.method(), req.uri().path()) {

				        // Serialized compute state.

				        (&Method::GET, "/status") => {

				            info!("serving /status GET request");

				            let state = compute.state.read().unwrap();

				            Response::new(Body::from(serde_json::to_string(&*state).unwrap()))

				            debug!("serving /status GET request");

				            let state = compute.state.lock().unwrap();

				            let status_response = status_response_from_state(&state);

				            Response::new(Body::from(serde_json::to_string(&status_response).unwrap()))

				        }

				        // Startup metrics in JSON format. Keep /metrics reserved for a possible

				        // future use for Prometheus metrics format.

				        (&Method::GET, "/metrics.json") => {

				            info!("serving /metrics.json GET request");

				            Response::new(Body::from(serde_json::to_string(&compute.metrics).unwrap()))

				            let metrics = compute.state.lock().unwrap().metrics.clone();

				            Response::new(Body::from(serde_json::to_string(&metrics).unwrap()))

				        }

				        // Collect Postgres current usage insights

				        (&Method::GET, "/insights") => {

				            info!("serving /insights GET request");

				            let status = compute.get_status();

				            if status != ComputeStatus::Running {

				                let msg = format!("compute is not running, current status: {:?}", status);

				                error!(msg);

				                return Response::new(Body::from(msg));

				            }

				            let insights = compute.collect_insights().await;

				            Response::new(Body::from(insights))

				        }

				        (&Method::POST, "/check_writability") => {

				            info!("serving /check_writability POST request");

				            let status = compute.get_status();

				            if status != ComputeStatus::Running {

				                let msg = format!(

				                    "invalid compute status for check_writability request: {:?}",

				                    status

				                );

				                error!(msg);

				                return Response::new(Body::from(msg));

				            }

				            let res = crate::checker::check_writability(compute).await;

				            match res {

				                Ok(_) => Response::new(Body::from("true")),

				                Err(e) => Response::new(Body::from(e.to_string())),

				                Err(e) => {

				                    error!("check_writability failed: {}", e);

				                    Response::new(Body::from(e.to_string()))

				                }

				            }

				        }

				        (&Method::GET, "/info") => {

				            let num_cpus = num_cpus::get_physical();

				            info!("serving /info GET request. num_cpus: {}", num_cpus);

				            Response::new(Body::from(

				                serde_json::json!({

				                    "num_cpus": num_cpus,

				                })

				                .to_string(),

				            ))

				        }

				        // Accept spec in JSON format and request compute configuration. If

				        // anything goes wrong after we set the compute status to `ConfigurationPending`

				        // and update compute state with new spec, we basically leave compute

				        // in the potentially wrong state. That said, it's control-plane's

				        // responsibility to watch compute state after reconfiguration request

				        // and to clean restart in case of errors.

				        (&Method::POST, "/configure") => {

				            info!("serving /configure POST request");

				            match handle_configure_request(req, compute).await {

				                Ok(msg) => Response::new(Body::from(msg)),

				                Err((msg, code)) => {

				                    error!("error handling /configure request: {msg}");

				                    render_json_error(&msg, code)

				                }

				            }

				        }

				        (&Method::POST, "/terminate") => {

				            info!("serving /terminate POST request");

				            match handle_terminate_request(compute).await {

				                Ok(()) => Response::new(Body::empty()),

				                Err((msg, code)) => {

				                    error!("error handling /terminate request: {msg}");

				                    render_json_error(&msg, code)

				                }

				            }

				        }

				        (&Method::GET, "/dbs_and_roles") => {

				            info!("serving /dbs_and_roles GET request",);

				            match get_dbs_and_roles(compute).await {

				                Ok(res) => render_json(Body::from(serde_json::to_string(&res).unwrap())),

				                Err(_) => {

				                    render_json_error("can't get dbs and roles", StatusCode::INTERNAL_SERVER_ERROR)

				                }

				            }

				        }

				        (&Method::GET, "/database_schema") => {

				            let database = match must_get_query_param(&req, "database") {

				                Err(e) => return e.into_response(),

				                Ok(database) => database,

				            };

				            info!("serving /database_schema GET request with database: {database}",);

				            match get_database_schema(compute, &database).await {

				                Ok(res) => render_plain(Body::wrap_stream(res)),

				                Err(SchemaDumpError::DatabaseDoesNotExist) => {

				                    render_json_error("database does not exist", StatusCode::NOT_FOUND)

				                }

				                Err(e) => {

				                    error!("can't get schema dump: {}", e);

				                    render_json_error("can't get schema dump", StatusCode::INTERNAL_SERVER_ERROR)

				                }

				            }

				        }

				        // download extension files from remote extension storage on demand

				        (&Method::POST, route) if route.starts_with("/extension_server/") => {

				            info!("serving {:?} POST request", route);

				            info!("req.uri {:?}", req.uri());

				            // don't even try to download extensions

				            // if no remote storage is configured

				            if compute.ext_remote_storage.is_none() {

				                info!("no extensions remote storage configured");

				                let mut resp = Response::new(Body::from("no remote storage configured"));

				                *resp.status_mut() = StatusCode::INTERNAL_SERVER_ERROR;

				                return resp;

				            }

				            let mut is_library = false;

				            if let Some(params) = req.uri().query() {

				                info!("serving {:?} POST request with params: {}", route, params);

				                if params == "is_library=true" {

				                    is_library = true;

				                } else {

				                    let mut resp = Response::new(Body::from("Wrong request parameters"));

				                    *resp.status_mut() = StatusCode::BAD_REQUEST;

				                    return resp;

				                }

				            }

				            let filename = route.split('/').last().unwrap().to_string();

				            info!("serving /extension_server POST request, filename: {filename:?} is_library: {is_library}");

				            // get ext_name and path from spec

				            // don't lock compute_state for too long

				            let ext = {

				                let compute_state = compute.state.lock().unwrap();

				                let pspec = compute_state.pspec.as_ref().expect("spec must be set");

				                let spec = &pspec.spec;

				                // debug only

				                info!("spec: {:?}", spec);

				                let remote_extensions = match spec.remote_extensions.as_ref() {

				                    Some(r) => r,

				                    None => {

				                        info!("no remote extensions spec was provided");

				                        let mut resp = Response::new(Body::from("no remote storage configured"));

				                        *resp.status_mut() = StatusCode::INTERNAL_SERVER_ERROR;

				                        return resp;

				                    }

				                };

				                remote_extensions.get_ext(

				                    &filename,

				                    is_library,

				                    &compute.build_tag,

				                    &compute.pgversion,

				                )

				            };

				            match ext {

				                Ok((ext_name, ext_path)) => {

				                    match compute.download_extension(ext_name, ext_path).await {

				                        Ok(_) => Response::new(Body::from("OK")),

				                        Err(e) => {

				                            error!("extension download failed: {}", e);

				                            let mut resp = Response::new(Body::from(e.to_string()));

				                            *resp.status_mut() = StatusCode::INTERNAL_SERVER_ERROR;

				                            resp

				                        }

				                    }

				                }

				                Err(e) => {

				                    warn!("extension download failed to find extension: {}", e);

				                    let mut resp = Response::new(Body::from("failed to find file"));

				                    *resp.status_mut() = StatusCode::INTERNAL_SERVER_ERROR;

				                    resp

				                }

				            }

				        }

				@@ -51,10 +251,158 @@ async fn routes(req: Request<Body>, compute: &Arc<ComputeNode>) -> Response<Body

				    }

				}

				async fn handle_configure_request(

				    req: Request<Body>,

				    compute: &Arc<ComputeNode>,

				) -> Result<String, (String, StatusCode)> {

				    if !compute.live_config_allowed {

				        return Err((

				            "live configuration is not allowed for this compute node".to_string(),

				            StatusCode::PRECONDITION_FAILED,

				        ));

				    }

				    let body_bytes = hyper::body::to_bytes(req.into_body()).await.unwrap();

				    let spec_raw = String::from_utf8(body_bytes.to_vec()).unwrap();

				    if let Ok(request) = serde_json::from_str::<ConfigurationRequest>(&spec_raw) {

				        let spec = request.spec;

				        let parsed_spec = match ParsedSpec::try_from(spec) {

				            Ok(ps) => ps,

				            Err(msg) => return Err((msg, StatusCode::BAD_REQUEST)),

				        };

				        // XXX: wrap state update under lock in code blocks. Otherwise,

				        // we will try to `Send` `mut state` into the spawned thread

				        // bellow, which will cause error:

				        // ```

				        // error: future cannot be sent between threads safely

				        // ```

				        {

				            let mut state = compute.state.lock().unwrap();

				            if state.status != ComputeStatus::Empty && state.status != ComputeStatus::Running {

				                let msg = format!(

				                    "invalid compute status for configuration request: {:?}",

				                    state.status.clone()

				                );

				                return Err((msg, StatusCode::PRECONDITION_FAILED));

				            }

				            state.pspec = Some(parsed_spec);

				            state.status = ComputeStatus::ConfigurationPending;

				            compute.state_changed.notify_all();

				            drop(state);

				            info!("set new spec and notified waiters");

				        }

				        // Spawn a blocking thread to wait for compute to become Running.

				        // This is needed to do not block the main pool of workers and

				        // be able to serve other requests while some particular request

				        // is waiting for compute to finish configuration.

				        let c = compute.clone();

				        task::spawn_blocking(move || {

				            let mut state = c.state.lock().unwrap();

				            while state.status != ComputeStatus::Running {

				                state = c.state_changed.wait(state).unwrap();

				                info!(

				                    "waiting for compute to become Running, current status: {:?}",

				                    state.status

				                );

				                if state.status == ComputeStatus::Failed {

				                    let err = state.error.as_ref().map_or("unknown error", |x| x);

				                    let msg = format!("compute configuration failed: {:?}", err);

				                    return Err((msg, StatusCode::INTERNAL_SERVER_ERROR));

				                }

				            }

				            Ok(())

				        })

				        .await

				        .unwrap()?;

				        // Return current compute state if everything went well.

				        let state = compute.state.lock().unwrap().clone();

				        let status_response = status_response_from_state(&state);

				        Ok(serde_json::to_string(&status_response).unwrap())

				    } else {

				        Err(("invalid spec".to_string(), StatusCode::BAD_REQUEST))

				    }

				}

				fn render_json_error(e: &str, status: StatusCode) -> Response<Body> {

				    let error = GenericAPIError {

				        error: e.to_string(),

				    };

				    Response::builder()

				        .status(status)

				        .header(CONTENT_TYPE, "application/json")

				        .body(Body::from(serde_json::to_string(&error).unwrap()))

				        .unwrap()

				}

				fn render_json(body: Body) -> Response<Body> {

				    Response::builder()

				        .header(CONTENT_TYPE, "application/json")

				        .body(body)

				        .unwrap()

				}

				fn render_plain(body: Body) -> Response<Body> {

				    Response::builder()

				        .header(CONTENT_TYPE, "text/plain")

				        .body(body)

				        .unwrap()

				}

				async fn handle_terminate_request(compute: &Arc<ComputeNode>) -> Result<(), (String, StatusCode)> {

				    {

				        let mut state = compute.state.lock().unwrap();

				        if state.status == ComputeStatus::Terminated {

				            return Ok(());

				        }

				        if state.status != ComputeStatus::Empty && state.status != ComputeStatus::Running {

				            let msg = format!(

				                "invalid compute status for termination request: {:?}",

				                state.status.clone()

				            );

				            return Err((msg, StatusCode::PRECONDITION_FAILED));

				        }

				        state.status = ComputeStatus::TerminationPending;

				        compute.state_changed.notify_all();

				        drop(state);

				    }

				    forward_termination_signal();

				    info!("sent signal and notified waiters");

				    // Spawn a blocking thread to wait for compute to become Terminated.

				    // This is needed to do not block the main pool of workers and

				    // be able to serve other requests while some particular request

				    // is waiting for compute to finish configuration.

				    let c = compute.clone();

				    task::spawn_blocking(move || {

				        let mut state = c.state.lock().unwrap();

				        while state.status != ComputeStatus::Terminated {

				            state = c.state_changed.wait(state).unwrap();

				            info!(

				                "waiting for compute to become Terminated, current status: {:?}",

				                state.status

				            );

				        }

				        Ok(())

				    })

				    .await

				    .unwrap()?;

				    info!("terminated Postgres");

				    Ok(())

				}

				// Main Hyper HTTP server function that runs it and blocks waiting on it forever.

				#[tokio::main]

				async fn serve(state: Arc<ComputeNode>) {

				    let addr = SocketAddr::from(([0, 0, 0, 0], 3080));

				async fn serve(port: u16, state: Arc<ComputeNode>) {

				    // this usually binds to both IPv4 and IPv6 on linux

				    // see e.g. https://github.com/rust-lang/rust/pull/34440

				    let addr = SocketAddr::new(IpAddr::from(Ipv6Addr::UNSPECIFIED), port);

				    let make_service = make_service_fn(move |_conn| {

				        let state = state.clone();

				@@ -89,10 +437,10 @@ async fn serve(state: Arc<ComputeNode>) {

				}

				/// Launch a separate Hyper HTTP API server thread and return its `JoinHandle`.

				pub fn launch_http_server(state: &Arc<ComputeNode>) -> Result<thread::JoinHandle<()>> {

				pub fn launch_http_server(port: u16, state: &Arc<ComputeNode>) -> Result<thread::JoinHandle<()>> {

				    let state = Arc::clone(state);

				    Ok(thread::Builder::new()

				        .name("http-endpoint".into())

				        .spawn(move || serve(state))?)

				        .spawn(move || serve(port, state))?)

				}

									
										332

compute_tools/src/http/openapi_spec.yaml
									
												View File
												
				@@ -10,12 +10,12 @@ paths:

				  /status:

				    get:

				      tags:

				      - "info"

				      summary: Get compute node internal status

				      - Info

				      summary: Get compute node internal status.

				      description: ""

				      operationId: getComputeStatus

				      responses:

				        "200":

				        200:

				          description: ComputeState

				          content:

				            application/json:

				@@ -25,35 +25,217 @@ paths:

				  /metrics.json:

				    get:

				      tags:

				      - "info"

				      summary: Get compute node startup metrics in JSON format

				      - Info

				      summary: Get compute node startup metrics in JSON format.

				      description: ""

				      operationId: getComputeMetricsJSON

				      responses:

				        "200":

				        200:

				          description: ComputeMetrics

				          content:

				            application/json:

				              schema:

				                $ref: "#/components/schemas/ComputeMetrics"

				  /insights:

				    get:

				      tags:

				      - Info

				      summary: Get current compute insights in JSON format.

				      description: |

				        Note, that this doesn't include any historical data.

				      operationId: getComputeInsights

				      responses:

				        200:

				          description: Compute insights

				          content:

				            application/json:

				              schema:

				                $ref: "#/components/schemas/ComputeInsights"

				  /info:

				    get:

				      tags:

				      - Info

				      summary: Get info about the compute pod / VM.

				      description: ""

				      operationId: getInfo

				      responses:

				        200:

				          description: Info

				          content:

				            application/json:

				              schema:

				                $ref: "#/components/schemas/Info"

				  /dbs_and_roles:

				    get:

				      tags:

				        - Info

				      summary: Get databases and roles in the catalog.

				      description: ""

				      operationId: getDbsAndRoles

				      responses:

				        200:

				          description: Compute schema objects

				          content:

				            application/json:

				              schema:

				                $ref: "#/components/schemas/DbsAndRoles"

				  /database_schema:

				    get:

				      tags:

				        - Info

				      summary: Get schema dump

				      parameters:

				        - name: database

				          in: query

				          description: Database name to dump.

				          required: true

				          schema:

				            type: string

				          example: "postgres"

				      description: Get schema dump in SQL format.

				      operationId: getDatabaseSchema

				      responses:

				        200:

				          description: Schema dump

				          content:

				            text/plain:

				              schema:

				                type: string

				                description: Schema dump in SQL format.

				        404:

				          description: Non existing database.

				          content:

				            application/json:

				              schema:

				                $ref: "#/components/schemas/GenericError"

				  /check_writability:

				    post:

				      tags:

				      - "check"

				      summary: Check that we can write new data on this compute

				      - Check

				      summary: Check that we can write new data on this compute.

				      description: ""

				      operationId: checkComputeWritability

				      responses:

				        "200":

				        200:

				          description: Check result

				          content:

				            text/plain:

				              schema:

				                type: string

				                description: Error text or 'true' if check passed

				                description: Error text or 'true' if check passed.

				                example: "true"

				  /configure:

				    post:

				      tags:

				      - Configure

				      summary: Perform compute node configuration.

				      description: |

				        This is a blocking API endpoint, i.e. it blocks waiting until

				        compute is finished configuration and is in `Running` state.

				        Optional non-blocking mode could be added later.

				      operationId: configureCompute

				      requestBody:

				        description: Configuration request.

				        required: true

				        content:

				          application/json:

				            schema:

				              type: object

				              required:

				                - spec

				              properties:

				                spec:

				                  # XXX: I don't want to explain current spec in the OpenAPI format,

				                  # as it could be changed really soon. Consider doing it later.

				                  type: object

				      responses:

				        200:

				          description: Compute configuration finished.

				          content:

				            application/json:

				              schema:

				                $ref: "#/components/schemas/ComputeState"

				        400:

				          description: Provided spec is invalid.

				          content:

				            application/json:

				              schema:

				                $ref: "#/components/schemas/GenericError"

				        412:

				          description: |

				            It's not possible to do live-configuration of the compute.

				            It's either in the wrong state, or compute doesn't use pull

				            mode of configuration.

				          content:

				            application/json:

				              schema:

				                $ref: "#/components/schemas/GenericError"

				        500:

				          description: |

				            Compute configuration request was processed, but error

				            occurred. Compute will likely shutdown soon.

				          content:

				            application/json:

				              schema:

				                $ref: "#/components/schemas/GenericError"

				  /extension_server:

				    post:

				      tags:

				      - Extension

				      summary: Download extension from S3 to local folder.

				      description: ""

				      operationId: downloadExtension

				      responses:

				        200:

				          description: Extension downloaded

				          content:

				            text/plain:

				              schema:

				                type: string

				                description: Error text or 'OK' if download succeeded.

				                example: "OK"

				        400:

				          description: Request is invalid.

				          content:

				            application/json:

				              schema:

				                $ref: "#/components/schemas/GenericError"

				        500:

				          description: Extension download request failed.

				          content:

				            application/json:

				              schema:

				                $ref: "#/components/schemas/GenericError"

				  /terminate:

				    post:

				      tags:

				      - Terminate

				      summary: Terminate Postgres and wait for it to exit

				      description: ""

				      operationId: terminate

				      responses:

				        200:

				          description: Result

				        412:

				          description: "wrong state"

				          content:

				            application/json:

				              schema:

				                $ref: "#/components/schemas/GenericError"

				        500:

				          description: "Unexpected error"

				          content:

				            application/json:

				              schema:

				                $ref: "#/components/schemas/GenericError"

				components:

				  securitySchemes:

				    JWT:

				@@ -64,13 +246,16 @@ components:

				  schemas:

				    ComputeMetrics:

				      type: object

				      description: Compute startup metrics

				      description: Compute startup metrics.

				      required:

				        - wait_for_spec_ms

				        - sync_safekeepers_ms

				        - basebackup_ms

				        - config_ms

				        - total_startup_ms

				      properties:

				        wait_for_spec_ms:

				          type: integer

				        sync_safekeepers_ms:

				          type: integer

				        basebackup_ms:

				@@ -80,28 +265,147 @@ components:

				        total_startup_ms:

				          type: integer

				    Info:

				      type: object

				      description: Information about VM/Pod.

				      required:

				        - num_cpus

				      properties:

				        num_cpus:

				          type: integer

				    DbsAndRoles:

				      type: object

				      description: Databases and Roles

				      required:

				        - roles

				        - databases

				      properties:

				        roles:

				          type: array

				          items:

				            $ref: "#/components/schemas/Role"

				        databases:

				          type: array

				          items:

				            $ref: "#/components/schemas/Database"

				    Database:

				      type: object

				      description: Database

				      required:

				        - name

				        - owner

				        - restrict_conn

				        - invalid

				      properties:

				        name:

				          type: string

				        owner:

				          type: string

				        options:

				          type: array

				          items:

				            $ref: "#/components/schemas/GenericOption"

				        restrict_conn:

				          type: boolean

				        invalid:

				          type: boolean

				    Role:

				      type: object

				      description: Role

				      required:

				        - name

				      properties:

				        name:

				          type: string

				        encrypted_password:

				          type: string

				        options:

				          type: array

				          items:

				            $ref: "#/components/schemas/GenericOption"

				    GenericOption:

				      type: object

				      description: Schema Generic option

				      required:

				        - name

				        - vartype

				      properties:

				        name:

				          type: string

				        value:

				          type: string

				        vartype:

				          type: string

				    ComputeState:

				      type: object

				      required:

				        - start_time

				        - status

				        - last_active

				      properties:

				        start_time:

				          type: string

				          description: |

				            Time when compute was started. If initially compute was started in the `empty`

				            state and then provided with valid spec, `start_time` will be reset to the

				            moment, when spec was received.

				          example: "2022-10-12T07:20:50.52Z"

				        status:

				          $ref: '#/components/schemas/ComputeStatus'

				        last_active:

				          type: string

				          description: The last detected compute activity timestamp in UTC and RFC3339 format

				          description: |

				            The last detected compute activity timestamp in UTC and RFC3339 format.

				            It could be empty if compute was never used by user since start.

				          example: "2022-10-12T07:20:50.52Z"

				        error:

				          type: string

				          description: Text of the error during compute startup, if any

				          description: Text of the error during compute startup or reconfiguration, if any.

				          example: ""

				        tenant:

				          type: string

				          description: Identifier of the current tenant served by compute node, if any.

				          example: c9269c359e9a199fad1ea0981246a78f

				        timeline:

				          type: string

				          description: Identifier of the current timeline served by compute node, if any.

				          example: ece7de74d4b8cbe5433a68ce4d1b97b4

				    ComputeInsights:

				      type: object

				      properties:

				        pg_stat_statements:

				          description: Contains raw output from pg_stat_statements in JSON format.

				          type: array

				          items:

				            type: object

				    ComputeStatus:

				      type: string

				      enum:

				        - empty

				        - init

				        - failed

				        - running

				        - configuration_pending

				        - configuration

				      example: running

				    #

				    # Errors

				    #

				    GenericError:

				      type: object

				      required:

				        - error

				      properties:

				        error:

				          type: string

				security:

				  - JWT: []

									
										9

compute_tools/src/lib.rs
									
												View File
												
				@@ -1,14 +1,19 @@

				//!

				//! Various tools and helpers to handle cluster / compute node (Postgres)

				//! configuration.

				//!

				#![deny(unsafe_code)]

				#![deny(clippy::undocumented_unsafe_blocks)]

				pub mod checker;

				pub mod config;

				pub mod configurator;

				pub mod http;

				#[macro_use]

				pub mod logger;

				pub mod catalog;

				pub mod compute;

				pub mod extension_server;

				pub mod monitor;

				pub mod params;

				pub mod pg_helpers;

				pub mod spec;

				pub mod swap;

				pub mod sync_sk;

									
										9

compute_tools/src/logger.rs
									
												View File
												
				@@ -18,6 +18,7 @@ pub fn init_tracing_and_logging(default_log_level: &str) -> anyhow::Result<()> {

				        .unwrap_or_else(|_| tracing_subscriber::EnvFilter::new(default_log_level));

				    let fmt_layer = tracing_subscriber::fmt::layer()

				        .with_ansi(false)

				        .with_target(false)

				        .with_writer(std::io::stderr);

				@@ -33,5 +34,13 @@ pub fn init_tracing_and_logging(default_log_level: &str) -> anyhow::Result<()> {

				        .init();

				    tracing::info!("logging and tracing started");

				    utils::logging::replace_panic_hook_with_tracing_panic_hook().forget();

				    Ok(())

				}

				/// Replace all newline characters with a special character to make it

				/// easier to grep for log messages.

				pub fn inlinify(s: &str) -> String {

				    s.replace('\n', "\u{200B}")

				}

									
										1

compute_tools/src/migrations/0000-neon_superuser_bypass_rls.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				ALTER ROLE neon_superuser BYPASSRLS;

									
										18

compute_tools/src/migrations/0001-alter_roles.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1,18 @@

				DO $$

				DECLARE

				    role_name text;

				BEGIN

				    FOR role_name IN SELECT rolname FROM pg_roles WHERE pg_has_role(rolname, 'neon_superuser', 'member')

				    LOOP

				        RAISE NOTICE 'EXECUTING ALTER ROLE % INHERIT', quote_ident(role_name);

				        EXECUTE 'ALTER ROLE ' || quote_ident(role_name) || ' INHERIT';

				    END LOOP;

				    FOR role_name IN SELECT rolname FROM pg_roles

				        WHERE

				            NOT pg_has_role(rolname, 'neon_superuser', 'member') AND NOT starts_with(rolname, 'pg_')

				    LOOP

				        RAISE NOTICE 'EXECUTING ALTER ROLE % NOBYPASSRLS', quote_ident(role_name);

				        EXECUTE 'ALTER ROLE ' || quote_ident(role_name) || ' NOBYPASSRLS';

				    END LOOP;

				END $$;

									
										6

compute_tools/src/migrations/0002-grant_pg_create_subscription_to_neon_superuser.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1,6 @@

				DO $$

				BEGIN

				    IF (SELECT setting::numeric >= 160000 FROM pg_settings WHERE name = 'server_version_num') THEN

				        EXECUTE 'GRANT pg_create_subscription TO neon_superuser';

				    END IF;

				END $$;

									
										1

compute_tools/src/migrations/0003-grant_pg_monitor_to_neon_superuser.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				GRANT pg_monitor TO neon_superuser WITH ADMIN OPTION;

				`@@ -0,0 +1 @@`
				`GRANT pg_monitor TO neon_superuser WITH ADMIN OPTION;`

Compare commits

2675 Commits arthur/sim ... arpad/slic

18 .cargo/config.toml Unescape Escape View File

8 .config/hakari.toml Unescape Escape View File

2 .config/nextest.toml Normal file Unescape Escape View File

42 .dockerignore Unescape Escape View File

5 .github/ISSUE_TEMPLATE/epic-template.md vendored Unescape Escape View File

3 .github/PULL_REQUEST_TEMPLATE/release-pr.md vendored Unescape Escape View File

13 .github/actionlint.yml vendored Normal file Unescape Escape View File

234 .github/actions/allure-report-generate/action.yml vendored Normal file Unescape Escape View File

72 .github/actions/allure-report-store/action.yml vendored Normal file Unescape Escape View File

232 .github/actions/allure-report/action.yml vendored Unescape Escape View File

6 .github/actions/download/action.yml vendored Unescape Escape View File

12 .github/actions/neon-branch-create/action.yml vendored Unescape Escape View File

12 .github/actions/neon-branch-delete/action.yml vendored Unescape Escape View File

28 .github/actions/neon-project-create/action.yml vendored Unescape Escape View File

8 .github/actions/neon-project-delete/action.yml vendored Unescape Escape View File

86 .github/actions/run-python-test-set/action.yml vendored Unescape Escape View File

10 .github/actions/upload/action.yml vendored Unescape Escape View File

5 .github/ansible/.gitignore vendored Unescape Escape View File

12 .github/ansible/ansible.cfg vendored Unescape Escape View File

15 .github/ansible/ansible.ssh.cfg vendored Unescape Escape View File

193 .github/ansible/deploy.yaml vendored Unescape Escape View File

42 .github/ansible/get_binaries.sh vendored Unescape Escape View File

38 .github/ansible/prod.ap-southeast-1.hosts.yaml vendored Unescape Escape View File

38 .github/ansible/prod.eu-central-1.hosts.yaml vendored Unescape Escape View File

41 .github/ansible/prod.us-east-2.hosts.yaml vendored Unescape Escape View File

43 .github/ansible/prod.us-west-2.hosts.yaml vendored Unescape Escape View File

33 .github/ansible/scripts/init_pageserver.sh vendored Unescape Escape View File

31 .github/ansible/scripts/init_safekeeper.sh vendored Unescape Escape View File

2 .github/ansible/ssm_config vendored Unescape Escape View File

41 .github/ansible/staging.eu-west-1.hosts.yaml vendored Unescape Escape View File

51 .github/ansible/staging.us-east-2.hosts.yaml vendored Unescape Escape View File

18 .github/ansible/systemd/pageserver.service vendored Unescape Escape View File

18 .github/ansible/systemd/safekeeper.service vendored Unescape Escape View File

1 .github/ansible/templates/pageserver.toml.j2 vendored Unescape Escape View File

76 .github/helm-values/dev-eu-west-1-zeta.neon-proxy-scram.yaml vendored Unescape Escape View File

52 .github/helm-values/dev-eu-west-1-zeta.neon-storage-broker.yaml vendored Unescape Escape View File

68 .github/helm-values/dev-us-east-2-beta.neon-proxy-link.yaml vendored Unescape Escape View File

61 .github/helm-values/dev-us-east-2-beta.neon-proxy-scram-legacy.yaml vendored Unescape Escape View File

76 .github/helm-values/dev-us-east-2-beta.neon-proxy-scram.yaml vendored Unescape Escape View File

52 .github/helm-values/dev-us-east-2-beta.neon-storage-broker.yaml vendored Unescape Escape View File

77 .github/helm-values/prod-ap-southeast-1-epsilon.neon-proxy-scram.yaml vendored Unescape Escape View File

52 .github/helm-values/prod-ap-southeast-1-epsilon.neon-storage-broker.yaml vendored Unescape Escape View File

77 .github/helm-values/prod-eu-central-1-gamma.neon-proxy-scram.yaml vendored Unescape Escape View File

52 .github/helm-values/prod-eu-central-1-gamma.neon-storage-broker.yaml vendored Unescape Escape View File

59 .github/helm-values/prod-us-east-2-delta.neon-proxy-link.yaml vendored Unescape Escape View File

77 .github/helm-values/prod-us-east-2-delta.neon-proxy-scram.yaml vendored Unescape Escape View File

52 .github/helm-values/prod-us-east-2-delta.neon-storage-broker.yaml vendored Unescape Escape View File

61 .github/helm-values/prod-us-west-2-eta.neon-proxy-scram-legacy.yaml vendored Unescape Escape View File

77 .github/helm-values/prod-us-west-2-eta.neon-proxy-scram.yaml vendored Unescape Escape View File

52 .github/helm-values/prod-us-west-2-eta.neon-storage-broker.yaml vendored Unescape Escape View File

8 .github/pull_request_template.md vendored Unescape Escape View File

51 .github/workflows/actionlint.yml vendored Normal file Unescape Escape View File

163 .github/workflows/approved-for-ci-run.yml vendored Normal file Unescape Escape View File

456 .github/workflows/benchmarking.yml vendored Unescape Escape View File

105 .github/workflows/build-build-tools-image.yml vendored Normal file Unescape Escape View File

1253 .github/workflows/build_and_test.yml vendored View File

51 .github/workflows/check-build-tools-image.yml vendored Normal file Unescape Escape View File

36 .github/workflows/check-permissions.yml vendored Normal file Unescape Escape View File

32 .github/workflows/cleanup-caches-by-a-branch.yml vendored Normal file Unescape Escape View File

179 .github/workflows/deploy-dev.yml vendored Unescape Escape View File

167 .github/workflows/deploy-prod.yml vendored Unescape Escape View File

303 .github/workflows/neon_extra_builds.yml vendored Unescape Escape View File

15 .github/workflows/pg_clients.yml vendored Unescape Escape View File

73 .github/workflows/pin-build-tools-image.yml vendored Normal file Unescape Escape View File

29 .github/workflows/release-notify.yml vendored Normal file Unescape Escape View File

102 .github/workflows/release.yml vendored Unescape Escape View File

133 .github/workflows/trigger-e2e-tests.yml vendored Normal file Unescape Escape View File

5 .gitignore vendored Unescape Escape View File

4 .gitmodules vendored Unescape Escape View File

5 .neon_clippy_args Normal file Unescape Escape View File

18 CODEOWNERS Unescape Escape View File

57 CONTRIBUTING.md Unescape Escape View File

4975 Cargo.lock generated View File

196 Cargo.toml Unescape Escape View File

35 Dockerfile Unescape Escape View File

207 Dockerfile.build-tools Normal file Unescape Escape View File

713 Dockerfile.compute-node Unescape Escape View File

29 Dockerfile.compute-tools Unescape Escape View File

2675 Commits

arthur/sim ... arpad/slic

18

.cargo/config.toml

View File

8

.config/hakari.toml

View File

2

.config/nextest.toml Normal file

View File

42

.dockerignore

View File

5

.github/ISSUE_TEMPLATE/epic-template.md vendored

View File

3

.github/PULL_REQUEST_TEMPLATE/release-pr.md vendored

View File

13

.github/actionlint.yml vendored Normal file

View File

234

.github/actions/allure-report-generate/action.yml vendored Normal file

View File

72

.github/actions/allure-report-store/action.yml vendored Normal file

View File

232

.github/actions/allure-report/action.yml vendored

View File

6

.github/actions/download/action.yml vendored

View File

12

.github/actions/neon-branch-create/action.yml vendored

View File

12

.github/actions/neon-branch-delete/action.yml vendored

View File

28

.github/actions/neon-project-create/action.yml vendored

View File

8

.github/actions/neon-project-delete/action.yml vendored

View File

86

.github/actions/run-python-test-set/action.yml vendored

View File

10

.github/actions/upload/action.yml vendored

View File

5

.github/ansible/.gitignore vendored

View File

12

.github/ansible/ansible.cfg vendored

View File

15

.github/ansible/ansible.ssh.cfg vendored

View File

193

.github/ansible/deploy.yaml vendored

View File

42

.github/ansible/get_binaries.sh vendored

View File

38

.github/ansible/prod.ap-southeast-1.hosts.yaml vendored

View File

38

.github/ansible/prod.eu-central-1.hosts.yaml vendored

View File

41

.github/ansible/prod.us-east-2.hosts.yaml vendored

View File

43

.github/ansible/prod.us-west-2.hosts.yaml vendored

View File

33

.github/ansible/scripts/init_pageserver.sh vendored

View File

31

.github/ansible/scripts/init_safekeeper.sh vendored

View File

2

.github/ansible/ssm_config vendored

View File

41

.github/ansible/staging.eu-west-1.hosts.yaml vendored

View File

51

.github/ansible/staging.us-east-2.hosts.yaml vendored

View File

18

.github/ansible/systemd/pageserver.service vendored

View File

18

.github/ansible/systemd/safekeeper.service vendored

View File

1

.github/ansible/templates/pageserver.toml.j2 vendored

View File

76

.github/helm-values/dev-eu-west-1-zeta.neon-proxy-scram.yaml vendored

View File

52

.github/helm-values/dev-eu-west-1-zeta.neon-storage-broker.yaml vendored

View File

68

.github/helm-values/dev-us-east-2-beta.neon-proxy-link.yaml vendored

View File

61

.github/helm-values/dev-us-east-2-beta.neon-proxy-scram-legacy.yaml vendored

View File

76

.github/helm-values/dev-us-east-2-beta.neon-proxy-scram.yaml vendored

View File

52

.github/helm-values/dev-us-east-2-beta.neon-storage-broker.yaml vendored

View File

77

.github/helm-values/prod-ap-southeast-1-epsilon.neon-proxy-scram.yaml vendored

View File

52

.github/helm-values/prod-ap-southeast-1-epsilon.neon-storage-broker.yaml vendored

View File

77

.github/helm-values/prod-eu-central-1-gamma.neon-proxy-scram.yaml vendored

View File

52

.github/helm-values/prod-eu-central-1-gamma.neon-storage-broker.yaml vendored

View File

59

.github/helm-values/prod-us-east-2-delta.neon-proxy-link.yaml vendored

View File

77

.github/helm-values/prod-us-east-2-delta.neon-proxy-scram.yaml vendored

View File

52

.github/helm-values/prod-us-east-2-delta.neon-storage-broker.yaml vendored

View File

61

.github/helm-values/prod-us-west-2-eta.neon-proxy-scram-legacy.yaml vendored

View File

77

.github/helm-values/prod-us-west-2-eta.neon-proxy-scram.yaml vendored

View File

52

.github/helm-values/prod-us-west-2-eta.neon-storage-broker.yaml vendored

View File

8

.github/pull_request_template.md vendored

View File

51

.github/workflows/actionlint.yml vendored Normal file

View File

163

.github/workflows/approved-for-ci-run.yml vendored Normal file

View File

456

.github/workflows/benchmarking.yml vendored

View File

105

.github/workflows/build-build-tools-image.yml vendored Normal file

View File

1253

.github/workflows/build_and_test.yml vendored

View File

51

.github/workflows/check-build-tools-image.yml vendored Normal file

View File

36

.github/workflows/check-permissions.yml vendored Normal file

View File

32

.github/workflows/cleanup-caches-by-a-branch.yml vendored Normal file

View File

179

.github/workflows/deploy-dev.yml vendored

View File

167

.github/workflows/deploy-prod.yml vendored

View File

303

.github/workflows/neon_extra_builds.yml vendored

View File

15

.github/workflows/pg_clients.yml vendored

View File

73

.github/workflows/pin-build-tools-image.yml vendored Normal file

View File

29

.github/workflows/release-notify.yml vendored Normal file

View File

102

.github/workflows/release.yml vendored

View File

133

.github/workflows/trigger-e2e-tests.yml vendored Normal file

View File

5

.gitignore vendored

View File

4

.gitmodules vendored

View File

5

.neon_clippy_args Normal file

View File

18

CODEOWNERS

View File

57

CONTRIBUTING.md

View File

4975

Cargo.lock generated

View File

196

Cargo.toml

View File

35

Dockerfile

View File

207

Dockerfile.build-tools Normal file

View File

713

Dockerfile.compute-node

View File

29

Dockerfile.compute-tools

View File

25

Dockerfile.vm-compute-node

View File