rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-03 20:20:38 +00:00

Author	SHA1	Message	Date
BodoBolero	cf4cdd6cd5	temporarily build only on x64 to test out x64 optflags	2024-06-15 11:31:43 +02:00
BodoBolero	b8940f1685	compare native with x86-64	2024-06-15 11:17:09 +02:00
BodoBolero	c5bc73fff0	test performance difference between generic binaries and optimized binaries	2024-06-15 10:18:42 +02:00
Peter Bendel	46210035c5	add halfvec indexing and queries to periodic pgvector performance tests (#8057 ) ## Problem halfvec data type was introduced in pgvector 0.7.0 and is popular because it allows smaller vectors, smaller indexes and potentially better performance. So far we have not tested halfvec in our periodic performance tests. This PR adds halfvec indexing and halfvec queries to the test.	2024-06-14 18:36:50 +02:00
Alex Chi Z	81892199f6	chore(pageserver): vectored get target_keyspace directly accums (#8055 ) follow up on https://github.com/neondatabase/neon/pull/7904 avoid a layer of indirection introduced by `Vec<Range<Key>>` Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-14 11:57:58 -04:00
Alexander Bayandin	83eb02b07a	CI: downgrade docker/setup-buildx-action (#8062 ) ## Problem I've bumped `docker/setup-buildx-action` in #8042 because I wasn't able to reproduce the issue from #7445. But now the issue appears again in https://github.com/neondatabase/neon/actions/runs/9514373620/job/26226626923?pr=8059 The steps to reproduce aren't clear, it required `docker/setup-buildx-action@v3` and rebuilding the image without cache, probably ## Summary of changes - Downgrade `docker/setup-buildx-action@v3` to `docker/setup-buildx-action@v2`	2024-06-14 11:43:51 +00:00
Arseny Sher	a71f58e69c	Fix test_segment_init_failure. Graceful shutdown broke it.	2024-06-14 14:24:15 +03:00
Conrad Ludgate	e6eb0020a1	update rust to 1.79.0 (#8048 ) ## Problem rust 1.79 new enabled by default lints ## Summary of changes * update to rust 1.79 * `s/default_features/default-features/` * fix proxy dead code. * fix pageserver dead code.	2024-06-14 13:23:52 +02:00
John Spray	eb0ca9b648	pageserver: improved synthetic size & find_gc_cutoff error handling (#8051 ) ## Problem This PR refactors some error handling to avoid log spam on tenant/timeline shutdown. - "ignoring failure to find gc cutoffs: timeline shutting down." logs (https://github.com/neondatabase/neon/issues/8012) - "synthetic_size_worker: failed to calculate synthetic size for tenant ...: Failed to refresh gc_info before gathering inputs: tenant shutting down", for example here: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8049/9502988669/index.html#suites/3fc871d9ee8127d8501d607e03205abb/1a074a66548bbcea Closes: https://github.com/neondatabase/neon/issues/8012 ## Summary of changes - Refactor: Add a PageReconstructError variant to GcError: this is the only kind of error that find_gc_cutoffs can emit. - Functional change: only ignore shutdown PageReconstructError variant: for other variants, treat it as a real error - Refactor: add a structured CalculateSyntheticSizeError type and use it instead of anyhow::Error in synthetic size calculations - Functional change: while iterating through timelines gathering logical sizes, only drop out if the whole tenant is cancelled: individual timeline cancellations indicate deletion in progress and we can just ignore those.	2024-06-14 11:08:11 +01:00
John Spray	6843fd8f89	storage controller: always wait for tenant detach before delete (#8049 ) ## Problem This test could fail with a timeout waiting for tenant deletions. Tenant deletions could get tripped up on nodes transitioning from offline to online at the moment of the deletion. In a previous reconciliation, the reconciler would skip detaching a particular location because the node was offline, but then when we do the delete the node is marked as online and can be picked as the node to use for issuing a deletion request. This hits the "Unexpectedly still attached path", which would still work if the caller kept calling DELETE, but if a caller does a Delete,get,get,get poll, then it doesn't work because the GET calls fail after we've marked the tenant as detached. ## Summary of changes Fix the undesirable storage controller behavior highlighted by this test failure: - Change tenant deletion flow to _always_ wait for reconciliation to succeed: it was unsound to proceed and return 202 if something was still attached, because after the 202 callers can no longer GET the tenant. Stabilize the test: - Add a reconcile_until_idle to the test, so that it will not have reconciliations running in the background while we mark a node online. This test is not meant to be a chaos test: we should test that kind of complexity elsewhere. - This reconcile_until_idle also fixes another failure mode where the test might see a None for a tenant location because a reconcile was mutating it (https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7288/9500177581/index.html#suites/8fc5d1648d2225380766afde7c428d81/4acece42ae00c442/) It remains the case that a motivated tester could produce a situation where a DELETE gives a 500, when precisely the wrong node transitions from offline to available at the precise moment of a deletion (but the 500 is better than returning 202 and then failing all subsequent GETs). Note that nodes don't go through the offline state during normal restarts, so this is super rare. We should eventually fix this by making DELETE to the pageserver implicitly detach the tenant if it's attached, but that should wait until nobody is using the legacy-style deletes (the ones that use 202 + polling)	2024-06-14 10:37:30 +01:00
Alexander Bayandin	edc900028e	CI: Update outdated GitHub Actions (#8042 ) ## Problem We have some amount of outdated action in the CI pipeline, GitHub complains about some of them. ## Summary of changes - Update `actions/checkout@1` (a really old one) in `vm-compute-node-image` - Update `actions/checkout@3` in `build-build-tools-image` - Update `docker/setup-buildx-action` in all workflows / jobs, it was downgraded in https://github.com/neondatabase/neon/pull/7445, but it it seems it works fine now	2024-06-14 10:24:13 +01:00
Heikki Linnakangas	789196572e	Fix test_replica_query_race flakiness (#8038 ) This failed once with `relation "test" does not exist` when trying to run the query on the standby. It's possible that the standby is started before the CREATE TABLE is processed in the pageserver, and the standby opens up for queries before it has received the CREATE TABLE transaction from the primary. To fix, wait for the standby to catch up to the primary before starting to run the queries. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8025/9483658488/index.html	2024-06-14 11:51:12 +03:00
John Spray	425eed24e8	pageserver: refine shutdown handling in secondary download (#8052 ) ## Problem Some code paths during secondary mode download are returning Ok() rather than UpdateError::Cancelled. This is functionally okay, but it means that the end of TenantDownloader::download has a sanity check that the progress is 100% on success, and prints a "Correcting drift..." warning if not. This warning can be emitted in a test, e.g. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8049/9503642976/index.html#/testresult/fff1624ba6adae9e. ## Summary of changes - In secondary download cancellation paths, use Err(UpdateError::Cancelled) rather than Ok(), so that we drop out of the download function and do not reach the progress sanity check.	2024-06-14 09:39:31 +01:00
James Broadhead	f67010109f	extensions: pgvector-0.7.2 (#8037 ) Update pgvector to 0.7.2 Purely mechanical update to pgvector.patch, just as a place to start from	2024-06-14 10:17:43 +02:00
Tristan Partin	0c3e3a8667	Set application_name for internal connections to computes This will help when analyzing the origins of connections to a compute like in [0]. [0]: https://github.com/neondatabase/cloud/issues/14247	2024-06-13 12:06:10 -07:00
Christian Schwarz	82719542c6	fix: vectored get returns incorrect result on inexact materialized page cache hit (#8050 ) # Problem Suppose our vectored get starts with an inexact materialized page cache hit ("cached lsn") that is shadowed by a newer image layer image layer. Like so: ``` <inmemory layers> +-+ < delta layer \| \| -\|-\|----- < image layer \| \| \| \| -\|-\|----- < cached lsn for requested key +_+ ``` The correct visitation order is 1. inmemory layers 2. delta layer records in LSN range `[image_layer.lsn, oldest_inmemory_layer.lsn_range.start)` 3. image layer However, the vectored get code, when it visits the delta layer, it (incorrectly!) returns with state `Complete`. The reason why it returns is that it calls `on_lsn_advanced` with `self.lsn_range.start`, i.e., the layer's LSN range. Instead, it should use `lsn_range.start`, i.e., the LSN range from the correct visitation order listed above. # Solution Use `lsn_range.start` instead of `self.lsn_range.start`. # Refs discovered by & fixes https://github.com/neondatabase/neon/issues/6967 Co-authored-by: Vlad Lazar <vlad@neon.tech>	2024-06-13 18:20:47 +00:00
Alex Chi Z	d25f7e3dd5	test(pageserver): add test wal record for unit testing (#8015 ) https://github.com/neondatabase/neon/issues/8002 We need mock WAL record to make it easier to write unit tests. This pull request adds such a record. It has `clear` flag and `append` field. The tests for legacy-enhanced compaction are not modified yet and will be part of the next pull request. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-13 09:44:37 -04:00
Anna Khanova	fbccd1e676	Proxy process updated errors (#8026 ) ## Problem Respect errors classification from cplane	2024-06-13 14:42:26 +02:00
Heikki Linnakangas	dc2ab4407f	Fix on-demand SLRU download on standby starting at WAL segment boundary (#8031 ) If a standby is started right after switching to a new WAL segment, the request in the SLRU download request would point to the beginning of the segment (e.g. 0/5000000), while the not-modified-since LSN would point to just after the page header (e.g. 0/5000028). It's effectively the same position, as there cannot be any WAL records in between, but the pageserver rightly errors out on any request where the request LSN < not-modified since LSN. To fix, round down the not-modified since LSN to the beginning of the page like the request LSN. Fixes issue #8030	2024-06-13 00:31:31 +03:00
MMeent	ad0ab3b81b	Fix query error in vm-image-spec.yaml (#8028 ) This query causes metrics exporter to complain about missing data because it can't find the correct column. Issue was introduced with https://github.com/neondatabase/neon/pull/7761	2024-06-12 11:25:04 -07:00
Alex Chi Z	836d1f4af7	test(pageserver): add test keyspace into collect_keyspace (#8016 ) Some test cases add random keys into the timeline, but it is not part of the `collect_keyspace`, this will cause compaction remove the keys. The pull request adds a field to supply extra keyspaces during unit tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-12 17:42:43 +00:00
a-masterov	9dda13ecce	Add the image version to the neon-test-extensions image (#8032 ) ## Problem The version was missing in the image name causing the error during the workflow ## Summary of changes Added the version to the image name	2024-06-12 18:15:20 +02:00
Peter Bendel	9ba9f32dfe	Reactivate page bench test in CI after ignoring CopyFail error in pageserver (#8023 ) ## Problem Testcase page bench test_pageserver_max_throughput_getpage_at_latest_lsn had been deactivated because it was flaky. We now ignore copy fail error messages like in `270d3be507/test_runner/regress/test_pageserver_getpage_throttle.py (L17-L20)` and want to reactivate it to see it it is still flaky ## Summary of changes - reactivate the test in CI - ignore CopyFail error message during page bench test cases ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-06-12 16:10:57 +02:00
Vlad Lazar	3099e1a787	storcon_cli: do not drain to undesirable nodes (#8027 ) ## Problem The previous code would attempt to drain to unavailable or unschedulable nodes. ## Summary of Changes Remove such nodes from the list of nodes to fill.	2024-06-12 12:33:54 +01:00
a-masterov	f749437cec	Resolve the problem the docker compose caused by the extensions tests (#8024 ) ## Problem The merging of #7818 caused the problem with the docker-compose file. Running docker compose is now impossible due to the unavailability of the neon-test-extensions:latest image ## Summary of changes Fix the problem: Add the latest tag to the neon-test-extensions image and use the profiles feature of the docker-compose file to avoid loading the neon-test-extensions container if it is not needed.	2024-06-12 12:25:13 +02:00
Heikki Linnakangas	0a256148b0	Update documentation on running locally with Docker (#8020 ) - Fix the dockerhub URLs - `neondatabase/compute-node` image has been replaced with Postgres version specific images like `neondatabase/compute-node-v16` - Use TAG=latest in the example, rather than some old tag. That's a sensible default for people to copy-past - For convenience, use a Postgres connection URL in the `psql` example that also includes the password. That way, there's no need to set up .pgpass - Update the image names in `docker ps` example to match what you get when you follow the example	2024-06-12 07:06:00 +00:00
Heikki Linnakangas	69aa1aca35	Update default Postgres version in docker-compose.yml (#8019 ) Let's be modern.	2024-06-12 09:19:24 +03:00
Heikki Linnakangas	9983ae291b	Another attempt at making test_vm_bits less flaky (#7989 ) - Split the first and second parts of the test to two separate tests - In the first test, disable the aggressive GC, compaction, and autovacuum. They are only needed by the second test. I'd like to get the first test to a point that the VM page is never all-zeros. Disabling autovacuum in the first test is hopefully enough to accomplish that. - Compare the full page images, don't skip page header. After fixing the previous point, there should be no discrepancy. LSN still won't match, though, because of commit `387a36874c`. Fixes issue https://github.com/neondatabase/neon/issues/7984	2024-06-12 09:18:52 +03:00
Sasha Krassovsky	b7a0c2b614	Add On-demand WAL Download to logicalfuncs (#7960 ) We implemented on-demand WAL download for walsender, but other things that may want to read the WAL from safekeepers don't do that yet. This PR makes it do that by adding the same set of hooks to logicalfuncs. Addresses https://github.com/neondatabase/neon/issues/7959 Also relies on: https://github.com/neondatabase/postgres/pull/438 https://github.com/neondatabase/postgres/pull/437 https://github.com/neondatabase/postgres/pull/436	2024-06-11 17:59:32 -07:00
Arpad Müller	27518676d7	Rename S3 scrubber to storage scrubber (#8013 ) The S3 scrubber contains "S3" in its name, but we want to make it generic in terms of which storage is used (#7547). Therefore, rename it to "storage scrubber", following the naming scheme of already existing components "storage broker" and "storage controller". Part of #7547	2024-06-11 22:45:22 +00:00
Heikki Linnakangas	78a59b94f5	Copy editor config for the neon extension from PostgreSQL (#8009 ) This makes IDEs and github diff format the code the same way as PostgreSQL sources, which is the style we try to maintain.	2024-06-11 23:19:18 +03:00
Vlad Lazar	7121db3669	storcon_cli: add 'drain' command (#8007 ) ## Problem We need the ability to prepare a subset of storage controller managed pageservers for decommisioning. The storage controller cannot currently express this in terms of scheduling constraints (it's a pretty special case, so I'm not sure it even should). ## Summary of Changes A new `drain` command is added to `storcon_cli`. It takes a set of nodes to drain and migrates primary attachments outside of said set. Simple round robing assignment is used under the assumption that nodes outside of the draining set are evenly balanced. Note that secondary locations are not migrated. This is fine for staging, but the migration API will have to be extended for prod in order to allow migration of secondaries as well. I've tested this out against a neon local cluster. The immediate use for this command will be to migrate staging to ARM(Arch64) pageservers. Related https://github.com/neondatabase/cloud/issues/14029	2024-06-11 16:39:38 +00:00
Vlad Lazar	126bcc3794	storcon: track number of attached shards for each node (#8011 ) ## Problem The storage controller does not track the number of shards attached to a given pageserver. This is a requirement for various scheduling operations (e.g. draining and filling will use this to figure out if the cluster is balanced) ## Summary of Changes Track the number of shards attached to each node. Related https://github.com/neondatabase/neon/issues/7387	2024-06-11 16:03:25 +01:00
Alex Chi Z	4c2100794b	feat(pageserver): initial code sketch & test case for combined gc+compaction at gc_horizon (#7948 ) A demo for a building block for compaction. The GC-compaction operation iterates all layers below/intersect with the GC horizon, and do a full layer rewrite of all of them. The end result will be image layer covering the full keyspace at GC-horizon, and a bunch of delta layers above the GC-horizon. This helps us collect the garbages of the test_gc_feedback test case to reduce space amplification. This operation can be manually triggered using an HTTP API or be triggered based on some metrics. Actual method TBD. The test is very basic and it's very likely that most part of the algorithm will be rewritten. I would like to get this merged so that I can have a basic skeleton for the algorithm and then make incremental changes. <img width="924" alt="image" src="https://github.com/neondatabase/neon/assets/4198311/f3d49f4e-634f-4f56-986d-bfefc6ae6ee2"> --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-11 14:14:51 +00:00
Joonas Koivunen	d3b892e9ad	test: fix duplicated harness name (#8010 ) We need unique tenant harness names in case you want to inspect the results of the last failing run. We are not using any proc macros to get the test name as there is no stable way of doing that, and there will not be one in the future, so we need to fix these duplicates. Also, clean up the duplicated tests to not mix `?` and `unwrap/assert`.	2024-06-11 10:10:05 -04:00
Joonas Koivunen	7515d0f368	fix: stop storing TimelineMetadata in index_part.json as bytes (#7699 ) We've stored metadata as bytes within the `index_part.json` for long fixed reasons. #7693 added support for reading out normal json serialization of the `TimelineMetadata`. Change the serialization to only write `TimelineMetadata` as json for going forward, keeping the backward compatibility to reading the metadata as bytes. Because of failure to include `alias = "metadata"` in #7693, one more follow-up is required to make the switch from the old name to `"metadata": <json>`, but that affects only the field name in serialized format. In documentation and naming, an effort is made to add enough warning signs around TimelineMetadata so that it will receive no changes in the future. We can add those fields to `IndexPart` directly instead. Additionally, the path to cleaning up `metadata.rs` is documented in the `metadata.rs` module comment. If we must extend `TimelineMetadata` before that, the duplication suggested in [review comment] is the way to go. [review comment]: https://github.com/neondatabase/neon/pull/7699#pullrequestreview-2107081558	2024-06-11 15:38:54 +03:00
a-masterov	e27ce38619	Add testing for extensions (#7818 ) ## Problem We need automated tests of extensions shipped with Neon to detect possible problems. ## Summary of changes A new image neon-test-extensions is added. Workflow changes to test the shipped extensions are added as well. Currently, the regression tests, shipped with extensions are in use. Some extensions, i.e. rum, timescaledb, rdkit, postgis, pgx_ulid, pgtap, pg_tiktoken, pg_jsonschema, pg_graphql, kq_imcx, wal2json_2_5 are excluded due to problems or absence of internal tests. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-06-11 13:07:51 +02:00
Joonas Koivunen	e46692788e	refactor: Timeline layer flushing (#7993 ) The new features have deteriorated layer flushing, most recently with #7927. Changes: - inline `Timeline::freeze_inmem_layer` to the only caller - carry the TimelineWriterState guard to the actual point of freezing the layer - this allows us to `#[cfg(feature = "testing")]` the assertion added in #7927 - remove duplicate `flush_frozen_layer` in favor of splitting the `flush_frozen_layers_and_wait` - this requires starting the flush loop earlier for `checkpoint_distance < initdb size` tests	2024-06-10 19:34:34 +03:00
Alex Chi Z	a8ca7a1a1d	docs: highlight neon env comes with an initial timeline (#7995 ) Quite a few existing test cases create their own timelines instead of using the default one. This pull request highlights that and hopefully people can write simpler tests in the future. Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Yuchen Liang <70461588+yliang412@users.noreply.github.com>	2024-06-10 12:08:16 -04:00
Joonas Koivunen	b52e31c1a4	fix: allow layer flushes more often (#7927 ) As seen with the pgvector 0.7.0 index builds, we can receive large batches of images, leading to very large L0 layers in the range of 1GB. These large layers are produced because we are only able to roll the layer after we have witnessed two different Lsns in a single `DataDirModification::commit`. As the single Lsn batches of images can span over multiple `DataDirModification` lifespans, we will rarely get to write two different Lsns in a single `put_batch` currently. The solution is to remember the TimelineWriterState instead of eagerly forgetting it until we really open the next layer or someone else flushes (while holding the write_guard). Additional changes are test fixes to avoid "initdb image layer optimization" or ignoring initdb layers for assertion. Cc: #7197 because small `checkpoint_distance` will now trigger the "initdb image layer optimization"	2024-06-10 13:50:17 +00:00
Heikki Linnakangas	5a7e285c2c	Simplify scanning compute logs in tests (#7997 ) Implement LogUtils in the Endpoint fixture class, so that the "log_contains" function can be used on compute logs too. Per discussion at: https://github.com/neondatabase/neon/pull/7288#discussion_r1623633803	2024-06-10 12:52:49 +00:00
Christian Schwarz	ae5badd375	Revert "Include openssl and ICU statically linked" (#8003 ) Reverts neondatabase/neon#7956 Rationale: compute incompatibilties Slack thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1718011276665839?thread_ts=1718008160.431869&cid=C033RQ5SPDH Relevant quotes from @hlinnaka > If we go through with the current release candidate, but the compute is pinned, people who create new projects will get that warning, which is silly. To them, it looks like the ICU version was downgraded, because initdb was run with newer version. > We should upgrade the ICU version eventually. And when we do that, users with old projects that use ICU will start to see that warning. I think that's acceptable, as long as we do homework, notify users, and communicate that properly. > When do that, we should to try to upgrade the storage and compute versions at roughly the same time.	2024-06-10 13:20:20 +02:00
Alex Chi Z	3e63d0f9e0	test(pageserver): quantify compaction outcome (#7867 ) A simple API to collect some statistics after compaction to easily understand the result. The tool reads the layer map, and analyze range by range instead of doing single-key operations, which is more efficient than doing a benchmark to collect the result. It currently computes two key metrics: * Latest data access efficiency, which finds how many delta layers / image layers the system needs to iterate before returning any key in a key range. * (Approximate) PiTR efficiency, as in https://github.com/neondatabase/neon/issues/7770, which is simply the number of delta files in the range. The reason behind that is, assume no image layer is created, PiTR efficiency is simply the cost of collect records from the delta layers, and the replay time. Number of delta files (or in the future, estimated size of reads) is a simple yet efficient way of estimating how much effort the page server needs to reconstruct a page. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-10 10:42:13 +02:00
Rahul Patil	3b647cd55d	Include openssl and ICU statically linked (#7956 ) ## Problem Due to the upcoming End of Life (EOL) for Debian 11, we need to upgrade the base OS for Pageservers from Debian 11 to Debian 12 for security reasons. When deploying a new Pageserver on Debian 12 with the same binary built on Debian 11, we encountered the following errors: ``` could not execute operation: pageserver error, status: 500, msg: Command failed with status ExitStatus(unix_wait_status(32512)): /usr/local/neon/v16/bin/initdb: error while loading shared libraries: libicuuc.so.67: cannot open shared object file: No such file or directory ``` and ``` could not execute operation: pageserver error, status: 500, msg: Command failed with status ExitStatus(unix_wait_status(32512)): /usr/local/neon/v14/bin/initdb: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory ``` These issues occur when creating new projects. ## Summary of changes - To address these issues, we configured PostgreSQL build to use statically linked OpenSSL and ICU libraries. - This resolves the missing shared library errors when running the binaries on Debian 12. Closes: https://github.com/neondatabase/cloud/issues/12648 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [x] Do not forget to reformat commit message to not include the above checklist	2024-06-07 17:28:10 +00:00
Tristan Partin	26c68f91f3	Move SQL migrations out of line It makes them much easier to reason about, and allows other SQL tooling to operate on them like language servers, formatters, etc. I also brought back the removed migrations such that we can more easily understand what they were. I included a "-- SKIP" comment describing why those migrations are now skipped. We no longer skip migrations by checking if it is empty, but instead check to see if the migration starts with "-- SKIP".	2024-06-07 08:35:55 -07:00
a-masterov	2078dc827b	CI: copy run-* labels from external contributors' PRs (#7915 ) ## Problem We don't carry run-* labels from external contributors' PRs to ci-run/pr-* PRs. This is not really convenient. Need to sync labels in approved-for-ci-run workflow. ## Summary of changes Added the procedure of transition of labels from the original PR ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-06-07 10:04:59 +02:00
Joonas Koivunen	8ee191c271	test_local_only_layers_after_crash: various fixes (#7986 ) In #7927 I needed to fix this test case, but the fixes should be possible to land irrespective of the layer ingestion code change. The most important fix is the behavior if an image layer is found: the assertion message formatting raises a runtime error, which obscures the fact that we found an image layer.	2024-06-07 10:18:05 +03:00
Anastasia Lubennikova	66c6b270f1	Downgrade No response from reading prefetch entry WARNING to LOG	2024-06-06 20:56:19 +01:00
Arthur Petukhovsky	e4e444f59f	Remove random sleep in partial backup (#7982 ) We had a random sleep in the beginning of partial backup task, which was needed for the first partial backup deploy. It helped with gradual upload of segments without causing network overload. Now partial backup is deployed everywhere, so we don't need this random sleep anymore. We also had an issue related to this, in which manager task was not shut down for a long time. The cause of the issue is this random sleep that didn't take timeline cancellation into account, meanwhile manager task waited for partial backup to complete. Fixes https://github.com/neondatabase/neon/issues/7967	2024-06-06 17:54:44 +00:00
Joonas Koivunen	d46d19456d	raise the warning for oversized L0 to 2target (#7985 ) currently we warn even by going over a single byte. even that will be hit much more rarely once #7927 lands, but get this in earlier. rationale for 2checkpoint_distance: anything smaller is not really worth a warn. we have an global allowed_error for this warning, which still cannot be removed nor can it be removed with #7927 because of many tests with very small `checkpoint_distance`.	2024-06-06 20:18:39 +03:00

1 2 3 4 5 ...

5428 Commits