rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-07 06:00:38 +00:00

Author	SHA1	Message	Date
Christian Schwarz	c2f8fdccd7	ingest: rate-limited warning if WAL commit timestamps lags for > wait_lsn_timeout (#8839 ) refs https://github.com/neondatabase/cloud/issues/13750 The logging in this commit will make it easier to detect lagging ingest. We're trusting compute timestamps --- ideally we'd use SK timestmaps instead. But trusting the compute timestamp is ok for now.	2024-08-29 12:06:00 +01:00
Konstantin Knizhnik	cfa45ff5ee	Undo walloging replorgin file on checkpoint (#8794 ) ## Problem See #8620 ## Summary of changes Remove walloping of replorigin file because it is reconstructed by PS ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-08-29 07:45:33 +03:00
Andrew Rudenko	acc075071d	feat(compute_ctl): add periodic `lease lsn` requests for static computes (#7994 ) Part of #7497 ## Problem Static computes pinned at some fix LSN could be created initially within PITR interval but eventually go out it. To make sure that Static computes are not affected by GC, we need to start using the LSN lease API (introduced in #8084) in compute_ctl. ## Summary of changes compute_ctl - Spawn a thread for when a static compute starts to periodically ping pageserver(s) to make LSN lease requests. - Add `test_readonly_node_gc` to test if static compute can read all pages without error. - (test will fail on main without the code change here) page_service - `wait_or_get_last_lsn` will now allow `request_lsn` less than `latest_gc_cutoff_lsn` to proceed if there is a lease on `request_lsn`. Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>	2024-08-28 19:09:26 +00:00
Christian Schwarz	9627747d35	bypass `PageCache` for `InMemoryLayer` + avoid `Value::deser` on L0 flush (#8537 ) Part of [Epic: Bypass PageCache for user data blocks](https://github.com/neondatabase/neon/issues/7386). # Problem `InMemoryLayer` still uses the `PageCache` for all data stored in the `VirtualFile` that underlies the `EphemeralFile`. # Background Before this PR, `EphemeralFile` is a fancy and (code-bloated) buffered writer around a `VirtualFile` that supports `blob_io`. The `InMemoryLayerInner::index` stores offsets into the `EphemeralFile`. At those offset, we find a varint length followed by the serialized `Value`. Vectored reads (`get_values_reconstruct_data`) are not in fact vectored - each `Value` that needs to be read is read sequentially. The `will_init` bit of information which we use to early-exit the `get_values_reconstruct_data` for a given key is stored in the serialized `Value`, meaning we have to read & deserialize the `Value` from the `EphemeralFile`. The L0 flushing also needs to re-determine the `will_init` bit of information, by deserializing each value during L0 flush. # Changes 1. Store the value length and `will_init` information in the `InMemoryLayer::index`. The `EphemeralFile` thus only needs to store the values. 2. For `get_values_reconstruct_data`: - Use the in-memory `index` figures out which values need to be read. Having the `will_init` stored in the index enables us to do that. - View the EphemeralFile as a byte array of "DIO chunks", each 512 bytes in size (adjustable constant). A "DIO chunk" is the minimal unit that we can read under direct IO. - Figure out which chunks need to be read to retrieve the serialized bytes for thes values we need to read. - Coalesce chunk reads such that each DIO chunk is only read once to serve all value reads that need data from that chunk. - Merge adjacent chunk reads into larger `EphemeralFile::read_exact_at_eof_ok` of up to 128k (adjustable constant). 3. The new `EphemeralFile::read_exact_at_eof_ok` fills the IO buffer from the underlying VirtualFile and/or its in-memory buffer. 4. The L0 flush code is changed to use the `index` directly, `blob_io` 5. We can remove the `ephemeral_file::page_caching` construct now. The `get_values_reconstruct_data` changes seem like a bit overkill but they are necessary so we issue the equivalent amount of read system calls compared to before this PR where it was highly likely that even if the first PageCache access was a miss, remaining reads within the same `get_values_reconstruct_data` call from the same `EphemeralFile` page were a hit. The "DIO chunk" stuff is truly unnecessary for page cache bypass, but, since we're working on [direct IO](https://github.com/neondatabase/neon/issues/8130) and https://github.com/neondatabase/neon/issues/8719 specifically, we need to do _something_ like this anyways in the near future. # Alternative Design The original plan was to use the `vectored_blob_io` code it relies on the invariant of Delta&Image layers that `index order == values order`. Further, `vectored_blob_io` code's strategy for merging IOs is limited to adjacent reads. However, with direct IO, there is another level of merging that should be done, specifically, if multiple reads map to the same "DIO chunk" (=alignment-requirement-sized and -aligned region of the file), then it's "free" to read the chunk into an IO buffer and serve the two reads from that buffer. => https://github.com/neondatabase/neon/issues/8719 # Testing / Performance Correctness of the IO merging code is ensured by unit tests. Additionally, minimal tests are added for the `EphemeralFile` implementation and the bit-packed `InMemoryLayerIndexValue`. Performance testing results are presented below. All pref testing done on my M2 MacBook Pro, running a Linux VM. It's a release build without `--features testing`. We see definitive improvement in ingest performance microbenchmark and an ad-hoc microbenchmark for getpage against InMemoryLayer. ``` baseline: commit `7c74112b2a` origin/main HEAD: `ef1c55c52e` ``` <details> ``` cargo bench --bench bench_ingest -- 'ingest 128MB/100b seq, no delta' baseline ingest-small-values/ingest 128MB/100b seq, no delta time: [483.50 ms 498.73 ms 522.53 ms] thrpt: [244.96 MiB/s 256.65 MiB/s 264.73 MiB/s] HEAD ingest-small-values/ingest 128MB/100b seq, no delta time: [479.22 ms 482.92 ms 487.35 ms] thrpt: [262.64 MiB/s 265.06 MiB/s 267.10 MiB/s] ``` </details> We don't have a micro-benchmark for InMemoryLayer and it's quite cumbersome to add one. So, I did manual testing in `neon_local`. <details> ``` ./target/release/neon_local stop rm -rf .neon ./target/release/neon_local init ./target/release/neon_local start ./target/release/neon_local tenant create --set-default ./target/release/neon_local endpoint create foo ./target/release/neon_local endpoint start foo psql 'postgresql://cloud_admin@127.0.0.1:55432/postgres' psql (13.16 (Debian 13.16-0+deb11u1), server 15.7) CREATE TABLE wal_test ( id SERIAL PRIMARY KEY, data TEXT ); DO $$ DECLARE i INTEGER := 1; BEGIN WHILE i <= 500000 LOOP INSERT INTO wal_test (data) VALUES ('data'); i := i + 1; END LOOP; END $$; -- => result is one L0 from initdb and one 137M-sized ephemeral-2 DO $$ DECLARE i INTEGER := 1; random_id INTEGER; random_record wal_test%ROWTYPE; start_time TIMESTAMP := clock_timestamp(); selects_completed INTEGER := 0; min_id INTEGER := 1; -- Minimum ID value max_id INTEGER := 100000; -- Maximum ID value, based on your insert range iters INTEGER := 100000000; -- Number of iterations to run BEGIN WHILE i <= iters LOOP -- Generate a random ID within the known range random_id := min_id + floor(random() * (max_id - min_id + 1))::int; -- Select the row with the generated random ID SELECT * INTO random_record FROM wal_test WHERE id = random_id; -- Increment the select counter selects_completed := selects_completed + 1; -- Check if a second has passed IF EXTRACT(EPOCH FROM clock_timestamp() - start_time) >= 1 THEN -- Print the number of selects completed in the last second RAISE NOTICE 'Selects completed in last second: %', selects_completed; -- Reset counters for the next second selects_completed := 0; start_time := clock_timestamp(); END IF; -- Increment the loop counter i := i + 1; END LOOP; END $$; ./target/release/neon_local stop baseline: commit `7c74112b2a` origin/main NOTICE: Selects completed in last second: 1864 NOTICE: Selects completed in last second: 1850 NOTICE: Selects completed in last second: 1851 NOTICE: Selects completed in last second: 1918 NOTICE: Selects completed in last second: 1911 NOTICE: Selects completed in last second: 1879 NOTICE: Selects completed in last second: 1858 NOTICE: Selects completed in last second: 1827 NOTICE: Selects completed in last second: 1933 ours NOTICE: Selects completed in last second: 1915 NOTICE: Selects completed in last second: 1928 NOTICE: Selects completed in last second: 1913 NOTICE: Selects completed in last second: 1932 NOTICE: Selects completed in last second: 1846 NOTICE: Selects completed in last second: 1955 NOTICE: Selects completed in last second: 1991 NOTICE: Selects completed in last second: 1973 ``` NB: the ephemeral file sizes differ by ca 1MiB, ours being 1MiB smaller. </details> # Rollout This PR changes the code in-place and is not gated by a feature flag.	2024-08-28 18:31:41 +00:00
Alex Chi Z.	63a0d0d039	fix(storage-scrubber): make retry error into warnings (#8851 ) We get many HTTP connect timeout errors from scrubber logs, and it turned out that the scrubber is retrying, and this is not an actual error. In the future, we should revisit all places where we log errors in the storage scrubber, and only error when necessary (i.e., errors that might need manual fixing) Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-28 13:39:21 -04:00
Vlad Lazar	793b5061ec	storcon: track pageserver availability zone (#8852 ) ## Problem In order to build AZ aware scheduling, the storage controller needs to know what AZ pageservers are in. Related https://github.com/neondatabase/neon/issues/8848 ## Summary of changes This patch set adds a new nullable column to the `nodes` table: `availability_zone_id`. The node registration request is extended to include the AZ id (pageservers already have this in their `metadata.json` file). If the node is already registered, then we update the persistent and in-memory state with the provided AZ. Otherwise, we add the node with the AZ to begin with. A couple assumptions are made here: 1. Pageserver AZ ids are stable 2. AZ ids do not change over time Once all pageservers have a configured AZ, we can remove the optionals in the code and make the database column not nullable.	2024-08-28 18:23:55 +01:00
Yuchen Liang	a889a49e06	pageserver: do vectored read on each dio-aligned section once (#8763 ) Part of #8130, closes #8719. ## Problem Currently, vectored blob io only coalesce blocks if they are immediately adjacent to each other. When we switch to Direct IO, we need a way to coalesce blobs that are within the dio-aligned boundary but has gap between them. ## Summary of changes - Introduces a `VectoredReadCoalesceMode` for `VectoredReadPlanner` and `StreamingVectoredReadPlanner` which has two modes: - `AdjacentOnly` (current implementation) - `Chunked(<alignment requirement>)` - New `ChunkedVectorBuilder` that considers batching `dio-align`-sized read, the start and end of the vectored read will respect `stx_dio_offset_align` / `stx_dio_mem_align` (`vectored_read.start` and `vectored_read.blobs_at.first().start_offset` will be two different value). - Since we break the assumption that blobs within single `VectoredRead` are next to each other (implicit end offset), we start to store blob end offsets in the `VectoredRead`. - Adapted existing tests to run in both `VectoredReadCoalesceMode`. - The io alignment can also be live configured at runtime. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-08-28 15:54:42 +01:00
Vlad Lazar	5eb7322d08	docs: rolling storage controller restarts RFC (#8310 ) ## Problem Storage controller upgrades (restarts, more generally) can cause multi-second availability gaps. While the storage controller does not sit on the main data path, it's generally not acceptable to block management requests for extended periods of time (e.g. https://github.com/neondatabase/neon/issues/8034). ## Summary of changes This RFC describes the issues around the current storage controller restart procedure and describes an implementation which reduces downtime to a few milliseconds on the happy path. Related https://github.com/neondatabase/neon/issues/7797	2024-08-28 13:56:14 +00:00
Joonas Koivunen	c0ba18a112	bench: flush before shutting down (#8844 ) while driving by: - remove the extra tenant - remove the extra timelines implement this by turning the pg_compare to a yielding fixture. evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/main/10571779162/index.html#suites/9681106e61a1222669b9d22ab136d07b/3bbe9f007b3ffae1/	2024-08-28 10:20:43 +01:00
John Spray	992a951b5e	.github: direct feature requests to the feedback form (#8849 ) ## Problem When folks open github issues for feature requests, they don't have a clear recipient: engineers usually see them during bug triage, but that doesn't necessarily get the work prioritized. ## Summary of changes Give end users a clearer path to submitting feedback to Neon	2024-08-28 09:22:19 +01:00
Heikki Linnakangas	c5ef779801	tests: Remove unnecessary entries from list of allowed errors (#8199 ) The "manual_gc" context was removed in commit `be0c73f8e7`. The code that generated the other error was removed in commit `9a6c0be823`.	2024-08-27 17:47:05 +01:00
Heikki Linnakangas	2d10306f7a	Remove support for pageserver <-> compute protocol version 1 (#8774 ) Protocol version 2 has been the default for a while now, and we no longer have any computes running in production that used protocol version 1. This completes the migration by removing support for v1 in both the pageserver and the compute. See issue #6211.	2024-08-27 18:36:33 +03:00
Alexey Kondratov	9b9f90c562	fix(walproposer): Do not restart on safekeepers reordering (#8840 ) ## Problem Currently, we compare `neon.safekeepers` values as is, so we unnecessarily restart walproposer even if safekeepers set didn't change. This leads to errors like: ```log FATAL: [WP] restarting walproposer to change safekeeper list from safekeeper-8.us-east-2.aws.neon.tech:6401,safekeeper-11.us-east-2.aws.neon.tech:6401,safekeeper-10.us-east-2.aws.neon.tech:6401 to safekeeper-11.us-east-2.aws.neon.tech:6401,safekeeper-8.us-east-2.aws.neon.tech:6401,safekeeper-10.us-east-2.aws.neon.tech:6401 ``` ## Summary of changes Split the GUC into the list of individual safekeepers and properly compare. We could've done that somewhere on the upper level, e.g., control plane, but I think it's still better when the actual config consumer is smarter and doesn't rely on upper levels.	2024-08-27 15:49:47 +02:00
Folke Behrens	52cb33770b	proxy: Rename backend types and variants as prep for refactor (#8845 ) * AuthBackend enum to AuthBackendType * BackendType enum to Backend * Link variants to Web * Adjust messages, comments, etc.	2024-08-27 14:12:42 +02:00
Conrad Ludgate	12850dd5e9	proxy: remove dead code (#8847 ) By marking everything possible as pub(crate), we find a few dead code candidates.	2024-08-27 12:00:35 +01:00
a-masterov	5d527133a3	Fix the pg_hintplan flakyness (#8834 ) ## Problem pg_hintplan test seems to be flaky, sometimes it fails, while usually it passes ## Summary of changes The regression test is changed to filter out the Neon service queries. The expected file is changed as well. ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-08-27 12:39:42 +02:00
Arseny Sher	09362b6363	safekeeper: reorder routes and their handlers. Routes and their handlers were in a bit different order in 1) routes list 2) their implementation 3) python client 4) openapi spec, making addition of new ones intimidating. Make it the same everywhere, roughly lexicographically but preserving some of existing logic. No functional changes.	2024-08-27 07:37:55 +03:00
Alexey Kondratov	7820c572e7	fix(sql-exporter): Remove tenant_id from compute_logical_snapshot_files It appeared to be that it's already auto-added to all metrics [1] [1]: `3a907c317c/apps/base/ext-vmagent/vmagent.yaml (L43)`	2024-08-27 00:51:23 +02:00
Alexey Kondratov	bf03713fa1	fix(sql-exporter): Fix typo in gauge In `f4b3c317f` there was a typo and I missed that on review	2024-08-27 00:51:23 +02:00
Alex Chi Z.	0f65684263	feat(pageserver): use split layer writer in gc-compaction (#8608 ) Part of #8002, the final big PR in the batch. ## Summary of changes This pull request uses the new split layer writer in the gc-compaction. * It changes how layers are split. Previously, we split layers based on the original split point, but this creates too many layers (test_gc_feedback has one key per layer). * Therefore, we first verify if the layer map can be processed by the current algorithm (See https://github.com/neondatabase/neon/pull/8191, it's basically the same check) * On that, we proceed with the compaction. This way, it creates a large enough layer close to the target layer size. * Added a new set of functions `with_discard` in the split layer writer. This helps us skip layers if we are going to produce the same persistent key. * The delta writer will keep the updates of the same key in a single file. This might create a super large layer, but we can optimize it later. * The split layer writer is used in the gc-compaction algorithm, and it will split layers based on size. * Fix the image layer summary block encoded the wrong key range. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-08-26 14:19:47 -04:00
Christian Schwarz	97241776aa	pageserver: startup: ensure local disk state is durable (#8835 ) refs https://github.com/neondatabase/neon/issues/6989 Problem ------- After unclean shutdown, we get restarted, start reading the local filesystem, and make decisions based on those reads. However, some of the data might have not yet been fsynced when the unclean shutdown completed. Durability matters even though Pageservers are conceptually just a cache of state in S3. For example: - the cloud control plane is no control loop => pageserver responses to tenant attachmentm, etc, needs to be durable. - the storage controller does not rely on this (as much?) - we don't have layer file checksumming, so, downloaded+renamed but not fsynced layer files are technically not to be trusted - https://github.com/neondatabase/neon/issues/2683 Solution -------- `syncfs` the tenants directory during startup, before we start reading from it. This is a bit overkill because we do remove some temp files (InMemoryLayer!) later during startup. Further, these temp files are particularly likely to be dirty in the kernel page cache. However, we don't want to refactor that cleanup code right now, and the dirty data on pageservers is generally not that high. Last, with [direct IO](https://github.com/neondatabase/neon/issues/8130) we're going to have near-zero kernel page cache anyway quite soon.	2024-08-26 18:07:55 +02:00
Arpad Müller	2dd53e7ae0	Timeline archival test (#8824 ) This PR: * Implements the rule that archived timelines require all of their children to be archived as well, as specified in the RFC. There is no fancy locking mechanism though, so the precondition can still be broken. As a TODO for later, we still allow unarchiving timelines with archived parents. * Adds an `is_archived` flag to `TimelineInfo` * Adds timeline_archival_config to `PageserverHttpClient` * Adds a new `test_timeline_archive` test, loosely based on `test_timeline_delete` Part of #8088	2024-08-26 17:30:19 +02:00
Folke Behrens	d6eede515a	proxy: clippy lints: handle some low hanging fruit (#8829 ) Should be mostly uncontroversial ones.	2024-08-26 15:16:54 +02:00
Alexey Kondratov	d48229f50f	feat(compute): Introduce new compute_subscriptions_count metric (#8796 ) ## Problem We need some metric to sneak peek into how many people use inbound logical replication (Neon is a subscriber). ## Summary of changes This commit adds a new metric `compute_subscriptions_count`, which is number of subscriptions grouped by enabled/disabled state. Resolves: neondatabase/cloud#16146	2024-08-26 14:34:18 +02:00
Jakub Kołodziejczak	cdfdcd3e5d	chore: improve markdown formatting (#8825 ) fixes: ![Screenshot_2024-08-25_16-25-30](https://github.com/user-attachments/assets/c993309b-6c2d-4938-9fd0-ce0953fc63ff) fixes: ![Screenshot_2024-08-25_16-26-29](https://github.com/user-attachments/assets/cf497f4a-d9e3-45a6-a1a5-7e215d96d022)	2024-08-25 16:33:45 +01:00
Conrad Ludgate	06795c6b9a	proxy: new local-proxy application (#8736 ) Add binary for local-proxy that uses the local auth backend. Runs only the http serverless driver support and offers config reload based on a config file and SIGHUP	2024-08-23 22:32:10 +01:00
Conrad Ludgate	701cb61b57	proxy: local auth backend (#8806 ) Adds a Local authentication backend. Updates http to extract JWT bearer tokens and passes them to the local backend to validate.	2024-08-23 18:48:06 +00:00
John Spray	0aa1450936	storage controller: enable timeline CRUD operations to run concurrently with reconciliation & make them safer (#8783 ) ## Problem - If a reconciler was waiting to be able to notify computes about a change, but the control plane was waiting for the controller to finish a timeline creation/deletion, the overall system can deadlock. - If a tenant shard was migrated concurrently with a timeline creation/deletion, there was a risk that the timeline operation could be applied to a non-latest-generation location, and thereby not really be persistent. This has never happened in practice, but would eventually happen at scale. Closes: #8743 ## Summary of changes - Introduce `Service::tenant_remote_mutation` helper, which looks up shards & generations and passes them into an inner function that may do remote I/O to pageservers. Before returning success, this helper checks that generations haven't incremented, to guarantee that changes are persistent. - Convert tenant_timeline_create, tenant_timeline_delete, and tenant_timeline_detach_ancestor to use this helper. - These functions no longer block on ensure_attached unless the tenant was never attached at all, so they should make progress even if we can't complete compute notifications. This increases the database load from timeline/create operations, but only with cheap read transactions.	2024-08-23 18:56:05 +01:00
John Spray	b65a95f12e	controller: use PageserverUtilization for scheduling (#8711 ) ## Problem Previously, the controller only used the shard counts for scheduling. This works well when hosting only many-sharded tenants, but works much less well when hosting single-sharded tenants that have a greater deviation in size-per-shard. Closes: https://github.com/neondatabase/neon/issues/7798 ## Summary of changes - Instead of UtilizationScore, carry the full PageserverUtilization through into the Scheduler. - Use the PageserverUtilization::score() instead of shard count when ordering nodes in scheduling. Q: Why did test_sharding_split_smoke need updating in this PR? A: There's an interesting side effect during shard splits: because we do not decrement the shard count in the utilization when we de-schedule the shards from before the split, the controller will now prefer to pick _different_ nodes for shards compared with which ones held secondaries before the split. We could use our knowledge of splitting to fix up the utilizations more actively in this situation, but I'm leaning toward leaving the code simpler, as in practical systems the impact of one shard on the utilization of a node should be fairly low (single digit %).	2024-08-23 18:32:56 +01:00
Conrad Ludgate	c1cb7a0fa0	proxy: flesh out JWT verification code (#8805 ) This change adds in the necessary verification steps for the JWT payload, and adds per-role querying of JWKs as needed for #8736	2024-08-23 18:01:02 +01:00
Alex Chi Z.	f4cac1f30f	impr(pageserver): error if keys are unordered in merge iter (#8818 ) In case of corrupted delta layers, we can detect the corruption and bail out the compaction. ## Summary of changes * Detect wrong delta desc of key range * Detect unordered deltas Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-23 16:38:42 +00:00
Conrad Ludgate	612b643315	update diesel (#8816 ) https://rustsec.org/advisories/RUSTSEC-2024-0365	2024-08-23 15:28:22 +00:00
Vlad Lazar	bcc68a7866	storcon_cli: add support for drain and fill operations (#8791 ) ## Problem We have been naughty and curl-ed storcon to fix-up drains and fills. ## Summary of changes Add support for starting/cancelling drain/fill operations via `storcon_cli`.	2024-08-23 14:48:06 +01:00
Joonas Koivunen	73286e6b9f	test: copy dict to avoid error on retry (#8811 ) there is no "const" in python, so when we modify the global dict, it will remain that way on the retry. fix to not have it influence other tests which might be run on the same runner. evidence: <https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8625/10513146742/index.html#/testresult/453c4ce05ada7496>	2024-08-23 14:43:08 +01:00
Alex Chi Z.	bc8cfe1b55	fix(pageserver): l0 check criteria (#8797 ) close https://github.com/neondatabase/neon/issues/8579 ## Summary of changes The `is_l0` check now takes both layer key range and the layer type. This allows us to have image layers covering the full key range in btm-most compaction (upcoming PR). However, we still don't allow delta layers to cover the full key range, and I will make btm-most compaction to generate delta layers with the key range of the keys existing in the layer instead of `Key::MIN..Key::HACK_MAX` (upcoming PR). Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-23 09:42:45 -04:00
Alex Chi Z.	6a74bcadec	feat(pageserver): remove features=testing restriction for compact (#8815 ) A small PR to make it possible to run force compaction in staging for btm-gc compaction testing. Part of https://github.com/neondatabase/neon/issues/8002 Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-23 14:32:00 +01:00
Alexander Bayandin	e62cd9e121	CI(autocomment): add arch to build type (#8809 ) ## Problem Failed / flaky tests for different arches don't have any difference in GitHub Autocomment ## Summary of changes - Add arch to build type for GitHub autocomment	2024-08-23 14:29:11 +01:00
Arpad Müller	e80ab8fd6a	Update serde_json to 1.0.125 (#8813 ) Updates `serde_json` to `1.0.125`, rolling out speedups added by a serde_json contributor. Release [link](https://github.com/serde-rs/json/releases/tag/1.0.125). Blog post [link](https://purplesyringa.moe/blog/i-sped-up-serde-json-strings-by-20-percent/).	2024-08-23 12:14:14 +01:00
MMeent	d8ca495eae	Require poetry >=1.8 (#8812 ) This was already a requirement for installing the python packages after https://github.com/neondatabase/neon/pull/8609 got merged, so this updates the documentation to reflect that.	2024-08-23 11:48:26 +01:00
Heikki Linnakangas	dbdb8a1187	Document how to use "git merge" for PostgreSQL minor version upgrades. (#8692 ) Our new policy is to use the "rebase" method and slice all the Neon commits into a nice patch set when doing a new major version, and use "merge" method on minor version upgrades on the release branches. "git merge" preserves the git history of Neon commits on the Postgres branches. While it's nice to rebase all the Neon changes to a logical patch set against upstream, having to do it between every minor release is a fair amount work, and it loses the history, and is more error-prone.	2024-08-23 09:15:55 +03:00
Tristan Partin	f7ab3ffcb7	Check that TERM != dumb before using colors in pre-commit.py Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-08-22 18:03:45 -05:00
Tristan Partin	2f8d548a12	Update Postgres 16 to 16.4 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-08-22 18:03:45 -05:00
Tristan Partin	66db381dc9	Update Postgres 15 to 15.8 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-08-22 18:03:45 -05:00
Tristan Partin	6744ed19d8	Update Postgres 14 to 14.13 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-08-22 18:03:45 -05:00
Tristan Partin	ae63ac7488	Write messages field by field instead of bytes sheet in test_simple_sync_safekeepers Co-authored-by: Arseny Sher <ars@neon.tech>	2024-08-22 18:03:45 -05:00
Alex Chi Z.	6eb638f4b3	feat(pageserver): warn on aux v1 tenants + default to v2 (#8625 ) part of https://github.com/neondatabase/neon/issues/8623 We want to discover potential aux v1 customers that we might have missed from the migrations. ## Summary of changes Log warnings on basebackup, load timeline, and the first put_file. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-22 22:31:38 +01:00
Konstantin Knizhnik	7a485b599b	Fix race condition in LRU list update in get_cached_relsize (#8807 ) ## Problem See https://neondb.slack.com/archives/C07J14D8GTX/p1724347552023709 Manipulations with LRU list in relation size cache are performed under shared lock ## Summary of changes Take exclusive lock ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-08-22 23:53:37 +03:00
Joonas Koivunen	b1c457898b	test_compatibility: flush in the end (#8804 ) `test_forward_compatibility` is still often failing at graceful shutdown. Fix this by explicit flush before shutdown. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/main/10506613738/index.html#testresult/5e7111907f7ecfb2/ Cc: #8655 and #8708 Previous attempt: #8787	2024-08-22 16:38:03 +01:00
Folke Behrens	1a9d559be8	proxy: Enable stricter/pedantic clippy checks (#8775 ) Create a list of currently allowed exceptions that should be reduced over time.	2024-08-22 13:29:05 +02:00
Alexey Kondratov	0e6c0d47a5	Revert "Use sycnhronous commit for logical replicaiton worker (#8645 )" (#8792 ) This reverts commit `cbe8c77997`. This change was originally made to test a hypothesis, but after that, the proper fix #8669 was merged, so now it's not needed. Moreover, the test is still flaky, so probably this bug was not a reason of the flakiness. Related to #8097	2024-08-22 12:52:36 +02:00

1 2 3 4 5 ...

5965 Commits