rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-03 20:20:38 +00:00

Author	SHA1	Message	Date
Alexander Bayandin	fa52cd575e	Remove old tests results and old coverage collection (#6376 ) ## Problem We have switched to new test results and new coverage results, so no need to collect these data in old formats. ## Summary of changes - Remove "Upload coverage report" for old coverage report - Remove "Store Allure test stat in the DB" for old test results format	2024-02-01 13:36:55 +00:00
Vlad Lazar	d2c410c748	pageserver_api: remove overlaps from KeySpace (#6544 ) This commit adds a function to `KeySpace` which updates a key key space by removing all overlaps with a second key space. This can involve splitting or removing of existing ranges. The implementation is not particularly efficient: O(M * N * log(N)) where N is the number of ranges in the current key space and M is the number of ranges in the key space we are checking against. In practice, this shouldn't matter much since, in the short term, the only caller of this function will be the vectored read path and the number of key spaces invovled will be small. This follows from the upper bound placed on the number of keys accepted by the vectored read path. A couple other small utility functions are added. They'll be used by the vectored search path as well.	2024-02-01 13:14:35 +00:00
Vlad Lazar	221531c9db	pageserver: lift ancestor timeline logic from read path (#6543 ) When the read path needs to follow a key into the ancestor timeline, it needs to wait for said ancestor to become active and aware of it's branching lsn. The logic is lifted into a separate function with it's own new error type. This is done because the vectored read path needs the same logic. It's also the reason for the newly introduced error type. When we'll switch the read path to proxy into `get_vectored`, we can remove the duplicated variants from `PageReconstructError`.	2024-02-01 10:35:18 +00:00
Christian Schwarz	4c173456dc	pagebench: fix percentiles reporting (#6547 ) Before this patch, pagebench was always showing the same value. refs https://github.com/neondatabase/neon/issues/6509	2024-01-31 23:29:48 +00:00
Christian Schwarz	e82625b77d	refactor(pageserver main): signal handling (#6554 ) This refactoring makes it easier to experimentally replace BACKGROUND_RUNTIME with a single-threaded runtime. Found this useful [during benchmarking](https://github.com/neondatabase/neon/pull/6555).	2024-01-31 23:25:57 +00:00
Christian Schwarz	0ac1e71524	update tokio-epoll-uring (#6558 ) to pull in fixes for https://github.com/neondatabase/tokio-epoll-uring/issues/37	2024-01-31 22:54:54 +00:00
Anna Khanova	271133d960	Proxy: reduce number of get role secret calls (#6557 ) ## Problem Right now if get_role_secret response wasn't cached (e.g. cache already reached max size) it will send the second (exactly the same request). ## Summary of changes Avoid needless request.	2024-01-31 22:16:56 +00:00
Joonas Koivunen	3d5fab127a	rewrite Gate impl for better observability (#6542 ) changes: - two messages instead of message every second when gate was closing - replace the gate name string by using a pointer - slow GateGuards are likely to log who they were (see example) example found in regress tests: <https://github.com/neondatabase/neon/pull/6542#issuecomment-1919009256>	2024-01-31 22:15:58 +00:00
Joonas Koivunen	66719d7eaf	logging: fix span usage (#6549 ) Fixes some duplication due to extra or misconfigured `#[instrument]`, while filling in the `timeline_id` to delete timeline flow calls.	2024-01-31 20:52:00 +00:00
Konstantin Knizhnik	9a9d9beaee	Download SLRU segments on demand (#6151 ) ## Problem See https://github.com/neondatabase/cloud/issues/8673 ## Summary of changes Download missed SLRU segments from page server ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-01-31 21:39:18 +02:00
John Spray	2bfc831c60	control_plane/attachment_service: make --path optional (#6545 ) ## Problem The `--path` argument is only used in testing, for compat tests that use a JSON snapshot of state rather than the postgres database. In regular deployments, it should be omitted (currently one has to specify `--path ""`) ## Summary of changes Make `--path` optional.	2024-01-31 17:02:41 +00:00
Joonas Koivunen	799db161d3	tests: support for running on single pg version, use in one place (#6525 ) Some tests which are unit test alike do not need to run on different pg versions. Logging test is one of them which I found for unrelated reasons. Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-01-31 17:37:25 +02:00
Arpad Müller	47380be12d	Remove version param from get_lsn_by_timestamp (#6551 ) This removes the last remnants of the version param added by #5608 , concluding the transition plan laid out in https://github.com/neondatabase/cloud/pull/7553#discussion_r1370473911 . It follows PR https://github.com/neondatabase/cloud/pull/9202, which we now assume has been deployed to all environments. Full history: * https://github.com/neondatabase/neon/pull/5608 * https://github.com/neondatabase/cloud/pull/7553 * https://github.com/neondatabase/neon/pull/6178 * https://github.com/neondatabase/cloud/pull/9202	2024-01-31 15:30:19 +01:00
Conrad Ludgate	c7b02ce8ec	proxy: use jemalloc (#6531 ) ## Summary of changes Experiment with jemalloc in proxy	2024-01-31 14:51:11 +01:00
John Spray	4010adf653	control_plane/attachment_service: complete APIs (#6394 ) Depends on: https://github.com/neondatabase/neon/pull/6468 ## Problem The sharding service will be used as a "virtual pageserver" by the control plane -- so it needs the set of pageserver APIs that the control plane uses, and to present them under identical URLs, including prefix (/v1). ## Summary of changes - Add missing APIs: - Tenant deletion - Timeline deletion - Node list (used in test now, later in tools) - `/location_config` API (for migrating tenants into the sharding service) - Rework attachment service URLs: - `/v1` prefix is used for pageserver-compatible APIs - `/upcall/v1` prefix is used for APIs that are called by the pageserver (re-attach and validate) - `/debug/v1` prefix is used for endpoints that are for testing - `/control/v1` prefix is used for new sharding service APIs that do not mimic a pageserver API, such as registering and configuring nodes. - Add test_sharding_service. The sharding service already had some collateral coverage from its use in general tests, but this is the first dedicated testing for it.	2024-01-31 12:23:06 +00:00
Konstantin Knizhnik	e10a7ee391	Prevent to frequent reconnects in case of race condition errors returned by PS (tenant not found) (#6522 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1706531433057289 ## Summary of changes 1. Do not decrease reconnect timeout until maximal interval value (1 second) is reached 2. Compute reconnect time after connection attempt is taken to exclude connect time itself from the interval measurement. So now backend should not perform more than 4 reconnect attempts per second. But please notice that backoff is performed locally in each backend and so if there are many active backends, then connection (and so error) rate may be much higher. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-01-31 09:17:32 +02:00
Sasha Krassovsky	e8c9a51273	Allow creating subscriptions as neon_superuser (#6484 ) ## Problem We currently can't create subscriptions in PG14 and PG15 because only superusers can, and PG16 requires adding roles to pg_create_subscription. ## Summary of changes I added changes to PG14 and PG15 that allow neon_superuser to bypass the superuser requirement. For PG16, I didn't do that but added a migration that adds neon_superuser to pg_create_subscription. Also added a test to make sure it works.	2024-01-30 22:32:33 -08:00
Alexander Bayandin	3c3ee8f3e8	Compute: add compatibility patch for pgvector (#6527 ) ## Problem `pgvector` requires a patch to work well with Neon (a patch created by @hlinnaka) ## Summary of changes - Apply the patch to `pgvector`	2024-01-30 17:33:24 +00:00
Arpad Müller	6928a34f59	S3 DR: Large prefix improvements (#6515 ) ## Problem PR #6500 has removed the limiting by number of versions/deletions for time travel calls. We never get informed about how many versions there are, and thus the call would just hang without any indication of progress. ## Summary of changes We improve the pageserver's behaviour with large prefixes, i.e. those with many keys, removed or currently still available. * Add a hard limit of 100k versions/deletions. For the reasoning see https://github.com/neondatabase/cloud/issues/8233#issuecomment-1915021625 , but TLDR it will roughly support tenants of 2 TiB size, of course depending on general write activity and duration of the s3 retention window. The goal is to have a limit at all so that the process doesn't accumulate increasing numbers of versions until an eventual crash. * Lower the RAM footprint for the `VerOrDelete` datastructure. This means we now don't cache a lot of redundant metadata in RAM like the owner ID. The top level datastructure's footprint goes down from 264 bytes to 80 (but it contains strings that are not counted in there). Follow-up of #6500, part of https://github.com/neondatabase/cloud/issues/8233 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-01-30 15:57:27 +00:00
Arseny Sher	bc684e9d3b	Make WAL segment init atomic. Since fdatasync is used for flushing WAL, changing file size is unsafe. Make segment creation atomic by using tmp file + rename to avoid using partially initialized segments. fixes https://github.com/neondatabase/neon/issues/6402	2024-01-30 18:05:22 +04:00
Arseny Sher	08532231ee	Fix find_end_of_wal busy loop. It hanged if file size is less than of a normal segment. Normally that doesn't happen, but it might in case of crash during segment init. We're going to fix that half initialized segment by durably renaming it after cooking, so this fix won't be needed, but better avoid busy loop anyway. fixes https://github.com/neondatabase/neon/issues/6401	2024-01-30 18:05:22 +04:00
Christian Schwarz	79137a089f	fix(#6366 ): pageserver: incorrect log level for Tenant not found during basebackup (#6400 ) Before this patch, when requesting basebackup for a not-found tenant or timeline, we'd emit an ERROR-level log entry with a huge stack trace. See #6366 "Details" section for an example With this patch, we log at INFO level and only a single line. Example: ``` 2024-01-19T14:16:11.479800Z INFO page_service_conn_main{peer_addr=127.0.0.1:43448}: query handler for 'basebackup d69a536d529a68fcf85bc070030cdf4b 035484e9c28d8d0138a492caadd03ffd 0/2204340 --gzip' entity not found: Tenant d69a536d529a68fcf85bc070030cdf4b not found 2024-01-19T14:19:35.807819Z INFO page_service_conn_main{peer_addr=127.0.0.1:48862}: query handler for 'basebackup d69a536d529a68fcf85bc070030cdf4a 035484e9c28d8d0138a492caadd03ffd 0/2204340 --gzip' entity not found: Timeline d69a536d529a68fcf85bc070030cdf4a/035484e9c28d8d0138a492caadd03ffd was not found ``` fixes https://github.com/neondatabase/neon/issues/6366 Changes ------- - Change `handle_basebackup_request` to return a `QueryError` - The new `impl From<WaitLsnError> for QueryError` is needed so the `?` at `wait_lsn()` call in `handle_basebackup_request` works again. It's duplicating `impl From<WaitLsnError> for PageStreamError`. - Remove hard-to-spot conversion of `handle_basebackup_request` return value to anyhow::Result (the place where I replaced `anyhow::Ok` with `Result::<(), QueryError>::Ok(())` - Add forgotten distinguished handling for "Tenant not found" case in `impl From<GetActiveTenantError> for QueryError` This was not at all pleasant, and I find it very hard to follow the various error conversions. It took me a while to spot the hard-to-spot `anyhow::Ok` thing above. It would have been caught by the compiler if we weren't auto-converting `anyhow::Error` into `QueryError::Other`. We should move away from that, in my opinion, instead forcing each `.context()` site to become `.context().map_err(QueryError::Other)`. But that's for a future PR.	2024-01-30 13:10:48 +00:00
Joonas Koivunen	e3cb715e8a	fix: capture initdb stderr, discard others (#6524 ) When using spawn + wait_with_output instead of std::process::Command::output or tokio::process::Command::output we must configure the redirection. Fixes: #6523 by discarding the stdout completely, we only care about stderr if any.	2024-01-30 14:07:58 +01:00
dependabot[bot]	c70bf9150f	build(deps): bump aiohttp from 3.9.0 to 3.9.2 (#6518 )	2024-01-30 10:46:49 +00:00
Alexander Bayandin	8e4da52069	Compute: pgvector 0.6.0 (#6517 ) Update pgvector extension from 0.5.1 to 0.6.0	2024-01-30 09:29:45 +00:00
Arthur Petukhovsky	2ff1a5cecd	Patch safekeeper control file on HTTP request (#6455 ) Closes #6397	2024-01-29 18:20:57 +00:00
Conrad Ludgate	ec8dcc2231	flatten proxy flow (#6447 ) ## Problem Taking my ideas from https://github.com/neondatabase/neon/pull/6283 and doing a bit less radical changes. smaller commits. Proxy flow was quite deeply nested, which makes adding more interesting error handling quite tricky. ## Summary of changes I recommend reviewing commit by commit. 1. move handshake logic into a separate file 2. move passthrough logic into a separate file 3. no longer accept a closure in CancelMap session logic 4. Remove connect_to_db, copy logic into handle_client 5. flatten auth_and_wake_compute in authenticate 6. record info for link auth	2024-01-29 17:38:03 +00:00
Arpad Müller	b844c6f0c7	Do pagination in list_object_versions call (#6500 ) ## Problem The tenants we want to recover might have tens of thousands of keys, or more. At that point, the AWS API returns a paginated response. ## Summary of changes Support paginated responses for `list_object_versions` requests. Follow-up of #6155, part of https://github.com/neondatabase/cloud/issues/8233	2024-01-29 17:59:26 +01:00
Alexander Bayandin	6a85a06e1b	Compute: build rdkit without freetype support (#6495 ) ## Problem `rdkit` extension is built with `RDK_BUILD_FREETYPE_SUPPORT=ON` (by default), which requires a bunch of additional dependencies, but the support of freetype fonts isn't required for Postgres. With `RDK_BUILD_FREETYPE_SUPPORT=ON`: ``` ldd /usr/local/pgsql/lib/rdkit.so linux-vdso.so.1 (0x0000ffff82ea8000) libfreetype.so.6 => /usr/lib/aarch64-linux-gnu/libfreetype.so.6 (0x0000ffff825e5000) libboost_serialization.so.1.74.0 => /usr/lib/aarch64-linux-gnu/libboost_serialization.so.1.74.0 (0x0000ffff82590000) libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000ffff8255f000) libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000ffff82387000) libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffff822dc000) libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000ffff822b8000) libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffff82144000) libpng16.so.16 => /usr/lib/aarch64-linux-gnu/libpng16.so.16 (0x0000ffff820fd000) libz.so.1 => /lib/aarch64-linux-gnu/libz.so.1 (0x0000ffff820d3000) libbrotlidec.so.1 => /usr/lib/aarch64-linux-gnu/libbrotlidec.so.1 (0x0000ffff820b8000) /lib/ld-linux-aarch64.so.1 (0x0000ffff82e78000) libbrotlicommon.so.1 => /usr/lib/aarch64-linux-gnu/libbrotlicommon.so.1 (0x0000ffff82087000) ``` With `RDK_BUILD_FREETYPE_SUPPORT=OFF`: ``` ldd /usr/local/pgsql/lib/rdkit.so linux-vdso.so.1 (0x0000ffffbba75000) libboost_serialization.so.1.74.0 => /usr/lib/aarch64-linux-gnu/libboost_serialization.so.1.74.0 (0x0000ffffbb259000) libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000ffffbb228000) libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000ffffbb050000) libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffbafa5000) libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000ffffbaf81000) libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffbae0d000) /lib/ld-linux-aarch64.so.1 (0x0000ffffbba45000) ``` ## Summary of changes - Build `rdkit` with `RDK_BUILD_FREETYPE_SUPPORT=OFF` - Remove extra dependencies from the Compute image	2024-01-29 16:16:37 +00:00
John Spray	b04a6acd6c	docker: add attachment_service binary (#6506 ) ## Problem Creating sharded tenants will require an instance of the sharding service -- the initial goal is to deploy one of these in a staging region (https://github.com/neondatabase/cloud/issues/9718). It will run as a kubernetes container, similar to the storage broker, so needs to be built into the container image. ## Summary of changes Add `attachment_service` binary to container image	2024-01-29 13:31:56 +00:00
Vlad Lazar	0c7b89235c	pageserver: add range layer map search implementation (#6469 ) ## Problem There's no efficient way of querying the layer map for a range. ## Summary of changes Introduce a range query for the layer map (`LayerMap::range_search`). There's two broad steps to it: 1. Find all coverage changes for layers that intersect the queried range (see `LayerCoverage::range_overlaps`). The slightly tricky part is dealing with the start of the range. We can either be aligned with a layer or not and we need to treat these cases differently. 2. Iterate over the coverage changes and collect the result. For this we use a two pointer approach: the trailing pointer tracks the start of the current range (current location in the key space) and the forward pointer tracks the next coverage change. Plugging the range search into the read path is deferred to a future PR. ## Performance I adapted the layer map benchmarks on a local branch. Range searches are between 2x and 2.5x slower than point searches. That's in line with what I expected since we query thelayer map twice. Since `Timeline::get` will proxy to `Timeline::get_vectored` we can special case the one element layer map range search at that point.	2024-01-29 09:47:12 +00:00
Joonas Koivunen	1e9a50bca8	disk_usage_eviction_task: cleanup summaries (#6490 ) This is the "partial revert" of #6384. The summaries turned out to be expensive due to naive vec usage, but also inconclusive because of the additional context required. In addition to removing summary traces, small refactoring is done.	2024-01-29 10:38:40 +02:00
Conrad Ludgate	511e730cc0	hll experiment (#6312 ) ## Problem Measuring cardinality using logs is expensive and slow. ## Summary of changes Implement a pre-aggregated HyperLogLog-based cardinality estimate. HyperLogLog estimates the cardinality of a set by using the probability that the uniform hash of a value will have a run of n 0s at the end is `1/2^n`, therefore, having observed a run of `n` 0s suggests we have measured `2^n` distinct values. By using multiple shards, we can use the harmonic mean to get a more accurate estimate. We record this into a Prometheus time-series. HyperLogLog counts can be merged by taking the `max` of each shard. We can apply a `max_over_time` in order to find the estimate of cardinality of distinct values over time	2024-01-29 07:26:20 +00:00
Konstantin Knizhnik	c1148dc9ac	Fix calculation of maximal multixact in ingest_multixact_create_record (#6502 ) ## Problem See https://neondb.slack.com/archives/C06F5UJH601/p1706373716661439 ## Summary of changes Use None instead of 0 as initial accumulator value for calculating maximal multixact XID. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-01-29 07:39:16 +02:00
Anna Khanova	8253cf1931	proxy: Relax endpoint check (#6503 ) ## Problem http-over-sql allowes host to be in format api.aws.... however it's not the case for the websocket flow. ## Summary of changes Relax endpoint check for the ws serverless connections.	2024-01-28 21:27:14 +00:00
Christian Schwarz	3a82430432	fixup(#6492 ): also switch the benchmarks that runs on merge-to-main back to std-fs (#6501 )	2024-01-28 00:15:11 +01:00
Arpad Müller	734755eaca	Enable nextest retries for the arm build (#6496 ) Also make the NEXTEST_RETRIES declaration more local. Requested in https://github.com/neondatabase/neon/pull/6493#issuecomment-1912110202	2024-01-27 05:16:11 +01:00
Christian Schwarz	e34166a28f	CI: switch back to std-fs io engine for soak time before next release (#6492 ) PR #5824 introduced the concept of io engines in pageserver and implemented `tokio-epoll-uring` in addition to our current method, `std-fs`. We used `tokio-epoll-uring` in CI for a day to get more exposure to the code. Now it's time to switch CI back so that we test with `std-fs` as well, because that's what we're (still) using in production.	2024-01-26 22:48:34 +01:00
Christian Schwarz	3a36a0a227	fix(test suite): some tests leak child processes (#6497 )	2024-01-26 18:23:53 +00:00
John Spray	58f6cb649e	control_plane: database persistence for attachment_service (#6468 ) ## Problem Spun off from https://github.com/neondatabase/neon/pull/6394 -- this PR is just the persistence parts and the changes that enable it to work nicely ## Summary of changes - Revert #6444 and #6450 - In neon_local, start a vanilla postgres instance for the attachment service to use. - Adopt `diesel` crate for database access in attachment service. This uses raw SQL migrations as the source of truth for the schema, so it's a soft dependency: we can switch libraries pretty easily. - Rewrite persistence.rs to use postgres (via diesel) instead of JSON. - Preserve JSON read+write at startup and shutdown: this enables using the JSON format in compatibility tests, so that we don't have to commit to our DB schema yet. - In neon_local, run database creation + migrations before starting attachment service - Run the initial reconciliation in Service::spawn in the background, so that the pageserver + attachment service don't get stuck waiting for each other to start, when restarting both together in a test.	2024-01-26 17:20:44 +00:00
Arpad Müller	dcc7610ad6	Do backoff::retry in s3 timetravel test (#6493 ) The top level retries weren't enough, probably because we do so many network requests. Fine grained retries ensure that there is higher potential for the entire test to succeed. To demonstrate this, consider the following example: let's assume that each request has 5% chance of failing and we do 10 requests. Then chances of success without any retries is 0.95^10 = 0.6. With 3 top level retries it is 1-0.4^3 = 0.936. With 3 fine grained retries it is (1-0.05^3)^10 = 0.9988 (roundings implicit). So chances of failure are 6.4% for the top level retry vs 0.12% for the fine grained retry. Follow-up of #6155	2024-01-26 16:43:56 +00:00
Alexander Bayandin	4c245b0f5a	update_build_tools_image.yml: Push build-tools image to Docker Hub (#6481 ) ## Problem - `docker.io/neondatabase/build-tools:pinned` image is frequently outdated on Docker Hub because there's no automated way to update it. - `update_build_tools_image.yml` workflow contains legacy roll-back logic, which is not required anymore because it updates only a single image. ## Summary of changes - Make `update_build_tools_image.yml` workflow push images to both ECR and Docker Hub - Remove unneeded roll-back logic	2024-01-26 16:12:49 +00:00
John Spray	55b7cde665	tests: add basic coverage for sharding (#6380 ) ## Problem The support for sharding in the pageserver was written before https://github.com/neondatabase/neon/pull/6205 landed, so when it landed we couldn't directly test sharding. ## Summary of changes - Add `test_sharding_smoke` which tests the basics of creating a sharding tenant, creating a timeline within it, checking that data within it is distributed. - Add modes to pg_regress tests for running with 4 shards as well as with 1.	2024-01-26 14:40:47 +00:00
Vlad Lazar	5b34d5f561	pageserver: add vectored get latency histogram (#6461 ) This patch introduces a new set of grafana metrics for a histogram: pageserver_get_vectored_seconds_bucket{task_kind="Compaction\|PageRequestHandler"}. While it has a `task_kind` label, only compaction and SLRU fetches are tracked. This reduces the increase in cardinality to 24. The metric should allow us to isolate performance regressions while the vectorized get is being implemented. Once the implementation is complete, it'll also allow us to quantify the improvements.	2024-01-26 13:40:03 +00:00
Alexander Bayandin	26c55b0255	Compute: fix rdkit extension build (#6488 ) ## Problem `rdkit` extension build started to fail because of the changed checksum of the Comic Neue font: ``` Downloading https://fonts.google.com/download?family=Comic%20Neue... CMake Error at Code/cmake/Modules/RDKitUtils.cmake:257 (MESSAGE): The md5 checksum for /rdkit-src/Code/GraphMol/MolDraw2D/Comic_Neue.zip is incorrect; expected: 850b0df852f1cda4970887b540f8f333, found: b7fd0df73ad4637504432d72a0accb8f ``` https://github.com/neondatabase/neon/actions/runs/7666530536/job/20895534826 Ref https://neondb.slack.com/archives/C059ZC138NR/p1706265392422469 ## Summary of changes - Disable comic fonts for `rdkit` extension	2024-01-26 12:39:20 +00:00
Vadim Kharitonov	12e9b2a909	Update plv8 (#6465 )	2024-01-26 09:56:11 +00:00
Christian Schwarz	918b03b3b0	integrate tokio-epoll-uring as alternative VirtualFile IO engine (#5824 )	2024-01-26 09:25:07 +01:00
Alexander Bayandin	d36623ad74	CI: cancel old e2e-tests on new commits (#6463 ) ## Problem Triggered `e2e-tests` job is not cancelled along with other jobs in a PR if the PR get new commits. We can improve the situation by setting `concurrency_group` for the remote workflow (https://github.com/neondatabase/cloud/pull/9622 adds `concurrency_group` group input to the remote workflow). Ref https://neondb.slack.com/archives/C059ZC138NR/p1706087124297569 Cloud's part added in https://github.com/neondatabase/cloud/pull/9622 ## Summary of changes - Set `concurrency_group` parameter when triggering `e2e-tests` - At the beginning of a CI pipeline, trigger Cloud's `cancel-previous-in-concurrency-group.yml` workflow which cancels previously triggered e2e-tests	2024-01-25 19:25:29 +00:00
Christian Schwarz	689ad72e92	fix(neon_local): leaks child process if it fails to start & pass checks (#6474 ) refs https://github.com/neondatabase/neon/issues/6473 Before this PR, if process_started() didn't return Ok(true) until we ran out of retries, we'd return an error but leave the process running. Try it by adding a 20s sleep to the pageserver `main()`, e.g., right before we claim the pidfile. Without this PR, output looks like so: ``` (.venv) cs@devvm-mbp:[~/src/neon-work-2]: ./target/debug/neon_local start Starting neon broker at 127.0.0.1:50051. storage_broker started, pid: 2710939 . attachment_service started, pid: 2710949 Starting pageserver node 1 at '127.0.0.1:64000' in ".neon/pageserver_1"..... pageserver has not started yet, continuing to wait..... pageserver 1 start failed: pageserver did not start in 10 seconds No process is holding the pidfile. The process must have already exited. Leave in place to avoid race conditions: ".neon/pageserver_1/pageserver.pid" No process is holding the pidfile. The process must have already exited. Leave in place to avoid race conditions: ".neon/safekeepers/sk1/safekeeper.pid" Stopping storage_broker with pid 2710939 immediately....... storage_broker has not stopped yet, continuing to wait..... neon broker stop failed: storage_broker with pid 2710939 did not stop in 10 seconds Stopping attachment_service with pid 2710949 immediately....... attachment_service has not stopped yet, continuing to wait..... attachment service stop failed: attachment_service with pid 2710949 did not stop in 10 seconds ``` and we leak the pageserver process ``` (.venv) cs@devvm-mbp:[~/src/neon-work-2]: ps aux \| grep pageserver cs 2710959 0.0 0.2 2377960 47616 pts/4 Sl 14:36 0:00 /home/cs/src/neon-work-2/target/debug/pageserver -D .neon/pageserver_1 -c id=1 -c pg_distrib_dir='/home/cs/src/neon-work-2/pg_install' -c http_auth_type='Trust' -c pg_auth_type='Trust' -c listen_http_addr='127.0.0.1:9898' -c listen_pg_addr='127.0.0.1:64000' -c broker_endpoint='http://127.0.0.1:50051/' -c control_plane_api='http://127.0.0.1:1234/' -c remote_storage={local_path='../local_fs_remote_storage/pageserver'} ``` After this PR, there is no leaked process.	2024-01-25 19:20:02 +01:00
Christian Schwarz	fd4cce9417	test_pageserver_max_throughput_getpage_at_latest_lsn: remove n_tenants=100 combination (#6477 ) Need to fix the neon_local timeouts first (https://github.com/neondatabase/neon/issues/6473) and also not run them on every merge, but only nightly: https://github.com/neondatabase/neon/issues/6476	2024-01-25 18:17:53 +00:00

1 2 3 4 5 ...

4491 Commits