Commit Graph

2227 Commits

Author SHA1 Message Date
Dmitry Rodionov
4c1cb890db try to toggle background activity without a race condition 2022-10-20 22:30:42 +03:00
Arthur Petukhovsky
f5ab9f761b Remove flaky checks in test_delete_force (#2567) 2022-10-20 17:14:32 +04:00
Kirill Bulatov
306a47c4fa Use uninit mark files during timeline init for atomic creation (#2489)
Part of https://github.com/neondatabase/neon/pull/2239

Regular, from scratch, timeline creation involves initdb to be run in a separate directory, data from this directory to be imported into pageserver and, finally, timeline-related background tasks to start.

This PR ensures we don't leave behind any directories that are not marked as temporary and that pageserver removes such directories on restart, allowing timeline creation to be retried with the same IDs, if needed.

It would be good to later rewrite the logic to use a temporary directory, similar what tenant creation does.
Yet currently it's harder than this change, so not done.
2022-10-20 14:19:17 +03:00
Kirill Bulatov
84c5f681b0 Fix test feature detection (#2659)
Follow-up of #2636 and #2654 , fixing the test detection feature.

Pageserver currently outputs features as

```
/target/debug/pageserver --version
Neon page server git:7734929a8202c8cc41596a861ffbe0b51b5f3cb9 failpoints: true, features: ["testing", "profiling"]
```
2022-10-20 13:44:03 +03:00
Kirill Bulatov
50297bef9f RFC about Tenant / Timeline guard objects (#2660)
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2022-10-20 12:49:54 +03:00
Andrés
9211923bef Pageserver Python tests should not fail if the server is built with no testing feature (#2636)
Co-authored-by: andres <andres.rodriguez@outlook.es>
2022-10-20 10:46:57 +03:00
bojanserafimov
7734929a82 Remove stale todos (#2630) 2022-10-19 22:59:22 +00:00
Heikki Linnakangas
bc5ec43056 Fix flaky physical-size tests in test_timeline_size.py.
These two tests, test_timeline_physical_size_post_compaction and
test_timeline_physical_size_post_gc, assumed that after you have
waited for the WAL from a bulk insertion to arrive, and you run a
cycle of checkpoint and compaction, no new layer files are created.
Because if a new layer file is created while we are calculating the
incremental and non-incremental physical sizes, they might differ.

However, the tests used a very small checkpoint_distance, so even a
small amount of WAL generated in PostgreSQL could cause a new layer
file to be created. Autovacuum can kick in at any time, and do that.
That caused occasional failues in the test. I was able to reproduce it
reliably by adding a long delay between the incremental and
non-incremental size calculations:

```
--- a/pageserver/src/http/routes.rs
+++ b/pageserver/src/http/routes.rs
@@ -129,6 +129,9 @@ async fn build_timeline_info(
         }
     };
     let current_physical_size = Some(timeline.get_physical_size());
+    if include_non_incremental_physical_size {
+        std:🧵:sleep(std::time::Duration::from_millis(60000));
+    }

     let info = TimelineInfo {
         tenant_id: timeline.tenant_id,
```

To fix, disable autovacuum for the table. Autovacuum could still kick
in for other tables, e.g. catalog tables, but that seems less likely
to generate enough WAL to causea new layer file to be flushed.

If this continues to be a problem in the future, we could simply retry
the physical size call a few times, if there's a mismatch. A mismatch
could happen every once in a while, but it's very unlikely to happen
more than once or twice in a row.

Fixes https://github.com/neondatabase/neon/issues/2212
2022-10-19 23:50:21 +03:00
MMeent
b237feedab Add more redo metrics: (#2645)
- Measure size of redo WAL (new histogram), with bounds between 24B-32kB
- Add 2 more buckets at the upper end of the redo time histogram
  We often (>0.1% of several hours each day) take more than 250ms to do the
  redo round-trip to the postgres process. We need to measure these redo
  times more precisely.
2022-10-19 22:47:11 +02:00
Alexey Kondratov
4d1e48f3b9 [compute_ctl] Use postgres::config to properly escape database names (#2652)
We've got at least one user in production that cannot create a
database with a trailing space in the name.

This happens because we use `url` crate for manipulating the
DATABASE_URL, but it follows a standard that doesn't fit really
well with Postgres. For example, it trims all trailing spaces
from the path:

  > Remove any leading and trailing C0 control or space from input.
  > See: https://url.spec.whatwg.org/#url-parsing

But we used `set_path()` to set database name and it's totally valid
to have trailing spaces in the database name in Postgres.

Thus, use `postgres::config::Config` to modify database name in the
connection details.
2022-10-19 19:20:06 +02:00
Anastasia Lubennikova
7576b18b14 [compute_tools] fix GRANT CREATE ON SCHEMA public -
run the grant query in each database
2022-10-19 18:37:52 +03:00
Konstantin Knizhnik
6b49b370fc Fix build after applying PR #2558 2022-10-19 13:55:30 +03:00
Konstantin Knizhnik
91411c415a Persists latest_gc_cutoff_lsn before performing GC (#2558)
* Persists latest_gc_cutoff_lsn before performing GC

* Peform some refactoring and code deduplication

refer #2539

* Add test for persisting GC cutoff

* Fix python test style warnings

* Bump postgres version

* Reduce number of iterations in test_gc_cutoff test

* Bump postgres version

* Undo bumping postgres version
2022-10-19 12:32:03 +03:00
Kirill Bulatov
c67cf34040 Update GH Action version (#2646) 2022-10-19 11:16:36 +03:00
bojanserafimov
8fbe437768 Improve pageserver IO metrics (#2629) 2022-10-18 11:53:28 -04:00
Heikki Linnakangas
989d78aac8 Buffer the TCP incoming stream on libpq connections.
Reduces the number of syscalls needed to read the commands from the
compute.

Here's a snippet of strace output from the pageserver, when performing
a sequential scan on a table, with prefetch:

    3084934 recvfrom(47, "d", 1, 0, NULL, NULL) = 1
    3084934 recvfrom(47, "\0\0\0\37", 4, 0, NULL, NULL) = 4
    3084934 recvfrom(47, "\2\1\0\0\0\0\362\302\360\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\0\3", 27, 0, NULL, NULL) = 27
    3084934 pread64(28, "\0\0\0\1\0\0\0\0\0\0\0\253                    "..., 8192, 25190400) = 8192
    3084934 write(45, "B\0\0\0\25\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\3A\0\0\32\355\0\0\0\0\1"..., 7010) = 7010
    3084934 poll([{fd=46, events=POLLIN}, {fd=48, events=POLLIN}], 2, 60000) = 1 ([{fd=46, revents=POLLIN}])
    3084934 read(46, "\0\0\0\0p\311q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237\362\0\0\237\362\0"..., 8192) = 8192
    3084934 sendto(47, "d\0\0 \5f\0\0\0\0p\311q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237"..., 8198, MSG_NOSIGNAL, NULL, 0) = 8198
    3084934 recvfrom(47, "d", 1, 0, NULL, NULL) = 1
    3084934 recvfrom(47, "\0\0\0\37", 4, 0, NULL, NULL) = 4
    3084934 recvfrom(47, "\2\1\0\0\0\0\362\302\360\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\0\4", 27, 0, NULL, NULL) = 27
    3084934 pread64(28, "    \0=\0L\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0;;\0\0\0\4\4\0"..., 8192, 25198592) = 8192
    3084934 write(45, "B\0\0\0\25\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\4A\0\0\32\355\0\0\0\0\1"..., 7010) = 7010
    3084934 poll([{fd=46, events=POLLIN}, {fd=48, events=POLLIN}], 2, 60000) = 1 ([{fd=46, revents=POLLIN}])
    3084934 read(46, "\0\0\0\0\260\344q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237\362\0\0\237\362\0"..., 8192) = 8192
    3084934 sendto(47, "d\0\0 \5f\0\0\0\0\260\344q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237"..., 8198, MSG_NOSIGNAL, NULL, 0) = 8198
    3084934 recvfrom(47, "d", 1, 0, NULL, NULL) = 1
    3084934 recvfrom(47, "\0\0\0\37", 4, 0, NULL, NULL) = 4
    3084934 recvfrom(47, "\2\1\0\0\0\0\362\302\360\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\0\5", 27, 0, NULL, NULL) = 27
    3084934 write(45, "B\0\0\0\25\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\5A\0\0\32\355\0\0\0\0\1"..., 7010) = 7010
    3084934 poll([{fd=46, events=POLLIN}, {fd=48, events=POLLIN}], 2, 60000) = 1 ([{fd=46, revents=POLLIN}])
    3084934 read(46, "\0\0\0\0\330\377q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237\362\0\0\237\362\0"..., 8192) = 8192
    3084934 sendto(47, "d\0\0 \5f\0\0\0\0\330\377q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237"..., 8198, MSG_NOSIGNAL, NULL, 0) = 8198

This shows the interaction for three get_page_at_lsn requests. For
each request, the pageserver performs three recvfrom syscalls to read
the incoming request from the socket. After this patch, those recvfrom
calls are gone:

    3086123 read(47, "\0\0\0\0\360\222q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237\362\0\0\237\362\0"..., 8192) = 8192
    3086123 sendto(45, "d\0\0 \5f\0\0\0\0\360\222q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237"..., 8198, MSG_NOSIGNAL, NULL, 0) = 8198
    3086123 pread64(29, "                                "..., 8192, 25182208) = 8192
    3086123 write(46, "B\0\0\0\25\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\2A\0\0\32\355\0\0\0\0\1"..., 7010) = 7010
    3086123 poll([{fd=47, events=POLLIN}, {fd=49, events=POLLIN}], 2, 60000) = 1 ([{fd=47, revents=POLLIN}])
    3086123 read(47, "\0\0\0\0000\256q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237\362\0\0\237\362\0"..., 8192) = 8192
    3086123 sendto(45, "d\0\0 \5f\0\0\0\0000\256q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237"..., 8198, MSG_NOSIGNAL, NULL, 0) = 8198
    3086123 pread64(29, "\0\0\0\1\0\0\0\0\0\0\0\253                    "..., 8192, 25190400) = 8192
    3086123 write(46, "B\0\0\0\25\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\3A\0\0\32\355\0\0\0\0\1"..., 7010) = 7010
    3086123 poll([{fd=47, events=POLLIN}, {fd=49, events=POLLIN}], 2, 60000) = 1 ([{fd=47, revents=POLLIN}])
    3086123 read(47, "\0\0\0\0p\311q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237\362\0\0\237\362\0"..., 8192) = 8192
    3086123 sendto(45, "d\0\0 \5f\0\0\0\0p\311q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237"..., 8198, MSG_NOSIGNAL, NULL, 0) = 8198
    3086123 pread64(29, "    \0=\0L\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0;;\0\0\0\4\4\0"..., 8192, 25198592) = 8192
    3086123 write(46, "B\0\0\0\25\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\4A\0\0\32\355\0\0\0\0\1"..., 7010) = 7010
    3086123 poll([{fd=47, events=POLLIN}, {fd=49, events=POLLIN}], 2, 60000) = 1 ([{fd=47, revents=POLLIN}])

In this test, the compute sends a batch of prefetch requests, and they
are read from the socket in one syscall. That syscall was not captured
by the strace snippet above, but there are much fewer of them than
before.
2022-10-18 18:46:07 +03:00
Stas Kelvich
7ca72578f9 Enable plv8 again
Now with quickfix for https://github.com/plv8/plv8/issues/503
2022-10-18 18:34:27 +03:00
Heikki Linnakangas
41550ec8bf Remove unnecessary indirections of libpqwalproposer functions
In the Postgres backend, we cannot link directly with libpq (check the
pgsql-hackers arhive for all kinds of fun that ensued when we tried to
do that). Therefore, the libpq functions are used through the thin
wrapper functions in libpqwalreceiver.so, and libpqwalreceiver.so is
loaded dynamically. To hide the dynamic loading and make the calls
look like regular functions, we use macros to hide the function
pointers.

We had inherited the same indirections in libpqwalproposer, but it's
not needed since the neon extension is already a shared library that's
loaded dynamically. There's no problem calling the functions directly
there. Remove the indirections.
2022-10-18 18:25:30 +03:00
Sergey Melnikov
0cd2d91b9d Fix deploy-new job by installing sivel.toiletwater (#2641) 2022-10-18 14:44:19 +00:00
Sergey Melnikov
546e9bdbec Deploy storage into new account and migrate to management API v2 (#2619)
Deploy storage into new account
Migrate safekeeper and pageserver initialisation to management api v2
2022-10-18 15:52:15 +03:00
Heikki Linnakangas
59bc7e67e0 Use an optimized version of amplify_num.
Speeds up layer_map::search somewhat. I also opened a PR in the upstream
rust-amplify repository with these changes,
see https://github.com/rust-amplify/rust-amplify/pull/148. We can switch
back to upstream version when that's merged.
2022-10-18 15:00:10 +03:00
Heikki Linnakangas
2418e72649 Speed up layer_map::search, by remembering the "envelope" for each layer.
Lookups in the R-tree call the "envelope" function for every comparison,
and our envelope function isn't very cheap, so that overhead adds up.
Create the envelope once, when the layer is inserted into the tree, and
store it along with the layer. That uses some more memory per layer, but
that's not very significant.

Speeds up the search operation 2x
2022-10-18 15:00:10 +03:00
Heikki Linnakangas
80746b1c7a Add micro-benchmark for layer map search function
The test data was extracted from our pgbench benchmark project on the
captest environment, the one we use for the 'neon-captest-reuse' test.
2022-10-18 15:00:10 +03:00
Dmitry Rodionov
129f7c82b7 remove redundant expect_tenant_to_download_timeline 2022-10-18 11:21:48 +03:00
Anastasia Lubennikova
0ec5ddea0b GRANT CREATE ON SCHEMA public TO web_access 2022-10-17 22:42:51 +03:00
Kirill Bulatov
c4ee62d427 Bump clap and other minor dependencies (#2623) 2022-10-17 12:58:40 +03:00
Joonas Koivunen
c709354579 Add layer sizes to index_part.json (#2582)
This is the first step in verifying layer files. Next up on the road is
hashing the files and verifying the hashes.

The metadata additions do not require any migration. The idea is that
the change is backward and forward-compatible with regard to
`index_part.json` due to the softness of JSON schema and the
deserialization options in use.

New types added:

- LayerFileMetadata for tracking the file metadata
    - starting with only the file size
    - in future hopefully a sha256 as well
- IndexLayerMetadata, the serialized counterpart of LayerFileMetadata

LayerFileMetadata needing to have all fields Option is a problem but
that is not possible to handle without conflicting a lot more with other
ongoing work.

Co-authored-by: Kirill Bulatov <kirill@neon.tech>
2022-10-17 12:21:04 +03:00
Lassi Pölönen
5d6553d41d Fix pageserver configuration generation bug (#2584)
* We had an issue with `lineinfile` usage for pageserver configuration
file: if the S3 bucket related values were changed, it would have
resulted in duplicate keys, resulting in invalid toml.

So to fix the issue, we should keep the configuration in structured
format (yaml in this case) so we can always generate syntactically
correct toml.

Inventories are converted to yaml just so that it's easier to maintain
the configuration there. Another alternative would have been a separate
variable files.

* Keep the ansible collections dir, but locally installed collections
should not be tracked.
2022-10-16 11:37:10 +00:00
Kirill Bulatov
f03b7c3458 Bump regular dependencies (#2618)
* etcd-client is not updated, since we plan to replace it with another client and the new version errors with some missing prost library error
* clap had released another major update that requires changing every CLI declaration again, deserves a separate PR
2022-10-15 01:55:31 +03:00
Heikki Linnakangas
9c24de254f Add description and license fields to OpenAPI spec.
These were added earlier to the control plane's copy of this file.
This is the master version of this file, so let's keep it in sync.
2022-10-14 18:37:58 +03:00
Heikki Linnakangas
538876650a Merge 'local' and 'remote' parts of TimelineInfo into one struct.
The 'local' part was always filled in, so that was easy to merge into
into the TimelineInfo itself. 'remote' only contained two fields,
'remote_consistent_lsn' and 'awaits_download'. I made
'remote_consistent_lsn' an optional field, and 'awaits_download' is now
false if the timeline is not present remotely.

However, I kept stub versions of the 'local' and 'remote' structs for
backwards-compatibility, with a few fields that are actively used by
the control plane. They just duplicate the fields from TimelineInfo
now. They can be removed later, once the control plane has been
updated to use the new fields.
2022-10-14 18:37:14 +03:00
Heikki Linnakangas
500239176c Make TimelineInfo.local field mandatory.
It was only None when you queried the status of a timeline with
'timeline_detail' mgmt API call, and it was still being downloaded. You
can check for that status with the 'tenant_status' API call instead,
checking for has_in_progress_downloads field.

Anothere case was if an error happened while trying to get the current
logical size, in a 'timeline_detail' request. It might make sense to
tolerate such errors, and leave the fields we cannot fill in as empty,
None, 0 or similar, but it doesn't make sense to me to leave the whole
'local' struct empty in tht case.
2022-10-14 18:37:14 +03:00
Anastasia Lubennikova
ee64a6b80b Fix CI: push versioned compute images to production ECR 2022-10-14 18:12:50 +03:00
Anastasia Lubennikova
a13b486943 Bump vendor/postgres-v15. Rebase to 15.0 2022-10-14 18:12:50 +03:00
Arseny Sher
9fe4548e13 Reimplement explicit timeline creation on safekeepers.
With the ability to pass commit_lsn. This allows to perform project WAL recovery
through different (from the original) set of safekeepers (or under different
ttid) by
1) moving WAL files to s3 under proper ttid;
2) explicitly creating timeline on safekeepers, setting commit_lsn to the
latest point;
3) putting the lastest .parital file to the timeline directory on safekeepers, if
desired.

Extend test_s3_wal_replay to exersise this behaviour.

Also extends timeline_status endpoint to return postgres information.
2022-10-13 21:43:10 +04:00
Heikki Linnakangas
14c623b254 Make it possible to build with old cargo version.
I'm using the Rust compiler and cargo versions from Debian packages,
but the latest available cargo Debian package is quite old, version
1.57.  The 'named-profiles' features was not stabilized at that
version yet, so ever since commit a463749f5, I've had to manually add
this line to the Cargo.toml file to compile. I've been wishing that
someone would update the cargo Debian package, but it doesn't seem to
be happening any time soon.

This doesn't seem to bother anyone else but me, but it shouldn't hurt
anyone else either. If there was a good reason, I could install a
newer cargo version with 'rustup', but if all we need is this one line
in Cargo.toml, I'd prefer to continue using the Debian packages.
2022-10-13 15:17:00 +03:00
Alexander Bayandin
ebf54b0de0 Nightly Benchmarks: Add 50 GB projects (#2612) 2022-10-13 10:00:29 +01:00
Andrés
09dda35dac Return broken tenants due to non existing timelines dir (#2552) (#2575)
Co-authored-by: andres <andres.rodriguez@outlook.es>
2022-10-12 22:28:39 +03:00
Dmitry Ivanov
6ace79345d [proxy] Add more context to console requests logging (#2583) 2022-10-12 21:00:44 +03:00
danieltprice
771e61425e Update release-pr.md (#2600)
Update the Release Notes PR example that is referenced from the checklist. The Release Notes file structure changed recently.
2022-10-12 08:38:28 -03:00
Alexander Bayandin
93775f6ca7 GitHub Actions: replace deprecated set-output with GITHUB_OUTPUT (#2608) 2022-10-12 10:22:24 +01:00
Arseny Sher
6d0dacc4ce Recreate timeline on pageserver in s3_wal_replay test.
That's closer to real usage than switching to brand new pageserver.
2022-10-12 11:46:21 +04:00
Heikki Linnakangas
e5e40a31f4 Clean up terms "delete timeline" and "detach tenant".
You cannot attach/detach an individual timeline, attach/detach always
applies to the whole tenant. However, you can *delete* a single timeline
from a tenant. Fix some comments and error messages that confused these
two operations.
2022-10-11 17:47:41 +03:00
Heikki Linnakangas
676c63c329 Improve comments. 2022-10-11 17:47:41 +03:00
Heikki Linnakangas
47366522a8 Make the return type of 'list_timelines' simpler.
It's enough to return just the Timeline references. You can get the
timeline's ID easily from Timeline.
2022-10-11 17:47:41 +03:00
Heikki Linnakangas
db26bc49cc Remove obsolete FIXME comment.
Commit c634cb1d36 removed the trait and changed the function to return
a &TimelineWriter, as the FIXME said we should do, but forgot to remove
the FIXME.
2022-10-11 17:47:41 +03:00
Lassi Pölönen
e520293090 Add build info metric to pageserver, safekeeper and proxy (#2596)
* Test that we emit build info metric for pageserver, safekeeper and proxy with some non-zero length revision label

* Emit libmetrics_build_info on startup of pageserver, safekeeper and
proxy with label "revision" which tells the git revision.
2022-10-11 09:54:32 +03:00
Sergey Melnikov
241e549757 Switch neon-stress etcd to dedicatd instance (#2602) 2022-10-10 22:07:19 +00:00
Sergey Melnikov
34bea270f0 Fix POSTGRES_DISTRIB_DIR for benchmarks on ec2 runner (#2594) 2022-10-10 09:12:50 +00:00
Kirill Bulatov
13f0e7a5b4 Deploy pageserver_binutils to the envs 2022-10-09 08:21:11 +03:00