Commit Graph

1979 Commits

Author SHA1 Message Date
MMeent
04a018a5b1 Extract neon and neon_test_utils from postgres repo (#2325)
* Extract neon and neon_test_utils from postgres repo
* Remove neon from vendored postgres repo, and fix build_and_test.yml
* Move EmitWarningsOnPlaceholders to end of _PG_init in neon.c (from libpagestore.c)
* Fix Makefile location comments
* remove Makefile EXTRA_INSTALL flag
* Update Dockerfile.compute-node to build and include the neon extension
2022-08-25 18:48:09 +02:00
MMeent
bc588f3a53 Update WAL redo histograms (#2323)
Previously, it could only distinguish REDO task durations down to 5ms, which
equates to approx. 200pages/sec or 1.6MB/sec getpage@LSN traffic. 
This patch improves to 200'000 pages/sec or 1.6GB/sec, allowing for
much more precise performance measurement of the redo process.
2022-08-25 17:17:32 +03:00
Egor Suvorov
c952f022bb waldecoder: fix comment 2022-08-25 15:03:22 +02:00
Rory de Zoete
f67d109e6e Copy binaries to /usr/local (#2335)
* Add extra symlink

* Take other approach

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
2022-08-25 14:35:01 +02:00
Rory de Zoete
344db0b4aa Re-add temporary symlink (#2331)
Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
2022-08-25 11:17:09 +02:00
Rory de Zoete
0c8ee6bd1d Add postgis & plv8 extensions (#2298)
* Add postgis & plv8 extensions

* Update Dockerfile & Fix typo's

* Update dockerfile

* Update Dockerfile

* Update dockerfile

* Use plv8 step

* Reduce giga layer

* Reduce layer size further

* Prepare for rollout

* Fix dependency

* Pass on correct build tag

* No longer dependent on building tools

* Use version from vendor

* Revert "Use version from vendor"

This reverts commit 7c6670c477.

* Revert and push correct set

* Add configure step for new approach

* Re-add configure flags

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
2022-08-25 09:46:52 +02:00
Dmitry Ivanov
8e1d6dd848 Minor cleanup in pq_proto (#2322) 2022-08-23 18:00:02 +03:00
Heikki Linnakangas
4013290508 Fix module doc comment.
`///` is used for comments on the *next* code that follows, so the comment
actually applied to the `use std::collections::BTreeMap;` line that follows.

rustfmt complained about that:

    error: an inner attribute is not permitted following an outer doc comment
     --> /home/heikki/git-sandbox/neon/libs/utils/src/seqwait_async.rs:7:1
      |
    5 | ///
      | --- previous doc comment
    6 |
    7 | #![warn(missing_docs)]
      | ^^^^^^^^^^^^^^^^^^^^^^ not permitted following an outer attribute
    8 |
    9 | use std::collections::BTreeMap;
      | ------------------------------- the inner attribute doesn't annotate this `use` import
      |
      = note: inner attributes, like `#![no_std]`, annotate the item enclosing them, and are usually found at the beginning of source files
help: to annotate the `use` import, change the attribute from inner to outer style
      |
    7 - #![warn(missing_docs)]
    7 + #[warn(missing_docs)]
      |

`//!` is the correct syntax for comments that apply to the whole file.
2022-08-23 12:58:54 +03:00
Heikki Linnakangas
5f0c95182d Minor cleanup, to pass by reference where possible. 2022-08-23 12:58:54 +03:00
Heikki Linnakangas
63b9dfb2f2 Remove unnecessary 'pub' from test module, and remove dead constant.
After making the test module private, the compiler noticed and warned
that the constant is unused.
2022-08-23 12:58:54 +03:00
Heikki Linnakangas
1a666a01d6 Improve comments a little. 2022-08-23 12:58:54 +03:00
Heikki Linnakangas
d110d2c2fd Reorder permission checks in HTTP API call handlers.
Every handler function now follows the same pattern:

1. extract parameters from the call
2. check permissions
3. execute command.

Previously, we extracted some parameters before permission check and
some after. Let's be consistent.
2022-08-23 12:14:06 +03:00
KlimentSerafimov
b98fa5d6b0 Added a new test for making sure the proxy displays a session_id when using link auth. (#2039)
Added pytest to check correctness of the link authentication pipeline.

Context: this PR is the first step towards refactoring the link authentication pipeline to use https (instead of psql) to send the db info to the proxy. There was a test missing for this pipeline in this repo, so this PR adds that test as preparation for the actual change of psql -> https.
Co-authored-by: Bojan Serafimov <bojan.serafimov7@gmail.com>
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
Co-authored-by: Stas Kelvic <stas@neon.tech>
Co-authored-by: Dimitrii Ivanov <dima@neon.tech>
2022-08-22 20:02:45 -04:00
Dmitry Rodionov
9dd19ec397 Remove interferring proc check
We do not need it anymore because ports_distributor checks
whether the port can be used before giving it to service
2022-08-22 20:59:32 +03:00
Alexander Bayandin
832e60c2b4 Add .git-blame-ignore-revs file (#2318) 2022-08-22 16:38:31 +01:00
Alexey Kondratov
6dc56a9be1 Add GitHub templates for epics, bugs and release PRs (neondatabase/cloud#2079)
After merging this we will be able to:
- Pick Epic or Bug template in the GitHub UI, when creating an issue
- Use this link to open a release PR formatted in a unified way and
  containing a checklist with useful links: https://github.com/neondatabase/neon/compare/release...main?template=release-pr.md&title=Release%20202Y-MM-DD
2022-08-22 16:29:42 +02:00
Alexander Bayandin
39a3bcac36 test_runner: fix flake8 warnings 2022-08-22 14:57:09 +01:00
Alexander Bayandin
ae3227509c test_runner: revive flake8 2022-08-22 14:57:09 +01:00
Alexander Bayandin
4c2bb43775 Reformat all python files by black & isort 2022-08-22 14:57:09 +01:00
Alexander Bayandin
6b2e1d9065 test_runner: replace yapf with black and isort 2022-08-22 14:57:09 +01:00
Alexander Bayandin
277f2d6d3d Report test results to Allure (#2229) 2022-08-22 11:21:50 +01:00
Kirill Bulatov
7779308985 Ensure timeline logical size is initialized once 2022-08-22 11:51:37 +03:00
Kirill Bulatov
32be8739b9 Move walreceiver timeline registration into layered_repository 2022-08-22 11:51:37 +03:00
Kirill Bulatov
631cbf5b1b Use single map to manage timeline data 2022-08-22 11:51:37 +03:00
Heikki Linnakangas
5522fbab25 Move all unit tests related to Repository/Timeline to layered_repository.rs
There was a nominal split between the tests in layered_repository.rs and
repository.rs, such that tests specific to the layered implementation were
supposed to be in layered_repository.rs, and tests that should work with
any implementation of the traits were supposed to be in repository.rs.
In practice, the line was quite muddled. With minor tweaks, many of the
tests in layered_repository.rs should work with other implementations too,
and vice versa. And in practice we only have one implementation, so it's
more straightforward to gather all unit tests in one place.
2022-08-20 01:21:18 +03:00
Heikki Linnakangas
d48177d0d8 Expose timeline logical size as a prometheus metric.
Physical size was already exposed, and it'd be nice to show both
logical and physical size side by side in our graphana dashboards.
2022-08-19 22:21:33 +03:00
Heikki Linnakangas
84cd40b416 rustfmt fixes.
Not sure why these don't show up as CI failures, but on my laptop,
rustfmt insists.
2022-08-19 22:21:15 +03:00
Heikki Linnakangas
daba4c7405 Add a section in glossary to explain what "logical size" means. (#2306) 2022-08-19 21:57:00 +03:00
MMeent
8ac5a285a1 Update vendor/postgres to one that is rebased onto REL_14_5 (#2312)
This was previously based on REL_14_4
Protected tag of main before rebase is at main-before-rebase-REL_14_5
2022-08-19 20:02:36 +02:00
Heikki Linnakangas
aaa60c92ca Use u64/i64 for logical size, comment on why to use signed i64.
usize/isize type corresponds to the CPU architecture's pointer width,
i.e. 64 bits on a 64-bit platform and 32 bits on a 32-bit platform.
The logical size of a database has nothing to do with the that, so
u64/i64 is more appropriate.

It doesn't make any difference in practice as long as you're on a
64-bit platform, and it's hard to imagine anyone wanting to run the
pageserver on a 32-bit platform, but let's be tidy.

Also add a comment on why we use signed i64 for the logical size
variable, even though size should never be negative. I'm not sure the
reasons are very good, but at least this documents them, and hints at
some possible better solutions.
2022-08-19 16:44:16 +03:00
Kirill Bulatov
187a760409 Reset codestyle cargo cache 2022-08-19 16:40:37 +03:00
Kirill Bulatov
c634cb1d36 Remove TimelineWriter trait, rename LayeredTimelineWriter struct into TimelineWriter 2022-08-19 16:40:37 +03:00
Kirill Bulatov
c19b4a65f9 Remove Repository trait, rename LayeredRepository struct into Repository 2022-08-19 16:40:37 +03:00
Kirill Bulatov
8043612334 Remove Timeline trait, rename LayeredTimeline struct into Timeline 2022-08-19 16:40:37 +03:00
Rory de Zoete
12e87f0df3 Update workflow to fix dependency issue (#2309)
* Update workflow to fix dependency issue

* Update workflow

* Update workflow and dockerfile

* Specify tag

* Update main dockerfile as well

* Mirror rust image to docker hub

* Update submodule ref

Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
2022-08-19 12:07:46 +02:00
Kirill Bulatov
6b9cef02a1 Use better defaults for pageserver Docker image 2022-08-19 12:41:00 +03:00
MMeent
37d90dc3b3 Fix dependencies issue between compute-tools and compute node docker images (#2304)
Compute node docker image requires compute-tools to build, but this
dependency (and the argument for which image to pick) weren't described in the
workflow file. This lead to out-of-date binaries in latest builds, which
subsequently broke these images.
2022-08-18 21:51:33 +02:00
Kirill Bulatov
a185821d6f Explicitly error on cache issues during I/O (#2303) 2022-08-18 22:37:20 +03:00
MMeent
f99ccb5041 Extract WalProposer into the neon extension (#2217)
Including, but not limited to:

* Fixes to neon management code to support walproposer-as-an-extension

* Fix issue in expected output of pg settings serialization.

* Show the logs of a failed --sync-safekeepers process in CI

* Add compat layer for renamed GUCs in postgres.conf

* Update vendor/postgres to the latest origin/main
2022-08-18 17:12:28 +02:00
Rory de Zoete
2db675a2f2 Re-enable test dependency for deploy (#2300)
Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
2022-08-18 15:18:59 +02:00
Anton Galitsyn
77a2bdf3d7 on safekeeper registration pass availability zone param (#2292) 2022-08-18 15:05:40 +03:00
Arthur Petukhovsky
976576ae59 Fix walreceiver and safekeeper bugs (#2295)
- There was an issue with zero commit_lsn `reason: LaggingWal { current_commit_lsn: 0/0, new_commit_lsn: 1/6FD90D38, threshold: 10485760 } }`. The problem was in `send_wal.rs`, where we initialized `end_pos = Lsn(0)` and in some cases sent it to the pageserver.
- IDENTIFY_SYSTEM previously returned `flush_lsn` as a physical end of WAL. Now it returns `flush_lsn` (as it was) to walproposer and `commit_lsn` to everyone else including pageserver.
- There was an issue with backoff where connection was cancelled right after initialization: `connected!` -> `safekeeper_handle_db: Connection cancelled` -> `Backoff: waiting 3 seconds`. The problem was in sleeping before establishing the connection. This is fixed by reworking retry logic.
- There was an issue with getting `NoKeepAlives` reason in a loop. The issue is probably the same as the previous.
- There was an issue with filtering safekeepers based on retry attempts, which could filter some safekeepers indefinetely. This is fixed by using retry cooldown duration instead of retry attempts.
- Some `send_wal.rs` connections failed with errors without context. This is fixed by adding a timeline to safekeepers errors.

New retry logic works like this:
- Every candidate has a `next_retry_at` timestamp and is not considered for connection until that moment
- When walreceiver connection is closed, we update `next_retry_at` using exponential backoff, increasing the cooldown on every disconnect.
- When `last_record_lsn` was advanced using the WAL from the safekeeper, we reset the retry cooldown and exponential backoff, allowing walreceiver to reconnect to the same safekeeper instantly.
2022-08-18 13:38:23 +03:00
Anastasia Lubennikova
1a07ddae5f fix cargo test 2022-08-18 13:25:00 +03:00
Heikki Linnakangas
9bc12f7444 Move auto-generated 'bindings' to a separate inner module.
Re-export only things that are used by other modules.

In the future, I'm imagining that we run bindgen twice, for Postgres
v14 and v15. The two sets of bindings would go into separate
'bindings_v14' and 'bindings_v15' modules.

Rearrange postgres_ffi modules.

Move function, to avoid Postgres version dependency in timelines.rs
Move function to generate a logical-message WAL record to postgres_ffi.
2022-08-18 13:25:00 +03:00
Rory de Zoete
92bdf04758 Fix: Always build images (#2296)
* Always build images

* Remove unused

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
2022-08-18 09:41:24 +02:00
Kirill Bulatov
67e091c906 Rework init in pageserver CLI (#2272)
* Do not create initial tenant and timeline (adjust Python tests for that)
* Rework config handling during init, add --update-config to manage local config updates
2022-08-17 23:24:47 +03:00
Alexander Bayandin
dc102197df workflows/benchmarking: increase timeout (#2294) 2022-08-17 17:16:26 +01:00
Rory de Zoete
262cdf8344 Update cachepot endpoint (#2290)
* Update cachepot endpoint

* Update dockerfile & remove env

* Update image building process

* Cannot use metadata endpoint for this

* Update workflow

* Conform to kaniko syntax

* Update syntax

* Update approach

* Update dockerfiles

* Force update

* Update dockerfiles

* Update dockerfile

* Cleanup dockerfiles

* Update s3 test location

* Revert s3 experiment

* Add more debug

* Specify aws region

* Remove debug, add prefix

* Remove one more debug

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
2022-08-17 18:02:03 +02:00
Kirill Bulatov
3b819ee159 Remove extra type aliases (#2280) 2022-08-17 17:51:53 +03:00
bojanserafimov
e9a3499e87 Fix flaky pageserver restarts in tests (#2261) 2022-08-17 08:17:35 -04:00