Commit Graph

2429 Commits

Author SHA1 Message Date
Egor Suvorov
8261455019 persistent_range_query: add layer_map_test 2022-11-24 04:47:19 +02:00
Egor Suvorov
aad88d6c39 persistent_range_query: add stress test 2022-11-24 03:50:18 +02:00
Egor Suvorov
6188315b51 persistent_range_query: more refs 2022-11-24 03:45:02 +02:00
Egor Suvorov
3a4b932d8a Draft generic persistent segment tree 2022-11-24 02:31:48 +02:00
Egor Suvorov
cc2b3c986c Simplify code: fewer lifetimes, auto-impl VecFrozenVersion 2022-11-24 02:11:06 +02:00
Egor Suvorov
c250c2664b Always require Clone for RangeQueryResult 2022-11-24 01:42:14 +02:00
Egor Suvorov
e5550a01b0 VecVersion: make it read-only, only the latest version can be modified 2022-11-24 01:28:13 +02:00
Egor Suvorov
45617ceaef RangeModification: add is_no_op/is_reinitialization 2022-11-24 00:12:05 +02:00
Egor Suvorov
29b39301fe RangeQueryResult: do not require Clone and do not provide default combine/add implementations 2022-11-24 00:11:52 +02:00
Egor Suvorov
b01a93be60 persistent_range_query: second draft
Move same generic parameters to associated types.
Work around private fields of SumResult<T>
2022-11-23 23:09:25 +02:00
Dmitry Ivanov
05db6458df [proxy] Fix project (endpoint) -related error messages 2022-11-23 23:03:29 +03:00
Egor Suvorov
4c68d019e3 persistent_range_query: first draft 2022-11-23 20:52:34 +02:00
Arseny Sher
2d42f84389 Add storage_broker binary.
Which ought to replace etcd. This patch only adds the binary and adjusts
Dockerfile to include it; subsequent ones will add deploy of helm chart and the
actual replacement.

It is a simple and fast pub-sub message bus. In this patch only safekeeper
message is supported, but others can be easily added.

Compilation now requires protoc to be installed. Installing protobuf-compiler
package is fine for Debian/Ubuntu.

ref
https://github.com/neondatabase/neon/pull/2733
https://github.com/neondatabase/neon/issues/2394
2022-11-23 22:05:59 +04:00
Sergey Melnikov
aee3eb6d19 Deploy link proxy to us-east-2 (#2905) 2022-11-23 18:11:44 +01:00
Konstantin Knizhnik
a6e4a3c3ef Implement corrent truncation of FSM/VM forks on arbitrary position (#2609)
refer #2601

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2022-11-23 18:46:07 +02:00
Konstantin Knizhnik
21ec28d9bc Add bulk update test (#2902) 2022-11-23 17:51:35 +02:00
Heikki Linnakangas
de8f24583f Remove obsolete 'zenith_ctl' alias from compute images 2022-11-23 16:58:31 +02:00
Sergey Melnikov
85f0975c5a Setup eu-west-1 as region for PR testing (#2757) 2022-11-23 10:54:39 +01:00
Konstantin Knizhnik
1af087449a Reduce max_replication_write_lag to 10Mb (#1793) 2022-11-23 08:41:22 +02:00
Heikki Linnakangas
37625c4433 Remove obsolete design doc.
I considered archiving this under docs/rfcs, but looking at the contents,
I don't think it's relevant at all anymore. So let's just remove it.
2022-11-23 00:40:17 +02:00
Heikki Linnakangas
e9f4ca5972 Remove references to obsolete files in .gitignore 2022-11-23 00:40:17 +02:00
Alexey Kondratov
4bf3087aed [pageserver] list latest_gc_cutoff_lsn in the OpenAPI spec (#2894)
It seems that it's present in the API response for quite a while. It's
just not listed in the spec, fix it.
2022-11-22 21:10:49 +01:00
Dmitry Ivanov
9470bc9fe0 [proxy] Implement per-tenant traffic metrics 2022-11-22 18:50:57 +03:00
Heikki Linnakangas
86e483f87b Fix tenant size modeling code to include WAL at end of branch
Imagine that you have a tenant with a single branch like this:

---------------==========>
               ^
	    gc horizon
where:

----  is the portion of the branch that is older than retention period
====  is the portion of the branch that is newer than retention period.

Before this commit, the sizing model included the logical size at the
GC horizon, but not the WAL after that. In particular, that meant that
on a newly created tenant with just one timeline, where the retention
period covered the whole history of the timeline, i.e. gc_cutoff was 0,
the calculated tenant size was always zero.

We now include the WAL after the GC horizon in the size. So in the
above example, the calculated tenant size would be the logical size
of the database the GC horizon, plus all the WAL after it (marked with
===).

This adds a new `insert_point` function to the sizing model, alongside
`modify_branch`, and changes the code in size.rs to use the new
function. The new function takes an absolute lsn and logical size as
argument, so we no longer need to calculate the difference to the
previous point. Also, the end-size is now optional, because we now
need to add a point to represent the end of each branch to the model,
but we don't want to or need to calculate the logical size at that
point.
2022-11-22 17:11:27 +02:00
Christian Schwarz
f50d0ec0c9 test_runner: ignore 'sender is dropped while join handle is still alive' warnings
The need for a proper solution to this is tracked in
https://github.com/neondatabase/neon/issues/2885
2022-11-22 11:30:34 +01:00
Sergey Melnikov
74ec36a1bf Add pageserver-1.us-east-2.aws.neon.build (#2881) 2022-11-22 10:55:02 +01:00
Anastasia Lubennikova
a63ebb6446 Update vendor postgres to 14.6 and 15.1 2022-11-22 10:46:21 +02:00
Alexander Stanovoy
a5b898a31c Fix the order of checks in LSN (#2882)
We should check if LSN is in the lower range because it's constant and
only after wait for LSN to arrive if needed.
2022-11-22 02:28:41 +02:00
bojanserafimov
c6f095a821 Fix remote seqscan test (#2878) 2022-11-21 17:21:47 -05:00
Alexander Bayandin
6b2bc7f775 Nightly Benchmarks: Add RDS Postgres (#2859)
Add RDS Postgres `db.m5.large` instance to Nightly Benchmarks
2022-11-21 15:25:09 +00:00
Heikki Linnakangas
6c97fc941a Enable passing FAILPOINTS at startup.
- Pass through FAILPOINTS environment variable to the pageserver in
  "neon_local pageserver start" command

- On startup, list any failpoints that were set with FAILPOINTS to the log

- Add optional "extra_env_vars" argument to the NeonPageserver.start()
  function in the python fixture, so that you can pass FAILPOINTS

None of the tests use this functionality yet; that comes in a separate
commit.

closes https://github.com/neondatabase/neon/pull/2865
2022-11-21 16:24:19 +01:00
Alexander Bayandin
cb9b26776e Fix test_seqscans on remote cluster (#2869)
A remote project is reused between tests, so we need to ensure that we
don't have a table with the same name already created.
2022-11-19 23:39:42 +00:00
Heikki Linnakangas
684329d4d2 Another attempt at silencing test_gc_cutoff failures.
Increse the pgbench runtimes even further. The theory is that when
there are many other tests running at the same time, one pgbench run
could take a long time until it generates enough layers for GC to kick
in.
2022-11-19 19:28:56 +02:00
Heikki Linnakangas
ed40a045c0 Add more logging to track down test_gc_cutoff failure.
see https://github.com/neondatabase/neon/issues/2856
2022-11-19 14:12:21 +02:00
Heikki Linnakangas
3f39327622 Silence a few compiler warnings
I saw these from the build of the compute docker image in the CI
(compute-node-image-v15):

    pagestore_smgr.c: In function 'neon_prefetch':
    pagestore_smgr.c:1654:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     1654 |  BufferTag tag = (BufferTag) {
          |  ^~~~~~~~~
    walproposer.c:197:1: warning: no previous prototype for 'WalProposerSync' [-Wmissing-prototypes]
      197 | WalProposerSync(int argc, char *argv[])
          | ^~~~~~~~~~~~~~~
    libpagestore.c: In function 'pageserver_connect':
    libpagestore.c💯9: warning: variable 'wc' set but not used [-Wunused-but-set-variable]
      100 |   int   wc;
          |         ^~
    libpagestore.c: In function 'call_PQgetCopyData':
    libpagestore.c:144:9: warning: variable 'wc' set but not used [-Wunused-but-set-variable]
      144 |   int   wc;
          |         ^~

Harmless warnings, but let's be tidy.

In the passing, I added some "extern" to a few function declarations
that were missing them, and marked WalProposerSync as "static". Those
changes are also purely cosmetic.
2022-11-19 14:11:04 +02:00
Heikki Linnakangas
a50a7e8ac0 Try to silence test_gc_cutoff flakiness.
Commit d013a2b227 changed the test, so that it fails if pgbench runs
to completion without triggering the failpoint. That has now happened
several times in the CI. That's not expected, so this needs some
investigation, but as a quick fix just make the pgbench runs longer so
that we're closer to the situation before commit d013a2b227.

See https://github.com/neondatabase/neon/issues/2856
2022-11-19 01:19:09 +02:00
Egor Suvorov
e28eda7939 sourcetree/docs: mention hakari generate (#2864) 2022-11-18 22:30:41 +00:00
Christian Schwarz
f564dff0e3 make test_tenant_detach_smoke fail reproducibly
Add failpoint that triggers the race condition.
Skip test until we'll land the fix from
https://github.com/neondatabase/neon/pull/2851
with
https://github.com/neondatabase/neon/pull/2785
2022-11-18 17:15:34 +01:00
Christian Schwarz
d783889a1f timeline: explicit tracking of flush loop state: NotStarted, Running, Exited
This allows us to error out in the case where we request flush but the
flush loop is not running.
Before, we would only track whether it was started, but not when it
exited.
Better to use an enum with 3 states than a 2-state bool because then
the error message can answer the question whether we ever started
the flush loop or not.
2022-11-18 17:15:34 +01:00
bojanserafimov
2655bdbb2e Add remote seqscans test (#2840) 2022-11-18 09:05:13 -05:00
Konstantin Knizhnik
b9152f1ef4 Correctly terminate prefetch in case of pageserver restart (#2850)
refer #2819

This patch requires deep knowledge of prefetch internals.
So @MMeent  please review it or suggest better solution.
2022-11-18 15:04:58 +02:00
Heikki Linnakangas
328ec1ce24 Print a more full error message, with stack trace, on GC failure.
In a CI run, I got a test failure because of this error in the log,
from the test_get_tenant_size_with_multiple_branches test:

    ERROR gc_loop{tenant_id=f1630516d4b526139836ced93be0c878}: Gc failed, retrying in 2s: No such file or directory (os error 2)

There are known race conditions between GC and timeline deletion,
which surely caused that error. But if we didn't know the cause, it
would be pretty hard to debug without a stack trace.
2022-11-18 11:44:00 +02:00
Heikki Linnakangas
dcb79ef08f Silence yet another test failure from race condition between GC and delete.
Another similar case to commit 9ae4da4f31.
2022-11-18 10:18:15 +02:00
Konstantin Knizhnik
fd99e0fbc4 Build pg_prewrm extension (#2794) 2022-11-18 09:10:32 +02:00
Kirill Bulatov
60ac227196 Use modern flex and bison in macOS compilations (#2847) 2022-11-17 14:48:21 +00:00
MMeent
4a60051b0d Add codeowners section for /vendor/ (#2849)
After this, consent of @neondatabase/compute is required to update the
vendored PostgreSQL versions.
2022-11-17 14:31:34 +00:00
Heikki Linnakangas
24d3ed0952 Ignore another ERROR that's expected in test.
Got a test failure in CI because of this.
2022-11-17 12:42:56 +02:00
Alexander Bayandin
0a87d71294 test_runner: make proxy mgmt port mandatory (#2839)
Make `mgmt` port mandatory argument for `NeonProxy` (and set it for
`static_proxy`) to avoid port collision when tests run in parallel.
2022-11-16 17:57:48 +00:00
Heikki Linnakangas
150bddb929 Clean up process start/stop handling
* Poll more frequently when waiting for process start/stop. This
  speeds up startup and shutdown in tests. We did this already in
  commit 52ce1c9d53, which reduced the interval to 100 ms, but it was
  inadvertently increased back to 500 ms in commit d42700280f. Reduce
  it to 100 ms again, for both start and stop operations.

* Harmonize the start and stop loops, printing the dots and notices
  the same way in both. I considered extracting the logic to a
  separate retry-function that takes a closure as argument that does
  the polling, but as long as we only have two copies, the code
  duplication isn't that bad.

* Remove newline after "Starting pageserver" and "Starting etcd"
  messages, so that the progress-indicator dots that are printed once
  a second are printed on the same line. Before:

    Starting pageserver at '127.0.0.1:64000' in '.neon'
    ...
    pageserver started, pid: 2538937

  After:

    Starting pageserver at '127.0.0.1:64000' in '.neon'...
    pageserver started, pid: 2538937

  The "Starting safekeeper" message already got this right.

* Update example output in README.md to match
2022-11-16 19:51:37 +02:00
Alexander Bayandin
2b728bc69e test_forward_compatibility: fix path to pg_distrib_dir (#2826)
Set correct `pg_distrib_dir` in `pageserver.toml` and in neon_local
`config`.

`test_forward_compatibility` shows flakiness during `neon_local pg
start`, so hopefully, the patch will help.

```
2022-11-15 16:07:34.091 GMT [13338] LOG:  starting with zenith basebackup at LSN 0/A6A9310, prev 0/0
2022-11-15 16:07:34.091 GMT [13338] FATAL:  cannot start in read-write mode from this base backup
2022-11-15 16:07:34.091 GMT [13337] LOG:  startup process (PID 13338) exited with exit code 1
```
2022-11-16 15:14:36 +00:00