Commit Graph

126 Commits

Author SHA1 Message Date
anastasia
5b34afe893 Bump vendor/postgres to use local relation cache for smgr_exists 2022-03-10 17:36:09 +04:00
anastasia
c1b3836df1 Bump vendor/postgres 2022-02-24 12:52:12 +03:00
Heikki Linnakangas
e4670a5f1e Remove the PageVersions abstraction.
Since commit fdd987c3ad, it was only used in InMemoryLayers. Let's
just "inline" the code into InMemoryLayer itself.

I originally did this as part of a bigger PR (#1267). With that PR,
one in-memory layer, and one ephemeral file, would hold page versions
belonging to multiple segments. Currently, PageVersions can only hold
versions for a single segment, so that would need to be changed.
Rather than modify PageVersions to support that, just remove it
altogether.
2022-02-23 21:04:39 +02:00
Stas Kelvich
058123f7ef Bump postgres to fix zenith_test_utils linkage on macOS. 2022-02-23 20:33:47 +03:00
anastasia
74a0942a77 Fix zenith feedback processing at compute node.
Add test for backpressure
2022-02-22 13:56:21 +03:00
Arseny Sher
328e3b4189 bump vendor/postgres to fix compiler warnings 2022-02-15 06:51:16 +03:00
Arseny Sher
d4d26f619d bump vendor/postgres to fix compilation warning 2022-02-14 21:00:11 +03:00
Arseny Sher
36481f3374 bump vendor/postgres to init pgxactoff in walproposer
ref #1244
2022-02-14 15:57:38 +03:00
Heikki Linnakangas
c45ee13b4e Bump vendor/postgres, to fix memory leak.
See https://github.com/zenithdb/postgres/pull/129
2022-02-10 11:29:38 +02:00
anastasia
f1e7db9d0d Bump vendor/postgres rebased to 14.2 2022-02-10 11:19:10 +03:00
Heikki Linnakangas
a504cc87ab Bump vendor/postgres for "Make getpage requests interruptible"
See https://github.com/zenithdb/zenith/issues/1224
2022-02-09 16:13:46 +02:00
Heikki Linnakangas
5268bbc840 Bump vendor/postgres for fixes to cluster size limit.
See https://github.com/zenithdb/postgres/pull/126
2022-02-09 15:52:21 +02:00
Arseny Sher
e1d770939b Bump vendor/postgres to fix recent CI failure.
See zenithdb/postgres#127
2022-02-09 08:50:45 -05:00
anastasia
642797b69e Implement cluster size quota for zenith compute node.
Use GUC zenith.max_cluster_size to set the limit.

If limit is reached, extend requests will throw out-of-space error.
When current size is too close to the limit - throw a warning.

Add new test: test_timeline_size_quota.
2022-02-06 02:16:36 +03:00
Arseny Sher
32c7859659 bump vendor/postgres 2022-02-05 01:27:31 +03:00
Arthur Petukhovsky
cedde559b8 Add test for replacement of the failed safekeeper (#1179)
* Add test to replace failed safekeeper

* Restart safekeepers in test_replace_safekeeper

* Update vendor/postgres
2022-01-27 17:26:55 +03:00
anastasia
c44695f34b bump vendor/postgres 2022-01-27 11:20:45 +03:00
anastasia
5abe2129c6 Extend replication protocol with ZentihFeedback message
to pass current_timeline_size to compute node

Put standby_status_update fields into ZenithFeedback and send them as one message.
Pass values sizes together with keys in ZenithFeedback message.
2022-01-27 11:20:45 +03:00
Heikki Linnakangas
17b7caddcb Update vendor/postgres: silence excessive logging from walproposer. 2022-01-14 20:51:02 +02:00
Arthur Petukhovsky
0a34a592d5 Bump vendor/postgres (#1120) 2022-01-13 20:28:37 +03:00
Arthur Petukhovsky
5a6405848d Bump vendor/postgres (#1086) 2022-01-05 14:27:51 +03:00
Alexey Kondratov
f64074c609 Move compute_tools from console repo (zenithdb/console#383)
Currently it's included with minimal changes and lives aside of the main
workspace. Later we may re-use and combine common parts with zenith
control_plane.

This change is mostly needed to unify cloud deployment pipeline:
1.1. build compute-tools image
1.2. build compute-node image based on the freshly built compute-tools
2. build zenith image

So we can roll new compute image and new storage required by it to
operate properly. Also it becomes easier to test console against some
specific version of compute-node/-tools.
2021-12-28 20:17:29 +03:00
anastasia
980f5f8440 Propagate remote_consistent_lsn to safekeepers.
Change meaning of lsns in HOT_STANDBY_FEEDBACK:
flush_lsn = disk_consistent_lsn,
apply_lsn = remote_consistent_lsn
Update compute node backpressure configuration respectively.

Update compute node configuration:
set 'synchronous_commit=remote_write' in setup without safekeepers.
This way compute node doesn't have to wait for data checkpoint on pageserver.
This doesn't guarantee data durability, but we only use this setup for tests, so it's fine.
2021-12-24 15:32:54 +03:00
Arthur Petukhovsky
f56db3da68 Bump vendor/postgres (#996) 2021-12-21 16:53:08 +03:00
anastasia
bb5aba42eb bump vendor/postgres to use correct backpressure commit 2021-12-08 18:57:18 +03:00
Dmitry Rodionov
557e3024cd Forward pageserver connection string from compute to safekeeper
This is needed for implementation of tenant rebalancing. With this
change safekeeper becomes aware of which pageserver is supposed to be
used for replication from this particular compute.
2021-12-06 21:28:49 +03:00
anastasia
b7685eb6ba Enable backpressure 2021-12-06 12:49:42 +03:00
anastasia
c7f3b4e62c Clarify the meaning of StandbyReply LSNs:
write_lsn - The last LSN received and processed by pageserver's walreceiver.
flush_lsn - same as write_lsn. At pageserver it doesn't guarantees data persistence, but it's fine. We rely on safekeepers.
apply_lsn - The LSN at which pageserver guaranteed persistence of all received data (disk_consistent_lsn).
2021-12-06 12:49:42 +03:00
Arseny Sher
cba4da3f4d Add term history to safekeepers.
Persist full history of term switches on safekeepers instead of storing only the
single term of the highest entry (called epoch). This allows easily and
correctly find the divergence point of two logs and truncate the obsolete part
before overwriting it with entries of the newer proposer(s).

Full history of the proposer is transferred in separate message before proposer
starts streaming; it is immediately persisted by safekeeper, though he might not
yet have entries for some older terms there. That's because we can't atomically
append to WAL and update the control file anyway, so locally available WAL must
be taken into account when looking at the history.

We should sometimes purge term history entries beyond truncate_lsn; this is not
done here.

Per https://github.com/zenithdb/rfcs/pull/12

Closes #296.

Bumps vendor/postgres.
2021-12-03 12:43:57 +03:00
Konstantin Knizhnik
f72d4814b1 Extract page images from FPI WAL records (#949)
* Extract page images from FPI WAL records

* Fix issues reported in review
2021-11-30 12:57:26 +03:00
Alexey Kondratov
0d6bf14ecb Use vendor/postgres rebased on top of REL_14_1 2021-11-12 16:27:43 +03:00
Arthur Petukhovsky
9aaa02bc9a Fix high CPU usage in walproposer (#860)
* Bump vendor/postgres

* Update time limits for test_restarts_under_load
2021-11-10 17:18:07 +03:00
Arseny Sher
229dc7704f Bump vendor/postgres. 2021-11-08 17:32:13 +03:00
Heikki Linnakangas
7ed39655dc Bump vendor/postgres 2021-11-04 10:35:50 +02:00
Arseny Sher
1877bbc7cb bump vendor/postgres to fix reconnection busy loop 2021-10-26 15:43:19 +03:00
Alexey Kondratov
9070a4dc02 Turn off back pressure by default 2021-10-22 01:40:43 +03:00
Konstantin Knizhnik
c310932121 Implement backpressure for compute node to avoid WAL overflow
Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>
2021-10-21 18:15:50 +03:00
Heikki Linnakangas
feae7f39c1 Support read-only nodes
Change 'zenith.signal' file to a human-readable format, similar to
backup_label. It can contain a "PREV LSN: %X/%X" line, or a special
value to indicate that it's OK to start with invalid LSN ('none'), or
that it's a read-only node and generating WAL is forbidden
('invalid').

The 'zenith pg create' and 'zenith pg start' commands now take a node
name parameter, separate from the branch name. If the node name is not
given, it defaults to the branch name, so this doesn't break existing
scripts.

If you pass "foo@<lsn>" as the branch name, a read-only node anchored
at that LSN is created. The anchoring is performed by setting the
'recovery_target_lsn' option in the postgresql.conf file, and putting
the server into standby mode with 'standby.signal'.

We no longer store the synthetic checkpoint record in the WAL segment.
The postgres startup code has been changed to use the copy of the
checkpoint record in the pg_control file, when starting in zenith
mode.
2021-10-19 09:48:12 +03:00
Heikki Linnakangas
e3945d94fd Store unlogged tables locally, and replace PD_WAL_LOGGED.
All the changes are in the vendor/postgres side. However, because we now
generate fewer Full Page Writes, the 'branch_behind' test needs to be
modified so that it still generates enough WAL to consume a few WAL
segments.
2021-10-06 10:58:15 +03:00
Heikki Linnakangas
86e14f2f1a Bump vendor/postgres 2021-09-30 20:36:57 +03:00
Heikki Linnakangas
c846a824de Bump vendor/postgres, to use buffered I/O in WAL redo process.
Greatly reduces the CPU overhead in the WAL redo process.
2021-09-24 21:48:30 +03:00
Max Sharnoff
139936197a bump vendor/postgres: Catch walkeeper ErrorResponse (#650)
Postgres commit message:

PQgetCopyData can sometimes indicate that the copy is done if the
backend returns an error response. So while we still expect that the
walkeeper never sends CopyDone, we can't expect it to never produce
errors.
2021-09-23 14:55:38 -07:00
Arseny Sher
97c4cd4434 bump vendor/postgres 2021-09-23 12:22:53 +03:00
Heikki Linnakangas
296586b7ce bump vendor/postgres 2021-09-20 18:52:55 +03:00
Heikki Linnakangas
ad5f16f724 Improve the protocol between Postgres and page server.
- Use different message formats for different kinds of response messages.

- Add an Error message, for passing errors from page server to Postgres.
  Previously, we would respond to 'exists' request with 'false', and
  to 'nblocks' request with 0, if an error happened. Fix those to return
  an error message to the client. GetPage requests had a mechanism to
  return an error, but it was just a flag with no error message.

- Add a flag to requests, to indicate that we actually want the latest
  page version on the timeline, and the LSN is just a hint that we know
  that there haven't been any modifications since that LSN. The flag isn't
  used for anything yet, but I'm planning to use it to fix
  https://github.com/zenithdb/zenith/issues/567
2021-09-17 16:38:14 +03:00
Arseny Sher
87bc18972f bump vendor/postgres 2021-09-16 11:41:29 +03:00
Arseny Sher
6dc66eefb6 bump vendor/postgres 2021-09-11 06:10:10 +03:00
Konstantin Knizhnik
f83108002b Revert "Bump postgres version"
This reverts commit 511873aaed.
2021-09-07 15:06:43 +03:00
Konstantin Knizhnik
511873aaed Bump postgres version 2021-09-07 15:05:08 +03:00
Arseny Sher
51d36b9930 bump vendor/postgres 2021-09-06 13:06:20 +03:00