Commit Graph

58 Commits

Author SHA1 Message Date
Arseny Sher
7ad5a5e847 Enable timeout on reading from socket in safekeeper WAL service.
TCP_KEEPALIVE is not enabled by default, so this prevents hanged up connections
in case of abrupt client termination. Add 'closed' flag to PostgresBackendReader
and pass it during handles join to prevent attempts to read from socket if we
errored out previously -- now with timeouts this is a common situation.

It looks like
2023-04-10T18:08:37.493448Z  INFO {cid=68}:WAL
receiver{ttid=59f91ad4e821ab374f9ccdf918da3a85/16438f99d61572c72f0c7b0ed772785d}:
terminated: timed out

Presumably fixes https://github.com/neondatabase/neon/issues/3971
2023-04-11 11:45:43 +04:00
Heikki Linnakangas
f0b2e076d9 Move compute_ctl structs used in HTTP API and spec file to separate crate.
This is in preparation of using compute_ctl to launch postgres nodes
in the neon_local control plane. And seems like a good idea to
separate the public interfaces anyway.

One non-mechanical change here is that the 'metrics' field is moved
under the Mutex, instead of using atomics. We were not using atomics
for performance but for convenience here, and it seems more clear to
not use atomics in the model for the HTTP response type.
2023-04-09 21:52:28 +03:00
Kirill Bulatov
1300dc9239 Replace Python IT test with the Rust one 2023-03-29 00:08:30 +03:00
Vadim Kharitonov
1401021b21 Be able to get number of CPUs (#3774)
After enabling autoscaling, we faced the issue that customers are not
able to get the number of CPUs they use at this moment. Therefore I've
added these two options:

1. Postgresql function to allow customers to call it whenever they want
2. `compute_ctl` endpoint to show these number in console
2023-03-10 19:00:20 +02:00
Arseny Sher
7627d85345 Move async postgres_backend to its own crate.
To untie cyclic dependency between sync and async versions of postgres_backend,
copy QueryError and some logging/error routines to postgres_backend.rs. This is
temporal glue to make commits smaller, sync version will be dropped by the
upcoming commit completely.
2023-03-09 20:45:56 +03:00
Alexander Bayandin
000eb1b069 Bump tempfile from 3.3.0 to 3.4.0 (#3709)
Update `tempfile` crate to get rid of `remove_dir_all` dependency
Ref https://github.com/neondatabase/neon/security/dependabot/15
2023-02-27 12:44:08 +00:00
Heikki Linnakangas
6f9af0aa8c [proxy] Enable OpenTelemetry tracing.
This commit sets up OpenTelemetry tracing and exporter, so that they
can be exported as OpenTelemetry traces as well.

All outgoing HTTP requests will be traced. A separate (child)
span is created for each outgoing HTTP request, and the tracing
context is also propagated to the server in the HTTP headers.

If tracing is enabled in the control plane and compute node too, you
can now get an end-to-end distributed trace of what happens when a new
connection is established, starting from the handshake with the
client, creating the 'start_compute' operation in the control plane,
starting the compute node, all the way to down to fetching the base
backup and the availability checks in compute_ctl.

Co-authored-by: Dmitry Ivanov <dima@neon.tech>
2023-02-17 15:32:14 +03:00
Dmitry Ivanov
9657459d80 [proxy] Fix possible unsoundness in the websocket machinery (#3569)
This PR replaces the ill-advised `unsafe Sync` impl with a de-facto
standard way to solve the underlying problem.

TLDR:
- tokio::task::spawn requires future to be Send
- ∀t. (t : Sync) <=> (&t : Send)
- ∀t. (t : Send + !Sync) => (&t : !Send)
2023-02-10 12:45:38 +03:00
Christian Schwarz
175a577ad4 automatic layer eviction
This patch adds a per-timeline periodic task that executes an eviction
policy. The eviction policy is configurable per tenant.

Two policies exist:
- NoEviction (the default one)
- LayerAccessThreshold

The LayerAccessThreshold policy examines the last access timestamp per
layer in the layer map and evicts the layer if that last access is
further in the past than a configurable threshold value.
This policy kind is evaluated periodically at a configurable period.
It logs a summary statistic at `info!()` or `warn!()` level, depending
on whether any evictions failed.

This feature has no explicit killswitch since it's off by default.
2023-02-09 13:33:55 +01:00
Christian Schwarz
58fa4f0eb7 maintain access stats for historic layers
This patch adds basic access statistics for historic layers
and exposes them in the management API's `LayerMapInfo`.

We record the accesses in the `{Delta,Image}Layer::load()` function
because it's the common path of
* page_service (`Timline::get_reconstruct_data()`)
* Compaction (`PersistentLayer::iter()` and `PersistentLayer::key_iter()`)

The stats survive residence status changes, and record these as well.

When scraping the layer map endpoint to record its evolution over time,
one must account for stat resets because they are in-memory only and
will reset on pageserver restart.
Use the launch timestamp header added by (#3527) to identify pageserver restarts.

This is PR https://github.com/neondatabase/neon/pull/3496
2023-02-06 17:01:38 +01:00
bojanserafimov
ada933eb42 Pageserver read trace utils (#2795)
List, dump, and analyze read traces.
2023-02-02 15:33:40 -05:00
Dmitry Ivanov
ea0278cf27 [proxy] Implement compute node info cache (#3331)
This patch adds a timed LRU cache implementation and a compute node info cache on top of that.
Cache entries might expire on their own (default ttl=5mins) or become invalid due to real-world events,
e.g. compute node scale-to-zero event, so we add a connection retry loop with a wake-up call.

Solved problems:
- [x] Find a decent LRU implementation.
- [x] Implement timed LRU on top of that.
- [x] Cache results of `proxy_wake_compute` API call.
- [x] Don't invalidate newer cache entries for the same key.
- [x] Add cmdline configuration knobs (requires some refactoring).
- [x] Add failed connection estab metric.
- [x] Refactor auth backends to make things simpler (retries, cache
placement, etc).
- [x] Address review comments (add code comments + cleanup).
- [x] Retry `/proxy_wake_compute` if we couldn't connect to a compute
(e.g. stalled cache entry).
- [x] Add high-level description for `TimedLru`.

TODOs (will be addressed later):
- [ ] Add cache metrics (hit, spurious hit, miss).
- [ ] Synchronize http requests across concurrent per-client tasks
(https://github.com/neondatabase/neon/pull/3331#issuecomment-1399216069).
- [ ] Cache results of `proxy_get_role_secret` API call.
2023-02-01 17:11:41 +03:00
Heikki Linnakangas
006ee5f94a Configure 'compute_ctl' to use OpenTelemetry exporter.
This allows tracing the startup actions e.g. with Jaeger
(https://www.jaegertracing.io/). We use the "tracing-opentelemetry"
crate, which turns tracing spans into OpenTelemetry spans, so you can
use the usual "#[instrument]" directives to add tracing.

I put the tracing initialization code to a separate crate,
`tracing-utils`, so that we can reuse it in other programs. We
probably want to set up tracing in the same way in all our programs.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-01-26 15:20:03 +02:00
bojanserafimov
a3d7ad2d52 Implement layer map using immutable BST (#2998) 2023-01-20 16:10:12 -05:00
Anastasia Lubennikova
2cbe84b78f Proxy metrics (#3290)
Implement proxy metrics collection.
Only collect metric for outbound traffic.

Add proxy CLI parameters:
- metric-collection-endpoint
- metric-collection-interval.

Add test_proxy_metric_collection test.

Move shared consumption metrics code to libs/consumption_metrics.
Refactor the code.
2023-01-16 15:17:28 +00:00
Kirill Bulatov
bce4233d3a Rework Cargo.toml dependencies (#3322)
* Use workspace variables from cargo, coming with rustc
[1.64](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1640-2022-09-22)

See
https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-package-table
and
https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-dependencies-table
sections.

Now, all dependencies in all non-root `Cargo.toml` files are defined as 
```
clap.workspace = true
```

sometimes, when extra features are needed, as 
```
bytes = {workspace = true, features = ['serde'] }
```

With the actual declarations (with shared features and version
numbers/file paths/etc.) in the root Cargo.toml.
Features are additive:

https://doc.rust-lang.org/nightly/cargo/reference/specifying-dependencies.html#inheriting-a-dependency-from-a-workspace

* Uses the mechanism above to set common, 2021, edition and license across the
workspace

* Mechanically bumps a few dependencies

* Updates hakari format, as it suggested:
```
work/neon/neon kb/cargo-templated ❯ cargo hakari generate
info: no changes detected
info: new hakari format version available: 3 (current: 2)
(add or update `dep-format-version = "3"` in hakari.toml, then run `cargo hakari generate && cargo hakari manage-deps`)
```
2023-01-13 18:13:34 +02:00
Vadim Kharitonov
29a2465276 Update rust version in toolchain 2023-01-12 15:16:52 +01:00
Dmitry Ivanov
61194ab2f4 Update rust-postgres everywhere
I've rebased[1] Neon's fork of rust-postgres to incorporate
latest upstream changes (including dependabot's fixes),
so we need to advance revs here as well.

[1] https://github.com/neondatabase/rust-postgres/commits/neon
2022-12-17 00:26:10 +03:00
Dmitry Ivanov
83baf49487 [proxy] Forward compute connection params to client
This fixes all kinds of problems related to missing params,
like broken timestamps (due to `integer_datetimes`).

This solution is not ideal, but it will help. Meanwhile,
I'm going to dedicate some time to improving connection machinery.

Note that this **does not** fix problems with passing certain parameters
in a reverse direction, i.e. **from client to compute**. This is a
separate matter and will be dealt with in an upcoming PR.
2022-12-16 21:37:50 +03:00
Arseny Sher
2d42f84389 Add storage_broker binary.
Which ought to replace etcd. This patch only adds the binary and adjusts
Dockerfile to include it; subsequent ones will add deploy of helm chart and the
actual replacement.

It is a simple and fast pub-sub message bus. In this patch only safekeeper
message is supported, but others can be easily added.

Compilation now requires protoc to be installed. Installing protobuf-compiler
package is fine for Debian/Ubuntu.

ref
https://github.com/neondatabase/neon/pull/2733
https://github.com/neondatabase/neon/issues/2394
2022-11-23 22:05:59 +04:00
Kirill Bulatov
f87017c04d Omit dependencies' debug info (#2803)
Based on https://neondb.slack.com/archives/C0277TKAJCA/p1668079753506749

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
2022-11-14 12:44:41 +00:00
Heikki Linnakangas
14c623b254 Make it possible to build with old cargo version.
I'm using the Rust compiler and cargo versions from Debian packages,
but the latest available cargo Debian package is quite old, version
1.57.  The 'named-profiles' features was not stabilized at that
version yet, so ever since commit a463749f5, I've had to manually add
this line to the Cargo.toml file to compile. I've been wishing that
someone would update the cargo Debian package, but it doesn't seem to
be happening any time soon.

This doesn't seem to bother anyone else but me, but it shouldn't hurt
anyone else either. If there was a good reason, I could install a
newer cargo version with 'rustup', but if all we need is this one line
in Cargo.toml, I'd prefer to continue using the Debian packages.
2022-10-13 15:17:00 +03:00
Kirill Bulatov
b8eb908a3d Rename old project name references 2022-09-14 08:14:05 +03:00
Heikki Linnakangas
a4e79db348 Move neon_local to control_plane.
Seems a bit silly to have a separate crate just for the executable. It
relies on the control plane for everything it does, and it's the only
user of the control plane.
2022-09-02 16:34:33 +03:00
MMeent
a463749f59 Slim down compute-node images (#2346)
Slim down compute-node images:

- Optimize compute_ctl build for size, not performance & debug-ability
- Don't run unused stages. Saves time in not building the PLV8 extension.
- Do not include static libraries in clean postgres
- Do the installation and finishing touches in the final layer in one job
    This allows docker (and kaniko) to only register one change to the files,
    removing potentially duplicate changed files.
- The runtime library for libreadline-dev is libreadline8, changing the dependency saves 45 MB
- libprotobuf-c-dev -> libprotobuf-c1, saving 100 kB
- libossp-uuid-dev -> libossp-uuid16, saving 150 kB
- gdal-bin + libgdal-dev -> libgeos-c1v5 + libgdal28 + libproj19, saving 747MB
- binutils @ testing -> libc6 @ testing, saving 32 MB
2022-09-02 14:34:40 +02:00
bojanserafimov
ef40e404cf Rename zenith crate to neon_local (#1625) 2022-05-05 19:06:53 -04:00
Dmitry Ivanov
d3f356e7a8 Update rust-postgres project-wide (#1525)
* Update `rust-postgres` project-wide

This commit points to https://github.com/neondatabase/rust-postgres/commits/neon
in order to test our patches on top of the latest version of this crate.

* [proxy] Update `hmac` and `sha2`
2022-04-22 17:31:58 +03:00
Kirill Bulatov
81cad6277a Move and library crates into a dedicated directory and rename them 2022-04-21 13:30:33 +03:00
Kirill Bulatov
629688fd6c Drop redundant resolver setting for 2021 edition 2022-04-21 13:30:33 +03:00
Kirill Bulatov
81417788c8 walkeeper -> safekeeper 2022-04-18 12:52:31 +03:00
Dmitry Ivanov
ab20f2c491 Use the same version of rust-postgres everywhere. (#1516)
Turns out we still had a stale dep in `compute_tools`.
2022-04-15 18:36:11 +03:00
Dmitry Rodionov
eee0f51e0c use cargo-hakari to manage workspace_hack crate
workspace_hack is needed to avoid recompilation when different crates
inside the workspace depend on the same packages but with different
features being enabled. Problem occurs when you build crates separately
one by one. So this is irrelevant to our CI setup because there we build
all binaries at once, but it may be relevant for local development.

this also changes cargo's resolver version to 2
2022-03-29 10:42:04 +03:00
Dmitry Ivanov
a47dade622 [proxy] Migrate to async
This change makes most parts of the code asynchronous, except
for the `mgmt` subsystem (we're going to drop it anyway).

Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
2022-02-17 11:54:27 +03:00
Alexey Kondratov
06c28174c2 Integrate compute_tools into zenith workspace and improve logging (zenithdb/console#487) 2022-01-18 18:47:31 +03:00
Dmitry Ivanov
cb1b4a12a6 Add some prometheus metrics to pageserver
The metrics are served by an http endpoint, which
is meant to be spawned in a new thread.

In the future the endpoint will provide more APIs,
but for the time being, we won't bother with proper routing.
2021-08-03 21:42:24 +03:00
Stas Kelvich
bf45bef284 md5 auth for postgres_backend.rs 2021-07-19 14:52:41 +03:00
Dmitry Ivanov
eb7388e3e8 Add debug info to release builds
This is useful for profiling and, to some extent, debug.
Besides, debug info should not affect the performance.
2021-06-28 14:21:30 +03:00
Arseny Sher
37b0236e9a Move wal acceptor tests to python.
Includes fixtures for wal acceptors and associated setup.

Nothing really new here, but surprisingly this caught some issues in
walproposer.

ref #182
2021-06-15 15:14:27 +03:00
Eric Seppanen
df5a55c445 add workspace_hack crate
Our builds can be a little inconsistent, because Cargo doesn't deal well
with workspaces where there are multiple crates which have different
dependencies that select different features. As a workaround, copy what
other big rust projects do: add a workspace_hack crate.

This crate just pins down a set of dependencies and features that
satisfies all of the workspace crates.

The benefits are:
- running `cargo build` from one of the workspace subdirectories now
  works without rebuilding anything.
- running `cargo install` works (without rebuilding anything).
- making small dependency changes is much less likely to trigger large
  dependency rebuilds.
2021-05-07 13:08:31 -07:00
Eric Seppanen
f387769203 add zenith_utils crate
This is a place for code that's shared between other crates in this
repository.
2021-04-20 11:11:29 -07:00
Heikki Linnakangas
3600b33f1c Implement "timelines" in page server
This replaces the page server's "datadir" concept. The Page Server now
always works with a "Zenith Repository". When you initialize a new
repository with "zenith init", it runs initdb and loads an initial
basebackup of the freshly-created cluster into the repository, on "main"
branch. Repository can hold multiple "timelines", which can be given
human-friendly names, making them "branches". One page server simultaneously
serves all timelines stored in the repository, and you can have multiple
Postgres compute nodes connected to the page server, as long they all
operate on a different timeline.

There is a new command "zenith branch", which can be used to fork off
new branches from existing branches.

The repository uses the directory layout desribed as Repository format
v1 in https://github.com/zenithdb/rfcs/pull/5. It it *highly* inefficient:
- we never create new snapshots. So in practice, it's really just a base
  backup of the initial empty cluster, and everything else is reconstructed
  by redoing all WAL

- when you create a new timeline, the base snapshot and *all* WAL is copied
  from the new timeline to the new one. There is no smarts about
  referencing the old snapshots/wal from the ancestor timeline.

To support all this, this commit includes a bunch of other changes:

- Implement "basebackup" funtionality in page server. When you initialize
  a new compute node with "zenith pg create", it connects to the page
  server, and requests a base backup of the Postgres data directory on
  that timeline. (the base backup excludes user tables, so it's not
  as bad as it sounds).

- Have page server's WAL receiver write the WAL into timeline dir. This
  allows running a Page Server and Compute Nodes without a WAL safekeeper,
  until we get around to integrate that properly into the system. (Even
  after we integrate WAL safekeeper, this is perhaps how this will operate
  when you want to run the system on your laptop.)

- restore_datadir.rs was renamed to restore_local_repo.rs, and heavily
  modified to use the new format. It now also restores all WAL.

- Page server no longer scans and restores everything into memory at startup.
  Instead, when the first request is made for a timeline, the timeline is
  slurped into memory at that point.

- The responsibility for telling page server to "callmemaybe" was moved
  into Postgres libpqpagestore code. Also, WAL producer connstring cannot
  be specified in the pageserver's command line anymore.

- Having multiple "system identifiers" in the same page server is no
  longer supported. I repurposed much of that code to support multiple
  timelines, instead.

- Implemented very basic, incomplete, support for PostgreSQL's Extended
  Query Protocol in page_service.rs. Turns out that rust-postgres'
  copy_out() function always uses the extended query protocol to send
  out the command, and I'm using that to stream the base backup from the
  page server.

TODO: I haven't fixed the WAL safekeeper for this scheme, so all the
integration tests involving safekeepers are failing. My plan is to modify
the safekeeper to know about Zenith timelines, too, and modify it to work
with the same Zenith repository format. It only needs to care about the
'.zenith/timelines/<timeline>/wal' directories.
2021-04-20 19:11:27 +03:00
Stas Kelvich
59163cf3b3 Rework controle_plane code to reuse it in CLI.
Move all paths from control_plane to local_env which can set them
for testing environment or for local installation.
2021-04-10 12:09:20 +03:00
Heikki Linnakangas
1367332447 Separate walkeeper and pageserver sources into different directories.
The integration tests, which depend on both walkeeper and pageserver,
are moved into yet another directory, 'integration_tests'.
2021-04-06 13:15:26 +03:00
Konstantin Knizhnik
13f507f0b4 Calculate records CRC in wal decoder 2021-04-02 10:30:56 +03:00
Konstantin Knizhnik
02ca245081 Port wal_acceptor to rust 2021-04-02 10:30:56 +03:00
anastasia
853c130ff0 [issue #7] CLI parse subcommands 2021-03-29 15:59:28 +03:00
Stas Kelvich
8a80d055b9 daemon mode for pageserver 2021-03-29 15:59:28 +03:00
Heikki Linnakangas
4c0be32bf5 Implement Text User Interface to show log streams in multiple "windows"
Switch to 'slog' crate for logging, it gives us the flexibility that we
need for the widget to scroll logs on TUI
2021-03-29 15:59:28 +03:00
Stas Kelvich
9e89c1e2cd add CLI options to pageserver 2021-03-29 15:59:28 +03:00
Heikki Linnakangas
303a546aba Refactor locking in page cache, and use async I/O for WAL redo
Story on why:

The apply_wal_records() function spawned the special postgres process
to perform WAL redo. That was done in a blocking fashion: it launches
the process, then it writes the command to its stdin, then it reads
the result from its stdout.  I wanted to also read the child process's
stderr, and forward it to the page server's log (which is just the
page server's stderr ATM). That has classic potential for deadlock:
the child process might block trying to write to stderr/stdout, if the
parent isn't reading it. So the parent needs to perform the read/write
with the child's stdin/stdout/stderr in an async fashion.  So I
refactored the code in walredo.c into async style.  But it started to
hang. It took me a while to figure it out; async makes for really ugly
stacktraces, it's hard to figure out what's going on. The call path
goes like this: Page service -> get_page_at_lsn() in page cache ->
apply_wal_records() the page service is written in async style. And I
refactored apply_wal_recorsds() to also be async. BUT,
get_page_at_lsn() acquires a lock, in a blocking fashion.

The lock-up happened like this:

- a GetPage@LSN request arrives. The asynch handler thread calls
  get_page_at_lsn(), which acquires a lock. While holding the lock,
  it calls apply_wal_records().
- apply_wal_records() launches the child process, and waits on it
  using async functions
- more GetPage@LSN requests arrive. They also call get_page_at_lsn().
  But because the lock is already held, they all block

The subsequent GetPage@LSN calls that block waiting on the lock use up
all the async handler threads. All the threads are locked up, so there
is no one left to make progress on the apply_wal_records() call, so it
never releases the lock. Deadlock So my lesson here is that mixing
async and blocking styles is painful. Googling around, this is a well
known problem, there are long philosophical discussions on "what color
is your function".  My plan to fix that is to move the WAL redo into a
separate thread or thread pool, and have the GetPage@LSN handlers
communicate with it using channels.  Having a separate thread pool for
it makes sense anyway in the long run. We'll want to keep the postgres
process around, rather than launch it separately every time we need to
reconstruct a page. Also, when we're not busy reconstructing pages
that are needed right now by GetPage@LSN calls, we want to proactively
apply incoming WAL records from a "backlog".

Solution:

Launch a dedicated thread for WAL redo at startup. It has an event loop,
where it listens on a channel for requests to apply WAL. When a page
server thread needs some WAL to be applied, it sends the request on
the channel, and waits for response. After it's done the WAL redo process
puts the new page image in the page cache, and wakes up the requesting
thread using a condition variable.

This also needed locking changes in the page cache. Each cache entry now
has a reference counter and a dedicated Mutex to protect just the entry.
2021-03-29 15:59:28 +03:00