Commit Graph

7841 Commits

Author SHA1 Message Date
Heikki Linnakangas
e2bad5d9e9 Add debugging HTTP endpoint for dumping the cache tree 2025-05-12 22:54:03 +03:00
Heikki Linnakangas
5623e4665b bunch of fixes 2025-05-12 18:40:54 +03:00
Heikki Linnakangas
8abb4dab6d implement shrinking nodes 2025-05-12 03:57:10 +03:00
Heikki Linnakangas
731667ac37 better metrics of the art tree 2025-05-12 02:08:51 +03:00
Heikki Linnakangas
6a1374d106 Pack tree node structs more tightly, avoiding alignment padding 2025-05-12 01:01:58 +03:00
Heikki Linnakangas
f7c908f2f0 more metrics 2025-05-12 01:01:50 +03:00
Heikki Linnakangas
86671e3a0b Add a bunch of metric counters 2025-05-11 20:11:13 +03:00
Heikki Linnakangas
319cd74f73 Fix eviction 2025-05-11 19:34:50 +03:00
Heikki Linnakangas
0efefbf77c Add a few metrics, fix page eviction 2025-05-10 03:13:28 +03:00
Heikki Linnakangas
e6a4171fa1 fix concurrency issues with the LFC
- Add another locking hash table to track which cached pages are currently being
  modified, by smgrwrite() or smgrread() or by prefetch.

- Use single-value Leaf pages in the art tree. That seems simpler after all,
  and it eliminates some corner cases where a Value needed to be cloned, which
  made it tricky to use atomics or other interior mutability on the Values
2025-05-10 02:36:48 +03:00
Heikki Linnakangas
0c25ea9e31 reduce LOG noise 2025-05-09 18:27:36 +03:00
Heikki Linnakangas
6692321026 Remove dependency on io_uring, use plain std::fs ops instead
io_uring is a great idea in the long term, but for now, let's make it
easier to develop locally on macos, where io_uring is not available.
2025-05-06 17:46:21 +03:00
Heikki Linnakangas
791df28755 Linked list fix and add unit test 2025-05-06 16:46:54 +03:00
Heikki Linnakangas
d20da994f4 git add missing file 2025-05-06 15:36:48 +03:00
Heikki Linnakangas
6dbbdaae73 run 'cargo fmt' 2025-05-06 15:35:56 +03:00
Heikki Linnakangas
977bc09d2a Bunch of fixes, smarter iterator, metrics exporter 2025-05-06 15:28:50 +03:00
Heikki Linnakangas
44269fcd5e Implement simple eviction and free block tracking 2025-05-06 15:28:15 +03:00
Heikki Linnakangas
44cc648dc8 Implement iterator over keys
the implementation is not very optimized, but probably good enough for an MVP
2025-05-06 15:27:38 +03:00
Heikki Linnakangas
884e028a4a implement deletion in art tree 2025-05-06 15:27:38 +03:00
Heikki Linnakangas
42df3e5453 debugging stats 2025-05-06 15:27:38 +03:00
Heikki Linnakangas
fc743e284f more work on allocators 2025-05-06 15:27:38 +03:00
Heikki Linnakangas
d02f9a2139 Collect garbage, handle OOMs 2025-05-06 15:27:38 +03:00
Heikki Linnakangas
083118e98e Implement epoch system 2025-05-06 15:27:38 +03:00
Heikki Linnakangas
54cd2272f1 more memory allocation stuff 2025-05-06 15:27:38 +03:00
Heikki Linnakangas
e40193e3c8 simple block-based allocator 2025-05-06 15:27:38 +03:00
Heikki Linnakangas
ce9f7bacc1 Fix communicator client for recent changes in protocol and client code 2025-05-06 15:26:51 +03:00
Heikki Linnakangas
b7891f8fe8 Include 'neon-shard-id' header in client requests 2025-05-06 15:23:30 +03:00
Elizabeth Murray
5f2adaa9ad Remove some additional debug info messages. 2025-05-02 10:50:53 -07:00
Elizabeth Murray
3e5e396c8d Remove some debug info messages. 2025-05-02 10:24:18 -07:00
Elizabeth Murray
9d781c6fda Add a connection pool module to the grpc client. 2025-05-02 10:22:33 -07:00
Erik Grinaker
cf5d038472 service documentation 2025-05-02 15:20:12 +02:00
Erik Grinaker
d785100c02 page_api: add GetPageRequest::class 2025-05-02 10:48:32 +02:00
Erik Grinaker
2c0d930e3d page_api: add GetPageResponse::status 2025-04-30 16:48:45 +02:00
Erik Grinaker
66171a117b page_api: add GetPageRequestBatch 2025-04-30 15:31:11 +02:00
Erik Grinaker
df2806e7a0 page_api: add GetPageRequest::id 2025-04-30 15:00:16 +02:00
Erik Grinaker
07631692db page_api: protobuf comments 2025-04-30 12:36:11 +02:00
Erik Grinaker
4c77397943 Add neon-shard-id header 2025-04-30 11:18:06 +02:00
Erik Grinaker
7bb58be546 Use authorization header instead of neon-auth-token 2025-04-30 10:38:44 +02:00
Erik Grinaker
b5373de208 page_api: add get_slru_segment() 2025-04-29 17:59:27 +02:00
Erik Grinaker
b86c610f42 page_api: tweaks 2025-04-29 17:23:51 +02:00
Erik Grinaker
0f520d79ab pageserver: rename data_api to page_api 2025-04-29 15:58:52 +02:00
Heikki Linnakangas
93eb7bb6b8 include lots of changes that went missing by accident 2025-04-29 15:32:27 +03:00
Heikki Linnakangas
e58d0fece1 New communicator, with "integrated" cache accessible from all processes 2025-04-29 11:52:44 +03:00
Alex Chi Z.
11f6044338 fix(pageserver): report synthetic size = 1 if all tls offloaded (2) (#11731)
## Problem

https://github.com/neondatabase/neon/pull/11648 did this for resident
size instead of synthetic size.

## Summary of changes

Report synthetic_size == 1 if all timelines are offloaded.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-28 13:45:45 +00:00
Konstantin Knizhnik
692c0f3fb8 Prepare to prewarm support (#11740)
## Problem

See
(original prewarm implementation)
https://github.com/neondatabase/neon/pull/9197
(functions for storing/restoring LFC state)
https://github.com/neondatabase/neon/pull/9587
(store prefetch results in LFC)
https://github.com/neondatabase/neon/pull/10442

## Summary of changes

Preparation for prewarm implementation.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-28 13:24:18 +00:00
Alexander Bayandin
2b1d2a55d6 CI: fix typo oicd -> oidc (#11747)
## Problem

It's OIDC (OpenID Connect), not OICD

## Summary of changes
- Rename actions input `aws-oicd-role-arn` -> `aws-oidc-role-arn`
2025-04-28 12:44:28 +00:00
Konstantin Knizhnik
60b9fb1baf Ignore unlogged LSNs in set last written LSN (#11743)
## Problem

See https://github.com/neondatabase/neon/issues/11718
and https://neondb.slack.com/archives/C033RQ5SPDH/p1745122797538509

GIST other indexes performing "unlogged build" are using so called fake
LSNs - not a real LSN, but something like 0/1. Been stored in lwlsn
cache they cause incorrect lookup at PS.

## Summary of changes

Do not store fake LSNs in LwLSN hash.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-28 12:16:29 +00:00
Erik Grinaker
606f14034e pageserver: improve pageserver_smgr_query_seconds buckets (#11680)
## Problem

The `pageserver_smgr_query_seconds` buckets are too coarse, using powers
of 10: 1 µs, 10 µs, 100 µs, 1 ms, 10 ms, 100 ms, 1 s, 10 s, 100 s. This
is one of our most crucial latency metrics, and needs better resolution.

Touches #11594.

## Summary of changes

This patch uses buckets with better resolution around 1 ms (the typical
latency):

* 0.6 ms
* 1 ms
* 3 ms
* 6 ms
* 10 ms
* 30 ms
* 100 ms
* 1 s
* 3 s

These will be the same as the compute's `compute_getpage_wait_seconds`,
to make them comparable across the compute and Pageserver:
https://github.com/neondatabase/flux-fleet/pull/579. We sacrifice
buckets above 3 s, since these can already be considered "too slow".

This does not change the previously used `CRITICAL_OP_BUCKETS`, which is
also used for other operations on different timescales (e.g. LSN waits).
We should consider replacing this with more appropriate buckets for
specific operations, since it covers a large span with low resolution.
2025-04-28 11:52:44 +00:00
Conrad Ludgate
32393b4393 pg-sni-router: support compute TLS on different port (#11732)
## Problem

pg-sni-router isn't aware of compute TLS

## Summary of changes

If connections come in on port 4433, we require TLS to compute from
pg-sni-router
2025-04-28 11:29:44 +00:00
Alexander Bayandin
1a29f5672a CI(check-macos-build): trigger workflow automatically for PRs (#11706)
## Problem

- if-conditions for the `check-macos-build` workflow don't trigger it on
PRs with relevant changes (in Rust code or Postgres submodules).
- Jobs in the workflow depend on the presence of a cache, which is not
guaranteed.

## Summary of changes
- Fix if-conditions
- Use artifacts on top of cache whenever the workflow depends on it —
the cache might not be available
2025-04-28 09:03:10 +00:00