6 Commits

Author SHA1 Message Date
Peter Bendel
4bc6dbdd5f use a prod-like shared_buffers size for some perf unit tests (#11373)
## Problem

In Neon DBaaS we adjust the shared_buffers to the size of the compute,
or better described we adjust the max number of connections to the
compute size and we adjust the shared_buffers size to the number of max
connections according to about the following sizes
`2 CU: 225mb; 4 CU: 450mb; 8 CU: 900mb`

[see](877e33b428/goapp/controlplane/internal/pkg/compute/computespec/pg_settings.go (L405))

## Summary of changes

We should run perf unit tests with settings that is realistic for a
paying customer and select 8 CU as the reference for those tests.
2025-04-02 10:43:05 +00:00
Konstantin Knizhnik
1783501eaa Increase max connection for replica to prevent test flukyness (#10306)
## Problem

See https://github.com/neondatabase/neon/issues/10167
Too small number of `max_connections` (2) can cause failures of
test_physical_replication_config_mismatch_too_many_known_xids test

## Summary of changes

Increase `max_connections` to 5

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-01-13 20:01:03 +00:00
Konstantin Knizhnik
97a9abd181 Add GUC controlling whether to pause recovery if some critical GUCs at replica have smaller value than on primary (#9057)
## Problem

See https://github.com/neondatabase/neon/issues/9023

## Summary of changes

Ass GUC `recovery_pause_on_misconfig` allowing not to pause in case of
replica and primary configuration mismatch

See https://github.com/neondatabase/postgres/pull/501
See https://github.com/neondatabase/postgres/pull/502
See https://github.com/neondatabase/postgres/pull/503
See https://github.com/neondatabase/postgres/pull/504


## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2024-12-01 12:23:10 +00:00
Tristan Partin
2a3410d1c3 Hide import behind TYPE_CHECKING
No need to import it if we aren't type checking anything.
2024-07-08 07:30:22 -07:00
Christian Schwarz
79401638df remove materialized page cache (#8105)
part of Epic https://github.com/neondatabase/neon/issues/7386

# Motivation

The materialized page cache adds complexity to the code base, which
increases the maintenance burden and risk for subtle and hard to
reproduce bugs such as #8050.

Further, the best hit rate that we currently achieve in production is ca
1% of materialized page cache lookups for
`task_kind=PageRequestHandler`. Other task kinds have hit rates <0.2%.

Last, caching page images in Pageserver rewards under-sized caches in
Computes because reading from Pageserver's materialized page cache over
the network is often sufficiently fast (low hundreds of microseconds).
Such Computes should upscale their local caches to fit their working
set, rather than repeatedly requesting the same page from Pageserver.

Some more discussion and context in internal thread
https://neondb.slack.com/archives/C033RQ5SPDH/p1718714037708459

# Changes

This PR removes the materialized page cache code & metrics.

The infrastructure for different key kinds in `PageCache` is left in
place, even though the "Immutable" key kind is the only remaining one.
This can be further simplified in a future commit.

Some tests started failing because their total runtime was dependent on
high materialized page cache hit rates. This test makes them
fixed-runtime or raises pytest timeouts:
* test_local_file_cache_unlink
* test_physical_replication
* test_pg_regress

# Performance

I focussed on ensuring that this PR will not result in a performance
regression in prod.

* **getpage** requests: our production metrics have shown the
materialized page cache to be irrelevant (low hit rate). Also,
Pageserver is the wrong place to cache page images, it should happen in
compute.
* **ingest** (`task_kind=WalReceiverConnectionHandler`): prod metrics
show 0 percent hit rate, so, removing will not be a regression.
* **get_lsn_by_timestamp**: important API for branch creation, used by
control pane. The clog pages that this code uses are not
materialize-page-cached because they're not 8k. No risk of introducing a
regression here.

We will watch the various nightly benchmarks closely for more results
before shipping to prod.
2024-06-20 11:56:14 +02:00
Konstantin Knizhnik
7fab731f65 Track size of FSM fork while applying records at replica (#5901)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1700560921471619

## Summary of changes

Update relation size cache for FSM fork in WAL records filter

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-12-05 18:49:24 +02:00