Commit Graph

4417 Commits

Author SHA1 Message Date
Abhijeet Patil
f19a8cecf0 removing the nextest filter 2024-01-19 11:32:26 +00:00
Abhijeet Patil
c83de86038 chaning debug build to gcc 2024-01-19 11:32:26 +00:00
Abhijeet Patil
55f549a404 chaning debug build to gcc 2024-01-19 11:32:26 +00:00
Abhijeet Patil
d698a7d1b1 reverting back the debug compiler to clang 2024-01-19 11:32:26 +00:00
Abhijeet Patil
60202936fe testing if debug build will work with gcc 2024-01-19 11:32:26 +00:00
Abhijeet Patil
27b47c65f8 fixing build 2024-01-19 11:32:26 +00:00
Abhijeet Patil
f8a8ff8184 fixing build 2024-01-19 11:32:26 +00:00
Abhijeet Patil
8a67dc396d fixing build 2024-01-19 11:32:26 +00:00
Abhijeet Patil
dbf4fe6c65 added enabled sanitizers only in debug build 2024-01-19 11:32:26 +00:00
Abhijeet Patil
526366b950 fixing build 2024-01-19 11:32:26 +00:00
Abhijeet Patil
d22ccd2392 fixing build 2024-01-19 11:32:26 +00:00
Abhijeet Patil
af9c10c319 fixing build 2024-01-19 11:32:26 +00:00
Abhijeet Patil
3ea0e13401 adding back the clang compiler for postgres headers 2024-01-19 11:32:26 +00:00
Abhijeet Patil
8fbeb7dc6f removing clang cc from headers 2024-01-19 11:32:26 +00:00
Abhijeet Patil
53f4b347a1 setting compiler to clang for debug and to gcc for release 2024-01-19 11:32:26 +00:00
Abhijeet Patil
c4dba3577f setting compiler to default i.e. gcc 2024-01-19 11:32:26 +00:00
Abhijeet Patil
9c911dbaf2 printing env 2024-01-19 11:32:26 +00:00
Abhijeet Patil
5f2f1a7e6e printing env 2024-01-19 11:32:26 +00:00
Abhijeet Patil
59cf9cf799 also building release builds 2024-01-19 11:32:26 +00:00
Abhijeet Patil
3c143976bb testing the regression test for debug and release branch 2024-01-19 11:32:26 +00:00
Abhijeet Patil
3e62479382 ammended review comments 2024-01-19 11:32:26 +00:00
Abhijeet Patil
2a857765e5 fixing build 2024-01-19 11:32:26 +00:00
Abhijeet Patil
0c3e41e430 fixing build 2024-01-19 11:32:26 +00:00
Abhijeet Patil
97b20eee40 fixing build 2024-01-19 11:32:26 +00:00
Abhijeet Patil
e067cb251d testing debug changes 2024-01-19 11:32:26 +00:00
Abhijeet Patil
ecffa25feb added debug info 2024-01-19 11:32:26 +00:00
Abhijeet Patil
2739ca00cd based on the debug build setting sanitizer flag 2024-01-19 11:32:26 +00:00
Abhijeet Patil
a46960a855 using extend method instead of append
refactored the code to use a method to combine two line together

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2024-01-19 11:32:26 +00:00
Abhijeet Patil
c3439466e5 renabled libseccompo 2024-01-19 11:32:26 +00:00
Abhijeet Patil
4a24620ed2 fix build 2024-01-19 11:32:26 +00:00
Abhijeet Patil
11570f7706 fix format of file 2024-01-19 11:32:26 +00:00
Abhijeet Patil
5c938ee98c added page server warning message exclusion 2024-01-19 11:32:26 +00:00
Abhijeet Patil
b37244ab0c added error to allowed list 2024-01-19 11:32:26 +00:00
Abhijeet Patil
e584ff5630 removed release build to test regress test 2024-01-19 11:32:26 +00:00
Abhijeet Patil
28715b64df fixing build 2024-01-19 11:32:26 +00:00
Abhijeet Patil
003624b817 updated lastest from postgres 16 2024-01-19 11:32:26 +00:00
Abhijeet Patil
acfe048bb7 fixing build 2024-01-19 11:32:26 +00:00
abhijeet
330d9a8b02 testing if santiser work
moved sanitizers in its owm workflow

merged all jobs into onme

cleaned up failing job

cleaned up failing job

running just tests

fixing build

reverting changes

fixing linter error and build error

clearning up job

added wal and extension builds

fixing build

fixing build

fixing build

added use sanitizer patch

testing if sanitiser work in main workflow

fixed format issue

fixing format issue

fixing format issue

added flags

disabled flags

enabling flags

enabling flags

added more options to flag

fixing build

fixing build

testing the regression run

added asan and usban flag for regression test

commented unit test and release build

fixing build

fix neon for sanitizers

enabled unit test

updated branch to test the fix

updated branch to test the fix

updated the commit id

fixing build

restoring the submodules to main

updated git modules and revision of commit

updated postgres 16 vendor dir

removed test
2024-01-19 11:32:26 +00:00
Alexander Bayandin
c65ac37a6d zenbenchmark: attach perf results to allure report (#6395)
## Problem

For PRs with `run-benchmarks` label, we don't upload results to the db,
making it harder to debug such tests. The only way to see some
numbers is by examining GitHub Action output which is really
inconvenient.
This PR adds zenbenchmark metrics to Allure reports.

## Summary of changes
- Create a json file with zenbenchmark results and attach it to allure
report
2024-01-18 20:59:43 +00:00
Arthur Petukhovsky
a092127b17 Fix truncateLsn initialization (#6396)
In
7f828890cf
we changed the logic for persisting control_files. Previously it was
updated if `peer_horizon_lsn` jumped more than one segment, which made
`peer_horizon_lsn` initialized on disk as soon as safekeeper has
received a first `AppendRequest`.

This caused an issue with `truncateLsn`, which now can be zero
sometimes. This PR fixes it, and now `truncateLsn/peer_horizon_lsn` can
never be zero once we know `timeline_start_lsn`.

Closes https://github.com/neondatabase/neon/issues/6248
2024-01-18 18:55:24 +00:00
Christian Schwarz
e8f773387d pagebench: avoid noise about CopyFail in PS logs (#6392)
Before this patch, pagebench get-page-latest-lsn would sometimes cause
noisy errors in pageserver log about `CopyFail` protocol message.

refs https://github.com/neondatabase/neon/issues/6390
2024-01-18 18:50:42 +00:00
Christian Schwarz
00936d19e1 pagebench: use tracing panic hook (#6393) 2024-01-18 18:39:38 +00:00
Joonas Koivunen
57155ada77 temp: human readable summaries for relative access time compared to absolute (#6384)
With testing the new eviction order there is a problem of all of the
(currently rare) disk usage based evictions being rare and unique; this
PR adds a human readable summary of what absolute order would had done
and what the relative order does. Assumption is that these loggings will
make the few evictions runs in staging more useful.

Cc: #5304 for allowing testing in the staging
2024-01-18 17:21:08 +02:00
Konstantin Knizhnik
02b916d3c9 Use [NEON_SMGR] tag for all messages in neon extension (#6313)
## Problem

Use [NEON_SMGR] for all log messages produced by neon extension.

## Summary of changes

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-01-18 17:08:34 +02:00
Anastasia Lubennikova
e6e013b3b7 Fix pgbouncer settings update:
- Start pgbouncer in VM from postgres user, to allow connection to
pgbouncer admin console.
- Remove unused compute_ctl options --pgbouncer-connstr
and --pgbouncer-ini-path.
- Fix and cleanup code of connection to pgbouncer, add retries
because pgbouncer may not be instantly ready when compute_ctl starts.
2024-01-18 11:27:12 +00:00
John Spray
bd19290d9f pageserver: add shard_id to metric labels (#6308)
## Problem

tenant_id/timeline_id is no longer a full identifier for metrics from a
`Tenant` or `Timeline` object.

Closes: https://github.com/neondatabase/neon/issues/5953

## Summary of changes

Include `shard_id` label everywhere we have `tenant_id`/`timeline_id`
label.
2024-01-18 10:52:18 +00:00
Joonas Koivunen
a584e300d1 test: figure out the relative eviction order assertions (#6375)
I just failed to see this earlier on #6136. layer counts are used as an
abstraction, and each of the two tenants lose proportionally about the
same amount of layers. sadly there is no difference in between
`relative_spare` and `relative_equal` as both of these end up evicting
the exact same amount of layers, but I'll try to add later another test
for those.

Cc: #5304
2024-01-18 12:39:45 +02:00
Joonas Koivunen
e247ddbddc build: update h2 (#6383)
Notes: https://github.com/hyperium/h2/releases/tag/v0.3.24

Related: https://rustsec.org/advisories/RUSTSEC-2024-0003
2024-01-18 09:54:15 +00:00
Konstantin Knizhnik
0dc4c9b0b8 Relsize hash lru eviction (#6353)
## Problem


Currently relation hash size is limited by "neon.relsize_hash_size" GUC
with default value 64k.
64k relations is not so small number... but it is enough to create 376
databases to exhaust it.

## Summary of changes

Use LRU replacement algorithm to prevent hash overflow

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-01-17 20:34:30 +02:00
John Spray
b6ec11ad78 control_plane: generalize attachment_service to handle sharding (#6251)
## Problem

To test sharding, we need something to control it. We could write python
code for doing this from the test runner, but this wouldn't be usable
with neon_local run directly, and when we want to write tests with large
number of shards/tenants, Rust is a better fit efficiently handling all
the required state.

This service enables automated tests to easily get a system with
sharding/HA without the test itself having to set this all up by hand:
existing tests can be run against sharded tenants just by setting a
shard count when creating the tenant.

## Summary of changes

Attachment service was previously a map of TenantId->TenantState, where
the principal state stored for each tenant was the generation and the
last attached pageserver. This enabled it to serve the re-attach and
validate requests that the pageserver requires.

In this PR, the scope of the service is extended substantially to do
overall management of tenants in the pageserver, including
tenant/timeline creation, live migration, evacuation of offline
pageservers etc. This is done using synchronous code to make declarative
changes to the tenant's intended state (`TenantState.policy` and
`TenantState.intent`), which are then translated into calls into the
pageserver by the `Reconciler`.

Top level summary of modules within
`control_plane/attachment_service/src`:
- `tenant_state`: structure that represents one tenant shard.
- `service`: implements the main high level such as tenant/timeline
creation, marking a node offline, etc.
- `scheduler`: for operations that need to pick a pageserver for a
tenant, construct a scheduler and call into it.
- `compute_hook`: receive notifications when a tenant shard is attached
somewhere new. Once we have locations for all the shards in a tenant,
emit an update to postgres configuration via the neon_local `LocalEnv`.
- `http`: HTTP stubs. These mostly map to methods on `Service`, but are
separated for readability and so that it'll be easier to adapt if/when
we switch to another RPC layer.
- `node`: structure that describes a pageserver node. The most important
attribute of a node is its availability: marking a node offline causes
tenant shards to reschedule away from it.

This PR is a precursor to implementing the full sharding service for
prod (#6342). What's the difference between this and a production-ready
controller for pageservers?
- JSON file persistence to be replaced with a database
- Limited observability.
- No concurrency limits. Marking a pageserver offline will try and
migrate every tenant to a new pageserver concurrently, even if there are
thousands.
- Very simple scheduler that only knows to pick the pageserver with
fewest tenants, and place secondary locations on a different pageserver
than attached locations: it does not try to place shards for the same
tenant on different pageservers. This matters little in tests, because
picking the least-used pageserver usually results in round-robin
placement.
- Scheduler state is rebuilt exhaustively for each operation that
requires a scheduler.
- Relies on neon_local mechanisms for updating postgres: in production
this would be something that flows through the real control plane.

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2024-01-17 18:01:08 +00:00