Anastasia Lubennikova
506086a3e2
Fix metric_collection_endpoint for prod.
...
It was incorrectly set to staging url
2023-01-18 16:35:43 +02:00
Heikki Linnakangas
3b58c61b33
If an error happens while checking for core dumps, don't panic.
...
If we panic, we skip the 30s wait in 'main', and don't give the
console a chance to observe the error. Which is not nice.
Spotted by @ololobus at
https://github.com/neondatabase/neon/pull/3352#discussion_r1072806981
2023-01-18 11:25:47 +02:00
Kirill Bulatov
c6b56d2967
Add more io::Error context when fail to operate on a path ( #3254 )
...
I have a test failure that shows
```
Caused by:
0: Failed to reconstruct a page image:
1: Directory not empty (os error 39)
```
but does not really show where exactly that happens.
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-3227/release/3823785365/index.html#categories/c0057473fc9ec8fb70876fd29a171ce8/7088dab272f2c7b7/?attachment=60fe6ed2add4d82d
The PR aims to add more context in debugging that issue.
2023-01-17 22:07:38 +02:00
Anastasia Lubennikova
9d3992ef48
Increase metric_collection_interval for proxy on prod
...
to not owerwhelm the service
2023-01-17 15:50:19 +02:00
Anastasia Lubennikova
7624963e13
Enable metric_collection_endpoint for proxy on prod
...
in all regions
2023-01-17 13:43:50 +02:00
Anastasia Lubennikova
63e3b815a2
Enable metric_collection_endpoint for pageserver on prod
...
in all regions
2023-01-17 13:43:50 +02:00
Kirill Bulatov
1ebd145c29
Actualize the comment ( #3362 )
...
Follow-up of
https://github.com/neondatabase/neon/pull/3326#issuecomment-1384265759
2023-01-17 13:30:42 +02:00
sharnoff
f8e887830a
build: Use curl -f on vm-informant download ( #3363 )
...
Without this, we can silently fail
2023-01-17 10:38:33 +01:00
Christian Schwarz
48dd9565ac
TaskHandle: tone down sender is dropped while join handle is still alive
...
Rationale: see comments added as part of this commit.
fixes https://github.com/neondatabase/neon/issues/3339
2023-01-17 09:42:22 +01:00
Anastasia Lubennikova
e067cd2947
Enable metric collection for proxy on staging
2023-01-16 21:15:42 +02:00
Christian Schwarz
58c8c1076c
download_all_remote_layers API: require client to specify max_concurrent_downloads
...
Before this patch, we would start all layer downloads simultaneously.
There is at most one download_all_remote_layers task per timeline.
Hence, the specified limit is per timeline.
There is still no global concurrency limit for layer downloads.
We'll have to revisit that at some point and also prioritize on-demand
initiated downloads over download_all_remote_layers downloads.
But that's for another day.
2023-01-16 19:29:06 +01:00
Alexander Bayandin
4c6b507472
Update Postgres clients we test ( #3359 )
...
Update client libraries and runtimes for Postgres libraries we test.
- `pg8000` works with Neon now 🎉
- `PostgresClientKit` still doesn't support SNI
2023-01-16 17:22:17 +00:00
Stas Kelvich
431e464c1e
Consumption metering RFC
2023-01-16 19:15:59 +02:00
danieltprice
424fd0bd63
Update auth.rs ( #3349 )
...
Update SNI error message. Users now specify the endpoint ID when making
a connection to Neon. This should be reflected in the error message.
2023-01-16 12:32:00 -04:00
Joonas Koivunen
a8a9bee602
walredo: simple tests and bench updates ( #3045 )
...
Separated from #2875 .
The microbenchmark has been validated to show similar difference as to
larger scale OLTP benchmark.
2023-01-16 18:24:45 +02:00
Vadim Kharitonov
6ac5656be5
Enable earthdistance extension
2023-01-16 17:04:51 +01:00
Anastasia Lubennikova
3c571ecde8
Update docs/consumption_metrics.md
2023-01-16 17:24:13 +02:00
Anastasia Lubennikova
5f1bd0e8a3
Add documentation for consumption metrics
2023-01-16 17:24:13 +02:00
Anastasia Lubennikova
2cbe84b78f
Proxy metrics ( #3290 )
...
Implement proxy metrics collection.
Only collect metric for outbound traffic.
Add proxy CLI parameters:
- metric-collection-endpoint
- metric-collection-interval.
Add test_proxy_metric_collection test.
Move shared consumption metrics code to libs/consumption_metrics.
Refactor the code.
2023-01-16 15:17:28 +00:00
sharnoff
5c6a7a17cb
Add VM informant to vm-compute-node ( #3324 )
...
The general idea is that the VM informant binary is added to the
vm-compute-node images only. `compute_tools` then will run whatever's at
`/bin/vm-informant`, if the path exists.
2023-01-16 07:05:29 -08:00
Arseny Sher
84ffdc8b4f
Don't keep FDs open on cancelled timelines in safekeepers.
...
Since PR #3300 we don't remove timelines completely until next restart, so this
prevents leakage.
fixes https://github.com/neondatabase/neon/issues/3336
2023-01-16 19:03:38 +04:00
Kirill Bulatov
bce4233d3a
Rework Cargo.toml dependencies ( #3322 )
...
* Use workspace variables from cargo, coming with rustc
[1.64](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1640-2022-09-22 )
See
https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-package-table
and
https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-dependencies-table
sections.
Now, all dependencies in all non-root `Cargo.toml` files are defined as
```
clap.workspace = true
```
sometimes, when extra features are needed, as
```
bytes = {workspace = true, features = ['serde'] }
```
With the actual declarations (with shared features and version
numbers/file paths/etc.) in the root Cargo.toml.
Features are additive:
https://doc.rust-lang.org/nightly/cargo/reference/specifying-dependencies.html#inheriting-a-dependency-from-a-workspace
* Uses the mechanism above to set common, 2021, edition and license across the
workspace
* Mechanically bumps a few dependencies
* Updates hakari format, as it suggested:
```
work/neon/neon kb/cargo-templated ❯ cargo hakari generate
info: no changes detected
info: new hakari format version available: 3 (current: 2)
(add or update `dep-format-version = "3"` in hakari.toml, then run `cargo hakari generate && cargo hakari manage-deps`)
```
2023-01-13 18:13:34 +02:00
Vadim Kharitonov
16baa91b2b
Add more information about cargo deny
2023-01-13 13:24:34 +01:00
Kirill Bulatov
99808558de
Avoid duplicate timeline insert ( #3326 )
...
`initialize_with_lock` inserts `Arc<Timeline>` before returning it:
c1731bc4f0/pageserver/src/tenant.rs (L222)
but `setup_timeline` function did another insert, which got removed in this PR:
c1731bc4f0/pageserver/src/tenant.rs (L486)
On top, a better comment and function renames are added.
2023-01-13 12:05:54 +00:00
Anastasia Lubennikova
c6d383e239
code cleanup
2023-01-13 11:51:28 +02:00
Anastasia Lubennikova
5e3e0fbf6f
remove unneeded Cargo.lock changes
2023-01-13 11:51:28 +02:00
Anastasia Lubennikova
26f39c03f2
review code cleanup:
...
- handle errors in calculate_synthetic_size_worker. Don't exit the bgworker if one tenant failed.
- add cached_synthetic_tenant_size to cache values calculated by the bgworker
- code cleanup: remove unneeded info! messages, clean comments
- handle collect_metrics_task() error. Don't exit collect_metrics worker if one task failed.
- add unit test to cover case when we have multiple branches at the same lsn
2023-01-13 11:51:28 +02:00
Anastasia Lubennikova
148e020fb9
Fix logical size calculation:
...
sort updates in topological order so that the parent timeline always preceeds its children.
fixes #3179
2023-01-13 11:51:28 +02:00
Anastasia Lubennikova
0675859bb0
Add background worker that periodically spawns
...
synthetic size calculation.
Add new pageserver config param calculate_synthetic_size_interval
2023-01-13 11:51:28 +02:00
Anastasia Lubennikova
ba0190e3e8
Handle errors in tenant_size_model code
2023-01-13 11:51:28 +02:00
Konstantin Knizhnik
9ce5ada89e
Do not report position in SMGR message ( #3307 )
...
refer #3277
2023-01-13 10:23:35 +02:00
Alexander Bayandin
c28bfd4c63
Nightly Benchmarks: add user provided example ( #3308 )
2023-01-12 23:03:21 +00:00
Vadim Kharitonov
dec875fee1
Disable postgis_sfcgal
2023-01-12 21:51:49 +01:00
Kirill Bulatov
fe8cef3427
Use ready! rustc 1.64 macro ( #3315 )
...
rustc
[1.64](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1640-2022-09-22 )
had brought `ready!` macro:
https://doc.rust-lang.org/stable/std/task/macro.ready.html
Use it to shorten the code slightly.
2023-01-12 21:27:34 +02:00
MMeent
bb406b21a8
Fix issue in compaction code ( #3246 )
...
If we ran `compact_prefetch_buffers` with exactly one hole in the
buffers, the code would fail to remove the last, now unused, entry from
the array.
This is now fixed.
Also, add and adjust some comments in the compaction code so that the
algorithm used is a bit more clear.
Fixes #3192
2023-01-12 19:23:59 +01:00
Heikki Linnakangas
57a6e931ea
Comment, formatting and other cosmetic cleanup.
2023-01-12 19:05:13 +02:00
Heikki Linnakangas
0cceb14e48
Add a FIXME on ugly error message parsing.
2023-01-12 19:05:13 +02:00
Konstantin Knizhnik
1983c4d4ad
Explain prefetch ( #3002 )
...
Co-authored-by: Bojan Serafimov <bojan.serafimov7@gmail.com >
2023-01-12 18:12:40 +02:00
Heikki Linnakangas
d7c41cbbee
Replace tokio::watch with CancellationToken.
...
PR #3228 starts to use CancellationTokens more widely, this is a small
part extracted from that.
2023-01-12 17:37:15 +02:00
Vadim Kharitonov
29a2465276
Update rust version in toolchain
2023-01-12 15:16:52 +01:00
Arthur Petukhovsky
f49e923d87
Keep deleted timelines in memory of safekeeper ( #3300 )
...
A temporal fix for https://github.com/neondatabase/neon/issues/3146 ,
until we come up with a reliable way to create and delete timelines in
all safekeepers.
2023-01-12 15:33:07 +03:00
Vadim Kharitonov
a0ee306c74
Enable safe contrib extensions
2023-01-12 12:41:53 +01:00
Heikki Linnakangas
c1731bc4f0
Push on-demand download into Timeline::get() function itself.
...
This makes Timeline::get() async, and all functions that call it
directly or indirectly with it. The with_ondemand_download() mechanism
is gone, Timeline::get() now always downloads files, whether you want
it or not. That is what all the current callers want, so even though
this loses the capability to get a page only if it's already in the
pageserver, without downloading, we were not using that capability.
There were some places that used 'no_ondemand_download' in the WAL
ingestion code that would error out if a layer file was not found
locally, but those were dubious. We do actually want to on-demand
download in all of those places.
Per discussion at
https://github.com/neondatabase/neon/pull/3233#issuecomment-1368032358
2023-01-12 11:53:10 +02:00
Sergey Melnikov
95bf19b85a
Add --atomic to all helm upgrade operations ( #3299 )
...
When number of github actions workers is changed, some jobs get killed.
When helm if killed during the upgrade, release stuck in pending-upgrade
state. --atomic should initiate automatic rollback in this case.
2023-01-10 10:05:27 +00:00
Vadim Kharitonov
80d4afab0c
Update tokio version (RUSTSEC-2023-0001)
2023-01-10 09:02:00 +01:00
Arthur Petukhovsky
0807522a64
Enable wss proxy in all regions ( #3292 )
...
Follow-up to https://github.com/neondatabase/helm-charts/pull/24 and
#3247
2023-01-09 19:56:12 +00:00
Christian Schwarz
8eebd5f039
run on-demand compaction in a task_mgr task
...
With this patch, tenant_detach and timeline_delete's
task_mgr::shutdown_tasks() call will wait for on-demand
compaction to finish.
Before this patch, the on-demand compaction would grab the
layer_removal_cs after tenant_detach / timeline_delete had
removed the timeline directory.
This resulted in error
No such file or directory (os error 2)
NB: I already implemented this pattern for ondemand GC a while back.
fixes https://github.com/neondatabase/neon/issues/3136
2023-01-09 19:08:22 +01:00
Heikki Linnakangas
8c07ef413d
Minor cleanup of test_ondemand_download_timetravel test.
...
- Fix and improve comments
- Rename 'physical_size' local variable to 'resident_size' for clarity.
- Remove one 'unnecessary wait_for_upload' call. The
'wait_for_sk_commit_lsn_to_reach_remote_storage' call after shutting
down compute is sufficient.
2023-01-09 18:56:50 +02:00
Sergey Melnikov
14df37c108
Use GHA environments for gradual prod rollout ( #3295 )
...
Each release will wait for manual approval for each region
2023-01-09 20:18:16 +04:00
Christian Schwarz
d4d0aa6ed6
gc_iteration_internal: better log message & debug log level if nothing to do
...
fixes https://github.com/neondatabase/neon/issues/3107
2023-01-09 13:53:59 +01:00