Commit Graph

8352 Commits

Author SHA1 Message Date
Alexey Masterov
6763925a4d Run all the operations 2025-07-18 16:32:07 +02:00
Alexey Masterov
3bcdbe30f1 Avoid to manipulate restored snapshots 2025-07-18 16:09:46 +02:00
Alexey Masterov
22975426b7 10x more wait 2025-07-18 15:17:29 +02:00
Alexey Masterov
31c6f66a49 Wait before delete 2025-07-18 15:08:45 +02:00
Alexey Masterov
287e01fdf9 retry more 2025-07-18 15:06:52 +02:00
Alexey Masterov
91c81cc5e5 refactor 2025-07-18 14:52:39 +02:00
Alexey Masterov
a8354b0aa3 Delete projects 2025-07-18 14:44:26 +02:00
Alexey Masterov
1102e2aff0 Add connect_env 2025-07-18 14:42:28 +02:00
Alexey Masterov
f6a61c9492 Add commit 2025-07-18 14:08:15 +02:00
Alexey Masterov
cbf8e248fc Do not delete project after failure (debug only, do not merge!) 2025-07-18 09:39:04 +02:00
Alexey Masterov
f0f30076cc Do not delete project after failure (debug only, do not merge!) 2025-07-18 09:35:26 +02:00
Alexey Masterov
42544cf145 Add debug 2025-07-17 19:57:20 +02:00
Alexey Masterov
28b25092ad An attempt 5 2025-07-17 19:49:49 +02:00
Alexey Masterov
b77a1fae04 An attempt 4 2025-07-17 18:58:00 +02:00
Alexey Masterov
73ed7ade70 An attempt 3 2025-07-17 18:53:09 +02:00
Alexey Masterov
74626b94a8 An attempt 2 2025-07-17 18:48:44 +02:00
Alexey Masterov
4ca6d8cecf An attempt 2025-07-17 18:32:57 +02:00
Alexey Masterov
bf0be50df9 Add debug 2025-07-17 15:06:53 +02:00
Alexey Masterov
1adc95758e add the database 2025-07-17 14:52:27 +02:00
Alexey Masterov
03e994f9c7 Connection parameters 2025-07-17 14:40:25 +02:00
Alexey Masterov
f0671c996e Debug 2025-07-17 14:27:09 +02:00
Alexey Masterov
829cb5fe59 use connection parameters instead of connect URI 2025-07-17 14:25:53 +02:00
Alexey Masterov
561083524d finalize restore by default 2025-07-17 13:58:09 +02:00
Alexey Masterov
009303e31f Connect to the target branch, not the main one 2025-07-17 13:44:19 +02:00
Alexey Masterov
0e42cac589 Add debug 2025-07-17 12:48:08 +02:00
Alexey Masterov
f5cebcaf6a Wait for the snapshot to complete 2025-07-17 12:34:43 +02:00
Alexey Masterov
5861d0f9b2 Add the environment 2025-07-17 12:01:50 +02:00
Alexey Masterov
dbedf11191 Add check for snapshot sanity 2025-07-17 11:50:30 +02:00
Alexey Masterov
1e20c4f2b2 format 2025-07-16 13:26:02 +02:00
Alexey Masterov
018f95115a Retry on 423 error "snapshot is in transition" 2025-07-16 13:21:32 +02:00
Alexey Masterov
f222256225 Added a documentation for the new methods 2025-07-15 17:07:07 +02:00
Alexey Masterov
17b5f5e090 Merge remote-tracking branch 'origin/amasterov/random-ops-add' into amasterov/random-ops-add 2025-07-15 16:35:48 +02:00
Alexey Masterov
9bf5d69c01 Cleanup 2025-07-15 16:35:16 +02:00
a-masterov
f816b3d90e Merge branch 'main' into amasterov/random-ops-add 2025-07-15 16:20:14 +02:00
Alexey Masterov
1ec1a82d3d Start benchmark 2025-07-15 15:16:45 +02:00
Alexey Masterov
e97c1d2684 Fix the parameter error 2025-07-15 14:49:27 +02:00
Alexander Bayandin
921a4f2009 CI(run-python-test-set): don't collect code coverage (#12601)
## Problem

We don't use code coverage produced by `regress-tests`
(neondatabase/neon#6798), so there's no need to collect it. Potentially,
disabling it should reduce the load on disks and improve the stability
of debug builds.

## Summary of changes
- Disable code coverage collection for regression tests
2025-07-15 11:16:29 +00:00
dependabot[bot]
eb93c3e3c6 build(deps): bump aiohttp from 3.10.11 to 3.12.14 in the pip group across 1 directory (#12600)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-15 11:06:58 +00:00
Alexander Bayandin
7a7ab2a1d1 Move build-tools.Dockerfile -> build-tools/Dockerfile (#12590)
## Problem

This is a prerequisite for neondatabase/neon#12575 to keep all things
relevant to `build-tools` image in a single directory

## Summary of changes
- Rename `build_tools/` to `build-tools/`
- Move `build-tools.Dockerfile` to `build-tools/Dockerfile`
2025-07-15 10:45:49 +00:00
a-masterov
94cfd3f22e Merge branch 'main' into amasterov/random-ops-add 2025-07-15 12:09:32 +02:00
Alexey Masterov
f45ea8fe6b Add snapshots 2025-07-15 12:08:38 +02:00
Alexey Masterov
1443ba65d3 Add reset_to_parent 2025-07-15 12:08:38 +02:00
Krzysztof Szafrański
ff526a1051 [proxy] Recognize more cplane errors, use retry_delay_ms as TTL (#12543)
## Problem

Not all cplane errors are properly recognized and cached/retried.

## Summary of changes

Add more cplane error reasons. Also, use retry_delay_ms as cache TTL if
present.

Related to https://github.com/neondatabase/cloud/issues/19353
2025-07-15 07:42:48 +00:00
Heikki Linnakangas
9a2456bea5 Reduce noise from get_installed_extensions during e.g shut down (#12479)
All Errors that can occur during get_installed_extensions() come from
tokio-postgres functions, e.g. if the database is being shut down
("FATAL: terminating connection due to administrator command"). I'm
seeing a lot of such errors in the logs with the regression tests, with
very verbose stack traces. The compute_ctl stack trace is pretty useless
for errors originating from the Postgres connection, the error message
has all the information, so stop printing the stack trace.

I changed the result type of the functions to return the originating
tokio_postgres Error rather than anyhow::Error, so that if we introduce
other error sources to the functions where the stack trace might be
useful, we'll be forced to revisit this, probably by introducing a new
Error type that separates postgres errors from other errors. But this
will do for now.
2025-07-14 18:42:36 +00:00
Mikhail
a456e818af LFC prewarm perftest: increase timeout for initialization job (#12594)
Tests on
https://github.com/neondatabase/neon/actions/runs/16268609007/job/45930162686
time out due to pgbench init job taking more than 30 minutes to run.
Increase test timeout duration to 2 hours.
2025-07-14 17:37:47 +00:00
Matthias van de Meent
3e6fdb0aa6 Add and use [U]INT64_[HEX_]FORMAT for various [u]int64 needs (#12592)
We didn't consistently apply these, and it wasn't consistently solved.
With this patch we should have a more consistent approach to this, and
have less issues porting changes to newer versions.

This also removes some potentially buggy casts to `long` from `uint64` -
they could've truncated the value in systems where `long` only has 32
bits.
2025-07-14 16:47:07 +00:00
Vlad Lazar
f8d3f86f58 pageserver: include records in get page debug handler (#12578)
Include records and image in the debug get page handler.
This endpoint does not update the metrics and does not support tracing.

Note that this now returns individual bytes which need to be encoded
properly for debugging.

Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com>
2025-07-14 16:37:28 +00:00
HaoyuHuang
f67a8a173e A few SK changes (#12577)
# TLDR 
This PR is a no-op. 

## Problem
When a SK loses a disk, it must recover all WALs from the very
beginning. This may take days/weeks to catch up to the latest WALs for
all timelines it owns.

## Summary of changes
When SK starts up,
if it finds that it has 0 timelines,
- it will ask SC for the timeline it owns.
- Then, pulls the timeline from its peer safekeepers to restore the WAL
redundancy right away.

After pulling timeline is complete, it will become active and accepts
new WALs.

The current impl is a prototype. We can optimize the impl further, e.g.,
parallel pull timelines.

---------

Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com>
2025-07-14 16:37:04 +00:00
Mikhail
2288efae66 Performance test for LFC prewarm (#12524)
https://github.com/neondatabase/cloud/issues/19011

Measure relative performance for prewarmed and non-prewarmed endpoints.
Add test that runs on every commit, and one performance test with a
remote cluster.
2025-07-14 13:41:31 +00:00
a-masterov
4fedcbc0ac Leverage the existing mechanism to retry 404 errors instead of implementing new code. (#12567)
## Problem
In https://github.com/neondatabase/neon/pull/12513, the new code was
implemented to retry 404 errors caused by the replication lag. However,
this implemented the new logic, making the script more complicated,
while we have an existing one in `neon_api.py`.
## Summary of changes
The existing mechanism is used to retry 404 errors.

---------

Co-authored-by: Alexey Masterov <alexey.masterov@databricks.com>
2025-07-14 13:25:25 +00:00