Commit Graph

2901 Commits

Author SHA1 Message Date
Arthur Petukhovsky
2fd351fd63 Hide debug logs 2023-08-29 10:05:18 +00:00
Arthur Petukhovsky
13e94bf687 Fix truncateLsn bug 2023-08-29 09:03:22 +00:00
Arthur Petukhovsky
41b9750e81 Run many schedules 2023-08-24 23:42:11 +00:00
Arthur Petukhovsky
f8729f046d Fix excessive logs 2023-08-24 17:25:44 +00:00
Arthur Petukhovsky
420d3bc18f Add simulation schedule 2023-08-24 15:24:38 +00:00
Arthur Petukhovsky
33f7877d1b Show simulation time in logs 2023-08-23 10:10:11 +00:00
Arthur Petukhovsky
7de94c959a Support walproposer recovery 2023-08-22 23:15:46 +00:00
Arthur Petukhovsky
731ed3bb64 Support virtual disk in tests 2023-08-17 13:09:55 +00:00
Arthur Petukhovsky
413ce2cfe8 Crash safekeepers 2023-08-17 10:36:23 +00:00
Arthur Petukhovsky
7f36028fab Generate WAL in tests 2023-08-03 16:58:41 +00:00
Arthur Petukhovsky
cb6a8d3fe3 Fix some warnings 2023-07-28 21:37:16 +00:00
Arthur Petukhovsky
095747afc0 Fix walproposer main loop 2023-07-28 21:18:08 +00:00
Arthur Petukhovsky
89bd7ab8a3 Fix read/write in walproposer 2023-07-28 15:14:24 +00:00
Arthur Petukhovsky
5034a8cca0 WIP 2023-07-26 22:51:19 +02:00
Arthur Petukhovsky
55e40d090e Run sync several times 2023-07-25 11:16:47 +00:00
Arthur Petukhovsky
d87e822169 Return LSN from sync safekeepers 2023-07-24 21:15:35 +00:00
Arthur Petukhovsky
296a0cbac2 Add -DSIMLIB 2023-07-21 15:40:47 +00:00
Arthur Petukhovsky
aed14f52d5 Test sync safekeepers 2023-06-03 19:11:28 +00:00
Arthur Petukhovsky
909d7fadb8 Implement simlib sk server 2023-06-02 14:49:55 +00:00
Arthur Petukhovsky
3840d6b18b Clean up C API 2023-06-01 09:38:07 +00:00
Arthur Petukhovsky
65f92232e6 Compile walproposer 2023-05-31 21:06:47 +00:00
Arthur Petukhovsky
0d4f987fc8 Implement full simlib C API 2023-05-31 20:25:25 +00:00
Arthur Petukhovsky
aa0763d49d Run simulator on C code 2023-05-31 16:55:16 +00:00
Arthur Petukhovsky
7b5123edda Fix elog 2023-05-31 15:06:26 +00:00
Arthur Petukhovsky
b6a80bc269 Link postgres to rust statically 2023-05-31 13:19:41 +00:00
Arthur Petukhovsky
ac82b34c64 Create more involved example 2023-05-30 16:43:33 +00:00
Arthur Petukhovsky
a77fc2c5ff Test Rust -> C -> Rust codepath 2023-05-30 16:38:32 +00:00
Arthur Petukhovsky
9ccbec0e14 Spend some time 2023-05-26 14:45:25 +03:00
Arthur Petukhovsky
b55005d2c4 Build simple C func example 2023-05-26 14:44:48 +03:00
Arthur Petukhovsky
6436432a77 Showcase network failures 2023-05-25 12:53:20 +03:00
Arthur Petukhovsky
1b8918e665 Add accept, close and delays to the network 2023-05-25 12:26:57 +03:00
Arthur Petukhovsky
87c9edac7c Add basic support for network delays 2023-05-24 20:28:53 +03:00
Arthur Petukhovsky
5e0550a620 Add os.sleep and os.random 2023-05-24 15:51:30 +03:00
Arthur Petukhovsky
06f493f525 Extract simlib 2023-05-24 13:06:42 +03:00
Arthur Petukhovsky
f6b540ebfe Add initial support for virtual time 2023-05-22 15:00:56 +03:00
Arthur Petukhovsky
83f87af02b Remove sync debug 2023-03-10 00:10:09 +02:00
Arthur Petukhovsky
79823c38cd It looks deterministic now 2023-03-10 00:03:35 +02:00
Arthur Petukhovsky
072fb3d7e9 WIP 2023-03-09 14:59:03 +02:00
Arthur Petukhovsky
f2fb9f6be9 WIP 2023-03-09 14:51:29 +02:00
Arthur Petukhovsky
dd4c8fb568 WIP 2023-03-09 00:51:14 +02:00
Arthur Petukhovsky
9116c01614 WIP 2023-03-08 18:45:13 +02:00
Arthur Petukhovsky
17cd96e022 WIP 2023-03-03 20:33:55 +00:00
Christian Schwarz
9cada8b59d fix benchmarks, broken by PR #3737
Benchmarks only run on `main` branch, so, the pre-commit tests didn't
catch these.
2023-03-03 18:47:57 +01:00
Christian Schwarz
66a5159511 fix: compaction: no index upload scheduled if no on-demand downloads
Commit

    0cf7fd0fb8
    Compaction with on-demand download (#3598)

introduced a subtle bug: if we don't have to do on-demand downloads,
we only take one ROUND in fn compact() and exit early.
Thereby, we miss scheduling the index part upload for any layers
created by fn compact_inner().

Before that commit, we didn't have this problem.
So, this patch fixes it.

Since no regression test caught this, I went ahead and extended the
timeline size tests to assert that, if remote storage is configured,
1. pageserver_remote_physical_size matches the other physical sizes
2. file sizes reported by the layer map info endpoint match the other
   physical size metrics

Without the pageserver code fix, the regression test would
fail at the physical size assertion, complaining that
any of the resident physical size != remote physical size metric
50790400.0 != 18399232.0
I figured out what the problem is by comparing the remote storage
and local directories like so, and noticed that the image layer
in the local directory wasn't present on the remote side.
It's size was exactly the difference
    50790400.0 - 18399232.0  =32391168.0

fixes https://github.com/neondatabase/neon/issues/3738
2023-03-03 16:11:54 +01:00
Christian Schwarz
d1a0a907ff tests: use parse_metrics everywhere (#3737)
- use parse_metrics() in all places where we parse Prometheus metrics
- query_all: make `filter` argument optional
- encourage using properly parsed, typed metrics by changing get_metrics()
  to return already-parsed metrics. The new get_metric_str() method,
  like in the Safekeeper type, returns the raw text response.
2023-03-03 14:53:27 +01:00
Christian Schwarz
1b780fa752 timeline_checkpoint_handler: add span with tenant and timeline id
Before this patch, the logs written by freeze_and_flush() and compact()
didn't have any span, which made the test logs annoying to read.
2023-03-03 12:10:24 +01:00
Christian Schwarz
38022ff11c gc: only decrement resident size if GC'd layer is resident
Before this patch, GC would call PersistentLayer::delete()
on every GC'ed layer.
RemoteLayer::delete() returned Ok(()) unconditionally.
GC would then proceed by decrementing the resident size metric,
even though the layer is a RemoteLayer.

This patch makes the following changes:
- Rename PersistentLayer::delete() to delete_resident_layer_file().
  That name is unambiguous.
- Make RemoteLayer::delete_resident_layer_file return an Err().
  We would have uncovered this bug if we had done that from the start.
- Change GC / Timeline::delete_historic_layer check whether
  the layer is remote or not, and only call delete_resident_layer_file()
  if it's not remote. This brings us in line with how eviction does it.
- Add a regression test.

fixes https://github.com/neondatabase/neon/issues/3722
2023-03-03 12:10:24 +01:00
Christian Schwarz
1b9b9d60d4 eviction: add comment explaining resident size decrement on error
https://github.com/neondatabase/neon/issues/3722
2023-03-03 12:10:24 +01:00
Christian Schwarz
68141a924d eviction: remove needless if-let around resident size decrement
The branch was always taken at runtime, so, this should not
constitute a behavioral change.

refs https://github.com/neondatabase/neon/issues/3722
2023-03-03 12:10:24 +01:00
Christian Schwarz
764d27f696 fix checkpoint_timeout serialization in TenantConf
Without this change, when actually setting this conf opt, the tenant
would become Broken next time we load it.
Why?
The serde_toml representation that persist_tenant_conf would write out
would be a TOML inline table of `secs` and `nsecs`.
But our hand-rolled TenantConf parser expects a TOML string.

I checked that all other `Duration` values in TenantConfOpt
use the humantime serialization.

Issues like this would likely be systematically prevent by
https://github.com/neondatabase/neon/issues/3682
2023-03-03 12:10:24 +01:00