rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-06 13:40:37 +00:00

Author	SHA1	Message	Date
Arthur Petukhovsky	6a00ad3aab	Validate logs	2023-09-18 17:12:40 +00:00
Arthur Petukhovsky	61e6b24cb2	Fix fd leak	2023-09-18 09:54:19 +00:00
Arthur Petukhovsky	44c7d96ed0	Cleanup resources better	2023-09-16 23:10:53 +00:00
Arthur Petukhovsky	eb2886b401	Add test for 1000 WAL messages	2023-09-16 14:58:26 +00:00
Arthur Petukhovsky	0dc262a84a	Fix bug in walproposer voting	2023-08-29 14:11:04 +00:00
Arthur Petukhovsky	13e94bf687	Fix truncateLsn bug	2023-08-29 09:03:22 +00:00
Arthur Petukhovsky	41b9750e81	Run many schedules	2023-08-24 23:42:11 +00:00
Arthur Petukhovsky	7de94c959a	Support walproposer recovery	2023-08-22 23:15:46 +00:00
Arthur Petukhovsky	413ce2cfe8	Crash safekeepers	2023-08-17 10:36:23 +00:00
Arthur Petukhovsky	7f36028fab	Generate WAL in tests	2023-08-03 16:58:41 +00:00
Arthur Petukhovsky	cb6a8d3fe3	Fix some warnings	2023-07-28 21:37:16 +00:00
Arthur Petukhovsky	095747afc0	Fix walproposer main loop	2023-07-28 21:18:08 +00:00
Arthur Petukhovsky	89bd7ab8a3	Fix read/write in walproposer	2023-07-28 15:14:24 +00:00
Arthur Petukhovsky	5034a8cca0	WIP	2023-07-26 22:51:19 +02:00
Arthur Petukhovsky	55e40d090e	Run sync several times	2023-07-25 11:16:47 +00:00
Arthur Petukhovsky	d87e822169	Return LSN from sync safekeepers	2023-07-24 21:15:35 +00:00
Arthur Petukhovsky	296a0cbac2	Add -DSIMLIB	2023-07-21 15:40:47 +00:00
Arthur Petukhovsky	aed14f52d5	Test sync safekeepers	2023-06-03 19:11:28 +00:00
Arthur Petukhovsky	909d7fadb8	Implement simlib sk server	2023-06-02 14:49:55 +00:00
Arthur Petukhovsky	65f92232e6	Compile walproposer	2023-05-31 21:06:47 +00:00
Arthur Petukhovsky	b6a80bc269	Link postgres to rust statically	2023-05-31 13:19:41 +00:00
Arthur Petukhovsky	b55005d2c4	Build simple C func example	2023-05-26 14:44:48 +03:00
Konstantin Knizhnik	f71b1b174d	Check correctness of file_cache_size_limit (#3530 ) ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-02-03 08:01:26 +02:00
Konstantin Knizhnik	9ce5ada89e	Do not report position in SMGR message (#3307 ) refer #3277	2023-01-13 10:23:35 +02:00
MMeent	bb406b21a8	Fix issue in compaction code (#3246 ) If we ran `compact_prefetch_buffers` with exactly one hole in the buffers, the code would fail to remove the last, now unused, entry from the array. This is now fixed. Also, add and adjust some comments in the compaction code so that the algorithm used is a bit more clear. Fixes #3192	2023-01-12 19:23:59 +01:00
Konstantin Knizhnik	1983c4d4ad	Explain prefetch (#3002 ) Co-authored-by: Bojan Serafimov <bojan.serafimov7@gmail.com>	2023-01-12 18:12:40 +02:00
Heikki Linnakangas	8b710b9753	Fix segfault if pageserver connection is lost during backend startup. It's not OK to return early from within a PG_TRY-CATCH block. The PG_TRY macro sets the global PG_exception_stack variable, and PG_END_TRY restores it. If we jump out in between with "return NULL", the PG_exception_stack is left to point to garbage. (I'm surprised the comments in PG_TRY_CATCH don't warn about this.) Add test that re-attaches tenant in pageserver while Postgres is running. If the tenant is detached while compute is connected and busy running queries, those queries will fail if they try to fetch any pages. But when the tenant is re-attached, things should start working again, without disconnecting the client <-> postgres connections. Without this fix, this reproduced the segfault. Fixes issue #3231	2023-01-05 18:51:47 +02:00
Heikki Linnakangas	c187de1101	Copy error message before it's freed. pageserver_disconnect() call invalidates 'pageserver_conn', including the error message pointer we got from PQerrorMessage(pageserver_conn). Copy the message to a temporary variable before disconnecting, like we do in a few other places. In the passing, clear 'pageserver_conn_wes' variable in a few places where it was free'd. I didn't see any live bug from this, but since pageserver_disconnect() checks if it's NULL, let's not leave it dangling to already-free'd memory.	2023-01-05 18:51:47 +02:00
Egor Suvorov	42c6ddef8e	Rename ZENITH_AUTH_TOKEN to NEON_AUTH_TOKEN Changes are: * Pageserver: start reading from NEON_AUTH_TOKEN by default. Warn if ZENITH_AUTH_TOKEN is used instead. * Compute, Docs: fix the default token name. * Control plane: change name of the token in configs and start sequences. Compatibility: * Control plane in tests: works, no compatibility expected. * Control plane for local installations: never officially supported auth anyways. If someone did enable it, `pageserver.toml` should be updated with the new `neon.pageserver_connstring` and `neon.safekeeper_token_env`. * Pageserver is backward compatible: you can run new Pageserver with old commands and environment configurations, but not vice-versa. The culprit is the hard-coded `NEON_AUTH_TOKEN`. * Compute has no code changes. As long as you update its configuration file with `pageserver_connstring` in sync with the start up scripts, you are good to go. * Safekeeper has no code changes and has never used `ZENITH_AUTH_TOKEN` in the first place.	2022-12-29 03:33:43 +03:00
Konstantin Knizhnik	140c0edac8	Yet another port of local file system cache (#2622 )	2022-12-27 14:42:51 +02:00
MMeent	3514e6e89a	Use neon_nblocks instead of get_cached_relsize (#3132 ) This prevents us from overwriting all blocks of a relation when we extend the relation without first caching the size - get_cached_relsize does not guarantee a correct result when it returns `false`.	2022-12-16 21:14:57 +01:00
MMeent	3321eea679	Fix for #3043 (#3048 )	2022-12-09 14:26:05 +01:00
MMeent	0d04cd0b99	Run compaction on the buffer holding received buffers when useful (#3028 ) This cleans up unused entries and reduces the chance of prefetch buffer thrashing.	2022-12-08 09:49:43 +01:00
MMeent	145e7e4b96	Prefetch cleanup: (#2876 ) - Enable `enable_seqscan_prefetch` by default - Drop use of `seqscan_prefetch_buffers` in favor of `[maintenance,effective]_io_concurrency` This includes adding some fields to the HeapScan execution node, and vacuum state. - Cleanup some conditionals in vacuumlazy.c - Clarify enable_seqscan_prefetch GUC description - Fix issues in heap SeqScan prefetching where synchronize_seqscan machinery wasn't handled properly.	2022-12-02 13:35:01 +01:00
Konstantin Knizhnik	c21104465e	Fix copying relation in walloged create database in PG15 (#2986 ) refer #2904	2022-12-01 22:27:18 +02:00
Konstantin Knizhnik	d9ab42013f	Resend prefetch request in case of pageserver restart (#2974 ) refer #2819 Co-authored-by: MMeent <matthias@neon.tech>	2022-12-01 12:16:15 +02:00
MMeent	0c1195c30d	Fix #2937 (#2940 ) Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2022-11-28 15:34:07 +01:00
Egor Suvorov	ae53dc3326	Add authentication between Safekeeper and Pageserver/Compute * Fix https://github.com/neondatabase/neon/issues/1854 * Never log Safekeeper::conninfo in walproposer as it now contains a secret token * control_panel, test_runner: generate and pass JWT tokens for Safekeeper to compute and pageserver * Compute: load JWT token for Safekepeer from the environment variable. Do not reuse the token from pageserver_connstring because it's embedded in there weirdly. * Pageserver: load JWT token for Safekeeper from the environment variable. * Rewrite docs/authentication.md	2022-11-25 04:17:42 +03:00
Egor Suvorov	10d554fcbb	walproposer: refactor safekeeper::conninfo initialization It is used both in WalProposerInit and ResetConnection. In the future the logic will become more complicated due to authentication with Safekeeper.	2022-11-25 04:17:42 +03:00
Heikki Linnakangas	3f39327622	Silence a few compiler warnings I saw these from the build of the compute docker image in the CI (compute-node-image-v15): pagestore_smgr.c: In function 'neon_prefetch': pagestore_smgr.c:1654:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] 1654 \| BufferTag tag = (BufferTag) { \| ^~~~~~~~~ walproposer.c:197:1: warning: no previous prototype for 'WalProposerSync' [-Wmissing-prototypes] 197 \| WalProposerSync(int argc, char *argv[]) \| ^~~~~~~~~~~~~~~ libpagestore.c: In function 'pageserver_connect': libpagestore.c💯9: warning: variable 'wc' set but not used [-Wunused-but-set-variable] 100 \| int wc; \| ^~ libpagestore.c: In function 'call_PQgetCopyData': libpagestore.c:144:9: warning: variable 'wc' set but not used [-Wunused-but-set-variable] 144 \| int wc; \| ^~ Harmless warnings, but let's be tidy. In the passing, I added some "extern" to a few function declarations that were missing them, and marked WalProposerSync as "static". Those changes are also purely cosmetic.	2022-11-19 14:11:04 +02:00
Konstantin Knizhnik	b9152f1ef4	Correctly terminate prefetch in case of pageserver restart (#2850 ) refer #2819 This patch requires deep knowledge of prefetch internals. So @MMeent please review it or suggest better solution.	2022-11-18 15:04:58 +02:00
MMeent	01778e37cc	Address issues in the pagestore prefetch mechanism: (#2790 ) - Update vendored PostgreSQL to address prefetch issues - Make flushed state explicit in PrefetchState - Move flush logic into prefetch_wait_for, where possible - Clean up some prefetch state handling code in the various code elements handling state transitions. - Fix a race condition in neon_read_at_lsn where a hash entry pointer was used after the hash table was updated. This could result in incorrect state transitions and assertion failures after disconnects during prefetch_wait_for in that neon_read_at_lsn. Fixes #2780	2022-11-15 15:12:38 +01:00
Heikki Linnakangas	e999f66b01	Use a cached WaitEventSet instead of WaitLatchOrSocket. When we repeatedly wait for the same events, it's faster to create the event set once and reuse it. While testing with a sequential scan test case, I saw WaitLatchOrSocket consuming a lot of CPU: > - 40.52% 0.14% postgres postgres [.] WaitLatchOrSocket > - 40.38% WaitLatchOrSocket > + 17.83% AddWaitEventToSet > + 9.47% close@plt > + 8.29% CreateWaitEventSet > + 4.57% WaitEventSetWait This eliminates most of that overhead.	2022-11-08 19:45:14 +02:00
andres	1cf257bc4a	feedback	2022-11-08 20:15:54 +04:00
andres	40164bd589	Use latestMsgReceivedAt in walproposer	2022-11-08 20:15:54 +04:00
MMeent	d5b6471fa9	Update prefetch mechanism: (#2687 ) Prefetch requests and responses are stored in a ringbuffer instead of a queue, which means we can utilize prefetches of many relations concurrently -- page reads of un-prefetched relations now don't imply dropping prefetches. In a future iteration, this may detect sequential scans based on the read behavior of sequential scans, and will dynamically prefetch buffers for such relations as needed. Right now, it still depends on explicit prefetch requests from PostgreSQL. The main improvement here is that we now have a buffer for prefetched pages of 128 entries with random access. Before, we had a similarly sized cache, but this cache did not allow for random access, which resulted in dropped entries when multiple systems used the prefetching subsystem concurrently. See also: #2544	2022-11-07 18:13:24 +02:00
Heikki Linnakangas	22cc8760b9	Move walredo process code under pgxn in the main 'neon' repository. - Refactor the way the WalProposerMain function is called when started with --sync-safekeepers. The postgres binary now explicitly loads the 'neon.so' library and calls the WalProposerMain in it. This is simpler than the global function callback "hook" we previously used. - Move the WAL redo process code to a new library, neon_walredo.so, and use the same mechanism as for --sync-safekeepers to call the WalRedoMain function, when launched with --walredo argument. - Also move the seccomp code to neon_walredo.so library. I kept the configure check in the postgres side for now, though.	2022-10-31 01:11:50 +01:00
Heikki Linnakangas	41550ec8bf	Remove unnecessary indirections of libpqwalproposer functions In the Postgres backend, we cannot link directly with libpq (check the pgsql-hackers arhive for all kinds of fun that ensued when we tried to do that). Therefore, the libpq functions are used through the thin wrapper functions in libpqwalreceiver.so, and libpqwalreceiver.so is loaded dynamically. To hide the dynamic loading and make the calls look like regular functions, we use macros to hide the function pointers. We had inherited the same indirections in libpqwalproposer, but it's not needed since the neon extension is already a shared library that's loaded dynamically. There's no problem calling the functions directly there. Remove the indirections.	2022-10-18 18:25:30 +03:00
Arseny Sher	9fe4548e13	Reimplement explicit timeline creation on safekeepers. With the ability to pass commit_lsn. This allows to perform project WAL recovery through different (from the original) set of safekeepers (or under different ttid) by 1) moving WAL files to s3 under proper ttid; 2) explicitly creating timeline on safekeepers, setting commit_lsn to the latest point; 3) putting the lastest .parital file to the timeline directory on safekeepers, if desired. Extend test_s3_wal_replay to exersise this behaviour. Also extends timeline_status endpoint to return postgres information.	2022-10-13 21:43:10 +04:00
sharnoff	4b25b9652a	Rename more zid-like idents (#2480 ) Follow-up to PR #2433 (`b8eb908a`). There's still a few more unresolved locations that have been left as-is for the same compatibility reasons in the original PR.	2022-09-20 11:06:31 -07:00

1 2

59 Commits