rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-04 12:02:55 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	b774ab54d4	Remove obsolete ones - Relation size cache was moved to extension - the changes in visibilitymap.c and freespace.c became unnecessary with v16, thanks to changes in upstream code - WALProposer was moved to extension - The hack in ReadBuffer_common to not throw an error on unexpected data beyond EOF was removed in v16 rebase. We haven't seen such errors, so I guess that was some early issue that was fixed long time ago. - The ginfast.c diff was made unnecessary by upstream commit 56b662523f	2024-06-18 20:01:32 +03:00
Heikki Linnakangas	33a09946fc	Prefetching has been implemented	2024-06-18 20:01:32 +03:00
Heikki Linnakangas	0396ed67f7	Update comments on various items To update things that have changed since this was written, and to reflect discussions at offsite meeting.	2024-06-18 20:01:32 +03:00
Heikki Linnakangas	8ee6724167	Update overview section to reflect current code organization	2024-06-18 20:01:32 +03:00
Arpad Müller	68a2298973	Add support to specifying storage account in AzureConfig (#8090 ) We want to be able to specify the storage account via the toml configuration, so that we can connect to multiple storage accounts in the same process. https://neondb.slack.com/archives/C06SJG60FRB/p1718702144270139	2024-06-18 16:03:23 +02:00
Heikki Linnakangas	0a256148b0	Update documentation on running locally with Docker (#8020 ) - Fix the dockerhub URLs - `neondatabase/compute-node` image has been replaced with Postgres version specific images like `neondatabase/compute-node-v16` - Use TAG=latest in the example, rather than some old tag. That's a sensible default for people to copy-past - For convenience, use a Postgres connection URL in the `psql` example that also includes the password. That way, there's no need to set up .pgpass - Update the image names in `docker ps` example to match what you get when you follow the example	2024-06-12 07:06:00 +00:00
Heikki Linnakangas	69aa1aca35	Update default Postgres version in docker-compose.yml (#8019 ) Let's be modern.	2024-06-12 09:19:24 +03:00
Cihan Demirci	84b6b95783	docs: fix unintentional file link (#7506 ) Not sure if this should actually be a link pointing to the `persistence.rs` file but following the conventions of the rest of the file, change `persistence.rs` reference to simply be a file name mention.	2024-04-30 14:17:01 +01:00
John Spray	0d8e68003a	Add a docs page for storage controller (#7392 ) ## Problem External contributors need information on how to use the storage controller. ## Summary of changes - Background content on what the storage controller is. - Deployment information on how to use it. This is not super-detailed, but should be enough for a well motivated third party to get started, with an occasional peek at the code.	2024-04-18 13:45:25 +00:00
John Spray	66fc465484	Clean up 'attachment service' names to storage controller (#7326 ) The binary etc were renamed some time ago, but the path in the source tree remained "attachment_service" to avoid disruption to ongoing PRs. There aren't any big PRs out right now, so it's a good time to cut over. - Rename `attachment_service` to `storage_controller` - Move it to the top level for symmetry with `storage_broker` & to avoid mixing the non-prod neon_local stuff (`control_plane/`) with the storage controller which is a production component.	2024-04-05 16:18:00 +01:00
John Spray	67522ce83d	docs: shard splitting RFC (#6358 ) Extend the previous sharding RFC with functionality for dynamically splitting shards to increase the total shard count on existing tenants.	2024-03-15 16:00:04 +00:00
John Spray	23416cc358	docs: sharding phase 1 RFC (#5432 ) We need to shard our Tenants to support larger databases without those large databases dominating our pageservers and/or requiring dedicated pageservers. This RFC aims to define an initial capability that will permit creating large-capacity databases using a static configuration defined at time of Tenant creation. Online re-sharding is deferred as future work, as is offloading layers for historical reads. However, both of these capabilities would be implementable without further changes to the control plane or compute: this RFC aims to define the cross-component work needed to bootstrap sharding end-to-end.	2024-03-15 11:14:25 +00:00
John Spray	89cf714890	tests/neon_local: rename "attachment service" -> "storage controller" (#7087 ) Not a user-facing change, but can break any existing `.neon` directories created by neon_local, as the name of the database used by the storage controller changes. This PR changes all the locations apart from the path of `control_plane/attachment_service` (waiting for an opportune moment to do that one, because it's the most conflict-ish wrt ongoing PRs like #6676 )	2024-03-12 11:36:27 +00:00
Andreas Scherbaum	5c6d78d469	Rename "zenith" to "neon" (#6957 ) Usually RFC documents are not modified, but the vast mentions of "zenith" in early RFC documents make it desirable to update the product name to today's name, to avoid confusion. ## Problem Early RFC documents use the old "zenith" product name a lot, which is not something everyone is aware of after the product was renamed. ## Summary of changes Replace occurrences of "zenith" with "neon". Images are excluded. --------- Co-authored-by: Andreas Scherbaum <andreas@neon.tech>	2024-03-04 13:02:18 +01:00
Vlad Lazar	5accf6e24a	attachment_service: JWT auth enforcement (#6897 ) ## Problem Attachment service does not do auth based on JWT scopes. ## Summary of changes Do JWT based permission checking for requests coming into the attachment service. Requests into the attachment service must use different tokens based on the endpoint: * `/control` and `/debug` require `admin` scope * `/upcall` requires `generations_api` scope * `/v1/...` requires `pageserverapi` scope Requests into the pageserver from the attachment service must use `pageserverapi` scope.	2024-02-26 18:17:06 +00:00
Clarence	09519c1773	chore: update wording in docs to improve readability (#6607 ) ## Problem Found typos while reading the docs ## Summary of changes Fixed the typos found	2024-02-04 19:33:38 +00:00
Clarence	3d1b08496a	Update words in docs for better readability (#6600 ) ## Problem Found typos while reading the docs ## Summary of changes Fixed the typos found	2024-02-03 00:59:39 +00:00
Arthur Petukhovsky	f2aa96f003	Console split RFC (#1997 ) [Rendered](https://github.com/neondatabase/neon/blob/rfc-console-split/docs/rfcs/017-console-split.md) Co-authored-by: Stas Kelvich <stas.kelvich@gmail.com>	2024-02-02 23:41:55 +02:00
Christian Schwarz	66c52a629a	RFC: vectored `Timeline::get` (#6250 )	2024-01-08 15:00:01 +00:00
Alexander Bayandin	7de829e475	test_runner: replace black with ruff format (#6268 ) ## Problem `black` is slow sometimes, we can replace it with `ruff format` (a new feature in 0.1.2 [0]), which produces pretty similar to black style [1]. On my local machine (MacBook M1 Pro 16GB): ``` # `black` on main $ hyperfine "BLACK_CACHE_DIR=/dev/null poetry run black ." Benchmark 1: BLACK_CACHE_DIR=/dev/null poetry run black . Time (mean ± σ): 3.131 s ± 0.090 s [User: 5.194 s, System: 0.859 s] Range (min … max): 3.047 s … 3.354 s 10 runs ``` ``` # `ruff format` on the current PR $ hyperfine "RUFF_NO_CACHE=true poetry run ruff format" Benchmark 1: RUFF_NO_CACHE=true poetry run ruff format Time (mean ± σ): 300.7 ms ± 50.2 ms [User: 259.5 ms, System: 76.1 ms] Range (min … max): 267.5 ms … 420.2 ms 10 runs ``` ## Summary of changes - Replace `black` with `ruff format` everywhere - [0] https://docs.astral.sh/ruff/formatter/ - [1] https://docs.astral.sh/ruff/formatter/#black-compatibility	2024-01-05 15:35:07 +00:00
Christian Schwarz	c272c68e5c	RFC: Per-Tenant GetPage@LSN Throttling (#5648 ) Implementation epic: https://github.com/neondatabase/neon/issues/5899	2023-12-19 11:20:56 +01:00
Arpad Müller	3842773546	Correct RFC number for Pageserver WAL DR RFC (#5997 ) When I opened #5248, 27 was an unused RFC number. Since then, two RFCs have been merged, so now 27 is taken. 29 is free though, so move it there.	2023-11-30 21:01:25 +00:00
Arpad Müller	8ec6033ed8	Pageserver disaster recovery RFC (#5248 ) Enable the pageserver to recover from data corruption events by implementing a feature to re-apply historic WAL records in parallel to the already occurring WAL replay. The feature is outside of the user-visible backup and history story, and only serves as a second-level backup for the case that there is a bug in the pageservers that corrupted the served pages. The RFC proposes the addition of two new features: * recover a broken branch from WAL (downtime is allowed) * a test recovery system to recover random branches to make sure recovery works	2023-11-30 14:30:17 +01:00
Arpad Müller	31a54d663c	Migrate links from wiki to notion (#5862 ) See the slack discussion: https://neondb.slack.com/archives/C033A2WE6BZ/p1696429688621489?thread_ts=1695647103.117499	2023-11-14 15:36:47 +00:00
Tristan Partin	726c8e6730	Add docs for updating Postgres for new minor versions	2023-10-31 12:31:14 -05:00
Christian Schwarz	4a50483861	docs: error handling: document preferred anyhow context & logging style (#5178 ) We already had strong support for this many months ago on Slack: https://neondb.slack.com/archives/C0277TKAJCA/p1673453329770429	2023-10-17 15:41:47 +01:00
Arpad Müller	e09d5ada6a	Azure blob storage support (#5546 ) Adds prototype-level support for [Azure blob storage](https://azure.microsoft.com/en-us/products/storage/blobs). Some corners were cut, see the TODOs and the followup issue #5567 for details. Steps to try it out: * Create a storage account with block blobs (this is a per-storage account setting). * Create a container inside that storage account. * Set the appropriate env vars: `AZURE_STORAGE_ACCOUNT, AZURE_STORAGE_ACCESS_KEY, REMOTE_STORAGE_AZURE_CONTAINER, REMOTE_STORAGE_AZURE_REGION` * Set the env var `ENABLE_REAL_AZURE_REMOTE_STORAGE=y` and run `cargo test -p remote_storage azure` Fixes #5562	2023-10-16 17:37:09 +02:00
John Spray	6b4bb91d0a	docs/rfcs: add RFC for fast tenant migration/failover (#5029 ) ## Problem Currently we don't have a way to migrate tenants from one pageserver to another without a risk of gap in availability. ## Summary of changes This follows on from https://github.com/neondatabase/neon/pull/4919 Migrating tenants between pageservers is essential to operating a service at scale, in several contexts: 1. Responding to a pageserver node failure by migrating tenants to other pageservers 2. Balancing load and capacity across pageservers, for example when a user expands their database and they need to migrate to a pageserver with more capacity. 3. Restarting pageservers for upgrades and maintenance Currently, a tenant may migrated by attaching to a new node, re-configuring endpoints to use the new node, and then later detaching from the old node. This is safe once [generation numbers](025-generation-numbers.md) are implemented, but does meet our seamless/fast/efficient goals: Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-09-28 10:07:11 +01:00
Christian Schwarz	5edae96a83	rfc: Crash-Consistent Layer Map Updates By Leveraging index_part.json (#5086 ) This RFC describes a simple scheme to make layer map updates crash consistent by leveraging the index_part.json in remote storage. Without such a mechanism, crashes can induce certain edge cases in which broadly held assumptions about system invariants don't hold.	2023-09-01 15:24:58 +02:00
John Spray	382473d9a5	docs: add RFC for remote storage generation numbers (#4919 ) ## Summary A scheme of logical "generation numbers" for pageservers and their attachments is proposed, along with changes to the remote storage format to include these generation numbers in S3 keys. Using the control plane as the issuer of these generation numbers enables strong anti-split-brain properties in the pageserver cluster without implementing a consensus mechanism directly in the pageservers. ## Motivation Currently, the pageserver's remote storage format does not provide a mechanism for addressing split brain conditions that may happen when replacing a node during failover or when migrating a tenant from one pageserver to another. From a remote storage perspective, a split brain condition occurs whenever two nodes both think they have the same tenant attached, and both can write to S3. This can happen in the case of a network partition, pathologically long delays (e.g. suspended VM), or software bugs. This blocks robust implementation of failover from unresponsive pageservers, due to the risk that the unresponsive pageserver is still writing to S3. --------- Co-authored-by: Christian Schwarz <christian@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-08-30 09:49:55 +01:00
Christian Schwarz	ed5bce7cba	rfcs: archive my MVCC S3 Notion Proposal (#5040 ) This is a copy from the [original Notion page](https://www.notion.so/neondatabase/Proposal-Pageserver-MVCC-S3-Storage-8a424c0c7ec5459e89d3e3f00e87657c?pvs=4), taken on 2023-08-16. This is for archival mostly. The RFC that we're likely to go with is https://github.com/neondatabase/neon/pull/4919.	2023-08-18 19:34:29 +02:00
Alek Westover	d005c77ea3	Tar Remote Extensions (#4715 ) Add infrastructure to dynamically load postgres extensions and shared libraries from remote extension storage. Before postgres start downloads list of available remote extensions and libraries, and also downloads 'shared_preload_libraries'. After postgres is running, 'compute_ctl' listens for HTTP requests to load files. Postgres has new GUC 'extension_server_port' to specify port on which 'compute_ctl' listens for requests. When PostgreSQL requests a file, 'compute_ctl' downloads it. See more details about feature design and remote extension storage layout in docs/rfcs/024-extension-loading.md --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Alek Westover <alek.westover@gmail.com>	2023-08-02 12:38:12 +03:00
arpad-m	d98cb39978	pageserver: use tokio::time::timeout where possible (#4756 ) Removes a bunch of cases which used `tokio::select` to emulate the `tokio::time::timeout` function. I've done an additional review on the cancellation safety of these futures, all of them seem to be cancellation safe (not that `select!` allows non-cancellation-safe futures, but as we touch them, such a review makes sense). Furthermore, I correct a few mentions of a non-existent `tokio::timeout!` macro in the docs to the `tokio::time::timeout` function.	2023-07-20 16:19:38 +02:00
bojanserafimov	1aad8918e1	Document recommended ccls setup (#4723 )	2023-07-17 09:21:42 -04:00
Stas Kelvich	444d6e337f	add rfcs/022-user-mgmt.md (#3838 ) Co-authored-by: Vadim Kharitonov <vadim@neon.tech>	2023-07-12 19:58:55 +02:00
Tomoka Hayashi	91435006bd	Fix docker-compose file and document (#4621 ) ## Problem - Running the command according to docker.md gives warning and error. - Warning `permissions should be u=rw (0600) or less` is output when executing `psql -h localhost -p 55433 -U cloud_admin`. - `FATAL: password authentication failed for user "root”` is output in compute logs. ## Summary of changes - Add `$ chmod 600 ~/.pgpass` in docker.md to avoid warning. - Add username (cloud_admin) to pg_isready command in docker-compose.yml to avoid error. --------- Co-authored-by: Tomoka Hayashi <tomoka.hayashi@ntt.com>	2023-07-06 10:11:24 +01:00
Heikki Linnakangas	9787227c35	Shield HTTP request handlers from async cancellations. (#4314 ) We now spawn a new task for every HTTP request, and wait on the JoinHandle. If Hyper drops the Future, the spawned task will keep running. This protects the rest of the pageserver code from unexpected async cancellations. This creates a CancellationToken for each request and passes it to the handler function. If the HTTP request is dropped by the client, the CancellationToken is signaled. None of the handler functions make use for the CancellationToken currently, but they now they could. The CancellationToken arguments also work like documentation. When you're looking at a function signature and you see that it takes a CancellationToken as argument, it's a nice hint that the function might run for a long time, and won't be async cancelled. The default assumption in the pageserver is now that async functions are not cancellation-safe anyway, unless explictly marked as such, but this is a nice extra reminder. Spawning a task for each request is OK from a performance point of view because spawning is very cheap in Tokio, and none of our HTTP requests are very performance critical anyway. Fixes issue #3478	2023-06-02 08:28:13 -04:00
Dmitry Rodionov	7529ee2ec7	rfc: the state of pageserver tenant relocation (#3868 ) Summarize current state of tenant relocation related activities and implementation ideas	2023-05-19 14:35:33 +03:00
Heikki Linnakangas	72346e102d	Document that our code is mostly not async cancellation-safe. We had a hot debate on whether we should try to make our code cancellation-safe, or just accept that it's not, and make sure that our Futures are driven to completion. The decision is that we drive Futures to completion. This documents the decision, and summarizes the reasoning for that. Discussion that sparked this: https://github.com/neondatabase/neon/pull/4198#discussion_r1190209316	2023-05-17 17:29:54 +03:00
mikecaat	14a40c9ca6	Fix minor things for the docker-compose file (#3862 ) * Add the REPOSITORY env to build args to avoid the following error when executing without the credentials for the repository. ``` ERROR: Service 'compute' failed to build: Head "https://369495373322.dkr.ecr.eu-central-1.amazonaws.com/v2/compute-node-v15/manifests/2221": no basic auth credentials ``` * update the tag version in the documentation to support storage broker	2023-03-22 08:10:53 +00:00
Dmitry Rodionov	4158e24e60	rfc: delete pageserver data from s3 (#3792 ) [Rendered](https://github.com/neondatabase/neon/blob/main/docs/rfcs/022-pageserver-delete-from-s3.md) --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-03-21 20:03:27 +02:00
Heikki Linnakangas	299db9d028	Simplify and clean up the $NEON_AUTH_TOKEN stuff in compute - Remove the neon.safekeeper_token_env GUC. It was used to set the name of an environment variable, which was then used in pageserver and safekeeper connection strings to in place of the password. Instead, always look up the environment variable called NEON_AUTH_TOKEN. That's what neon.safekeeper_token_env was always set to in practice, and I don't see the need for the extra level of indirection or configurability. - Instead of substituting $NEON_AUTH_TOKEN in the connection strings, pass $NEON_AUTH_TOKEN "out-of-band" as the password, when we connect to the pageserver or safekeepers. That's simpler. - Also use the password from $NEON_AUTH_TOKEN in compute_ctl, when it connects to the pageserver to get the "base backup".	2023-03-21 00:15:04 +02:00
Heikki Linnakangas	fea4b5f551	Switch to EdDSA algorithm for the storage JWT authentication tokens. The control plane currently only supports EdDSA. We need to either teach the storage to use EdDSA, or the control plane to use RSA. EdDSA is more modern, so let's use that. We could support both, but it would require a little more code and tests, and we don't really need the flexibility since we control both sides.	2023-03-20 16:28:01 +02:00
Heikki Linnakangas	10a5d36af8	Separate mgmt and libpq authentication configs in pageserver. (#3773 ) This makes it possible to enable authentication only for the mgmt HTTP API or the compute API. The HTTP API doesn't need to be directly accessible from compute nodes, and it can be secured through network policies. This also allows rolling out authentication in a piecemeal fashion.	2023-03-15 13:52:29 +02:00
Alexander Bayandin	3d869cbcde	Replace flake8 and isort with ruff (#3810 ) - Introduce ruff (https://beta.ruff.rs/) to replace flake8 and isort - Update mypy and black	2023-03-14 13:25:44 +00:00
Heikki Linnakangas	b00530df2a	Add section in internal docs on the JWT payload. Just copied from the code comments. Could be improved, but this is a start.	2023-03-10 16:09:32 +02:00
Shany Pozin	7b182e2605	Update settings.md with latest PITR and gc period values (#3618 ) ## Describe your changes Updates PITR and GC_PERIOD default value doc ## Issue ticket number and link ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-02-16 10:33:04 +02:00
Anna Stepanyan	a974602f9f	fix the logical size term definition (#3609 ) a size of a database cannot be a sum of the sizes of all databases indicating that a logical size is calculated for a branch ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] i checked the suggested changes - [x] this is not a core feature - [x] this is just a docs update, does not require analytics - [x] this PR does not require a public announcement	2023-02-15 15:11:06 +01:00
Heikki Linnakangas	2040db98ef	Add docs for synthetic size calculation (#3328 ) --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2023-02-09 11:20:10 +02:00
Stas Kelvich	431e464c1e	Consumption metering RFC	2023-01-16 19:15:59 +02:00

1 2 3 4

184 Commits