rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-13 16:32:56 +00:00

Author	SHA1	Message	Date
John Spray	aec92bfc34	pageserver: decrease utilization MAX_SHARDS (#10489 ) ## Problem The intent of this parameter is to have pageservers consider themselves "full" if they've got lots of shards, even if they have plenty of capacity. It works, but because we typically successfully oversubscribe capacity up to 200%, the MAX_SHARDS limit is effectively doubled, so this 20,000 value ends up meaning 40,000, whereas the original intent was to limit nodes to ~10000 shards. ## Summary of changes - Change MAX_SHARDS to 5000, so that a node with 5000 will get a 100% utilization, which is equivalent in practice to being considered "half full" by the storage controller in capacity terms. This is all a bit subtle and indiret. Originally the limit was baked into the pageserver with the idea that the pageserver knows better what its own resources tolerate than the storage controller does, but in practice it would be probably be easier to understand all this if we just did it controller-side. So there's scope to refactor here in future.	2025-01-27 17:03:32 +00:00
John Spray	b65a95f12e	controller: use PageserverUtilization for scheduling (#8711 ) ## Problem Previously, the controller only used the shard counts for scheduling. This works well when hosting only many-sharded tenants, but works much less well when hosting single-sharded tenants that have a greater deviation in size-per-shard. Closes: https://github.com/neondatabase/neon/issues/7798 ## Summary of changes - Instead of UtilizationScore, carry the full PageserverUtilization through into the Scheduler. - Use the PageserverUtilization::score() instead of shard count when ordering nodes in scheduling. Q: Why did test_sharding_split_smoke need updating in this PR? A: There's an interesting side effect during shard splits: because we do not decrement the shard count in the utilization when we de-schedule the shards from before the split, the controller will now prefer to pick _different_ nodes for shards compared with which ones held secondaries before the split. We could use our knowledge of splitting to fix up the utilizations more actively in this situation, but I'm leaning toward leaving the code simpler, as in practical systems the impact of one shard on the utilization of a node should be fairly low (single digit %).	2024-08-23 18:32:56 +01:00
John Spray	ecb01834d6	pageserver: implement utilization score (#8703 ) ## Problem When the utilization API was added, it was just a stub with disk space information. Disk space information isn't a very good metric for assigning tenants to pageservers, because pageservers making full use of their disks would always just have 85% utilization, irrespective of how much pressure they had for disk space. ## Summary of changes - Use the new layer visibiilty metric to calculate a "wanted size" per tenant, and sum these to get a total local disk space wanted per pageserver. This acts as the primary signal for utilization. - Also use the shard count to calculate a utilization score, and take the max of this and the disk-driven utilization. The shard count limit is currently set as a constant 20,000, which matches contemporary operational practices when loading pageservers. The shard count limit means that for tiny/empty tenants, on a machine with 3.84TB disk, each tiny tenant influences the utilization score as if it had size 160MB.	2024-08-13 15:15:55 +01:00
Kevin Mingtarja	a306d0a54b	implement Serialize/Deserialize for SystemTime with RFC3339 format (#7203 ) ## Problem We have two places that use a helper (`ser_rfc3339_millis`) to get serde to stringify SystemTimes into the desired format. ## Summary of changes Created a new module `utils::serde_system_time` and inside it a wrapper type `SystemTime` for `std::time::SystemTime` that serializes/deserializes to the RFC3339 format. This new type is then used in the two places that were previously using the helper for serialization, thereby eliminating the need to decorate structs. Closes #7151.	2024-04-08 15:53:07 +01:00
Joonas Koivunen	21b3e1d13b	fix(utilization): return used as does df (#7337 ) We can currently underflow `pageserver_resident_physical_size_global`, so the used disk bytes would show `u63::MAX` by mistake. The assumption of the API (and the documented behavior) was to give the layer files disk usage. Switch to reporting numbers that match `df` output. Fixes: #7336	2024-04-08 09:01:38 +03:00
Joonas Koivunen	bc7a82caf2	feat: bare-bones /v1/utilization (#6831 ) PR adds a simple at most 1Hz refreshed informational API for querying pageserver utilization. In this first phase, no actual background calculation is performed. Instead, the worst possible score is always returned. The returned bytes information is however correct. Cc: #6835 Cc: #5331	2024-02-22 13:58:59 +02:00

6 Commits