From 7c076edeea8741e2f6e2c4108041170a122eed1c Mon Sep 17 00:00:00 2001 From: John Spray Date: Tue, 1 Aug 2023 13:17:49 +0100 Subject: [PATCH] pageserver: tweak period of imitate_layer_accesses (#4859) ## Problem When the eviction threshold is an integer multiple of the eviction period, it is unreliable to skip imitating accesses based on whether the last imitation was more recent than the threshold. This is because as finite time passes between the time used for the periodic execution, and the 'now' time used for updating last_layer_access_imitation. When this is just a few milliseconds, and everything else is on-time, then a 5 second threshold with a 1 second period will end up entering its 5th iteration slightly _less than_ 5 second since last_layer_access_imitation, and thereby skipping instead of running the imitation. If a few milliseconds then pass before we check the access time of a file that _should_ have been bumped by the imitation pass, then we end up evicting something we shouldn't have evicted. ## Summary of changes We can make this race far less likely by using the threshold minus one interval as the period for re-executing the imitate_layer_accesses: that way we're not vulnerable to racing by just a few millis, and there would have to be a delay of the order `period` to cause us to wrongly evict a layer. This is not a complete solution: it would be good to revisit this and use a non-walltime mechanism for pinning these layers into local storage, rather than relying on bumping access times. --- pageserver/src/tenant/timeline/eviction_task.rs | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/pageserver/src/tenant/timeline/eviction_task.rs b/pageserver/src/tenant/timeline/eviction_task.rs index 5485cc42b4..354f971e11 100644 --- a/pageserver/src/tenant/timeline/eviction_task.rs +++ b/pageserver/src/tenant/timeline/eviction_task.rs @@ -308,8 +308,13 @@ impl Timeline { ctx: &RequestContext, ) -> ControlFlow<()> { let mut state = self.eviction_task_timeline_state.lock().await; + + // Only do the imitate_layer accesses approximately as often as the threshold. A little + // more frequently, to avoid this period racing with the threshold/period-th eviction iteration. + let inter_imitate_period = p.threshold.checked_sub(p.period).unwrap_or(p.threshold); + match state.last_layer_access_imitation { - Some(ts) if ts.elapsed() < p.threshold => { /* no need to run */ } + Some(ts) if ts.elapsed() < inter_imitate_period => { /* no need to run */ } _ => { self.imitate_timeline_cached_layer_accesses(cancel, ctx) .await; @@ -332,7 +337,7 @@ impl Timeline { }; let mut state = tenant.eviction_task_tenant_state.lock().await; match state.last_layer_access_imitation { - Some(ts) if ts.elapsed() < p.threshold => { /* no need to run */ } + Some(ts) if ts.elapsed() < inter_imitate_period => { /* no need to run */ } _ => { self.imitate_synthetic_size_calculation_worker(&tenant, ctx, cancel) .await;