Came here to say that this is spot on with regards to how solutions usually iterate. Depending on your use case, you could easily see how this pattern scales on a per-endpoint rate limit. At extremely high loads on authentication, however, you could accidentally rate limit clients should their reserved (and not yet cancelled) requests go beyond the limits. Taking it a step further, you can add the IP to a temporary whitelist in the reservation implementation on successful auth to avoid the entire state for some period of time (usually 5 minutes works in our high load cases).
We’ve also gone ahead and used CF to block truly abusive traffic patterns, and while it gets a bunch of hate for whatever reason the individual may have, it gets the job done and saves my teams tons of time having to solve the same probleslms they’ve mastered.
(Edit - typo)