A neat observation though I’d liked to understand why. I’m not very strong on how modern CPUs work so things like branch prediction are a little bit of a mystery to me in practice.
Branch predictors unfortunately are a black box. They’re an implementation detail subject to change. AMD went as far as calling theirs a “neural net” (which is probably a perceptron).
My guess is that the problem overall can be treated similarly to cache pressure: if you add an easily-cacheable memory access, even when it is cheap by itself, it will push something else out of the cache and make some other access more expensive.
Branch predictors have a finite-size hashmap of locations they track, so if you add more branches, the predictor will have to stop tracking some other branches. Another possibility is that the predictor evaluates behavior of multiple branches together as a group, and adding an unusual branch there confuses it.