1. 2

    Conclusion: “Due to really high noise in the data, it’s often hard to know if the model is doing anything.”

    1. 2

      Haha. Thanks for noticing it.

      However, I view this statement a bit differently - I don’t see that as a conclusion. Instead, that’s the belief I start with. Given that the data contains a lot of noise, I want to construct models that can consistently spit out actionable predictions (i.e. predictions that can be used in an automated execution/trading system). In case of linear models, it’s about preserving standard deviation of the predicted output. For neural networks and ensemble models, it’s about generalization to out-of-sample examples.

      1. 1

        Isn’t the assumption that the past does not reflect what the future holds? How can any of these models say anything actionable?

        I’m genuinely curious.

        1. 3

          Yup, you’re right. Usually, the price/return data distribution of future is very different from the past - this is what makes the whole process (the process of constructing models) difficult, but not really impossible.

          Unfortunately, I didn’t really discuss the actionable part in the post. Typically, a high-frequency trading company or a trade execution company would have an execution logic on top of these models. Execution logic refers to an algorithm that takes in the current prices, sizes, positions, pnl & model signals (HFT models are generally linear in order to preserve the speed). The job of this algorithm is to use that information to decide whether to send a new order (if so, what price) or to cancel/modify an existing order. So, simply optimizing the model for MSE or MAE doesn’t necessarily mean it will result in better PNL. Also, the mapping of model error to pnl is not obvious. Therefore, the goal is to construct actionable models (that can be helpful in trading) rather than simply optimizing for error. I realize it’s hard to understand this aspect simply from this paragraph - but, I hope I was able to highlight something.

          Here’s an example that’s kinda similar: Youtube (and others) generates recommendation models that try to minimize RMSE on the watch time (or click rate). However, a model only optimized on this error wouldn’t necessarily be the best one in their A/B testing. By the way, there is an interesting parallel between the recommendation systems and this problem - both have very low accuracies, both have a lot of noise in the data, both have seasonalities and both have to deal with the unseen ‘events/items’.