It’s not all that black and white.
IMO, there are times when unit tests help, there are times when integration tests help and there are times when it’s best not to write any tests.
Based on my experience working with huge codebases as well as starting codebases from scratch and working with teams as well as working on a project on my own, I have found that the optimal test/testing requirement varies a lot. However, in times of doubt, it’s probably a good idea to err on the side of writing tests.
(Disclaimer: I hate writing tests.)
These are my rules of thumb at the moment (they are still evolving):
It also matters how a test is written. I have seen some really good tests (i.e. readable, serves the purpose and concise) tests as well as very bad tests. Good tests can also serve as an unofficial guide to using the interface. Bad tests, on the other hand, confuse people about the purpose of the interface and unnecessarily create an obstacle to further changes.
Curious that jsonrpc wasn’t evaluated. I would think that it was more popular than RPyC, at any rate.
Thanks. I will check it out - didn’t know about it. Someone mentioned they had used RPyC when I was checking out gRPC.
Thrift supports container types list, set and map. They also support constants. This is not supported by Protocol Buffers.
I think you missed something. Repeated fields have been in protos forever, map types are in proto3 (required by gRPC) and enums are built-in constants.
Oh, wow. I didn’t know about this. Thanks for pointing this out.
Conclusion: “Due to really high noise in the data, it’s often hard to know if the model is doing anything.”
Haha. Thanks for noticing it.
However, I view this statement a bit differently - I don’t see that as a conclusion. Instead, that’s the belief I start with. Given that the data contains a lot of noise, I want to construct models that can consistently spit out actionable predictions (i.e. predictions that can be used in an automated execution/trading system). In case of linear models, it’s about preserving standard deviation of the predicted output. For neural networks and ensemble models, it’s about generalization to out-of-sample examples.
Isn’t the assumption that the past does not reflect what the future holds? How can any of these models say anything actionable?
I’m genuinely curious.
Yup, you’re right. Usually, the price/return data distribution of future is very different from the past - this is what makes the whole process (the process of constructing models) difficult, but not really impossible.
Unfortunately, I didn’t really discuss the actionable part in the post. Typically, a high-frequency trading company or a trade execution company would have an execution logic on top of these models.
Execution logic refers to an algorithm that takes in the current prices, sizes, positions, pnl & model signals (HFT models are generally linear in order to preserve the speed). The job of this algorithm is to use that information to decide whether to send a new order (if so, what price) or to cancel/modify an existing order. So, simply optimizing the model for MSE or MAE doesn’t necessarily mean it will result in better PNL. Also, the mapping of model error to pnl is not obvious. Therefore, the goal is to construct actionable models (that can be helpful in trading) rather than simply optimizing for error. I realize it’s hard to understand this aspect simply from this paragraph - but, I hope I was able to highlight something.
Here’s an example that’s kinda similar: Youtube (and others) generates recommendation models that try to minimize RMSE on the watch time (or click rate). However, a model only optimized on this error wouldn’t necessarily be the best one in their A/B testing.
By the way, there is an interesting parallel between the recommendation systems and this problem - both have very low accuracies, both have a lot of noise in the data, both have seasonalities and both have to deal with the unseen ‘events/items’.