I like to take notes on these kind of posts since they’re very informative. Here are the notes I found to be relevant to my interest:
Key Use Cases
- “node spin-up time on GCP was so fast that we found race conditions in our cluster management software”
- distributed load-balancer intake methodologies GCP employs also means clients often hop on the GCP network very close to their point of presence
- “GCP is a natural choice for things which scale up and down regularly relating to real-time data.”
- AWS is very flexible with expected throughput + ability to burse capacity for short times
- This means you can have highly specialized disk settings for your exact application needs
- Our experience with these two particular disk types (gp2; General Purpose SSD, st1; Throughput Optimized HDD) is that the performance expectations can be inconsistent and can suffer from noisy neighbors and multi-VM brownouts or blackouts.
- Options more limited in GCP, network attached disk performs EXACTLY as advertised
- For AWS, networking expectations is one of the hardest things to figure out.
- … 10Gbs or 20Gbs instances, … only sees those throughputs if … using placement groups, which can be subject to freeze outs where you cannot get capacity.
- For GCP, … achievable network capacity is based on the quantity of CPUs
- Significantly more CPU skew in GCP vs AWS; AWS was more consistent from VM to VM
- VMs that run at nearly 100% CPU makes capacity planning more challenging
- “on AWS you have the option of getting dedicated machines which you can use to guarantee no two machines of yours run on the same underlying motherboard”
- In GCP there is no comparable offering. What this means is that for most use cases, you have to assume in any particular zone that all of your instances are running on the same machine.
- GCP has been a lot more forthcoming with what issues their services are experiencing
- AWS often does not report issues or does not acknowledge issues at all
- A unique feature for GCP is the ability to migrate your VMs to new hardware transparently. This live migration is something we were very hesitant about at first, but in practice when migrating a chunk of kafka brokers in the middle of broker replacement, none of our metrics could even detect that a migration had occurred!