1. 21

  2. 15

    The easiest way to solve the problem is to increase the size of your gp2 volume.

    While this is true, there’s another way that will give you even more iops for less (or zero!) additional cost. Many EC2 instances come with a local SSD. For example the i3.large instance type - which is fairly small, just 2 cores and 16GB ram - includes a 475GB NVMe SSD. You can perform tens of thousands of iops on this disk easily.

    Obviously, since this SSD is local it’s contents are lost if your instance is stopped for any reason, like hardware failure.

    1. 3

      Also worth noting there’s more options like this since the introduction of the new generation instances with “d” designators, like c5d and m5d, which have local nvme storage, and might be a good balance between general purpose compute while still having local storage. The i-type hosts are “I/O optimised” which solves the storage problem but might leave you without much for the actual build tasks.

      1. 2

        Thanks for the idea, noted in the article.

    2. 3

      This is good advice for troubleshooting EC2 performance in general, as disk IOPS issues don’t always jump out the same way CPU, RAM, and network I/O issues might.

      1. 1

        You can explicitly add more CPU & RAM but only implicitly add iops.

        1. 9

          io1 volume types offer provisioned IOPS with a guarantee of “Amazon EBS delivers within 10 percent of the provisioned IOPS performance 99.9 percent of the time over a given year.” io1 volumes do have a price premium over the more commonly-used gp2 volume types, but there is no bucket-and-credit model.

          (disclosure: I work for AWS)

      2. 2

        The solution to this problem is proper staging of fast and slow moving parts of the pipeline. Bake all the slow moving parts into an EBS snapshot and then have a delta update mechanism to bring things up to date. If you’re using docker then the delta ends up being the top most layer. Along with reducing I/O this also speeds up the startup time and makes for faster feedback loops.

        1. 2

          I think there’s a miscalculation in this paragraph:

          The standard IOPS, 3 IOPS per GiB of storage, with a minimum of 100 regardless of volume size. If you have a 100GiB EBS volume it will do 100 IOPS; a 500GiB volume will do 1500 IOPS.

          The 100 GiB volume will have 300 IOPS.

          1. 2

            Fixed, thank you.

          2. 1

            If this is all transient data that you won’t care about losing if the VM crashes, maybe you could you maybe get a little more by turning off all the data consistency options on the filesystem mount, to let the OS coalesce (or even elide) some writes? :)

            But if that’s the case then you might really want to avoid using a networked EBS volume instead of a local device anyway.

            1. 1

              I have another post queued up about that, in a slightly different context :)