1. 26

For most of my apps we use both Redis and Memcached. Thinking about completely removing Memcached from the stack, since everything Memcached does works just as well with Redis. What do you think? Is it times to remove Memcached from the stack and use only Redis?

  1.  

  2. 21

    I’ve dealt with both at relatively high scales (low MHz, persistent and non-persistent workloads, interesting outages related to pushing to high scales, heavy sharding) and may be able to help you zero in on an answer. Before anyone can give you a useful answer, please answer these questions:

    1. do you trust the people writing code to consider the computational complexity of their queries? Redis can be really painful in multitenant situations due to its single-threaded nature, as a single command may occupy the thread for long enough to cause other requests to time out pretty easily. Grabbing huge values, KEYS over too many keys, etc… will cause other clients to time out in some situations. The simplicity of memcached tends to cause developers to avoid unintuitively expensive usage. Memcached is multithreaded and will deal with rare expensive mget’s much more gracefully than redis (expect blasts of timeouts that you have to track down using tools like redis faina).
    2. are you storing non-recomputable data in these systems, or do they act purely as caches? redis is really a poor choice for a persistent store, and something like mysql will serve most workloads much better, has far better tooling around it, serves multiple requests concurrently, allows you to have larger-than-memory datasets, etc etc etc…
    3. are you storing more than 4 billion keys in single instances? memcached will have trouble due to its 32 bit hash function and may start spitting back incorrect values

    Generally I think of redis as a nice prototyping tool, but I haven’t been very impressed with it at high scale, and it has a lot of operational edges that cause it to be more human-time expensive than something like memcached for caches or mysql for persistence.

    1. 8

      Concur. I found most of my semi permanent semi ephemeral data could be recategorized as either permanent or ephemeral.

      1. 8

        I’ve used Redis at high scale (x00,000 operations per second per instance, with several instances, for years) and it’s worked like a charm. Redis by itself never crashes unless you exceed some operational parameter, or you have people who don’t understand Redis writing code (e.g. using KEYS at all for any reason). I’ve additionally never lost data from a Redis instance that wasn’t directly attributable to user error or Amazon.

        That said, Redis is actually a database construction kit rather than a database, so if you show up thinking you’re going to use it without understanding it like you sort-of-can with mysql, then you’re screwed. It also has sharp operational edges (e.g. you should leave about twice, and certainly no more than once again, the size of your maximum redis memory load as headroom, because Redis forks in order to save state, and rapid queries may mark all the copy-on-write pages and force at least one copy of your entire state; and the filesystem driver may do so as well), and it has zero HA story (‘redis cluster’ is a bandaid, not a solution). If you want ‘high’ availability, have two or more slave readers subscribed to master, and if the master ever goes away, manually promote a slave to master, reconfigure the topology, continue.

        Redis is probably not my first choice for a persistent store, but it’s not that bad. It has a decent enough filesystem log story that is actually not too dissimilar from the way, e.g., a mysql instance handles it. It has the ability to send writes to slaves. You can persist to disk in the background without screwing the main thread. Pragmatically all that it really lacks is a real clustering story, but most of the time you can shard your database somehow, and Redis is so fast that it offers a lot of headroom until you get to that point unless your data is very large. Now that you can get machines with 2 terabytes of memory, very large is pretty darn large.

        1. 3

          Redis is actually a database construction kit rather than a database

          I really like this description! In the infrastructure that I became involved with it, it had been thrown up as a way to easily persist state at high throughputs without requiring users to deal with time intensive capacity planning conversations that needed to happen in that organization before using MySQL. As you may predict, the org learned many lessons about the importance of capacity planning! Well after non-recomputable data was being stored in it using an in-house sharding proxy (with auto-promote built in for handling [read: creating] failures) we realized that it was impossible to safely fail over a host and begin reslaving 10 instances using a single rotational disk.

          Lessons:

          1. be careful when running multiple instances with persistence on a single rotational disk. when you add a slave, the master forks and dumps the entire dataset into a file, then pushes that file to the slave, followed by the diff since the fork happened. when 10 instances are doing this at once while serving traffic, you thrash the disk while read-only pages start faulting like crazy and start being copied (as you described). this caused the boxes to run out of memory, and ssh didn’t even work because doing so caused a disk access which was deeply deeply buried in a starved queue. solutions:
          2. automation that is careful to avoid downing the box by over-slaving
          3. SSD’s
          4. do failover while capacity planning
          1. 1

            Yeah. If you underprovision most database servers, they will grind to a halt in a moderately safe manner. If you underprovision redis, you can get into unexpected trouble real fast.

            That said: properly provisioned Redis with enough memory to fit the use case and a fast SSD: insanely fast, great collection of features, about as safe log-wise as any other database, and scales surprisingly far.

      2. 13

        I’m somewhat of an expert in the area, having developed the main memcached client for Ruby (Dalli) and a major user of Redis (Sidekiq). Short answer: I love them both.

        memcached is fantastic because it requires almost no configuration at all: set the amount of memory to use and you’re done. It’s threaded so it will use as many cores as necessary and scale to the moon.

        redis requires more configuration to tune the persistence properly based on your needs. If you are using Redis for background jobs and caching, you ideally should have two different Redis instances with different persistence configurations. It’s also single threaded but this shouldn’t be a problem for anything but the largest of scales. Typical business apps will be fine.

        More info:

        1. 6

          I think it’s worth drilling in on a few points you made and expanding a bit:

          Unlike memcached, Redis is persistent, but like you said it does require configuration. Redis, given the elements of persistence, can lend its self well to becoming a critical piece of infrastructure storing more than just cache data. Practically, relying on Redis works out 99.9% of the time (or more), but as soon as it’s not a cache anymore you need to think about HA and disaster recovery. That’s where having two instances with two different configurations makes a lot of sense – you don’t need hard durability for things you’re caching.

          Memcached is sharded out of the box. Add another machine and you’re good to go. After all, it’s just a cache, right? Redis requires a little more effort and care to shard. twmemproxy is a great way to do that if you need to, and it works for memcached too (though the benefits are a little less obvious unless you’re running a huge cache farm).

          Reading the above, you might think I’d be voting in favor of keeping memcached. Honestly, it depends on how big your deployment is, and the kinds of work you’re bringing to Redis. If your cache usage is fairly light (100/s-1000/s), Redis might well be a good way to consolidate infrastructure. Of all the workloads you could heap onto Redis, caching is among the best since it does not require durability (though is helped considerably by it; cold caches hurt).

          1. 4

            Using Redis for most things besides caches and truly ephemeral data is a pretty bad idea. While it can be made persistent, it has no real HA story (Sentinel is not ready for production, writes are not guaranteed to make it to each slave before a failover, and in testing I found it really easy to confuse sentinel and get it in a state where it could not elect a new master).

            1. 3

              Hello, just a few comments about Sentinel and HA in Redis, in order to expose a point of view different than your one.

              1. Sentinel starting with Redis 3.0 is considered to be production ready.
              2. The fact that it has a weak (and very documented) consistency model (best effort consistency with asynchronous replication, with different failure modes that can lead to data loss, but with attempts to avoid losing data when possible), does not mean you can’t use it, it means that you need to apply it for use cases where this consistency model makes sense for the application.
              3. The new Redis Sentinel (v2) acts on a very small set of fixed rules. For example every failover generates a guaranteed unique configuration number, and newer configurations eventually wins over old ones when partitions heals, it’s all documented. If there are behaviors that don’t conform with what Sentinel is supposed to do, please report them, we test Sentinel and can’t find issues like the ones you describe (however make sure to test latest 3.0 for maximum stability – latest 3.2 can be also an option but has a lot of new code).

              Also note that newer versions of Redis tend to be much more verbose and are able to report why a promotion is not possible: you can also configure it with different tradeoffs. For example normally slaves that does not appear to have an updated state are not considered candidates to replace the old master, so by random reboots it is very easy to get into a state where no promotion is possible once you get all the slaves not able to provide evidence they have a decently updated data set.

              Disclaimer: I’m the primary author of Redis and Redis Sentinel.

            2. 1

              Thank you for sharing! So basically at high volume you wouldn’t recommend replacing Memcached with Redis for the simple key/value cache use case?

              1. 2

                It totally depends on your use case. Just don’t use either as your primary data store. Redis is a better choice for pure KV in certain situations, but has more operational complexity. That operational complexity probably isn’t worth it, but if you need to maximize throughput at the cost of multitenancy and increased automation and debugging time, then you can pin redis instances to different cpu cores, and pin network related IRQ handling to another. Disable all disk activity, and make your users pick the maxmemory-policy for their clusters so that it becomes psychologically real for them that their data will be deleted over time / their cluster will stop accepting new data and their writepath will block.