1. 37

  2. 16

    Check list to use mmap on files:

    • No portability (Linux/BSD only). On the surface it’s portable, but the failures aren’t.
    • Local files on friendly file systems, reading the (fs) source is the only way to know the failure modes, mmap manpage is useless.
    • File size known, pre-allocated and static under the duration of the mmap. Size can be increased, but beware of mremap (see later point).
    • Known and controlled bounds on number mmap invocations. mmap is a limited resources like files, see sysctl vm.max_map_count.
    • Lifetime management is of utmost priority. Calls to munmap are carefully tracked. Make sure that every pointers to the un-mapped regions are reclaimed/invalidated. This is probably the most difficult part. If your pointers are not wrapped with a handle on the mmap resource, you likely have a use-after-free bug laying somewhere.
    • Always assume mremap will give you a new address, or fail under MREMAP_FIXED. This is especially relevant to previous point.
    • Using a signal handler to recover from mmap failures is just pure madness and a daunting task to make re-entrant. Rule of thumb is that the only thing you can do in a non-crashing signal handler is to flip an atomic.

    If you meet all the previous point, go ahead and use mmap. Otherwise trust the kernel and use pread/pwrite. mmap is amazing, but it’s a double-edge-nuclear-foot-gun.

    1. 3

      mmap() is POSIX so any Unix system should support it (for instance, Solaris does). I agree, but would also add:

      • Make sure the file DOES NOT CHANGE from outside your program. So don’t go overwriting the file with a script while you have a program that has mmap()ed the file. Bad things happen.
      • It can be used to map memory without a file, using MAP_ANONYMOUS. I would wager this is probably the most often used case for mmap().
      • Once you mmap() a file, you can close the file descriptor. At least, this works on Linux. Mac OS-X and Solaris in my experience (of mmap()ing a read-only file)
      1. 5

        Note that map anonymous is not in posix. Actually the spec is quite lengthy and if you read it carefully, there are a lot of caveats and wiggle room left to implementations to only provide a minimal version.


        1. 2

          That’s surprising (but it shouldn’t be—I mean, the mem*() functions from Standard C were only marked async safe a few years ago in POSIX). On the plus side, it appears that Linux, BSD and Solaris all support MAP_ANONYMOUS.

        2. 5

          I meant that the way things go wrong with mmap are not portable. If you’re building a library or a long running daemon, this is critical. Other points I forgot in my list

          • mmap/munmap is a costly syscall. A corollary of this and the third point in the original list is that you should not mmap many small files. Few large block is the optimal case.
          • you can only sync on page boundaries, thus if you write, it needs to be on aligned blocks of 4k, otherwise you’ll trigger some serious write amplification on SSDs.
          • msync is tricky, MS_ASYNC is a noop on Linux and doesn’t report failures, thus you might never know if a write really succeeded and you also have to know about vm.dirty* sysctls, they’re directly correlated with async writeback.
          • huge pages don’t work with file backed mmap (and doesn’t make sense, there’s only a single dirty bit for the whole page!)
          • if you have many virtual memory mmaping, the chance of a TLB miss increase and your performance can decrease depending on your access patterns.

          I’m starting to realize that the mmap man page should be written with all said pitfalls :)

          1. 2

            At work, we have a near perfect case use of mmap()—we map a large file shared read-only (several instances of the program share the same data) on the local filesystem (not over NFS) that contains data to be searched through, and it’s updated [1] periodically. That’s the only case I’ve seen mmap() used [3].

            [1] By deleting the underlying file [2], then moving in the new version. Our program will then pick up the change (I don’t think the system supports file notification events so it periodically polls the timestamp of the file—this is fine for our use case), mmap() the updated filed, and when that succeeds, munmap() the old one.

            [2] Unix file semantics to the rescue! Since there’s still a reference to the actual data, the file is still around, but the new copy won’t overwrite the old copy.

            [3] Although it’s possible the underlying runtime system uses it for memory allocation.

            1. 1

              When working on IO devices where the bandwidth is equal or greater than the memory bandwidth (on my desktop, I’m capped at 10-12G/sec on a single core, or 48-50G/sec for all cores), you’re eliding one copy of the data. Effectively reaching the upper bound (instead of half).

      2. 6

        Trying to use version control systems on network drives is an activity well known to be a rich source of sorrow, everybody on the planet who wished to maintain sanity ceased doing that when CVS arrived.

        They are trying to use git. They should sniff that the repo is on a network drive and stop right there with a “Don’t Do That”

        Given that they are duelling with git garbage collection, I can’t but help feel that they are trying to do something very weird and outside the supported API of git.

        ie. If you’re fight the tool… maybe don’t do that.

        All this said, I have use mmap many times hugely successfully.

        1. 7

          I got to the part about longjmping out of a signal handler before breaking down crying. No matter what you’re doing, that has to be a warning that you’re doing it wrong.

          1. 4

            The post points out later on that this is unsafe, and switches to siglongjmp.

            1. 4

              Can’t tell if you’re being facetious or not :)

              1. 2

                I’m not being facetious

          2. 4

            Just removed “try using mmap to read files” from my project TODO list. Thanks, @notriddle.

            1. 3

              FWIW mmap can be incredibly useful for working with files. It effectively transforms a function-based api into a memory-based one, and C has a ton of native support for manipulating memory. You no longer have to manage a separate buffer (or buffers), and you can just assign to a memory address instead of seeking. I’d call it the “scripting language” of file IO: fast, dirty, and simple.

              1. 2

                I agree, but due to the complexity of error handling in the cases described by the original article, you need to consider whether mmap is a net simplification or a net complication for the program you are writing.

                In my case, I only need read-only access to the files I am processing. I get the same simplicity of coding regardless of whether I mmap the files, or allocate a buffer and read the entire file into the buffer. The extra complexity of mmap error handling isn’t worth it. If I was modifying these files, the situation might be different. Right now my files are small. If I start working with huge files in the future, I might need to reconsider the use of mmap.

                1. 1

                  Yeah, I’ve found the best use cases are with read-only files (where the errors don’t matter) and with files you create that have a fixed size (where you can just start over/fail noisily on error).

            2. 3

              we are in early stages of using lmdb as a file cache actually. Works well. it is a well optimized key value store with library bindings from many languages https://symas.com/performance-tradeoffs-in-lmdb/

              I had it used it other project, and it was residing on a remote san drive, the write performance was noticeable slower, than when on a local drive, but no corruption issues

              Myself, I would be reasonably comfortable using mmap for reading / verifying/computing some aggregate values – basically a read-only use case. But if writing to mmap, I would look for a library that does it well (and in portable way), and the use case needs to have really well understood performance advantages.

              1. 3

                The combination of remote filesystems and mmap is just a minefield. Thumbnail generation in RawThreapee over NFS is very “fun”. For people on Linux it appears as just very long pauses, for me on FreeBSD it appears as full CPU usage on all cores :D