1. 18
    1. 4

      I actually ran into this recently, but mostly use radare2 on the command line.

      You can use https://github.com/opensource-apple/dyld to extract them. Just check out the above repo, edit launch-cache/dsc_extractor.cpp, and remove the #if 0 around the test program portion.

      Then compile it with:

      clang++ launch-cache/dsc_extractor.cpp launch-cache/dsc_iterator.cpp -o dsc_extractor
      

      Then go to: /System/Cryptexes/OS/System/Library/dyld/

      And run dsc_extractor on the file for the arch you want to get the binary for:

      ~/git-projects/dyld/dsc_extractor dyld_shared_cache_arm64e /tmp/arm64/
      
      1. 3

        For those following along, you can also use

        int dyld_shared_cache_extract_dylibs_progress(const char* shared_cache_file_path, const char* extraction_root_path, progress_block progress)
        

        which is exported from the

        /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/usr/lib/dsc_extractor.bundle
        

        library if you don’t want to compile Dyld yourself (assuming you have Xcode installed).

        Also, the repo you linked is no longer updated. The new Apple OSS distributions are under the apple-oss-distributions org. You’ll want https://github.com/apple-oss-distributions/dyld for the latest source.


        Minimal example:

        // clang -fblocks -o dsc_extract main.c
        
        #include <dlfcn.h>
        #include <stdio.h>
        #include <stdlib.h>
        
        #define LIB_PATH   "/Applications/Xcode-beta.app/Contents/Developer/Platforms/iPhoneOS.platform/usr/lib/dsc_extractor.bundle"
        #define EXTRACT_FN "dyld_shared_cache_extract_dylibs_progress"
        
        typedef void (^dsc_progress_block_t)(unsigned current, unsigned total);
        typedef int (*dsc_extract_fn_t)(const char *dsc_path, const char *out_path, dsc_progress_block_t callback);
        
        int main(int argc, char const **argv)
        {
            if (argc < 2) {
                fprintf(stderr, "Usage: %s <dsc>\n", argv[0]);
                return EXIT_FAILURE;
            }
        
            void *lib = dlopen(LIB_PATH, RTLD_NOW);
            if (!lib) {
                fprintf(stderr, "Error: Failed to open 'dsc_extractor' library!\n");
                return EXIT_FAILURE;
            }
        
            dsc_extract_fn_t extract = dlsym(lib, EXTRACT_FN);
            if (!extract) {
                fprintf(stderr, "Error: Failed to get extractor function!\n");
                return EXIT_FAILURE;
            }
        
            extract(argv[1], ".", ^(unsigned current, unsigned progress) {
                // Comment out to reduce output noise...
                fprintf(stderr, "Extracting... (%u/%u)\n", current + 1, progress);
            });
        }
        
      2. 1

        Reading through others’ thoughts on this, it seems to have some kind of security/performance implications, but I am always very suspicious of when those two things get waved around randomly.

        It means there’s a single blob to check the validity of, you don’t need to check for subcomponents being changed, you don’t need to verify those subcomponents when they are changed, and you don’t have to have two copies of the system library (the original and the cache).

        This knee jerk “ok I don’t have complete causal knowledge of the universe therefore I am suspicious” is so tiresome, when it takes literally no effort to understand what the security and performance costs of duplicate copies of the system libraries are.

        Not even sure what the possible threat that this muppet is “suspicious” of.

        1. 9

          As you might imagine I also objected to that throw away comment, but you get to it first ;-)

          To be an explicit about this, it is utterly trivial to see how it has numerous performance benefits as well, there is no need to be skeptical. On my system an execution of /bin/ls pulls in 46 mach-o images, of those 45 of them are in the shared cache. Each on of those images has multiple segments:

          • an executable segment with code
          • a data segment with globals as well as pointer necessary for runtime functionality
          • a read only section

          (we actually have more, but those are for somewhat specialized optimizations and I am going to ignore them for now, but adding them in would just amplify the impact).

          Anyway, because of the shared cache is pre-mapped into the shared region we do not have to search via filesystem APIs for the dylibs (at a minimum a stat), open the files, mmap the segments, and close the files. Adding that up we can trivially see that saves us:

          • 45 stat syscalls
          • 45 open syscalls
          • 135 mmap syscalls
          • 45 closes syscalls
          • 45 codesignature registrations

          Not to mention all the time we would spend parsing the mach-o files to actually figure out we need to do all of those operations.

          At this point people generally comment that we would not have to do all of those if the applications were statically linked, or are aghast that /bin/ls links (indirectly) to 45 libraries… but that is a Linux centric mindset where the only stable cross distribution interfaces are syscalls, dynamic libraries are relatively slow, and (especially in systems running containerized workloads) relatively few processes use any particular set of libraries. The reason we have some many dylibs is explicitly because dynamic linking on Darwin (especially in the shared cache) is very fast, and because they are stable and everything uses them we get a ton of page sharing. It gives most of the perf of static linking with all of the memory and developments benefits of dynamic linking.

          That also results in us doing things that don’t make sense to be people coming from Linux. For example, as I alluded to above we actually have more than 3 segments per dylib, which would increase the number of mmaps and waste more memory due to rounding each segment of to a page. That is because there are other optimizations we can apply through careful layout and grouping of image contents that require more segments with distinct memory permissions. When having additional segments has memory and runtime overhead due to loading individual libraries from disk they may not be worth it, but because we have removed those obstacles have been able to pursue those sorts of layout optimizations.

          That does not ever get into all of the complex optimizations we do thanks to having a holistic view of all the dylibs in the system. And lest you think those are exclusively for system libraries. we also support similar technologies for applications through the use of Mergeable Libraries and Pagein linking.

          1. 2

            Thanks for the thorough explanation (and for not calling me a muppet), I definitely learned a bunch from this comment!

            I added a footnote in the blog linking to this so others can as well.

      🇬🇧 The UK geoblock is lifted, hopefully permanently.