Understanding Dyld: The Dynamic Loader Lifecycle Explained

Chapter 1: The Journey Begins

This article marks the final entry in our series dedicated to debugging Dyld-1122 and examining its source code. We will delve into how Dyld loads dependent dynamic libraries, binds them together, invokes the main() function, and ultimately terminates.

It's important to mention that this analysis might contain inaccuracies, as I am still in the learning phase and have worked on this alone without external verification. Please feel free to provide feedback in the comments or reach out via social media if you notice any errors. Let’s get started!

Working Map

To kick off, we’ll decompile Dyld using Hopper with the command: Copyhopper -e '/usr/lib/dyld' We will start by examining the Memory Manager. In a previous article, I introduced some pseudo-code, which is referenced below:

This led us to create an allocator that serves as a memory pool for establishing the Global state of the process, comprising two categories of states: fixed (ProcessConfig) and dynamic (RuntimeState). The ProcessConfig can be accessed via RuntimeState as a property.

The RuntimeState class's state object, developed in the previous episode, acts as an API for querying data related to the process (including threads or loaded Mach-Os). In the repository, this is even referred to as APIs, which inherit from RuntimeState.

In our last discussion, we explored the ExternallyViewableState that retains details about the loaded images. Initially, it only holds information about Dyld and the executable image. Now, we will execute the prepare function to load the remaining dependent images (dylibs):

Dyld GitHub repository: Start: prepare in dyld-1122.1 — dyldMain.cpp#1252 End: exit in dyld-1122.1 — dyldMain.cpp#1272

LLDB Breakpoints: Copy # Start - dyld`start+1828 settings set target.env-vars DYLD_IN_CACHE=0 br set -n start -s dyld -R 1828 # Just before calling main - dyld`start+2356 br set -n start -s dyld -R 2356 # Just before calling exit - dyld`start+2432 br set -n start -s dyld -R 2432

This concludes our current article; however, I might add more insights about aspects I overlooked in this series later on.

Starting Preparation

Before we execute the prepare function, we initialize some register values: The x1 register holds a pointer to the start of the Dyld image (our second argument — MachOAnalyzer), while the first argument APIs is stored in the x0 register. We can verify this by examining the instructions.

Upon stepping into the prepare function, we notice it contains considerably more code (4080 instructions 😰) than the dyld::start function:

To summarize, here’s a line from the code repository we are examining: dyldMain.cpp

The source code repository features relevant code between lines 482–944. Based on the comments, this represents the concluding section of our Dyld review. We can skip the code between lines 484–516, as it is compiled solely in the context of EnclaveKit initialization. We should begin our analysis at line 517: dyldMain.cpp.

We can validate our assumption in the debugger, where the first instructions we encounter are kdebug_trace_dyld_enabled:

Here, we won’t delve deeply into the kdebug system; I plan to revisit this in a future series focused on XNU debugging. Generally, it remains disabled, so we will jump to line 524, where the simulator check is executed: dyldMain.cpp.

In LLDB, after returning from kdebug_trace_dyld_enabled with a value of 0 in the x0 register, we encounter a CBZ instruction and jump to +132.

Subsequently, another jump takes us to line 204, where the isSimulatorPlatform function checks if our executable is running within any of the simulator platforms listed below: MachOFile.cpp.

Since we are not running in a simulator context, we’ll bypass most of the code and jump to line 563. This jump also verifies if the executed program is configured for the simulator in line 538 and whether logging of environment variables is enabled in line 554: dyldMain.cpp.

If we were running in a simulator context, this would confirm the program is appropriately set up to operate on a simulator platform with the correct DYLD_ROOT_PATH.

The state.initializeClosureMode() function is called in RuntimeState to manage PrebuiltLoaders from the dyld cache. The logic of this function is thoroughly elucidated in PrebuiltLoaderSet_Policy.md.

PrebuiltLoaders offer optimized representations of dynamic libraries used by Dyld to enhance application launch times. If the application is initiated for the first time, JustInTimeLoader will also come into play.

For further information, the aforementioned document about PrebuiltLoaderSet Policy offers an in-depth explanation of its workings alongside dyld3 and dyld2 versions: dyld/doc/PrebuiltLoaderSet_Policy.md.

However, with the current version of dyld4, we now primarily rely on two options: JustInTimeLoaders and PrebuiltLoaders: dyld/doc/PrebuiltLoaderSet_Policy.md.

The PrebuiltLoaderSet for dyld4 functions similarly to Dyld Closure in dyld3. The policy point of dyld4 summarizes scenarios where Dyld Closures are not employed and clarifies how DYLD_USE_CLOSURES operates in this version: dyld/doc/PrebuiltLoaderSet_Policy.md.

Additionally, there is a constraint concerning PrebuiltLoaderSet: dyld/doc/dyld4.md. Regarding DYLD_USE_CLOSURES, there is a remark in the code: DyldRuntimeState.cpp.

A tool named dyld_closure_util exists for creating Dyld Closures, with its source code available in the repository. However, compiling it in a non-internal Apple environment is quite complex, and I eventually abandoned the attempt: dyld_closure_util.cpp.

The initializeClosureMode function is invoked from the state object since RuntimeState includes a Loader object that monitors each loaded Mach-O: DyldRuntimeState.h.

The PrebuiltLoader and JustInTimeLoader classes are subclasses of Loader: dyld/doc/dyld4.md. Their code resides in the repository between lines 90–355: Loader.h.

For additional information about Loaders, consult another section in the documentation: dyld/doc/dyld4.md.

Further details about PrebuiltLoader can be found in dyld/doc/dyld4.md, and about JustInTimeLoader in dyld/doc/dyld4.md.

The core functionality resides within the code spanning lines 2670–2842: DyldRuntimeState.cpp. It begins with the initialization of various variables and continues with validating the header of the PrebuiltLoaderSet from the Dyld cache, specifically at line 2677: DyldRuntimeState.cpp.

The source code for the validHeader logic is illustrated below. In our scenario, it returns a true value: PrebuiltLoader.cpp.

The hasValidMagic function checks if PrebuiltLoaderSet->magic matches kMagic: PrebuiltLoader.cpp. We can locate kmagic in the source code repository or by reviewing the decompiled code while debugging in LLDB (0x9a66106073703464).

Following the magic validation, we execute dontUsePrebuiltForApp: DyldRuntimeState.cpp. This function evaluates whether prebuilt loaders should be disabled based on Dyld Environment Variables and executable load commands: DyldProcessConfig.cpp.

After this verification, we enter another else if block where we search the cache for PrebuiltLoader for the program using findLaunchLoaderSet: DyldSharedCache.cpp.

If cachePBLS is not located and the main executable path begins with /System/, it tries to find a PrebuiltLoaderSet using the cd-hash: DyldRuntimeState.cpp.

Since we are not executing the program from the /System/ directory, we bypass the code between lines 2707–2716 and move ahead to line 2717: DyldRuntimeState.cpp.

The hasLaunchLoaderSetWithCDHash function serves as a simple wrapper that calls findLaunchLoaderSetWithCDHash and checks for a non-null pointer: DyldSharedCache.cpp.

The findLaunchLoaderSetWithCDHash function constructs a path using the supplied cdHashString, ensuring it is neither null nor excessively lengthy to prevent buffer overflow, and then attempts to locate a prebuilt loader set corresponding to this path using findLaunchLoaderSet: DyldSharedCache.cpp.

Example path after executing DyldSharedCache::hasLaunchLoaderSetWithCDHash: /cdhash/3302ae16a5eda1cf7daab75ce63b94274674ec8b

If a PrebuiltLoaderSet is discovered, isOsProgram is set to True, and we execute allowOsProgramsToSaveUpdatedClosures. Otherwise, we handle it as a third-party app and execute allowNonOsProgramsToSaveUpdatedClosures: DyldRuntimeState.cpp.

The allowOsProgramsToSaveUpdatedClosures block restricts local closure files from overriding closures in the dyld cache: DyldRuntimeState.cpp.

The allowNonOsProgramsToSaveUpdatedClosures function prevents third-party apps from saving closures based on several criteria: DyldRuntimeState.cpp.

Saving is prohibited on macOS for iPad apps operating on Apple Silicon macOS when the executable lacks a CDHash (unsigned). However, saving is permitted on iOS, tvOS, and watchOS platforms.

In our instance, a closure will not be saved since it is a third-party app on macOS.

Next, we enter a code block pertaining to DYLD_USE_CLOSURES logic: DyldRuntimeState.cpp.

Subsequently, there’s code related to loading closures from disk, but on macOS, this is only applicable to system applications. I will refrain from analyzing it here.

To summarize, the initializeClosureMode function ensures that dyld can utilize prebuilt closures when available and valid for dynamic libraries to optimize application startup or revert to just-in-time loading, which constructs closures for concurrent program startup. In the case of third-party apps on macOS, this code guarantees that the closure will not be saved to disk.

Just-in-Time Loading

After returning from initializeClosureMode, lines 564–568 process a set of prebuilt loaders if they were initialized and retrieve the primary loader (at index 0). We then pre-allocate memory for all images.

For us, there is no mainSet. This code will not execute for third-party apps on macOS: dyldMain.cpp.

The following condition will be executed since there is no mainLoader if no mainSet exists, thus confirming that mainLoader == nullptr is true: dyldMain.cpp.

The reserve function originates from the Linker Standard Library. The argument for reserve specifies the number of elements rather than bytes, preparing space for 512 elements of state.loaded type: Vector.h.

The lsl::bit_ceil(newCapacity) function finds the smallest power of two that is greater than or equal to the specified newCapacity: BitUtils.h.

The state.loaded container comprises pointers to Loader objects, each 8 bytes wide. Therefore, this allocates 512*8 == 4096 bytes using reserveExact: Vector.h.

After this allocation, we create Diagnostics buildDiag (line 573): dyldMain.cpp. This operation appears to zero out the memory we just allocated at x0+0x270.

Following these preparations, we create a JIT Loader. The function computes the slice offset, checks if the binary file exists, creates a loader instance based on the parameters provided, and returns a pointer to it.

A slice is a single architecture Mach-O from a Fat binary mapped to memory within the Loader::getOnDiskBinarySliceOffset function: JustInTimeLoader.cpp.

The core functionality is encapsulated in JustInTimeLoader::make, which is too lengthy to insert here. Below are some essential points regarding what the function accomplishes: - Calculates the size necessary for the JustInTimeLoader object - Allocates memory using state.persistentAllocator.malloc - Constructs a new JustInTimeLoader object using placement new - Adds the created loader to the runtime state - Returns the pointer to the newly created JustInTimeLoader object.

After initializing the JIT Loader, we set it within the RuntimeState and notify the debugger about it: dyldMain.cpp.

The setMainLoader function primarily updates the mainExecutableLoader field in the RuntimeState object with the loader pointer provided. It also logs related to the main executable, including loaded libraries and segment mappings, if logging is activated: DyldRuntimeState.cpp.

In summary, we have initialized the JIT Loader and incorporated it into RuntimeStates. It will later be employed for loading dependent libraries and applying fixups.

Image Loading

The STACK_ALLOC_OVERFLOW_SAFE_ARRAY function is at the forefront of the images (dylibs) loading process. It allocates a stack array to store pointers to Loader objects, initially set to a capacity of 16. This array will monitor all images: dyldMain.cpp.

In line 591, we add the mainLoader to the topLevelLoaders array. From lines 592 to 630, we first load the inserted libraries: dyldMain.cpp.

Next, we establish certain properties and begin recursively loading everything necessary for the main executable and the inserted dylibs (lines 640–680). The core functionality lies within the loadDependents function: dyldMain.cpp.

We can also observe how the notifyDebuggerLoad operates in LLDB by inspecting the image list before and after the function execution.

There is also a notifyDtrace function. Dylibs may contain DOF sections that provide information about static user probes for dtrace, allowing the identification and registration of such sections: DOF stands for DTrace Object Format: dyldMain.cpp.

Finally, we encounter code that identifies and registers non-cached dylib loaders to a permanent state list using addPermanentRanges: dyldMain.cpp.

Utilizing a stack-allocated array (STACK_ALLOC_ARRAY) is efficient for memory allocation and deallocation, as it avoids heap allocation. By identifying loaders not included in the dyld cache and adding them to permanent ranges, the system ensures they remain in memory.

Overall, we have successfully loaded all images essential for executing the application during this step.

Fixups

Prior to performing fixups, we set up a weakDefMap for a runtime state, a mechanism designed to manage and resolve weak symbols in dynamically loaded libraries (dylibs) before binding occurs: dyldMain.cpp.

Before addressing fixups, the buildInterposingTables function establishes tables for interposing functions in non-cached dylibs. Interposing enables a program to override existing functions in shared libraries with custom implementations, though this can be blocked by AMFI: dyldMain.cpp.

Next, applying fixups begins. The code responsible for this initiates a ScopedTimer to gauge the time taken for applying fixups and acquires a DyldCacheDataConstLazyScopedWriter for patching dyld cache data: dyldMain.cpp.

We then manage strong overrides of weak definitions via the handleStrongWeakDefOverrides function, which identifies dylibs with weak definitions, locates strong overrides in those dylibs, and applies the necessary fixups: dyldMain.cpp.

A strong symbol is simply a symbol without any additional definition or using the default attribute for visibility:

int strong_symbol = 42; int strong_symbol __attribute__((visibility("default"))) = 42;

Conversely, a weak symbol can be defined like this:

int weak_symbol __attribute__((weak)) = 42;

After addressing strong overrides of weak symbols, we iterate through each loaded loader to apply fixups utilizing applyFixups <- (core logic here). If any errors occur during fixups, execution halts, and the fixup error is reported.

Additionally, there exists an applyCachePatches function to handle any patches in the dyld cache (only if a dylib overrides something there): dyldMain.cpp.

Singleton patching within the Dyld Shared Cache is executed by a function called doSingletonPatching: dyldMain.cpp. It seems to apply only to Obj-C code. Here’s the structure: DyldRuntimeState.cpp.

Ultimately, we applyInterposingToDyldCache if required: dyldMain.cpp. However, this does not factor into the timing of applying fixups. Therefore, we can conclude that singleton patching is the final stage in the fixup process.

After completing all these fixups, we can affirm that our executable’s dependent libraries have been loaded, and symbols have been resolved and relocated, making it ready for execution.

Libdyld.dylib

The lines between 734–761 are not pertinent to us, as they apply to PrebuiltLoaders, while we are utilizing JustInTimeLoader: dyldMain.cpp.

Similarly, lines 763–796 pertain to kdebug, which is disabled. If enabled, it notifies kdebug on each image load: dyldMain.cpp.

The first action we actually take is checking if libdyld.dyld exists, as set in JustInTimeLoader::applyFixups: dyldMain.cpp.

Following this, we wire up libdyld.dylib to dyld. The code first retrieves the load address of libdyld by invoking loadAddress on the libdyldLoader (801).

Next, we search for the __dyld4 section within the __DATA segment of libdyld.dylib (803). If it is not located in the __DATA segment, it searches the __AUTH segment (806).

If it cannot be found, loading is halted: dyldMain.cpp.

We then establish a connection between libdyld.dylib and the program's runtime state by granting access to global APIs through the libdyld4Section: dyldMain.cpp.

We also permit external code and components to access information about all loaded images in the process by providing a pointer to the allImageInfos field from libdyld4Section using storeProcessInfoPointer: dyldMain.cpp.

Next, we initialize program variables (vars) in the runtime state (state) based on the information retrieved from libdyld.dylib: dyldMain.cpp.

One aspect I find puzzling is that while debugging, I couldn't locate the C code in the repository that corresponds to the below instructions:

__chkstk_darwin

After setting state.vars, we observe a blraa x16, x17 instruction, which branches to the subsequent code, ultimately leading to __chkstk_darwin:

Continuing, we branch to __chkstk_darwin_probe:

The disassembled __chkstk_darwin_probe code is illustrated below. During debugging, this executes instructions +0, +4, +8 and then jumps to +32:

+0: Compares the value in the register x9 (stack size?) with 0x1000 (4096), shifted left by 12 bits (resulting in 0x1000, equivalent to 4096). This check likely verifies if the stack size is at least 4096 bytes. +4: Moves the stack pointer (sp) value into register x10. +8: Branches low (b.lo) to instruction +32 if the comparison at instruction +0 indicates that the stack size is less than 4096 bytes. +32: Subtracts the value in x9 from the value in x10. +36: Loads a byte from memory at the new address pointed to by x10.

This probe appears to check if we can access the stack at [x10], which contains this value:

This value holds state.vars = &libdyld4Section->defaultVars, suggesting it checks if the variables are readable.

Partition Delay Loads

Moreover, after executing __chkstk_darwin, we encounter the following function. Unlike __chkstk_darwin, I could find its definition in the Dyld source code repository, but I could not determine where it is invoked in dyldMain.cpp:

The partitionDelayLoads function can be found in DyldRuntimeState.cpp between lines 525–566. Its primary purpose is to retrieve Loaders marked as delay-init, which can now be initiated.

If a loader in delayLoaded is no longer delayed, it is transferred to loaded. Conversely, if a loader in loaded is now delayed, it is moved to delayLoaded. The undelayedLoaders vector is filled with loaders originally marked for delay but are now not delayed.

This function guarantees that the dylibs are initialized in the correct sequence.

DYLD_JUST_BUILD_CLOSURE

Before proceeding, a block of code not executed under normal circumstances on macOS for third-party apps is displayed below:

This section manages the creation and serialization of prebuilt loader sets: dyldMain.cpp.

Following that, there is a check for the DYLD_JUST_BUILD_CLOSURE variable used for prewarming. If this variable is set, execution will be halted here: dyldMain.cpp.

I must revisit this part of the code, as it is quite intriguing due to its serialization and closure saving mechanisms.

I have bypassed some executed code here but modified nothing relevant for us: notifyMonitorNeeded, kdebug_trace_dyld_enabled, Prepare main.

The final task in the prepare function is to set up the program's entry point. The logic involves determining whether to utilize LC_MAIN or LC_THREAD: dyldMain.cpp.

The getEntry function is displayed below. It merely iterates over Load Commands to verify if it is using LC_MAIN or LC_UNIXTHREAD and returns the offset: MachOFile.cpp.

Line 928 converts the main executable's base address to an integer, adds an offset to it, and then converts the result back to a pointer.

For example, if mainExecutable points to the base address 0x100000000 and entryOffset is 0x2000, the operation would yield: Cast 0x100000000 to uintptr_t. Add 0x2000, resulting in 0x100002000. Cast the result back to void*.

Thus, the result would point to 0x100002000.

Final Cleanup

The last action here is to clean up to prevent resource leaks.

After returning from prepare, we also free EphemeralAllocator:

This code guarantees that any allocated resources are appropriately released before invoking main().

The program is now initiated!

Ultimately, we have completed the prepare function, representing the last step in the work() code block executed within the Memory Manager.

The next phase involves invoking appMain(), which returns the exit code after concluding the main() of our executable and storing it in the result variable: dyldMain.cpp.

However, before we proceed, we recall that we were executing the work() block and must first call memoryManager->restorePreviousState: Allocator.h.

The restorePreviousState function reinstates the previous write protection state while ensuring data integrity through pointer authentication (PAC): Allocator.h.

Thread protection is also in place using thread protection restrictions (TPRO): Allocator.h.

Finally, at dyld`start +2356, we call blraaz x20, which corresponds to our main():

When we step in, we can observe the decompiled code of our program:

The +84 ret instruction will return to dyld`start with the exit code in the x0 register.

The program has concluded its execution!

We remain within the context of Dyld when our executable completes. The exit code is transferred from x0 to x20 register:

At the end, there is a check to see if we are running within the context of a simulator platform using isSimulatorPlatform: dyldMain.cpp.

If that is the case, the _exit function is invoked. Otherwise, as is typically the case (and applies to our situation), we call exit from libSystemHelpers:

This subsequently calls libsystem_c.dylib`exit, which in turn invokes __exit from libsystem_kernel.dylib, and that concludes our journey with Dyld ^^.

Final Thoughts

This wraps up the current discussion on Dyld. I plan to revisit some briefly mentioned areas in this series, but I do not intend to elaborate on them extensively.

After evaluating all the code discussed here, I can confidently state that I have acquired significant knowledge about Dyld and macOS as a whole.

I hope that anyone following along with this series benefits similarly. However, I would be misleading if I claimed to understand all Dyld mechanics at an atomic level.

Links to all articles with tags can be found in the Snake&Apple repository.

The first video titled "Pro Tools Crashes / Won't Open Session (4 ways to fix it!)" provides various solutions for fixing Pro Tools session issues.

The second video, "Riklig språkanvändning-Anna Nyberg," discusses the use of language in various contexts.

jkisolo.com

Understanding Dyld: The Dynamic Loader Lifecycle Explained

Chapter 1: The Journey Begins

Working Map

Starting Preparation

Just-in-Time Loading

Image Loading

Fixups

Libdyld.dylib

Partition Delay Loads

DYLD_JUST_BUILD_CLOSURE

Final Cleanup

Final Thoughts

Share the page:

Recent Post:

Embracing Self-Care: A Transformative Year of Growth

Understanding the Market Requirements Document: A Comprehensive Guide

New Morning Routine for a Refreshing Start to Your Day

Unlocking Your True Potential: A Guide to Self-Improvement

Rediscovering Inner Peace: A Journey to Authentic Healing

Essential Strategies to Prevent Falls in Seniors: A Guide

# Lessons from Neopets: My Journey into Freelancing and Tech

Elevate Your Obsidian Experience with Top New Plugins