Development – Michael Kazakov's quiet corner

January 18, 2025

Action Shortcuts in Nimble Commander

Over the New Year 2025 holiday period, I spent quite a few days implementing a feature in Nimble Commander that has been occasionally requested for many years. Customization of action shortcuts has been supported in the app since its early days, but this only allowed for a one-to-one relationship: an action could have a single shortcut.

What some users wanted was twofold:

The ability to set more than one shortcut per action.
The option to choose shortcuts that are normally not supported by macOS UX.

Adding this feature involved an interesting refactoring and touched many nitty-gritty details, so I thought it would be worth writing a brief reflection.

To start, let’s lay out the basics of how Nimble Commander handled shortcuts before.

Shortcuts can be assigned to three distinct types of actions, even though on the surface they appear similar:

Actions available through the standard fixed set of menu items.
Processing the keyboard events for these shortcuts is done by macOS frameworks. This approach is standard and expected by users, plus it’s somewhat self-documenting — the menu items display their key equivalents next to the labels. The downside, often surprising to users, is that the menu system enforces strict limitations on which keys or key combinations can work as shortcuts. For instance, assigning a single letter without modifiers as a shortcut isn’t possible.
Context-based custom actions.
An example of these might be actions in the file panels. For instance, when the Down key is pressed, a particular view in the responder chain recognizes the keypress as corresponding to the panel.move_down action and triggers the appropriate event. This logic is custom and implemented in methods like performKeyEquivalent: or keyDown:. Nimble Commander has full control over these events, with no restrictions on key assignments.
External tools.
Shortcuts can also be set for external tools. In this case, the macOS menu system handles the heavy lifting — the corresponding dynamic menu item is assigned the shortcut selected for the external tool. The difference from standard application menu actions is that the list of external tools can be modified while the application is running. However, the same restrictions as menu-based actions apply here.

There are two main components in Nimble Commander that provide the machinery for shortcut customization:

ActionShortcut.
This class describes a single keystroke. In practical terms, it encapsulates a combination of a character from the BMP (2 bytes) and a bitmask of key modifiers (1 byte). The class is very compact, occupying just 4 bytes (2 bytes for the character, 1 byte for the modifier bitmask, and 1 byte for padding).
ActionsShortcutsManager.
This class is responsible for tracking which actions are associated with which shortcuts, including both default settings and custom overrides. It also provides a mechanism to automatically update external instances of ActionShortcut whenever the shortcut assigned to a corresponding action changes.

At first glance, the feature seems to simply involve allowing multiple shortcuts per action instead of just one. However, a naive implementation would be too inefficient. The issue lies with how context-based shortcut processing is implemented.

Previously, the processing worked as follows: each view in the hierarchy (e.g., file panel views, split views, tabbed holders, main window states, etc.) maintained a set of ActionShortcuts corresponding to the actions it handled. Whenever a keyDown: event occurred, the view hierarchy was traversed, with each view being asked, “Are you interested in this keystroke?” until one of the views responded affirmatively. Views supporting customizable shortcuts would then iterate through their set of shortcuts, asking each one, “Are you this keypress?“.

The runtime cost of this implementation scaled linearly with the number of customizable shortcuts. This was already somewhat inefficient, and converting each of these shortcuts into a dynamic container would have been completely impractical.

After staring at the code for a while, I realized the entire problem could be re-framed.

Instead of asking each shortcut, “Are you this keystroke?”, a new shortcut could be created directly from the keystroke itself. Once an incoming keystroke is expressed as a shortcut, it becomes possible to compare it directly with other shortcuts (a comparison of exactly 3 raw bytes).

Moreover, since shortcuts are just three bytes, they are trivially hashable, allowing all used shortcuts to be stored in a flat hash map. With such a map in place, it’s possible to perform an O(1) lookup for the incoming keystroke and answer the reverse question: “Which actions are triggered by this keystroke?”

This approach eliminates the need to maintain up-to-date, context-based shortcuts scattered throughout the UI code. Instead, the UI code can query the ActionsShortcutsManager to determine if the incoming keystroke corresponds to any specific action.

In practical terms, this functionality expansion involved:

Allowing the creation of an ActionShortcut from an NSEventTypeKeyDown NSEvent by adding a new constructor.
Modifying the ActionsShortcutsManager‘s API to store and provide a dynamic array of shortcuts for each action.
Building and maintaining a mapping between shortcuts and the list of actions that use each shortcut.
Adding a new API to query for an action associated with a shortcut with ~O(1) complexity and zero mallocs.
Performing a deep refactor to make ActionsShortcutsManager properly unit-testable and actually writing unit tests.

Once this infrastructure was in place, adding support for multiple shortcuts for menu items was relatively straightforward:

Override the NSWindows’s performKeyEquivalent: method.
Find the first action that uses the incoming keypress as its shortcut, if it exists.
Check the index of the shortcut among the action’s shortcuts. If it’s the first one, ignore it and let NSMenu process the event.
Locate the menu item with the corresponding action tag and find its responder.
Validate the menu item if the responder provides the validation interface. Beep and return if validation fails.
Finally, ask the parent menu to perform the menu item’s action, including the visual blink that’s normally expected.

External tools still support only a single shortcut for now, though. Perhaps this is something to improve in future releases.

October 5, 2024October 5, 2024

Enabling Swift in Nimble Commander

I’m quite interested in introducing Swift in Nimble Commander’s codebase and gradually replacing its UI components with code written in this language. Integrating Swift into this codebase is not straightforward, as there is almost no pure Objective-C code. Instead, all UI-level code is compiled as Objective-C++, which gives transparent access to components written in C++ and its standard library. Frankly, this is often much more efficient and pleasant to use than [Core]Foundation. The challenge before was that interoperability between C++ and Swift was essentially non-existent, and the only solution was to manually write bridges in plain C, which was a showstopper for me. Last year, with Xcode 15, some reasonable C++ <-> Swift interoperability finally became available, but it was missing crucial parts to be meaningfully used in an established codebase. However, with Xcode 16, it seems that the interop is now mature enough to be taken seriously. This week, I converted a small Objective-C component to Swift and submitted the change to Nimble Commander’s repository. It was a rather bumpy ride and took quite a few hours to iron out the problems, so I decided to write down my notes to help someone else spare a few brain cells while going through a similar journey.

The start was promising: enable ObjC++ interop (SWIFT_OBJC_INTEROP_MODE=objcxx), add a Swift file, and Xcode automatically enables Clang modules and creates a dummy bridging header. I had to tweak some C++ code with SWIFT_UNSAFE_REFERENCE to allow the Swift compiler to import the required type, but after that, the setup worked like a charm – the Objective-C++ side created a view now implemented in Swift, and the Swift side seamlessly accessed the component written in [Objective]C++. All of this was fully supported by Xcode: navigation, auto-completion—it all worked! Well, until it didn’t. Trivial functions, like printing “Hello, World!” worked fine, but the actual UI component re-written in Swift greeted me with a crash:

Nimble Commander`@objc PanelListViewTableHeaderCell.init(textCell:):
    0x10128ed74 <+0>:   sub    sp, sp, #0x50
    0x10128ed78 <+4>:   stp    x20, x19, [sp, #0x30]
    0x10128ed7c <+8>:   stp    x29, x30, [sp, #0x40]
    0x10128ed80 <+12>:  add    x29, sp, #0x40
    0x10128ed84 <+16>:  str    x0, [sp, #0x10]
    0x10128ed88 <+20>:  str    x2, [sp, #0x18]
    0x10128ed8c <+24>:  mov    x0, #0x0                  ; =0 
    0x10128ed90 <+28>:  bl     0x10149cf84               ; symbol stub for: type metadata accessor for Swift.MainActor
->  0x10128ed94 <+32>:  mov    x20, x0
    0x10128ed98 <+36>:  str    x20, [sp, #0x8]
    ...

This left me quite puzzled—the Swift runtime was clearly loaded, as I could write a function using its standard library, and it was executed correctly when called from the C++ side. Yet the UI code simply refused to work, with parts of it clearly not being loaded—the pointers to the functions were NULL. Normally, I’d expect a runtime to either work correctly or fail entirely with a load-time error, but this was something new. As I don’t have much (or frankly, any) reasonable understanding of Swift’s runtime machinery under the hood, I tried searching online for any answers related to these symptoms and found essentially none. It’s not the kind of experience one would expect from a user-friendly language.

While searching for what other projects do, I stumbled upon a suspicious libswift_Concurrency.dylib located in the Frameworks directory, which gave me a hint—the actors model is related to concurrency, and the presence of this specific library couldn’t be a coincidence. So, out of desperation and curiosity, I blindly copied this library into Nimble Commander’s own Frameworks directory, and lo and behold—it finally worked! There is an option to make Xcode copy this library automatically: ALWAYS_EMBED_SWIFT_STANDARD_LIBRARIES. Another piece of the puzzle was that my @rpath only contained @executable_path/../Frameworks when it should have also included /usr/lib/swift. With these changes, Nimble Commander can now run correctly on macOS 10.15 through macOS 15.

With that done and the actual application built, it was time to tackle the tooling around the project. While Xcode’s toolchain is used to compile Nimble Commander, a separate LLVM installation from Homebrew is used for linting. That’s because Xcode doesn’t include eitherclang-format or clang-tidy (seriously, weird!). Since Apple ships a modified toolchain, consuming the same source code with an external toolchain can be rather problematic. I had to make the following changes to get clang-tidy to pass again after integrating the Swift source:

Disable the explicit-specialization-storage-class diagnostic, as the automatically generated bridging code from the Swift compiler seems to contain incorrect declarations.
Disable Clang modules by manually removing the -fmodules and -fmodules-cache-path= flags from the response files.
Remove framework imports (e.g., @import AppKit;) from the automatically generated bridging headers.
Add the paths to the Swift toolchain to the project’s search paths, i.e., $(TOOLCHAIN_DIR)/usr/lib/swift and $(TOOLCHAIN_DIR)/usr/include.
Explicitly include the Swift interop machinery before including the bridging header, i.e., #include <swiftToCxx/_SwiftCxxInteroperability.h>.

Such a number of hacks is unfortunate, and I hope to gradually make it easier to maintain.

This concludes the bumpy road to making Swift usable in Nimble Commander, though I’m sure more active use of the language will reveal other rough edges—no interop is seamless.

March 25, 2019March 25, 2019

Compilation time: Boost vs Std

This is a small experiment to compare compilation times of some Boost facilities against their standardised counterparts. The goal was to assess the time penalty which comes with vast compilers and platforms support.
Each measured file contains a minimal code snippet to employ some facility. The difference between Boost and Std versions boils down to including another header and picking the right namespace. Both preprocessing/parsing and instantiation contribute to the timing, so technically speaking it’s not exactly only compilation time. Each source file was compiled a hundred times to minimise measurement errors.
Boost version is 1.69.
Std implementation is libc++ coming with Xcode10.1.
Both were compiled using clang-1000.11.45.5 using C++17 mode and -O2 optimisation level.
The snippets themselves and the driver script are available here.

February 12, 2019February 12, 2019

Measuring templates bloat

Recently I’ve been investigating slow compilation of source files which used one particular library. The library was written in-house and has its roots in pre-C++11 era, including abundant usage of Boost. The library itself provides a sophisticated mechanism of reflection and operates with type-erased objects. Obviously, it heavily relies on the C++ type system and has a lot of template code.
Boost became my primary suspect almost immediately – it’s notorious for compiler torture and I personally try to avoid it wherever possible. So the most prominent red flags like Boost.MPL were almost completely removed and other pieces were converted to their C++11 standardised counterparts. Results, however, weren’t inspiring – compilation time moved a bit, but only marginally. The bottleneck was somewhere else.

Looking at MSVC’s time report (“/d2cgsummary”) didn’t provide anything meaningful – it basically stated that each file contained dozens of functions with “Anomalistic Compile Times”™️. No details why though.
GCC’s time report (“-ftime-report”), on the other hand, was much more helpful. It clearly showed that the lion’s share of time was spent on “phase opt and generate”, which, to my understanding, is actual instructions generation. That was somewhat surprising given the fact that the majority of these source files weren’t large nor performed any rocket surgery.

It should be mentioned that almost all reflection code in that library was written in templates, which makes sense. And, apparently, the compiler was spending time generating instructions for these methods per each instantiation type over and over again for each translation unit, to be later simply thrown away by the linker. It’s hard to estimate the number of instantiation types in the final product itself, but 50-100 can serve as ballpark estimation. So I decided to make an experiment and tried offloading some portions of templated code into a private non-templated “base” class. It immediately became evident that removing even tiny pieces of code, like the formatting of an exception message, results in a reduction of overall object files size (*.obj) by literally megabytes.

In this post I, roughly model the situation with synthetic code generation. Imagine the following pattern (let’s call it Pattern 1):

struct Base {
    virtual ~Base() = default;
    virtual void Method(int v) = 0;
};

template <typename T>
struct Impl : Base {
    void Method( int v ) override {
        if( v == 0 ) // some error checking
            throw std::logic_error
            ( "you're so unlucky with Method() for \'"s + typeid(T).name() + "\'!" );
        // some useful stuff
    }
};

struct Type{};

Base *Spawn_Type() { return new Impl<Type>; }

It’s quite easy to generate such code for a given number of class methods and instantiation types. Each additional method adds an entry in a virtual methods table and a templated implementation in Impl<T> by analogy with Method(). Each additional instantiation type introduces a new type and a new spawning function by analogy with Type / Spawn_Type().
And, for comparison, below is a slightly altered version (Pattern 2). ImplBase provides non-templated functionality and Impl<T> does just the same but redirects the exception composing and throwing to the ImplBase class. Performance hit introduced by additional function call can be neglected in 99.9% of cases.

// [...]
struct ImplBase {
    static void ThrowLogicErrorAtMethod( const std::type_info& typeid_t );
};

template <typename T>
struct Impl : Base, private ImplBase {
    void Method( int v ) override {
        if( v == 0 ) // some error checking
            ThrowLogicErrorAtMethod( typeid(T) );
        // some useful stuff
    }
};
// [...]

This repo contains generators and measurement scripts for both patterns. Scripts execute these generators for each of combinations of [1..20] methods by [1..20] instantiations and measure compilation of produced source code. The measurements shown below were made with “Apple LLVM version 10.0.0 (clang-1000.11.45.5)” on i7-3520M with “-std=c++17 -O2 -c” flags.

These are the compilation times and the object file sizes for the first pattern. Both compilation time and file size scale roughly proportionally to both number of methods and types. In the worst case scenario (20 methods x 20 types) it takes almost 3 seconds to compile the code which does basically nothing apart from error reporting. If there would be any actual code instead of “// some useful stuff” the graph will look much scarier.

The graphs below show the scaling of the second pattern. The worst-case scenario takes 0.75s to compile instead of 2.73s with the first pattern. The object file is 4 times smaller in that case.

Of course, both patterns generated a completely synthetic code which is far too simple to look at concrete absolute numbers. Adding any reasonable logic into these methods would radically shift the results. But I guess it’s safe to assume that delta between these two patterns will not go anywhere – a compiler will still have to generate these instructions regardless of other complexity. So it should be fair to look at delta numbers:

These delta numbers show something interesting. For instance, for the case of 10 methods and 10 instantiation types (which doesn’t seem too extreme), the difference is about half a second of compilation time per file. Or, to rephrase, there is a choice of two approaches:
a) Pattern I: clearer code – easier to maintain, but it costs 0.4 seconds of wait time per compilation per file;
b) Pattern II: a bit more obfuscated code – harder to maintain, but doesn’t introduce additional cost in terms of compilation time.
This choice, as usual in engineering, doesn’t provide a “right” option – it’s always a tradeoff. Often times, however, such choices are being done unconsciously just because something is considered to be a “default” way by the C++ community.

The “zero-cost abstractions” are sometimes being presented as the main C++ feature, but there are many hidden costs – graphs above show just one aspect of such penalties. The recent debate on Modern C++ vs. GameDev touched this problem and the ascetic approach of “C with classes” definitely has many valid points. At least such code compiles fast.