Skip to content

Much faster navigator index creation for mixed Swift and Objective-C projects #917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

d-ronnqvist
Copy link
Contributor

@d-ronnqvist d-ronnqvist commented May 14, 2024

Bug/issue #, if applicable: rdar://127759734

Summary

This fixes a major inefficiency in the navigator index creation for large (10k+ symbols) mixed Swift and Objective-C projects.

With these changes impacted projects will build about 5 times faster total docc convert time. Even larger projects may see even faster builds (the biggest improvement that I've seen was ~30 times faster in one project).

Benchmarks

Below are two benchmarks from two large mixed Swift and Objective-C projects that show a 6 times improvement and a 5 time improvement compared to current main:

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Metric                                   │ Change          │ main                 │ current              │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Duration for 'bundle-registration'       │ no change¹      │ 8.353 sec            │ 8.271 sec            │
│ Duration for 'convert-total-time'        │ -82.9%²         │ 89.014 sec           │ 15.223 sec           │
│ Duration for 'documentation-processing'  │ -93.41%³        │ 78.942 sec           │ 5.202 sec            │
│ Duration for 'finalize-navigation-index' │ no change⁴      │ 1.031 sec            │ 1.036 sec            │
│ Peak memory footprint                    │ no change⁵      │ 1.03 GB              │ 1.02 GB              │
│ Data subdirectory size                   │ no change⁶      │ 399.7 MB             │ 400.4 MB             │
│ Index subdirectory size                  │ no change⁷      │ 24.5 MB              │ 24.4 MB              │
│ Total DocC archive size                  │ no change⁸      │ 456.8 MB             │ 458.5 MB             │
│ Topic Anchor Checksum                    │ no change       │ 7da1d0d3ff251d8c2c0f │ 7da1d0d3ff251d8c2c0f │
│ Topic Graph Checksum                     │ change          │ 80f03eee58720209c93f │ 542c6795a3c9d1b555c3 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Metric                                   │ Change          │ main                 │ current              │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Duration for 'bundle-registration'       │ no change¹,²    │ 3.052 sec            │ 2.954 sec            │
│ Duration for 'convert-total-time'        │ -80.11%³,⁴      │ 35.358 sec           │ 7.034 sec            │
│ Duration for 'documentation-processing'  │ -88.23%⁵,⁶      │ 31.986 sec           │ 3.765 sec            │
│ Duration for 'finalize-navigation-index' │ no change⁷,⁸    │ 0.169 sec            │ 0.163 sec            │
│ Peak memory footprint                    │ no change⁹      │ 577.3 MB             │ 575.6 MB             │
│ Data subdirectory size                   │ no change¹⁰     │ 108.6 MB             │ 108.6 MB             │
│ Index subdirectory size                  │ no change¹¹     │ 3.9 MB               │ 3.9 MB               │
│ Total DocC archive size                  │ no change¹²     │ 138.6 MB             │ 138.6 MB             │
│ Topic Anchor Checksum                    │ no change       │ d41d8cd98f00b204e980 │ d41d8cd98f00b204e980 │
│ Topic Graph Checksum                     │ change          │ d4e127bf2e5d126c1cb7 │ 3d1b0f2fcacafee063b6 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

If you look at these closely you may see that the output size and topic graph checksum aren't the same. At first I thought that I caused a regression but it turns out that the output isn't fully stable for mixed Swift and Objective-C projects. I tested this on "main", "release/6.0", and "release/5.10" and they all show similar differences when comparing the same commit of DocC to itself. For example, these 3 measurements are all gathered from the current main (664acfd7) and show similar changes in output size and topic graph checksum:

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Metric                                   │ Change          │ main-1               │ main-2               │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Duration for 'bundle-registration'       │ no change¹,²    │ 8.242 sec            │ 8.41 sec             │
│ Duration for 'convert-total-time'        │ no change³,⁴    │ 96.103 sec           │ 96.014 sec           │
│ Duration for 'documentation-processing'  │ no change⁵,⁶    │ 84.831 sec           │ 84.838 sec           │
│ Duration for 'finalize-navigation-index' │ no change⁷,⁸    │ 1.104 sec            │ 1.09 sec             │
│ Peak memory footprint                    │ no change⁹      │ 1.03 GB              │ 1.05 GB              │
│ Data subdirectory size                   │ no change¹⁰     │ 403.9 MB             │ 399.9 MB             │
│ Index subdirectory size                  │ no change¹¹     │ 24.5 MB              │ 24.4 MB              │
│ Total DocC archive size                  │ no change¹²     │ 462 MB               │ 458 MB               │
│ Topic Anchor Checksum                    │ no change       │ 15bea17f1e7688a3fc2d │ 15bea17f1e7688a3fc2d │
│ Topic Graph Checksum                     │ change          │ 5b0bf590353b64a56db7 │ 477075a5259bee77ca9c │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Metric                                   │ Change          │ main-2               │ main-3               │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Duration for 'bundle-registration'       │ no change¹,²    │ 8.41 sec             │ 8.324 sec            │
│ Duration for 'convert-total-time'        │ no change³,⁴    │ 96.014 sec           │ 96.647 sec           │
│ Duration for 'documentation-processing'  │ no change⁵,⁶    │ 84.838 sec           │ 85.55 sec            │
│ Duration for 'finalize-navigation-index' │ no change⁷      │ 1.09 sec             │ 1.101 sec            │
│ Peak memory footprint                    │ no change⁸,⁹    │ 1.05 GB              │ 1.05 GB              │
│ Data subdirectory size                   │ no change¹⁰     │ 399.9 MB             │ 409 MB               │
│ Index subdirectory size                  │ no change¹¹     │ 24.4 MB              │ 24.4 MB              │
│ Total DocC archive size                  │ no change¹²     │ 458 MB               │ 467.1 MB             │
│ Topic Anchor Checksum                    │ no change       │ 15bea17f1e7688a3fc2d │ 15bea17f1e7688a3fc2d │
│ Topic Graph Checksum                     │ change          │ 477075a5259bee77ca9c │ eb532ce04afad3fdfcfd │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

I've gone over my changes many times and since many previous releases have the same behavior, I'm reasonably certain that I didn't cause them now.

Why this is faster

The added comment in NavigatorIndex.swift explain the specific changes in more detail, but long story short:

Using RenderNodeVariantOverridesApplier() to get the Objective-C variant of a RenderNode is very slow and involves repeatedly encoding and decoding the JSON into different formats to transform it.

By instead peeking into the different variant collections to give a "view" into the render node's Objective-C data we can avoid all that encoding and decoding.

Unfortunately the docc process-archive index command still needs to use the slow path. I filed rdar://128050800 to track fixing that as a separate PR.

Dependencies

None

Testing

Build documentation for a large mixed Swift and Objective-C project with main and with this branch and measure the total build time. If the project is large enough the difference should be very noticeable. If you want rough estimate numbers to compare you can use time for each call, for example:

time .build/arm64-apple-macosx/release/docc convert /path/to/SomeCatalog.docc  --additional-symbol-graph-dir /path/to/some-symbol-graph-directory 

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

  • Added tests
  • Ran the ./bin/test script and it succeeded
  • Updated documentation if necessary Not applicable

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

Oh, the CI fails because we're not allowed to nest protocols like that yet. I'll move them out to be top-level tomorrow.

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

Copy link
Contributor

@mayaepps mayaepps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a great time improvement! I've left a few questions.

import Foundation

/// A language specific representation of a render node value for building a navigator index.
protocol NavigatorIndexableRenderNodeRepresentation<Metadata> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this protocol and NavigatorIndexableRenderMetadataRepresentation only are conformed to once. Why do we need this protocol as opposed to just using RenderNodeVariantView directly, for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both protocols are conformed to twice; once for the main type and once for the view:

Render Node representation

extension RenderNode: NavigatorIndexableRenderNodeRepresentation {}
struct RenderNodeVariantView: NavigatorIndexableRenderNodeRepresentation {
   ...
}

Metadata representation

extension RenderMetadata: NavigatorIndexableRenderMetadataRepresentation {}
struct RenderMetadataVariantView: NavigatorIndexableRenderMetadataRepresentation {
   ...
}

We need these protocols so that the same index implementation can be used for both Swift and Objective-C representations of pages.

private func index(_ renderNode: any NavigatorIndexableRenderNodeRepresentation, traits: [RenderNode.Variant.Trait]?) throws -> InterfaceLanguage? { ... }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I'd missed that. Thanks for the explanation!

}
}

private let typesThatShouldNotUseNavigatorTitle: Set<NavigatorIndex.PageType> = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want these types to not use the navigator title? I assume this is not new behavior--is it because they're symbols?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. This isn't new code. I just moved it from here because I needed to make the code usable for NavigatorIndexableRenderNodeRepresentation and not just RenderNode.

RenderRelationshipsGroup(
name: chapter.name,
abstract: nil,
references: chapter.tutorials.compactMap { self.references[$0.identifier] as? TopicRenderReference }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any scenarios where the volume's chapters wouldn't be TopicRenderReferences? It looks like previously this logic force-unwrapped the result of the cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of any scenarios today but maybe there will be some in the future. I saw this code and felt that it was unnecessary to force-cast when the old code was already doing a compactMap to access the references anyway:

+ tutorials.compactMap { self.references[$0.identifier] as? TopicRenderReference }
- tutorials.compactMap { self.references[$0] } as! [TopicRenderReference]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense. I was initially thinking it might be useful to have a warning or something here so children weren't silently dropped from the navigator, but it sounds like that's probably not a situation we have to worry about right now.

/// - originalValue: The original value to transform.
/// - traits: The traits that determine what variant's patches to apply to the original value.
/// - Returns: The transformed value, or the original value if no variants match the given traits.
func applied(to originalValue: Value, for traits: [RenderNode.Variant.Trait]) -> Value {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems more flexible, nice! Was there something else in this PR that necessitated this change?

Copy link
Contributor Author

@d-ronnqvist d-ronnqvist May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this abstraction because the previous one wasn't all that useful.

There are several properties in the navigator builder that need to determine what the value is for a given trait, some for the render node:

var topicSections: [TaskGroupRenderSection] {
    wrapped.topicSectionsVariants.value(for: traits)
}
var defaultImplementationsSections: [TaskGroupRenderSection] {
    wrapped.defaultImplementationsSectionsVariants.value(for: traits)
}

but most for the render node metadata:

var title: String? {
    wrapped.titleVariants.value(for: traits)
}
var navigatorTitle: [DeclarationRenderSection.Token]? {
    wrapped.navigatorTitleVariants.value(for: traits)
}
var fragments: [DeclarationRenderSection.Token]? {
    wrapped.fragmentsVariants.value(for: traits)
}
var externalID: String? {
    wrapped.externalIDVariants.value(for: traits)
}
var roleHeading: String? {
    wrapped.roleHeadingVariants.value(for: traits)
}
var symbolKind: String? {
    wrapped.symbolKindVariants.value(for: traits)
}
var platforms: [AvailabilityRenderItem]? {
    wrapped.platformsVariants.value(for: traits)
}

The previous abstraction operated on the individual VariantCollection.Variant (as opposed to operating on the full VariantCollection). This API wasn't particularly useful for two reasons:

  • it was the callers responsibility to iterate over the variants for a certain trait
  • the API was only available when the value was a (range replaceable) collection

The first drawback meant that to determine the Objective-C value of a render node, each caller needs to access the default value, filter and iterate over the variants, and incrementally apply the patch:

var topicSections: [TaskGroupRenderSection] {
    var patchedValue = wrapped.topicSectionsVariants.defaultValue
    for variant in wrapped.topicSectionsVariants.variants where variant.traits == traits {
        patchedValue = variant.applyingPatchTo(patchedValue)
    }
    return patchedValue
}

Most of this is boilerplate compared to the new API

var topicSections: [TaskGroupRenderSection] {
    wrapped.topicSectionsVariants.value(for: traits)
}

The second drawback meant that to determine the Objective-C value of render node's metadata's platforms, the caller couldn't use the current API because the Value of VariantCollection<[AvailabilityRenderItem]?> is an optional value and not a collection. Thus, the caller not only needs to access the default value, filter and iterate over the variants, and incrementally apply the patch but also needs to switch over there patch and incrementally apply it depending on the type of Value:

var platforms: [AvailabilityRenderItem]? {
    var patchedValue = wrapped.platformsVariants.defaultValue
    for variant in wrapped.platformsVariants.variants where variant.traits == traits {
        for patch in variant.patch {
            switch patch {
            case .replace(let value):
                patchedValue = value
            case .add(nil):
                break // Adding nil does nothing
            case .add(let value?):
                if patchedValue == nil {
                    patchedValue = value
                } else {
                    patchedValue!.append(contentOf: value)
                }
            case .remove:
                patchedValue = nil
            }
        }
    }
    return patchedValue
}

The two steps to address all this boilerplate is:

  • Create an interface for types that a patch can be applied to
  • Move the API to VariantCollection (instead of VariantCollection.Value)

The new API did both these things, thus obsoleting the old API.
Further, most of the time the caller doesn't want to apply the patch to an arbitrary value. It makes the most sense to apply the patch to the default value for that variant collection. Because of this, I wrapped the applied(to:for:) API with value(for:).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks for the detailed explanation. The new API is definitely easier to read.

Copy link
Contributor

@mayaepps mayaepps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist d-ronnqvist merged commit b1c29d3 into swiftlang:main May 23, 2024
@d-ronnqvist d-ronnqvist deleted the objc-navigator-index-performance branch May 23, 2024 07:09
d-ronnqvist added a commit to d-ronnqvist/swift-docc that referenced this pull request May 23, 2024
…projects (swiftlang#917)

* Fix a navigator index performance for mixed Swift/Objective-C projects

rdar://127759734

* Use new apply-patch function in other places

* Deprecate unused `RenderNode/childrenRelationship(for:)`

* Move inner function in test to avoid warning about captured state

* Fix unrelated issue where the language in disambiguated references wasn't stable

* Move new indexable-render-node-representation protocol top-level

* Remove extra word in code comment

Co-authored-by: Maya Epps <[email protected]>

---------

Co-authored-by: Maya Epps <[email protected]>
Comment on lines +625 to +627
guard language == .swift else {
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this means we can't index objective-c only frameworks? is even possible to have an objective-c frameworks or the top level node will always have be a swift node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, any single language framework would already be indexed 3 lines above this

let language = try index(renderNode, traits: nil)

mutating func remove()
}

extension Optional: VariantCollectionPatchable where Wrapped: VariantCollectionPatchable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯


extension Optional: VariantCollectionPatchable where Wrapped: VariantCollectionPatchable {
mutating func add(_ other: Wrapped?) {
guard var wrapped, let other else { return }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is wrapped referring to the type of the optional? In this case does it has to be unwrapped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapped is the optional's value (which is of the type Wrapped)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants