-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUE 729 - Streaming Imports from ODA #739
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ompare unit test with some comparisons deactivated.
…pport metadata nodes properly
…ph differences
… thats the only way really on models with lots of identical names. fixed ignore mechanisms.
…of concept now import via ModelImportManager
…er, and made tweaks so rest of bouncer can load a RepoSceneBuilder scene to do full import
…he repoquery to be better designed
…ta only) nodes
…rms with mesh data. added support to textures for streaming import. support for min bounding box and shared coordinates.
…y collector and file processor, and updated the rvt, dwg and dgn importers.
…into ISSUE_729 # Conflicts: # bouncer/src/repo/manipulator/repo_manipulator.cpp # bouncer/src/repo/manipulator/repo_manipulator.h # bouncer/src/repo/repo_controller.cpp # bouncer/src/repo/repo_controller.cpp.inl # bouncer/src/repo/repo_controller.h # bouncer/src/repo/repo_controller_internal.cpp.inl
…eration with multiple entries in one go
… getModelBounds to use header info instead of geometry for speed
…der and updated RepoSceneBuilder to use it
…e across translation units
…involve internal heap allocations
…l into the worker thread of reposcenebuilder
…that have values are read in the first place
…ared libraries
# Conflicts: # tools/bouncer_worker/config_example.json # tools/bouncer_worker/src/lib/config.js
carmenfan
requested changes
Mar 18, 2025
carmenfan
approved these changes
Mar 18, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes #729 - streaming imports from ODA.
Overview
The main goal of this PR is to change how files are imported, using a pattern where nodes are offloaded to the database as soon as possible to keep the memory consumption down.
The PR makes a number of changes to achieve this for ODA, and prepare to swap other importers too. In addition, a number of performance optimisations have been made, along with updates to support the gcc-11&c++20 toolchain.
Importer Re-write
The biggest change is that the ODA import process has been "inverted".
Previously all importers would set up layers in GeometryCollector. These layers were containers that would receive primitives from ODA as a view was vectorized. At the end the collection of primitives would be turned into a scene graph.
By building the graph post-hoc, the GeometryCollector could do things such as filter empty transforms, and apply the world offset with a global view of the scene - but with the streaming import, this global view no longer exists.
The way the ODA importers work now is that each takes explicit control over the container that ODA will write primitives into (called the 'draw context' object).
As ODA vectorizes a scene, it makes recursive calls to the
draw()
/doDraw()
overrides. Within each implementation, the importer decides whether a new layer/transform should be created for that geometry, and if so updates the draw context. The draw context itself exists on that call's stack frame. When the stack unwinds back to that frame, the method checks if the context object has any geometry, and if so only then does it create the transformation & metadata nodes.Global properties such as the world offsets are set beforehand in the File Processors, using APIs specific to each format to get scene bounds from whatever information is available in the file headers.
Each Data Processor builds the tree in a different way, so how the necessary information is stored with the context depends on the file format. Typically though, as the trees are quite simple, they are mostly just local variables in the frame that owns the context.
For performance, metadata is only collected when a node has geometry.
This behaviour is specific to RVT, DGN, and DWG, as these are the only importers that use the vectorization approach. NWD uses a more traditional tree traversal, where we can add our own parameters to the recursive method, and get geometry from a direct function call.
The inheritance of the ODA importers now looks like the following:
To better reflect how logic is actually shared between these types.
Optimisations and Upgrades
While these changes were being made, bouncer was tested with a number of recent files that were failing on production. Guided by the Visual Studio Profiler while importing these files, a number of changes to things like caching behaviour, the ODA APIs used and control flow were made in order to reduce the import times and memory.
Changes
This PR makes the following specific changes:
RepoSceneBuilder
to themodelutility
namespace. RepoSceneBuilder accepts Repo Nodes and graph operations (addParent
), and uses them to populate a collection asynchronously via a worker thread. RepoSceneBuilder is intended to be used in place of the RepoScene constructor for streaming-enabled importers.RepoMeshBuilder
to theodaHelper
namespace. RepoMeshBuilder receives faces (from an ODA tessellation object) and outputs MeshNodes. This is the mesh building part of the previousGeometryCollector
.GeometryCollector
. Instead of the nested dictionaries, the new type uses a stack of Context objects that is managed by the callers. Transformation and Metadata node management is separated from the context management, with the owner being responsible for connecting the two. Transformation and Metadata nodes are created immediately using RepoSceneBuilder.RepoQuery
,AddParent
, has been introduced, which when run as an update adds a set of UUIDs to a document'sparents
array.[1]BulkWriteContext
, has been added to thedatabase
namespace. This new type allows owners to make multiple insert and update calls, and have them automatically dispatched in bulk. The Mongo database handler has an implementation of this object. The abstract database handler has a new method to return such an object.repo_face_t
has been replaced with an actual type, that mimics the API of the old one (astd::vector
), but which does not perform any of its own allocations. This is because the profiler was showing heap allocations to be a significant part of the hot path. Corresponding face types types have been added forGeometryCollector
andRepoSceneBuilder
.repo_color3d_t
has been introduced.repo_material_t
now usesrepo_color_3d_t
instead of float vectors to store colours. This reduces heap allocations and also fixes the size of the colours in arepo_material_t
, disambiguating how transparency is handled. Native support for this new type has been added toRepoBSONBuilder
.repo_material_t::checksum()
has been updated to use astd::hash
of built-in primitives, instead of computing a CRC of the string representation, as profiling was showing the string operations to be a significant part of the hot path.VertexMap
no longer performs vertex indexing, but simply maintains arrays. Vertex indexing can now be performed byMeshNode
instances on themselves (removeDuplicateVertices()
). This removes indexing from the hot path, and also makes it available to all importers.RepoSceneBuilder
callsremoveDuplicateVertices()
on all Mesh Nodes in its worker thread.TransformReductionOptimizer
has been decomissioned.IFCUtilsParser
has been updated to absorb Transformation Nodes where possible, on import.RepoQuery
implementation has been updated to be easier to follow & enable proper abstractions through the use of visitors. Variants are used to declare the potential types for the visitors, as per the standard library. The use of variants to define specific types allows making different sets of Repo Queries that are supported by different database methods. The use of the visitor pattern puts the implementation in the database handler module, where it should be.repo_structs.h
have been moved into therepo
namespace. This is because anonymous namespaces are unique between translation units (such as libraries), but the structs should be fungible (if a type is not used across translation units, it should probably not be in repo_structs.h...)toString
methods have been removed from repo_structs.h.RepoBSON::replaceBinaryWithReference()
is now true to its name and deletes theBinMapping
once the BSON has been updated.RepoNode
has a new virtual member,getSize()
, that is intended to return the total allocated memory owned by that node. This has overrides forTransformationNode
andMeshNode
.getParameters
method, instead ofgetParamsList
, so only populated parameters are imported. This skips unpopulated parameters, such as shared project parameters, which would be ignored by the previous logic anyway.ImportFromFile
as some callers assume they are set.SceneUtils
, for querying the scene graph as-imported for the purposes of the unit tests.ut_repo_model_import_oda.cpp
has been added to contain any ODA specific regression tests.projectHasMetaNodesWithPaths
&projectHasGeometryWithMetadata
test functions have been removed (in favour of SceneUtils), and the tree validation performed for ODA types in the system tests have been moved into ut_repo_model_import_oda.cpp.UploadTestNWDProtected
no longer tests if the project exist, because RepoSceneBuilder will make the project beforehand (it would be up to the user via io to destroy the collection, if they didnt want to attempt an upload again)RepoLog
has been moved into its own shared library, so the singleton can be shared between bouncers various static and dynamic dependencies. TheRepoLog
API and convenience preprocessor defines have been updated to expose a standardostringstream
, and hide the boost implementation inside the repo_log.cpp module. This is because the version of boost on Ubuntu 20.04 will not compile under C++20, and C++20 is required for the new threading behaviour.Dependencies
This PR adds a third party dependency
This is BSD licensed and as included as source (header only)
Footnotes
[1] This is a minor abstraction leak, in the sense that Database operations should not know what the
parents
array is, however the alternative is every instance stores the same string, and given we expect this operation be used a lot, and the whole point of this ticket is memory performance, it was considered the leak was an acceptable trade-off.Comments/Future Work
I have had issues with generic server errors, so far it seems these are genuinely mongo errors, and there is nothing to be done client side, however we should keep an eye on it.
For this ticket, we should bear in mind that the survey and base point contribute to the bounds if visible, and can undermine the revision world offset.
Mongo's bulk_write performance can be improved with unordered writes, but we'd need to ensure ourselves that all inserts took place before updates. So far it seems the performance is good enough without it.
Regarding upgrading Revit files in-place, the following snippet has been tested and is successful. However, the implications are more nuanced than thought. The act of saving the file can consume a lot of memory (2x as much), so actually saving and reloading within a single process does not reduce the highwater mark, and would have to be run on a much bigger machine as a separate processing stage. We'd need to understand better the gains of loading upgraded files before deciding this is worth it.
Two other opportunities for performance improvements, for which there is not enough evidence to justify the cost for now, include:
a. Asynchronous file writing from BlobFilesHandler
b. A third thread to perform the serialisation to BSONs.
It is possible to turn on multithreaded rendering for Revit, in theory, but this doesn't have any effect in practice: https://forum.opendesign.com/showthread.php?23889-Inquiry-Regarding-Reading-Performance-Optimization-for-ODBM
Our imports differ from Navis in a number of ways which we know of (by design), but may be picked up by the cBIM team. These are:
a. Entities take the metadata of their parents, which is most noticeable for Element Ids. This is usually intuitive, but we can get a situation where the Element Ids for instanced groups are overridden, which is not what cBIM users would expect.
b. Navis views are always imported shaded, but the file can specify realistic.
c. Navis importer ignores the Hidden state of objects.
Link dump
https://forum.opendesign.com/showthread.php?19803-Why-memory-is-so-different-in-REVIT
https://forum.opendesign.com/showthread.php?19004-Optimizing-load
https://forum.opendesign.com/showthread.php?19668-Performance-with-large-nwd-files
https://docs.opendesign.com/tbim/bimrv_unload.html
https://www.mongodb.com/docs/manual/core/aggregation-pipeline-limits/