ISSUE 729 - Streaming Imports from ODA #739

sebjf · 2025-02-12T17:22:51Z

This fixes #729 - streaming imports from ODA.

Overview

The main goal of this PR is to change how files are imported, using a pattern where nodes are offloaded to the database as soon as possible to keep the memory consumption down.

The PR makes a number of changes to achieve this for ODA, and prepare to swap other importers too. In addition, a number of performance optimisations have been made, along with updates to support the gcc-11&c++20 toolchain.

Importer Re-write

The biggest change is that the ODA import process has been "inverted".

Previously all importers would set up layers in GeometryCollector. These layers were containers that would receive primitives from ODA as a view was vectorized. At the end the collection of primitives would be turned into a scene graph.

By building the graph post-hoc, the GeometryCollector could do things such as filter empty transforms, and apply the world offset with a global view of the scene - but with the streaming import, this global view no longer exists.

The way the ODA importers work now is that each takes explicit control over the container that ODA will write primitives into (called the 'draw context' object).

As ODA vectorizes a scene, it makes recursive calls to the draw()/doDraw() overrides. Within each implementation, the importer decides whether a new layer/transform should be created for that geometry, and if so updates the draw context. The draw context itself exists on that call's stack frame. When the stack unwinds back to that frame, the method checks if the context object has any geometry, and if so only then does it create the transformation & metadata nodes.

Global properties such as the world offsets are set beforehand in the File Processors, using APIs specific to each format to get scene bounds from whatever information is available in the file headers.

Each Data Processor builds the tree in a different way, so how the necessary information is stored with the context depends on the file format. Typically though, as the trees are quite simple, they are mostly just local variables in the frame that owns the context.

For performance, metadata is only collected when a node has geometry.

This behaviour is specific to RVT, DGN, and DWG, as these are the only importers that use the vectorization approach. NWD uses a more traditional tree traversal, where we can add our own parameters to the recursive method, and get geometry from a direct function call.

The inheritance of the ODA importers now looks like the following:

NWD -> RepoSceneBuilder
RVT -> GeometryCollector -> RepoSceneBuilder
DWG -> DataProcessor -> GeometryCollector -> RepoSceneBuilder
DGN -> DataProcessor -> GeometryCollector -> RepoSceneBuilder

To better reflect how logic is actually shared between these types.

Optimisations and Upgrades

While these changes were being made, bouncer was tested with a number of recent files that were failing on production. Guided by the Visual Studio Profiler while importing these files, a number of changes to things like caching behaviour, the ODA APIs used and control flow were made in order to reduce the import times and memory.

Changes

This PR makes the following specific changes:

Add a new type RepoSceneBuilder to the modelutility namespace. RepoSceneBuilder accepts Repo Nodes and graph operations (addParent), and uses them to populate a collection asynchronously via a worker thread. RepoSceneBuilder is intended to be used in place of the RepoScene constructor for streaming-enabled importers.
Add a new type RepoMeshBuilder to the odaHelper namespace. RepoMeshBuilder receives faces (from an ODA tessellation object) and outputs MeshNodes. This is the mesh building part of the previous GeometryCollector.
Completely re-write GeometryCollector. Instead of the nested dictionaries, the new type uses a stack of Context objects that is managed by the callers. Transformation and Metadata node management is separated from the context management, with the owner being responsible for connecting the two. Transformation and Metadata nodes are created immediately using RepoSceneBuilder.
Scene offsets are now computed in the File Processors of each format. This is because ODA may initialise Data Processors multiple times for a given vectorisation, and computing bounds ahead of time is not assumed to be a cheap operation.
The Revit importer can now use the ODA file unload feature. This is turned off by default though because it did not demonstrate any real memory benefits.
A new RepoQuery, AddParent, has been introduced, which when run as an update adds a set of UUIDs to a document's parents array.[1]
A new type, BulkWriteContext, has been added to the database namespace. This new type allows owners to make multiple insert and update calls, and have them automatically dispatched in bulk. The Mongo database handler has an implementation of this object. The abstract database handler has a new method to return such an object.
The type alias repo_face_t has been replaced with an actual type, that mimics the API of the old one (a std::vector), but which does not perform any of its own allocations. This is because the profiler was showing heap allocations to be a significant part of the hot path. Corresponding face types types have been added for GeometryCollector and RepoSceneBuilder.
The Revit importer now caches materials based on the ODA object handle, saving rebuilding the same material multiple times.
A new colour type, repo_color3d_t has been introduced. repo_material_t now uses repo_color_3d_t instead of float vectors to store colours. This reduces heap allocations and also fixes the size of the colours in a repo_material_t, disambiguating how transparency is handled. Native support for this new type has been added to RepoBSONBuilder.
repo_material_t::checksum() has been updated to use a std::hash of built-in primitives, instead of computing a CRC of the string representation, as profiling was showing the string operations to be a significant part of the hot path.
VertexMap no longer performs vertex indexing, but simply maintains arrays. Vertex indexing can now be performed by MeshNode instances on themselves (removeDuplicateVertices()). This removes indexing from the hot path, and also makes it available to all importers. RepoSceneBuilder calls removeDuplicateVertices() on all Mesh Nodes in its worker thread.
TransformReductionOptimizer has been decomissioned.
IFCUtilsParser has been updated to absorb Transformation Nodes where possible, on import.
The RepoQuery implementation has been updated to be easier to follow & enable proper abstractions through the use of visitors. Variants are used to declare the potential types for the visitors, as per the standard library. The use of variants to define specific types allows making different sets of Repo Queries that are supported by different database methods. The use of the visitor pattern puts the implementation in the database handler module, where it should be.
All the structs in repo_structs.h have been moved into the repo namespace. This is because anonymous namespaces are unique between translation units (such as libraries), but the structs should be fungible (if a type is not used across translation units, it should probably not be in repo_structs.h...)
Unused static toString methods have been removed from repo_structs.h.
NWD sometimes missed tree entries. This is now fixed.
This bug for the DGN processor has been fixed by ODA, so this snippet has been reinstated: https://account.opendesign.com/support/issue-tracking/DGN-2274
RepoScene can now take a project and database name in a constructor, in order to represent just a pointer to a revisioned scene that already exists as a collection. This is the way it is intended to work with RepoSceneBuilder.
RepoBSON::replaceBinaryWithReference() is now true to its name and deletes the BinMapping once the BSON has been updated.
RepoNode has a new virtual member, getSize(), that is intended to return the total allocated memory owned by that node. This has overrides for TransformationNode and MeshNode.
Revit importer now gets all user parameters using the getParameters method, instead of getParamsList, so only populated parameters are imported. This skips unpopulated parameters, such as shared project parameters, which would be ignored by the previous logic anyway.
Scene project and database name are now set in ImportFromFile as some callers assume they are set.
The tests have been given a new type, SceneUtils, for querying the scene graph as-imported for the purposes of the unit tests.
A new unit test file, ut_repo_model_import_oda.cpp has been added to contain any ODA specific regression tests.
The projectHasMetaNodesWithPaths & projectHasGeometryWithMetadata test functions have been removed (in favour of SceneUtils), and the tree validation performed for ODA types in the system tests have been moved into ut_repo_model_import_oda.cpp.
The SRC exporter has been decommissioned.
Any simple template declarators have been removed from constructor declarations, as per a change in the standard.
UploadTestNWDProtected no longer tests if the project exist, because RepoSceneBuilder will make the project beforehand (it would be up to the user via io to destroy the collection, if they didnt want to attempt an upload again)
RepoScene unit tests have been updated as RepoScene constructor no longer allows two root nodes.
RepoLog has been moved into its own shared library, so the singleton can be shared between bouncers various static and dynamic dependencies. The RepoLog API and convenience preprocessor defines have been updated to expose a standard ostringstream, and hide the boost implementation inside the repo_log.cpp module. This is because the version of boost on Ubuntu 20.04 will not compile under C++20, and C++20 is required for the new threading behaviour.

Dependencies

This PR adds a third party dependency

https://github.com/cameron314/readerwriterqueue

This is BSD licensed and as included as source (header only)

Footnotes

[1] This is a minor abstraction leak, in the sense that Database operations should not know what the parents array is, however the alternative is every instance stores the same string, and given we expect this operation be used a lot, and the whole point of this ticket is memory performance, it was considered the leak was an acceptable trade-off.

Comments/Future Work

I have had issues with generic server errors, so far it seems these are genuinely mongo errors, and there is nothing to be done client side, however we should keep an eye on it.
For this ticket, we should bear in mind that the survey and base point contribute to the bounds if visible, and can undermine the revision world offset.
Mongo's bulk_write performance can be improved with unordered writes, but we'd need to ensure ourselves that all inserts took place before updates. So far it seems the performance is good enough without it.
Regarding upgrading Revit files in-place, the following snippet has been tested and is successful. However, the implications are more nuanced than thought. The act of saving the file can consume a lot of memory (2x as much), so actually saving and reloading within a single process does not reduce the highwater mark, and would have to be run on a much bigger machine as a separate processing stage. We'd need to understand better the gains of loading upgraded files before deciding this is worth it.

if (pDb->latestFileVersion() > pDb->getOriginalFileVersion())
{
	auto filename = "D:/3drepo/3drepobouncer_ISSUE729/temp/" + svcs.getTempFileName() + ".rvt ...";
	repoInfo << "Saving upgraded Revit file to " << convertToStdString(filename);
	svcs.writeFile(filename, pDb, false);
	pDb = nullptr; // Set this to null first, to prompt the cleanup of the database, before we read in the new file.
	pDb = svcs.readFile(filename);
	repoInfo << "...done.";
}

Two other opportunities for performance improvements, for which there is not enough evidence to justify the cost for now, include:
a. Asynchronous file writing from BlobFilesHandler
b. A third thread to perform the serialisation to BSONs.
It is possible to turn on multithreaded rendering for Revit, in theory, but this doesn't have any effect in practice: https://forum.opendesign.com/showthread.php?23889-Inquiry-Regarding-Reading-Performance-Optimization-for-ODBM
Our imports differ from Navis in a number of ways which we know of (by design), but may be picked up by the cBIM team. These are:
a. Entities take the metadata of their parents, which is most noticeable for Element Ids. This is usually intuitive, but we can get a situation where the Element Ids for instanced groups are overridden, which is not what cBIM users would expect.
b. Navis views are always imported shaded, but the file can specify realistic.
c. Navis importer ignores the Hidden state of objects.

Link dump

https://forum.opendesign.com/showthread.php?19803-Why-memory-is-so-different-in-REVIT
https://forum.opendesign.com/showthread.php?19004-Optimizing-load
https://forum.opendesign.com/showthread.php?19668-Performance-with-large-nwd-files
https://docs.opendesign.com/tbim/bimrv_unload.html
https://www.mongodb.com/docs/manual/core/aggregation-pipeline-limits/

…E_729

…ompare unit test with some comparisons deactivated.

…pport metadata nodes properly

…ph differences

… thats the only way really on models with lots of identical names. fixed ignore mechanisms.

…of concept now import via ModelImportManager

…er, and made tweaks so rest of bouncer can load a RepoSceneBuilder scene to do full import

…he repoquery to be better designed

…ta only) nodes

…rms with mesh data. added support to textures for streaming import. support for min bounding box and shared coordinates.

…y collector and file processor, and updated the rvt, dwg and dgn importers.

…into ISSUE_729 # Conflicts: # bouncer/src/repo/manipulator/repo_manipulator.cpp # bouncer/src/repo/manipulator/repo_manipulator.h # bouncer/src/repo/repo_controller.cpp # bouncer/src/repo/repo_controller.cpp.inl # bouncer/src/repo/repo_controller.h # bouncer/src/repo/repo_controller_internal.cpp.inl

…eration with multiple entries in one go

… getModelBounds to use header info instead of geometry for speed

…der and updated RepoSceneBuilder to use it

…e across translation units

…involve internal heap allocations

…dingly

…l into the worker thread of reposcenebuilder

…that have values are read in the first place

…ared libraries

…laces

# Conflicts: # tools/bouncer_worker/config_example.json # tools/bouncer_worker/src/lib/config.js

bouncer/src/repo/core/model/bson/repo_node_mesh.cpp

tools/bouncer_worker/src/lib/elastic.js

sebjf added 30 commits January 7, 2025 14:59

Merge branch 'ISSUE_698' of github.com:3drepo/3drepobouncer into ISSU…

d39abff

…E_729

Merge branch 'staging' of github.com:3drepo/3drepobouncer into ISSUE_729

ed2ff63

ISSUE #729 started adding scene comparer for importer unit tests

46e1bc0

ISSUE #729 moved nwd importer to use reposcenebuilder. passed scene c…

b486df5

…ompare unit test with some comparisons deactivated.

ISSUE #729 moved scenecomparer into its own file and updated it to su…

f3359b0

…pport metadata nodes properly

ISSUE #729 updated scenecomparer to be more sensitive to material gra…

4bd7b24

…ph differences

ISSUE #729 added support for referenced materials through simple cache

6ca8877

ISSUE #729 put scenecomparer back to always use hierarchical hash, as…

8a5fe9a

… thats the only way really on models with lots of identical names. fixed ignore mechanisms.

ISSUE #729 exposed db & project in modelimport config, to have proof …

8048c23

…of concept now import via ModelImportManager

ISSUE #729 cleaned up unit tests

d409197

ISSUE #729 integrated model offset from ODA into NWD streaming import…

5b35bdf

…er, and made tweaks so rest of bouncer can load a RepoSceneBuilder scene to do full import

ISSUE #729 removed the transform reduction optimiser and refactored t…

26c73d9

…he repoquery to be better designed

ISSUE #729 updated revit importer so it works, but with empty (metada…

644a229

…ta only) nodes

ISSUE #729 fixes for revit importer - importer now loads only transfo…

e1f4041

…rms with mesh data. added support to textures for streaming import. support for min bounding box and shared coordinates.

ISSUE #729 redistributed responsibilities of data processors, geometr…

c00c62e

…y collector and file processor, and updated the rvt, dwg and dgn importers.

ISSUE #729 tweaks to finish off dwg and dgn importers

27b3040

ISSUE #729 clean up after examining propsective pr

f6078d6

ISSUE #729 moved materials handling into reposcenebuilder

3a044b4

ISSUE #729 removed repo material builder and reinstated addparents op…

74abcab

…eration with multiple entries in one go

ISSUE #729 fixed exception handling so right error code is reported

873be96

ISSUE #729 added support for unload mode to rvt importer, and tweaked…

21c267a

… getModelBounds to use header info instead of geometry for speed

ISSUE #729 introduced concept of WriteContext to AbstractDatabaseHanl…

7330401

…der and updated RepoSceneBuilder to use it

ISSUE #729 added multithreading support to RepoSceneBuilder

0dde42f

ISSUE #729 added some basic reporting and turned off disk caching in ODA

c7ce3bb

ISSUE #729 introduced better material caching in revit importer

0ca09e3

ISSUE #729 moved repo structs into repo namespace so they are the sam…

b4c3d19

…e across translation units

ISSUE #729 changed face handling to use dedicated type that does not …

10d7a31

…involve internal heap allocations

ISSUE #729 updated colour structs and tweaked material handling accor…

d7880e6

…dingly

ISSUE #729 moved vertex referencing into meshnode, and moved that cal…

29fa958

…l into the worker thread of reposcenebuilder

sebjf added 18 commits February 11, 2025 14:12

ISSUE #729 updated how revit files get parameters so only parameters …

9387e99

…that have values are read in the first place

ISSUE #729 updated comments

14b2f25

ISSUE #729 put condition back

734c6af

ISSUE #729 updated ifc importer to absorb transform nodes where possible

22fa0c7

ISSUE #729 added exception handling for bulk write context

556b942

ISSUE #729 removed optimisation disable

ef101ba

ISSUE #729 minor tweaks for pr

7ff3970

ISSUE #729 build indices at start of oda import

d99f3c9

ISSUE #729 tweaks for building under gcc

fb2e4ec

ISSUE #729 further tweaks for gcc

13d8c2f

ISSUE #729 moved logger into its own dll so singleton works across sh…

956d760

…ared libraries

ISSUE #729 further tweaks for gcc

767a6e6

ISSUE #729 updated headers

918a008

ISSUE #729 tweaked how scene is initialised to put things in better p…

13e9356

…laces

ISSUE #729 fixed error handling and tweaked unit tests

8844d6e

ISSUE #729 fixed missing textures flag

ba5fa6f

ISSUE #729 removed outdated check for rvt import test

3aa390a

ISSUE #729 tweaks to assimp importer to absorb leaf transforms on import

a31ac93

sebjf requested a review from carmenfan February 18, 2025 09:05

carmenfan self-assigned this Feb 18, 2025

carmenfan changed the base branch from master to staging February 18, 2025 14:36

Merge branch 'staging' of github.com:3drepo/3drepobouncer into ISSUE_729

9504d07

# Conflicts: # tools/bouncer_worker/config_example.json # tools/bouncer_worker/src/lib/config.js

carmenfan requested changes Mar 18, 2025

View reviewed changes

bouncer/src/repo/core/model/bson/repo_node_mesh.cpp Outdated Show resolved Hide resolved

bouncer/src/repo/core/model/bson/repo_node_mesh.cpp Outdated Show resolved Hide resolved

tools/bouncer_worker/src/lib/elastic.js Outdated Show resolved Hide resolved

sebjf added 3 commits March 18, 2025 11:53

ISSUE #729 tweaks to hash combine to better match conventions

875f203

ISSUE #729 reverted change looking for elastic enabled flag

9e2431b

Merge branch 'staging' into ISSUE_729

bf8fc1f

carmenfan approved these changes Mar 18, 2025

View reviewed changes

carmenfan merged commit 64ab771 into staging Mar 18, 2025
7 of 8 checks passed

carmenfan deleted the ISSUE_729 branch March 18, 2025 13:46

carmenfan removed their assignment Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISSUE 729 - Streaming Imports from ODA #739

ISSUE 729 - Streaming Imports from ODA #739

sebjf commented Feb 12, 2025 •

edited

Loading

ISSUE 729 - Streaming Imports from ODA #739

ISSUE 729 - Streaming Imports from ODA #739

Conversation

sebjf commented Feb 12, 2025 • edited Loading

Overview

Importer Re-write

Optimisations and Upgrades

Changes

Dependencies

Footnotes

Comments/Future Work

Link dump

sebjf commented Feb 12, 2025 •

edited

Loading