You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When trying to understand why a shard seems to not do a good job merging, it's surprisingly difficult to gain visibility / understanding. E.g. cases like #14163 and #13226.
At Amazon Product Search, we are also trying to understand how our service behaves under update storms (many sudden real-time catalog updates), and its impact on merging / NRT segment replication.
IndexWriter has an InfoStream which gives amazing verbosity on all that is happening, but it is too voluminous.
I'd think we could make a small improvement to InfoStream. Today, it writes under different components e.g. SM for segment merging. I'd like to add a new component, ST (for "segment tracing"), which provides smallish amount of output about each flush (start and end, size, deletes), each merge (start and end, which segments, how many deletes at the start, how many carryover deletes (deletes that happened while merging was happening), when deletes are applied/written, and time to merge each index section (doc values, postings, knn, etc.)).
IW/SM already writes much of this to InfoStream but it's too scattered / diffuse. I'm hoping a new ST can be lighter weight and have the important debugging details that can help us understand issues like the ones linked/described above. An application can set an InfoStream that captures just the ST messages ...
Once we have this, the 2nd part of this effort is a simple tool that can digest the output of STInfoStream and visualize, e.g. producing videos like this one and mayb a 2D interactive canvas/chart that lays out a graphical rendition of all segments and their life times.
The text was updated successfully, but these errors were encountered:
Description
When trying to understand why a shard seems to not do a good job merging, it's surprisingly difficult to gain visibility / understanding. E.g. cases like #14163 and #13226.
At Amazon Product Search, we are also trying to understand how our service behaves under update storms (many sudden real-time catalog updates), and its impact on merging / NRT segment replication.
IndexWriter
has anInfoStream
which gives amazing verbosity on all that is happening, but it is too voluminous.I'd think we could make a small improvement to
InfoStream
. Today, it writes under different components e.g.SM
for segment merging. I'd like to add a new component,ST
(for "segment tracing"), which provides smallish amount of output about each flush (start and end, size, deletes), each merge (start and end, which segments, how many deletes at the start, how many carryover deletes (deletes that happened while merging was happening), when deletes are applied/written, and time to merge each index section (doc values, postings, knn, etc.)).IW/SM already writes much of this to InfoStream but it's too scattered / diffuse. I'm hoping a new
ST
can be lighter weight and have the important debugging details that can help us understand issues like the ones linked/described above. An application can set anInfoStream
that captures just theST
messages ...Once we have this, the 2nd part of this effort is a simple tool that can digest the output of
ST
InfoStream
and visualize, e.g. producing videos like this one and mayb a 2D interactive canvas/chart that lays out a graphical rendition of all segments and their life times.The text was updated successfully, but these errors were encountered: