-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MFS Support to Reprovider Strategy #10704
Conversation
Signed-off-by: Abhinav Prakash <[email protected]>
Signed-off-by: Abhinav Prakash <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thank you very much. Upon looking closely at this PR, I think this could be approached better.
- The
boxo/provider
module implements the different providers and the MFS provider should be implemented there in line with the others. - The MFS provider could then be very similar to the PinnedProvider which already traverses each recursive-pin dag from its root. It probably should receive the
offlineIPLDFetcher
that Kubo can already provide (just likeIPLDFetcher
is passed to the PinnedProvider.). - The MFS provider can hopefully traverse blocks recursively very similar to how it is done for the recursive pins, using the
fetcherHelpers.BlockAll
(https://github.com/ipfs/boxo/blob/4645270f16bb6c0943381065596c18c667440e09/provider/provider.go#L112-L117). In principle we don't need another dag-traversal implementation. - If it is easier, we can make a general "DAGProvider" which receives a root CID and does the rest. That way we don't need to figure out the MFS root from Boxo and we can pass that from Kubo. It would be more flexible too.
- In Kubo then we make a "prioritized provider" that includes the MFS and the Pinned provider, rather than mixing the MFS part with pins.
That is how I would try this first, if you have questions or something is not clear let me know! This will require a PR in boxo first, and then a follow up in Kubo.
Hi @hsanjuan |
Signed-off-by: Abhinav Prakash <[email protected]>
I have one question.... |
You need to add a changelog entry describing the change, similar to other entries in there. You can leave that for now until things are wrapped. |
|
||
if onlyPinned { | ||
return provider.NewPinnedProvider(false, in.Pinner, in.IPLDFetcher) | ||
mfsProvider := provider.NewDAGProvider(mfsRoot.Cid(), in.IPLDFetcher) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PsychoPunkSage As noted by Hector above, this PR needs to include an end-to-end test in test/cli
that ensures MFS provider is not prefetching any data that was not already in the local datastore.
Context:
- MFS is different from local blockstore and local pins in that it is lazy-loaded.
- I can do
ipfs files cp /ipfs/bafybeie6tqsijf7j6gpow5q3qdjmtqgscrmd2ijl4olpxanbbptmvh7sui/ /Tsize-test
and my node won't fetch 4EiB of data unless i start walking the dag. - People can
ipfs files cp $(ipfs resolve -r /ipns/en.wikipedia-on-ipfs.org/) /wikipedia
and use this to protect articles they visited from gc, without fetching entire thing. - We don't want providing to trigger fetch of any CIDs that are not already in local block-store.
- I can do
- Perhaps we should explicitly rename/replace
in.IPLDFetcher
here within.OfflineIPLDFetcher
? We already initialize both in./core/node/core.go
and store them separately to be extra sure offline one is used when it matters. - Separate concern: what happens once DAG walk errors on reference to a CID that is not in local store? We don't want to abort entire walk, but we should stop going deeper in this specific branch, and continue walking the DAG in other directions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Triage notes:
- The above concerns seem to be addressed by @hsanjuan in feat: NewDAGProvider to walk partial DAGs in offline mode boxo#847
- what needs to happen here is to ensure we use offline version of
in.IPLDFetcher
- once it returns "not found", the
provider.NewDAGProvider
here will just skip branch and continue walk elsewhere.
- once it returns "not found", the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Triage notes:
- This and feat: NewDAGProvider to walk partial DAGs in offline mode boxo#847 need both be rebased due to buffered provider added in provider: add a buffered KeyChanFunc. boxo#870
- Walking the entire MFS should occur only for
all
,pinned
flat
will announce everything in blockstore, so no need to do any extra work (its main purpose is to be memory-inexpensive option)roots
should only return root CIDs of files and dirs, and no internal blocks- if not possible, we should only announce MFS root CID and avoid doinf the walk
- this is important because some users may have a lot of data in MFS, and explicitly chose
roots
to announce bare minimum
Fixes #10386. Overseeds #10704. This augments the current provider KeyChanFuncs with the CIDs coming from the MFS DAG. MFS CIDs are listed using the offline fetcher, which has been updated with SkipNotFound so that no errors happen during traversal when nodes are missing. Strategies have been updated: - For "roots" and "all" strategy we announce the MFS root only, after the pins. - For the "pinned" strategy, we announce the MFS Cids after the pin cids. - A new "mfs" strategy is added, only for MFS cids. The code around strategy selection has been slightly change for the better (I hope).
Following up on #10754 |
closes #10386