Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MFS Support to Reprovider Strategy #10704

Closed
wants to merge 4 commits into from

Conversation

PsychoPunkSage
Copy link
Contributor

@PsychoPunkSage PsychoPunkSage commented Feb 7, 2025

Solution

  • Modified NewPinnedProvider to walk MFS content
  • Respects onlyRoots flag when announcing MFS content
  • Only walks local content (offline mode)
  • Announces directory root CIDs and file root CIDs per IPIP-402

closes #10386

Copy link
Contributor

@hsanjuan hsanjuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thank you very much. Upon looking closely at this PR, I think this could be approached better.

  • The boxo/provider module implements the different providers and the MFS provider should be implemented there in line with the others.
  • The MFS provider could then be very similar to the PinnedProvider which already traverses each recursive-pin dag from its root. It probably should receive the offlineIPLDFetcher that Kubo can already provide (just like IPLDFetcher is passed to the PinnedProvider.).
  • The MFS provider can hopefully traverse blocks recursively very similar to how it is done for the recursive pins, using the fetcherHelpers.BlockAll (https://github.com/ipfs/boxo/blob/4645270f16bb6c0943381065596c18c667440e09/provider/provider.go#L112-L117). In principle we don't need another dag-traversal implementation.
  • If it is easier, we can make a general "DAGProvider" which receives a root CID and does the rest. That way we don't need to figure out the MFS root from Boxo and we can pass that from Kubo. It would be more flexible too.
  • In Kubo then we make a "prioritized provider" that includes the MFS and the Pinned provider, rather than mixing the MFS part with pins.

That is how I would try this first, if you have questions or something is not clear let me know! This will require a PR in boxo first, and then a follow up in Kubo.

@PsychoPunkSage
Copy link
Contributor Author

Hi @hsanjuan
I tried to implement what you mentioned.... please have a look PR:boxo#847 + Made some changes in this PR too... (It will fail to compile as NewDAGProvider isn't available in boxo/providers rn)

Signed-off-by: Abhinav Prakash <[email protected]>
@PsychoPunkSage
Copy link
Contributor Author

I have one question....
What I have to do in CHANGELOG.md?? Can you please help me in that?? I saw a comment + a workflow for the same...

@hsanjuan hsanjuan changed the title chore: Add MFS Support to Reprovider Strategy Add MFS Support to Reprovider Strategy Feb 17, 2025
@hsanjuan
Copy link
Contributor

I have one question.... What I have to do in CHANGELOG.md?? Can you please help me in that?? I saw a comment + a workflow for the same...

You need to add a changelog entry describing the change, similar to other entries in there. You can leave that for now until things are wrapped.


if onlyPinned {
return provider.NewPinnedProvider(false, in.Pinner, in.IPLDFetcher)
mfsProvider := provider.NewDAGProvider(mfsRoot.Cid(), in.IPLDFetcher)
Copy link
Member

@lidel lidel Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PsychoPunkSage As noted by Hector above, this PR needs to include an end-to-end test in test/cli that ensures MFS provider is not prefetching any data that was not already in the local datastore.

Context:

  • MFS is different from local blockstore and local pins in that it is lazy-loaded.
    • I can do ipfs files cp /ipfs/bafybeie6tqsijf7j6gpow5q3qdjmtqgscrmd2ijl4olpxanbbptmvh7sui/ /Tsize-test and my node won't fetch 4EiB of data unless i start walking the dag.
    • People can ipfs files cp $(ipfs resolve -r /ipns/en.wikipedia-on-ipfs.org/) /wikipedia and use this to protect articles they visited from gc, without fetching entire thing.
    • We don't want providing to trigger fetch of any CIDs that are not already in local block-store.
  • Perhaps we should explicitly rename/replace in.IPLDFetcher here with in.OfflineIPLDFetcher? We already initialize both in ./core/node/core.go and store them separately to be extra sure offline one is used when it matters.
  • Separate concern: what happens once DAG walk errors on reference to a CID that is not in local store? We don't want to abort entire walk, but we should stop going deeper in this specific branch, and continue walking the DAG in other directions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triage notes:

@lidel lidel mentioned this pull request Mar 11, 2025
56 tasks
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triage notes:

  • This and feat: NewDAGProvider to walk partial DAGs in offline mode boxo#847 need both be rebased due to buffered provider added in provider: add a buffered KeyChanFunc. boxo#870
  • Walking the entire MFS should occur only for all, pinned
    • flat will announce everything in blockstore, so no need to do any extra work (its main purpose is to be memory-inexpensive option)
    • roots should only return root CIDs of files and dirs, and no internal blocks
      • if not possible, we should only announce MFS root CID and avoid doinf the walk
      • this is important because some users may have a lot of data in MFS, and explicitly chose roots to announce bare minimum

hsanjuan added a commit that referenced this pull request Mar 13, 2025
Fixes #10386. Overseeds #10704.

This augments the current provider KeyChanFuncs with the CIDs coming from the MFS DAG. MFS CIDs
are listed using the offline fetcher, which has been updated with SkipNotFound so that no
errors happen during traversal when nodes are missing.

Strategies have been updated:

- For "roots" and "all" strategy we announce the MFS root only, after the
pins.
- For the "pinned" strategy, we announce the MFS Cids after the pin cids.
- A new "mfs" strategy is added, only for MFS cids.

The code around strategy selection has been slightly change for the better (I
hope).
@hsanjuan
Copy link
Contributor

Following up on #10754

@hsanjuan hsanjuan closed this Mar 13, 2025
@lidel lidel mentioned this pull request Mar 20, 2025
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include local MFS entity roots in Reprovider.Strategy=roots|pinned+mfs|mfs
3 participants