-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add boilerplate to the dashboard template about accessing hub data on S3 #4
Comments
Related issue: hubverse-org/hubTemplate#22 |
Fun thing: I'm not sure that either of the tools in the dashboard are able to use s3 (yet). That being said, it might really help operations to be able to do this because that way I wouldn't have to assume that a hub is on GitHub and have to fetch it. |
Interesting---we we first discussed dashboards at the hub retreat in June (just before you started), we had floated the idea of telling hubs they had to host their data on the cloud before using these tools. Now that y'all have rolled up your sleeves, it would be interesting to get a sense of how much time and complexity it adds to work with the data on GitHub instead. (all this aside, I assume that the play here is to have a template markdown page that cloud folks can use when publishing their dashboards) |
LOL no one told me 😅
I mean, that's what we have right now... and it means that we have to first clone the hub before doing anything with it.
Oh are you talking about publishing a dashboard on s3? Because the play for incorporating s3 hubs would be to conditionally fetch the data from s3 or download the hub if we can't. |
And for context, the dashboards are built as static sites where all the data have been pre-computed and are called via |
FWIW, we did discuss this -- it was just in the era when notes were recorded in google docs and so it was easy for ideas to get lost. In this case, we were using this doc -- see notes from Oct 9 and summary at the bottom. The discussion is limited, though -- more or less, "the current working solution requires clones of repos, which may be a problem if repo sizes get large. Consider S3 in the future." If we think the time has come to reconsider support hubs in clouds, there is a question of what it would take to move to pulling data from S3. I think the answers may differ for hubPredEvalsData, which computes scores for the evals tool, and hub-dashboard-predtimechart, which creates the json data ingested by the viz tool:
[Edit: I added this comment having read the discussion thread but not the actual issue topic and now feel that I've sent the conversation even farther away from the actual topic here. If we want to discuss this further maybe we should file a separate issue or start an RFC about it?] |
Right--I'm muddying the waters by using the word "template" instead of "boilerplate". I heard in our Monday meeting that we wanted some text in hub-dashboard-template that admins of cloud-hosted hubs could use as a starting point for describing how people can access the S3 data (since the dashboard essentially serves as a hub's website). Maybe as a new "data" or "data access" tab? Same idea as the boilerplate in the README PR: hubverse-org/hubTemplate#23 (Which I agree is not very interesting, but is a step in the right direction...ultimately, I'd love to see an actual template-driven process for creating a hub) Does that seem reasonable? |
Thanks for adding that context @elray1. If we didn't actually decide to rely on S3 for the hub dashboard, then I certainly don't want to derail all the work that's already been done and cause thrashing in the project! In the long run, continuing to tightly couple hub data access to git is something we'd want to address, but if S3 was never truly a requirement for this first dashboard iteration, I'm definitely not suggesting that we rejig the POC work y'all have been doing. |
A couple of things:
This doesn't even have to be limited to s3 and we could actually template this similar to how we template the data generation by doing the following:
|
Background
@nickreich noted on Slack that Hubverse hubs that sync their data to AWS:
The Hubverse should do a better job of advertising data available on S3.
Definition of done
The text was updated successfully, but these errors were encountered: