Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC 2024 brainstorming #3524

Closed
terriko opened this issue Nov 15, 2023 · 15 comments
Closed

GSoC 2024 brainstorming #3524

terriko opened this issue Nov 15, 2023 · 15 comments
Labels
gsoc Tasks related to our participation in Google Summer of Code

Comments

@terriko
Copy link
Contributor

terriko commented Nov 15, 2023

GSoC 2024 has been officially announced and the schedule is up here:
https://developers.google.com/open-source/gsoc/timeline

We'll want to have some viable ideas nailed down around the end of January for when Python puts in an org application. But the first step to this is brainstorming on all ideas (including ones that may not work out for various reasons) so feel free to just throw ideas around here and we'll narrow it down later.

Some wishlist items off the top of my head to get the discussion started:

  1. Improved PURL/SBOM support and other input data quality tools
    • helping people annotate SBOMs with PURL data or otherwise improve SBOM quality
    • improved PURL support for our language parsers (some of this may happen before gsoc but I suspect there will still be work to do by then)
    • we'd previously discussed using additional metadata (e.g. from language package repositories) to improve scan quality but didn't get a taker for that gsoc project, so it might get rolled into a new one
  2. Improved Triage tooling:
    • warning when triage goes "out of date"
    • improved support for using multiple triage files
    • improved tooling and guidance (documentation) on how to triage, how to share triage, how to use shared triage. I suspect that in writing documentation people will find a few more gaps.
@terriko terriko added the gsoc Tasks related to our participation in Google Summer of Code label Nov 15, 2023
@terriko
Copy link
Contributor Author

terriko commented Nov 29, 2023

@terriko
Copy link
Contributor Author

terriko commented Nov 29, 2023

  • Look at more data sources, there may be an issue with gitlab (and also redhat?)

@terriko
Copy link
Contributor Author

terriko commented Nov 29, 2023

  • UI streamlining/improvements?
    • multiple command line sets?
    • moving some things to config files only?

@terriko
Copy link
Contributor Author

terriko commented Nov 29, 2023

  • cve-bin-tool as a service: what scaling issues might be encountered?
    • even if we don't run it, what problems would others find? Updates? pipelines?
  • Is there work to do to improve the experience of folk using github actions and other CI systems? Clouds? Containers?

@terriko
Copy link
Contributor Author

terriko commented Nov 29, 2023

  • database design: any places we need to make improvements? Moving to a "real" database?
    • are we throwing away too much data? Can we make good use of ecosystem data or other things we currently discard?

@terriko
Copy link
Contributor Author

terriko commented Nov 29, 2023

  • metrics, usage, gathering more data about how cve-bin-tool is used?

@terriko
Copy link
Contributor Author

terriko commented Nov 29, 2023

  • cvss4

@terriko
Copy link
Contributor Author

terriko commented Nov 29, 2023

  • better support for api2 and whatever comes next -- converting api2 to json?

@terriko
Copy link
Contributor Author

terriko commented Nov 30, 2023

Two ideas from gsoc 2023 that didn't get chosen/completed:

We'll likely rework those a bit before using them, but they could potentially be viable options in 2024.

And one even older one to show that we've been thinking about triage for a while...

@JCoonradt
Copy link

JCoonradt commented Jan 28, 2024

Hello, my name is Jensen! I am new to open-source development, but I have extensive programming experience and a passion for cybersecurity. I would love to get involved with the CVE Binary Tool Project, and I hope to apply to Google Summer of Code. An interesting project could be creating a Ghidra-based backend for the CVE Binary Tool scanner. By incorporating Ghidra’s binary pattern analysis, it could be possible to improve the vulnerability scanning. This project would probably fall under the consideration of improved SBOM generation tools. Also, if you have any advice on getting involved with the CVE Binary Tool Project, that would be amazing!

@terriko
Copy link
Contributor Author

terriko commented Jan 29, 2024

@JCoonradt that sounds intriguing, but I don't think we have any mentors familiar with Ghidra to run that this year, and we've got a few more urgent needs that are much more likely to get selected. The biggest issue in my brain right now is anything that can improve the matching for the language parsers and sboms to avoid false positives based on common names, but we'll be discussing ideas and prioritizing during our monthly meeting on Wednesday so I'll have some better described project ideas after that.

@anthonyharrison
Copy link
Contributor

anthonyharrison commented Jan 29, 2024

Hello, my name is Jensen! I am new to open-source development, but I have extensive programming experience and a passion for cybersecurity. I would love to get involved with the CVE Binary Tool Project, and I hope to apply to Google Summer of Code. An interesting project could be creating a Ghidra-based backend for the CVE Binary Tool scanner. By incorporating Ghidra’s binary pattern analysis, it could be possible to improve the vulnerability scanning. This project would probably fall under the consideration of improved SBOM generation tools. Also, if you have any advice on getting involved with the CVE Binary Tool Project, that would be amazing!

@JCoonradt Thanks for the idea but Ghidra is a very complicated Java project. I consider that getting Ghidra to produce the information which we could use in the tool is non-trivial and certainly not within the capabilities of a typical GSOC student. I have looked briefly at Ghidra and I don't think it is suitable for generating the binary checkers we need for cve-bin-tool. However, if you have a prototype to show how the binary analysis capabilities of Ghidra could be used, then it would be interesting to look at. (note that there are already a number of binary analysis tools in Python which might be worth considering as they may offer a better integration route for cve-bin-tool).

I agree with @terriko in that the improvements to language parsers to remove false positives and improve the component matching SBOM triage process (there is a lot of work to do in this area now) are more pressing at this stage.

@terriko
Copy link
Contributor Author

terriko commented Jan 31, 2024

Notes from Jan monthly meeting:

  • @anthonyharrison is interested in mentoring a triage-related project: 3 formats of primary interest right now:
    • CSAF - being used in medical field, not a very nice format to work with and will need a library
    • OpenVEX (from ChainGuard) very limited and simple
    • CycloneDX - released, stable, definitely from people who really understand this space
    • Unsure if triage will be per-cve, per-component, per-sbom. Definitely some design decisions here.
    • People still saying "I was 0 vulnerabilities" so triage can be used to help make it easier for people to handle scans
    • may need automation/tooling around this to make it easier
  • Other ideas: CVSS4 is coming, we should be prepared to support it
  • False positives / improved data matching
    • possible tidying to avoid reporting unknown vendors
    • Make our language scanners generate PURL & integrate PURL2CPE
    • (May need a PURL2NotCPE to avoid false positives)
    • PURL support in OSV may be better
    • We're throwing a lot of data away that we should maybe consider using (e.g. CWE)
    • Note: PURL->CPE is not a 1-1 mapping at all (and likely never will be)
  • Data sources may need revisiting: GitLab not working consistently, others may need review
  • database: over 2 million records,
    • what does that mean for performance?
    • Should we consider a "real" database?
    • Should we make the database configurable?
    • Use SQLAlchemy
    • Are records being overwritten incorrectly?
  • SBOM generation from the binary: could be improved.
  • cve-bin-tool is really 3 tools: binary scanner, package/language scanner, sbom scanner. Do we need better separation/refactoring?

Idea not for gsoc:

  • could we get a training course for cve-bin-tool?

@terriko
Copy link
Contributor Author

terriko commented Jan 31, 2024

Additional notes from Terri:

  • cve-bin-tool will likely get 1-2 slots, so I'm going to focus on writing up our 2 most urgent ideas (triage and false positives) first
  • Some of the others should probably be written up as feature requests that could be taken outside of gsoc projects
  • I'm very happy to have more mentors if anyone's interested! Let me know here or privately if you're interested in mentoring.

@terriko
Copy link
Contributor Author

terriko commented Apr 17, 2024

Brainstorming is long over for this year (we're into selection now) so I'll close up this thread.

@terriko terriko closed this as completed Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gsoc Tasks related to our participation in Google Summer of Code
Projects
None yet
Development

No branches or pull requests

3 participants