Asankhaya Sharma's picture

Asankhaya Sharma

codelion

AI & ML interests

AI/ML, Dev Tools and Application Security

Recent Activity

Organizations

meraGPT's profile picture Lambda Security's profile picture National University of Singapore's profile picture Patched's profile picture Arcee AI's profile picture ZeroGPU Explorers's profile picture MLX Community's profile picture Social Post Explorers's profile picture Arcee Training Org's profile picture

Posts 11

view post
Post
1964
We recently worked with OpenAI to fine-tune gpt-4o and built the SOTA model for the patched-codes/static-analysis-eval benchmark. All the code and data patched-codes/synth-vuln-fixes on how we did it is available on their GitHub - https://github.com/openai/build-hours/tree/main/5-4o_fine_tuning.

Here are some tips based on our experience:

ā†’ Establish baseline with "conditioning" / prompting

ā†’ Task-specific datasets are ideal for PEFT; hard to beat gpt-4o on "broad" tasks

ā†’ Add your best system prompt to each example

ā†’ Ensure training data distribution is similar to inference data

ā†’ Shorten instructions with concise prompts; may require more examples.

ā†’ Define clear evaluation metrics (seriously, please eval!)

You can see more details on the benchmark and process here - https://www.patched.codes/blog/the-static-analysis-evaluation-benchmark-measuring-llm-performance-in-fixing-software-vulnerabilities
view post
Post
2525
A new paper titled "STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis" shows the benefits of integrating static analysis with LLMs. (https://arxiv.org/abs/2406.10018)

Authors evaluate 4 key questions:

- How does each static analysis integration strategy perform in LLM-based repository-level code completion?
> They found that integrating static analysis in the prompting phase (especially with file-level dependencies) can achieve the substantially larger improvements than other phases.

- How do different combinations of integration strategies affect LLM-based repository-level code completion?
> Languages that are easier to analyze like Java show more improvements compared to dynamic languages like Python.

- How do static analysis integration strategies perform when compared or combined with RAG in LLM-based repository-level code completion?
> Static analysis and RAG are complementary and boost the overall accuracy.

- What are the online costs of different integration strategies in LLM-based repository-level code completion?
> Combining prompting-phase static analysis and RAG is the best option for cost-effectiveness.

In my @owasp App Sec keynote last year, I had described how one can do static analysis augmented generation (SaAG) to boost the accuracy of LLM based patches for vulnerability remediation. (you can see the talk here - https://www.youtube.com/watch?v=Cw4-ZnUNVLs)