-
Notifications
You must be signed in to change notification settings - Fork 193
Description
@MRtrix3/mrtrix3-devs
Currently contemplating the prospect of having to overcome my clankophobia (how is this not a term yet?) and try to make use of AI coding tools to make a dent in the list of bugs / feature requests.
But before I even make an attempt, I want to run some thoughts past the team.
My biggest concern with the utilisation of these tools specifically in the context of research software is the prospect of erroneous code written by an AI having a deleterious effect on external scientific research.
We can quote:
Covered Software is provided under this License on an "as is" basis, without ... warranties that the Covered Software is free of defects ... (or) fit for a particular purpose ...
all we want, but I still feel a large burden of responsibility for ensuring quality and robustness in what is provided here. When some issue is identified, I put a lot of effort into backtracking through git blame and GitHub to establish what went wrong and when, including who wrote & reviewed the code, and establishing bidirectional links for anyone to follow along.
AI throws a spanner into the works here. It is going to become increasingly easy for both core developers and external researchers to generate and contribute functional code through prompt rather than hard labour; indeed one day C++ changes could be proposed by individuals with no knowledge of the language. We can be diligent about the addition of tests to verify additions or changes (though I'm conscious of AI being used to generate tests also, which could lead to a death spiral of refuse if one is not careful), but to me that's not enough.
Prior to myself or anyone else making changes to MRtrix through such tools, I'd like to establish clear policies around the documentation of such, integrated into the contribution instructions.
Here's what I think needs forethought:
-
Tracking
We need the capability to retrospectively identify:
- If the use of specific tools consistently leads to unexpected problems.
- How much of the code base is AI- versus human-generated.
This to me means documenting within individual commit messages that such tools have been used.
-
Accountability
We need to be able to infer retrospectively whether the introduction of a particular problem was an error of code / communication written by a human versus an error resulting from imperfect review of code that was written by an AI.
I've publicly self-flagellated on the importance of accountability for research software errors, so I don't like the thought of lazy AI use having deleterious effects on external scientific projects for which responsibility is deferred to silicon.
-
Attribution
I want for the statistics on contributions toward the project to not be undermined by this novel capacity for generation of voluminous content for small effort.
We already have the @MRtrixBot account for attributing auto-generated content to mitigate distortion of author line counts. Generative AI shares certain features with this: yes it's not just running a shell script, it involves prompt engineering / adequate problem definition, but it's nevertheless a mismatch between how much effort was invested and the reporting / perception of the proportion of one's contribution to the software.
The first thought that came to mind here for me was the way in which I often utilise the @MRtrixBot account for auto-generated content, where I generate three commits:
- Authored exclusively by myself, adding or modifying the code responsible for the auto-generation
- Authored exclusively by @MRtrixBot, contributing the generated content
- Authored exclusively by myself, performing any requisite review / cleanup / revision
A similar approach to generative AI use, where one commit contains verbatim what was produced by the AI agent and then subsequent commits contain the human revisions to such, would to me be the most faithful encoding. It would however be a fair bit of overhead, which others might not be willing to take on. And it only really works for a singular prompt-response process and not for more dynamic pair-programming. I also have some recollection that in the early LLM boom popular opinion was that AI should not be considered an "author" of code, though I'm not finding any such arguments at the current time and don't recall the exact reasoning.
The other related technique is the use of git trailers, specifically "Co-authored-by". This is added automatically if accepting explicit code suggestions from PR code review within GitHub---personally I also use this if I manually make a change based on a suggestion from another dev that wasn't formatted as such---and co-contributor icons will show up in the GitHub interface.
Putting all of this together, I think the most sensible choice is to expand the use of git trailers.
My initial searching on the topic led me to this post (which is now also captured in the AI response to web search):
Assisted-bymeans I wrote the code and AI helped either through prompts or inline completions up to roughly 33% generated code.Co-authored-byis the 50/50-ish bucket ranging from 34–-66% generated code.Generated-bymeans the majority of this code came from AI--—roughly 67-–100%.Commit-generated-bymeans AI summarized a conventional commit message for me (or similar trivial contribution) but none of the code was AI-modified in any meaningful way.
Other posts suggest even greater metadata, such as the precise model and prompt used. If this were to be embedded as part of a standardised integrated automated workflow I'd be down; but in the absence of such I think that's too intrusive. Tools like Git AI and GitButler seek to provide accountability on a per-line basis within commits, but I don't think we should be demanding such stringent requirements from anyone looking to contribute to the project. Whereas git trailers are not a massive request, and could be retrospectively added through rebasing if need be.
Completely open to opinions and alternative suggestions. I've near zero experience with these tools so am not speaking from a position of expertise. I am however well-versed in technical debt and protection against perverse incentives, so am invested in ensuring this does not turn out in the future to be a complete disaster.