Dutch and Iranian security researchers have created an automated genAI tool that can scan huge open source repositories and patch vulnerable code that could compromise applications.
Tested by scanning GitHub for a particular path traversal vulnerability in Node.js projects that’s been around since 2010, the tool identified 1,756 vulnerable projects, some described as “very influential,” and led to 63 projects being patched so far.
The tool opens the possibility for genAI platforms like ChatGPT to automatically create and distribute patches in code repositories, dramatically increasing the security of open source applications.
But the research, described in a recently published paper, also points to a serious limitation in the use of AI that will need to be fixed for this solution to be effective. While automated patching by a large language model (LLM) dramatically improves scalability, the patch also might introduce other bugs.
And it might be difficult to fully eradicate the particular vulnerability they worked on because, after 15 years of exposure, some popular large language models (LLMs) seem to have been poisoned with it.
Why? Because LLMs are trained on open source codebases, where that bug is buried.
In fact, the researchers found that if an LLM is contaminated with a vulnerable source code pattern, it will generate that code even when instructed to synthesize secure code. So, the researchers say, one lesson is that popular vulnerable code patterns need to be eradicated not only from open-source projects and developers’ resources, but also from LLMs, “which can be a very challenging task.”
Hackers have been planting bad code for years
Threat actors have been planting vulnerabilities in open source repositories for years, hoping that, before the bugs are discovered, they can be used to infiltrate organizations adopting open source applications. The problem: Developers unknowingly copy and paste vulnerable code from code-sharing platforms such as Stack Overflow, which then gets into GitHub projects.
Attackers need to know only one vulnerable code pattern to be able to successfully attack many projects and their downstream dependencies, the researchers note.
The solution created by the researchers could allow the discovery and elimination of open source holes at scale, not just in one project at a time as is the case now.
However, the tool isn’t “scan for this once, correct all,” because developers often fork repositories without contributing to the original projects. That means for a vulnerability to be truly erased, all repositories with a vulnerable piece of a code would have to be scanned and corrected.
In addition, the vulnerable code pattern studied in this research used the path name part of the URL directly, without any special formatting, creating an easy to exploit flaw. That’s the pattern the tool focuses on; other placements of the bad code aren’t detected.
The researchers will release the tool in August at a security conference in Vietnam. They plan to improve and extend it in several directions, particularly by integrating other vulnerable code patterns and improving patch generation.
Skeptical expert
However, Robert Beggs, head of Canadian incident response firm DigitalDefence, is skeptical of the value of the tool in its present state.
The idea of an automated tool to scan for and patch malicious code has been around for a while, he pointed out, and he credits the authors for trying to address many of the possible problems already raised.
But, he added, the research still doesn’t deal with questions like who’s responsible if a faulty patch damages a public project, and whether a repository manager can recognize that an AI tool is trying to insert what may be a vulnerability into an application?
When it was suggested that management would have to approve the use of such a tool, Beggs wondered how managers would know the tool is trustworthy and – again – who would be responsible if the patch is bad?
It’s also not clear how much, if any, post-remediation testing the tool will do to make sure the patch doesn’t do more damage. The paper says ultimately the responsibility for making sure the patch is correct lies with the project maintainers. The AI part of the tool creates a patch, calculates a CVSS score and submits a report to the project maintainers.
The researchers “have an excellent process and I give them full credit for a tool that has a lot of capability. However, I personally wouldn’t touch the tool because it deals with altering source code,” Beggs said, adding, “I don’t feel artificial intelligence is at the level to let it manage source code for a large number of applications.”
However, he admitted, academic papers are usually just the first pass at a problem.
Open source developers can be part of the problem
Along the way, the researchers also discovered a disturbing fact: Open source app developers sometimes ignore warnings that certain code snippets are radioactive.
The vulnerable code the researchers wanted to fix in as many GitHub projects as possible dated back to 2010, and is found in GitHub Gist, a service for sharing code snippets. The code creates a static HTTP file server for Node.js web applications. “[Yet] despite its simplicity and popularity, many developers appear unaware that this code pattern is vulnerable to the path traversal attack,” the researchers write.
Even those who recognized the problem faced disagreement from other developers, who repeatedly squashed the notion that the code was bad. In 2012, a developer commented that the code was vulnerable. Two years later, another developer raised the same concern about the vulnerability, but yet another developer said that the code was safe, after testing it. In 2018, somebody commented about the vulnerability again, and another developer insisted that that person did not understand the issue and that the code was safe.
Separately, the code snippet was seen in a hard copy of a document created by the community of Mozilla developers in 2015 – and fixed seven years later. However, the vulnerable version also migrated to Stack Overflow in late 2015. Although snippet received several updates, the vulnerability was not fixed. In fact, the code snippet there was still vulnerable as of the publication of the current research.
The same thing happened in 2016, the researchers note, with another Stack Overflow question (with over 88,000 views) in which a developer suspected the code held a vulnerability. However, that person was not able to verify the issue, so the code was again assumed safe.
The researchers suspect the misunderstanding about the seriousness of the vulnerability is because, when developers test the code, they usually use a web browser or Linux’s curl command. These would have masked the problem. Attackers, the researchers note, are not bound to use standard clients.
Disturbingly, the researchers add, “we have also found several Node.js courses that used this vulnerable code snippet for teaching purposes.” .