6.1 C
New York
Friday, February 28, 2025

Open-Source AI Is Increasingly Popular But Not Risk-Free


Open-source AI projects are exploding in popularity and are contributing to PwC’s estimated $15.7 trillion impact AI will have on the global economy by 2030. However, some enterprises have hesitated to fully embrace AI.  

In 2023, VentureBeat found that while more than 70% of companies were experimenting with AI, only 20% were willing and able to invest more.  

Open-source tooling offers enterprises cost-effective, accessible AI use with benefits including customization, transparency and platform independence. But it also carries potentially hefty costs for the unprepared. As enterprises expand their AI experimentation, managing these risks becomes critical.  

Risk #1: Training data  

Many AI tools rely on vast stores of training data to develop models and generate outputs. For example, OpenAI’s GPT-3.5 was reportedly trained on 570 gigabytes of online text data, approximating 300 billion words.   

More advanced models require even larger and often less transparent datasets. Some open-source AI tools are released without dataset disclosures or with overwhelming disclosures, limiting useful model evaluations and posing potential risks. For example, a code generation AI tool could be trained on proprietary, licensed datasets without permission, leading to unlicensed output, and potential liability.  

Related:What Netflix’s ‘Zero Day’ Got Right (and Wrong) About Cyberattacks

Open-source AI tools using open datasets still face challenges, such as evaluating data quality to ensure a dataset hasn’t been corrupted, is regularly maintained, and includes data suited for the tool’s intended purpose.  

Regardless of the data’s origins, enterprises should carefully review training data sources and tailor future datasets to the use case, where possible.    

Risk #2: Licensing  

Proper data, model, and output licensing presents complicated issues for AI proliferation. The open-source community has been discussing the suitability of traditional open-source software licenses for AI models.   

Current licensing ranges from freely open to partial use restrictions, but unclear criteria for qualifying as “open source” can lead to licensing confusion. The licensing question can trickle downstream: If a model produces output from a source with a viral license, you may need to adhere to that license’s requirements.  

With models and datasets evolving constantly, evaluate every AI tool’s licensing against your chosen use case. Legal teams should help you understand limitations, restrictions and other requirements, like attribution or a flow-down of terms.  

Risk #3: Privacy  

Related:How to Overcome the Quantum Threat

As global AI regulations emerge and discussions swirl around the misuse of open-source models, companies should assess regulatory and privacy concerns for AI tech stacks.  

At this stage, be comprehensive in your risk assessments. Ask AI vendors targeted questions, such as:  

  • Does the tool use de-identification to remove personal identifiable information (PII), especially from training datasets and outputs?  

  • Where is training data and fine-tuning data stored, copied and processed?  

  • How does the vendor review and test accuracy and bias, and on what cadence?  

  • Is there a way to opt in or out of data collection?  

Where possible, implement explainability for AI and human review processes. Build trust and the business value of the AI by understanding the model and datasets enough to explain why the AI returned a given output. 

Risk #4: Security   

Open-source software’s security benefits simultaneously pose security risks. Many open-source models can be deployed in your environment, giving you the benefit of your security controls. However, open-source models can expose the unsuspecting to new threats, including manipulation of outputs and harmful content by bad actors. 

AI tech startups offering tools built on open AI can lack adequate cyber security, security teams, or secure development and maintenance practices. Organizations evaluating these vendors should ask targeted questions, such as:   

Related:Risk Leaders: Follow These 4 Strategies When Transitioning To Continuous Risk Management

  • Does the open project address cybersecurity issues?   

  • Are the developers involved in the project demonstrating secure practices like those outlined by OWASP?   

  • Have vulnerabilities and bugs been promptly remediated by the community?  

Enterprises experimenting with AI tooling should continue following internal policies, processes, standards, and legal requirements. Consider best security practices like:  

  • The tool’s source code should remain subject to vulnerability scanning.   

  • Enable branch protection for AI integrations.   

  • Interconnections should be encrypted in transit and databases at rest.  

  • Establish boundary protection for the architecture and use cases.   

A strong security posture will serve enterprises well in their AI explorations.  

Risk #5: Integration and performance   

Integration and performance of AI tooling matters for both internal and external use cases at an organization.   

Integration can affect many internal elements, like data pipelines, other models and analytics tools, increasing risk exposure and hampering product performance. Tools can also introduce dependencies upon integration, such as open source vector databases supporting model functionality. Consider how those elements affect your tool integration and use cases, and determine what additional adjustments are needed.  

After integration, monitor AI’s impact on system performance. AI vendors may not carry a performance warranty, causing your organization to handle development if open-source AI does not meet your expectations. The costs associated with maintaining and scaling AI functions, including data cleaning and subject matter expertise time, climb quickly.  

Know Before You Go Open Source  

Open-source AI tooling offers enterprises an accessible and affordable way to accelerate innovation. Still, successful implementation requires scrutiny and a proactive compliance and security posture. An intentional evaluation strategy for hidden costs and considerations of leveraging open-source AI will ensure ethical and intelligent use. 



Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles