Google, Elon Musk and Mark Zuckerberg say their AI is open source, but the new definition may challenge that.
advertisement
Generative artificial intelligence (AI) companies like Meta's Llama and Elon Musk's Grok claim to be open source, but many disagree about what open source AI is.
A new working definition of what the term means for AI could change that, and was recently released by the Open Source Initiative (OSI), the self-appointed custodian of the term.
Open source generally means that the software's source code is in the public domain and available for anyone to use, modify, and distribute.
The OSI open source definition also requires that the license meet 10 other criteria, including that the means to obtain the source code be widely publicized at reasonable or no cost, that it be non-discriminatory, and that the license not restrict other software.
However, because it is more difficult to evaluate AI systems against the OSI's 10 criteria, there is a new specific definition of AI.
What is the definition of open source?
The definition of open source AI means that it can be used for any purpose without needing permission from a company, and researchers should be free to explore how the system works.
It also states that the system can be modified for any purpose, including changing the output or sharing the system for use by others for any reason, with or without modifications.
The definition also states that AI companies must be transparent about the data used to train their systems, the source code used to train and run their systems, and the weights (numeric parameters that influence the performance of an AI model).
Here's the problem: despite its name, OpenAI is closed source, in that its algorithms, models and datasets are kept secret.
However, the models from Meta, Grok and Google that claim to be open source are not actually so according to the OSI definition, as the companies have not been transparent about what data is used to train the weights, which could raise copyright issues and ethical questions about whether the data is biased.
OSI recognizes that sharing complete training data sets can be difficult and isn't so black and white, and therefore doesn't prevent open source AI development from being considered “open source.”
“Open laundry”
This definition has been in the works for several years and will likely need to be updated as AI advances.
OSI developed a working definition in consultation with a 70-person group of researchers, lawyers, policymakers, activists, and representatives from major technology companies, including Microsoft, Meta, and Google.
“This definition will be a valuable resource in fighting the widespread practice of 'openwashing,' which is quite prevalent,” Mozilla representatives Aya Budel, Imo Udom and Nick Maruda said in a statement to Euronews Next.
“Overwashing”, they explain, is when non-open models (or open-ish models like Meta's Llama 3) are promoted as the main “open source” option without contributing back to the commons.
advertisement
“The researchers showed that 'the consequences of openwashing are substantial' and that it impacts innovation, research, and public understanding of AI,” they added.
There is no authority to enforce definitions
“We are the custodians and maintainers of the definition, but we don't have strong powers to enforce it,” OSI Secretary-General Stefano Maffuri told Euronews Next in an interview in March.
He further noted that judges and courts around the world are beginning to recognize the importance of open source definitions, particularly when it comes to mergers but also regulation.
As countries around the world finalize how to regulate AI, open source software has become a contentious topic.
advertisement
“The open source definition acts as a barrier to identifying false advertising,” Makhuri said.
“If a company claims to be open source, it has to uphold the values of the open source definition, or it's just going to create confusion.”