Authors: Ayah Budayle, Imo Udom, Nick Maluda
TL;DR: Mozilla is excited about today's new definition of open source AI and supports it as an important step forward.
The past year has seen more and more people recognize the societal benefits of open source AI. In October, a large coalition signed our statement highlighting that openness and transparency are key elements to AI safety and security. In February, Mozilla and the Columbia Institute of International Politics brought together AI experts to highlight how openness in AI can help advance important societal goals. Policymakers are also embracing open source AI. The U.S. National Telecommunications and Information Administration (NTIA) recently released a groundbreaking report in support of openness in AI. Even companies like Google, Microsoft, Apple, and Meta have begun to open up certain aspects of their AI systems.
As attention turns to open source AI, it is increasingly important to establish a common understanding of what open source AI is. A definition must clarify what needs to be shared and under what conditions. Without this clarity, we risk a fragmented approach: companies label products as “open source” that are not in fact open source, civil society lacks access to the AI components needed for testing and accountability, and policymakers create regulations that fail to address the complexity of the issue.
The Open Source Initiative (OSI) recently published a draft new definition of open source AI, marking a significant milestone in the evolution of the Internet. This moment comes after two years of dialogue, debate, work, and late-night discussions across the technical and open source communities. It will not only redefine what “open source” means in the context of AI, but it will also shape the future of the technology and its impact on society.
The original Open Source Definition, introduced by OSI in 1998, was more than just a set of guidelines; it was a manifesto for a new way to build software. It laid the foundation for the open systems that have become the backbone of the modern Internet. From Linux to Apache, open source projects have fostered innovation, collaboration, and competition, allowing the Internet to grow into a diverse, dynamic ecosystem. By making software free to use, modify, and share, the original open source movement expanded access to technology and broke down barriers to entry, fostering a culture of innovation and transparency, making software more secure and less vulnerable to cyber attacks.
This is an important step toward bringing clarity and rigor to the open source AI discussion. It introduces a dual-term definition of “open source” similar to existing definitions. While this is only one of several approaches to defining open source AI, it provides precision to developers, advocates, and regulators who benefit from a clear definition in a variety of work contexts. Specifically, it outlines that open source AI revolves around the ability to freely use, study, modify, and share AI systems. It also promotes the importance of access to key components needed to reproduce substantially equivalent AI systems, including information about the data used for training, the source code for AI development, and the AI models themselves.
This definition also marks the first attempt to address the complex question of whether and how training data for AI models should be shared as part of open source AI. The definition acknowledges that sharing entire training datasets is difficult in practice, and thus avoids the vast majority of otherwise open source AI development being considered “open source.” We are working to change this by making open datasets a more common part of the AI ecosystem. Mozilla and Eleuther AI recently convened experts to outline best practices for open datasets to support AI training, and we will soon publish a paper to advance norms that support AI training data being more widely available.
Some will disagree with various aspects of the OSI definition, such as how training data is treated, and the definition will likely need to be refined over time. However, we believe that the OSI's community-driven process, which has involved stakeholder engagement for over a year, has established an important reference point for the discussion of open source AI. For example, the definition will be a valuable resource for combating the fairly widespread practice of “openwashing,” where non-open models (or open-ish models like Meta's Llama 3) are promoted as the primary “open source” option without contributing them to the commons. Researchers have shown that the “impact of openwashing is profound,” affecting innovation, research, and public understanding of AI.
This effort essentially embodies what the open source community is all about: engaging in open discussion, addressing differences, acknowledging shortcomings, and refining this definition together to build something better. This effort effectively embraces many of the key aspects of openness that the open source community has worked on, such as going beyond just considering openness in the model weighting to include broader model components, documentation, and licensing approaches as outlined at Columbia. In contrast, closed source ecosystems operate in secrecy, with limited access and behind-the-scenes deals where big tech companies trade computing power and talent. We prefer a consistent, if sometimes imperfect, approach any day.
We and many others look forward to working with OSI and the broader open source community to bring greater clarity to the discussion around open source AI and continue to unleash the potential of open source AI for the benefit of society.
Get Firefox
Get the browser that protects what's important