Open Source AI Has No Moat

Why the artificial intelligence game is rigged against open source.

Feb 24, 2024

In early 2023, a memo from an anonymous researcher at Google was leaked. It stated that neither Google nor OpenAI have a distinct advantage in the development of artificial intelligence. Instead, the open source community has a much better chance of advancing the technology's frontiers. Although the memo is meant to be a call to action for Google to revise its research strategy, it makes several bold statements about open source that unfortunately don’t always add up.

SemiAnalysis

Google "We Have No Moat, And Neither Does OpenAI"

The text below is a very recent leaked document, which was shared by an anonymous individual on a public Discord server who has granted permission for its republication. It originates from a researcher within Google. We have verified its authenticity. The only modifications are formatting and removing links to internal web pages. The document is only the opinion of a Google employee, not the entire firm. We do not agree with what is written below, nor do other researchers we asked, but we will publish our opinions on this in a separate piece for subscribers. We simply are a vessel to share this document which raises some very interesting points…

3 years ago · 673 likes · 10 comments · Dylan Patel and Afzal Ahmad

The core argument of the Google memo is that the worldwide community of open source developers is much more capable than research labs of large tech companies, owing to their speed of execution, adoption of the latest ideas, and sharing of resources. As a result, models developed by open source are often faster, more customizable, and, pound-for-pound, better than those from closed-source companies. While individual hobbyists, academic researchers, and underpaid graduate students have successfully addressed some of the biggest challenges in artificial intelligence, the truth is that they, too, have no moat. To maintain a defensible advantage for the open source model, we should consider three different aspects of artificial intelligence: development, deployment, and distribution.

Development of new AI models, tools, datasets and benchmarks.

Specialized knowledge, the latest research papers, and training datasets are often freely available, and the best ideas can come from anywhere. Tapping into the global talent pool, often at no cost, has proven to be an excellent strategy for innovation, as seen by the timeline of progress in the memo. However, there's a dark side: open-source projects frequently find themselves locked into platforms owned or sponsored by major tech companies, which ultimately stand to benefit the most, whether directly or indirectly. Some notable examples include Chromium, Android, React, PyTorch, and Kubernetes. Specifically, in the context of open-source AI, many models today are built atop the leaked version of Meta’s Llama. As of this writing, there are over 8.6k forks of their GitHub repository, and this extensive development gives Facebook Research the opportunity to explore all the possibilities of their models without having to do all the work. What should have been a moat for open source has become a moat for the company that owns the platform.

Deployment of AI models as services utilizing expensive compute resources, data centers, and extensive cloud management.

This is the most capital-intensive aspect of producing artificial intelligence, and it's something beyond the reach of not just open source communities but also most companies. Only a few players—Microsoft Azure, Google Cloud Platform, and Amazon Web Services—have a strong foothold in this market, in part due to the ongoing GPU shortage. Even the most popular open-source platform for AI development, HuggingFace, operates largely because corporations like Google, Amazon, and NVIDIA pay premiums for a private version of the platform, which subsidizes usage for open-source developers. When it comes to deployment, open source not only lacks a competitive edge, but the cost of computing itself also becomes a barrier to entry for training new AI models from scratch.

Distribution of artificial intelligence in products and services that are used by consumers.

The most exciting aspect of artificial intelligence lies in its ability to perform tasks that were previously considered nearly impossible to do at scale. The real value proposition of artificial intelligence will fundamentally alter our interaction with the technology around us. Arguably, this represents the most significant force multiplier in the recent history of humankind, and it is easy to understand why there is explosive growth — ChatGPT reached 1 million users in just 5 days. However, with growth and opportunity comes competition. Historically, many open-source projects have been under-resourced, lacking both funds and community contributors. In such a competitive landscape, open-source projects struggle to organize development, design quality products, and market to customers for widespread adoption and distribution.

There is no doubt that open source projects generally do not achieve the same level of success as closed-source companies. This observation shouldn't surprise anyone— RedHat, the poster child for successful open-source enterprises, pales in comparison to Microsoft. Although an oversimplification, both companies initially created operating systems. In 2018, RedHat was acquired by IBM for $34 billion and Microsoft was worth $780 billion. As of this writing, Microsoft is worth a staggering $3 trillion.

This discussion holds even greater significance today as artificial intelligence increasingly automates and substitutes for human reasoning and judgment. Abundant research highlights the inherent risks and biases in models, including misinformation, racial, and gender discrimination, among others. Theoretically, open-source AI should increase the likelihood that the wisdom of the crowd will identify and correct biases, uncover issues, and suggest enhancements. However, in reality, the incentives are not aligned for open source to succeed or compete with large tech companies. Without organized control over the development, deployment, or distribution of artificial intelligence, the open source artificial intelligence community is going to be at the mercy of Big Tech.

Almost a year since the memo leaked, Google has released its family of small language models, Gemma, based on the same work as its flagship Gemini models. With the weights made publicly available, Google is hoping to engage with the open source community for research and development. Only time will tell whether this release will help build a moat for open source or deepen the moat for Google.

Thank you for reading Models and Metrics. Send this post to someone who wants to learn about artificial intelligence.

Models and Metrics

Discussion about this post

Ready for more?