How Google Missed the LLM Bus
Nevertheless, the race is still far from over as nimble startups grab mindshare, resources, talent, and early revenues built on Google’s own innovations
In the world of innovation, Kodak, a.k.a Big Yellow, is often referred to by management students, as a cautionary tale of how a company overlooked its own milestone innovation in coming up with the world’s first digital camera, to shelve the project and then eventually facing bankruptcy as film cameras became Jurassic age technology. Google very nearly became a Kodak in the information technology space, as it missed the Large Language Model powered Generative Artificial Intelligence bus, the technology used by OpenAI to launch ChatGPT that democratised artificial intelligence for the entire world. According to some estimates, Google could have missed an economic opportunity of anything between $10-$20 billion.
It’s a fascinating story of how OpenAI was struggling around 2016-17 and had failed in its attempt to come up with a workable LLM model. Its researchers were tossing algorithmic spaghetti toward the ceiling to see what stuck. They delved into systems that solved video games and spent considerable effort on robotics. “We knew what we wanted to do,” said Altman in a recent interview with the Wired. “We knew why we wanted to do it. But we had no idea how.,” he confesses.
OpenAI’s aha moment – Elvis writing a superhit for Beatles
OpenAI’s aha moment came surprisingly from Google, which too was researching along the same lines and was fairly advanced. In the summer of 2017, eight of Google scientists, led by Ashish Vaswani, who worked in Google Brain, published a paper titled “Attention is All You Need.” The paper came up with a startlingly simple approach to sequence transduction, which is the task of converting one sequence of data into another – the basic principle underlying generative AI. It was as if Elvis was writing the lyrics for a Beatles superhit paving their way to superstardom!
The dominant models in this field were recurrent or convolutional neural networks. Imagine you have the task of translating a book from English to German. Traditionally, you might use a detailed process where you go through each sentence, understand it, and then rewrite it in German. This process is like how older computer programs (called models) are used to translate languages. They were complex, involving many steps (like understanding each word, then each sentence), and took a long time.
The attention mechanism
Now, imagine a new method, like having a magic magnifying glass. When you look at an English sentence through this glass, it instantly shows you the German translation. This new method is what the “Transformer” model is like. It’s a new way for computers to translate languages. The Transformer uses something called “attention mechanisms.” Transformers rely on a mechanism called “self-attention” which allows words to be understood in relation to all the other words in a sentence, rather than just the adjacent words. This gives a better, more holistic sense of context. This method is different from the old way, where the computer would look at each word one by one and might miss the bigger picture.
The exciting part? This new Transformer model worked really well and blazing fast. In tests where it translated from English to German and English to French, it performed better than all previous methods. It even set new records in quality of translation, measured by a score called BLEU. And it did all this much faster than older models, taking less time and using less computer power.
It was versatile & efficient
Finally, the Transformer isn’t just good at translating. It can also be used for other tasks, like understanding the structure of English sentences. It’s versatile and efficient, a big step forward in the world of language processing by computers.
Overall, this paper condensed key innovations about attention and self-supervision at an opportune time just as compute resources for massive models were expanding. It provided both an architectural blueprint and empirical evidence showing the directions to pursue for impactful Natural Language Processing (NLP) advances based on scaled-up LLMs. The Transformer fundamentally changed the trajectory of language AI.
Why Google overlooked its innovation
There are several reasons why Google did not directly parlay that work into developing its own large language models (LLMs):
- Business priorities: In 2017, Google’s AI efforts were still more focused on computer vision and machine learning across its products rather than longer-term natural language research. Developing LLMs requires massive investment.
- Talent transitions: By the time the implications were clear, some of the key researchers like Alec Radford had already left Google. Google would later acquire DeepMind but initially didn’t focus resources on leveraging Transformer advances.
- Open research culture: The researchers open-sourced their work rather than keeping the IP restricted to Google, which benefited the whole community. OpenAI and others capitalised rapidly.
- Data advantages: Groups like OpenAI crafted novel pretraining schemes using broad data from sources like Reddit to unlock beneficial abilities. Google had a less natural advantage here.
So, while Google had the foundational architecture and talent to potentially pursue LLMs, circumstances and business decisions steered their focus elsewhere initially while startups and academics pushed new paradigms. Eventually, Google pivoted resources into large language models as broader capabilities were demonstrated in areas aligned with their products. But they missed the pioneering opportunity despite the seminal work occurring under their roof.
OpenAI takes a leap forward
Meanwhile, using Google’s Transformer Paper OpenAI made more progress in two weeks than I did over the past two years. They came to understand that the key to getting the most out of the new model was to add scale – to train it on fantastically large data sets. The idea was dubbed “Big Transformer.” This approach required a change of culture at OpenAI and a focus it had previously lacked.
While it’s very difficult to precisely quantify the potential economic value that Google missed out on by not capitalising directly on the Transformer architecture to develop their own large language models. However, we can make some intelligent guesses at the order of magnitude:
- OpenAI recently raised funds at a $29 billion valuation based predominantly on their GPT language models. Google likely could have been in a similar position of leadership.
- Stability AI recently raised a $1 billion valuation after unveiling its image generation model Stable Diffusion. Google’s early Transformer research could potentially have put them ahead in this space.
- Microsoft invested $1 billion along with extensive engineering resources into their own LLM efforts with models like Turing. Google had a multi-year head start to make similar bets.
- There are also second-order growth opportunities from emerging AI verticals that leverage LLMs as a backbone. Google Cloud could already be monetising capabilities similar to what startups like Anthropic are offering.
Given these points, one could perhaps conservatively estimate the total lost revenue potential to be somewhere in the range of $10 billion to $20 billion if we project across the past 5 years. And the missed future opportunity is likely larger still.
The race is far from over
Nevertheless, the race is still far from over. This is where the tale is different from Kodak. While Google still produced hugely impactful research, it did take time for leadership to realign priority and resources into LLMs. This delay allowed nimble startups to grab mindshare, resources, talent, and early revenues built on Google’s own innovations. The pioneers reap the most rewards!
Google still can catch up and make up for lost time when it comes to leveraging its early Transformer research into large language models (LLMs). In AI specifically and R&D broadly, Google remains a leading destination for top researchers. They have the financial strength to invest billions into major LLM projects.
The search giant’s global scale Cloud infrastructure and engineering know-how allows them to train and deploy massive AI models that most startups can’t support. Their tools and reach confer advantages. From Google Search to YouTube to their Cloud offering, Google has many high-volume applications where LLMs can be integrated to provide value to billions of users.
Relative to smaller entities, Google has robust organisational structures around responsible AI development that provide confidence for large-scale LLM deployment, like what is needed for healthcare. The company’s leadership has clearly acknowledged the importance of generative AI. Investor pressures and employee demands will also push toward LLM-related growth.
While Google has stumbled before by not acting quickly based on their own R&D, LLMs remain such a crucial wave of innovation that Google dedicating focus and resources here can absolutely regain lost ground and also leverage LLMs responsibly at scale. Their strengths in allied areas are still formidable for long-term impact.
OpenAI’s weaknesses
While OpenAI has done ground-breaking research, but still has some weaknesses. Relative to giants like Google and Microsoft, OpenAI has far fewer researchers. This limits the exploration of different model architectures, datasets, and applications in parallel. Investor profit pressures early on can skew incentives away from open research culture towards closed IP and platform lock-in if not managed carefully. Though growing through partnerships,
OpenAI still likely does not have access to the unprecedented internet-scale datasets held by Google, Meta, and Amazon. This caps model power over time. OpenAI relies heavily on partners (MSFT/AWS) for most training/deployment support. Lack of fully customised software/hardware stack optimised to their models inhibits rapid innovation and flexibility.
OpenAI’s thought leadership around safety and ethics is often questioned as being disproportionate PR cover for rushing risky technology to market to appease investors. More transparent processes could help assuage this scepticism. The company, however, deserves great credit for its vision and capabilities.
Nevertheless, their smaller, narrowly-focused nature dependent on external support does impose constraints relative to having an integrated technology stack. As their models grow more powerful, mitigating these structural weaknesses will be key to ensuring the responsible, beneficial development of AI. The company’s leadership, especially Sam Altman, is fully aware of these shortcomings, and perhaps this is why he insists on greater government guardrails around future developments which could also prevent competitors from scaling up their research, while turning him into an ally for the administration drafting the new regulations.