It will get weirder with the rise of large language models

Last month, DeepMind published a paper, “Training Compute-Optimal Large Language Models”, which talks about how OpenAI, DeepMind, Microsoft, etc., have trained large language models with a deeply suboptimal use of compute. DeepMind has also proposed new scaling laws for optimal compute use and has been training a new 70 billion parameter model that outperforms much larger language models, including GPT-3 (175 billion parameters) and Gopher (270-billion parameters).

Reacting to some recent developments regarding large language models, Russel Kaplan, Head of Nucleus at ScaleAI, wrote a series of tweets about the “second-order effect” of the rise of large language models. Let’s break the thread down.

Pay tax to companies creating large language models

Russel said in his Twitter thread that companies making products might have to embed intelligence into their massive language models, like add Copilot to VSCode, DALL.E 2 to Photoshop or GPT-3 to Google Docs. These companies may have to have their own large language models or pay tax to use them from OpenAI, Google, etc.

This tweet can be decoded with a paper written by AI researcher Timnit Gebru and her associates titled “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” This paper discusses the variety of costs and risks associated with larger language models.

Poor compute will depend on rich compute

Russel said that the development of large language models could create a divide among tech companies, where the compute rich companies may become platform gatekeepers making compute poor companies depend on them for the ML models. Like the way Epic Games, Zynga, etc., got expelled from big tech giants, smaller companies’ products may not function for want of language models.

Power over core resources

Further, Kaplan’s tweet talked about how compute rich companies may aggressively secure their supply chain, like silicon chips, etc., citing Elon Musk’s intention to buy lithium mining rights to ensure a continuous supply of raw material to his electric vehicle batteries. Lithium is the key raw material for making lithium-ion electric vehicle batteries, and Tesla, the largest electric vehicle manufacturing company in the world, is facing a shortage of this raw material. Tesla’s CEO Elon Musk had expressed concern over the rising price of lithium in a recent tweet and suggested that Tesla might get into the mining business to help solve this shortage.

Talking further about securing the supply chain, Russel mentions how AI companies are designing their training chips rather than buying from NVIDIA’s, profiting with a gross margin of 65 per cent. In fact, NVIDIA has, in their investment thesis report, stated that their gross margin is 65 per cent and that they realized that their chips are a perfect fit for processing large amounts of data required in AI applications.

Link to national security

Russel’s thread said that the government would soon invest in having a computational infrastructure to train the largest language models, which will become essential for national security. There are also chances of having a new Manhattan project for AI supercomputing.

The Manhattan project is about the US government asking supercomputer manufacturer Cray to build an exascale computer to run complex solutions to replicate nuclear weapons tests without detonating one.

Facebook’s AI RSC

Russel further tweeted about the expenditure in various sectors, like how Facebook’s AI RSC was developed at a capital expenditure of $ 1 billion; the original Manhattan project was approximately $ 30 billion and space race projects about $ 250 billion.

Large language models to herald new search engines

Then, Russel spoke about how Generative Language Models will replace search engines. In the future, the user needs not search for anything on Google; instead, the information will be embedded in the product that the user is using. Giving an example of Copilot in GithHub, he said there could be many implications in this trend. Decoding the tweet, an article published by MIT Technology Review about “Language models like GPT-3 could herald a new type of search engine” talks about users asking a trained language model to search and answer them directly instead of searching for information on the web pages.

Royalty instead of licensing

Russel further tweeted that in the future, web properties with user-generated content will demand royalties instead of licensing it when their data is used for training AI models. Demanding royalty was written in a research article published by The Royal Society Publishing, called “Algorithms that remember: model inversion attacks and data protection law”. Many large firms already offer trained models for various tasks, and the two main business models underpin this practice. The first is in licensing application programming interfaces (APIs) through ‘App Store’-like platforms, and the other is firms earn royalties when their models are deployed.

Maximizing log-likelihood instead of SEO

Kaplan discussed the implication of search engine optimization, saying that instead of SEO, marketers will maximize the log-likelihood of their content generated by an ML model that could have data poisoning attacks. Further, Russel tweeted about seeing Sponsored Outputs for language models where advertisers will pay to condition model outputs on their products. There will be more research on v2AdWords, and ads will be generated instead of search placement. Russel concluded that all these developments would only get us into a weird situation.

Twitteratis reactions divide over Russel’s tweets.

Reacting to Russel’s Twitter thread, Boston-based investor Dan MacDade disagreed and agreed to some of the tweets. Dan said that large language models would not only be owned by the big techs but also OSS will replace them.

However, Igor Brigadir, CTO at recsyslabs, tweeted in support; he said Russel had just nailed it and that whatever he tweeted would come true.

Reading the Twitter thread, many responded by saying it was an eye-opener on large language models. Though there are mixed reactions, many liked the methodology of searching for such instances and putting them across. They even suggested a forum be created to discuss such topics.

Leave a Reply

Your email address will not be published.