AI News

Why Do Researchers Care About Small Language Models?

Large Language Models Will Define Artificial Intelligence

Why Small Large Language Models May be Better: AI Language Models Need to Shrink

In short, the transformer unlocked the full processing power of GPUs and catalyzed rapid increases in the scale of language models. Leading LLMs grew from hundreds of millions of parameters in 2018 to hundreds of billions of parameters by 2020. Classic RNN-based models could not have grown that large because their linear architecture prevented them from being trained efficiently on a GPU.

  • This means they’re faster, use less energy, can run on small devices, and may not require a public cloud connection.
  • Over time, Moore’s Law enabled Nvidia to make GPUs with tens, hundreds, and eventually thousands of computing cores.
  • Using a closed-source LLM through an API may risk exposing sensitive information.
  • In 1999, Nvidia started selling graphics processing units (GPUs) to speed up the rendering of three-dimensional games like Quake III Arena.
  • He stated, “You can build a model for a particular use case… with just 10 hours of recording.”
  • Let’s look into a few strategies on how SLMs can be optimized to successfully deploy on edge devices.

SLMs are more streamlined versions of LLMs, with fewer parameters and simpler designs. They require less data and training time—think minutes or a few hours, as opposed to days for LLMs. This makes SLMs more efficient and straightforward to implement on-site or on smaller devices. Another significant issue with LLMs is their propensity for hallucinations – generating outputs that seem plausible but are not actually true or factual. This stems from the way LLMs are trained to predict the next most likely word based on patterns in the training data, rather than having a true understanding of the information. As a result, LLMs can confidently produce false statements, make up facts or combine unrelated concepts in nonsensical ways.

The Key Challenges In Deploying SLMs On Edge Devices

Why Small Large Language Models May be Better: AI Language Models Need to Shrink

Jamba 1.5 Large has seven times more Mamba layers than attention layers. As a result, Jamba 1.5 Large requires far less memory than comparable models from Meta and others. For example, AI21 estimates that Llama 3.1 70B needs 80GB of memory to keep track of 256,000 tokens of context. Jamba 1.5 Large only needs 9GB, allowing the model to run on much less powerful hardware.

Why Small Language Models For Edge Computing

Instead, Google’s model used an attention mechanism to scan previous words for relevant context. In 2012, three University of Toronto computer scientists—Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton—used a pair of Nvidia GTX 580 GPUs to train a neural network for recognizing images. The massive computing power of those GPUs, which had 512 cores each, allowed them to train a network with a then-impressive 60 million parameters. They entered ImageNet, an academic competition to classify images into one of 1,000 categories, and set a new record for accuracy in image recognition. Each of these tasks could easily require more than 2 million tokens of context.

Why Do Researchers Care About Small Language Models?

Why Small Large Language Models May be Better: AI Language Models Need to Shrink

For a company that needs AI for a set of specialised tasks, it doesn’t require a large AI model. Training small models require less time, less compute and smaller training data. Natural language processing (NLP) has been around for years, and although GPT models have hit the headlines, the sophistication of smaller NLP models is improving all the time.

  • The AI model, able to reply to any question and write any text, was quickly denounced as a cheating-enabling tool — and one that undermined learning.
  • Before long, researchers were applying similar techniques to a wide variety of domains, including natural language.
  • Transformers are good at information recall because they “remember” every token of their context—this is also why they become less efficient as the context grows.
  • An SLM could provide those results without the lag and potential privacy concerns that often come with using a mobile device.
  • A report from The Times of India detailed how NoBroker, a real estate platform, developed SLMs to improve customer service interactions.
  • Conversely, small language models might have a more limited capacity to process information and generate text compared to LLMs, but they’re more efficient in meeting specific needs.

Another example is CodeGemma, a specialized version of Gemma focused on coding and mathematical reasoning. CodeGemma offers three different models tailored for various coding-related activities, making advanced coding tools more accessible and efficient for developers. You’d assume that large language models backed with enormous computational power, such as OPT-175B would be able to process the same information faster and to a higher quality. It doesn’t understand the structure of a research paper, it doesn’t know what information is important, and it doesn’t understand chemical formulas. It’s not the model’s fault — it simply hasn’t been trained on this information. ChatGPT relies on a subsection of machine learning, called large language models, that have already shown to be both immensely useful and potentially dangerous.

What’s more, the computational costs to generate that news article summary is less than $0.23. With such capabilities, creating a more powerful model would be significantly easier as one could collect data that reflects current use of language while providing incredible source diversity. As a result, we believe that web scraping provides immense value to the development of any LLM by making data gathering significantly easier. There are of course better new models for almost every conceivable NLP task. However we have also seen an emergence of derivative applications outside the field of NLP, such as Open AI’s DALL-e which uses a version of their GPT-3 LLM trained to generate images from text. This opens a whole new wave of potential applications we haven’t even dreamed of.

Why Small Large Language Models May be Better: AI Language Models Need to Shrink

A clever new way to create instant reminders on Android

But a lot of valuable information is proprietary, time-sensitive, or otherwise not available for training. In my day job as President of Product and Technology at Marpai Inc., a Third Party Administrator. Our core aim is to help self-insured companies not only to administer their claims but as well our members to find the best provider for their needs.

This meant that if you fed it more than about 15 pages of text, it would “forget” information from the beginning of its context. AI adoption may be steadily rising, but a closer examination shows that most enterprise companies may not be quite ready for the big time when it comes to artificial intelligence. This makes AI accessible to businesses of all sizes, democratizing the power of this transformative technology. “We are seeing the concern from enterprises about using a model like GPT, or PaLM because they’re very large and have to be hosted by the model providers. In a sense your data does go through those providers,” said Arvind Jain, CEO of Glean, a provider of an AI-assisted enterprise search engine. There are also other techniques being tested, including one that involves training smaller sub-models for specific jobs as part of a larger model ecosystem.

Why Small Large Language Models May be Better: AI Language Models Need to Shrink

This leads to high costs, making it difficult for smaller organizations or individuals to engage in core LLM development. At an MIT event last year, OpenAI CEO Sam Altman stated the cost of training GPT-4 was at least $100M. As the performance gap continues to close and more models demonstrate competitive results, it raises the question of whether LLMs are indeed starting to plateau.

Leave a Reply

Your email address will not be published. Required fields are marked *