A Game Plan for LLM
The motivation for this article is to highlight how organizations can now effectively utilize AI into their processes and get tangible results.
The Shiny Car
Over a decade has passed since Machine Learning (ML) began gaining prominence, promising valuable insights and business differentiation. This promise has indeed materialized for companies with top technical minds and lots of resources such as the likes of Google, Facebook and the Microsoft’s of the world. However, achieving the same has seldom been feasible for smaller companies.
In fact, I personally experienced this challenge when involved in establishing the Decision Platform department at a previous organization. Our objective was to setup an internal Machine Learning department focused on swiftly implementing feasible use-cases in key business functions — e.g., predicting customers’ propensity to buy, best next action etc.
To achieve this, we brought in an external leading ML consultation partner and hired a few top talents from other organizations. However, the process of feature engineering, acquiring training data, and tuning the models proved to be intricate, cumbersome, and time-consuming. Before we could derive meaningful results, we found ourselves disrupted by significant advancements in the Deep Learning Model space — Deep Learning gained popularity not only for its ability to provide profound insights but also for its user-friendly nature, eliminating the need for extensive feature engineering.
Without delving into specifics, we were not successful in Deep Learning space either. It required lot of computing resources, was complex to build and was evolving at a very fast pace, but none of these were crippling as much as the talent attrition.
We found ourselves recruiting and losing skilled individuals before they could become productive contributors. After more than five years and millions of dollars, there was not a single piece of work which we could point to and be proud of.
My takeaway from that experience is that experimentation is essential, and at times, the anticipated results may not materialize, but valuable discoveries might emerge. As an organization, we shifted our approach to utilizing market-ready ML solutions. Leveraging our acquired expertise in the field, we became adept at questioning and evaluating these solutions, leading to the creation of a few differentiating solutions in the marketplace.
A Way Forward
Although LLM model initially gained prominence with Vaswani’s et al. paper “Attention is all you need” published in 2017 (My previous article covers the topic in depth) , it wasn’t until the availability of OpenAI’s ChatGPT that LLM models truly skyrocketed in popularity, capturing the imagination of the masses.
Today almost every corporate board in America has tasked its team to find ways to employ models (specifically LLM) to gain that competitive edge in the marketplace.
In that spirit, I chose to document some of my thoughts on the path ahead.
Andrej Kapathy’s recent talk at Microsoft Build is highly informative and a must watch.
There are three techniques you can use to custom build a language model to suit your use-case.
- Build and Train your own custom model
- Use a Pre-Trained or Fine-Tune an existing Model
- RAG (Retrieval Augmented Generation Technique)
Let’s dive a little deeper in each of these approaches to understand the suitability.
Build and Train your own custom model
This is how ML models are built. Typically an organization or a research institution would publish a research paper citing their approach, technique’s they employed, data they used and the results they have achieved. Every rock star model in the market today have grown or evolved this classical way. This approach obviously requires highly skilled ML Scientists, lots of relevant data, and compute power and that is why much of powerful models are built by large mega tech companies usually in collaboration with good research schools.
This approach is the most difficult. While the downside is high, but this is where the next big innovation will come from. If you are not a large tech organization, research or government organization, you probably want to stay away from this approach.
Use a Pre-Trained or Fine-Tune an existing model
In a Pre-Trained model approach, you take an existing model and serve your runtime data to get the output. Obviously if the model is trained on generic data, which most of LLM’s are, the output wouldn’t be accurate or relevant to your context. To make these models more contextual, you have an option to fine tune the model. Fine-tuning a model typically involves updating the weights of the last few layers of the model by training the model with your proprietary domain specific data so that it can generate outputs relevant to your data and context.
Fine-tuning a model, although not as complex as building a new model, is an involved and tedious process. Typically, this will need skilled ML scientists & engineers and a good computational spend.
Until recently this approach was hard and available to those with ML prowess, but I am excited with developments happening with the Hugging Face. Hugging Face has become a go-to open-source platform where state-of-the-art pre-trained models (in the relevant domains) are available in the hub — you can download, configure it for your task or fine-tune to your dataset if needed and apply.
This is exciting because now domain specific pre-trained models are becoming available, which you can download and hopefully you will find one that will fit for your data and can be applied to your use case immediately with little to no modification.
Other reason I am excited is — For generic LLM’s like ChatGPT you must provide data — both input and grounding data (your data) which it will retain and likely use them in future. Also, every request received from your users will likely translate into multiple LLM calls for which you will incur charges and those quickly add up. With the Hugging Face, you can have state-of-the-art model for yours to keep, don’t have to share data and have a good control on costs.
This approach of Fine-Tuning an existing model is good for companies who want to build differentiating solution in the marketplace. This approach is good for disruptors/start-ups and organizations who have sensitive data (Financial Organization, etc.) or are working on mission critical use-cases
RAG (Retrieval Augmented Generation Technique)
RAG approach is a technique used to improve accuracy of a LLM model by providing additional relevant data at a right time for it to reason. RAG approach is necessary because one of the major challenges with LLM is hallucination — tendency to generate confidently plausible but factually incorrect responses.
RAG addresses hallucination by augmenting LLM’s with external knowledge (Mialon et al.,2023). These methods involve incorporating LLM’s with a retrieval system, which seeks to utilize external knowledge to guide the generation process. Instead of relying solely on the training knowledge of LLM’s, these methods fetch relevant information from external knowledge sources such as websites, document stores, databases etc. and provide it to LLM to reason.
Depending on data and how it is stored, you may use SQL (Database), Spar QL (Knowledge Graphs), embeddings (Unstructured data), or APIs for quick retrieval. Embeddings make unstructured data easily searchable using natural language. This is done by taking unstructured data, generating embeddings, storing them in a vector database (like LlamaIndex or PineCone) and when a query occurs, searching those embeddings for the most relevant context and providing it to LLM.
Providing additional relevant data at a right time allows to expand the model’s limited context window for its reasoning and thereby increasing its accuracy (Mialon et al.,2023).
Amongst the three, this approach is the simplest, and the best part is it does not require skilled ML Scientists or ML engineers. Solutions can be built by Solo developers (read Non-ML Developers) in a matter of weeks, if not days.
This space is rapidly evolving and has really sparked imagination on possible use cases those can be built and employed with this technology. Few recent promising developments include Chain-Of-Thought (Wei et al.,2023), and Verify-and-Edit (Zhao et al.,2023) proposals seem to further enhance performance in generation and reasoning tasks. Also, context windows of models are growing — OpenAI extended to 16K and Anthropic to 100K.
I recommend this approach to everyone, from a solo developer to a large organization. This approach would not necessarily create differentiation in the marketplace, since the MOAT is too small, but it will allow organization to unlock potential, improve productivity and become more efficient in its operations.