As part of laying down application architecture for LLM applications, one key focus area is LLM deployments. Related to LLM deployment is laying down LLM hosting strategy as part of which different hosting options need to be looked at, and evaluated based on various criteria including cost and appropriate hosting should be selected. In this blog, we will learn about different hosting options for different kinds of LLM and related strategies.
What is going to be the cost related to LLM hosting depends upon the type of LLM we need for our application.
If we need to use a proprietary model such as GPT-4 or Claude-2, our LLM hosting cost would primarily be API cost. We don’t require to host such models. Rather these models are hosted on the servers of LLM API providers. The proprietary models expose REST API which can be used in the application. The API cost depends upon the number of tokens processed as part of the API request. Recall that the number of tokens includes both the input tokens (the text you send to the model) and the output tokens (the text the model generates in response). For example, if you send a request with a prompt that contains 100 tokens and the model generates a response with 150 tokens, the total number of tokens processed would be 250 tokens.
LLM Hosting Cost = f(API Cost)
There are several reasons why we would want to use the hosting options related to the usage of the proprietary model rather than the open-source model. Some of the important ones:
If we want to use open-source LLMs such as Llama, these models would need to be downloaded locally and hosted on their infrastructure. The cost would depend upon the size of the model.
Let’s understand with the help of the Llama-13B model. The Llama-13B model can be downloaded from the Huggingface website. Here is the Huggingface Llama page. Downloading the model implies downloading model weights in the form of necessary files. Here are some of the steps that would need to be taken to download the model.
The hosting cost would comprise the following:
Based on a rough estimate as shown above, the total monthly cost of hosting a Llama-13B model in your cloud hosting server can be somewhere around the following:
Using the LLaMA-13B model would involve downloading and hosting it on the company’s infrastructure, with a monthly cost of around $25,921.20. This cost primarily includes the expenses for GPU computing, storage, and operational overheads.
In comparison, using a proprietary model like GPT-4 hosted by OpenAI would eliminate the need for local infrastructure and maintenance. Instead, the company would pay based on the number of tokens processed, which could be more cost-effective and simpler to manage depending on usage patterns and specific requirements.
When we train an LLM in-house, we need to take a little different approach in terms of library usage than the open-source models where HuggingFace transformer libraries can be used.
We will need to use a powerful computing infrastructure with large GPUs such as NVIDIA A100 GPUs and set up a distributed training environment given a large language model and dataset. The frameworks such as Tensorflow or Pytorch can be used to train the LLMs. Once the LLM is trained, we would need to export the model in a format suitable for deployment. Common formats include ONNX, TensorFlow SavedModel, or PyTorch models.
Now we are ready for the LLM deployment. We can choose to host the LLM in on-premises servers or cloud-based solutions (e.g., AWS, GCP, Azure). We can use serving frameworks like TensorFlow Serving and TorchServe to interact with Large Language Models (LLMs).
Suppose we trained an LLM from scratch with 5B parameters. For hosting such LLMs, we would typically use instances with powerful GPUs. AWS offers various GPU instances, but a common choice for deep learning tasks is the p4d.24xlarge or p4de.24xlarge instance, which comes with NVIDIA V100 GPUs. Here’s a cost estimate based on the p4d instance.
Monthly Cost Calculation:
You will also need storage for your model and data. Let’s assume you need around 100 GB of storage for the model weights, data, and other necessary files.
Monthly Storage Cost:
Data transfer costs depend on the amount of data being transferred in and out of AWS. For simplicity, let’s assume a modest data transfer of 1 TB per month.
Monthly Data Transfer Cost:
Artificial Intelligence (AI) agents have started becoming an integral part of our lives. Imagine asking…
In the ever-evolving landscape of agentic AI workflows and applications, understanding and leveraging design patterns…
In this blog, I aim to provide a comprehensive list of valuable resources for learning…
Have you ever wondered how systems determine whether to grant or deny access, and how…
What revolutionary technologies and industries will define the future of business in 2025? As we…
For data scientists and machine learning researchers, 2024 has been a landmark year in AI…