🤖 LLM on a Dime - Exploring large language models without draining your wallet.

Posted Jan 31, 2025

By Pablo Sanchez

4 min read

LLM on a Dime

Exploring large language models without draining your wallet is easier than ever. Thanks to platforms like Runpod.io, you can spin up powerful LLMs for just a few bucks an hour. Here, we’ll share how to do it so you can dive into cutting-edge AI without breaking the bank. Let’s get started!

Requirements

Runpod.io account
Basic LLM & linux knowledge
10 bucks (approx 10hrs of usage)

1. Navigate to the Dashboard

Upon Signing In
After logging in, you will land on the Runpod dashboard. This is where you can view any existing servers (pods) you have.
Click on ‘Create Pod’
Look for the button labeled Pods > Deploy. Click on it to begin setting up a new instance.

2. Select GPU / CPU Hardware Type

Choose Instance Type
- Runpod displays different GPU options (e.g., NVIDIA RTX 3090, A10, A100, etc.) with varying prices and performance.
- For this demo I am going to be running DeepSeek-R1 70B parameter model which requires at least 1 NVIDIA A40 (48 GB of VRAM)
Check Pricing
- Pricing is displayed per hour. Verify the cost fits your budget and that the hardware is suitable for your project.

3. Select Your Use Case / Template

Runpod often provides quick templates for various setups. You have two main approaches:

Use a Community Template
- Runpod offers community images such as “Stable Diffusion,” “PyTorch,” “TensorFlow,” etc.
- If one meets your needs, select it to simplify your setup.
I’ll be using a simple Ubuntu server Image
- It is important that when running your custom image you make sure to include your public SSH key and increase the container disk size to match the model requirements - In this scenario I’ll need at least 65 Gb on the container disk to run DeepSeek-R1:70B

Deploy
- Choose the template that best suits your requirements, then click Deploy On-Demand.
- Wait a short time (seconds to minutes) for the pod to start.

4. Connect to Your New Server

Once your pod is running, you can connect in several ways:

Option A: Built-in Runpod Web Shell / Web GUI

For many templates (e.g., Jupyter Notebook), you’ll see a Connect or Open in Browser button. Click it to open the environment in a new browser tab.
You can often access a Web Terminal or SSH directly from the Runpod dashboard.

Option B: SSH from Your Local Machine

Retrieve SSH Info
- In the pod’s details, you’ll see an SSH command, something like:
  1 ssh runpod-username@ip-address -p PORT
Use the Command
- Open a terminal on your local machine, paste the SSH command, and press Enter.
- If you added your SSH key, it will authenticate automatically. Otherwise, you may be prompted for a password. —

5. Install Ollama

Once connected to the pod, install Ollama

 curl -fsSL https://ollama.com/install.sh | sh

Install CUDA Drivers https://developer.nvidia.com/cuda-downloads
Next we are going to use screen to run Ollama server on the background
1 screen -S Ollama
This is going to create a new window where we need to run ollama serve
Once ollama server is running, we can exit the screen session with CTRL A + D
Download the desired model e.i DeepSeek-R1:70B - This step should take a couple of minutes.
Exit Ollama

We now have Ollama and the model installed and we are able to run it from the CLI.

6. Install OpenWeb UI

Time to give your model a sleek web interface with OpenWebUI! Here’s how to set it up.

        
      
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
DATA_DIR=~/.open-webui uvx --python 3.11 open-webui@latest serve

7. Access OpenWebUI in Your Browser

Since everything is running inside a Docker container, directly accessing it via http://:8080 might not be possible or secure. Instead, setting up a reverse SSH tunnel allows you to securely access the web interface from your local machine without exposing it to the internet.

Establish the Reverse SSH Tunnel

        
      
   ssh -N -L 8080:localhost:8080 root@your-server-ip -p port

With the reverse SSH tunnel in place, you’re all set to interact with your DeepSeek 70B model through a user-friendly web interface!

Now, you can run a massive 70B model without the need for an exorbitant upfront investment in hardware, such as a €7,000 graphics card. Instead, leverage cloud platforms like RunPod to access the necessary GPU resources for just a few cents an hour!! Hope you enjoyed it😁

This post is licensed under CC BY 4.0 by the author.