Last Week
Development derailed. I got sucked into this AI thing. Wound up setting up a local private AI stack. Below is an example of me asking AI to code for me and its response in real time on local hardware with anonymized internet search. Since it is shown in real time, it is a bit slower than what we might be used to from places like public ChatGPT.
Tools used:
- GPU: NVIDIA GTX 3090 ($650 used)
- Operating System: Fedora Server (free)
- Container Service: Docker (free)
- AI Engine: Ollama (free)
- AI Engine UI: Open WebUI (free)
- Reasoning Model: DeepSeek 32b (free)
- Search Aggregator: SearXNG (free)
- Search Aggregator Database: valkey (free)
- VPN Container Service: gluetun (free)
What does it mean in English?
- An AI model is just a bunch of numbers stored in a very large file. An AI engine is needed to make use of AI models. This is not dissimilar to how we need a video player to play video files.
- Running an AI model requires a lot of computations happening at the same time. A GPU is much better at doing this than a CPU.
- An AI model, once trained, does not incorporate new information in the model itself. However, it is possible to make it search the internet and incorporate the response from search engines in the response it gives us.
- Running everything required for the model, the engine, and the UI requires lots of dependencies (i.e., tools). Docker makes it easy to assemble everything in one place for the application to run.
- A search aggregator searches many search engines and then combine their results together, strip away the ads, then return them to the user. When routed through a VPN service, it makes it nigh impossible to track the user.
Nerdy Details
How did I set up my private local LLM that sends no data to big tech?
First, we need a computer with a fairly powerful GPU. In my case, I purchased a used RTX 3090 with 24GB of VRAM for $650. Most modern processors would do just fine.
Then we install an operating system on it. I chose Fedora Server because I like hats.
Once the OS is installed, we can get to work to get LLM to work. We will need:
- Open WebUI, which is a web app that simplifies the interaction with your local LLMs.
- Ollama, which is an app that manages and runs local LLMs.
- Large Language Models, which are models that contain the actual data (weights) to perform inference.
To kill three birds with one stone, we will spin up a docker container that has both Open WebUI and Ollama pre-configured. Then we will go into the Open WebUI interface to download models to use.
Prerequisites:
- Make sure docker is installed on your system.
- If you are using an NVIDIA GPU with CUDA support, ensure that you have the NVIDIA Container Toolkit.
Make a docker-compose.yml
file and include the following:
services:
open-webui:
ports:
- 3000:8080
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- gpu
volumes:
- ./ollama:/root/.ollama
- ./open-webui:/app/backend/data
container_name: open-webui
restart: unless-stopped
image: ghcr.io/open-webui/open-webui:ollama
Run the container.
docker compose up -d
Now we can access Open WebUI at 127.0.0.0:3000
. But can we go deeper? Below we will set up a locally-hosted search aggregation service (for use by Open WebUI) and route its outgoing traffic through a VPN.
First we set up a docker container that provides VPN service. Use the following docker compose file. Make sure to customize it using your own VPN provider.
Importantly we need to expose the container name by specifying it with container_name
so it becomes available to other containers we will make soon. Additionally, we also expose port 8080
for searxng. This means once the searxng container is connected to the gluetun network, we can still access searxng via gluetun at this port.
services:
gluetun:
image: qmcgaw/gluetun
container_name: gluetun
cap_add:
- NET_ADMIN
restart: unless-stopped
devices:
- /dev/net/tun:/dev/net/tun
ports:
- 8888:8888/tcp # HTTP proxy
- 8388:8388/tcp # Shadowsocks
- 8388:8388/udp # Shadowsocks
- 8080:8080/tcp # searxng
volumes:
- ./gluetun:/gluetun
environment:
- VPN_SERVICE_PROVIDER=
- VPN_TYPE=wireguard
- WIREGUARD_PRIVATE_KEY=
- WIREGUARD_ADDRESSES=
- WIREGUARD_PRESHARED_KEY=
After the gluetun tunnel is up, we spin up a searxng container and attached it to the gluetun network. We use the following docker compose file. Importantly, note that the network_mode
is specified for both containers to be container:gluetun
. This means instead of docker managing the container network, gluetun will. In this case gluetun will act as the gateway and DHCP server (among other things) and have redis
and searxng
join its network. This ensures that any traffic searxng has going out would go through gluetun’s VPN tunnel.
services:
redis:
container_name: redis
image: docker.io/valkey/valkey:8-alpine
command: valkey-server --save 30 1 --loglevel warning
restart: unless-stopped
network_mode: "container:gluetun"
volumes:
- ./valkey-data2:/data
cap_add:
- SETGID
- SETUID
- DAC_OVERRIDE
logging:
driver: "json-file"
options:
max-size: "1m"
max-file: "1"
searxng:
container_name: searxng
image: docker.io/searxng/searxng:latest
restart: unless-stopped
network_mode: "container:gluetun"
volumes:
- ./searxng:/etc/searxng:rw
environment:
- SEARXNG_BASE_URL=https://0.0.0.0/
- UWSGI_WORKERS=4
- UWSGI_THREADS=4
cap_add:
- CHOWN
- SETGID
- SETUID
logging:
driver: "json-file"
options:
max-size: "1m"
max-file: "1"
This docker compose file would spin up a redis database server for searxng to use, and then searxng. If everything goes as expected, we can visit searxng from 127.0.0.0:3000
. Then inside Open WebUI’s interface we can configure for it to search the internet using the searxng service we just configured.
Next Week
- Actually flush out the UI on host landing page, ideally using AI help.