Last Week

Development derailed. I got sucked into this AI thing. Wound up setting up a local private AI stack. Below is an example of me asking AI to code for me and its response in real time on local hardware with anonymized internet search. Since it is shown in real time, it is a bit slower than what we might be used to from places like public ChatGPT.

Example of using AI for coding

Tools used:

What does it mean in English?

  • An AI model is just a bunch of numbers stored in a very large file. An AI engine is needed to make use of AI models. This is not dissimilar to how we need a video player to play video files.
  • Running an AI model requires a lot of computations happening at the same time. A GPU is much better at doing this than a CPU.
  • An AI model, once trained, does not incorporate new information in the model itself. However, it is possible to make it search the internet and incorporate the response from search engines in the response it gives us.
  • Running everything required for the model, the engine, and the UI requires lots of dependencies (i.e., tools). Docker makes it easy to assemble everything in one place for the application to run.
  • A search aggregator searches many search engines and then combine their results together, strip away the ads, then return them to the user. When routed through a VPN service, it makes it nigh impossible to track the user.

Nerdy Details

How did I set up my private local LLM that sends no data to big tech?

First, we need a computer with a fairly powerful GPU. In my case, I purchased a used RTX 3090 with 24GB of VRAM for $650. Most modern processors would do just fine.

Then we install an operating system on it. I chose Fedora Server because I like hats.

Once the OS is installed, we can get to work to get LLM to work. We will need:

  1. Open WebUI, which is a web app that simplifies the interaction with your local LLMs.
  2. Ollama, which is an app that manages and runs local LLMs.
  3. Large Language Models, which are models that contain the actual data (weights) to perform inference.

To kill three birds with one stone, we will spin up a docker container that has both Open WebUI and Ollama pre-configured. Then we will go into the Open WebUI interface to download models to use.

Prerequisites:

  1. Make sure docker is installed on your system.
  2. If you are using an NVIDIA GPU with CUDA support, ensure that you have the NVIDIA Container Toolkit.

Make a docker-compose.yml file and include the following:

services:
    open-webui:
        ports:
            - 3000:8080
        deploy:
            resources:
                reservations:
                    devices:
                        - driver: nvidia
                          count: all
                          capabilities:
                              - gpu
        volumes:
            - ./ollama:/root/.ollama
            - ./open-webui:/app/backend/data
        container_name: open-webui
        restart: unless-stopped
        image: ghcr.io/open-webui/open-webui:ollama

Run the container.

docker compose up -d

Now we can access Open WebUI at 127.0.0.0:3000. But can we go deeper? Below we will set up a locally-hosted search aggregation service (for use by Open WebUI) and route its outgoing traffic through a VPN.

First we set up a docker container that provides VPN service. Use the following docker compose file. Make sure to customize it using your own VPN provider.

Importantly we need to expose the container name by specifying it with container_name so it becomes available to other containers we will make soon. Additionally, we also expose port 8080 for searxng. This means once the searxng container is connected to the gluetun network, we can still access searxng via gluetun at this port.

services:
  gluetun:
    image: qmcgaw/gluetun
    container_name: gluetun
    cap_add:
      - NET_ADMIN
    restart: unless-stopped
    devices:
      - /dev/net/tun:/dev/net/tun
    ports:
      - 8888:8888/tcp # HTTP proxy
      - 8388:8388/tcp # Shadowsocks
      - 8388:8388/udp # Shadowsocks
      - 8080:8080/tcp # searxng
    volumes:
      - ./gluetun:/gluetun
    environment:
      - VPN_SERVICE_PROVIDER=
      - VPN_TYPE=wireguard
      - WIREGUARD_PRIVATE_KEY=
      - WIREGUARD_ADDRESSES=
      - WIREGUARD_PRESHARED_KEY=

After the gluetun tunnel is up, we spin up a searxng container and attached it to the gluetun network. We use the following docker compose file. Importantly, note that the network_mode is specified for both containers to be container:gluetun. This means instead of docker managing the container network, gluetun will. In this case gluetun will act as the gateway and DHCP server (among other things) and have redis and searxng join its network. This ensures that any traffic searxng has going out would go through gluetun’s VPN tunnel.

services:
  redis:
    container_name: redis
    image: docker.io/valkey/valkey:8-alpine
    command: valkey-server --save 30 1 --loglevel warning
    restart: unless-stopped
    network_mode: "container:gluetun"
    volumes:
      - ./valkey-data2:/data
    cap_add:
      - SETGID
      - SETUID
      - DAC_OVERRIDE
    logging:
      driver: "json-file"
      options:
        max-size: "1m"
        max-file: "1"

  searxng:
    container_name: searxng
    image: docker.io/searxng/searxng:latest
    restart: unless-stopped
    network_mode: "container:gluetun"
    volumes:
      - ./searxng:/etc/searxng:rw
    environment:
      - SEARXNG_BASE_URL=https://0.0.0.0/
      - UWSGI_WORKERS=4
      - UWSGI_THREADS=4
    cap_add:
      - CHOWN
      - SETGID
      - SETUID
    logging:
      driver: "json-file"
      options:
        max-size: "1m"
        max-file: "1"

This docker compose file would spin up a redis database server for searxng to use, and then searxng. If everything goes as expected, we can visit searxng from 127.0.0.0:3000. Then inside Open WebUI’s interface we can configure for it to search the internet using the searxng service we just configured.

Next Week

  • Actually flush out the UI on host landing page, ideally using AI help.