If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version
Run the server.cpp directly inside the Go runtime via cgo while retaining the LLM Go abstractions.