The default thread count logic was broken and resulted in 2x the number of threads as it should on a hyperthreading CPU resulting in thrashing and poor performance.
Run the server.cpp directly inside the Go runtime via cgo while retaining the LLM Go abstractions.