* subprocess improvements
- increase start-up timeout
- when runner fails to start fail rather than timing out
- try runners in order rather than choosing 1 runner
- embed metal runner in metal dir rather than gpu
- refactor logging and error messages
* Update llama.go
* Update llama.go
* simplify by using glob