ollama/envconfig
Daniel Hiltgen 17b7186cd7 Enable concurrency by default
This adjusts our default settings to enable multiple models and parallel
requests to a single model.  Users can still override these by the same
env var settings as before.  Parallel has a direct impact on
num_ctx, which in turn can have a significant impact on small VRAM GPUs
so this change also refines the algorithm so that when parallel is not
explicitly set by the user, we try to find a reasonable default that fits
the model on their GPU(s).  As before, multiple models will only load
concurrently if they fully fit in VRAM.
2024-06-21 15:45:05 -07:00
..
config.go Enable concurrency by default 2024-06-21 15:45:05 -07:00
config_test.go move OLLAMA_HOST to envconfig (#5009) 2024-06-12 18:48:16 -04:00