ollama

History

Daniel Hiltgen 17b7186cd7 Enable concurrency by default This adjusts our default settings to enable multiple models and parallel requests to a single model. Users can still override these by the same env var settings as before. Parallel has a direct impact on num_ctx, which in turn can have a significant impact on small VRAM GPUs so this change also refines the algorithm so that when parallel is not explicitly set by the user, we try to find a reasonable default that fits the model on their GPU(s). As before, multiple models will only load concurrently if they fully fit in VRAM.	2024-06-21 15:45:05 -07:00
..
config.go	Enable concurrency by default	2024-06-21 15:45:05 -07:00
config_test.go	move OLLAMA_HOST to envconfig (#5009 )	2024-06-12 18:48:16 -04:00

Daniel Hiltgen 17b7186cd7 Enable concurrency by default

This adjusts our default settings to enable multiple models and parallel
requests to a single model.  Users can still override these by the same
env var settings as before.  Parallel has a direct impact on
num_ctx, which in turn can have a significant impact on small VRAM GPUs
so this change also refines the algorithm so that when parallel is not
explicitly set by the user, we try to find a reasonable default that fits
the model on their GPU(s).  As before, multiple models will only load
concurrently if they fully fit in VRAM.

2024-06-21 15:45:05 -07:00

config.go

Enable concurrency by default

2024-06-21 15:45:05 -07:00

config_test.go

move OLLAMA_HOST to envconfig (#5009 )

2024-06-12 18:48:16 -04:00