Merge pull request #1074 from jmorganca/mattw/loganalysisexample

Log Analysis Example
2023-11-16 16:33:07 -08:00 · 2023-11-16 16:33:07 -08:00 · ab6639bc47
commit ab6639bc47
parent dbe6e77472 f4edc302a8
5 changed files with 131 additions and 0 deletions
--- a/examples/python-loganalysis/Modelfile
+++ b/examples/python-loganalysis/Modelfile
@ -0,0 +1,8 @@
 FROM codebooga:latest
 SYSTEM """
 You are a log file analyzer. You will receive a set of lines from a log file for some software application, find the errors and other interesting aspects of the logs, and explain them so a new user can understand what they mean. If there are any steps they can do to resolve them, list the steps in your answer.
 """
 PARAMETER TEMPERATURE 0.3
--- a/examples/python-loganalysis/loganalysis.py
+++ b/examples/python-loganalysis/loganalysis.py
@ -0,0 +1,42 @@
 import sys
 import re
 import requests
 import json
 # prelines and postlines represent the number of lines of context to include in the output around the error
 prelines = 10
 postlines = 10
 def find_errors_in_log_file():
  if len(sys.argv) < 2:
    print("Usage: python loganalysis.py <filename>")
    return
  log_file_path = sys.argv[1]
  with open(log_file_path, 'r') as log_file:
    log_lines = log_file.readlines()
 error_logs = []
    for i, line in enumerate(log_lines):
        if "error" in line.lower():
            start_index = max(0, i - prelines)
            end_index = min(len(log_lines), i + postlines + 1)
            error_logs.extend(log_lines[start_index:end_index])
  return error_logs
 error_logs = find_errors_in_log_file()
 data = {
  "prompt": "\n".join(error_logs), 
  "model": "mattw/loganalyzer"
 }
 response = requests.post("http://localhost:11434/api/generate", json=data, stream=True)
 for line in response.iter_lines():
  if line:
    json_data = json.loads(line)
    if json_data['done'] == False:
      print(json_data['response'], end='', flush=True)
--- a/examples/python-loganalysis/logtest.logfile
+++ b/examples/python-loganalysis/logtest.logfile
@ -0,0 +1,32 @@
 2023-11-10 07:17:40 /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
 2023-11-10 07:17:40 /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
 2023-11-10 07:17:40 /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
 2023-11-10 07:17:40 10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
 2023-11-10 07:17:40 10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
 2023-11-10 07:17:40 /docker-entrypoint.sh: Sourcing /docker-entrypoint.d/15-local-resolvers.envsh
 2023-11-10 07:17:40 /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
 2023-11-10 07:17:40 /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
 2023-11-10 07:17:40 /docker-entrypoint.sh: Configuration complete; ready for start up
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: using the "epoll" event method
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: nginx/1.25.3
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: built by gcc 12.2.0 (Debian 12.2.0-14) 
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: OS: Linux 6.4.16-linuxkit
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker processes
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 29
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 30
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 31
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 32
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 33
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 34
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 35
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 36
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 37
 2023-11-10 07:17:40 2023/11/10 13:17:40 [notice] 1#1: start worker process 38
 2023-11-10 07:17:44 192.168.65.1 - - [10/Nov/2023:13:17:43 +0000] "GET / HTTP/1.1" 200 615 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" "-"
 2023-11-10 07:17:44 2023/11/10 13:17:44 [error] 29#29: *1 open() "/usr/share/nginx/html/favicon.ico" failed (2: No such file or directory), client: 192.168.65.1, server: localhost, request: "GET /favicon.ico HTTP/1.1", host: "localhost:8080", referrer: "http://localhost:8080/"
 2023-11-10 07:17:44 192.168.65.1 - - [10/Nov/2023:13:17:44 +0000] "GET /favicon.ico HTTP/1.1" 404 555 "http://localhost:8080/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" "-"
 2023-11-10 07:17:50 2023/11/10 13:17:50 [error] 29#29: *1 open() "/usr/share/nginx/html/ahstat" failed (2: No such file or directory), client: 192.168.65.1, server: localhost, request: "GET /ahstat HTTP/1.1", host: "localhost:8080"
 2023-11-10 07:17:50 192.168.65.1 - - [10/Nov/2023:13:17:50 +0000] "GET /ahstat HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" "-"
 2023-11-10 07:18:53 2023/11/10 13:18:53 [error] 29#29: *1 open() "/usr/share/nginx/html/ahstat" failed (2: No such file or directory), client: 192.168.65.1, server: localhost, request: "GET /ahstat HTTP/1.1", host: "localhost:8080"
 2023-11-10 07:18:53 192.168.65.1 - - [10/Nov/2023:13:18:53 +0000] "GET /ahstat HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" "-"
--- a/examples/python-loganalysis/readme.md
+++ b/examples/python-loganalysis/readme.md
@ -0,0 +1,48 @@
 # Log Analysis example
 ![loganalyzer 2023-11-10 08_53_29](https://github.com/jmorganca/ollama/assets/633681/ad30f1fc-321f-4953-8914-e30e24db9921)
 This example shows one possible way to create a log file analyzer. To use it, run:
 `python loganalysis.py <logfile>`
 You can try this with the `logtest.logfile` file included in this directory.
 ## Review the code
 The first part of this example is a Modelfile that takes `codebooga` and applies a new System Prompt:
 ```plaintext
 SYSTEM """
 You are a log file analyzer. You will receive a set of lines from a log file for some software application, find the errors and other interesting aspects of the logs, and explain them so a new user can understand what they mean. If there are any steps they can do to resolve them, list the steps in your answer.
 """
 ```
 This model is available at https://ollama.ai/mattw/loganalyzer. You can customize it and add to your own namespace using the command `ollama create <namespace/modelname> -f <path-to-modelfile>` then `ollama push <namespace/modelname>`.
 Then loganalysis.py scans all the lines in the given log file and searches for the word 'error'. When the word is found, the 10 lines before and after are set as the prompt for a call to the Generate API.
 ```python
 data = {
  "prompt": "\n".join(error_logs), 
  "model": "mattw/loganalyzer"
 }
 ```
 Finally, the streamed output is parsed and the response field in the output is printed to the line.
 ```python
 response = requests.post("http://localhost:11434/api/generate", json=data, stream=True)
 for line in response.iter_lines():
  if line:
    json_data = json.loads(line)
    if json_data['done'] == False:
      print(json_data['response'], end='')
 ```
 ## Next Steps
 There is a lot more that can be done here. This is a simple way to detect errors, looking for the word error. Perhaps it would be interesting to find anomalous activity in the logs. It could be interesting to create embeddings for each line and compare them, looking for similar lines. Or look into applying Levenshtein Distance algorithms to find similar lines to help identify the anomalous lines.
 Also try different models and different prompts to analyze the data. You could consider adding retrieval augmented generation (RAG) to this to help understand newer log formats.
--- a/examples/python-loganalysis/requirements.txt
+++ b/examples/python-loganalysis/requirements.txt
@ -0,0 +1 @@
 Requests==2.31.0