Our servers generated multiple compressed logs every day. Some days there would be one log file, other days there could be five or six depending on activity. Over time this turned into thousands of compressed .gz log files covering months or years.
Whenever something suspicious happened (a bug, exploit attempt, etc.) - the only way to investigate it was by searching through those logs.


Originally, the process looked like this:
Download tens, or even hundreds of compressed log files from the server Decompress them locally Run grepsearches across the extracted logsManually scan the results and piece together timelines
This worked, but it was slow and repetitive
The simple tool I made to automate this 
The tool connects to the server via multiple simultanious SFTP connections and scans logs across a configurable time range.
Instead of downloading and searching files sequentially, the script:
lists the server’s log directory filters files based on a time threshold (for example you’d input “30”, “90”, or “365” days) downloads log files concurrently decompresses .gzfiles automaticallyscans each line for a target string aggregates the results into a structured output
The tool is able to process hundreds of log files very quickly, because the downloading of the logs, decompression, and search, all run in parallel. So now an investigation that would have taken multiple hours, now turns into minutes.
How the log processing works 
The python script is able to process an insane amount of log files in parallel by using multiple workers.
This reduced the time required to search large volumes of logs.
num_workers = 10threads = []for _ in range(num_workers):t = threading.Thread(target=worker)t.start()threads.append(t)file_queue.join()for t in threads:t.join()
Quick Note:
Simultaneous sftp connections are also used by
FileZilla
Visualizing the Search Results
Instead of printing results in the terminal, the script generates a local HTML page that displays matches grouped by log file.
The Python script injects the results as JSON into a HTML template.
results_json = json.dumps(sorted_results, indent=4, ensure_ascii=False)with open(template_path, 'r', encoding='utf-8') as file:html_template_content = file.read()html_template_content = html_template_content.replace('>PLACEHOLDER_DATA',f'>{results_json}')with open(output_path, 'w', encoding='utf-8') as file:file.write(html_template_content)webbrowser.open('file://' + os.path.realpath(output_path))
Example of the Visual Output Template
The HTML visualizer renders the results and allows quick inspection of matches.
<h1>RESULTS</h1><div id="utc-time"></div><div id="input-section" class="input-section"><textarea id="input">PLACEHOLDER_DATA</textarea><button onclick="visualizeLogs()">Visualize</button></div><div id="results"></div>
The results are rendered dynamically with JavaScript →
Object.entries(data).forEach(([filename, lines]) => {const fileDiv = document.createElement('div')fileDiv.className = 'file-result'const filenameDiv = document.createElement('div')filenameDiv.className = 'file-name'filenameDiv.textContent = filenameconst linesDiv = document.createElement('div')linesDiv.className = 'log-lines'lines.forEach(line => {const lineDiv = document.createElement('div')lineDiv.className = 'log-line'lineDiv.textContent = linelinesDiv.appendChild(lineDiv)})fileDiv.appendChild(filenameDiv)fileDiv.appendChild(linesDiv)resultsDiv.appendChild(fileDiv)})
This produced a structured interface where log matches were grouped by file and displayed in chronological order.
Conclusion
I originally built the tool for my own debugging and investigation work.
Over time I shared it with trusted members of the volunteer team managing the servers. Instead of manually downloading and searching logs, they could simply run the script, specify a time range and search string, and immediately see all relevant log entries.
This made it much easier to investigate incidents, trace user activity, and debug unexpected server behaviour.


