Remote Logs Searcher via SFTP

Our servers generated multiple compressed logs every day. Some days there would be one log file, other days there could be five or six depending on activity. Over time this turned into thousands of compressed .gz log files covering months or years.

Whenever something suspicious happened (a bug, exploit attempt, etc.) - the only way to investigate it was by searching through those logs.

Originally, the process looked like this:

Download tens, or even hundreds of compressed log files from the server
Decompress them locally
Run grep searches across the extracted logs
Manually scan the results and piece together timelines

This worked, but it was slow and repetitive

The simple tool I made to automate this

The tool connects to the server via multiple simultanious SFTP connections and scans logs across a configurable time range.

Instead of downloading and searching files sequentially, the script:

lists the server’s log directory
filters files based on a time threshold (for example you’d input “30”, “90”, or “365” days)
downloads log files concurrently
decompresses .gz files automatically
scans each line for a target string
aggregates the results into a structured output

The tool is able to process hundreds of log files very quickly, because the downloading of the logs, decompression, and search, all run in parallel. So now an investigation that would have taken multiple hours, now turns into minutes.

How the log processing works

Each worker thread downloads a log file, decompresses it if needed (latest log file on the remote server is not compressed), and searches each line for the target string.

The python script is able to process an insane amount of log files in parallel by using multiple workers.

This reduced the time required to search large volumes of logs.

Python

num_workers = 10
threads = []

for _ in range(num_workers):
    t = threading.Thread(target=worker)
    t.start()
    threads.append(t)

file_queue.join()

for t in threads:
    t.join()

Quick Note:

Simultaneous sftp connections are also used by FileZilla
https://www.ridgeon-network.co.uk/blog/faster-ftp-transfers-with-filezilla-concurrent-connections

Visualizing the Search Results

Instead of printing results in the terminal, the script generates a local HTML page that displays matches grouped by log file.

The Python script injects the results as JSON into a HTML template.

Python

results_json = json.dumps(sorted_results, indent=4, ensure_ascii=False)

with open(template_path, 'r', encoding='utf-8') as file:
    html_template_content = file.read()

html_template_content = html_template_content.replace(
    '>PLACEHOLDER_DATA',
    f'>{results_json}'
)

with open(output_path, 'w', encoding='utf-8') as file:
    file.write(html_template_content)

webbrowser.open('file://' + os.path.realpath(output_path))

Example of the Visual Output Template

The HTML visualizer renders the results and allows quick inspection of matches.

Html

<h1>RESULTS</h1>

<div id="utc-time"></div>

<div id="input-section" class="input-section">
    <textarea id="input">PLACEHOLDER_DATA</textarea>
    <button onclick="visualizeLogs()">Visualize</button>
</div>

<div id="results"></div>

This is a simplified version of my one, for demonstration purposes.

The results are rendered dynamically with JavaScript →

Javascript

Object.entries(data).forEach(([filename, lines]) => {

    const fileDiv = document.createElement('div')
    fileDiv.className = 'file-result'

    const filenameDiv = document.createElement('div')
    filenameDiv.className = 'file-name'
    filenameDiv.textContent = filename

    const linesDiv = document.createElement('div')
    linesDiv.className = 'log-lines'

    lines.forEach(line => {
        const lineDiv = document.createElement('div')
        lineDiv.className = 'log-line'
        lineDiv.textContent = line
        linesDiv.appendChild(lineDiv)
    })

    fileDiv.appendChild(filenameDiv)
    fileDiv.appendChild(linesDiv)
    resultsDiv.appendChild(fileDiv)

})

This produced a structured interface where log matches were grouped by file and displayed in chronological order.

Conclusion

I originally built the tool for my own debugging and investigation work.

Over time I shared it with trusted members of the volunteer team managing the servers. Instead of manually downloading and searching logs, they could simply run the script, specify a time range and search string, and immediately see all relevant log entries.

This made it much easier to investigate incidents, trace user activity, and debug unexpected server behaviour.

Takeaways

One thing I have learnt running online projects, is that repetitive operational tasks are almost always worth automating.
Even small internal tools can eliminate hours of manual work and make work dramatically faster.
This project was a good reminder that sometimes the most useful software isn’t a product - it’s the internal tools that make running a system easier.