← Back to projects
pkgpulse: Container Image Package Analyzer
Sep 1, 2025 5 min read

pkgpulse: Container Image Package Analyzer

A single-binary Go CLI that analyzes and compares container image packages. Answer your own question: distroless, wolfi or scratch?

View Live →

This project started with a very normal container question: what exactly is inside these runtime images, and which one should I use?

scratch, distroless, Wolfi. They all sound sensible until you want to compare them properly. Then you realise you’re mostly choosing from vibes, docs, and whatever blog post looked convincing at 11:40pm.

pkgpulse gives me a more useful answer. It inspects container images, shows the packages inside, and makes it easy to compare images side by side.

Tech Stack Talk

The interesting bits:

This is also a very AI-native project. I built it with Codex, treated myself more like a picky product manager than a careful line-by-line engineer, and kept steering based on output quality and speed. Slightly chaotic. Surprisingly effective.

  • Single-file Go CLI (~1700 lines in main.go) with zero mandatory runtime dependencies
  • Parallel image analysis with caching
  • Live progress updates on multiple images, useful for a slow internet connection

The Journey from a wrapper to being self contained

pkgpulse started life as a wrapper around Docker and Syft. Version 0.1.0 was basically:

docker pull <image>
syft <image> -o syft-json
# parse JSON, display table

It worked, but it came with luggage. Docker had to be running. Syft had to be installed. And Syft is built for a much bigger job than the one I actually had. I didn’t need a full SBOM platform. I just wanted to know which APK, DEB, or RPM packages were in the image and how large they were.

0.9.0 was the first proper shift: native package parsing. Instead of leaning on Syft, the tool reads image layers as tar archives and parses package databases directly.

0.10.0 was the second shift: no Docker daemon. Using go-containerregistry, I could talk to registries over HTTP, fetch manifests and layers directly, and add a tarball cache at the same time. Images are saved locally and reload much faster on later runs.

So the tool went from "please install half the kitchen first" to "just run the binary". Much better.

0.11.0 added CSV export, mostly because comparing more than a few images in a terminal starts to look like a spreadsheet trying to escape.

0.12.0 came from working in a hotel with dreadful Wi-Fi. Downloads were so slow the app looked dead, so I added live multi-line progress updates. Sometimes good product decisions come from being mildly annoyed in a different postcode.

The tool also learned to handle awkward cases like fragmented dpkg state and odd overlay filesystem behaviour. Some of that came from AI-generated implementation details I did not inspect with saint-like discipline. The repo is honest about that.

The Concurrency Model

When comparing multiple images, each one needs to be fetched, cached, and parsed independently. The concurrency model is a standard bounded semaphore:

sem := make(chan struct{}, 5) // max 5 concurrent

for i, image := range images {
    wg.Add(1)
    go func(idx int, img string) {
        defer wg.Done()
        sem <- struct{}{}        // acquire slot
        defer func() { <-sem }() // release slot
        results[idx] = analyzeImage(...)
    }(i, image)
}
wg.Wait()

All goroutines start immediately, but only five can actively work at once. Results are written back into a pre-allocated slice by index, so there’s no mutex drama.

Why five? Registry rate limits. Docker Hub is not a fan of enthusiastic parallelism. Five turned out to be a practical number: fast enough to help, low enough not to get slapped.

Key Takeaways

Does pkgpulse solve my original question? Yes. It helps me see what’s inside an image, compare tradeoffs, and occasionally answer weird practical questions like "why is vips in here?".

It’s less useful once you already have a runtime image that works well enough. At that point, container optimisation can turn into a hobby with a clipboard. But for choosing, comparing, and satisfying curiosity, it’s handy.

Technical:

  • Parsing other people's data formats means handling every quirk (dpkg fragments, three RPM database formats, whiteout files)
  • Removing external dependencies is worth the effort. Zero deps is the best install story
  • Single-file Go CLIs are a great format for tools like this. Easy to understand, easy to contribute to, one go install and you're done

Product:

  • The most useful insights came from comparing images. Single image analysis is nice, but comparison is where decisions get made
  • Auto-CSV export at the right threshold removed friction I didn't know existed
  • Edge cases (distroless, scratch+binary, BusyBox) are where the tool's value is actually proven

Future Work

Features:

  • Automatic cache invalidation via digest checking (currently cache is manual-only)
  • Vulnerability cross-reference. Not full scanning, but flagging known CVEs for the packages detected
  • Image layer breakdown. Show which layer contributes what packages and size
  • --json output for CI/CD pipeline integration

Architecture:

  • Consider splitting the ~1700 line main.go if it grows further, though for now single-file is still a feature, not a bug
  • Explore using go-containerregistry's streaming layer reads instead of full tarball downloads for faster first-analysis

Distribution:

  • More package managers (Nix, Wolfi's newer apk variants)
  • Interactive TUI mode for exploring package trees