Building Sorty: local-first AI document sorting

I’ve been paperless for a long time. Part organization, part paranoia. I sleep better knowing the important stuff is scanned, searchable, and stored somewhere sane.

My weekly ritual is simple:

Then we moved.

After an international relocation, my ā€œinboxā€ wasn’t a few documents. It was 200+ scans with names like ā€œScan Jul 17, 2025 at 12.17 PMā€ and ā€œ1 Screenshot 2.ā€ The job was obvious. The motivation was not.

Around the same time, I kept seeing demos of Claude Cowork: a local agent browsing folders, renaming files, sorting everything automatically. The capability was clearly there.

But I had one non-negotiable constraint: For this category of documents, I wanted a local-first design with explicit human-first reviews built into the flow.

So I built my own.

I called it Sorty.

Auto-generated description: A user interface is displayed on a software application called Sorty, designed to organize and process digital files with options for setting source folders and filing destinations.

The constraint: do it with AI, but keep it on my machine

The product idea was straightforward: Extract meaning from documents locally, propose a filename and destination folder, let me approve changes, then execute in a predictable, reviewable way.

No cloud OCR. No sending document text to an external LLM. Local-first by design.

I started where I start most things now: fast, ugly, and useful.

A Python CLI prototype I could run against a folder, iterate on, and delete without mourning. Claude Code and Cursor made that part almost unfairly easy. Within hours I had something that felt real:

And then I hit the part of AI-assisted building that still surprises people.

The output looked right. The system was wrong.

I’ve seen the same failure mode inside product orgs. When output gets cheap, it becomes easy to confuse ā€œplausibleā€ with ā€œcorrect.ā€ The only real defense is ownership: insist on foundational context, build a verification loop, and make the system’s behavior legible to the human who has to live with the consequences.

The failure mode: confidently building the wrong solution

Sorty was improving with each iteration, but it wasn’t improving in the way I intended.

When I dug in, I realized what was happening.

Claude was hard-coding a sorting schema based on my examples. I found a growing mapping table that was basically my prompt examples turned into code.

Instead of building a pipeline that interpreted each document using the primitives I was asking for (local OCR, on-device intelligence, user review), it was quietly turning my examples into a ruleset. That can look fine early on. It also becomes fragile the moment your folder stops resembling your prompt.

So I asked it directly, in plain language: ā€œAre you actually using Apple’s on-device intelligence and vision frameworks here?ā€

Its response was essentially: those capabilities aren’t available.

That mismatch is the dangerous part. Not malicious. Just stale.

Apple’s on-device capabilities were real, and I knew it. Post WWDC 2025, the latest OS releases made local-first intelligence a practical foundation for this kind of workflow. Sorty existed because I trusted that direction enough to build around it.

My coding assistant didn’t share that mental model. So it optimized for what it could ā€œproveā€ from my prompt instead of what I intended from the platform.

The fix: foundational context, then rebuild around trust

Once I treated this like any other ambiguity, the path was obvious:

  1. bring context (official docs, current APIs, real references)
  2. re-anchor the design around what actually exists today
  3. rebuild the pipeline so it cannot quietly fall back into a hard-coded guesser

I also installed a community-made plugin that pulls in current Apple developer documentation, because this is the reality of building with coding assistants: if their knowledge is stale, they will fill the gaps with confident fiction.

Once the model had the right foundational context, everything improved, including the decisions:

At some point the CLI stopped being a script and started being a product. So I used the CLI as a spec and began shaping a SwiftUI app.

Auto-generated description: A computer screen displays a file processing progress bar at 59% with an activity log below.

Three rules I would not compromise on

This workflow touches high-stakes documents. That changes the standard. ā€œWorksā€ is not enough. The UI has to show what is happening and why, in a way that survives real use.

Three rules became non-negotiable:

That last point shaped Sorty more than I expected. I added a Preview mode because I don’t want to infer what the system did after the fact. I want to see the exact set of proposed changes, make edits, and approve it before anything touches the filesystem.

What Sorty does

Sorty takes a folder of messy scans and helps you turn it into an organized archive:

One more constraint I wrote down early and kept: Sorty will not auto-delete, auto-archive, or silently move anything. If a file changes location, you see it and approve it.

The real lesson: AI collapses build time, not ownership

Lately, the loudest change has been speed. You can get from idea to shippable prototype in hours.

But building Sorty reinforced something I’ve started to treat as a rule: AI collapses the cost of output. It does not replace ownership.

If anything, it raises the bar on the human parts:

The difference, for me, is whether I’m driving or delegating.

Sorty only became real once I stopped treating the model like an oracle and started treating it like a capable teammate who still needs context, verification, and accountability.

What I’m taking forward

Auto-generated description: A digital interface displays a completed preview of sorted files, indicating 251 processed files, with options for further actions at the bottom.

Speed changes what is easy to produce. It does not change what is safe to ship.

The most valuable work moved upstream: constraints, foundational context, verification, and making the system’s behavior obvious to the person accountable for it.

The model can accelerate the build. Ownership still decides whether the system is real.