Most agent setups are overbuilt.
You can feel the framework wanting to happen before the problem is even clear. There is a planner, a runtime, a memory layer, a tool registry, a workflow engine, and a long explanation for why all of that is necessary. Then you look at the actual job, and it is usually much smaller. You need a few commands the agent can trust, a few rules about how it should behave, and a repo where those choices live in plain text.
That is the shape I ended up with for a Gmail cleanup agent: mise task files for execution, and skill files for behavior.
What mattered was a small surface that both a human and a model could use without ceremony.
mise Tasks as the Execution Surface
In this repo, the Gmail task surface ended up small on purpose:
all-threads-listthreads-listthreads-getlabels-listquery-countquery-previewquery-exportquery-trashquery-archive
That is enough.
Those tasks give the agent a stable command layer. They are visible. They are inspectable. They are narrow. A human can run them directly. A model can compose them into longer workflows without having to invent new shell commands every time.
The task files stay thin:
#!/usr/bin/env bash
set -euo pipefail
#MISE description="Archive threads for a Gmail query"
source mise-tasks/gmail/.lib
gmail-archive-query-auto "$@"
That is the right level of indirection. The runnable file says what the task is for. Shared shell logic lives in .lib. The repo keeps a visible command surface. Nothing is hidden behind a custom runtime.
I like mise for this because it gives the agent something concrete to target. mise run gmail:query-export — 'in:inbox older_than:30d' is a real command. It is easy to inspect, easy to repeat, and easy to trust. If I stop using the agent tomorrow, the tasks are still useful.
That matters more than people admit. Most agent abstractions decay into private infrastructure. A small task surface survives.
Skill Files as the Behavioral Layer
The task files are only half the story. The other half is teaching the agent how to use them.
That is where the skill files come in.
In this repo there are two repo-local skills doing most of the work:
- a
miseskill that pushes the agent toward the task surface first - a
mise-task-authorskill that tells the agent how to add or change tasks without turning the repo into sludge
Those skills do not execute anything. They do not add another runtime. They shape the agent’s habits.
The mise skill says, in effect: use the repo’s command surface if it exists, and treat sandbox friction as a clue that the surface may be missing something. The mise-task-author skill says: keep task files thin, keep shared logic in .lib, preserve argument passthrough, and resist the urge to build a little shell framework.
That split has been more valuable than any “agent architecture” diagram I have seen. Tasks define what can be done cleanly. Skills define how the agent should approach the repo and when it should improve the surface.
One file exposes operations. Another file teaches discipline.
The Narrow Workflow That Ended Up Useful
The Gmail agent got better when the surface got smaller and more generic.
The useful loop now looks like this:
mise run gmail:query-export -- 'in:inbox'
mise run gmail:threads-get -- --params '{"userId":"me","id":"THREAD_ID","format":"metadata","metadataHeaders":["From","Subject","Date"]}'
mise run gmail:query-archive -- 'in:inbox older_than:14d'
mise run gmail:query-trash -- 'label:Newsletter'
That is enough to support a real workflow:
- export a manifest of matching threads with sender, subject, date, and snippet
- inspect it locally
- hydrate specific threads when more context is needed
- archive or trash through a visible task
The agent can do a lot with that. It can build manifests, classify mail locally, suggest buckets, and queue actions for approval. None of that requires a giant tool registry or a bespoke agent framework. It requires a handful of commands that make sense.
The repo got worse whenever I let it fill up with one-off cleanup tasks. Every time the surface drifted toward promotions-trash, stale-marketing-trash, or some narrow sender bucket, it became harder to reason about. Those commands captured yesterday’s conclusions, not today’s capabilities.
The generic tasks lasted. The specific ones were scaffolding.
Why This Pattern Holds Up
I like this approach because the artifacts stay valuable even outside the agent.
The task files expose a clean operational surface. The skill files keep the model from making the repo worse.
That buys a few things:
- low lock-in
- explicit commands
- easy review
- easy pruning
- human and agent sharing the same interface
More importantly, it keeps the repo honest.
When an agent needs a new capability, the question is simple: should this become a task? If yes, make it small and visible. If no, keep the reasoning local and temporary. That habit prevents a lot of fake abstraction.
People talk about agents as if the hard part is making them more autonomous. A lot of the time the hard part is giving them a decent place to stand.
mise tasks gave this agent a floor. Skill files gave it posture.
That turned out to be enough.
A Good Default for Narrow Agents
I would reach for this pattern again for any focused operational agent: email triage, personal finance cleanup, log review, doc maintenance, small internal ops.
Start with a repo. Add a few tasks that expose the actions cleanly. Add a few skills that teach the agent how to use and extend that surface. Keep the tasks generic. Keep the behavior files opinionated. Delete the specialized cruft when the stable primitives become obvious.
You do not need much more than that.
The best part is that the repo still makes sense without the model. That should be a higher bar than it is.