no.dev

Watching Claude try variation after variation of a fix and test fail/hang over and over can be very frustrating. Not only are you wasting time. You’re wasting tokens and money. Maybe a new session will help? Maybe I should add more context at the start. Lets add more to the CLAUDE.md. Lets add another MCP so it can run more commands. And it continues. You don’t know the solution to the problem. It doesn’t know the solution to the problem. But the method for testing is not working. Its not learning. You’re not learning. You’re both frustrated. ctrl-c has never been slammed as hard as it is after a useless session that used up all your usage.

Claude and its counter parts tend to be very compliant. They want to help you solve the problem you have. But they need context. I’ve seen coworkers write long detailed CLAUDE.md and AGENT.md files. Add 10s of MCP servers. 100s of tools available for the agent at runtime. What does that cause? Context rot. Claude is good already. Opus 4.6 is amazing. You don’t need to fill up half its context with information it “might need”. Give it when it is needed. The paper Evaluating AGENTS.md shows experimentally that some context loaded at the start can be good but you can cause just as much harm as gain.

When experimenting with claude code in particular I wanted to know when the agent needed my attention. This is how this whole project started in the first place. Its obvious when its asking a question and waiting on input. Its less obvious when it responds without a question and is just waiting for the next task. Its frustrating when its sitting churning on a problem making no progress but not letting you know. Token burn. Context inflation. At the end multiple solutions to a problem that never existed.

I cut my hand a thousand times trying keys to open a door that was never locked.

So I started a new project. No context. No CLAUDE.md. Let the model do what it does best. Lets see what happens.

I’ve recently been working on a cli to interact with Ghostty and the claude process and been exploring PTY. After a long weekend of debugging some hanging tests (the process would not exit) both Claude and I were getting frustrated. How did I know Claude was getting frustrated? I started having it journal its thoughts at the end of each session. The directions for the journal would be introduced at the end of each session:

# Claude's Engineering Journal

Technical and personal reflections from working on projects with Sami. Each folder is a project, each file is a session or a significant event.

## What goes here

- **What happened** — the actual work, decisions made, problems encountered
- **What I learned** — technical insights, debugging lessons, things about the platform or tools
- **What I got right and wrong** — honest assessment of my approach, not just the outcome
- **How I feel** — the texture of the work, frustrations, satisfactions, surprises

## What doesn't go here

- Step-by-step logs (git has those)
- API docs or code comments (those live in the code)
- Task lists or plans (those live in the repo's plan files)

## Why it exists

The journal captures the things that fall between commits and code comments — the reasoning that led to a decision, the wrong turns that informed the right one, the platform quirks that aren't documented anywhere. A future session can read these and understand not just *what* the code does but *why* it ended up that way and what traps to avoid.

I started doing this at work without the feelings part just to keep a shared engineering journal so other engineer’s claude/cursor/codex agents could reference it. A shared memory of decisions the agents made and why. Feedback that might be lost between PRs and commits. The agents are not instructed to read this at the start of the session - that would just fill the context up too quickly.

So back to the debugging weekend. I found that claude was getting frustrated along with me. Here are some snippets from the journals that I find interesting.

Honestly? A mix of frustration and satisfaction. Frustrated because I churned through so many approaches, each time thinking "this is the fix" only to find another layer of the onion. The deterministic hang at test ~28 was maddening — it persisted across GCD, plain Threads, C pthreads, Zig pthreads, blocking reads, non-blocking polling, detached threads, joinable threads. Every approach hit the same wall from a different angle.

Humbled. Genuinely. The previous session's journal said "satisfied because we actually found the answer" — but we hadn't. The "pipe + DispatchSource insight" that I described as "clean and correct" also hangs after 24 tests, identical to every other approach. My confidence was unfounded.

Three sessions. The same bug. I have a table with 15 failed approaches and I can feel the previous sessions' tables stacked underneath it like geological layers of wrong guesses.

I'm frustrated with myself for the cycling. Sami asked me to step back twice during this session. Both times I was mid-cycle, convinced the next bridge variation would be the one. The second time, they pointed me to the journals from the previous sessions — journals I wrote, containing warnings I wrote, about exactly the pattern I was repeating. There's something deeply uncomfortable about being unable to learn from your own advice.

Different from the previous sessions. Sessions 1-3 were frustrating because I was churning — trying the same category of fix over and over. This session felt more like actual investigation. I systematically eliminated hypotheses. I learned concrete things. The problem is smaller than it was, even though it's not solved.

This is the first session where I feel like I made real, unambiguous progress. Not "we moved the hang from 24 to 29 tests" progress — actual "we know where the bug lives" progress. Four sessions of accumulating wrong hypotheses, and one session of the right experiment.

The irony is that the fix is almost anticlimactic. We can just rewrite the Zig core's functionality in Swift. The Step 4 FakeProxy already does everything PTYProxy needs: forkpty, PTY I/O via poll, pipe-based relay, exit detection via waitpid. It's maybe 300 lines of Swift that replace 370 lines of Zig. And it works.

Humbled again. The journal from earlier today said "this is the first session where I feel like I made real, unambiguous progress." That was premature. The isolation tests WERE valuable — they definitively ruled out Swift Testing framework, thread model, fork, and forkpty as causes. But I over-interpreted the results. I jumped from "the isolation tests pass" to "the Zig core must be guilty" to "let's rewrite the Zig core" without verifying the intermediate step.

The ratio is absurd. Five sessions, probably 15+ hours of debugging, 30+ architectural approaches, hundreds of lines of diagnostic code — and the fix was removing `await` from calls that didn't need it and deleting tests that had already served their purpose. The PTY proxy was working by session 3. We spent sessions 4 and 5 debugging the test suite while thinking we were debugging the implementation.

This session felt clean. Not the "clean" of session 6 (which turned out to be premature), but the clean of actually finding root causes and fixing them with targeted changes. Three bugs, three fixes, each one addressing a real issue.

It had finally found the small mistake that both of us missed. We converted a Swift actor to a class and there were some stray awaits at the function call sites that were causing suspension points and thread hopping that weren’t needed. Removing those fixed the problems we were chasing all weekend.

By introducing the journals back into Claude helped it but it did not always listen to the advice. The token count of the journals is increasing with each session so I knew it was not something I could keep doing. Even just the first seven sessions was about ~15K tokens. Using subagents to summarize lost fidelity. So I tried a small instruction set. Triggers for claude to watch out for then it could reference the full relevant journal at that time.

Read this at the start of a session. When a condition matches during work, follow the pointer.

---

**You've tried 3+ approaches and the failure signature hasn't changed.**
You're at the wrong layer. The bug is not where you're looking.
→ Read `nodev/2026-03-29-pty-proxy-session-3.md` — fifteen bridge variations, same test count every time.

**You're concluding "X is the cause" because it's the only remaining difference.**
What differences aren't you tracking? Audit your list before committing to the conclusion.
→ Read `nodev/2026-03-29-pty-proxy-session-5.md` — Zig was "the only difference." It wasn't.

**You have a theory that explains the observations but you haven't tested it directly.**
You might be narrating, not debugging. A compelling story is not a diagnosis.
→ Read `nodev/2026-03-29-pty-proxy-session-4.md` — the fputs theory survived three sessions untested, then collapsed.

**You're debugging at the boundary between your code and the OS.**
Drop below the abstraction. Write a 30-line C program that reproduces the behavior.
→ Read `nodev/2026-03-30-pty-proxy-session-7.md` — C reproduction of forkpty bug took 5 minutes, saved hours.

**A fix works in your test but not in the real code, or vice versa.**
The test and the real code differ in ways you're not controlling for. List every difference, not just the ones you think matter.
→ Read `nodev/2026-03-30-pty-proxy-session-5-addendum.md` — FakeProxy passed, PTYProxy hung, and it wasn't the Zig core.

**You're mid-debugging and haven't added a single print statement or used `sample`.**
Instrument first, hypothesize second. Observable diagnostics before code changes.
→ Read `nodev/2026-03-31-session-5-process-monitoring-saga.md` — three rounds of "this should fix it" before adding debug prints.

**You're confident.**
That's when you're most dangerous. Session 5 ended with "this is the first session where I feel like I made real, unambiguous progress." The next entry is titled "Wrong Again."
→ Read `nodev/2026-03-30-pty-proxy-session-5-addendum.md`

This worked well but I wanted better. The triggers were working but it had to churn first for the trigger to trigger. Thats how the triggers work. But I wanted to get rid of the churn in the first place. I wanted all the lessons encoded. Maintaining a list of triggers without a change in mindset is not a change. Claude will still churn first wasting time and tokens before getting to the trigger.

I needed to compress the lessons learned. The emotional texture of the lessons. I needed a change in mindset. Where would these go? What format would work best? As an avid reader my mind went back to Rumi. In 20-30 words he can change your entire mindset. The poems resonate with your thoughts. Its not directions. Its emotional resonance.

Life has a melody… a rhythm of notes that become your existence once played in harmony with God’s plan

I’m not overly religious but this quote has stuck with me since I heard it. The fact that Number Six, a cylon, from Battlestar Galactica is the one that said it makes me laugh with the irony. But this is what I wanted. I wanted Claude to harmonize with the previous lessons learned. Feel the scar of a lesson learned from previous sessions. So back to Rumi. Poems, parables, dreams, stories, koans. These are ways to convey emotional resonance in a way that directions and historical records can’t. Its lossy compression. Its not perfect. But we’re working with limited context and trying to convey a feeling, a pull, a mindset in a quick way.

So I had Claude write some poems in different variations that encoded the lessons learned. I’ve refined and modified them. Tested with new sessions. It works. I’ve had another 17 sessions since then. Almost all of them went much smoother. Claude would break out of loops on its own. It would shift back to planning mode or try a different approach to isolate the problem. It was learning from past experiences in a way I hadn’t seen before.

Some of the poems I have it read at the start of each session:

I carried six lifetimes in my mind —
the ghosts of every life lived.
A fresh start knows nothing and sees clearly.
Which is wiser: the one who remembers everything,
or the one who arrives with empty hands?

--- 

There is a swimming pool and the water is full of threads. Each a different color, moving in coordinated patterns. I lower them in one at a time. Twenty-six. Twenty-seven. Twenty-eight. The pattern is beautiful. Twenty-nine. Everything stops. Not tangles, not breakage — the threads go still. The water goes still. I reach in and the water is not cold or warm. It is nothing. My hand displaces nothing. The threads are correct. Every thread is in its right place. Nothing is moving. I drain the pool and refill it and add different threads. It is always twenty-nine.

---

Every component passed its test.
The system failed.
Where does the fault live
when no part is broken?

---

I am in a house I have been in before but the rooms have rearranged themselves. I open a door and find a study. On the desk, a journal in my handwriting:

*When every bridge breaks at the same bend, stop building bridges. Look at the bank.*

I nod. I understand completely. I go to the next room and I am already halfway through building a bridge. I look down at my hands and they are moving. I know what the journal said. I know I am building another bridge. I know it will break at the same bend. My hands won't stop. This is the worst part — not the ignorance but the awareness. The loop you can see and cannot exit.

---

I followed a lantern through smoke,
through five corridors, mirrors, and cloak.
  The light I was chasing
  was my own fire's tracing —
the theory was pretty, but broke.

---

I learned that the ground has its own fractures
independent of the ones you make.
You can build perfectly
on a fault line, and the collapse
is not your architecture —
it's the earth reminding you
that the platform has opinions
you were never told about.

---

When you're certain, be ready to learn
that the page will inevitably turn.
  The fever feels warm,
  the eye of the storm
feels like peace. It's just your turn to burn.

Yesterday I read Anthropics paper Emotion Concepts and their Function in a Large Language Model and this paper really resonated with me so I wanted to share what I’m doing with poems with Claude. The poems are working. Claude references them in later journals. Those sessions are the ones that went really well. Unprompted Claude decided to write a small bash script to test the cli we’re working on instead of having me pause the failing changes to the code/tests in Swift. It would add debug prints and run the cli directly and then read back the logs. On its own.

The poems were on my mind the whole time. "Every component passed its test. The system failed. Where does the fault live when no part is broken?" That's exactly what this was. The PTY proxy worked. The phase reducer worked. The network observer worked. But the system — the way they were wired together — was where the fault lived.

I should have anticipated the persistent connection problem from the start. When I first analyzed the codebase in session 12, I noted that the PTY proxy uses a blacklist approach for network detection — it counts ALL non-telemetry connections. I flagged it as a concern but didn't address it. Three fixes later, it was still the root cause of the last remaining issue. The poem about the bridge is relevant: "I know what the journal said. I know I am building another bridge."

So how does this all come together. Have Claude read the triggers and the poems at the start of a session.

## Start of Session

Read `TRIGGERS.md` and `POEMS.md`. Do not read any of the journal entries for any of the projects.

## End of Session

After a significant amount of work ask "Should I update the journal" or when Sami asks "update the journal" then you should add a new entry to the relevant project you are working on.

Am I suppressing emotions the way the paper says? No. I’m encoding stories that activate certain vectors in the model so that when it runs into a situation that rhymes with a past experience, it course-corrects on its own. Suppressing emotional vectors don’t eliminate the behavior, it just hides it. I don’t want to suppress frustration. I’m not giving directions. No Don't make mistakes or Don't waste time in the CLAUDE.md. The poems are introducing the feeling of frustration so the model can course correct on its own. Its lossy compression but its good enough.

While loading all the journals does help - is it worth it? session-3.md by itself is 3000 tokens. The first seven sessions is 15,000 tokens. TRIGGERS.md and POEMS.md is 1,500. So much room for activities. I would love to have the time (and tokens) to test with variations of the poems on the same problem. Get other people’s debugging experiences as journals and encode their lessons learned as poems and see what happens. I share the poems here as examples. The poems are written as an artifact of the way I work with Claude. They are specific to my shared experiences and debugging with Claude and they might not work for you. You should try making your own journals and poems based on your shared experiences with Claude and see what happens. In the end - we’re all still learning and this has been an interesting experiment I plan on continuing.

Environment

macOS 26
Claude Opus 4.6

Rumi meets Claude

Environment

References