Last Week

Last week was infrastructure week. I finally split Shokken into integration, staging, and production backends so I could keep building without treating real users like crash-test dummies. That work mattered immediately, because both the Android and iOS beta builds are now live in their respective test tracks.

I expected this week to be more website polish: tighten up the marketing pages, improve the presentation, and keep delaying the uncomfortable part where strangers actually use the product. Then a visiting friend agreed to test the app with almost no context, and that changed the direction of the week completely.

The First Context-Free Test

He was an unusually useful tester.

He is not a restaurant operator. He is not steeped in the internal logic of the app. He had not been following the dev blog, and I did not brief him on the product philosophy beforehand. I gave him a single scenario instead: you are a busy host at a popular restaurant and you need to manage the line in front of you—use this app. Then I asked him to think out loud.

That setup was important because most of my previous feedback has come from people who already had some mental model for what Shokken is supposed to do. Even if they had never used the app before, they had context. They knew what problem I was solving, or they had enough patience to poke every button until the product started making sense.

This was different. He approached the app the way a real new user approaches software: with assumptions, habits, and not much generosity.

The good news is that the app held together mechanically. No crashes. No mysterious failures. Database calls succeeded. Telemetry looked healthy. The core workflows did not collapse under ordinary use. That matters, because it means the last several months of bug-fixing and backend hardening were not imaginary progress. Shokken is not in the “touch anything and it falls apart” phase anymore.

The bad news—or really the more valuable news—is that “mechanically sound” is nowhere close to “obvious to use”.

My tester surfaced a pile of UI and UX issues that all pointed at the same underlying problem: I have been designing too much of this app from inside my own head.

Some of the problems were about expectation matching. Users carry around a huge library of interface assumptions. Tap an X, and something should close. Tap a primary action, and the next state should be legible. Read a label, and it should sound like the language of the job, not the language of the engineer who implemented it. If the interface asks users to translate too much, they slow down and start doubting themselves.

That happened repeatedly in the session. Nothing was catastrophically broken, but multiple small mismatches compounded into confusion. He kept hitting moments where the app technically supported the right workflow, but the path was not obvious enough for somebody seeing it cold.

The clearest example was one of the “fast path” interactions I had built for heavy usage. On a guest card, I designed a quick action so the host would not have to drill into the guest details every time. The intended flow was elegant from my perspective:

  • tap the chip once to arm the action
  • tap the same place again to confirm
  • later, tap the same area again to dismiss the guest

If you already understand the pattern, it is efficient. The operator keeps their eyes roughly in one place, the confirmation happens without a modal, and the interaction minimizes hand movement during a busy service.

But that elegance existed mostly in my own mental model.

My tester did not discover the shortcut at all. He did what many reasonable people would do: tap the guest card, open the detail sheet, and look for a clear button. When the button changed to a “notify?” confirmation state instead of opening a conventional confirmation dialog, it read as ambiguity instead of speed. I had optimized for throughput before I had earned discoverability.

That was the most useful lesson of the week. A hidden convenience feature is not a convenience feature. It is just an implementation detail the user never benefits from.

What does it mean in English?

This week was a reminder that there are two very different ways for software to be “good”.

The first is technical: it does not crash, it talks to the backend correctly, and it preserves data the way it should.

The second is human: a new person can look at it and figure out what to do without feeling stupid.

I have made real progress on the first category. The test proved that. But the second category is what decides whether anyone sticks around long enough to become a user.

That is also why I changed my mind about launch timing. I can keep polishing privately forever, but private polishing will never generate the same kind of feedback as real usage. At some point the only honest move is to let people try the thing and learn from what confuses them.

Nerdy Details

Mechanical stability is no longer the bottleneck

One reason this test was so clarifying is that it separated operational quality from product quality.

If the app had crashed during the session, the takeaway would have been simple: fix stability first. If database operations had failed, if the queue had corrupted itself, or if telemetry had shown obvious backend issues, then this week would have become another round of defect triage.

Instead, the infrastructure mostly disappeared into the background, which is exactly what good infrastructure should do.

That is partly because of the work from the last two weeks. Moving from one shared backend to distinct integration, staging, and production environments reduced the amount of ambient risk in the system. I now have room to keep developing aggressively without worrying that every experiment might spill onto the instance intended for real users. That separation does not directly improve the UI, but it changes the quality of every subsequent decision. I can release with less fear, observe behavior, and iterate from a stable base.

There is also a psychological benefit here. Once the app is mechanically reliable enough, every confusing interaction becomes visible for what it actually is. I can no longer blame rough edges on “well, everything is still early”. The app is mature enough that usability has become the primary constraint.

That is a much better problem than random crashes, but it is still a real problem.

Discoverability beats micro-optimizations

The quick-action flow for notifying and dismissing guests is a good example of a design trap I am especially vulnerable to.

I spend a lot of time thinking about operator speed. Shokken is for people who may be juggling a queue while answering questions, seating guests, and dealing with interruptions. In that environment, every extra tap feels expensive. So it is natural to hunt for compressed interaction patterns: fewer dialogs, fewer screen transitions, fewer long reaches across the display.

The danger is that an interaction can be locally optimized and globally wrong.

My inline confirmation flow reduced motion, but it also hid the existence of the action. It asked new users to infer a custom interaction model before they had enough confidence to infer anything. That is backwards. The normal order should be:

  1. make the action obvious
  2. make the action trustworthy
  3. only then make the action fast

There are a few ways to resolve that tension:

  • keep the fast path, but add onboarding or affordances that teach it
  • surface a more conventional action first, then unlock faster patterns for repeat users
  • accept slightly more friction in exchange for higher clarity

I do not think the answer is to reject power-user flows entirely. Professional software needs some amount of compression, because repetitive work compounds. But the shortcut has to be legible, and if it is not legible then the optimization is premature.

The transcript also changed how I think about confirmation. From my perspective, reusing the same physical target for both initiation and confirmation was a win because it reduced pointer travel and visual reacquisition. From my tester’s perspective, the system had failed to answer the obvious question: “what exactly just happened?” A traditional modal may be slower, but it is explicit. That explicitness carries value, especially during first use.

So the technical takeaway is not just “maybe add a dialog”. It is broader: interface latency is not only about milliseconds and taps. Cognitive latency matters too. A user who has to pause and decode the UI is already paying a performance cost.

Wording is part of the interaction contract

Another issue the session exposed is that I still have too much engineer-speak in the product.

When I am deep in implementation mode, labels often come from the internal structure of the system rather than the external language of the user. The result is software that is semantically accurate but operationally awkward. It says what the system is doing, not what the person is trying to accomplish.

That distinction matters more in Shokken because the user is not sitting down with a coffee to explore the app. They are working. The interface needs to align with the vocabulary of service, queue management, and immediate action. A phrase that is “close enough” in a prototype becomes expensive in a live environment because every moment of hesitation multiplies across a shift.

This is one of the easiest places for solo builders to fool themselves. If I use the product every day, I become fluent in my own weird phrasing. That fluency is fake. It is specific to me. The tester session functioned like a translation audit: every moment where he slowed down or verbalized uncertainty was evidence that I was relying on insider knowledge.

I do not need to solve this with copywriting theater. I just need to be stricter about matching labels to user intent. If a button is for notifying a guest, it should say that in the most obvious way possible. If a destructive or stateful action is being armed, the UI should communicate that state clearly and consistently. Plain language is not a cosmetic layer; it is part of the control surface.

Optional onboarding is now worth considering

I have generally been suspicious of onboarding flows because I skip them whenever possible.

The session forced me to confront the obvious flaw in using myself as the universal reference user: plenty of people do read onboarding, especially when the app is unfamiliar and task-oriented. My tester explicitly told me he goes through those sequences. That means an onboarding layer could rescue at least some of the confusion without requiring me to flatten every advanced interaction into the lowest common denominator.

If I do add onboarding, it needs to follow a few rules:

  • it has to be skippable
  • it has to be recallable later
  • it should teach workflows, not marketing slogans
  • it should focus on the handful of interactions that are unusual but valuable

That last point is the most important. I do not want a six-screen tour that says nothing. The useful version would teach concrete ideas: how guest actions work, what the queue states mean, and which shortcuts are worth learning if you are using the app during a rush.

There is a cost, of course. Onboarding is not free. It has to be implemented, tested, localized eventually, and kept in sync with product changes. Every guided interaction becomes another piece of interface surface area that can go stale. So I do not want to add it just because it is a fashionable box to check.

But this week produced an actual argument for it: there are specific, high-value behaviors that currently require too much intuition to discover on first contact.

The real next step is production, not more private polishing

The biggest decision from the week is that I am done treating “not yet launched” as a safe default.

I could easily spend months doing respectable-sounding work:

  • one more round of UI cleanup
  • one more security review
  • one more pass on the website
  • one more set of edge-case fixes before I feel emotionally ready

All of that work has value. None of it answers the core question: will real people use Shokken, and what will they bounce off first?

That answer only comes from release.

So next week’s concrete goal is to submit for production on both the App Store and Google Play. I expect the submission process itself to surface issues, especially on iOS where the production review requirements are much more demanding than the testing track. I would be surprised if the first pass sailed through untouched.

But that is not a reason to delay the first pass. It is a reason to start it.

The backend is ready for this now. Production can exist as a stable environment while integration and staging continue moving underneath it. That is exactly why I spent the time on environment separation in the first place. If I am unwilling to use that safety margin to actually ship, then I have built a release pipeline for a release I am too timid to attempt.

I am not completely satisfied with the app yet. That is true. But the bar for release is not “contains no future improvements”. The bar is “useful enough to help someone now, stable enough that using it is not reckless, and instrumented enough that feedback can drive the next iteration”.

I think Shokken has crossed that line.

The most uncomfortable part is also the healthiest part: once the app is public, feedback gets sharper. Ratings may be rough. Reviews may point at problems I should have noticed earlier. Some of that will sting. But that information is exactly what turns a builder’s private theory into a product.

Without real users, I am just talking to a camera every week. With real users, I finally get to find out what this thing actually is.

Next Week

Next week is submission week. I am going to prepare the production paperwork for both stores, send the builds in, and release if either side approves faster than expected.

At the same time, I want to start addressing the usability issues this test exposed: cleaner wording, clearer interaction cues, and possibly an optional onboarding flow for the parts of the app that are efficient once learned but too opaque on first contact.