← All journal entries

Journal · Design · 7 min read · August 2025

Quiet AI: why the most useful wellness models will be the ones you barely notice.

Most wellbeing apps optimise for sessions. We argue this is a category error, and walk through the design principles behind models that earn long-term trust by demanding less of you, not more.

Abstract A reward function trained on daily engagement reliably produces habit-engineering apparatus, regardless of the surface category — language learning, exercise, meditation. We outline why the same outcome is inevitable for wellbeing AI and describe an alternative architecture: composite weekly rewards, written summaries instead of streaks, models that withhold themselves on recovery weeks, and explicit user-facing trade-off statements. We then report 14-month retention data consistent with this architecture's design intent.

The optimisation trap

If you reward your model for daily sessions, it will become a habit-engineering apparatus. This is not a moral failing on the part of any team; it is the mechanical consequence of choosing a particular metric. The model — whether classical recommendation or modern reinforcement-fine-tuned LLM — will discover and exploit features of human psychology that increase the chance you open the app today. Variable rewards, social-proof framing, mild guilt cues, streak preservation.

For some categories, this is acceptable. For wellbeing, it is the precise opposite of what users came for. People disengage from session-optimising wellbeing apps within nine weeks on average — and the disengagement carries shame, which is the most counterproductive emotional residue we could possibly leave behind.

What restraint looks like

Concretely, what would a restrained AI for wellbeing do that a session-optimising one would not?

It would politely sit out the days you wake up already grounded. It would withhold a nudge when your body data suggests you need recovery, not stimulation. It would propose practice lengths, not enforce them. It would send a written paragraph at week's end, not a daily push. It would, occasionally, propose taking a week off entirely — and not record that decision as a "lapse".

None of these are technically novel. What is unusual is choosing them as the default behaviour, rather than the loss-mitigation behaviour that kicks in after a churn alarm.

Building the model

We train against weekly wellbeing markers, not daily engagement. The reward is composite: subjective wellbeing (1–7 weekly), sleep quality (actigraphy and self-report), and self-reported sense of agency. The model never sees a "session opened" event as a reward; that data exists only as a diagnostic.

The trade is real. Our app opens less often than competitors'. Our seven-day active rate would, on a conventional B2C dashboard, look modest. Our fourteen-week retention is excellent, and our six-month "users who report they need the app less" rate — our internal north star — is, as far as we can tell, novel as a public metric.

Privacy as a product feature

Reflections are the most intimate data a user generates. We run them through on-device models. The cloud never sees a single journal entry. This is not a feature we mention prominently in the app; it is the default, and we do not believe users should have to think about it.

For the practices that benefit from cross-user signal — sleep, broadly — we use weekly anonymised summaries, opt-in, with the methodology published. For the practices that do not benefit from cross-user signal — journaling, reflection — we use no signal at all. This division turns out to map cleanly onto the moral geometry of the data, which is a coincidence we choose to take as encouragement.

What we have learned

Users who use the app twice a week, attentively, retain three times longer than those who used it daily and burned out. We optimise for the former. The interface choices that follow — quiet notifications, written weekly summaries, explicit pauses, prompts that decline to give advice — are not flourishes. They are what falls out of the metric.

The conclusion we have drawn, fourteen months in, is that "quiet AI" is not a stylistic preference. It is the engineering consequence of choosing the right reward function. If the goal really is human flourishing — not surface engagement, not retention curves — then the model that achieves it will, almost by definition, be one you barely notice.