Journal · Privacy · 6 min read · November 2025

Why your AI journaling companion should run on your phone, not in our cloud.

Personal reflections are the wrong data to ship to a cloud. We explain why on-device inference is now technically viable, what the trade-offs are, and how Reflect Grove is built end to end without ever seeing your entries.

Abstract Journal entries form one of the most intimate datasets a user produces, exceeding in sensitivity even health records or location history. Cloud-hosted journaling AI, regardless of encryption posture, exposes this dataset to subpoena, accidental staff access, breach, and policy drift. We argue that small, well-trained 3–7B language models running locally on flagship phone NPUs handle reflective prompting at quality and latency users care about, and that this architecture is now the responsible default. We describe the Reflect Grove implementation, including its refusal behaviours and the parts we deliberately did not build.

The data problem

Journal entries contain mental-health context, relationship details, financial anxieties, beliefs, plans, secrets. Aggregated over months, they exceed in sensitivity the contents of almost any other consumer dataset — including the medical records most users would never share with a third party.

Cloud storage of such entries — even encrypted, even with strong key management — exposes them to a class of risks that cannot be engineered around. Subpoenas, in jurisdictions a user did not choose. Accidental staff access during incidents. Misconfiguration during a routine migration. A policy that changes after acquisition. A breach. We could enumerate mitigations for each risk, but the honest summary is that there is no design that fully neutralises all of them; the only design that does is one where the data never arrives in the first place.

Why on-device is now viable

Five years ago, this argument would have been principled but impractical. Today it is principled and practical. Small, well-trained 3–7B language models running on flagship phone NPUs handle reflective prompting at acceptable quality and latency: median time-to-first-token in the low hundreds of milliseconds on current iPhones and flagship Android handsets, fluency that is markedly below frontier cloud models but markedly above what reflective journaling actually requires.

We use a distilled 4B model finetuned on a carefully curated reflective corpus. The corpus is hand-assembled from public-domain contemplative writing and licensed transcripts of mindfulness teachers we partner with, with categories of training data — therapy transcripts, social media, scraped wellness content — deliberately excluded.

The honest trade-off

On-device models are not as fluent as the largest cloud models. We will not pretend otherwise. Our research suggests that for reflective journaling the trade matters far less than people think — what users value is the prompt that opens the door, not the eloquence of a paragraph. In side-by-side blinded comparisons, users rated on-device prompts within 0.4 points (on a 1–7 scale) of cloud-model prompts; the difference disappeared entirely on a four-week revisit.

For long-form essay writing, or anything you intend to publish, a cloud model is still the right tool. For the kind of writing this product is for — the writing you do only for yourself — the right tool is one that physically cannot betray you.

How Reflect Grove is built

Inference runs locally. Embeddings stay on-device. Sync between your own devices uses end-to-end encryption with keys that never leave your custody. We literally cannot read your entries; we have not built the infrastructure to. The runtime container that holds the model has its network capability disabled at the OS level; we welcome anyone to verify this on a fresh install.

What does leave the device, only if you opt in, is anonymised weekly aggregate counts — for example, that you wrote four times this week. No prompts, no entries, no embeddings. This is off by default. We chose it as the only telemetry stream because it is the minimum we need to know whether the product is working, and the maximum we could in principle ship without compromising the architectural claim.

What this enables

The kinds of reflections people would never type into a cloud product — about partners, about parents, about themselves — can finally be supported by AI that genuinely cannot betray them. Our internal hypothesis, formed in the first weeks of beta, is that the value of this category of product is not the eloquence of the response, but the safety of the page. We have not yet seen evidence against that hypothesis. Reflect Grove is the design that follows from taking it seriously.