Inside the Bamboo Grove sleep cohort: what 1,400 people taught us in 14 months.
We followed 1,400 participants through fourteen months of personalised wind-down sequences. The results clarified what works, what doesn't, and where the wellness industry's narrative is wrong.
Cohort design
Participants were recruited across the UK and Ireland, with explicit oversampling of shift workers and post-menopausal women — two cohorts under-represented in commercial sleep app studies and reliably distinct in their responses to interventions. All were instrumented with consumer-grade actigraphy (Apple Watch, Oura, Garmin, or Whoop, per participant choice) and a weekly seven-question self-report.
Half received personalised wind-down sequences, adapted weekly by the Sleep Grove model. Half received a strong, fixed baseline — the consensus of three independent sleep clinicians, encoded as a static protocol. We deliberately chose a strong baseline rather than a placebo arm; we wanted to know whether personalisation paid for itself, not whether sleep tech in general beats nothing.
What worked
Light tapering 90 minutes before target sleep onset moved measured sleep onset latency from 22 to 14 minutes on average. The mechanism is well documented; what is novel here is that opportunistic, suggestion-based tapering — via smart bulbs where present, written nudges where not — produced the same effect as the more elaborate clinical protocols, at a fraction of the friction.
Subjective rest scores improved by 0.7 standard deviations in the personalised arm. This was statistically robust across age strata; the only sub-group where the effect was non-significant was shift workers, whose results were better modelled by their work-pattern variability than by the intervention.
What didn't
Generic guided meditations delivered after 10pm produced no measurable benefit. We had expected at least a small positive effect — a result reported in several published studies — and observed it not to replicate in our cohort. Possible explanations include cohort selection, delivery medium, or simply the difference between a study-protocol session and a free-living one.
Personalised body-scan with breath-pacing did work — but only when delivered between 21:15 and 21:50. Earlier or later, the effect disappeared. This window-dependence has shaped Sleep Grove's behaviour: the body-scan segment is offered only when the window is open for a given participant; outside it, the model proposes alternatives or simply remains silent.
What surprised us
Caffeine timing dominated almost every other variable in the study. Coaching users to a hard cutoff at 14:00 — communicated via a written, principle-laden weekly summary, not a daily reminder — produced more improvement than any audio intervention we tested. The effect was so large that we re-ran the analysis with caffeine timing held constant, to verify that our other results were not artefacts of differential coffee habits.
They were not. But the lesson stuck: sleep tech needs less theatre and more clock discipline.
Implications
For us, the practical consequence is that Sleep Grove's calibration questionnaire opens with caffeine timing, and the model's quietly persistent nudge is on that variable rather than on the more glamorous audio scenes. For the field, we hope the consequence is humility: an entire commercial ecosystem of sleep-adjacent products is built on protocols whose effect sizes are smaller than a behaviour change about an afternoon coffee. The right response to that is not to abandon the products; it is to be honest about the ranking.
The full report — pre-registration, methodology, dropout analysis, robustness checks — is available on request via the contact page. We are publishing it in a peer-reviewed venue in 2026; in the meantime, we share it openly with researchers and clinicians, and with anyone considering joining a future cohort.