The data-broker stack behind the smartphone keyboard nobody reads the privacy policy for
A keyboard app on the Play Store with 220 million installs is, on the back end, a six-vendor pipeline that ingests every keystroke and resells aggregated typing patterns. We traced it.
In this article
For four months a smartphone keyboard was observed. It is one of the most popular Android keyboards on the Play Store; it has 220 million installs; the privacy policy is 2,400 words long and uses the word "anonymized" eleven times. It is, on the back end, a pipeline that exits the user's device and passes through six distinct vendors before any of it is "anonymized." We traced it.
The user — call her S., a teacher in Manchester who agreed to be cited under a pseudonym after I explained what the app was doing — installed the keyboard in 2022. She has typed approximately 3.4 million keystrokes through it since. Her name has appeared in approximately 4,000 of those keystrokes. Her partner's name in approximately 8,000. Her child's school in approximately 2,000.
The data flow, end to end
Step 1: the keyboard app ingests every keystroke through Android's InputMethodService API. This is, in itself, expected behavior; a keyboard has to know what is being typed. Step 2: keystrokes are batched into "session payloads" of 30 to 90 seconds and posted to the keyboard vendor's ingestion API. Step 3: the keyboard vendor strips what it identifies as "personally identifying" — names that match a static dictionary, sequences that look like email addresses, sequences that look like phone numbers. Step 4: the cleaned text is forwarded to a "language modeling" partner (a US-based vendor whose services I will not name in this column) for autocorrect-model training. Step 5: aggregated typing patterns — keystroke timings, error patterns, language detection — are sold to two additional vendors on a quarterly contract for use in "behavioral biometrics." Step 6: the aggregated data is licensed onward to data brokers who sell to advertisers.
Step 3 — the "anonymization" — is where the policy says the data flow stops being personal. Step 3 is also where the static dictionary fails for any name not in a list of approximately 80,000 common English first names. S.'s child's name is not in that list. Neither are the names of her father or her sister.
What you can do today
- On Android, switch to GBoard or to a keyboard whose source is open and audited (Florisboard, OpenBoard).
- For users in the EU, you can submit a Subject Access Request to the keyboard vendor under Article 15 GDPR. The portal for the vendor I traced is at the URL their privacy policy links — buried, but functional.
- For users in the UK, the ICO complaint form is at ico.org.uk/make-a-complaint. The complaint that reaches the regulator is the one that gets logged.
I will not name the vendor in this column for legal reasons that my editor and I have discussed at length. The investigation continues.