Venmo is a US-based mobile social payments platform. Each Venmo transaction requires a ``payment note'', a brief memo. By default, these memos are visible to all other Venmo users. Using three data sets of Venmo transactions, which span 8 years and a total of 389 M transactions with over 22.5M unique users, we quantify the extent of private data leaks from public transaction notes. To quantify the leaks, we develop a classification framework SENMO, that uses BERT and regular expressions to classify public transaction notes as sensitive or non-sensitive. We find that 41 M notes (10.5%) leak some sensitive information such as health condition, political orientation and drug/alcohol consumption involving 8.5 M (37.8%) users. We further find that users seek privacy by making their notes private, inconspicuous or cryptic. However, the large increase in Venmo's user base means that the number of users whose privacy is publicly exposed has grown substantially. Finally, the privacy of a user who transacts with a group on Venmo can be reduced or eliminated through the actions of other users. We find that this happens to around half of Alcoholics Anonymous, gambling and biker gang group members. Our findings strongly suggest that public-by-default payment information puts many users at risk of unintended privacy leaks.
Identifying Sensitive Notes: Using BERT, we develop a classification framework, SENMO (SENsitive content on venMO). It classifies a transaction note as one or more of the sensitive categories. A note could also be classified as NON (non-sensitive), if it does not contain any sensitive information.
Measuring User Recourse: A user that does not know how to make their transactions non-public, may post cryptic notes. We define cryptic notes as notes that contain only emojis, only random numbers that do not match our regex patterns to infer sensitive content like phone numbers and addresses, only English interjections and greetings (e.g. “Hi”, “Hey”, “Aww”), only English stop words (e.g. “a”, “the”, “too”), use English letters, but do not contain a vowel, or are longer than 30 words.
Measuring Risks from Group Transactions: Membership in some groups may be considered sensitive – it may pose privacy risk to a user, if it is publicly known. We focus on three types of groups in this work: Alcoholics Anonymous, gambling and biker gang groups. We apply a keyword/activity heuristic to identify sensitive groups: identification of candidate sensitive groups, pruning of low-activity groups and pruning of unrelated groups.
Privacy leakage affects many transaction notes and users: 41M notes out of 389M notes (10.5%) leak some sensitive information and 8.5M users out of at least 22.5M unique users (37.8%) are affected. REL and DAG are two of the most frequent categories. Privacy leakage increases over time, in spitef user measures to contain it. Among the group users that we found, around 40%–50% of users post at least one sensitive or common-pattern note, rest attempt to hide their membership by posting unrelated notes. However, all the users (503K users who posted a note in these groups) are affected.