Studying how people create passwords (SemTrAC)
Members
    - Jelena Mirkovic
- Chris Kanich
- Ameya Hanamsagar
- Simon Woo
Overview
The study focuses on analyzing password re-use, extracting password patterns, analyzing password complexity, their
    similarities, and variations across users or the same user as per web site category/importance. The target population
    is general population of users of any web site.
The study will focus on analyzing password re-use, extracting password patterns, analyzing password complexity, their
    similarities, and variations for the same user as per website category/importance. We plan to analyze human
    behavior/approach in creating passwords used in real-life, password re-use and their associated complexities as per
    website importance (which may be subjective). We also plan to analyze password complexities and their memorability
    across users as per website importance.
Relevant Work & Problem Statement:
A large-scale study conducted by Microsoft analyzed password re-use across websites and the number of online
    accounts users maintain. This study was done in 2006 and found that an average user has 25 online accounts. By now
    the number must have increased and our study will contribute to finding the new average.
A recent study analyzed semantic variations of passwords transformed from leaked password sets. They found many
    similarities among users but we want to study password similarities within many accounts of the same user.
Another study conducted at Carnegie Mellon University has analyzed the participant's opinion and approach for creating new
    passwords on random/fictitious websites based on their respective category (banking website, social networking
    website, email provider, forum, blog, etc.) in a controlled environment. This study asked participants to create
    passwords for three fictitious web sites and to narrate their choice. But, since the sites were fictitious, user
    motivations when they created these passwords were likely not the same as when they create real passwords. First,
    user perception of risk is smaller for a fictitious bank site than for a real one. Second, user need to remember a
    password is non-existent for a fictitious site, while it may be large for a real site.
Much prior research has also found that passwords are not secure (they are easily cracked) and many people forget
    their passwords. There is a trade-off between memorability and security of passwords (more complex passwords are
    more secure but less memorable). We believe that the best way to study how people create passwords, and which factors
    influence this security/memorability trade-off is to study real passwords users have at multiple sites. We want to
    study how people reason about security and memorability of passwords, and how their perception of risk and their
    perception of importance of the given site interact with this reasoning.
Study Procedure
	- In the study, you will visit a location at USC campus. You will use our machine in the study. All your input (except for information we store in our database) will be deleted at the end of the study.
- For the study, you'll be asked to login to our secure portal using GMail's OAuth authentication to access your mailbox (once you agree to this) using an automated system. We then analyze each email automatically, and compile a list of websites where you may have an online account. We also extract the number of password reset attempts for each website. 
- Once the list is compiled, it is shown on the portal. You may remove some websites from the list, which you may want to hide from our researchers.
- You will select 12 Web sites out of the list and attempt to log into them. Our software will record the login attempts you make (successful or unsuccessful), transform the username and password on our machine and store them in our database. The transformed username/password preserve semantic structure of the original username/password (e.g., if your original password were "Sam1201", we transform this into number+noun grammar, and then generate a random password with the same structure, e.g., "Jake3322"). This enables us to study password reuse. No one, including our research staff, can retrieve your original username or password from our transformation.
- We will also ask you some questions about your general password usage before the user study. Also, we will ask you some specific questions about your experience and credential usage after the user study. 
Confidentiality
The University of Southern California's Human Subjects Protection Program (HSPP) reviews and monitors research studies to protect the rights and welfare of research subjects. We will protect your privacy in the following way:
- We will not ask you for any identifying information, such as name, email, etc. The system will
assign you a random identifier for the study.
- Your credentials will be semantically transformed and stored on a secure server along with their grammar. So, none of your "actual" credentials are stored, visible or accessible to any person. For example, if your password is "John@4r" (without quotes), it may be transformed into "bob#1a." Here "John" is transformed (one-way mapping) into another name "Bob."
- The system neither stores, nor modifies any email messages. No private information is stored on our servers after your emails are accessed.
- The system scans your emails through an automated script without any intervention by a human being.
- The system will only store the semantically transformed credentials (along with capitalization and/or mangling usages, if any) and the corresponding domain name of the websites. We will also store the number of password resets and the time when you first created an account on that site.
- All data will be associated with your random identifier in our database.
    - Florencio, Dinei, and Cormac Herley. "A large-scale study of web password habits." Proceedings of the
        16th international conference on World Wide Web. ACM, 2007.
    
- Veras, Rafael, Christopher Collins, and Julie Thorpe. "On Semantic Patterns of Passwords and their Security
        Impact." NDSS. 2014.
    
- Ur, Blase, et al. "" I Added'!'at the End to Make It Secure": Observing Password Creation in the Lab." Eleventh
        Symposium On Usable Privacy and Security (SOUPS 2015). 2015.
    
- Egelman, Serge, and Eyal Peer. "Scaling the security wall: Developing a security behavior intentions scale
        (sebis)." Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 2015.
    
- Creese, Sadie, et al. "Relationships between password choices, perceptions of risk and security expertise."
        International Conference on Human Aspects of Information Security, Privacy, and Trust. Springer Berlin Heidelberg,
        2013.