

However, datasets with both personality and demographic labels are scarce. Publisher = "Association for Computational Linguistics",Ībstract = "Personality and demographics are important variables in social sciences and computational sociolinguistics. | SocialNLP SIG: Publisher: Association for Computational Linguistics Note: Pages: 138–152 Language: URL: DOI: 10.18653/v1/2021.socialnlp-1.12 Bibkey: gjurkovic-etal-2021-pandora Copy Citation: BibTeX MODS XML Endnote More options… PDF: = "eddit",īooktitle = "Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media", Anthology ID: 2021.socialnlp-1.12 Volume: Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media Month: June Year: 2021 Address: Online Venues: NAACL Finally, we present benchmark prediction models for all personality and demographic variables. We showcase the usefulness of this dataset on three experiments, where we leverage the more readily available data from other personality models to predict the Big 5 traits, analyze gender classification biases arising from psycho-demographic variables, and carry out a confirmatory and exploratory analysis based on psychological theories. To address this, we present PANDORA, the first dataset of Reddit comments of 10k users partially labeled with three personality models and demographics (age, gender, and location), including 1.6k users labeled with the well-established Big 5 personality model. Abstract Personality and demographics are important variables in social sciences and computational sociolinguistics.
