Ethical Considerations in QAnon Authorship Attribution Research

cover
7 Dec 2024

Authors:

(1) Florian Cafiero (ORCID 0000-0002-1951-6942), Sciences Po, Medialab;

(2) Jean-Baptiste Camps (ORCID 0000-0003-0385-7037), Ecole nationale des chartes, Universite Paris, Sciences & Lettres.

Abstract and Introduction

Why work on QAnon? Specificities and social impact

Who is Q? The theories put to test

Authorship attribution

Results

Discussion

Corpus constitution

Quotes of authors outside of the corpus have been

Definition of two subcorpus: dealing with generic difference and an imbalanced dataset

The genre of “Q drops”: a methodological challenge

Detecting style changes: rolling stylometry

Ethical statement, Acknowledgements, and References

Ethical statement

Even if written independently, this study tried to abide as much as possible to the principles of the “Pratiquer une recherche int`egre et responsable” guide by the Centre National de la Recherche Scientifique’s ethics board (Comit´e d’´ethique du CNRS, 2017). This article does not reveal the identity of individuals that were not broadly known beforehand. In this case, all candidate authors were already either public figures, or individuals whose identity had been stated in major media outlets (NBC, HBO etc.). It only uses information that was in conscience made publicly available by the candidate authors and was accessible through standard internet searches at the time of data collection.

For ethical reasons and to respect the privacy policy of the platforms studied here, we do not freely release any content studied here. To respect the data sharing and open data principles, we however detail our data collection method, which should be sufficient to ensure reproducibility in most cases. Some contents could not be available anymore when an attempt at reproducing our computations is performed. In that case, these missing materials could be delivered to research teams on request.

Table 1: Results of the leave-one-out cross-evaluation for the large corpus (left) and the controlled corpus

Table 2: Confusion matrix for the leave-one-out evaluation on the larger corpus

We choose to designate the candidates we study only by their first name and initial, not to impact internet searches on their names.

Finally, we remind that this paper does not assert in any way that other persons outside of the persons studied here could not have written the Q drops.

Acknowledgements

The authors would like to thank David D. Kirkpatrick from the New York Times and Frederick Brennan for their help during this investigation. Errors remain our own. Funding: no funding was used for this research. Data and materials availability: Code is available on Zenodo, doi: 10.5281/zenodo.6164620. Data available on request.

References

Max Aliapoulios, Antonis Papasavva, Cameron Ballard, Emiliano De Cristofaro, Gianluca Stringhini, Savvas Zannettou, and Jeremy Blackburn. The gospel according to q: Understanding the qanon conspiracy from the perspective of canonical information. arXiv preprint, arXiv:2101.08750, 2021.

Shlomo Argamon, Moshe Koppel, James W. Pennebaker, and Jonathan Schler. Automatically Profiling the Author of an Anonymous Text. Communications of the ACM, 52(2):119–123, February 2009. ISSN 00010782. doi: 10.1145/1461928.1461959.

Stephane J Baele, Lewys Brace, and Travis G Coan. Variations on a theme? comparing 4chan, 8kun, and other chans’ far-right “\pol” boards. Perspectives on Terrorism, 15(1):65–80, 2021.

Jean-Paul Benz´ecri. L’Analyse des Donn´ees. Volume II. L’Analyse des Correspondances. Paris, 1973.

Johanna Bjorklund and Niklas Zechner. Syntactic methods for topic-independent authorship attribution. Natural Language Engineering, 23(5):789– 806, 2017.

Florian Cafiero and Jean-Baptiste Camps. Why moli`ere most likely did write his plays. Science advances, 5(11):eaax5489, 2019.

Florian Cafiero and Jean-Baptiste Camps. ‘Psych´e’ as a Rosetta stone? Assessing collaborative authorship in the French 17th century theatre. In Proceedings of the Conference on Computational Humanities Research 2021 Amsterdam, the Netherlands, November 17-19, 2021., pages 377–391, 2021. URL https://ceur-ws.org/Vol-2989/ long_paper51.pdf.

Florian Cafiero and Jean-Baptiste Camps. Affaires de style. Le Robert, 2022.

Jean-Baptiste Camps, Thibault Cl´erice, and Florian Cafiero. Supervised Stylometry: SuperStyl. Ecole ´ des chartes, Paris, 2021. URL https://github. com/SupervisedStylometry/SuperStyl/.

Carole E Chaski. Who’s at the keyboard? authorship attribution in digital evidence investigations. International journal of digital evidence, 4(1):1–13, 2005.

Comit´e d’´ethique du CNRS. Pratiquer une recherche int`egre et responsable: Guide. CNRS, Paris, 2017. URL https://comite-ethique.cnrs.fr/wpcontent/uploads/2019/10/GUIDE-2017-FR.pdf.

Malcolm Coulthard. Author identification, idiolect, and linguistic uniqueness. Applied linguistics, 25 (4):431–447, 2004.

Olivier De Vel, Alison Anderson, Malcolm Corney, and George Mohay. Multi-topic e-mail authorship attribution forensics. In Proceedings ACM Conference on Computer Security-Workshop on Data Mining for Security Applications, 2001.

Joachim Diederich, Jorg Kindermann, Edda Leopold, ¨ and Gerhard Paass. Authorship attribution with support vector machines. Applied intelligence, 19 (1):109–123, 2003.

Maciej Eder. Does size matter? Authorship attribution, small samples, big problem. Literary and Linguistic Computing, 30(2):167–182, June 2015. ISSN 0268-1145. doi: 10.1093/llc/fqt066. URL https://academic.oup.com/dsh/article/ 30/2/167/390738.

Maciej Eder. Rolling stylometry. Digital Scholarship in the Humanities, 31(3):457–469, 2016.

Maciej Eder. Short Samples in Authorship Attribution: A New Approach. In DH, Montr´eal, 2017. ADHO. URL https://dh2017.adho.org/ abstracts/341/341.pdf.

Maciej Eder. Elena ferrante: a virtual author. In Drawing Elena Ferrante’s Profile, pages 31–46. Padova University Press Padova, 2018.

Gavin Fox (pseudo). Where in the world is q? clues from image metadata, 2021. URL https: //catdogmix.com/2021/05/11/where-in-theworld-is-q-clues-from-image-metadata/.

Amanda Garry, Samantha Walther, Rukaya Rukaya, and Ayan Mohammed. Qanon conspiracy theory: Examining its evolution and mechanisms of radicalization. Journal for Deradicalization, 26:152– 216, 2021.

Benyamin Ghojogh and Mark Crowley. The theory behind overfitting, cross validation, regularization, bagging, and boosting: tutorial. arXiv preprint arXiv:1905.12787, 2019.

David Gilbert. How qanon is tearing families apart. VICE, 26, 2021. URL https://www.vice.com/en/article/dy8ayx/howqanon-is-tearing-families-apart.

Alexander AG Gladwin, Matthew J Lavin, and Daniel M Look. Stylometry and collaborative authorship: Eddy, Lovecraft, and ‘The Loved Dead’. Digital Scholarship in the Humanities, 32(1):123– 140, 2017.

Mohamad Hoseini, Philipe Melo, Fabricio Benevenuto, Anja Feldmann, and Savvas Zannettou. On the globalization of the qanon conspiracy theory through telegram. arXiv preprint, arXiv:2105.13020, 2021. URL http://arxiv. org/abs/2105.13020.

Cullen Huback. Q: Into the storm, 2021. 6 episodes HBO Documentary.

Mingzhe Jin and Minghu Jiang. Text clustering on authorship attribution based on the features of punctuations usage. In 2012 IEEE 11th International Conference on Signal Processing, volume 3, pages 2175–2178. IEEE, 2012.

Patrick Juola. Authorship attribution, volume 3. Now Publishers Inc, 2008.

Patrick Juola. Stylometry and immigration: A case study. Journal of Law and Policy, 21:287, 2012.

Patrick Juola. The rowling case: a proposed standard analytic protocol for authorship questions. Digital Scholarship in the Humanities, 30(suppl 1):i100– i113, 2015.

Jeffrey Kaplan. A conspiracy of dunces: Good americans vs. a cabal of satanic pedophiles? Terrorism and Political Violence, 33(5):917–921, 2021. doi: 10.1080/09546553.2021.1932342.

Mike Kestemont. Function words in authorship attribution. From black magic to theory? In Proceedings of the 3rd Workshop on Computational Linguistics for Literature (CLFL), pages 59–66, Stroudsburg PA, 2014. Association for Computation Linguistics.

Mike Kestemont, Sara Moens, and Jeroen Deploige. Collaborative authorship in the twelfth century: A stylometric study of hildegard of bingen and guibert of gembloux. Digital Scholarship in the Humanities, 30(2):199–224, 2015.

Mike Kestemont, Justin Stover, Moshe Koppel, Folgert Karsdorp, and Walter Daelemans. Authenticating the writings of julius caesar. Expert Systems with Applications, 63:86–96, 2016.

Mike Kestemont, Efstathios Stamatatos, Enrique Manjavacas, Walter Daelemans, Martin Potthast, and Benno Stein. Overview of the cross-domain authorship attribution task at pan 2019. In CLEF (Working Notes), 2019.

Adrienne LaFrance. The Prophecies of Q. The Atlantic, May 2020. URL https://www.theatlantic.com/magazine/archive/ 2020/06/qanon-nothing-can-stop-what-is-coming/610567/. Section: Ideas.

S´ebastien Lˆe, Julie Josse, and Franc¸ois Husson. FactoMineR: A package for multivariate analysis. Journal of Statistical Software, 25(1):1–18, 2008. doi: 10.18637/jss.v025.i01.

Wincenty Lutoslawski. Principes de stylom´etrie appliqu´es a la chronologie des oeuvres de ` platon. Revue des etudes grecques, 11(41): 61–81, 1898. doi: 10.3406/reg.1898.5847. URL https://www.persee.fr/doc/reg_0035- 2039_1898_num_11_41_5847.

Rangsipan Marukatat, Robroo Somkiadcharoen, Ratthanan Nalintasnai, and Tappasarn Aramboonpong. Authorship attribution analysis of thai online messages. In 2014 International Conference on Information Science & Applications (ICISA), pages 1–4. IEEE, 2014.

George K Mikros. Authorship attribution and gender identification in greek blogs. Methods and Applications of Quantitative Linguistics, 21:21–32, 2012.

George K Mikros. Blended authorship attribution: Unmasking elena ferrante combining different author profiling methods. UPPADO, page 85, 2017.

Frederick Mosteller and David L Wallace. Inference in an authorship problem: A comparative study of discrimination methods applied to the authorship of the disputed federalist papers. Journal of the American Statistical Association, 58(302):275–309, 1963.

Lincoln Mullen. textreuse: Detect Text Reuse and Document Similarity, 2020. URL https://CRAN. R-project.org/package=textreuse. R package version 0.1.5.

Andrew Y Ng et al. Preventing” overfitting” of cross-validation data. In ICML, volume 97, pages 245– 253. Citeseer, 1997.

Orphanalytics. Style analysis by machine learning reveals that two authors likely shared the writing of QAnon’s messages at two different periods in time. Technical report, Vevey, December 2020. URL https://www.orphanalytics.com/ en/news/whitepaper202012.

Siham Ouamour and Halim Sayoud. Authorship attribution of ancient texts written by ten arabic travelers using a smo-svm classifier. In 2012 International Conference on Communications and Information Technology (ICCIT), pages 44–47. IEEE, 2012.

Antonis Papasavva, Jeremy Blackburn, Gianluca Stringhini, Savvas Zannettou, and Emiliano De Cristofaro. ‘is it a qoincidence?’: An exploratory study of qanon on voat. In Proceedings of the Web Conference 2021, pages 460–471, New York, 2021. Association for Computing Machinery. doi: 10.1145/3442381.3450036.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12: 2825–2830, 2011.

James W. Pennebaker. The Secret Life of Pronouns: What Our Words Say About Us. Bloomsbury Publishing, New York, reprint edition edition, January 2013. ISBN 978-1-60819-496-4.

Petr Plechac. Relative contributions of Shakespeare and Fletcher in Henry VIII: An analysis based on most frequent words and most frequent rhythmic patterns. Digital Scholarship in the Humanities, fqaa032, 2020. doi: 10.1093/llc/fqaa032. URL https://arxiv.org/pdf/1911.05652.pdf.

Jan Rybicki. Partners in life, partners in crime. In Drawing Elena Ferrante’s Profile, pages 111–122. Padova University Press, 2018.

Jan Rybicki, David Hoover, and Mike Kestemont. Collaborative authorship: Conrad, Ford and rolling delta. Literary and Linguistic Computing, 29(3): 422–431, 2014. Publisher: Oxford University Press.

Upendra Sapkota, Steven Bethard, Manuel Montes, and Thamar Solorio. Not all character n-grams are created equal: A study in authorship attribution. In Proceedings of the 2015 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies, pages 93–102, Stroudsburg PA, 2015. Association for Computational Linguistics.

Santiago Segarra, Mark Eisen, and Alejandro Ribeiro. Authorship attribution through function word adjacency networks. IEEE Transactions on Signal Processing, 63(20):5464–5478, 2015.

Efstathios Stamatatos, Francisco Rangel, Michael Tschuggnall, Benno Stein, Mike Kestemont, Paolo Rosso, and Martin Potthast. Overview of pan 2018. In International conference of the cross-language evaluation forum for european languages, pages 267–285. Springer, 2018.

Karina van Dalen-Oskam and Joris Van Zundert. Delta for middle Dutch author and copyist distinction in Walewein. Literary and Linguistic Computing, 22(3):345–362, 2007.

Jana Winter. Fbi document warns conspiracy theories are a new domestic terrorism threat. Yahoo News, 1, 2019. URL https://news.yahoo.com/fbidocuments-conspiracy-theories-terrorism160000507.html.

Brandy Zadrozny and Ben Collins. Who is behind the Qanon conspiracy? We’ve traced it to three people. NBC News, August 2018. URL https://www.nbcnews.com/tech/technews/how-three-conspiracy-theorists-tookq-sparked-qanon-n900531.

This paper is available on arxiv under CC BY 4.0 DEED license.