Methodology

pari was developed as an AI-powered model based on machine learning to detect hate speech. To train the model, tweets from the X platform were used in addition to the Hrant Dink Foundation’s extensive news data archives compiling 10 years of print media coverage. In an effort to ensure a high-quality dataset, an interdisciplinary team devised detailed annotation guidelines to eliminate ambiguities and ensure inclusiveness. Based on these guidelines, hate speech was classified into four categories and the level of severity was assessed on a scale from 0 to 10.

Exaggeration/Generalization/ Attribution/Distortion: Discourses that draw larger conclusions and inferences from an event, situation, or action, manipulate real data by distorting it, or attribute isolated incidents to the entirety of an identity.

Swearing/Insult/Defamation/Dehumanization: Discourses that include direct insults, slurs, or demeaning remarks towards a community, or describe them with actions or attributes typically associated with non-human entities.

Threat of Enmity/War/Attack/Murder/Harm: Discourses that contain hostile statements, invoke war-like language, or express a desire to harm the specific identity in question.

Symbolization: Discourses in which an element of identity itself is used as an element of insult, hatred, or humiliation and the identity is symbolized in such manners.

In addition to hate speech categories, Tweets have also been annotated with respect to discriminatory discourse.

Discriminatory Discourse: Discourses where a community is viewed negatively as different from the dominant group in areas such as benefiting from rights and freedoms or inclusion in society.

As part of the data annotation process, annotators from different fields, comprising mostly university students, received extensive training prior to the labelling of tweets. Each tweet was annotated by three different annotators to ensure accuracy.

The dynamic and evolving nature of hate speech, cultural references, lack of context, and linguistic ambiguity (e.g. irony) are among the ongoing challenges in automatic hate speech classification. Adapting the model to accommodate these changes and maintaining accuracy requires regular data updates and retraining. Therefore, it is our aim to improve pari by regularly feeding more data. pari maintains 85% accuracy in Turkish and 80% accuracy in Arabic. In Turkish language, pari has achieved 17% false positive (FP) and 15% false negative (FN) results based on 2189 test data. In Arabic language, pari has yielded 6% false positive (FP) and 27% false negative (FN) results on 499 test data. (For further information on accuracy rates, please see “Utilizing AI in Against Hate Speech: A guide to annotation, classification and detection”).