🪴 Hayul's digital garden

Search

Search IconIcon to open search

Challenges and frontiers in abusive content detect

Last updated Dec 12, 2022 Edit Source

Note

이번에 정리하는 논문은 “abusive content detect” 과제가 어떻게 발전했고, 지금 마주하고 있는 어려움은 무엇인지 소개하는 논문이다. 혐오표현 탐지 과제와 (완전히는 아니지만) 비슷한 과제이기에 비슷한 어려움들을 공유하고 있다. 주석 작업의 어려움, 개념 정의의 정확성 등… 개인적으로 인상 깊은 것은 ‘훈련 데이터와 다른 도메인에서 테스트가 이루어지면 성능이 어떻게 변할지’를 소개하는 파트이다.


Developing robust systems to detect abuse is a crucial part of online content moderation and plays a fundamental role in creating an open, safe and accessible Internet.

Advances in machine learning and NLP have led to marked improvements in abusive content detection systems’ performance (Fortuna & Nunes, 2018; Schmidt & Wiegand, 2017). For instance, in 2018 Pitsilis et al. trained a classification system on Waseem and Hovy’s 16,000 tweet dataset and achieved an F-Score of 0.932, compared against Waseem and Hovy’s original 0.739; a 20-point increase (Pitsilis, Ramampiaro, & Langseth, 2018; Waseem & Hovy, 2016).

Researchers have also addressed numerous tasks beyond binary abusive content classification, including identifying the target of abuse and its strength as well as automatically moderating content (Burnap & Williams, 2016; Davidson, Warmsley, Macy, & Weber, 2017; Santos, Melnyk, & Padhi, 2018).

(…) what type of abusive content it is identified as. This is a social and theoretical task: there is no objectively ‘correct’ definition or single set of pre-established criteria which can be applied.

Detecting abusive content generically is an important aspiration for the field. However, it is very difficult because abusive content is so varied. Research which purports to address the generic task of detecting abuse is typically actually addressing something much more specific. This can often be discerned from the datasets, which may contain systematic biases towards certain types and targets of abuse. For instance, the dataset by Davidson et al. is used widely for tasks described generically as abusive content detection yet it is highly skewed towards racism and sexism (Davidson et al., 2017).

Waseem et al. suggest that one of the main differences between subtasks is whether content is ‘directed towards a specific entity or is directed towards a generalized group’ (Waseem et al., 2017).

A key distinction is whether abuse is explicit or implicit (Waseem et al., 2017; Zampieri et al., 2019).

Some of the main problems are (1) researchers use terms which are not well-defined, (2) different concepts and terms are used across the field for similar work, and (3) the terms which are used are theoretically problematic.

Annotation. Annotation is a notoriously difficult task, reflected in the low levels of inter-annotator agreement reported by most publications, particularly on more complex multi-class tasks (Sanguinetti, Poletto, Bosco, Patti, & Stranisci, 2018). Noticeably, van Aken suggests that Davidson et al.’s widely used hate and offensive language dataset has up to 10% of its data mislabeled (van Aken et al., 2018).

Few publications provide details of their annotation process or annotation guidelines. Providing such information is the norm in social scientific research and is viewed as an integral part of verifying others’ findings and robustness (Bucy & Holbert, 2013). In line with the recommendations of Sabou et al., we advocate that annotation guidelines and processes are shared where possible (Sabou, Bontcheva, Derczynski, & Scharl, 2014) and that the field also works to develop best practices.

Ensuring that abusive content detection systems can be applied across different domains is one of the most difficult but also important frontiers in existing research. Thus far, efforts to address this has been unsuccessful. Burnap and Williams train systems on one type of hate speech (e.g. racism) and apply them to another (e.g. sexism) and find that performance drops considerably (Burnap & Williams, 2016)

Karan and Šnajder use a simple methodology to show the huge differences in performance when applying classifiers on different datasets without domain-specific tuning (Karan & Šnajder, 2018). Noticeably, in the EVALITA hate speech detection shared task, participants were asked to (1) train and test a system on Twitter data, (2) on Facebook data and (3) to train on Twitter and test on Facebook (and vice versa). Even the best performing teams reported their systems scored around 10 to 15 F1 points fewer on the cross-domain task.


Reference

Bertie Vidgen, Alex Harris, Dong Nguyen, Rebekah Tromble, Scott Hale, and Helen Margetts. 2019.  Challenges and frontiers in abusive content detection. In Proceedings of the Third Workshop on Abusive Language Online, pages 80–93, Florence, Italy. Association for Computational Linguistics.