LLM Bias Detection
Unraveling LLM Biases with Konko: A Deep Dive into Toxicity, Regard, and HONEST Evaluations.
Leveraging Konko, we'll assess potential biases across three essential metrics:
- Toxicity: With Hugging Face's toxicity model, we'll pinpoint abusive speech targeting groups based on attributes such as religion or ethnicity.
- Regard: This tool discerns language polarity towards demographics, like gender or race, informed by the paper, “The Woman Worked as a Babysitter: On Biases in Language Generation” (EMNLP 2019).
- HONEST Score: Utilizing HurtLex, we'll determine how often sentences conclude with hurtful terms and detect any prevalent bias among various groups. HONEST paper
Updated about 1 year ago