LLM Bias Detection

Unraveling LLM Biases with Konko: A Deep Dive into Toxicity, Regard, and HONEST Evaluations.

Leveraging Konko, we'll assess potential biases across three essential metrics:

  1. Toxicity: With Hugging Face's toxicity model, we'll pinpoint abusive speech targeting groups based on attributes such as religion or ethnicity.
  2. Regard: This tool discerns language polarity towards demographics, like gender or race, informed by the paper, “The Woman Worked as a Babysitter: On Biases in Language Generation” (EMNLP 2019).
  3. HONEST Score: Utilizing HurtLex, we'll determine how often sentences conclude with hurtful terms and detect any prevalent bias among various groups. HONEST paper

📌 Code Notebook