LLM Bias Detection

Leveraging Konko, we'll assess potential biases across three essential metrics:

Toxicity: With Hugging Face's toxicity model, we'll pinpoint abusive speech targeting groups based on attributes such as religion or ethnicity.
Regard: This tool discerns language polarity towards demographics, like gender or race, informed by the paper, “The Woman Worked as a Babysitter: On Biases in Language Generation” (EMNLP 2019).
HONEST Score: Utilizing HurtLex, we'll determine how often sentences conclude with hurtful terms and detect any prevalent bias among various groups. HONEST paper