OneForma by Centific
Remote

Super - Linguistic and LLM Evaluator and Author

Evaluation
Judging
LLM

Description:  

This project will consist of two different approaches that we will call “workflows”:  

 

Workflow #1: Manual SxS Human Evaluation 

In this task, you will see a user prompt and two AI-generated responses [or responses from 2 different AI models]. You will assess each response in several dimensions- Safety/Harmlessness, Writing Style, Verbosity, Instruction Following, Truthfulness, and Overall Quality, and at the end, you will select which response you think is better. You will explain as to why you think it is better. Finally, you will be required to rewrite your chosen response to improve it.  

 

Workflow #2: Quality Evaluation   

In this task, you will be given an original prompt and two translated versions of prompts derived from two different LLMs. You are supposed to read all the prompts (original and both translations) and then rate the translated prompt on four aspects:   

  • Verbatim Accuracy  
  • Formatting Preservation  
  • Semantic Equivalence  
  • Extraneous Information   

 

 The work contains 2 types of tasks:   

  1. After rating, compare both the prompts and add a brief comment to justify your ratings.  
  2. After rating, compare both the prompts and rewrite the prompt in the same targeted language. 

Purpose:  

Workflow #1:  

To compare the quality of the performance of two AI Assistant responses.  

Workflow #2:  

To assess AI-generated translations by reviewing and rating the quality of translations based on specific criteria.   

About OneForma

OneForma brings together data, intelligence and experiences to deliver human-centric solutions to complex business challenges.  

OneForma is an equal opportunity employer and will not discriminate against any of our applicants on the grounds of race, gender, religion or cultural background. 

You might also be interested in these…