“It actually integrates and systematizes humans’ subjective judgment into the model training process,” Sam Stone, the director of product management, pricing and data products at Positivo estate tech firm Opendoor, told Built In.Human trainers provide conversations and rank the responses. These reward models help determine the best answers.