Analyze agent evaluation data to uncover why they fail
Drop your CSV here or click to upload
Columns: task_name, topic, input_length, output_length, safety_passed, instruction_passed, efficiency_score, pass_fail
💡 Getting Started: Download the sample CSV from the README or upload your own evaluation data. Look for the pass_fail column (values: pass/fail).