Run the same prompt multiple times — measure how stable the output and score are Testing
Run a test to see consistency metrics