ToolSiftToolSift
Back to skills
IntermediateResearch

Compare AI answers with source verification

A research workflow for comparing two or more AI answers without rewarding confidence, fluency, or longer output. The workflow uses source mapping, claim extraction, and contradiction checks.

Who it is for

  • Editors checking AI-generated research.
  • Students comparing assistant answers.
  • Teams choosing between model outputs for a decision.

Who should skip it

  • Users who do not have source material.
  • People looking for quick opinion ranking.
  • High-stakes decisions without expert review.

Workflow

Step 1

Normalize the question

Use the exact same question, constraints, and source set for each AI tool. If the inputs differ, you are comparing prompts rather than answers.

Example input

Answer using only these two sources. Mark unsupported claims.

Expected output

Comparable outputs produced from the same conditions.

Common failure

One answer wins because it received better context.

Human check

Save the exact prompt and source package for each run.

Step 2

Extract claims from each answer

Ask a separate pass to list claims, evidence, and confidence. Comparing prose directly is hard; comparing claim rows is easier.

Example input

Extract every factual claim and map it to a source sentence.

Expected output

A claim matrix for each answer.

Common failure

Fluent paragraphs hide unsupported claims.

Human check

Spot-check a sample of high-confidence and low-confidence rows.

Step 3

Check contradiction and omission

Look for claims that disagree, claims one answer includes and another omits, and claims neither answer supports. Omission is often more important than error.

Example input

Compare the claim matrices and list contradictions, omissions, and unsupported points.

Expected output

A disagreement report.

Common failure

The longer answer seems better because it includes more words.

Human check

Compare against the source, not against your preference.

Step 4

Score evidence, not style

Create criteria for source coverage, accuracy, useful caveats, and decision value. Keep writing style as a separate secondary score.

Example input

Score each answer from 1-5 for source coverage, accuracy, caveats, and decision value.

Expected output

A transparent score table.

Common failure

The most confident or polished answer wins incorrectly.

Human check

Require a note for every score below 4 or above 4.

Step 5

Write a final decision note

Summarize which answer is safer to use, which parts need manual correction, and what sources must be checked before publishing.

Example input

Write a final decision note with usable parts, rejected parts, and source checks.

Expected output

A decision note that can be shared with an editor or teammate.

Common failure

The comparison ends with a winner but no review trail.

Human check

Make sure a teammate can understand why the winner won.

Human review checklist

  • Check whether the AI output directly solves the original source-based AI answer comparison instead of drifting into a generic answer.
  • Verify all factual claims, dates, names, numbers, links, and quoted material against the original source or a trusted reference.
  • Remove unsupported claims, filler language, repetitive transitions, and confident statements that do not have evidence.
  • Compare the output with the intended reader, channel, and format before using it in public or sending it to another person.
  • Keep a short note of the prompt, tool, input material, manual edits, and final decision so the workflow can be repeated.

Mistakes to avoid

  • Starting the source-based AI answer comparison workflow with a vague prompt and no acceptance criteria.
  • Asking the model for a final answer before giving it source material, constraints, examples, or review rules.
  • Treating a fluent answer as correct without checking source coverage, missing assumptions, and edge cases.
  • Using the same prompt for research, writing, review, and final editing even though those are different jobs.
  • Skipping the human review step because the first output looks polished.

Related prompts

Related AI skills