吴恩达agentic-第二节

吴恩达教授agentic课程的第2节

Reflection Design Pattern

Reflection to Improve Outputs of a Task

Reflection lets an LLM improve its own outputs, just as humans reflect and revise their drafts.

Example:

  • Write an email → produce version 1 (v1)
  • Reflect → find unclear phrasing, typos, missing signature
  • Revise → produce version 2 (v2)

LLMs can follow the same loop:

  1. Generate v1 (email, code, etc.)
  2. Pass v1 into another prompt
  3. Reflect and improve to create v2

Different models can handle each step:

  • Generation → creative or direct model
  • Reflection → reasoning or analytical model

External feedback increases effectiveness:

  • Example: run code → capture errors → feed results back → fix bugs

Key points

  • Reflection adds modest but consistent performance gains
  • External information strengthens the reflection process
  • Reflection ≠ perfection, but it improves clarity, accuracy, and completeness

Why Not Just Direct Generation?

Zero-shot prompting = one-step generation with no examples.

Example:

  • “Write an essay about black holes.”
  • “Write a Python function for compound interest.”

Reflection vs direct generation:

  • Reflection often outperforms zero-shot on diverse tasks.
  • Research shows reflection boosts success across models like GPT-3.5 and GPT-4.

When reflection helps

  • Generating structured data (HTML, JSON)
  • Multi-step instructions (e.g., how to brew tea)
  • Creative generation (domain names, brand names)

Reflection prompt examples

  • Check tone, facts, and clarity in emails
  • Evaluate if domain names are easy to pronounce and have no negative meanings

Prompt writing tips

  • Use verbs like “review,” “reflect,” “check,” “verify”
  • Specify clear criteria (tone, factual accuracy, structure)
  • Study prompts from good open-source implementations

Chart Generation Workflow

Reflection also improves visual outputs.

Example workflow:

  1. Generate chart code (Python) → produce stacked bar plot (v1)
  2. Feed both code + generated image to a multimodal LLM
  3. Reflect visually → suggest better plot (e.g., grouped bars)
  4. Generate improved visualization (v2)

Effective reflection prompts

  • Assign clear roles (“expert data analyst”)
  • Include context: code, data, and image
  • Define criteria: readability, clarity, completeness

Model choice

  • Generation: GPT-4o, GPT-5, etc.
  • Reflection: reasoning model or visual model

Evaluating the Impact of Reflection

Reflection adds latency but often improves accuracy. Always test if it’s worth keeping.

Example: Database query evaluation

  1. LLM writes SQL → run without reflection
  2. LLM writes SQL → reflect, rewrite → run again
  3. Compare accuracy vs ground truth answers

Result example:

  • No reflection → 87% correct
  • With reflection → 95% correct

Evaluation methods

  • Objective tasks: use automated tests (e.g., SQL results)
  • Subjective tasks: use LLM-as-judge with rubrics

Rubric-based evaluation

  • Avoid pairwise comparison bias
  • Use binary (0/1) scoring for consistent evaluation
  • Criteria: clear title, proper labels, correct chart type, etc.

Why this matters

  • Enables prompt optimization
  • Provides reproducible benchmarks
  • Keeps improvements data-driven

Using External Feedback

Reflection with new information outperforms reflection using only prior context.

Performance curve:

  • Zero-shot → plateaus fast
  • Reflection → moderate gain
  • Reflection + external feedback → strong improvement

Examples of external feedback

  • Run code → capture runtime errors → feed back
  • Detect banned terms → feed detection results
  • Web search for fact-checking → supply verified info
  • Word count validation → enforce length limits

Design pattern

  1. Generate → produce output
  2. Collect external feedback → new data
  3. Reflect with feedback → improved output

Reflection is a system-level loop:

  • LLM learns from reality
  • Developer measures and tunes
  • System continuously improves

Summary

Reflection design pattern:

  • Mimics human revision
  • Works across text, code, and images
  • Improves quality when guided by clear criteria or external feedback

Core principle Add structured feedback loops to any generation system.

© 2026 林悦己