吴恩达agentic-第二节

吴恩达教授agentic课程的第2节

📅 2025-10-26📂 AI课程

📚 推荐资源

🚀 CheerSelfAI - AI资源&工具集合↗

Reflection Design Pattern

Reflection to Improve Outputs of a Task

Reflection lets an LLM improve its own outputs, just as humans reflect and revise their drafts.

Example:

Write an email → produce version 1 (v1)
Reflect → find unclear phrasing, typos, missing signature
Revise → produce version 2 (v2)

LLMs can follow the same loop:

Generate v1 (email, code, etc.)
Pass v1 into another prompt
Reflect and improve to create v2

Different models can handle each step:

Generation → creative or direct model
Reflection → reasoning or analytical model

External feedback increases effectiveness:

Example: run code → capture errors → feed results back → fix bugs

Key points

Reflection adds modest but consistent performance gains
External information strengthens the reflection process
Reflection ≠ perfection, but it improves clarity, accuracy, and completeness

Why Not Just Direct Generation?

Zero-shot prompting = one-step generation with no examples.

Example:

“Write an essay about black holes.”
“Write a Python function for compound interest.”

Reflection vs direct generation:

Reflection often outperforms zero-shot on diverse tasks.
Research shows reflection boosts success across models like GPT-3.5 and GPT-4.

When reflection helps

Generating structured data (HTML, JSON)
Multi-step instructions (e.g., how to brew tea)
Creative generation (domain names, brand names)

Reflection prompt examples

Check tone, facts, and clarity in emails
Evaluate if domain names are easy to pronounce and have no negative meanings

Prompt writing tips

Use verbs like “review,” “reflect,” “check,” “verify”
Specify clear criteria (tone, factual accuracy, structure)
Study prompts from good open-source implementations

Chart Generation Workflow

Reflection also improves visual outputs.

Example workflow:

Generate chart code (Python) → produce stacked bar plot (v1)
Feed both code + generated image to a multimodal LLM
Reflect visually → suggest better plot (e.g., grouped bars)
Generate improved visualization (v2)

Effective reflection prompts

Assign clear roles (“expert data analyst”)
Include context: code, data, and image
Define criteria: readability, clarity, completeness

Model choice

Generation: GPT-4o, GPT-5, etc.
Reflection: reasoning model or visual model

Evaluating the Impact of Reflection

Reflection adds latency but often improves accuracy. Always test if it’s worth keeping.

Example: Database query evaluation

LLM writes SQL → run without reflection
LLM writes SQL → reflect, rewrite → run again
Compare accuracy vs ground truth answers

Result example:

No reflection → 87% correct
With reflection → 95% correct

Evaluation methods

Objective tasks: use automated tests (e.g., SQL results)
Subjective tasks: use LLM-as-judge with rubrics

Rubric-based evaluation

Avoid pairwise comparison bias
Use binary (0/1) scoring for consistent evaluation
Criteria: clear title, proper labels, correct chart type, etc.

Why this matters

Enables prompt optimization
Provides reproducible benchmarks
Keeps improvements data-driven

Using External Feedback

Reflection with new information outperforms reflection using only prior context.

Performance curve:

Zero-shot → plateaus fast
Reflection → moderate gain
Reflection + external feedback → strong improvement

Examples of external feedback

Run code → capture runtime errors → feed back
Detect banned terms → feed detection results
Web search for fact-checking → supply verified info
Word count validation → enforce length limits

Design pattern

Generate → produce output
Collect external feedback → new data
Reflect with feedback → improved output

Reflection is a system-level loop:

LLM learns from reality
Developer measures and tunes
System continuously improves

Summary

Reflection design pattern:

Mimics human revision
Works across text, code, and images
Improves quality when guided by clear criteria or external feedback

Core principle Add structured feedback loops to any generation system.