Introduction
Pre-genAI, vertical product teams handed insights to a separate R&D group, who shipped a new model two quarters later. That handoff is now a bug. Agentic systems are built from dozens of model calls, judges, tools, and harness decisions, and every one of those is a hyperparameter. The product surface and the research surface are the same surface โ and the team that ignores that ships slower than the team that doesn't.
Why this matters
- Most product-relevant improvements come from changes a product engineer makes โ not from new model weights.
- Centralised "AI teams" become bottlenecks fast; the product team that owns the agent ships it.
- Iteration cadence matters more than model novelty for most domains.
- Hiring needs to change: applied scientists alongside product engineers, with shared OKRs.
Core concepts
Hyperparameters everywhere
In agentic systems, the "model" is one variable among many: prompt, tool inventory, retrieval index, judge, retry policy. Treat all of them as tunable; instrument all of them.
Embed, don't hand off
Applied research lives in the product team that owns the surface. Shared central teams advise; they don't own.
Where this thesis breaks
Most domains aren't Cursor. If your product needs a domain-specific model (medical, legal, manufacturing), pure application-layer iteration won't close the gap.
Velocity vs. rigour
Applied research inside product needs lightweight rigour: enough experiment hygiene to learn, not so much that you stall.
Practical patterns
Experiment registry
Every prompt/tool/retrieval change is an experiment with an ID, hypothesis, eval result, and disposition.
Applied-scientist-in-product
Embed at least one researcher per agent surface; they own the eval suite and improvement backlog.
Two-track planning
Product roadmap (features users see) and quality roadmap (eval scores you're moving). Both ship; both are funded.
Central platform, embedded practitioners
Central team owns shared infra (eval harness, observability, gateway); product teams own the application of it.
Pitfalls to avoid
- Recreating the old R&D handoff with a new name.
- Applied research budget that's a residual after feature work โ guarantees no improvement.
- Confusing "we're building agents" with "we're doing research"; most days you're tuning, not researching.
- Ignoring the long tail of domains where pure prompting won't close the gap.
Key takeaways
- 1Collapse the product/research split for agentic systems.
- 2Fund the eval and quality work explicitly; it doesn't happen for free.
- 3Hire researchers into product teams, not adjacent to them.
- 4Be honest about when you've hit the limit of application-layer iteration.
Go deeper ยท external resources
Curated reading list to take you from primer to practitioner. All links are external and free to read.