AI Glossary for Product Teams

Clear definitions and practical examples to align product, design, and engineering teams—use this glossary to reduce confusion and speed decisions. Get started now.

This glossary translates core and applied AI terms into practical language for product teams. It helps non-experts make informed decisions, communicate requirements, and evaluate trade-offs when specifying AI features.

Concise, product-focused definitions of core and applied AI terms.
Concrete examples showing when to choose techniques like fine-tuning vs prompting.
Practical data, measurement, and governance guidance to avoid common pitfalls.

Purpose and how to use this glossary

This glossary is designed for product managers, designers, and engineers who need a shared language for scoping AI features, writing requirements, and reviewing vendor proposals. Use it during discovery, spec reviews, and vendor evaluations to reduce ambiguity.

Suggested workflow: reference the glossary when drafting acceptance criteria, attach relevant definitions to tickets, and keep a single linked copy in your team handbook so everyone uses the same terms.

Quick answer — one-paragraph summary

Core AI concepts: machine learning (ML) finds patterns in data, deep learning (DL) uses neural networks for complex tasks, large language models (LLMs) generate and reason about text, and reinforcement learning (RL) optimizes behavior via reward. Applied techniques—fine-tuning, prompting, and inference—determine how you adapt models to product needs. Choose by trade-offs: accuracy, latency, cost, interpretability, and data governance.

Define core AI terms (ML, DL, LLM, RL)

Machine learning (ML) — Algorithms that learn patterns from labeled or unlabeled data to make predictions or classifications. Example: a classifier that tags support tickets by category.

Deep learning (DL) — A subset of ML built on layered neural networks (e.g., CNNs, RNNs, Transformers). DL excels at image, audio, and language tasks. Example: a convolutional model that detects defects in product photos.

Large language model (LLM) — A transformer-based DL model trained on vast text corpora to generate and understand natural language. Example: using an LLM to draft email replies or summarize meeting notes.

Reinforcement learning (RL) — Training an agent to take actions in an environment to maximize cumulative reward. Example: an RL approach to optimize recommendation sequences that maximize long-term retention.

Clarify applied AI terms with examples (fine-tuning, prompting, inference)

Fine-tuning — Continued training of a base model on domain-specific labeled data so it performs better on your task. Example: fine-tuning an LLM on legal contracts to improve clause extraction accuracy.

Prompting — Designing inputs (prompts) to guide an LLM’s behavior without changing model weights. Examples: system messages, few-shot examples, or structured templates. Prompting is fast to iterate and requires no training data, but may be less reliable than fine-tuning.

Inference — The act of running a trained model to produce predictions or outputs. Inference choices affect latency, cost, and scalability. Example: on-device inference for AR filters vs. cloud inference for large LLM responses.

Applied technique tradeoffs
Technique	Strength	Weakness	When to use
Fine-tuning	High task accuracy	Requires labeled data, retraining	Domain-specific tasks needing reliability
Prompting	Fast, no retrain	Variable outputs, prompt engineering needed	Prototypes, creative tasks, low-cost iterations
Inference	Operationalized predictions	Latency & cost considerations	Any production use—choose infra based on SLA

Compare model types, metrics, and trade-offs

Common model types: small models (edge-friendly), medium models (balanced), and LLMs (high-capacity). Choosing a model depends on constraints: accuracy, latency, interpretability, cost, and data sensitivity.

Accuracy vs latency: larger models often improve accuracy but increase inference time.
Cost vs performance: hosted LLM APIs reduce infra overhead but raise per-request cost.
Interpretability: classical models and smaller DL models are easier to inspect than massive transformers.

Metrics to monitor
Metric	Why it matters	Example threshold
Accuracy / F1	Task correctness	Product-specific (e.g., F1 > 0.8)
Latency (P95)	User experience	< 300ms for interactive UIs
Token usage / cost	Operational budget	Monitor monthly spend vs budget
Failure rate	Reliability	< 1% for critical flows

Manage data: labeling, bias, governance

Data is the foundation. Prioritize clear labeling guidelines, audit datasets for demographic and sampling bias, and maintain lineage for governance.

Labeling: create a rubric, train annotators, measure inter-annotator agreement (Cohen’s kappa).
Bias audits: sample outputs across cohorts, run fairness metrics, and document edge cases.
Governance: track data sources, consent, retention policies, and approval workflows for model updates.

Data governance checklist
Item	Action
Data source record	Store provenance and last refresh date
Consent & PII	Flag and remove personally identifiable information
Versioning	Tag datasets and model snapshots

Common pitfalls and how to avoid them

Ambiguous requirements — Remedy: define success metrics, example inputs/outputs, and edge cases in acceptance criteria.
Underestimating data needs — Remedy: run a data sufficiency check and pilot label ~1k examples before full build.
Overreliance on prompts for critical flows — Remedy: use fine-tuning or retrieval-augmented models for reliability.
Neglecting monitoring — Remedy: implement real-time telemetry for latency, error rate, and drift detection.
Ignoring governance — Remedy: document lineage, approvals, and privacy checks before production deployment.

Standardize team vocabulary and documentation

Create a living glossary page in your documentation portal and link definitions in PRs and tickets. Use templates to standardize specs: objective, success metrics, data sources, privacy constraints, and rollout plan.

Spec template fields: intent, training/inference method, datasets, rollout criteria, monitoring plan.
Communication: use the glossary terms verbatim in cross-functional meetings to reduce misunderstandings.
Onboarding: include a short quiz or checklist for new hires to confirm understanding of core terms.

Implementation checklist

Attach glossary-backed definitions to the feature spec.
Choose technique (prompting vs fine-tuning) with explicit trade-offs documented.
Prepare labeling rubric and pilot ~1k examples.
Set measurable success metrics (accuracy, latency, failure rate).
Establish monitoring, drift detection, and governance sign-offs.

FAQ

Q: When should we fine-tune instead of prompt?: A: Fine-tune when you need consistent, high-accuracy behavior on domain-specific tasks and have labeled data and time for retraining; prompt for fast iteration or low-risk, creative tasks.
Q: How much labeled data is enough?: A: It depends on task complexity; start with a pilot (≈1k high-quality examples) to estimate learning curves and error modes.
Q: How do we measure model drift?: A: Compare recent predictions to ground truth samples, monitor performance metrics over time, and set alerts for significant drops or distribution shifts.
Q: What governance checks are essential before production?: A: Verify data provenance, privacy compliance, fairness audit results, and stakeholder sign-off on rollout & rollback plans.