AI Paradigms Overview
Quick Look Summary
| Concept | What it is (1‑sentence) | Core tech / algorithmic family | Typical “sweet‑spot” tasks | Main strengths | Typical limits |
|---|---|---|---|---|---|
| Classical AI | Rule‑based systems that manipulate explicit symbols, logic and search. | Expert systems, production rules, planning/search (A*, SAT solvers), knowledge graphs. | Diagnostic reasoning, theorem proving, constraint solving, high‑level robotics planning. | Fully explainable; works with very little data. | Brittle to noise; hard to scale to perception‑heavy domains. |
| Machine Learning | Algorithms that learn statistical patterns from data without being hand‑programmed. | Linear/logistic regression, decision trees, SVMs, clustering, reinforcement learning, shallow NNs. | Spam detection, churn prediction, recommendation, simple classification/regression. | Good with moderate data; fast to train; relatively interpretable (tree‑based). | Feature engineering still required; performance caps on raw, high‑dim data. |
| Neural Networks | Computation graphs composed of interconnected “neurons” that approximate functions. | Perceptron, multilayer perceptron (MLP), convolutional layers, recurrent layers, attention heads. | Image/video classification, speech recognition, language modeling (when shallow). | Learns features automatically; smooth function approximation. | Shallow nets struggle with very complex hierarchies; often need lots of data. |
| Deep Learning | Deep (many‑layer) neural networks that can learn hierarchical representations. | CNNs, RNNs/LSTMs, Transformers, Graph Neural Networks, Diffusion models. | Vision (object detection, segmentation), NLP (translation, chat), speech‑to‑text, game‑playing. | State‑of‑the‑art accuracy on perception tasks; scales with compute & data. | Data‑hungry, opaque (hard to explain), expensive to train/infer. |
| Generative AI | Models that create new data (text, images, code, music, etc.) rather than just label it. | Autoregressive Transformers (GPT‑x), diffusion models (Stable Diffusion, DALL·E), VAEs, GANs. | Content creation, data augmentation, code synthesis, design, simulation. | Produces novel, high‑fidelity outputs; can be finetuned for many domains. | Hallucinations, lack of factual grounding, bias, copyright concerns. |
| Agentic AI | An autonomous “agent” that decides, plans, and acts in an environment to achieve goals (often using DL/LLM + tool use). | Reinforcement‑learning agents, LLM‑driven agents (ChatGPT‑plugins, ReAct, AutoGPT), multi‑agent coordination frameworks. | Browsing the web, executing code, orchestrating APIs, robotic control, game‑playing, automated workflows. | Can “reason” over tools and data, perform multi‑step tasks without human micromanagement. | Safety/alignment challenges, unpredictable behavior, high compute cost, requires reliable external tools. |
In-Depth Look at Each Paradigm
Idea: Intelligence can be captured by manipulating symbols and logical rules.
- Key ingredients: Knowledge bases, ontologies, rule engines, logical inference, search algorithms (A*, Dijkstra), planning (STRIPS, PDDL).
- Classic example: MYCIN (1970s medical expert system) that used if‑then rules to diagnose infections.
- Strengths: Fully transparent; works with very little data; easy to audit.
- Weaknesses: Brittle when faced with noisy or unseen situations; hard to scale to perception‑heavy tasks (vision, speech).
Idea: Let a computer learn a mapping from inputs → outputs by optimizing a loss function on data.
- Categories: Supervised, unsupervised, semi‑supervised, reinforcement learning.
- Typical algorithms: Logistic regression, decision trees, random forests, SVMs, k‑means, Q‑learning.
- When it shines: Structured/tabular data, moderate data volumes, problems where interpretability matters.
- Limitations: Requires hand‑crafted features; performance caps on raw high‑dim data (images, raw text).
Idea: A network of simple computational units (neurons) arranged in layers can approximate any continuous function (Universal Approximation Theorem).
- Core parts: Input layer → hidden layers (weights + non‑linearities) → output layer.
- Common variants: Fully‑connected (MLP), convolutional layers (CNNs) for spatial locality, recurrent layers (RNN/LSTM/GRU) for sequences, attention heads.
- Strengths: Learns features automatically; flexible function approximator.
- Typical failure mode: Shallow nets struggle with complex hierarchical patterns; they still need a decent amount of data.
Idea: Use many‑layer neural networks to learn hierarchical feature representations automatically.
- Why “deep” matters: Early layers capture low‑level patterns (edges, n‑grams); deeper layers capture high‑level concepts (objects, syntax).
- Landmark breakthroughs: AlexNet (2012, ImageNet), Transformers (Vaswani et al., 2017), BERT/GPT families (NLP), Diffusion models (image generation).
- Typical tasks: Image classification, object detection, speech‑to‑text, language understanding, game‑playing.
- Pros: State‑of‑the‑art accuracy on perception tasks; scales with compute & data.
- Cons: Data‑hungry, opaque, expensive to train/infer.
Idea: Instead of just labeling, the model learns to sample from the data distribution—producing new, plausible examples.
- Two broad families:
- Autoregressive (e.g., GPT‑3, Codex) – predict next token conditioned on previous tokens.
- Latent/denoising (VAEs, diffusion models) – learn a latent space and decode/denoise to synthesize data.
- Key applications: Chatbots, code assistants, image generation (Stable Diffusion, DALL·E), music, synthetic data for simulation.
- Strengths: Produces novel, high‑fidelity outputs; can be fine‑tuned for many domains.
- Risks / limits: Hallucinations, lack of factual grounding, bias, copyright / IP concerns.
Idea: An autonomous “agent” that decides, plans, and acts in an environment to achieve a goal – often built on top of large language models (LLMs) with tool‑use capabilities.
- Typical architecture:
- Goal formulation (LLM or planner).
- Planning / reasoning (ReAct chain‑of‑thought, explicit planner modules).
- Tool use (web browsing, code execution, database queries, robot actuators).
- Feedback loop (observe outcome, adjust plan).
- Research threads: Reinforcement‑learning agents (AlphaGo, OpenAI Five), LLM‑driven agents (AutoGPT, BabyAGI, LangChain agents), multi‑agent systems (cooperative/competitive societies).
- When to use: Scenarios requiring multi‑step, self‑service workflows – e.g., “plan a trip, book flights, generate an itinerary”, autonomous research assistants, robotic control.
- Challenges: Safety & alignment, unpredictable behavior, high compute cost, reliance on stable external tools, need for robust monitoring.
When to Use Which Paradigm?
| Dimension | Classical AI | Shallow ML / Traditional ML | Deep Learning | Generative AI | Agentic AI |
|---|---|---|---|---|---|
| Primary output | Decision, plan, logical conclusion | Predicted label/value | Predicted label/value (or latent vector) | New data (text, image, code, sound…) | Sequence of actions (including tool calls) |
| Data requirement | Very little (knowledge engineered) | Moderate (hundreds‑thousands examples) | Large (≥ 10⁴–10⁶ examples) | Large (same as DL) | Large for underlying model + optional environment data |
| Explainability | High (rules explicit) | Medium (feature importances, tree paths) | Low‑Medium (layerwise visualizations) | Low (latent space opaque) | Low (LLM reasoning is stochastic) |
| Compute cost (training) | Minimal | Low‑moderate | High (GPU/TPU clusters) | High (large LLMs or diffusion pipelines) | Very high (LLM + tool‑execution environment) |
| Best use‑cases | Formal reasoning, compliance, low‑data domains | Tabular business analytics, quick prototypes | Vision, speech, raw‑text understanding | Creative content, data augmentation, design | Autonomous assistants, automated workflows, robotic control |
| Typical failure mode | Inflexibility to unseen situations | Bad feature engineering, over‑fitting | Bias, hallucination, distribution shift | Nonsense generation, unsafe output | Goal‑drift, unsafe tool usage, unpredictable loops |
No comments:
Post a Comment