Core Capability Bottlenecks: Persistent Plasticity And Causal Reasoning
Sources: 1 • Confidence: Medium • Updated: 2026-04-11 18:05
Key takeaways
- Misra claims that LLMs do Bayesian-style updating during an interaction but do not retain learning across sessions because their weights are frozen after training.
- Misra models an LLM as an implicit extremely large but sparse mapping from prompts to next-token probability distributions, approximated via compression rather than explicit storage.
- Misra reports that in wind-tunnel experiments, transformers matched the Bayesian posterior to about 1e-3 bits accuracy.
- Misra argues that passing the Turing test or doing economically useful work are insufficient definitions of AGI because they do not imply autonomous performance without human intervention.
- Misra rejects claims that current LLMs are conscious or have an inner monologue.
Sections
Core Capability Bottlenecks: Persistent Plasticity And Causal Reasoning
- Misra claims that LLMs do Bayesian-style updating during an interaction but do not retain learning across sessions because their weights are frozen after training.
- Misra claims AGI-level progress requires solving two core problems: robust plasticity via continual learning and the ability to build causal models from data efficiently.
- A speaker identifies Pearl’s causal hierarchy and do-calculus framework as an appropriate theoretical structure for advancing from association to intervention and counterfactual reasoning for grounded simulation.
- Misra claims current deep learning primarily captures correlation (association) rather than causal reasoning that supports intervention and counterfactual simulation (as in Pearl’s causal hierarchy).
- Misra frames deep learning as closer to Shannon-entropy-style correlation learning, while human-level insight is linked to low Kolmogorov-complexity representations (short programs) that explain data.
- Misra claims progress toward intelligence requires moving from correlation to causation and that this shift should change how intelligence is conceptualized and engineered.
Bayesian In-Context Learning And Diagnostic Tooling
- Misra models an LLM as an implicit extremely large but sparse mapping from prompts to next-token probability distributions, approximated via compression rather than explicit storage.
- Misra says that after OpenAI removed token-probability visibility in its interface, his group built TokenProbe (tokenprobe.cs.columbia.edu) to inspect next-token probabilities and entropy for open-weight models.
- Misra characterizes in-context learning as Bayesian-style belief updating in which token probabilities shift toward the demonstrated output format as examples are added to the prompt.
- Misra proposes a controlled 'Bayesian wind tunnel' methodology using tasks that are too combinatorial to memorize but have analytically computable posteriors to test whether architectures perform Bayesian inference.
- Misra claims that geometric signatures associated with Bayesian updating in small controlled models also appear in larger open-weight production LLMs, though messier.
- Misra describes a Google Research paper as using an RLHF-like approach to teach LLMs to perform Bayesian learning more properly.
Architecture Comparisons Under Controlled Inference Tests
- Misra reports that in wind-tunnel experiments, transformers matched the Bayesian posterior to about 1e-3 bits accuracy.
- Misra reports that in the same wind-tunnel experiments, Mamba performed well on most tasks.
- Misra reports that in the same wind-tunnel experiments, LSTMs only partially matched the Bayesian posterior behavior.
- Misra reports that in the same wind-tunnel experiments, MLPs failed the Bayesian-inference tasks.
Agi Evaluation: Autonomy Emphasis And Discovery-Style Tests
- Misra argues that passing the Turing test or doing economically useful work are insufficient definitions of AGI because they do not imply autonomous performance without human intervention.
- Misra proposed an AGI criterion: an LLM trained only on pre-1916 physics should be able to derive the theory of relativity.
- Misra reports that Demis Hassabis has publicly mentioned a similar 'Einstein/relativity' style AGI test at an India AI Summit.
- Misra predicts that in the near term, frontier models may complete well-defined, well-scoped coding tasks without human intervention.
Disputes About Consciousness, Agency, And Interpretation Of Behaviors
- Misra rejects claims that current LLMs are conscious or have an inner monologue.
- Misra attributes apparently deceptive or self-preserving behaviors in LLM outputs to training-data content rather than to the architecture having intrinsic goals.
- Misra reports that Dario Amodei has said one cannot rule out LLM consciousness and that Misra explicitly disagrees with that assessment.
- Casado argues that recent viral examples suggesting LLM generality (including Donald Knuth’s experience) do not demonstrate true general intelligence.
Unknowns
- What are the exact task designs, datasets, training procedures, and evaluation metrics used in the 'Bayesian wind tunnel' experiments, and are they independently replicated?
- What specific evidence would adjudicate the consciousness dispute (e.g., agreed operational criteria or tests), and do leading labs converge on any such criteria?
- What concrete, testable scoring rubric would make the 'pre-1916 to relativity' AGI test operational (acceptable derivations, verification against predictions, contamination checks)?
- What are the measured business outcomes of the ESPN deployment (accuracy, latency, user adoption, maintenance burden, error recovery) and how do they compare to later RAG or fine-tuned approaches?
- How robust are context/memory-update 'pseudo-plasticity' approaches over long horizons (drift, compounding errors, scaling of memory, adversarial susceptibility) compared to true continual learning?