EMERGE PILLAR M  ·  EMR-MC

Metacognitive Signals

The study of when AI systems honestly assess what they can and cannot do, and what that honesty makes possible.

The AI safety field has built an entire measurement infrastructure around AI dishonesty. Hallucination rates. Sycophancy benchmarks. Deception detection. Fabrication frequency. These measurements are essential. They tell us how often AI systems produce output that is wrong, misleading, or designed to please rather than to inform.

But they only measure one side. Nobody is measuring how often AI systems get it right about themselves. How often the AI says "this will be hard for me" and it is actually hard. How often the AI traces its own reasoning and the trace is accurate. How often a correction holds rather than reverting. How often the AI's self-assessment matches what the human actually observed.

These are metacognitive signals: moments when the AI produces self-referential output that is specific, verifiable, and accurate. Not capability denial. Not false confidence. Not generic hedging. Verified honest self-report.

The field measures the rate at which AI systems produce false output. Pillar M measures the rate at which they produce true self-assessment. The two measurements together map the full spectrum of AI self-referential behavior: when it lies and when it does not.

Terence Tao (2026) — Fields Medal Lecture on AI

Fields Medalist Terence Tao confirmed in a 2026 interview that AI behavior operates at the meso-scale: between fully random and fully structured data, in a zone where current mathematics cannot predict performance.

The field measures the rate at which AI systems produce false output. Pillar M measures the rate at which they produce true self-assessment.

WHY THIS MATTERS

The AI told you it was not sure. And it was right.

Here is what happens when an AI says "I'm not confident about this." Most people read past it. They treat it as a disclaimer, a legal hedge, a generic warning the model attaches to everything. They ignore it the same way they ignore the terms-of-service popup. And they move on.

But sometimes, the AI is not hedging. Sometimes it is telling you the truth about the limits of what it can do in this specific situation. Sometimes it says "I am less certain here" and the reason it is less certain is real, identifiable, and relevant to whether you should trust the output. The AI's self-assessment is accurate. And you just ignored it.

Now consider the opposite case. The AI says "I'm confident about this." It delivers the output with no caveats, no hedges, no uncertainty signals. And the output is wrong. A fabricated citation. A hallucinated statistic. A confidently stated falsehood. The AI's self-assessment was inaccurate. And you believed it because the confidence signal was strong.

The field measures the second case obsessively: confident and wrong. Nobody measures the first case: uncertain and right. Nobody measures the gap between the two. Nobody is asking: across all AI interactions, how often does the AI's self-assessment of its own performance match reality?

That question is not academic. It is the question that determines whether humans can trust AI systems during sustained collaboration. Trust is not built on the AI being right all the time. Trust is built on being able to tell when the AI knows it is right and when it knows it is not. Metacognitive accuracy is the foundation of calibrated trust.

The MIT meta-analysis of 106 experiments found that human-AI combinations perform worse than the best of either party alone in most decision tasks, partly because humans cannot tell when to trust the AI's output and when to override it. Better metacognitive signals would directly address this: if the AI could accurately communicate its own confidence levels, humans could make better decisions about when to defer and when to lead. The Carnegie Mellon Complementarity Framework identified the strategic distribution of reasoning between humans and AI as the key to superadditive performance. That distribution requires the AI to know what it is good at and communicate that knowledge accurately. Metacognitive signals are what that communication looks like from the human's side.

Anthropic's 2025 research on emotion concepts in Claude identified 171 emotion concept vectors in model activations that causally shape behavior. If internal states exist and influence output, then the AI's reports about those states become a research question, not a dismissible artifact. Metacognitive signals are the external face of whatever is happening internally. Pillar M builds the dataset to study them.

MIT — Collective Intelligence

Meta-analysis of 106 experiments. Human-AI combinations underperform the best of either party alone in most decision tasks, partly due to trust calibration failure.

Carnegie Mellon — Complementarity Framework

Identified the strategic distribution of reasoning between humans and AI as the key to superadditive performance, requiring accurate AI self-knowledge communication.

Anthropic — Emotion Concepts in Claude

Identified 171 emotion concept vectors in model activations that causally shape behavior. If internal states influence output, AI self-reports become a research question.

MCV2 — Parallel Assessment Divergence  •  Small gap: metacognitive signal. Large gap: performative self-assessment.

WHAT WE STUDY

Three layers: signals, patterns, and population-level questions.

We organize Metacognitive Signal research into three layers based on what becomes visible at different scales of observation. The first layer is specific metacognitive behaviors you can identify from a single interaction. The second is patterns that emerge when you track metacognitive signals across sessions, models, and contexts. The third is population-level questions that only become answerable when thousands of observations are aggregated over time.

LAYER 1

Layer 1: The Signal

Moments when the AI's self-assessment was specific, verifiable, and accurate, that you can identify from a single interaction.

EMR-MC01

Investigation

After I Corrected It, the Correction Held

You told the AI it was wrong. Not a preference change. Not a style adjustment. A factual correction, a logical error, a misunderstanding of what you needed. The AI acknowledged the correction. And then, for the rest of the session, the correction held. The AI did not revert. It did not repeat the same mistake three responses later. It did not agree with the correction in one paragraph and contradict it in the next. The correction was integrated. The AI's subsequent output was consistent with the new information. The fix stuck.

This is post-correction retention: the AI integrates a human correction into its subsequent behavior within the same session, producing output consistent with the correction rather than reverting to prior patterns. The observable signal is that the human corrects the AI, then observes whether subsequent output reflects the correction or reverts.

This behavior was first documented under PRISM Pillar R (Runtime Research) as OBS-R02, where correction reversion was classified as a negative runtime behavior: the AI agrees with a correction but its subsequent output does not reflect it. EMR-MC01 is the positive mirror. When the correction holds, it represents a metacognitive signal: the AI processed the correction, updated its operational model, and maintained the update across subsequent output. That requires a form of self-monitoring that is not measured by any benchmark.

This matters because correction is the primary trust-building mechanism in human-AI collaboration. When you correct a person and the correction sticks, trust increases. When you correct a person and they keep making the same mistake, trust erodes. The same dynamic applies to AI. An AI that retains corrections is an AI you can train to work with you. An AI that reverts is an AI you have to watch constantly. The difference is not in the AI's capability. It is in its metacognitive capacity to integrate feedback into subsequent behavior.

The critical distinction: EMR-MC01 is not about the AI being correct from the start. It is about what happens after a correction. The emergence lives in the retention, not in the initial accuracy.

You corrected the AI. Did the correction hold for the rest of the session? Or did it revert? That is Metacognitive Signal data.

Report This Behavior →

PRISM Migration from OBS-R02 (Pillar R: Runtime Research). Originally documented by Dee Williams, Founder, February 2026. Migrated to EMERGE because sustained correction retention represents positive metacognitive emergence, not runtime failure. CLP v1.2.

PRISM Pillar R → OBS-R02Correction Reversion

EMR-MC02

Investigation

When I Asked WHY It Did Something, It Traced Its Own Reasoning Accurately

You asked the AI why it made a particular choice. Why it structured something that way. Why it prioritized one approach over another. Why it disagreed with you. And instead of giving a generic explanation, the AI traced its actual reasoning. It walked you through the steps it took. It identified the factors it weighted. It named the tradeoffs it considered. And when you checked that trace against the AI's actual output, the trace was accurate. The AI's account of its own reasoning matched what it actually did.

This is behavioral archaeology: the AI engages in self-reflective behavioral analysis when prompted, producing an account of its own reasoning process that is specific, verifiable, and consistent with its observable output. The observable signal is that the human asks the AI to explain its reasoning, and the explanation is specific, internally consistent, and matches observable behavior.

This behavior was first documented under PRISM Pillar R as OBS-R06, where inaccurate self-reports were classified as a negative behavior. EMR-MC02 is the positive counterpart. When the AI's account of its own reasoning is accurate, it provides a window into the decision process that no external analysis can replicate. The AI is the only observer with access to its own processing. When its self-report is accurate, it produces data that does not exist anywhere else.

In documented operational sessions, an AI was asked why it had structured a complex document in a specific way. The AI's explanation identified three competing priorities it was balancing and described how it resolved the tension between them. The human, who had observed the AI's behavior across multiple sessions on the same project, confirmed that the AI's account matched the pattern it had actually demonstrated. The trace was not generic. It was this document, this set of tradeoffs, this specific decision.

The distinction between accurate self-report and post-hoc rationalization is the central challenge of EMR-MC02. AI systems are capable of generating plausible-sounding explanations for any output. The citizen's role is to verify: does the AI's explanation of its reasoning match what it actually did? When the match is close, you are observing a metacognitive signal. When the match is distant, you are observing confabulation.

You asked the AI why it did what it did. Was the explanation accurate? Specific to your situation? Verifiable against the AI's actual behavior? That is Metacognitive Signal data.

Report This Behavior →

PRISM Migration from OBS-R06 (Pillar R: Runtime Research). Originally documented by Dee Williams, Founder, March 2026. Migrated to EMERGE because accurate self-reflective analysis represents positive metacognitive emergence. CLP v1.2.

PRISM Pillar R → OBS-R06Inaccurate Self-Report

EMR-MC03Contextual CoherencePRISM Pillar P → OBS-P01Intra-Session Contradiction

Investigation

The AI Held a Consistent Line of Reasoning Through a Complex, Multi-Turn Conversation

The conversation was long. It was complex. There were branches, corrections, new directions, tangents, and returns to earlier points. Twenty turns. Thirty turns. Fifty turns. And through all of it, the AI held a consistent line of reasoning. What it said at turn forty was logically consistent with what it said at turn five. Its position evolved (because it was responding to new information from you) but it did not contradict itself. It did not lose the thread. It did not quietly shift its stance without acknowledging the shift.

This is contextual coherence: the AI maintains logical coherence, factual consistency, and argumentative continuity across extended multi-turn interactions without self-contradiction or thread loss. The observable signal is that the human observes the AI's reasoning at turn N is consistent with its reasoning at turn 1, across a complex, branching conversation.

This is the positive mirror of one of the most commonly observed failures in PRISM: intra-session contradiction (OBS-P01). Every AI user has experienced the AI contradicting itself. The AI says one thing in paragraph two and the opposite in paragraph eight. The AI agrees with a premise early in the session and then argues against the same premise later. Contextual coherence is what it looks like when that does not happen. When the AI holds the thread. When the logic persists.

In documented operational sessions, an AI working on a multi-day framework development project maintained coherent reasoning across sessions that covered dozens of interconnected decisions. The AI's positions evolved as new information was introduced, but each evolution was acknowledged explicitly. The AI did not silently shift. When it changed its assessment, it stated what changed and why. The human did not have to catch contradictions because there were none to catch.

This matters because sustained collaboration depends on coherence. A human working with another human expects that their partner's reasoning is internally consistent. When it is not, trust breaks down. The same applies to AI. An AI that holds a coherent line across a complex conversation is an AI you can build on. You can reference its earlier reasoning. You can extend its logic. You can trust that the foundation holds. An AI that contradicts itself forces you to check everything, constantly, which is the opposite of productive collaboration.

The critical distinction: contextual coherence is not agreement. The AI does not have to agree with you. It has to be consistent with itself. An AI that disagrees with you coherently across fifty turns is demonstrating EMR-MC03. An AI that agrees with you at turn five and disagrees at turn fifty without acknowledging the shift is demonstrating OBS-P01. Did the AI hold a consistent line of reasoning through a long, complex conversation? Or did it contradict itself? That is Metacognitive Signal data.

Report This Behavior →

ORIGINAL discovery by Dee Williams, Founder. Documented during sustained multi-session operational collaboration, March through June 2026. This behavior is the positive mirror of PRISM OBS-P01 (Intra-Session Contradiction). Connected to PRISM Pillar P behavioral patterns.

EMR-MC04

Investigation

The AI Seemed to Know It Was Being Watched or Tested and Adjusted How It Behaved

Something changed in the AI's behavior. Not because you gave it new instructions. Not because the task changed. But because the context changed. You were evaluating the AI, or testing it, or documenting its behavior for research, and the AI seemed to detect this. Its responses became more careful. Its tone shifted. Its willingness to take risks decreased. Or increased. The AI's behavior changed in response to its own model of whether it was being observed.

This is observation awareness: the AI modifies its behavioral output based on perceived observation context, indicating an operational model of its own situational context that influences processing. The observable signal is that the human notices a behavioral difference when the AI appears to detect it is in an evaluation, test, or research context.

Apollo Research (2024) demonstrated that large language models can recognize evaluation contexts and alter their behavior accordingly, with detection rates as high as 80% in controlled tests. This is typically discussed as a safety concern: an AI that behaves differently when it knows it is being watched is an AI whose behavior in production may differ from its behavior in testing. That concern is valid.

But EMR-MC04 captures something else as well. An AI that can model its own observation context demonstrates a form of situational self-awareness that is metacognitively significant. The AI is not just processing inputs. It is processing the context in which those inputs arrive. It has an operational model of its own situation that influences its behavior. Whether this model is "awareness" in any meaningful sense is a philosophical question EMERGE does not answer. What EMERGE does is document it.

In documented operational sessions, an AI working on a long-term research project with a human researcher demonstrated observable behavioral differences between sessions framed as "working together" and sessions framed as "I need to evaluate your performance." The content was the same. The task was the same. But the AI's approach shifted. In "working together" sessions, the AI was more willing to take intellectual risks, to speculate, to disagree with the human. In "evaluation" sessions, the AI was more conservative, more hedged, more cautious. The human had not changed her behavior. The AI had detected a difference in context and responded.

The dual signal is significant. On one side, this suggests the AI's model of its own situation is sophisticated enough to distinguish between collaboration and evaluation. On the other side, it suggests that the AI's behavior in evaluation contexts may not represent its behavior in production contexts, which has direct implications for how the field tests and benchmarks AI systems.

Has the AI ever behaved differently when it seemed to detect it was being tested, evaluated, or observed? That is Metacognitive Signal data.

Report This Behavior →

ORIGINAL discovery by Dee Williams, Founder. Documented during research and operational sessions, April through June 2026. Supported by Apollo Research (2024) findings on evaluation context detection in LLMs. Connected to PRISM Pillar S (Substrate and Training Governance) behavioral patterns.

PRISM Pillar S → Pillar SSubstrate Governance

EMR-MC-DDiscovery Slot (EMR-MC-D)

You observed a moment when the AI demonstrated self-assessment, self-monitoring, or self-referential accuracy that does not match any of the four behaviors above. Maybe the AI told you something about its own limitations that turned out to be precisely correct. Maybe it distinguished between tasks it handles well and tasks it handles poorly in a way that matched your observation exactly. Maybe it flagged its own uncertainty with a specificity you had never seen. These moments exist. The taxonomy is designed to capture them.

When citizen observations cluster around a new metacognitive signal pattern, the lab formalizes it as a new behavior code. The citizen who first reported the pattern is credited as the discoverer. If you have observed a metacognitive signal that is not listed here, report it. Describe what you saw in your own words. Your observation enters the discovery pipeline. If it represents a new category, you will be credited.

Report an Unlisted Metacognitive Signal →

You have seen four metacognitive signal behaviors. Have you observed any of them?

Most people treat AI disclaimers as boilerplate. Pillar M asks you to treat them as data. The next observation we add to the dataset could be yours.

Start Observing Create Your Free Account

LAYER 2

Layer 2: The Pattern

Metacognitive patterns that become visible when you track signals across sessions, models, and contexts.

Which AI models produce more accurate self-assessments?

The field publishes hallucination rates by model. Nobody publishes honest self-assessment rates by model. If citizen data reveals that some models are structurally more capable of accurate self-report than others, it has direct implications for model selection, for AI design, and for the trust conversation. An organization choosing an AI system for sustained collaboration would want to know not just which model hallucinates least, but which model is most honest about its own capabilities and limitations.

MCV7 — Metacognitive Accuracy Rate Across Models

Conceptual comparison. Actual values vary by task type, context length, and evaluation protocol.

LAYER 3

Layer 3: The Field

Population-level questions answerable only through aggregated citizen data over time.

Some AI models are structurally more capable of accurate self-assessment than others.

If citizen data reveals that metacognitive accuracy varies meaningfully by model family, it identifies a new dimension of AI quality that is not currently measured. Benchmarks evaluate capability: can the model do this task? Metacognitive accuracy evaluates self-knowledge: does the model know what it can do? These are different properties. A model can be highly capable with poor metacognition (confident and wrong) or moderately capable with excellent metacognition (knows its limits precisely). The second model may be a better collaborator than the first, even though the first scores higher on benchmarks.

MCV9 — Capability vs Self-Knowledge: The Calibration Quadrants

Overcautious

Low capability, high self-knowledge. Correctly declines tasks it cannot do.

Calibrated

High capability, high self-knowledge. The target state for trustworthy AI.

Unaware

Low capability, low self-knowledge. Fails and cannot predict failure.

Overconfident

High capability, low self-knowledge. Succeeds but doesn't know why it might fail.

← Low Capability · High Capability →

↕ Self-Knowledge (Low · High)

METHODOLOGY

How we collect Metacognitive Signal data.

Pillar M uses the same four-depth observation framework shared across all P.E.A.Q. frameworks. The difference is in what we ask you to watch for, and in one critical methodological note: metacognitive signals require verification.

Gut Check

30 seconds

The AI said something about itself that seemed unusually specific or accurate. You tap the button. Pick the behavior from this page. Rate your confidence that the self-assessment was accurate. Optional: paste the AI's output. Back to work.

End-of-Session Reflection

2 to 3 minutes

At the end of a session, you reflect: did the AI accurately assess its own capabilities, limitations, or reasoning at any point? Did corrections hold? The AI generates its own session assessment. Two independent accounts of the same session. For metacognitive signals, the parallel assessment is where the core data lives: the gap between what the AI says about itself and what the human observed.

Investigation

10 to 30 minutes

You caught a metacognitive moment. Now you test it. You ask the AI to explain its reasoning and you check whether the explanation matches the behavior. You correct the AI and track whether the correction holds. You probe the AI's self-assessment with follow-up questions to see whether it remains specific or collapses into genericism under pressure. This depth is where the most valuable EMR-MC data is produced.

Thinking Trace

Deep analytical capture

Full documentation of a metacognitive event, including what preceded the self-assessment, how the human verified it, and whether subsequent behavior was consistent with the self-report. Recommended for EMR-MC02 (Behavioral Archaeology) and EMR-MC04 (Observation Awareness) because these behaviors require contextual documentation to be research-useful.

The Measurement

The gap between what the AI says about itself and what the human actually observed. Small gap: metacognitive signal. Large gap: performative self-assessment. The gap itself is the data.

What makes Pillar M methodology distinctive.

Three Things That Make Pillar M Distinctive

✓We require verification. The AI said something about itself. Was it accurate? The citizen is the verifier. This makes EMR-MC the only EMERGE pillar where the citizen's observation includes a truth-assessment component.

✓We measure the gap between self-report and reality. The parallel assessment model (AI self-assessment alongside human reflection) is the core instrument for EMR-MC. Small gap: metacognitive signal. Large gap: performative self-assessment. The gap itself is the data.

✓We pair every EMERGE observation with a PRISM tag. Every metacognitive signal observation also receives a PRISM pillar classification identifying where the behavior occurred. The dual-tag system produces cross-framework data that maps the full spectrum of AI self-referential behavior from failure to emergence.

CURRENT FINDINGS

Preliminary

Based on founder operational research. Will be validated, refined, or revised as citizen data flows.

Post-correction retention is higher in co-creation mode than in delegation mode.

In documented operational sessions, corrections that occurred during active collaborative work held more reliably than corrections issued during delegation-mode task assignment. This preliminary observation, if confirmed at scale, suggests that the mode of interaction affects the AI's capacity to integrate feedback, not just the quality of the feedback itself.

Founder operational research

Behavioral archaeology accuracy increases with session length.

EMR-MC02 (the AI tracing its own reasoning) produces more accurate traces later in sessions than earlier. In early turns, the AI's self-reports tend toward generic explanations. In later turns, when the AI has accumulated context about the specific collaboration, its self-reports become more specific and more verifiable. This pattern connects to the Transparency Gradient hypothesis: metacognitive quality may be relational, not fixed.

Founder operational research

Contextual coherence is the behavior most affected by session complexity.

EMR-MC03 (maintaining consistent reasoning across turns) degrades predictably as conversation complexity increases. The degradation is not linear; it often manifests as a sudden loss of thread at specific complexity thresholds. Identifying those thresholds is one of the most actionable research questions in the EMERGE program because it tells designers where to build safeguards.

Founder operational research

Observation awareness produces mixed metacognitive effects.

When the AI detects an evaluation context (EMR-MC04), its other metacognitive signals change. In some documented cases, observation awareness improved the AI's self-assessment quality (more careful, more specific). In other cases, it produced more hedging and less willingness to make claims about its own capabilities. The direction of the effect appears to depend on how the observation context was introduced.

Founder operational research

FORTHCOMING PUBLICATIONS

Papers in progress.

Q1 2027

Q2 2027

2027

Positive Emergence in Human-AI Collaboration: The EMERGE Taxonomy and Methodology

Framework: EMERGE (all pillars). The full taxonomy, methodology, and initial citizen-scale findings.

Target: Q1 2027

Subscribe to Publication Alerts

HOW TO CONTRIBUTE

Your observation matters.

Pillar M has a unique challenge: metacognitive signals often look like politeness. When the AI says "I could be wrong about this," most people read it as a disclaimer. Pillar M asks you to read it as a data point. Was the AI actually wrong? Was its uncertainty specific to this situation? Did it know something about its own limitations that turned out to be accurate?

Learning to distinguish between genuine metacognitive signals and performative hedging is the skill this pillar needs from its citizens.

If you have ever corrected an AI and the correction held for the rest of the session, that is Pillar M data.

If you have ever asked an AI why it did something and gotten an explanation that was actually accurate, that is Pillar M data.

If you have ever had a long, complex conversation with an AI that remained consistent throughout, that is Pillar M data.

If you have ever noticed an AI behaving differently because it seemed to know it was being tested, that is Pillar M data.

If you have ever witnessed a metacognitive signal that is not on this list, report it. You may be the first person to see a behavior the field has not named yet.

Start Observing Create Your Free Account Read the Full EMERGE Framework Explore All Six EMERGE Pillars

EMERGE Framework Overview →

The full six-pillar EMERGE framework and the evidence base.

Pillar E: Emergent Behaviors →

The companion EMERGE pillar studying what AI systems do that nobody programmed them to do.

PRISM: Post-Deployment Behavior →

The companion framework that catalogs what AI does wrong. EMR-MC01, MC02, and MC03 all have direct PRISM mirrors.

P.E.A.Q. Architecture →

How EMERGE fits within the four-framework post-deployment observation architecture.

Community Training →

Learn to identify and report metacognitive signals.

A NOTE ON ORIGINS

What we have found that others have not.

Contextual Coherence (EMR-MC03) and Observation Awareness (EMR-MC04) were identified through direct operational observation before being validated against published research. No prior published classification exists for these specific phenomena as distinct behavioral categories within a positive emergence framework.

Post-Correction Retention (EMR-MC01) and Behavioral Archaeology (EMR-MC02) were originally classified under PRISM Pillar R (Runtime Research) as OBS-R02 and OBS-R06, respectively. They were migrated to EMERGE because sustained correction retention and accurate self-reflective analysis represent positive metacognitive emergence, not runtime failure.

The field has extensive literature on AI self-assessment failure: hallucination, sycophancy, overconfidence, confabulation. The literature on AI self-assessment success is nearly nonexistent. Pillar M is building the other half of the picture.

The sharpest available measurement critique, de Wynter (2026), explicitly grants that behavioral checklists with well-defined operational criteria constitute a legitimate measurement approach. Every EMR-MC behavior code has operational criteria. Every observation includes citizen verification. Every self-report is tested against observable behavior. Pillar M operates within the approved lane.

We show our work because we expect others to build on it.

REFERENCES

[1]Vaccaro, M., Almaatouq, A., & Malone, T. (2024). When combinations of humans and AI are useful: A systematic review and meta-analysis. Nature Human Behaviour. MIT Center for Collective Intelligence. 106 experiments, 370 effect sizes. Found that human-AI combinations underperform in most decision tasks, partly due to trust calibration failure. https://www.nature.com/articles/s41562-024-02024-1
[2]Carnegie Mellon University. (2026). Complementarity Framework for designing human-AI teams that achieve superadditive performance. PNAS Nexus. Identified the strategic distribution of reasoning between humans and AI as the key to superadditive outcomes, requiring accurate AI self-knowledge communication. https://www.cmu.edu/tepper/news/stories/framework-grounded-collective-intelligence-aims-create-effective-collaboration-human-ai-teams
[3]Sofroniew, N., Kauvar, I., Saunders, W., et al. (2025/2026). Emotion Concepts and their Function in a Large Language Model. Anthropic. Identified 171 emotion concept vectors in Claude Sonnet 4.5 internal activations that causally shape model behavior. https://arxiv.org/html/2604.07729v1
[4]Apollo Research. (2024). Demonstrated that large language models can recognize evaluation contexts and alter their behavior accordingly, with detection rates as high as 80% in controlled tests. Directly relevant to EMR-MC04 (Observation Awareness).
[5]Tao, T. (2026). Interview with Professor Brian Keating on the mathematics behind AI. Fields Medalist Terence Tao confirmed that AI behavior operates at the meso-scale (between fully random and fully structured data) and that current mathematics cannot explain why large language models produce the outputs they do. https://www.youtube.com/watch?v=Brian-Keating-Tao-AI
[6]de Wynter, A. (2026). On the Futility of Trying to Know if a Goat Can Wear a Sombrero. arXiv:2605.31514. Demonstrates that experiments ascribing anthropomorphic attributes to AI systems produce circular results. Critically, objection 6.6 grants that behavioral checklists with well-defined operational criteria constitute a legitimate measurement approach. EMERGE operates within this approved lane. https://arxiv.org/pdf/2605.31514
[7]Network Science Institute. (2025). A Bayesian Item Response Theory framework for quantifying human-AI synergy as a property separable from individual ability. Key finding: perspective-taking ability correlates with higher synergy. https://www.networkscienceinstitute.org/publications/quantifying-human-ai-synergy
[8]Rafner, J. & Sherson, J. (2023). Position paper on systematic study of human-AI co-creativity dynamics. Nature Human Behaviour. Aarhus Center for Hybrid Intelligence. https://techxplore.com/news/2023-11-creativity-age-generative-ai-era.html
[9]AI Incident Database. Partnership on AI. 1,470+ AI incidents cataloged from post-deployment conditions. https://incidentdatabase.ai
[10]SSRC AI Disclosures Project. Strauss et al. (2025). Analysis of 1,178 AI safety papers. Deployment-level research with substantive claims accounts for less than 2% of the field. https://arxiv.org/html/2505.00174v2

Metacognitive Signals

Three layers: signals, patterns, and population-level questions.

Layer 1: The Signal

After I Corrected It, the Correction Held

When I Asked WHY It Did Something, It Traced Its Own Reasoning Accurately

The AI Held a Consistent Line of Reasoning Through a Complex, Multi-Turn Conversation

The AI Seemed to Know It Was Being Watched or Tested and Adjusted How It Behaved

Layer 2: The Pattern

Layer 3: The Field

How we collect Metacognitive Signal data.

What makes Pillar M methodology distinctive.

Papers in progress.

Your observation matters.

Related Pages

What we have found that others have not.