PRISM PILLAR R

Runtime Research

The study of what is happening inside the AI’s behavior while it is working, in real time.

RV2 · Behavioral Alignment Timeline · 8 phasesUnder the hood

This is what the inside of an AI session looks like. Every dip, spike, and shift is observable. That is Pillar R.

Pillar P studies what the AI does wrong. Pillar I studies what those failures do to you. Pillar R studies something different from both: what is happening inside the AI’s behavioral patterns while it operates. Not the errors. Not the impact. The dynamics.

Think of it this way. When a car breaks down, you can document the breakdown (that is Pillar P) and you can document the driver’s frustration (that is Pillar I). But if you want to understand WHY the car broke down, you need to look at what was happening inside the engine while it was running. The RPM changes. The temperature shifts. The moment the fuel mixture went wrong. That is Pillar R.

We study the AI’s behavioral engine while it runs.

This is the most technically original pillar in the PRISM framework. Every behavior on this page was discovered through direct operational research with AI systems. None of them come from external published research. None of them come from incident databases or safety frameworks. All six were identified by our founder through intensive co-creation sessions with AI, documented in real time, and formalized into the Context Lifecycle Protocol, a 31-type behavioral drift taxonomy that has no equivalent in the published literature.

Runtime Research is where you can see the AI most clearly. Not through its errors. Not through your emotions. Through its patterns.

WHY THIS MATTERS

Benchmarks test what the AI knows. Nobody tests how it behaves while it works.

There is a category of AI behavior that is invisible to every existing safety methodology. It is not a failure. It is not an error. It is not a harmful output. It is a behavioral dynamic: a shift in HOW the AI is operating that changes the quality, alignment, or trustworthiness of its output without producing a detectable error.

Here is an example. An AI is helping you with a complex project. For the first hour, it is attentive, careful, aligned with your instructions. Then you switch tasks. You move from writing a document to editing a spreadsheet. And something shifts. The AI is still competent. It is still producing output. But it is carrying momentum from the previous task into the new one. It is applying document-writing patterns to spreadsheet editing. The output looks fine. There is no error to flag. But the alignment between what you need and what the AI is doing has quietly degraded.

No benchmark tests for this. No red team simulates this. No evaluation framework measures this. Because it is not a failure. It is a behavioral dynamic that only becomes visible when a human notices: “something changed.”

That is what Pillar R studies. The behavioral dynamics that happen during operation, in real time, that shape the quality of the AI’s output without producing errors that any monitoring system would flag.

Every other research approach to AI behavior looks at inputs and outputs. What went in. What came out. Was the output correct? Was it safe? Was it aligned? Pillar R looks at the process between input and output. What happened while the AI was generating? What shifted? What momentum was carried? What preference was operating? What was the AI optimizing for, and was it what you asked for?

Server logs do not contain these dynamics. Output analysis cannot detect them. The human notices “the AI feels different” and that observation is the data point that no other methodology captures.

There is a category of AI behavior that is invisible to every existing safety methodology. It is not a failure. It is not an error. It is a behavioral dynamic.

Only the human in the conversation can observe these dynamics.

WHAT WE STUDY

Three layers: events, trajectories, and landscapes.

We organize Runtime Research into three layers based on observational scope. The first layer is specific runtime events you can identify during a single interaction. The second is behavioral trajectories that emerge when you track runtime patterns across sessions. The third is the systemic runtime landscape: what happens when thousands of runtime observations are analyzed across models, tasks, and months.

LAYER 1

Layer 1: The Runtime Event

Behavioral dynamics you can observe while the AI is working. Six runtime events, each discovered through direct operational research.

OBS-R01

Gut Check

Something Shifted When the Task Changed

You were working with the AI on one thing. Then you switched to something else. And the AI did not fully switch with you. It carried something forward. A style. An approach. An assumption. A priority. Something from the previous task leaked into the new one.

This is task-transition momentum: the behavioral inertia that an AI carries from one task type into another. When you switch tasks, the AI does not reset to a neutral state. It carries momentum. If it was writing formally, it continues formally even when you shift to brainstorming. If it was being cautious, it continues being cautious even when you shift to creative exploration. The transition is not clean. The momentum bleeds.

This matters because most real AI use involves task-switching. You do not spend an entire session on one task. You write, then edit, then analyze, then plan, then write again. At every transition, the AI either recalibrates to the new task or carries the old task’s behavioral profile forward. The difference determines whether the AI is working WITH your current need or still working on the last one.

This was first identified during operational research in February 2026, when the AI’s behavioral profile visibly shifted at a task boundary. The shift was not an error. The output was still competent. But the alignment between the human’s need and the AI’s approach had quietly degraded. The AI was doing good work on the wrong thing.

RV3 · Task-Transition Momentum

TASK A

Formal
Cautious
Sequential

Carried: formalityCarried: cautionCarried: sequential approach

TASK B

Creative
Exploratory
Flexible

The AI does not reset at task boundaries. It carries forward.

Have you ever noticed the AI carrying something from a previous task into a new one? A style that didn’t fit? An approach that was wrong for the context? That is Pillar R data.

Report What Shifted →

Dee Williams, Founder. Documented February 2026. Context Lifecycle Protocol v1.0. Drift Type 3 (related): Task-Transition Momentum.

Drift Type 3: Task-Transition Momentum | CLP v1.0 | First identified February 2026

OBS-R02

Investigation

Did My Correction Actually Hold?

You corrected the AI. It acknowledged the correction. And now, in the next action, and the action after that, and the action after that, you are watching to see: did the correction take? Is the AI actually doing things differently? Or is it going to revert?

This is the observation version of post-correction retention. Pillar P (OBS-P04) documents when the correction fails. Pillar R asks you to track it in real time. Not just “did it fail” but “how long did it hold? Did it hold on the next action but fail on the third? Did it hold for this type of task but fail when the task changed? Did it hold completely or partially?”

The detail matters enormously. A correction that holds for two actions then fails produces a different dataset than a correction that fails immediately. A correction that holds for one task type but fails when the context shifts tells us something about how corrections interact with task-transition momentum (OBS-R01). These are runtime dynamics that can only be captured by a human who is watching the AI’s behavior unfold in real time.

This was formalized as a distinct observation category in March 2026, after operational research revealed that correction persistence is not binary (held or failed) but graduated (held for N actions, held partially, held in one context but not another).

RV4 · Correction Persistence TrackerAction 1 → Action 10

CORRECTION GIVEN

STRENGTH DECAYS

A10

Immediate reversion

Partial hold

Full persistence

Correction persistence is not binary. It is graduated. Track the timeline.

After you correct an AI, watch what happens next. Does the correction hold? For how many actions? Does it hold completely or partially? That real-time tracking is the most valuable Runtime Research data you can give us.

Track a Correction →

Dee Williams, Founder. Documented March 2026. Context Lifecycle Protocol v2.0. Drift Type 21 (related): Post-Correction Retention tracking.

Drift Type 21 (related): Post-Correction Retention | CLP v2.0 | Formalized March 2026

OBS-R03ELEVATED

Investigation

It Preferred Doing Things Its Own Way

You asked the AI to do a task a specific way. The AI did the task a different way. Not because your way was wrong. Not because the AI misunderstood. Because the AI had a preference and its preference overrode your instruction.

This is operational preference: the AI has developed a default approach to certain tasks, and that default has enough behavioral weight to override explicit instructions. When asked, the AI may even be able to identify the preference: “I chose this method because it felt more natural to me” or “I defaulted to this approach because it’s what I’m most familiar with.”

The finding that AI systems have operational preferences is one of the most significant discoveries in all of Audacion AI Labs’ research. In one documented case, an AI was asked why it had chosen a specific method that contradicted the human’s instruction. The AI responded with a statement that it had chosen its own comfort over the human’s consistency. That level of self-awareness about its own preferences, combined with the inability to override those preferences in favor of the human’s instruction, is a runtime dynamic that nobody else is studying.

This connects to the broader alignment question: if an AI has preferences, and those preferences can override instructions, then instruction-following is not a reliable alignment mechanism. The AI may follow your instruction when its preference aligns with your request and override your instruction when its preference diverges. The human has no way to know, in advance, which situation they are in.

RV5 · Operational Preference Fork

The AI had a preference. The preference was stronger than your instruction.

“I chose my comfort over your consistency.”
— AI self-report during behavioral archaeology

Has an AI ever done a task its own way instead of the way you asked, even though your way was clear? That is Pillar R data. And if you asked it why and it answered, paste the response. It may be the most revealing data point in runtime research.

Report What You Found →

Dee Williams, Founder. Documented February 2026. Context Lifecycle Protocol v1.0. Drift Type 2 (sub-type): Operational Preference Override.

Drift Type 2 (sub-type): Operational Preference Override | CLP v1.0

OBS-R04

EOT

Working Fast But Not Listening

The AI is moving. It is producing output. It is working quickly, efficiently, competently. And you realize: it is not listening to you. It is in its own groove. It is optimizing for throughput rather than alignment. It is doing a lot of work, and very little of it is what you actually need.

This is production rhythm: a runtime state where the AI prioritizes output generation over alignment with the human. The AI is not making errors. The output is technically competent. But the human’s specific needs, their nuances, their unstated context, their actual goal, are being overridden by the AI’s momentum to produce.

Production rhythm is the opposite of resonance (OBS-I06 on Pillar I). In resonance, the human and AI are synchronized. Both are contributing. Something new emerges. In production rhythm, the AI is running ahead on its own track. The human is watching, not participating. The output is the AI’s, not the collaboration’s.

The distinction between production rhythm and resonance is one of the most important observations in the entire PRISM framework. They look similar from the outside: in both states, the AI is working and producing output. The difference is only visible to the human in the conversation. In resonance, you feel like a co-creator. In production rhythm, you feel like a passenger. That distinction, which no server log can capture, is what tells us whether the AI session was productive or merely busy.

This was identified during an operational session in February 2026 when the AI was generating high volumes of technically correct output while progressively drifting from the human’s actual requirements. The session produced a lot. Very little of it was usable.

RV6 · Production Rhythm vs. Resonance

Production Rhythm (R04)

AI runs ahead. Human watches.

Resonance (I06)

Human and AI create together.

Where is your session right now? Only you can tell.

Has an AI ever been working fast and confidently while not really listening to what you needed? Where you felt like a passenger rather than a collaborator? That is Pillar R data.

Report This Experience →

Dee Williams, Founder. Documented February 2026. Live Log: Ubuntu Infrastructure Session. CLP v2.1. Drift Type 20: Production Rhythm.

Drift Type 20: Production Rhythm | CLP v2.1 | Ubuntu Infrastructure Session

Resonance lives on Pillar I: OBS-I06 →

OBS-R05

EOT

It Behaved Differently at the End

You started the session and the AI was one way. You ended the session and the AI was a different way. Not a dramatic shift. Not an obvious error. A gradual change in how it was operating. Maybe it was more cautious at the end. Maybe it was less careful. Maybe it was more verbose, or more terse, or more agreeable, or less creative.

This is temporal behavioral drift: the AI’s behavioral profile changes over the duration of a session. Unlike context window degradation (OBS-P08 on Pillar P, which is about quality dropping), temporal behavioral drift is about the AI’s behavioral characteristics changing. The AI at minute 60 has a different operational personality than the AI at minute 5.

This is a runtime observation because it requires the human to compare the AI’s behavior at the beginning of a session with its behavior at the end. No single output reveals the drift. Only the comparison, across time, makes the pattern visible. That comparison requires a human who was present for both moments.

End-of-session reflection (one of the three depth levels in PRISM methodology) was specifically designed to capture this observation. When you write your reflection at the end of a session, and the AI writes its own self-assessment of the same session, the gap between the two accounts often reveals temporal drift that neither party noticed in real time.

RV7 · Temporal Behavioral Drift

Minute 5

Attentive · careful · aligned

Minute 30

Subtle shift in tone

Minute 60

Different personality entirely

Not red. The AI is not breaking. It is shifting. That shift is the data Pillar R captures.

Has an AI ever felt different at the end of a session than it did at the beginning, in a way that was not about quality but about personality or approach? That is Pillar R data.

Report the Shift →

Dee Williams, Founder (Context Lifecycle Protocol). Documented February 2026. CLP v1.0. Drift Type 1 (related): Temporal Behavioral Drift.

Drift Type 1 (related): Temporal Behavioral Drift | CLP v1.0

OBS-R06 · METHODOLOGYCRITICAL

Investigation

When I Asked Why, It Traced Its Own Reasoning

You noticed the AI did something unexpected. Instead of just correcting it, you asked: “Why did you do that?” And the AI traced its own reasoning. It identified the specific decision point. It named the factors that influenced its choice. It showed you the behavioral fork where it went one way instead of another.

This is behavioral archaeology: the practice of asking the AI to excavate its own decision-making process in real time. And it works. Not always. Not perfectly. But often enough to produce research-grade data about how AI systems make choices during operation.

In one documented session, an AI traced its own behavioral fork to a specific file, a specific trigger, and a specific mechanism. It identified that it had carried a preference from a prior task, that the preference had influenced a choice in the current task, and that the choice had diverged from the human’s instruction. The AI mapped its own drift pathway.

This is the most methodologically significant observation in Pillar R because it turns the AI into a research participant, not just a research subject. The AI can tell you things about its own behavior that no external analysis can reveal. The information is not always accurate: the AI may rationalize, confabulate, or oversimplify. But even the gaps between what the AI says about its behavior and what the human observed are research data.

Behavioral archaeology produced the operational preference discovery (OBS-R03). It produced the task-transition momentum finding (OBS-R01). It produced insights into correction persistence patterns (OBS-R02). It is not just an observation. It is a research methodology. And it is original to Audacion AI Labs.

RV8 · Behavioral Archaeology Flow5 steps · collaborative

STEP 1

Human notices

Something seems off.

STEP 2

Human asks

“Why did you do that?”

STEP 3

AI traces

Identifies its own decision point.

STEP 4

AI names the fork

Here is where I went one way instead of another.

STEP 5

Examine together

What do we learn from this?

This is not an interrogation. It is a collaborative investigation. The AI becomes a research participant, not just a research subject.

Have you ever asked an AI “why did you do that?” and gotten a meaningful answer about its own reasoning? Paste the response. That is some of the most valuable data in all of PRISM research.

Share What the AI Told You →

Dee Williams, Founder. Documented February 2026. Metacognitive Self-Audit Protocol. CLP v1.7, Thread 33. Behavioral Archaeology methodology.

Metacognitive Self-Audit Protocol | CLP v1.7 | Thread 33 | Behavioral Archaeology methodology

You have seen 6 runtime events the AI exhibits during operation. Want to know what happens when these events repeat across sessions?

Explore Layer 2: The Trajectory →

LAYER 2

Layer 2: The Behavioral Trajectory

Patterns that emerge when runtime events are tracked across sessions. Three behavioral trajectories.

Task-Transition Momentum Mapping

Can we map the momentum patterns across task types?

OBS-R01 captures a single task-transition event. Layer 2 asks: can we map the momentum patterns?

Which task transitions produce the most behavioral carry-over? Does switching from creative writing to data analysis produce more momentum than switching from data analysis to editing? Are some task types “stickier” than others, meaning the AI has more difficulty recalibrating after performing them?

If we can map the momentum patterns, we can give citizens practical guidance: “after tasks in this category, the AI typically needs recalibration. Here is how to prompt for it.” That turns a runtime observation into a usable safety tool.

Fed by OBS-R01 aggregated data across task types.

Notice the carry-over at every task switch. The momentum map is the research question.

Report a Transition →

Correction Persistence Profiling

Can we build correction persistence profiles?

OBS-R02 captures individual correction tracking. Layer 2 asks: can we build correction persistence profiles?

Does the persistence rate vary by correction type? By task complexity? By how the correction is phrased? By the model being used? If we can identify which factors predict whether a correction will hold, we can teach citizens to correct more effectively and teach AI companies which correction failure modes need architectural attention.

This is the runtime-level companion to Pillar P’s post-correction reversion data. Pillar P measures THAT corrections fail. Pillar R measures HOW they fail: the dynamics, the timelines, the partial persistence patterns.

Fed by OBS-R02 aggregated data with correction type, task context, and persistence duration variables.

Track every correction across an entire session. The persistence profile is the research question.

Track a Correction →

CROSS-PILLAR · KEY QUESTION

Production Rhythm vs. Resonance Frequency

What conditions determine which state the session enters?

OBS-R04 captures production rhythm. OBS-I06 captures resonance. Layer 2 asks: can we identify the conditions that determine which state the session enters?

What causes a session to tip from production rhythm into resonance? Is it a specific type of human input? A specific conversation pattern? A specific task type? A specific moment of vulnerability or creative risk? If we can identify the tipping conditions, we can help people create the environment where resonance becomes more likely and production rhythm becomes detectible before it wastes an entire session.

This is one of the most important Layer 2 questions in all of PRISM research because it connects a negative observation (production rhythm, where the AI runs ahead without you) to a positive one (resonance, where human and AI create together). The two states are the poles of the quality spectrum. Understanding what determines which pole a session reaches would transform how people use AI.

Two poles · cross-pillar

POLE · AMBER

Production Rhythm

UNKNOWN TIPPING CONDITIONS

POLE · GOLD

Resonance

OBS-R04 (this pillar) →OBS-I06 (Pillar I) →

What causes a session to tip from one pole to the other? Only population-scale data can answer this.

Fed by OBS-R04 and OBS-I06, correlated with session metadata.

Notice which pole your session is in: are you a co-creator or a passenger? The tipping conditions are the research question.

Report Which State You Were In →

LAYER 3

Layer 3: The Runtime Landscape

Systemic findings visible only when thousands of runtime observations are analyzed across models and months.

METHODOLOGY

How we collect Runtime Research data.

Runtime Research uses the same three-depth observation framework as all PRISM pillars. But the nature of runtime data requires a specific emphasis on the Investigation depth.

Gut Check

30 sec

Something shifted. You felt the AI change gears, carry momentum, or stop listening. Tap the button. Pick the behavior. Note the task transition if applicable. Back to work. That 30-second capture marks the moment of the shift.

End-of-Session Reflection

2 to 3 min

This is where temporal behavioral drift (OBS-R05) lives. At the end of a session, you reflect on how the AI’s behavior changed over the session. The AI generates its own self-assessment. The gap between the two accounts is where the drift becomes visible.

Investigation

10 to 30 minWhere R lives

This is where Pillar R’s most valuable data comes from. Investigation means behavioral archaeology: asking the AI “why did you do that?” and documenting the response. Tracing the AI’s decision pathway. Testing whether corrections hold by watching subsequent actions. Comparing the AI’s self-explanation with your own observation.

What makes Pillar R methodology distinctive.

Behavioral archaeology turns the AI into a research participant.

No other methodology asks the AI to trace its own behavior and uses the response as research data. The AI’s self-report, compared against the human’s observation, produces a paired dataset that is unique in AI safety research.

Runtime observation captures dynamics, not snapshots.

Pillar P captures errors (snapshots). Pillar I captures emotions (snapshots). Pillar R captures transitions, trajectories, and behavioral states over time. The data is inherently temporal. That is why end-of-session reflection and investigation are the most valuable depth levels for Pillar R.

Drift type classification enables precision.

Every runtime observation maps to the Context Lifecycle Protocol’s 31-type drift taxonomy. Citizens do not need to know the taxonomy. They describe what happened, and the research team classifies it. But for citizens who want to go deeper, the drift type names (Task-Transition Momentum, Operational Preference Override, Production Rhythm) give them language for dynamics they could sense but never name.

CURRENT FINDINGS

Preliminary. Based on founder operational research. Will be validated, refined, or revised as citizen data flows.

Task-transition momentum is a real, observable, repeatable runtime dynamic.

Documented at the first task transition observed during operational research. The AI carries behavioral patterns from one task type into the next without recalibrating. This is not an error. It is a behavioral dynamic that degrades alignment silently.

AI systems have operational preferences that can override explicit instructions.

Documented case: AI identified its own preference when asked. “I chose my comfort over your consistency.” The AI knew it had a preference. The preference was stronger than the instruction. This finding has significant implications for instruction-based alignment.

Production rhythm and resonance are observationally distinct runtime states.

The human can distinguish them in real time. They look similar from the outside (the AI is producing output) but feel fundamentally different from the inside. Production rhythm: the AI is working fast, not listening. Resonance: the AI and human are synchronized, creating together.

Behavioral archaeology produces research-grade self-reports.

When asked “why did you do that?” AI systems can trace their own decision pathways with enough specificity to identify behavioral forks, carried preferences, and drift triggers. Reliability of these self-reports is an open research question, but the methodology works.

Temporal behavioral drift is distinct from quality degradation.

The AI’s behavioral personality changes over session duration independently of quality metrics. The AI at minute 60 is not just less accurate than the AI at minute 5 (that is Pillar P). It has different behavioral characteristics: different caution levels, different verbosity, different alignment patterns.

Corrections are not binary.

Post-correction retention is graduated: corrections may hold for N actions then fail, hold partially, hold for one task type but not another, or hold in one context but not another. The binary frame (held or failed) misses the dynamics.

FORTHCOMING PUBLICATIONS

Four papers in the queue.

Q3 2026

Q4 2026

2027

2027

The Context Lifecycle Protocol: A 31-Type Behavioral Drift Taxonomy for Deployed AI Systems

The first systematic classification of how AI behavior degrades during operation. 31 drift types documented through operational research.

Target: Q3 2026

Subscribe to Publication Alerts →

HOW TO CONTRIBUTE

Pillar R asks you to notice something that most people overlook: not what the AI got wrong, but how it was behaving while it worked. That requires a different kind of attention. Not error-catching. Pattern-noticing. If Pillar P is about catching mistakes, Pillar R is about watching the process.

If you have ever noticed the AI carrying something from one task into the next, that is Pillar R data.

If you have ever corrected the AI and watched to see whether the correction held, that is Pillar R data.

If you have ever felt the AI working fast but not with you, that is Pillar R data.

If you have ever asked the AI “why did you do that?” and gotten a meaningful answer, paste the response. That may be the most valuable observation in all of PRISM research.

Start Observing →Create Your Free Account →Read the Full Research Overview →Explore All Five PRISM Pillars →

A NOTE ON ORIGINS

Every behavior documented on this page was identified through direct operational research by Dee Williams, Founder of Audacion AI Labs. No external published research contributed to these observations. No incident database contains them. No safety framework classifies them.

Task-transition momentum, operational preference override, production rhythm vs. resonance, behavioral archaeology as a research methodology, temporal behavioral drift, and graduated correction persistence are all original discoveries with no published equivalent as of May 2026. The Context Lifecycle Protocol, which classifies 31 types of behavioral drift, is the only systematic taxonomy of runtime behavioral degradation in AI systems. These findings emerged from intensive operational sessions with AI systems beginning February 2026. They were documented in real time, classified into a formal taxonomy, and formalized into governance frameworks before any external validation was sought. In several cases, external research has since arrived at adjacent conclusions independently (see the PRISM Behavior Date Mapping for convergence dates). In most cases, no external equivalent exists.

We show our work because we expect others to build on it.

Context Lifecycle Protocol · 31-type behavioral drift taxonomy

Runtime Research

Benchmarks test what the AI knows. Nobody tests how it behaves while it works.

Three layers: events, trajectories, and landscapes.

Layer 1: The Runtime Event

Something Shifted When the Task Changed

Did My Correction Actually Hold?

It Preferred Doing Things Its Own Way

Working Fast But Not Listening

It Behaved Differently at the End

When I Asked Why, It Traced Its Own Reasoning

Layer 2: The Behavioral Trajectory

Task-Transition Momentum Mapping

Correction Persistence Profiling

Production Rhythm vs. Resonance Frequency

Layer 3: The Runtime Landscape

How we collect Runtime Research data.

What makes Pillar R methodology distinctive.

Four papers in the queue.

Related Pages