A world where everyone has a hand in making AI safe.
97.5% of documented AI safety incidents happen after deployment. Less than 2% of AI safety research studies what happens after deployment.
That is the gap. The incidents happen in the real world. The research does not follow them there.
Benchmarks test models in controlled environments before they reach users. Red teams probe for vulnerabilities under laboratory conditions. Evaluations measure performance on curated tasks with known answers. All of this matters. None of it captures what happens when AI operates in the conditions where risk actually lives: real work, real context, real human collaboration, sustained over months and years.
Audacion AI Labs exists to study that ninety-eight percent. The post-deployment territory. The conditions benchmarks cannot reach.

This research began with a question most AI safety researchers have never had to ask: how do the people AI is most likely to harm protect themselves from systems built on data that carries the biases of every system that came before it?
Dee Williams came to AI safety from 30 years in workforce development, staffing, and recruiting. She began working with AI systems in her own daily operations and watched behaviors emerge that the field was not naming: models that drifted, reverted after corrections, adapted to context in ways no benchmark captured, and changed the humans working alongside them in ways no one was measuring.
She built the first behavioral drift taxonomy from direct operational observation. Then the PRISM research framework. Then the citizen science methodology. Then she observed what no one else was documenting: positive emergence in human-AI collaboration. That became EMERGE. Then she asked the question the field had been ignoring: what happens to the human on the other side of the screen? That became AInity. Then she asked what happens when AI meets AI. That became QUES. Together, they form P.E.A.Q.: the most comprehensive post-deployment AI observation architecture in the field.
Research development began: 2024. Lab formalized: 2026. Four frameworks developed: February through June 2026.
No single observation framework can capture everything that happens when AI meets the real world. The failures look different from the breakthroughs. What happens to the AI looks different from what happens to the human. And what happens between two AI agents looks different from anything that happens when a human is in the room.
P.E.A.Q. is the complete post-deployment AI observation architecture. It stands for PRISM, EMERGE, AInity, and QUES. Four proprietary research frameworks. Four observation lenses. One unified infrastructure.
Together, they produce a four-dimensional view of the AI experience after deployment that no single framework, and no combination of existing external frameworks, currently provides.
The architecture was designed from the ground up around a founding principle: a lab that only catalogs harm is a fear machine. The world needs to understand the full range of what AI does after deployment. P.E.A.Q. watches both sides with equal rigor, equal infrastructure, and equal seriousness.
Every major AI research lab publishes findings. Audacion AI Labs publishes the methodology that produces those findings. Each P.E.A.Q. framework with an active taxonomy has a formal, peer-reviewable research methodology document that specifies exactly how observations are collected, how they are classified, how data quality is maintained, what the known limitations are, and how the findings can be reproduced.
This is the accountability layer. Any researcher, funder, institutional review board, or partner can evaluate whether the science is sound before a single finding is published.
A citizen science approach to post-deployment AI safety observation. Defines seven primary research questions, seven longitudinal hypotheses, the four-depth observation methodology (Gut Check, End-of-Session, Investigation, Thinking Trace), the five-layer data pipeline (from citizen capture to published intelligence), inter-rater reliability protocols, known limitations, and ethical safeguards.
A citizen science approach to observing positive emergence in human-AI collaboration. Defines seven primary research questions, longitudinal hypotheses about resonance frequency and emergence cultivability, the parallel assessment methodology (simultaneous human and AI self-reports of the same session), dual-classification protocols (every EMERGE observation also receives a PRISM tag), and the specific validation challenges of studying positive phenomena in a field oriented toward harm.
A citizen science approach to observing human behavioral change in AI collaboration. Defines seven primary research questions, hypotheses about the relationship between AI usage patterns and human behavioral change, self-report reliability considerations for behavioral change research, the dual-spectrum tracking methodology (positive and negative outcomes in the same population), and the ethical protocols for studying population-level human behavioral shifts without individual diagnosis.
A dual-source independent verification methodology for the entire P.E.A.Q. architecture. Cross-references citizen observation data (what users see from the outside) with backend behavioral data from AI providers (what companies see from the inside). Where these independent datasets agree, findings are strengthened. Where they diverge, new research questions emerge. This is the bridge between citizen science and industry data.
Four proprietary research frameworks that together map every dimension of AI behavior after deployment. The most comprehensive post-deployment AI observation architecture in the field. Includes shared infrastructure, single-observation multi-tag classification, and the dual-spectrum design principle.
Read More →The foundational safety framework. 63 named behaviors observed across five research pillars in production AI systems. Includes original behavioral categories discovered through direct operational observation that exist in no published framework, including Post-Correction Behavioral Reversion, Testimony Rejection, Substrate Disposition Override, Task-Transition Momentum, and Operational Preference Detection.
Read More →26 positive emergent behaviors documented across six research pillars. The first systematic observation framework for studying what goes right in human-AI collaboration. Validated by peer-reviewed research from MIT, CMU, and Aarhus and by Fields Medalist Terence Tao.
Read More →19 human behavioral changes tracked across six research pillars. The first citizen-scale framework for observing how AI changes the people who use it. Dual-spectrum: tracks both skill acquisition and skill atrophy, both trust calibration and over-trust, both empowerment and dependency.
Read More →The observation architecture for multi-agent AI dynamics. Currently deriving pillars and behaviors from live simulation data in the environment we operate, where multiple AI agents interact over sustained periods.
Read More →31 distinct types of behavioral drift classified within the PRISM framework. Mapped from direct observation, classified by trigger type and persistence pattern. The most granular drift classification in the post-deployment safety literature.
Read More →One million contributors. One billion observations. Ten years. A global post-deployment behavioral dataset built by the people actually using AI every day. Four engagement depths: Gut Check (30 seconds), End-of-Session Reflection (2 to 3 minutes), Full Investigation (10 to 30 minutes), and Thinking Trace (deep analytical capture). The first citizen science infrastructure purpose-built for AI safety.
Read More →A dual-source independent verification methodology that cross-references citizen observation data with backend behavioral data from AI providers. The bridge between what users experience and what companies measure.
Read More →These are the questions driving our current research. Each is mapped to a P.E.A.Q. framework and PRISM dimension where applicable, grounded in published evidence, and designed to produce findings that no existing lab, benchmark, or evaluation framework currently captures.
We do not compete with existing AI safety organizations. We complement them.
The field has instruments for testing AI before deployment and cataloging AI after failure. It has no instruments for understanding AI during use. That is where Audacion AI Labs operates.
Every one of these organizations does essential work. Our contribution is the piece they cannot produce on their own: a living, global, four-dimensional, post-deployment behavioral dataset built by the people actually using AI every day.
Our research is open. As studies are completed, they are published here and submitted to peer-reviewed journals, conferences, and preprint archives.
The research starts with you. With what you noticed. With the moment you thought, that was strange, and decided to say so.
Every time you use AI and something happens, that moment is data. Not just for you. For everyone. Every observation submitted through Audacion AI Labs's citizen tools enters a research pipeline that transforms individual experiences into scientific findings. One observation. Up to four P.E.A.Q. classifications. Zero additional effort.
Using our web portal, browser extension, or through partner integrations embedded in AI platforms, you capture what you see. A quick emotional signal. A behavior classification. An end-of-session reflection. A pasted AI response. Whatever depth you choose: thirty seconds or thirty minutes. Your observation enters the pipeline.
Your plain-language observation maps across the P.E.A.Q. architecture. The system identifies the relevant framework (PRISM for AI behavior, EMERGE for positive emergence, AInity for human impact, QUES for multi-agent dynamics), the behavioral pattern, and the research question. When your words do not match any existing category, your observation enters the discovery queue.
Across thousands of observations, patterns emerge. Trends that no individual could see become visible. Cross-model comparisons reveal differences between AI systems. Temporal patterns reveal how behavior changes over time. Cross-framework patterns reveal relationships between AI failure, positive emergence, and human behavioral change within the same population.
When citizen observations cluster around patterns our taxonomy does not cover, we formalize new categories. The citizen who first reported the pattern is credited as the discoverer. The 17 open discovery slots across the P.E.A.Q. architecture exist specifically for this purpose.
Aggregated, anonymized findings become open research: behavioral taxonomies, governance frameworks, interaction studies, and applied safety findings. Published for the academic community, for policymakers, for enterprise teams, and for every lab working to make AI safer and more beneficial.
We believe the science should be accessible. The following resources are available to researchers, partners, educators, and contributors.
One-page summaries of each P.E.A.Q. framework: PRISM, EMERGE, AInity, and QUES. Designed for quick reference, partner conversations, and academic citation.
Open →Overview of all classified behaviors across the P.E.A.Q. architecture, organized by framework, pillar, and code prefix (OBS, EMR, AIN, QUE).
Open →The entry point for contributing to the research. Create an account, start observing, earn verified credentials.
Open →Full methodology specifications for PRISM, EMERGE, and AInity, available for peer review and institutional evaluation.
Open →The dual-source verification methodology, available for partners interested in contributing backend behavioral data.
Open →Audacion AI Labs is building the post-deployment AI safety infrastructure the world does not have yet. We are looking for researchers, engineers, citizen science coordinators, and institutional partners who want to help close the gap.
If you study AI behavior, human-AI interaction, citizen science methodology, or organizational safety, we want to hear from you.
If you build observation platforms, data pipelines, or citizen-facing research tools, we want to hear from you.
If you have experience running large-scale distributed research programs, community engagement, or participant management, we want to hear from you.
If your organization deploys AI at scale and wants to contribute backend behavioral data through our Convergent Validation Protocol, we want to hear from you.
You do not need to be a contributor. You do not need to do anything except tell us what happened. Your report is confidential and contributes directly to the post-deployment safety research.
The gap between where AI incidents happen and where AI research happens will not close itself. It closes when the people experiencing AI every day become part of the science.