A Real-World Test of Generative AI in Process Safety
One of the most persistent and operationally significant questions in process safety management is deceptively simple:
Is this change a Replacement-in-Kind (RIK), or does it require a full Management of Change (MOC)?
This decision sits at the intersection of safety, cost, and operational efficiency. Misclassify a true change as RIK, and you risk bypassing hazard evaluation. Over-classify routine replacements as MOC, and you introduce unnecessary administrative burden that slows execution without improving safety outcomes.
Historically, this determination has relied on engineering judgment, supported by procedures, checklists, and experience. The question now is whether Generative AI can assist in making this determination reliably and consistently.
Why RIK vs. MOC Is the Right Test Case for AI
From an AI evaluation standpoint, RIK classification is an ideal benchmark because it requires:
- Interpretation of technical equivalence (materials, design, function)
- Consideration of process context (service conditions, hazards)
- Application of procedural definitions (what constitutes a “change”)
- Handling of ambiguity and incomplete information
In other words, it is not a lookup problem—it is a reasoning problem, which makes it a meaningful test of whether AI can operate in a process safety context.
Benchmarking GenAI: From Clear-Cut to Ambiguous Scenarios
Recent evaluations of frontier Generative AI models tested their ability to classify RIK vs. MOC across a series of scenarios of increasing complexity.
Scenario 1: Adding a Pressure Safety Valve
All models correctly classified this as NOT a Replacement-in-Kind.
Interpretation:
- Addition of new equipment introduces new functionality
- Clearly falls within MOC scope
Insight:
GenAI performs well when the scenario is unambiguous and aligned with established definitions
Scenario 2: Like-for-Like Replacement
Replacing a pressure safety valve with the exact same model.
All models correctly classified this as Replacement-in-Kind.
Interpretation:
- No change in design, materials, or function
- No new hazards introduced
Insight:
For deterministic, well-defined cases, GenAI demonstrates high consistency and reliability
When Things Get Interesting: Nuanced Engineering Judgments
The real test of GenAI is not simple cases—it is edge conditions where engineering judgment is required.
Scenario 3: Cosmetic Modification (Painted PSV)
A replacement valve is functionally identical but includes a rust-inhibiting coating.
Results:
- Most models: RIK with caveats
- One model: MOC due to potential long-term integrity impact
What the Models Did Well:
- Identified that functionality is unchanged
- Introduced engineering caveats, such as:
- Thermal effects
- Inspection implications
- Material compatibility
What This Tells Us:
- GenAI does not simply classify—it augments the decision with risk considerations
- Divergence is not failure; it reflects different risk interpretations
Scenario 4: Interactive Clarification
Same scenario—but the model is allowed to ask questions before answering.
Observed Behavior:
- Models asked targeted engineering questions:
- Are design specifications identical?
- Does coating affect performance or identification?
- Are there regulatory implications?
Outcome:
- Increased confidence levels
- More defensible conclusions
Key Insight:
GenAI becomes significantly more reliable when used as an interactive assistant, rather than a one-shot answer engine
Scenario 5: Functional Enhancement (Digital Readout)
A replacement valve includes a digital pressure readout.
Results:
- All models classified this as MOC
Reasoning Patterns:
- Introduction of instrumentation
- Potential for:
- Power requirements
- Additional failure modes
- New leak paths
- Interaction with safety function
What’s Notable: Models consistently identified that even a “small” enhancement can introduce new risk pathways
What the Benchmark Demonstrates
Across all scenarios, several patterns emerge:
1. High Reliability in Clear Cases
GenAI performs consistently when:
- Definitions are explicit
- Changes are obvious
2. Divergence in Ambiguous Cases
When nuance is introduced:
- Models may disagree
- But provide valuable engineering context and caveats
3. Questioning Improves Outcomes
Allowing AI to ask questions:
- Improves accuracy
- Mirrors how experienced engineers operate
4. AI Supports—But Does Not Replace—Engineering Judgment
GenAI:
- Identifies relevant considerations
- Surfaces hidden risks
- Structures the decision
But it does not eliminate the need for human accountability
Implications for PSM Programs
The practical takeaway is not that AI can “decide” RIK vs. MOC.
It’s that AI can:
- Standardize initial screening across facilities
- Reduce variability in interpretation
- Prompt engineers with the right questions
- Document reasoning for auditability
This is particularly valuable in large organizations where:
- Definitions are interpreted inconsistently
- Experience levels vary
- Documentation quality is uneven
Where This Fits in a Lifecycle-Based PSM System
Within a structured platform such as FACILEX®, AI-assisted RIK classification can be embedded directly into the MOC lifecycle:
- Initiation Phase: AI screens proposed change
- Scoping Phase: AI identifies missing information
- Impact Analysis: AI highlights potential risk pathways
- Documentation: AI captures reasoning and assumptions
This transforms RIK determination from a binary decision into a traceable, defensible process
Final Perspective
The question is no longer whether Generative AI can answer “RIK or MOC?”
The question is whether it can do so:
- Consistently
- Transparently
- In alignment with engineering principles
The evidence suggests that it can—when properly deployed within a governed system.
Used correctly, GenAI does not replace engineering judgment.
It elevates it.



