The decision framework for identifying which processes in your organization are strong dark factory candidates, and which will produce expensive failures regardless of how well you build the pipeline.
The dark factory pattern does not work everywhere. Its practitioners are the first to say so. StrongDM‘s own manifesto acknowledges that building from scratch with clear specifications is very different from taking over existing systems, and that the latter is harder in ways that the pattern does not solve. MindStudio’s analysis of dark factory agents states it plainly: dark factory automation works best for well-defined, repetitive tasks with testable acceptance criteria, and is poorly suited for ambiguous, novel, or high-stakes problems requiring real judgment.
This post gives you the framework for making that assessment in your own organization, identifying the right first deployments, and understanding why the wrong first deployment will set back internal confidence in the pattern for years.
The Fundamental Eligibility Test
Before any other analysis, every candidate process should pass three primary tests. Failing any one of them is a disqualifier, not a yellow flag.
Specifiability. Can the desired output be described completely and unambiguously in a document, without assuming the reader will fill gaps with judgment? If the answer requires “and someone with domain knowledge would know what we mean by X,” the process is not yet specifiable for autonomous execution. This is not a permanent verdict. It is a signal that specification work must precede pipeline work.
Verifiability. Can correctness be determined from the outside, against observable behavior, without a human reading the code? If verifying the output requires human judgment about whether the approach is sensible, the architecture is right, or the code is maintainable, the process is not a dark factory candidate at this stage. It may be a strong agentic coding candidate with human review, but that is a different and less autonomous operating model.
Consequentiality. What happens when the pipeline makes a mistake? If errors in output can propagate undetected into production with serious consequences before any human sees them, the governance architecture must be airtight before dark factory operation is appropriate. This does not disqualify high-stakes processes permanently. It sets the bar for the governance layer that must be in place before autonomous deployment.
Failing any one of the three primary tests is a disqualifier, not a yellow flag. The wrong first deployment will set back internal confidence in the pattern for years.
Strong Candidates: What to Look For
The profile of a strong dark factory candidate shares predictable characteristics across organizations and industries. Volume matters: high-volume, repetitive work generates the ROI needed to justify the architectural investment. Consistency matters: work that follows well-understood patterns is specifiable; work that requires judgment on each instance is not. Domain stability matters: work in a domain that changes slowly produces durable specifications; work in a rapidly evolving domain requires constant specification maintenance that may eliminate the efficiency gain.
| Work Category | Candidate Strength | Primary Reason |
|---|---|---|
| API integration boilerplate | Strong | Fully specifiable from API docs; behaviorally verifiable against digital twins |
| Database migration scripts | Strong | Clear input/output, testable against schema snapshots, high volume in legacy modernization |
| Test suite generation | Strong | Well-defined acceptance criteria; accelerates overall pipeline quality |
| Internal tooling and admin UIs | Strong | Tolerant of iteration; low external consequence of early errors |
| Documentation and changelog generation | Strong | Human-reviewable output; low consequence errors; high volume |
| CRUD service generation | Moderate | Specifiable but requires robust data model specification upfront |
| Novel algorithm design | Weak | Requires judgment about correctness; specification of the unknown is paradoxical |
| Security-critical code | Weak (without human review) | Best LLMs produce secure code only 56–69% of the time in benchmarks |
| Existing legacy system extension | Weak initially | Requires spec reconstruction of undocumented systems before agents can operate |
The Legacy Problem: Why You Cannot Dark-Factory Your Way Through Existing Systems
This deserves extended treatment because it is where the most expensive misalignments of expectations occur.
Most production codebases carry a decade or more of undocumented decisions, workarounds, and institutional knowledge held only by the people who wrote them. There is no complete specification because the system itself is the specification. The only full record of what it does is what it does. Before AI agents can extend or maintain those systems, someone must reconstruct what exists and why. That work is human, and it is harder than building something new.
The implication is that the path to dark factory operation on existing systems runs through a documentation and specification reconstruction phase that has no shortcut. Some of this can be accelerated with AI tools doing code analysis and documentation generation, but the validation that the reconstructed specification actually captures the system’s behavior requires human engineers who understand the domain. This phase typically takes months, not weeks, for systems of any meaningful complexity.
Common Failure Mode
Skipping Spec Reconstruction on Legacy Systems
Organizations that deploy dark factory pipelines against existing systems without a completed specification reconstruction phase find that agents produce code that passes their tests but subtly breaks existing behavior. These failures are often silent: the new code works correctly in isolation, but the undocumented integration contract with legacy components is violated in ways that only surface in production. This is the most expensive failure mode in dark factory implementations, because it manifests late and propagates broadly.
The Right First Deployment: Principles for Choosing It
The first dark factory deployment in an organization does more than produce software. It builds or destroys internal credibility for the pattern. A first deployment that underdelivers, regardless of whether the failure is architectural or scope-related, creates organizational resistance that can persist for years. Getting the first one right matters disproportionately.
Four principles for selecting it:
Choose a project where failure is contained and recoverable. Internal tooling is ideal. Customer-facing systems under SLA commitments are not appropriate as first deployments. The first pipeline will have rough edges that need correction. Those corrections must not happen under production pressure.
Choose a project where the specification already exists or is straightforward to write. If the first deployment requires significant specification work as a prerequisite, that work becomes part of the pilot and extends the timeline unpredictably. Find a project where the requirements are already clear and documented.
Choose a project with observable success criteria. The first deployment needs a clear definition of done that is visible to stakeholders: the internal tool works, the API integration passes integration tests, and the migration script runs clean. Avoid projects where “done” involves subjective quality judgments.
Choose a project with meaningful volume. A dark factory pipeline is architectural overhead compared to a developer writing fifty lines of code. The pattern pays off at volume. The first deployment should be large enough that the productivity delta is visible, even if the absolute savings are not dramatic.
Thinking About the Org Chart Impact
As AI handles more implementation, coordination roles built around human-paced development lose their purpose. Sprint planning, code review cycles, and release management workflows were designed for a world where humans are doing the building and need synchronization. In a dark factory, the pipeline executes without needing those synchronization mechanisms.
This does not mean engineering management disappears. It means the job description changes. As BCG Platinion puts it: the operating model shifts from managing people who write code to orchestrating agents that deliver outcomes. The bottleneck moves from coding speed to clarity of organizational intent. The people who become most valuable are those who can define what should be built with the precision that autonomous execution requires, and who can evaluate whether what was built actually serves the intended purpose.
For technology leaders planning dark factory deployments, this organizational transition is as important to plan for as the technical architecture. Part V addresses the full organizational readiness question. Part IV addresses what must be in place to govern what you cannot watch.