Modelfile

Short definition

A Modelfile is an Ollama-specific configuration file that defines how a local language model variant behaves. It sets the base model, system prompt, and inference parameters in a single declarative file. It functions like a Dockerfile, but for model variants instead of container images.

Extended definition

When teams run large language models locally with Ollama, they often need to produce variants of a base model tailored to a specific task. A Modelfile makes that possible without retraining or fine-tuning. It is a plain-text configuration file that Ollama reads to create a named, reusable model variant using the ollama create command.

The file follows a directive-based syntax. The FROM directive names the base model, such as llama3.1:8b. The SYSTEM directive provides a persistent system prompt that shapes the model’s persona, constraints, or task focus. PARAMETER directives control inference behavior: temperature adjusts output randomness, top_p controls nucleus sampling, and num_ctx sets the context window size in tokens.

This approach gives engineering teams a reproducible, version-controllable way to configure model behavior. The Modelfile can be checked into a Git repository alongside application code, making model configuration a first-class artifact in the development workflow. Teams can diff changes, review them in pull requests, and roll back when a parameter change degrades output quality.

Modelfiles are particularly useful in production-adjacent environments where models serve narrow, defined functions: a code review assistant, a document summarizer, or a classification engine. By fixing the system prompt and inference parameters in the file, teams reduce output variance and make behavior predictable across deployments.

Deep technical explanation

Directive structure

A Modelfile is parsed top-to-bottom by Ollama. The FROM directive is required and must appear first. It references either a model available in the Ollama library or a local GGUF file path. All other directives are optional but have significant effects on model behavior.

The SYSTEM directive accepts a multi-line string. Ollama injects this string as the system turn at the start of every conversation with the model variant. This is distinct from a user-level prompt: it persists across all requests to that variant and cannot be overridden by a user message unless the model has been instructed to allow it.

Key PARAMETER directives

PARAMETER temperature controls the probability distribution over the model’s output tokens. A value of 0.0 makes the model deterministic, always selecting the highest-probability token. A value of 1.0 or higher increases randomness. For structured outputs such as JSON or code, values between 0.0 and 0.4 are common.

PARAMETER num_ctx defines the context window in tokens. Increasing this value allows the model to retain more conversation history or process longer input documents, but it also increases memory consumption. For a model like llama3.1:8b, setting num_ctx to 4096 is a balanced default; pushing it to 16384 or higher requires significantly more VRAM or system RAM.

PARAMETER top_p applies nucleus sampling. The model considers only the smallest set of tokens whose cumulative probability reaches the top_p threshold. This prevents low-probability tokens from appearing while preserving natural variation. A top_p of 0.9 is a widely used starting point.

Build and distribution

Running ollama create my-variant -f Modelfile builds the named variant locally. Ollama stores the result in its local model registry. Teams can then use ollama push to publish the variant to a private Ollama registry, making it available to other machines or CI runners.

Failure modes and edge cases

A missing or mistyped FROM directive causes the build to fail immediately. If the referenced base model is not already pulled, Ollama will attempt to download it, which can cause unexpected delays in automated pipelines. Setting num_ctx higher than the base model’s trained maximum does not extend capability; it may degrade output quality or cause the runtime to truncate silently. Teams should validate context window limits against the base model’s documentation before committing a Modelfile.

Practical examples

Scenario 1: Task-specific code assistant

A backend team needed a local code review assistant restricted to Python and security feedback. They wrote a Modelfile with FROM llama3.1:8b, a SYSTEM directive defining the assistant’s scope, PARAMETER temperature 0.2 for consistent output, and PARAMETER num_ctx 8192 to handle large file diffs. The named variant was checked into the repository and shared across the team via an internal Ollama registry.

Scenario 2: Document classification engine

A data engineering team built a document triage tool for legal filings. They created a Modelfile with a SYSTEM directive that instructed the model to return only a JSON object with a category field. Setting temperature to 0.0 removed output randomness. The variant produced consistent structured output without additional parsing logic.

Scenario 3: Per-environment model configuration

A product team maintained two Modelfiles for the same base model: one for development with a higher temperature for creative brainstorming, and one for production with tighter parameters for customer-facing responses. Both files lived in version control, and the correct variant was built and deployed by the CI pipeline based on the target environment.

Scenario 4: Reducing context window cost in constrained environments

A team running inference on a CPU-only server set PARAMETER num_ctx 2048 in their Modelfile to reduce memory pressure. This allowed the service to handle concurrent requests without swapping, accepting a trade-off in maximum input length that was acceptable for their short-document use case.

Why it matters

Modelfiles make model configuration reproducible: the same file produces the same variant every time it is built, removing environment-specific drift.
They enable version control for model behavior: teams can review, diff, and roll back system prompts and parameter changes exactly as they do with application code.
They separate model configuration from application code: inference parameters live in the Modelfile, not scattered across service configuration or environment variables.
They reduce output variance in production by fixing temperature and sampling parameters to values validated during development.
They allow teams to maintain multiple task-specific variants of one base model without maintaining separate fine-tuned weights.
They integrate naturally into CI/CD pipelines: ollama create can be called in a build step, and the resulting variant can be pushed to a registry and promoted through environments.