Configuration Reference

Main Configuration File

OllamaFlow uses a JSON configuration file (ollamaflow.json) that defines server settings, logging, and authentication. On first startup, a default configuration is created if none exists.

Default Configuration Location

Docker: /app/ollamaflow.json
Bare Metal: Same directory as executable

Complete Configuration Example

{
  "Logging": {
    "Servers": [
      {
        "Hostname": "127.0.0.1",
        "Port": 514,
        "RandomizePorts": false,
        "MinimumPort": 65000,
        "MaximumPort": 65535
      }
    ],
    "LogDirectory": "./logs/",
    "LogFilename": "ollamaflow.log",
    "ConsoleLogging": true,
    "EnableColors": true,
    "MinimumSeverity": 1
  },
  "Webserver": {
    "Hostname": "*",
    "Port": 43411,
    "IO": {
      "StreamBufferSize": 65536,
      "MaxRequests": 1024,
      "ReadTimeoutMs": 10000,
      "MaxIncomingHeadersSize": 65536,
      "EnableKeepAlive": false
    },
    "Ssl": {
      "Enable": false,
      "MutuallyAuthenticate": false,
      "AcceptInvalidCertificates": true,
      "CertificateFile": "",
      "CertificatePassword": ""
    },
    "Headers": {
      "IncludeContentLength": true,
      "DefaultHeaders": {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Methods": "OPTIONS, HEAD, GET, PUT, POST, DELETE, PATCH",
        "Access-Control-Allow-Headers": "*",
        "Access-Control-Expose-Headers": "",
        "Accept": "*/*",
        "Accept-Language": "en-US, en",
        "Accept-Charset": "ISO-8859-1, utf-8",
        "Cache-Control": "no-cache",
        "Connection": "close",
        "Host": "localhost:43411"
      }
    },
    "AccessControl": {
      "DenyList": {},
      "PermitList": {},
      "Mode": "DefaultPermit"
    },
    "Debug": {
      "AccessControl": false,
      "Routing": false,
      "Requests": false,
      "Responses": false
    }
  },
  "DatabaseFilename": "ollamaflow.db",
  "AdminBearerTokens": [
    "your-secure-admin-token"
  ]
}

Configuration Sections

Logging Settings

Controls how OllamaFlow logs information and errors.

Setting	Type	Default	Description
`LogDirectory`	string	`"./logs/"`	Directory for log files
`LogFilename`	string	`"ollamaflow.log"`	Base filename for logs
`ConsoleLogging`	boolean	`true`	Enable console output
`EnableColors`	boolean	`true`	Enable colored console output
`MinimumSeverity`	integer	`1`	Minimum log level (0=Debug, 1=Info, 2=Warn, 3=Error)

Syslog Servers

Optional remote logging configuration:

{
  "Servers": [
    {
      "Hostname": "syslog.company.com",
      "Port": 514,
      "RandomizePorts": false,
      "MinimumPort": 65000,
      "MaximumPort": 65535
    }
  ]
}

Webserver Settings

Configures the HTTP server that handles all requests.

Basic Settings

Setting	Type	Default	Description
`Hostname`	string	`"*"`	Bind hostname (* for all interfaces)
`Port`	integer	`43411`	TCP port to listen on

IO Settings

Controls request handling and performance:

Setting	Type	Default	Description
`StreamBufferSize`	integer	`65536`	Buffer size for streaming responses
`MaxRequests`	integer	`1024`	Maximum concurrent requests
`ReadTimeoutMs`	integer	`10000`	Request read timeout in milliseconds
`MaxIncomingHeadersSize`	integer	`65536`	Maximum size of request headers
`EnableKeepAlive`	boolean	`false`	Enable HTTP keep-alive connections

SSL Settings

HTTPS configuration for secure connections:

Setting	Type	Default	Description
`Enable`	boolean	`false`	Enable HTTPS
`MutuallyAuthenticate`	boolean	`false`	Require client certificates
`AcceptInvalidCertificates`	boolean	`true`	Accept self-signed certificates
`CertificateFile`	string	`""`	Path to SSL certificate file
`CertificatePassword`	string	`""`	Certificate password if required

Headers Settings

Default HTTP headers and CORS configuration:

{
  "IncludeContentLength": true,
  "DefaultHeaders": {
    "Access-Control-Allow-Origin": "*",
    "Access-Control-Allow-Methods": "OPTIONS, HEAD, GET, PUT, POST, DELETE, PATCH",
    "Access-Control-Allow-Headers": "*"
  }
}

Access Control

IP-based access control (optional):

{
  "DenyList": {
    "192.168.1.100": "Blocked IP",
    "10.0.0.0/8": "Blocked network"
  },
  "PermitList": {
    "192.168.1.0/24": "Allowed network"
  },
  "Mode": "DefaultPermit"
}

Modes:

DefaultPermit: Allow all except denied IPs
DefaultDeny: Deny all except permitted IPs

Database Settings

Setting	Type	Default	Description
`DatabaseFilename`	string	`"ollamaflow.db"`	SQLite database file path

Authentication Settings

Setting	Type	Default	Description
`AdminBearerTokens`	array	`["ollamaflowadmin"]`	Valid bearer tokens for admin APIs

Frontend Configuration

Frontends are virtual Ollama endpoints that clients connect to. They are stored in the database and managed via the API.

Frontend Object Structure

{
  "Identifier": "production-frontend",
  "Name": "Production AI Inference",
  "Hostname": "ai.company.com",
  "TimeoutMs": 90000,
  "LoadBalancing": "RoundRobin",
  "BlockHttp10": true,
  "MaxRequestBodySize": 1073741824,
  "Backends": ["gpu-1", "gpu-2", "gpu-3"],
  "RequiredModels": ["llama3:8b", "mistral:7b"],
  "AllowEmbeddings": true,
  "AllowCompletions": true,
  "PinnedEmbeddingsProperties": {
    "model": "nomic-embed-text",
    "options": {
      "temperature": 0.1
    }
  },
  "PinnedCompletionsProperties": {
    "options": {
      "temperature": 0.7,
      "num_ctx": 2048
    }
  },
  "LogRequestFull": false,
  "LogRequestBody": false,
  "LogResponseBody": false,
  "UseStickySessions": false,
  "StickySessionExpirationMs": 1800000,
  "Active": true
}

Frontend Properties

Property	Type	Default	Description
`Identifier`	string	Required	Unique identifier for this frontend
`Name`	string	Required	Human-readable name
`Hostname`	string	`"*"`	Hostname pattern (* for catch-all)
`TimeoutMs`	integer	`60000`	Request timeout in milliseconds
`LoadBalancing`	enum	`"RoundRobin"`	Load balancing algorithm
`BlockHttp10`	boolean	`true`	Reject HTTP/1.0 requests
`MaxRequestBodySize`	integer	`536870912`	Max request size in bytes (512MB)
`Backends`	array	`[]`	List of backend identifiers
`RequiredModels`	array	`[]`	Models that must be available
`AllowEmbeddings`	boolean	`true`	Allow embeddings API requests
`AllowCompletions`	boolean	`true`	Allow completions API requests
`PinnedEmbeddingsProperties`	object	`{}`	Key-value pairs merged into embeddings requests
`PinnedCompletionsProperties`	object	`{}`	Key-value pairs merged into completions requests
`UseStickySessions`	boolean	`false`	Enable session stickiness
`StickySessionExpirationMs`	integer	`1800000`	Session timeout (30 minutes, min: 10s, max: 24h)
`LogRequestFull`	boolean	`false`	Log complete requests
`LogRequestBody`	boolean	`false`	Log request bodies
`LogResponseBody`	boolean	`false`	Log response bodies
`Active`	boolean	`true`	Whether frontend is active

Load Balancing Options

"RoundRobin": Cycle through backends sequentially
"Random": Randomly select from healthy backends

Hostname Patterns

"*": Match all hostnames (catch-all)
"api.company.com": Exact hostname match
Multiple frontends can exist with different hostname patterns

Security Controls

Frontend security controls enable fine-grained access control and request parameter enforcement:

Request Type Controls

AllowEmbeddings: Controls whether embeddings API endpoints are accessible through this frontend
- Ollama API: /api/embed
- OpenAI API: /v1/embeddings
AllowCompletions: Controls whether completion API endpoints are accessible through this frontend
- Ollama API: /api/generate, /api/chat
- OpenAI API: /v1/completions, /v1/chat/completions

For a request to succeed, both the frontend and at least one assigned backend must allow the request type.

Pinned Properties

Pinned properties allow administrators to enforce specific parameters in requests, providing security compliance and standardization:

PinnedEmbeddingsProperties: Key-value pairs automatically merged into all embeddings requests
PinnedCompletionsProperties: Key-value pairs automatically merged into all completion requests

Common use cases:

Enforce maximum context size: {"options": {"num_ctx": 2048}}
Standardize temperature settings: {"options": {"temperature": 0.7}}
Override model selection: {"model": "approved-model:latest"}
Set organizational defaults: {"options": {"top_p": 0.9, "top_k": 40}}

Properties are merged with client requests, with pinned properties taking precedence over client-specified values.

Backend Configuration

Backends represent physical Ollama instances in your infrastructure.

Backend Object Structure

{
  "Identifier": "gpu-server-1",
  "Name": "Primary GPU Server",
  "Hostname": "192.168.1.100",
  "Port": 11434,
  "Ssl": false,
  "UnhealthyThreshold": 3,
  "HealthyThreshold": 2,
  "HealthCheckMethod": "GET",
  "HealthCheckUrl": "/",
  "MaxParallelRequests": 8,
  "RateLimitRequestsThreshold": 20,
  "AllowEmbeddings": true,
  "AllowCompletions": true,
  "PinnedEmbeddingsProperties": {
    "options": {
      "num_ctx": 512
    }
  },
  "PinnedCompletionsProperties": {
    "options": {
      "num_ctx": 4096,
      "temperature": 0.8
    }
  },
  "LogRequestFull": false,
  "LogRequestBody": false,
  "LogResponseBody": false,
  "Active": true
}

Backend Properties

Property	Type	Default	Description
`Identifier`	string	Required	Unique identifier for this backend
`Name`	string	Required	Human-readable name
`Hostname`	string	Required	Backend server hostname/IP
`Port`	integer	`11434`	Backend server port
`Ssl`	boolean	`false`	Use HTTPS for backend communication
`UnhealthyThreshold`	integer	`2`	Failed checks before marking unhealthy
`HealthyThreshold`	integer	`2`	Successful checks before marking healthy
`HealthCheckMethod`	string	`"GET"`	HTTP method for health checks, either `GET` or `HEAD`
`HealthCheckUrl`	string	`"/"`	URL path for health checks
`MaxParallelRequests`	integer	`4`	Maximum concurrent requests
`RateLimitRequestsThreshold`	integer	`10`	Rate limiting threshold
`AllowEmbeddings`	boolean	`true`	Allow embeddings API requests
`AllowCompletions`	boolean	`true`	Allow completions API requests
`PinnedEmbeddingsProperties`	object	`{}`	Key-value pairs merged into embeddings requests
`PinnedCompletionsProperties`	object	`{}`	Key-value pairs merged into completions requests
`LogRequestFull`	boolean	`false`	Log complete requests
`LogRequestBody`	boolean	`false`	Log request bodies
`LogResponseBody`	boolean	`false`	Log response bodies
`Active`	boolean	`true`	Whether backend is active

Health Check Configuration

Health checks validate backend availability:

Method: HTTP method (GET, HEAD)
URL: Path to check (e.g., /, /api/version, /health)
Thresholds: Number of consecutive successes/failures to change state

Common health check endpoints:

HEAD /: Basic connectivity check for Ollama
GET /health: Basic connectivity check for vLLM

Rate Limiting

Backends can enforce rate limits:

Requests exceeding RateLimitRequestsThreshold receive HTTP 429
Rate limiting is per backend, not global
Helps protect individual Ollama instances from overload

Security Controls

Backend security controls provide additional layers of request filtering and parameter enforcement:

Request Type Controls

AllowEmbeddings: Controls whether this backend can process embeddings requests
AllowCompletions: Controls whether this backend can process completion requests

Requests are only routed to backends that allow the specific request type. This enables:

Dedicated embeddings servers that only handle embeddings requests:
- Ollama API: /api/embed
- OpenAI API: /v1/embeddings
Completion-only servers that only handle completion requests:
- Ollama API: /api/generate, /api/chat
- OpenAI API: /v1/completions, /v1/chat/completions
Multi-tenant isolation by request type

Pinned Properties

Backend pinned properties provide server-level parameter enforcement:

PinnedEmbeddingsProperties: Applied to all embeddings requests routed to this backend
PinnedCompletionsProperties: Applied to all completion requests routed to this backend

Backend pinned properties are merged after frontend pinned properties, allowing for:

Server-specific resource limits: {"options": {"num_ctx": 1024}}
Hardware-optimized settings: {"options": {"num_gpu": 2}}
Backend-specific model overrides: {"model": "server-optimized-model"}

The merge order is: Client Request → Frontend Pinned Properties → Backend Pinned Properties, with later values taking precedence.

docker run -d \
  -e OLLAMAFLOW_PORT=8080 \
  -e OLLAMAFLOW_ADMIN_TOKEN=my-secure-token \
  -e ASPNETCORE_ENVIRONMENT=Production \
  -p 8080:8080 \
  jchristn/ollamaflow

Configuration Examples

Basic Single Backend

Minimal configuration for testing:

{
  "Webserver": {
    "Port": 43411
  },
  "AdminBearerTokens": ["test-token"]
}

Frontend/Backend via API:

# Create backend
curl -X PUT -H "Authorization: Bearer test-token" \
  -H "Content-Type: application/json" \
  -d '{"Identifier": "local", "Hostname": "localhost", "Port": 11434}' \
  http://localhost:43411/v1.0/backends

# Create frontend
curl -X PUT -H "Authorization: Bearer test-token" \
  -H "Content-Type: application/json" \
  -d '{"Identifier": "main", "Backends": ["local"]}' \
  http://localhost:43411/v1.0/frontends

Production Multi-Backend

Production configuration with multiple GPU servers:

{
  "Webserver": {
    "Hostname": "*",
    "Port": 43411,
    "Ssl": {
      "Enable": true,
      "CertificateFile": "/etc/ssl/ollamaflow.crt",
      "CertificatePassword": "cert-password"
    }
  },
  "Logging": {
    "LogDirectory": "/var/log/ollamaflow/",
    "ConsoleLogging": false,
    "MinimumSeverity": 1
  },
  "DatabaseFilename": "/var/lib/ollamaflow/ollamaflow.db",
  "AdminBearerTokens": [
    "secure-production-token-1",
    "secure-production-token-2"
  ]
}

Backends configuration:

# GPU servers
for i in {1..4}; do
  curl -X PUT -H "Authorization: Bearer secure-production-token-1" \
    -H "Content-Type: application/json" \
    -d "{
      \"Identifier\": \"gpu-$i\",
      \"Name\": \"GPU Server $i\",
      \"Hostname\": \"gpu$i.company.internal\",
      \"Port\": 11434,
      \"MaxParallelRequests\": 8,
      \"HealthCheckUrl\": \"/api/version\"
    }" \
    http://localhost:43411/v1.0/backends
done

# Production frontend
curl -X PUT -H "Authorization: Bearer secure-production-token-1" \
  -H "Content-Type: application/json" \
  -d '{
    "Identifier": "production",
    "Name": "Production AI Inference",
    "Hostname": "ai.company.com",
    "LoadBalancing": "RoundRobin",
    "Backends": ["gpu-1", "gpu-2", "gpu-3", "gpu-4"],
    "RequiredModels": ["llama3:8b", "mistral:7b", "codellama:13b"],
    "TimeoutMs": 120000
  }' \
  http://localhost:43411/v1.0/frontends

Development Environment

Development setup with debugging enabled:

{
  "Webserver": {
    "Port": 43411,
    "Debug": {
      "Routing": true,
      "Requests": true,
      "Responses": false
    }
  },
  "Logging": {
    "ConsoleLogging": true,
    "EnableColors": true,
    "MinimumSeverity": 0
  },
  "AdminBearerTokens": ["dev-token"]
}

Configuration Validation

OllamaFlow validates configuration on startup:

Common Validation Errors

Invalid Port Range: Ports must be 1-65535
Missing Required Fields: Identifier, Hostname required for backends
Duplicate Identifiers: Frontend/Backend IDs must be unique
Invalid Load Balancing: Must be "RoundRobin" or "Random"
Invalid Hostnames: Must be valid hostname or "*"

Configuration Test

Validate configuration without starting the server:

# Test configuration file
dotnet OllamaFlow.Server.dll --validate-config

# Test with specific config
OLLAMAFLOW_CONFIG=/path/to/config.json dotnet OllamaFlow.Server.dll --validate-config

Migration and Backup

Database Backup

Backup the SQLite database regularly:

# Simple copy (stop OllamaFlow first)
cp ollamaflow.db ollamaflow.db.backup

# Online backup (while running)
sqlite3 ollamaflow.db ".backup /path/to/backup.db"

Configuration Migration

When upgrading OllamaFlow:

Backup Configuration: Save current ollamaflow.json
Backup Database: Save current ollamaflow.db
Review Changes: Check for new configuration options
Test Upgrade: Test in non-production environment first

Export/Import Configuration

Export current configuration for replication:

# Export all frontends
curl -H "Authorization: Bearer token" \
  http://localhost:43411/v1.0/frontends > frontends.json

# Export all backends
curl -H "Authorization: Bearer token" \
  http://localhost:43411/v1.0/backends > backends.json

Import configuration to new instance:

# Import backends first
cat backends.json | jq '.[]' | while read backend; do
  curl -X PUT -H "Authorization: Bearer token" \
    -H "Content-Type: application/json" \
    -d "$backend" \
    http://new-host:43411/v1.0/backends
done

# Then import frontends
cat frontends.json | jq '.[]' | while read frontend; do
  curl -X PUT -H "Authorization: Bearer token" \
    -H "Content-Type: application/json" \
    -d "$frontend" \
    http://new-host:43411/v1.0/frontends
done

Next Steps

Review API Reference for programmatic configuration
Explore Deployment Options for your infrastructure
Check Monitoring and Observability for production insights

Main Configuration File

OllamaFlow uses a JSON configuration file (ollamaflow.json) that defines server settings, logging, and authentication. On first startup, a default configuration is created if none exists.

Default Configuration Location

Docker: /app/ollamaflow.json
Bare Metal: Same directory as executable
Custom: Specify with OLLAMAFLOW_CONFIG environment variable

Complete Configuration Example

{
  "Logging": {
    "Servers": [
      {
        "Hostname": "127.0.0.1",
        "Port": 514,
        "RandomizePorts": false,
        "MinimumPort": 65000,
        "MaximumPort": 65535
      }
    ],
    "LogDirectory": "./logs/",
    "LogFilename": "ollamaflow.log",
    "ConsoleLogging": true,
    "EnableColors": true,
    "MinimumSeverity": 1
  },
  "Webserver": {
    "Hostname": "*",
    "Port": 43411,
    "IO": {
      "StreamBufferSize": 65536,
      "MaxRequests": 1024,
      "ReadTimeoutMs": 10000,
      "MaxIncomingHeadersSize": 65536,
      "EnableKeepAlive": false
    },
    "Ssl": {
      "Enable": false,
      "MutuallyAuthenticate": false,
      "AcceptInvalidCertificates": true,
      "CertificateFile": "",
      "CertificatePassword": ""
    },
    "Headers": {
      "IncludeContentLength": true,
      "DefaultHeaders": {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Methods": "OPTIONS, HEAD, GET, PUT, POST, DELETE, PATCH",
        "Access-Control-Allow-Headers": "*",
        "Access-Control-Expose-Headers": "",
        "Accept": "*/*",
        "Accept-Language": "en-US, en",
        "Accept-Charset": "ISO-8859-1, utf-8",
        "Cache-Control": "no-cache",
        "Connection": "close",
        "Host": "localhost:43411"
      }
    },
    "AccessControl": {
      "DenyList": {},
      "PermitList": {},
      "Mode": "DefaultPermit"
    },
    "Debug": {
      "AccessControl": false,
      "Routing": false,
      "Requests": false,
      "Responses": false
    }
  },
  "DatabaseFilename": "ollamaflow.db",
  "AdminBearerTokens": [
    "your-secure-admin-token"
  ]
}

Configuration Sections

Logging Settings

Controls how OllamaFlow logs information and errors.

Setting	Type	Default	Description
`LogDirectory`	string	`"./logs/"`	Directory for log files
`LogFilename`	string	`"ollamaflow.log"`	Base filename for logs
`ConsoleLogging`	boolean	`true`	Enable console output
`EnableColors`	boolean	`true`	Enable colored console output
`MinimumSeverity`	integer	`1`	Minimum log level (0=Debug, 1=Info, 2=Warn, 3=Error)

Syslog Servers

Optional remote logging configuration:

{
  "Servers": [
    {
      "Hostname": "syslog.company.com",
      "Port": 514,
      "RandomizePorts": false,
      "MinimumPort": 65000,
      "MaximumPort": 65535
    }
  ]
}

Webserver Settings

Configures the HTTP server that handles all requests.

Basic Settings

Setting	Type	Default	Description
`Hostname`	string	`"*"`	Bind hostname (* for all interfaces)
`Port`	integer	`43411`	TCP port to listen on

IO Settings

Controls request handling and performance:

Setting	Type	Default	Description
`StreamBufferSize`	integer	`65536`	Buffer size for streaming responses
`MaxRequests`	integer	`1024`	Maximum concurrent requests
`ReadTimeoutMs`	integer	`10000`	Request read timeout in milliseconds
`MaxIncomingHeadersSize`	integer	`65536`	Maximum size of request headers
`EnableKeepAlive`	boolean	`false`	Enable HTTP keep-alive connections

SSL Settings

HTTPS configuration for secure connections:

Setting	Type	Default	Description
`Enable`	boolean	`false`	Enable HTTPS
`MutuallyAuthenticate`	boolean	`false`	Require client certificates
`AcceptInvalidCertificates`	boolean	`true`	Accept self-signed certificates
`CertificateFile`	string	`""`	Path to SSL certificate file
`CertificatePassword`	string	`""`	Certificate password if required

Headers Settings

Default HTTP headers and CORS configuration:

{
  "IncludeContentLength": true,
  "DefaultHeaders": {
    "Access-Control-Allow-Origin": "*",
    "Access-Control-Allow-Methods": "OPTIONS, HEAD, GET, PUT, POST, DELETE, PATCH",
    "Access-Control-Allow-Headers": "*"
  }
}

Access Control

IP-based access control (optional):

{
  "DenyList": {
    "192.168.1.100": "Blocked IP",
    "10.0.0.0/8": "Blocked network"
  },
  "PermitList": {
    "192.168.1.0/24": "Allowed network"
  },
  "Mode": "DefaultPermit"
}

Modes:

DefaultPermit: Allow all except denied IPs
DefaultDeny: Deny all except permitted IPs

Database Settings

Setting	Type	Default	Description
`DatabaseFilename`	string	`"ollamaflow.db"`	SQLite database file path

Authentication Settings

Setting	Type	Default	Description
`AdminBearerTokens`	array	`["ollamaflowadmin"]`	Valid bearer tokens for admin APIs

Frontend Configuration

Frontends are virtual Ollama endpoints that clients connect to. They are stored in the database and managed via the API.

Frontend Object Structure

{
  "Identifier": "production-frontend",
  "Name": "Production AI Inference",
  "Hostname": "ai.company.com",
  "TimeoutMs": 90000,
  "LoadBalancing": "RoundRobin",
  "BlockHttp10": true,
  "MaxRequestBodySize": 1073741824,
  "Backends": ["gpu-1", "gpu-2", "gpu-3"],
  "RequiredModels": ["llama3:8b", "mistral:7b"],
  "AllowEmbeddings": true,
  "AllowCompletions": true,
  "PinnedEmbeddingsProperties": {
    "model": "nomic-embed-text",
    "options": {
      "temperature": 0.1
    }
  },
  "PinnedCompletionsProperties": {
    "options": {
      "temperature": 0.7,
      "num_ctx": 2048
    }
  },
  "LogRequestFull": false,
  "LogRequestBody": false,
  "LogResponseBody": false,
  "UseStickySessions": false,
  "StickySessionExpirationMs": 1800000,
  "Active": true
}

Frontend Properties

Property	Type	Default	Description
`Identifier`	string	Required	Unique identifier for this frontend
`Name`	string	Required	Human-readable name
`Hostname`	string	`"*"`	Hostname pattern (* for catch-all)
`TimeoutMs`	integer	`60000`	Request timeout in milliseconds
`LoadBalancing`	enum	`"RoundRobin"`	Load balancing algorithm
`BlockHttp10`	boolean	`true`	Reject HTTP/1.0 requests
`MaxRequestBodySize`	integer	`536870912`	Max request size in bytes (512MB)
`Backends`	array	`[]`	List of backend identifiers
`RequiredModels`	array	`[]`	Models that must be available
`AllowEmbeddings`	boolean	`true`	Allow embeddings API requests
`AllowCompletions`	boolean	`true`	Allow completions API requests
`PinnedEmbeddingsProperties`	object	`{}`	Key-value pairs merged into embeddings requests
`PinnedCompletionsProperties`	object	`{}`	Key-value pairs merged into completions requests
`UseStickySessions`	boolean	`false`	Enable session stickiness
`StickySessionExpirationMs`	integer	`1800000`	Session timeout (30 minutes, min: 10s, max: 24h)
`LogRequestFull`	boolean	`false`	Log complete requests
`LogRequestBody`	boolean	`false`	Log request bodies
`LogResponseBody`	boolean	`false`	Log response bodies
`Active`	boolean	`true`	Whether frontend is active

Load Balancing Options

"RoundRobin": Cycle through backends sequentially
"Random": Randomly select from healthy backends

Hostname Patterns

"*": Match all hostnames (catch-all)
"api.company.com": Exact hostname match
Multiple frontends can exist with different hostname patterns

Security Controls

Frontend security controls enable fine-grained access control and request parameter enforcement:

Request Type Controls

AllowEmbeddings: Controls whether embeddings API endpoints are accessible through this frontend
- Ollama API: /api/embed
- OpenAI API: /v1/embeddings
AllowCompletions: Controls whether completion API endpoints are accessible through this frontend
- Ollama API: /api/generate, /api/chat
- OpenAI API: /v1/completions, /v1/chat/completions

For a request to succeed, both the frontend and at least one assigned backend must allow the request type.

Pinned Properties

Pinned properties allow administrators to enforce specific parameters in requests, providing security compliance and standardization:

PinnedEmbeddingsProperties: Key-value pairs automatically merged into all embeddings requests
PinnedCompletionsProperties: Key-value pairs automatically merged into all completion requests

Common use cases:

Enforce maximum context size: {"options": {"num_ctx": 2048}}
Standardize temperature settings: {"options": {"temperature": 0.7}}
Override model selection: {"model": "approved-model:latest"}
Set organizational defaults: {"options": {"top_p": 0.9, "top_k": 40}}

Properties are merged with client requests, with pinned properties taking precedence over client-specified values.

Backend Configuration

Backends represent physical Ollama instances in your infrastructure.

Backend Object Structure

{
  "Identifier": "gpu-server-1",
  "Name": "Primary GPU Server",
  "Hostname": "192.168.1.100",
  "Port": 11434,
  "Ssl": false,
  "UnhealthyThreshold": 3,
  "HealthyThreshold": 2,
  "HealthCheckMethod": "GET",
  "HealthCheckUrl": "/api/version",
  "MaxParallelRequests": 8,
  "RateLimitRequestsThreshold": 20,
  "AllowEmbeddings": true,
  "AllowCompletions": true,
  "PinnedEmbeddingsProperties": {
    "options": {
      "num_ctx": 512
    }
  },
  "PinnedCompletionsProperties": {
    "options": {
      "num_ctx": 4096,
      "temperature": 0.8
    }
  },
  "LogRequestFull": false,
  "LogRequestBody": false,
  "LogResponseBody": false,
  "Active": true
}

Backend Properties

Property	Type	Default	Description
`Identifier`	string	Required	Unique identifier for this backend
`Name`	string	Required	Human-readable name
`Hostname`	string	Required	Backend server hostname/IP
`Port`	integer	`11434`	Backend server port
`Ssl`	boolean	`false`	Use HTTPS for backend communication
`UnhealthyThreshold`	integer	`2`	Failed checks before marking unhealthy
`HealthyThreshold`	integer	`2`	Successful checks before marking healthy
`HealthCheckMethod`	string	`"GET"`	HTTP method for health checks
`HealthCheckUrl`	string	`"/"`	URL path for health checks
`MaxParallelRequests`	integer	`4`	Maximum concurrent requests
`RateLimitRequestsThreshold`	integer	`10`	Rate limiting threshold
`AllowEmbeddings`	boolean	`true`	Allow embeddings API requests
`AllowCompletions`	boolean	`true`	Allow completions API requests
`PinnedEmbeddingsProperties`	object	`{}`	Key-value pairs merged into embeddings requests
`PinnedCompletionsProperties`	object	`{}`	Key-value pairs merged into completions requests
`LogRequestFull`	boolean	`false`	Log complete requests
`LogRequestBody`	boolean	`false`	Log request bodies
`LogResponseBody`	boolean	`false`	Log response bodies
`Active`	boolean	`true`	Whether backend is active

Health Check Configuration

Health checks validate backend availability:

Method: HTTP method (GET, HEAD, POST)
URL: Path to check (e.g., /, /api/version, /health)
Thresholds: Number of consecutive successes/failures to change state

Common health check endpoints:

/: Basic connectivity check
/api/version: Ollama version endpoint
/api/tags: Model listing endpoint

Rate Limiting

Backends can enforce rate limits:

Requests exceeding RateLimitRequestsThreshold receive HTTP 429
Rate limiting is per backend, not global
Helps protect individual Ollama instances from overload

Security Controls

Backend security controls provide additional layers of request filtering and parameter enforcement:

Request Type Controls

AllowEmbeddings: Controls whether this backend can process embeddings requests
AllowCompletions: Controls whether this backend can process completion requests

Requests are only routed to backends that allow the specific request type. This enables:

Dedicated embeddings servers that only handle embeddings requests:
- Ollama API: /api/embed
- OpenAI API: /v1/embeddings
Completion-only servers that only handle completion requests:
- Ollama API: /api/generate, /api/chat
- OpenAI API: /v1/completions, /v1/chat/completions
Multi-tenant isolation by request type

Pinned Properties

Backend pinned properties provide server-level parameter enforcement:

PinnedEmbeddingsProperties: Applied to all embeddings requests routed to this backend
PinnedCompletionsProperties: Applied to all completion requests routed to this backend

Backend pinned properties are merged after frontend pinned properties, allowing for:

Server-specific resource limits: {"options": {"num_ctx": 1024}}
Hardware-optimized settings: {"options": {"num_gpu": 2}}
Backend-specific model overrides: {"model": "server-optimized-model"}

The merge order is: Client Request → Frontend Pinned Properties → Backend Pinned Properties, with later values taking precedence.

Environment Variables

Override configuration with environment variables:

Variable	Description	Example
`OLLAMAFLOW_CONFIG`	Configuration file path	`/etc/ollamaflow/config.json`
`OLLAMAFLOW_PORT`	Override webserver port	`8080`
`OLLAMAFLOW_HOSTNAME`	Override webserver hostname	`0.0.0.0`
`OLLAMAFLOW_DATABASE`	Override database file path	`/data/ollamaflow.db`
`OLLAMAFLOW_ADMIN_TOKEN`	Override admin token	`secure-production-token`
`ASPNETCORE_ENVIRONMENT`	.NET environment	`Production`

Docker Environment Example

docker run -d \
  -e OLLAMAFLOW_PORT=8080 \
  -e OLLAMAFLOW_ADMIN_TOKEN=my-secure-token \
  -e ASPNETCORE_ENVIRONMENT=Production \
  -p 8080:8080 \
  jchristn/ollamaflow

Configuration Examples

Basic Single Backend

Minimal configuration for testing:

{
  "Webserver": {
    "Port": 43411
  },
  "AdminBearerTokens": ["test-token"]
}

Frontend/Backend via API:

# Create backend
curl -X PUT -H "Authorization: Bearer test-token" \
  -H "Content-Type: application/json" \
  -d '{"Identifier": "local", "Hostname": "localhost", "Port": 11434}' \
  http://localhost:43411/v1.0/backends

# Create frontend
curl -X PUT -H "Authorization: Bearer test-token" \
  -H "Content-Type: application/json" \
  -d '{"Identifier": "main", "Backends": ["local"]}' \
  http://localhost:43411/v1.0/frontends

Production Multi-Backend

Production configuration with multiple GPU servers:

{
  "Webserver": {
    "Hostname": "*",
    "Port": 43411,
    "Ssl": {
      "Enable": true,
      "CertificateFile": "/etc/ssl/ollamaflow.crt",
      "CertificatePassword": "cert-password"
    }
  },
  "Logging": {
    "LogDirectory": "/var/log/ollamaflow/",
    "ConsoleLogging": false,
    "MinimumSeverity": 1
  },
  "DatabaseFilename": "/var/lib/ollamaflow/ollamaflow.db",
  "AdminBearerTokens": [
    "secure-production-token-1",
    "secure-production-token-2"
  ]
}

Backends configuration:

# GPU servers
for i in {1..4}; do
  curl -X PUT -H "Authorization: Bearer secure-production-token-1" \
    -H "Content-Type: application/json" \
    -d "{
      \"Identifier\": \"gpu-$i\",
      \"Name\": \"GPU Server $i\",
      \"Hostname\": \"gpu$i.company.internal\",
      \"Port\": 11434,
      \"MaxParallelRequests\": 8,
      \"HealthCheckUrl\": \"/api/version\"
    }" \
    http://localhost:43411/v1.0/backends
done

# Production frontend
curl -X PUT -H "Authorization: Bearer secure-production-token-1" \
  -H "Content-Type: application/json" \
  -d '{
    "Identifier": "production",
    "Name": "Production AI Inference",
    "Hostname": "ai.company.com",
    "LoadBalancing": "RoundRobin",
    "Backends": ["gpu-1", "gpu-2", "gpu-3", "gpu-4"],
    "RequiredModels": ["llama3:8b", "mistral:7b", "codellama:13b"],
    "TimeoutMs": 120000
  }' \
  http://localhost:43411/v1.0/frontends

Development Environment

Development setup with debugging enabled:

{
  "Webserver": {
    "Port": 43411,
    "Debug": {
      "Routing": true,
      "Requests": true,
      "Responses": false
    }
  },
  "Logging": {
    "ConsoleLogging": true,
    "EnableColors": true,
    "MinimumSeverity": 0
  },
  "AdminBearerTokens": ["dev-token"]
}

Configuration Validation

OllamaFlow validates configuration on startup:

Common Validation Errors

Invalid Port Range: Ports must be 1-65535
Missing Required Fields: Identifier, Hostname required for backends
Duplicate Identifiers: Frontend/Backend IDs must be unique
Invalid Load Balancing: Must be "RoundRobin" or "Random"
Invalid Hostnames: Must be valid hostname or "*"

Configuration Test

Validate configuration without starting the server:

# Test configuration file
dotnet OllamaFlow.Server.dll --validate-config

# Test with specific config
OLLAMAFLOW_CONFIG=/path/to/config.json dotnet OllamaFlow.Server.dll --validate-config

Migration and Backup

Database Backup

Backup the SQLite database regularly:

# Simple copy (stop OllamaFlow first)
cp ollamaflow.db ollamaflow.db.backup

# Online backup (while running)
sqlite3 ollamaflow.db ".backup /path/to/backup.db"

Configuration Migration

When upgrading OllamaFlow:

Backup Configuration: Save current ollamaflow.json
Backup Database: Save current ollamaflow.db
Review Changes: Check for new configuration options
Test Upgrade: Test in non-production environment first

Export/Import Configuration

Export current configuration for replication:

# Export all frontends
curl -H "Authorization: Bearer token" \
  http://localhost:43411/v1.0/frontends > frontends.json

# Export all backends
curl -H "Authorization: Bearer token" \
  http://localhost:43411/v1.0/backends > backends.json

Import configuration to new instance:

# Import backends first
cat backends.json | jq '.[]' | while read backend; do
  curl -X PUT -H "Authorization: Bearer token" \
    -H "Content-Type: application/json" \
    -d "$backend" \
    http://new-host:43411/v1.0/backends
done

# Then import frontends
cat frontends.json | jq '.[]' | while read frontend; do
  curl -X PUT -H "Authorization: Bearer token" \
    -H "Content-Type: application/json" \
    -d "$frontend" \
    http://new-host:43411/v1.0/frontends
done

Next Steps

Review API Reference for programmatic configuration
Explore Deployment Options for your infrastructure
Check Monitoring and Observability for production insights

Main Configuration File

OllamaFlow uses a JSON configuration file (ollamaflow.json) that defines server settings, logging, and authentication. On first startup, a default configuration is created if none exists.

Default Configuration Location

Docker: /app/ollamaflow.json
Bare Metal: Same directory as executable
Custom: Specify with OLLAMAFLOW_CONFIG environment variable

Complete Configuration Example

{
  "Logging": {
    "Servers": [
      {
        "Hostname": "127.0.0.1",
        "Port": 514,
        "RandomizePorts": false,
        "MinimumPort": 65000,
        "MaximumPort": 65535
      }
    ],
    "LogDirectory": "./logs/",
    "LogFilename": "ollamaflow.log",
    "ConsoleLogging": true,
    "EnableColors": true,
    "MinimumSeverity": 1
  },
  "Webserver": {
    "Hostname": "*",
    "Port": 43411,
    "IO": {
      "StreamBufferSize": 65536,
      "MaxRequests": 1024,
      "ReadTimeoutMs": 10000,
      "MaxIncomingHeadersSize": 65536,
      "EnableKeepAlive": false
    },
    "Ssl": {
      "Enable": false,
      "MutuallyAuthenticate": false,
      "AcceptInvalidCertificates": true,
      "CertificateFile": "",
      "CertificatePassword": ""
    },
    "Headers": {
      "IncludeContentLength": true,
      "DefaultHeaders": {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Methods": "OPTIONS, HEAD, GET, PUT, POST, DELETE, PATCH",
        "Access-Control-Allow-Headers": "*",
        "Access-Control-Expose-Headers": "",
        "Accept": "*/*",
        "Accept-Language": "en-US, en",
        "Accept-Charset": "ISO-8859-1, utf-8",
        "Cache-Control": "no-cache",
        "Connection": "close",
        "Host": "localhost:43411"
      }
    },
    "AccessControl": {
      "DenyList": {},
      "PermitList": {},
      "Mode": "DefaultPermit"
    },
    "Debug": {
      "AccessControl": false,
      "Routing": false,
      "Requests": false,
      "Responses": false
    }
  },
  "DatabaseFilename": "ollamaflow.db",
  "AdminBearerTokens": [
    "your-secure-admin-token"
  ],
  "StickyHeaders": [
    "x-conversation-id",
    "x-thread-id"
  ]
}

Configuration Sections

Logging Settings

Controls how OllamaFlow logs information and errors.

Setting	Type	Default	Description
`LogDirectory`	string	`"./logs/"`	Directory for log files
`LogFilename`	string	`"ollamaflow.log"`	Base filename for logs
`ConsoleLogging`	boolean	`true`	Enable console output
`EnableColors`	boolean	`true`	Enable colored console output
`MinimumSeverity`	integer	`1`	Minimum log level (0=Debug, 1=Info, 2=Warn, 3=Error)

Syslog Servers

Optional remote logging configuration:

{
  "Servers": [
    {
      "Hostname": "syslog.company.com",
      "Port": 514,
      "RandomizePorts": false,
      "MinimumPort": 65000,
      "MaximumPort": 65535
    }
  ]
}

Webserver Settings

Configures the HTTP server that handles all requests.

Basic Settings

Setting	Type	Default	Description
`Hostname`	string	`"*"`	Bind hostname (* for all interfaces)
`Port`	integer	`43411`	TCP port to listen on

IO Settings

Controls request handling and performance:

Setting	Type	Default	Description
`StreamBufferSize`	integer	`65536`	Buffer size for streaming responses
`MaxRequests`	integer	`1024`	Maximum concurrent requests
`ReadTimeoutMs`	integer	`10000`	Request read timeout in milliseconds
`MaxIncomingHeadersSize`	integer	`65536`	Maximum size of request headers
`EnableKeepAlive`	boolean	`false`	Enable HTTP keep-alive connections

SSL Settings

HTTPS configuration for secure connections:

Setting	Type	Default	Description
`Enable`	boolean	`false`	Enable HTTPS
`MutuallyAuthenticate`	boolean	`false`	Require client certificates
`AcceptInvalidCertificates`	boolean	`true`	Accept self-signed certificates
`CertificateFile`	string	`""`	Path to SSL certificate file
`CertificatePassword`	string	`""`	Certificate password if required

Headers Settings

Default HTTP headers and CORS configuration:

{
  "IncludeContentLength": true,
  "DefaultHeaders": {
    "Access-Control-Allow-Origin": "*",
    "Access-Control-Allow-Methods": "OPTIONS, HEAD, GET, PUT, POST, DELETE, PATCH",
    "Access-Control-Allow-Headers": "*"
  }
}

Access Control

IP-based access control (optional):

{
  "DenyList": {
    "192.168.1.100": "Blocked IP",
    "10.0.0.0/8": "Blocked network"
  },
  "PermitList": {
    "192.168.1.0/24": "Allowed network"
  },
  "Mode": "DefaultPermit"
}

Modes:

DefaultPermit: Allow all except denied IPs
DefaultDeny: Deny all except permitted IPs

Database Settings

Setting	Type	Default	Description
`DatabaseFilename`	string	`"ollamaflow.db"`	SQLite database file path

Authentication Settings

Setting	Type	Default	Description
`AdminBearerTokens`	array	`["ollamaflowadmin"]`	Valid bearer tokens for admin APIs

Sticky Headers

The StickyHeaders string array specifies on which headers to match to uniquely identify a client when using session stickiness. If you are not using session stickiness, set this to an empty array. A case-insensitive comparison is used, meaning x-conversation-id and X-Conversation-ID are considered the same while evaluating headers.

If no sticky headers are defined and session stickiness is enabled, the client IP address will be used as the client identifier.

Frontend Configuration

Frontends are virtual Ollama endpoints that clients connect to. They are stored in the database and managed via the API.

Frontend Object Structure

{
  "Identifier": "production-frontend",
  "Name": "Production AI Inference",
  "Hostname": "ai.company.com",
  "TimeoutMs": 90000,
  "LoadBalancing": "RoundRobin",
  "BlockHttp10": true,
  "MaxRequestBodySize": 1073741824,
  "Backends": ["gpu-1", "gpu-2", "gpu-3"],
  "RequiredModels": ["llama3:8b", "mistral:7b"],
  "LogRequestFull": false,
  "LogRequestBody": false,
  "LogResponseBody": false,
  "UseStickySessions": false,
  "StickySessionExpirationMs": 1800000,
  "Active": true
}

Frontend Properties

Property	Type	Default	Description
`Identifier`	string	Required	Unique identifier for this frontend
`Name`	string	Required	Human-readable name
`Hostname`	string	`"*"`	Hostname pattern (* for catch-all)
`TimeoutMs`	integer	`60000`	Request timeout in milliseconds
`LoadBalancing`	enum	`"RoundRobin"`	Load balancing algorithm
`BlockHttp10`	boolean	`true`	Reject HTTP/1.0 requests
`MaxRequestBodySize`	integer	`536870912`	Max request size in bytes (512MB)
`Backends`	array	`[]`	List of backend identifiers
`RequiredModels`	array	`[]`	Models that must be available
`UseStickySessions`	boolean	`false`	Enable session stickiness
`StickySessionExpirationMs`	integer	`1800000`	Session timeout (30 minutes, min: 10s, max: 24h)
`LogRequestFull`	boolean	`false`	Log complete requests
`LogRequestBody`	boolean	`false`	Log request bodies
`LogResponseBody`	boolean	`false`	Log response bodies
`Active`	boolean	`true`	Whether frontend is active

Load Balancing Options

"RoundRobin": Cycle through backends sequentially
"Random": Randomly select from healthy backends

Hostname Patterns

"*": Match all hostnames (catch-all)
"api.company.com": Exact hostname match
Multiple frontends can exist with different hostname patterns

Backend Configuration

Backends represent physical Ollama instances in your infrastructure.

Backend Object Structure

{
  "Identifier": "gpu-server-1",
  "Name": "Primary GPU Server",
  "Hostname": "192.168.1.100",
  "Port": 11434,
  "Ssl": false,
  "UnhealthyThreshold": 3,
  "HealthyThreshold": 2,
  "HealthCheckMethod": "GET",
  "HealthCheckUrl": "/api/version",
  "MaxParallelRequests": 8,
  "RateLimitRequestsThreshold": 20,
  "ApiFormat": "Ollama",
  "LogRequestFull": false,
  "LogRequestBody": false,
  "LogResponseBody": false,
  "Active": true
}

Backend Properties

Property	Type	Default	Description
`Identifier`	string	Required	Unique identifier for this backend
`Name`	string	Required	Human-readable name
`Hostname`	string	Required	Backend server hostname/IP
`Port`	integer	`11434`	Backend server port
`Ssl`	boolean	`false`	Use HTTPS for backend communication
`UnhealthyThreshold`	integer	`2`	Failed checks before marking unhealthy
`HealthyThreshold`	integer	`2`	Successful checks before marking healthy
`HealthCheckMethod`	string	`"GET"`	HTTP method for health checks
`HealthCheckUrl`	string	`"/"`	URL path for health checks
`MaxParallelRequests`	integer	`4`	Maximum concurrent requests
`RateLimitRequestsThreshold`	integer	`10`	Rate limiting threshold
`ApiFormat`	string	`Ollama`	Backend API format, either `Ollama` or `OpenAI`
`LogRequestFull`	boolean	`false`	Log complete requests
`LogRequestBody`	boolean	`false`	Log request bodies
`LogResponseBody`	boolean	`false`	Log response bodies
`Active`	boolean	`true`	Whether backend is active

Health Check Configuration

Health checks validate backend availability:

Method: HTTP method (GET, HEAD, POST)
URL: Path to check (e.g., /, /api/version, /health)
Thresholds: Number of consecutive successes/failures to change state

IMPORTANT: vLLM expects healthchecks on GET /health

Common health check endpoints:

/: Basic connectivity check
/api/version: Ollama version endpoint
/api/tags: Model listing endpoint

Rate Limiting

Backends can enforce rate limits:

Requests exceeding RateLimitRequestsThreshold receive HTTP 429
Rate limiting is per backend, not global
Helps protect individual Ollama instances from overload

Environment Variables

Override configuration with environment variables:

Variable	Description	Example
`OLLAMAFLOW_CONFIG`	Configuration file path	`/etc/ollamaflow/config.json`
`OLLAMAFLOW_PORT`	Override webserver port	`8080`
`OLLAMAFLOW_HOSTNAME`	Override webserver hostname	`0.0.0.0`
`OLLAMAFLOW_DATABASE`	Override database file path	`/data/ollamaflow.db`
`OLLAMAFLOW_ADMIN_TOKEN`	Override admin token	`secure-production-token`
`ASPNETCORE_ENVIRONMENT`	.NET environment	`Production`

Docker Environment Example

docker run -d \
  -e OLLAMAFLOW_PORT=8080 \
  -e OLLAMAFLOW_ADMIN_TOKEN=my-secure-token \
  -e ASPNETCORE_ENVIRONMENT=Production \
  -p 8080:8080 \
  jchristn/ollamaflow

Configuration Examples

Basic Single Backend

Minimal configuration for testing:

{
  "Webserver": {
    "Port": 43411
  },
  "AdminBearerTokens": ["test-token"]
}

Frontend/Backend via API:

# Create backend
curl -X PUT -H "Authorization: Bearer test-token" \
  -H "Content-Type: application/json" \
  -d '{"Identifier": "local", "Hostname": "localhost", "Port": 11434}' \
  http://localhost:43411/v1.0/backends

# Create frontend
curl -X PUT -H "Authorization: Bearer test-token" \
  -H "Content-Type: application/json" \
  -d '{"Identifier": "main", "Backends": ["local"]}' \
  http://localhost:43411/v1.0/frontends

Production Multi-Backend

Production configuration with multiple GPU servers:

{
  "Webserver": {
    "Hostname": "*",
    "Port": 43411,
    "Ssl": {
      "Enable": true,
      "CertificateFile": "/etc/ssl/ollamaflow.crt",
      "CertificatePassword": "cert-password"
    }
  },
  "Logging": {
    "LogDirectory": "/var/log/ollamaflow/",
    "ConsoleLogging": false,
    "MinimumSeverity": 1
  },
  "DatabaseFilename": "/var/lib/ollamaflow/ollamaflow.db",
  "AdminBearerTokens": [
    "secure-production-token-1",
    "secure-production-token-2"
  ]
}

Backends configuration:

# GPU servers
for i in {1..4}; do
  curl -X PUT -H "Authorization: Bearer secure-production-token-1" \
    -H "Content-Type: application/json" \
    -d "{
      \"Identifier\": \"gpu-$i\",
      \"Name\": \"GPU Server $i\",
      \"Hostname\": \"gpu$i.company.internal\",
      \"Port\": 11434,
      \"MaxParallelRequests\": 8,
      \"HealthCheckUrl\": \"/api/version\"
    }" \
    http://localhost:43411/v1.0/backends
done

# Production frontend
curl -X PUT -H "Authorization: Bearer secure-production-token-1" \
  -H "Content-Type: application/json" \
  -d '{
    "Identifier": "production",
    "Name": "Production AI Inference",
    "Hostname": "ai.company.com",
    "LoadBalancing": "RoundRobin",
    "Backends": ["gpu-1", "gpu-2", "gpu-3", "gpu-4"],
    "RequiredModels": ["llama3:8b", "mistral:7b", "codellama:13b"],
    "TimeoutMs": 120000
  }' \
  http://localhost:43411/v1.0/frontends

Development Environment

Development setup with debugging enabled:

{
  "Webserver": {
    "Port": 43411,
    "Debug": {
      "Routing": true,
      "Requests": true,
      "Responses": false
    }
  },
  "Logging": {
    "ConsoleLogging": true,
    "EnableColors": true,
    "MinimumSeverity": 0
  },
  "AdminBearerTokens": ["dev-token"]
}

Configuration Validation

OllamaFlow validates configuration on startup:

Common Validation Errors

Invalid Port Range: Ports must be 1-65535
Missing Required Fields: Identifier, Hostname required for backends
Duplicate Identifiers: Frontend/Backend IDs must be unique
Invalid Load Balancing: Must be "RoundRobin" or "Random"
Invalid Hostnames: Must be valid hostname or "*"

Configuration Test

Validate configuration without starting the server:

# Test configuration file
dotnet OllamaFlow.Server.dll --validate-config

# Test with specific config
OLLAMAFLOW_CONFIG=/path/to/config.json dotnet OllamaFlow.Server.dll --validate-config

Migration and Backup

Database Backup

Backup the SQLite database regularly:

# Simple copy (stop OllamaFlow first)
cp ollamaflow.db ollamaflow.db.backup

# Online backup (while running)
sqlite3 ollamaflow.db ".backup /path/to/backup.db"

Configuration Migration

When upgrading OllamaFlow:

Backup Configuration: Save current ollamaflow.json
Backup Database: Save current ollamaflow.db
Review Changes: Check for new configuration options
Test Upgrade: Test in non-production environment first

Export/Import Configuration

Export current configuration for replication:

# Export all frontends
curl -H "Authorization: Bearer token" \
  http://localhost:43411/v1.0/frontends > frontends.json

# Export all backends
curl -H "Authorization: Bearer token" \
  http://localhost:43411/v1.0/backends > backends.json

Import configuration to new instance:

# Import backends first
cat backends.json | jq '.[]' | while read backend; do
  curl -X PUT -H "Authorization: Bearer token" \
    -H "Content-Type: application/json" \
    -d "$backend" \
    http://new-host:43411/v1.0/backends
done

# Then import frontends
cat frontends.json | jq '.[]' | while read frontend; do
  curl -X PUT -H "Authorization: Bearer token" \
    -H "Content-Type: application/json" \
    -d "$frontend" \
    http://new-host:43411/v1.0/frontends
done