Configuration Reference

This comprehensive guide covers all configuration options available in OllamaFlow, including the main settings file (ollamaflow.json) and the structure of key objects like frontends and backends.

Main Configuration File

OllamaFlow uses a JSON configuration file (ollamaflow.json) that defines server settings, logging, and authentication. On first startup, a default configuration is created if none exists.

Default Configuration Location

  • Docker: /app/ollamaflow.json
  • Bare Metal: Same directory as executable

Complete Configuration Example

{
  "Logging": {
    "Servers": [
      {
        "Hostname": "127.0.0.1",
        "Port": 514,
        "RandomizePorts": false,
        "MinimumPort": 65000,
        "MaximumPort": 65535
      }
    ],
    "LogDirectory": "./logs/",
    "LogFilename": "ollamaflow.log",
    "ConsoleLogging": true,
    "EnableColors": true,
    "MinimumSeverity": 1
  },
  "Webserver": {
    "Hostname": "*",
    "Port": 43411,
    "IO": {
      "StreamBufferSize": 65536,
      "MaxRequests": 1024,
      "ReadTimeoutMs": 10000,
      "MaxIncomingHeadersSize": 65536,
      "EnableKeepAlive": false
    },
    "Ssl": {
      "Enable": false,
      "MutuallyAuthenticate": false,
      "AcceptInvalidCertificates": true,
      "CertificateFile": "",
      "CertificatePassword": ""
    },
    "Headers": {
      "IncludeContentLength": true,
      "DefaultHeaders": {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Methods": "OPTIONS, HEAD, GET, PUT, POST, DELETE, PATCH",
        "Access-Control-Allow-Headers": "*",
        "Access-Control-Expose-Headers": "",
        "Accept": "*/*",
        "Accept-Language": "en-US, en",
        "Accept-Charset": "ISO-8859-1, utf-8",
        "Cache-Control": "no-cache",
        "Connection": "close",
        "Host": "localhost:43411"
      }
    },
    "AccessControl": {
      "DenyList": {},
      "PermitList": {},
      "Mode": "DefaultPermit"
    },
    "Debug": {
      "AccessControl": false,
      "Routing": false,
      "Requests": false,
      "Responses": false
    }
  },
  "DatabaseFilename": "ollamaflow.db",
  "AdminBearerTokens": [
    "your-secure-admin-token"
  ]
}

Configuration Sections

Logging Settings

Controls how OllamaFlow logs information and errors.

SettingTypeDefaultDescription
LogDirectorystring"./logs/"Directory for log files
LogFilenamestring"ollamaflow.log"Base filename for logs
ConsoleLoggingbooleantrueEnable console output
EnableColorsbooleantrueEnable colored console output
MinimumSeverityinteger1Minimum log level (0=Debug, 1=Info, 2=Warn, 3=Error)

Syslog Servers

Optional remote logging configuration:

{
  "Servers": [
    {
      "Hostname": "syslog.company.com",
      "Port": 514,
      "RandomizePorts": false,
      "MinimumPort": 65000,
      "MaximumPort": 65535
    }
  ]
}

Webserver Settings

Configures the HTTP server that handles all requests.

Basic Settings

SettingTypeDefaultDescription
Hostnamestring"*"Bind hostname (* for all interfaces)
Portinteger43411TCP port to listen on

IO Settings

Controls request handling and performance:

SettingTypeDefaultDescription
StreamBufferSizeinteger65536Buffer size for streaming responses
MaxRequestsinteger1024Maximum concurrent requests
ReadTimeoutMsinteger10000Request read timeout in milliseconds
MaxIncomingHeadersSizeinteger65536Maximum size of request headers
EnableKeepAlivebooleanfalseEnable HTTP keep-alive connections

SSL Settings

HTTPS configuration for secure connections:

SettingTypeDefaultDescription
EnablebooleanfalseEnable HTTPS
MutuallyAuthenticatebooleanfalseRequire client certificates
AcceptInvalidCertificatesbooleantrueAccept self-signed certificates
CertificateFilestring""Path to SSL certificate file
CertificatePasswordstring""Certificate password if required

Headers Settings

Default HTTP headers and CORS configuration:

{
  "IncludeContentLength": true,
  "DefaultHeaders": {
    "Access-Control-Allow-Origin": "*",
    "Access-Control-Allow-Methods": "OPTIONS, HEAD, GET, PUT, POST, DELETE, PATCH",
    "Access-Control-Allow-Headers": "*"
  }
}

Access Control

IP-based access control (optional):

{
  "DenyList": {
    "192.168.1.100": "Blocked IP",
    "10.0.0.0/8": "Blocked network"
  },
  "PermitList": {
    "192.168.1.0/24": "Allowed network"
  },
  "Mode": "DefaultPermit"
}

Modes:

  • DefaultPermit: Allow all except denied IPs
  • DefaultDeny: Deny all except permitted IPs

Database Settings

SettingTypeDefaultDescription
DatabaseFilenamestring"ollamaflow.db"SQLite database file path

Authentication Settings

SettingTypeDefaultDescription
AdminBearerTokensarray["ollamaflowadmin"]Valid bearer tokens for admin APIs

Frontend Configuration

Frontends are virtual Ollama endpoints that clients connect to. They are stored in the database and managed via the API.

Frontend Object Structure

{
  "Identifier": "production-frontend",
  "Name": "Production AI Inference",
  "Hostname": "ai.company.com",
  "TimeoutMs": 90000,
  "LoadBalancing": "RoundRobin",
  "BlockHttp10": true,
  "MaxRequestBodySize": 1073741824,
  "Backends": ["gpu-1", "gpu-2", "gpu-3"],
  "RequiredModels": ["llama3:8b", "mistral:7b"],
  "AllowEmbeddings": true,
  "AllowCompletions": true,
  "PinnedEmbeddingsProperties": {
    "model": "nomic-embed-text",
    "options": {
      "temperature": 0.1
    }
  },
  "PinnedCompletionsProperties": {
    "options": {
      "temperature": 0.7,
      "num_ctx": 2048
    }
  },
  "LogRequestFull": false,
  "LogRequestBody": false,
  "LogResponseBody": false,
  "UseStickySessions": false,
  "StickySessionExpirationMs": 1800000,
  "Active": true
}

Frontend Properties

PropertyTypeDefaultDescription
IdentifierstringRequiredUnique identifier for this frontend
NamestringRequiredHuman-readable name
Hostnamestring"*"Hostname pattern (* for catch-all)
TimeoutMsinteger60000Request timeout in milliseconds
LoadBalancingenum"RoundRobin"Load balancing algorithm
BlockHttp10booleantrueReject HTTP/1.0 requests
MaxRequestBodySizeinteger536870912Max request size in bytes (512MB)
Backendsarray[]List of backend identifiers
RequiredModelsarray[]Models that must be available
AllowEmbeddingsbooleantrueAllow embeddings API requests
AllowCompletionsbooleantrueAllow completions API requests
PinnedEmbeddingsPropertiesobject{}Key-value pairs merged into embeddings requests
PinnedCompletionsPropertiesobject{}Key-value pairs merged into completions requests
UseStickySessionsbooleanfalseEnable session stickiness
StickySessionExpirationMsinteger1800000Session timeout (30 minutes, min: 10s, max: 24h)
LogRequestFullbooleanfalseLog complete requests
LogRequestBodybooleanfalseLog request bodies
LogResponseBodybooleanfalseLog response bodies
ActivebooleantrueWhether frontend is active




Load Balancing Options

  • "RoundRobin": Cycle through backends sequentially
  • "Random": Randomly select from healthy backends

Hostname Patterns

  • "*": Match all hostnames (catch-all)
  • "api.company.com": Exact hostname match
  • Multiple frontends can exist with different hostname patterns

Security Controls

Frontend security controls enable fine-grained access control and request parameter enforcement:

Request Type Controls

  • AllowEmbeddings: Controls whether embeddings API endpoints are accessible through this frontend
    • Ollama API: /api/embed
    • OpenAI API: /v1/embeddings
  • AllowCompletions: Controls whether completion API endpoints are accessible through this frontend
    • Ollama API: /api/generate, /api/chat
    • OpenAI API: /v1/completions, /v1/chat/completions

For a request to succeed, both the frontend and at least one assigned backend must allow the request type.

Pinned Properties

Pinned properties allow administrators to enforce specific parameters in requests, providing security compliance and standardization:

  • PinnedEmbeddingsProperties: Key-value pairs automatically merged into all embeddings requests
  • PinnedCompletionsProperties: Key-value pairs automatically merged into all completion requests

Common use cases:

  • Enforce maximum context size: {"options": {"num_ctx": 2048}}
  • Standardize temperature settings: {"options": {"temperature": 0.7}}
  • Override model selection: {"model": "approved-model:latest"}
  • Set organizational defaults: {"options": {"top_p": 0.9, "top_k": 40}}

Properties are merged with client requests, with pinned properties taking precedence over client-specified values.

Backend Configuration

Backends represent physical Ollama instances in your infrastructure.

Backend Object Structure

{
  "Identifier": "gpu-server-1",
  "Name": "Primary GPU Server",
  "Hostname": "192.168.1.100",
  "Port": 11434,
  "Ssl": false,
  "UnhealthyThreshold": 3,
  "HealthyThreshold": 2,
  "HealthCheckMethod": "GET",
  "HealthCheckUrl": "/",
  "MaxParallelRequests": 8,
  "RateLimitRequestsThreshold": 20,
  "AllowEmbeddings": true,
  "AllowCompletions": true,
  "BearerToken": null,
  "Querystring": "",
  "Headers": {},
  "PinnedEmbeddingsProperties": {
    "options": {
      "num_ctx": 512
    }
  },
  "PinnedCompletionsProperties": {
    "options": {
      "num_ctx": 4096,
      "temperature": 0.8
    }
  },
  "LogRequestFull": false,
  "LogRequestBody": false,
  "LogResponseBody": false,
  "Active": true
}

Backend Properties

PropertyTypeDefaultDescription
IdentifierstringRequiredUnique identifier for this backend
NamestringRequiredHuman-readable name
HostnamestringRequiredBackend server hostname/IP
Portinteger11434Backend server port
SslbooleanfalseUse HTTPS for backend communication
UnhealthyThresholdinteger2Failed checks before marking unhealthy
HealthyThresholdinteger2Successful checks before marking healthy
HealthCheckMethodstring"GET"HTTP method for health checks, either GET or HEAD
HealthCheckUrlstring"/"URL path for health checks
MaxParallelRequestsinteger4Maximum concurrent requests
RateLimitRequestsThresholdinteger10Rate limiting threshold
AllowEmbeddingsbooleantrueAllow embeddings API requests
AllowCompletionsbooleantrueAllow completions API requests
BearerTokenstringnullBearer token to attach to each request
QuerystringstringnullQuerystring to attach to each request. Do not include a leading ?, and separate each key-value pair with &
Headersdictionary{}Dictionary containing key-value pairs to attach as headers to each request
PinnedCompletionsPropertiesobject{}Key-value pairs merged into completions requests
LogRequestFullbooleanfalseLog complete requests
LogRequestBodybooleanfalseLog request bodies
LogResponseBodybooleanfalseLog response bodies
ActivebooleantrueWhether backend is active

Health Check Configuration

Health checks validate backend availability:

  • Method: HTTP method (GET, HEAD)
  • URL: Path to check (e.g., /, /api/version, /health)
  • Thresholds: Number of consecutive successes/failures to change state

Common health check endpoints:

  • HEAD /: Basic connectivity check for Ollama
  • GET /health: Basic connectivity check for vLLM

Rate Limiting

Backends can enforce rate limits:

  • Requests exceeding RateLimitRequestsThreshold receive HTTP 429
  • Rate limiting is per backend, not global
  • Helps protect individual Ollama instances from overload

Security Controls

Backend security controls provide additional layers of request filtering and parameter enforcement:

Request Type Controls

  • AllowEmbeddings: Controls whether this backend can process embeddings requests
  • AllowCompletions: Controls whether this backend can process completion requests

Requests are only routed to backends that allow the specific request type. This enables:

  • Dedicated embeddings servers that only handle embeddings requests:
    • Ollama API: /api/embed
    • OpenAI API: /v1/embeddings
  • Completion-only servers that only handle completion requests:
    • Ollama API: /api/generate, /api/chat
    • OpenAI API: /v1/completions, /v1/chat/completions
  • Multi-tenant isolation by request type

Pinned Properties

Backend pinned properties provide server-level parameter enforcement:

  • PinnedEmbeddingsProperties: Applied to all embeddings requests routed to this backend
  • PinnedCompletionsProperties: Applied to all completion requests routed to this backend

Backend pinned properties are merged after frontend pinned properties, allowing for:

  • Server-specific resource limits: {"options": {"num_ctx": 1024}}
  • Hardware-optimized settings: {"options": {"num_gpu": 2}}
  • Backend-specific model overrides: {"model": "server-optimized-model"}

The merge order is: Client Request → Frontend Pinned Properties → Backend Pinned Properties, with later values taking precedence.


docker run -d \
  -e OLLAMAFLOW_PORT=8080 \
  -e OLLAMAFLOW_ADMIN_TOKEN=my-secure-token \
  -e ASPNETCORE_ENVIRONMENT=Production \
  -p 8080:8080 \
  jchristn/ollamaflow

Configuration Examples

Basic Single Backend

Minimal configuration for testing:

{
  "Webserver": {
    "Port": 43411
  },
  "AdminBearerTokens": ["test-token"]
}

Frontend/Backend via API:

# Create backend
curl -X PUT -H "Authorization: Bearer test-token" \
  -H "Content-Type: application/json" \
  -d '{"Identifier": "local", "Hostname": "localhost", "Port": 11434}' \
  http://localhost:43411/v1.0/backends
>>>>>>> 7087d02bef34583fd5434e8e29aad06934988889

### Frontend Properties

| Property                      | Type    | Default        | Description                                      |
| ----------------------------- | ------- | -------------- | ------------------------------------------------ |
| `Identifier`                  | string  | Required       | Unique identifier for this frontend              |
| `Name`                        | string  | Required       | Human-readable name                              |
| `Hostname`                    | string  | `"*"`          | Hostname pattern (* for catch-all)               |
| `TimeoutMs`                   | integer | `60000`        | Request timeout in milliseconds                  |
| `LoadBalancing`               | enum    | `"RoundRobin"` | Load balancing algorithm                         |
| `BlockHttp10`                 | boolean | `true`         | Reject HTTP/1.0 requests                         |
| `MaxRequestBodySize`          | integer | `536870912`    | Max request size in bytes (512MB)                |
| `Backends`                    | array   | `[]`           | List of backend identifiers                      |
| `RequiredModels`              | array   | `[]`           | Models that must be available                    |
| `AllowEmbeddings`             | boolean | `true`         | Allow embeddings API requests                    |
| `AllowCompletions`            | boolean | `true`         | Allow completions API requests                   |
| `PinnedEmbeddingsProperties`  | object  | `{}`           | Key-value pairs merged into embeddings requests  |
| `PinnedCompletionsProperties` | object  | `{}`           | Key-value pairs merged into completions requests |
| `UseStickySessions`           | boolean | `false`        | Enable session stickiness                        |
| `StickySessionExpirationMs`   | integer | `1800000`      | Session timeout (30 minutes, min: 10s, max: 24h) |
| `LogRequestFull`              | boolean | `false`        | Log complete requests                            |
| `LogRequestBody`              | boolean | `false`        | Log request bodies                               |
| `LogResponseBody`             | boolean | `false`        | Log response bodies                              |
| `Active`                      | boolean | `true`         | Whether frontend is active                       |

### Load Balancing Options

* `"RoundRobin"`: Cycle through backends sequentially
* `"Random"`: Randomly select from healthy backends

### Hostname Patterns

* `"*"`: Match all hostnames (catch-all)
* `"api.company.com"`: Exact hostname match
* Multiple frontends can exist with different hostname patterns

### Security Controls

Frontend security controls enable fine-grained access control and request parameter enforcement:

#### Request Type Controls

* **`AllowEmbeddings`**: Controls whether embeddings API endpoints are accessible through this frontend
  * Ollama API: `/api/embed`
  * OpenAI API: `/v1/embeddings`
* **`AllowCompletions`**: Controls whether completion API endpoints are accessible through this frontend
  * Ollama API: `/api/generate`, `/api/chat`
  * OpenAI API: `/v1/completions`, `/v1/chat/completions`

For a request to succeed, both the frontend and at least one assigned backend must allow the request type.

#### Pinned Properties

Pinned properties allow administrators to enforce specific parameters in requests, providing security compliance and standardization:

* **`PinnedEmbeddingsProperties`**: Key-value pairs automatically merged into all embeddings requests
* **`PinnedCompletionsProperties`**: Key-value pairs automatically merged into all completion requests

Common use cases:

* Enforce maximum context size: `{"options": {"num_ctx": 2048}}`
* Standardize temperature settings: `{"options": {"temperature": 0.7}}`
* Override model selection: `{"model": "approved-model:latest"}`
* Set organizational defaults: `{"options": {"top_p": 0.9, "top_k": 40}}`

Properties are merged with client requests, with pinned properties taking precedence over client-specified values.

## Backend Configuration

Backends represent physical Ollama instances in your infrastructure.

### Backend Object Structure

```json
{
  "Identifier": "gpu-server-1",
  "Name": "Primary GPU Server",
  "Hostname": "192.168.1.100",
  "Port": 11434,
  "Ssl": false,
  "UnhealthyThreshold": 3,
  "HealthyThreshold": 2,
  "HealthCheckMethod": "GET",
  "HealthCheckUrl": "/",
  "MaxParallelRequests": 8,
  "RateLimitRequestsThreshold": 20,
  "AllowEmbeddings": true,
  "AllowCompletions": true,
  "PinnedEmbeddingsProperties": {
    "options": {
      "num_ctx": 512
    }
  },
  "PinnedCompletionsProperties": {
    "options": {
      "num_ctx": 4096,
      "temperature": 0.8
    }
  },
  "BearerToken": "sk-your-api-key",
  "Querystring": "api_version=2024-01",
  "Headers": {
    "X-Custom-Header": "custom-value"
  },
  "LogRequestFull": false,
  "LogRequestBody": false,
  "LogResponseBody": false,
  "Active": true
}

Backend Properties

PropertyTypeDefaultDescription
IdentifierstringRequiredUnique identifier for this backend
NamestringRequiredHuman-readable name
HostnamestringRequiredBackend server hostname/IP
Portinteger11434Backend server port
SslbooleanfalseUse HTTPS for backend communication
UnhealthyThresholdinteger2Failed checks before marking unhealthy
HealthyThresholdinteger2Successful checks before marking healthy
HealthCheckMethodstring"GET"HTTP method for health checks, either GET or HEAD
HealthCheckUrlstring"/"URL path for health checks
MaxParallelRequestsinteger4Maximum concurrent requests
RateLimitRequestsThresholdinteger10Rate limiting threshold
AllowEmbeddingsbooleantrueAllow embeddings API requests
AllowCompletionsbooleantrueAllow completions API requests
PinnedEmbeddingsPropertiesobject{}Key-value pairs merged into embeddings requests
PinnedCompletionsPropertiesobject{}Key-value pairs merged into completions requests
BearerTokenstringnullBearer token for Authorization header
QuerystringstringnullQuerystring appended to backend URLs
Headersobject{}Custom headers added to backend requests
LogRequestFullbooleanfalseLog complete requests
LogRequestBodybooleanfalseLog request bodies
LogResponseBodybooleanfalseLog response bodies
ActivebooleantrueWhether backend is active

Health Check Configuration

Health checks validate backend availability:

  • Method: HTTP method (GET, HEAD)
  • URL: Path to check (e.g., /, /api/version, /health)
  • Thresholds: Number of consecutive successes/failures to change state

Common health check endpoints:

  • HEAD /: Basic connectivity check for Ollama
  • GET /health: Basic connectivity check for vLLM

Rate Limiting

Backends can enforce rate limits:

  • Requests exceeding RateLimitRequestsThreshold receive HTTP 429
  • Rate limiting is per backend, not global
  • Helps protect individual Ollama instances from overload

Security Controls

Backend security controls provide additional layers of request filtering and parameter enforcement:

Request Type Controls

  • AllowEmbeddings: Controls whether this backend can process embeddings requests
  • AllowCompletions: Controls whether this backend can process completion requests

Requests are only routed to backends that allow the specific request type. This enables:

  • Dedicated embeddings servers that only handle embeddings requests:
    • Ollama API: /api/embed
    • OpenAI API: /v1/embeddings
  • Completion-only servers that only handle completion requests:
    • Ollama API: /api/generate, /api/chat
    • OpenAI API: /v1/completions, /v1/chat/completions
  • Multi-tenant isolation by request type

Pinned Properties

Backend pinned properties provide server-level parameter enforcement:

  • PinnedEmbeddingsProperties: Applied to all embeddings requests routed to this backend
  • PinnedCompletionsProperties: Applied to all completion requests routed to this backend

Backend pinned properties are merged after frontend pinned properties, allowing for:

  • Server-specific resource limits: {"options": {"num_ctx": 1024}}
  • Hardware-optimized settings: {"options": {"num_gpu": 2}}
  • Backend-specific model overrides: {"model": "server-optimized-model"}

The merge order is: Client Request → Frontend Pinned Properties → Backend Pinned Properties, with later values taking precedence.

Request Customization

Backends support additional request customization options for communicating with upstream services that require authentication or special parameters:

  • BearerToken: If set, OllamaFlow automatically adds an Authorization: Bearer {token} header to all requests sent to this backend. Useful for:

    • OpenAI-compatible APIs requiring API keys
    • Azure OpenAI Service authentication
    • Custom inference endpoints with bearer authentication
  • Querystring: If set, the specified querystring is appended to all URLs when communicating with this backend. Do not include the leading ? character. Separate multiple key-value pairs with ampersands (e.g., foo=bar&key=val). Useful for:

    • API versioning: api-version=2024-01
    • Deployment targeting: deployment=gpt-4
    • Custom routing parameters
  • Headers: A dictionary of custom headers added to all requests sent to this backend. Useful for:

    • Organization identification: X-Organization-Id: org-123
    • Custom routing headers: X-Custom-Region: us-east
    • Compliance headers: X-Audit-Id: audit-456

Example configuration for Azure OpenAI:

{
  "Identifier": "azure-openai",
  "Name": "Azure OpenAI Service",
  "Hostname": "my-resource.openai.azure.com",
  "Port": 443,
  "Ssl": true,
  "BearerToken": "your-azure-api-key",
  "Querystring": "api-version=2024-02-15-preview",
  "Headers": {
    "X-MS-Region": "eastus"
  }
}

Configuration Examples

Basic Single Backend

Minimal configuration for testing:

{
  "Webserver": {
    "Port": 43411
  },
  "AdminBearerTokens": ["test-token"]
}

Frontend/Backend via API:

# Create backend
curl -X PUT -H "Authorization: Bearer test-token" \
  -H "Content-Type: application/json" \
  -d '{"Identifier": "local", "Hostname": "localhost", "Port": 11434}' \
  http://localhost:43411/v1.0/backends

# Create frontend
curl -X PUT -H "Authorization: Bearer test-token" \
  -H "Content-Type: application/json" \
  -d '{"Identifier": "main", "Backends": ["local"]}' \
  http://localhost:43411/v1.0/frontends

Production Multi-Backend

Production configuration with multiple GPU servers:

{
  "Webserver": {
    "Hostname": "*",
    "Port": 43411,
    "Ssl": {
      "Enable": true,
      "CertificateFile": "/etc/ssl/ollamaflow.crt",
      "CertificatePassword": "cert-password"
    }
  },
  "Logging": {
    "LogDirectory": "/var/log/ollamaflow/",
    "ConsoleLogging": false,
    "MinimumSeverity": 1
  },
  "DatabaseFilename": "/var/lib/ollamaflow/ollamaflow.db",
  "AdminBearerTokens": [
    "secure-production-token-1",
    "secure-production-token-2"
  ]
}

Backends configuration:

# GPU servers
for i in {1..4}; do
  curl -X PUT -H "Authorization: Bearer secure-production-token-1" \
    -H "Content-Type: application/json" \
    -d "{
      \"Identifier\": \"gpu-$i\",
      \"Name\": \"GPU Server $i\",
      \"Hostname\": \"gpu$i.company.internal\",
      \"Port\": 11434,
      \"MaxParallelRequests\": 8,
      \"HealthCheckUrl\": \"/api/version\"
    }" \
    http://localhost:43411/v1.0/backends
done

# Production frontend
curl -X PUT -H "Authorization: Bearer secure-production-token-1" \
  -H "Content-Type: application/json" \
  -d '{
    "Identifier": "production",
    "Name": "Production AI Inference",
    "Hostname": "ai.company.com",
    "LoadBalancing": "RoundRobin",
    "Backends": ["gpu-1", "gpu-2", "gpu-3", "gpu-4"],
    "RequiredModels": ["llama3:8b", "mistral:7b", "codellama:13b"],
    "TimeoutMs": 120000
  }' \
  http://localhost:43411/v1.0/frontends

Development Environment

Development setup with debugging enabled:

{
  "Webserver": {
    "Port": 43411,
    "Debug": {
      "Routing": true,
      "Requests": true,
      "Responses": false
    }
  },
  "Logging": {
    "ConsoleLogging": true,
    "EnableColors": true,
    "MinimumSeverity": 0
  },
  "AdminBearerTokens": ["dev-token"]
}

Configuration Validation

OllamaFlow validates configuration on startup:

Common Validation Errors

  1. Invalid Port Range: Ports must be 1-65535
  2. Missing Required Fields: Identifier, Hostname required for backends
  3. Duplicate Identifiers: Frontend/Backend IDs must be unique
  4. Invalid Load Balancing: Must be "RoundRobin" or "Random"
  5. Invalid Hostnames: Must be valid hostname or "*"

Configuration Test

Validate configuration without starting the server:

# Test configuration file
dotnet OllamaFlow.Server.dll --validate-config

# Test with specific config
OLLAMAFLOW_CONFIG=/path/to/config.json dotnet OllamaFlow.Server.dll --validate-config

Migration and Backup

Database Backup

Backup the SQLite database regularly:

# Simple copy (stop OllamaFlow first)
cp ollamaflow.db ollamaflow.db.backup

# Online backup (while running)
sqlite3 ollamaflow.db ".backup /path/to/backup.db"

Configuration Migration

When upgrading OllamaFlow:

  1. Backup Configuration: Save current ollamaflow.json
  2. Backup Database: Save current ollamaflow.db
  3. Review Changes: Check for new configuration options
  4. Test Upgrade: Test in non-production environment first

Export/Import Configuration

Export current configuration for replication:

# Export all frontends
curl -H "Authorization: Bearer token" \
  http://localhost:43411/v1.0/frontends > frontends.json

# Export all backends
curl -H "Authorization: Bearer token" \
  http://localhost:43411/v1.0/backends > backends.json

Import configuration to new instance:

# Import backends first
cat backends.json | jq '.[]' | while read backend; do
  curl -X PUT -H "Authorization: Bearer token" \
    -H "Content-Type: application/json" \
    -d "$backend" \
    http://new-host:43411/v1.0/backends
done

# Then import frontends
cat frontends.json | jq '.[]' | while read frontend; do
  curl -X PUT -H "Authorization: Bearer token" \
    -H "Content-Type: application/json" \
    -d "$frontend" \
    http://new-host:43411/v1.0/frontends
done

Next Steps