# API Reference

  OllamaFlow provides three sets of APIs: **Ollama-compatible APIs** and **OpenAI-compatible APIs** for AI inference, plus **Administrative APIs** for cluster management. All APIs support JSON request/response format and maintain full compatibility with existing Ollama and OpenAI clients.

## Base URL and Authentication

* **Base URL**: `http://your-ollamaflow-host:43411`
* **Admin Authentication**: Bearer token required for administrative endpoints
* **Ollama APIs**: No authentication required (proxied to backends)
* **OpenAI APIs**: No authentication required (proxied to backends)

### Authentication Header

```bash
# For administrative APIs
curl -H "Authorization: Bearer your-admin-token" \
  http://localhost:43411/v1.0/backends
```

## API Compatibility

OllamaFlow supports both Ollama and OpenAI-compatible API formats, allowing clients to use either API style without modification.

## Ollama-Compatible APIs

These endpoints maintain full compatibility with the Ollama API, allowing existing clients to work without modification.

### Generate Completion

Generate text completions using a specified model.

**POST** `/api/generate`

#### Request Body

```json
{
  "model": "llama3:8b",
  "prompt": "Why is the sky blue?",
  "stream": true,
  "options": {
    "temperature": 0.8,
    "num_predict": 100,
    "top_k": 20,
    "top_p": 0.9
  }
}
```

#### cURL Example

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:8b",
    "prompt": "Explain quantum computing in simple terms",
    "stream": false,
    "options": {
      "temperature": 0.7,
      "num_predict": 200
    }
  }' \
  http://localhost:43411/api/generate
```

#### Response

```json
{
  "model": "llama3:8b",
  "created_at": "2024-01-15T10:30:00.123456Z",
  "response": "Quantum computing is a revolutionary technology...",
  "done": true,
  "context": [1, 2, 3, 4, 5],
  "total_duration": 1234567890,
  "load_duration": 123456789,
  "prompt_eval_count": 10,
  "prompt_eval_duration": 234567890,
  "eval_count": 25,
  "eval_duration": 876543210
}
```

### Chat Completion

Generate chat-style completions with conversation context.

**POST** `/api/chat`

#### Request Body

```json
{
  "model": "llama3:8b",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful AI assistant."
    },
    {
      "role": "user",
      "content": "What is machine learning?"
    }
  ],
  "stream": true,
  "options": {
    "temperature": 0.8,
    "num_ctx": 2048
  }
}
```

#### cURL Example

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:8b",
    "stream": false,
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant specializing in technology."
      },
      {
        "role": "user",
        "content": "Explain the difference between AI and ML"
      }
    ]
  }' \
  http://localhost:43411/api/chat
```

### Pull Model

Download a model to the backend instances.

**POST** `/api/pull`

#### Request Body

```json
{
  "model": "llama3:8b",
  "insecure": false,
  "stream": true
}
```

#### cURL Example

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral:7b"
  }' \
  http://localhost:43411/api/pull
```

### Show Model Information

Get detailed information about a specific model.

**POST** `/api/show`

#### Request Body

```json
{
  "name": "llama3:8b",
  "verbose": true
}
```

#### cURL Example

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "name": "llama3:8b"
  }' \
  http://localhost:43411/api/show
```

### List Models

Get a list of available models across all backends.

**GET** `/api/tags`

#### cURL Example

```bash
curl http://localhost:43411/api/tags
```

#### Response

```json
{
  "models": [
    {
      "name": "llama3:8b",
      "model": "llama3:8b",
      "modified_at": "2024-01-15T10:30:00.123456Z",
      "size": 4661224576,
      "digest": "sha256:8934d96d3f08...",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "llama",
        "families": ["llama"],
        "parameter_size": "8B",
        "quantization_level": "Q4_0"
      }
    }
  ]
}
```

### List Running Models

Get information about currently running models.

**GET** `/api/ps`

#### cURL Example

```bash
curl http://localhost:43411/api/ps
```

### Generate Embeddings

Generate embeddings for text input.

**POST** `/api/embed`

#### Request Body

```json
{
  "model": "nomic-embed-text",
  "input": "The quick brown fox jumps over the lazy dog"
}
```

#### cURL Example

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed-text",
    "input": ["Hello world", "How are you?"]
  }' \
  http://localhost:43411/api/embed
```

### Delete Model

Remove a model from backend instances.

**DELETE** `/api/delete`

#### Request Body

```json
{
  "name": "llama3:8b"
}
```

#### cURL Example

```bash
curl -X DELETE \
  -H "Content-Type: application/json" \
  -d '{
    "name": "old-model:7b"
  }' \
  http://localhost:43411/api/delete
```

## OpenAI-Compatible APIs

OllamaFlow also supports OpenAI-compatible API endpoints, allowing existing OpenAI clients and tools to work seamlessly.

### Generate Completion

Generate text completions using OpenAI-compatible format.

**POST** `/v1/completions`

#### Request Body

```json
{
  "model": "llama3:8b",
  "prompt": "Why is the sky blue?",
  "max_tokens": 100,
  "temperature": 0.8,
  "top_p": 0.9,
  "stream": false
}
```

#### cURL Example

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:8b",
    "prompt": "Explain quantum computing in simple terms",
    "max_tokens": 200,
    "temperature": 0.7,
    "stream": false
  }' \
  http://localhost:43411/v1/completions
```

### Chat Completion

Generate chat-style completions using OpenAI-compatible format.

**POST** `/v1/chat/completions`

#### Request Body

```json
{
  "model": "llama3:8b",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful AI assistant."
    },
    {
      "role": "user",
      "content": "What is machine learning?"
    }
  ],
  "max_tokens": 150,
  "temperature": 0.8,
  "stream": false
}
```

#### cURL Example

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:8b",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant specializing in technology."
      },
      {
        "role": "user",
        "content": "Explain the difference between AI and ML"
      }
    ],
    "max_tokens": 150,
    "temperature": 0.7
  }' \
  http://localhost:43411/v1/chat/completions
```

### Generate Embeddings

Generate embeddings using OpenAI-compatible format.

**POST** `/v1/embeddings`

#### Request Body

```json
{
  "model": "nomic-embed-text",
  "input": "The quick brown fox jumps over the lazy dog"
}
```

#### cURL Example

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed-text",
    "input": ["Hello world", "How are you?"]
  }' \
  http://localhost:43411/v1/embeddings
```

#### Response

```json
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.1, 0.2, 0.3, ...],
      "index": 0
    }
  ],
  "model": "nomic-embed-text",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}
```

### List Models

Get available models using OpenAI-compatible format.

**GET** `/v1/models`

#### cURL Example

```bash
curl http://localhost:43411/v1/models
```

#### Response

```json
{
  "object": "list",
  "data": [
    {
      "id": "llama3:8b",
      "object": "model",
      "created": 1704067200,
      "owned_by": "ollama"
    }
  ]
}
```

## Administrative APIs

These endpoints provide cluster management capabilities and require bearer token authentication.

### Frontend Management

#### List All Frontends

**GET** `/v1.0/frontends`

```bash
curl -H "Authorization: Bearer your-admin-token" \
  http://localhost:43411/v1.0/frontends
```

#### Get Frontend

**GET** `/v1.0/frontends/{identifier}`

```bash
curl -H "Authorization: Bearer your-admin-token" \
  http://localhost:43411/v1.0/frontends/frontend1
```

#### Create Frontend

**PUT** `/v1.0/frontends`

```bash
curl -X PUT \
  -H "Authorization: Bearer your-admin-token" \
  -H "Content-Type: application/json" \
  -d '{
    "Identifier": "production-frontend",
    "Name": "Production AI Inference",
    "Hostname": "ai.company.com",
    "LoadBalancing": "RoundRobin",
    "TimeoutMs": 90000,
    "Backends": ["gpu-1", "gpu-2", "gpu-3"],
    "RequiredModels": ["llama3:8b", "mistral:7b"],
    "MaxRequestBodySize": 1073741824,
    "UseStickySessions": true,
    "StickySessionExpirationMs": 3600000
  }' \
  http://localhost:43411/v1.0/frontends
```

#### Update Frontend

**PUT** `/v1.0/frontends/{identifier}`

```bash
curl -X PUT \
  -H "Authorization: Bearer your-admin-token" \
  -H "Content-Type: application/json" \
  -d '{
    "Identifier": "production-frontend",
    "Name": "Updated Production Frontend",
    "Hostname": "*",
    "LoadBalancing": "Random",
    "Backends": ["gpu-1", "gpu-2", "gpu-3", "gpu-4"],
    "RequiredModels": ["llama3:8b", "mistral:7b", "codellama:13b"]
  }' \
  http://localhost:43411/v1.0/frontends/production-frontend
```

#### Delete Frontend

**DELETE** `/v1.0/frontends/{identifier}`

```bash
curl -X DELETE \
  -H "Authorization: Bearer your-admin-token" \
  http://localhost:43411/v1.0/frontends/old-frontend
```

### Backend Management

#### List All Backends

**GET** `/v1.0/backends`

```bash
curl -H "Authorization: Bearer your-admin-token" \
  http://localhost:43411/v1.0/backends
```

#### Get Backend

**GET** `/v1.0/backends/{identifier}`

```bash
curl -H "Authorization: Bearer your-admin-token" \
  http://localhost:43411/v1.0/backends/gpu-1
```

#### Create Backend

**PUT** `/v1.0/backends`

```bash
curl -X PUT \
  -H "Authorization: Bearer your-admin-token" \
  -H "Content-Type: application/json" \
  -d '{
    "Identifier": "gpu-server-4",
    "Name": "GPU Server 4",
    "Hostname": "192.168.1.104",
    "Port": 11434,
    "Ssl": false,
    "HealthCheckUrl": "/api/version",
    "HealthCheckMethod": "GET",
    "UnhealthyThreshold": 3,
    "HealthyThreshold": 2,
    "MaxParallelRequests": 8,
    "RateLimitRequestsThreshold": 20,
    "LogRequestBody": false,
    "LogResponseBody": false
  }' \
  http://localhost:43411/v1.0/backends
```

#### Update Backend

**PUT** `/v1.0/backends/{identifier}`

```bash
curl -X PUT \
  -H "Authorization: Bearer your-admin-token" \
  -H "Content-Type: application/json" \
  -d '{
    "Identifier": "gpu-server-1",
    "Name": "Updated GPU Server 1",
    "Hostname": "192.168.1.101",
    "Port": 11434,
    "MaxParallelRequests": 12,
    "UnhealthyThreshold": 2
  }' \
  http://localhost:43411/v1.0/backends/gpu-server-1
```

#### Delete Backend

**DELETE** `/v1.0/backends/{identifier}`

```bash
curl -X DELETE \
  -H "Authorization: Bearer your-admin-token" \
  http://localhost:43411/v1.0/backends/old-backend
```

### Health Monitoring

#### Get All Backend Health

**GET** `/v1.0/backends/health`

```bash
curl -H "Authorization: Bearer your-admin-token" \
  http://localhost:43411/v1.0/backends/health
```

#### Response

```json
[
  {
    "Identifier": "backend1",
    "Name": "My localhost Ollama instance",
    "Hostname": "localhost",
    "Port": 11434,
    "Ssl": false,
    "UnhealthyThreshold": 2,
    "HealthyThreshold": 2,
    "HealthCheckMethod": {
      "Method": "GET"
    },
    "HealthCheckUrl": "/",
    "MaxParallelRequests": 4,
    "RateLimitRequestsThreshold": 10,
    "LogRequestFull": false,
    "LogRequestBody": false,
    "LogResponseBody": false,
    "ApiFormat": "Ollama",
    "PinnedEmbeddingsProperties": {},
    "PinnedCompletionsProperties": {
      "model": "qwen2.5:3b",
      "options": {
        "temperature": 0.1,
        "howdy": "doody"
      }
    },
    "AllowEmbeddings": true,
    "AllowCompletions": true,
    "Active": true,
    "CreatedUtc": "2025-09-29T23:15:45.659639Z",
    "LastUpdateUtc": "2025-09-29T23:19:07.346900Z",
    "HealthySinceUtc": "2025-09-30T01:53:21.026058Z",
    "Uptime": "00:25:52.4859452",
    "ActiveRequests": 0,
    "IsSticky": false
  }
]
```

#### Get Single Backend Health

**GET** `/v1.0/backends/{identifier}/health`

```bash
curl -H "Authorization: Bearer your-admin-token" \
  http://localhost:43411/v1.0/backends/backend1/health
```

#### Response

```json
{
  "Identifier": "backend1",
  "Name": "My localhost Ollama instance",
  "Hostname": "localhost",
  "Port": 11434,
  "Ssl": false,
  "UnhealthyThreshold": 2,
  "HealthyThreshold": 2,
  "HealthCheckMethod": {
    "Method": "GET"
  },
  "HealthCheckUrl": "/",
  "MaxParallelRequests": 4,
  "RateLimitRequestsThreshold": 10,
  "LogRequestFull": false,
  "LogRequestBody": false,
  "LogResponseBody": false,
  "ApiFormat": "Ollama",
  "PinnedEmbeddingsProperties": {},
  "PinnedCompletionsProperties": {
    "model": "qwen2.5:3b",
    "options": {
      "temperature": 0.1,
      "howdy": "doody"
    }
  },
  "AllowEmbeddings": true,
  "AllowCompletions": true,
  "Active": true,
  "CreatedUtc": "2025-09-29T23:15:45.659639Z",
  "LastUpdateUtc": "2025-09-29T23:19:07.346900Z",
  "HealthySinceUtc": "2025-09-30T01:53:21.026058Z",
  "Uptime": "00:26:32.4690556",
  "ActiveRequests": 0,
  "IsSticky": false
}
```

## Error Responses

All APIs return standard HTTP status codes and JSON error responses.

### Error Response Format

```json
{
  "error": "BadRequest",
  "message": "Invalid request format",
  "details": "Missing required field: model",
  "timestamp": "2024-01-15T10:30:00.123456Z",
  "requestId": "12345678-1234-1234-1234-123456789012"
}
```

### Common Error Codes

| Status | Error Type         | Description                          |
| ------ | ------------------ | ------------------------------------ |
| 400    | BadRequest         | Invalid request format or parameters |
| 401    | Unauthorized       | Missing or invalid bearer token      |
| 404    | NotFound           | Resource not found                   |
| 409    | Conflict           | Resource already exists or conflict  |
| 429    | TooManyRequests    | Rate limit exceeded                  |
| 500    | InternalError      | Server error                         |
| 502    | BadGateway         | Backend unavailable                  |
| 503    | ServiceUnavailable | No healthy backends available        |

## Rate Limiting

OllamaFlow implements rate limiting at the backend level:

* Each backend has a configurable `RateLimitRequestsThreshold`
* Requests exceeding the threshold receive `429 Too Many Requests`
* Rate limiting is applied per backend, not globally

## Streaming Responses

Both Ollama APIs and admin APIs support streaming where applicable:

* **Text Generation**: Set `"stream": true` for real-time token streaming
* **Model Downloads**: Progress updates during model pulls
* **Health Monitoring**: Server-sent events for real-time status updates

### Streaming Example

```bash
# Stream text generation
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:8b",
    "prompt": "Write a story about space exploration",
    "stream": true
  }' \
  http://localhost:43411/api/generate
```

## Request Headers

### Standard Headers

* `Content-Type: application/json` - Required for POST/PUT requests
* `Accept: application/json` - Recommended for consistent responses
* `User-Agent: your-client/1.0` - Optional client identification

### Custom Headers

* `X-Request-ID: uuid` - Optional request tracking
* `X-Frontend-Hint: frontend-id` - Optional frontend selection hint

### Response Headers

* `X-Request-ID: uuid` - Request tracking identifier
* `X-Backend-Used: backend-id` - Which backend processed the request
* `X-Model-Synchronized: true/false` - Whether model sync was required

## Postman Collection

A complete Postman collection with all API endpoints and examples is available in the OllamaFlow repository:

**Download**: [OllamaFlow.postman\_collection.json](https://github.com/jchristn/ollamaflow/blob/main/OllamaFlow.postman_collection.json)

The collection includes:

* All Ollama-compatible endpoints with sample requests
* Complete admin API coverage with authentication
* Environment variables for easy configuration
* Response examples and test scripts

## Security and Access Control

OllamaFlow provides comprehensive security controls through Frontend and Backend configuration:

### Request Type Controls

* **AllowEmbeddings**: Controls access to embeddings endpoints
  * Ollama API: `/api/embed`
  * OpenAI API: `/v1/embeddings`
* **AllowCompletions**: Controls access to completion endpoints
  * Ollama API: `/api/generate`, `/api/chat`
  * OpenAI API: `/v1/completions`, `/v1/chat/completions`

For a request to succeed, both the frontend and at least one assigned backend must allow the request type.

### Pinned Properties

Administrators can enforce specific parameters in requests through pinned properties:

* **PinnedEmbeddingsProperties**: Key-value pairs merged into all embeddings requests
* **PinnedCompletionsProperties**: Key-value pairs merged into all completion requests

Pinned properties take precedence over client-specified values, enabling organizational compliance and standardization.

### Example Security Configuration

```bash
# Create a frontend that only allows completions with enforced temperature
curl -X PUT \
  -H "Authorization: Bearer your-admin-token" \
  -H "Content-Type: application/json" \
  -d '{
    "Identifier": "secure-frontend",
    "Name": "Secure Completions Only",
    "AllowEmbeddings": false,
    "AllowCompletions": true,
    "PinnedCompletionsProperties": {
      "options": {
        "temperature": 0.7,
        "num_ctx": 2048
      }
    },
    "Backends": ["secure-backend"]
  }' \
  http://localhost:43411/v1.0/frontends
```

With this configuration:

* ✅ **Allowed**: Completion requests to both API formats
  * `POST /api/generate` (Ollama)
  * `POST /api/chat` (Ollama)
  * `POST /v1/completions` (OpenAI)
  * `POST /v1/chat/completions` (OpenAI)
* ❌ **Blocked**: Embeddings requests to both API formats
  * `POST /api/embed` (Ollama)
  * `POST /v1/embeddings` (OpenAI)

## API Explorer

OllamaFlow includes a companion web-based API Explorer for testing and validation:

* **Repository**: [https://github.com/ollamaflow/apiexplorer](https://github.com/ollamaflow/apiexplorer)
* **Purpose**: Test and evaluate APIs in scaled inference architectures
* **Features**: Real-time API testing, JSON validation, response inspection
* **Formats**: Supports both Ollama and OpenAI API formats

The API Explorer provides an intuitive interface for development debugging, load testing, and integration validation.

## SDK and Client Libraries

OllamaFlow supports both Ollama and OpenAI client libraries:

### Ollama-Compatible Libraries

* **Python**: `ollama-python`
* **JavaScript**: `ollama-js`
* **Go**: `ollama-go`
* **Rust**: `ollama-rs`
* **Java**: `ollama-java`

### OpenAI-Compatible Libraries

* **Python**: `openai` (official OpenAI Python library)
* **JavaScript**: `openai` (official OpenAI Node.js library)
* **Go**: `go-openai`
* **Rust**: `async-openai`
* **Java**: `openai-java`

Simply point these libraries to your OllamaFlow endpoint instead of a direct Ollama or OpenAI instance. For OpenAI libraries, use the base URL `http://your-ollamaflow-host:43411/v1`.

## Next Steps

* Explore [Configuration Examples](configuration-examples.md) for common scenarios
* Review [REST API Basics](rest-api-basics.md) for API fundamentals
* Check [Monitoring and Observability](monitoring.md) for production insights