AI Chat: API Usage | Zentrum für Datenverarbeitung

Our chat frontend https://ki-chat.uni-mainz.de not only supports interactive chat, but also serves OpenAI-API compatible API endpoints for seamless integration in own applications. This page provides basic instructions to make use of that feature.

Getting Started
API Endpoints
Usage Constraints
Specific Features

Getting Started

To use the API, you'll need to obtain an API key from the KI-Chat@JGU interface:

Log in to the KI-Chat@JGU platform at https://ki-chat.uni-mainz.de
Click your avatar image in the upper right corner and click Settings.
In the opened window, navigate to Account.
Under API keys, click Create New Key.
Use the copy button to copy the generated key and store it securely.

Important: Please make sure to keep your API key confidential.

To Top of Page

API Endpoints

The API endpoints under https://ki-chat.uni-mainz.de/api are compatible with the OpenAI-API standard as described in the official reference (https://platform.openai.com/docs/api-reference), but not all endpoints and features are necessarily supported.

Currently, we offer the endpoints /models,/chat/completions and embeddings. All endpoints require authentication using your personal API key as Bearer Token in the Authorization header:
(Authorization: Bearer API_KEY).

In the following, you'll find basic instructions for each endpoint, with example requests using the command-line tool curl.

To Top of Page

/models

Retrieve a list of all available models.

Request Example:

curl -H "Authorization: Bearer API_KEY" https://ki-chat.uni-mainz.de/api/models

OpenAI API reference: https://platform.openai.com/docs/api-reference/models/list

To Top of Page

/chat/completions

Generate a chat message, given the current message history.

Request Example:

curl -X POST https://ki-chat.uni-mainz.de/api/chat/completions \
  -H "Authorization: Bearer API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "GPT OSS 120B",
    "messages": [
      {"role": "user", "content": "Why is the sky blue?"}
    ]
  }'

Note: Use the parameter "stream". Like this you will receive a complete answer and a delta stream.

OpenAI API reference: https://platform.openai.com/docs/api-reference/chat/create

To Top of Page

/embeddings

Generate embeddings for text parts using the embedding model bge-m3

Request Example:

curl -X POST https://ki-chat.uni-mainz.de/api/embeddings \
  -H "Authorization: Bearer API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-m3",
    "input": ["Open WebUI is great!", "Let's generate embeddings."]
  }'

Note: For optimal performance, batch multiple chunks per requests.

OpenAI API reference: https://platform.openai.com/docs/api-reference/embeddings

To Top of Page

Usage Constraints

To ensure fair usage of our system, currently the following constraints are enforced:

Context Size Limits

65536 input tokens maximum context window
8192 output tokens maximum for non-reasoning models
16384 output tokens maximum for reasoning models

Rate Limits

Max. 2 parallel /chat/completions requests
Max. 1 parallel /embeddings request (please batch chunks together)
Max. 1 API requests per second over prolonged duration
Per 5 minutes: Max. 200.000 weighted tokens, calculated as 4 × Ausgabetokens + 1 × Eingabetokens (Kein Cache-Treffer) + 0.1 × Eingabetokens (Cache-Treffer)

Staying within these limits should prevent receiving Status 429 (Too Many Requests) responses.

To Top of Page

Specific Features

To Top of Page

GPT-OSS Reasoning Effort

The GPT-OSS model supports the reasoning_effort parameter to control the amount of reasoning tokens generated:

{
  "model": "GPT OSS 120B",
  "messages": [{"role": "user", "content": "Explain quantum physics simply"}],
  "reasoning_effort": "medium"
}

Available values:

"low" Minimal reasoning tokens (faster response)
"medium" Balanced reasoning (default)
"high" Extensive reasoning (more thorough but slower)

To Top of Page

Function Calling

All of our models except for Gemma3 27B support native function calling via the /chat/completions API.

Learn more: https://platform.openai.com/docs/guides/function-calling

Request Example:

curl -X POST https://ki-chat.uni-mainz.de/api/chat/completions \
  -H "Authorization: Bearer API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "GPT OSS 120B",
    "messages": [
      {
        "role": "user",
        "content": "What is the weather like in Boston today?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_current_weather",
          "description": "Get the current weather in a given location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
              },
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"]
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

However, the implementation of a full flow including tool responses back to the model are beyond the scope of these instructions.

To Top of Page

Multimodal Image Processing

To send images to multimodal models like Qwen3 235B VL, encode the image in base64 and include it in the message content. The following Python code serves as an example:


import requests
import base64

# API Config
api_url = "https://ki-chat.uni-mainz.de/api"
api_key = "API_KEY"

# Path of image to send
image_path = "test.jpg"

# Base64-encode the image
with open(image_path, "rb") as image_file:
    encoded_image = base64.b64encode(image_file.read()).decode('utf-8')

# API-Request Payload
payload = {
    "model": "Qwen3 235B VL",
    "messages": [
        {"role": "user", "content": [
            {"type": "text", "text": "What do you see?"},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{encoded_image}"
                }
            }
        ]}
    ]
}

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.post(
    f"{api_url}/chat/completions",
    headers=headers,
    json=payload
)

print(response.json())

To Top of Page

Fill In the Middle (FIM)

Qwen3 Coder 30B is the only model that supports classic “fill in the middle,” i.e., autocompletion based on surrounding text (usually code). To use this feature, the prefix and suffix text must be surrounded by special tokens in the query:

`<|fim_prefix|>PREFIX<|fim_suffix|>SUFFIX<|fim_middle|>`

This format must be used exactly as shown and sent as the content of a user message.

Example Request:


```bash
curl -X POST https://ki-chat.uni-mainz.de/api/chat/completions \
-H "Authorization: Bearer API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen3 Coder 30B",
"messages": [
{"role": "user", "content": "<|fim_prefix|>def approximate_pi()<|fim_suffix|>    return pi<|fim_middle|>"}
],
"max_tokens": 512
}'
```

To Top of Page