API Documentation
Complete reference for integrating with alAPI's LLM, OCR, and Retrieval services
https://alapi.deep.sa/v1
OpenAI Compatible Use the official OpenAI SDK with our base URL. Drop-in replacement for existing applications.
LLM API
OpenAI-compatible API for chat completions and embeddings. Use your favorite models through a unified interface.
Authentication
All API requests require authentication using a Bearer token in the Authorization header.
Request Header:
Authorization: Bearer YOUR_API_KEY
API Key: Generate API keys from your Dashboard.
SDK Setup Example:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://alapi.deep.sa/v1"
)
Chat Completions
Create Chat Completion
Creates a model response for the given chat conversation
Endpoint:
https://alapi.deep.sa/v1/chat/completions
Request Body:
{
"model": "llama-3.3-70b-versatile",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 1024,
"stream": false
}
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | ID of the model to use |
messages |
array | Yes | Array of message objects with role and content |
temperature |
number | No | Sampling temperature (0-2). Default: 1 |
max_tokens |
integer | No | Maximum tokens to generate |
stream |
boolean | No | If true, returns a stream of events |
top_p |
number | No | Nucleus sampling parameter. Default: 1 |
Response (200 OK):
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1706745000,
"model": "llama-3.3-70b-versatile",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 10,
"total_tokens": 30
}
}
Code Examples:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://alapi.deep.sa/v1"
)
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of Saudi Arabia?"}
],
temperature=0.7,
max_tokens=1024
)
print(response.choices[0].message.content)
Streaming Responses
Server-Sent Events (SSE)
Stream responses token by token for real-time output
How it works: Set stream: true in your request. The response will be sent as Server-Sent Events, with each chunk containing a delta of the response content.
Code Examples:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://alapi.deep.sa/v1"
)
stream = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{"role": "user", "content": "Write a short poem about coding"}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Embeddings
Create Embeddings
Creates an embedding vector representing the input text
Endpoint:
https://alapi.deep.sa/v1/embeddings
Request Body:
{
"model": "deep-sa/alEmbedding",
"input": "The quick brown fox jumps over the lazy dog"
}
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | ID of the embedding model to use |
input |
string | array | Yes | Text to embed. Can be a string or array of strings |
encoding_format |
string | No | Format for the embeddings: 'float' or 'base64'. Default: float |
dimensions |
integer | No | Number of dimensions for the output embeddings (model-dependent) |
Response (200 OK):
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023064255, -0.009327292, ...]
}
],
"model": "deep-sa/alEmbedding",
"usage": {
"prompt_tokens": 9,
"total_tokens": 9
}
}
Code Examples:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://alapi.deep.sa/v1"
)
response = client.embeddings.create(
model="deep-sa/alEmbedding",
input="The quick brown fox jumps over the lazy dog"
)
embedding = response.data[0].embedding
print(f"Embedding dimension: {len(embedding)}")
Available Models
List Models
Returns the list of currently available models
Endpoint:
https://alapi.deep.sa/v1/models
Required Scope:
This endpoint requires an API key with the
models
Response (200 OK):
{
"object": "list",
"data": [
{
"id": "llama-3.3-70b-versatile",
"object": "model",
"created": 1706745000,
"owned_by": "groq",
"type": "llm"
},
{
"id": "deep-sa/alEmbedding",
"object": "model",
"created": 1706745000,
"owned_by": "openai",
"type": "embedding"
}
]
}
Code Examples:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://alapi.deep.sa/v1"
)
models = client.models.list()
for model in models.data:
print(f"{model.id} ({model.type})")
The following models are currently available through alAPI. Use the model name in your API requests.
| Model Name | Type | Provider | Avg Latency |
|---|---|---|---|
deep-sa/alEmbedding
|
embedding | deepcloud | ~1717ms |
deep-sa/alLLM
|
llm | deepcloud | ~2796ms |
google/gemini-3-flash
|
llm | google_gemini | ~4841ms |
google/gemini-3-pro
|
llm | google_gemini | ~18360ms |
gpt-oss-120b
|
llm | groq | ~4264ms |
gpt-oss-20b
|
llm | groq | ~536ms |
llama-3.3-70b
|
llm | groq | ~194ms |
llama-4-maverick-17b
|
llm | groq | ~791ms |
opanai/gpt-5-mini
|
llm | openai | ~2187ms |
qwen3-32b
|
llm | groq | ~2559ms |
Error Handling
The API uses standard HTTP status codes to indicate success or failure of requests.
Error Response Format:
{
"error": {
"message": "Error description",
"type": "invalid_request_error",
"code": "invalid_api_key"
}
}
Ready to Get Started?
Generate an API key from your dashboard and start building