Request Types
The Resultity API supports multiple inference request types to accommodate a wide range of use cases — from fast single-turn queries to heavy distributed tasks and multimodal jobs.
Currently Available
Synchronous
- Standard request-response cycle.
- The client sends a request and waits for the result in the same HTTP connection.
- Ideal for fast, low-latency jobs (e.g., short completions).
Polling
- The client sends a job and immediately receives a
request_id
. - The result must be fetched later using a status endpoint.
- Designed for longer or queued tasks.
Planned
Batch
- Submit multiple requests in a single call.
- Suitable for preloading embeddings, generating multiple completions, etc.
Streaming
- Sends tokens to the client as they are generated.
- Used for chatbots, streaming UIs, and voice assistants.
Multimodel
- Executes the same prompt on multiple models in parallel.
- Can be used for ensemble approaches or fallback logic.
- Examples:
mistral
+gemma
, oropenchat
+zephyr
.
Multinode
- Executes a single heavy model (e.g., LLaMA2 70B) across several nodes using sharding.
- Enables distributed inference with large context windows.
Media-Oriented Extensions
Planned support for non-text jobs using compatible APIs:
Vision
/v1/images/generations
— generate images from prompts (DALL·E-style);/v1/images/variations
— modify or enhance existing images;/v1/images/description
— describe or caption images;- Based on models such as Kandinsky, Stable Diffusion, Playground v2.
Audio
/v1/audio/transcriptions
— convert speech to text (e.g., Whisper, SeamlessM4T);/v1/audio/speech
— text-to-speech (e.g., Bark, xtts, tts-zero);/v1/audio/translation
— audio-based translation.
Video (experimental)
- Future support for generation and description (e.g., SVD, AnimateDiff, Zer0Scope);
- Initially via Space or custom Subclouds, not main Cloud API.
This modular structure allows Resultity to evolve from OpenAI-compatible endpoints to full multimodal capability, leveraging existing open-source and hosted models while maintaining API consistency.