Skip to main content
POST
/
v1
/
audio
/
speech
curl --request POST \ --url https://baze-api.powerbuyin.top/v1/audio/speech \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data @- <<EOF { "model": "speech-2.8-hd", "input": "Summarize today's system status in a natural tone.", "voice": "Chinese (Mandarin)_Lyrical_Voice", "response_format": "mp3", "metadata": { "output_format": "url" } } EOF
"<string>"

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
model
enum<string>
required
Available options:
speech-2.8-hd,
speech-2.8-turbo,
speech-2.6-hd,
speech-2.6-turbo,
speech-02-hd,
speech-02-turbo
input
string
required

Input text to synthesize.

voice
string
required

Voice name. Mapped to upstream voice_setting.voice_id.

speed
number

Speaking rate. Mapped to upstream voice_setting.speed.

response_format
string

Target audio encoding format, mapped to upstream audio_setting.format. Common values include mp3, wav, and pcm.

stream_format
string

Streaming trigger field. Any non-empty value enables upstream stream=true; the current gateway only relies on its non-empty semantics.

metadata
object

MiniMax extension container. The only stable field currently documented here is output_format.

Response

Streaming success. Returns MiniMax SSE payloads. For internal settlement, the gateway reads extra_info.usage_characters as the unified input audio character count.

SSE payload. Each data: event can be parsed as a MiniMax response chunk containing data.audio, trace_id, and base_resp.