fal.ai Media Generation
Drift-prone skill. fal.ai model IDs, pricing, inputs, and MCP tool names change quickly. Search or fetch the current model metadata before promising a specific model, parameter, output format, or cost.
Drift-prone skill. fal.ai model IDs, pricing, inputs, and MCP tool names change quickly. Search or fetch the current model metadata before promising a specific model, parameter, output format, or cost.
--- name: fal-ai-media description: Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI. origin: ECC ---
**Drift-prone skill.** fal.ai model IDs, pricing, inputs, and MCP tool names change quickly. Search or fetch the current model metadata before promising a specific model, parameter, output format, or cost.
Generate images, videos, and audio using fal.ai models via MCP.
fal.ai MCP server must be configured. Add to `~/.claude.json`:
"fal-ai": {
"command": "npx",
"args": ["-y", "fal-ai-mcp-server"],
"env": { "FAL_KEY": "YOUR_FAL_KEY_HERE" }
}Get an API key at [fal.ai](https://fal.ai).
The fal.ai MCP provides these tools:
---
Best for: quick iterations, drafts, text-to-image, image editing.
generate(
app_id: "fal-ai/nano-banana-2",
input_data: {
"prompt": "a futuristic cityscape at sunset, cyberpunk style",
"image_size": "landscape_16_9",
"num_images": 1,
"seed": 42
}
)Best for: production images, realism, typography, detailed prompts.
generate(
app_id: "fal-ai/nano-banana-pro",
input_data: {
"prompt": "professional product photo of wireless headphones on marble surface, studio lighting",
"image_size": "square",
"num_images": 1,
"guidance_scale": 7.5
}
)| Param | Type | Options | Notes | |-------|------|---------|-------| | `prompt` | string | required | Describe what you want | | `image_size` | string | `square`, `portrait_4_3`, `landscape_16_9`, `portrait_16_9`, `landscape_4_3` | Aspect ratio | | `num_images` | number | 1-4 | How many to generate | | `seed` | number | any integer | Reproducibility | | `guidance_scale` | number | 1-20 | How closely to follow the prompt (higher = more literal) |
Use Nano Banana 2 with an input image for inpainting, outpainting, or style transfer:
# First upload the source image
upload(file_path: "/path/to/image.png")
# Then generate with image input
generate(
app_id: "fal-ai/nano-banana-2",
input_data: {
"prompt": "same scene but in watercolor style",
"image_url": "<uploaded_url>",
"image_size": "landscape_16_9"
}
)---
Best for: text-to-video, image-to-video with high motion quality.
generate(
app_id: "fal-ai/seedance-1-0-pro",
input_data: {
"prompt": "a drone flyover of a mountain lake at golden hour, cinematic",
"duration": "5s",
"aspect_ratio": "16:9",
"seed": 42
}
)Best for: text/image-to-video with native audio generation.
generate(
app_id: "fal-ai/kling-video/v3/pro",
input_data: {
"prompt": "ocean waves crashing on a rocky coast, dramatic clouds",
"duration": "5s",
"aspect_ratio": "16:9"
}
)Best for: video with generated sound, high visual quality.
generate(
app_id: "fal-ai/veo-3",
input_data: {
"prompt": "a bustling Tokyo street market at night, neon signs, crowd noise",
"aspect_ratio": "16:9"
}
)Start from an existing image:
generate(
app_id: "fal-ai/seedance-1-0-pro",
input_data: {
"prompt": "camera slowly zooms out, gentle wind moves the trees",
"image_url": "<uploaded_image_url>",
"duration": "5s"
}
)| Param | Type | Options | Notes | |-------|------|---------|-------| | `prompt` | string | required | Describe the video | | `duration` | string | `"5s"`, `"10s"` | Video length | | `aspect_ratio` | string | `"16:9"`, `"9:16"`, `"1:1"` | Frame ratio | | `seed` | number | any integer | Reproducibility | | `image_url` | string | URL | Source image for image-to-video |
---
Text-to-speech with natural, conversational quality.
generate(
app_id: "fal-ai/csm-1b",
input_data: {
"text": "Hello, welcome to the demo. Let me show you how this works.",
"speaker_id": 0
}
)Generate matching audio from video content.
generate(
app_id: "fal-ai/thinksound",
input_data: {
"video_url": "<video_url>",
"prompt": "ambient forest sounds with birds chirping"
}
)For professional voice synthesis, use ElevenLabs directly:
import os
import requests
resp = requests.post(
"https://api.elevenlabs.io/v1/text-to-speech/<voice_id>",
headers={
"xi-api-key": os.environ["ELEVENLABS_API_KEY"],
"Content-Type": "application/json"
},
json={
"text": "Your text here",
"model_id": "eleven_turbo_v2_5",
"voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
}
)
with open("output.mp3", "wb") as f:
f.write(resp.content)If VideoDB is configured, use its generative audio:
# Voice generation
audio = coll.generate_voice(text="Your narration here", voice="alloy")
# Music generation
music = coll.generate_music(prompt="upbeat electronic background music", duration=30)
# Sound effects
sfx = coll.generate_sound_effect(prompt="thunder crack followed by rain")---
Before generating, check estimated cost:
estimate_cost(
estimate_type: "unit_price",
endpoints: {
"fal-ai/nano-banana-pro": {
"unit_quantity": 1
}
}
)Find models for specific tasks:
search(query: "text to video")
find(endpoint_ids: ["fal-ai/seedance-1-0-pro"])
models()