Markdown

--- name: fal-ai-media description: Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI. origin: ECC ---

fal.ai Media Generation

**Drift-prone skill.** fal.ai model IDs, pricing, inputs, and MCP tool names change quickly. Search or fetch the current model metadata before promising a specific model, parameter, output format, or cost.

Generate images, videos, and audio using fal.ai models via MCP.

When to Activate

User wants to generate images from text prompts
Creating videos from text or images
Generating speech, music, or sound effects
Any media generation task
User says "generate image", "create video", "text to speech", "make a thumbnail", or similar

MCP Requirement

fal.ai MCP server must be configured. Add to `~/.claude.json`:

"fal-ai": {
  "command": "npx",
  "args": ["-y", "fal-ai-mcp-server"],
  "env": { "FAL_KEY": "YOUR_FAL_KEY_HERE" }
}

Get an API key at [fal.ai](https://fal.ai).

MCP Tools

The fal.ai MCP provides these tools:

`search` — Find available models by keyword
`find` — Get model details and parameters
`generate` — Run a model with parameters
`result` — Check async generation status
`status` — Check job status
`cancel` — Cancel a running job
`estimate_cost` — Estimate generation cost
`models` — List popular models
`upload` — Upload files for use as inputs

---

Image Generation

Nano Banana 2 (Fast)

Best for: quick iterations, drafts, text-to-image, image editing.

generate(
  app_id: "fal-ai/nano-banana-2",
  input_data: {
    "prompt": "a futuristic cityscape at sunset, cyberpunk style",
    "image_size": "landscape_16_9",
    "num_images": 1,
    "seed": 42
  }
)

Nano Banana Pro (High Fidelity)

Best for: production images, realism, typography, detailed prompts.

generate(
  app_id: "fal-ai/nano-banana-pro",
  input_data: {
    "prompt": "professional product photo of wireless headphones on marble surface, studio lighting",
    "image_size": "square",
    "num_images": 1,
    "guidance_scale": 7.5
  }
)

Common Image Parameters

| Param | Type | Options | Notes | |-------|------|---------|-------| | `prompt` | string | required | Describe what you want | | `image_size` | string | `square`, `portrait_4_3`, `landscape_16_9`, `portrait_16_9`, `landscape_4_3` | Aspect ratio | | `num_images` | number | 1-4 | How many to generate | | `seed` | number | any integer | Reproducibility | | `guidance_scale` | number | 1-20 | How closely to follow the prompt (higher = more literal) |

Image Editing

Use Nano Banana 2 with an input image for inpainting, outpainting, or style transfer:

# First upload the source image
upload(file_path: "/path/to/image.png")

# Then generate with image input
generate(
  app_id: "fal-ai/nano-banana-2",
  input_data: {
    "prompt": "same scene but in watercolor style",
    "image_url": "<uploaded_url>",
    "image_size": "landscape_16_9"
  }
)

---

Video Generation

Seedance 1.0 Pro (ByteDance)

Best for: text-to-video, image-to-video with high motion quality.

generate(
  app_id: "fal-ai/seedance-1-0-pro",
  input_data: {
    "prompt": "a drone flyover of a mountain lake at golden hour, cinematic",
    "duration": "5s",
    "aspect_ratio": "16:9",
    "seed": 42
  }
)

Kling Video v3 Pro

Best for: text/image-to-video with native audio generation.

generate(
  app_id: "fal-ai/kling-video/v3/pro",
  input_data: {
    "prompt": "ocean waves crashing on a rocky coast, dramatic clouds",
    "duration": "5s",
    "aspect_ratio": "16:9"
  }
)

Veo 3 (Google DeepMind)

Best for: video with generated sound, high visual quality.

generate(
  app_id: "fal-ai/veo-3",
  input_data: {
    "prompt": "a bustling Tokyo street market at night, neon signs, crowd noise",
    "aspect_ratio": "16:9"
  }
)

Image-to-Video

Start from an existing image:

generate(
  app_id: "fal-ai/seedance-1-0-pro",
  input_data: {
    "prompt": "camera slowly zooms out, gentle wind moves the trees",
    "image_url": "<uploaded_image_url>",
    "duration": "5s"
  }
)

Video Parameters

| Param | Type | Options | Notes | |-------|------|---------|-------| | `prompt` | string | required | Describe the video | | `duration` | string | `"5s"`, `"10s"` | Video length | | `aspect_ratio` | string | `"16:9"`, `"9:16"`, `"1:1"` | Frame ratio | | `seed` | number | any integer | Reproducibility | | `image_url` | string | URL | Source image for image-to-video |

---

Audio Generation

CSM-1B (Conversational Speech)

Text-to-speech with natural, conversational quality.

generate(
  app_id: "fal-ai/csm-1b",
  input_data: {
    "text": "Hello, welcome to the demo. Let me show you how this works.",
    "speaker_id": 0
  }
)

ThinkSound (Video-to-Audio)

Generate matching audio from video content.

generate(
  app_id: "fal-ai/thinksound",
  input_data: {
    "video_url": "<video_url>",
    "prompt": "ambient forest sounds with birds chirping"
  }
)

ElevenLabs (via API, no MCP)

For professional voice synthesis, use ElevenLabs directly:

import os
import requests

resp = requests.post(
    "https://api.elevenlabs.io/v1/text-to-speech/<voice_id>",
    headers={
        "xi-api-key": os.environ["ELEVENLABS_API_KEY"],
        "Content-Type": "application/json"
    },
    json={
        "text": "Your text here",
        "model_id": "eleven_turbo_v2_5",
        "voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
    }
)
with open("output.mp3", "wb") as f:
    f.write(resp.content)

VideoDB Generative Audio

If VideoDB is configured, use its generative audio:

# Voice generation
audio = coll.generate_voice(text="Your narration here", voice="alloy")

# Music generation
music = coll.generate_music(prompt="upbeat electronic background music", duration=30)

# Sound effects
sfx = coll.generate_sound_effect(prompt="thunder crack followed by rain")

---

Cost Estimation

Before generating, check estimated cost:

estimate_cost(
  estimate_type: "unit_price",
  endpoints: {
    "fal-ai/nano-banana-pro": {
      "unit_quantity": 1
    }
  }
)

Model Discovery

Find models for specific tasks:

search(query: "text to video")
find(endpoint_ids: ["fal-ai/seedance-1-0-pro"])
models()

Tips

Use `seed` for reproducible results when iterating on prompts
Start with lower-cost models (Nano Banana 2) for prompt iteration, then switch to Pro for finals
For video, keep prompts descriptive but concise — focus on motion and scene
Image-to-video produces more controlled results than pure text-to-video
Check `estimate_cost` before running expensive video generations

Related Skills

`videodb` — Video processing, editing, and streaming
`video-editing` — AI-powered video editing workflows
`content-engine` — Content creation for social platforms

fal.ai Media Generation

Markdown

fal.ai Media Generation

When to Activate

MCP Requirement

MCP Tools

Image Generation

Nano Banana 2 (Fast)

Nano Banana Pro (High Fidelity)

Common Image Parameters

Image Editing

Video Generation

Seedance 1.0 Pro (ByteDance)

Kling Video v3 Pro

Veo 3 (Google DeepMind)

Image-to-Video

Video Parameters

Audio Generation

CSM-1B (Conversational Speech)

ThinkSound (Video-to-Audio)

ElevenLabs (via API, no MCP)

VideoDB Generative Audio

Cost Estimation

Model Discovery

Tips

Related Skills

Conteúdos relacionados

Relacionados