Edlitera works best when JavaScript is enabled. Click to learn more.

Workshops How to Serve Custom AI Models at Scale Using Modal

How to Serve Custom AI Models at Scale Using Modal

Ship custom ML and generative models without wrestling with DevOps. In this hands-on workshop, you’ll use Modal — a serverless compute platform for AI and data workloads — to define infrastructure in Python, run on-demand CPU/GPU hardware, and scale from one request to hundreds of workers in seconds. You’ll learn Modal’s core concepts, compare CPU vs. GPU execution, explore parallelism patterns, and deploy a production-style Stable Diffusion inference pipeline.

FREE

Upcoming cohort

Nov 26, 2025

Meeting time

Wednesday, 12:00PM ET

How to Serve Custom AI Models at Scale Using Modal

What will I learn?

Explain why model serving is mostly an infrastructure problem and how Modal abstracts it
Define infrastructure as code with modal.App and modal.Image
Choose and attach the right hardware and reason about costs & burst scaling
Persist and share artifacts using Modal Volumes, and mount external storage
Securely manage secrets and environment configuration

Implement map-based and function-level parallelism for speedups on CPU workloads
Run PyTorch inference on GPU and measure CPU vs. GPU performance differences
Build a hybrid workflow that orchestrates locally and executes heavy steps remotely
Deploy a Stable Diffusion text-to-image service on Modal with cached model weights for fast cold starts

Curriculum

Why Infra is Hard for AI

Provisioning, dependency hell, and scaling challenges

Modal Fundamentals

Apps, Images, functions, local entrypoints, secrets, and tokens

Data & State

Volumes for weights/datasets; mounting external stores; caching strategies

Parallelism Patterns

Map-based vs. function-level parallelism and when to use each

Hands-On Deployments

From simple remote functions to PyTorch GPU and Stable Diffusion on Modal

Hardware & Scaling

CPU vs. GPU selection, pricing, and burst capacity

Why Edlitera?

Build the coding, data and AI skills you need, online, on your own schedule. From learning to code as a beginner to mastering cutting-edge data science, machine learning and AI techniques.

Stylized drawing of a woman in front of a dashboard

Learning for the real world

Our courses are made with the input and feedback of top teams at Fortune 500 companies in Silicon Valley and on Wall Street.

No-fluff learning

Each minute of each course is packed full of insight, best practices and real-world experience from our expert instructors.

Learn by doing

Start writing code on your computer from Day One. Practice on hundreds of exercises. Apply your skills in mini-projects. Get instant feedback from video solutions.

Complete learning tracks

With over 150 hours of video lectures and hundreds of practice exercises and projects, our learning tracks will help you level up your skills whether you are a novice or an advanced learner.

What people are saying

"I walked into the bootcamp with some basic Python syntax and walked out with a much stronger, contextualized grasp of Python, an understanding of common mistakes, the ability to solve basic coding problems, and confidence in my ability to learn more."

Randi S., Edlitera Student

Randi S., a graduate of Edlitera's Python training bootcamp

"I wanted to learn Python and be able to process data without being tied and limited by Excel and macros. These classes gave me all the tools to do so and beyond. The materials provided, the engagement of the class by the tutors and their availability to help us were excellent."

Gaston G., Edlitera Student

Gaston G., a graduate of Edlitera's Python training bootcamp

Course Syllabus

1. The Problem Space: Serving Models at Scale

Provisioning clouds/VMs, dependency conflicts, networking & security
Kubernetes complexity vs. developer velocity

2. What Is Modal & How It Works

Serverless compute for AI/data; infra defined in Python
App groups functions; Image defines the container environment
Fast, secure sandboxes; launch times often sub-second (gVisor isolation)
On-demand hardware
How billing works

3. Getting Started

Environment setup, modal token new, sign in with Google/GitHub
Default free credits; using the CLI and project scaffolding
Best practice: local orchestration + remote execution

4. Core Concepts by Example

App & Image: build a lightweight Image (e.g., requests, beautifulsoup4)
Remote function: fetch a page title on Modal; compare local vs. remote execution
How to handle secrets
Volumes: cache model weights/datasets to avoid repeated downloads; mounting S3

5. Choosing the Right Hardware

When to use CPUs vs. GPUs
Cost/perf tradeoffs; matching task profile to instance type
Measuring gains: timing utilities, warmups, and GPU synchronization gotchas

6. Parallelism Patterns in Modal

Map-based parallelism
Function-level parallelism
Understanding the Python GIL and when parallel workers beat bigger machines

7. Hands-On: CPU & GPU Demos

CPU scaling demo
PyTorch CPU vs. GPU

8. Deploying a Generative Model: Stable Diffusion on Modal

Create a Volume for SD weights; one-off upload to cache artifacts
Add a Hugging Face token as a Secret; construct a diffusers pipeline
Attach hardware to the function; return a PIL image
Latency tips: model initialization strategies, keeping workers warm, batching & concurrency

9. Operating Considerations

Structuring repos for Modal apps; environments and reproducibility
Cost controls: right-sizing CPU/GPU, pay-per-second awareness, job timeouts
Safety & reliability: timeouts, retries, input validation, resource scoping
Extending to APIs: wrapping Modal functions behind HTTP endpoints; versioning and rollbacks