"It works on my machine" - câu nói kinh điển của developers khi code chạy tốt trên laptop nhưng fail trên server production. Containerization giải quyết vấn đề này bằng cách đóng gói ứng dụng cùng toàn bộ dependencies vào một package nhất quán.
Trong thời đại AI/ML, containerization còn quan trọng hơn - models cần specific Python versions, CUDA versions, library versions. Một mismatch nhỏ có thể làm model fail hoàn toàn.
┌─────────────────────────────────────┐
│ Physical Hardware │
├─────────────────────────────────────┤
│ Hypervisor (ESXi) │
├──────────┬──────────┬───────────────┤
│ VM 1 │ VM 2 │ VM 3 │
│ ┌──────┐ │ ┌──────┐ │ ┌──────┐ │
│ │ OS │ │ │ OS │ │ │ OS │ │
│ │Ubuntu│ │ │CentOS│ │ │Windows│ │
│ └──────┘ │ └──────┘ │ └──────┘ │
│ ┌──────┐ │ ┌──────┐ │ ┌──────┐ │
│ │ App │ │ │ App │ │ │ App │ │
│ └──────┘ │ └──────┘ │ └──────┘ │
└──────────┴──────────┴───────────────┘
Each VM: Full OS + Apps (GB per VM)
Boot time: Minutes
Overhead: High (multiple OS kernels)
┌─────────────────────────────────────┐
│ Physical Hardware │
├─────────────────────────────────────┤
│ Host OS (Linux) │
├─────────────────────────────────────┤
│ Container Engine (Docker) │
├──────────┬──────────┬───────────────┤
│Container1│Container2│ Container3 │
│ ┌──────┐ │ ┌──────┐ │ ┌──────┐ │
│ │ App │ │ │ App │ │ │ App │ │
│ │+Libs │ │ │+Libs │ │ │+Libs │ │
│ └──────┘ │ └──────┘ │ └──────┘ │
└──────────┴──────────┴───────────────┘
Each Container: Just Apps + Libs (MB)
Boot time: Seconds
Overhead: Low (shared OS kernel)
Key Differences:
| Aspect | VM | Container |
|---|---|---|
| Size | GBs | MBs |
| Startup | Minutes | Seconds |
| Isolation | Full (OS-level) | Process-level |
| Performance | Lower | Near-native |
| Density | 10s per host | 100s per host |
When to use VMs:
When to use Containers:
Docker là container platform phổ biến nhất.
┌─────────────────────────────────────┐
│ Docker Client │
│ (docker CLI commands) │
└──────────────┬──────────────────────┘
│ REST API
┌──────────────▼──────────────────────┐
│ Docker Daemon │
│ (dockerd - manages:) │
│ ┌──────────────────────────────┐ │
│ │ Images │ │
│ │ (Read-only templates) │ │
│ └──────────────────────────────┘ │
│ ┌──────────────────────────────┐ │
│ │ Containers │ │
│ │ (Running instances) │ │
│ └──────────────────────────────┘ │
│ ┌──────────────────────────────┐ │
│ │ Networks │ │
│ │ (Container networking) │ │
│ └──────────────────────────────┘ │
│ ┌──────────────────────────────┐ │
│ │ Volumes │ │
│ │ (Persistent storage) │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────┘
1. Docker Image:
2. Docker Container:
3. Docker Registry:
Dockerfile là blueprint để build image.
# Base image - start from Python 3.10
FROM python:3.10-slim
# Set working directory
WORKDIR /app
# Copy requirements first (for caching)
COPY requirements.txt .
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Expose port
EXPOSE 8000
# Command to run
CMD ["python", "app.py"]
FROM python:3.10-slim
WORKDIR /app
# System dependencies for ML libraries
RUN apt-get update && apt-get install -y \
build-essential \
curl \
&& rm -rf /var/lib/apt/lists/*
# Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy model files
COPY models/ /app/models/
COPY src/ /app/src/
# Environment variables
ENV MODEL_PATH=/app/models/model.pkl
ENV PORT=8000
EXPOSE 8000
# Run FastAPI server
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
FROM: Base image
FROM python:3.10-slim # Official Python slim
FROM nvidia/cuda:11.8-base # CUDA support
FROM ubuntu:22.04 # Ubuntu base
WORKDIR: Set working directory
WORKDIR /app
# All subsequent commands run in /app
COPY: Copy files from host to container
COPY requirements.txt . # Copy file
COPY src/ /app/src/ # Copy directory
COPY . . # Copy everything
RUN: Execute commands during build
RUN pip install torch # Install package
RUN apt-get update && \ # Multi-line with \
apt-get install -y curl
ENV: Set environment variables
ENV PYTHONUNBUFFERED=1
ENV MODEL_PATH=/app/models
EXPOSE: Document which ports are used
EXPOSE 8000
# Doesn't actually publish port, just documentation
CMD: Default command when container starts
CMD ["python", "app.py"] # Exec form (preferred)
CMD python app.py # Shell form
ENTRYPOINT: Configure container as executable
ENTRYPOINT ["python"]
CMD ["app.py"]
# Result: python app.py
# Can override CMD: docker run image script.py → python script.py
Reduce final image size by using multiple stages.
# Stage 1: Build
FROM python:3.10 as builder
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
# Copy source
COPY src/ /app/src/
# Stage 2: Runtime
FROM python:3.10-slim
WORKDIR /app
# Copy only necessary files from builder
COPY --from=builder /root/.local /root/.local
COPY --from=builder /app/src /app/src
# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH
CMD ["python", "src/main.py"]
Benefits:
Docker caches each layer. Order matters for efficiency!
# ❌ BAD - App code changes invalidate all layers
FROM python:3.10
COPY . . # Everything copied
RUN pip install -r requirements.txt # Re-runs every time code changes
# ✅ GOOD - Dependencies cached separately
FROM python:3.10
COPY requirements.txt . # Copy requirements first
RUN pip install -r requirements.txt # Cached unless requirements change
COPY . . # Copy code last
Principle:
# Build image
docker build -t myapp:v1 .
docker build -t myapp:latest --no-cache . # Force rebuild
# List images
docker images
# Remove image
docker rmi myapp:v1
# Pull from registry
docker pull python:3.10
# Push to registry
docker tag myapp:v1 username/myapp:v1
docker push username/myapp:v1
# Inspect image
docker inspect myapp:v1
docker history myapp:v1 # Show layers
# Run container
docker run myapp:v1 # Run and exit
docker run -d myapp:v1 # Detached (background)
docker run -p 8000:8000 myapp:v1 # Port mapping
docker run -v /data:/app/data myapp:v1 # Volume mount
docker run --name mycontainer myapp:v1 # Named container
docker run --rm myapp:v1 # Auto-remove after exit
# List containers
docker ps # Running containers
docker ps -a # All containers
# Start/stop containers
docker start container_id
docker stop container_id
docker restart container_id
# Execute command in running container
docker exec -it container_id bash # Interactive shell
docker exec container_id ls /app # Run command
# View logs
docker logs container_id
docker logs -f container_id # Follow logs
# Remove container
docker rm container_id
docker rm -f container_id # Force remove running container
# Container stats
docker stats container_id
# Build ML model serving image
docker build -t ml-api:v1 .
# Run with GPU support (NVIDIA)
docker run --gpus all \
-p 8000:8000 \
-v $(pwd)/models:/app/models \
-e MODEL_NAME=bert-base \
--name ml-server \
ml-api:v1
# Check logs
docker logs -f ml-server
# Test API
curl http://localhost:8000/predict -d '{"text": "Hello world"}'
# Stop and remove
docker stop ml-server
docker rm ml-server
Manage multiple containers together.
version: '3.8'
services:
# FastAPI backend
api:
build: ./api
ports:
- "8000:8000"
environment:
- DATABASE_URL=postgresql://user:pass@db:5432/mydb
- REDIS_URL=redis://cache:6379
depends_on:
- db
- cache
volumes:
- ./api:/app
- models:/app/models
restart: unless-stopped
# PostgreSQL database
db:
image: postgres:15
environment:
- POSTGRES_USER=user
- POSTGRES_PASSWORD=pass
- POSTGRES_DB=mydb
volumes:
- postgres_data:/var/lib/postgresql/data
ports:
- "5432:5432"
# Redis cache
cache:
image: redis:7-alpine
ports:
- "6379:6379"
command: redis-server --appendonly yes
volumes:
- redis_data:/data
# Nginx reverse proxy
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
depends_on:
- api
volumes:
postgres_data:
redis_data:
models:
# Start all services
docker-compose up
docker-compose up -d # Detached
# Stop all services
docker-compose down
docker-compose down -v # Also remove volumes
# View logs
docker-compose logs
docker-compose logs -f api # Follow specific service
# Scale services
docker-compose up -d --scale api=3 # Run 3 API instances
# Rebuild and restart
docker-compose up -d --build
# Execute in service
docker-compose exec api bash
# ❌ Generic
FROM python:3
# ✅ Specific version
FROM python:3.10.12-slim
# ✅ ML-optimized
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
# Use slim/alpine variants
FROM python:3.10-slim # ~120MB vs 900MB for full
# Clean up in same layer
RUN apt-get update && \
apt-get install -y --no-install-recommends curl && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Use .dockerignore
# .dockerignore file:
__pycache__
*.pyc
.git
.venv
*.log
# Don't run as root
FROM python:3.10-slim
RUN useradd -m -u 1000 appuser
USER appuser
WORKDIR /home/appuser/app
COPY --chown=appuser:appuser . .
# Scan for vulnerabilities
# docker scan myimage:v1
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Option 1: Bake into image (small models)
COPY models/model.pkl /app/models/
# Option 2: Mount volume (large models)
# docker run -v /path/to/models:/app/models myapp
# Option 3: Download at runtime
RUN pip install huggingface-hub
CMD ["python", "-c", "from huggingface_hub import snapshot_download; \
snapshot_download('bert-base-uncased', cache_dir='/app/models')"]
# Use build args
ARG ENV=production
ENV APP_ENV=${ENV}
# Build for different envs
# docker build --build-arg ENV=development -t myapp:dev .
# docker build --build-arg ENV=production -t myapp:prod .
Containers alone aren't enough for production. Need orchestration:
Kubernetes (next topic):
Example scenario:
Single Container:
- Manual start/stop
- Manual scaling
- No automatic recovery
- Manual load balancing
Kubernetes Cluster:
- Auto-start on failure
- Auto-scale based on load
- Built-in load balancer
- Zero-downtime deployments
Trong bài tiếp theo, chúng ta sẽ khám phá Model Serving Architecture - batch vs online inference, model formats (ONNX, TensorRT), và deployment strategies.
Bài viết thuộc series "From Zero to AI Engineer" - Module 9: Deployment Strategy