Introduction
Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. Deploying Airflow using Docker simplifies the setup process and ensures a consistent environment. This guide provides a detailed step-by-step process to set up Apache Airflow on an Ubuntu EC2 instance using Docker and Docker Compose.
Prerequisites
AWS Account with permission to launch EC2 instances
Basic knowledge of Linux commands and Docker
SSH access to the EC2 instance
Step 1: Launch an EC2 Instance
Log in to AWS Console
Navigate to EC2 and click Launch Instance
Select Ubuntu Server (20.04 or later)
Choose t4g.medium or higher (Recommended for Docker)
Configure Security Group:
Allow SSH (Port 22) from your IP
Allow HTTP (Port 80)
Allow Custom TCP (Port 8080) for Airflow Web UI
Launch the instance and connect via SSH:
ssh -i <your-key-pair.pem> ubuntu@<public-ip>
Step 2: Install Docker and Docker Compose
2.1 Update the System
sudo apt-get update && sudo apt-get upgrade -y
2.2 Install Required Packages
sudo apt-get install -y ca-certificates curl gnupg lsb-release
2.3 Add Docker Repository
sudo mkdir -m 0755 -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
2.4 Install Docker Engine
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
2.5 Verify Docker Installation
docker --version
2.6 Install Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/download/v2.20.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
2.7 Verify Docker Compose Installation
docker-compose --version
Step 3: Setup Apache Airflow
3.1 Create Airflow Directory
mkdir -p ~/airflow && cd ~/airflow
3.2 Download Docker-Compose YAML for Airflow
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.5.1/docker-compose.yaml'
3.3 Create Required Directories
mkdir -p ./dags ./logs ./plugins
3.4 Configure Environment
echo -e "AIRFLOW_UID=$(id -u)" > .env
# Alternatively, you can manually set:
echo -e "AIRFLOW_UID=50000" > .env
3.5 Initialize Airflow Database
docker-compose up airflow-init
3.5 Launch Airflow
docker-compose up -d
3.6 Check Airflow Logs (Optional)
docker-compose logs
Step 4: Access Airflow Web UI
Modify Security Group to allow inbound traffic on Port 8080.
Access Airflow UI via browser:
http://<public-ip>:8080/
Login Credentials:
Username: airflow (default)
Password: airflow (default)
Customizing the Docker Compose File as per requirements.
Step 1 : Create a .env
File for Environment Variables
touch .env
Add the following content to .env
:
This is an example. Please update the .env
file with your own credentials and key values.
AIRFLOW_IMAGE_NAME=apache/airflow:2.10.4
POSTGRES_USER=airflow
POSTGRES_PASSWORD=airflow
POSTGRES_DB=airflow
POSTGRES_HOST=postgres
POSTGRES_PORT=5432
REDIS_HOST=redis
REDIS_PORT=6379
REDIS_DB=0
AIRFLOW_UID=50000
DBT_TYPE=postgres
DBT_HOST=localhost
DBT_USER=airflow
DBT_PASSWORD=airflow
DBT_PORT=5432
DBT_NAME=airflow
DBT_SCHEMA=public
Step 2: Configuring the Docker Compose File
Below is the Docker Compose configuration file used to set up Apache Airflow and its components.
docker-compose.yaml
version: '3'
x-airflow-common: &airflow-common
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.4}
environment: &airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- ./dados:/opt/airflow/dados
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-0}"
depends_on: &airflow-common-depends-on
redis:
condition: service_healthy
postgres:
condition: service_healthy
Airflow Scheduler
Responsible for scheduling tasks in the DAGs.
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
Airflow Webserver
Hosts the Airflow web UI, accessible at localhost:8080
.
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- 8080:8080
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
Airflow Worker
Executes tasks.
airflow-worker:
<<: *airflow-common
command: celery worker
healthcheck:
test: ["CMD-SHELL", 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
environment:
<<: *airflow-common-env
DUMB_INIT_SETSID: "0"
restart: always
Airflow Initialisation Service
Initialises the Airflow database and creates a default user.
airflow-init:
<<: *airflow-common
environment:
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
user: "0:${AIRFLOW_GID:-0}"
Flower
Provides monitoring for Celery workers, accessible at localhost:5555
.
flower:
<<: *airflow-common
command: celery flower
ports:
- 5555:5555
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
interval: 10s
timeout: 10s
retries: 5
restart: always
Redis
Redis is used as the message broker.
redis:
image: redis:latest
expose:
- 6379
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always
Postgres
Postgres serves as the metadata database.
postgres:
image: postgres:13
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- postgres-db-volume:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
Step 3: Start the Airflow Environment
docker-compose up --build
Thank you for reading! ❤️