Deploying Apache Airflow with Docker on Ubuntu EC2 Instance

Deploying Apache Airflow with Docker on Ubuntu EC2 Instance

·

4 min read

Introduction

Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. Deploying Airflow using Docker simplifies the setup process and ensures a consistent environment. This guide provides a detailed step-by-step process to set up Apache Airflow on an Ubuntu EC2 instance using Docker and Docker Compose.

Prerequisites

  • AWS Account with permission to launch EC2 instances

  • Basic knowledge of Linux commands and Docker

  • SSH access to the EC2 instance

Step 1: Launch an EC2 Instance

  1. Log in to AWS Console

  2. Navigate to EC2 and click Launch Instance

  3. Select Ubuntu Server (20.04 or later)

  4. Choose t4g.medium or higher (Recommended for Docker)

  5. Configure Security Group:

    • Allow SSH (Port 22) from your IP

    • Allow HTTP (Port 80)

    • Allow Custom TCP (Port 8080) for Airflow Web UI

  6. Launch the instance and connect via SSH:

     ssh -i <your-key-pair.pem> ubuntu@<public-ip>
    

Step 2: Install Docker and Docker Compose

2.1 Update the System

sudo apt-get update && sudo apt-get upgrade -y

2.2 Install Required Packages

sudo apt-get install -y ca-certificates curl gnupg lsb-release

2.3 Add Docker Repository

sudo mkdir -m 0755 -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

2.4 Install Docker Engine

sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

2.5 Verify Docker Installation

docker --version

2.6 Install Docker Compose

sudo curl -L "https://github.com/docker/compose/releases/download/v2.20.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

2.7 Verify Docker Compose Installation

docker-compose --version

Step 3: Setup Apache Airflow

3.1 Create Airflow Directory

mkdir -p ~/airflow && cd ~/airflow

3.2 Download Docker-Compose YAML for Airflow

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.5.1/docker-compose.yaml'

3.3 Create Required Directories

mkdir -p ./dags ./logs ./plugins

3.4 Configure Environment

echo -e "AIRFLOW_UID=$(id -u)" > .env
# Alternatively, you can manually set:
echo -e "AIRFLOW_UID=50000" > .env

3.5 Initialize Airflow Database

docker-compose up airflow-init

3.5 Launch Airflow

docker-compose up -d

3.6 Check Airflow Logs (Optional)

docker-compose logs

Step 4: Access Airflow Web UI

  1. Modify Security Group to allow inbound traffic on Port 8080.

  2. Access Airflow UI via browser:

     http://<public-ip>:8080/
    
  3. Login Credentials:

    • Username: airflow (default)

    • Password: airflow (default)

Customizing the Docker Compose File as per requirements.

Step 1 : Create a .env File for Environment Variables

touch .env

Add the following content to .env:
This is an example. Please update the .env file with your own credentials and key values.

AIRFLOW_IMAGE_NAME=apache/airflow:2.10.4
POSTGRES_USER=airflow
POSTGRES_PASSWORD=airflow
POSTGRES_DB=airflow
POSTGRES_HOST=postgres
POSTGRES_PORT=5432
REDIS_HOST=redis
REDIS_PORT=6379
REDIS_DB=0
AIRFLOW_UID=50000
DBT_TYPE=postgres
DBT_HOST=localhost
DBT_USER=airflow
DBT_PASSWORD=airflow
DBT_PORT=5432
DBT_NAME=airflow
DBT_SCHEMA=public

Step 2: Configuring the Docker Compose File

Below is the Docker Compose configuration file used to set up Apache Airflow and its components.

docker-compose.yaml

version: '3'
x-airflow-common: &airflow-common
  image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.4}
  environment: &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ''
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
    AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
    AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
    _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
    - ./dados:/opt/airflow/dados
  user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-0}"
  depends_on: &airflow-common-depends-on
    redis:
      condition: service_healthy
    postgres:
      condition: service_healthy

Airflow Scheduler

Responsible for scheduling tasks in the DAGs.

  airflow-scheduler:
    <<: *airflow-common
    command: scheduler
    healthcheck:
      test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

Airflow Webserver

Hosts the Airflow web UI, accessible at localhost:8080.

  airflow-webserver:
    <<: *airflow-common
    command: webserver
    ports:
      - 8080:8080
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

Airflow Worker

Executes tasks.

  airflow-worker:
    <<: *airflow-common
    command: celery worker
    healthcheck:
      test: ["CMD-SHELL", 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"']
      interval: 10s
      timeout: 10s
      retries: 5
    environment:
      <<: *airflow-common-env
      DUMB_INIT_SETSID: "0"
    restart: always

Airflow Initialisation Service

Initialises the Airflow database and creates a default user.

  airflow-init:
    <<: *airflow-common
    environment:
      _AIRFLOW_DB_UPGRADE: 'true'
      _AIRFLOW_WWW_USER_CREATE: 'true'
      _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
      _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
    user: "0:${AIRFLOW_GID:-0}"

Flower

Provides monitoring for Celery workers, accessible at localhost:5555.

  flower:
    <<: *airflow-common
    command: celery flower
    ports:
      - 5555:5555
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

Redis

Redis is used as the message broker.

  redis:
    image: redis:latest
    expose:
      - 6379
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 30s
      retries: 50
    restart: always

Postgres

Postgres serves as the metadata database.

  postgres:
    image: postgres:13
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    volumes:
      - postgres-db-volume:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 5s
      retries: 5
    restart: always

Step 3: Start the Airflow Environment

docker-compose up --build

Thank you for reading! ❤️