Build and Run a Data Engineering Pipeline on Your Local System

Step-by-Step Setup Guide

1. Install Prerequisites

Make sure these are installed:

Docker Desktop (Windows/Mac) OR Docker Engine (Linux)
Docker Compose (already included in Docker Desktop)
Git (optional, but helpful)

Verify installation:

docker --version
docker compose version

2. Create Project Folder

mkdir data-engineering-pipeline
cd data-engineering-pipeline

3. Add Files

`docker-compose.yml`

Create docker-compose.yml inside data-engineering-pipeline/ with the content I provided earlier.

PostgreSQL Init Script

Make a folder postgres/ and add file init.sql:

mkdir postgres
nano postgres/init.sql

Paste:

CREATE TABLE IF NOT EXISTS weather_data (
id SERIAL PRIMARY KEY,
city VARCHAR(50),
temperature FLOAT,
humidity FLOAT,
timestamp BIGINT
);

Airflow DAGs Folder

Make airflow/dags/ folder:

mkdir -p airflow/dags

Inside it, create three files:

weather_pipeline.py (Airflow DAG)
fetch_weather.py (data ingestion)
process_weather.py (processing + insert to PostgreSQL)

Paste the scripts I gave you earlier into these files.

4. Start Services

Run:

docker compose up -d

This will start:

PostgreSQL (5432)
Kafka + Zookeeper (9092)
Airflow (8080)
Superset (8088)

Check running containers:

docker ps

5. Access UIs

Airflow → http://localhost:8080

Default user: airflow / airflow (or create your own with airflow users create)

Superset → http://localhost:8088

6. Trigger the Pipeline

Go to Airflow UI → Enable DAG weather_pipeline.
It will:

Run fetch_weather.py → Pull data from OpenWeather API → Send to Kafka.
Run process_weather.py → Consume Kafka data → Insert into PostgreSQL.

7. Build Dashboard in Superset

Log in at http://localhost:8088.
Connect PostgreSQL:

Host: postgres
Port: 5432
Database: weatherdb
User: postgres
Password: postgres

Create a new dataset from weather_data table.
Build charts:

Line chart: temperature trend over time.
Bar chart: average humidity per city.
Alerts: filter by thresholds.

8. Stopping the Pipeline

To stop services:

docker compose down

To stop & remove all data volumes (start fresh):

docker compose down -v

Now your local machine runs a production-style data engineering pipeline with open-source tools.

Build and Run a Data Engineering Pipeline on Your Local System

`docker-compose.yml`

PostgreSQL Init Script

Post a Comment

0 Comments

Translator

VISITORS

Facebook

Popular Posts

How to Publish an Azure Function - In-Process vs Isolated Models

Datadog: Revolutionizing Observability and Cloud Monitoring in 2025

Bind multiple domains on same IP address and Port in SSL

Subscribe Us

Sponsor

json viewer

Recent Posts

Categories

ViASTUDY.com

Links

Build and Run a Data Engineering Pipeline on Your Local System

docker-compose.yml

PostgreSQL Init Script

Related Posts

Post a Comment

0 Comments

Translator

VISITORS

Facebook

Popular Posts

How to Publish an Azure Function - In-Process vs Isolated Models

Datadog: Revolutionizing Observability and Cloud Monitoring in 2025

Bind multiple domains on same IP address and Port in SSL

Subscribe Us

Sponsor

json viewer

Recent Posts

Categories

ViASTUDY.com

Links

Footer Social Widget

`docker-compose.yml`