data-engineering

Name: data-engineering
Author: pluginagentmarketplace

ETL pipelines, Apache Spark, data warehousing, and big data processing. Use for building data pipelines, processing large datasets, or data infrastructure.

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: pluginagentmarketplace
Category: DevOps
Views: 133

GitHub repo

About this skill

ETL pipelines, Apache Spark, data warehousing, and big data processing. Use for building data pipelines, processing large datasets, or data infrastructure.

How to use

Zainstaluj wymagane biblioteki: PySpark do przetwarzania rozproszonego oraz Apache Airflow do orkiestracji potoków. Upewnij się, że masz dostęp do klastra Spark i systemu magazynowania (S3, HDFS lub innego).
Zainicjuj sesję Spark w swoim skrypcie, konfigurując parametry takie jak pamięć executora i nazwa aplikacji. Użyj SparkSession.builder do utworzenia połączenia z klastrem.
Wczytaj dane z zewnętrznego źródła (np. pliki Parquet z S3) za pomocą spark.read. Określ format i ścieżkę do danych źródłowych.
Zastosuj transformacje danych: filtruj wiersze, grupuj po kolumnach, obliczaj agregaty (sumę, średnią, liczbę). Spark wykonuje te operacje leniwie, optymalizując plan zapytania.
Zapisz przetworzone dane do magazynu danych, wybierając tryb zapisu (overwrite lub append) i partycjonując wyniki po dacie lub innym kluczu dla lepszej wydajności.
Dla automatyzacji powtarzalnych procesów zdefiniuj DAG w Apache Airflow: utwórz funkcje extract, transform i load, połącz je w sekwencję zadań, ustaw harmonogram (np. codziennie) i konfiguruj powiadomienia o błędach.

Related skills

task-master

by sfc-gh-dflippo

AI-powered task management for structured, specification-driven development. Use this skill when you need to manage complex projects with PRDs, break down tasks into subtasks, track dependencies, and maintain organized development workflows across features and branches.

DevOps

14126

3d-games

by davila7

3D game development principles. Rendering, shaders, physics, cameras.

DevOps

1355

senior-computer-vision

by davila7

World-class computer vision skill for image/video processing, object detection, segmentation, and visual AI systems. Expertise in PyTorch, OpenCV, YOLO, SAM, diffusion models, and vision transformers. Includes 3D vision, video analysis, real-time processing, and production

DevOps

1044

drawio-diagrams-enhanced

by jgtolentino

Create professional draw.io (diagrams.net) diagrams in XML format (.drawio files) with integrated PMP/PMBOK methodologies, extensive visual asset libraries, and industry-standard professional templates. Use this skill when users ask to create flowcharts, swimlane diagrams,

DevOps

918943

grafana-dashboards

by wshobson

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

DevOps

92262

draw-io

by davila7

draw.io diagram creation, editing, and review. Use for .drawio XML editing, PNG conversion, layout adjustment, and AWS icon usage.

DevOps

1693