data-cleaning-pipeline

Name: data-cleaning-pipeline
Author: aj-geddes

Build robust processes for data cleaning, missing value imputation, outlier handling, and data transformation for data preprocessing, data quality, and data pipeline automation

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: aj-geddes
Category: DevOps
Views: 102

GitHub repo

About this skill

Build robust processes for data cleaning, missing value imputation, outlier handling, and data transformation for data preprocessing, data quality, and data pipeline automation

How to use

Zainstaluj wymagane biblioteki: pandas, numpy oraz scikit-learn (SimpleImputer, KNNImputer, StandardScaler, MinMaxScaler).
Wczytaj swoje dane surowe za pomocą pandas.read_csv() lub innego źródła danych.
Zidentyfikuj brakujące wartości używając df.isnull().sum() i wybierz strategię obsługi: usuń wiersze dla kolumn krytycznych (dropna), imputuj wartości numeryczne medianą (SimpleImputer), zastosuj imputację KNN dla powiązanych cech lub wypełnij kategorie modą.
Obsługuj anomalie i duplikaty: zidentyfikuj wartości odstające oraz zduplikowane wiersze, następnie usuń lub transformuj je zgodnie z wymaganiami projektu.
Standaryzuj typy danych i zakresy wartości: upewnij się, że kolumny mają prawidłowe typy (numeryczne, kategorialne, tekstowe), a następnie normalizuj zakresy za pomocą StandardScaler lub MinMaxScaler.
Waliduj czystość danych poprzez sprawdzenie reguł integralności i upewnienie się, że dane są gotowe do analizy lub modelowania.

Related skills

task-master

by sfc-gh-dflippo

AI-powered task management for structured, specification-driven development. Use this skill when you need to manage complex projects with PRDs, break down tasks into subtasks, track dependencies, and maintain organized development workflows across features and branches.

DevOps

14126

resolve-conflicts

by antinomyhq

Use this skill immediately when the user mentions merge conflicts that need to be resolved. Do not attempt to resolve conflicts directly - invoke this skill first. This skill specializes in providing a structured framework for merging imports, tests, lock files (regeneration),

DevOps

48163

pmbok-project-management

by jgtolentino

Comprehensive PMP/PMBOK project management methodologies and best practices. Use this skill when users need guidance on project management processes, templates, knowledge areas, process groups, tools, techniques, or certification preparation. Covers all 10 PMBOK Knowledge Areas

DevOps

21133

file-organizer

by ComposioHQ

Intelligently organizes your files and folders across your computer by understanding context, finding duplicates, suggesting better structures, and automating cleanup tasks. Reduces cognitive load and keeps your digital workspace tidy without manual effort.

DevOps

1399

aws-solution-architect

by alirezarezvani

Design AWS architectures for startups using serverless patterns and IaC templates. Use when asked to design serverless architecture, create CloudFormation templates, optimize AWS costs, set up CI/CD pipelines, or migrate to AWS. Covers Lambda, API Gateway, DynamoDB, ECS, Aurora,

DevOps

1231

context7

by mikha08-rgb

Search GitHub issues, pull requests, and discussions across any repository. Activates when researching external dependencies (whisper.cpp, NAudio), looking for similar bugs, or finding implementation examples.

DevOps

51166