pytorch-fsdp2

Name: pytorch-fsdp2
Author: Orchestra-Research

Adds PyTorch FSDP2 (fully_shard) to training scripts with correct init, sharding, mixed precision/offload config, and distributed checkpointing. Use when models exceed single-GPU memory or when you need DTensor-based sharding with DeviceMesh.

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: Orchestra-Research
Category: Data Science
Views: 25

GitHub repo

About this skill

How to use

Upewnij się, że masz zainstalowany PyTorch z obsługą FSDP2 (torch.distributed.fsdp.fully_shard) oraz że Twój projekt zależy od torch jako zależności.
Zidentyfikuj moduły w Twoim skrypcie treningowym, które chcesz shardować — zwykle są to duże warstwy modelu, które przekraczają pamięć pojedynczego GPU.
Zainicjalizuj rozproszone środowisko treningowe za pomocą torch.distributed, ustawiając backend komunikacji (np. nccl dla GPU) i rank procesów.
Zastosuj torch.distributed.fsdp.fully_shard do wybranych modułów, konfigurując opcje shardingu, precyzji mieszanej (mixed precision) i offloadingu pamięci zgodnie z dostępnymi zasobami GPU.
Skonfiguruj rozproszone checkpointowanie (Distributed Checkpoint) zamiast standardowych save/load, aby poprawnie zapisywać i wczytywać stany shardowane na wielu procesach.
Uruchom skrypt treningowy na wielu GPU/węzłach — FSDP2 automatycznie synchronizuje gradienty i aktualizuje parametry shardowane podczas backpropagation.

Related skills

infographic-creation

by antvis

Create beautiful infographics based on the given text content. Use this when users request creating infographics.

Data Science

60199

skill-installer

by openai

Install Codex skills into $CODEX_HOME/skills from a curated list or a GitHub repo path. Use when a user asks to list installable skills, install a curated skill, or install a skill from another repo (including private repos).

Data Science

23118

nano-banana-pro

by garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., \

Data Science

535772

excalidraw

by ryanquinn3

Data Science

124204

rust-coding-skill

by UtakataKyosui

Guides Claude in writing idiomatic, efficient, well-structured Rust code using proper data modeling, traits, impl organization, macros, and build-speed best practices.

Data Science

248325

ml-paper-writing

by davila7

Write publication-ready ML/AI papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM. Use when drafting papers from research repos, structuring arguments, verifying citations, or preparing camera-ready submissions. Includes LaTeX templates, reviewer guidelines, and citation

Data Science

2681