distributed-llm-pretraining-torchtitan

Name: distributed-llm-pretraining-torchtitan
Author: davila7

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: davila7
Category: Data Science
Views: 2

GitHub repo

About this skill

How to use

Zainstaluj TorchTitan za pomocą pip install torchtitan lub klonując repozytorium PyTorch i instalując zależności z requirements.txt. Wymaga PyTorch w wersji 2.6.0 lub nowszej.
Pobierz tokenizer modelu, który chcesz trenować. Przejdź na https://huggingface.co/settings/tokens, wygeneruj token dostępu, a następnie uruchom python scripts/download_hf_assets.py --repo_id meta-llama/Llama-3.1-8B --assets tokenizer --hf_token=TWÓJ_TOKEN (zastąp TWÓJ_TOKEN swoim tokenem HuggingFace).
Przygotuj plik konfiguracji treningowej w formacie TOML. Możesz użyć istniejącego szablonu z ./torchtitan/models/llama3/train_configs/llama3_8b.toml lub stworzyć własny, definiując parametry takie jak folder wyjściowy, rozmiar modelu i ustawienia paralelizacji.
Uruchom trening na dostępnych GPU-ach, wykonując ./run_train.sh z wskazaniem pliku konfiguracji. Na pojedynczym węźle z 8 GPU-ami użyj CONFIG_FILE="./torchtitan/models/llama3/train_configs/llama3_8b.toml" ./run_train.sh.
Monitoruj postęp treningowy i zarządzaj checkpointami. TorchTitan automatycznie zapisuje punkty kontrolne w folderze dump_folder zdefiniowanym w konfiguracji, co umożliwia wznowienie treningów w razie przerwania.

Related skills

data-storytelling

by wshobson

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

Data Science

26105

infographic-creation

by antvis

Create beautiful infographics based on the given text content. Use this when users request creating infographics.

Data Science

60199

xlsx

by anthropics

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2)

Data Science

40128

market-analysis

by xbklairith

Use when analyzing markets or interpreting charts - applies technical indicators (RSI, MACD, Moving Averages), identifies support/resistance, analyzes multi-timeframe trends, checks fundamentals and sentiment. Activates when user says \

Data Science

29144

web-artifacts-builder

by anthropics

Suite of tools for creating elaborate, multi-component claude.ai HTML artifacts using modern frontend web technologies (React, Tailwind CSS, shadcn/ui). Use for complex artifacts requiring state management, routing, or shadcn/ui components - not for simple single-file HTML/JSX

Data Science

37124

skill-creator

by anthropics

Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.

Data Science

59147