openrlhf-training

Name: openrlhf-training
Author: davila7

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: davila7
Category: Security

GitHub repo

About this skill

How to use

Przygotuj środowisko Docker z obsługą NVIDIA, uruchamiając obraz PyTorch 25.02 z GPU. Zamontuj katalog roboczy jako wolumin, aby mieć dostęp do plików treningowych.
Zainstaluj OpenRLHF i jego zależności (Ray, vLLM, PyTorch, Transformers, DeepSpeed) poleceniem pip install openrlhf[vllm]. Przed instalacją odinstaluj pakiety konfliktujące: xgboost, transformer_engine, flash_attn i pynvml.
Uruchom klaster Ray na maszynie głównej, wskazując liczbę dostępnych GPU (np. 8 GPU) poleceniem ray start --head --node-ip-address 0.0.0.0 --num-gpus 8.
Skonfiguruj trening PPO, definiując liczbę węzłów i GPU dla każdego komponentu (model referencyjny, reward model, krytyk, aktor) oraz parametry vLLM takie jak liczba silników inferencji i rozmiar batch'a.
Wyślij zadanie treningowe do klastra Ray poleceniem ray job submit, podając ścieżkę do skryptu train_ppo_ray, model wstępnie wytrenowany (np. Llama-3-8b-sft-mixture) oraz parametry optymalizacji (learning rate, liczba epok, max_len dla promptów i generacji).
Monitoruj postęp treningu i zapisz wytrenowany model w wskazanym katalogu wyjściowym (np. ./output/llama3-8b-rlhf).

Related skills

solidity-security

by wshobson

Master smart contract security best practices to prevent common vulnerabilities and implement secure Solidity patterns. Use when writing smart contracts, auditing existing contracts, or implementing security measures for blockchain applications.

Security

10105

youtube-watcher

by openclaw

Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.

Security

2231

ui-audit

by openclaw

AI skill for automated UI audits. Evaluate interfaces against proven UX principles for visual hierarchy, accessibility, cognitive load, navigation, and more. Based on Making UX Decisions by Tommy Geoco.

Security

1223

reverse-engineering-tools

by gmh5225

Guide for reverse engineering tools and techniques used in game security research. Use this skill when working with debuggers, disassemblers, memory analysis tools, binary analysis, or decompilers for game security research.

Security

3168

better-auth-best-practices

by novuhq

Skill for integrating Better Auth - the comprehensive TypeScript authentication framework.

Security

1148

1password

by openclaw

Set up and use 1Password CLI (op). Use when installing the CLI, enabling desktop app integration, signing in (single or multi-account), or reading/injecting/running secrets via op.

Security

1174