Toolverse
All skills

verl-rl-training

by davila7

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author
davila7
Category
Security
Views
5

About this skill

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

How to use

  1. Zainstaluj verl za pomocą pip, wybierając backend: pip install verl[vllm] dla vLLM lub pip install verl[sglang] dla SGLang. Alternatywnie użyj obrazu Docker: docker pull verlai/verl:vllm011.latest dla wdrożenia produkcyjnego.

  2. Przygotuj swój model bazowy (np. Qwen-3, Llama-3.1, DeepSeek, Gemma-2) i dane treningowe zawierające prompty oraz odpowiedzi do oceny.

  3. Wybierz algorytm uczenia wzmacniającego odpowiedni do Twojego celu: PPO dla klasycznego RLHF, GRPO dla szybszego treningu, RLOO lub REINFORCE++ dla innych wariantów, DAPO lub SPIN dla specjalistycznych zastosowań.

  4. Skonfiguruj backend treningowy (FSDP dla rozproszenia, Megatron-LM dla dużych modeli) oraz engine rolloutów (vLLM lub SGLang dla generowania odpowiedzi podczas treningu).

  5. Uruchom trening, dostosowując parametry takie jak liczba kroków, rozmiar batcha i współczynnik uczenia. Verl obsługuje sekwencyjny paralelizm i paralelizm ekspertów dla modeli powyżej 100B parametrów.

  6. Monitoruj postępy treningu i waliduj model na benchmarkach. Jeśli potrzebujesz wieloturowych interakcji z narzędziami, włącz obsługę agentic workflows w konfiguracji rolloutów.

Related skills

backend-security-coder

by sickn33

Expert in secure backend coding practices specializing in input validation, authentication, and API security. Use PROACTIVELY for backend security implementations or security code reviews.

Security
1133

software-security

by project-codeguard

A software security skill that integrates with Project CodeGuard to help AI coding agents write secure code and prevent common vulnerabilities. Use this skill when writing, reviewing, or modifying code to ensure secure-by-default practices are followed.

Security
1678

obsidian

by gapmiss

Comprehensive guidelines for Obsidian.md plugin development including all 27 ESLint rules, TypeScript best practices, memory management, API usage (requestUrl vs fetch), UI/UX standards, and submission requirements. Use when working with Obsidian plugins, main.ts files,

Security
14111

openapi-spec-generation

by wshobson

Generate and maintain OpenAPI 3.1 specifications from code, design-first specs, and validation patterns. Use when creating API documentation, generating SDKs, or ensuring API contract compliance.

Security
18109

accessibility-compliance

by wshobson

Implement WCAG 2.2 compliant interfaces with mobile accessibility, inclusive design patterns, and assistive technology support. Use when auditing accessibility, implementing ARIA patterns, building for screen readers, or ensuring inclusive user experiences.

Security
2173

content-creator

by alirezarezvani

Create SEO-optimized marketing content with consistent brand voice. Includes brand voice analyzer, SEO optimizer, content frameworks, and social media templates. Use when writing blog posts, creating social media content, analyzing brand voice, optimizing SEO, planning content

Security
25124