slime-rl-training

Name: slime-rl-training
Author: davila7

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: davila7
Category: Security

GitHub repo

About this skill

How to use

Zainstaluj wymagane zależności: sglang-router w wersji 0.2.3 lub wyższej, ray, torch w wersji 2.0.0 lub wyższej oraz transformers w wersji 4.40.0 lub wyższej. Upewnij się, że masz dostęp do repozytorium davila7 na GitHubie.
Sklonuj lub pobierz skill z folderu post-training-slime z repozytorium claude-code-templates. Umieść go w strukturze katalogów zgodnie z konwencją ai-research skills.
Przygotuj swoje dane treningowe i skonfiguruj buffer danych. Slime oferuje elastyczne zarządzanie promptami i przechowywanie próbek — zdefiniuj niestandardowy workflow generowania danych zgodnie z potrzebami Twojego modelu.
Skonfiguruj parametry treningu dla wybranego modelu (GLM-4.x, Qwen3, DeepSeek V3 lub Llama 3). Określ typ równoległa obliczeń: tensor parallelism (TP), pipeline parallelism (PP), data parallelism (DP) lub sequence parallelism (SP).
Uruchom trening za pomocą Megatron-LM z integracją SGLang do generowania rolloutów. Slime automatycznie koordynuje trenowanie z wysokowydajnym generowaniem poprzez router SGLang.
Monitoruj przebieg treningu i dostosuj parametry data buffer'a w razie potrzeby. Po zakończeniu treningu model jest gotowy do ewaluacji i wdrożenia.

Related skills

manim

by davila7

Comprehensive guide for Manim Community - Python framework for creating mathematical animations and educational videos with programmatic control

Security

1588

openapi-spec-generation

by wshobson

Generate and maintain OpenAPI 3.1 specifications from code, design-first specs, and validation patterns. Use when creating API documentation, generating SDKs, or ensuring API contract compliance.

Security

18109

python-expert

by Shubhamsaboo

Senior Python developer expertise for writing clean, efficient, and well-documented code.\nUse when: writing Python code, optimizing Python scripts, reviewing Python code for best practices,\ndebugging Python issues, implementing type hints, or when user mentions Python, PEP 8,

Security

2777

software-security

by project-codeguard

A software security skill that integrates with Project CodeGuard to help AI coding agents write secure code and prevent common vulnerabilities. Use this skill when writing, reviewing, or modifying code to ensure secure-by-default practices are followed.

Security

1678

security-compliance

by davila7

Guides security professionals in implementing defense-in-depth security architectures, achieving compliance with industry frameworks (SOC2, ISO27001, GDPR, HIPAA), conducting threat modeling and risk assessments, managing security operations and incident response, and embedding

Security

1172

windows-ui-automation

by martinholovsky

Security

10115