Toolverse
All skills

fine-tuning-with-trl

by davila7

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author
davila7
Category
Security
Views
1

About this skill

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

How to use

  1. Zainstaluj wymagane pakiety: pip install trl transformers datasets peft accelerate. 2. Przygotuj swoje dane treningowe – dla SFT potrzebujesz par prompt-completion, dla DPO par chosen/rejected. 3. Załaduj model bazowy, np. Qwen/Qwen2.5-0.5B, używając AutoModelForCausalLM z biblioteki transformers. 4. Dla nadzorowanego dostrajania (SFT) utwórz SFTTrainer, przekaż model, dataset i uruchom trainer.train(). 5. Jeśli chcesz wyrównać model z preferencjami, użyj DPOTrainer z DPOConfig, ustaw preference_dataset z parami chosen/rejected i trenuj. 6. Po dostrojeniu ewaluuj model na testowych danych, aby sprawdzić jakość wyrównania z ludzkimi preferencjami.

Related skills

typescript-review

by metabase

Review TypeScript and JavaScript code changes for compliance with Metabase coding standards, style violations, and code quality issues. Use when reviewing pull requests or diffs containing TypeScript/JavaScript code.

Security
17133

architect-review

by sickn33

Master software architect specializing in modern architecture patterns, clean architecture, microservices, event-driven systems, and DDD. Reviews system designs and code changes for architectural integrity, scalability, and maintainability. Use PROACTIVELY for architectural

Security
2773

python-expert

by Shubhamsaboo

Senior Python developer expertise for writing clean, efficient, and well-documented code.\nUse when: writing Python code, optimizing Python scripts, reviewing Python code for best practices,\ndebugging Python issues, implementing type hints, or when user mentions Python, PEP 8,

Security
2777

manim

by davila7

Comprehensive guide for Manim Community - Python framework for creating mathematical animations and educational videos with programmatic control

Security
1588

qmd

by tobi

Search personal markdown knowledge bases, notes, meeting transcripts, and documentation using QMD - a local hybrid search engine. Combines BM25 keyword search, vector semantic search, and LLM re-ranking. Use when users ask to search notes, find documents, look up information in

Security
1951

feishu-docs

by openclaw

飞书文档(Docx)API技能。用于创建、读取、更新和删除飞书文档。支持Markdown/HTML内容转换、文档权限管理。

Security
1574