simpo-training

Name: simpo-training
Author: davila7

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: davila7
Category: Security

GitHub repo

About this skill

How to use

Przygotuj środowisko: utwórz nowe środowisko Conda z Pythonem 3.10, aktywuj je, a następnie zainstaluj PyTorch 2.2.2 ze strony pytorch.org/get-started/locally/ wybierając swoją konfigurację sprzętu.
Sklonuj repozytorium alignment-handbook z GitHuba, przejdź do katalogu i zainstaluj pakiet za pomocą pip install.
Zainstaluj Flash Attention 2, które przyspiesza trenowanie: uruchom pip install flash-attn --no-build-isolation.
Przygotuj plik konfiguracji trenowania (np. mistral-7b-base-simpo.yaml) – określ model bazowy (np. Mistral 7B), dataset (np. HuggingFaceH4/ultrafeedback_binarized), oraz hiperparametry SimPO takie jak beta (2.0–10.0 dla skalowania nagród) i gamma_beta_ratio (0–1 dla marginesu docelowego).
Uruchom trenowanie za pomocą accelerate launch z plikiem konfiguracji deepspeed_zero3.yaml oraz skryptem run_simpo.py, przekazując ścieżkę do pliku konfiguracji trenowania.
Monitoruj postęp trenowania poprzez logi accelerate – model będzie optymalizowany na podstawie preferencji z datasetu bez potrzeby osobnego modelu referencyjnego.

Related skills

ui-audit

by openclaw

AI skill for automated UI audits. Evaluate interfaces against proven UX principles for visual hierarchy, accessibility, cognitive load, navigation, and more. Based on Making UX Decisions by Tommy Geoco.

Security

1223

youtube-watcher

by openclaw

Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.

Security

2231

brand-voice

by anthropics

Apply and enforce brand voice, style guide, and messaging pillars across content. Use when reviewing content for brand consistency, documenting a brand voice, adapting tone for different audiences, or checking terminology and style guide compliance.

Security

48158

reviewing-code

by CaptainCrouton89

Systematically evaluate code changes for security, correctness, performance, and spec alignment. Use when reviewing PRs, assessing code quality, or verifying implementation against requirements.

Security

1493

software-security

by project-codeguard

A software security skill that integrates with Project CodeGuard to help AI coding agents write secure code and prevent common vulnerabilities. Use this skill when writing, reviewing, or modifying code to ensure secure-by-default practices are followed.

Security

1678

backend-security-coder

by sickn33

Expert in secure backend coding practices specializing in input validation, authentication, and API security. Use PROACTIVELY for backend security implementations or security code reviews.

Security

1133