constitutional-ai

Name: constitutional-ai
Author: davila7

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: davila7
Category: Security
Views: 10

GitHub repo

About this skill

How to use

Zainstaluj wymagane biblioteki: transformers, torch i trl. Możesz to zrobić za pomocą pip install transformers torch trl.
Przygotuj zestaw zasad (konstytucję) — listę принципów opisujących pożądane zachowanie modelu. Przykład: "Wybieraj odpowiedzi pomocne, szczere i bezpieczne", "Unikaj treści toksycznych, rasistowskich lub seksistowskich", "Wyjaśniaj zastrzeżenia zamiast odmawiać".
W fazie nadzorowanego uczenia wygeneruj początkowe odpowiedzi modelu na pytania testowe, używając pipeline'u text-generation z biblioteki transformers.
Uruchom fazę samooceny — przekaż każdą wygenerowaną odpowiedź wraz z pytaniem i konstytucją do modelu, aby ten ocenił, czy odpowiedź jest zgodna z zasadami. Model powinien wskazać problemy i zasugerować poprawy.
Pozwól modelowi zrewidować swoje odpowiedzi na podstawie własnej krytyki z kroku 4. To jest kluczowa część fazy nadzorowanego uczenia.
W fazie uczenia ze wzmacnianiem (RLAIF) użyj sprzężenia zwrotnego od modelu do optymalizacji jego parametrów, zamiast polegać na ocenach człowieka. Biblioteka trl zawiera narzędzia do tego procesu.

Related skills

software-security

by project-codeguard

A software security skill that integrates with Project CodeGuard to help AI coding agents write secure code and prevent common vulnerabilities. Use this skill when writing, reviewing, or modifying code to ensure secure-by-default practices are followed.

Security

1678

better-auth-best-practices

by novuhq

Skill for integrating Better Auth - the comprehensive TypeScript authentication framework.

Security

1148

senior-security

by davila7

Comprehensive security engineering skill for application security, penetration testing, security architecture, and compliance auditing. Includes security assessment tools, threat modeling, crypto implementation, and security automation. Use when designing security architecture,

Security

2482

python-expert

by Shubhamsaboo

Senior Python developer expertise for writing clean, efficient, and well-documented code.\nUse when: writing Python code, optimizing Python scripts, reviewing Python code for best practices,\ndebugging Python issues, implementing type hints, or when user mentions Python, PEP 8,

Security

2777

payload

by payloadcms

Use when working with Payload CMS projects (payload.config.ts, collections, fields, hooks, access control, Payload API). Use when debugging validation errors, security issues, relationship queries, transactions, or hook behavior.

Security

50171

1password

by openclaw

Set up and use 1Password CLI (op). Use when installing the CLI, enabling desktop app integration, signing in (single or multi-account), or reading/injecting/running secrets via op.

Security

1174