sparse-autoencoder-training

Name: sparse-autoencoder-training
Author: davila7

Provides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features. Use when discovering interpretable features, analyzing superposition, or studying monosemantic representations in language

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: davila7
Category: Security
Views: 1

GitHub repo

About this skill

How to use

Zainstaluj wymagane zależności: SAELens (wersja 6.0.0+), TransformerLens (2.0.0+) i PyTorch (2.0.0+). Umiejętność wymaga tych bibliotek do działania.
Załaduj model języka, którego aktywacje chcesz analizować. SAELens pracuje z modelami obsługiwanymi przez TransformerLens, takimi jak popularne modele otwarte.
Przygotuj dane treningowe — wybierz teksty reprezentatywne dla zachowań, które chcesz zbadać. SAE będzie uczyć się rozkładać aktywacje na podstawie tych danych.
Skonfiguruj i wytrenuj rzadki autokoder, ustawiając parametry takie jak liczba cech, współczynnik rzadkości i współczynnik uczenia. Proces trenowania rozkłada gęste aktywacje na rzadkie, interpretowalne komponenty.
Analizuj odkryte cechy — zbadaj, które neurony aktywują się dla konkretnych konceptów, jak superpozyja wpływa na reprezentacje i jakie bezpieczeństwo-istotne wzorce model wyuczył.
Opcjonalnie wykonaj sterowanie cechami lub ablację — użyj odkrytych cech do modyfikacji zachowania modelu lub testowania przyczynowych wpływów na wyjście.

Related skills

backend-security-coder

by sickn33

Expert in secure backend coding practices specializing in input validation, authentication, and API security. Use PROACTIVELY for backend security implementations or security code reviews.

Security

1133

windows-ui-automation

by martinholovsky

Security

10115

solidity-security

by wshobson

Master smart contract security best practices to prevent common vulnerabilities and implement secure Solidity patterns. Use when writing smart contracts, auditing existing contracts, or implementing security measures for blockchain applications.

Security

10105

architect-review

by sickn33

Master software architect specializing in modern architecture patterns, clean architecture, microservices, event-driven systems, and DDD. Reviews system designs and code changes for architectural integrity, scalability, and maintainability. Use PROACTIVELY for architectural

Security

2773

youtube-watcher

by openclaw

Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.

Security

2231

typescript-review

by metabase

Review TypeScript and JavaScript code changes for compliance with Metabase coding standards, style violations, and code quality issues. Use when reviewing pull requests or diffs containing TypeScript/JavaScript code.

Security

17133