moe-training

Name: moe-training
Author: davila7

by davila7

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: davila7
Category: Security
Views: 23

GitHub repo

About this skill

Train Mixture of Experts (MoE) models using DeepSpeed or HuggingFace. Use when training large-scale models with limited compute (5× cost reduction vs dense models), implementing sparse architectures like Mixtral 8x7B or DeepSeek-V3, or scaling model capacity without proportional compute increase. Covers MoE architectures, routing mechanisms, load balancing, expert parallelism, and inference optimization.

How to use

Zainstaluj DeepSpeed z obsługą MoE: pip install deepspeed==0.6.0. Opcjonalnie sklonuj Megatron-DeepSpeed z repozytorium Microsoft dla trenowania na dużą skalę, lub użyj HuggingFace Transformers z accelerate: pip install transformers accelerate.
Zdefiniuj warstwę MoE w swoim modelu, tworząc klasę MoELayer z parametrami: rozmiar ukryty (hidden_size), liczba ekspertów (num_experts, domyślnie 8) i top_k (ile ekspertów aktywować na raz, zwykle 2). Każdy ekspert to niezależna sieć neuronowa specjalizująca się w różnych wzorcach.
Skonfiguruj router, który decyduje, które eksperty aktywować dla każdego tokena wejściowego. Router uczy się, które eksperty są najlepsze dla danego wejścia, co zmniejsza liczbę aktywnych parametrów.
Dodaj mechanizm równoważenia obciążenia, aby zapewnić, że wszystkie eksperci są równomiernie wykorzystywani podczas trenowania — unika to sytuacji, gdzie jeden ekspert otrzymuje zbyt wiele przykładów.
Trenuj model używając DeepSpeed lub HuggingFace Trainer, podając konfigurację MoE. Monitoruj, ile parametrów jest aktywnych w każdym kroku — w Mixtral 8x7B aktywnych jest zaledwie 13 miliardów z 47 miliardów parametrów.
Po treningu zoptymalizuj wnioskowanie, włączając sparse activation — model będzie szybszy i mniej wymagający zasobów niż gęste sieci tej samej wielkości.

Related skills

backend-security-coder

by sickn33

Expert in secure backend coding practices specializing in input validation, authentication, and API security. Use PROACTIVELY for backend security implementations or security code reviews.

Security

1133

openapi-spec-generation

by wshobson

Generate and maintain OpenAPI 3.1 specifications from code, design-first specs, and validation patterns. Use when creating API documentation, generating SDKs, or ensuring API contract compliance.

Security

18109

architect-review

by sickn33

Master software architect specializing in modern architecture patterns, clean architecture, microservices, event-driven systems, and DDD. Reviews system designs and code changes for architectural integrity, scalability, and maintainability. Use PROACTIVELY for architectural

Security

2773

security-compliance

by davila7

Guides security professionals in implementing defense-in-depth security architectures, achieving compliance with industry frameworks (SOC2, ISO27001, GDPR, HIPAA), conducting threat modeling and risk assessments, managing security operations and incident response, and embedding

Security

1172

reviewing-code

by CaptainCrouton89

Systematically evaluate code changes for security, correctness, performance, and spec alignment. Use when reviewing PRs, assessing code quality, or verifying implementation against requirements.

Security

1493

better-auth-best-practices

by novuhq

Skill for integrating Better Auth - the comprehensive TypeScript authentication framework.

Security

1148