optimizing-attention-flash

Name: optimizing-attention-flash
Author: davila7

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (u003e512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA,

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: davila7
Category: Security
Views: 13

GitHub repo

About this skill

How to use

Sprawdź wersję PyTorch — powinna być co najmniej 2.2.0. Uruchom python -c "import torch; print(torch.__version__)" w terminalu. Jeśli masz starszą wersję, zaktualizuj PyTorch poleceniem pip install --upgrade torch.
Wybierz metodę integracji. Dla najprostszego podejścia użyj natywnego PyTorch SDPA (dostępny w wersji 2.2+), który automatycznie włącza Flash Attention jeśli jest dostępna. Alternatywnie zainstaluj bibliotekę flash-attn poleceniem pip install flash-attn --no-build-isolation dla większej kontroli i dodatkowych opcji.
Zaimplementuj attention w swoim modelu. W przypadku PyTorch SDPA zaimportuj torch.nn.functional i użyj funkcji scaled_dot_product_attention(q, k, v) zamiast ręcznego obliczania attention. Dla flash-attn zaimportuj flash_attn_func i przekaż tensory w formacie [batch, seqlen, nheads, headdim].
Przygotuj tensory wejściowe — query, key i value powinny być na urządzeniu CUDA i w formacie float16 lub bfloat16. Upewnij się, że sekwencja ma więcej niż 512 tokenów, aby w pełni wykorzystać optymalizacje Flash Attention.
Przetestuj wydajność za pomocą profilowania. Porównaj czas wykonania i zużycie pamięci przed i po włączeniu Flash Attention, aby potwierdzić przyspieszenie i oszczędności pamięci.
Zweryfikuj dokładność — uruchom testy aby upewnić się, że wyniki modelu pozostają zgodne z wersją bazową bez Flash Attention.

Related skills

youtube-watcher

by openclaw

Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.

Security

2231

zendesk

by vm0-ai

Zendesk Support REST API for managing tickets, users, organizations, and support operations. Use this skill to create tickets, manage users, search, and automate customer support workflows.

Security

11100

ui-audit

by openclaw

AI skill for automated UI audits. Evaluate interfaces against proven UX principles for visual hierarchy, accessibility, cognitive load, navigation, and more. Based on Making UX Decisions by Tommy Geoco.

Security

1223

gmail-manager

by jeffvincent

Manage Gmail - send, read, search emails, manage labels and drafts. Use when user wants to interact with their Gmail account for email operations.

Security

17128

payload

by payloadcms

Use when working with Payload CMS projects (payload.config.ts, collections, fields, hooks, access control, Payload API). Use when debugging validation errors, security issues, relationship queries, transactions, or hook behavior.

Security

50171

windows-ui-automation

by martinholovsky

Security

10115