awq-quantization

Name: awq-quantization
Author: davila7

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: davila7
Category: Security

GitHub repo

About this skill

How to use

Zainstaluj bibliotekę autoawq za pomocą pip install autoawq. Jeśli chcesz zoptymalizowane jądra CUDA i Flash Attention, użyj pip install autoawq[kernels]. Upewnij się, że masz Python 3.8+, CUDA 11.8+ i GPU z compute capability 7.5 lub wyższym.
Pobierz wstępnie skwantyzowany model z repozytorium HuggingFace, na przykład TheBloke/Mistral-7B-Instruct-v0.2-AWQ, który jest już przygotowany w formacie AWQ.
Załaduj model w Pythonie, importując AutoAWQForCausalLM z biblioteki awq oraz AutoTokenizer z transformers. Użyj metody from_quantized() z parametrem fuse_layers=True, aby połączyć warstwy i zwiększyć wydajność.
Przygotuj tokenizer dla wybranego modelu, wczytując go za pomocą AutoTokenizer.from_pretrained() z tą samą nazwą modelu.
Uruchom inferecję, przekazując tekst wejściowy przez tokenizer, a następnie generując odpowiedź modelem. Skwantyzowany model będzie działać szybciej i zajmować mniej pamięci GPU niż wersja pełnej precyzji.
Jeśli pracujesz z vLLM do serwowania w produkcji, upewnij się, że Twoje GPU obsługuje jądra Marlin dla najlepszej wydajności.

Related skills

windows-ui-automation

by martinholovsky

Security

10115

accessibility-compliance

by wshobson

Implement WCAG 2.2 compliant interfaces with mobile accessibility, inclusive design patterns, and assistive technology support. Use when auditing accessibility, implementing ARIA patterns, building for screen readers, or ensuring inclusive user experiences.

Security

2173

security-compliance

by davila7

Guides security professionals in implementing defense-in-depth security architectures, achieving compliance with industry frameworks (SOC2, ISO27001, GDPR, HIPAA), conducting threat modeling and risk assessments, managing security operations and incident response, and embedding

Security

1172

brand-voice

by anthropics

Apply and enforce brand voice, style guide, and messaging pillars across content. Use when reviewing content for brand consistency, documenting a brand voice, adapting tone for different audiences, or checking terminology and style guide compliance.

Security

48158

payload

by payloadcms

Use when working with Payload CMS projects (payload.config.ts, collections, fields, hooks, access control, Payload API). Use when debugging validation errors, security issues, relationship queries, transactions, or hook behavior.

Security

50171

typescript-review

by metabase

Review TypeScript and JavaScript code changes for compliance with Metabase coding standards, style violations, and code quality issues. Use when reviewing pull requests or diffs containing TypeScript/JavaScript code.

Security

17133