Toolverse
All skills

quantizing-models-bitsandbytes

by davila7

Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4, FP4 formats, QLoRA training, and 8-bit optimizers. Works with HuggingFace

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author
davila7
Category
Security
Views
2

About this skill

Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4, FP4 formats, QLoRA training, and 8-bit optimizers. Works with HuggingFace Transformers.

How to use

  1. Zainstaluj wymagane pakiety: pip install bitsandbytes transformers accelerate. 2. Oblicz wymagania pamięciowe swojego modelu — dla modelu 7B w FP16 potrzebujesz około 14 GB, w INT8 około 7 GB, w INT4 około 3,5 GB. 3. Wybierz poziom kwantyzacji: 8-bitowy dla 50% redukcji pamięci lub 4-bitowy dla 75% redukcji. 4. Skonfiguruj kwantyzację, importując BitsAndBytesConfig z transformers i ustawiając load_in_8bit=True lub load_in_4bit=True. 5. Załaduj model za pomocą AutoModelForCausalLM.from_pretrained(), przekazując konfigurację kwantyzacji i device_map="auto". 6. Zweryfikuj, że model załadował się prawidłowo i testuj wnioskowanie — dokładność powinna być utracona poniżej 1%.

Related skills

software-security

by project-codeguard

A software security skill that integrates with Project CodeGuard to help AI coding agents write secure code and prevent common vulnerabilities. Use this skill when writing, reviewing, or modifying code to ensure secure-by-default practices are followed.

Security
1678

architect-review

by sickn33

Master software architect specializing in modern architecture patterns, clean architecture, microservices, event-driven systems, and DDD. Reviews system designs and code changes for architectural integrity, scalability, and maintainability. Use PROACTIVELY for architectural

Security
2773

feishu-docs

by openclaw

飞书文档(Docx)API技能。用于创建、读取、更新和删除飞书文档。支持Markdown/HTML内容转换、文档权限管理。

Security
1574

google-analytics

by davila7

Analyze Google Analytics data, review website performance metrics, identify traffic patterns, and suggest data-driven improvements. Use when the user asks about analytics, website metrics, traffic analysis, conversion rates, user behavior, or performance optimization.

Security
1260

accessibility-compliance

by wshobson

Implement WCAG 2.2 compliant interfaces with mobile accessibility, inclusive design patterns, and assistive technology support. Use when auditing accessibility, implementing ARIA patterns, building for screen readers, or ensuring inclusive user experiences.

Security
2173

zendesk

by vm0-ai

Zendesk Support REST API for managing tickets, users, organizations, and support operations. Use this skill to create tickets, manage users, search, and automate customer support workflows.

Security
11100