Our review
This skill sets up an isolated ML environment with PyTorch, auto-detecting hardware (NVIDIA/AMD GPU or CPU) and installing appropriate builds.
Strengths
- Automatic detection and configuration of GPU or CPU hardware
- Support for NVIDIA (including Blackwell), AMD RDNA, and Strix Halo GPUs
- Reproducible installation via bash scripts and pinned PyTorch version
Limitations
- Does not handle manual GTT memory configuration for Strix Halo
- Pinned PyTorch version (2.10.0) may require updates for newer releases
- Depends on correctly installed GPU drivers
Use this skill to quickly bootstrap a new ML project or troubleshoot PyTorch environment issues related to GPU detection.
Avoid this skill if you need advanced custom configuration (other frameworks, specific dependencies) or if you work in an already configured environment with package managers like Anaconda.
Security analysis
CautionThe skill uses Bash to execute setup scripts that require elevated privileges (sudo usermod) and download/install packages, which could modify the system. No destructive or exfiltration commands are present, but the scripts' sources aren't transparent to the auditor, making the risk moderate.
- •Running bash scripts from user's skills directory without user review, which could perform arbitrary system modifications (installing packages, modifying user groups).
Examples
Set up a new ML project called 'my-ml-project' with PyTorch in ~/projects. Auto-detect my hardware and install the right version of PyTorch.I'm trying to use PyTorch but it doesn't see my GPU. Can you detect my hardware and install the correct PyTorch build for my NVIDIA RTX 4090?Set up a PyTorch environment for my AMD Strix Halo (gfx1151) system. Use ROCm 7 and remember to add me to render/video groups.name: ml-env description: Set up ML environments with PyTorch and auto-detect hardware. Use this when creating new ML projects, setting up PyTorch, or troubleshooting GPU/environment issues. Guides you through creating isolated project environments with hardware-specific PyTorch builds (NVIDIA/AMD/CPU). allowed-tools: Read, Bash, WebFetch activation-precedence: high
ML Environment Setup & Troubleshooting
This skill helps you create and manage isolated ML environments with PyTorch. It auto-detects your hardware (NVIDIA GPU, AMD GPU, or CPU) and installs the appropriate PyTorch build.
Creating a New ML Project
I can help you set up a complete ML project with PyTorch in seconds. Here's what I'll do:
- Create your project directory
- Create a
.gitignorefor ML files - Run hardware detection
- Install PyTorch with the right backend
- Install ML libraries
- Validate everything works
To get started, tell me:
- Project name/path where you want it created
- I'll handle the rest!
Interactive Setup Process
When you ask me to set up a new ML project, I will:
# 1. Create the project directory
mkdir -p ~/projects/my-ml-project
# 2. Create .gitignore
# (ignores ml-env/, data/, models/, logs/, etc.)
# 3. Run the setup script
bash ~/.claude/skills/ml-env/scripts/setup-universal.sh
# 4. Show you the results
The setup script will:
- Detect your GPU (NVIDIA with nvidia-smi, AMD with rocminfo, or fallback to CPU)
- Ask questions for special hardware (Blackwell GPUs, Strix Halo, etc.)
- Install Python 3.13 virtual environment with uv
- Install PyTorch 2.10.0 with correct backend
- Install ML libraries: numpy, pandas, scikit-learn, jupyter, accelerate, etc.
- Create
ml-env/directory in your project - Optionally initialize git
After Setup: Using Your Environment
Once created, activating is simple:
cd ~/projects/my-ml-project
source ml-env/bin/activate # Regular environments
# OR
source ml-env/activate-safe.sh # If you use conda (ignores conda settings)
Check it works:
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.cuda.is_available()}')"
Hardware-Specific Guidance
NVIDIA GPUs
- Supported: RTX 3090, 4090, 5090, and most Ampere/Ada/Blackwell
- Installation: CUDA 12.8 (stable) or CUDA 13.0 (for RTX 5090)
- Driver requirement: 520+ for CUDA 12.8, 550+ for CUDA 13.0
- WSL2 users: Use Windows NVIDIA driver only (do NOT install Linux driver)
AMD RDNA (RX 6000/7000 series)
- Installation: ROCm 6.2
- Requirements: User in
renderandvideogroups - Setup:
sudo usermod -aG render,video $USER && newgrp render
AMD Strix Halo (gfx1151)
⚠️ This requires special handling - official PyTorch wheels do NOT work!
- GPU: Ryzen AI MAX+ 395 with gfx1151
- Critical issue: Official PyTorch wheels fail with "HIP error: invalid device function"
- Solution: Use AMD gfx1151-specific builds
- ROCm 7 (Recommended):
https://repo.amd.com/rocm/whl/gfx1151/- ~31 TFLOPS BF16 - ROCm 6.4.4+ (Fallback):
https://rocm.nightlies.amd.com/v2/gfx1151/- ~12 TFLOPS BF16 - Memory limits: Default ~33GB; configure GTT for larger models (30B+)
- Setup requires: User in
renderandvideogroups, Linux kernel 6.14+ (6.16.9+ recommended for automatic UMA/GTT behavior)
Reference project: See ~/Projects/amdtest for a working gfx1151 setup example.
See TROUBLESHOOTING.md for complete Strix Halo setup and GTT memory configuration.
CPU-Only Systems
- Works everywhere
- Good for development/testing
- Use for learning before scaling to GPU
Current Versions (2026)
- PyTorch: 2.10.0
- Python: 3.13 (or 3.12 if needed)
- CUDA: 12.8 (main), 13.0 (Blackwell experimental)
- ROCm: 6.2 (RDNA), 7.x preferred for Strix Halo (6.4.4+ as fallback)
- Key ML libs: numpy, pandas, matplotlib, scikit-learn, jupyter, accelerate, tensorboard
Validating an Existing Environment
If you already have a project and want to verify it works:
cd ~/your-ml-project
bash ~/.claude/skills/ml-env/scripts/validate.sh
This will check:
- Python and PyTorch versions
- GPU/CPU backend detection
- GPU memory and specifications
- Computation tests
Troubleshooting
GPU Not Detected
NVIDIA:
nvidia-smi # Check driver is installed
python -c "import torch; print(torch.cuda.is_available())"
AMD:
rocm-smi # Check ROCm installation
rocminfo | grep gfx # Check GPU architecture
CUDA Out of Memory
- Reduce batch size
- Enable mixed precision training
- Use
torch.cuda.empty_cache()between batches - Try gradient accumulation
PyTorch Not Finding GPU After Install
- Activate the environment:
source ml-env/bin/activate - Check driver version
- Reinstall PyTorch with correct index URL:
uv pip install --upgrade torch --index-url https://download.pytorch.org/whl/cu128
Strix Halo Specific Issues
See TROUBLESHOOTING.md for detailed Strix Halo troubleshooting, GTT memory setup, and performance optimization.
Best Practices
- Always activate first: Before running any Python/ML code
- Use virtual environments: Never install to system Python
- Move models to device: Explicitly move tensors to GPU
- Monitor memory: Keep an eye on GPU memory usage
- Test on CPU first: Develop with small data on CPU, scale to GPU
- Save checkpoints: Don't train for hours without saving progress
Common Workflows
Training a Model
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = YourModel().to(device)
for batch in dataloader:
x, y = batch
x, y = x.to(device), y.to(device)
loss = model(x, y)
loss.backward()
Mixed Precision Training (Faster, Less Memory)
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
for batch in dataloader:
with autocast():
loss = model(x, y)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
Checking GPU Memory
import torch
if torch.cuda.is_available():
print(f"Allocated: {torch.cuda.memory_allocated()/1e9:.2f}GB")
print(f"Reserved: {torch.cuda.memory_reserved()/1e9:.2f}GB")
print(f"Total: {torch.cuda.get_device_properties(0).total_memory/1e9:.2f}GB")
Reference Documentation
- TROUBLESHOOTING.md - Common issues, hardware-specific setup (especially Strix Halo)
- UPDATE.md - Updating PyTorch and dependencies
Scripts in This Skill
All scripts are in ~/.claude/skills/ml-env/scripts/:
- setup-universal.sh - Hardware detection and PyTorch installation (used during initial setup)
- validate.sh - Validate an existing environment and test GPU/CPU
When to Use This Skill
Use me when you:
- Want to create a new ML project
- Need to set up PyTorch with GPU support
- Are troubleshooting GPU/CUDA/ROCm issues
- Want to update or maintain your ML environment
- Have hardware-specific questions (NVIDIA, AMD, Strix Halo)
- Need guidance on ML best practices
Questions?
Ask me anything about:
- Creating new ML projects
- Hardware setup and troubleshooting
- PyTorch installation
- GPU/CPU configuration
- ML best practices
- Updating packages
Next.js App Router Expert
Development
A skill that turns Claude into a Next.js App Router expert.
README Generator
Development
Creates professional and comprehensive README.md files for your projects.
API Documentation Writer
Development
Generates comprehensive API documentation in OpenAPI/Swagger format.