Large language models (LLMs)

2025-10-17

Important

This feature is in Beta.

This page provides notebook examples for fine-tuning large language models (LLMs) using Serverless GPU compute. These examples demonstrate various approaches to fine-tuning including parameter-efficient methods like Low-Rank Adaptation (LoRA) and full supervised fine-tuning.

Before running these notebooks, see the Best practices checklist.

Fine-tune Qwen2-0.5B model

The following notebook provides an example of how to efficiently fine-tune the Qwen2-0.5B model using:

Transformer reinforcement learning (TRL) for supervised fine-tuning
Liger Kernels for memory-efficient training with optimized Triton kernels.
LoRA for parameter-efficient fine-tuning.

Notebook

Get notebook

Fine-tune Llama-3.2-3B with Unsloth

This notebook demonstrates how to fine-tune Llama-3.2-3B using the Unsloth library.

Notebook

Get notebook

Video demo

This video walks through the notebook in detail (12 minutes).

Supervised fine-tuning using DeepSpeed and TRL

This notebook demonstrates how to use the Serverless GPU Python API to run supervised fine-tuning (SFT) using the Transformer Reinforcement Learning (TRL) library with DeepSpeed ZeRO Stage 3 optimization.

Notebook

Get notebook

Feedback

Was this page helpful?

Share via

Large language models (LLMs)

Fine-tune Qwen2-0.5B model

Notebook

Fine-tune Llama-3.2-3B with Unsloth

Notebook

Video demo

Supervised fine-tuning using DeepSpeed and TRL

Notebook

Feedback

Additional resources