Skip to content

tridao/zoo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

We use the template from https://github.com/ashleve/lightning-hydra-template. Please read the instructions there to understand the repo structure.

GPT2 training

To train GPT2 on Openwebtext with 8 GPUs:

python run.py experiment=owt/gpt2s-flash trainer.devices=8
python run.py experiment=owt/gpt2m-flash trainer.devices=8
python run.py experiment=owt/gpt2l-flash trainer.devices=8

To train with bf16 instead of fp16, add trainer.precision=bf16.

Requirements

Python 3.8+, Pytorch 1.9+, torchvision, torchtext, pytorch-fast-transformers, munch, einops, timm, hydra-core, hydra-colorlog, python-dotenv, rich, pytorch-lightning, triton. We recommend CUDA 11.8 (e.g., using the Nvidia's Pytorch Docker image from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)

We provide a Dockerfile that lists all the required packages.

This repo includes the following CUDA extensions:

  1. Fused dropout + residual + LayerNorm, adapted from Apex's FastLayerNorm.
cd csrc/layer_norm && pip install .
  1. Fused matmul + bias (forward and backward), and fused matmul + bias + gelu (forward and backward), adapted from Apex's FusedDense.
cd csrc/fused_dense_lib && pip install .
  1. Optimized cross-entropy loss, adapted from Apex's Xentropy.
cd csrc/xentropy && pip install .

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published