Diffusion Transformer Implementation
A PyTorch implementation of the Diffusion Transformer (DiT) model. With OpenAI’s Sora demonstrating the power of DiTs for multidimensional tasks, they represents a stable and efficient approach any diffusion task (vision, audio, robotics etc..). This implementation provides a clean, modular codebase to extend DiT for various generative applications. Code Repository Architecture Implementation details Firstlayer: A Python class that initializes the input processing. It includes a Patchify module to convert images into patches, a learnable positional embedding, and separate embedding layers for timesteps and class labels....
Wav2Vec 2.0 Implementation
A Wav2Vec 2.0 implementation using PyTorch Lightning. This project aims to create a clean, modifiable building block for speech reco gnition research. It uses common tools for optimized training and effective monitoring. The implementation includes code for model training, dataset preparation, and evaluation. This page also details the results of pretraining on the Libri-Speech dataset. Code Repository Architecture My implementation closely follows the Wav2Vec 2.0 BASE model architecture: 768 embedding size 8 attention heads 12 transformer blocks 512 convolutional channels in the feature encoder 2 groups and 320 choices per group in the quantizer This configuration results in approximately 95M parameters....
LoRA Instruction Tuning Implementation
Following my previous GPT-2 pretraining project, this is clean pipeline for fine-tuning with a “from scratch” implementation of LoRA that can be reused for other project. I’m choosing GPT-2 because I’m renting a small GPU for the experiment with a small memory but this can be reproduced with any other transformer model. Code Repository Dataset I used the Alpaca-GPT4 dataset from Stanford that contains 52K instruction-following data generated by GPT-4. Each row of the dataset contains an instruction, an optional input that provide context, and an output:...
GPT-2 Pretraining
A nano-GPT implementation with Pytorch Lightning. The goal is to have a clean building block for other research projects by containing just enough manual implementation do be easily modifiable, but also by using common tools to have a painless optimized training and nice monitoring. Its contains the code to train the model, prepare the dataset and run evals. This page also details results I got training on HF’s FineWeb-Edu. Code Repository...
U-Net for Segmentation
A simple Pytroch U-Net implementation. The goal is to have an clean building block that can be used in other bigger projects (e.g. Diffusion). The model is tested with a segmentation task on the MIT scene-parse-150 dataset. Code Repository Architecture The network is built up as follows: The network consists of a downsampling path, a bottleneck, and an upsampling path. In the downsampling path: A sequence of DoubleConv modules are applied....
RL Policy for Legged Locomotion
Quadrupeds robots currently have difficulty overcoming rough terrains, the goal of this project is to improve the agility and robustness of legged locomotion over complex terrain using reinforcement learning. The project consists of implementing the following paper from Nvidia Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning and adapt it to the unitree A1 Problem statement Traditionnaly locomotion is achieve through Optimization algorithms, especially Model Predictive Control (MPC)....
MPC Ball Balancing Robot
Control a ball on a plate using a robotic manipulator and a MPC controller. This project was carried out as part of the CS206B Robotic Manipulation and Interaction course at UC Berkeley MPC Controller MPC is a type of feedback control algorithm, in which a model of the system is used to make predictions about it’s future behavior. The control inputs are then computed based on the predictions, with the goal of achieving the desired system behavior....