Projects

FORGE: Program Synthesis for Robotic Manipulation

Tech Stack: Python, LLMs, MuJoCo, UR5 Manipulator

Co-developed an LLM-based program synthesis framework for compliance-aware robotic manipulation without gradient updates.
Achieved 1.6 mm peg-in-hole accuracy with a UR5 arm via code-as-policy generation.
Presented at the Southwest Robotics Symposium (SWRS) 2025.

ProPS: Prompted Policy Search

Tech Stack: Python, PyTorch, LLMs, Reinforcement Learning

Built an LLM-based optimizer that injects linguistic reasoning into numerical and RL policy optimization.
Outperformed PPO and TRPO on 8 of 15 benchmarks.
Used random projections to reduce neural dimensionality by 70% and improve convergence speed.

Hypernetwork-based Optimizer (HyOpt)

Tech Stack: Python, PyTorch, Gemma LLM, LoRA, Flash Attention 2

Fine-tuned Gemma with distributed training on 4xA100 GPUs for semantically enhanced hypernetwork optimization.
Reached convergence across 10+ numerical and RL domains using SFT over a 1.6B token dataset.
Accelerated fine-tuning by 50% with quantization, LoRA, and reasoning distillation.

Shared Control UR5 for In-Space Welding

Tech Stack: Python, MuJoCo, OpenCV, ArUco Markers, Inverse Dynamics

Designed a human-in-the-loop simulation for orbital construction with 6-DOF SpaceMouse teleoperation.
Implemented marker-based localization for metallic workpiece pose estimation.
Built semi-autonomous Grab and Weld modes for stable power-grip alignment.

Humanoid Quadruped Motion Planning (Learning to Crawl)

Tech Stack: Python, MuJoCo, PCA, CMA-ES, Dynamic Movement Primitives

Reduced 29 joint DOF to rank-2 latent representations using PCA for faster trajectory search.
Generated stable crawl trajectories using DMPs optimized by CMA-ES.
Showed low-dimensional latent optimization can synthesize periodic crawling behavior effectively.

Autonomous Indoor Navigation

Tech Stack: Python, ROS2, Gazebo, Turtlebot3, Q-Learning

Designed discrete state-action navigation from LiDAR sectors and spatial zones.
Implemented Q-learning with reward shaping for distance, direction, and collision behavior.
Trained for 400 episodes and improved target completion with fewer collisions.

Robotic Chess Player using Foundation Models

Tech Stack: Python, UR5, Pi0 VLA, Inverse Kinematics

Prototyped a robotic chess player combining vision-language-action models with contact-rich actuation.
Parsed PGN strings to translate high-level chess moves into low-level robotic motion plans.
Fine-tuned foundation models with IK-generated demonstrations for precise piece manipulation.

Self-Supervised Point Tracking in Turbulent Videos

Tech Stack: Python, PyTorch, DINOv2, RAFT, Restormer, QuickTurbSim

Enhanced DINO-Tracker with RAFT-based refinement for robust dense point tracking under atmospheric distortion.
Built a turbulence benchmark by augmenting TAP-Vid with QuickTurbSim.
Integrated Restormer-based stabilization and turbulence strength estimation from temporal displacement.

Cross-Geography Generalization for Flood Segmentation

Tech Stack: Python, Deep Learning, Computer Vision, UAV Imagery

Developed segmentation architectures for flooded region extraction in UAV aerial imagery.
Focused on domain generalization for accuracy across varying geographies and conditions.
Published findings for disaster response workflows.

Multiclass Classification and Verification of Online Signatures

Tech Stack: Python, Scikit-Learn, SVM, Signal Processing

Built a signature verification system on time-series dynamics for owner classification and forgery detection.
Applied the Ramer-Douglas-Peucker algorithm to reduce dimensions without losing signal fidelity.

Implementation of Genetic Algorithm for Path Traversal

Tech Stack: Javascript, P5.js

Developed an interactive 2D obstacle-avoidance simulator using an evolutionary path-finding strategy.
Inspired by concepts from The Nature of Code.

Neural Feature Extraction and Semantic Video Retrieval

Tech Stack: Python, ResNet, OpenCV, Scikit-Learn, Streamlit

Built a video retrieval engine using ResNet, HOG, and color histograms with latent semantic reduction via SVD, PCA, and KMeans.
Implemented relevance feedback using Decision Trees and KNN to refine rankings from user input.
Used spectral clustering and MDS embeddings to visualize semantic groups of videos.

Robotics & Reinforcement Learning

Hypernetwork-based Optimizer

Tech Stack: Python, PyTorch, Gemma LLM, LoRA, Flash Attention 2

Developed semantically enhanced Hypernetwork-based Optimizers by fine-tuning the Gemma LLM with Distributed Data Parallelism on 4xA100 GPUs.
Achieved convergence on 10+ numerical and Reinforcement Learning domains using Supervised Fine-Tuning (SFT) on a 1.6B token dataset.
Accelerated LLM fine-tuning by 50% through Quantization, LoRA, and reasoning distillation, integrating Flash Attention 2 for a 2x faster evaluation pipeline.

FORGE: Program Synthesis for Robotic Manipulation

Tech Stack: Python, LLMs, MuJoCo, UR5 Manipulator

Co-developed an LLM-based program synthesis framework for compliance-aware robotic manipulation without requiring gradient updates.
Achieved 1.6 mm peg-in-hole accuracy with a UR5 robot arm by generating code-as-policy.
Presented at the Southwest Robotics Symposium (SWRS) 2025.

ProPS: Prompted Policy Search

Tech Stack: Python, PyTorch, LLMs, Reinforcement Learning

Co-developed an LLM-based optimizer that incorporates linguistic reasoning for numerical and RL policy optimization.
Achieved State-of-the-Art (SOTA) results by outperforming PPO and TRPO on 8 of 15 benchmarks.
Implemented random projection techniques to reduce neural network dimensionality by 70%, facilitating faster convergence for high-dimensional optimization tasks.

Shared Control UR5 for In-Space Welding

Tech Stack: Python, MuJoCo, OpenCV, ArUco Markers, Inverse Dynamics

Designed a human-in-the-loop simulation for orbital construction, featuring a 6-DOF SpaceMouse controller and force-based inverse dynamics for a UR5 manipulator.
Implemented a computer vision pipeline using ArUco markers for real-time localization and pose estimation of metallic workpieces.
Developed semi-autonomous “Grab” and “Weld” modes with power-grip stabilization, enabling precise end-effector alignment in a zero-gravity environment.

Humanoid Quadruped Motion Planning

Tech Stack: Python, MuJoCo, PCA, CMA-ES, Dynamic Movement Primitives (DMP)

Implemented a latent space planning approach for Unitree G1 robot locomotion, reducing 29 joint degrees-of-freedom to a rank-2 latent space using PCA.
Generated stable crawl trajectories using Dynamic Movement Primitives (DMPs) optimized via Covariance Matrix Adaptation Evolution Strategy (CMA-ES).
Demonstrated that low-dimensional latent optimization could effectively synthesize complex, periodic crawling behaviors without full-body policy training.

Tech Stack: Python, ROS2, Gazebo, Turtlebot3, Q-Learning

Designed a discrete state-action space navigation system by segmenting LiDAR data into spatial sectors and zones.
Implemented a Q-learning algorithm with a custom reward structure (optimizing for distance, direction, and collision avoidance), training the agent over 400 episodes.
Achieved successful target navigation with reduced collision rates by iteratively refining the policy based on 144 unique state representations.

Robotic Chess Player using Foundation Models

Tech Stack: Python, UR5, Pi0 VLA, Inverse Kinematics

Prototyped a robotic chess player using a UR5 manipulator and Vision-Language-Action (VLA) models, integrating perception with contact-rich actuation.
Implemented language-conditioned control by parsing PGN (Portable Game Notation) strings to align high-level chess moves with low-level robotic motion planning.
Fine-tuned foundation models with inverse kinematics-generated demonstrations to handle precise piece manipulation.

Computer Vision & Multimedia

Self-Supervised Point Tracking in Turbulent Videos

Tech Stack: Python, PyTorch, DINOv2, RAFT, Restormer, QuickTurbSim

Engineered a dense point tracking system robust to atmospheric distortion by enhancing the DINO-Tracker architecture with RAFT-based optical flow refinement.
Created a novel benchmark for turbulent video tracking by augmenting the TAP-Vid dataset using QuickTurbSim.
Developed a pipeline to estimate turbulence strength () based on the temporal displacement of tracked points and integrated Restormer for video stabilization.

Neural Feature Extraction and Semantic Video Retrieval

Tech Stack: Python, ResNet, OpenCv, Scikit-Learn, Streamlit

Designed a video retrieval engine using ResNet, HOG, and Color Histograms, reducing high-dimensional features into latent semantics via SVD, PCA, and KMeans.
Implemented relevance feedback mechanisms (Decision Trees and KNN) to iteratively refine search rankings based on user input.
Performed spectral clustering to visualize semantic groupings of videos, interpreting inter-label similarities using MDS embeddings.

Cross-Geography Generalization for Flood Segmentation

Tech Stack: Python, Deep Learning, Computer Vision, UAV Imagery

Developed neural network architectures for the segmentation of flooded regions in UAV-captured aerial images.
Focused on domain generalization to ensure the model performed accurately across different geographical terrains and conditions.

Multiclass Classification and Verification of Online Signatures

Tech Stack: Python, Scikit-Learn, SVM, Signal Processing

Developed a signature verification tool using Support Vector Machines (SVM) on time-series data to detect forgeries and classify owners.
Optimized model efficiency by implementing the Ramer-Douglas-Peucker algorithm for feature dimension reduction without information loss.

Pratyush Kerhalkar

Projects

Project Details

FORGE: Program Synthesis for Robotic Manipulation

ProPS: Prompted Policy Search

Hypernetwork-based Optimizer (HyOpt)

Shared Control UR5 for In-Space Welding

Humanoid Quadruped Motion Planning (Learning to Crawl)

Autonomous Indoor Navigation

Robotic Chess Player using Foundation Models

Self-Supervised Point Tracking in Turbulent Videos

Cross-Geography Generalization for Flood Segmentation

Multiclass Classification and Verification of Online Signatures

Implementation of Genetic Algorithm for Path Traversal

Neural Feature Extraction and Semantic Video Retrieval

Robotics & Reinforcement Learning

Hypernetwork-based Optimizer

FORGE: Program Synthesis for Robotic Manipulation

ProPS: Prompted Policy Search

Shared Control UR5 for In-Space Welding

Humanoid Quadruped Motion Planning

Autonomous Indoor Navigation

Robotic Chess Player using Foundation Models

Computer Vision & Multimedia

Self-Supervised Point Tracking in Turbulent Videos

Neural Feature Extraction and Semantic Video Retrieval

Cross-Geography Generalization for Flood Segmentation

Multiclass Classification and Verification of Online Signatures