Research Projects

Research

Humanoid Hanoi: Investigating Shared Whole-Body Control for Skill-Based Box Rearrangement

In this work, We investigate a skill-based framework for humanoid box rearrangement that enables long-horizon execution by sequencing reusable skills at the task level. In our architecture, all skills execute through a shared, task-agnostic whole-body controller (WBC), providing a consistent closed-loop interface for skill composition, in contrast to non-shared designs that use separate low-level controllers per skill. Additionally, we introduce Humanoid Hanoi, a long-horizon Tower-of-Hanoi box rearrangement benchmark to evaluate long-horizon performance, and report results in simulation and on the Digit V3 humanoid robot, demonstrating fully autonomous rearrangement over extended horizons and quantifying the benefits of the shared-WBC approach over non-shared baselines.
Under Review

SAGE: Semantic And Geometric Estimation of 6D Object Pose from Multi-View Observations

We introduce SAGE, a unified multi-view transformer that bridges the gap between geometric scene reconstruction and semantic object understanding. SAGE utilizes a cross-view attention mechanism to enforce 3D geometric consistency across multiple camera angles. By integrating CLIP embeddings, SAGE enables open-vocabulary object grounding from text queries without task-specific retraining. The architecture incorporates a differentiable Direct Linear Transform (DLT) for camera estimation and a differentiable Non-Maximum Suppression (NMS) layer to resolve geometric symmetries. On an NVIDIA GeForce RTX 3090 Ti, SAGE achieves up to 18 Hz for single-view and over 10 Hz for two-view 256x256 inputs, providing an efficient and modular perception framework for robotic manipulation.
Under Review

ASM-6D: Real-Time 6D Object Pose and Shape Estimation via Active Shape Models and ADMM

We present ASM-6D, a framework for real-time category-level 6D pose, shape, and scale estimation from partial RGB-D observations. We bridge this gap by formulating a unified optimization problem over the SIM(3) manifold and introducing a high-throughput Matrix ADMM solver. Our approach concurrently estimates an object’s pose and shape at frequencies exceeding 100 Hz, with sub-linear scaling with respect to batch size and active shape model complexity. For learning-based keypoint detector frontends, we provide a pipeline for generating geometrically consistent keypoints, enabling the training of robust detection models without manual labeling. We demonstrate the system's reliability through global optimality certification via SDP duality and show resilience to both stochastic noise and occlusions. ASM-6D provides a mathematically rigorous and computationally efficient foundation for real-time, closed-loop robotic manipulation.
Under Review

Dynamic-ASM6D: Real-time 6D Object Pose and Shape Estimation via Active Shape Models and ADMM

This research introduces a real-time framework for simultaneous 6D object pose tracking and shape estimation in dynamic, real-world environments. The system combines Active Shape Models with ADMM-based point cloud registration to handle occlusions, deformations, and shape variability without requiring CAD models. A novel SVGD-based multi-hypothesis tracker over SIM(3) ensures robustness under symmetry and noise. Running at 100–200Hz on GPU, this framework enables reliable object-centric perception and closed-loop manipulation in fast-changing, human-centric tasks.
Accepted to Equivariant Systems: Theory and Applications in State Estimation, Artificial Intelligence and Control Workshop at RSS 2025
Accepted to TC Virtual Poster Session and Networking Event 2025

Learning a Vision-Based Footstep Planner for Hierarchical Walking Control

This research presents a vision-based hierarchical control framework to enhance locomotion in unstructured terrains. The framework integrates a reinforcement learning (RL) footstep planner, which generates adaptive footsteps from a robot-centric elevation map and the Angular Momentum Linear Inverted Pendulum model as policy input, with a low-level Operational Space Controller (OSC) that tracks the planned trajectories. We validate our approach on the underactuated bipedal robot Cassie and show evaluations across diverse terrain conditions in both simulation and hardware experiments.
Accepted to 2025 IEEE-RAS 24th International Conference on Humanoid Robots [Oral Presentation]

Terrain Recognition System for Wearable Device

This research develops a terrain recognition algorithm for a wearable system to support the mobility of forestry workers. Utilizing a single stereo camera, the system identifies safe regions and classifies ground types in dense forest environments, enabling adaptive operational mode switches for the wearable device in response to varying terrain types.

AI-based Real-Time Monitoring and Fault Diagnosis for Gear Failure in Electric Vehicle Reducers

This research constructs a real-time fault diagnosis model for monitoring gear damage in electric vehicle (EV) reducers using learning-based methods and demonstrates 98% detection. The study utilized feature extraction methods, including Wavelet Packet Decomposition (WPD), Mel-Frequency Cepstral Coefficients (MFCC), STFT-spectrogram, and Mel-spectrum features to train a Convolutional Neural Network (CNN) model.

Vision-based UAV control System for Following Drones in Complex Visual Environments

This research develops a real-time (>30fps) vision-based UAV control system that combines a YOLO-based object detection algorithm with a Siamese network for tracking and following dynamic drones in complex visual environments. The project addresses critical challenges in small object detection under limited visibility by implementing a custom small-object centric loss function and contextual region refinement methodology.