Mukul Khanna | publications

2024

GOAT-Bench

GOAT-Bench: A Benchmark for Multi-modal Lifelong Navigation 🤖

Mukul Khanna*, Ram Ramrakhya*, Gunjan Chhablani, Sriram Yenamandra, Theophile Gervet, Matthew Chang, Devendra Singh Chaplot, Zsolt Kira, Dhruv Batra, and Roozbeh Mottaghi

CVPR 2024

arXiv WEBSITE Code
HSSD

Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation 🏘️

Mukul Khanna*, Yongsen Mao*, Hanxiao Jiang, Sanjay Haresh, Brennan Shacklett, Dhruv Batra, Alexander Clegg, Eric Undersander, Angel X. Chang, and Manolis Savva

CVPR 2024

arXiv WEBSITE Code
GOAT

GOAT: GO to Any Thing 🤖🐐

Matthew Chang*, Theophile Gervet*, Mukul Khanna*, Sriram Yenamandra*, Dhruv Shah, So Yeon Min, Kavit Shah, Chris Paxton, Saurabh Gupta, Dhruv Batra, Roozbeh Mottaghi, Jitendra Malik, and Devendra Singh Chaplot

RSS 2024

arXiv WEBSITE

2023

OVMM

HomeRobot: Open Vocab Mobile Manipulation 🤖

Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alex William Clegg, John Turner, Zsolt Kira, Manolis Savva, Angel Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, and Chris Paxton

Conference on Robot Learning (CoRL), NeurIPS Competition Track 2023

arXiv WEBSITE Code

2022

EMQA

Episodic Memory Question Answering 🤖 🎞️

Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, and Devi Parikh

CVPR 2022

WEBSITE PDF
DeepHS-HDRV

DeepHS-HDRVideo: Deep High Speed High Dynamic Range Video Reconstruction 📸

Zeeshan Khan, Parth Shettiwar, Mukul Khanna, and Shanmuganathan Raman

International Conference on Pattern Recognition (ICPR) 2022

arXiv

2021

BF2NormalNet

Building Facades to Normal Maps: Adversarial Learning from Single View Images 🏢

Mukul Khanna, Tanu Sharma, Ayyappa Swamy Thatavarthy, and K. Madhava Krishna

Conference on Robots and Vision (CRV) 2021

Abs WEBSITE Code

Surface normal estimation is an essential component of several computer and robot vision pipelines. While this problem has been extensively studied, most approaches are geared towards indoor scenes and often rely on multiple modalities (depth, multiple views) for accurate estimation of normal maps. Outdoor scenes pose a greater challenge as they exhibit significant lighting variation, often contain occluders, and structures like building facades are often ridden with numerous windows and protrusions. Conventional supervised learning schemes excel in indoor scenes, but do not exhibit competitive performance when trained and deployed in outdoor environments. Furthermore, they involve complex network architectures and require many more trainable parameters. To tackle these challenges, we present an adversarial learning scheme that regularizes the output normal maps from a neural network to appear more realistic, by using a small number of precisely annotated examples. Our method presents a lightweight and simpler architecture, while improving performance by at least 1.5x across most metrics. We evaluate our approaches against the state-of-the-art on normal map estimation, on a synthetic and a real outdoor dataset, and observe significant performance enhancements.

2019

FHDR

FHDR: HDR Image Reconstruction from a Single LDR Image using Feedback Network 📸

Zeeshan Khan, Mukul Khanna, and Shanmuganathan Raman

GlobalSIP 2019

Abs arXiv Code

High dynamic range (HDR) image generation from a single exposure low dynamic range (LDR) image has been made possible due to the recent advances in Deep Learning. Various feed-forward Convolutional Neural Networks (CNNs) have been proposed for learning LDR to HDR representations. To better utilize the power of CNNs, we exploit the idea of feedback, where the initial low level features are guided by the high level features using a hidden state of a Recurrent Neural Network. Unlike a single forward pass in a conventional feed-forward network, the reconstruction from LDR to HDR in a feedback network is learned over multiple iterations. This enables us to create a coarse-to-fine representation, leading to an improved reconstruction at every iteration. Various advantages over standard feed-forward networks include early reconstruction ability and better reconstruction quality with fewer network parameters. We design a dense feedback block and propose an end-to-end feedback network-FHDR for HDR image generation from a single exposure LDR image. Qualitative and quantitative evaluations show the superiority of our approach over the state-of-the-art methods.
URSIM

Open Source Simulator for Unmanned Underwater Vehicles using ROS and Unity3D 🐟

Pushkal Katara, Mukul Khanna, Harshit Nagar, and A. Panaiyappan

Underwater Technology (UT) 2019

Abs WEBSITE Code

The paper presents URSim: an open source 3D underwater simulation framework for Unmanned Underwater Vehicles (UUVs) developed using Robotics Operating System (ROS) and a real-time game engine called Unity3D. Simulation systems like these enable to implement, test, study and analyze complex systems while minimizing cost and disruption to the environment. URSim provides the user an intuitive way to simulate underwater vehicles and robots. It is capable of simulating feedback control systems, dynamic model, underwater vision and mission planning for underwater vehicles and robots. The simulation provides support for underwater sensor modules, underwater physics, collision kinematics and is highly configurable to simulate a realistic underwater environment. The software architecture is adaptive to algorithms for control systems, image processing, navigation and manipulation.