publications
2024
-
GOAT-BenchGOAT-Bench: A Benchmark for Multi-modal Lifelong Navigation 🤖Mukul Khanna*, Ram Ramrakhya*, Gunjan Chhablani, Sriram Yenamandra, Theophile Gervet, Matthew Chang, Devendra Singh Chaplot, Zsolt Kira, Dhruv Batra, and Roozbeh MottaghiCVPR 2024
-
HSSDHabitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation 🏘️Mukul Khanna*, Yongsen Mao*, Hanxiao Jiang, Sanjay Haresh, Brennan Shacklett, Dhruv Batra, Alexander Clegg, Eric Undersander, Angel X. Chang, and Manolis SavvaCVPR 2024
-
GOATGOAT: GO to Any Thing 🤖🐐Matthew Chang*, Theophile Gervet*, Mukul Khanna*, Sriram Yenamandra*, Dhruv Shah, So Yeon Min, Kavit Shah, Chris Paxton, Saurabh Gupta, Dhruv Batra, Roozbeh Mottaghi, Jitendra Malik, and Devendra Singh ChaplotRSS 2024
2023
-
OVMMHomeRobot: Open Vocab Mobile Manipulation 🤖Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alex William Clegg, John Turner, Zsolt Kira, Manolis Savva, Angel Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, and Chris PaxtonConference on Robot Learning (CoRL), NeurIPS Competition Track 2023
2022
-
EMQAEpisodic Memory Question Answering 🤖 🎞️Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, and Devi ParikhCVPR 2022
-
DeepHS-HDRVDeepHS-HDRVideo: Deep High Speed High Dynamic Range Video Reconstruction 📸Zeeshan Khan, Parth Shettiwar, Mukul Khanna, and Shanmuganathan RamanInternational Conference on Pattern Recognition (ICPR) 2022
2021
-
BF2NormalNetBuilding Facades to Normal Maps: Adversarial Learning from Single View Images 🏢Mukul Khanna, Tanu Sharma, Ayyappa Swamy Thatavarthy, and K. Madhava KrishnaConference on Robots and Vision (CRV) 2021
Surface normal estimation is an essential component of several computer and robot vision pipelines. While this problem has been extensively studied, most approaches are geared towards indoor scenes and often rely on multiple modalities (depth, multiple views) for accurate estimation of normal maps. Outdoor scenes pose a greater challenge as they exhibit significant lighting variation, often contain occluders, and structures like building facades are often ridden with numerous windows and protrusions. Conventional supervised learning schemes excel in indoor scenes, but do not exhibit competitive performance when trained and deployed in outdoor environments. Furthermore, they involve complex network architectures and require many more trainable parameters. To tackle these challenges, we present an adversarial learning scheme that regularizes the output normal maps from a neural network to appear more realistic, by using a small number of precisely annotated examples. Our method presents a lightweight and simpler architecture, while improving performance by at least 1.5x across most metrics. We evaluate our approaches against the state-of-the-art on normal map estimation, on a synthetic and a real outdoor dataset, and observe significant performance enhancements.
2019
-
FHDRFHDR: HDR Image Reconstruction from a Single LDR Image using Feedback Network 📸Zeeshan Khan, Mukul Khanna, and Shanmuganathan RamanGlobalSIP 2019
High dynamic range (HDR) image generation from a single exposure low dynamic range (LDR) image has been made possible due to the recent advances in Deep Learning. Various feed-forward Convolutional Neural Networks (CNNs) have been proposed for learning LDR to HDR representations. To better utilize the power of CNNs, we exploit the idea of feedback, where the initial low level features are guided by the high level features using a hidden state of a Recurrent Neural Network. Unlike a single forward pass in a conventional feed-forward network, the reconstruction from LDR to HDR in a feedback network is learned over multiple iterations. This enables us to create a coarse-to-fine representation, leading to an improved reconstruction at every iteration. Various advantages over standard feed-forward networks include early reconstruction ability and better reconstruction quality with fewer network parameters. We design a dense feedback block and propose an end-to-end feedback network-FHDR for HDR image generation from a single exposure LDR image. Qualitative and quantitative evaluations show the superiority of our approach over the state-of-the-art methods.
-
URSIMOpen Source Simulator for Unmanned Underwater Vehicles using ROS and Unity3D 🐟Pushkal Katara, Mukul Khanna, Harshit Nagar, and A. PanaiyappanUnderwater Technology (UT) 2019
The paper presents URSim: an open source 3D underwater simulation framework for Unmanned Underwater Vehicles (UUVs) developed using Robotics Operating System (ROS) and a real-time game engine called Unity3D. Simulation systems like these enable to implement, test, study and analyze complex systems while minimizing cost and disruption to the environment. URSim provides the user an intuitive way to simulate underwater vehicles and robots. It is capable of simulating feedback control systems, dynamic model, underwater vision and mission planning for underwater vehicles and robots. The simulation provides support for underwater sensor modules, underwater physics, collision kinematics and is highly configurable to simulate a realistic underwater environment. The software architecture is adaptive to algorithms for control systems, image processing, navigation and manipulation.