Shivanand Venkanna Sheshappanavar
I graduated with a Ph.D. from the Dept. of Computer and Information Sciences at the University of Delaware. I did my doctoral research at the VIMS Laboratory under the guidance of Dr. Chandra Kambhamettu. I completed my Masters in Computer Science at Syracuse University, New York (2018). Previously, I worked as an IT Consultant at Oracle India Private Limited (2012-2016). I also hold a Master's and a Bachelor's degree in Computer Science and Engineering from RVCE (2012) and MSRIT (2009), Bengaluru, respectively.
My primary research areas are 3D Computer Vision and large (vision) language models, which have applications in Grocery recognition for the visually impaired, Controlled-Environment Agriculture, and Digital Humanities.
Recent News:
|
|
Email  / 
CV  / 
Teaching Philosophy  / 
Google Scholar  / 
Github  / 
LinkedIn  / 
PhDinUS (Facebook)  / 
 / 
PhD Aspirants
|
|
|
|
|
PhD, CS University of Delaware 2018 - 2023 |
MS, CS Syracuse University 2016 - 2018 |
IT Consultant Oracle 2012 - 2016 |
Research Intern Infineon 2011 - 2012 |
Resources
Five nodes of 8xH100 GPUs (total 40 H100 GPUs).
Six nodes of 8xL40 GPUs (total 48 L40 GPUs).
Eight nodes of 8xA30 GPUs (total 64 A30 GPUs).
Two nodes of 4xA6000 GPUs (total 8 A6000 GPUs).
One node of 4xADA A6000 GPUs.
|
Research
My research interests are to develop deep learning algorithms for 3D computer vision problems and create end-to-end solution pipelines. My long-term goal is to build a mobile-based assistant for the visually impaired to help them navigate the real world.
|
|
EDADepth: Enhanced Data Augmentation for Monocular Depth Estimation
Nischal Khanal,Shivanand Venkanna Sheshappanavar
2024 23rd International Conference on Machine Learning and Applications (IEEE, 2024), December 2024, Miami, Florida, USA
[paper] [code]
Due to their text-to-image synthesis feature, diffusion models have recently seen a rise in visual perception tasks, such as depth estimation. The lack of good-quality datasets makes extracting fine-grain semantic context challenging for the diffusion models. The semantic context with fewer details further worsens the process of creating effective text embeddings that will be used as input for diffusion models. This paper proposes a novel EDADepth, an Enhanced Data Augmentation method for Monocular Depth Estimation without using extra training data. We use Swin2SR, a super-resolution model, to enhance the quality of input images. We employ the BEiT pre-trained semantic segmentation model to extract better text embeddings. Furthermore, we introduce the BLIP-2 tokenizer to generate tokens from these text embeddings. The novelty of our approach is the introduction of Swin2SR, BEiT model, and BLIP-2 tokenizer in the diffusion-based pipeline for monocular depth estimation. Our model achieves state-of-the-art results (SOTA) on the d3 metric on both NYUv2 and KITTI datasets. It also achieves results comparable to those of the SOTA models in the RMSE and REL metrics. Finally, we also show improvements in the visualization of the estimated depth compared to the SOTA diffusion-based monocular depth estimation models.
|
|
Mahalanobis k-NN: A Statistical Lens for Robust Point-Cloud Registrations
Tejas Anvekar,Shivanand Venkanna Sheshappanavar
4th Workshop on Image/Video/Audio Quality in Computer Vision and Generative AI, WACV 2025, Tucson, AZ, USA
[paper] [code]
In this paper, we discuss Mahalanobis k-NN: a statistical lens designed to address the challenges of feature matching in learning-based point cloud registration when confronted with an arbitrary density of point clouds, either in the source or target point cloud. We tackle this by adopting Mahalanobis k-NN's inherent property to capture the distribution of the local neighborhood and surficial geometry. Our method can be seamlessly integrated into any local-graph-based point cloud analysis method. This paper focuses on two distinct methodologies: Deep Closest Point (DCP) and Deep Universal Manifold Embedding (DeepUME). Our extensive benchmarking on the ModelNet40 and Faust datasets highlights the efficacy of the proposed method in point cloud registration tasks. Moreover, we establish for the first time that the features acquired through point cloud registration inherently can possess discriminative capabilities. This is evident by a substantial improvement of about 20% in the average accuracy observed in the point cloud few-shot classification task benchmarked on ModelNet40 and ScanObjectNN.
|
|
3DGrocery100: A Benchmark Grocery Dataset of Realworld Point Clouds From Single View
Shivanand Venkanna Sheshappanavar, Tejas Anvekar, Shivanand_Kundargi, Yufan Wang, Chandra Kambhamettu.
2024 International Conference on 3D Vision (3DV) (IEEE, 2024), March 2024, Davos, Switzerland.
[paper] [project page]
We introduce a large-scale grocery dataset called 3DGrocery100. It constitutes 100 classes, 10,755 RGB-D images, and 87,898 3D point cloud objects. We benchmark our dataset on six recent state-of-the-art 3D object classification models. 3DGrocery100 is the largest real-world 3D point cloud grocery dataset.
|
|
Local Neighborhood Features for 3D Classification
Shivanand Venkanna Sheshappanavar, Chandra Kambhamettu
22nd Scandinavian Conference In Image Analysis (SCIA), April 2023, Levi Ski Resort (Lapland), Finland.
[paper] [code]
With advances in deep learning model training strategies, the training of Point cloud classification methods is significantly improving. For example, PointNeXt, which adopts prominent training techniques and InvResNet layers into PointNet++, achieves over 7% improvement on the real-world ScanObjectNN dataset. However, most of these models use point coordinates features of neighborhood points mapped to higher dimensional space while ignoring the neighborhood point features computed before feeding to the network layers. In this paper, we revisit the PointNeXt model to study the usage and benefit of such neighborhood point features.
|
|
SimpleView++: Neighborhood Views for Point Cloud Classification
Shivanand Venkanna Sheshappanavar, Chandra Kambhamettu
IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR) 2022
[paper] [code] [video]
We propose the use of neighbor projections along with object projections to learn finer local structural information. SimpleView++ concatenates features from orthogonal perspective projections at object and neighbor levels with encoded features from the point cloud.
|
|
PatchAugment: Local Neighborhood Augmentation in Point Cloud Classification
Shivanand Venkanna Sheshappanavar, Vinit Veerendraveer Singh, Chandra Kambhamettu
IEEE/CVF International Conference on Computer Vision (ICCV) Workshops 2021
[paper] [code] [video]
Different local neighborhoods on the object surface hold a different amount of geometric complexity. Applying the same data augmentation techniques at the object level is less effective in augmenting local neighborhoods with complex structures. This paper presents PatchAugment, a data augmentation framework to apply different augmentation techniques to the local neighborhoods.
|
|
Dynamic local geometry capture in 3d point cloud classification
Shivanand Venkanna Sheshappanavar, Chandra Kambhamettu
IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR) 2021
[paper] [code] [video]
PointNet++ model uses ball querying for local geometry capture in its set abstraction layers. Several models based on single scale grouping of PointNet++ continue to use ball querying with a fixed-radius ball. However, ball lacks orientation and is ineffective in capturing complex or varying geometry proportions from different local neighborhoods on the object surface. We propose a novel technique of dynamically oriented and scaled ellipsoid based on unique local information to capture the local geometry better. We also propose ReducedPointNet++, a single set abstraction based single scale grouping model.
|
|
Mesh Classification with Dilated Mesh Convolutions
Vinit Veerendraveer Singh, Shivanand Venkanna Sheshappanavar, Chandra Kambhamettu
IEEE International Conference on Image Processing (ICIP) 2021
[paper]
[code]
[video]
In this paper, inspired by dilated convolutions for images, we proffer dilated convolutions for meshes. Our Dilated Mesh Convolution (DMC) unit inflates the kernels' receptive field without increasing the number of learnable parameters. We also propose a Stacked Dilated Mesh Convolution (SDMC) block by stacking DMC units. We accommodated SDMC in MeshNet to classify 3D meshes.
|
|
MeshNet++: A Network with a Face
Vinit Veerendraveer Singh, Shivanand Venkanna Sheshappanavar, Chandra Kambhamettu
29th ACM International Conference on Multimedia (ACM MM Oral) 2021
[paper]
[code]
[video]
MeshNet is a pioneer in this direction. In this paper, we propose a novel neural network that is substantially deeper than its MeshNet predecessor. This increase in depth is achieved through our specialized convolution and pooling blocks that operate on mesh faces. Our network named MeshNet++ learns local structures at multiple scales and is also robust to shortcomings of mesh decimation. We evaluated it for the shape classification task on various data sets, and results significantly higher than state-of-the-art were observed.
|
|
A novel local geometry capture in pointnet++ for 3d classification
Shivanand Venkanna Sheshappanavar, Chandra Kambhamettu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops 2020
[paper]
[code]
[video]
Few of the recent deep learning models for 3D point sets classification are dependent on how well the model captures the local geometric structures. PointNet++ model was able to extract the local region features from points by ball querying the local neighborhoods. However, ball querying is less effective in capturing local neighborhoods of high curvature surfaces or regions. In this paper, we demonstrate improvement in the 3D classification results by using ellipsoid querying around centroids, capturing more points in the local neighborhood. We extend the ellipsoid querying technique by orienting it in the direction of principal axes of the local neighborhood for better capture of the local geometry.
|
|
LSTM based Soil Moisture Prediction
Shivanand Venkanna Sheshappanavar, Chilukuri K. Mohan, David G. Chandler
1st Northeast Regional Conference on Complex Systems (NERCCS) 2018
[paper]
[code]
Soil moisture content is an important variable that has a considerable impact on agricultural processes and practical weather-related concerns such as flooding and drought. We address the problem of predicting soil moisture by applying recurrent neural networks that use Long Short-Term Memory (LSTM) models. The success of our approach is evaluated using a dataset obtained from ground-based sensor infrastructure networks. Feature reduction using a mutual information approach is shown to be more effective than feature extraction using principal component analysis.
|
Teaching
I am preparing the following courses at the University of Wyoming:
- COSC 4010/5010: Introduction to Deep Learning [Spring 2025]
- EE 5885/COSC 5010: Advances in Deep Learning [Spring 2025]
- COSC 4010/5010: Introduction to Large Language Models [Fall 2025]
I have been the Instructor for the following courses at the University of Wyoming:
- EE 5885/COSC 5010: Advances in 3D Computer Vision [Spring 2024]
- EE/COSC 2150: Computer Organization [Fall 2024, Fall 2023]
I have been the Instructor for the below course at the University of Delaware:
- CISC210: Introduction to Systems Programming [Summer 2020]
I have been the Lead Teaching Assistant for the following course:
- CISC210: Introduction to Systems Programming at the University of Delaware[Fall 2022, Spring 2022, Spring 2021, Fall 2020, Spring 2020, Fall 2019, Spring 2019]
I have been the Teaching Assistant for the following courses:
- CISC220: Data Structures at the University of Delaware [Fall 2021]
- CISC101: Principles of Computing at the University of Delaware [Winter 2021]
- CISC662: Advanced Computer Architecture at the University of Delaware [Fall 2018]
|
[Web Cite]
|