Research Experience


My primary research focuses on Natural Language Processing (NLP), specifically using transformer-based models and Automatic Speech Recognition (ASR). I am highly motivated to develop solutions that can advance how machines understand human language and speech.

I have worked extensively with language models such as BERT, DistilBERT, and other transformers to detect harmful online content. In a recent project, I implemented transformers to identify online sexism with promising accuracy results. My current projects focus on improving Automatic Speech Recognition (ASR) models, which involve deep learning techniques such as LSTMs and GRUs. These methods aim to enhance the way machines transcribe human speech with higher precision, especially in noisy environments.

During my research, I have also explored computer vision techniques, which include building models for license plate recognition in the Bengali language. I’ve employed a fusion of CNN-GRU architectures combined with explainable AI to achieve better performance in real-world applications.

As I move forward, I aim to deepen my knowledge in both NLP and ASR, working towards innovations that can be applied to enhance machine learning models and contribute to safer, smarter AI-driven systems.

Publications (4)

CNN-GRU Based Fusion Architecture For Bengali License Plate Recognition With Explainable AI

Protiva Das, Sowmen Mitra, Sovon Chakraborty, Md. Humaion Kabir Mehedi, Muhammed Yaseen Morshed Adib, Annajiat Alim Rasel
2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)
July 2023

Abstract: Because of recent improvements to Bangladesh’s roads and highways, Automatic Number Plate Recognition (ALPR) has become a crucial component. Numerous crimes, including kidnapping, failure to pay tolls, and harassment of women, occur on both public and private transportation. The security forces will be able to locate offenders more quickly with the earlier and more accurate detection of license plates. The authors of this research propose a deep learning-based fusion model for ALPR that integrates CNN and GRU on the basis of these circumstances. A total of 4753 images from various Bangladeshi roads and highways have been collected for training, validation, and testing purposes. The dataset consists of three classes of data, namely Private cars, Public buses, and Trucks, where all the images are in RGB format. To get precise and reliable findings, a variety of preprocessing approaches have been applied. After passing the images to the proposed architecture, all the necessary parameters have been fine-tuned, resulting in a lesser amount of trainable parameters and more accuracy. The research demonstrates that the suggested CNN-GRU based fusion architecture, with a 98.97% F1-score, outperforms the leading models. Both static photos and CCTV video material can be used to accomplish ALPR tasks with comparable efficiency. Later, Explainable Artificial Intelligence (XAI) model SHAP has been used in order to interpret the outstanding result with a region of features.

Download Paper

LSTM-ANN Based Price Hike Sentiment Analysis from Bangla Social Media Comments

Sovon Chakraborty, Muhammad Borahn Uddin Talukdar, Muhammed Yaseen Morshed Adib, Sowmen Mitra, Md. Golam Rabiul Alam
2022 25th International Conference on Computer and Information Technology (ICCIT)
December 2022

Abstract: Price hike has always been a substantial concern for people all over the world. The crisis gets more conspicuous, and people find themselves more confounded when even the bare minimum of expenses still exceeds the amount they can get to earn. This tension tends to invite chaos in society as the number of people affected increases. Bangladesh is currently undergoing a formidable wave of price hikes. People have been expressing mixed reactions on social media regarding this issue. Hence, understanding the overall public sentiment can be crucial for policymaking and preventing chaos in society. This study utilizes social media comments for analyzing underlying sentiments. Data were collected from the Facebook pages of some popular Bangladeshi media for this purpose, and thereby a specialized dataset was constructed. The dataset contains 2000 public comments annotated with three polarity values- positive, negative, and neutral. A hybrid LSTM-ANN deep architecture has been exploited in this research. The model outperforms other state-of-the-art models in terms of less trainable parameters along with an F1-score of 88.47%.

Download Paper

Smartphone based Human Activity Recognition using CNNs and Autoencoder Features

Sowmen Mitra, Proma Kanungoe
2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)
April 2023

Abstract: Recognition of human activities is essential for many applications, and the widespread availability of low-cost sensors on smartphones and wearables has enabled the development of mobile apps capable of tracking user activities “in the wild.” However, dealing with heterogeneous data from different devices and real-time scenarios presents significant challenges. In this study, a novel learning framework is proposed for Human Activity Recognition (HAR) that combines a Convolutional Neural Network (CNN) with an autoencoder for feature extraction. The study also investigates the importance of preprocessing techniques, including orientation-independent transformation, to mitigate heterogeneity when dealing with multiple types of smartphones. The results show that the proposed approach outperforms state-of-the-art methods in HAR, with an accuracy of 95.74% on the heterogeneous dataset used in this study. Furthermore, the study demonstrates that the proposed framework can be effectively deployed on smartphones with limited computational resources, making it suitable for real-world applications.

Download Paper

Detecting Public Hate Sentiment Using Transformers

Sowmen Mitra, Proma Kanungoe
2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)
July 2023

Abstract: This research explores the application of deep learning techniques, specifically BERT and DistilBERT models, for detecting public hate sentiment in text data. Through a systematic analysis of the literature and a thorough understanding of the subject matter, we developed a groundbreaking method that leverages the power of these advanced models. Extensive experimentation and evaluation were conducted using a carefully curated dataset, employing techniques such as tokenization, padding, and truncating for preprocessing. The results demonstrate the efficacy of our approach, achieving high accuracy and precision in identifying and classifying hate sentiment. This research contributes to the field of natural language processing and provides valuable insights for effectively addressing and mitigating hate speech in online platforms.

Download Paper