I am a former Data Scientist now pursuing a Ph.D. in Computer Science, where I study how greener technologies can improve public health. I employ methods from environmental epidemiology, statistics, and machine learning to assess the impact of electric vehicles on air quality and public health. My research aims to inform policymakers in creating equitable environmental regulations for addressing climate change. At the University of Toronto, I am supervised by Dr. Steve Easterbrook.
I am also engaged in research on the mitigation of technology's rebound effects—or unintended effects—and contribute to their mitigation in residential buildings, data centers, and machine learning infrastructure. I am a doctoral fellow at the University of Toronto's Climate Positive Energy (CPE) and Data Science Institute (DSI), where I contribute to research on sustainable energy solutions and data-driven environmental policy.
"Where there is much desire to learn, thereof necessity will be much arguing, much writing, many opinions; for opinion in good men (good person) is but knowledge in the making." — John Milton.
Updates
Received the DSI Doctoral Fellowship of $75,000 for three years (2024-2027) for my research on the health impact of electric vehicles (EVs) on respiratory diseases. Awarded for ongoing contributions in data science with a focus on innovative research that bridges the gap
between theory and practical application.
Received the Climate Positive Energy's Climate Solutions Scholarship of $15,000 (2023-24), supporting scholars committed to developing solutions that promote an equitable energy transition.
Software engineering for cybersecurity: My paper titled, "Design and Implementation of a Quantitative Network Health Monitoring and Recovery System," has been recently published at Springer's Wireless Personal Communications Journal. Please check it here.
President’s Advisory Committee on the Environment, Climate Change, and Sustainability
Research Assistant
Full-Time (Summer'23, 5 m)
Domain: Sustainability Education
Advisor(s)
Dr. Ayako Ariga
Responsibilities
The mandate of the CECCS is to advance coordination of the University of Toronto’s contributions and objectives on climate change and sustainability pertaining to research and innovation, academic programs, community engagement, and sustainability initiatives related to our operations.
As a part of our research, we studied issues faced by software engineers while implementing source-code logging in their applications. Subsequently, we built an NLP-based topic model to analyze six websites of Stack Exchange Network, including Stack Overflow. Our research paper got accepted in the Journal of Software: Evolution and Process, Wiley.
Nyalazone Solutions Pvt. Ltd.
Data Scientist
Full-Time (Jul'19 - Jul'20, 12 m)
Domain: Algorithms, Data Management, and Data Transformation.
Responsibilities
As a data scientist, my responsibilities included building Leggero Dynamic Data Source (DDS) platform to transform and manage structured data. In addition, I trained and deployed a CNN to classify similar residential addresses by transforming them into matrices. I also visited the Department of Employment and Labour in South Africa to understand their data requirements and revamped DDS accordingly.
Delhivery Pvt. Ltd.
Associate Data Scientist Intern
Full-Time (Jan'19 - Jul'19, 6 m)
Domain: Data science, Statistics, and Machine learning
Responsibilities
Delhivery is a third-party logistics service provider, operating in over 1,200 cities in India. Delhivery became the first unicorn of 2019 with a valuation of $1.6 Billion.
Due to largely unstructured geographical planning and high population density, residential address matching is crucial. I worked on the performance enhancement of the address similarity engine that employs hierarchical structures derived from a residential address, along with edit distance-based matching using phonetics.
Koç University - Istanbul
Visiting Student
Full-Time (Jul - Aug '18 , 2 m)
Domain: Data science, Machine learning, and Visualizations Interdisciplinary: Computer Science and International Relations
I was selected to participate in the Summer Research Program at KOC University, Istanbul, Turkey. My project Dangerous Companions was an interdisciplinary initiative that focuses on non-state violence, especially on the role of non-state actors, such as terrorists, insurgents, and revolutionaries. The project is supported by the NAGs dataset .
My task involved assessing the relationship between the international support and lifespan of these groups. I engineered several new features using domain knowledge coupled with network science. Subsequently, the association analysis was conducted using ensemble learning and machine learning interpretability techniques.
CSIR-CDRI - India
Data science & Machine learning Intern
Remote (Aug '17 - Jul '18 , 11 m)
Domain: Statistics and Machine learning Interdisciplinary: Computer Science, Ecology, and Neuroscience.
As a part of our research, I determined crucial gaps in the usage of time-series analysis tools, particularly by looking at their applications in life sciences and neuroscience. Subsequently, we quantified the trends in their usage from the perspective of software practitioners and life scientists by examing the vast databases of Stack Overflow, Cross Validated, and PubMed.
Nyalazone Solutions Pvt Ltd.
Data Engineer Intern
Full-Time (May - Jul '17 , 2 m)
Domain: Algorithms, Problem Solving and Web-Development
Advisor(s)
Mr. Saurabh Kumar (Founder & CEO) and Ms. Pooja Saxena (Data Engineer)
"As a data scientist, my responsibilities included building Leggero Dynamic Data Source (DDS) platform to transform and manage structured data. In addition, I trained and deployed a CNN to classify similar residential addresses by transforming them into matrices. I also visited the Department of Employment and Labour in South Africa to understand their data requirements and revamped DDS accordingly."
"As a data scientist, my responsibilities included building Leggero Dynamic Data Source (DDS) platform to transform and manage structured data. In addition, I trained and deployed a CNN to classify similar residential addresses by transforming them into matrices. I also visited the Department of Employment and Labour in South Africa to understand their data requirements and revamped DDS accordingly."
Editage
Language Editor
Remote (Jun' 20 - Present, 8 m)
Domain: Language editing service for research papers.
Responsibilities
Editage aims to help scholars break through the confines of geography and language, bridge the gap between authors and peer-reviewed journals, and accelerate the process of publishing high-quality research.
As a Langauge Editor, I responsibility is to review and edit Computer Science research. It not only helps me to stay connected with the existing state-of-the-art but also enhances my writing skills.
HackerRank
Technical Content Writer
Remote (Aug'19 - Present, 16 m)
Domain: Data science and Machine learning
Responsibilities
"As a data scientist, my responsibilities included building Leggero Dynamic Data Source (DDS) platform to transform and manage structured data. In addition, I trained and deployed a CNN to classify similar residential addresses by transforming them into matrices. I also visited the Department of Employment and Labour in South Africa to understand their data requirements and revamped DDS accordingly."
Broadly, I have published peer-reviewed contributions in computational modeling, HCI, software engineering, and data mining
Computational Modeling
1. Harshit Gujral, Adwitiya Sinha (2021). Association between exposure to airborne pollutants and COVID-19 in Los Angeles, United States with ensemble-based dynamic emission model. Environmental Research, Elsevier (Peer-reviewed | SCI, SCIE indexed). Also, included in Elsevier Public Health Emergency Collection.[ScienceDirect Version][PubMed Version] Domain: Network science, Ensemble learning, and Exposure modeling
2. Harshit Gujral, Belgin San-Akca, Sangeeta Mittal. Association of international support and violence with the longevity of armed groups. (Ongoing). Domain: International Relations, Counterterrorism, Network Science, and Machine Learning
3. Harshit Gujral, Somya Jain, Adwitiya Sinha. EPA or CARB? Evaluation of ground-level ozone pollution of the US using network science. (Ongoing). Domain: Environmental Policies Assessment, Network Science, and Emission Modeling
Ethics in AI
4. Eshta Bhardwaj, Harshit Gujral, Siyi Wu, Ciara Zogheib, Tegan Maharaj, Christoph Becker (2024). Machine Learning Data Practices through a Data Curation Lens: An Evaluation Framework
. ACM Conference on Fairness, Accountability, and Transparency 2024 (ACM FAccT'24)[ACM FAccT Preprint] Domain: Data Curation in Machine Learning
Human–computer interaction (HCI)
5. Christina Bremer, Harshit Gujral, and Christoph Becker, Vlad Coroama (2023). How Viable are Energy Savings in Smart Homes? A Call to Embrace Rebound Effects in Sustainable HCI. ACM Journal on Computing and Sustainable Societies. Also, ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies (COMPASS'23).[ACM Digital Library][GitHub] Domain: Climate Change, Sustainability, and Empirical Analysis
Software Engineering
6. Harshit Gujral, Abhinav Sharma, Pulkit Jain, Shreya Juneja, Sangeeta Mittal (2022). Design and Implementation of a Quantitative Network Health Monitoring and Recovery System. Wireless Personal Communication by Springer (Semimonthly peer-reviewed | Scopus, SCIE indexed).[Springer Version][GitHub] Domain: Software Engineering for Cybersecurity.
7. Harshit Gujral, Sangeeta Lal, Heng Li (2021). An exploratory semantic analysis of logging questions
. Journal of Software: Evolution and Process, e2361 (Peer-reviewed | SCIE indexed).[Wiley Version][Kudos Wiley] Domain: Topic Modeling and Empirical Analysis
8. Harshit Gujral , Abhinav Sharma, Sangeeta Lal, Lov Kumar (2019). A Three Dimensional Empirical Study of Logging Questions From Six Popular Q&A Websites. e-Informatica Software Engineering Journal by Polish Academy of Sciences and Wroclaw University of Science and Technology(Annually peer-reviewed | Scopus, ESCI indexed).[GitHub][e-informatica Version] Domain: Data Science and Empirical Analysis
9. Harshit Gujral, Abhinav Sharma, Sangeeta Lal, Amanpreet Kaur, Lov Kumar, Ashish Sureka (2018). Empirical Analysis of the Logging Questions on the Stack Overflow Website. Conference On Software Engineering & Data Sciences (CoSEDS): CEUR WS (Scopus) in-press. Domain: Data Science and Empirical Analysis
10. Harshit Gujral, Abhinav Sharma, Sangeeta Mittal (2017). No-Escape Search: Design and Implementation of Cloud-Based Directory Content Search. IEEE- 10th International Conference on Contemporary Computing (IC3)[IEEE Version][Web-Page] Domain: Algorithm and Problem Solving
Data-mining
11. Harshit Gujral, Ajay Khushwaha, Sukant Khurana (2020). Utilization of Time Series Tools in Life-sciences and Neuroscience. Neuroscience Insights (formerly Journal of Experimental Neuroscience). Sage Publications (Peer-reviewed | Scopus, ESCI, PubMed indexed).[Sage Version][PubMed Version] Domain: Statistical Modeling and Time-series Analysis
12. Harshit Gujral, Sangeeta Mittal, Abhinav Sharma (2019). A Novel Data mining approach for Analysis and Pattern Recognition of Active Fingerprinting components. Wireless Personal Communication by Springer (Semimonthly peer-reviewed | Scopus, SCIE indexed).[Springer Version][ACM Version] Domain: Network Communication and Cybersecurity.
13. Harshit Gujral, Abhinav Sharma, Parmeet Kaur (2018). Empirical Investigation of Trends in NoSQL-based Big-data Solutions in the Last Decade. IEEE: 11th International Conference on Contemporary Computing-IC3[Web-Page][IEEE Version] Domain: NoSQL - Database Management Systems and Empirical Analyeses
14. Harshit Gujral , Abhinav Sharma, Sangeeta Lal (2018). Empirical analysis of Q&A websites and a sustainable solution to ensure water-security. IEEE-11th International Conference on Contemporary Computing[Web-Page][IEEE Version] Domain: Global Water-Security
Concurrent Computing
15. Harshit Gujral , Abhinav Sharma, Sangeeta Mittal (2018). Determination of Optimal Thread Pool for Cloud based Concurrent Enhanced No-Escape Search. IEEE: 11th International Conference on Contemporary Computing-IC3[GitHub][IEEE Version] Domain: Parallel Processing Algorithms and High-Performance Computing
Peer-Review Experience
Air Quality, Atmosphere & Health (Springer Nature): Reviewed research on the intersection Machine Learning and Air Quality (Twice).
PLOS One (PLOS),: Reviewed research on the intersection of COVID-19 and network science (Once).
Journal of Hydrology - Elsevier: Reviewed research on the intersection of Social Network Analysis and Water Management (Thrice).
Wireless Personal Communication - Springer: Reviewed Cybersecurity research (5 times).
International Conference on Computational Intelligence and Data Science 2018 (ICCIDS2018): Reviewed data science research (Once).
Binarization, Segmentation and Recognition of generic shopping bills.
Project Details
Two-line Summary
This project aims for OCR (Optical Character Recognition) of white paper receipts. It includes Cropping (removing background), Improving Quality (if image is blurred), Binarization, Segmentation and Prediction.
System Vulnerabilities
Estimation of System Vulnerabilities from the Network layer using Deep-learning Techniques
Project Details
Domain
Cybersecurity, Network Communication, Research, Data Analytics
Two-line Summary
This model aims to estimate system vulnerabilities remotely without requiring system privileges in much lesser time.
Academic Profile
Academic Profile for Prof. Belgin San-Akca, Koc University, Istanbul.
Project Details
Domain
Web-development
Two-line Summary
This is a web-development project is Academic Profile of Professor Belgin San-Akca, Koc University, Istanbul, Turkey. I developed this website during Summer Research Program 2018 at Koc University.
Terror Network
Identification of precursors for thriving conditions in a Terror-Network.
Project Details
Domain
Social Causes, Research
Two-line Summary
This interdisciplinary project started during Summer Research Program 2018 at Koc University. I further continued this project as my major project at Jaypee Institute of Information Technology, Noida, India.
Hyperbola-based Thread-pool Analysis (HTA)
Determination of Optimal Thread Pool for Cloud based Concurrent Processes
Project Details
Domain
Research, Data Analytics, Algorithms
Two-line Summary
We aim to enhance performance for any cloud-based indexing process by keeping CPU/cores constant but improving utilization of existing resources.
Hyperbola-based Thread-Pool Analysis (HTA) technique has been designed to determine optimal Thread Pool depending upon a number of keywords in a files to upload (workload) and upload speed of network (bandwidth).
Network Fingerprinting and Communication
Study on Active Fingerprinting Analysis of Hosts in an Institutional Network
Project Details
Domain
Network Communication, Cybersecurity, Research
Two-line Summary
A distinct correlation pattern is observed in timers (RTT, SRTT, RTTVar and RTO) with variation in IP-ID Sequence classes, traceroute protocols and network traffic intensity.
Desalination
Desalination as a permanent solution to water-scarcity: A novel empirical study.
Project Details
Domain
Social Causes, Data Analytics, Research
Two-line Summary
This independent study aims to formulate awareness for Desalination of sea-water as a permanent solutions to water scarcity.
Network Health and Information Security
A Distributed Network Health Monitoring and Recovery System
Project Details
Domain
System and Software, Cybersecurity, Network Communication, Research
Two-line Summary
Eminent features of this system encompass monitoring and permeating malicious egress and ingress traffic, identification and scoring of most-prominent and exhaustive vulnerabilities along with exposing devices with anomalous fingerprints.
USA Air-Quality
Project Details
Domain
Social Causes, Data Analytics, Research
Two-line Summary
EEG
Electroencephalogram Data Analysis (Alcoholics and Control)
Project Details
Domain
Data Analytics, Research
Two-line Summary
Data Analysis, Visualization, Hypothesis - Significance testing of EEG dataset.
AAM System
Automated Attendance Management System
Project Details
Domain
System and Software, Web-development
Two-line Summary
This Flask-based Web Application is implemented during my internship at Nyalazone Solutions Pvt Ltd.
ERM System
Employee Resource Management System
Project Details
Domain
System and Software, Web-development
Two-line Summary
Designed a centralized MySQL database satisfying BCNF Normalization. Privileges for a user is divided into three categories i.e. Superadmin, Admin & Employee. It has submodules such as automated attendance recorder that is implemented by scanning the network through Nmap, Employee performance analyzer, Project monitoring system, leave management system and an Internal Chat System implemented by using AJAX.
Indexing Algorithms
Disk-level indexing algorithms developed at Nyalazone Solutions Pvt Ltd.
Project Details
Domain
Algorithms, Research
Two-line Summary
This repository consists of disk-level indexing algorithms developed by my summer internship at Nyalazone Solutions Pvt Ltd.
Tools/Technologies Used:
Python, HDF-5, ZLIB Compression
Trip-Advisor Scraping
This project is the part of Flight module of Review Pool project.
Project Details
Domain
Web Scraping
Two-line Summary
Names and IDs of Flights were extracted from Trip-Advisor that are further used to dynamically crawl data from Trip-Advisor. (for learning purpose only)
No-Escape Search System
Design and Implementation of Cloud-based Directory Content Search
Project Details
Domain
System and Software, Algorithms, Research
Two-line Summary
No-Escape Search has solved 3 problems of the Windows Search. First, memory wastage by Windows indexing and its limited nature; second, slow data retrieval by unindexed window search; and third, inability to facilitate the user with location(s) of the input.
Manuscript at IEEE Xplore.
Review Pool
Centralized Review and Ratings
Project Details
Domain
System and Software, Web-development
Two-line Summary
It is an integrated platform for reviews and ratings of multiple services namely, Hotels, Food, Flights and Jobs. In-order to aggregate the information we used API’s of TripAdvisor, Zomato, goibibo and GlassDoor.
JIIT-Simplified Scraping
This project is the part of my involvement with JIIT-Simplified platform as a core-team member.
Project Details
Domain
Web Scraping
Two-line Summary
Script is designed to scrap static HTML pages from old JIIT-Simplified Website to store them into MySQL database followed by creating new web-interface for placement forum at JIIT-Simplified Platform.
SUNNY FLIGHT
Project Details
Domain
System and Software, Web-development
Two-line Summary
It determines which ‘side of plane’ (left or right) should be reserved in order to get minimum exposure to sunlight using calculations of elevation of latitudes and longitudes of earth, flight and sunlight.
Yatrasoft
Client Management System - Java-JavaFX (JDK-8)
Project Details
Domain
System and Software
Two-line Summary
This GUI has features namely registration, updation, deletion of the client along with a feature to migrate data to excel using Apache POI.
Google Scraping
This scraping is the part of our ongoing Mining Software Repository project.
Project Details
Domain
Web Scraping
Two-line Summary
Script stores Google-Search results into MySQL database.
Yellow-Pages
C++ Graphics Console Application
Project Details
Domain
Algorithms
Two-line Summary
It is a Project on Data Structures using C++ graphics. This project is primarily based on Binary Search Tree and C++ Graphics. On the basic of user's data, we have implemented insertion, deletion and searching operation with interactive graphics. Moreover we have provided a comparison in time of execution of insertion and traversal in binary tree (using queue), BST, AVL Tree and Heap Tree.
JIIT Scraping
This projects involves scraping faculty information from JIIT-Simplified Platform for learning purposes only.
Project Details
Domain
Web Scraping
Two-line Summary
Names and other information is extracted from www.jiit.ac.in website.
Presentations
1. Presented at ACM COMPASS 2023 (SIGCAS/SIGCHI Conference on
Computing and Sustainable Societies) in Cape Town, South Africa, a research paper titled 'How Viable Are Energy Savings in Smart Homes? A Call to Embrace Rebound Effects in Sustainable HCI'.
2. Presented at Emerging Mobility Scholars Conference 2023 in Toronto, Canada, a research paper titled 'Impact of EV sales on childhood asthma in the US: Can ZEV mandates help?'.
3. Presented at Climate Positive Energy Research Day 2023 in Toronto, Canada, a research paper titled 'Examining the public health impact of ZEV mandates in the US'.
5. Presented at National Seminar on Unboxing Today's Consumers in a Global And Digital Age in Noida, India, a review paper titled 'Desalination as permanent solution for water scarcity'.
Involvements
1. Graduated from the Young Urban Forest Leaders (YUFL) program by LEAF in Toronto, 2024, specializing in urban forestry and community leadership.
2. Volunteered at Second Mile Club at Kensington Health, a senior home, through the Community Action Program (CAP) at the University of Toronto, providing support and companionship to elderly residents for 2023-24.
3. Collaborated with Climate Justice Toronto in organizing the Global March for Climate Action at Queen's Park in September 2023, advocating for progressive environmental policies and community engagement.