About me

I am a former Data Scientist now pursuing a Ph.D. in Computer Science, where I study how greener technologies can improve public health. I employ methods from environmental epidemiology, statistics, and machine learning to assess the impact of electric vehicles on air quality and public health. My research aims to inform policymakers in creating equitable environmental regulations for addressing climate change. At the University of Toronto, I am supervised by Dr. Steve Easterbrook.

I am also engaged in research on the mitigation of technology's rebound effects—or unintended effects—and contribute to their mitigation in residential buildings, data centers, and machine learning infrastructure. I am a doctoral fellow at the University of Toronto's Climate Positive Energy (CPE) and Data Science Institute (DSI), where I contribute to research on sustainable energy solutions and data-driven environmental policy.

"Where there is much desire to learn, thereof necessity will be much arguing, much writing, many opinions; for opinion in good men (good person) is but knowledge in the making." — John Milton.

Updates

  • Received the DSI Doctoral Fellowship of $75,000 for three years (2024-2027) for my research on the health impact of electric vehicles (EVs) on respiratory diseases. Awarded for ongoing contributions in data science with a focus on innovative research that bridges the gap between theory and practical application.
  • Received the Climate Positive Energy's Climate Solutions Scholarship of $15,000 (2023-24), supporting scholars committed to developing solutions that promote an equitable energy transition.
  • Featured at Wiley Kudos: Check it out!.
  • Software engineering for cybersecurity: My paper titled, "Design and Implementation of a Quantitative Network Health Monitoring and Recovery System," has been recently published at Springer's Wireless Personal Communications Journal. Please check it here.
  • Machine learning for health contribution: My work is published in 194th Volume of Environmental Research This article is also included in Elsevier Public Health Emergency Collection and can be accessed here.



Please find my updated CV here.

Experience

University of Toronto

Research Assistant

Full-Time (Sep '21 - Present, 24 m)

Domain: Computational Modeling, Public Health, and Climate Change

Just Sustainability Design Lab

Research Assistant

Remote (Fall'23 - Present, 2 m)

Domain: Data Curation in Machine Learning

President’s Advisory Committee on the Environment, Climate Change, and Sustainability

Research Assistant

Full-Time (Summer'23, 5 m)

Domain: Sustainability Education

Advisor(s)

Dr. Ayako Ariga

Responsibilities


The mandate of the CECCS is to advance coordination of the University of Toronto’s contributions and objectives on climate change and sustainability pertaining to research and innovation, academic programs, community engagement, and sustainability initiatives related to our operations.

Keele University

Research Assistant

Remote (Aug '20 - Jun '21, 11 m)

Domain: Software Engineering

Responsibilities


As a part of our research, we studied issues faced by software engineers while implementing source-code logging in their applications. Subsequently, we built an NLP-based topic model to analyze six websites of Stack Exchange Network, including Stack Overflow. Our research paper got accepted in the Journal of Software: Evolution and Process, Wiley.

Nyalazone Solutions Pvt. Ltd.

Data Scientist

Full-Time (Jul'19 - Jul'20, 12 m)

Domain: Algorithms, Data Management, and Data Transformation.

Responsibilities


As a data scientist, my responsibilities included building Leggero Dynamic Data Source (DDS) platform to transform and manage structured data. In addition, I trained and deployed a CNN to classify similar residential addresses by transforming them into matrices. I also visited the Department of Employment and Labour in South Africa to understand their data requirements and revamped DDS accordingly.

Delhivery Pvt. Ltd.

Associate Data Scientist Intern

Full-Time (Jan'19 - Jul'19, 6 m)

Domain: Data science, Statistics, and Machine learning

Responsibilities


Delhivery is a third-party logistics service provider, operating in over 1,200 cities in India. Delhivery became the first unicorn of 2019 with a valuation of $1.6 Billion.

Due to largely unstructured geographical planning and high population density, residential address matching is crucial. I worked on the performance enhancement of the address similarity engine that employs hierarchical structures derived from a residential address, along with edit distance-based matching using phonetics.


AI Credentials

Neural Networks and Deep Learning
deeplearning.ai (4 weeks)
Coursera certificate
Improving your statistical inferences
Eindhoven University of Technology, Netherlands (7 weeks)
Coursera certificate
Data Science Math Skills
Duke University, North Carolina, US (4 weeks)
Coursera certificate
Bayesian Statistics: Techniques and Models
University of California Santa Cruz, California, US (5 weeks)
Coursera certificate
Applied Social Network Analysis in Python
University of Michigan, Michigan, US (4 weeks)
Coursera certificate
Python for Data Science and AI
IBM (5 weeks)
Coursera certificate

Publications

Broadly, I have published peer-reviewed contributions in computational modeling, HCI, software engineering, and data mining

Computational Modeling

1. Harshit Gujral, Adwitiya Sinha (2021). Association between exposure to airborne pollutants and COVID-19 in Los Angeles, United States with ensemble-based dynamic emission model. Environmental Research, Elsevier (Peer-reviewed | SCI, SCIE indexed). Also, included in Elsevier Public Health Emergency Collection. [ScienceDirect Version] [PubMed Version]
Domain: Network science, Ensemble learning, and Exposure modeling

2. Harshit Gujral, Belgin San-Akca, Sangeeta Mittal. Association of international support and violence with the longevity of armed groups. (Ongoing).
Domain: International Relations, Counterterrorism, Network Science, and Machine Learning

3. Harshit Gujral, Somya Jain, Adwitiya Sinha. EPA or CARB? Evaluation of ground-level ozone pollution of the US using network science. (Ongoing).
Domain: Environmental Policies Assessment, Network Science, and Emission Modeling

Ethics in AI

4. Eshta Bhardwaj, Harshit Gujral, Siyi Wu, Ciara Zogheib, Tegan Maharaj, Christoph Becker (2024). Machine Learning Data Practices through a Data Curation Lens: An Evaluation Framework . ACM Conference on Fairness, Accountability, and Transparency 2024 (ACM FAccT'24) [ACM FAccT Preprint]
Domain: Data Curation in Machine Learning

Human–computer interaction (HCI)

5. Christina Bremer, Harshit Gujral, and Christoph Becker, Vlad Coroama (2023). How Viable are Energy Savings in Smart Homes? A Call to Embrace Rebound Effects in Sustainable HCI. ACM Journal on Computing and Sustainable Societies. Also, ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies (COMPASS'23). [ACM Digital Library] [GitHub]
Domain: Climate Change, Sustainability, and Empirical Analysis

Software Engineering

6. Harshit Gujral, Abhinav Sharma, Pulkit Jain, Shreya Juneja, Sangeeta Mittal (2022). Design and Implementation of a Quantitative Network Health Monitoring and Recovery System. Wireless Personal Communication by Springer (Semimonthly peer-reviewed | Scopus, SCIE indexed). [Springer Version] [GitHub]
Domain: Software Engineering for Cybersecurity.

7. Harshit Gujral, Sangeeta Lal, Heng Li (2021). An exploratory semantic analysis of logging questions . Journal of Software: Evolution and Process, e2361 (Peer-reviewed | SCIE indexed). [Wiley Version] [Kudos Wiley]
Domain: Topic Modeling and Empirical Analysis

8. Harshit Gujral , Abhinav Sharma, Sangeeta Lal, Lov Kumar (2019). A Three Dimensional Empirical Study of Logging Questions From Six Popular Q&A Websites. e-Informatica Software Engineering Journal by Polish Academy of Sciences and Wroclaw University of Science and Technology(Annually peer-reviewed | Scopus, ESCI indexed). [GitHub] [e-informatica Version]
Domain: Data Science and Empirical Analysis

9. Harshit Gujral, Abhinav Sharma, Sangeeta Lal, Amanpreet Kaur, Lov Kumar, Ashish Sureka (2018). Empirical Analysis of the Logging Questions on the Stack Overflow Website. Conference On Software Engineering & Data Sciences (CoSEDS): CEUR WS (Scopus) in-press.
Domain: Data Science and Empirical Analysis

10. Harshit Gujral, Abhinav Sharma, Sangeeta Mittal (2017). No-Escape Search: Design and Implementation of Cloud-Based Directory Content Search. IEEE- 10th International Conference on Contemporary Computing (IC3) [IEEE Version] [Web-Page]
Domain: Algorithm and Problem Solving

Data-mining

11. Harshit Gujral, Ajay Khushwaha, Sukant Khurana (2020). Utilization of Time Series Tools in Life-sciences and Neuroscience. Neuroscience Insights (formerly Journal of Experimental Neuroscience). Sage Publications (Peer-reviewed | Scopus, ESCI, PubMed indexed). [Sage Version] [PubMed Version]
Domain: Statistical Modeling and Time-series Analysis

12. Harshit Gujral, Sangeeta Mittal, Abhinav Sharma (2019). A Novel Data mining approach for Analysis and Pattern Recognition of Active Fingerprinting components. Wireless Personal Communication by Springer (Semimonthly peer-reviewed | Scopus, SCIE indexed). [Springer Version] [ACM Version]
Domain: Network Communication and Cybersecurity.

13. Harshit Gujral, Abhinav Sharma, Parmeet Kaur (2018). Empirical Investigation of Trends in NoSQL-based Big-data Solutions in the Last Decade. IEEE: 11th International Conference on Contemporary Computing-IC3 [Web-Page] [IEEE Version]
Domain: NoSQL - Database Management Systems and Empirical Analyeses

14. Harshit Gujral , Abhinav Sharma, Sangeeta Lal (2018). Empirical analysis of Q&A websites and a sustainable solution to ensure water-security. IEEE-11th International Conference on Contemporary Computing [Web-Page] [IEEE Version]
Domain: Global Water-Security

Concurrent Computing

15. Harshit Gujral , Abhinav Sharma, Sangeeta Mittal (2018). Determination of Optimal Thread Pool for Cloud based Concurrent Enhanced No-Escape Search. IEEE: 11th International Conference on Contemporary Computing-IC3 [GitHub] [IEEE Version]
Domain: Parallel Processing Algorithms and High-Performance Computing

Peer-Review Experience

Projects



OCR

Binarization, Segmentation and Recognition of generic shopping bills.

Project Details



Two-line Summary

This project aims for OCR (Optical Character Recognition) of white paper receipts. It includes Cropping (removing background), Improving Quality (if image is blurred), Binarization, Segmentation and Prediction.

System Vulnerabilities

Estimation of System Vulnerabilities from the Network layer using Deep-learning Techniques

Project Details


Domain

Cybersecurity, Network Communication, Research, Data Analytics

Two-line Summary

This model aims to estimate system vulnerabilities remotely without requiring system privileges in much lesser time.

Academic Profile

Academic Profile for Prof. Belgin San-Akca, Koc University, Istanbul.

Project Details


Domain

Web-development

Two-line Summary

This is a web-development project is Academic Profile of Professor Belgin San-Akca, Koc University, Istanbul, Turkey. I developed this website during Summer Research Program 2018 at Koc University.

Hyperbola-based Thread-pool Analysis (HTA)

Determination of Optimal Thread Pool for Cloud based Concurrent Processes

Project Details


Domain

Research, Data Analytics, Algorithms

Two-line Summary

We aim to enhance performance for any cloud-based indexing process by keeping CPU/cores constant but improving utilization of existing resources. Hyperbola-based Thread-Pool Analysis (HTA) technique has been designed to determine optimal Thread Pool depending upon a number of keywords in a files to upload (workload) and upload speed of network (bandwidth).

Network Fingerprinting and Communication

Study on Active Fingerprinting Analysis of Hosts in an Institutional Network

Project Details


Domain

Network Communication, Cybersecurity, Research

Two-line Summary

A distinct correlation pattern is observed in timers (RTT, SRTT, RTTVar and RTO) with variation in IP-ID Sequence classes, traceroute protocols and network traffic intensity.

Network Health and Information Security

A Distributed Network Health Monitoring and Recovery System

Project Details


Domain

System and Software, Cybersecurity, Network Communication, Research

Two-line Summary

Eminent features of this system encompass monitoring and permeating malicious egress and ingress traffic, identification and scoring of most-prominent and exhaustive vulnerabilities along with exposing devices with anomalous fingerprints.

EEG

Electroencephalogram Data Analysis (Alcoholics and Control)

Project Details


Domain

Data Analytics, Research

Two-line Summary

Data Analysis, Visualization, Hypothesis - Significance testing of EEG dataset.

AAM System

Automated Attendance Management System

Project Details


Domain

System and Software, Web-development

Two-line Summary

This Flask-based Web Application is implemented during my internship at Nyalazone Solutions Pvt Ltd.

ERM System

Employee Resource Management System

Project Details


Domain

System and Software, Web-development

Two-line Summary

Designed a centralized MySQL database satisfying BCNF Normalization. Privileges for a user is divided into three categories i.e. Superadmin, Admin & Employee. It has submodules such as automated attendance recorder that is implemented by scanning the network through Nmap, Employee performance analyzer, Project monitoring system, leave management system and an Internal Chat System implemented by using AJAX.

Indexing Algorithms

Disk-level indexing algorithms developed at Nyalazone Solutions Pvt Ltd.

Project Details


Domain

Algorithms, Research

Two-line Summary

This repository consists of disk-level indexing algorithms developed by my summer internship at Nyalazone Solutions Pvt Ltd. Tools/Technologies Used: Python, HDF-5, ZLIB Compression

Trip-Advisor Scraping

This project is the part of Flight module of Review Pool project.

Project Details


Domain

Web Scraping

Two-line Summary

Names and IDs of Flights were extracted from Trip-Advisor that are further used to dynamically crawl data from Trip-Advisor. (for learning purpose only)

No-Escape Search System

Design and Implementation of Cloud-based Directory Content Search

Project Details


Domain

System and Software, Algorithms, Research

Two-line Summary

No-Escape Search has solved 3 problems of the Windows Search. First, memory wastage by Windows indexing and its limited nature; second, slow data retrieval by unindexed window search; and third, inability to facilitate the user with location(s) of the input. Manuscript at IEEE Xplore.

Review Pool

Centralized Review and Ratings

Project Details


Domain

System and Software, Web-development

Two-line Summary

It is an integrated platform for reviews and ratings of multiple services namely, Hotels, Food, Flights and Jobs. In-order to aggregate the information we used API’s of TripAdvisor, Zomato, goibibo and GlassDoor.

JIIT-Simplified Scraping

This project is the part of my involvement with JIIT-Simplified platform as a core-team member.

Project Details


Domain

Web Scraping

Two-line Summary

Script is designed to scrap static HTML pages from old JIIT-Simplified Website to store them into MySQL database followed by creating new web-interface for placement forum at JIIT-Simplified Platform.

SUNNY FLIGHT

Project Details


Domain

System and Software, Web-development

Two-line Summary

It determines which ‘side of plane’ (left or right) should be reserved in order to get minimum exposure to sunlight using calculations of elevation of latitudes and longitudes of earth, flight and sunlight.

Yatrasoft

Client Management System - Java-JavaFX (JDK-8)

Project Details


Domain

System and Software

Two-line Summary

This GUI has features namely registration, updation, deletion of the client along with a feature to migrate data to excel using Apache POI.

Google Scraping

This scraping is the part of our ongoing Mining Software Repository project.

Project Details


Domain

Web Scraping

Two-line Summary

Script stores Google-Search results into MySQL database.

Yellow-Pages

C++ Graphics Console Application

Project Details


Domain

Algorithms

Two-line Summary

It is a Project on Data Structures using C++ graphics. This project is primarily based on Binary Search Tree and C++ Graphics. On the basic of user's data, we have implemented insertion, deletion and searching operation with interactive graphics. Moreover we have provided a comparison in time of execution of insertion and traversal in binary tree (using queue), BST, AVL Tree and Heap Tree.

JIIT Scraping

This projects involves scraping faculty information from JIIT-Simplified Platform for learning purposes only.

Project Details


Domain

Web Scraping

Two-line Summary

Names and other information is extracted from www.jiit.ac.in website.

Presentations

1. Presented at ACM COMPASS 2023 (SIGCAS/SIGCHI Conference on Computing and Sustainable Societies) in Cape Town, South Africa, a research paper titled 'How Viable Are Energy Savings in Smart Homes? A Call to Embrace Rebound Effects in Sustainable HCI'.

2. Presented at Emerging Mobility Scholars Conference 2023 in Toronto, Canada, a research paper titled 'Impact of EV sales on childhood asthma in the US: Can ZEV mandates help?'.

3. Presented at Climate Positive Energy Research Day 2023 in Toronto, Canada, a research paper titled 'Examining the public health impact of ZEV mandates in the US'.

4. Presented at IEEE- International Conference on Contemporary Computing-2017 in Noida, India, a research paper titled 'No-Escape Search: Design and Implementation of Cloud Based Directory Content Search'.

5. Presented at National Seminar on Unboxing Today's Consumers in a Global And Digital Age in Noida, India, a review paper titled 'Desalination as permanent solution for water scarcity'.

Involvements

1. Graduated from the Young Urban Forest Leaders (YUFL) program by LEAF in Toronto, 2024, specializing in urban forestry and community leadership.

2. Volunteered at Second Mile Club at Kensington Health, a senior home, through the Community Action Program (CAP) at the University of Toronto, providing support and companionship to elderly residents for 2023-24.

3. Collaborated with Climate Justice Toronto in organizing the Global March for Climate Action at Queen's Park in September 2023, advocating for progressive environmental policies and community engagement.

4. Attended the International Summer School on ICT for Sustainability (ICT4S 2021)

5. Attended a two-day Harvard US India Initiative Conference, Delhi-2018.

Documents



Contact

email: harshit.gujral@mail.utoronto.ca