top of page

Michael 

Knight

Data Scientist/Machine Learning Engineer
headshot smaller.jpg

"I have been working in Data Science since 2019 and obtained my Master’s Degree in Data Science in 2022. With a strong foundation in mathematics, I love using Machine Learning, AI, and Natural Language Processing techniques to solve complex puzzles."

01 PROFESSIONAL

MY DATA SCIENCE SKILL SET

MACHINE LEARNING

NLP

PYTHON

R

JAVA

SQL

JAVASCRIPT

NEUAL NETWORKS

DEEP LEARNING

AI

AWS

LLM

PROFESSIONAL 

03 Experience

2024-2024

VIDOORI

​Data Scientist

  • Designed and built an AI driven resume grader (written in Python and deployed in AWS Lambda) that evaluates (on a scale of 0-5) candidates based on 2 pass-fail disqualifying criteria, 3 weighted scoring criteria, and 3 bonus criteria; with 90% similarity to human grading

  • Designed 2 chatbots (HR Bot, Legal Bot) with distinct voices and exclusive access to their designated data using Llama2 via Llamaindex 

  • Developed cross-functional PowerPoint presentations to educate coworkers on the company’s SOTA Transformer based Deep Neural Net used to link people across two different surveys for the Census Bureau, as well as other AI / ML /NLP models and techniques

  • Precisely tracked and maintained project timelines cross functionally in JIRA tickets through Agile methodology and Scrum practices

2023-2023

CAREFORGE AI

​Data Scientist

  • Built and maintained a secure and organized data repository using PostGreSQL 16, ensuring that data integrity and accessibility are maintained throughout

  • Trained and optimized machine learning models enriched with key terms and search strings tailored to the platform's specific needs 

  • Worked with the advisory boards and used NLP techniques to extract insights, automate processes, and enhance user interactions on the platform, ensuring the most relevant and accurate responses to user input

  • Set up local large language models (LLMs) development pipeline

2023-2024

CHERRY STREET ENERGY

​Data Scientist (Contractor)

  • Within a two week deadline, created linear, ridge, and LASSO regression models that predict the monthly energy usage (kWh/mo) and intensity (kWh/sqft/mo) of a building in the U.S Southeastern Region within 90% accuracy, based on six inputs (square footage, stories, building profile, year constructed, weekly operating hours, and month)

  • Optimized the best performing model (ridge) using GridSearchCV to tune hyperparameters

  • Incorporated the regression model into a SEED calculator, which, in tandem with a function that calculates Billing Demand, determines how much a client would save on energy cost by switching to Cherry Street Energy

  • Developed and designed data pipelines to support an end-to-end solution for accessing Georgia Power’s API to extract meaningful insights on commercial building data for the SEED calculator

2020-2021

HUNGER FREE AMERICA

Data Analyst

  • Analyzed meal site data using Python to observe the biggest increase and decrease in numbers of lunches and breakfasts served between 3 years, by all 5 boroughs, and compared to census tract by poverty rate, to find underserved neighborhoods

  • Mapped analytic discoveries for 1,312 meal sites over the span of 3 years using Tableau

  • Created complex, cross-object and cross-platform reports and dashboards in Tableau

  • Designed and implemented Python modules to conduct ETL and EDA processes on 3 sets of yearly data using NumPy and Pandas 

2021-2023

AMERICAN UNIVERSITY

​Graduate Research Assistant (Machine Learning Engineer)

  • Developed human assisted machine learning and natural language processing (NLP) approaches to infer information about chemical compounds from highly technical open literature sources

  • Enhanced existing machine learning techniques to predict 9 different properties of CNOHF chemical molecules from their molecular structures for 439 unique chemical compounds 

  • Designed and implemented nested K-fold cross validation on a Kernel ridge regression (KRR) model using radial basis function (RBF) kernel mapping to find parameters that would give the best possible Mean Absolute Error (MAE) score for the model when using the 439 vectors of the 28 dimensional Sum Over Bonds featurization method as the feature matrix (X) and the 9 chemical properties as the target vectors (y1-y9)

  • Designed and implemented convolutional neural networks within PyTorch and PyTorch_Geometric to create neural fingerprints for these compounds based on their graphical representations (as generated using RDKit)

EXPERIENCEO
EDUCATION

04 education

2021-2022

AMERICAN UNIVERSITY

Washington, DC

Master of Science, Data Science

2019-2019

GENERAL ASSEMBLY

Data Science Immersive (full-time)

Computer science bootcamp (data science)

2016-2017

COMMUNITY COLLEGE OF PHILADELPHIA

Philadelphia, PA

​Additional computer science courses (Java)

2009-2011

UNIVERSITY OF MARYLAND

College Park, Maryland

​Additional mathematics, computer science courses (Javascript)

2001-2006

BARD COLLEGE

Annandale-on-Hudson, NY

​Bachelors of the arts, mathematics

Some computer science courses

CONTACT
CONTACT

Thank you for visiting my website.  If you have any questions or would like to discuss any opportunities, you can reach me here.  I look forward to working with you.

​

mknight4714@gmail.com

Tel: 202-747-4509

Thanks for submitting!

  • LinkedIn
  • GitHub
  • Medium
bottom of page