My data science and programming practices - as of March 2025

Hello, and welcome to my page. I'm Matt Najarian. I'm a data scientist with expertise in machine learning, statistics, optimization, and end-to-end model deployment. Over the past few years, I’ve built and deployed predictive models using Python, SQL, Apache Spark, and cloud platforms like Azure Machine Learning, Snowflake, and Databricks. I love teaming up with cross-functional groups, from data engineers to business stakeholders, to turn complex challenges into impactful solutions.

In this repo, I share some of my experiences in the following ares:

Section1: AI

machine learning
- applications to marketing
  - customer segmentation
  - customer lifetime value
  - media mix model
LLM, RAG, Chatbot (powering by models running locally on Ollama and Hugging Face)
PyTorch, TensorFlow, and Keras
Apache Spark / Databricks
- recommendation system (ALS)
- RDD, DataFrame, and MLIB

Section2: Optimization

optimization which contains my optimization model implementations in Python, Java, and C++. Here is a list of project codes you will find:
- Security Constrained Unit Commitment: Unit Commitment is the process of turning on (committing) resources to meet load and other market requirements. • Security-Constrained Unit Commitment (SCUC) commits units (electricity generators). while respecting limitations of the transmission system and unit. I have coded it in Java and Python.
- Maximizing Infrastructure Resiliency Under Budgetary Constraint: it is crucial for investment on resiliency to distribute budget among different resources, in a way that the effect is maximized. Check my paper in the following link: https://www.sciencedirect.com/science/article/abs/pii/S0951832019308336
- Component Importance: at the time of recovery from a disaster, some components play a more important. This is an ongoing research of mine to find those components. The codes include cool visualization (Cytoscape) and random graph generation codes.
SQL (PostgreSQL 14.0 and PgAdmin 4)
NoSQL (Neo4j)
Business Intelligence (Apache SuperSet, Tableau)

My Personal Cluster

One of my hobbies is managing my personal computing cluster. It consists of three standard PCs and one high-end PC, on which I have installed several applications to support my projects and experiments.

I built my high-end PC using parts sourced from MicroCenter and a used Nvidia RTX 3090 Ti Founders Edition (24 GB) that I purchased for $800. The system is powered by a Ryzen 7 CPU and equipped with 32 GB of RAM (F5-6000J3238F16G).

What runs on this cluster are:

Hadoop and Apache Spark
Ollama
PostgreSQL 14.0 and PgAdmin 4
Apache Airflow

Useful Links

Here is a list of useful links that Steve Nouri has shared on his Twitter account plus two other links that I added to it. While I put them here for simplicity of acces, please also read on the Twitter.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.vscode		.vscode
Azure		Azure
SQL		SQL
airflow		airflow
apache_spark		apache_spark
llm		llm
machine_learning		machine_learning
media		media
optimization		optimization
portfolio_fastapi		portfolio_fastapi
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
audioabrahamlincolnmystic_2_grierson_64kb.mp3		audioabrahamlincolnmystic_2_grierson_64kb.mp3
guide.md		guide.md
requirements.yaml		requirements.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

My data science and programming practices - as of March 2025

My Personal Cluster

Useful Links

About

Uh oh!

Releases

Packages

Uh oh!

Languages

lohraspco/data-science

Folders and files

Latest commit

History

Repository files navigation

My data science and programming practices - as of March 2025

My Personal Cluster

Useful Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages