📚 ArXivLens AI - Advanced Analytics Suite

A comprehensive data science project for analyzing ArXiv AI/ML research papers (2025-2026) with an interactive, animated, glassmorphic Streamlit dashboard.

🎯 Project Overview

This project combines multiple machine learning and LLM/RAG techniques to analyze research papers:

LDA Topic Modeling: Discover 10 major research topics dynamically
Semantic Search: Vector similarity searches across abstracts using sentence embeddings
Clustering: K-Means clustering (8 clusters) projected in interactive 3D PCA space
Network Analysis: Co-authorship collaboration and PageRank analysis
Trend Forecasting: Track and forecast research popularity using regression models
Advanced LLM/RAG: Question-answering, document summarization, and automated literature reviews
Interactive Dashboard: Modern responsive UI with animated particle canvas backgrounds

📦 Project Structure

Project-1/
├── model.ipynb                 # Main Jupyter notebook with all analysis
├── streamlit_app.py           # Streamlit web application dashboard
├── requirements.txt           # Python dependencies
├── README.md                  # This file
├── DEPLOYMENT_GUIDE.md        # Deployment instructions and guidelines
├── run_api.py                 # REST API entry point
├── run_streamlit.bat          # Streamlit launcher batch script (Windows)
├── run_streamlit.ps1          # Streamlit launcher PowerShell script
├── setup.bat                  # Developer environment setup script
├── .env                       # Local environment API keys and configs
├── src/                       # Modular computational engines
│   ├── api.py                 # REST API endpoints & payload models
│   ├── db.py                 # SQLite database connection management
│   ├── citation_analysis.py   # PageRank and citation leaderboard computation
│   ├── predictive_analytics.py# Topic forecasting and saturation regression models
│   ├── nlp_applications.py    # Summarization, QA, and Literature Review engines
│   ├── dl_models.py           # Neural network classifiers and scientific tone engines
│   ├── knowledge_graph.py     # NetworkX graphs & semantic relationships 
│   ├── report_generator.py    # ReportLab PDF executive brief builder
│   └── run_pipeline.py        # Database initializer and model trainer
└── models/                    # Saved models & outputs (auto-generated)
    ├── lda_model.model       # LDA topic model
    ├── kmeans_model.pkl      # K-Means clustering model
    ├── pca_model.pkl         # PCA mapping model
    ├── dictionary.dict       # Gensim topic dictionary
    ├── corpus.pkl            # Preprocessed gensim corpus
    ├── vectorizer.pkl        # TF-IDF vectorizer
    ├── network_graph.pkl     # Co-authorship NetworkX graph
    ├── embeddings.npy        # High-dimensional sentence embeddings
    ├── embeddings_2d.npy     # 2D PCA projected embeddings
    ├── metadata.json         # Run metadata
    ├── processed_data.csv    # Flattened paper catalog dataset
    ├── Research_Intelligence_Brief.pdf # Generated PDF brief
    ├── citation_predictor.pkl # Citations regression model
    ├── pytorch_classifier.pkl # PyTorch model checkpoint
    └── research_suite.db      # Primary SQLite database data store

🚀 Quick Start

1. Install Dependencies

cd e:\Project-1
pip install -r requirements.txt

2. Initialize database and Train ML models

# This script creates sqlite_database.db and trains LDA, PCA, K-Means, and network models
python -m src.run_pipeline

3. Launch Streamlit Dashboard

streamlit run streamlit_app.py --server.port 8501

The dashboard will open automatically in your browser at http://localhost:8501.

🤖 LLM & RAG Integration

The suite supports generative AI analysis using OpenAI or DeepSeek chat models.

🔑 Sidebar API Key Manager

Directly inside the sidebar, you can expand 🔑 LLM API Key Configuration to enter your personal DeepSeek or OpenAI API keys. Keys entered here are cached in st.session_state and prioritized for all queries.

⚠️ Transparent Billing Error Handling

If a configured API key runs out of funds, the application detects the 402 Insufficient Balance API response, issues a styled warning notice with platform top-up links, and falls back gracefully to local resources.

🔄 Offline Fallbacks

If all LLM APIs are offline or lack balance:

Summarization: Falls back to a local TF-IDF TextRank extractive summary.
Literature Reviews: Falls back to an extractive metadata synthesis template.
Document QA: Synthesizes structured insights directly from retrieved document abstracts.

📊 Dashboard Sections

🏠 Executive Dashboard

High-level KPIs (Total papers, clusters, citation edges)
Glassmorphic card metrics with hover animation effects
Interactive Plotly distribution chart

🔍 Semantic Search & QA

Vector Search: Semantic query matching on BERT embeddings
Interactive QA: Ask questions and get answers synthesized from papers. Toggle context between the Local Database or Global Live ArXiv.
Automated Literature Review: Generate cohesive paragraphs mapping out papers on a topic using local or global live search scopes.

📊 Topics & Trend Forecasting

LDA Topic list with TF-IDF keyword frequencies
12-Month topic popularity projections
Growth velocity and saturation analytics

🔗 Citation Network & PageRank

View collaboration networks, Node Degree distributions, and PageRank rankings
Export network insights

👥 Researcher Analytics & Predictive Success

Metric calculations (Citations, h-index) and citation velocity trajectory projections
Search Scope Toggle: Toggle between Local Database Catalog and Global AI Search (LLM-driven) for instant global academic profiling.

📁 PCA Paper Clustering & Summaries

3D PCA Interactive Cluster Map: Project and rotate high-dimensional embeddings in a 3D Plotly canvas.
Document Summaries: Paste custom abstracts, select local papers, or use the Live ArXiv Search API to fetch and summarize any paper dynamically.

📈 Live ArXiv Monitor

Directly connects to the live ArXiv API feed to extract and run real-time ML classifiers and citation predictions on the newest preprints.

⚙️ System & Reports

Build a styled PDF research brief with ReportLab
Export processed paper datasets to CSV
View system path configurations

🛠️ Technical Stack

Component	Technology
Data Processing	Pandas, NumPy, SQLite
ML & Clustering	Scikit-learn, Sentence-Transformers (`all-MiniLM-L6-v2`)
NLP & Topic Modeling	NLTK, Gensim, TF-IDF
Generative LLM / RAG	OpenAI API, DeepSeek API, Live ArXiv Client
Graph Analytics	NetworkX
Visualizations	Plotly Express, Plotly Graph Objects, HTML Canvas
PDF Generation	ReportLab PDF Library
Frontend UI	Streamlit, Glassmorphism, CSS Micro-animations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 ArXivLens AI - Advanced Analytics Suite

🎯 Project Overview

📦 Project Structure

🚀 Quick Start

1. Install Dependencies

2. Initialize database and Train ML models

3. Launch Streamlit Dashboard

🤖 LLM & RAG Integration

🔑 Sidebar API Key Manager

⚠️ Transparent Billing Error Handling

🔄 Offline Fallbacks

📊 Dashboard Sections

🏠 Executive Dashboard

🔍 Semantic Search & QA

📊 Topics & Trend Forecasting

🔗 Citation Network & PageRank

👥 Researcher Analytics & Predictive Success

📁 PCA Paper Clustering & Summaries

📈 Live ArXiv Monitor

⚙️ System & Reports

🛠️ Technical Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.streamlit		.streamlit
__pycache__		__pycache__
models		models
src		src
.env		.env
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
README.md		README.md
model.ipynb		model.ipynb
requirements.txt		requirements.txt
run_api.py		run_api.py
run_streamlit.bat		run_streamlit.bat
run_streamlit.ps1		run_streamlit.ps1
setup.bat		setup.bat
streamlit_app.py		streamlit_app.py

Folders and files

Latest commit

History

Repository files navigation

📚 ArXivLens AI - Advanced Analytics Suite

🎯 Project Overview

📦 Project Structure

🚀 Quick Start

1. Install Dependencies

2. Initialize database and Train ML models

3. Launch Streamlit Dashboard

🤖 LLM & RAG Integration

🔑 Sidebar API Key Manager

⚠️ Transparent Billing Error Handling

🔄 Offline Fallbacks

📊 Dashboard Sections

🏠 Executive Dashboard

🔍 Semantic Search & QA

📊 Topics & Trend Forecasting

🔗 Citation Network & PageRank

👥 Researcher Analytics & Predictive Success

📁 PCA Paper Clustering & Summaries

📈 Live ArXiv Monitor

⚙️ System & Reports

🛠️ Technical Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages