
Jonathan A. Rocha
Data Scientist · AI/ML Engineer · Full-Stack Developer
Building applied ML systems in industry. Pursuing doctoral research in topology-aware deep learning and time-series data mining.
jarocha@smu.eduLinkedInGitHubAustin, TX
Choose your path
For Industry Recruiters
20+ years of full-stack engineering experience across financial services (Wells Fargo), automotive retail, marketing technology, and AI/ML consulting. Currently shipping production ML systems — multimodal RAG, multi-agent pipelines, sentiment-driven market intelligence — while completing an M.S. in Data Science at SMU.
- Production projects (finrag.io, CounselOS, HarliBot)
- 20+ years of full-stack engineering
- Stack and ML tooling
For Academic Programs
Pursuing a PhD in data science, computer science, or mathematics with research interests in topology-aware deep learning, time-series data mining, and NLP applied to financial markets. Author of the Formal educational platform series (130+ topics across 29 tracks) and the Applied NLP for Finance book manuscript.
- Formal series — 130+ topics across 29 tracks
- Research interests and PhD direction
- SMU MSDS Capstone (advised by Dr. Lin)
🎉What's New
Pursuing a PhD in data science, computer science, or mathematics — research focus on topology-aware deep learning and time-series data mining
Writing "Applied NLP for Finance: Building Market Intelligence Systems with Language Models" — book manuscript in progress with companion GitHub repo
DataSalt.ai consultancy actively serving Texas businesses across retail, agriculture, legal, and healthcare — 10 case studies and 7 technical blog posts published
Formal Educational Platform Series
129+ topics · 29 curriculum tracks · Astro 5 · React 18 · MDX · D3.js · KaTeX
Formal ML
Mathematical foundations of modern machine learning. Geometric-first exposition with rigorous proofs and interactive D3 visualizations.
Formal Statistics
A pure statistics curriculum from probability foundations through high-dimensional inference.
Projects
Cross-Asset Sentiment Regime Detector
Industry · AcademicSMU Capstone (advised by Dr. Lin). Two-layer pipeline combining GARCH(1,1) volatility modeling with a Statistical Jump Model for cross-asset sentiment regime detection. Ensemble approach integrates BERT-family transformers with classical time-series methods. Live dashboard with hybrid RAG + live-context chatbot.
finrag.io — Multimodal Financial RAG
Industry · AcademicA multimodal financial document intelligence platform. Gemini Embeddings 2 + Qdrant for retrieval, Cloudflare R2 for storage, FastAPI backend on Fly.io, Claude Sonnet for synthesis, Gemini Flash for TTS of earnings call transcripts.
CounselOS — Multi-Agent Legal Intake
IndustryA multi-agent AI legal matter intake system featuring a five-agent pipeline with a custom state machine orchestrator. Built for portfolio and technical interview preparation; demonstrates multi-agent orchestration patterns.
HarliBot — Bilingual Municipal AI Chatbot
IndustryProduction RAG-based AI chatbot for the City of Harlingen, TX. True bilingual support (EN/ES), 2,100+ vector-indexed chunks, full deployment on Vercel + AWS Lambda.
DataSalt.ai — Consultancy Platform
IndustryThe DataSalt.ai consultancy site with 10 case studies covering South Texas verticals (boat sales, beach resort, shrimping, citrus/agriculture, healthcare, construction, law firm, used-car dealership), SaltyDog AI chatbot with custom Amelie French Bulldog avatar, programmatic hero image generator, and 7 technical blog posts.
Persistent Homology for Financial Regime Detection
AcademicML 2 final project. Applied topological data analysis methods — persistent homology, Vietoris–Rips complexes, persistence diagrams — to financial time-series classification. Deliverables: video script, Jupyter notebook, presentation deck.
Statistics Visualization
AcademicA collection of Mermaid diagrams visualizing complex statistical concepts for Data Science education.
ENGL 5374 Final Project
AcademicA Vite-React website exploring Twitter platform governance, featuring interactive diagrams.
In Progress
Applied NLP for Finance
Manuscript in progress · 12–13 chaptersBuilding Market Intelligence Systems with Language Models
A book and companion code repository covering knowledge graphs, LLM workflows, and applied financial NLP. Bridges current research with practitioner-ready engineering.
Research Interests
Pursuing a PhD in data science, computer science, or mathematics. Research is anchored in topology-aware deep learning and time-series data mining, with applied threads in financial NLP and sentiment-based regime detection.
Education
Master of Science, Data Science
Southern Methodist University (SMU)
Expected Graduation: August 2026 · GPA: 3.63 · Advisor: Dr. Lin
- Capstone: Sentiment-based market regime detection using ensemble transformer models (BERT-family + GARCH(1,1) + Statistical Jump Model)
- Coursework: Artificial Intelligence, Database Management Systems, Applied Statistics I & II, Machine Learning II
Master of Arts, English
Texas A&M University – Central Texas (TAMUCT)
Graduated December 2024
Bachelor of Arts, History
Texas A&M University (TAMU)
Graduated 2004
Experience
- Built finrag.io, a multimodal financial RAG system (Gemini Embeddings 2, Qdrant, Cloudflare R2, FastAPI on Fly.io, Claude Sonnet, Next.js on Vercel)
- Created the Formal educational platform series — formalml.com, formalstatistics.com, formalcalculus.com — Astro 5 / React 18 / MDX / Tailwind CSS / D3.js / KaTeX with 130+ published topics across 29 curriculum tracks
- Published portfolio of 10 case studies and 7 technical blog posts at datasalt.ai; built SaltyDog, an AI chatbot with custom avatar
- Executed full-stack development with HTML, CSS, JavaScript, and Python; managed database systems and AWS cloud infrastructure
- Optimized site performance for SEO and user experience, significantly boosting visibility and search rankings of Fullsteam digital properties
- Contributed to digital growth strategy through data-informed development decisions and cross-functional collaboration
- Manage every stage of the project lifecycle: requirements gathering, technical architecture, full-stack development, API integration, performance optimization, accessibility, ongoing maintenance
- Translate complex business needs into clear technical solutions; provide strategic guidance on best practices to non-technical stakeholders
- Led mobile-first web design initiatives and contributed to the initial design and development of the company mobile app
- Optimized site performance for SEO and usability, improving search rankings and customer engagement across digital properties
- Worked across the stack with a back-end focus using HTML, CSS, JavaScript, and Python; built reliable, high-performing features under banking-grade security standards
- Collaborated with product, design, and security teams to ensure regulatory compliance while optimizing backend services for response time, stability, and usability
Founder & Chief Executive Officer
CurrentDataSalt.ai
February 2025 – Present · Austin, TX · Hybrid
Founded a boutique AI/ML consultancy serving small and mid-sized Texas businesses. Lead all data science engagements: retrieve and analyze sensitive client data across retail, agriculture, legal, healthcare, and other verticals; transform inputs into actionable insights through intuitive data storytelling.
Senior Web Developer & Full-Stack Engineer
CurrentFullsteam / Fullsteam Marketing
January 2015 – December 2025 · Austin, TX · Remote
11-year tenure across three role transitions: Web Developer (Jan 2015) → Full-Stack Engineer at Fullsteam Marketing (Jan 2016, concurrent) → promoted to Senior Web Developer (Jan 2019). Architected and maintained React-based web applications aligned with company digital strategy.
Independent Web Development Consultant
CurrentSelf-Employed
2004 – Present · Austin, TX · Hybrid
Two decades of end-to-end custom web solutions specializing in React, responsive design, and UX/UI for clients ranging from startups to enterprises.
Senior Web Developer
Amaru Motors LP dba Charlie Clark Nissan
March 2009 – January 2015 · Harlingen, TX · Onsite
Developed, optimized, and maintained user-facing websites and web applications for one of South Texas’s largest dealer-group operations.
Web Developer
Wells Fargo
March 2004 – March 2009 · San Antonio, TX · Onsite
Built and maintained secure, user-friendly web applications supporting Wells Fargo online banking — account management, transactions, customer self-service — under strict regulatory and security requirements.
Technical Skills
Development
TypeScript / JavaScript (ES6+)
React / Next.js
Astro / MDX
Python (FastAPI, Flask)
Node.js
R, SQL
Data Science & ML
Hugging Face Transformers / BERT
NLP & time-series modeling
RAG pipelines (Gemini, Claude, Qdrant)
GARCH & ensemble methods
Scikit-learn, PyTorch
Statistical inference & A/B testing
Math & ML Theory
Topology & TDA (persistent homology, Mapper)
Differential geometry
Optimization
Probability & statistics
Information theory
Learning theory
Infrastructure & Data
Vercel · Fly.io · Railway
AWS · Cloudflare R2
Qdrant · MongoDB · NoSQL
D3.js · KaTeX · MDX
Docker · CI/CD
Full-stack architecture
Blog
View All Posts →Building an AI Market Regime Detector
How I built a cross-asset sentiment regime detector using ensemble transformer models for my SMU Capstone project.
Read More →
HarliBot: Building a Bilingual AI Chatbot for Local Government
A case study on building and deploying a production RAG-based bilingual chatbot for the City of Harlingen, TX.
Read More →
From History to Data Science
My academic journey from a BA in History to an MS in Data Science.
Read More →
Combining English MA with Technical Writing
How my academic background in English has influenced my technical writing.
Read More →
Get in Touch

I'm open to AI/ML engineering and data science roles as well as pre-doctoral and PhD program conversations. Reach out — I'd be glad to talk.