Jonathan A. Rocha

Jonathan A. Rocha

Data Scientist · AI/ML Engineer · Full-Stack Developer

Building applied ML systems in industry. Pursuing doctoral research in topology-aware deep learning and time-series data mining.

jarocha@smu.eduLinkedInGitHubAustin, TX

Choose your path

For Industry Recruiters

20+ years of full-stack engineering experience across financial services (Wells Fargo), automotive retail, marketing technology, and AI/ML consulting. Currently shipping production ML systems — multimodal RAG, multi-agent pipelines, sentiment-driven market intelligence — while completing an M.S. in Data Science at SMU.

Download Resume

For Academic Programs

Pursuing a PhD in data science, computer science, or mathematics with research interests in topology-aware deep learning, time-series data mining, and NLP applied to financial markets. Author of the Formal educational platform series (130+ topics across 29 tracks) and the Applied NLP for Finance book manuscript.

Download CV

🎉What's New

🎓

Pursuing a PhD in data science, computer science, or mathematics — research focus on topology-aware deep learning and time-series data mining

📘

Writing "Applied NLP for Finance: Building Market Intelligence Systems with Language Models" — book manuscript in progress with companion GitHub repo

💼

DataSalt.ai consultancy actively serving Texas businesses across retail, agriculture, legal, and healthcare — 10 case studies and 7 technical blog posts published

Formal Educational Platform Series

129+ topics · 29 curriculum tracks · Astro 5 · React 18 · MDX · D3.js · KaTeX

formalml.com
65+
topics
13
tracks

Formal ML

Mathematical foundations of modern machine learning. Geometric-first exposition with rigorous proofs and interactive D3 visualizations.

Topology & TDALinear AlgebraProbability & StatisticsOptimizationDifferential Geometry+8 more
formalstatistics.com
32+
topics
8
tracks

Formal Statistics

A pure statistics curriculum from probability foundations through high-dimensional inference.

Foundations of ProbabilityCore Distributions & FamiliesConvergence & Limit TheoremsStatistical EstimationHypothesis Testing & Confidence+3 more
formalcalculus.com
32+
topics
8
tracks

Formal Calculus

Calculus and analysis curriculum spanning single-variable through functional analysis essentials.

Limits & ContinuitySingle-Variable CalculusMultivariable Differential CalculusMultivariable Integral CalculusSequences & Series+3 more

Projects

Cross-Asset Sentiment Regime Detector

Industry · Academic

SMU Capstone (advised by Dr. Lin). Two-layer pipeline combining GARCH(1,1) volatility modeling with a Statistical Jump Model for cross-asset sentiment regime detection. Ensemble approach integrates BERT-family transformers with classical time-series methods. Live dashboard with hybrid RAG + live-context chatbot.

PythonBERTGARCHStatistical Jump ModelRAG

finrag.io — Multimodal Financial RAG

Industry · Academic

A multimodal financial document intelligence platform. Gemini Embeddings 2 + Qdrant for retrieval, Cloudflare R2 for storage, FastAPI backend on Fly.io, Claude Sonnet for synthesis, Gemini Flash for TTS of earnings call transcripts.

GeminiQdrantFastAPIClaude SonnetNext.js

CounselOS — Multi-Agent Legal Intake

Industry

A multi-agent AI legal matter intake system featuring a five-agent pipeline with a custom state machine orchestrator. Built for portfolio and technical interview preparation; demonstrates multi-agent orchestration patterns.

FastAPINext.jsMulti-AgentRailwayVercel

HarliBot — Bilingual Municipal AI Chatbot

Industry

Production RAG-based AI chatbot for the City of Harlingen, TX. True bilingual support (EN/ES), 2,100+ vector-indexed chunks, full deployment on Vercel + AWS Lambda.

ReactNext.jsRAGAWS Lambda

DataSalt.ai — Consultancy Platform

Industry

The DataSalt.ai consultancy site with 10 case studies covering South Texas verticals (boat sales, beach resort, shrimping, citrus/agriculture, healthcare, construction, law firm, used-car dealership), SaltyDog AI chatbot with custom Amelie French Bulldog avatar, programmatic hero image generator, and 7 technical blog posts.

Next.js 14Tailwind CSSVercel

Persistent Homology for Financial Regime Detection

Academic

ML 2 final project. Applied topological data analysis methods — persistent homology, Vietoris–Rips complexes, persistence diagrams — to financial time-series classification. Deliverables: video script, Jupyter notebook, presentation deck.

PythonGUDHIRipsergiotto-tda

Statistics Visualization

Academic

A collection of Mermaid diagrams visualizing complex statistical concepts for Data Science education.

Mermaid.js

ENGL 5374 Final Project

Academic

A Vite-React website exploring Twitter platform governance, featuring interactive diagrams.

ReactVite

In Progress

Applied NLP for Finance

Manuscript in progress · 12–13 chapters

Building Market Intelligence Systems with Language Models

A book and companion code repository covering knowledge graphs, LLM workflows, and applied financial NLP. Bridges current research with practitioner-ready engineering.

Companion repoDual MIT / CC BY-NC 4.0

Research Interests

Pursuing a PhD in data science, computer science, or mathematics. Research is anchored in topology-aware deep learning and time-series data mining, with applied threads in financial NLP and sentiment-based regime detection.

Topology-aware deep learning
Time-series data mining
NLP applied to financial markets
Sentiment-based market regime detection
Topological data analysis (persistent homology, Mapper)
Ensemble transformer models
Scalable ML systems
Distributed computing architectures
Computational text analysis

Education

Master of Science, Data Science

Southern Methodist University (SMU)

Expected Graduation: August 2026 · GPA: 3.63 · Advisor: Dr. Lin

  • Capstone: Sentiment-based market regime detection using ensemble transformer models (BERT-family + GARCH(1,1) + Statistical Jump Model)
  • Coursework: Artificial Intelligence, Database Management Systems, Applied Statistics I & II, Machine Learning II

Master of Arts, English

Texas A&M University – Central Texas (TAMUCT)

Graduated December 2024

Bachelor of Arts, History

Texas A&M University (TAMU)

Graduated 2004

Experience

    Founder & Chief Executive Officer

    Current

    DataSalt.ai

    February 2025 – Present · Austin, TX · Hybrid

    Founded a boutique AI/ML consultancy serving small and mid-sized Texas businesses. Lead all data science engagements: retrieve and analyze sensitive client data across retail, agriculture, legal, healthcare, and other verticals; transform inputs into actionable insights through intuitive data storytelling.

    • Built finrag.io, a multimodal financial RAG system (Gemini Embeddings 2, Qdrant, Cloudflare R2, FastAPI on Fly.io, Claude Sonnet, Next.js on Vercel)
    • Created the Formal educational platform series — formalml.com, formalstatistics.com, formalcalculus.com — Astro 5 / React 18 / MDX / Tailwind CSS / D3.js / KaTeX with 130+ published topics across 29 curriculum tracks
    • Published portfolio of 10 case studies and 7 technical blog posts at datasalt.ai; built SaltyDog, an AI chatbot with custom avatar

    Senior Web Developer & Full-Stack Engineer

    Current

    Fullsteam / Fullsteam Marketing

    January 2015 – December 2025 · Austin, TX · Remote

    11-year tenure across three role transitions: Web Developer (Jan 2015) → Full-Stack Engineer at Fullsteam Marketing (Jan 2016, concurrent) → promoted to Senior Web Developer (Jan 2019). Architected and maintained React-based web applications aligned with company digital strategy.

    • Executed full-stack development with HTML, CSS, JavaScript, and Python; managed database systems and AWS cloud infrastructure
    • Optimized site performance for SEO and user experience, significantly boosting visibility and search rankings of Fullsteam digital properties
    • Contributed to digital growth strategy through data-informed development decisions and cross-functional collaboration

    Independent Web Development Consultant

    Current

    Self-Employed

    2004 – Present · Austin, TX · Hybrid

    Two decades of end-to-end custom web solutions specializing in React, responsive design, and UX/UI for clients ranging from startups to enterprises.

    • Manage every stage of the project lifecycle: requirements gathering, technical architecture, full-stack development, API integration, performance optimization, accessibility, ongoing maintenance
    • Translate complex business needs into clear technical solutions; provide strategic guidance on best practices to non-technical stakeholders

    Senior Web Developer

    Amaru Motors LP dba Charlie Clark Nissan

    March 2009 – January 2015 · Harlingen, TX · Onsite

    Developed, optimized, and maintained user-facing websites and web applications for one of South Texas’s largest dealer-group operations.

    • Led mobile-first web design initiatives and contributed to the initial design and development of the company mobile app
    • Optimized site performance for SEO and usability, improving search rankings and customer engagement across digital properties

    Web Developer

    Wells Fargo

    March 2004 – March 2009 · San Antonio, TX · Onsite

    Built and maintained secure, user-friendly web applications supporting Wells Fargo online banking — account management, transactions, customer self-service — under strict regulatory and security requirements.

    • Worked across the stack with a back-end focus using HTML, CSS, JavaScript, and Python; built reliable, high-performing features under banking-grade security standards
    • Collaborated with product, design, and security teams to ensure regulatory compliance while optimizing backend services for response time, stability, and usability

Technical Skills

Development

TypeScript / JavaScript (ES6+)

React / Next.js

Astro / MDX

Python (FastAPI, Flask)

Node.js

R, SQL

Data Science & ML

Hugging Face Transformers / BERT

NLP & time-series modeling

RAG pipelines (Gemini, Claude, Qdrant)

GARCH & ensemble methods

Scikit-learn, PyTorch

Statistical inference & A/B testing

Math & ML Theory

Topology & TDA (persistent homology, Mapper)

Differential geometry

Optimization

Probability & statistics

Information theory

Learning theory

Infrastructure & Data

Vercel · Fly.io · Railway

AWS · Cloudflare R2

Qdrant · MongoDB · NoSQL

D3.js · KaTeX · MDX

Docker · CI/CD

Full-stack architecture

Get in Touch

Jonathan A. Rocha

I'm open to AI/ML engineering and data science roles as well as pre-doctoral and PhD program conversations. Reach out — I'd be glad to talk.