AI product development has changed dramatically in the last two years.
In 2023, most AI engineers defaulted to Python-heavy stacks — FastAPI for APIs, LangChain (Python) for orchestration, HuggingFace for embeddings and fine-tunes. That made sense when models were new and everything happened in notebooks.
But here’s the problem:
When you’re building a real product, 90% of the work is not in Python. It’s in UI/UX, authentication, payments, user state, analytics — and the faster you can ship those, the faster you can get feedback and pivot.
That’s why in 2025, I’ve shifted to a TypeScript-first stack for nearly everything, and I only bring Python into the picture when it’s truly needed.
The result? Faster iteration, cleaner integration, and an architecture that’s observable, swappable, and built for change.
#The Core Philosophy
- TS-First for speed — Next.js, Convex, and modern TypeScript tooling let me ship features in hours instead of days.
- Python only where it’s irreplaceable — fine-tuning, LoRA, or specialized CV/NLP pipelines.
- Everything observable — if I can’t see what the model retrieved or why it hallucinated, it’s not a product, it’s a demo.
- Swappable components — vector DBs, embeddings, even the LLM can change without rewriting everything.
#1. Frontend & Application Layer
I keep the entire user-facing layer in TypeScript.
- Framework: Next.js (App Router) — lets me mix server and client components, stream responses, and keep all API contracts in the same repo.
- Database & State: Convex — real-time updates, serverless storage, and cron jobs without extra infra.
- Auth: Clerk — OAuth, email magic links, and SSO in minutes.
- UI/Styling: Tailwind CSS + shadcn/ui — fast, consistent, and themeable.
- File Handling: UploadThing or Vercel Blob.
💡 Example: Updating my RAG index nightly is just a Convex cron job calling the vector DB — no extra servers, no DevOps overhead.
#2. LLM Orchestration & Retrieval
For most projects, I orchestrate entirely in LangChain.js or LangGraph.js:
- Retrieval: Weaviate or Qdrant for managed ops; pgvector when I want Postgres-native queries.
- Embeddings: OpenAI
text-embedding-3-large
for general use, Voyage AI for higher multilingual accuracy, or Cohere for budget-friendly scale. - LLMs: Mix and match — Groq (Llama 3) for low-latency, OpenAI for reasoning-heavy queries.
- UI Integration: Vercel AI SDK for streaming chat and completion.
💡 Why not Pinecone? I prefer Weaviate/Qdrant for more control over index params and cost structure.
#3. The Python “Island”
I don’t run my whole backend in Python anymore — but I keep a Python microservice for when it matters:
- LoRA / Fine-Tuning — HuggingFace PEFT & Transformers.
- Custom CV/Audio/NLP Pipelines — OpenFace for video feature extraction, librosa for audio, spaCy for NLP.
- Self-Hosted Inference — GPU-heavy models running on Modal, Replicate, or Runpod.
This runs in FastAPI, deployed separately, and is called from the TS backend only when needed.
💡 Benefit: I can scale the Python service independently — if training spikes GPU usage, the rest of the app isn’t affected.
#4. Evaluation & Observability
If you can’t debug model behavior, you’re flying blind.
I track:
- Retrieval hits and metadata.
- Latency per pipeline stage.
- Hallucination scores (LLM-graded or heuristic).
- User feedback loops.
Tools I use:
- LangSmith for tracing, dataset runs, and A/B prompt testing.
- Sentry for app-level error tracking.
💡 Example: By logging retrieval hits with timestamps, I caught a timezone bug that was silently breaking a client’s nightly index refresh.
#5. Why This Stack Works
- Speed: Building product features in TS is simply faster.
- Flexibility: I can change vector DBs or LLM providers in hours.
- Scalability: Python workloads don’t slow down the rest of the app.
- Observability: Every stage is logged, traceable, and measurable.
#When to Break the TS-First Rule
There are times when Python-first makes sense:
- Research-heavy prototypes where speed to model iteration > product speed.
- Internal tools where UI/UX isn’t a priority.
- Deep integration with Python-only libraries.
But for production-grade AI products — especially those with real users — TS-first wins for me every time.
#TL;DR
Ship fast.
Track everything.
Keep it modular.
In AI, pivots are inevitable. This stack is built so I can change anything — the LLM, the embeddings, the vector DB — without starting over.