Talking points and playthings
Fri, Apr 24, 2026
Today is a bit different from usual. While we’ll talk a just a little about how some of the mathematics that we’ve learned fits into this monster, I’ll also point you to some tools you might explore some day and wax slightly philosophic.
Artificial intelligence is a topic that’s been around for decades - since at least the 1940s. The current AI frenzy, though, is based largely around Large Language Models.
I guess they’re making a bit of an impact.
It seems to me that there are a lot of objectively good things coming out of the transformer architecture:
My view on AI is to learn about it!
More generally, for fledgling programmers - broaden your interests!
Gary Marcus is a frequent critic of LLMs:
https://garymarcus.substack.com/
He’s also a major proponent of Symbolic AI.
I worked for Wolfram Alpha for a few years, by the way.
I was as surprised as most people in 2023, though, when suddenly computers could speak in complete sentences and carry on a semi-normal conversation.
The major steps in LLM development look like so:
This is the process of forming a large corpus or body of text to model. This has been done largely by
The data does not need to be labeled. As the objective is next token prediction, the training process is an example of self-supervised training.
The process of translating the text to computer consumable tokens. This uses a process called byte pair encoding, which recursively merges the most frequent pairs of characters in a corpus.
You can see the result at the tokenizer playground.
While tokens are numeric, they don’t really live within a structured space. Vector embedding is the process of placing them within a vector space with a structure that reflects meaning.
Points that are close together generally have similar meanings. The classic algebraic craziness is \[ \text{king} - \text{man} + \text{woman} \approx \text{queen}. \]
You can play with an example of a vector embedding at the vector projector.
The fundamental structure of a Large Language model is called a transformer. It’s literally built to transform one sequence of tokens into another.
A schematic of the transformer (taken from the seminal 2017 paper “Attention is all you need”) is shown at the right. It comes it two parts:
Some tasks, like translation, use both the encoder and the decoder.

Pretraining is the process of training the model to write natural text. This portion works a lot like previous supervised learning algorithms work. Thus, we feed our corpus to the model. Each sequence of consecutive tokens in the data is labeled by subsequent token, and we apply the method of maximum likelihood to obtain a next token predictor.
After pretraining, our model can converse naturally but it doesn’t necessarily follow instructions, align with our preferences, or reliably produce accurate responses. The process of training the model further is called finetuning. This is often done with human feedback
The evaluation and inference are very much as we’ve seen in other cases. Thus, the model is pretty much built and ready to use. So we
There are lots of ways to use LLMs other than direct interface with ChatGPT or some other major chatbot. There are also lots of models that can be highly customized and used as a chatbot of via an API for various purposes.
That’s exactly how the Math Proofreader on our forum works.
OpenRouter provides a unified API access to nearly 700 models. I’ve spent about 20 cents on all the API calls for the forum this semester. About 5 cents of that was my own usage of the chatbot.
Things can go awry, though.
Hugging Face is a company based in NYC that provides tools, libraries, and a model hub for building, sharing, and deploying machine learning models. You can build and deploy your own models for testing right there, though you’ll probably want to move beyond Hugging Face for production.
vLLM is an open source server for AI large language models. You can rent GPU server space from a company like DigitalOcean or Lambda and use vLLM from there.
My webpages are served by DigitalOcean.
Andrej Karpathy cofounded and worked at OpenAI and also worked for a bit at Tesla. In early 2023, he began creating amazing free resources available which you can find listed on his website.
Of particular interest, I would point out
where he explains how to build GPT V2 from scratch for less than $100.