The process of breaking a sentence into smaller pieces, or tokens, is called tokenization. The tokens help the model understand the text better. For a sentence like, IBM taught me tokenization, tokens can be IBM, taught, me and tokenization. Different AI models might use different types of tokens. The program that breaks down text into individual tokens is called a tokenizer. Tokenizers generate tokens primarily through three tokenization methods, word based, character based, and subword based.