TokenTree vs LLama.cpp Tokenization

This page contains information on how to validat the performance of TokenTree against LLama.cpp. TokenTree is an algorithm Gyula Rabai has developed for his C#.Net AI Inference Engine. LLama.cpp is the most popular inference engine for local LLMs.

About the test system

The test system is a Visual Studio 2022 project (.sln), it contains the source code of TokenTree. It uses LLamaSharp as a wrapper around compiled llama.cpp binaries. The test system loads a text file, or downloads a webpage and tokenizes it with both algorithms. It measures the time for each algorithm and displays the results.

Download the test system

Download (model file): Llama-3.2-1B-Instruct-f16.gguf (2.4 GB)
Save the model file to: C:\AIModels\Llama-3.2-1B-Instruct-f16.gguf

Download (Source + Exe): Token-LLama.zip
Extract to: C:\Work\Gyuszi\Token-LLama

Optionali download and recompile LLama.cpp

LLamaCpp Github: https://github.com/ggerganov/llama.cpp
LLamaCpp Srouce: llama.cpp-master.zip (2025-02-02)

Findings

The TokenTree algorithm did the tokenization is a much better approach, it offers significant speed increase over LLama.cpp method. Read the following paper to learn about the insightes:

Fast Inference-Time Tokenization through Approximating BPE by Gyula Rabai

Video

The following video shows you how to run the test system.