TokenTree vs LLama.cpp Tokenization
This page contains information on how to validat the performance of TokenTree against LLama.cpp. TokenTree is an algorithm Gyula Rabai has developed for his C#.Net AI Inference Engine. LLama.cpp is the most popular inference engine for local LLMs.
About the test system
The test system is a Visual Studio 2022 project (.sln), it contains the source code of TokenTree. It uses LLamaSharp as a wrapper around compiled llama.cpp binaries. The test system loads a text file, or downloads a webpage and tokenizes it with both algorithms. It measures the time for each algorithm and displays the results.
Download the test system
Download (model file): Llama-3.2-1B-Instruct-f16.gguf (2.4 GB)
Save the model file to: C:\AIModels\Llama-3.2-1B-Instruct-f16.gguf
Download (Source + Exe): Token-LLama.zip
Extract to: C:\Work\Gyuszi\Token-LLama
Optionali download and recompile LLama.cpp
LLamaCpp Github: https://github.com/ggerganov/llama.cpp
LLamaCpp Srouce: llama.cpp-master.zip (2025-02-02)
Findings
The TokenTree algorithm did the tokenization is a much better approach, it offers significant speed increase over LLama.cpp method. Read the following paper to learn about the insightes:
Fast Inference-Time Tokenization through Approximating BPE by Gyula RabaiVideo
The following video shows you how to run the test system.