# [Ilia Breitburg](https://breitburg.com) Personal website for Ilia Breitburg ## Workbench ### [Bandnine](/workbench/bandnine/) Bandnine ======== Practice your IELTS writing skills with a timer. ![Bandnine interface mockup on an iPad](/workbench/ielts-app-promo.png) ### How To simulate the real exam conditions, I built a simple timer that counts down from 60 minutes. When the timer reaches zero, it automatically submits your essay for review. ### Why I created this app to help IELTS candidates improve their writing skills under timed conditions. By practicing with a timer, users can better prepare for the pressure of the actual exam. ### [Inference for Dart](/workbench/inference-for-dart/) Inference for Dart ================== Run large language model inference from Dart and Flutter, using [`llama.cpp`](https://github.com/ggml-org/llama.cpp) as a backend. The API is designed to be human-friendly and to follow the [Dart design guidelines](https://dart.dev/effective-dart/design). Installation ------------ 1. Add the following to your `pubspec.yaml`: dependencies: inference: git: url: https://github.com/breitburg/inference-for-dart 2. Run `flutter pub get`. 3. Compile the [`llama.cpp`](https://github.com/ggml-org/llama.cpp) backend for your target platform and link it to your native project. Use the official [build instructions](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md) for your platform. The library must be compiled with the same architecture as your Dart/Flutter app. _For iOS, you need to open the Xcode project and add `llama.xcframework` to the 'Frameworks, Libraries, and Embedded Content' section; for Linux, you need to compile `libllama.so` and link it to your project, etc._ Prerequisites ------------- ### Model Support You can run inference from any `.gguf` model, downloaded at runtime or embedded within the app. Search for 'GGUF' on HuggingFace to find and download the model file. For the up-to-date model availability, see [llama.cpp's README](https://github.com/ggml-org/llama.cpp?tab=readme-ov-file#text-only). ### RAM Requirements Running inference with large language models requires sufficient memory resources. The model must be fully loaded into RAM (or VRAM for GPU acceleration) before any inference can begin. For example, the [OLMo 2 (7B) Instruct model with `Q4_K_S` quantization](https://huggingface.co/allenai/OLMo-2-1124-7B-Instruct-GGUF/blob/main/olmo-2-1124-7B-instruct-Q4_K_S.gguf) is 4.25 GB. To estimate the minimum RAM required for inference, use this formula: Total RAM ≈ Model Weights + KV Cache + Inference Overhead 1. **Model Weights**: 4.25 GB × 1.1 ≈ 4.7 GB * Quantized models typically require 1.0-1.2× their file size in memory. 2. **KV Cache**: ~100-200 MB (with default 1024 token context) * Calculated as: 2 × n\_layers × context\_length × (n\_heads\_kv × head\_size) × data\_type\_size * Scales linearly with context length. 3. **Inference Overhead**: ~100-200 MB * Includes temporary buffers and computational workspace. Total Estimated RAM: **~5.0-5.1 GB minimum** _Insufficient RAM will result in application crash during model initialization._ Usage ----- ### Models and Metadata Before running inference and loading the full model weights into memory, you can inspect the model metadata to fetch its name, authors, the license, and understand its capabilities and requirements. // Create a model instance from a file path (not loaded yet) final model = InferenceModel(path: 'path/to/model.gguf'); // Retrieve and display model metadata final metadata = model.fetchMetadata(); print( "${metadata['general.name']} by ${metadata['general.organization']} under ${metadata['general.license']}", ); ### Initializing the Inference Engine The `InferenceEngine` manages the model's lifecycle, including initialization and cleanup. // Create an inference engine with the loaded model final engine = InferenceEngine(model); // Initialize the engine (loads model into memory, prepares context, etc.) engine.initialize(); // Dispose of the engine when done to free resources engine.dispose(); > **Tip:** Always dispose of resources (such as the inference engine) when they are no longer needed to avoid memory leaks. ### Chat Inference Interact with the model using structured chat messages for conversational AI scenarios. // Prepare a list of chat messages with roles final messages = [ ChatMessage.system('You are an AI running on ${Platform.operatingSystem}.'), ChatMessage.human('Why is the sky blue?'), ]; // Run inference and handle the output engine.chat( messages, onResult: (result) => stdout.write(result.message?.content ?? ""), ); ### Text Embeddings In order to compute the embedding for the given text, you must use an embedding model. // Embed 'Hello' into the latent space final vector = engine.embed('Hello'); print(vector) // [-0.0596, 0.0614, ...] ### Tokenizing and Detokenizing Text You can convert text to tokens, and convert tokens back to text as needed for your application and evaluations. // Tokenize input text List tokens = engine.tokenize("Hello, world!"); print("Tokens: $tokens"); // Example output: [1, 15043, 29892, 0] // Detokenize tokens back to text String text = engine.detokenize([1, 15043, 29892, 0]); print("Text: $text"); // Output: "Hello, world!" ### Advanced Configuration Customize library behavior, such as specifying a dynamic library path or handling logs. // Set a custom dynamic library and log callback lowLevelInference ..dynamicLibrary = DynamicLibrary.open("path/to/libllama.so") // Use .dylib for macOS, .dll for Windows ..logCallback = (String message) => print('[llama.cpp] $message'); ### [Claude for Pebble](/workbench/claude-for-pebble/) Claude for Pebble ================= During the period I was exploring KaiOS, I created a simple client for Claude, the AI assistant by Anthropic. ## Primary ### [About](/about/) ### Mindset I'm a builder at heart, launching products since 14 years old. I have a mix of a technical and business background, with experience in engineering complex apps and backend architectures, training AI models and implementing custom tokenizers, as well as building strong teams and making sure that whatever we're building is so good it grows organically. Speaking personally, I love building because seeing people actually use what you made gives me purpose. I care about building something meaningful with people who feel the same. I'd rather work with someone as obsessed as me than chase trends with someone optimizing for optics. ### Life I'm currently a student at [KU Leuven](https://kuleuven.be/) doing BSc in Business Administration. ### Work Currently doing AI/ML at [Klassif.ai](https://klassif.ai), a document AI platform serving Fortune 5000 companies, enabling up to 80% faster document processing. I'm working on the R&D team, building the amazing VLMs for document AI. ### [Home](/) [About](/about/) [Memos](/memos/) [Workbench](/workbench/) ### [Memos](/memos/) [Tuesday, 29 July 2025](#2025-07-29) ------------------------------------ I've tried to make this website timeless. I purposefully decided to avoid skeuomorphism (or liquid glass, neumorphism, etc.) and modern design trends that may quickly become outdated. I am mostly inspired by classic Swiss design of the 20th century, Metro by Microsoft (which is largely inspired by airport navigation), [Vitsœ](https://www.vitsoe.com/), and the [New York City Subway](https://en.wikipedia.org/wiki/New_York_City_Subway). I figured what works best is creating a sense of order and clarity with grids and typography, then adding some details that are not perfectly aligned to make everything feel more human and less sterile. This completes the overall aesthetic I'm aiming for. [Friday, 09 May 2025](#2025-05-09) ---------------------------------- Even after 15 years since the iPad's release, most note-taking apps are still skeuomorphic. I mean it in the fundamental sense: they use the paper metaphor — pages, ink, eraser, and highlighter. This worked to make people believe that it could replace their notebook. But with more time, it becomes more and more pointless. In fact, this hurts the experience significantly and limits the possibilities. It's like if cars were designed to look exactly like horses and carriages instead of being built for speed and efficiency. For instance, why do we have issues like "I started writing a word but realized there's not enough space"? This problem is inherited from paper and has no reason to exist. Formatting writing between pages also causes problems. Even though highlighting definitions doesn't correlate with memory retention and is proven to be purely for structuring, we still do it as if writing on paper. What if you escaped from the paper legacy in note-taking and tried to build the experience from scratch? Just like a typewriter supercharged the handwriting experience, or a computer supercharged the typewriter. Speaking of new perspectives, most users of note-taking apps are students. They use the app as a utility to preserve, decompose, and structure information for the purpose of learning. For some reason, the apps that are marketed as educational tools are still mostly designed to be the best recording tool, not a general-purpose learning tool, which is how they are actually used. Some complementary factors are: * Since 2022, students use AI as one of the core tools for learning * They provide context to the AI — essentially, the same context they write down in their notes * AI models become more and more capable and efficient every year and can be run on a tablet * Modern tablets have insane compute power, which is mostly idle during note-taking What if you built a note-taking app that is not just a recording tool, but a learning tool? Then the primary focus becomes the retention of information, the engagement, and the evaluation of progress. I don't mean simply adding a chat sidebar, but rather building the app around the new technology and the new use case. [Thursday, 27 March 2025](#2025-03-27) -------------------------------------- Went to a [Raycast](https://raycast.com) meetup at the [amo](https://amo.co/) office in Paris. Well, "office" is an overstatement. ![](amo-office-2025.jpeg) ## Home > [About](/about/) > [Memos](/memos/)