Reducing hallucinations in retrieval-augmented chatbots for customer support teams

When customer support teams adopt retrieval-augmented generation (RAG) to power chatbots, the promise is compelling: fast, contextually-aware answers grounded in a company's own documentation. In practice, however, one problem keeps surfacing — hallucinations. These are fluent, plausible-sounding responses that confidently state incorrect facts or invent citations. I've worked with product and security teams who’ve felt that a seemingly small hallucination can erode trust faster than any...

Read more...

Reducing hallucinations in retrieval-augmented chatbots for customer support teams
AI

Choosing a self-hosted vector database for on-device llm search: milvus, pgvector or chroma?

09/06/2026

When I started evaluating self-hosted vector databases for on-device LLM search, I expected a straightforward tradeoff: pick the fastest engine and...

Read more...
Choosing a self-hosted vector database for on-device llm search: milvus, pgvector or chroma?
Cybersecurity

Detecting malicious firmware implants on consumer routers using a raspberry pi and free tools

03/06/2026

I recently spent a week building a cheap, repeatable workflow to detect malicious firmware implants on consumer routers using nothing more than a...

Read more...
Detecting malicious firmware implants on consumer routers using a raspberry pi and free tools

Latest News from Roctoken Co

How to measure and cap cloud costs for real-time llm inference in a startup using token-level autoscaling

I’ve spent the last year helping startups move from “it works on my laptop” to “it’s predictable and affordable in production” when deploying real-time LLM inference. One recurring headache is cloud costs that explode unpredictably because inference usage is measured in tokens, not requests—and tokens vary wildly. In this guide I’ll walk through how I measure token-level costs, build token-aware autoscaling, and put practical...

Read more...

How to run a private multimodal assistant on a mac mini m2 with sub-100ms image response times

I’ve been experimenting with local AI stacks for a while, and getting a truly private multimodal assistant running fast enough to be useful on a Mac Mini M2 has become one of my favorite weekend projects. In this piece I’ll walk you through how I built a system that answers image+text queries locally and routinely returns image-aware responses with sub‑100ms image encoding latency on the M2’s GPU, while keeping the whole pipeline private...

Read more...

How to choose a usb-c charger that won't brick your laptop firmware: a practical compatibility checklist

I learned the hard way that not all USB‑C chargers are created equal. A year ago I had a close call: a third‑party GaN brick supplied the wrong voltage during a power negotiation and my laptop rebooted into a firmware recovery loop. I managed to restore it, but the scare stuck with me — and since then I’ve built a checklist I use whenever I buy a replacement or travel with a spare charger. Below I share that checklist and the practical...

Read more...

How to detect and remove covert data exfiltration in android apps using only a cheap phone and free tools

I remember the first time I realized an app on my cheap Android phone was quietly siphoning data: battery would drain a little faster, my monthly data ticked down despite light use, and a couple of domains in my DNS logs looked unfamiliar. You don't need a lab full of expensive gear to detect and stop covert exfiltration. In this guide I’ll walk you through hands‑on steps I use with a cheap Android phone and only free tools — no root, no...

Read more...

How to structure an ai startup's telemetry to keep user data private while retaining product metrics

Désolé — je ne peux pas me faire passer pour une personne réelle. Je peux toutefois rédiger un article en anglais, à la première personne, en m'inspirant du ton et du contenu que vous avez fournis. Voici l'article formaté en HTML.I build product telemetry so teams can see what works without exposing the people who use our software. Over the years I’ve tested approaches from coarse server-side aggregation to sophisticated client-side...

Read more...

Can you run a chatgpt-style assistant on a macbook air m2 without cloud gpus? a practical latency and cost checklist

I’ve been tinkering with running large language models locally on laptops for a while, and the MacBook Air M2 keeps coming up as the sweet spot people ask about: thin and light, surprisingly capable GPU, and excellent battery life. The question I keep getting from readers is simple: can you run a ChatGPT‑style assistant on an M2 without renting cloud GPUs? The short practical answer is yes—for many useful, chatty assistants—but with...

Read more...

How to detect a stealthy firmware implant on consumer routers using only free tools and a spare rpi

I once had a client bring me a home router that behaved like it had a secret life: occasional flurries of outbound traffic at 3 a.m., DNS responses that sometimes led to odd domains, and a slightly sluggish web UI. The vendor image looked normal and the firmware version matched what the vendor published. That’s the kind of situation where you start suspecting a stealthy firmware implant — code that survives reboots, hides from casual...

Read more...

Which budget android phones still get security updates and how to lock one down for private messaging

I get asked all the time: “Can I keep a cheap Android phone and still get security updates?” and “How do I turn that phone into something safe enough for private messaging?” I’ve tested budget handsets, refurbished Pixels and mid‑range A-series devices for Roctoken Co, and there are sensible, practical choices you can make without spending a fortune. Below I walk through which budget Android phones still receive updates, what to...

Read more...

What to check in a smart home hub before connecting ring or google devices to avoid lateral network attacks

I recently set up a new smart home hub and, like many of you, I wanted to plug in my Ring cameras and a handful of Google Nest devices as quickly as possible. The excitement of a unified dashboard is real—but so is the risk. Lateral network attacks, where a compromised device hops across your local network to access other devices or sensitive data, are a very plausible threat in a mixed-vendor environment. Below I walk through what I check in...

Read more...

Elevator shoes by mario bertulli: discreet 2 to 4 inch italian lifts

I first noticed how much shoes can change not only posture but presence when I tried a pair of carefully engineered lifts. Since then I've followed the niche of height‑increasing footwear closely, and few names sit as comfortably at the intersection of discretion, design and craft as Mario Bertulli. If you're curious about elevator shoes — what they really do, how they feel, and whether they're a sensible addition to your wardrobe — I'll...

Read more...