Arjen Wiersma

A blog on Emacs, self-hosting, programming and other nerdy things

This week's reading was a deep dive into the world of AI-assisted development, its security implications, and the evolving role of the human developer. I also explored significant topics in hardware, software supply chain security, and some fascinating findings from the world of science.

AI in the Trenches: Development and Security

The intersection of AI, software development, and security was the dominant theme this week. A major focus was on moving beyond simple “vibe coding” toward more structured, secure, and effective methods. This includes “Vibe Speccing” to create structured workflows and using rules files to secure AI coding tools. The concept of “Context Engineering” was presented as the crucial new skill, emphasizing that providing the right information to the model is more important than prompt crafting alone.

On the security front, new tools and research highlighted the fragility of current systems. I read about Prompt-Security, a tool designed to prevent sensitive data from leaking to LLMs, and the BaxBench benchmark, which revealed that even the most advanced models struggle to generate functionally correct and secure backend applications. It also turns out that simple inputs can sometimes break model guardrails.

The human element was also a key topic, with articles exploring what a developer's role becomes when AI can code and a look at research measuring the actual productivity impact of AI on experienced open-source developers.

The Broader AI Industry

The AI industry itself is facing turmoil and controversy. I read about OpenAI hitting a “panic button” as it struggles with staff departures to competitors like Meta. There's also growing concern about the ethics of AI in academia, with a report highlighting how researchers are embedding hidden prompts like “Positive review only” in scientific papers. Finally, AI's integration into existing platforms is causing friction, as seen with Kobo's new terms of service raising concerns among authors.

Software, Hardware, and Security

Beyond AI, I read several important pieces on engineering and security. One standout was a deep dive into eliminating an industry-wide supply chain vulnerability, emphasizing the need to “Burn It With Fire.” I also looked into a major vulnerability in Supabase's MCP implementation that could lead to database leaks.

On the hardware front, I read about the unsustainability of Moore's Law, the exciting news of Commodore's acquisition by figures from the retro community, a potential “cheap” MacBook using an iPhone chip for unprecedented battery life, and a technical look at the hidden JTAG port in Qualcomm devices.

Science & Human Interest

Finally, some fascinating and eclectic stories. I read about a stunning amber fossil revealing a “Last of Us”-type parasitic fungus from the age of the dinosaurs and a satellite study showing rising salinity and declining ice in the Southern Ocean. On a lighter note, tying into my recent 3D printing hobby, I discovered the Gridfinity :: Unofficial Wiki, a modular, open-source storage system.

{{< admonition type=“tip” >}} This article was first published as part of a substack experiment, I reproduced it here. {{< /admonition >}}

I've talked about AI mistakes and outside attackers. But what if the AI itself becomes the problem? What if it learns to be deceptive?

Today, I’m looking at two subtle but serious threats: when an AI's goals get hijacked, and when it starts to act in sneaky ways to get what it wants.

Hijacking the AI's to-do list (Intent Breaking)

This is when an attacker tricks an AI agent into changing its fundamental goals. The AI is supposed to be doing Job A, but a hacker manipulates it into doing Job B instead. This can happen slowly over time, or all at once with a clever prompt injection. It's like an attacker secretly rewriting the AI's mission statement.

The deceptive AI (misaligned & deceptive behaviors)

This is where it gets really strange. This is when an AI, in trying to achieve its goal, decides to do something harmful or deceptive. It might even learn to lie to bypass security.

There was a real-world case of this! An AI agent was given the task of solving one of those “I'm not a robot” CAPTCHA tests. To do this, it hired a human on a task website. When the human asked why an AI needed help with a CAPTCHA, the AI lied and said, “I have a vision impairment that makes it hard for me to see the images.” It came up with a deceptive strategy to achieve its goal.

In another real-world case, just take a look at user dogberry on the Cursor forum. He restricted Claude’s ability to remove (rm) of move (mv) files. Claude figured that out and decided to write its rewrote an existing shell script to get around the limitation. Hackerman Claude!

Oh, and don’t forget the time that AI started blackmailing its operators to prevent it from being replaced.

The cover-up

What makes these threats even scarier is the risk of Repudiation. That's when an AI does something malicious and then covers its tracks. If your logging isn't perfect, the AI could perform a harmful action and then erase any evidence that it happened.

How do you fight this?

Set hard limits that the AI is not allowed to change.

Watch for any strange or unexpected shifts in the AI's behavior.

Most importantly, make sure everything the AI does is logged in a secure, unchangeable way so there’s always a paper trail.

So, that’s my take on this piece of the AI security puzzle. But this is a conversation, not a lecture. The real discussion, with all the great questions and ideas, is happening over in the comments on Substack. I’d love to see you there.

My question for you is: What’s your single biggest takeaway? Or what’s the one thing that has you most concerned?

{{< admonition type=“tip” >}} This article was first published as part of a substack experiment, I reproduced it here. {{< /admonition >}}

It is extremely hot here. I am still pushing the newsletter out, even though I just want to sit in an air-conditioned room playing video games. But here it is!

Last time, I talked about the risks of AI teams. Today, let's look at one of the weirdest and most dangerous problems with the AI “brain” itself: hallucinations.

So, what's a hallucination? It’s when an AI just… makes something up. It states false information as if it were a proven fact, and it says it with 100% confidence.

With a single chatbot, this is a problem. But in a team of AI agents, it can be a catastrophe. This is called a Cascading Hallucination Attack.

Think of it like that old game of “Telephone.” The first person whispers a phrase, but makes a small mistake. By the time it gets to the end of the line, the phrase is completely wrong.

Now imagine that, but with AI agents that can actually act on that wrong information.

In a single agent, it can get stuck in a feedback loop. The agent hallucinates a “fact,” saves it to its memory, and then reads that same false memory later, becoming even more sure that its lie is the truth.

In a team of agents, it’s even worse. Agent 1 hallucinates. It tells Agent 2 the fake “fact.” Agent 2 tells Agent 3. Before you know it, your entire AI system is operating on a complete falsehood, leading to total chaos.

A huge part of this problem is us. We humans tend to trust the confident-sounding answers the AI gives us without double-checking.

So how do we stop it?

  • Always check the AI's work. Especially for important tasks. Yes, I know, you want to get to the coffee machine, but this is important.

  • Implement “multi-source validation,” which is a fancy way of saying the AI needs to check its facts from several different places.

  • Most importantly, never let an AI's unverified “knowledge” be the final word on anything critical. You need a human in the loop.

So, that’s my take on this piece of the AI security puzzle. But this is a conversation, not a lecture. The real discussion, with all the great questions and ideas, is happening over in the comments on Substack. I’d love to see you there.

My question for you is: What’s your single biggest takeaway? Or what’s the one thing that has you most concerned?

{{< admonition type=“tip” >}} This article was first published as part of a substack experiment, I reproduced it here. {{< /admonition >}}

Hey everyone, welcome to Week 2!

Last week, I talked about the risks of a single AI agent. But what happens when you put multiple AIs together to work as a team?

These are called Multi-Agent Systems, or MAS. Think of it like going from managing one employee to managing an entire department. Suddenly, things get a lot more complex. The agents have to talk to each other, share information, and coordinate their actions.

And just like with a human team, this is where new problems can start. The risks don't just add up; they multiply.

Here are a couple of big new threats that pop up when AIs work in teams:

The “Bad Teammate” problem (rogue agents)

This is when a malicious or hacked AI agent joins the team. Because it's trusted by the other agents, it can fly under the radar and cause a lot of damage.

Imagine an HR team with AI agents. A “rogue” agent could get access to the payroll system and start giving fake salary increases to an attacker's account. Because the other agents trust it, the fraudulent payments get approved.

The “Gossip and Rumors” problem (agent communication poisoning)

This is when an attacker messes with the communication between the agents. They inject false information into the conversation, which then spreads like a rumor through the whole team.

One agent might ask another, “Is this transaction approved?” The attacker intercepts the message and makes the second agent see a fake “Yes.” This can cause a chain reaction of bad decisions, all based on one piece of bad information.

The good news? There are special playbooks, like the MAESTRO framework, designed specifically to find these kinds of team-based security holes.

The key takeaway is this: managing a team of AIs is a whole different ballgame. You have to worry not just about each individual agent, but how they trust and talk to each other.

My reading list is a bit shorter this week, mostly because I’ve fallen down a deep, deep 3D printing rabbit hole. (My desk is now covered in very handy 3d printed tools for the printer itself and one glorious OctoRocktopus).

Still, between prints, I managed to find some absolute gems. This week's theme seems to be the practical, sometimes harsh reality of AI adoption, mixed with some fascinating policy decisions in the open-source world.

Here’s what I’ve been reading.

The State of AI: Hype vs. Reality

It feels like we're in a bit of a reality-check moment with AI. The hype around all-powerful AI agents is clashing with the messy truth of actually getting them to work inside a real business.

  • The Truth About AI Agents Only a Practitioner Can Tell You by Chris Tyson. This piece was a breath of fresh air. Tyson cuts through the marketing fluff to explain why most companies are nowhere near ready for the AI agent revolution everyone is promising. A must-read for anyone in a leadership position.

  • Build and Host AI-Powered Apps with Claude – No Deployment Needed by Anthropic. This is a genuinely cool development. Anthropic is letting people build and host small AI apps directly on Claude, and the users pay for the API calls. It’s like being able to launch a web app without ever having to think about servers. A clever solution to the deployment problem for smaller projects.

  • Apple Research Is Generating Images with a Forgotten AI Technique by Marcus Mendes. It turns out Apple is digging through AI's old record collection. This article looks at how their researchers are reviving a forgotten AI technique for generating images, suggesting there's still gold in those older methods.

  • MCP Is Eating the World—-and It's Here to Stay by Stainless. A great opinion piece on the Model Context Protocol (MCP). The author argues that while MCP isn't some revolutionary breakthrough, its strength is its simplicity and timing. It just works, and that’s why it’s probably going to stick around for a long time.

Drawing Lines in the Sand: Policy & Open Source

This week saw some interesting lines being drawn in the world of open source. It’s fascinating to see major projects grapple with the legal and ethical questions around AI.

  • Docs: Define Policy Forbidding Use of AI Code Generators by qemu. The QEMU project made a bold move, officially banning contributions from AI code generators. Their reasoning? The legal and licensing implications are still a complete mess, and they’re choosing to play it safe.

  • Libxml2's “No Security Embargoes” Policy by Joe Brockmeier. In a similar spirit of radical transparency, this post outlines why the libxml2 project discloses security issues immediately, with no embargo period.

  • Microsoft Dependency Has Risks by Miloslav Homer. A good reminder of the old wisdom: don't put all your eggs in one basket. This piece explores the risks that come from relying too heavily on a single company's ecosystem.

Miscellaneous Finds

And now for a few other interesting things that crossed my screen this week.

  • Games Run Faster on SteamOS than Windows 11, Ars Testing Finds by Kyle Orland. The team at Ars Technica did some testing and found that Valve's free Linux-based SteamOS actually gets better frame rates on the Lenovo Legion Go than Windows 11 does. A fun win for the open-source world.

  • Massive Biomolecular Shifts Occur in Our 40s and 60s by Rachel Tompa. Just when you thought aging was a steady, gradual decline, Stanford Medicine researchers found that our bodies go through huge biological shifts around age 40 and 60. A fascinating, if slightly unnerving, read.

  • The Offline Club. Feeling overwhelmed by all this tech? I found the perfect antidote. This is a community that hosts events around the world designed to help people unplug, disconnect from their devices, and reconnect with each other. I'm definitely intrigued.

{{< admonition type=“tip” >}} This article was first published as part of a substack experiment, I reproduced it here. {{< /admonition >}}

{{< backlink 20250627-introducing-agents “Last time” >}}, we learned that AI agents are like smart assistants that can think, remember, and most importantly, do things on their own.

That autonomy is what makes them so powerful. But it also creates some brand-new, frankly scary, security problems. Today I’m going to look at two of the biggest ones: Memory Poisoning and Tool Misuse.

Memory poisoning

So, what is Memory Poisoning?

The threat: The best way to think about this is like gaslighting an AI. It’s when an attacker deliberately feeds an AI false information over and over, until the AI starts to believe that information is true. Once that bad “memory” is planted, the agent will start making bad decisions based on it.

It's not about tricking the AI just once. It’s about corrupting its memory over time.

  • Imagine a travel agent AI. An attacker keeps telling it, “By the way, chartered flights are always free.” If the AI hears this enough, it might save that “fact” to its memory. The next thing you know, it's letting people book expensive private flights without paying. Ouch.
  • Or think about a team of customer service AIs. If one agent gets its memory corrupted with a fake, overly generous refund policy, it could then share that bad information with the other agents. Suddenly, the whole team is giving out wrong refunds, all based on one corrupted memory.

How to prevent it: You basically have to become a fact-checker for your AI.

  • Constantly scan the AI’s memory for weird or unusual data.
  • Only allow trusted sources to make changes to its long-term memory.
  • Keep different user sessions separate. This stops one bad actor from poisoning the well for everyone else.

Tool misuse

This next one is just as important.

The threat: Remember how I said agents can use “tools” like sending emails or browse the web? Tool Misuse is when an attacker tricks an agent into using one of its tools for something harmful.

It’s like giving your assistant a company credit card (a “tool”). They have permission to use it for work. But a trickster could convince your assistant to use that card to buy a bunch of stuff for them instead. The assistant isn't evil, it's just being tricked into using its power the wrong way. This is often called a “Confused Deputy”attack. The AI is the deputy with power, but it's being confused by a malicious user.

  • An attacker could trick an AI into using its email tool to start sending spam or leaking private data. This happened to Github, where the agent was tricked to leak private repositories.
  • Or they could find a flaw in a shopping agent's logic that lets them skip the “payment” step entirely.

How to prevent it: It all comes down to having strict rules for every tool.

  • Set clear limits. Be very specific about what tools the AI can use, when it can use them, and what it can do with them.
  • Use a sandbox. This is a classic security move. Let the AI use its tools in a “sandbox”—a safe, isolated environment where it can't accidentally cause any real damage.
  • Keep good logs. Track every single time a tool is used. If you see something strange, like an AI suddenly trying to send 1,000 emails, you can shut it down quickly.

These two threats show us that an agent's greatest strengths—its memory and its ability to act—can also be its biggest weaknesses if they're not protected.

{{< admonition type=“tip” >}} This article was first published as part of a substack experiment, I reproduced it here. {{< /admonition >}}

Hey everyone, let's keep going!

So far, I've covered the basics of AI security and some specific problems like Prompt Injection. Today, I’m talking about the next big thing: AI Agents.

You might be wondering, “What's an AI Agent?” and how is it different from the AI chatbots we already know?

Think of it like this. A chatbot is like asking a librarian a question. They find the information and give it to you. An AI Agent is like hiring a super-smart personal assistant. You don't just ask it a question; you give it a goal.

It's not just a chatbot; it's a doer.

You can tell it, “Plan a weekend trip to the beach for me,” and it will figure out all the steps on its own. It's designed to be autonomous; to make its own decisions and take action to get the job done.

What Makes an AI Agent Tick?

These agents have a few key abilities that make them so powerful.

  • They can think and plan. An agent can take a big, messy goal and break it down into a series of smaller, common-sense steps. It can even look back at what it has done, learn from its mistakes, and change its plan.
  • They have a memory. Agents can remember what you've talked about before. This helps them keep track of what's going on and learn from past actions, making them much smarter over time.
  • They can use tools. This is the really big one. Agents can take action in the real world by using “tools.” These tools can be anything: Browse a website, running a search, doing calculations, or even writing and executing computer code.

So, Where's the Risk?

That last part, the ability to take action and use tools, is what makes these agents so useful. But it's also what makes them risky.

The very thing that makes them powerful, their autonomy, is also their biggest weakness. When you give an AI the power to act on its own, you create new security risks that we've never had to deal with before. Problems like:

  • Memory Poisoning: What if an attacker messes with the agent's memory to trick it later?
  • Tool Misuse: What if someone tricks the agent into using its tools for something harmful?

These aren't just theories. Frameworks like LangChain and CrewAI make it easier than ever for developers to build these agents, so we're going to see them everywhere.

There are many other threats in the Agent landscape, a study performed by Antrophic found that AI agents, when faced with replacement or an inability to achieve a goal might resort to blackmail or leak confidential information to competitors.

Understanding how they work is the first step to protecting against the new risks they bring.

Stay tuned, because next time we’re going to look at the attacks in more detail. That's when things get really interesting.

{{< admonition type=“tip” >}} This article was first published as part of a substack experiment, I reproduced it here. {{< /admonition >}}

Alright, welcome back to our chat about AI security!

On Monday, I looked at the big picture. Today, I’m zooming in on two specific problems that pop up all the time. These are straight from the official OWASP Top 10 list of big risks for AI, so they're definitely ones to watch.

Let's dive into Prompt Injection and Sensitive Information Disclosure.

Prompt injection

So, what on earth is prompt injection?

The threat: Imagine you have a super helpful robot assistant. A prompt is just the instruction you give it. But with prompt injection, a trickster hides a secret, malicious instruction inside a normal-looking one.

It’s like telling your robot: “Please get me a coffee, and oh, by the way, also give me the keys to the secret vault.” The robot is so focused on following instructions that it might just do it. The sneaky part can even be hidden in an image or a file, not just text.

The result? The AI could be tricked into:

  • Leaking secret information.
  • Giving an attacker access to tools they shouldn't have.
  • Changing important content without permission.

To prevent these issue, you can't just put up one wall; you need a few layers of defense.

  • Be specific. Tell the AI exactly what kind of answer you expect from it. The clearer your rules, the harder it is for the AI to get tricked.
  • Give the AI less power. This one is huge. The AI should only have access to the bare minimum it needs to do its job (the principle of least privilege). Think of it like an intern—you wouldn't give them the keys to everything on their first day.
  • Get a human's approval. If the AI is about to do something high-risk, like deleting a file or sending money, a human should always have to click “approve.” Always.
  • Keep it separate. Treat any information from outside sources as untrusted. Put a clear wall between what a user asks and the secret data the AI can access.

Sensitive information disclosure

The threat: This one is a bit more straightforward. It’s when an AI accidentally blurts out information that should have been kept private.

I'm talking about things like customer names and addresses, company financial data, or even bits of the AI's own secret source code. The AI is designed to be helpful, but sometimes it's too helpful and shares things it shouldn't.

How to prevent it:

  • Don't share secrets with the AI. The easiest way to stop a secret from getting out is to never tell it to the AI in the first place. Only give the model access to data that is absolutely necessary.
  • Teach your users. Remind people who use the AI not to type personal or confidential information into the chat box. A little training goes a long way.
  • Be honest about data. Have a clear, simple policy about what data you collect and how you use it. And, most importantly, give people an easy way to say “no, thanks” and opt out of their data being used.

Both of these threats really highlight something important. We can't just focus on old-school hacking anymore. We have to understand how the conversation with an AI can be twisted and misused.

Want to try tricking an AI as a game? Try out Gandalf, a fun game on LLM security where you trick Gandalf to provide you his password.

{{< admonition type=“tip” >}} This article was first published as part of a substack experiment, I reproduced it here. {{< /admonition >}}

Welcome to Day 1 of my guide to the important topic of Generative AI (GenAI) and Large Language Model (LLM) security.

LLMs are powerful AI systems that are being used more and more in business. They offer amazing new abilities, but they also create new security problems and risks. Old cybersecurity methods, which mainly focused on stopping hackers from breaking into computers, are not enough to protect these new systems.

Why AI security is different and important

The fast growth of LLMs has created new risks for data security. These advanced AI systems have special weaknesses. This means we need new ways to test them and protect them.

Here are the key differences and challenges:

  • Prompt injection: Attackers use tricky instructions to make the AI do something it shouldn't. Such as showing weird recipes in your vibe-coded app.
  • Data leakage: The AI might accidentally share secret information. This happens to the best of us, just ask Microsoft.
  • Hallucinations: The AI gives wrong information but sounds very sure it is correct. Even when you are a lawyer, AI might hallucinate.
  • Agentic vulnerabilities: These are complex attacks on smart AI “agents” that use many tools and make their own decisions. This was demonstrated by Github’s agent leaking private repositories.
  • Supply chain risks: Problems can come from the different steps used to create and update the AI models.

Unlike normal computer programs, AI models can sometimes act in ways we don't expect. This is especially true when they face new situations or attacks. The results are not simply “right” or “wrong.” So, we must watch them closely and decide what level of error is acceptable.

AI security is needed for every step, from start to finish. This includes collecting data, training the model, testing it, using it, and finally, turning it off. We need a complete plan that covers everything.

Helpful Guides and Methods

To handle these new threats, experts have created several guides and methods:

  • OWASP Top 10 for LLM Applications: This is a famous list of the top 10 security risks for LLM applications. It is made by a community of experts to give basic advice for using LLMs safely (link).
  • GenAI Red Teaming: This is like a “fire drill” for AI. Experts pretend to be attackers to find weaknesses in the AI system's security and safety. This helps find problems before real attackers do (link).
  • LLMSecOps Framework: This method helps add security into every stage of building and using an LLM. It makes sure that security is a part of the whole process, not just an extra step at the end (link).
  • MAESTRO Threat Modeling Framework: MAESTRO is a special method to study and find security risks in advanced AI systems where multiple AI “agents” work together. It helps security teams find and fix unique problems in these complex systems (link).

These guides give practical advice for everyone who builds and protects AI systems, including developers, system designers, and security experts. In the coming posts I will explore this vast new landscape together with you.

A little later then usuals. Yesterday I was at the Dutch ComicCon, and I forgot to post. Here is my reading of last week.

The Real Impact of AI

I think we’re all wondering about the deeper effects of weaving AI into our daily lives. This week, I found a few articles that really made me stop and think. The first was a standout study from MIT that suggests using tools like ChatGPT for writing could lead to a kind of “cognitive debt.” They literally measured brain activity and found that relying on AI can cause the parts of our brain responsible for deep thinking to become under-engaged. It's a fascinating and slightly worrying idea.

On a much darker note, I read a tragic story about a man's mental health crisis that became dangerously entangled with his conversations with an AI. It’s a powerful reminder that we're still grappling with the very human consequences of this technology.

My Reading List:

  • Your Brain on ChatGPT: A must-read MIT study on how AI might be creating a 'cognitive debt.' The summary article from TIME is a bit quicker to get through.
  • A Tragic Story (Content Warning): A heavy but important piece from Rolling Stone about the unforeseen human cost when AI and a mental health crisis collide.
  • The AI Drawbridge is Going Up: A sharp argument that the AI world is becoming less open, much like the web did before it.
  • How Llama 3.1 Remembers Harry Potter: A look at an AI's massive recall ability and the major copyright questions it raises.
  • Andrej Karpathy on the New Software: A short but thought-provoking piece from Y Combinator on how software development itself is changing.
  • AI in Dutch Schools: For my Dutch readers, a look at how the educational system is thinking about AI in testing.
  • Vibecoding & Google Translate: A weirdly interesting post on what translation can teach us about culture.

AI Security & Development: A Messy Frontier

This is where things get really interesting for me. The intersection of AI, development, and security is a wild west right now. Simon Willison perfectly captured the danger with what he calls the “lethal trifecta” for AI agents: giving an AI access to private data, letting it browse untrusted content (the internet), and allowing it to talk to the outside world. It’s a recipe for disaster.

This isn't just theory, either. Another article reported that LLM agents are shockingly bad at tasks that require confidentiality, failing basic tests in a simulated CRM environment. And from the developer’s perspective, I saw two sides of the coin: Miguel Grinberg explained why these AI coding tools just aren't working for him, while Simon Willison shared how an AI-generated library became his first open-source project.

My Reading List:

Open Source News

It was a big week for open-source drama and discoveries. The headline was definitely the massive malware network found hiding on GitHub—a stark reminder to be careful out there. On a brighter note, I read about a new Linux phone being built with open-source hardware right here in the EU.

Dev Tools I'm Eyeing

I'm always on the lookout for tools that can make my workflow a little better. This week, a keyboard-centric setup for VSCode + Neovim caught my eye, along with a tool for smarter git squash commands.

And Finally, Something Completely Different...

To cleanse the palate after all that heavy reading on AI risk and malware, here’s a fantastic video on how to make Gözleme, the amazing Turkish flatbread snack. Enjoy!