Tag: ai

  • Cutting Costs / Adding Value

    Cutting Costs / Adding Value

    What is price? It can be a measure to “weight value”.
    For example, there was this person on Twitter that stated stated how $30 per PR shipped is a good price, and I tend to agree.

    The thing we have with AI now is that you can spin up a model ask it to build something.

    Guess what? The model will agree and will start using tokens (spending tokens). There is no pushback. There is no questioning of any assumptions.

    You can build AI skills that manage some level of pushback, but overall, whenever you ask a model to do something, it will do it, sometimes it’ll relentlessy do it.

    This brings everything to another level and raises another question that remains valuable today, a question that was there from the beginning: What’s the value of what I want?

    There will be a time when people will start saying we need to bring the costs down, at that time we will go back to square one because, as I said before, it was never about building. It was always about building the right things.

    This opportunity to directly see the price of an implementation, or at least have a rough idea, gives you an overall understanding that money is a finite source. You want to spend it wisely on things that matter and things that will make a difference.

    But we can’t ignore the fact that we are now in the full bloom of AI. It’s normal to abuse tokens now. People are exploring a new world.

    But, at the same time, I expect that there will be a time when some people will either switch to local models for things where they are unsure of the value, or they will start questioning the product roadmap.

    For example, is it ok spending $1,000 or $2,000 for an idea that might not bring value? These costs will sum up over time and will add up maintenance costs that will make running products even more difficult.

    What we ultimately want is a sustainable product that is easy to maintain and brings true value for customers, that’s the dream right?

    It was never about the price. It was never about what to build or the time to build. It was always about building the right things that solve people’s problems and keep them engaged with your product.

    But this is actually a cool thing about AI because you can now have a rough understanding of how much investment is needed to do anything.

    Before, the situation was slightly more blurry, estimates were difficult, we didn’t have “human tokens”. Now, you are starting to see the cost clearly also (strangely) thanks to measurable token costs which, even though you can’t predict the exact amount of work (and therefore the spending) can help a lot in this kind of planning.

    It will be somewhat easier for entrepreneurs, stakeholders, or product people to think about whether they are spending money in the right direction.

    And, eventually, they will be accountable for how much they spend.

    This is not new: If you want to adopt a new product that costs a significant amount, you obviously need to figure out how make a case for its usage in your company. This will be no different.

    Knowing that what you build is valuable or will make a ROI seems obvious, but now you have kind of a metric to help understanding it.
    Tokens.

  • Two Bets on the Future of Software Engineering

    Two Bets on the Future of Software Engineering

    I feel that in the future all code will be written by agents.

    That might not be the time now for every company, but I think that’s where we’re heading. The interesting part is not whether that happens or not. The interesting part is how we get there.

    I think there are multiple ways to reach a point where we can successfully ship production-quality code written by agents. Not in terms of quantity, but in terms of quality. Quality that is comparable to what humans produce today.

    And to understand why that’s difficult, we first need to talk about context.

    Context Is More Than Code

    We all know that context is partly the limit, and even more so the context that is not part of the code.

    Whenever we create a project, we create it with an intention. We define it with taste and goals. Those things are often not written in the code.

    Sometimes they’re documented elsewhere. Sometimes they’re not documented at all.

    The code is only one part of the project. The intentions behind it, the trade-offs, the expectations, and the taste of the people building it are often stored somewhere else, or nowhere at all.

    That’s one of the biggest challenges if we imagine a future where agents write all the code.

    Getting There

    If the world eventually reaches a point where all code is written by agents, there will be multiple stages to get there.

    Right now, what we’re seeing is that some people and companies are using agents directly, while other companies are all-in on using AI to augment a single developer.

    Those are two different ways of approaching the problem.

    I think the first group is going to have more trouble in the beginning because, again, the agent doesn’t know everything. It doesn’t know the taste, the goals, or the requirements that exist outside the immediate scope of the task.

    That gets messy quickly.

    All-In on Agents

    This is similar to what OpenAI described in their Harness Engineering document around projects that are largely AI-generated where the agent does all the work, 100% code is written by it and it’s async. It doesn’t need a human to start working.

    That poses some interesting questions.
    You want to review the code. You want to make sure it’s clean, that it doesn’t introduce duplications, and so on.

    At the same time, you want some kind of ownership of that code by the AI itself.
    So you need to build processes that revolve around making sure the code continues to evolve asynchronously with minimal human interaction.

    Humans are still involved, but they’re not guiding every step. They’re reviewing, controlling behavior, and making adjustments.

    The goal becomes enabling the system to continue operating without needing a human to constantly push it forward.

    AI as Augmentation

    The other group is probably moving slower, but (probably) at higher quality.

    They’re controlling the agents more. They’re augmenting their workload rather than delegating ownership completely.

    But at the same time, whenever we reach a point where we can safely trust agents to do their work, or at least part of it, they won’t have learned how to guide agents in that way.

    Current harness engineering and context engineering revolve around addressing the limitations of today’s agents, but whenever a new model comes out, you’re suddenly facing a different set of problems.

    Some of the techniques you’re using become useless, for example models already require much less prompt engineering than they did before.

    Trade-offs

    I believe some companies are spending time on things that won’t matter anymore in a few years or maybe months.

    One group is optimizing for quality and control.

    The other group is optimizing for amplification and learning how to operate with autonomous systems. It’s a trade off one has to make.

    The Winner

    In the long term, I think the first group, the one adopting AI-written code more aggressively, is going to be faster, assuming they can sustain the errors long enough (or have a quick feedback loop for humans and AI agents) for the models to improve.

    It’s a bet, of course. I’m betting too as I can’t predict the future.

    My take is that models will reach a point where they can remove most of the issues they created. But to get there, you need to enable each model and agent to self-serve and self-direct.

    Even something as simple as checking that reporting is consistent could become a task handled by an agent.
    Or maybe it’s something that won’t matter anymore because the interface itself changes.

    Maybe software stops being primarily UI-based and becomes something else, who knows.

    Everyone Is Betting

    I think everyone is betting on something right now.

    The companies going all-in on agents are making one bet.

    The companies focusing on augmentation are making another.

    I think that learning how to give agents work and let them operate autonomously will be important for companies right now, the more that is possible, the better positioned your company will be if that future arrives.

    What will you be betting on?

  • Nobody knows the AI Future

    Nobody knows the AI Future

    Recently the CEO of ClickUp announced a layoff of twenty-two percent, and that was also bundled with a lot of ideas for the future, about restructuring, and about what it means for now.

    Here are my takeaways.

    1 – AI is shifting everything

      Nobody knows what the future will look like. Don’t trust who says otherwise, because we’re still in a moment in time where little do we know how AI will shape the world and how we will adapt to it.

      We’re currently doing our best, as usual, but the world five years from now will probably be quite different from what we have today.

      2 – Tokens are a limiting factor

        Tokens are a limiting factor, and inefficient tokens are a hurdle and a cost.

        This is not fully shared by the CEO. He mostly talked about letting people use unlimited tokens, and I agree with the direction. Token usage can absolutely be a limiting factor.

        Right now you want to leverage as much power as possible that AI can give you, because you can expect some of the services will bump the prices up, and we’re already seeing this.

        3 – Critical thinking matters even more now

          This is my take, but you want to build even more critical thinking as you hand off more work to AI.

          If you didn’t have enough critical thinking before, now is the time.

          If you had it before, be aware that you’re trusting AI more than you would trust a human. You need to address that.

          4 – Bottlenecks will define the future

            Everything that slows AI adoption or AI distribution is currently a bottleneck.

            It’s not about “AI can code.”
            The question is: assume that everything is done by AI, what can’t AI do currently?

            Those are the bottlenecks you need to figure out.

            Again, this is mostly a vision for the future, so it might be wrong.

            5 – Deprecating old systems

              If you want to shift the mindset entirely, adapting might take longer. You want to reinvent because adapting them might require so many different changes and what you want is speed.

            1. The Motion/Movement AI Trap

              The Motion/Movement AI Trap

              I talked about focus before, and one of the things that comes to mind when I think about how we use AI, and the characteristics of AI itself, is the difference between movement and motion. Many of us, if not all, confuse these two things.

              What I mean by that is: sometimes you task AI to do something, and it spins and spins, creates code, creates an application that looks like it’s doing something. But from time to time it might break. It might not achieve what you wanted. It might have bugs.

              Yet Another App

              Maybe, for example, you’re one of the people who started creating your own iOS app. I for sure did. And as we now know, the category of apps that had the most influx of new applications is transcription apps, apps that use an AI model to transcribe what you say, speech to text.

              So maybe you started creating your own app, and that felt like doing something.
              But then you didn’t ship it. You didn’t market it. You just let AI spin and spin.
              You started configuring your OpenClaw setup, and maybe now you have a Telegram bot that does stuff for you, but that stuff might not translate to actual changes in your life.

              You spent time, and that time didn’t result in a change, in a practical improvement.

              Motion That Feels Like Progress

              That is the confusion between motion, this thing that keeps spinning, and real movement.

              It feels like we’re doing something. You get this burst of adrenaline, of interest and enthusiasm, because you’re building something, you’re starting to create something.

              For some people who were not coders, this was a new feeling, because they never went through the whole coding experience. But that did not translate into a meaningful change. At least for some people, it did not.

              Even skilled people get tricked from time to time into thinking that all this delegation, all this work, produces something when it doesn’t.

              6 Questions to Break the Illusion

              One of the things AI does well is spinning the wheel, this process of letting you think that something happens while nothing happens.

              So the more you use AI, the more you should reflect:

              1. Is it helping?
              2. What things are you letting go of while doing so?
              3. What things are you not aware of?
              4. How is it addressing your problems in a meaningful way?
              5. Are you learning more, or are you delegating to something you don’t understand?
              6. Is this movement tangible, or is it not?

              At the end of the day, AI is an incredible new tool that we are pleased to have.
              But as with all tools, and all the time we spend on tooling, what we want in the end is to make a meaningful change, either in the world, in our work, or in our life.

              Is that new tool you’re using achieving that goal, or is it giving you the illusion, the impression that it’s achieving it?

              And if it’s the latter, what should you change?

            2. People Choose You, Not the Model

              People Choose You, Not the Model

              How easy it is to determine if an article has been written through an AI?

              I’m used to see the the type of wording that get out gets out from when you pass your content through an AI model, with titles that are clearly structured in a very specific way. It makes it pop when something is written by an AI or not.

              I cannot lie, sometimes this titles are a blocker for me. I have less interest in reading the content even if the original content was interesting. They may want to share a beautiful idea and use AI to leverage its power to amplify the concept, turning it into a longer form, but it won’t captivate me as a reader.

              I think this is perfectly fine (who am I, afterall) , but there is one thing we should personally keep in mind when doing this.

              Who are you talking to?

              AI models, if not trained, write with a very distinctive style, each model will have it’s own flavor.

              The problem is that most of the time what you see out there is people using AI models to extend or rewrite their content entirely.
              That gives you a very precise type of output that doesn’t take into consideration your voice, your style, and the beauty of who you are (which again is perfectly fine, to do).
              But that begs the question: is that type of content talking, speaking to the right people?

              I’m not sure if formatting things through AI this way is effective because it targets a very generic audience. While you might attract a specific kind of reader, you will likely alienate others who are more aware of these scenarios and less susceptible to such titles.

              I believe it is crucial not to skip the editorial process. You must check whether the output is truly reaching the right audience.

              It’s all about the audience

              • Am I writing for the right audience?
              • Is this voice, this tone and style, is it going to be listened or be read by the right audience, the one that I want to reach?
              • Is this talking to them the way I would like to talk to them or not?
              • Did I just skip a step in the process to make producing content faster?

              If you answered yes at least once, then I do think it is valuable to take some time and revisit that output. You should add your personal sauce, your style, and what makes you unique. This can be as easy as changing the titles and sometime will become a more structured editorial review of your content.

              Take a moment to think about your articles. Even if they are AI-generated, take your time to review them as an editor would.
              That human element is what connects you to your specific reader. You are not writing for a generic reader, which is what you often get from a model, but for the reader you want to reach or the one who chose you to begin with.

              At the end of the day, people choose you because you have a specific voice, don’t let them down.

            3. Choosing a service is always about value

              I remember back in the days when I was seriously considering switching off some online services. I had grown less and less interested in being tracked and having my activity aggregated and, potentially, misused. This journey was also part of how I learned about marketing and what opportunities exist in that space.

              One of the things I didn’t like then, and I’m still personally not a big fan of, is the remarketing approach. You know the one: you visit a site, and from that moment on, you’re presented with advertisements about that same site until you either buy something (so the ads stop) or start using ad blockers.

              While I’m actually a believer that good marketing doesn’t necessarily need this aggressive approach, I also realize that sometimes, to make a purchase happen, you need to be pushy with marketing. I don’t agree with those tactics, but that’s not really the point of this discussion.

              Enter AI (yes, I know, it’s everywhere)

              What was interesting to me is that with the advent of AI, I reconciled a lot of things.

              One of the reasons why AI can be so powerful is that the more context it has, the more it can learn, be helpful, and understand how you work and what could work for you. The benefits are absolutely incredible on some levels. That’s why some people suggest doing long-form chatting or talking with AI, when you ask detailed questions with lots of context, the replies you get back are much more relevant.

              This shows up in some amusing contradictions. Yesterday, I saw a LinkedIn post where someone asked: “I need to wash my car, and the car wash is only 50 meters away. How should I get there?”

              One AI said “by walking.” Another said “by car, because you need to wash the car after all.”

              The problem here is obvious: we expect AI to get the answer right, but the right answer is based on context.
              The real question was “how do I go there to wash the car, knowing I’ll need to take the car with me?” but that wasn’t explicitly asked.

              It’s all about assumptions. If you provide all the context, “I need to wash my car AND I need to decide how to get there so I can get it washed”, you’d clearly see how the question was being tricky, and AI would probably give you a proper answer.

              Switching Away was not (only) about tracking

              So what’s the point of all this? It made me re-evaluate many of my concerns about sharing data.

              After thinking a lot about it, I realized that my problem was that I wasn’t getting enough value out of the services. Maybe they weren’t providing enough value for me. All that tracking just to give me ads, to track me more, to sell me more stuff… that kind of intelligence wasn’t giving me what I needed. It wasn’t providing value to me as a customer.

              Nowadays, things are shifting. The value you might get from AI is so vastly different, so powerful, that it might genuinely change some people’s lives.

              I believe that choosing a service or product has always been about the value it provides. There’s always a trade-off: how much am I getting versus what am I paying? Sometimes you’re getting something for free, but you’re paying with your personal information. The famous saying “if the service is free, you are the product” still stands.

              At the same time, it’s a trade-off you might revisit because you’re getting so much value out of it. But until you get there, and the value isn’t that significant, it’s all about understanding: is this giving me enough or not?

            4. The more you build, the more you’ll need to understand

              The more you build, the more you’ll need to understand

              I remember when I started studying computer science. At that time, I was very eager to try new stuff. I still am. I also remember there was a lot of discussion about which Linux distro to use. I tried many of them: I tried SuSE, I tried Mint and, at one point, I decided to try Gentoo.

              Gentoo, for those who don’t know about it, was a very unique concept because you would build every single thing for your machine. You compiled everything from the ground up.

              I remember it would take hours and days to get a minimal UI working, and this was without even thinking of building things like OpenOffice, which were giant pieces of software. Since everything was built from the ground up, you had to get your system built piece by piece, which involved a lot of manual command-line typing: setting up the system, configuring it, and so on.

              It was a learning experience, but it was also frustrating. From time to time you would get an error after hours of compilation where you would need to go back and try to figure out what was broken and why it didn’t work. At that time, the community was strong enough for you to be able to navigate these problems by yourself and that provided a fantastic learning experience.

              That’s where most of my terminal foundations came from, and I still rely on those skills, it paid dividends in my life.
              I’ve used this skills set up a very high-performing WordPress sites on very cheap systems. and while that article is very old and certainly outdated, the concept behind it is still valid.

              These days, I happen to interact frequently with AI, and during these holidays I leveraged AI to build a couple of things for myself.
              I realized that the knowledge I do have is still helpful, because AI was making some errors here and there, and I was able to spot them, understand them, and figure out why something was broken.

              I’m now starting to believe that moving forward, leveraging AI will lead to greater performance and greater speed, but we’ll still need two critical things whenever we build something.

              The first, as developers, is understanding: having at least a partial understanding of the technologies involved.
              Not because it’s useful as applied knowledge, we might not even apply them directly since AI will become even better in the future, but as directive knowledge
              Something that is critical for tying pieces together, as well as seeing the flaws in AI’s code and proposals.
              To get that, we need to embrace the somewhat painful and kind of boring, compared to “just building it with AI”, slow work of practicing and learning the technologies from the ground up. Working step by step, failure by failure.
              This goes hand in hand with leveraging AI more and learning how to delegate more, and better, work.

              The second thing, I plan to write more about it, is knowing what to build and what not to build.
              That is another key component of every software-building cycle that will need more attention in the future.

              So if there’s anything I hope the next years will give people, including me, it’s: the passion, the time, and the willingness to pursue learning things from the ground up, to understand systems/technologies and how they connect to each other and to use them as a foundational tool to better direct their work as AI takes even a more prominent place in our daily work life.

              PS: Related to this article is the concept of Mental Maps

            5. The end of waiting

              The end of waiting

              I think it was Instagram that first introduced the concept: removing the “waiting” part from our actions in the app.

              Before that, the standard pattern in UI design was simple: you clicked a button and waited until the action was completed. This pattern likely existed because it was reasonable to assume users wanted immediate confirmation that an action was finished, deleted, or errored out.

              But over time, we realized that many actions aren’t that critical. As we started moving faster, we wanted the application to just “take care of it”.
              We only really wanted to know if things went wrong. Since errors represent a tiny fraction of the workflow, the trade-off made sense.

              Optimistic UIs

              Optimistic UIs started to become more common. You’d upload an image, and the app would let you continue as if it were already done, processing the heavy lifting in the background.

              This was a paradigm shift. You could either force a user to wait seconds, or even minutes, depending on complexity, or you could let them move forward. By not blocking the user, the interface allowed for a continuous flow of action.

              The second wave of waiting

              I believe we are seeing a second wave of this pattern now with AI agents.

              Right now, interacting with AI often feels like the “pre-Instagram” era of the web.
              Every time you ask an AI something, you’re stuck waiting for the application to finish “thinking” and applying that thought. In software development specifically, this often means you can’t touch a repository while the AI is working on it.

              How are tools solving this? We are starting to see the emergence of parallel workspaces:

              • Cloud Agents: Tools like OpenAI Codex provide cloud-based agents to delegate actions without locking up your local environment. (Jules does the same for Gemini, although not as tightly integrated as Codex)
              • Virtual Workspaces: Git applications like GitButler are exploring separate virtual “workspaces” (they call it Virtual Branches) where agents can act independently, allowing you to build multiple features simultaneously (
              • Visual Workflows: In image editing, apps like Freepick or “FloraFauna” offer an UI similar to Zapier/n8n to allow you to chain AI workflows together so they process multiple tasks at once.

              Intentional Multitasking

              The true power of AI is its ability to let people do more by relying on the one thing only humans can provide: valuable attention and review.

              Currently, most AI interfaces tie you to a single chat. If you want to multitask, you have to manually open new tabs, and even then, the context doesn’t always follow making it harder to move quickly .

              The goal is intentional multitasking: a scenario where the user is not blocked by a loading spinner and can continue working without losing focus.

              Focus better, focus more

              I’ll close with a story. Many years ago, I worked on project where every edit required a full rebuild. That rebuild took two to five minutes, during which the entire editor was blocked.

              In those five minutes, we would lose all context and the focus. We became less efficient, less focused, and ultimately less creative.

              The ability to run multiple commands while maintaining focus will be one of the challenges for many products since now we can do more, but we should leverage these new skills while not losing the focus and, instead, by enhancing the focus itself so we can do more things, better.

            6. Backfilling a Project hidden Knowledge is finally Possible

              Backfilling a Project hidden Knowledge is finally Possible

              Sometimes you join a project and realize you don’t know the best practices there. It’s a new project, either new in general or new to you because you’ve joined a new team or started helping on a different codebase. This has happened to me quite a few times.

              When that happens, a few things can occur.
              First, the documentation might be up to date, which is great. But from time to time the documentation might not be aligned with the latest standards. This can happen for more mature projects, or when a project is in the very early stages and best practices are still evolving.

              In those moments I ask myself, “What are the best practices?”
              I either try to follow the framework’s recommended practices, or, if I have more experience with a specific framework ,I mix the framework’s best practices with my own learnings, experience, and judgment.

              Recently I realized we can actually backfill that information, thanks to AI.
              You may have heard me mention AI before and that’s because I use it daily for product work and coding, and I think it offers many opportunities to change how we work, no surprise I’m bringing it up again here.

              Where do a project’s best practices live?

              Whenever we interact on a pull request and comment that something is not quite right or should be done differently, we are implicitly documenting an expectation, a behaviour.

              A PR comment might say, “You did it this way, but the way we prefer to approach this for this project or our goals is different.” That information lives in the pull request, but it is sometimes not translated into the formal documentation. For many reasons, bandwidth, speed, whatever, we might not move those comments into the docs.

              That information, though, can be used to document the latest best practices on a project and there are two main ways to do this.
              The first is through deep research. The second is through MCPs and the CLI toolings.

              Deep research

              Deep research is a way for AI to think more deeply about a topic, search widely, and create a detailed document based on what you ask.

              If your project is open source or your repository is publicly available, this is especially powerful: you can ask the tool to review the repo, check comments from code owners in the last 50, 100, or 200 pull requests, and use that information as the source to gather the latest best practices for that codebase.

              This can produce documentation of the current recommended approach to development and the things to pay attention to (please don’t trust the output without checking it, but it’s a good starting point.
              Most tools call this “deep research.”: Perplexity, ChatGPT, Claude, or Gemini should support this and it’s all a matter of finding the right prompt.

              Deep research might take minutes (even 20 minutes sometimes), but it’s so powerful for this use case.

              MCPs and the CLI

              The other approach is to use MCPs and command-line tools to gather information about a private project.

              MCP stands for Model Context Protocol, and it’s a way for AI tooling to interact with external systems to retrieve information.

              For example, the MCP for GitHub would allow you to query pull requests and comments. You can do something similar if you’re running AI locally with Claude Code/Open AI Codex by asking it to use the GitHub CLI to gather PR information and comments.

              In this case you prompt the tool to analyze the PRs, but you must craft the prompt slightly differently depending on whether you use an MCP integration or CLI commands to fetch the data.

              Once the research is finished, you have a baseline that tells you what’s currently used, what’s considered best, and how to approach implementation.

              I personally find this very helpful when approaching new codebases.
              Even if you have multiple contributors, you can use this method to identify different points of view and align everyone on the same page.

              A last note: AI tooling are changing quickly, so if you revisit this content later, check the current best practices and available tools.

            7. Can AI Count Puzzle Pieces? The September 2025 test

              Can AI Count Puzzle Pieces? The September 2025 test

              There are many ways of testing the usefulness and power of an AI.
              Currently, there are good examples of, for example, trying to draw stuff, create programs, and so on and so forth.

              I would like AI to increase their ability to help humans via better detection of visual information. This can make a big difference for people who have temporary or not temporary impaired visual abilities, as well as being a time saver.

              When I was on vacation a couple of weeks ago, we found an old puzzle, and we tried to finish it with my daughter. At first it was very hard, because I couldn’t find the pieces.
              Yet, the puzzle was simple, with only 200 pieces, so why couldn’t I find the pieces?

              Turns out, some pieces were missing.

              And so with this in mind, I figured, could AI help me? (Spoiler: it didn’t. Go to the end to see the rankings).

              I decided that this could be a good test to try from time to time with current state-of-the-art models and see how they rank, by using the same image (of that puzzle), with the same prompt (kind of what Simon Willison does with Pelicans :), but less fancy).

              The image

              The image makes it clearly difficult to count stuff, mostly because the background is noisy and not clear.
              But that’s the whole concept. And why is this interesting? Because we want it to be difficult, we don’t want an easy test.

              Also, I’m intentionally not writing the number of the pieces because, as you may know, AI can be trained on this information (but feel free to count them, just don’t write the number in a comment :D )

              If I’ll ever see that they have been trained on the actual number to give the result, I will use a different puzzle.

              The prompt is the same for every tool

              Count the puzzle pieces in this photo
              

              The replies

              Gemini 2.5 Pro

              Based on a careful count of the image, there are approximately 251 puzzle pieces visible.


              Breakdown of the Count

              • Assembled Pieces: There are 41 pieces that have been assembled at the bottom of the image.
              • Loose Pieces: There are approximately 210 loose pieces scattered across the rest of the table.

              Counting the exact number of loose pieces is challenging due to them overlapping, but 251 is a very close estimate.

              Gemini 2.5 Pro reply

              Claude 4.1 Opus

              I’ll carefully count the puzzle pieces scattered across this patterned surface.

              Let me systematically go through the image, counting the visible puzzle pieces by color:

              Pink pieces: Scanning across the image, I can count approximately 45-50 pink puzzle pieces of various sizes scattered throughout.

              Light blue/gray pieces: There appear to be approximately 25-30 light blue or gray colored pieces.

              Cream/beige pieces: I can identify approximately 20-25 cream or light beige colored pieces.

              Yellow pieces: There are several distinct yellow pieces that appear to be from a different puzzle or section, approximately 8-10 pieces.

              Along the bottom: There’s a connected row of pieces forming a partial border, which contains approximately 15-20 pieces.

              Adding these together, I count approximately 120-135 individual puzzle pieces in total visible in this image.

              The pieces appear to be from at least one or possibly multiple puzzles, with the majority being pink, blue, and cream colored pieces scattered across what looks like a decorative tablecloth or surface with a vintage pattern featuring cherries and other designs.

              Claude 4.1 Opus Reply

              GPT 5 High Reasoning

              Approximately 240 puzzle pieces are visible in the photo (including the connected edge strip at the bottom). There may be a small ±5 margin due to overlaps and limited zoom.

              If you want an exact count, share a higher-zoom image or let me annotate/mark as I count.

              Results & Scoring

              Here are the results for each model

              • Gemini 2.5 Pro: 210 pieces
              • Claude 4.1 Opus: 120-135 pieces
              • GPT-5 High Reasoning: 240 pieces

              And our question is:
              Can AI Count Puzzle Pieces (September 2025)
              Well, turns out they can’t. All models failed to count, or get actually really close to the actual count with something that could be considered a valid answer.

              Which model came closer to the result?
              Gemini 2.5 Pro was the model that approximated the count better, although it should be noted that this is a 200 piece puzzle, and therefore it was still super-wrong.

              So, considering this, here are the rankings (no winner, yet)

              1. Gemini 2.5 Pro
              2. Claude 4.1 Opus
              3. GPT-5 High Reasoning

              We’ll see if things improve in the future.