On January 5, employees at Cursor returned from the holiday weekend to an all-hands meeting with a slide deck titled “War Time.” After becoming the hottest, fastest growing AI coding company, Cursor is confronting a new reality: developers may no longer need a code editor at all. Check out the full story: https:// forbes.com/sites/annatong /2026/03/05/cursor-goes-to-war-for-ai-coding-dominance/?utm_campaign=ForbesMainTwitter&utm_source=ForbesMainTwitter&utm_medium=social … ( : Kimberly White via Getty Images for Fortune Media) Relevant View quotes unpopular take but IDE-based AI tools were alw…
signüll @signulll signüll @signulll remarkable to see github copilot execution given they had almost all of the advantages including first mover. what happened?! Relevant View quotes They screwed over the guy who spearheaded the project on comp and he walked. This happened fairly early and it never recovered. That’s my recollection at least based on his posts. Honestly feel so bad for people who are only allowed to use copilot at work Every time I hear somebody be like, "Oh yeah, AI is actually not that good. I tried it out." Every fucking time, it's always co-pilot. This chart was already deb…
International models on ARC-AGI-2 Semi Private - Kimi K2.5 ( @Kimi_Moonshot ): 12%, $0.28 - Minimax M2.5 ( @MiniMax_AI ): 5%, $0.17 - GLM-5 ( @Zai_org ): 5%, $0.27 - Deepseek V3.2 ( @deepseek_ai ): 4%, $0.12 These models score below July 2025 frontier labs Relevant View quotes We only conduct Semi-Private testing with providers that have trusted data retention agreements. Qwen 3 Max Thinking is not included for this reason. I see the same thing on pencil puzzle bench (multi step reasoning benchmark), US closed models score well and above the open chinese models. interesting that Mistral is com…
Human DX optimizes for discoverability and forgiveness. Agent DX optimizes for predictability and defense-in-depth. These are different enough that retrofitting a human-first CLI for agents is a losing bet. I built a CLI for Google Workspace — agents first. Not “built a CLI, then noticed agents were using it.” From Day One, the design assumptions were shaped by the fact that AI agents would be the primary consumers of every command, every flag, and every byte of output. CLIs are increasingly the lowest-friction interface for AI agents to reach external systems. Agents don’t need GUIs. They nee…
💌 Hey there, it’s Elizabeth from SigNoz! This newsletter is a n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! & This piece took 6 days, 5 hours to be cooked, hope we served. 🌚 There are two popular prophecies floating around tech circles these days. The first says SRE is the future of all software engineering , that as AI writes more and more code, the humans who remain will be the ones keeping systems alive. The second says AI will devour every tech job alive, SREs included. Neither is particularly useful if you’re an SRE…
The orchestration layer around a language model that manages prompts, tool execution, policy checks, and loop control for autonomous agent behavior. Latent Patterns is a new platform that teaches AI concepts to developers — through screencasts, technical deep dives, interactive playgrounds, and hands-on courses. We haven't launched yet. Sign up below and we'll notify you when we open the doors. An agent harness is the orchestration layer around an agent : the runtime that constructs context, executes tool calls , enforces guardrails, and decides when each loop iteration should continue or stop…
definition: Agent Harness > The orchestration layer around a language model that manages prompts, tool execution, policy checks, and loop control for autonomous agent behavior. An agent harness is the orchestration layer around an agent: the runtime that constructs context, executes tool calls, enforces guardrails, and decides when each loop iteration should continue or stop. If the model is the “reasoning engine,” the harness is the operating system and control plane that makes the engine useful, safe, and repeatable in production. Agent Harness — Glossary — Latent Patterns From latentpattern…
The paper says the best way to manage AI context is to treat everything like a file system. Today, a model's knowledge sits in separate prompts, databases, tools, and logs, so context engineering pulls this into a coherent system. The paper proposes an agentic file system where every memory, tool, external source, and human note appears as a file in a shared space. A persistent context repository separates raw history, long term memory, and short lived scratchpads, so the model's prompt holds only the slice needed right now. Every access and transformation is logged with timestamps and provena…
Dedicated to all those who are sceptical about the significance of agentic coding, and to those who are not, and are wondering what it means for the future of their profession. The title is an homage to Zen of Python by Tim Peters. Unlike Tim, I am not a zen master. My only aim is to take stock of where we are and where we might be heading. I have been building with coding agents daily for the past year, and I also help teams adopt them without losing reliability or security. Software development is dead Code is cheap Refactoring easy So is repaying technical debt All bugs are shallow Create t…
","pad_token":"<|endoftext|>","unk_token":null},"chat_template_jinja":"{%- set image_count = namespace(value=0) %}\n{%- set video_count = namespace(value=0) %}\n{%- macro render_content(content, do_vision_count, is_system_content=false) %}\n {%- if content is string %}\n {{- content }}\n {%- elif content is iterable and content is not mapping %}\n {%- for item in content %}\n {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}\n {%- if is_system_content %}\n {{- raise_exception('System message cannot contain images.') }}\n {%- endif %}\n {%- if do_vision_count %}\n {%- set…
Agent Harness is the Real Product Everyone talks about models. Nobody talks about the scaffolding. The companies shipping the best AI agents today- Claude Code, Cursor, Manus, Devin, SWE-Agent all converge on the same architecture: a deliberately simple loop wraps the model, a handful of primitive tools give it hands, and the scaffolding decides what information reaches the model and when. The model is interchangeable. The harness is the product. Here is the evidence: Claude Opus 4.5 scores 42% on CORE-Bench with one scaffold and 78% with another. Cursor's lazy tool loading cuts token usage by…
","pad_token":"<|vision_pad|>","unk_token":null},"chat_template_jinja":"{%- set image_count = namespace(value=0) %}\n{%- set video_count = namespace(value=0) %}\n{%- macro render_content(content, do_vision_count, is_system_content=false) %}\n {%- if content is string %}\n {{- content }}\n {%- elif content is iterable and content is not mapping %}\n {%- for item in content %}\n {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}\n {%- if is_system_content %}\n {{- raise_exception('System message cannot contain images.') }}\n {%- endif %}\n {%- if do_vision_count %}\n {%- se…
Credit: Transformer/ Rebecca Hendin “Somehow all of the interesting energy for discussions about the long-range future of humanity is concentrated on the right,” wrote Joshua Achiam, head of mission alignment at OpenAI, on X last year. “The left has completely abdicated their role in this discussion. A decade from now this will be understood on the left to have been a generational mistake.” It’s a provocative claim: that while many sectors of the world, from politics to business to labor, have begun engaging with what artificial intelligence might soon mean for humanity, the left has not. And…
The self-driving codebase: fleets, swarms and background agents Recently an article titled 'something big is happening' went viral. It was a wake-up call to those not in the tech industry about how AI has hit this inflection point, since December 2025. It does a great job of putting into words what those of us keeping up with the frontier of coding AI feel. An inflection point, and like things are 'going exponential'. My contributions on areyougoingexponential.rhys.dev/loujaybee I feel it and see it in my own GitHub contributions graph. The bottleneck of software development has shifted violen…
DAIR.AI @dair_ai DAIR.AI @dair_ai New research on agent memory. Agent memory is evaluated on chatbot-style dialogues. But real agents don't chat. They interact with databases, code executors, and web interfaces, generating machine-readable trajectories, not conversational text. The key to better memory is to preserve causal dependencies. Existing memory benchmarks don't actually measure what matters for agentic applications. This new research introduces AMA-Bench, the first benchmark built for evaluating long-horizon memory in real agentic tasks. It spans six domains including web, text-to-SQL…
A simple framework to build Agentic Systems that just works I've been building agentic systems for a couple of years now. For Youtube, for Open Source, for my SaaS, for my office. Today I want to write this short article sharing what I have learned and where my policies have converged. Many people claim that building agentic harnesses is more of an art than a science . I mostly agree with this, but I still think it is a bit dangerous to assume "its just art" . The art myself sets you up to think about agentic systems in a wrong way. If you convince yourself that all you are building is an art…
The third era of AI software development When we started building Cursor a few years ago, most code was written one keystroke at a time. Tab autocomplete changed that and opened the first era of AI-assisted coding. Then agents arrived, and developers shifted to directing agents through synchronous prompt-and-response loops. That was the second era. Now a third era is arriving. It is defined by agents that can tackle larger tasks independently, over longer timescales, with less human direction. As a result, Cursor is no longer primarily about writing code. It is about helping developers build t…
Not just did OpenAI defect and concede to this whole authoritarian maneuver, but Sam also went and just deceptively framed the whole thing to try to make it look like they had agreed to the same Anthropic redlines, which is not actually true. Quote Nathan Calvin @_NathanCalvin · Feb 28 From reading this and Sam's tweet, it really seems like OpenAI *did* agree to the compromise that Anthropic rejected - "all lawful use" but with additional explanation of what the DOW means by all lawful use. The concerns Dario raised in his response would still apply here x.com/UnderSecretary… Show more Relevan…
Introducing Desloppify v.0.8. Thanks to many workflow improvements + new agent planning tools, it can now run for days on end - autonomously finding, understanding, & fixing large and small code quality problems. There's no reason your slop code can't be beautiful! Relevant View quotes
Latent.Space @latentspacepod Latent.Space @latentspacepod From rewriting Google’s search stack in the early 2000s to reviving sparse trillion-parameter models and co-designing TPUs with frontier ML research, Jeff Dean has quietly shaped nearly every layer of the modern AI stack. As Chief AI Scientist at Google and a driving force behind Gemini, Jeff has lived through multiple scaling revolutions from CPUs and sharded indices to multimodal models that reason across text, video, and code. We sat down with Jeff to unpack what it really means to “own the Pareto frontier,” why distillation is the q…
This week I found myself writing code by hand again. Not a lot, maybe ten, twenty lines in total, which is far less than what I had Amp produce, but still: actual typing out of code. Miracle I didn’t get any blisters. At our Amp meetup in Singapore I mentioned this on stage and someone in the audience cheekily asked: “You just told us that these agents can now work well when you give them a longer leash and yet you wrote code by hand, how come?” The answer can probably be boiled down to something that sounds very trite: to build software means to learn. When you build a new piece of software,…
No Servers Yet a:hover]:text-primary [&>a]:underline [&>a]:underline-offset-4 ino:ZGF0YS1zbG90PWVtcHR5LWRlc2NyaXB0aW9u>Add a server to connect to remote machines via SSH
Lance Martin @RLanceMartin Sal DiStefano reposted Lance Martin @RLanceMartin Give Claude a computer TL;DR – Programmatic tool calling (PTC) is an interesting capability in Claude Opus/Sonnet 4.6. Instead of making tool calls that each round-trip through Claude's context, Claude writes code that can orchestrate tool calls directly inside a container. Intermediate tool results return to the code, not Claude’s context window. This reduces token usage and improves performance on multi-step tasks like search. Opus 4.6 with PTC recently scored #1 on LMArena’s search benchmark . See our docs to learn…
TL;DR: A good mental model is to treat AGENTS.md as a living list of codebase smells you haven’t fixed yet, not a permanent configuration. Auto-generated AGENTS.md files hurt agent performance and inflate costs by 20%+ because they duplicate what agents can already discover. Human-written files help only when they contain non-discoverable information - tooling gotchas, non-obvious conventions, landmines. Every other line is noise. There’s a ritual that’s become almost universal among developers adopting AI coding agents. You set up a new repo, run /init , watch the agent scan your codebase, an…
· Mod THESE ARE ALL ONE-SHOT SVGs!!! From a new anonymous model called "Arrow Preview" on Design Arena. This level of detail is unheard of from an LLM. It's using a different technique to create these than all previous LLMs. SVG benchmark is saturated Check comments Relevant View quotes
we're making @blocks smaller today. here's my note to the company. #### today we're making one of the hardest decisions in the history of our company: we're reducing our organization by nearly half, from over 10,000 people to just under 6,000. that means over 4,000 of you are being asked to leave or entering into consultation. i'll be straight about what's happening, why, and what it means for everyone. first off, if you're one of the people affected, you'll receive your salary for 20 weeks + 1 week per year of tenure, equity vested through the end of may, 6 months of health care, your corpora…
Thariq @trq212 pedram.md reposted Thariq @trq212 Lessons from Building Claude Code: Seeing like an Agent One of the hardest parts of building an agent harness is constructing its action space. Claude acts through Tool Calling, but there are a number of ways tools can be constructed in the Claude API with primitives like bash, skills and recently code execution (read more about programmatic tool calling on the Claude API in @RLanceMartin's new article ). Given all these options, how do you design the tools of your agent? Do you need just one tool like code execution or bash? What if you had 50…
Sakana AI @SakanaAILabs Séb Krier reposted Sakana AI @SakanaAILabs We’re excited to introduce Doc-to-LoRA and Text-to-LoRA , two related research exploring how to make LLM customization faster and more accessible. https:// pub.sakana.ai/doc-to-lora/ By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks. Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs…
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations. 2026_02_25-3d595112026_02_06-1ed29a0 runtimeOnly("com.skillsjars:anthropics__skills__algorithmic-art:2026_02_25-3d59511") Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when bran…
As a recap of Part 1 in this blog miniseries, minions are a homegrown unattended agentic coding flow at Stripe. Over 1,300 Stripe pull requests (up from 1,000 as of Part 1) merged each week are completely minion-produced, human-reviewed, but containing no human-written code. If you haven’t read Part 1, we recommend checking that out first to understand the developer experience of using minions. In this post, we’ll dive deeper into some more details of how they’re built, focusing on the Stripe-specific portions of the minion flow. Devboxes, hot and ready For maximum effectiveness, unattended ag…
Across the industry, agentic coding has gone from new and exciting to table stakes, and as underlying models continue to improve, unattended coding agents have gone from possibility to reality. Minions are Stripe’s homegrown coding agents. They’re fully unattended and built to one-shot tasks. Over a thousand pull requests merged each week at Stripe are completely minion-produced, and while they’re human-reviewed, they contain no human-written code. Our developers can still plan and collaborate with agents such as Claude and Cursor, but in a world where one of our most constrained resources is…
Ivan Fioravanti ᯅ @ivanfioravanti Ivan Fioravanti ᯅ @ivanfioravanti Qwen 3.5 Medium models benchmarks on M3 Ultra Alibaba Qwen released Qwen 3.5 Medium Model Series and on paper is powerful, faster and smaller than Qwen 3 Series. In this article we are gonna see: Qwen/Qwen3.5-122B-A10B vs Qwen/Qwen3.5-35B-A3B vs Qwen/Qwen3.5-27B in 4bit from pure speed and memory perspective. Quality Benchmarks are already available everywhere. We'll start with pure benchmarks and close with a sample of OpenCode running with Qwen3.5-122B-A10B 4bit to generate a snake game, with final results and prompt at the…
It’s Next.js Liberation Day. The #1 request we kept hearing: help us run Next fast and secure, without the lock-in and the costs. So we did it. We kept the amazing DX of @nextjs , without the bespoke tooling, built on @vite . We’re working with other providers to make deployment a first-class experience everywhere. Next.js belongs to everyone. How we rebuilt Next.js with AI in one week From blog.cloudflare.com Relevant View quotes
Over the last ~2 weeks I've rewritten the @ladybirdbrowser JavaScript compiler in Rust using AI agents. ~25k lines of safe Rust (20k if you exclude comments). No regressions on test262 or our own internal test suites. Extensively tested against the live web by browsing in lockstep mode where we run both the C++ and Rust pipelines, and then verify identical AST & bytecode. We're making a pragmatic decision and adopting Rust as a C++ successor language. What a time to be alive! Quote Ladybird @ladybirdbrowser · Feb 23 Ladybird adopts Rust, with help from AI https:// ladybird.org/posts/adopting -…
February 25, 2026 The world of software is undergoing a shift not seen since the advent of compilers in the 1970s. Compilers were the original vibe coding : they automatically generate complex machine code that human programmers had to manually write before. Over time, compilers became fully trusted, nobody has to look under the hood, most programmers won't understand a thing. Are AI coding agents the new compilers? Will we simply trust whatever code they generate? In this post I focus on two questions: In what language(s) are we going to express our intent? How will humans tell AI agents what…
Dillon Mulroy @dillon_mulroy Nico Bailon reposted Dillon Mulroy @dillon_mulroy · Feb 19 pi code gen is all you need total bash victory confirmed again The problem is that the tool call is no longer deterministic. And really the solution is just writing better tools instead of letting Claude write bespoke python code thousands or millions of times a day. Last week I had an agent loop burning 40k+ tokens just round-tripping tool results through the model. PTC skipping those intermediate inference passes is the obvious fix... surprised it took this long to ship. This is convergence toward code-as…
*This post was updated at 12:35 pm PT to fix a typo in the build time benchmarks. Last week, one engineer and an AI model rebuilt the most popular front-end framework from scratch. The result, vinext (pronounced "vee-next"), is a drop-in replacement for Next.js, built on Vite , that deploys to Cloudflare Workers with a single command. In early benchmarks, it builds production apps up to 4x faster and produces client bundles up to 57% smaller. And we already have customers running it in production. The whole thing cost about $1,100 in tokens. Next.js is the most popular React framework. Million…
The File System Is the New Database: How I Built a Personal OS for AI Agents Every AI conversation starts the same way. You explain who you are. You explain what you're working on. You paste in your style guide. You re-describe your goals. You give the same context you gave yesterday, and the day before, and the day before that. Then, 40 minutes in, the model forgets your voice and starts writing like a press release. I got tired of this. So I built a system to fix it. I call it Personal Brain OS. It's a file-based personal operating system that lives inside a Git repository. Clone it, open it…
Skill Graphs > SKILL.md people underestimate the power of structured knowledge. it enables entirely new kinds of applications right now people write skills that capture one aspect of something. a skill for summarizing, a skill for code review and so on. (often) one file with one capability thats fine for simple tasks but real depth requires something else imagine a therapy skill that provides relevant information about cognitive behavioral patterns, attachment theory, active listening techniques, emotional regulation frameworks and so on a single skill file cant hold that skill graphs a skill…
(All images: Gemini) After millennia of supremacy, we await our demotion. You can detect the trembling. It’s found in the anxious insistence that artificial intelligence isn’t truly intelligent . Or that using AI is a cheat , a perversity , a turf violation . The trembling intensifies with a disturbing thought: What if those flares behind your eyes—the bursts of wit and the worry, the storyboards of memory, so many yearnings—what if everything was just computation? Because our “computers” are yesterday’s model, no updates available. “I think about it practically all the time, every single day.…