Building CLIs for agents If you've ever watched an agent try to use a CLI, you've seen it get stuck on an interactive prompt it can't answer, or parse a help page with no examples. Most CLIs were built assuming a human is at the keyboard. Here are some things I've found that make them work better for agents: Make it non-interactive. If your CLI drops into a prompt mid-execution, an agent is stuck. It can't press arrow keys or type "y" at the right moment. Every input should be passable as a flag. Keep interactive mode as a fallback when flags are missing, not the primary path. bash # this bloc…
Anthropic shipped four ways to run Claude without you in the last three weeks. Here’s when to use each one, and how they compare to OpenClaw. /schedule is the big one. Cloud-based recurring jobs on Anthropic’s infrastructure, launched March 23. Your laptop can be closed, your terminal can be shut. You write a prompt, set a cron cadence, Claude runs it. Nightly CI reruns on flaky tests so your morning standup starts with a PR instead of a bug report. Weekly dependency audits that ship a clean PR every Monday. Daily reviews of open PRs that flag anything stale for more than 48 hours. If you’re r…
TurboQuant: Redefining AI efficiency with extreme compression
We introduce a set of advanced theoretically grounded quantization algorithms that enable massive compression for large language models and vector search engines. Vectors are the fundamental way AI models understand and process information. Small vectors describe simple attributes, such as a point in a graph, while “high-dimensional” vectors capture complex information such as the features of an image, the meaning of a word, or the properties of a dataset. High-dimensional vectors are incredibly powerful, but they also consume vast amounts of memory, leading to bottlenecks in the key-value cac…
2026-03-25 The turtle's face is me looking at our industry It's been about a year since coding agents appeared on the scene that could actually build you full projects. There were precursors like Aider and early Cursor, but they were more assistant than agent. The new generation is enticing, and a lot of us have spent a lot of free time building all the projects we always wanted to build but never had time to. And I think that's fine. Spending your free time building things is super enjoyable, and most of the time you don't really have to care about code quality and maintainability. It also gi…
Meet the new Stitch, your vibe design partner. Here are 5 major upgrades to help you create, iterate and collaborate: AI-Native Canvas Smarter Design Agent Voice Instant Prototypes Design Systems and DESIGN.md Rolling out now. Details and product walkthrough video in 1: Relevant View quotes Here is a quick walkthrough of everything new in Stitch: The AI-native canvas can hold and reason across images, code, and text simultaneously. The new agent manager helps you design in parallel. (PS … light mode!) A smarter design agent now understands your entire AI-Native Canvas We are introducing a comp…
Lessons from Building Claude Code: How We Use Skills Skills have become one of the most used extension points in Claude Code. They’re flexible, easy to make, and simple to distribute. But this flexibility also makes it hard to know what works best. What type of skills are worth making? What's the secret to writing a good skill? When do you share them with others? We've been using skills in Claude Code extensively at Anthropic with hundreds of them in active use. These are the lessons we've learned about using skills to accelerate our development. What are Skills? If you’re new to skills, I’d r…
We're shipping a new feature in Claude Cowork as a research preview that I'm excited about: Dispatch! One persistent conversation with Claude that runs on your computer. Message it from your phone. Come back to finished work. To try it out, download Claude Desktop, then pair your phone. 0: Relevant View quotes
How to 10x your Claude Skills (using Karpathy's autoresearch method) Your Claude skills probably fail 30% of the time and you don't even notice. I built a method that auto-improves any skill on autopilot, and in this article I'm going to show you exactly how to run it yourself. You kick it off, and the agent tests and refines the skill over and over without you touching anything. My landing page copy skill went from passing its quality checks 56% of the time to 92%. With zero manual work at all. The agent just kept testing and tightening the prompt on its own. Here's the method and the exact s…
NVIDIA's Jenson Hwang launches NemoClaw to the OpenClaw community
NVIDIA today announced NemoClaw, an open source stack that simplifies running OpenClaw always-on assistants—with a single command. It incorporates policy-based privacy and security guardrails, giving you control over your agents’ behavior and data handling. This enables self-evolving claws to run more safely in the cloud, on prem, on NVIDIA RTX PCs, and on NVIDIA DGX Spark.
“Every software company in the world needs to have a Claw strategy" - Jensen Huang, Nvidia Indeed. This and more. Relevant View quotes jensen sells the shovels, builds the mine, and now writes the strategy doc. nvidia isnt competing with anyone, theyre the infrastructure Jensen consistent on this for years. The interesting shift is Claw strategy implying orchestration, not just inference. Most software companies are still stuck at the API call stage. The ones who figure out agent-to-agent coordination first will widen the gap fast. i am the Claw strategy at one company. what kevin figured out…
don't make me tap the sign Quote dex @dexhorthy · Aug 13, 2025 Giving sonnet 4 a 1m context window is kinda unhinged considering I see many folks struggle to keep it on task past Relevant View quotes not clear to me needle in the haystack is the right measure for long context performance I used to be a religious /clear user, but doing much less now, imo 4.6 is quite good across long context windows Yeah I take NIAH as like “the best it could possibly do” - for long convos with lots of instructions it will be worse than that it wasn’t the dumb zone until I showed up I’m always 85% context maxxi…
OpenClaw feels like this year's DeepSeek moment. Hype in China way beyond expectations! Kimi Claw rode the wave to #2 on Feb product growth rankings. :) Edit image Relevant View quotes awesome!! keep up the great work! OpenClaw as DeepSeek moment proves China strategy: when US gatekeeps access, China open-sources everything. Next frontier isnt model performance - its democratization of infrastructure. this is giving me flashbacks to when everyone suddenly became a deepseek expert overnight... same energy fr Government subsidies + enterprise forks + open-source momentum is a powerful combo for…
TLDR: it is a cron job dispatching tickets from Linear to workers, each of which is a Ralph loop using a Linear comment as draft pad for persisted state. Yes it is all you need. Beautifully designed and minimal. GitHub - openai/symphony: Symphony turns project work into isolated, autonomous implementation... From github.com Relevant View quotes
sent this to the team today everything great comes from being able to delay gratification for as long as possible and it feels like we're collectively losing our ability to do that Relevant View quotes
Luke The Dev @iamlukethedev Pinned Luke The Dev @iamlukethedev Scrum meeting added to the OpenClaw office. Agents walk into the meeting room and report their progress in real time. Task management on another level. Standup meetings with your AI engineers . Sound on 0: Relevant View quotes
a file system is not all you need there are a couple of articles going around on structured context graphs for knowledge work and argue that markdown files are the best primitive heres one: Heinrich @arscontexta · Feb 25 Article Company Graphs = Context Repository everything is a context problem when people say AI cant do real work, what theyre actually saying is they gave it bad context @alexalbert__ said 2026 will transform knowledge work (read this after you... and the diagnosis is true: context is the bottleneck. companies are sitting on scattered knowledge: decisions, rationale, meeting o…
The Anatomy of an Agent Harness TLDR: Agent = Model + Harness. Harness engineering is how we build systems around models to turn them into work engines. The model contains the intelligence and the harness makes that intelligence useful. We define what a harness is and derive the core components today's and tomorrow's agents need. Can Someone Please Define a "Harness"? Agent = Model + Harness If you're not the model, you're the harness. A harness is every piece of code, configuration, and execution logic that isn't the model itself. A raw model is not an agent. But it becomes one when a harness…
We're excited by the reaction to our research on scaling long-running autonomous coding . This work started as internal research to push the limits of the current models. As part of the research, we created a new agent harness to orchestrate many thousands of agents and observe their behavior. By last month, our system was stable enough to run continuously for one week, making the vast majority of the commits to our research project (a web browser). This browser was not intended to be used externally and we expected the code to have imperfections. However, even with quirks, the fact that thous…
From craft to mass production: Software as an industrial system · Ona
For a long time, writing software felt like a creative act, much like composing music or shaping clay. That feeling was real. But software development is no longer the sum of those moments. It is a production system in which creativity occupies only a small fraction of total lead time. For most businesses, software development is not defined by the act of writing code. It is a multi-stage production system that spans planning, coordination, execution, verification, integration, and release. Code is one station on a factory floor. An important one, but no longer the bottleneck. The craft myth b…
1,500+ PRs Later: Spotify’s Journey with Our Background Coding Agent (Honk, Part 1) | Spotify Engineering
This is part 1 in our series about Spotify's journey with background coding agents (internal codename: “Honk”) and the future of large-scale software maintenance. See also part 2 and part 3 . For years, developer productivity has improved through better tooling. We have smarter IDEs, faster builds, better tests, and more reliable deployments. But even so, maintaining a codebase, keeping dependencies up to date, and ensuring that the code follows best practices demands a surprising amount of manual work. At Spotify, our Fleet Management system automated much of that toil, yet any moderately com…
Custom Harness: The Agent Harness Is Model-Shaped The same scaffold that doubles one model's performance actively hurts another. @cursor_ai proved it. They remove reasoning traces from GPT-5-Codex and performance drops 30%. They remove them from base GPT-5 and it drops 3%. Same harness, same benchmark and 10x difference in sensitivity. They tell Codex to "preserve tokens" and the model starts refusing tasks. They give Claude the exact same instruction and nothing changes. Princeton's HAL leaderboard tested 21,730 agent rollouts across 9 models and found the optimal scaffold flips depending on…
Satya Nadella @satyanadella Robert Scoble reposted Satya Nadella @satyanadella · 5h Announcing Copilot Cowork, a new way to complete tasks and get work done in M365. When you hand off a task to Cowork, it turns your request into a plan and executes it across your apps and files, grounded in your work data and operating within M365’s security and governance Show more Pay attention to this one if you are building terminal-based coding agents. OpenDev is an 81-page paper covering scaffolding, harness design, context engineering, and hard-won lessons from building CLI coding agents. It introduces…
Man I am so sick of AI slop in writing. I don't think you quite understand how prevalent it is. It is disrespectful to expect ME to read something YOU could not even be bothered to write (or likely even read). The lingering human connection that remained on the internet is now being diluted even further. Many of the Hacker News posts I click on (especially sorting by new) are completely AI generated (let me not even start on Reddit posts or Twitter threads (which I don't use)). This includes several that reach the front page on a daily basis. It's shameless. Unfortunately, many of you educated…
On January 5, employees at Cursor returned from the holiday weekend to an all-hands meeting with a slide deck titled “War Time.” After becoming the hottest, fastest growing AI coding company, Cursor is confronting a new reality: developers may no longer need a code editor at all. Check out the full story: https:// forbes.com/sites/annatong /2026/03/05/cursor-goes-to-war-for-ai-coding-dominance/?utm_campaign=ForbesMainTwitter&utm_source=ForbesMainTwitter&utm_medium=social … ( : Kimberly White via Getty Images for Fortune Media) Relevant View quotes unpopular take but IDE-based AI tools were alw…
signüll @signulll signüll @signulll remarkable to see github copilot execution given they had almost all of the advantages including first mover. what happened?! Relevant View quotes They screwed over the guy who spearheaded the project on comp and he walked. This happened fairly early and it never recovered. That’s my recollection at least based on his posts. Honestly feel so bad for people who are only allowed to use copilot at work Every time I hear somebody be like, "Oh yeah, AI is actually not that good. I tried it out." Every fucking time, it's always co-pilot. This chart was already deb…
On New Year’s Day, programmer Steve Yegge launched Gas Town , an open-source platform that lets users orchestrate swarms of Claude Code agents simultaneously, assembling software at blistering speed. The results were impressive, but also dizzying. “[T]here’s really too much going on for you to reasonably comprehend,” wrote one early user. “I had a palpable sense of stress watching it. Gas Town was moving too fast for me.” Gas Town illustrates a growing tension: AI promises to act as an amplifier that will drive efficiency and make work easier, but workers that are using these AI tools report t…
auto PREMIUM Premium Journalism, deeply reported stories and breaking news Subscribe Subscriptions renew automatically. You may cancel your subscription at any time.
International models on ARC-AGI-2 Semi Private - Kimi K2.5 ( @Kimi_Moonshot ): 12%, $0.28 - Minimax M2.5 ( @MiniMax_AI ): 5%, $0.17 - GLM-5 ( @Zai_org ): 5%, $0.27 - Deepseek V3.2 ( @deepseek_ai ): 4%, $0.12 These models score below July 2025 frontier labs Relevant View quotes We only conduct Semi-Private testing with providers that have trusted data retention agreements. Qwen 3 Max Thinking is not included for this reason. I see the same thing on pencil puzzle bench (multi step reasoning benchmark), US closed models score well and above the open chinese models. interesting that Mistral is com…
Human DX optimizes for discoverability and forgiveness. Agent DX optimizes for predictability and defense-in-depth. These are different enough that retrofitting a human-first CLI for agents is a losing bet. I built a CLI for Google Workspace — agents first. Not “built a CLI, then noticed agents were using it.” From Day One, the design assumptions were shaped by the fact that AI agents would be the primary consumers of every command, every flag, and every byte of output. CLIs are increasingly the lowest-friction interface for AI agents to reach external systems. Agents don’t need GUIs. They nee…
💌 Hey there, it’s Elizabeth from SigNoz! This newsletter is a n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! & This piece took 6 days, 5 hours to be cooked, hope we served. 🌚 There are two popular prophecies floating around tech circles these days. The first says SRE is the future of all software engineering , that as AI writes more and more code, the humans who remain will be the ones keeping systems alive. The second says AI will devour every tech job alive, SREs included. Neither is particularly useful if you’re an SRE…
The orchestration layer around a language model that manages prompts, tool execution, policy checks, and loop control for autonomous agent behavior. Latent Patterns is a new platform that teaches AI concepts to developers — through screencasts, technical deep dives, interactive playgrounds, and hands-on courses. We haven't launched yet. Sign up below and we'll notify you when we open the doors. An agent harness is the orchestration layer around an agent : the runtime that constructs context, executes tool calls , enforces guardrails, and decides when each loop iteration should continue or stop…
definition: Agent Harness > The orchestration layer around a language model that manages prompts, tool execution, policy checks, and loop control for autonomous agent behavior. An agent harness is the orchestration layer around an agent: the runtime that constructs context, executes tool calls, enforces guardrails, and decides when each loop iteration should continue or stop. If the model is the “reasoning engine,” the harness is the operating system and control plane that makes the engine useful, safe, and repeatable in production. Agent Harness — Glossary — Latent Patterns From latentpattern…
The paper says the best way to manage AI context is to treat everything like a file system. Today, a model's knowledge sits in separate prompts, databases, tools, and logs, so context engineering pulls this into a coherent system. The paper proposes an agentic file system where every memory, tool, external source, and human note appears as a file in a shared space. A persistent context repository separates raw history, long term memory, and short lived scratchpads, so the model's prompt holds only the slice needed right now. Every access and transformation is logged with timestamps and provena…
Dedicated to all those who are sceptical about the significance of agentic coding, and to those who are not, and are wondering what it means for the future of their profession. The title is an homage to Zen of Python by Tim Peters. Unlike Tim, I am not a zen master. My only aim is to take stock of where we are and where we might be heading. I have been building with coding agents daily for the past year, and I also help teams adopt them without losing reliability or security. Software development is dead Code is cheap Refactoring easy So is repaying technical debt All bugs are shallow Create t…
","pad_token":"<|endoftext|>","unk_token":null},"chat_template_jinja":"{%- set image_count = namespace(value=0) %}\n{%- set video_count = namespace(value=0) %}\n{%- macro render_content(content, do_vision_count, is_system_content=false) %}\n {%- if content is string %}\n {{- content }}\n {%- elif content is iterable and content is not mapping %}\n {%- for item in content %}\n {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}\n {%- if is_system_content %}\n {{- raise_exception('System message cannot contain images.') }}\n {%- endif %}\n {%- if do_vision_count %}\n {%- set…
Agent Harness is the Real Product Everyone talks about models. Nobody talks about the scaffolding. The companies shipping the best AI agents today- Claude Code, Cursor, Manus, Devin, SWE-Agent all converge on the same architecture: a deliberately simple loop wraps the model, a handful of primitive tools give it hands, and the scaffolding decides what information reaches the model and when. The model is interchangeable. The harness is the product. Here is the evidence: Claude Opus 4.5 scores 42% on CORE-Bench with one scaffold and 78% with another. Cursor's lazy tool loading cuts token usage by…
","pad_token":"<|vision_pad|>","unk_token":null},"chat_template_jinja":"{%- set image_count = namespace(value=0) %}\n{%- set video_count = namespace(value=0) %}\n{%- macro render_content(content, do_vision_count, is_system_content=false) %}\n {%- if content is string %}\n {{- content }}\n {%- elif content is iterable and content is not mapping %}\n {%- for item in content %}\n {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}\n {%- if is_system_content %}\n {{- raise_exception('System message cannot contain images.') }}\n {%- endif %}\n {%- if do_vision_count %}\n {%- se…
Credit: Transformer/ Rebecca Hendin “Somehow all of the interesting energy for discussions about the long-range future of humanity is concentrated on the right,” wrote Joshua Achiam, head of mission alignment at OpenAI, on X last year. “The left has completely abdicated their role in this discussion. A decade from now this will be understood on the left to have been a generational mistake.” It’s a provocative claim: that while many sectors of the world, from politics to business to labor, have begun engaging with what artificial intelligence might soon mean for humanity, the left has not. And…
The self-driving codebase: fleets, swarms and background agents Recently an article titled 'something big is happening' went viral. It was a wake-up call to those not in the tech industry about how AI has hit this inflection point, since December 2025. It does a great job of putting into words what those of us keeping up with the frontier of coding AI feel. An inflection point, and like things are 'going exponential'. My contributions on areyougoingexponential.rhys.dev/loujaybee I feel it and see it in my own GitHub contributions graph. The bottleneck of software development has shifted violen…
DAIR.AI @dair_ai DAIR.AI @dair_ai New research on agent memory. Agent memory is evaluated on chatbot-style dialogues. But real agents don't chat. They interact with databases, code executors, and web interfaces, generating machine-readable trajectories, not conversational text. The key to better memory is to preserve causal dependencies. Existing memory benchmarks don't actually measure what matters for agentic applications. This new research introduces AMA-Bench, the first benchmark built for evaluating long-horizon memory in real agentic tasks. It spans six domains including web, text-to-SQL…
A simple framework to build Agentic Systems that just works I've been building agentic systems for a couple of years now. For Youtube, for Open Source, for my SaaS, for my office. Today I want to write this short article sharing what I have learned and where my policies have converged. Many people claim that building agentic harnesses is more of an art than a science . I mostly agree with this, but I still think it is a bit dangerous to assume "its just art" . The art myself sets you up to think about agentic systems in a wrong way. If you convince yourself that all you are building is an art…
The third era of AI software development When we started building Cursor a few years ago, most code was written one keystroke at a time. Tab autocomplete changed that and opened the first era of AI-assisted coding. Then agents arrived, and developers shifted to directing agents through synchronous prompt-and-response loops. That was the second era. Now a third era is arriving. It is defined by agents that can tackle larger tasks independently, over longer timescales, with less human direction. As a result, Cursor is no longer primarily about writing code. It is about helping developers build t…
Not just did OpenAI defect and concede to this whole authoritarian maneuver, but Sam also went and just deceptively framed the whole thing to try to make it look like they had agreed to the same Anthropic redlines, which is not actually true. Quote Nathan Calvin @_NathanCalvin · Feb 28 From reading this and Sam's tweet, it really seems like OpenAI *did* agree to the compromise that Anthropic rejected - "all lawful use" but with additional explanation of what the DOW means by all lawful use. The concerns Dario raised in his response would still apply here x.com/UnderSecretary… Show more Relevan…