I would like to make it clear that the findings in this blog post are all my own personal observations. This is not an objective comparison with repeatable tests. I used these tools in my daily work and noticed their strengths and weaknesses. Someone else might experience a very different outcome. I work on Ubicloud, which is written in Ruby and follows a design pattern uncommon in the industry.
AI Dev Tools are useful now, especially for writing tests, prototyping, and repetitive tasks. They are not magic, but excellent helpers. For complex code or debugging, human input is still better. Providing better context, scoping the task well, and reusing past information are key to getting the most out of them. Context management, persistence, and large language models keep improving. Soon, they may be essential.
What tools do I use right now? For day-to-day tasks, Windsurf is a winner for me at the moment. For creating new, complex things from scratch, Claude Code works well.
Over the last 6–7 months, I tried various AI coding tools for daily tasks, side projects, and testing ideas. I was looking for the right balance of performance, cost, and flexibility between self‑hosted and cloud options. After hearing about Qwen 2.5's strong programming skills, I tested self‑hosted code assistants with Continue.dev on an RTX 3090. The setup worked well but still felt short of GitHub Copilot. I found that cloud APIs from Anthropic and OpenAI offered better price-performance. As a result, I switched to OpenRouter with Continue.dev to test models from OpenAI, Anthropic, and DeepSeek. However, after encountering limitations, I expanded my usage to include tools like Cursor and Windsurf.
The following table gives very basic information about what tools I used.
Editor/Tool | Cost | Open Source | BYOK | Default Models |
---|---|---|---|---|
Continue.dev | $ | Claude 4 Sonnet, Codestral | ||
Cursor | $20/m | ![]() | ![]() | Claude 4 Sonnet, et al* |
Windsurf | $15/m | ![]() | ![]() | SWE-1 |
Cline | $$$ | Claude Sonnet 4 | ||
Claude Code | $$$+ | ![]() | ![]() | Claude Opus 4 |
This table consists of the information you can find anywhere. However, I decided to compare them in these 4 areas because they mattered the most to me.
I compare the performance of these tools in three general areas: building new features, writing tests, and debugging. This is definitely less scientific and more subjective because I didn’t test each of these tools in the same time frame on the same task. Most often, I changed the tool I was using once I started to feel like it could do better in certain tasks.
Editor-based tools: Continue.dev vs Cursor vs Windsurf
Continue.dev is a VS Code extension; it provides an autocomplete and chat interface to interact with an LLM to generate or edit code. I used it around Q4 2024 and Q1 2025, starting with self-hosted Qwen 2.5 Coder and later on with OpenRouter-based Claude, DeepSeek, and Gemini models.
PS: While writing this post, I tried using Continue.dev again. I was happy to see many recent updates that make the experience smoother in several areas. Continue.dev now offers deterministic diff apply, fast apply, and a better diff review experience. There are also several improvements in Agent mode.
Cursor is a closed-source VS Code fork. It has AI features like autocomplete and an "agent" mode. This mode can generate, edit, run code, and fix errors. While you can use the editor itself without signing up, using any AI features needs an account. I started using Cursor in March 2025 after struggling with edits in Continue.dev.
Windsurf is available both as a VS Code plugin and a standalone closed-source VS Code fork. I decided to use the editor to get the full experience. It has a core feature set similar to Cursor, but it emphasises team workflows, context sharing, and reproducibility.
Agentic Tools: Claude Code and Cline
Although all the tools above have agentic workflows, some tools, like Claude Code and Cline, focus on being fully agent-driven. They need little to no intervention, even for long and complex tasks.These tools work like assistants that can use tools to:
With these capabilities, these tools can run the entire development loop (Edit → Build → Test → Repeat). Paired along with large context windows, they can autonomously manage a given task. Claude Code is a terminal-only CLI. Cline lives inside VS Code as a plugin.
Example: I asked Claude Code to build a fuzz test framework for our managed PostgreSQL service using an OpenAPI spec. In 30 minutes and $3 of token cost, it got something running. It wasn’t perfect but good enough to build on.
Both tools are great for creating new features. They can handle long, moderately complex tasks with large contexts. In my experience, Claude Code edges out Cline overall in areas that need a more complex understanding or reasoning quality.
Problems
Windsurf handles my everyday tasks. Claude Code helps with complex features and prototypes.
Self-hosting isn’t worth it right now unless you already have the infrastructure. Big players in the market are currently subsidising inference costs. For individual developers, investing heavily in LLM inference infrastructure doesn’t make sense. Also, open-source models have improved a lot lately, but the top coding models are still closed-source.
Improvements to Continue.dev highlight the rapid pace of development. Open-source tools are closing the gap fast. I am tempted to give Continue.dev another spin, with the latest Qwen3 coder models. I guess I'll need to go all around the circle once again.