Core Concepts
The author examines the promotional videos and claims made by Cognition Labs about their AI software engineer, Devin, and finds evidence of cherry-picking, bait-and-switch tactics, and omission of key limitations, raising concerns about the accuracy of the portrayed capabilities.
Abstract
The author initially did not plan to cover the hype around Devin AI, as they felt they lacked the necessary expertise in software engineering workflows compared to other creators. However, a recent video by the YouTuber "Internet of Bugs" prompted the author to take a closer look at the communications and demos surrounding Devin.
The author analyzes the infamous Devin AI demo on Upwork and finds several issues:
The task was cherry-picked to showcase Devin's strengths, as it explicitly mentioned "road damage" which is not a typical software engineering task.
The video skips the client communication part and jumps straight to Devin's output, which does not actually meet the stated requirements of the task.
Devin creates its own bugs and then fixes them, which is not acknowledged in the demo, making it seem like Devin was fixing real issues.
The time taken to complete the task is much longer than a human software engineer would take.
The author then examines other Devin AI demos, such as "AI finds and fixes a bug that I didn't catch!" and "Our AI software engineer fixes a bug in Python algebra system," and finds similar patterns of cherry-picking, omission of limitations, and reliance on well-defined problems that do not showcase Devin's ability to handle ambiguity or make architectural decisions.
The author acknowledges that Devin may still be a useful tool, but the concern is with the one-sided, hype-driven communication from Cognition Labs, which the author believes can lead to negative consequences, such as hiding real issues with the technology, diverting attention from alternatives, and preying on the vulnerable and unaware.
The author concludes by emphasizing the importance of being more critical of information shared, especially in emerging and hype-heavy fields, to make better-informed decisions.
Stats
"In 2016, the average business saved and stored 347.56 terabytes of data, according to research from HubSpot. Keeping that amount of data stored would generate nearly 700 tons of carbon dioxide each year."
"Half of all publicly traded companies in America are not unprofitable. Many of these are 'tech companies', hoping to reach the jannat of economies of scale, high profit margins, and the network effect."
Quotes
"Hype-based environments hide the real issues with a particular technology or solution. One need look no further than Crypto for a recently devasting example."
"Hype occasionally leads to upper management sanctioning projects that adversely impact their employees' careers. JP Morgan folk found out the hard way, when JPM released WADU- an AI Surveillance system that was meant to track employee productivity."
"Hype preys on the people who are most vulnerable/unaware about them. The 2008 crisis hit the financially illiterate who bought into the story that real estate never goes down (many people who pushed this agenda walked out rich)."