Skip to main content

Chaos engineering service Gremlin raises $18M, launches new resiliency tools

“Slack is down.” It’s a headline we have had blaring at TechCrunch on numerous occasions (mostly because we actually get work done when not distracted by a constant waterfall of GIFs). But Slack is not alone — issues with uptime and reliability plague modern web services, from Alexa to WhatsApp to Apple Maps.

As any software engineer can attest, web application development is extraordinarily complicated. Databases, storage services and business logic all need to work together perfectly so that users can buy their goods or watch their films.

But what happens when one piece of that application breaks down? Today, a small outage in one AWS availability zone could cascade and knock offline an entire service, as we have seen repeatedly. Today’s developer tools are decent at spotting bugs and other logic errors, but they don’t investigate applications systematically to ask how they can respond to various crises.

That’s where Gremlin comes in. The service, founded by CEO Kolton Andrus, who designed Netflix’s failure injection service and worked with CTO Matthew Fornaciari while at Amazon, is designed to throw a monkey wrench into any application, simulating faults like storage errors, database congestion and sudden spikes in latency. Its tagline is “break things on purpose” (something of a rift of Facebook’s “move fast and break things”).

Resiliency is clearly on investors’ minds, since the startup announced this morning at its Chaos Conf in SF that it has raised an $18 million Series B round led by Redpoint partner Tomasz Tunguz. That’s a follow-up to a $7.5 million series A led by Index Ventures partner Mike Volpi, which was announced less than a year ago.

In addition to announcing the funding today, the company unveiled its “Application Level Fault Injection” system — a mouthful of a name, but a feature that will help DevOps engineers test systems at the application level, including most importantly serverless environments.

Andrus said in a note to TechCrunch that “This past year has been a whirlwind. We spent a lot of time educating everyone from engineers to CIOs about chaos engineering and building up the community.” He said the new funding will be used to further build out Gremlin’s engineering team.

As I wrote about in-depth a few months ago, Gremlin is pioneering a field of software development dubbed “chaos engineering.” Rather than using formal verification to test whether code is accurate and performant, chaos engineers throw deliberate and systematic errors at an application in an attempt to simulate various types of failure and find brittle parts of software programs.

That sounds easy on the surface, but extremely complicated in practice: You want to simulate an outage without actually creating an outage on a mission-critical system. Netflix wants to test whether losing a database will cause video to stop playing, without physically pulling the plug on a database and seeing if your movie is still on the TV.

Gremlin’s platform provides something of a sandbox for engineers to slowly ramp up errors, and then more importantly, ramp down errors if a breakage is detected. So a DevOps engineer can add a few milliseconds of latency to a program and see how it responds, and then add a few more.

With the rise of serverless services like AWS Lambda, the complexity around applications gets even more challenging. Now, applications aren’t just on a single instance, but individual functions could be scattered across multiple instances and potentially multiple data centers. That can save developer time and reduce costs, but it also exponentially increases the risk of something going wrong and harming an application’s reliability.

Gremlin’s new ALFI feature is designed to allow more fine-grain tuning of attacks, so that DevOps engineers can target just particular aspects of an application living in a serverless environment. It’s inspired by Andrus’ work at Netflix around Failure Injection Testing, which was a sort of successor to the company’s earlier Chaos Monkey tools.

Gremlin’s ALFI feature allows developers to simulate more fine-grained failures

It’s these sorts of features that partly intrigued Tunguz at Redpoint, who is well-known for his thoughts on SaaS. He said in a note to TechCrunch that “In the modern cloud era — where systems are distributed, containerized, and highly ephemeral — it’s become nearly impossible to have a complete understanding of system behavior without doing the kind of proactive testing Gremlin offers.”

Gremlin’s work is to not just sell a service, but to reshape how developers think about building and testing applications. Perhaps someday all of our web services will be reliable — and then how will we get work done?



from TechCrunch https://ift.tt/2zCXMj4
via IFTTT

Comments

Popular posts from this blog

Max Q: Psyche(d)

In this issue: SpaceX launches NASA asteroid mission, news from Relativity Space and more. © 2023 TechCrunch. All rights reserved. For personal use only. from TechCrunch https://ift.tt/h6Kjrde via IFTTT

Max Q: Anomalous

Hello and welcome back to Max Q! Last week wasn’t the most successful for spaceflight missions. We’ll get into that a bit more below. In this issue: First up, a botched launch from Virgin Orbit… …followed by one from ABL Space Systems News from Rocket Lab, World View and more Virgin Orbit’s botched launch highlights shaky financial future After Virgin Orbit’s launch failure last Monday, during which the mission experienced an  “anomaly” that prevented the rocket from reaching orbit, I went back over the company’s financials — and things aren’t looking good. For Virgin Orbit, this year has likely been completely turned on its head. The company was aiming for three launches this year, but everything will remain grounded until the cause of the anomaly has been identified and resolved. It’s unclear how long that will take, but likely at least three months. Add this delay to Virgin’s dwindling cash reserves and you have a foundation that’s suddenly much shakier than before. ...

What’s Stripe’s deal?

Welcome to  The Interchange ! If you received this in your inbox, thank you for signing up and your vote of confidence. If you’re reading this as a post on our site, sign up  here  so you can receive it directly in the future. Every week, I’ll take a look at the hottest fintech news of the previous week. This will include everything from funding rounds to trends to an analysis of a particular space to hot takes on a particular company or phenomenon. There’s a lot of fintech news out there and it’s my job to stay on top of it — and make sense of it — so you can stay in the know. —  Mary Ann Stripe eyes exit, reportedly tried raising at a lower valuation The big news in fintech this week revolved around payments giant Stripe . On January 26, my Equity Podcast co-host and overall amazingly talented reporter Natasha Mascarenhas and I teamed up to write about how Stripe had set a 12-month deadline for itself to go public, either through a direct listing or by pursuin...