Flexible Network Security Detection And Response With Grapl

Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try out a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes

next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node

balancers,

40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to python podcast.com/linode

today. That's l

inode,

and get a $60 credit to try out our Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For more opportunities to stay up to date, gain new skills, and learn from your peers, there are a growing number of virtual events that you can attend from the comfort and safety of your own home. Go to python podcast.com/conferences

to check out the upcoming events being offered by our partners and get registered today. Your host as usual is Tobias Macy. And today, I'm interviewing Colin O'Brien about Grapple, an open source platform for detection and response of system security incidents. So, Colin, can you start by introducing yourself?

Sure. Thanks, Tobias.

My name is Colin O'Brien. I am the CEO and founder of Grapple, where we're building

a next generation SIEM or detection response system that leverages a graph based and Python based approach to help defenders

catch attackers faster. Do you remember how you first got introduced to Python? Oh, definitely.

Yeah. I started off my career at Rapid7,

and my first role there was sort of assisting

1 of their data science teams. So I wasn't a data scientist by any means. I was just an engineer,

but they needed some help sort of building out, scaling out their research and the machine learning models that they were building. And, of course, they had chosen Python as their programming language of choice. So I sort of just dove in, started building out services for I got into Jupyter Notebooks, which the data scientists had been using to share information with each other to generate reports.

I had previously just been using languages like c and c plus plus mostly.

So it was actually a really different kind of language, and I was sort of blown away by the power of it and what you could do with some of the the really top notch quality libraries like pandas

and NumPy and those Really easy to just get started with. And, yeah, I've been using it on the side even when my main job has taken me to other languages like Java for a long time now.

And it's interesting that at Rapid7

being the home of Metasploit,

which is a famous

framework for penetration testing, but also written in Ruby that there is such a strong presence of Python as well. I'm curious if there were any sort of language wars between Python or Ruby or any issues of trying to figure out sort of compatibility or communication patterns between Ruby code bases and Python at the Rapid7?

Sure. Yeah. Yeah. I mean, I think always in good natured fun, certainly.

But

Rapid7 employed a lot of different technologies.

So Metasploit was in Ruby.

Product that I was working on, InsightIDR,

was in Java.

Data science team was leveraging

primarily Python, but, definitely they use some R as well in there,

couple of

Go projects,

and actually even an Erlang project in there at 1 point as well. So Rapid7 really wasn't shy about

picking the right tool for the job so long as everyone

sort of got together and agreed on it. In terms of Metasploit

and the data science teams, there was actually a lot of overlap in team members. But,

you know, Ruby and Python are both pretty dynamic, powerful languages. And so I don't think anyone was

pushing for 1 over the other when it just made sense.

But we did certainly have

a consistent discussion about what the right tool for the job was.

Now with your work on grapple, it's, as you said, a modern approach to being able to do detection and analysis Yeah.

Absolutely

Yeah. Absolutely. So

Grapple is a system that's designed to ingest data, usually event or log data.

So, you know, process executions,

network connections,

AWS

API calls, those sorts of things

get sent up to Grapple. Grapple will process that information and translate it from logs and events

into a graph format. Right? So really try to expose all of those connections between those logs. Just as 1 example, a process execution event might have a parent PID and a child PID, but it doesn't have much information about, you know, what that parent process was. Right? The the relationship's sort of hidden and implicit in the log.

Grapple takes that data out, makes it really explicit,

processes and cleans the data, and then stores it in a centralized

graph data lake, which is built on top of Degraf.

At this point, Grapple will execute your attack signatures, what it calls analyzers.

These are Python snippets, essentially, and what they're going to do is search that graph

looking for

malicious or suspicious patterns. Right? So maybe we don't expect 2 processes to execute together in the same process tree, or we've never seen this process call out to

an IP address in Russia or something like that.

And so you search around for those connections, the analyzers will find them, Grapple

will join all of that data up. Right? Because it's all in a graph, so it's sort of just expanding out and painting the master graph with risk scores and that sort of thing. And then you use a Jupyter notebook to

investigate the graph and interrogate it. So you have a Python environment,

a library that we've built,

and you can pivot off of the graph, expand it out, visually

look at it, and inspect

properties of nodes. And so it's a very powerful end to end system

for understanding what's going on in your various networks

and really being able to dig in, pivot off of information, and get to the heart of what behaviors are going on there.

And in terms of the motivation for the project, what was it that drove you to build this entirely new system for being able

to perform these analyses and detections? And what was lacking in the existing ecosystem of tooling for this problem space that necessitated you building this entirely new system?

I think most systems out there, as an example, Splunk or Elasticsearch,

generally work directly on

raw log data. Right? So you can write queries that are sort of almost like regexes

over a log,

and that's good for very simple attacks.

I think, especially, you know, 10, 15 years ago, that was fairly reasonable.

But attacker behaviors at this point tend to really

exist and span across multiple different events. It's very unlikely that given a single event or even a single event source that you would be able to detect

effectively an attacker without lots of false positives and that sort of thing. And so what I kept running into,

especially at my last job at Dropbox where we had 1 of these more traditional systems,

was that I wanted to find attackers. I wanted to look for

suspicious process execution, see where the binary file had come from, see if that was created by another process. Right? It was very join heavy workflows.

And these time series databases,

like Elasticsearch,

really don't make joins

easy or efficient. So, I mean, you can go to the Elasticsearch or the Splunk documentation, and they'll both say the same thing, which is that the complexity

of those queries of those join commands

really is just prohibitively

slow. In the case of Elasticsearch, they actually don't even provide a SQL like join. They have a much more specialized version, which is also not very performant.

And Splunk's SQL join has a number of caveats, like only returning a certain number of results before it gives up, timing out after 60 seconds, very, very significant

performance issues. You know, from my perspective,

these systems were slowest at the thing that I wanted to do the most. I wanted to stop writing detections

on, you know, properties of an attack, things like a process name or a file name. Those are things that attackers can change really, really easily. So if we tie ourselves to those, we're really setting ourself up for failure.

I wanted to track

the the structure of an attack. Right? What does the attack look like at a fundamental

level? And so that's where the graph really started to come in. I didn't like the query languages. Every single vendor has their own query language. They're all fairly

strange.

They make a lot of simple things very easy. But once you wanna do anything a little bit out of their comfort zone, things can get pretty rough with hidden performance issues, really large queries.

Python was an obvious choice for me. It's,

you know, a a choice for the the data science community. A lot of security people like Python,

very, very expressive

language. And so

merging this concept of graphs and structural attacks with Python

is really the

initial

motivation for me to start exploring this space.

And you mentioned that you first began working on this while you were employed at Dropbox. I'm curious what the overall process or initial

thoughts were on the nature of it being open source and

any challenges that you faced in building the system in the open while working somewhere like Dropbox?

I had actually started it just before I began my role at Dropbox. In between Rapid7 and Dropbox, I had about 6 weeks off. I spent most of that time building up an initial prototype of Grapple.

When I got to Dropbox,

really no issues at all. Grapple in no way competes with the company, and I think that was their only concern. If it were a file storage system or something or a collaboration

service, I think

there may have been more pushback, but

I consistently worked with legal, got the clear from them, got the clear from my managers, let them know if, you know, I was planning to, do anything new with the project.

And they were very supportive, which I really appreciated because I know there are a lot of companies where colleagues of my work, and they just are no longer capable of contributing to open source at this point. So Dropbox was very supportive of it. I was pretty happy with that. And the team themselves, we actually did publish a blog post about some of the work that myself and a colleague,

Mayank, had worked on, which was to bring more of Python capabilities

to Dropbox's security team. So we built out an automation system. We built out a Jupyter Notebook service that we could use to work with different data sources that we had at Dropbox.

So there were a lot of good lessons learned while I was there.

In terms of the end users of the Grapple project,

who is the overall target audience, and what does the overall workflow look like for somebody who's interacting with Grapple after you've already ingested data into the system?

I think today,

we're targeting

text and response teams specifically,

and

generally, maybe a little bit more modern of a detection response team. It's not always

typical that

members of the team would

have Python experience or, you know, any programming experience. We're kind of targeting teams that have started to realize that that's becoming necessary if you wanna keep up with attackers.

And I think that we'll find that more and more companies end up realizing that as time goes on.

The use case is

powerful, but minimal, I would say. So, really, there's a couple of ways to interact with Grapple.

The most obvious is just putting new signatures into it. So these are implemented as a Python class.

You inherit from our analyzer base class, and you implement 2 methods, 1 which builds and describes a query. So this is gonna be something like

a process query with children, and then you would describe the children with another process query. And Grapple will take that and run it against the master graph if there are any new processes that update.

And then another method, which is going to be the

response to what happens

if that signature matches. And this is where you can do follow-up context texting. You can

ensure that by the time you actually investigate that signature,

all of the necessary context has been added there. And that's really nice and simple. You can pivot off of the data right there in that method,

pulling in, you know, the entire process tree, network connections, files, anything like that.

After that, if you've built up enough of these analyzers and, unfortunately, there's something going on in your network,

the user interface for Grapple will sort of sort

based on different systems or properties in your environment, like your users' laptops or your AWS accounts. It's entirely configurable,

which ones are the riskiest?

You go into work, you sit down, you say, I've got time for, say, 5 investigations

today. You pick the top 5 riskiest

systems in your network,

open up a Jupyter notebook,

and pretty much just start pivoting off of the data. And 1 of the cool things that we've done is that

as you're pivoting off of the data into Jupyter Notebook, you'll have, on a separate window,

a live updating

visual representation of the graph that you're interrogating. So, you know, you call a method like get children on a process in the Jupyter Notebook,

and instantly in the other browser window, you'll see that that process has expanded with all of its child process nodes attached to it. So it's very interactive. It's really designed for

iterative exploration

driven workflows.

In terms of the actual architecture of the system, can you describe how it's implemented and some of the ways that the design has evolved from that initial prototype to where you are now?

So 1 of the initial important goals with grapple was to have

as low maintenance costs as possible. I've seen many security teams

spending a lot of time, a lot of money

just keeping their existing system up and running. This could be multiple full time people just making sure the system is up and running. 1 of my initial goals was to use as much serverless technology, as much managed

technology as possible. So Grapple deploys to AWS.

All of the compute is in AWS

Lambdas.

The Jupyter Notebook is

managed through SageMaker.

The event pipeline is s 3, SQS, these sorts of things. So it really focuses on keeping maintenance costs low.

The only exception there would be the Dgraph cluster,

which is running in a Docker Swarm on EC 2. So that's not fully managed for you, but Docker does a lot of heavy lifting, makes it easy to upgrade the underlying system, that sort of thing.

So it really does focus on low operational cost.

The pipeline is pretty traditional

in terms of, like, ETL. So

events come in, a Lambda will

transform them in some way. So parsers will turn a log into a graph,

and then they'll emit that event so that the next Lambda can process it. So in general, the way this works is a parser will generate a graph. That graph goes through our sort of data cleaning Lambda, what we call the node identifier.

This is gonna do some really, really cool stuff. Like, it'll figure out based on the metadata

of that node what it really is. Right? So to give some explanation there, you could think of a process on your system, which has a process ID,

but that process ID isn't actually unique. If that process terminates,

the process ID will show up again.

It's just a a matter of time, really. And so Grapple takes metadata,

like the process ID, the asset, the time that the process started and stopped, and resolves that to a unique identifier.

And so this happens for every single node that it processes.

Anything always has an identity in Grapple.

So these identified nodes get merged into the master graph database. This is

really simple service. It basically just performs an upsert into Dgraph. And this is actually 1 of the reasons why I chose Dgraph was because it has really strong

support for write performance and horizontal scaling of writes. Grapple has to ingest a lot of data, so that was super important.

Every time these updates happen, your analyzers

trigger automatically

and scan whatever the latest in the graph is. So they're very efficient. They don't have to scan the entire graph every time. They only ever scan what has updated.

And that means that grapple scales really, really nicely. In the entire system, there's actually

as far as I can tell, off top of my head, I don't think we even have a single linear algorithm. Everything is either

logarithmic or, in most cases, actually

data sources that you're working with? Is it mostly just things like the Linux audit log, or do you also support things like pulling information from Sysdig running as a sidecar in a Kubernetes cluster,

or things like system events from s 3 or other cloud services?

So today, what we support

is Sysmon logs, and we're working on some AWS support as well. So that'll be CloudTrail,

AWS Config,

and GuardDuty.

Grapple has a plug in system.

So if you, say, had your Sysdig configured on, you know, maybe, like, a Kubernetes cluster or something,

you could

build a parser using our plug in system

and then send that data up to Grapple, and it would just understand

past the parser level

how to work with that data. So, really, if you can get it into that graph format,

we can do anything with it.

We do intend to build out a nice large suite of parsers and plugins.

But at the moment, we've really just been focused on honing

the system itself, so making sure that the plugin system is robust and and that the system can scale.

That said, the plug in system has really come along very well. It's been redesigned in the last month or so, so it's really easy to

tell Grapple, you know, I want to express

this thing that you don't know about today. Right? So with AWS, for example, that's entirely implemented in the plug in system, and Grapple can still work with things like s 3 buckets or IAM users because the plug ins tell it how to do so. I mean, it's pretty straightforward.

As far as the specifics of the AWS implementation,

how difficult will it be to replace some of those

with either

similar paradigms from other cloud providers or just the sort of

nonspecific

abstractions

of what that functionality is, such as maybe the open FaaS for a Lambda replacement or just a generic queuing system for replacement of SQS and things like that?

I think it shouldn't be too hard. So, I mean, just as sort of some evidence of that, there is actually a version of Grapple that you can run locally on your laptop, and it just runs in Docker containers. There's no

Lambda environment or anything set up for it. That's because the system is pretty abstracted

away from those underlying

details like

SQS or s 3, that sort of thing. There's a library we've built that pretty much you just tell it, perform this computation, and then you plug in event sources

and

where you'll emit the events,

and you can do the rest basically really simply. I actually built a version that could just run on the file system using INotify and and other such things. That said, there is a little bit of a hurdle. We

do use DynamoDB.

This is probably the

system that would be hardest to move away from. It's just

it's a really great key value store,

and I'm not familiar

with

the equivalent in GCP. And and I think in Azure, it's Cosmos DB, but I I don't have the experience today

to port that over easily. But, really, other than that, there's actually nothing

that would be too hard to port over. It's it's all very abstract in a way. Both Python and Rust made that pretty simple to do.

And as you mentioned, there is a

split in terms of the implementation

where you have some of the core elements in Rust, and you have Python in there as well for being able to handle some of the processing as well as being the end user interface for being able to do this exploration of the data.

And I'm curious

what you have found to be some of the benefits of splitting the implementation between those languages and any challenges that you found working across those boundaries.

It's been a really interesting thing to build the system in what I guess you could call a more polyglot way. So we've got

JavaScript, TypeScript,

Python, and Rust, where Python and Rust by far dominate the code base almost, like, 5050

each. I think it's really just about choosing the right tool for the job. So something like a parser. Right? It's just it's pure compute, something you could very easily

spread across multiple threads.

You want correctness.

You want all of the nice things that a language like Rust gives you.

But data exploration and really iterative

data exploration where you're almost in a REPL, right, that's you know, a Jupyter notebook is just a really fancy REPL, that's just Python's bread and butter, and and it has tons of libraries for supporting these sorts of things. It's also a really nice dynamic language. So it's made a plug in system, and this whole, you know, users can add their own code to grapple. That that's been a lot easier because of Python's dynamic nature. So we've gotten a lot of value out of both of those, and I think going forward, we'll really be able to do a ton. I just saw actually recently

the pyo 3 library,

which allows for Python to Rust FFI, so calling Rust directly from Python.

I got an awesome update. It's something I've investigated in the past. So we can sort of bridge that gap more and more between them, so it kinda feels like we're just using 1 tool

rather than 2 different languages.

But in the meantime, there are certainly some challenges. We have some code duplication.

We have

libraries that only exist in Python but don't exist in Rust. So a new service that needs some capability,

even though maybe Rust would have been a better fit for it, we end up writing in Python or vice versa for that matter. Sometimes, Python would have been faster to get up and running with, but Rust happened to have the library that we needed because we had just built it for 1 language.

So it can slow things down in cases like that. I'd say initially, certainly, when when those first base libraries had to be built twice,

that was a bit of a time sink, but

it's always faster building it the second time. I just ported 1 of our libraries over to Rust. It was a couple of days of work, whereas building a library the first time was probably about 2 weeks. So, you know, there is a cost, but I think it's worth it. The benefits we get by

choosing

Rust where it makes sense and choosing Python where it makes sense have been well worth paying. So

And in terms of community engagement, how has that split played with people who are interested in using and contributing to the system?

That's a good point and probably 1 of the harder trade offs because

our end user,

ideally,

doesn't have to know a whole bunch of programming languages. And, really, the number of people who know Python and Rust and our insecurity,

It's not a big group of people by any means. So for that reason, we do understand that there's gonna be a larger barrier to entry. This is where I'm really hopeful for

that pyo

3 FFI approach where we can build up a system that

looks like it's entirely Python. Right? Like, all Python on the top. And then under the hood, we can do the work of maintaining

open source contributors probably aren't going to need to get their hands into anyways.

We can take that and put it into Rust, get all of that nice performance and stability,

but the interface will be really nice and high level. So longer term, I think that's gonna be a great solution for it. But today, it does mean that the barrier to contribute

is higher than I would like, certainly.

This episode of podcast.onet

is sponsored by Datadog, the premier monitoring solution for modern environments.

Datadog's latest features help teams visualize granular application data for more effective troubleshooting and optimization.

Datadog Continuous Profiler analyzes your production level code and collects different profile types, such as CPU, memory allocation,

IO, and more,

enabling you to search, analyze, and debug code level performance in real time.

Correlate and pivot between profiles and distributed traces to find slow or resource intensive requests.

In addition, Datadog's application performance monitoring live search lets you search across a real time stream of all ingested traces from your services.

For even more detail, filter individual traces by infrastructure,

application, and custom tags.

Datadog has a special offer for podcast dot in it listeners.

Sign up for a free 14 day trial at pythonpodcast.com

/Datadog.

Install the Datadog agent and receive 1 of Datadog's famously cozy t shirts for free.

On the open source side of things, how are you approaching

governance

and management of the road map and

sustainability of the project, especially since you also have a business that you're building on top of it?

I think this is 1 of those

potential points of friction where you take a project that was really just a a passion project and a side project where I just wanted to use grapple, and no 1 was building it, so I had to build it. And then, really, to build it, you need a a solid business that can actually bring other people in to work full time on it.

What we've ended up with is something that I'm I'm pretty hopeful for. So the core of Grapple is

totally open source,

and that's that's never gonna change. You know, we keep it, I believe, dual licensed

Apache 2.0

and MIT,

so you can choose whichever 1 to to fall under.

And then our plan is really to

try to make a viable business out of either a managed version of Grapple, which we've been hard at work on,

or through plugins and support. So not not every plugin, by any means, will be licensed separately, but

something like a GuardDuty plug in, for example, GuardDuty being a detection system in AWS, that's already, like, a paid feature for AWS. It's something only an enterprise would ever really

be using.

And so it makes sense to kind of have that be source available, but maybe not Apache 2 0. Maybe something that you either have to contribute to if you're using it or, work with us on on some kind of deal that everyone can walk away from happily.

Governance is something I'm really curious to experience as we grow this project out more and more.

You know, we don't wanna be dictators by any means. We don't want to, like,

fight people who are trying to make the product better. I think this is something that happens a lot with

purely managed systems that also have an open source version is that

the company is incentivized

to make the product harder to manage. Right? Because they want to be the only ones who can do that, and we really don't want to be that at all. We're trying to make Grapple as easy to use for as many people as possible.

My hope here is that by just being open about our road map, you know, we we use a GitHub issue tracker, and we're planning to move more and more of our

sort of Kanban boards and that sort of thing into the open.

And, hopefully, by just engaging with the community, like with our Slack channel, which is publicly open as well,

we can come to a place where everyone's just trying to do the right thing. Right? Where we don't have to worry that someone's gonna come in and, you know, try to reimplement

maybe 1 of those plugins, like the GuardDuty plugin, just to cause us trouble. Right? If they wanna do it because it's better the way they're doing it, that's great. I I would never wanna get in the way of that by any means. So my hope is that we can come to a a good understanding with our community where everyone's just trying to help companies and help people

stay safer.

In terms of the specifics of the

security compromises that you're able to detect and being able to surface those events.

I know that a lot of the

initial generation of systems like this were very

heuristics based and brittle and subject to

easy compromise by attackers who knew what was being used to detect these particular threats

because of the fact that these are graph based and you're able to join all of these complex events together to get a good view of things.

What are some of the useful

questions to be asking in order to be able to to discover events

and ways to surface them in the case of any sort of known compromises that that a system might be subject to?

Yeah. So I think there's a lot of

attacker behaviors

that, for multiple reasons, are just

more painful to express in other systems. And I think the 2 main reasons

that I've seen are that

the query languages and the underlying database

make it really hard to combine multiple events together to make 1, you know, larger contextualized event. So you're forced to just work with what's there, which is typically a lot more brittle. You're just lacking information.

And that coupled with

the false positive issue that these other systems has,

really pushes defenders

into building

worse detection. So you get detections that

overfocus on properties that the attacker can control

and then basically try to filter out as much existing data as they can to try to hone in on the attacker.

But what this means is that the attacker has tons of room to just work within what you've already, you know, whitelisted out of your detection and that sort of thing. That's a really serious problem,

and 1 I've seen just about every company using these systems

run into quite a lot.

Grapple takes a very, very different approach. So, of course, the graph based approach just means that we can much more easily understand the system.

If I wanna join together

a process,

all of its children,

its parent process, its grandparent process,

you know, its binary file, those sorts of things, that's extremely

efficient. So 1 example that I really like is the dropper behavior. So droppers are

a malware technique where the initial payload that you download

is very small. It's, like, very benign looking, a really, really simple program to basically make it harder to analyze for, like, an antivirus or something.

And the dropper

reaches out to the attacker's command and control service,

downloads the payload, and then executes that payload. Right? So we have multiple different events here. We've got the dropper execution.

We have a network connection,

file creation for the payload, and then the payload execution. Right? 4 different events, which could easily be across 4 different source types.

A traditional

SIEM, a traditional

system like Elasticsearch or Splunk will have an extremely difficult time expressing that efficiently.

With Grapple, that's trivial. It's a couple of lines of Python.

Just very, very natural to work with something like that. And it'll execute

really, really fast, so we'll catch it immediately.

And there's no properties there for an attacker to really take advantage of. It's just the structure of what they're doing.

It's not that we care about the process name or, you know, what these

processes are or where that file is. Those are details that

we we don't wanna tie ourselves to, especially because, you know, we might have 3 you you could have a a fleet of

Linux and OSX and Windows machines,

but that structural pattern

will apply to all of those the same exact way. So we don't have to write 3 different searches and worry about the peculiarities

of each operating system.

And then the other side of that is that Grapple just doesn't have false positives in the traditional sense.

We don't have to rule out specific types of droppers or anything like that. What we do is we assign a risk score. We say dropper behaviors

have a very risky behavior,

but maybe some other behavior, like a unique parent child process execution pair,

that has a really low risky behavior. Attackers will do that. Right? They'll execute processes in ways that we don't expect, but that also kind of just happens day to day. We don't wanna just ignore it. We wanna figure out anything an attacker can do. That's what we should be expressing.

But we'll lower the risk score, and then Grapple will combine those graphs together, see if they overlap,

and raise that risk score up over time

to make sure that the incident responder, when they get in the next day, can look at that and see, okay, this is at the top of my list. Whereas, you know, all of that other stuff that were just anomalies, you know, totally benign, those sink to the bottom of the list. So grapple really addresses 2 of those core issues that I see

holding defenders back from building the right signatures.

And in terms of somebody who's getting started with grapple initially, do you have a prebuilt set of rules that somebody can get started with for being able to detect a common set of attacks? And how much security knowledge and understanding is necessary for somebody to be able to use grapple effectively to identify potential security issues?

So we do have a open source set of initial analyzers,

and we actually have another set that we've been working on with a couple dozen more that we'll be adding in open sourcing in the near future. So you should be able to just get started right away if you wanted to deploy it and have pretty decent coverage over a number of attacker techniques.

In terms of

what knowledge is required,

of course, security knowledge is certainly going to help. You'll know what's weird. You'll know what's not normal. You'll know

what attackers do, and that can really inform your

your decision making about how to prioritize the work that you're gonna do and which signatures to build.

But if you know some Python and you've got the data going through the system,

it's very easy to work with. You can run queries against the graph. You could

export that data into

Pandas data frame,

start working with it almost in a SQL like syntax with Pandas,

and just do basic statistical analysis. Right? I mean, just, you know, you could easily say,

show me which processes have executed

that have

a rare process name. Right? Maybe fall into the the bottom 4th quartile of process name executions.

And these are things that you could Google that, and Stack Overflow will come up and give you the Python snippet in 2 seconds to do something like that.

And that's really what I think more defenders should be looking for. I actually think they overfocus on attackers

instead of really thinking about

what's going on in their networks. And that's something that with Grapple,

anyone can really do that. They don't have to be a security expert. They just have

to work with the data. It's all there right in front of you. You can just sift through it and and figure out what's going on.

And it seems like that's also a prime candidate for being able to use some of these prebuilt libraries for doing anomaly detection as well, where you can have a base set of behaviors on

a system that's pretty heavily locked down or isn't exposed to public use just to get a general idea of what is the baseline of processes that I'm expecting to run when I have this system deployed.

And then maybe in your production environment where it's more likely to be exposed to potential compromise, you can then run it to see, okay, what are the anomalies that are different from my QA baseline that I can, you know, dig deeper into to see, do I need to blacklist these? Do I need to

set these particular sets of processes

to be disallowed on these systems or remove potential targets from being accessible based on these attack vectors?

Yeah. Absolutely. So I think that's actually 1 of my favorite things that I think more security people should do in the same vein of not thinking so much about attackers, but instead thinking about systems is, you know, what are the policies and and expected behaviors

in your production environment? Right? You expect

maybe certain ports to be open.

The first thing you should be doing isn't

searching for, you know, a specific attacker behavior. It should be validating that that policy is in place and ensuring that if that policy changes, that you can

see that and react to it. So

anomaly detection,

baselining,

when you don't have to worry about false positives in the same way, all of a sudden those techniques

become much more viable.

Certainly, in the traditional

alerting

is pretty expensive.

It takes up a lot of your time.

You know, do you page on this 1? Do you just email? Does it go to Jira? Is it high priority? Whatever.

There's not a lot

to make those decisions. And so that yeah. I actually think that

starting with just understanding your systems,

understanding maybe aberrant behaviors, anomalous behaviors

is much more important.

And then when you start to layer those attacker behaviors

on top of all of that, you end up with this really high fidelity

detection system that by the time you come in to actually look at an attack, which maybe has raised that risk score way up, you've got all of these other anomalies that are joined together and connected to it. The investigation is, you know, practically done for you.

It's an incredible thing. I've actually done that multiple times

where

just using

contexting

and then having a single attack signature,

having that actual attack signature, you know, crop up in my risks, I could see everything. I mean, virtually every action the attacker did that was relevant and interesting

was already part of that graph that I was looking at.

So I see that as just a a huge capability with

And in terms of uses of the grapple system, what are some of the most interesting or unexpected or innovative ways that you've seen it employed or any particularly

notable

attacks that you've been able to detect with it that were maybe novel or especially sophisticated that you were intrigued by?

Yeah. I think my favorite 1

was detecting

SSH hijacking.

So there's multiple types of SSH hijacking. This is where an attacker

is able to leverage the local SSH agent

to sign

their

SSH connections so they can move into, say, your production environment

if you don't have 2 factor authentication or if they're in your 2 factor I'm sorry. If they're in your production environment already,

they can call back into your system's SSH agent to sign new requests and start moving around laterally.

And the key to this attack

that makes it really difficult to detect is often just the fact that it takes place across multiple systems.

So you've got, you know, an SSH that's initiated on 1 system, which self might actually involve,

first, a connection to the local SSH agent,

and then you've actually got the execution

on another system. And you could have chains of these executions. Right? The attacker can continuously

forward the agent

across your production environment if you don't have the right isolation in place.

This is just a devastating

attack

if an attacker is able to pull it off. This is, I believe it was

matrix.org

who was attacked and fully compromised,

And 1 of the root causes was the attacker being able to abuse

SSH forwarding like this. And so Grapple,

someone was able to use it. And in

maybe a week,

they were able to build up a suite of detections

that could catch agent forwarding

on the client side, on the server side using

different techniques like watching the connections, watching

the inter process communication

from processes to the SSH agent, which involved multiple joins actually, not necessarily across nodes, but across source types that formed those nodes.

I think, in total, it was something like 8 or 9 detections

and just totally

destroyed that attack in an environment that was actually vulnerable to this. If an attacker had pulled it off,

the policies weren't in place to prevent it for historic reasons.

Being able to detect that in so many ways,

no matter what the attacker did, was really cool. I was really impressed by that. That was actually

early on when I had just started the company, and it was just extremely motivating to see Grapple leveraged in such a powerful way.

And in your experience

of building the product and building the business around it, what are some of the most interesting or unexpected or challenging lessons that you've learned in that process?

Gosh. Yeah.

So many, really.

I've never done anything

like this before

in any capacity, really. I mean, even just starting Grapple as a side project,

all of my side projects had always just been, you know, I'll get it, like, 70% done, and then I'll move on to the next thing. And I sat down with Grapple, and I said, like, this is the project. You're just gonna work

on this.

And that was a completely new thing, taking a project solo from, you know,

idea

to proof of concept to deployable

to scalable production ready

was

so challenging

and such a great learning experience in so many ways. I'm sure so many engineers can empathize with the idea that

that last 10% of the project is always

so much harder than the first 90%.

And with a long tail,

large project like Grapple that does so many interesting things,

you know, shifting

technology choices along the way, learning about graph databases or even graph algorithms, and that sort of thing was just really interesting and challenging. And then, you know, we've

hired 6 people now. So the team's grown quite a lot in the last couple of months.

Still pretty small, but it's been very interesting to

actually run a company, to

not just be a developer. You know, I don't get to just

sit down and write code every day. I'm doing a lot more support for my team. I'm figuring out how taxes work and, you know, filing in different states and working with customers.

These are

incredibly valuable skills, I think. I I've been really happy to kind of be thrown into it in this way and be

forced to to learn all of this, but huge learning curve for sure. It's just so many things thrown at you at once, but I'm very lucky that I've managed to

build an awesome team. We have some people who are just really

unique sets of skills, unique combinations of skills, a lot of great engineering

and security talent here. So that's made it a lot easier for me because now I'm I'm at the point where I can lean on my team a lot, and they can

kind of take it and run with me just sort of supporting them along the way.

For somebody who is considering Grapple, what are the cases where it's the wrong choice?

Yeah. So I think that

Grapple can maybe ask

for different things from your security team.

The traditional

security operation center model where you have tiers of analysts

isn't really what Grapple is built for. It's not to say that it couldn't fit into an environment like that at all, but

if your

security team,

they feel like they're fine using the tools that they already know. They're not looking to learn Python. They're not really programmers. You don't have that investment

from the team into sort

of getting the skills necessary to use a system like Grapple, then I think you would be the wrong tool. The the team would probably struggle with it. It would be different enough in how they're thinking about things

that

they would be more effective with a tool that fits into their mindset. But, really, Grapple just asks that you

think about things a little differently. Right? Instead of events, you think about

graphs and behaviors.

Instead of maybe a SQL like query syntax

or a very, very specialized domain specific language,

you learn the basics of Python. You can call a couple of methods, maybe create a class,

for loops, that sort of thing. So

if that barrier is not something that you're prepared

to get over,

grapple would probably

not be the right tool.

And as you continue to work on grapple and continue to explore the space of security and tax

and being able to remediate some of those. What do you have planned for the future of the product and the business?

Yeah. Tons.

We are really ambitious,

but, you know, some of the obvious ones, I think, are just getting more data sources, more plug ins built for Grapple so we can address more use cases.

I would love to see

more more endpoint instrumentation,

like you'd mentioned, audit, which we have a very early proof of concept for, OSquery,

also more services

like G Suite and GitHub or 0365,

which are

often overlooked

as entry points for attacks,

but can be really important capabilities if an attacker can get control over those. So just expanding the use cases there,

improving our query DSL

built on top of Python to express more and more attacks even more effectively,

That's always top of mind for us. We never want Grapple to be the limiting factor in the attacks that you're expressing.

And as I mentioned, also getting a managed service up and running, I think, would be great. It would mean people can

bring their data to us, and we can do

the work of managing Grapple. We can help

understand

what's going on in our customers' environments

more directly,

build signatures for them, work with them, and collaborate, and just get a better idea of how people are actually using Grapple. And I think that would be really huge for us

just in terms of understanding how we wanna continue to build it. But, you know, we're very early. We're small. We're growing.

And

we're also still figuring out exactly what everybody's gonna want from it. So I think we've we've got a great foundation. We've got great capabilities built into it.

But I think it's gonna

really grow to be so much more than it is today.

Are there any other aspects of the grapple project or your experience building a business around it or the overall space of security detection and response that we didn't discuss that you'd like to cover before we close out the show?

You know, I I think really

what a lot of

grapple

gets right is that

it's really

about shifting how we think about approaching this problem. You know, security,

especially in detection response, has been approached in much the same way

for the last 10 to 20 years, this event

based way, this, you know, reg x on fields way.

And GRAPL really just

was an experiment to say, what if we just threw out the book? What if we said we wanna solve these problems

and really rethink

every aspect from bottom to top?

And I think we've built something really awesome just by taking that approach.

And I think, actually, you know, a lot of technologies have done that. They've just thrown out the book of what is

typical and what most people are doing.

Docker comes to mind, for example. Leveraging

containers has been massive, and that's that's a very different world from even 10, 15 years ago with Vagrant and, you know, all of these other VM based approaches. So,

you know, I just hope that we see more and more technologies cropping up that aren't constrained by the way things are already done.

For anybody who wants to get in touch with you or follow along with the work that you're doing or get involved with the grapple project, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the picks. And this week, I'm going to choose the Artemis Fowl series of books and the movie that they made off of the first book. It's just a lot of fun books that I've been reading with my kids and wife. So just a lot of interesting story lines and some interesting blending of

sort of fantasy and modern technology.

It's

interesting adventure. So definitely worth checking out if you're looking to stay entertained with some anything there. And so with that, I'll pass it to you, Colin. Do you have any picks this week?

I had mentioned the PYO 3 library.

I've been checking it out, melding Rust and Python together. I think it's 1 of the most

impressive technical projects that I've seen and and really enabling to start merging

the capabilities across programming languages. So if you're interested

in

those 2 languages

or how you could improve the performance of your Python code, I would highly recommend checking out Py 0 3.

Well, thank you very much for taking the time today to join me and discuss the work that you've been doing with Grapple. It's definitely a very interesting project,

help identify security issues in our system so that we can resolve them more effectively. So definitely a very useful way to spend some time, and I appreciate all the effort you've put into that, and hope you enjoy the rest of your day.

Thank you so much, Tobias. I really enjoyed getting to talk with you today.

Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at data engineering podcast dot com for the latest on modern data management.

And visit the site of pythonpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes.

And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com

with your story.

To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Closing Announcements

Links

The Python Podcast.__init__