EP 3: Bush League: How Todd Bush Has Utilized MongoDB, Ruby on Rails, and R to Streamline Data Access

So, we're back, Energy Bites, episode two, Brad Dad is here, John Califian, got my co-host

Bobby Nealon, and today we've also got Todd Bush from DeCarbonFuse, as well as just a

man, I was trying to think about what year it was that I originally met you, and it's

been a while.

Yeah, it's been a long time, for sure.

I had to look back on what the company has, I knew we even used the Energent, but I had

to look back up what it was called.

Yeah, I did too.

We used it in a number of places, the API and stuff, when Bobby and I were working together,

and then they got bought, and I also haven't worked at a company that needed it.

But no, we've got Todd Bush, how's it, thanks for joining us.

Yeah, thank you for the invitation, and definitely excited to talk about, we'll go back and forth

on the data and AI side, and so interested to hear what you guys are up to, as well as

sharing a little bit about what I'm doing.

Awesome, absolutely.

Go ahead and tell us a little bit about yourself, where are you from, how'd you end up here

today?

Yeah, so, let's see, so, born in Colorado, grew up mostly in Dallas, and was interested

in technology and kind of information all along, went to Texas A&M undergrad, focused

mostly on information and operations management, did a little bit of statistics, so that was

kind of the, kind of piqued my interest a little bit in that.

And then out of school, I actually worked for a small software company first, just really

focused more on the financial and banking side, and then met some people and got recruited

into a consulting firm that was working a lot with Chevron, and so I kind of found an

inroad into Chevron and was a, kind of like a project manager, program manager, and one

of the first projects there was a company-wide technical portal that included everything

from the initial kind of regulatory planning, all the way to drilling and production, to

what happens kind of on the work-over side afterwards, so that was kind of my first foray

into, you know, all the information, kind of all the different workflows that are across

different teams, and so from there, it's just been kind of expanded into a number of different

products and projects, and so, good stuff.

Todd's humble description of a successful software entrepreneur, I mean, that's how

we met, I was, it had to be, I think, sometime around 13 or 14, 2012, 13, 14-ish, where I

don't even, I think it was like a LinkedIn post or something, I came across Enerjet and

I was working for a Frac company and we needed, you know, data about permits and completions

and Frac focus had just kind of started back then, wasn't, you know, there was no requirements

or anything around it back then, it was just a database, but yeah, and you know, what,

you guys ended up selling that to?

Yeah, to Westwood, yeah, so started Enerjet with a friend, Boyd Skelton, and we basically

looked at all of the kind of Frac focus completion data, and at the time, the completion data

just wasn't very good, right, this is 2013, right, so we had a little mobile app out there

that was for kind of the field guys, and we just felt like the old field service side

was underserved, really, and ended up doing a lot of work for FracSAN companies, logistics

companies, financial institutions that were kind of looking at the, looking at activity

and trying to understand what completion trends were happening, and so we ended up

going from just the typical regulatory side and bringing in all that information to taking

satellite data and trying to figure out, okay, could we actually observe when the rig is

on site, when it leaves, what happens when a Frac crew is on site, can we detect kind

of all that, and we had some pretty good success with that and launched another kind of product

alongside Indigent, where it was really focused more on just activity and derived kind of

information that we could send to a number of different people.

So when you got onto that, was it ever the goal to, say, compete with the IHSs or Inverus

of the world, or, I mean, because in a way it was, but then in other ways not, I mean...

Right, right, yeah, we definitely wanted to not directly compete with them. I think one

thing that we tried to do early on, and it was really trying to find the progressive

companies that were, oh, we need an API, we need to integrate this into all of our systems,

so we had what I thought was a very simple kind of well-headed API that didn't require

you to download an entire database, it was restful, it was something that you could get

by well name, by API number, by operator, kind of all the basic things that you would

expect from what we thought a typical kind of oil and gas information company should

provide. So we did that, and then that led to probably even more kind of visualization

tools, I would say, on integrating with Tableau, integrating with kind of Spotfire, Salesforce,

kind of all down that path, and there's so many different workflows that you can kind

of tap into, and we ended up working really more with the drilling contractors, the logistics

companies, and those firms that were really interested in kind of how they could get insights

into what operators were doing.

So Energent was your first startup, really? You were with Rigdata, I saw a little bit?

Oh, yeah, so I kind of skipped over that for a bit. So I joined, let's see, I left Chevron

because I kind of had like an entrepreneurial bug in just trying to do something. So I had,

when I left Chevron, started a couple little side projects kind of on the software side,

joined Rigdata, and really the emphasis at Rigdata was relaunching a mobile app and building

out kind of, I'll say, kind of an enterprise data kind of product that they didn't have

at the time, and so stayed there for a short period. I was hoping that I could actually

at one point kind of take over or buy the company from the existing owners, and that

didn't work out. They wanted to hold onto it, which I definitely understood, and that

really spawned kind of Energent and really forced me to make a decision of, okay, I can

stay here and work and help kind of build out the product landscape or kind of start

Energent and go off on that, and so that's what I did.

So with Energent then, so did you all start, so again, I think, as I understand it, Boyd

was probably more fingers on the keyboard, and then you were driving the kind of direction

of the product?

Yeah, yeah, and then I would say I was trying to create some of the reports and the kind

of visualizations that we were doing, so along the way we decided to pick up R as kind of

a language for some analysis, some models, and really more probably in the way that I

use it as really just a visualization tool, making it easy, and so we picked that up and

ran with it and really improved kind of our ways of producing basically reproducible research,

if you will, around oil and gas information, which at the time was probably, R wasn't very

well known, but it was kind of coming into the scene.

I think that's one area where we helped kind of bring, we'd bring analysts on, teach them

R or hopefully they even had some kind of R skills, and then go to town on what we can

do for completion metrics or kind of insights or anything around kind of the completion

and production side.

Yeah, so I guess I was thinking that, so I mean, was R kind of part of the Energent story

like pretty early on, because I know I remember talking to you when you were at Westwood and

you had some people working on, and you even had maybe an R library to help pull data out.

Yes, yeah, so we ended up, I think there was a point in time when we talked about releasing

kind of open sourcing some of the tools that we built around, because we had all kinds

of R packages that were built specifically for the kind of well header information, specifically

for some of the prop end data, even getting into some of the more detailed completion

side where we had kind of pressures and ISIPs and a few other kind of metrics around kind

of the completion data, and we didn't, but you know, I think there's, I'm sure those

still exist, still being used, because they're a great kind of foundation for us to take

the research and actually do something more with the information.

For sure, so I mean at that point, I guess we can, yeah, I didn't know where we were

going to go with some of the technologies, but I loved R. R was probably my first coding

language, and I still love using it when I get the chance. So at the time, was R Markdown

a thing?

It was just, yeah, R Markdown was a thing, and we tried to do some of the, if you remember

some of the PDFs and publish it with kind of Markdown. Some of that worked well, some

of it was just okay, but we ended up taking all the ggplot and dplyr and kind of all of

that.

Daddyverse now, right?

Yes, yeah, all the packages and using those in a way that we could kind of run through,

oh, here's the updated information, here's, whether that's, say, taking some of the prop

data that we had and creating some of the kind of quarterly reports around that, we

could run through that pretty quickly and kept our, you know, the whole idea was, okay,

we can do more with less and with R in the reproducible side and then our own data, then

we could do a lot.

Yeah, walk me through why y'all went with R over something else, just out of curiosity.

So we started off with kind of a Ruby on Rails application. One thing, one of the decisions

that we made early on was we wanted to do this with MongoDB and NoSQL instead of kind

of a typical Postgres or Microsoft database or something like that, right? And that decision

was a huge help in pushing us to be able to collect any type of information from any state

regulatory body, from any, you know, EPA or even some of the FRAC focus information, we

would basically collect through MongoDB first and then kind of rationalize it, if you will,

a little bit and then pull that down into R for the analysis. And we started, I'm trying

to remember exactly where we started, we had a, at the time we started doing a lot of kind

of LinkedIn posts, articles, and having some key charts in there. One of the things that

we found with R is we could kind of differentiate the chart, make it look a little different,

kind of style it in a way that was unique to us. And so we continued doing that for

marketing purposes and then pulled that into some of the research reports.

So were you all just using it for just plotting or were you doing any kind of analysis and

stats and stuff on the backside?

On the backside, we did all the analysis in, I would say, some of the modeling. So we had

kind of our quarterly forecast for drilling completions. We had part of that in R. We

always wanted to move 100% into R, but there was just some things that were easy to do,

you know, in Excel. Like if you want to tweak some of the, you know, growth factors or we

believe, you know, the rig activity or drilling activities didn't go down in a particular

quarter or a particular area, it was just easier to tweak some of that in Excel. But

really the R gave us the front-end work of that to kind of really reduce how much effort

we had to put into it.

For sure.

So you can use R in production.

Proof.

Yes.

The other quick question I want to ask there, because you mentioned something that I think

all of us kind of generally understand, but I want to make sure that for kind of just

the average user that they understand, you're talking about your decision to go with a unstructured

database versus a structured database. What kind of went into that and why? And like thinking

about this from a generalist perspective of like, okay, I know that I need a database

because I'm going to be doing some analytics analysis, whatever, basic stuff with our data.

What do I need to think about and why do I need to be looking at unstructured versus

structured?

Yeah.

Oh, interesting.

That's a big question.

Yeah.

Let me rephrase it. What benefits did y'all find from going with an unstructured database

versus a more traditional SQL structured type database?

Yeah. Since we were thinking about a couple of different products, we had MongoDB on the

back end so that we could pull in state regulatory information from any place. And so that would

be everything from PDF documents to Excel spreadsheets to shape files going directly

into an unstructured database allowed us to say, okay, at this state level, we don't care

how you define your individual record. We're going to take all of that information and

that if well names repeated and at least names repeated and you have kind of all this duplicate

information, that's fine at one level. We'll pull that all into that kind of state, we'll

call it state repository on the MongoDB side. And then once we normalize it, then we can

actually get to a structure, kind of our standard well header.

So it was almost like a staging layer, if you will, or almost like what a data lake

would be.

Yeah, it's kind of like our ETL layer, if you will. We just used that. So we had some

Ruby programs that were all the gatherers that were basically pulling information and

there were a couple of things where we connected directly to other sources, but essentially

pulling that in and then within MongoDB that allowed us to say, okay, if you wanted to

then show that information through an application, obviously we could do that or we could show

the normalized view. And so that allowed us to basically not have to do, in Ruby and Rails

it's called database migrations every time the relational database changes. So that allowed

us to kind of prevent that whole step and just go directly to how we want to store the

data and get access to it and then present it. And then from the R perspective, we could

then use that normalized information or what often happened is like, oh, I want to dig

into North Dakota. Show me what's happening in the Bakken or what's happening with some

new minds or whatever it might be that we were able to then go deep into that state

repository and have all that information at our fingertips.

Yeah. So y'all started Energent when? What year?

So officially, I think it was 2013. Did a little bit of consulting. I think we incorporated

in 2013 or 2014.

Okay. And then y'all exited when?

In 2017. So I mean, that's a pretty quick turnaround for, in my opinion, in oil and

gas. I'm sure in the trenches, especially when you've got almost two downturn or you

got a downturn and a half.

And a downturn and a half. Yeah, and so I'll never forget, Boyd and I always laugh about

this, like some of our first customers, me driving through kind of Eagleford and talking

to people that were using the mobile app because we had that out in the field and trying to

figure out, okay, what steps do you really want to take? And some of those were large

companies, but you know, had kind of their districts and others were really small companies.

And so it was always an interesting kind of, you know, get a different view of that once

you got in the field.

So from my infrastructure side, were y'all always on the cloud? Did you start off on

the cloud? Yeah.

Yeah, we started off with Heroku before they were acquired.

Okay, yeah.

Yep. And then MongoDB was an independent hosting.

MongoDB Atlas, like the one that they host?

So even before that, there was something called, oh shoot, I'm going to forget the name of

it, not compose, but something that was acquired by IBM.

Okay.

And then we, like the MongoDB portion of that became the most expensive side of it.

And so we believe at one point transferred over to, this probably after we were acquired,

but transferred over to somebody else, probably Mongo Atlas.

Yeah.

To kind of reduce some of that cost, but that was years down the road.

Right.

Yeah.

Who were you hosted on, or which cloud?

Well, it was Heroku originally.

Yeah, it was Heroku.

And Heroku got bought by Salesforce, I believe.

Yes.

Yeah.

Yeah.

But I mean, it was a great platform.

I mean, just from the, being like, yeah, you can, there's even some really great stuff

now, even you're in API, Python, I'm going to just push it up and it works.

Like, and again, really abstracting those layers of complexity.

They handle all the hosting and even some of the CICD for you, if you want.

Yes.

Yeah.

And so we would do things like, we had a little special project or something with

Heroku, you could spin up a, say a Postgres database and then connect to that.

We did some other things with AWS and connecting with some of the, once we got

into production data, basically having kind of a direct connect to a public kind

of production database on AWS, but primarily it was the core, I guess, was Heroku.

Tell them, so you'll have your exit at Energent.

What are you working on now?

Yeah.

So, so let's see the exit, stayed at Westwood for four or five years, four years, I

guess, and then left and I knew I wanted to kind of go back and look at kind of

some of the energy transition pieces.

So one of my early projects at Chevron was a CO2 flood, it was kind of like

water alternating gas.

Wag.

Yep.

Wag.

And so we had a reservoir engineer that basically dealing with all of the CO2

buying and selling and trying to figure out how much water, how much CO2 they

needed and figuring out kind of that life cycle.

Logistics of that must be really fun.

Yeah.

And I'm like, if you have one guess, how do you think that was being managed?

Right?

It was, it was at Excel and access data.

I was going to say access, but yeah, it's the same difference.

So both of them.

And so-

I'm impressed that you had any kind of database involved.

I know.

Very true.

Yep.

Yeah, so-

Not Excel as a database type.

Yes.

Yeah.

So that was an interesting piece to me.

And then obviously the, some of the other projects I've worked on, I was like,

okay, there's got to be a little bit more on the CO2 side that's not very well

known and so started digging into some of the other information that I knew

about in different States for enhanced oil recovery and, and different CO2

floods and started really doing more research around carbon capture and so

started a decarbon fuse basically here.

How can I provide some insights into the information around carbon capture,

hydrogen and electrification because especially at Westwood did a lot with

emissions and electrifications for different frac fleets and companies that

were looking at, okay, how do we, how do we reduce emissions, clean, clean up

operations, if you will.

And so it took those kinds of three categories and really started digging

into the market, like who the players, projects, and now trying to figure out

if there's enough, enough kind of projects, enough spending to really

warrant like a full-blown kind of tool around, around the screening of kind of

carbon capture and hydrogen projects.

So what kind of data workflow use cases is, is your like target customer

we're looking for, right?

And so a lot of it right now is screening, like all these projects that have been,

been announced, most of them are in the screening and feasibility stage.

And so with screening, it's mostly all public information, right?

And you have all your emission data.

You have some, some well logs, if you want to get into kind of the sequestration

portion of it, with the kind of the offering that I have right now, it's

really here, let me take a look at all the emission sources in an area where

the proposed pipelines, what does that look like?

And then what type of scenarios can you build off of that?

And so the, I think the, go ahead.

No, so I'm an operator.

I want to get into sequestration because I just announced this giant carbon

capture plant in the Permian per se.

I con I'm trying to figure out how to logistically make that work.

Once I capture the carbon, that's kind of essentially where you would come in as

far as well, at least on the front end, right?

When you're planning and doing all the logistics.

Okay.

And so is it, is it just, are people looking at, I'm, I'm fairly uneducated

on this in the details.

So I'm curious, is it people looking at using existing pipeline infrastructure?

Are they trying to look at where they can build new pipeline infrastructure?

What's the, all of the above?

I think it's going to be pretty much all of the above.

I think, let's see, Tallgrass in kind of outside of the DJ, they're going

to repurpose an existing one.

There's a couple of conversations happening in the Permian about repurposing

and connecting into some of the existing CO2 pipeline that's already there.

Cause there's thousands of miles already.

It makes so much sense in the Permian just because of the fact that they've

been doing it there forever and all the infrastructure is in place on top of the

fact that it's one of the most prolific fields.

So now you can increase your recoverable.

So I don't think people understand how like oil and gas companies basically

get to double dip on this, right?

We get penalized for the emissions, but then we get the tax credits for pulling

the emissions out of the air.

And then we also get the upside of taking our recoverable reserves from

whatever 20, 30% up to 40 to 50%.

And so it's a, I think, I mean, it's a great idea in my perspective, if you

can figure out the physical logistics, not just physically moving it around,

but also the physics that go into capturing it and powering it and all that stuff.

Some of it's come to mind for me.

I mean, is it the same friction for like new pipelines if they're

allocated towards CO2 as if they're-

That's a good question.

Oh yes.

So yes, no pipelines.

Right.

It's yeah, no pipelines, period.

Yeah.

Well, I didn't know that was the thing because if you can say, well, this is

for CO2 and then like five years online, you know what, this business isn't

working for us, but you know, now we've got a good takeaway for natural gas.

Infrastructure for anything else?

Yeah.

So if you, there's a company called Summit Carbon Solutions, they're running

pipeline, they want to run pipeline through the Midwest all the way up to

North Dakota, where they already have, I think they've already been permitted

for the secret restoration well up there, but they're fighting all types of

battles, counties, states, federal, everything to get all the approval.

I think they have 2,200 miles of pipeline that they want to build.

And it is, you know, it is a, a web of, um, infrastructure connecting ethanol

facilities all the way through, you know, back to the sequestration side.

So huge project and will be awesome if it's, they can get it all done, but they

have a, they have an uphill battle from the community.

Yeah.

Well, I mean, that's, that's something to me that is just so under looked in

the energy space is like, Oh, well, we need more refineries.

Let's say it's going to take five years to do a study to figure out where, where

and how, and all the environmental, all the, you know, NIMBY effect and political

stuff, just to scope it out, right.

And then it's like, and then it's going to take another five years.

Once they do that, if they get the permits and approvals and all that to

build it, and it's going to cost a billion dollars.

And so it's like, people don't realize, you know, the time, effort and scope that

goes in and just like the amount of logistics, right?

Like getting local state and federal for 2200 miles sounds like a complete

nightmare to me.

I'm so glad I don't do that.

Uh, completely, completely.

And so we'll see if that happens, but you know, back to the Permian, there's

this huge kind of wave of carbon capture projects that were announced around

ethanol facilities, mostly in the Midwest.

So that's all kind of moving forward.

I think the Permian and natural gas either I'll say kind of midstream

companies in general are kind of this next wave and there's, they kind of sit

in a perfect situation where they have a lot of gathering facilities.

Some might be, um, have sizable emissions and being able to capture that and then

produce, you know, cleaner natural gas or even just take the credit or, uh, even

better utilize it and go create kind of a sustainable aviation fuel or something

else.

There's some, there's some paths there that could be, uh, be really interesting.

Yeah.

No, I mean, if you could figure out the power source piece on the carbon capture

side, the, you know, you could theoretically have zero in theory.

I've never done the math on this and I won't because it's above my pay grade,

but, uh, you know, you burn it, you capture the CO2 goes into the pipeline,

goes back into the wealth that raises the reservoir pressure and aids in recovery.

And it's just a closed loop system, right?

Like there's potential that, that could exist someday, which is kind of

wild machine.

So to kind of steer it back a little more towards like the data and tech side,

like, so, uh, at DeCarbonFuse, so are you aggregating data from all different kind

of, you know, regulatory bodies on like a lot of public data and everything?

Back to, uh, some of the, um, public information, that's all the EPA emissions

data, essentially, which is, you know, there's a couple of different sources

within the EPA, so we did a couple of things and publicly you can, most of

this is available just, um, on the site, right?

So you can see all the companies in energy and industrial in general, in

those sectors, um, pulling in a lot of the, say sustainability reports.

So what are they, you know, what are they doing from a scope one, scope two,

scope three emissions, and do they have any sustainability targets and then

going one layer down on the facility?

So what type of facilities do they have?

And kind of rated some of those facilities based on, oh, this is, uh, you

know, this, the air quality is good.

It's maybe high compared to others in the industry.

And then really then pulled in all the emissions data for the facility so that

you can basically say, okay, I only care about, um, natural gas facilities

or, uh, fertilizer plants or cement plants or whatever, and I want to see

what's available for CO2 infrastructure, what's available for, um, maybe other

partners, so that's where the screening part comes in, especially with companies

that are looking at, maybe they're only interested in Wyoming, right?

So they can see kind of what that looks like and, um, and basically provide

kind of a single source for kind of all that information.

Okay.

So you, you're just, are you just kind of aggregating all the time or are you

doing it more project based for customers?

And then like what, maybe what kind of tools are you using to get that?

So surprised I'm still Ruby.

So all Ruby on Rails on the kind of the, the website, uh, the scripts are,

there's a couple kind of Python scripts in there.

I'm trying to get like a little bit of Python experience.

Like that's been a little bit of a headache, but this time around, um, I'm

doing a little more of the kind of, I'll say development work myself, um, just to

get familiar with some new information.

And then, um, we're taking, let's see a handful of GIS tools and kind of

relearning some of the kind of Mapbox functionality.

So from the tool perspective, it is, uh, Ruby scripts, a couple of Python scripts

pulling down data from most of it's actually in shape files or CSVs, some kind

of like tab delimited data and then, um, aggregating that and this time around

them using, trying to leverage Mapbox a little more and, and clean up the data

and push it to Mapbox and then just visualize it on the front end.

And so that's been a little bit easier to do, um, versus, uh, where Mapbox was,

you know, when we started in an intergen.

So that's kind of that process.

And that goes, uh, we're basically doing that on a weekly basis.

Okay.

The most frequent, some of that is on a monthly basis.

And then there's some of the, um, some of the EPA data is only updated once a year.

So I kind of have to balance that out a little bit.

And like the GIS data is, are you, are you winning that in like a Mongo

or do you have like a QGIS?

So I was going to ask you what DB you're on the.

So this is where I've been getting some help on the QGIS side.

Okay.

If you kind of normalize some of it, um, can do some real simple kind of, um,

aggregating there as well, right.

Uh, before getting into getting into Mapbox.

Um, and then what else on the, I'm thinking more on the kind of the

data and the integration piece.

So we're basically doing all of that locally now, most of it locally.

There's some that's being hosted, uh, kind of just AWS S3.

So pulling it down, I have the raw data available, keeping it as is, and

then processing it, so kind of our ETL layers a little bit local,

a little bit in the cloud.

Yeah.

Um, and then, uh, keeping kind of keeping all that together so I can move it into,

move it into Mapbox and then eventually we'll have some of the announced projects.

Um, so right now we're just focused on North America, but, uh, the announced

projects are then in a Postgres database, which is updated essentially on, uh,

uh, monitoring the news and announcements on a daily basis and then kind of confirm

everything on a weekly basis.

Okay.

Yeah.

So can you speak to that?

Like, cause I'm assuming, I know a lot of people in the energy space are

very familiar with Esri, like ArcGIS can speak about QGIS and benefits to it.

Uh, only, I mean, I am a novice user on the, on the GIS side, but I think one

thing that, that I can do very quickly is pull in shape files, do some editing,

do some, um, kind of rationalizing of the data.

So especially thinking of, you know, the, there's a lot of functionality.

If you open up Esri as a first time user or even kind of, you know, intermediate,

yeah, you're probably going into, you know, specific functions or specific

things you do kind of every time, or maybe use some of the workflow tools.

Um, but with, um, QGIS, basically I'm using it as here a little bit of data

quality side, and then I also have another who I'd consider more kind of

intermediate to advanced person that I know once I'm getting into some of the

shape files, some of the different formats that then, uh, can get some

help on visualization within GIS.

And I passed that off.

Um, but for me is kind of a, I'll say even beginner user, it's a, it's a simple

way to kind of get access to the information you need and get it on a map.

Yeah.

And it's open source, right?

Open source and free and, you know, runs on the Mac and like, I'm good.

Yeah, no, I think it's important for people to understand.

Cause I, again, you've, you just walk right into an operator your whole life

and you just, oh, as we, but then you don't realize how much your company's

paying for that license.

And even having for us, you know, we have Esri, but we've actually got a

couple of people engineered and stuff that, you know, know how to use QGIS and

they can pull it in and do what they need to do.

And we don't have to pay for another license of Esri, you know, to do

those simple kind of things.

Yes.

Oh, exactly.

And like, I still use some of the, um, Esri map kind of web services.

Um, so still plug into a couple of those, but no, on a, I'd say on a, on a

routine basis, it's all kind of, um, QGIS.

And have you all looked at it or done anything with how to, or needed to do

anything with say the, is Postgres just kind of a serving layer?

Cause I know Postgres has PostGIS too, which is pretty powerful.

Yeah.

And I haven't used any of the PostGIS stuff yet.

Um, I know, um, for a couple of people have recommended for building out kind

of the scenarios and heat maps and different things, you can do some really

kind of interesting things on the fly with, um, PostGIS, but I haven't done any

of that right now, kind of that visualization piece I've, I've pushed

in the map box and kind of recreated some of the analysis with new layers.

Yeah.

Effectively.

Yeah.

Um, I'm just being an R guy cause it's like, you know, we're here to nerd out

on data and have you used any of the GIS stuff within R like the SF and all that?

Yes.

Yeah.

Although I have a little bit of a hard time with, um, some of the more complex

maps, I mean, I shouldn't say anything over kind of three or four layers and

dealing with the scales within, um, within our feel like that breaks down a little

bit, but, um, but overall, like there's some basic maps that I think that have

been really good and like kind of who's doing what for primacy, for example, that

was kind of nice, easy one to create an R and you know, simple there, um, being

able to get points on the map.

Yeah.

So I mean, as a, as someone who's used Mapbox in a number of places for a

number of reasons, I completely agree with you there.

It's like, there's so many, especially with like the GIS and ArcGIS, uh, just

like, Hey, there's this free thing over here or, you know, close to free of your

beta prototyping stuff where it's like, Hey, you can go try this out very easily.

Copy and paste API key.

Now you have a Mapbox plot in your Power BI instance or in your website that you

can just embed, right?

It's a very, I think there's, there's, that's one of the reasons I wanted to do

this podcast is because there's all these tools out there that everyone uses

that it's like, Oh, I didn't know about that.

And that's really nice.

Right.

But what, uh, what kind of, let me back up.

What, I want to talk a little bit more about your satellite experience.

Cause I think that's fascinating.

Yeah, it's fascinating by itself, hard stop, but then also the industry is

starting to use it more and more for a lot of really interesting things.

I mean, there's a number of companies, Josh Adler's company.

They changed their name.

It was Sourcewater.

It was Sourcewater.

And I'm sorry that I'm forgetting the new name, but like, that's built off of that.

Right.

Like a lot of their stuff.

Yeah, there's a lot of stuff.

So talk a little bit more about that.

Like, how do you, what are the limitations of it?

How do you see it really making an impact on the industry now and

moving kind of in the future?

I think one of the, one thing that we saw that was incredible was the acceleration

of the number of satellites that were covering the globe in between 20, call

it 2017, when we were acquired to, you know, 20, 21, and even now, right?

The number of satellites is crazy.

Yeah.

Walk people through just taking a step back, like, okay, what, what do we mean

by satellite data?

How is like, what does that look like?

Who are you getting it from?

How often, all of that kind of stuff.

So there's a couple, I'll say open source, satellite ready kind of analysis

ready, imagery companies out there.

And so you have kind of Landsat would be one that would be, you get certain

coverage, I think on a month, it's been a little while, but it's around a month

type of view of probably 10 to 30 meter resolution.

And this is where it's a, right.

When you start getting into the, the frequency and the resolution is what's

so important and that's where the cost is.

But we ended up using, because there's a couple of different providers, right?

Yeah.

Who are they?

So Sentinel is some of the data that you can acquire that has several

different layers to it, right?

So you could get the typical kind of RGB, which is your visualization or what

you would probably see in most applications.

Then they have some vegetation layers that kind of pre-built almost.

And they also had some layers that were pre-built around

detecting moisture and water.

So those, we've got Sentinel-5P, which is the methane one, right?

Methane, yes.

So that's coming.

And I know there's a couple of companies that are trying to, to use that as well.

I think the, where we started was a concept of can we see, you know,

that was very futuristic five years ago, right?

Like that was like, can we see permits?

It's like, can I detect pads before the permit is there?

We knew, we knew for a fact there were companies that were developing and

clearing the pads before, before any notification or regulatory report was

out, right, knew that for a fact.

And we knew we could kind of try and identify those visually.

It's like, okay, this would be, that would be stellar.

And then you start going down that kind of value chain and it's like, oh, well,

if I can confirm the rig onsite, right?

If I can confirm the frac crew and looking at the frac activity and then even

afterwards, can I tell when the frac, you know, when the frac crew leaves and maybe

there's flow back, maybe you can tell something's happening, but when, when,

when we'll actually get to production.

And so that kind of value stream or kind of chain of events, I would say, we tried

to automate pieces of that and really be, begin detecting kind of the permit side.

So this is, this is a funny thing where I think there were a lot of companies

talking about this at the same time and doing different things.

And you should look at some of the like, and Boyd and I sat down one day and

started kind of making up stuff about, well, like what, what if we could detect

kind of the, the permit date beforehand and, you know, the drilling dates and

rig dates and frac dates and everything else, and started kind of putting a

process to it.

And so we set out, of course, started with the Permian and he was able to do

some pretty good, I'll say machine learning around detecting the imagery,

processing the imagery, looking at certain areas and being able to see at

least, okay, here's a pad, here's what it looks like.

And then built a whole kind of processing slide to this that we would essentially

download the images, only the ones that we needed, and then do some automated

work upfront, but then it became a manual effort and we ended up using some of our

analysts to basically detect, detect kind of what the rig looks like, what the

frac crew looks like, and so forth.

Did you use them to train a model as you went?

Yep.

It's probably serving two purposes.

Like you're augmenting it, you know, for your customers, but then also then you

can take what they've kind of confirmed.

And when we were able to kind of scale that from the Permian to Eagleford to

Haynesville to Bakken, and I think the DJ, and I don't think, I'm trying to

think if we ever got to Guahoma and majority, we covered majority of the

cell basins, but one of the things that we saw early on was it wasn't frequent

enough, right?

There was not enough imagery.

And then all of a sudden kind of year in, we moved from once per like 10 days to

once every two days, three days, and sometimes daily, which that was, that was

huge.

Yeah.

Well, but then that comes with a cost, right?

Like, so that's my next question for you is for people that are looking at or

interested in satellite data, kind of talk about, you know, the knobs that you

can turn from the different providers, right?

Cause if my understanding is it boils down to basically resolution and

frequency, right?

Exactly.

So if you, if you're looking for something like a pad being built that can be done

in a short period or a rig moving on location or whatever, you need much

higher frequency data than, you know, if you're monitoring the completion of a

refinery or something like that, it's going to take years.

Yes.

Yeah.

And so seeing the resolution, you know, there's obviously what's available

publicly, which is what 10 meter and then goes all the way down to what I've

seen, I think is three centimeter resolution, which you can basically see

the logo on the side of the truck.

Yeah.

Like it's amazing.

Um, also super expensive, right?

And so we ended up, uh, talking to a handful of companies that were already

like in that space, I think what, what we saw is a lot of the imagery providers

wanted to become, wanted to deliver some of the analytics and wanted to be the

source for that, but didn't really understand the nuances between, you know,

a, well, that's actually a frat group, maybe, but it's just trucks parked.

Yeah.

That's staging ground.

Like, so we, we had some, um, we're able to kind of combine some information that

others weren't or actually others weren't doing at the time, but just buying the

satellite imagery, we went and got quotes from everyone to do the Permian.

And it was ridiculous.

You know, this is obviously it's the pricing has come down, but at the time it

was, you know, several million just to do the Permian and then just consider

what's going to cost me to put a satellite in orbit.

Yeah.

Yeah, exactly.

Which is, I think what some of the, uh, you know, companies are doing now, but

it's like, wow, that's a, yeah, that's taken it to another level.

Um, and, um, so yeah, we saw the, we saw the value in it.

We could, we could produce some insights from it.

We could see kind of the, the pad detection and some of that kind of work

come in, um, but really the, I would say the true value of, of what we were

producing was then all the way down the value stream of saying, okay, here's

a frat, frat crew, we can say, here's a frat crew that's consuming, call it, you

know, what, 2,500 tons a month type of deal and then forecast out what the

activity, um, would be based on that frat crew and then even further down the road.

Um, can you, can you say that well is going to come online and be producing,

um, with, with better, more granular information kind of.

I was trying to get university lands on that.

Yeah.

Yeah.

I mean, cause I mean, you know, just like you're saying from the people doing work

on the land before, you know, they're permitted to, I mean, would,

would have been a big deal.

Oh yeah.

Yeah.

And just also knowing, right.

Cause again, you know, you got the, say the invariant, the rig data stuff, you

know, but the drilling does not mean production, you know, you know, you've

got ducks now for a year or two years.

I mean, like, when someone's tracking, you know, like I should expect, you

know, um, you know, money coming in the door within, you know, a month or

so, you know, like give or take.

So even, even back then we were, we wanted to be able to buy based on

smaller kind of smaller areas.

Right.

And I think there's, I know there's a startup now out of, I think out of Austin

or maybe Denver Albedo that is doing smaller areas and they're like, they're

focused on the commercial side.

And I think that's like, I think that could be a winner because serve me the

data that I need also don't try and build the analytics on top of it because I

have a very custom nuance app use case or application for it.

Yeah, just keep that data.

Now that's, that's interesting.

What will someone roll your machine learning comments from that into the

next question, which is what, you know, how do you see ML, AI, GPTs of the

world playing into the, you know, the energy space today and in the future?

Like, where do you see that being kind of the most impactful?

We can, we can just pick, we can pick GPT or insert your favorite ML kind of

tool or platform just to narrow it down.

But yeah, no, uh, I mean, everyone seems to be talking about GPT so I'll start

with that and then we can work backwards.

But, um, there's so many interesting little use cases that are probably being

tested and piloted right now on, um, and with, uh, chat GPT that I think you

could, if you can get back, if you can get by some of the privacy concerns, um,

I'll put that to the side, but just some of the use cases, um, like, like being

able to pull in, um, drilling docs and answer questions around kind of the

drilling process without having access to WellView or without having access to

whatever drilling tool you're using.

I think that is a unique, um, kind of, uh, kind of deal.

And then especially, um, talk to a company this week that is really developed a

lot of, I'll say kind of AI toolbox, if you will, uh, targeted oil and gas, maybe

some healthcare, and they're just doing data discovery, right?

So where, what do I have?

Right.

What do I, you know, what do I need?

And then can I monetize any of that?

You know, can I go sell some old kind of well information or logs or, you

know, defects or whatever?

Do I even have it?

Yes.

Yeah.

And so seeing that kind of that data discovery angle, um, I think there's

been so much talk about kind of data as an asset within, within oil and gas, but

it is, um, it's hard to do and, and you have so many different source kind of

projects and different kinds of workflows that are needed.

Um, but the AI side, I think if there's an, what I like about, um, chat GPT and

open AI is basically the API component to it.

So if I can use your kind of intelligence, combine it with my own

data and I can keep the data on my little vector database or whatever type

of database, then I think that could be fascinating.

Absolutely.

Yeah.

Do you all have any kind of discovery?

You have some kind of discoverability engine or something on your platform?

Yeah.

A little bit, but not, not something you could kind of plug into, uh, yeah, not,

not yet plugging into scanning, I would say kind of scanning, um, uh, sources

and, and crawling different things, but internally I could see somebody creating

something and then pulling that into their own little database to then say,

Oh, here, this is what I need.

I don't want to pay for this data again.

Um, or I want to sell some information.

Oh, that's perfect.

Yeah.

Well, I mean, like even just, I say it's a basic use case.

It's not a basic use case, but like discoverability, right?

Like whether it's within a company of your own documents or on a, on

Reddit or on a mess, like whatever, right?

Like discoverability historically has been a very hard problem that, you know,

all the social media companies have figured out ways to, to, I guess you

could say solve, but their real intention is just your keeping your time,

not necessarily what you care about.

Yeah.

Um, but I mean, that's a big thing, right?

Like even with when you're talking about the evolution of the internet, right?

Like we went from forums or chat to forums and now we, I mean, we're still

using forums because Reddit is still huge, but even in that, in that, like

the discoverability part of it is such a key component, right?

Because you want to serve the person that is using it exactly what they want.

But if it's not the old way of having to tag, you know, put in tags or hashtags

is basically the modern version of that now, right?

But being able to do that with AI in the future, where it's just like, it just

knows, it knows everything that's in there so it can find all of the

missing pieces for you is pretty crazy.

Yeah.

I mean, I think someone saw someone talking about just how it's really

going to stir up like the elastic searches and some of those are real too.

Like if they haven't bought into this, I mean, like it's going to wipe them out.

Cause now you can just run these LLMs like on, you know, internal things and

it's going to be so much more powerful and better than, you know, what people

were already using those tools for.

For sure.

We've got like five more minutes.

Just jump into the speed round.

You can expand on.

Let's do one.

What's one piece of advice you'd give people either getting into the energy

tech space or that are, are new to it.

That I'm new to it.

Um, yeah, don't do free pilots.

No, that's a free pilot.

Well, you say that, but I mean, I think it's also controversial.

I mean, like there's plenty of people, I think that see the other way too.

So, yes, yeah, says the guy, says the guy at the operator.

I just want to point that out.

No, no, I'm talking, I'm like, I know somebody's on the software side that

or on their side having decent luck with it, but at the same time, definitely

want to hear your thoughts on it.

Yeah.

Yeah.

So, um, I've just seen so many companies get stuck in this pilot mode.

Yeah.

And if you don't have a way to monetize that, then you're, you know, you're

running your.

Takes resources.

So you're just running your opportunity kind of into the ground.

So you're just burning that cash, right?

There's so many, um, there's so many, I would say smart and unique things that

you can do within, um, within oil and gas and especially oil field that will

pay for solutions.

So find that, find that willingness to pay and then move, move that direction.

So go, go work for the smaller operator that's willing to pay for the pilot

instead of going to work for the big, uh, NLC or IOC that, yeah, it's going to

take three years and lots of time and handholding.

Yes.

Yeah.

Those sales cycles are very slow.

That's, that's still one of the biggest misconceptions of that.

I see software companies coming from outside of the industry into the industry.

Like, well, if we can get shell, we're a billion dollar company.

It's like, it's going to take you three years and a long time if you get in

lots of gatekeeping, lots of interviews, lots of hoop jumping, all of the

insurance, all of the documentation, all of the security, like, or you could go

talk to a smaller operator that doesn't have all that stuff that is also willing

to potentially pay for it because they're not going to be able to duplicate it

themselves, another big risk that you have.

But all right, Bobby, let's, let's do the speed round.

Yeah.

So, um, what's, uh, your favorite cloud that you've used?

Favorite cloud.

I think just some of the simplicity of AWS and S3, I still use it today.

I've been using it forever.

It's like, it's just easy.

Yeah.

I mean, S3 is probably the, one of the best cloud offerings that there's been

in it, it's to the test of time.

I probably have three accounts.

I'm still getting billed, but it's like a dollar 25.

Oh yeah, no, I've got to, I don't even know what that is.

That could be something, I don't even know.

Yeah, that's not something I'm saying.

Um, let's go down.

What's your favorite managed service?

I mean, you mentioned Heroku previously.

Yeah, I mean, I've, uh, I've just been using Heroku for so long.

I would say that's probably still the one, um, there's several coming out

that I'm kind of watching kind of more on the, on the rail side.

And then I still think there's this little gap with some of the, um, R and R

studio stuff.

That's kind of fascinating for kind of deploying, um, you know, deploying

some dashboards and different things like that.

I just don't know if they're

shiny, sir, or the, you know, like, yeah, some of that hasn't been great, but

yeah, that didn't work out so well, but I'm still kind of keeping an eye on

that to see what else, what else you can do there, Freddie Drennan.

I don't know if you follow him on LinkedIn or not, but he's got some

like indexer thing where it's supposed to be really helped like push your

R projects up to the cloud, the cloud.

Yeah.

Um, about security.

Oh man.

Well, um, security tool.

This was written by someone who has never dealt with IT security

in his entire life.

So you don't have to answer that because I just threw that on there because

I was like, ah, maybe I don't even, I don't even know it like, uh, I guess

last, I don't know.

Yeah.

Yeah.

Like, um, yeah, I would be like some like off zero that

makes authentication easier, but let's do visualization tool.

Oh yeah.

I definitely go with R and then I still go back to, uh, some of the, I guess

JavaScript JavaScript components like, um, D three to do some kind of unique

things.

Um, I think that's, well, that's what libraries and R DG plot, uh, GD

plots, doing a little more with SF for sure.

And then now I'm gonna have to come at speed on kind of the tidy

verse piece of that as well.

No, I mean, GD plots, but pretty core component, but I mean, one that if

you're talking about the D three and you'd like GD plot is plotly and then,

uh, cause they have GD plotly.

So it was basically a wrapper.

So you just wrap that around your GD plot code and it makes it interactive.

Yeah.

That would be fantastic.

I haven't used plotly in a long, I remember seeing it when, um, we would do

some kind of visualizations, but I haven't, I've haven't personally used it.

Yeah.

But I mean, if you're using any R, like, I mean, literally it's just simply

just, you know, install the plotly library, but then like they have the

GD plotly calling, just wrap your GD plot in GD plotly and it makes it

interactive and it brings in all the same formatting and yeah, the

formatting piece is like, it keeps it like your GD plot formatting, however

you've made it, but then it makes those pieces interactive and markable

and all that kind of stuff.

Cool.

This is why I love coding because it's like, Hey, you used to have to do all

these really hard, monotonous, boring things and another wrap.

Polly's built on top of D3.

So, okay.

Awesome.

Um, one more.

Yeah.

One more.

Let's go.

What's the most interesting, like emerging bleeding edge tech

that you're excited about?

I'm trying to wrap my head around all the, I mentioned a little bit, all

the new vector databases and that's just fascinating to me that like

just in the last four months, there's probably five or six new ones that are

out there that, um, that, so it's kind of use case that I'm trying to look at

is, all right, can I take, um, any type of policy documents or, or research

reports or anything like that, create embeddings within kind of the pine cone

or try, uh, what's the other one?

Chroma that I was kind of looking at.

Um, and then you use that with open aid, open AI API and create something.

Okay, cool.

So that's going to be like, that's fascinating to me right now.

And getting it to work has been, um, a little bit of a challenge.

Like, imagine that even like give it a year and you are going to make

that so much easier, right?

Better.

Like easier, but doing it now, like where the, while the barriers, your

entry is higher, it provides more opportunity, right?

Right.

And just seeing, you know, just trying to understand the technology side of it.

Kind of my, I don't know, it blows my mind a little bit on the whole, like

how it's turning it into vector and what that means and how the interpretation

is happening and all that.

But no, I mean, I've got some homework.

Yeah.

I'll be Wikipedia-ing that after we're done.

It's the first time I've heard it.

So I'm excited.

That's, uh, yeah, no, I mean, at the end of the day, everything is basically

built on or around some kind of database.

Right.

And I don't feel like a lot of people respect that.

And he all know Corey Quinn, he he's a cloud economist, the guy

he's got some really good stuff on AWS.

He's pretty active on Twitter, but like, he's just points out like,

Oh, that's a database.

That's a bit, you know, route 53 is a database, you know, everything, you

know, cloud front as a database.

The internet is literally just a giant assortment of databases that have

relationships to each other with gooey on top of it is really like, if you

really distill a lot of it down,

what we were at that UT blockchain thing.

And that one guy, Jimmy, something like he was like, it's a bleeping

database with rules.

It's like, yeah, it just turned inside out.

Yeah.

And it's just exposed to the, yeah, it's like, well, every, every company

should be a data business when it comes down to it.

Right.

Yeah, absolutely.

You know, depending on your, your Wells, your real-time information coming from

that side, you know, what you can do with it, what you want to do with it.

There's so many opportunities right there.

And Excel is not a database just for the record, just for clear.

Yeah.

Well, awesome.

Well, not Todd.

It's always a pleasure talking to you, man.

So really.

Thanks.

Appreciate it.

Enjoyed it.

Thanks, man.

Yeah, you bet.

While some may see them as the crazy ones, we see genius because the people who

are crazy enough to think they can change the world are the ones who do.

Goodbye.

Creators and Guests

Bobby Neelon
Host
Bobby Neelon
Husband, Father, Baseball, Upstream Oil and Gas, R, Python, JS, SQL, Cloud Computing
John Kalfayan
Host
John Kalfayan
Raddad, energy tech, crypto, data, sports, cars
EP 3: Bush League: How Todd Bush Has Utilized MongoDB, Ruby on Rails, and R to Streamline Data Access