EP 58: Jon Brewton from data²

0:00 Welcome to another episode of energy bites. I'm Bobby Neeland. Got bad dad, John Calfan here with us. How's it going? How's it going? And super excited to have John Bruton on with us from, uh,

0:09 from data squared. The founder and CEO, correct? Yes, sir. Awesome. They're really excited. And, um, you know, out of kind of AI companies out there, but y'all are one I've heard from a

0:18 handful of people, like you were excited about. They were trying. We're trying. No, um, you know, having seen a little bit, I'm definitely excited to dive in on So if you guys are doing, but

0:29 also what you've seen throughout your career too, because you got a lot of experience. So yeah, for sure. Yeah, this will be a fun one. You've got a good, you've got your background at Chevron

0:37 and then everything since, which is, I think a lot of people are very interested in all the AI stuff, obviously, but specifically from people that understand the energy spaces. Nice little icing

0:50 on that. Yeah, it's not that person swooping in from Silicon Valley. You're going to solve all our problems. Yeah.

0:56 Yeah Now that is a problem, at least with uh, with our industry, um, I, I worked for a fair bit at Chevron, like he said, um, kind of worked everything from the upstream side of the business,

1:08 uh, all the way through midstream, but a part of my job, uh, whenever I was there was looking at new technologies and scouting adventures and one of the critical sort of imperatives we would work

1:19 with when we were looking at new people, whether or not they had any experience in the industry. Like it really is sort of a teaching effort to get people to understand what we're doing If you don't

1:28 bring a background really at all from the industry perspective, it's really difficult to get in and be successful. Yeah. I mean, you bring someone in and they think that when you talk about

1:38 natural gas that you're talking about gasoline problem, right? Yeah. It's a big problem. Yeah. No. You do have that. It's in a big pool underneath the earth. It's not going anywhere. That's

1:47 my favorite. That's my favorite. Daniel Day Lewis said. Yeah. Let's draw over here. Absolutely. No, I think that's something we've talked about a lot is just, you know, is it easier to

1:59 technology domain or energy domain, if you're trying to bridge the two, right? Is it better to hire someone that knows the tech a lot and then try and teach them the oil field side or the energy

2:08 side? Or is it easier to take someone who has the domain in that space and then teach them some additional things on the data side? And I'd typically fall into the latter of those and that the SME

2:20 is a lot easier to train upon on the technology side, generally speaking, of course But I would tend to agree, it is a lot easier to teach an oil-filled professional, especially from an

2:32 engineering background. Today's engineers, they code, they do all kinds of things. So they understand the digital domain really well, but it's much easier to take an engineer as domain experience

2:44 to personalize that experience to the technology, I think, personally. That's the thing. I've always loved coding because it's very logical and like engineers were logical to a fault a lot of

2:55 times. And so teaching us coding logic is very straightforward. I'm especially now that you have more approachable languages to like, you know, I mean, I'd say probably R and Python and even SQL

3:06 are, you know, kind of in a lot of engineers' tool belts now. And like, they're much more approachable languages, whereas like, they probably weren't a ton of petroleum engineers jumping in and

3:14 like, ooh, I'm going to write some Java or C. But I mean, like, well, I mean, I probably VBA was the pretty conservative say Python was things. But I remember in college doing derivatives in

3:24 VBA as in one of my classes and being like, this is really stupid. Why will we ever use this? But yeah, no, it's funny to see how that's kind of come full circle.

3:36 I'll jump in with our tech story for the day, but there's a article from talking about MIT's robot learning breakthrough. It says, MIT researchers have developed a novel training method for robots

3:48 inspired by large language models, combining diverse data sources to enhance learning and adaptability, and some of the key applications they list include. robot actions, so using language models

3:59 to interpret human commands and generate appropriate robot movements, employing vision models to enhance robots understanding of its environment, training general models to map human instructions,

4:09 and utilizing a generative design process to create more efficient and innovative robot structures. And so like this is really cool because this to me is where everything ultimately ends up, right?

4:20 Where it's like, I was thinking about it, and language models to me are effectively just the most efficient way that we have today for a human to communicate with a computer and for both sides to

4:33 actually, generally speaking, understand what's going on, right? Before you'd have to learn the code and the code was the language of the computer, you know, Python, whatever. But now you

4:42 don't even have, we're getting to that point where you don't have to and you don't have to go train a vision model, you know, that understands us, you can just pull them off the case or off the

4:51 shelf almost and start using these things. And so I think it's pretty awesome to see that, Yeah, we're using. new technology, multiple new technologies to enhance all of them. Yeah, you know,

5:03 kind of one plus one equals three almost in a way. Yeah, but I think also, I mean, points even to what we've talked about, like that these probably more niche models or smaller models are gonna

5:12 be what really, you know, moves things forward as well. Because I mean, you can't put, you know, open AI is full model on a microprocessor on a robot. It's actually something we're working on

5:24 with Dell right now to essentially define deployability for on-prem, just sort of infrastructure in general and what's the maximum sort of model you can deploy within that infrastructure and how

5:36 effective it can be. And so we're doing a little bit of work to define that for the federal government at the moment, which is pretty interesting. I agree with you completely. Yeah, 'cause even

5:45 like I got something like this, whether it's a robot for a certain purpose, like it doesn't need to know all of this stuff. Correct. Again, say an oil and gas, you know, I've seen like, you

5:53 know, people using like the Boston dynamics dogs and they want to do like tank inspections and stuff like that like you can just feed it the necessary information for that and it's gonna be much more

6:02 effective too because it's not gonna have all that noise and You know just more things to hallucinate on so yeah, no, I agree with you I'm very glad we're not trying to boil the ocean like replexity

6:11 or Chad GBT foundational model companies that Bw because Wonder it's a lot easier to have better answers when you have a smaller data set that's focused on things but to you it's also Time consuming.

6:24 That's a really good lead-in for us because we as a company figured out a way to Completely navigate around that. We don't care if you have a 27 trillion point drain model like it doesn't matter to

6:38 us we can actually whittle through that because of some prior to your processes that we have and Like we can essentially create hallucination free outcomes regardless how large a model is or how broad

6:49 it is So we can take the ocean and we can distill it down to its constituent parts associated to the problem that we're trying to solve and really narrow down the training corpus itself to the thing

6:59 that we're trying to solve for. It's kind of one of our pieces of the secret sauce, puzzle that DataSquared tried to solve. Sweet. No, I think that's a fascinating problem because the two trends

7:11 in LLMs are either bigger, you know, bigger, more future - Sorry, yeah. Right. Or smaller. We're gonna run this on your local phone, laptop, yep Raspberry Pi, I was gonna say, who know, we

7:26 might even get to being able to do this in browser at some point, but there's no reason you can't. Yeah, it's gonna be, they could top in the GPU, right? So I mean, yeah, no, it's gonna be

7:36 very interesting to see, but to your point, I do agree that

7:41 for the kind of enterprise business use cases, it ultimately, ultimately people are gonna look at trying to get the smallest, most efficient models, which lead to the best, most efficient answers,

7:55 because. for all this compute is not cheap, most of the time. No, it's not cheap, and it's really, from an infrastructure perspective, it doesn't scale very well. Whenever you start trying to

8:06 plan out exactly. It's probably not necessary, yeah, 100. Yeah, that's it, there's a lot of unnecessary components and costs associated to trying to do it, at least within the way that we

8:15 understand it today. It's one thing to store all the data on like some S3 buckets that are 23 bucks per terabyte. I mean, that's a little more, you can even argue about that. We need all that,

8:25 how you store it, but then the compute has always been the most expensive part, whether you're using Snowflake or whatever else, and I think training LOMs and even serving them is no secret now

8:37 that that's probably the most compute-heavy things we have in the world, if not the most, and you've got Google and Microsoft and everyone buying nuclear reactors, or

8:48 data centers offset of nuclear reactors, for that reason, 'cause they know that they can't, It's not sustainable from energy. standpoint. Yeah, I'm waiting for the Bitcoin mining operations that

8:60 are tied into natural gas to change the just sort of a data center approach for like, you know, just wear our strength of gas assets in the fields. How can we hook up to them? And then how can we

9:09 power data centers associated to the fields? Like that is absolutely going to happen. And now with things like Starlink, I mean, you can do it more remote and you can have, you know, IO, you

9:17 know, from there. Absolutely. You don't need the communications infrastructure treatment. Yeah, no, I definitely see that as another big, like, coming trend or it's already happening kind of

9:26 underway. But, you know, the Permian has all this associated gas and they're basically a hundred negative, right, they're getting nothing for it. And so they're sitting there instead of moving

9:36 the energy to the source of the load, they're going to move the load to the source of the energy because it's the most efficient way to do that. Well, and now with the IRA, I mean, again,

9:45 because, you know, whatever you think of it, burning, you know, burning the natural gas is okay. Blaring it is not okay. You know, like, burning it for, you know, Yeah, purpose is fine.

9:56 So, I mean, people are going to say, hey, how can I at least monetize this in some way? Or at least not have to pay a penalty, which isn't monetizing it. I did love, though, reading the news

10:06 a couple of weeks ago and seeing that Microsoft is buying three mile-hour. So they can now set up their own data centers powered by nuclear power, which is crazy, but makes so much sense. Yeah,

10:19 it really does. When I've said it too, I think I've talked to Chuck about it here and I've seen it this morning and one of our, if big tech is what it takes to get more nuclear built, I'm here for

10:27 it. Yeah. I mean, like, again, you know, and then I'll call on them and be gone, like, you know, but energy companies are

10:34 gonna become tech companies and vice versa. Yeah, there's a convergence for sure. It seems like. There has to be. I mean, just sort of the way that we did business in the past, the fusion of

10:43 these two industries really starts to change the dynamics and cost efficiency hurdles associated to sort of main paintings, very large, diverse, like disparate portfolios. more you can do to

10:55 centralize how you think about these things and connect sort of technology at scale. There's a lot of what I sort of tried to do at Chevron in a variety of different places is the key sort of growth,

11:05 I think, in a really fiscally sound and smart way. Yeah. Let's get into what what you guys do. Okay. Yeah. I'm the AI guy. Yeah, it's in for it. Yeah, no worries a little bit about what what

11:19 you guys do. Cool. I'll give you just a quick background of the company. We're really young Uh, so I founded the company. I was living in Sydney, Australia. I'll get quickly that why I was

11:29 living in Sydney, Australia. But I found the company in Sydney in March of 2023. And we really didn't do anything with the company. I got with a bunch of my friends at that point, people that I

11:42 trusted and just try to figure out how we could make sense of what it was new technology that was hitting the market and what it meant for legacy workflows and the way that we used to do business. We

11:54 really spent probably a good five months and working together just to figure out how we want to do approach the market or are we gonna be a consulting firm? Are we gonna be software firm? Are we

12:04 gonna build products? Like what are we gonna do? And it wasn't until July 2023 that we all flew to Vegas, got there, blocked ourselves in some suite and the mirage and tried to figure it out. And

12:16 at that point, we had done enough testing with LLMs and some of the other things that were in place to define how well these things worked, how well they understood the domains. Like they scaled a

12:27 sort of broad domain ontology awareness sort of levels and these things worked really well. But we figured out that they worked much better if you could feed them data in the form of context in a

12:41 really interesting way. So our CTO, Jeff Doglish, started a company called Mauna after he left Chevron we worked at Chevron for 10 years together. And they worked on knowledge graph data models.

12:55 So they wanted to build an oil and gas, like sort of broad data model that could be applied for product development in a lot of different places. Really big ask. Young company, LLMs weren't there.

13:07 They raised76 million. They did a ton of work. They worked with every company, but the how to operationalize it didn't exist yet. It took a lot of programming and a lot of intent. And so we sort

13:19 of took that sort of mindset and said, you know, what if we start feeding the outputs of the knowledge graph data models into the LLMs as context when we start to see this, because you can

13:32 essentially create a transparency roadmap between how data is connected and why it's connected in the form of that data model. And we found out pretty quick that it worked really, really well. And

13:46 the efficacy rates on the returns that we were getting from systems that are broadly commercial and open now. We're significantly better, not just marginally better, significantly better when we

13:56 pair these two things together. In July, we got together, we figured this out, and we said, all right, we're going to build a platform, and we're going to build products. Shortly after that,

14:06 we got into an accelerator program with the DoD, and we had to figure out how to personalize this to DoD space. But it's probably a good stopping point. I'll circle back to that in a second to who

14:18 we are as a company So John Bruton founded the company, my background, BP, Chevron,

14:25 very much upstream. And then at the tail end of my time there, midstream went into a bunch of different college programs, studied NYU, London School Economics, MIT, Stanford, and Harvard. And

14:42 through that, ended up making some connections with people. And I left Chevron in 2021 to move over to Sydney, Australia and run a new company. There's a joint venture between a company called

14:54 Quantium, which is the largest AI, ML product development, bespoke product development shop in the ANZAC region, Australia and New Zealand, and Telstra, which is the largest telecoms, and they

15:06 wanted to stand up in industrial support, business, really targeting sort of industrial construction, oil and gas, and different things. So I came over there to run that. And I was there for

15:18 about six months, and I thought, this is really silly Like, I left Chevron to go work for somebody else, now building a company for somebody else. I should just start my own company. So I called

15:29 Jeff Douglas, who's our CTO, who was with Chevron for 18 years, just to run upstream IT for Chevron. And we worked on a ton of projects together that, hey, let's try to figure this out. We also

15:41 called my buddy Chris Roerbach, who's our chief operating officer. Chris is a retired Navy SEAL commander It's actually retired in March, March 1st of. 2023 and he was our third sort of employee.

15:55 I guess at that point nobody was getting paid. So do you call somebody employee? Yeah, he's a founder. But Chris ran CILTM2610 and then a couple of my buddies from from Harvard Eric Costentini.

16:09 He's a Marine Corbett. He's our chief business officer Amanda Fatch who works on all of our marketing stuff She's got a deep, deep background in analytics and another buddy of ours named RJ Deaton

16:24 who is an ex-army intelligence officer, NSA officer and then he worked in sort of product delivery and consulting into the federal government. Like we started to build out a company and do some very

16:36 interesting things together and I think at that point we all circle back to the military side of this thing. So we get into this DoD accelerator and we have to figure out what to do. We'd only

16:50 worked on abandonment use cases at this point. And the DOD doesn't necessarily care about abandoning wells. And so we had to figure out something that would personalize to the space. So we called

17:02 one of our buddies who is really close with a professor that does all the

17:12 NSA, CIA, FBI, analyst training programs before they roll into their roles and got his use case that he does as a capstone project. And this use case is really interesting because it's really

17:23 differentiated data. It's not one mode of data. It's large volumes of data. So you have signals intelligence, human intelligence reporting, financial intelligence, and then sort of on the ground,

17:34 just like intelligence reports. And this is both structured, semi-structured, unstructured, and transactional information. So the reason the story is really funny because we built this platform,

17:48 sort of version one of it. we tested it on an abandonment use case. It worked fantastic. But that's the only thing we had tested it on. And so we get this new cache of data, like 380 gigs of like

17:59 reports and data and stuff that, you know, it's all, it's all synthetic data, but data we don't understand very well.

18:07 We, we actually ingest this stuff and we go back to the professor the next day and say, all right, so we ingested the information. We don't know if it's correct. We need you to like, you see

18:15 this, like, can you take a run through it, ask them questions to see if it gives you some good results and let us know what's wrong. He calls us back eight hours later and he's like, this, like,

18:25 can't be right. We're like, like, well, we did it 14 hours ago. We know it's not right. Like, you know, it's not right. So let's start with that. And he goes, no, no, no. Like, I have

18:37 asked every single question I can thank to ask and it's coming up with a perfect answer. And like, within a day, this shouldn't be possible. But he said the other thing is really interesting as I

18:46 wrote all of the Like I wrote it all by like hand and produced it and like spreadsheets and it's all fake, but it defined 20 more connective tissue in the data model than I knew existed and I wrote

19:01 it. And he goes like, this is like really, really interesting. So at that point, we thought to ourselves like, oh, well, like we might be on to something. Who knows, we're not sure. So

19:15 let's test it out on other stuff And so we started hitting all of our friends up saying, do you have any data that we can take and that you're familiar with that you could sort of pilot test for us?

19:26 So we did a housing market case in Victoria, British Columbia because our CTO lives in Victoria, British Columbia and his friends with some folks. It's all public data, but just checking to see

19:38 like how the real estate marketed to transform over that time. And we ended up defining like,

19:45 fraud and sort of collusion between a bunch of companies to like buy up large swaths of the property market. It never happens in real estate. Yeah, exactly. The funny part is, is you start to do

19:57 these things and you don't really know. It's the people that understand it the best that can really validate what you're getting into. So everything we tested against and the people we tested it

20:07 with kept coming back like this is insane, like you shouldn't be able to do this. And so that's when we knew we were kind of onto something, but it's a good thing to sort of circle back. Like

20:17 there's three real common themes in our company with the founder group and the employees that we have. We're really energy experts, just kind of get back to the topic at hand. But we're also a

20:28 bunch of military vets that served in really interesting capacities. You know, Eric was like a attache for generals and has like an insanely high security clearance. Chris was a Navy SEAL team

20:40 commander, into 6'10 over years, so he's gotten insane. and few of things, and RJ, an Army Intelligence Officer and NSA Officer, same thing. I was an honor guardsman in the military, and so we

20:54 all have these really weird clearances and experiences that sort of bond us around that domain. Jeff, myself, Eric, Amanda, we've all worked in energy, and we had really diverse and interesting

21:05 careers that didn't follow like a typical career path, even though I was a drilling engineer and a wells person, the majority of my career worked on everything almost but that stuff. And Jeff, the

21:17 same sort of a cross-up stream, and Eric brings a mining background. We have some utilities background as well. And then the other stuff is that we just studied at some really interesting places.

21:28 So most of my school was sort of like sponsored by Chevron. They were like, go to this thing, go to that thing, go to this thing. But we have sort of ties, whether it's through NYU of London

21:39 School Economics, MIT, or Harvard,

21:43 really bind us. So it's an interesting, really diverse group of people that are focused on building really interesting solutions that can interface well with large language models and create some

21:54 really interesting results. The thing that we hang our hat on is the high reliability stuff. Like we want to work in spaces where, you know, 95 is in the good up answer. Sure. Because that's the

22:04 thing that we built. Yeah.

22:07 So is this mostly, is this kind of like a rag type in architecture as far as you're using Yeah. So I think of it almost. So there's rag, there's graph rag. And what we do is sort of like Barry

22:23 Bonds level, like human growth hormone rag, you know, Barry Bonds, 73 Homer season. Our stuff is a really advanced form of graph rag that is slightly different. And some of the things that we

22:40 have from a patent perspective are. The reasons why it's slightly different. The efficacy rates on traditional RAG applications range anywhere from 30 to 50. If you have a graph interface with it

22:51 from a graph RAG perspective, you can get anywhere from 50 to 70. Our stuff has a

23:01 zero-elucination basis and is roughly 99 accurate. The only thing it really trips up on is function calls for mathematical operations, which is an interesting problem to solve from an engineering

23:13 perspective, but it's not an insurmountable problem. That's really defining how you can partner with folks that can help you build these libraries out so that you can call them effectively. I think

23:23 that's the coolest part about where we're at in this stuff today, is that it's the worst it's going to be. You can clearly see, okay, this is a universal problem. Someone will figure this out.

23:29 They're working on a specific resource, or this will be in a foundational model soon enough. So it, oh yeah. So you've brought up graphs a lot.

23:43 And let's say in the last, you know, can you maybe talk about that originally, like the, you're talking about mana and like the knowledge graph. Can you talk about that? But then maybe also how

23:53 maybe that funnels into this. And I mean, correct me if I'm wrong. And yeah, you can say as much or a little as you want. Like I think you guys are using like Neo4j for some of this as well. And

24:01 we're talking about what a graph database is. So I mean, we haven't really dove in or dive or whatever the hell word is dove into any of that. So I mean, I want to go into that topic At a high

24:10 level, what is a graph? Look, it's really just a data model. At the end of the day, you have entities and evidence. They are called nodes and edges within the data model. And you have these

24:23 things that you can store information in and you can store the connection of any one of these pieces of information to another one. And that's represented in the edge connections in the data model.

24:33 And that was really Mona's goals. They wanted to build out a very, very large and effective now in words in Mauna's mouth. or somebody will listen to this and be like, That's not exactly what

24:43 we're doing. But a really large and effective data model that it incorporated almost a holistic domain ontology and just sort of the way things are connected holistically in the industry, you know?

24:57 Like, what does a production facility look like? How is it connected to different parts? How is it connected to external components? Like all of these things is really what they were working on.

25:07 And think of it from a data model perspective, like building an environment or a network. The way that I like to explain it to people is we try to build networks around problems that we'd like to

25:18 solve. And one of those problems I'll get into is produce water management. So produce water management, especially in places like the Permian, is a really large thing to manage to from. It can

25:31 water down your economics. It can cause you to set like shut in wells. There's all these LPO things that have been as part of this whole distributed network But that problem is a network. It really

25:42 isn't, it's definable. And so where Mauna was trying to sort of boil the industry ocean, like we're taking a completely different approach to let's define a problem space that we want to interact

25:52 with and then define the network associated that problem space. That's a lot more manageable to try to solve problems. So with produce water management, if we can map all the inputs and the outputs

26:02 of the system, we can essentially, if we internalize the data correctly, tell how things are connected, why they're connected, and the value of those connections from a system perspective. And

26:14 if you can define that, you can start a model within that network. And that's really, really interesting, because it gives you an opportunity to start to lever different pieces of the network to

26:25 see what effect it has, both upstream and downstream of that network and how that can affect your economics. So really, it's, think of it from a data model perspective, it's not a relational

26:35 database, it's not a typical like SQL database or anything like that, but it's the exact same information slightly different. It's a graphical construct. It's still a relationship diagram. It's

26:45 kind of like an RB, right, like to an extent. Yeah, I mean, so I mean, I looked at, you know, four-day when John and I worked together at Reservoir Data Systems and we were working on the web

26:55 app and I was looking at every different time, database, time series database. And, you know, at that time, Redshift or, you know, even just time-scaled EB on Postgres and all that. But like,

27:05 that did come up, the graph database came up. It didn't seem like it worked necessarily for our purposes, you know, very targeted, you know, time series approach. But it was interesting to look

27:15 at and explore for a few days, too, and just like, think about the possibilities. Oh, yeah. I mean, really, I think probably the easiest way for people to think about this is say, like, a

27:21 social network, right? It is, yeah. I mean, like, you know, I'm related to John and you're related to me, so now you guys are related and then, you know - The how's the line? Six degrees of

27:29 separation. Yeah, that's it, that's it completely. And it's really funny, like, Facebook runs a graph interface in the background so that they can tell, LinkedIn does the same thing.

27:42 LinkedIn's got an amazing sort of graph database model that they apply to things. You'll see representations of that natural language when you log on to LinkedIn and it says, you're a third order

27:50 connection to somebody. I was like, how do you define that? Well, the connective issue that you just described, right? The fun part is like how and why you're connected and being able to

27:60 ascertain that. And this is where large language models start to interface with these data models and create some really interesting results. Yeah, I can see that for sure Just 'cause when it has

28:09 that context connected, like formally defined, right? It doesn't have to decipher it or make it up, right? It can actually say, I know these five things are true. And then with that knowledge

28:21 in and of itself that triangulate probably like the same. Yeah, and Neo4j is a great partner. They're really hooked up with the federal government, which is another place that we play. And

28:32 they're also, they've just executed a bunch of partnerships with all the hyperscalers, but they're a company that's on. on the rise and doing pretty interesting stuff, at least in terms of

28:43 introducing knowledge graphs to the industry. And they actually have, I mean, like they're, if like a UI associated with it, that's pretty powerful too, right? Yeah, Neo Dash is a cool front

28:51 end you can play with. You can actually start to mock up your own sort of product interfaces if you'd like through that. It's really agile and really easy to use. And one of the things that we're

29:03 trying to

29:05 do, whenever you code or interrogate in Neo4j, you can do it through a language called Cypher. And Cypher is intuitive, but not intuitive. It's sort of another thing that you have to learn. One

29:16 of the things that we really try to solve for them is being able to query the elements of the graph in natural language. And so we built sort of an interface where you can just type something in and

29:26 natural language. I'd like to see this or that. And it will essentially write the Cypher statements to it and pull the information out and apply a product to that So trying to make things as easy as

29:37 possible. Yeah, so this now let's get another API to manage it. Exactly, yeah. That's it. What, okay, so we've got the graph part and then there's still an element of rag to this, right? So

29:52 you're retrieving information from the provided source docs, whatever they may be. Yeah.

30:01 Are y'all doing any, I mean, are y'all fine-tuning models or using foundational models? What's the? We use everything and we don't fine-tune anything Kind of to a point that you raised earlier. I

30:11 mean, our biggest thing is we'd like to sort of hack the system and how you interface with large language models. A lot of that's done through just being able to represent a graph in language. And

30:23 doing that to your point earlier, it gives us an opportunity to find the connective tissue and the roadmap you need to crawl down if you want to answer any questions. So a lot of large language

30:33 models, whenever they work now, they work, in a way where they're trying to derive context from the way that you write a question. They're trying to narrow and point that context to the relevant

30:43 parts of their training corpus so that they can define exactly how to answer that question. There's a couple of things that sort of cause large language models to both hallucinate and generalize.

30:55 And it really is in how you ask questions. Language you use is really, really important. And if you ask a really simple question, you know, something like help me optimize my production. You

31:10 know, production can mean a lot of things to a, you know, a seven billion parameter model that's, you know, trained. Yeah, exactly, exactly. And if you ask a really detailed question, like I

31:23 want to optimize my production on this within this industry and look at these things, it actually doesn't really help these large language models to narrow in. You get a little bit more efficacy,

31:35 but you start to get broader answers. So you get some generalization on both sides of that equation. We're trying to attack the middle of it where we can ask really, really easy questions with

31:45 embedded context. It increases the likelihood of it being able to pinpoint from a training corpus perspective, exactly the information it needs to recall, associated to the thing that you've asked

31:55 it within the context of the information you've provided it. And that sort of loop really starts to cut down on hallucinations at scale, and it increases token level probability, it decreases the

32:08 overall entropy in the models, so just sort of the predictive uncertainty, associated to answers that are provided. It starts to allow you to target the parts of the training corpus that are

32:19 relevant to the thing that you're trying to ask without having to ask a question in seven paragraphs and ask a very simple question and get a very, very good answer. So let's, I just realized this,

32:30 but we've talked about rag before. on the program, I don't know that we've actually defined what RAG is, so you might be a good person to break that down for everybody. And then also talking about

32:41 the training piece because this is something where I think a lot of people either are uninformed or don't understand, like with RAG specifically, why it's such a nice kind of hack to the system, so

32:53 to speak, is that you don't have to train, have a RAG stood up pretty quickly. Yeah. And so I think there's a lot of confusion around that So if you can kind of just dive into that a little bit.

33:05 Yeah, so Retrievalogbin and Generation, I'll try to do this without getting too nerdy and getting too much into the week. You're a good company. That's what this podcast is for. Yeah, no.

33:15 People can Google later if they don't. Yeah, 100. I'll try to be tea and ask it. When you train, or when you interface with something from a Retrievalogbin and Generation perspective, you're

33:25 essentially trying to look at things like cosign similarity and understanding from a contextual perspective, like how things are connected. there's mathematical proxies for these things. They can

33:35 use numbers as a representation of distance or similarity as vectorization is a thing, you know, people talk about. And that creates a super simplified version of what a graph does sort of in a

33:52 visual construct, right? So it gives you a proxy for similarity of different things. And it helps you define context associated to the questions that you're asking through some of these methods.

34:05 Vectorization is really good and it can get you pretty far. The efficacy rates on just sort of general vectorized models whenever you look at returns from a commercial perspective, they do really

34:16 well depending on question types, but they simple questions, you know, like simple question, simple facts that are unlikely to change over time, like your birthday or, you know, the author of a

34:28 book and simple with conditions, stock prices on a certain date or directors, what are their recent movies and what genres they're in. If you start to get into the space of multi-hop questions

34:43 where you're really looking at chaining multiple pieces of information together, this is where rag balls off a table. This is where it starts to fail because whenever you start to chain things,

34:54 you're looking at disparate or bifurcated pieces of information and trying to make logical assessments about how closely they are tied to the thing that you're trying to solve for. And that doesn't

35:06 scale across documents. Even if you start to chunk or embed different things, so if you're breaking documents down and

35:17 you're chunking them and you're using traditional rag methods, you can vectorize sections of that and then you can start to raise your efficacy rates on interrogation. But if you're using three

35:22 documents at a time, like that's not gonna work. We're not gonna be able to hop from this thing to that thing because it's just really difficult. that whole thing to take place in a really

35:33 efficient way and an effective way. This is where like GraphRag starts to change the efficacy curves on what you can get out of these systems because GraphRag represents this and then hold the sort

35:43 of dimensional way. You can look at how things are connected across documents, across data types and cross environments, and you could define the actual strength of those connections very tangibly

35:54 between them. And so a lot of people talk today about hybrid RAG, and you'll see in the time that we're doing hybrid RAG stuff. And that's great because that's sort of an additional layer of

36:06 fidelity. You can build in the way that you interrogate information. And that's combining sort of a graphical structure and vectorization from

36:16 a structured data perspective to sort of build a bridge between these two methods. And that can get you somewhere between 70 and 80 percent correct. So depending on the problem you're trying to

36:27 solve like yeah it can be completely good enough yeah you know like you're trying to define like some calendar invite system that sends something to everybody on their birthdays, across departments

36:37 or whatever. Oh great, like low stakes there. We're trying to build a bridge. Maybe we wanna look at something it's a little bit more effective and a little bit more sound from an integrity

36:51 perspective. But that's really it. It's like depending on the question you're trying to ask and the thing you're trying to solve for should define exactly what sort of method that you approach sort

37:00 of data interrogation through and with. And traditional rag is great. It's great if we're doing very like, say we wanted to question a novel, you know. The novels of HR is one of the ones that

37:12 we're seeing a lot of, right. People is a big corporate policy, right? Just like all the HR stonks, right? And they don't change very often. Versioning is very basic and simple. Like that's

37:23 such an easy one, right? And how many times, like how many thousands of emails a year get sent to the HR department 10 questions, right? And so things like that are such a low hanging fruit that

37:35 a lot of, you know, we're seeing that a lot as well. And so it's, yeah, it's a, I think it's interesting because, you know, once you see those use cases, right? Where it's like, it's not a

37:44 minimal amount of documents, but it's not just hundreds of thousands and gigs of stuff. So it's a decent size and it's very kind of targeted and focused on one thing. You get a very good, you know,

37:56 kind of solution. And then you're like, okay, well, now I want to start changing these things together And now we start getting agents and all this other fun stuff. Yeah. But like the whole rag

38:04 thing, I think people also don't understand that, you know, you submit your prompt. The prompt gets vectorized and it gets looked at against the index, you know, basically the cosign similarity

38:17 in most cases of what else is out there. And then it says, okay, these five chunks or these 10 chunks, whatever your K is. These are the ones that are most related to it and it takes those and

38:27 gives them to the. whatever language model you're using. And then the language model is what does all the summarization and the generation piece to give you the answer based off your system prompts

38:36 and stuff, but nowhere in that point, is there any training other than your system prompt and some of the tools you can do there. And so it's a, yeah, it's a really great POC solution for a lot

38:48 of people if you've got an idea for like, hey, could I use, could we make an HR bot really easily? Probably Yeah, we've worked with a couple of companies doing POCs on essentially internalizing

39:00 their maintenance processes and all of the maintenance, not the maintenance records themselves, that's sort of another thing to add into, but they're manuals for how they operate things. Like

39:12 that's a perfectly acceptable thing for sort of single document, but large document interrogation. Like if you're asking simple, simple with conditions, Set questions. something about comparison

39:28 within a document, you can start to get really targeted answers because there's not a lot of differentiation from the information. If you try to do aggregation across different methods, that's

39:38 where things start to lose some effectiveness. And the areas that you would look aggregation would be a question like,

39:47 how many MCPs did I work on as a person? So if you're trying to talk about HR, a lot of other stuff that we're hearing different people look at, and some of the conversations that we're having

39:58 people are, what if I wanna build the ideal team for the execution of a project? How can I internalize all my personnel records to define who the best people are for any given problem at any time?

40:10 And it's like, well, I mean, you could do that, but if you wanna do real aggregated interrogations sort of across these things, like traditional raglan, they get you so far. You need to apply a

40:21 different sort of data method and

40:24 I'm not sure if you're interested in that. Personally, as a company that the graphs are going to be the thing that changes the efficacy rates on AI implementation over the next decade. But who

40:34 knows what happens, right? The ground is moving under our feet as we speak. I mean, we got quantum computing being worked on Carolina as that comes through and then those get mashed. Who the hell

40:43 knows what it looks like? Probably scary. Yeah. Yeah. I don't want to get too much into the national defense stuff. Yeah. Like these are things that people are talking about for sure And scary

40:55 is within the context, all in the eye of the beholder, but it's scary. And the wrong scale. I don't get this in the wrong hand. Yeah. The scale is it. With the deep faking stuff now, I mean.

41:06 I'd say one last thing, just sort of on the vectorization rag sort of stuff. One of the real bad areas that these things don't perform well in with sort of a traditional rag approach is false

41:20 premise.

41:23 Well, he's a great question is like, what is the name of Taylor Swift's rap album?

41:29 We all know that she does not have one of those things. Even if we're not a fan, like we know who Taylor Swift is, we know what genre music she sings and in LLM, when you interface with sort of a

41:40 traditional rag approach, like really starts to fail in that. It'll start to make things up and be like, well, here are other rap albums and if she had one, what would it be called and what does

41:50 it look like from the history of her music? And it starts to try to answer things that aren't there. And in general, people would define that as sort of hallucination.

42:00 But Grafrag gives you an opportunity to really whittle through that. It runs through and be like, that does not even exist. It does not even exist, yeah. Now that's something that I love some of

42:13 these loco-wise or anything I love. I'm some of these very simple loco tools for spinning up POCs just to see like, Oh yeah. what does this look like? But the jump from that POC to a legit

42:27 production version of that is so much bigger than I feel like most people understand. It's massive, yeah. And so that's where like the tech, I feel like the tech community just screws itself so

42:37 much is that they make things look so magical almost, but in the back end and then they're like, Oh, well if I can do that, then let's go deploy it. And then you start and you're like, Oh shit,

42:47 this is very different. That's exactly what happened with one of the companies that we did at POC with They were like, We would like to internalizeall of the operating manuals or the satellite

42:57 system. And it's like, cool, no problem. So we showed him a rag approach and then we showed him a graph rag approach and the graph rag approach was really representing this data universe around

43:08 this satellite system and it's like here are all your maintenance logs, here are all your manuals for how you deploy this, here are your troubleshooting manuals for how you problem solve. And the

43:18 traditional rag approach will get you to a point where you can. question really effectively over given document, but being able to take all of those things together and question with any rate of

43:30 advocacy is a totally different thing.

43:36 And that's where the project starts to really expand in complexity. And it really does sort of change the way that you have to think about how you look at POCs versus production scale implementations,

43:49 because there's a whole other universal work required to get from here to there. And then what you're promising to the end user. Yeah, exactly. Well, that's the other thing, too, right? Like

44:01 with a RAG, generally, or graph RAG, it doesn't matter. You have a lot more control over system prompt, the temperature, your top K, like all the variables, right? And so even as simple as

44:13 something is going into the chat GBT interface and their playground and doing a RAG where you change the temperature, changing the temperature can significantly adjust the output. But one of the -

44:23 there was a paper a while ago called Chat GBT is Bullshit. I love this paper. It's such a great paper. The whole premise very briefly is, Bullshitters aren't interested in grounded facts. They're

44:33 just interested in an answer. And the

44:37 whole paper goes on to explain that that's what most of these foundational models are trained to do is give an answer. They don't have an idea whether that answer is brew or not. And so that's the

44:48 interesting thing with the rag side of things. Or once you start customizing these things, is you do have the ability to put some lot more barriers and guardrails and return the temperature all the

44:56 way down. You can do all these things to make it not hallucinate as much or give better answers. Arsp says if it doesn't return any kind of similarity within a certain percentage, I don't know.

45:08 And so it's like the amount of feedback that we've gotten on our end where it's like, I love that it says I don't know, is kind of shocking to me because you almost think like, why aren't the

45:17 foundation guys doing something like that when it's like yep 'cause it's just pissing people off if they're gonna give a bad answer. Like it's a bad user experience. If you give me a bad answer,

45:26 you just tell me you don't know. 'Cause as soon as they get a bad answer that they know, it's the wrong answer. Again, you're doing it for domain experts. Like, well, this is bullshit. I can't

45:32 trust it to answer this question. Like, I'm just gonna kick it at the curb rather than like, okay, it doesn't know that one, but let me see what the next thing is like. Correct, yeah. So, I

45:41 wanna make sure we hit it on some of the use cases you talked about. Then I also wanna make sure we cover like, how do you guys deploy? I mean, is it something that can be deployed on a CUS

45:51 companies own infrastructure, whether it's on-prem or their own public cloud or you guys host for people. And then I think last part also you've talked about before, but like how do you gotta get

46:01 data into the models, right? So, I know there's a lot to chew on there, but. Yeah, but where did you wanna start? Let's start with the use case, I wanna make sure you cover that. Yeah, yeah.

46:08 We need to talk about abandonment, originally it'd be interesting to talk about where it all started, right? Yeah, abandonment, so

46:17 US, lower 48, mostly. there is about a500 billion abandonment liability that's sitting out there with idle assets that we're producing at one point in time and are no longer producing today. And I

46:33 was a drilling engineer and I did a lot of abandonment work. I wrote abandonment legislation for different places that we operated with the regulatory agencies. And abandonment's are really hard to

46:48 solve for a couple of reasons. One, these assets change hands over time. Our industry is inorganic in nature. So we essentially buy assets. We normalize the language around those assets to our

46:60 local language. And this is less of a problem today. But if you think about 50, 60, 70s, 80s, 90s, even into the early 2000s, this is a persistent problem where people would just call things

47:12 different things. you'd lose records, you'd have all these fact-finding missions where you're deploying confusion blocks into wells to understand what the hell's going on down hall. You know,

47:23 there's all these things that you have to solve for. So abandonment is a really hard thing to solve for. So that was the first problem we wanted to tackle because we wanted to see how they could

47:31 handle complexity over sort of within temporal elements, right? So if we're looking at it over time, like how can it reconcile history? And a ban is a great place to start because you can get all

47:46 this really jaded history. The thing that led us to where we are was we had to, at that point, figure out a way to internalize structured data, image structured data, hand-written notes, like

47:60 images of things. So we needed to create something that was multimodal in nature, and we needed to interface with something. They could understand all of these different things. And the only way

48:10 that we could figure out to build a bridge between these two universes was. So our graph data model and that tended to work really well. So we started by looking at whether or not we could take an

48:21 asset that was 100 years old and reconcile where it was today, what was down whole and what it meant over time. And we found that the system was really, really good at doing that. It could figure

48:34 this stuff out, take subject matter expertise. I'd say as a caveat for everything, you need data and you need subject matter expertise If you have both of those things, you're in a really good

48:45 spot because you can start to play around. But we had that and so we were able to understand whether or not it could analyze key factors like well integrity over time, historical production rates,

48:56 what that meant for environmental implications, its regulatory status and different things like this. And it was great at that. It was really, really good. And so we started to widen the net a

49:06 little bit more just to understand what we could do And so the next thing we worked on was a lease and block acquisition advisor.

49:14 talking about this stuff today, but we were essentially working it from a portfolio strategy perspective. We were looking at what the portfolio imperatives were for a company and trying to

49:23 understand where they had idle or stranded assets that were no longer in operations that had productive life, because maybe they were just filled with a whole bunch of vertical wells, right? Nobody

49:33 ever re-completed the horizontals, you know, things like this. And where we had neighboring facilities around these pieces of land that were just sitting there idle, we'd be able to say, you know,

49:45 there are a lot of things around this that look really good. So we can infer from a subsurface perspective that there's reservoir continuity from this asset to this asset and sort of across this

49:56 plane and start to look at the history of it and define whether or not somebody had operated it, operated it within sort of a conventional approach and whether or not there was strategic portfolio

50:06 fit from a reservoir perspective and a recovery perspective over time. And so So that was the next thing that we looked at. And that was fun because we got to internalize around 8, 000 leases, 8,

50:19 000 permits. We got to look at general areas of sort of inference all across the Permian and Delaware basins. And we really got to try some interesting things about how we could compare different

50:32 assets to one another and start to try to model from an inference perspective what was the predictive life of these new assets that we could acquire That was the next thing that we worked on that led

50:43 to a fairly logical question around how can we use this stuff to look at loss production opportunities or minimize them while we're trying to do development within a field and what does that mean for

50:54 our shut-in radius and what we're going to do for executing fracks and everything else and so that started looking at average daily rates, volumes, fluids for these wells, what it meant for our

51:04 service facilities, what were the old rates on these things, how can we manage actually throughput to these assets in a more effective way. so that we could just operate the asset more effectively

51:16 and predictably. So that led very logically to lease operated expense advisory. And so we started trying to build agents that we could deploy for very specific means on, hey, we want to operate

51:30 this lease. Primary goals significantly reduce the lease operating expense across these assets and try to maximize our recovery. Cutting costs, increasing predictability and increasing recovery is

51:41 a byproduct of that. The next thing that we started to look into was well and completion design advisory. Just like the abandonment stuff is reconciling the history of the assets around the thing

51:54 that you want to develop and whether or not that's sort of fit for purpose for how you approach your completion's methodology in general. And that looks really good from a way to use a broad base of

52:10 information, relatively fit to how you would like to operate these assets and then optimize around different variables within the completions that have already been deployed. And what that means for

52:23 redesigning these things. One of the cool things somebody asked us is, you can take a deterministic approach to this. Like this has worked in asset A and we're in asset B right next door and

52:33 there's reservoir continuity so it should work in But what if we wanted to look at asset, C, D, E, F, G, Z, Q, R and their relationship to these fields and what their completion methodology

52:48 was and then create a composite completions methodology associated to how we can minimize the overall

52:57 problems that we could have in completing these wells and maximize the recovery on these assets to come up with a completely new approach to completing wells So that's kind of a cool thing to look at

53:08 that we've done some POC work around. We've got some other stuff, but I think at scale, one of the things that we really try to focus people on is there's a lot of things that you can focus on. It

53:19 doesn't mean you can do artificial life advisory, you can do production optimization advisory, all the things that we talked about. You really have to have a problem space to find, but the way

53:29 that our platform works best is looking at things from a strategic lens. It's like, how can I take this universe, boil it down to its constituent parts, understand how everything is connected with

53:42 this universe, and then start to play with the variables in the system to optimize any sort of individual element. And I think if you start with that sort of top-down approach, you can always go

53:53 low, because you've already mapped everything associated to this universe. But if you start low, it's really hard to start stacking on layers of complexity. So when we work with the people that we

54:04 work with, we try to point them towards things that are strategic in nature that are definable. and that we can start to build some fidelity within. And if we can do that, then we can start to get

54:14 into the lower level elements of that. Like if you wanted to figure out how to deploy a new pipeline within the water management solution space that we're working on now, well, that's a much easier

54:24 question to answer if you have the entire network map in your approaching it from a strategic asset management perspective. That's a real sort of different way to approach things. Yeah, no doubt.

54:35 Yeah, no, that's a. That's, I think that's one of the trickiest parts of the language model stuff is that people just want to throw all the data out of it and be like, Have it tell me everything.

54:46 And it's like, Well, that's not how this works. It doesn't work that way. And it really is a problem with like. No, the perception. The English language. And I mean, you know, there's these

54:56 great videos that go around like YouTube where people are trying to teach people English and, you know, they'll spell the word brown they'll spell another word that has the same four letters after

55:07 the B, and it's like, Right, we pronounce that completely different. Yeah, exactly, I see it, I've got a five out of seven year old, right, and they're trying to learn, and like my daughter

55:15 has my son how to spell a call. So who's that will roll except for these cases? Yeah, except for these, and these, and these, and this one instance, you know, we're gonna do something. He's

55:16 really good at the three letter words, and she threw out

55:24 gone, and he's like, G-O-N. And she's like, No, there's an E on the N. He's like, No, there's not. It doesn't - Exactly. They're like, Oh, yeah, English sucks It's just like a bonus

55:33 B-O-N-E, yeah. That's the complexity that large language models, when they're trained on these 700 billion parameter to interrogate those models more effectively because of how and appropriated by

55:42 different industries in different contexts, and it's really hard to sort through used, words can be spelled the exact same way and mean different things. Words can be language models, all this

55:42 knowledge is, in our that

56:01 problem It's not a problem that people are really cognizant of. until you start playing. Right. 'Cause when you're like, Shit. Until you get all the industry jargon going on today. Yeah,

56:09 exactly. Who does not stand what you think it stands for? That's right. In the oil field. Yeah. Okay, so let's say that I work at operator Z and we, you know, wanted to do an engagement with

56:19 ASquared, all right. You know, where does this get hosted? And then like, how do I get all this data? How do I get it in there? So it just depends, because we're working with the federal

56:26 government, one of the imperatives that we had to build our system around was air gap deployment. So deploying it into an environment that we can't see, touch, or interface with at all was

56:44 really the way that we started to approach building what we built. And from a platform perspective, we're micro-sarvices architecture, we're hyperscalular, agnostic. We don't care what your data

56:52 is. We don't care what modates in. We don't care where it resides. It's insanely agile, just philosophically because of the government coming in and saying. Well, we really want to use it for

57:05 CIA data, but like 12 people can't see that. Exactly, and you go like, okay, we'll try to figure this out. So our system architecture is really agile. To be running a serverless capacity, we

57:18 can deploy it behind your firewalls, into your on-prem facilities, we can deploy it into your own cloud, like it doesn't matter. In terms of how we get data, really where we host it is defining

57:30 like how we get it, like if you want to do like a big cloud transfer, like we can interface however you want, or really at the end of the day. So it's hyperscalable to however you want to approach

57:39 things.

57:42 We had to build like two

57:45 interfaces that were really critical for us were AWS and Azure, Azure because of the oil and gas interface. Most people, I believe this is still true, are on Azure infrastructure from an oil and

57:58 gas perspective And then in the government space, most people are on AWS. And so then we're like, all right, well, then we need to incorporate GCP. And we need to make sure that we can interface

58:10 with Databricks and Snowflake and all of these other things. And so it really did the, I said we were young. We're getting to the point, yeah, we're essentially a little over a year old. And I

58:22 think July,

58:24 2023, a little over a year old.

58:28 In that time, we spent the majority of our efforts trying to create this hyperscalable hyper agile platform that we can apply to any problem. It's industry lists, like it doesn't matter. You can

58:40 apply it to any problem space. The real question is whether or not we have represented data, or we have subject matter expertise. And we need both of those things for it to work. So I'm

58:50 interfacing with company Z. The first question is, what problem are you trying to solve? Okay, do we have an understanding of that problem? If it's an oil and gas problem, we have an

58:59 understanding of it So for that, to that end, like, is that. requirement of someone on your team to have expertise, or just the business that you're working with. Yeah, I mean, luckily from an

59:09 oil and gas perspective, we have that. Our backgrounds are really, really diverse, and our networks are insanely diverse as well as byproduct of the work that we did. So we can bring people in

59:23 that have subject matter expertise on any oil and gas problem you can think of. It doesn't matter. When we start to look at the other industries that are sort of high reliability and nature, like

59:33 finance, health care, people, like if you're gonna get into the people space.

59:39 That's not our expertise. We know that our platform would deploy exceptionally well into those environments and create very, very good products, but that's not our expertise. The other expertise

59:50 we have is the DOD space, especially sort of in the secret cleared environment space. We have some domain understanding and knowledge that we can apply. That helps us be pretty quick. Um, it

1:00:03 really informs our, our strategy or vertical market approach strategy is really geared towards those two industries. We want to grow into these other spaces and we will, uh, but right now we're

1:00:14 really focused on, while we're, we guys have your defense. Yeah. Makes sense.

1:00:20 That was, I know it blows by every time, every time, man. Um, so normally at the end, we'll do speed round. I'm going to tailor it a little bit for, uh, cool. For the AI stuff. What's your

1:00:34 favorite foundational model open source or closed? Glod, uh, sonnet, um, uh, haikus great. Um, we've kind of used them all. Um, we've done a lot of advocacy. Yeah. No, it changes by the

1:00:50 week almost. Um, I would say for the industries that we're working in, the sort of pedantic nature of the outcomes that we can generate from the clod models. or the anthropic models really in

1:01:03 general are significantly better at start. You can obviously tune these things up, at least in terms of the fidelity, the answers that you want. You can give prompt instructions associated to how

1:01:15 to answer questions and different things. But the basis at which they start with is exceptional. Like, not knocking any of the other ones, depends what you wanna do from a context window

1:01:26 perspective. If you need a huge one, use a Gemini model 'cause they have the largest context windows that you can interface with. OpenAI is good, their models are good, but we tend to see that

1:01:38 the cloud models are,

1:01:42 I think, sort of on the leading edge of effectiveness for the things that we want to apply them to. Yeah. What's your favorite book or podcast around AI? Great question. So I did this program at

1:01:56 Harvard, our professor that led the program. It runs this institute called the D3 Institute. It's named Kareem Laconi. And Kareem wrote a book, essentially calledThe Competition of AI. I'll

1:02:10 screw the name up. But it was about the developing market and AI, how things are changing. And effectively, how you can interface from a business perspective in the market. I'll get the exact

1:02:21 name for you. But Kareem is a great guy. They do a lot of really cool research. And we've met a lot of really cool people as a product of our work there. So

1:02:33 I think any of the books that they're producing, Ethan Mollack produced a book. And again, I'm just shit on titles. But Ethan Mollack's another professor that works at Penn University. And he

1:02:44 worked with Kareem at BCG. And they both started up their AI institutes at both of the schools, Harvard and Penn. And - Those good schools. I hear that they're OK. I don't know. They let me in.

1:02:55 So I'm not 100 sure But now they. Anything that they produce, I'll read, because they're sitting on the leading edge of all the research that's happening in these spaces. MIT, CSAIL, their

1:03:10 computer science and AI lab, they do a lot of really good work. MIT review, just in general, has a lot of good content. So that's where I go for content in general, and Ethan and Kareem's books

1:03:22 are, they're out right now competing in the age of AI. That's one, and I'll figure out what Ethan's latest book is They are kind of the go-to, I think, industry awareness books about how to play

1:03:37 in these spaces, how the ground is shifting, what research is telling us about the future imperatives of the industry and how it's gonna change. What's your favorite vector database? Milvus is

1:03:50 the one, so we've used Mongo. We've done testing in a couple of different places, Milvus seems to be the best in the DoD application. Actually, the overlap there is really important. There's a

1:04:02 lot of contractors in the DoD using Milvus instances. So that tends to be the thing that I think helps us from

1:04:13 just an operations perspective that the scale you can apply to. They have a hosted and open source version. I believe so, yeah, I believe so. Yeah, it's always for the one having both Yeah. So

1:04:26 they're just one. I mean, our POC was on a pinecone, which is what it is, but we very quickly realized that was a terrible decision out of

1:04:38 the gate for anything remotely production scalable. So we've now migrated over to Quadrant, but I'm in the same boat. That's why I asked, I'm just like, everyone has to make that decision of like,

1:04:47 okay, which one do we use? And it's such, there's so much to it, and there's so many different tools out there, and there's damn near a new one every day, it seems like, so it's just - It's

1:04:57 always fun to be able to design as agnoscically as possible so you can port. But even though the APIs are probably different and so it's like, yeah. They definitely are in managing the interface

1:05:08 there. It's sort of a burden over time. That was the craziest part about the vector stuff early on was there were like, most of them didn't have a GUI like it was just, oh, here's, here's all

1:05:18 the documentation. Figured out. Have fun. Yeah, I'm like, oh, I don't like that. My developers are here. But yeah, uh, no, that's, uh, it's like, yeah, one more. Yeah, it's the most

1:05:29 important questions. So you're in the Pacific Northwest. When you come back to Houston, what's a restaurant you have to go to? Oh, that's a great question. What's one people should go to? OK,

1:05:40 I had lunch with a lot of my friends yesterday at a restaurant called Relish, which I like. It's sort of Southern comfort food, I guess, is the best way to say it. But they have a fairly diverse

1:05:49 menu. Relish is on Westheimer. Okay. So right near Lamar High School, it's right over here by a place called State So it's right out of the road from there. I love relish Lajardine at the Museum

1:06:02 of Fine Arts. Okay. If you have not been, is a fantastic restaurant. It's Michelin-level food at the Art Museum, and you can sit out and have just a fantastic meal. I always try to go to

1:06:20 Lajardine, wherever I go. But most, just full transparency, most of my stuff is like, find a Mexican food restaurant that is good And go to that immediately. And then go to sports games. So I'm

1:06:31 gonna go to the Rockets game tonight. That's awesome. I was a seasoned ticket holder for the Rockets and the Astros when I lived here last. And just, I was born and raised here in Houston. So

1:06:42 there's a lot of different places, but I say definitely check out Relish. Definitely check out Lajardine at the Museum of Fine Arts. Both of those restaurants are really, really fantastic. That's

1:06:53 awesome. That's a hot take I've never heard it. the one at the Fine Arts Museum. That's a nice meeting. Yeah, I know. A lot of the hotels are awesome with my wife, so they'll be nice and close.

1:07:03 It's very good. That's cool. Very good.

1:07:06 Yeah, well, no, I mean, just, I mean, thanks for joining us. And, you know, letting us know you're gonna be in town and, you know, getting over to join us in studio. It's always easier and,

1:07:13 you know, just, you know, we love doing the remote ones too, but this is, you know. Oh, this is great. Being able to sit down and chat with people. I really appreciate the invite and being

1:07:22 able to stop by and chat with us. Where can people find you? I can find us on LinkedIn. So it's at Data2US, it's sort of the handle in search. That's us, DataSquared. So our website is data2ai.

1:07:37 And you can find me on LinkedIn at John Bruton. The rest of our team will be connected to the homepage. Or if you don't do much outside of that, those are our

1:07:47 traditional forums of inter-pays website and LinkedIn. Pretty standard, I think. They'll be able to find you if they want to. We're doing viral TikTok dances on - Not yet. We're trying to hire

1:07:56 somebody. It's not critical path. I don't know if the DoD would appreciate that as much as that. They certainly wouldn't. The TikTok,

1:08:03 they would not, they would not like that. Although I will say the Dunkin' Donuts Gen Z marketing teams have been crushing it lately. I think that's the pro tip is to just go hire. I hear someone

1:08:16 straight out of college and let them run at like their personal account But hard recommendation Dunkin' Donuts, you said it is to watch the Saturday Night Live skid about Dunkin' Donuts. It was like

1:08:27 a year and a half ago. It's a perfect representation of going to a Dunkin' Donuts in Boston at any point in time. So I'm checking out family in Boston. Yeah. I've been severely disappointed by the

1:08:39 donkeys down here. No, they're not, they're not. Like to the quality and the customer service, it's not even close. Yeah, we're sure. All right, well, awesome. Appreciate it, John

1:08:49 everybody if you like like the episode make sure you.

1:08:52 Give us a like, throw a question down in the comment for us, and make sure you guys follow and subscribe.

Creators and Guests

Bobby Neelon
Host
Bobby Neelon
Husband, Father, Baseball, Upstream Oil and Gas, R, Python, JS, SQL, Cloud Computing
John Kalfayan
Host
John Kalfayan
Raddad, energy tech, crypto, data, sports, cars
EP 58: Jon Brewton from data²