Energy Bytes | Transcript: EP 53: Meet Ionian Labs' Game-Changing Tools

EP 53: Meet Ionian Labs' Game-Changing Tools

October 17, 2024 / 57:16/E53

0:00 We are back with a another Bobbyless episode of of energy bytes. I feel feel lonely over here on my side, but he is busy transitioning with the Devon merger. So he's, I know he wishes he could be

0:15 here right now instead of having to deal with that. But we are here with the founders of Ionian Labs. Appreciate you guys coming on. Yeah. How you going to be here? Can you just introduce

0:27 yourselves and kind of give us what you're, what you guys do with the company? Very, yeah. My name is Yaz and Abushor. I'm a co-founder with Eugene is about to introduce himself. And we started

0:40 mainly a software company and oil and gas. And we'll probably end up talking more about that. But yeah, we both come from an engineering tech background, graduated from Texas AM University in

0:52 biomedical engineering or originally wanted to be premed and then two years in my degree, I changed.

1:17 You didn't want to be in school for a decade? That's exactly right, I got an internship or a job halfway through my college career and I was like, it was a tech internship in IT. I was like,

1:17 people are really successful here and then I think I could really do this. So that's when I made the pivot and I didn't want to delay my graduation. Thanks. So yeah, after that started working out

1:21 some software companies after I graduated and we end up here. That's awesome Yeah, so my name is Eugene Nadellia, co-founder with Yazen for Ionian Labs, also went to Texas AM but I graduated in

1:35 electronic systems engineering.

1:38 While I was doing that, I fell in love with software and I had some of my friends who were software engineers teach me more in depth and give me stuff to learn about that, some projects on my own

1:50 ended up getting a job at opportune, doing some software engineering for them And yeah, I found. in the course of my work at opportune, that there's really not a whole lot of options for software

2:03 in the energy space. And the options that do exist are very old. Specifically around the LAN solution, is that what you're - Well, around LAN solution, division order stuff, even accounting

2:15 software, really old, really data. Yeah, data entry. Data entry, data entry. You're right on making that generalization. I shouldn't have even stated that because it is a very clear line of,

2:27 well, was this built before 2010 or was it built after 2017? And they just were like - Exactly. Probably 2010 is probably also generous. It's probably 2000-ish. And then there was, yeah,

2:39 there's basically a 10, 15, 20-year gap in software development with the gas industry, and it's very apparent when you - I mean, that's great for us. Right now. All those - That was decade old

2:49 things. That's literally why the company was founded as Colin saw this kind of energy tech revolution happening and all, you know, people are age. working at these companies and being like, this

2:59 is a stupid problem. Someone should solve it and starting the companies and doing it, right? And so that's, no, that's exactly what we're here for. Before we get into any of this, the most

3:09 important question of the day is, how badly are the Razorback's gonna beach out this weekend in football? As a Razorback fan, I've gotta ask. Probably pre-dead. No, it's not, it's gonna be

3:22 another shit show. We are not good. I think y'all are gonna beat the hell out of us Yeah, unfortunately, we've been having the best

3:30 walk or rookie years for the past maybe four or five years, but it hasn't been panning out for us. The AM Arkansas game has always been a great one. It doesn't matter how good or bad either of us

3:42 are, we always play right to each other. And so it's always a fun one. But as an Arkansas fan, I have to talk some shit before, before the results are out, because I won't be able to after more

3:52 than likely

3:54 So, okay, Ioni in labs, tell us. just high level, kind of what y'all do, what problem you are solving, and then we'll work backwards as to how you guys came about starting the company and why.

4:08 So we, right now, what we do is mainly around title work and land acquisition, mineral acquisition. That's where Eugene and I kind of saw the issue where a lot of manual work and manual reading

4:24 and data entry and such. For the average person who hasn't worked in land and title, give them kind of what is the old way of doing things and then how you guys supplement or complement that. Yeah.

4:35 Yeah. So let's say we're someone's looking to buy some mineral assets in the Permian Basin in West Texas and they identify sort of the land or maybe the patch or parcel they want to look at and they

4:50 end up having to figure out who currently owns it and a lot of time just not one person just because of. of how inheritance and probate works, you end up finding many owners with many interests and

5:02 percentages. 64th. Exactly. 164th,

5:05 132nd, and then even less than that. You have to go back as over here in this industry, land men usually do this work and they figure out who owns it and you have to go back to the first title.

5:17 From Pat. Yeah. From whenever that land was started being tracked or issued by the county and a lot of times in Texas, it's the Spanish and Mexican grants where it's handwritten and they use a lot

5:32 of them use VARAs, which is a Mexican way of measuring distance and that's still being used today. So they have to go through all that to figure out who owns what before you can even like approach

5:46 the person that you want to buy from. Yeah. You're not even negotiating anything. You've got to go My first roommate when I moved down here, shout out to Lee Ellsner.

5:57 He was, he had just graduated with his, uh, just finished law school and just taking the bar and was waiting on his results. And so he started as a land man. And the first day he got back, I

6:07 remember asking him, you know, how was it? Whatever. He was like, man, I started at whatever 18 or early 1900s, late 1800s. And I was like, that's such a random, like time. Why? Then he

6:19 was like, Oh, that's when they started using the typewriter and then that moment I was like, Oh, I was that full mind blown I was like, Oh, man, you've got like maybe at least maybe in 100

6:31 years, if not more before that, where it's handwritten and then in cursive, right? Yeah, it's not like just print. It's all cursive. So it's because he was, he would tell me, you know, like

6:43 people's names would get mixed up or like just by the person who interpreted the handwriting, right? Like all this crazy stuff. So land has always been something for me that is just like, it's

6:53 fascinating. because it's such a, I mean, it's a necessary evil of the whole thing, but it's also like, I don't, how do you solve that problem? Like it's a hard problem to solve, but it's a

7:03 very big one, right? Like it needs to be solved. So let's go into the kind of that part of what, how you guys came about, how I own you and came about what a, what was the main problem y'all

7:15 were trying to solve for and what, how do you guys kind of approach that? What are y'all? Yeah, so the main problem we're solving for is taking out as much, you know, manual labor, as much eyes

7:25 looking at the paper. Less courthouse time. Less courthouse time, you know, less time spent with a flowchart and a magnifying glass and a couple of friends trying to figure out if this is a J or a

7:37 G or a

7:40 you know, so

7:43 yeah, so the way we saw it is that fundamentally this problem is a natural language problem Right, right, this is a, this is a words problem, words interpretation.

7:54 And LLMs seem to be

7:57 the perfect tool to solve this problem, a combination of LLMs and good character recognition. So that was the opportunity that we saw and we also saw that basically nobody working on this.

8:09 I think there were a couple of companies periodically that tried doing something like this. They tried various. I was gonna say probably OCR and some of the older - Yeah, some of the older OCR

8:20 methods, they didn't work quite as well There also weren't LLMs at the time, 'cause this was like, I don't know, 2018, 2014, so I'm sorry. Yeah, it's just popped off and no one even knew what

8:30 a transformer was in 2018. Yeah, yeah. I think at the time, when they came out, they really only used for like chess. So, not a whole lot of

8:42 overlap there. Yeah, no, for sure. But yeah, that's

8:47 what we saw the opportunity in is using this new wave of technology to actually solve like a real problem. you know, make a company and slap AI on top of it. No, I think that's the more I talk

8:59 about the language model stuff, the more I feel like it's kind of a, I don't know, it's like the universe's reaction to all of the data that we have, right? Like it's like a physical property of

9:12 like, if all of these things happen, then we have to have something like, like we have all this data now and most of it is text. But we can't, we have so much that we can't actually make most of

9:22 it usable because it is, there's so much, and a human can only ingest so much information per second at a time. And so these language models is like this organic evolution of, okay, well, we've

9:35 had this, you know, big data and analytics was what everybody was talking about two, three years ago. It's like, okay, we've got the data now. We finally figured out that, okay, we've got

9:43 enough data to make these useful. And now the language model is kind of that next evolutionary step of, For text data, we have a shit ton of that as well. it just continues to grow by the day.

9:55 And so how do we make it efficient to use it to get the value and knowledge out of that as efficient? To me, it's like, I tell people it's like, the difference in me Google searching versus my

10:06 parents Google searching, right? Like I'm much more targeted and direct. I know how to Google search, you know, put things in quotes and all the fun, you know, hacky tricks. I very much so see

10:17 language models as kind of that, at least one of the big applications of them is that is like augmented search, right? It's just a much, much more efficient, better way of searching and getting

10:27 to the point instead of trying to keyword hack, which is what Google is essentially, right? What are your, how, so let's dive into it. I assume it's, okay, we've got all these documents, lane

10:44 and title documents, and then is it a rag model? Is it a custom fine tune model?

10:53 And then let's get into as much detail as you guys are, are willing about kind of what that stack looks like and what the flow of that is. Like what are you using for databases? Are you using a rag

11:03 architecture? Is it a custom model? All the fun stuff. So let's show. Yeah, so what we do is we

11:13 like to take all of the data for a given section at one time, and then we'll ingest it all at the same time And by doing that, we don't really have to do rag so much 'cause we don't really need to

11:24 retrieve the data later, we're processing right there on the spot. So we'll extract the key information that we need, who's selling, who's buying, what are they buying it for, if it's there,

11:36 where is the tracked, down to your brain, your section, whatever. And then we use fine-tuned models for that And then we put all that in a custom database that we build,

11:53 After that, we run like a graph search algorithm to find the chain of title all the way back down, taking into account the effective dates of the documents and all that stuff. Are y'all using any

12:06 kind of, for the character recognition stuff? Are y'all using OCR, are you using multi-modal, are you using some combination, what are y'all kind of doing there? Yeah, so it turns out, if you

12:19 use a couple of different things and then take a consensus, you get higher rates, you get higher accuracy rates. So we use a couple of different providers, and then we just like to pull them, and

12:30 then we

12:33 kick out things that don't pull well, right? So if they all say something different, then - Right, no confidence in that one. Yeah, no confidence. But you still end up saving 99

12:45 of the work. Yeah, no, I think that's

12:48 the big thing with a lot of this, especially early on, or not even early on, I think like short term and mid term, right? Like, you know, everyone's all the news articles about everyone worried

12:60 about AI taking their job. And it's like, AI is going to make your job a lot easier and a lot more efficient before it takes your job in most cases. Um, and so like, yeah, that's the other thing

13:10 too. Then you get into the academics or the highly technical people. And I'm like, well, if it's not 99 accurate or 100 accurate, how am I supposed to trust? And it's like, well, if it saves

13:18 you 80 of your time, then I promise you, you'll trust it really quickly, and so that's cool, how much I can only, like I said, I'm responsible for all of our RAG data and all of our language

13:36 model back to the data side of it, and it's not fun. So I can only imagine and we're not doing like land and title documents, you know, we're doing textbooks and papers. So it's a little bit of

13:49 web sites and so it's always a lot more structured. But even then it's still very difficult, right? Like everyone thinks PDF is this like standard formatting. It's not it is just a dumpster fire

14:02 where anything can go and there's no formatting to it essentially. But how does what kind of challenges did you guys have around like specifically cursive with the language or you know the

14:12 recognition stuff because that's not a easy thing to yeah cursive is still a really big problem absolutely gone to like 80 85 somewhere in the 80 percent accuracy on cursive which is really good

14:27 because yeah even even reading it I mean that's a human exactly yeah it turns out cursive is different right like so I can't imagine it turns out the models are actually better at that than we are.

14:39 No, I completely make sense, right? Like, I mean, everyone goes with a doctor, you look at your prescription, you're like, I can't get that. Honestly, they can't even read it. Yeah, no,

14:49 absolutely. I think there's a reason why they switched to digital prescription. Yeah.

14:57 So that was one of the challenges. I guess another one that we can talk about, and actually something that we're working on right now is depending on the survey system used for the land description,

15:09 we started off doing rectangular survey systems, which is what's used in the majority of the United States and West Texas, but you have like, for example, the Eagle Fordschill, a lot of the

15:20 counties there use a meet and balance system where they point out the corners of the tract, and they say walk from this old rod or rusty rod to this other end of the creek.

15:35 it may not be there. It might be dried up. There's no water in the creek now. But there's like a creek. It's like you can't even tell it's a creek. Might have it gammed off. Yeah, so they use

15:43 that method of describing the land and that's what, legally, that's what it is. So you have to kind of take it how it is and deal with it like that. So having to map that into a systematic way

15:56 where we can do our relationships and figure out this document 20 years later is actually talking about the same piece of land that was referenced in this document 20 years earlier. So that's one of

16:07 the challenges that we're working on. I think we have some ideas how we can solve that. We actually found a vendor. Yeah. I don't know if I - I don't know if I - I think I told you about this a

16:17 little bit. You did, yeah. But yeah, we found a vendor. They're a company called Bunting Labs. They do a QGIS

16:26 plugin that reads your land documents or whatever and then maps that onto your QGIS map your GAS map or whatever, so they can read the meats and bounds and they can draw the polygon and then they can

16:36 match it with, you know, a set of, um, I don't know, a set of, um, equal, uh, talking about coordinates. Yeah, they, they use like, you know, that thing that you put a reference

16:50 reference. Sorry. Yeah. The user system of references. Um, are they using, is anyone using satellite or like aerial image data to try and also kind of triangulate some of those? Yeah, that's

17:05 what they use as their references. So if you have a picture and if you have the, um, you know, whatever the, the map coordinates that you have, then they'll map that over. That's a beautiful

17:15 use of kind of combining technologies. Right. Absolutely. Before you'd have to send the survey team out. Um, they'd have to go manually do it by hand. No, that's, uh, so is there a, there's

17:27 also, I said, The economics perfect sense with the. the land side of it. There's a whole GIS element to your platform and stuff as well. Yes, so Eugene referenced like polygons. So the entire

17:40 idea, why we started with rectangular service system is because it's like an easy way to transfer that into like essentially math, right? So we would take a legal land description with it, which

17:53 is natural language words. And we would map that into some sort of like graphing system or math system. And then we use that to form the relationships between the other rest of the documents. Nice.

18:06 So that's really neat. We use that for production. It's called

18:13 the net acreage reporting, right? So when you wanna figure out, you know, this person owns land in this tract, right? The tract is a square mile or something like that. He owns like a tiny

18:24 piece, 1130th or something like that.

18:29 know how many how many nadakers that is because we want to give that guy an offer, right? We can track that stuff. That's awesome. Yeah, there's like, there's a lot, I feel like coming out or a

18:39 lot of a well needed attention on the land side because there are so many dynamics to that, right? One of our earlier guests,

18:48 shout out to Pecantry. Pecantry estimate, they're focused on basically the flip side of that where you're a mineral owner and random oil and gas company or broker makes you an offer on your minerals.

19:00 How do you know whether it's a good, it's like Zillow, they say it's like Zillow for minerals, right? And then, so there's all these really interesting solutions out there, so it's always fun to

19:10 see people kind of approaching because it is not a trivial problem by any means from either side, right?

19:19 So let's get into your sack a little bit. What are you all use in. So you mentioned a graph search or you're using a what? kind of database or you'll use in mind that you're using vector database

19:30 or just a traditional kind of. Well, we're using vector database to store the, you know, all the elements, outputs and the embeddings and stuff, but you can talk more about what we're storing

19:42 actual data. Yeah. So the idea is we used the LLMs and AI to pull out these key attributes, exactly, extract key attributes that we really need from the document Not saying not most of it,

19:58 there's always these recurring themes that you're always looking for the grantor, the grantee, the legal land description, effective date. All these things are sort of recurring in these documents

20:08 and pull them out. And once we have them out, all we have to do is store them into a relational database. So that's what we use. We use Postgres for the database and we started in that

20:22 We're messing around with the best way to store a lot of these. Legal land descriptions. We're using Postgres because they have a lot of support for GIS and they have this very nice plugin. So

20:35 that's, that's pretty much it. And,

20:40 yeah, I think that's Yeah,

20:43 definitely in the right lane here. I've been, I've been telling people for months how, how good certain language models are at extracting information from documents. It's not just generative It's

20:56 being able to, like OCR will be replaced by multimodal LLMs, in my opinion, in the near future, because it is so much better. Like, what if you've ever done OCR, you understand what I'm talking

21:06 about? Like, you haven't, people just say, oh, yeah, I just use OCR. And it's like, oh, well, that's not quite how it works, right? Like, it's not just a magic button. The language

21:16 models seem to be better. You're, we're getting closer to that magic button of here's the document. Tell me, you know, enter the three keywords that you're trying to put in. extract from it and

21:26 it will reliably do that. I did that

21:31 all the, so our second version of Collide GPT, our language model, I've been working on for the last three or four months, but one of the big focuses we're doing with this next iteration is making

21:42 sure we have all the metadata and we cite all of our sources. And now we're gonna be not only citing them, but referring you to the actual direct source with the link and all this stuff. So I was

21:52 using, I've had a lot of success with, actually, we ran a test with 40 and Gemini flash 15, I think. Basically, I just took the first 10 pages of each book and said, I fed it to both models and

22:11 said, give me the author, the title, the date, the ISBN, all the metadata about the fact that I want. Gemini returned three times as many ISBN numbers open ad, which I thought was fascinating.

22:24 And then I did some more research. Somebody, I think, told me that Jim and I was trained for categorization, and it makes a lot more sense that Jim and I would be better at it. But it's like the

22:33 amount of time it would have taken me to do that on thousands of books versus it doing, and I still had to manually, I did manually did some this week, but

22:45 it was out of 3000, or 4000 papers that I pulled for this last batch, I had to manually do like 100 of them. And so it's like - Bet that wasn't fun. No, it's not fun But I'll take 100 over 4000,

22:54 or 100, damn long, you know, like, what is that, 10, 20 maybe? And so it's like, it's not perfect, but it's way better than manually having to do this shit. And so like, I think, again, I

23:07 think ultimately, I say all this to say that, I think the multimodal stuff, I haven't even, Wama just came out with their new one this week, I haven't even played with it, but I think the

23:15 multimodal stuff is gonna be revolutionary around that exact thing of, Hey, here's all these documents. give me, go through each one of them. I mean, it happens across every oil and gas

23:26 operation. You have a report for every single thing that you do, which has all the important information in it. And it goes into a PDF where data goes to die, doesn't go into a database. And it's

23:36 like, now you've got all these companies that have set up all these, you know,

23:41 God, I forgot what Microsoft calls their connector flow stuff, but they're flow apps where it's like, okay, if an email - A video or something? It's like that But it's like - The Azure stuff?

23:52 Yeah, they set up this automation where an email comes, or you get an email to this address with this in the subject, so then go take the PDF from that email, send it over here, and then run it

24:06 through this OCR, and it's like, or you just feed it into a model, and it extracts it and goes and does exactly what it wants. It's gonna be a lot easier, I think, in the future for a lot of

24:16 this stuff. Yeah, we're definitely looking forward to that. I mean, just the thing about the OCR We first started off doing just, I think we started using standard Google's OCR. And then slowly

24:30 we tried other OCR vendors. And then when 40 came out, like you said with the multi-model, we were like, okay, well, we know this can recognize, if you give it a document, this can recognize

24:40 text on it. How much better is it even better than traditional OCR? So that's where we went in. We did some testing I think we ended up finding out it was like maybe a little better, maybe not

24:53 much better. It wasn't really too different just because the OCR that we were using at the time was pretty good. And yeah, so it was really just about all these tools that we have and then end up

25:05 choosing the best one for our need. One really cool place where it is better using the multi-modal model. If you OCR something,

25:15 typically it's really hard to include like the location data and translate the location data into some meaningful insight about the document. But if you're using the multimodal model, that stuff

25:24 kind of gets baked in. So being able to answer the question, is this piece of text a title, or is this piece of text a page number at the bottom? Or a caption? Or a caption? That's much easier

25:38 with the multimodal model. That's the, I think, a thing a lot of people don't realize about, like, with rag in general is that, you know, it's a, you're essentially the look-uping a database,

25:51 which is the vector database for your rag model. But the tricky part is if you want to do it well and you want to be able to display the results, how they were originally created in the original

26:04 document, you have to understand how the document is constructed, understand what order, you know, A lot of it got never, ever, ever thought that technical papers would be an issue. but then

26:15 you get in there and it's like, there's two columns. That's right. So when it gets to the bottom of one comp, like does it read it all the way across or does it read down to the bottom of the

26:22 column and then back up to the next one? And so there's all these like nuances about that stuff that these things just keep getting better about. But it's a, yeah, you essentially have to figure

26:32 out how to deconstruct the original document. So that, and then have all the information about how it's constructed. So you can reconstruct it when someone wants

26:42 once that information or wants to search for it. So it's not trivial, but I completely agree with you. The multi-modals are really, really good. There are a lot of the tools that they've built

26:52 around the multi-modals are really, really good at like, okay, here's the document, here's the title, here's what's in the header, here's the headings and subheadings of the document. And so

27:03 it's like from there, you can pretty quickly kind of figure out how you can just reconstruct or deconstruct a document by an index if I know like, Okay, here's each header, and then the sub

27:13 headers within those. But then it's like, okay, then there's also a random diagram and an equation on this page too. And so some of these neutrals, it's like, okay, well, that equation is

27:22 inside these coordinates. It gives you the coordinates on the document where it is. And so you can always return that with 100 confidence instead of being like, oh, well, there's an image over

27:33 here, but we don't really have a way to tell you or always return it,

27:39 there's a lot, there's a lot to it But

27:43 what are some of the

27:47 kind of lessons learned that you guys have had over this? Because what y'all are doing, I think, applies to it. It's not just land, right? Like there's so many documents in every industry,

27:58 basically. But what are some of the things that you guys learned that you would do again or wouldn't do again, kind of moving forward? Also, where are you guys just kind of company-wise? Where

28:11 are you guys at? Are you all kind of still early-staged? or anything like that. Just give me an idea of how big you guys are so I can understand where you're at in your journey. So we're very

28:22 early stage. We've been at it for a little less than a year, maybe coming up on a year pretty soon. Great. And no, I guess, proper or - No VC funding. Yeah, no VC funding at all, mainly

28:38 bootstrapped. So friends and family. Yeah, that's the best way to do it That's the hardest, but it's the best way to do it at the beginning for sure. So, okay, so given that, how are some of

28:53 your customers or pilots and stuff going? What are their main kind of takeaways or benefits that your customer feedback that you're getting?

29:04 Yeah, so I guess the main feedback is, just like the amount of time it saves. Right

29:12 I mean, let's say you're handling. You know, if you're trying to buy new properties every week, right, you're basically starting a process every week that will last a month or something like that.

29:24 So you're stacking all these, you know, concurrent processes, but if we can shave two, three weeks off that process, then, you know, you can do the whole thing and, you know, find more deals

29:38 in the same time, right? I mean, that stuff is a large degree being taken care of for you So, you know, you can just do more deal flow. Yeah. No, that makes a, I mean, like I said, the

29:49 amount of people, if you've never done it, and I haven't, but I've been close enough to people who do do it to know how painful it is. But like, if you've never done land and title,

29:59 it's not easy. It's not fun. It's, and that's, again, that's a perfect example, in my opinion, of technology should replace the shit that humans don't like doing, and we're not good at, like,

30:09 that's where we should absolutely lean on technology So it's like, yeah, reading thousands of pages, of cursive. not my personal top of the list thing I want to do tonight, so let's let a

30:21 language model do it for me. Actually, one thing, a challenge that we've noticed that's pretty recurring with most of our customers that we're talking to and working with is the fact that, yes,

30:36 we can automate a lot of these egregious tasks that they don't like doing.

30:43 But one one big thing is the probates themselves. They're in a different like court. In in land documents, they're usually with the county clerk, but the actual probates are with the court clerk.

30:57 So it's a different part of the county. And that's where sometimes you will find where the inheritance and who gets what and that you could tell once you're doing work, a title work on a on a

31:08 tracked, you could see that there was a lapse here. And that's where you have to go as a landman or pretty much us now is you have to go figure out was there any probates there? And a lot of times

31:18 the land documents themselves are digitized, but not the probates, where the probates are completely different system that you have to go call the county, request it, and sometimes I could take

31:28 weeks to get back. So that's something that we're definitely working towards automating, but the fact that there's like the way the county works and the way the government works is just making it

31:39 very hard for us. Yeah, no, I think that's another, again, another nuance that the average person doesn't realize is it's like the state of Texas has its own state stuff. Every other state has

31:49 its own state stuff and everybody has federal stuff and it's, yeah, everything is always different. I mean, that's another thing with the mineral side. And never envy is like the vertical, like

32:01 depth. Oh, well, I own it from this up and someone else sends it down and you're like, oh, man. Whole another complexity thrown into that, right? Yeah How do y'all, so we're y'all.

32:14 Is there, are there like data sources that you guys can pull in for a lot of these land and title? Or is there still a lot of manual stuff or how does that kind of work on the platform? Well, it's

32:25 every different sort of thing. I mean, anyway, you can imagine data being presented to you. That's how we get it. So sometimes we're really lucky and we can just buy the data outright from the

32:38 county. Sometimes the county doesn't want to or can't because of some contract that they had with some other data provider. And then we have to get them individually. So at that point, we have to

32:49 navigate their system and pull the docs out ourselves, which that adds a whole mother dimension to the time. And then sometimes the documents are all paper. It's just paper, microfiche.

33:09 I don't even know what the microfiche is. I know you're with microfiche, it's just not all enough to have used microfiche in the library back when people went to the library. They used to gotta ask

33:17 nicely and then they'll let you take pictures.

33:23 I mean, it's just so funny to me because there's so much

33:28 of the data problems, regardless of the inner of what industry you're in. So many of the data problems stem from the fact that the data just isn't digitized or structured properly, right? And it's

33:38 like, it's crazy. And from their perspective, it makes sense. I mean, you're talking about a county that has 9, 000 people in it. Well, that's what I'm saying, right? Like there's no way to

33:46 solve it because there's no incentive unless the state comes out and says, hey, everyone, you're gonna have to fucking digitize this because we want you to digitize it. And then it's like, okay,

33:54 well, great, now we have Texas. All right, well, now we got 49 more, you know? Like it's still, every state has their own stuff. So it's such a, again, it's a necessary evil, but I'm glad

34:06 there are people like you guys

34:09 Hey, let's make this way more efficient than it needs to, than it has been in the past. You end up having favorite counties, a lot of times, too, I like working with Howard County because

34:20 everything's digitized there and it's very easy to access. Then like a county right next to it

34:25 is like the complete opposite. Right. You end up having - But I never pick up the phone. Yeah, never pick up the phone, always a little message. And then if you want anything, you're gonna have

34:34 to go there yourself. You're gonna have to drive from here It's like 13, 14 hours, all the way to West Texas, just to look at a few documents. Right, yeah, they can't email them to you. I'm

34:44 done with it.

34:46 I'm done with it, right. Let's see, what else, what

34:52 are y'all, which are y'all all cloud-based or do you self-host or how is your kind of architecture? Yeah, we're cloud-based right now. So when Eugene and I first started, we definitely, since

35:03 this is our first business, first startup, we knew there was going to be. some mistakes that we made and we're gonna learn from them. And there were a few things that we knew, or at least from

35:13 hearing from our buddies who've started their own companies that we knew we wanted to get right. And one of them was the way we host and serve our platform. We knew we needed to move fast and not

35:25 really worried about doing it perfectly or something like that. So we host right now on Versa, which is just a wrapper on AWS. I got some Versa, stuff too Very easy to ship, like you can get up.

35:38 I think I could probably speed run something up within like 30 minutes, maybe even less. That's how quick it is. So we got that right. And we're very happy about it. And

35:48 then we use, like I said, Postgres, they do some offerings for their back ends, but yeah, a full TypeScript stack all on Versa, and let's just move fast, which is the most important thing right

36:01 now. That's one of the biggest mistakes I feel like most startups make is that they're. They're so worried about it being perfect that they're not shipping. And because they're not shipping,

36:10 they're not getting traction or customers. And because they're not getting traction to customers, they don't have any revenue. And then they close up shop and it's like, one of my last boss I had

36:20 at Hivesil, which is my only true stint into the tech side of things. He said, don't let perfection be the blocker of progress or something

36:35 to that effect, right? And like, that really stuck with me after that because it's like, here, Colin is very much a, if you're not embarrassed by it when you ship it, it's too late. Yes. It

36:45 should be pseudo-embarrassing or like have some butt. Like, you're gonna know it more better than anyone else that uses it. So you have to like keep that in the back of your head of like, how many

36:55 people are actually gonna go down these random chain of steps that I got to that resulted in this bug? It's like probably not most of them, right? And so, no, I think that's a, that's, oh.

37:07 y'all are spot on with that strategy 'cause that's, that, there's so many times where it's like we have, you have an idea, you don't know if it's gonna work or not, or if like the market wants it,

37:17 but the only way to know is to put it out there. That's right. You gotta put it out there. Yes, because it is on it. Yes, and if you can't rapidly deploy a test that, where, you know, there's

37:26 like on low-code stuff and, you know, tools huge and prototype, like we're some of

37:28 these low-code kind of

37:33 LLM rag things now where it's like, I can run Wama on my laptop in a fully orchestrated rag within 30 minutes on a new thing, just to see if it works. Like, just to see if my hypothesis was, like,

37:52 are there legs here or not? If there aren't, it's very quick and it's obvious. If there are, then it's like, oh, well, I didn't just go spend two months trying to figure this out, right? It's

38:01 just like, yes, we're gonna spend time figuring it out, but it's probably worth trying to, versus this doesn't look like it worked very well at all. So let's just not do that, right? And it's

38:10 very against intuition. Like the companies that I worked at, they were, I mean, billion dollar SaaS companies where I wrote, I was even on the like the SRE or DevOps team where I wrote Terraform

38:21 and I wrote back-end scripts and I did validation software to make sure that a stack that stood up for a customer is working right. Right, on your town, so that's amazing. I could do all that,

38:32 it's just, I know how slow it is I don't know how painful it is and I could just do this. Maybe not the most efficient thing and the most prettiest thing, but it gets the job done and that's the

38:42 most important thing. Yeah, well, and that's the thing too in the startup side, right? It's like you, and it's frustrating from when you are coding 'cause you're like, I know in the next six

38:53 months I'm gonna have to rip and replace this at some point, but you just have to do it. And that's like, I kind of coined this startup masochist term And I truly believe that like most.

39:04 entrepreneurs and startup folks have some kind of masochistic tendency because it is like, especially on the development side, right? It's like, Oh, well, every three to six months, I'm redoing

39:14 the same thing that I did three to six months ago for a new, for this one random outlier thing that popped up or because we have to move to a different platform or what, you know, whatever. But

39:25 you have to be comfortable with the dynamics of that. And like, it's always going to change. It's never going to be perfect. And that's good. You're going to shoot your darlings Yes. Yeah, they

39:36 don't always have to have perfect hair and makeup before you present them. And they don't and you don't always have to leave them there if they're crappy or move them. But

39:45 so y'all are it sounds like AWS. Yeah,

39:51 I was going to ask you a little bit more about the fine tuned models. Are you all using any specific open source or closed source models or anything like that? Or are you all doing them yourselves?

40:00 Yeah, we try out the we try out the open source models.

40:05 They don't work that well, and they're a lot harder to find soon. You've got to stand them up and maintain them. It's a lot, it's a lot more work. And, you know, the price of the closed source

40:16 models is just so good. It just doesn't, like, why would you do all of that work? I mean, granted, there's, uh, there's obviously some use cases, like, um, you know, if the data is very,

40:26 very sensitive, you know, if you're doing some, that's really the only argument, that's right, in my opinion. Yeah. But yeah, if you're not doing government work, why? Why would you put

40:35 yourself to them? Well, but our production data is RIP. Yeah. Come on. I mean, it's our proprietary. No one else is doing the same thing that we're doing at all. We're all doing the exact same

40:46 thing. We can do that for you. Yeah. I'll put it this way. We can do that for you. But, uh, it will be a charge on time, that's for sure. No, that's, uh, I just think it's, it's so wild

40:57 how our industry has approached data for so long and uh,, how behind or how. We've learned a lot of hard lessons that we wouldn't have had to learn if people would have just talked about shit.

41:10 Like, Breck interference is a perfect one, right? We've got all these crazy development plans to put these wine racks and all this stuff in West Texas. And then it's like, oh, it turns out you

41:21 can't just shove a bunch of wells in one lease because they all pull from the same formation. There's so much oil down there. Right, and so it's, but yeah, it's a.

41:35 We haven't, we're doing, we've got enterprise deployments and stuff, and it's very interesting to see, as you can imagine, the big companies are all very hyper-focused on data security, and they

41:46 want it all in-house, and on their Azure environment, and nothing open source, or nothing close source.

41:55 But to your point, it is a hell of a lot easier to use most of these close source. They also have a lot more tools to go with them.

42:10 And it's just so much faster, right? Like I still use the open AI playground for all kinds of shit just to, again, test it out and see if there might be something there and if there's not, not a

42:15 big deal. But

42:19 how are y'all fine tuning through the models tools or are y'all using a different, I'm curious on the fine tuning side 'cause we're potentially looking into that in the future. And there's a bunch

42:29 of like platforms out there that are like, oh, we'll help you fine tune your stuff. Fine tuning goes serviced and something like that How much, how do I'm just curious from your experience? Like,

42:38 you don't have to say which ones or anything like that, but how has that been? Would you, do you have any confidence in these third parties or are you all doing it all yourself? Well, from my

42:47 experience, if you're using the close source models, you really don't have a choice. You have to use whatever tools the close source model provides. I mean, I don't really see how, for example,

42:57 for GPT 4L, I don't see how a third party could fine tune you 4L for you Right Um Yeah, I mean, for the open source models, I guess the third party fine tuning services could be fine if you're

43:10 trying to ship something out pretty quickly, but I mean, you're already putting so much work into it already, you may as well, just spin that part up. I mean, that's not the hard part of the

43:21 whole process anyway. It's really the hard part is getting the data for the fine tuning in the first place. Right, yes. Yeah, I mean, you don't want to,

43:31 you really don't want to put the wrong data in there, 'cause you're teaching it the wrong thing, and it's going to be really accurate in the very bad way. Yeah, then you'll be like Jim and I with

43:39 all the Reddit troll returns. So I actually added that to, we're doing a hackathon this weekend. I added that a screenshot of those to one of the challenges, because one of the challenges is, hey,

43:52 we're going to give you all the collide post data and the likes and comments come up with an algorithm to help us figure out how we determine whether this post and this content is one valid.

44:05 into worthy of adding to the RAC model so that it's there in the future for the knowledge bit, like the brain to get bigger. Yeah, it's just a problem. Well, it's a very interesting problem. And

44:17 then you look at, you know, what the whole Jim and I read at stuff. And it's like, as someone who's working on that stuff, I'm like, I know exactly what happened. One of two things happened.

44:26 Either someone's manager was like, you don't, I don't care, ship it. So the developers were saying, it's not ready, it's not good, and they just didn't have the time. Or the whoever was

44:38 responsible was just super lazy and they're like, hey, we're just gonna go based off of upvotes and we aren't gonna account for the fact that Reddit is full of trolls. And they probably upvote the

44:48 funniest comments as well as the accurate comments. And there's no really good way to distinguish between that. But it's a, yeah, no, that's, it's a non-trivial thing when you try and like with

44:59 fine-tuning even, right, to your point It's like, how do you, okay, I've got a shit ton of data,

45:05 to know that that data is worthy of adding to my fine-tuned model, or is it just data that I don't need to worry about? What was your kind of approach and strategy been on that side? Yeah, so the

45:17 thing that's worked for us is to just craft it by hand. Yeah. We just sit down with those documents and we painstakingly read them and transcribe everything into the slot it's supposed to go in.

45:27 Yeah. And then that's how it goes. Yeah, nothing else has worked. No, that's with a lot of this stuff. I was telling Calm earlier I was going through the manual metadata stuff and I was like,

45:37 Yeah, I just finished. It's like our metadata is gonna be so fucking glorious. It's gonna be so good 'cause I did so much with my hand. You know, like it's a pain in the ass and the only thing

45:48 you can do is grind through it. Like there's nothing else. There's no other way to do it.

45:53 Where,

45:56 what kind of

45:59 traction, what are your, who are your typical kind of customers

46:05 customers, mineral buyers, okay, yeah, because I know, I know one, like, it could be argued that operators also look to buy mineral interests. And, but we've realized that the mineral buyers

46:21 themselves are much easier to work with. They're fine with moving fast and I was going to say it's a much faster, easier. You don't have a whole legal team dissecting your MSAs and do privacy shit.

46:35 You're dealing with a lot fewer people. Yeah. And usually it's like, yeah, a few man, these mineral buying companies are like just a few people working on it. It's a bunch of next land guys that

46:44 buy minerals now, right? Yeah. I can just email, email the founder, email the partner, the person who actually just runs the mineral buying company and just work with him directly. And so

46:54 that's, that's where we find the most success. We're definitely like looking down the road. See where we can go. But mineral buying is, is definitely pretty which are bread and butter right now.

47:06 And is it how, so how does that work? Is it kind of a SaaS product or? Yeah, there's a SaaS product. You know, some of these mineral buyers have like odd requests here and there. Right. You

47:17 know, and we're perfectly capable of, you know, providing that as well. Yeah, nice. That usually ends up being a combination of that. Very cool.

47:30 When,

47:33 or what is, what is the all's kind of plan moving for you are going to try and grow from cashflow or you are going to try and get investors. What are you all, what's the, or do you know yet? Also

47:43 not knowing is also perfectly acceptable because you all know it's your first year. So it's not even

47:48 just out of curiosity, y'all. Yeah, I guess it depends on

47:54 like what the, like what the optimal path is for it at that given time, right? Like what's the deal, are they offering like good terms? Are we getting good cash flow? No, that's a good answer.

48:08 I mean, we're alive and we're kicking and we're not going anytime soon. So there's not really a huge impulse for us to go reach out to these easier, stuff like that. What were some of the problems

48:21 that you faced prior that you wanted to work on or try and solve with a company that we haven't already talked about? Or is it mostly what we've already talked about? What do you mean? As far as

48:35 like, before you started the company, when you were at opportune or anywhere else prior, are there any like glaring kind of big things that are like, Man, this is the worst. Like other ideas for

48:47 us to solve. Yeah, we had, there was just one idea with looking at production accounting from. What opportunity there is production accounting from like, you know, midstream and from the well

49:01 site and seeing if those things match up, right? you know, someone's stealing from someone else or whatever.

49:12 I think we also had this, this one thing, the CRM, this thing, yes. Could tell you more about that. Oh, no, it was a CMS system. CMS might change management system. So, I mean, this is not

49:24 really related to energy or oil and gas, but we, uh, working at some companies, um, the bureaucracy there was just so big that there was an entire system where you would want to go change one

49:38 small thing in one server and you had to go through this entire process and the company that I was working for, uh, they actually had an in-house built system for this entire process, um,

49:52 they had this entire system where you would essentially request it and then some, someone who's allowed to approve would go approve it. And, and if it becomes a very common change, like we, we

50:04 find ourselves changing this environment variable or this setting on this EC2 instance a lot, we would go ahead and automate it. And then that way we would let automation take care of it. And

50:14 automation would go open up the request, close it because it doesn't need approval. 'Cause it's deemed like a safe change. So we decided we thought about maybe building something like that for

50:25 companies. Trying to like a DevOps almost. Exactly, yeah. And we would handle the entire thing, like any software that would go with it And we would, yeah, so we thought about it, but after

50:39 talking, we ended up talking to a lot of people in multiple industries. And it was just such a glaring problem in land and oil and gas. So we were like, let's just look at this first. That's cool,

50:51 man. You end up wanting to solve the problem of the people who talk to you the most. Yeah, for sure. We're the one, yeah, that frustrates you, has frustrated you personally as to, right? Like

51:02 that's always. It's the fastest way to get me to dive deep on something is throw something at me that I'm like, this is just stupid, I need to fit. And this is a dumb way of doing it. Especially

51:13 if most people talk about the same thing. Like we were talking to people like all across the land industry and it was just the same problem would come up and we're like, this is definitely not a

51:22 coincidence or something here. Yeah, no, I've got a good buddy. I'll introduce you all after this podcast too that

51:31 he was a land man at Chesapeake and it's a bunch of different operators, but mostly on the exploration side back when shale was blowing and going and a hundred dollar oil and everyone was buying up

51:43 acreage left and right. And so yeah, he has a

51:46 minerals business and he's like, he is your target buyer, right? For this type of thing and stuff, but it's true. It's like those guys. I mean, you can know the process and what's a good value

51:58 and all that, but that doesn't mean that doesn't, mean that you can always get these deals done because then you still have to go to the damn courthouse and do all this like nothing. Yeah, so

52:08 having a platform where it's like he can work from wherever the hell he is at that moment in time and not have to spend a week in Midland or wherever just researching is inherently valuable out of the

52:21 gate. So

52:24 where where

52:27 can I'm sorry, I'm jumping ahead This is the speed around where I can't believe we're already in an hour. These things never seem to amaze me how quickly they go.

52:37 Let's do speed around favorite

52:41 video or board game.

52:46 What big fan of back in back in nice. My grandfather taught me back in and then I never haven't played in twenty something years Not a lot of people. I was at a cost. I don't need. No, you're

52:60 right. I randomly was at like a restaurant over here in city center for like lunch one day and there were these two old guys playing back game. I was like, oh shit, that's awesome. I had like a

53:09 weird like memory flashback moment with my grandfather. Katan. Katan? Yeah.

53:15 That's

53:22 a

53:24 good one to do, yeah. Yeah. Absolutely. Favorite book or podcast or, you know? Originally got into Sean Ryan's show and discovered him like two weeks ago. He's very big, but I just recently

53:30 discovered him It's so amazing to me, like podcasts specifically are so crazy because it's like, there's so many, but it's like that. It's like, you had never even heard of it. And then you're

53:40 like, oh, well, this guy's huge. How am I not heard of him? And you're like, oh, 'cause there's lots of people, but there's only so much capacity that you have in that stuff. What about you?

53:50 I don't know if I'd call it my favorite, but the one that I just found recently I've gotten really into. Have you heard of hardcore history at all? Yeah. that does hardcore game of thrones

54:03 he mimics Dan Carlin's voice

54:06 and he does

54:09 this is also what I love podcast it's like a creative outlet like who will ever I would have never thought of that but I love that idea is great you should check it out that's also I think it's on

54:19 YouTube what's it called hardcore game of thrones

54:22 it's also search SEO optimized I appreciate that too right like not coming up with some random name that you know you wouldn't be able to tell from it um last one favorite open source library

54:41 uh swift okay how's it going to be honest with you um I just check to see which one will solve the problem I'm doing and then and then I'll just like I'm like a good developer is spot on but no one

54:56 has a good answer to the favorite because there

54:60 or like there's so many. My favorite is the one that solves the problem. Yeah, my favorite is the one that is already been done that I don't have to build something from scratch. Yeah, and it's

55:06 easy to work with properly for sure. Or the one I'm learning right now.

55:13 So that was, I guess something I forgot to ask you guys. What do y'all code in normally? Is it Python

55:21 or? TypeScript. TypeScript. Yeah, manly TypeScript. We do some Python, Eugene usually does a lot of Python for the AI stuff Yeah, with the ETL and backend stuff. Exactly, but the TypeScript's

55:32 the main thing right now. Easier for the web app. Yeah, that makes a ton of sense. Where can people find you if they're interested in talking to you guys? They can email us at infoionianlabscom

55:46 or really anythingionianlabscom 'cause we own the domain.

55:52 And that's website, ironianlabscom. Yes, labscom, right Or

55:56 LinkedIn.

55:58 Guys, I really appreciate it. Everybody go check these guys out, especially. I know there's, I don't know how many of y'all are listening to my podcast, but I know there are plenty of mineral

56:06 buyers out there in our community that will probably find this very interesting. So go check them out. Appreciate you guys. Give us a like and a subscribe. If you get a chance, we will see you

56:16 guys next time.

Creators and Guests

Host

Bobby Neelon

Husband, Father, Baseball, Upstream Oil and Gas, R, Python, JS, SQL, Cloud Computing

Host

John Kalfayan

Raddad, energy tech, crypto, data, sports, cars

EP 53: Meet Ionian Labs' Game-Changing Tools

Creators and Guests

headphones Listen Anywhere

Listen Anywhere