Transcript
I've been a fan of OpenAI since they launched ChatGPT, almost exclusively. It's only until recently that I started to experiment with other models and I started to actually take notice of Google, which out of the gates they were pretty slow to start and for good reason -- the outputs of their models were let's just call it subpar. But what I've noticed as they've started to come out with their newer models is actually the performance is actually getting quite a bit better.
And more than that, if you're a heavy user of Google, which I suspect most of us are -- whether it's Google Search, whether it's Gmail, it's Google Docs, Google Slides, the list goes on -- I started to realize Google might be able to actually win this race given that they are so deeply embedded into our lives already. And their ability to deliver on good models, which they're starting to do and I anticipate that'll only improve, the fact that they're able to natively integrate into the things that we already use every day is pretty material. It's very meaningful.
And so what I wanted to do is actually go through an example of what got me thinking through these lines recently. Obviously they're already in Search, you see them in Gmail, you see them in Google Docs. But as I started to play with it more and I'm seeing it in more places and seeing how deep they can actually go in terms of the entire tech stack, I started to really think they actually might come out from the rear of the pack.
What we're looking at here is Google's BigQuery product. Now if you're a user of data in any meaningful way -- think databases, tables, SQL, querying information -- whether you're actually a user of that or you're just a consumer of reports, it is an incredible product to have embedded into your business. Whether you're a power user or you're just a consumer of reports, it is a fundamental tool that should be a part of any meaningful business. BigQuery is their product in conjunction with cloud storage and a whole variety of other things, and what's really nice about it is it's deeply embedded in terms of all of the APIs that they offer.
Now just from the landing page here you can see they come out of the gates -- "a data warehouse to a unified AI-ready data platform." And so imagine being able to leverage these models to actually not only build out databases but actually interact and explore data through the lens of AI to get the insights faster. And so this is one of the use cases I was recently going through and as I started experiencing what they're able to do with the tech stack that they have, I started getting pretty excited about it.
So I'll scan down here a little bit more. You can see "Power your agents with Gemini and BigQuery," bring multiple engines into a single copy of data, okay fine. The whole point here is they have an entire tech stack in conjunction with their data and oh by the way you can get up to 10 gigs or up to a terabyte of queries free per month. So for free you can essentially access all of these tools and integrate it into your business.
And so let's hop over into BigQuery. I'm not actually going to get into the tables and data, all that stuff. But what I did want to take a look at is not only in their tables and in the data that they offer, they actually enable you to integrate Python using their product called Colab. So I'm going to go ahead and open up this demo notebook and what this is actually going to do is take us through some of the features and functionality that they're able to do.
And what's nice is if you're not familiar with this, you don't know what pandas are or what data frames are or what Google is dubbing BigFrames -- and it's the part of their BigQuery product -- all that really is is just a table. But practically speaking you could go through this whole thing and not even necessarily need to know how all that works.
Let's just scan through this so they're giving us essentially a test data set here. They are going to generate customer clusters and marketing messages using this sample data. And so you could easily see how this might apply in your business and how you could leverage the same sort of feature set and functionality to do what most companies historically would need a data science team to accomplish. You essentially can do this on your own by spending a little bit of time understanding how this works and going through some of these demos.
So I'm going to scan through here, we won't read through all this, but essentially I'm expecting it's going to basically build a table, it's going to leverage their integrated AI services to do some level of analysis, some of their machine learning models to cluster the data, and it's going to generate some marketing messaging using likely their Gemini model.
So if I scan down here, okay yeah this is reference, great that's helpful. And so out of the gates it's already doing a lot of the setup for us. And this is really the same way you would actually create a project. And so you can call it whatever you like. They're going to add this data set, they're going to include this model. And in Python what's cool is you literally just run code and so you click on this little play icon and it's going to go ahead and process this for us.
Okay great so this piece of code ran for us. It doesn't look like we're expecting any outputs. Let's go ahead and run this next one. This will actually print something. Okay it created this data set for us and it's called "the look retail." And then it's going to start to walk us through how to actually explore the data. Here are some notes, we're not going to read through all this. I'll just trust what it's doing and it's going to initialize a BigQuery data frame, i.e. the BigFrames. Let's go ahead and initialize that. Okay that's initialized.
Then we're going to go to the next thing. It looks like it's doing some more setup and just setting some variables for us. And then we're going to scan on down here and it is going to then read that for us. So let's go ahead and run that. So here you can actually see the table that it's put into a BigFrame. It looks like there's some products with some IDs and some statuses and some different dates and when it's shipped, so on and so forth. You have sales prices and presumably it's going to walk us through the analysis that it wants to run.
It's going to restrict the columns -- so think of this just like filtering as you would in Google Sheets or something -- and it's only going to filter to these columns. Let's go ahead and run that. Great so now we have a revised data frame here and it's looking like it's just returning back a bunch of IDs, sales price, order dates, and statuses.
Okay so then we're going to go on here and it is going to filter on dates. So let's go ahead and run that. So it looks like that came through successfully. And now it's going to do some feature engineering -- this is some real data science stuff where it's going to be using some models to actually try to isolate some clusters of customers. So let's go ahead and see what this will do for us. Okay so that one ran. Now let's look at calculating average spend per customer. You can see it's just essentially calculating the mean of the sales price based on the user ID.
And then if we scan down here it's going to calculate total number of returned orders per customer. We'll run that, see what that looks like. Now it looks like it wants to calculate a return ratio of customers -- I think my wife might go off the charts on this one. And then let's go on to compiling a conclusive data frame for the development of a machine learning model for customer segmentation.
So again this is actually effectively building the model for you leveraging some of their out-of-the-box features and functionality right here. Now obviously if you had to write this on your own that'd be much harder. The one thing I'm going to highlight here is actually Gemini is right here. So you can actually chat with Gemini right here in the UI, ask it questions, and even have it give you the code that you would need to potentially insert in here. So you could theoretically just prompt this what you're trying to accomplish and it might start to steer you in the direction. And in a way it's like your pocket data manager, data scientist, etc. So this is really cool functionality.
So let me minimize that chat and we are going to go ahead and run this. Right, so now we have our data frame here. And now the next step is to create a K-means model to cluster e-commerce data. Gives you a little description about what K-means is, what the algorithm is all about. And here it's actually going to build that out for us. Go ahead and run that.
Okay now it's clustered the data for us. So now we're going to save the model to BigQuery and create essentially its own model in our UI here. So we will run that. That is done. We are now going to visualize the cluster, which is so cool that you can actually build charts and graphs again directly here in the UI on top of your data. Honestly I haven't looked or even thought about MATLAB probably since I don't know, one of my engineering classes back in the day. But let's go ahead and run this and see what its output looks like.
Okay so now it is plotting these and then we now have to actually generate the scatter plot. So let's go see what that looks like. There we go. So it looks like what it's doing is essentially we're looking at various buckets on the count of orders. We're seeing a distribution relative to their average spend, and it's actually created our cluster. If I didn't know anything about this business I would say okay, we have a lot of couple power users, power spenders with the business out here, and then you see a number of concentrations around just one order, two orders, maybe up to four, and then it starts to disperse after that.
Now let's go ahead and get some summary statistics on this. And we'll go ahead and run these next ones. All right and then let's go ahead and get the summary stats here. Let's run this guy and let's go ahead and run this guy.
So now we get into the fun part where it's actually going to start leveraging some of the AI models that Google offers as a part of this particular workflow. And so here we see that it's going to be leveraging generative AI and it's essentially going to have it explain the customer segments. And it's going to use Vertex, which is one of the models available through Gemini. And so we'll go ahead and -- looks like we have to do a couple setup pieces, so we'll go ahead and create the connection. And then we need to likely connect to our project here and we just got to run through these steps. And then we will also kick this off.
Okay so now we're actually going to get into prompting the model. And so we need to go through a couple steps to achieve that. So we're going to set up the model and connection here. And then in the next step here you can see it's actually giving it a prompt and this is no different than how you would interact with ChatGPT. And you can see it's passing this in and it's passing all the data and it's asking for something for each of those segments or the clusters that it's defined so far. So we're going to go ahead and run that. And then finally we'll run this last cell here.
All right so now we have the final outputs and it's broken this into roughly five clusters. And it is essentially creating avatars or market segments that you then, you know, target for the purposes of this particular use case. So we have the Value Seekers, we have the Premium Purchasers, all the way down to the Occasional Buyers. And you can see it's actually defined a persona leveraging one of the models and it's actually identified some of the marketing steps. So this might be how you engage with these particular individuals based on their usage and behavior with a particular product, driven by the data that we just ran through from the data set that they provided.
So I know this was an out-of-context example that they just have in a demo notebook available in BigQuery, but I think it starts to illustrate what's possible and just again how deeply integrated the tech stack that Google already has and their ability to integrate that natively into each of these experiences. And as they improve the models over time, this is going to be incredibly powerful.
I could easily see starting to transition from being an OpenAI guy to a Gemini guy, or you know any of the other models. In all likelihood you'll pick the model that fits for the use case that you're worried about solving for. In the context of exploring data, building models, and having this capability to build essentially a machine learning model and then layer in generative AI to articulate that in a meaningful way that you can then share with other people or actually make some decisions off of -- to me is really powerful.
Now I'm not sure that AI is going to take over the world, but what I am certain of is these models are fundamentally going to change the way that we work, the way that we operate, the way that we interface with people in the workplace or customers, etc. It is meaningfully impactful. The power and the speed to which you can execute when you have these in your back pocket -- so I would encourage you to continue to play with these, explore, be curious, and stay hungry. Until next time.