Table of Contents
- Table of Contents
- Prelude
- The Problem
- Getting the page HTML
- Checking if it’s in stock.
- Notify me!
- So what happened?
Prelude
So this whole story started because my girlfriend asked me for a KitchenAid stand mixer for her birthday. We saw that Costco had a really nice one on sale for $90 off, which was enough to trigger my “we need to get that price no matter what it takes” instinct. Unfortunately, on the first day the item was available, it was sold out in-store and online. I tried calling every day to see if they would get it back, but every day they said they had no more. One day I called, and the employee mentioned that he saw it available online, which surprised me since I’ve checked and it previously wasn’t available. This is when I realized that it must be going in and out of stock, and it would be the perfect opportunity to write a script to automate the checking for me.
As with most web scraping projects, this one got pretty messy. Whenever I see technical blogs, I always think “wow, I could have never thought of that.” This blog post is designed not just to tell you what I did, but how I got to my final code, so that you can learn how you could come up with a similar solution to the problem. We’re gonna go through a journey of APIs, HTML parsing, and tons of Python.
The Problem
The first thing I like to do when writing code is to abstractly think about what I need to make happen. Surprisingly this isn’t something that just intro CS professors make you do, but a good way to problem solve. So what did I want exactly? I wanted code that would do the following:
- Load the Costco product page.
- Determine if the item was in stock.
- If it was in stock, notify me.
- If it was not in stock, try again in a minute.
The thought is once we have the HTML, there must be something in it which signifies whether it is in stock or not. So if I was going to write this out with unwritten functions, it would look something like this:
I know this looks very similar to the meme about using “coding and algorithms” to have drones not crash into each other…
But the difference is we’re gonna fill everything in! So how do we actually do it?
Getting the page HTML
So how do we load the data from the Costco listing? Normally for getting HTML,
my first instinct is to use the requests library. requests
is an extremely well-developed library, which makes it easy to make web requests from the internet. For example, if you wanted to get
the HTML for my website, you would just have to do the following (shown in REPL
format):
You can install the requests library by doing pip install requests
in your terminal.
So that is pretty easy. 200 is the status code in HTTP which means “everything
went as expected,” so if we get back a 200, we know we can get the HTML from the
content
variable. So what happens if we try it on Costco’s page?
Well… on my computer I got this really fun error from the requests library after waiting about a minute:
So that’s when it hit me that this probably was not Costco’s first rodeo, and that people have probably tried to scrape their website before, so they must have some kind of protection against it. So what could it have been…
My first thought was to check the user agent, which is a string that browsers send across
with requests to let the server they’re hitting know what kind of browser they
are. Fortunately for us, these strings are very easy to spoof, so we can try to
make it the same user agent as our own browser. You can see your own User Agent
in Chrome by opening up the developer tools, clicking the network
tab, refreshing the page, clicking
on the first network request to the stand mixer page, and then scrolling down request headers
and then find user-agent.
From this website, I found out how to modify our request to include this user agent. So does it work?
Nice! We got back a 200 from the server, which means they accepted our request
and we can get the HTML from doing return page.content
. So in all, the get_page_html function will look like this:
Checking if it’s in stock.
So now that we have the page HTML, how can we tell if the item is in stock or not? Well, as you can see from the above screenshot, there’s a huge “out of stock” banner on the stand mixer image. So let’s see if we can use that. In Chrome, right click on the “out of stock” banner, and select “inspect” to look at where is is in the HTML.
Here’s what it looks like:
So it looks like the “out of stock” overlay is an image tag, with the CSS classes
oos-overlay
and img-responsive
. The img-responsive one doesn’t seem too useful, but oos-overlay is extremely specific. This is really handy for us, since we can use this info to tell if the item is out of stock. There’s a really nice library for Python which is really good at parsing HTML, called BeautifulSoup. You can install it by entering the command pip install beautifulsoup4
. One of the things that’s really easy to do with BeautifulSoup is find all of a specific kind of tag with certain attributes. Here is what I found when I googled how to do this. So let’s make a first pass at this:
Let’s see if this works on the stand mixer page, in conjunction with our get_page_html()
function.
Cool! Looks like our code can tell that the stand mixer is out of stock! Ideally
we would like to make sure that if it was in stock, that the function would return True
. But obviously we can’t do that, or else we wouldn’t be in this situation. But we can instead give it another product’s page and see what happens. Let’s make the URL for get_page_html
a parameter so we can easily swap it.
Let’s try it!
Hmm, that’s really weird. Why is it saying that this in stock item is actually
out of stock. We only return True if there are 0 img
tags with the class oos-overlay
in them, so let’s see what’s going on in the new product page. Let’s go
back to the “inspect” tool on this new page, and do ctrl+f for “oos-overlay” to
see where it could show up…
What gives! It looks like the overlay is still on the image in the code, but it isn’t actually showing… On further inspection, it looks like this oos-overlay image is slightly different.
Did you catch it? The overlay has an additional CSS class, hide
, which probably
has some CSS code telling it to hide the image. So the way Costco implemented this overlay is if the image is in stock, to add a “hide” class to the overlay so the user does not see it. We can use this in our code to instead look for hidden oos-overlay img tags - if one exists, that means the product is in stock!
Notice the two small changes - now we’re looking for images with both the “oos-overlay” and “hide” classes, and if such a class exists, we return True.
Let’s try it on our two products!
Nice! We’ve successfully written the part which will tell us if the product is in stock or not! Now we need it to notify me when it’s in stock, so I can buy it.
Notify me!
I chose to do two things to notify me once the item goes back in stock. First off, I wanted to get a text once the item came back in stock. Twilio is an API that allows you to easily send text messages. Thankfully, they have a free trial period for the API, which allows you to send ~150 texts for free, given that the number you’re sending to has been verified as yours. This works well for me, as I just needed the notification to go to me.
After making a twilio account and signing up for the free trial, I’m able to
follow their Python quickstart and write the code to text myself. You can verify your number here. You also have to install the Python package by entering the command pip install twilio
You might have noticed that there’s a lot of imports from secrets
. In general,
it’s not good to put secret information in your source code, so that you can
later put it on Github without people knowing that information. Specifically, the
account_sid and auth_token can be used to send texts on your behalf, which can get
very expensive if you’re not on the trial. So you can make a file called secrets.py
in the same folder, and define all of the variables. The first three
can be obtained from your project console, and the last one is just your phone number.
To test this, you can run the function yourself and make sure you’ll get the text.
I wanted to do one other thing though to make sure I knew when it would be in stock. If the item came back, I wanted my program to start playing a loud alarm sound. That way there would be no way to miss the notification. So I found this Python library called playsound. After reading the tutorial and doing pip istall playsound
, I found an alarm sound online, named it “alarm.mp3”, saved it in the same folder as my Python script, and added this code to send_notification
:
So there you have it! All the parts of our code are fully written out, and able to be run together. We just have to chain everything together as follows:
And that’s it! Just run this code on your computer, and eventually you’ll be startled by the alarm once the item is back in stock. I hope you appreciated the walkthrough of how I got to this solution, and I hope you learned a thing or two!
So what happened?
Well I ran the script, and a day later I heard the loud alarm sound! I went on my computer, and got the stand mixer! I even told a friend who was also interested in it, so we both got to score on the deal :) - so I’m glad that in the end the effort ended paying off. Hopefully it’s useful for y’all as well in the projects you try to create.