MikeJamesHamm

Student, Developer, Technology Advocate
Welcome to my personal website



Guide: Using Google Chrome to Access Hidden APIs

on Dec. 29, 2016, 10:55 p.m.

One problem I always seem to run into when gathering data is finding an easily accessible API. Often times there is no free API solution or any datasets to access. However, many times there is a solution to your data needs hidden in your browser requests. Take for example the Yahoo Finance website. I understand that stock data is easily accessable, but lets imagine that there were no easy solutions to getting price data. You could easily copy and paste data by looking up the price on Yahoo Finance, but that would take hours of meticulous work. Another solution would be to load the webpage with Python, and begin to parse the HTML response. One problem with that is making a request like that loads tons of unnessecary data when all you want is the price. After making dozens of requests this could really begin to bog down a program by loading extra information. Fortunately, Google Chrome has fantastic inspection tools. You can try it your self by right-clicking any webpage and clicking inspect. You should see the HTML response of the webpage, which is cool, but now click the network tab. The network tab allows users to view network activity from a specific webpage. So for example, if something like an API request were made, you could see the specific address used to load the data.

To see an example of an API request being made load any chart you want in Yahoo Finance. The data is displayed quickly, but where did it come from? Open up your network inspection tools again and change the dataframe to something different, ie 1d, 5d, 1m, 3m...

When you change the dataframe you should see a request pop up. Here is an example of what I saw below.

If you load the request URL in your browser you should see an overwhelming amount of JSON data.

The next step to our data solution is to parse the information into something easier for us to work with. Our goal is to get a timestamp, volume, open, high, low, and close prices for each tick of data represented on the Yahoo Finance graph.

If you have never worked with JSON data before, you can think of it as a bunch of nested dicts that can contain any other type of data. Below I have outlined how our JSON response can be interpreted. Notice that some results from the JSON data contain empty lists. This means if we wanted to access the timestamp list we could manipulate the variable j like this...
j['chart']['result'][0]['timestamp']


j = {
chart : {
result : {
[{
timestamp : [random-timestamp-data]
indicators : {
quote : {[
open : [random-open-data],
high : [random-high-data],
low : [random-low-data],
close : [random-close-data],
]}
}
}]
}
}
}

However, before we do that we still have to setup our Python to interpret the JSON data. Luckily a huge positive to the Python language is all of the accessible modules. In this example, I copied and pasted the JSON response to a txt file. You could easily load the url directly and interact with the data from there. Lets start by creating a dataframe class and reading our JSON data.

import datetime
import json

class DataFrame:
def __init__(self, timestamp, volume, open, high, low, close):
self.timestamp = str(timestamp)
self.date = str(datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')) # converts our unix timestamp into a readable format
self.volume = int(volume)
self.open = float(open)
self.high = float(high)
self.low = float(low)
self.close = float(close)

file_open = open("sample_finance.txt", "r")
file_read = file_open.read()
file_open.close()

j = json.loads(file_read)['chart']['result'][0]

Remember to refer to the example of the JSON outline above to keep track of the format of our data. Next lets create a quotes and timestamps variable to load the possible price, volume, or timestamp responses.


quotes = j['indicators']['quote'][0]
timestamps = j['timestamp']

Now that we can easily access our lists of data, we need to separate each item in each array, and add them to our DataFrame class object. We will store each object in an array.


my_data = []
i = 0

while i < len(timestamps):
my_data.append(DataFrame(timestamps[i], quotes['volume'][i], quotes['open'][i], quotes['high'][i], quotes['low'][i], quotes['close'][i]))
i+=1

for each_frame in my_data:
print(each_frame.timestamp, each_frame.date, each_frame.volume, each_frame.open, each_frame.high, each_frame.low, each_frame.close)

Our program prints this as its response...


1451917800 2016-01-04 07:30:00 67649400 102.61000061035156 105.37000274658203 102.0 105.3499984741211
1452004200 2016-01-05 07:30:00 55791000 105.75 105.8499984741211 102.41000366210938 102.70999908447266
1452090600 2016-01-06 07:30:00 68457400 100.55999755859375 102.37000274658203 99.87000274658203 100.69999694824219
...
1482849000 2016-12-27 07:30:00 18296900 116.5199966430664 117.80000305175781 116.48999786376953 117.26000213623047
1482935400 2016-12-28 07:30:00 20582000 117.5199966430664 118.0199966430664 116.19999694824219 116.76000213623047
1483043062 2016-12-29 13:24:22 11870151 116.44999694824219 117.1094970703125 116.41000366210938 116.69000244140625




The code listed above may or may not be the "best" solution. Please be advised that this is just the way I did it. There are normally thousands of different ways to solve the same programming problem. Find an error? Let me know in the contact form below.




Contact Me

Feel free to email me feedback, suggestions, notes, or to just say hello!