Skip to content

Broken options class for pandas.io.data - Did Yahoo just change their options site? #212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dscheste opened this issue Jul 13, 2016 · 37 comments

Comments

@dscheste
Copy link

dscheste commented Jul 13, 2016

Hello,
Yahoo seems to have changed their site design as I no longer can pull any options data with panda.

Does anyone else experience this?

Here's what's called:

#!/usr/bin/python
import sys, getopt
import pandas as pd
import numpy as np
from pandas.io.data import DataFrame
from pandas.io.data import Options
import h5py as h5
from pandas_datareader import data, wb
import datetime
import time
ticker = str(sys.argv[1])
new_data=pd.DataFrame()
option = Options(ticker, 'yahoo')
data = option.get_all_data()
new_data = new_data.append(data)
new_data.to_csv('cache/'+ticker)

And here's what I am getting:

Traceback (most recent call last):
File "db1.py", line 18, in 
data = option.get_all_data()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/data.py", line 1134, in get_all_data
expiry_dates = self.expiry_dates
File "/usr/local/lib/python2.7/dist-packages/pandas/io/data.py", line 1165, in expiry_dates
expiry_dates, _ = self._get_expiry_dates_and_links()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/data.py", line 1191, in _get_expiry_dates_and_links
raise RemoteDataError('Data not available')
pandas.io.data.RemoteDataError: Data not available

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: None
sphinx: None
patsy: 0.4.1
dateutil: 2.4.2
pytz: 2014.10
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
matplotlib: 1.5.1
openpyxl: 2.3.0
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: 0.2.0
@Bodisatva
Copy link

Bodisatva commented Jul 13, 2016

Same issue. Did yahoo proceeded to a complete overhaul of there site ? I hope not

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas_datareader/yahoo/options.py", line 619, in expiry_dates
    expiry_dates = self._expiry_dates
AttributeError: 'Options' object has no attribute '_expiry_dates'

OS X10.11.5
Python v3.5.0:374f501f4567, Sep 12 2015

@dscheste
Copy link
Author

Bump.

@nborggren
Copy link

ended up here as my codes have broken as well (see error below), I'll fork it and try to fix it, but I presume someone else can fix it better/faster...

from pandas.io.data import Options
Options('AAPL','yahoo')
<pandas.io.data.Options object at 0x1051c7c90>
opt.get_all_data()
Traceback (most recent call last):
File "", line 1, in
File "/Users/nborggren/anaconda/lib/python2.7/site-packages/pandas/io/data.py", line 1115, in get_all_data
expiry_dates = self.expiry_dates
File "/Users/nborggren/anaconda/lib/python2.7/site-packages/pandas/io/data.py", line 1146, in expiry_dates
expiry_dates, _ = self._get_expiry_dates_and_links()
File "/Users/nborggren/anaconda/lib/python2.7/site-packages/pandas/io/data.py", line 1172, in _get_expiry_dates_and_links
raise RemoteDataError('Data not available')
pandas.io.data.RemoteDataError: Data not available

@Slimdog588
Copy link

how does this get fixed? by whom? and in how long? Any info is appreciated.

ALL my shit is busted..lol

@emican86
Copy link

I was hoping to contribute to the fix, but yahoo is not returning a neatly formatted json object ( from what I can see) The http response is different and I have not made sense of the changes yet.

@nborggren
Copy link

I noticed the yahoo page is a mess too (from a scrapers perspective), I moved over to google for a quick fix. Below returns a rather clean json file that might be a good starting point, but I haven't put it into a pandas data frame yet. A lot of the symbols I was looking up in yahoo can't be found in google though.

import urllib2 as ul
dat = ul.urlopen('https://www.google.com/finance/option_chain?q=AAPL&output=json').read()

@Slimdog588
Copy link

@nborggren Do you have any code that will parse this? Thanks

@MattRijk
Copy link

Same issue. All my options data scripts return Data not available. They were all working when i last checked 7-6-2016

@nborggren
Copy link

nborggren commented Jul 14, 2016

I'm having a little trouble parsing this, but here is what I have so far. I needed to reference

https://github.com/femtotrader/pandas_datareaders_unofficial/blob/master/pandas_datareaders_unofficial/datareaders/google_finance_options.py

and

http://stackoverflow.com/questions/21104592/json-to-pandas-dataframe

I have a pandas dataframe (two actually one for calls and one for puts) but it will take some massaging to put it in the form of yahoos data before this catastrophe. Perhaps someone else can try too. Below is what I have so far:

from google_fix import *
calls, puts = Options('AAPL')
puts.head()

shows a dataframe that looks like the image.
screen shot 2016-07-14 at 7 23 49 pm

below is the google_fix code

import token, tokenize
from six.moves import cStringIO as StringIO
import json
from pandas.io.json import json_normalize
import urllib2 as ul

def fix_lazy_json(in_text):
    """
    Handle lazy JSON - to fix expecting property name
    this function fixes the json output from google
    http://stackoverflow.com/questions/4033633/handling-lazy-json-in-python-expecting-property-name
    """
    tokengen = tokenize.generate_tokens(StringIO(in_text).readline)
    result = []
    for tokid, tokval, _, _, _ in tokengen:
        # fix unquoted strings
        if (tokid == token.NAME):
            if tokval not in ['true', 'false', 'null', '-Infinity', 'Infinity', 'NaN']:
                tokid = token.STRING
                tokval = u'"%s"' % tokval

        # fix single-quoted strings
        elif (tokid == token.STRING):
            if tokval.startswith ("'"):
                tokval = u'"%s"' % tokval[1:-1].replace ('"', '\\"')

        # remove invalid commas
        elif (tokid == token.OP) and ((tokval == '}') or (tokval == ']')):
            if (len(result) > 0) and (result[-1][1] == ','):
                result.pop()

        # fix single-quoted strings
        elif (tokid == token.STRING):
            if tokval.startswith ("'"):
                tokval = u'"%s"' % tokval[1:-1].replace ('"', '\\"')

        result.append((tokid, tokval))

    return tokenize.untokenize(result)

def Options(symbol):
    dat = ul.urlopen('https://www.google.com/finance/option_chain?q='+symbol+'&output=json').read()
    dat = json.loads(fix_lazy_json(dat))
    puts = json_normalize(dat['puts'])
    calls = json_normalize(dat['calls'])
    return calls, puts

@Slimdog588
Copy link

@nborggren Does it work for you? I tried it and it gives me this:

line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

The only thing I added was Options('AAPL') at the end of the code in order to run it. I'm relatively new to python (and coding), if it works for you, any insight would be appreciated. Thanks

@nborggren
Copy link

@Slimdog588 It looks like I introduced a typo when github wrapped the urlopen line. Make sure in the file you copied that the ?q='+symbol+... part in the Options function appears right next to the option_chain part. See if that helps.

@Slimdog588
Copy link

@nborggren thank you sir!

@femtotrader
Copy link
Contributor

Thanks @nborggren for citing my code.

You might use requests http://docs.python-requests.org/ because it's currently use by pandas-datareader and so it will be easier to add your code here (and have Python 2 / 3 compatibility)

import requests
url = "https://www.google.com/finance/option_chain"
r = requests.get(url, params={"q": "AAPL", "output": "json"})
content_json = response.text
...

@Slimdog588
Copy link

@nborggren I may be wrong but it seems as though that the url (dat) only has option contracts for 1 expiry and nothing further out? Do you know of a link to get the entire option chain including all expiries?

Thanks!

@nborggren
Copy link

Hi @Slimdog588 thanks for pointing that out. It looks like we'll have to call requests (or urlopen) for each expiry which can be specified with a url like below:

https://www.google.com/finance/option_chain?q=AAPL&expd=19&expm=1&expy=2018

I'll work on automating this and I'll get back with updated code.

@nborggren
Copy link

nborggren commented Jul 16, 2016

I've made some progress massaging this into a DataFrame like the old Yahoo DataFrame and getting all expirations. Here is what it looks like now:

screen shot 2016-07-16 at 3 23 28 pm

You'll notice that there are a lot more NaNs now and I had to coerce a lot of the variables to get data types like the yahoo dataframe. IV and QuoteTime are missing now too. Some ints are now floats though and trying to convert to ints throws an error. I'm hoping @Slimdog588 , @emican86 , @dscheste , @femtotrader could have a look and offer additional suggestions before I start trying to push this as well as femtotrader's blessing to push his code along with this.

the code is below:

import token, tokenize
from six.moves import cStringIO as StringIO
import json
from pandas.io.json import json_normalize
import requests
import pandas as pd

def fix_lazy_json(in_text):
    """
    Handle lazy JSON - to fix expecting property name
    this function fixes the json output from google
    http://stackoverflow.com/questions/4033633/handling-lazy-json-in-python-expecting-property-name
    """
    tokengen = tokenize.generate_tokens(StringIO(in_text).readline)

    result = []
    for tokid, tokval, _, _, _ in tokengen:
        # fix unquoted strings
        if (tokid == token.NAME):
            if tokval not in ['true', 'false', 'null', '-Infinity', 'Infinity', 'NaN']:
                tokid = token.STRING
                tokval = u'"%s"' % tokval

        # fix single-quoted strings
        elif (tokid == token.STRING):
            if tokval.startswith ("'"):
                tokval = u'"%s"' % tokval[1:-1].replace ('"', '\\"')

        # remove invalid commas
        elif (tokid == token.OP) and ((tokval == '}') or (tokval == ']')):
            if (len(result) > 0) and (result[-1][1] == ','):
                result.pop()

        # fix single-quoted strings
        elif (tokid == token.STRING):
            if tokval.startswith ("'"):
                tokval = u'"%s"' % tokval[1:-1].replace ('"', '\\"')

        result.append((tokid, tokval))

    return tokenize.untokenize(result)

def Options(symbol):
    url = "https://www.google.com/finance/option_chain"
    r = requests.get(url, params={"q": symbol,"output": "json"})
    content_json = r.text
    dat = json.loads(fix_lazy_json(content_json))
    puts = json_normalize(dat['puts'])
    calls = json_normalize(dat['calls'])
    np=len(puts)
    nc=len(calls)

    for i in dat['expirations'][1:]:
        r = requests.get(url, params={"q": symbol,"expd":i['d'],"expm":i['m'],"expy":i['y'],"output": "json"})
        content_json = r.text
        idat = json.loads(fix_lazy_json(content_json))
        puts1 = json_normalize(idat['puts'])
        calls1 = json_normalize(idat['calls'])
        puts1.index = [np+i for i in puts1.index]
        calls1.index = [nc+i for i in calls1.index]
        np+=len(puts1)
        nc+=len(calls1)
        puts = puts.append(puts1)
        calls = calls.append(calls1)
    calls.columns = ['Ask','Bid','Chg','cid','PctChg','cs','IsNonstandard','Expiry','Underlying','Open_Int','Last','Symbol','Strike','Vol']
    puts.columns = ['Ask','Bid','Chg','cid','PctChg','cs','IsNonstandard','Expiry','Underlying','Open_Int','Last','Symbol','Strike','Vol']
    calls['Type'] = ['call' for i in range(len(calls))]
    puts['Type'] = ['put' for i in range(len(puts))]
    puts.index = [i+len(calls) for i in puts.index]
    opt=pd.concat([calls,puts])
    opt['Underlying']=[symbol for i in range(len(opt))]
    opt['Underlying_Price'] = [dat['underlying_price'] for i in range(len(opt))]
    opt['Root']=opt['Underlying']
    for j in ['Vol','Strike','Last','Bid','Ask','Chg']:
        opt[j] = pd.to_numeric(opt[j],errors='coerce')
    opt['IsNonstandard']=opt['IsNonstandard'].apply(lambda x:x!='OPRA')
    opt = opt.sort_values(by=['Strike','Type'])
    opt.index = range(len(opt))
    col = ['Strike', 'Expiry', 'Type', 'Symbol', 'Last', 'Bid', 'Ask', 'Chg', 'PctChg', 'Vol', 'Open_Int', 'Root', 'IsNonstandard', 'Underlying', 'Underlying_Price', 'cid','cs']
    opt = opt[col]

    return opt

@Slimdog588
Copy link

@nborggren , Great job! Thanks, it looks great. I'm trying to get it to print out option contracts that have volume of 500 or more, I can't seem to get it the way I did it last time, would you know how? Any help is greatly appreciated. Thanks

@Slimdog588
Copy link

@nborggren
I keep getting "ValueError: Length of values does not match length of index" when trying to access the data using this:

list.append(opt[opt.Vol > 500].index.get_level_values('Symbol'))

@nborggren
Copy link

nborggren commented Jul 16, 2016

@Slimdog588

I'm not sure how get_level_values works, I would just use:

df = opt[opt['Vol']>500]

this will give you a new dataframe of options with Vol over 500. Works for me.

@Slimdog588
Copy link

@nborggren , works like a charm. Many thanks! For those who want to implement this for multiple stocks, I tried doing this but kept getting a a value error(length of values does not match length of index). This is my last question, thank you for being so helpful!!!

def launch_program():

for symbol in input_file:
    s = str(symbol)
    return Options(s)

launch_program()

**input_file is a .csv that has 1 ticker in it

error:

Traceback (most recent call last):
File "popo.py", line 87, in
Options(symbol)
File "popo.py", line 67, in Options
calls['Type'] = ['call' for i in range(n)]
File "C:\Users\JDOG\Anaconda2\lib\site-packages\pandas\core\frame.py", line 2299, in setitem
self._set_item(key, value)
File "C:\Users\JDOG\Anaconda2\lib\site-packages\pandas\core\frame.py", line 2366, in _set_item
value = self._sanitize_column(key, value)
File "C:\Users\JDOG\Anaconda2\lib\site-packages\pandas\core\frame.py", line 2524, in _sanitize_column
value = _sanitize_index(value, self.index, copy=False)
File "C:\Users\JDOG\Anaconda2\lib\site-packages\pandas\core\series.py", line 2741, in _sanitize_index
raise ValueError('Length of values does not match length of '
ValueError: Length of values does not match length of index

@nborggren
Copy link

@Slimdog588
Hi I had edited the line
calls['Type'] = ['call' for i in range(n)]
to
calls['Type'] = ['call' for i in range(len(calls))]

I originally didn't account for the fact that the number of call options might be different from the number of puts, try recopying it and trying again.

No problem, I'm glad to have this working again too.

@Slimdog588
Copy link

@nborggren
Your fix worked. I hate to ask again but I'm desperate at this point:

I'm scanning through 6500 tickers roughly and I'm getting a few errors for some tickers:

  1. ''puts'' (This shows up when the stock has no options)

  2. 'Length mismatch: Expected axis has 12 elements, new values have 14 elements' (only has a few options)

  3. ''ascii' codec can't encode character u'\u2019' in position 1440: ordinal not in range(128)' (no options)

  4. 'No JSON object could be decoded' (This one pops up continuously after a while, whether the ticker has options or not)

I'm trying to handle these errors like so:

except Exception, e:
    print("Something went poopoo for ticker '%s', : '%s'" % (symbol, str(e)))
    time.sleep(1)
    continue

My questions:

1)Is there a better way of handling these exceptions?

2)When I add while j < 13 and do the code below, it says TypeError: cannot concatenate 'str' and 'int' objects for line j+=1......any ideas?

    except Exception, e:
        print("Something went poopoo for ticker '%s', : '%s'" % (symbol, (str(e))))
        j += 1
        time.sleep(1)
        continue

-symbol in this case is a string
-e's type is "typeerror" (no idea why)

I could easily be wrong but I think the "No JSON object could be decoded" error could be the result of requesting too much data from google too quickly, could be wrong, any thoughts are welcome.

Apologies for yet another question, thanks a million for everything!

@nborggren
Copy link

@Slimdog588

I'm running a list of 2500 symbols or so and need to use the following to handle exceptions, as far as I can tell, caused from the symbols not having any options (I'm writing to a file so I know what tickers failed):

except ValueError:
    g.write(i+'\n')
    continue
except KeyError:
    g.write(i+'\n')
    continue

I haven't had the json error you speak of. If it is happening continuously it might be a loss of connectivity, or as you say google. If you wait and try again it should work in that case, but I didn't think we were violating any terms of use.

Not sure about your TypeError problem, though it sounds like j is getting redefined as a string somewhere.

I should probably add some logic for low liquidity cases where there might be puts and no calls or calls and no puts, which might be the source of the 'puts' error.

@MaxGally
Copy link

Hi All,

I just tried to retrieve options quote with the provided code. It seems to return only quote for monthly expiration dates. I cannot see the weekly ones.
Any suggestions.

Thank you

@nborggren
Copy link

Hi @MaxGally,
which ticker are you looking at? For AAPL I'm seeing weekly and monthly
opt = Options('AAPL')
set(opt['Expiry'])
{u'Apr 21, 2017',
u'Aug 12, 2016',
u'Aug 19, 2016',
u'Aug 26, 2016',
u'Aug 5, 2016',
u'Jan 19, 2018',
u'Jan 20, 2017',
u'Jul 22, 2016',
u'Jul 29, 2016',
u'Jun 16, 2017',
u'Mar 17, 2017',
u'Oct 21, 2016',
u'Sep 16, 2016'}

@MaxGally
Copy link

yes, it works. Thank you

@dscheste
Copy link
Author

I am not much of a coder to help fix this bug, but I would like to propose a potential, although likely short-term solution.

I have noticed, that while the main yahoo finance site has been changed beyond recognition and to some degree reason, the Canadian section remains intact and uses the same design the broken function is using.

Is it possible to adjust the function that uses yahoo to pull all options data for a ticker to use the Canadian yahoo finance site instead?

See for yourselves:
https://ca.finance.yahoo.com/q/op?s=AAPL
vs new
https://finance.yahoo.com/quote/AAPL/options

I hope this helps.

@stephanschulz
Copy link

@dscheste thanks for this tip.

i do not know how to edit the pandas source to have it use ca.finance.yahoo.com.
but thanks to your tip i can (for now) use this simple python 3 script to save it as html file.

https://gist.github.com/stephanschulz/98e1ea9d4d04d8d8fb4b943dcd8ea416

@femtotrader
Copy link
Contributor

Fork this repository using GitHub interface

Git clone your pandas-datareader repository

Create a branch for your fix

git checkout -b yahoo-options-fix

Try to modify https://github.com/pydata/pandas-datareader/blob/master/pandas_datareader/yahoo/options.py#L81

Run unit tests to see it doesn't break anything and if it fix issue.

nosetests -s -v name_of_test.py

Then commit and push

git commit pandas_datareader/yahoo/options.py

git push

In GitHub open a Pull Request

@dscheste
Copy link
Author

dscheste commented Aug 3, 2016

@femtotrader I think both options.py and data.py need to be touched.
Instead of forking a new branch for this test, wouldn't it work if I just changed those files under:

/usr/local/lib/python2.7/dist-packages/pandas/io/data.py

  _TABLE_LOC = {'calls': 1, 'puts': 2}
    _OPTIONS_BASE_URL = 'http://ca.finance.yahoo.com/q/op?s={sym}'
    _FINANCE_BASE_URL = 'http://ca.finance.yahoo.com'

and /usr/local/lib/python2.7/dist-packages/pandas_datareader/yahoo/options.py

    _TABLE_LOC = {'calls': 1, 'puts': 2}
    _OPTIONS_BASE_URL = 'http://ca.finance.yahoo.com/q/op?s={sym}'
    _FINANCE_BASE_URL = 'http://ca.finance.yahoo.com'

Even after I changed as per above in both files, nothing gets recognized.

@femtotrader
Copy link
Contributor

You don't need to modify pandas/io/data.py as it's now part of pandas-datareader and will be deprecated.

Sad to know that http://ca.finance.yahoo.com/ doesn't work with current code. PR are welcome!

@Slimdog588
Copy link

Slimdog588 commented Aug 3, 2016

@femtotrader if I just reinstall pandas using pip , will this work? New to github, don't know how to use the pull, fork, etc

@femtotrader
Copy link
Contributor

I don't think so... until someone fix it ;-)
The best thing to do will be to git clone this repository, create a branch and to work on this issue

@femtotrader
Copy link
Contributor

First step is to fork this repository and git clone yours
Maybe this could help:

@femtotrader femtotrader mentioned this issue Sep 8, 2016
2 tasks
beluga9 added a commit to beluga9/pandas-datareader that referenced this issue Sep 26, 2016
@nemi83
Copy link

nemi83 commented Jan 23, 2017

nborggren:
thank you for your code. It was working great for the time being, but I ran into a strange error with the app this afternoon. It said:
<type 'str'>: (<type 'exceptions.TypeError'>, TypeError('__str__ returned non-string (type SysCallError)',))

I was running a simple command: option = Options('AAPL') and the above was what I got.
Also, yahoo! finance is still producing the same .json for data and I will start working on the update of the datareader:
https://query2.finance.yahoo.com/v7/finance/options/AAPL?straddle=true

@nborggren
Copy link

Hi @nemi83,
I couldn't reproduce your error, I tried with mine and it is still working.

@nemi83
Copy link

nemi83 commented Jan 23, 2017

Hi @nborggren:
I restarted the iep interpreter and the script was working fine (gone trough almost all the tickers) and then it broke again with the similar error:
SSLError: <unprintable SSLError object>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants