-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Read and write spss data format #5768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The following package might be useful https://www.ibm.com/developerworks/community/files/app?lang=en#/person/270003ERMT/file/b77a0da0-2f47-454b-b505-5404b242d78c |
Does SPSS offer no export formats compatible with pandas? Or, Is there relevant This may still make sense if for example users without access to SPSS frequently |
Actually, I do not use SPSS. But I have some SPSS data files that I want to explore. I may work with R but the data is so huge that this is not possible. The only way I did find is to tideous. I have to use PSPP to prepare the subset and the import it to R. With all the back and forth to add some variables and to master the SPSS syntax. Since pandas can deal with huge datasets, I do think it should provide import from SPSS. And I am willing to test it. |
That was a good enough reason for stata support, yeah. Note that this particular package weighs in at 90-180MB, includes a large chunk of binary Basic usage doesn't require any explicit pandas cooperation:
is all it takes. There's something to be said for pandas accepting data rather then data formats. Definitely worth a FAQ entry though, I'm sure other users have this need. |
Thank you very much for looking through this problem. |
Sorry @y-p This seems to be a reported error. I am using a 64bit python on a 64bit machine which is the case that is problematic according to this discussion https://bitbucket.org/fomcl/savreaderwriter/issue/12/win64-error |
Hi All,
Hope to work to someone. Have a great day! |
Lots of data is made available in SPSS so this tool would be very useful, especially for social scientists and economists. If the solution of @ThinkOnData works, it seems it should be an easy improvement. I will try it out and may submit a PR. |
I second the usefulness of Using SPSS to manage the master survey data seems to be a common approach (cf., Gaskin, 2016). As the first step in introducing a computational approach to the collaboration, I am writing a script that preprocesses the survey data we have collected. Being able to easily read and write SPSS would certainly be helpful here. |
For those interested in this issue, I think contributing better pandas support (or suggesting it) to the https://bitbucket.org/fomcl/savreaderwriter would be a good first step. |
SPSS is the only file format that is exportable from common survey tools like Qualtrics, SurveyGizmo, and SurveyMonkey that allows you to preserve both the values and the labels for many variables. Survey data seems to always be represented as one row per response and one column per questions or question choice. For single select questions or Open ended , the values are usually coded as a single number per choice, so a column may have a 1, 2, or 3 for male, female, or 'prefer not to state'. If you can multi-select, most survey tools export questions like Q2_1, Q2_2, Q2_3... for each of the possible choices, and then each cell has a 1 for Selected and a null/SYSMIS/NaN value if it was not selected. Sometimes, that missing data is also coded as a -99 or other values. Finally, SPSS has 2 properties, VARIABLE LABELS and VALUE LABELS that contain the Strings that tend to correspond to question text/choice text. If you export a survey file as CSV from any of those tools, you are presented with the choice to either take the strings from each questions (e.g. "Strongly Agree" will be what is in each cell where that was selected) or, you can have a '5'. the trouble is that for many types of analysis you want both. In pandas, I think it would be easy to treat all of these as category labels with pd.categorical. The common use case for this is to see what the average scale rating is for a question - e.g. in a Disagree <---> Agree scale is to get a mean response / st dev. But, you may also want to produce a cross tab that shows you count/percent for each category column. SPSS can do this pretty well, but, it is expensive, slow, and has a really lame internal syntax language. If pandas had full support for SPSS files, and could write them out, it would be very helpful for doing both initial data exploration, question aggregation, data restructuring, text analysis, and then write out the new file to leverage other downstream tools like Wincross for reporting/analysis that require SPSS files and are easy for business/non-techncial people to use. As someone that deals with a lot of survey data, I'm happy to talk/chat/answer any questions I can about this, and to test anything out in SPSS if it would help anyone. Note that I'm a novice when it comes to python/pandas, but I've been using SPSS for a long time and am looking to move away from it completely. |
I have written a wrapper for the C library Readstat named pyreadstat which reads SPSS sav, zsav and por files: github.com/Roche/pyreadstat |
It would be great if this functionality was available directly from Pandas, e.g. via |
It’s more likely that we would have an optional dependency on that package that a read_spss would use. Similar to what we do with pyarrow and parquet.
… On May 22, 2019, at 04:20, Clemens Brunner ***@***.***> wrote:
It would be great if this functionality was available directly from Pandas, e.g. via read_spss. @TomAugspurger @jreback @jorisvandenbossche (sorry for the explicit mentions, I don't know how to at the whole dev team) would this be an option (given that this requires a C lib)? @ofajardo would you be willing to merge your code?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@cbrnr @TomAugspurger I am willing to help and contribute |
Great! So this could be as simple as adding Also, |
I think ignore for now, but we can certainly revisit once we have spss taken care of. Once there's interest we can add an |
This issue shouldn't have been closed, since #26537 covers only the reading part. |
there is no write support anywhere AFAIK - |
It would be nice to be able to import spss dant with read_spss and export it using to_spss.
The text was updated successfully, but these errors were encountered: