Skip to content

Commit 9201e79

Browse files
committed
Merge tag 'v0.10.1' into debian
Version 0.10.1 * tag 'v0.10.1': (195 commits) RLS: set released to true RLS: Version 0.10.1 TST: skip problematic xlrd test Merging in MySQL support pandas-dev#2482 Revert "Merging in MySQL support pandas-dev#2482" BUG: don't let np.prod overflow int64 RLS: note changed return type in DatetimeIndex.unique RLS: more what's new for 0.10.1 RLS: some what's new for 0.10.1 API: restore inplace=TRue returns self, add FutureWarnings. re pandas-dev#1893 Merging in MySQL support pandas-dev#2482 BUG: fix python 3 dtype issue DOC: fix what's new 0.10 doc bug re pandas-dev#2651 BUG: fix C parser thread safety. verify gil release close pandas-dev#2608 BUG: usecols bug with implicit first index column. close pandas-dev#2654 BUG: plotting bug when base is nonzero pandas-dev#2571 BUG: period resampling bug when all values fall into a single bin. close pandas-dev#2070 BUG: fix memory error in sortlevel when many multiindex levels. close pandas-dev#2684 STY: CRLF BUG: perf_HEAD reports wrong vbench name when an exception is raised ...
2 parents 88119b2 + 31ecaa9 commit 9201e79

File tree

199 files changed

+10719
-5707
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

199 files changed

+10719
-5707
lines changed

.travis.yml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ language: python
33
python:
44
- 2.6
55
- 2.7
6-
- 3.1 # travis will soon EOL this
6+
# - 3.1 # travis EOL
77
- 3.2
88
- 3.3
99

@@ -15,6 +15,8 @@ matrix:
1515
include:
1616
- python: 2.7
1717
env: VBENCH=true
18+
- python: 2.7
19+
env: LOCALE_OVERRIDE="zh_CN.GB18030" # simplified chinese
1820
- python: 2.7
1921
env: FULL_DEPS=true
2022
- python: 3.2
@@ -45,8 +47,10 @@ before_install:
4547
install:
4648
- echo "Waldo2"
4749
- ci/install.sh
48-
- ci/print_versions.py # not including stats
4950

5051
script:
5152
- echo "Waldo3"
5253
- ci/script.sh
54+
55+
after_script:
56+
- ci/print_versions.py

RELEASE.rst

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,138 @@ Where to get it
2222
* Binary installers on PyPI: http://pypi.python.org/pypi/pandas
2323
* Documentation: http://pandas.pydata.org
2424

25+
pandas 0.10.1
26+
=============
27+
28+
**Release date:** 2013-01-22
29+
30+
**New features**
31+
32+
- Add data inferface to World Bank WDI pandas.io.wb (#2592)
33+
34+
**API Changes**
35+
36+
- Restored inplace=True behavior returning self (same object) with
37+
deprecation warning until 0.11 (GH1893_)
38+
- ``HDFStore``
39+
- refactored HFDStore to deal with non-table stores as objects, will allow future enhancements
40+
- removed keyword ``compression`` from ``put`` (replaced by keyword
41+
``complib`` to be consistent across library)
42+
- warn `PerformanceWarning` if you are attempting to store types that will be pickled by PyTables
43+
44+
**Improvements to existing features**
45+
46+
- ``HDFStore``
47+
48+
- enables storing of multi-index dataframes (closes GH1277_)
49+
- support data column indexing and selection, via ``data_columns`` keyword in append
50+
- support write chunking to reduce memory footprint, via ``chunksize``
51+
keyword to append
52+
- support automagic indexing via ``index`` keywork to append
53+
- support ``expectedrows`` keyword in append to inform ``PyTables`` about
54+
the expected tablesize
55+
- support ``start`` and ``stop`` keywords in select to limit the row
56+
selection space
57+
- added ``get_store`` context manager to automatically import with pandas
58+
- added column filtering via ``columns`` keyword in select
59+
- added methods append_to_multiple/select_as_multiple/select_as_coordinates
60+
to do multiple-table append/selection
61+
- added support for datetime64 in columns
62+
- added method ``unique`` to select the unique values in an indexable or data column
63+
- added method ``copy`` to copy an existing store (and possibly upgrade)
64+
- show the shape of the data on disk for non-table stores when printing the store
65+
- added ability to read PyTables flavor tables (allows compatiblity to other HDF5 systems)
66+
- Add ``logx`` option to DataFrame/Series.plot (GH2327_, #2565)
67+
- Support reading gzipped data from file-like object
68+
- ``pivot_table`` aggfunc can be anything used in GroupBy.aggregate (GH2643_)
69+
- Implement DataFrame merges in case where set cardinalities might overflow
70+
64-bit integer (GH2690_)
71+
- Raise exception in C file parser if integer dtype specified and have NA
72+
values. (GH2631_)
73+
- Attempt to parse ISO8601 format dates when parse_dates=True in read_csv for
74+
major performance boost in such cases (GH2698_)
75+
- Add methods ``neg`` and ``inv`` to Series
76+
- Implement ``kind`` option in ``ExcelFile`` to indicate whether it's an XLS
77+
or XLSX file (GH2613_)
78+
79+
**Bug fixes**
80+
81+
- Fix read_csv/read_table multithreading issues (GH2608_)
82+
- ``HDFStore``
83+
84+
- correctly handle ``nan`` elements in string columns; serialize via the
85+
``nan_rep`` keyword to append
86+
- raise correctly on non-implemented column types (unicode/date)
87+
- handle correctly ``Term`` passed types (e.g. ``index<1000``, when index
88+
is ``Int64``), (closes GH512_)
89+
- handle Timestamp correctly in data_columns (closes GH2637_)
90+
- contains correctly matches on non-natural names
91+
- correctly store ``float32`` dtypes in tables (if not other float types in
92+
the same table)
93+
- Fix DataFrame.info bug with UTF8-encoded columns. (GH2576_)
94+
- Fix DatetimeIndex handling of FixedOffset tz (GH2604_)
95+
- More robust detection of being in IPython session for wide DataFrame
96+
console formatting (GH2585_)
97+
- Fix platform issues with ``file:///`` in unit test (#2564)
98+
- Fix bug and possible segfault when grouping by hierarchical level that
99+
contains NA values (GH2616_)
100+
- Ensure that MultiIndex tuples can be constructed with NAs (seen in #2616)
101+
- Fix int64 overflow issue when unstacking MultiIndex with many levels (#2616)
102+
- Exclude non-numeric data from DataFrame.quantile by default (GH2625_)
103+
- Fix a Cython C int64 boxing issue causing read_csv to return incorrect
104+
results (GH2599_)
105+
- Fix groupby summing performance issue on boolean data (GH2692_)
106+
- Don't bork Series containing datetime64 values with to_datetime (GH2699_)
107+
- Fix DataFrame.from_records corner case when passed columns, index column,
108+
but empty record list (GH2633_)
109+
- Fix C parser-tokenizer bug with trailing fields. (GH2668_)
110+
- Don't exclude non-numeric data from GroupBy.max/min (GH2700_)
111+
- Don't lose time zone when calling DatetimeIndex.drop (GH2621_)
112+
- Fix setitem on a Series with a boolean key and a non-scalar as value (GH2686_)
113+
- Box datetime64 values in Series.apply/map (GH2627_, GH2689_)
114+
- Upconvert datetime + datetime64 values when concatenating frames (GH2624_)
115+
- Raise a more helpful error message in merge operations when one DataFrame
116+
has duplicate columns (GH2649_)
117+
- Fix partial date parsing issue occuring only when code is run at EOM (GH2618_)
118+
- Prevent MemoryError when using counting sort in sortlevel with
119+
high-cardinality MultiIndex objects (GH2684_)
120+
- Fix Period resampling bug when all values fall into a single bin (GH2070_)
121+
- Fix buggy interaction with usecols argument in read_csv when there is an
122+
implicit first index column (GH2654_)
123+
124+
.. _GH512: https://github.com/pydata/pandas/issues/512
125+
.. _GH1277: https://github.com/pydata/pandas/issues/1277
126+
.. _GH2070: https://github.com/pydata/pandas/issues/2070
127+
.. _GH2327: https://github.com/pydata/pandas/issues/2327
128+
.. _GH2585: https://github.com/pydata/pandas/issues/2585
129+
.. _GH2599: https://github.com/pydata/pandas/issues/2599
130+
.. _GH2604: https://github.com/pydata/pandas/issues/2604
131+
.. _GH2576: https://github.com/pydata/pandas/issues/2576
132+
.. _GH2608: https://github.com/pydata/pandas/issues/2608
133+
.. _GH2613: https://github.com/pydata/pandas/issues/2613
134+
.. _GH2616: https://github.com/pydata/pandas/issues/2616
135+
.. _GH2621: https://github.com/pydata/pandas/issues/2621
136+
.. _GH2624: https://github.com/pydata/pandas/issues/2624
137+
.. _GH2625: https://github.com/pydata/pandas/issues/2625
138+
.. _GH2627: https://github.com/pydata/pandas/issues/2627
139+
.. _GH2631: https://github.com/pydata/pandas/issues/2631
140+
.. _GH2633: https://github.com/pydata/pandas/issues/2633
141+
.. _GH2637: https://github.com/pydata/pandas/issues/2637
142+
.. _GH2643: https://github.com/pydata/pandas/issues/2643
143+
.. _GH2649: https://github.com/pydata/pandas/issues/2649
144+
.. _GH2654: https://github.com/pydata/pandas/issues/2654
145+
.. _GH2668: https://github.com/pydata/pandas/issues/2668
146+
.. _GH2684: https://github.com/pydata/pandas/issues/2684
147+
.. _GH2689: https://github.com/pydata/pandas/issues/2689
148+
.. _GH2690: https://github.com/pydata/pandas/issues/2690
149+
.. _GH2692: https://github.com/pydata/pandas/issues/2692
150+
.. _GH2698: https://github.com/pydata/pandas/issues/2698
151+
.. _GH2699: https://github.com/pydata/pandas/issues/2699
152+
.. _GH2700: https://github.com/pydata/pandas/issues/2700
153+
.. _GH2694: https://github.com/pydata/pandas/issues/2694
154+
.. _GH2686: https://github.com/pydata/pandas/issues/2686
155+
.. _GH2618: https://github.com/pydata/pandas/issues/2618
156+
25157
pandas 0.10.0
26158
=============
27159

bench/bench_dense_to_sparse.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,3 @@
1212
this_rng = rng2[:-i]
1313
data[100:] = np.nan
1414
series[i] = SparseSeries(data, index=this_rng)
15-

bench/bench_get_put_value.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,39 +4,46 @@
44
N = 1000
55
K = 50
66

7+
78
def _random_index(howmany):
89
return Index([rands(10) for _ in xrange(howmany)])
910

1011
df = DataFrame(np.random.randn(N, K), index=_random_index(N),
1112
columns=_random_index(K))
1213

14+
1315
def get1():
1416
for col in df.columns:
1517
for row in df.index:
1618
_ = df[col][row]
1719

20+
1821
def get2():
1922
for col in df.columns:
2023
for row in df.index:
2124
_ = df.get_value(row, col)
2225

26+
2327
def put1():
2428
for col in df.columns:
2529
for row in df.index:
2630
df[col][row] = 0
2731

32+
2833
def put2():
2934
for col in df.columns:
3035
for row in df.index:
3136
df.set_value(row, col, 0)
3237

38+
3339
def resize1():
3440
buf = DataFrame()
3541
for col in df.columns:
3642
for row in df.index:
3743
buf = buf.set_value(row, col, 5.)
3844
return buf
3945

46+
4047
def resize2():
4148
from collections import defaultdict
4249

bench/bench_groupby.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,16 +12,19 @@
1212
random.shuffle(foo)
1313
random.shuffle(foo2)
1414

15-
df = DataFrame({'A' : foo,
16-
'B' : foo2,
17-
'C' : np.random.randn(n * k)})
15+
df = DataFrame({'A': foo,
16+
'B': foo2,
17+
'C': np.random.randn(n * k)})
1818

1919
import pandas._sandbox as sbx
2020

21+
2122
def f():
2223
table = sbx.StringHashTable(len(df))
2324
ret = table.factorize(df['A'])
2425
return ret
26+
27+
2528
def g():
2629
table = sbx.PyObjectHashTable(len(df))
2730
ret = table.factorize(df['A'])

bench/bench_join_panel.py

Lines changed: 26 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,55 @@
1-
# reasonably effecient
1+
# reasonably efficient
2+
23

34
def create_panels_append(cls, panels):
45
""" return an append list of panels """
5-
panels = [ a for a in panels if a is not None ]
6+
panels = [a for a in panels if a is not None]
67
# corner cases
78
if len(panels) == 0:
89
return None
910
elif len(panels) == 1:
1011
return panels[0]
1112
elif len(panels) == 2 and panels[0] == panels[1]:
1213
return panels[0]
13-
#import pdb; pdb.set_trace()
14+
# import pdb; pdb.set_trace()
1415
# create a joint index for the axis
16+
1517
def joint_index_for_axis(panels, axis):
1618
s = set()
1719
for p in panels:
18-
s.update(list(getattr(p,axis)))
20+
s.update(list(getattr(p, axis)))
1921
return sorted(list(s))
22+
2023
def reindex_on_axis(panels, axis, axis_reindex):
2124
new_axis = joint_index_for_axis(panels, axis)
22-
new_panels = [ p.reindex(**{ axis_reindex : new_axis, 'copy' : False}) for p in panels ]
25+
new_panels = [p.reindex(**{axis_reindex: new_axis,
26+
'copy': False}) for p in panels]
2327
return new_panels, new_axis
24-
# create the joint major index, dont' reindex the sub-panels - we are appending
28+
# create the joint major index, dont' reindex the sub-panels - we are
29+
# appending
2530
major = joint_index_for_axis(panels, 'major_axis')
2631
# reindex on minor axis
2732
panels, minor = reindex_on_axis(panels, 'minor_axis', 'minor')
2833
# reindex on items
2934
panels, items = reindex_on_axis(panels, 'items', 'items')
3035
# concatenate values
3136
try:
32-
values = np.concatenate([ p.values for p in panels ],axis=1)
37+
values = np.concatenate([p.values for p in panels], axis=1)
3338
except (Exception), detail:
34-
raise Exception("cannot append values that dont' match dimensions! -> [%s] %s" % (','.join([ "%s" % p for p in panels ]),str(detail)))
35-
#pm('append - create_panel')
36-
p = Panel(values, items = items, major_axis = major, minor_axis = minor )
37-
#pm('append - done')
39+
raise Exception("cannot append values that dont' match dimensions! -> [%s] %s"
40+
% (','.join(["%s" % p for p in panels]), str(detail)))
41+
# pm('append - create_panel')
42+
p = Panel(values, items=items, major_axis=major,
43+
minor_axis=minor)
44+
# pm('append - done')
3845
return p
3946

4047

41-
42-
# does the job but inefficient (better to handle like you read a table in pytables...e.g create a LongPanel then convert to Wide)
43-
48+
# does the job but inefficient (better to handle like you read a table in
49+
# pytables...e.g create a LongPanel then convert to Wide)
4450
def create_panels_join(cls, panels):
4551
""" given an array of panels's, create a single panel """
46-
panels = [ a for a in panels if a is not None ]
52+
panels = [a for a in panels if a is not None]
4753
# corner cases
4854
if len(panels) == 0:
4955
return None
@@ -62,16 +68,18 @@ def create_panels_join(cls, panels):
6268
for minor_i, minor_index in panel.minor_axis.indexMap.items():
6369
for major_i, major_index in panel.major_axis.indexMap.items():
6470
try:
65-
d[(minor_i,major_i,item)] = values[item_index,major_index,minor_index]
71+
d[(minor_i, major_i, item)] = values[item_index, major_index, minor_index]
6672
except:
6773
pass
6874
# stack the values
6975
minor = sorted(list(minor))
7076
major = sorted(list(major))
7177
items = sorted(list(items))
7278
# create the 3d stack (items x columns x indicies)
73-
data = np.dstack([ np.asarray([ np.asarray([ d.get((minor_i,major_i,item),np.nan) for item in items ]) for major_i in major ]).transpose() for minor_i in minor ])
79+
data = np.dstack([np.asarray([np.asarray([d.get((minor_i, major_i, item), np.nan)
80+
for item in items])
81+
for major_i in major]).transpose()
82+
for minor_i in minor])
7483
# construct the panel
7584
return Panel(data, items, major, minor)
7685
add_class_method(Panel, create_panels_join, 'join_many')
77-

0 commit comments

Comments
 (0)