Skip to content

Commit fad7ddd

Browse files
committed
DOC: add section on Data Structures to Excel page
1 parent 5baca3d commit fad7ddd

File tree

3 files changed

+56
-4
lines changed

3 files changed

+56
-4
lines changed

doc/source/getting_started/comparison/comparison_with_excel.rst

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,52 @@ Excel-compatible spreadsheet software.
1515

1616
.. include:: comparison_boilerplate.rst
1717

18+
Data structures
19+
---------------
20+
21+
General terminology translation
22+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
23+
24+
.. csv-table::
25+
:header: "pandas", "Excel"
26+
:widths: 20, 20
27+
28+
``DataFrame``, worksheet
29+
``Series``, column
30+
``Index``, row headings
31+
row, row
32+
``NaN``, empty cell
33+
34+
``DataFrame``
35+
~~~~~~~~~~~~~
36+
37+
A ``DataFrame`` in pandas is analogous to an Excel worksheet. While an Excel worksheet can contain
38+
multiple worksheets, pandas ``DataFrame``s exist independently.
39+
40+
``Series``
41+
~~~~~~~~~~
42+
43+
A ``Series`` is the data structure that represents one column of a ``DataFrame``. Working with a
44+
``Series`` is analogous to referencing a column of a spreadsheet.
45+
46+
``Index``
47+
~~~~~~~~~
48+
49+
Every ``DataFrame`` and ``Series`` has an ``Index``, which are labels on the *rows* of the data. In
50+
pandas, if no index is specified, an integer index is used by default (first row = 0, second row =
51+
1, and so on), analogous to row headings/numbers in Excel.
52+
53+
In pandas, indexes can be set to one (or multiple) unique values, which is like having a column that
54+
use use as the row identifier in a worksheet. Unlike Excel, these ``Index`` values can actually be
55+
used to reference the rows. For example, in Excel, you would reference the first row as ``A1:Z1``,
56+
while in pandas you could use ``populations.loc['Chicago']``.
57+
58+
Index values are also persistent, so if you re-order the rows in a ``DataFrame``, the label for a
59+
particular row don't change.
60+
61+
See the :ref:`indexing documentation<indexing>` for much more on how to use an ``Index``
62+
effectively.
63+
1864
Commonly used Excel functionalities
1965
-----------------------------------
2066

doc/source/getting_started/comparison/comparison_with_sas.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,14 +39,17 @@ General terminology translation
3939
``NaN``, ``.``
4040

4141

42-
``DataFrame`` / ``Series``
43-
~~~~~~~~~~~~~~~~~~~~~~~~~~
42+
``DataFrame``
43+
~~~~~~~~~~~~~
4444

4545
A ``DataFrame`` in pandas is analogous to a SAS data set - a two-dimensional
4646
data source with labeled columns that can be of different types. As will be
4747
shown in this document, almost any operation that can be applied to a data set
4848
using SAS's ``DATA`` step, can also be accomplished in pandas.
4949

50+
``Series``
51+
~~~~~~~~~~
52+
5053
A ``Series`` is the data structure that represents one column of a
5154
``DataFrame``. SAS doesn't have a separate data structure for a single column,
5255
but in general, working with a ``Series`` is analogous to referencing a column

doc/source/getting_started/comparison/comparison_with_stata.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,14 +38,17 @@ General terminology translation
3838
``NaN``, ``.``
3939

4040

41-
``DataFrame`` / ``Series``
42-
~~~~~~~~~~~~~~~~~~~~~~~~~~
41+
``DataFrame``
42+
~~~~~~~~~~~~~
4343

4444
A ``DataFrame`` in pandas is analogous to a Stata data set -- a two-dimensional
4545
data source with labeled columns that can be of different types. As will be
4646
shown in this document, almost any operation that can be applied to a data set
4747
in Stata can also be accomplished in pandas.
4848

49+
``Series``
50+
~~~~~~~~~~
51+
4952
A ``Series`` is the data structure that represents one column of a
5053
``DataFrame``. Stata doesn't have a separate data structure for a single column,
5154
but in general, working with a ``Series`` is analogous to referencing a column

0 commit comments

Comments
 (0)