Velocity HOURLY #99

diodon · 2020-02-20T06:33:50Z

This code aggregates the flattened velocity variables binned into one hour time intervals. It checks for the presence of seconds_to_middle_of_measurement and lags forward the timestamp.
Here is a sample file aggregating 20 deployments from NRSDAR

codecov · 2020-02-20T06:35:19Z

Codecov Report

Merging #99 into master will not change coverage by %.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master      #99   +/-   ##
=======================================
  Coverage   54.78%   54.78%           
=======================================
  Files           9        9           
  Lines        1150     1150           
  Branches      167      167           
=======================================
  Hits          630      630           
  Misses        493      493           
  Partials       27       27

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fef2508...fef2508. Read the comment docs.

…to_middle as a variable

diodon · 2020-03-05T05:48:06Z

The code does not shift the TIME according to seconds_to_middle. However, the portions of the code that do this are commented in case we want to go back and use this approach

mhidas

I haven't quite got to the end of the code yet, but I think I've managed to get my head around the guts of it. There's a few things to think about.

aodntools/timeseries_products/velocity_hourly_timeseries.py

mhidas · 2020-03-12T04:44:21Z

aodntools/timeseries_products/velocity_hourly_timeseries.py

+    :param is_WCUR: flag indicating if WCUR is present
+    :return: end index of the slice
+    """
+    nc_cell = nc_cell.where(nc_cell.DEPTH_quality_control < 4, drop=True)


Why limit to < 4 and not < 3 ? With the non-velocity product we included only flags 1 & 2, so let's be consistent here.
Also, this limit would ideally be specified as an argument to the function, or at least a constant within this module.

More importantly, we also need to mask out bad values of the velocity components!!
E.g. Replace UCUR values where UCUR_quality_control > 2 with nan before doing the resample. I think you can do this with each variable on the same DataFrame (without drop=True), then still just do the resample.

mhidas · 2020-03-12T04:49:05Z

aodntools/timeseries_products/velocity_hourly_timeseries.py

+    ## back the index 30min
+    nc_cell.index = nc_cell.index - pd.Timedelta(30, units='m')
+
+    nc_cell_1H = nc_cell.resample('1H')


I think this might be the reason for some of the slowness.. we are repeating this resample for every cell, even though the timestamps are the same for each cell! Not sure if xarray allows you to get around that?
Actually, the same is true for excluding the "bad" DEPTH values - this should be done only once.

I would not expect this to be the bottleneck. Anyway, without running locally, I don't know why there is the conversion to data_frame above and then resampling - xarray can use the pandas resampling method.

mhidas · 2020-03-12T06:47:59Z

aodntools/timeseries_products/velocity_hourly_timeseries.py

+                        nc_cell['DEPTH'] = nc_cell['DEPTH'] - cell
+                        slice_end = get_resampled_values(nc_cell, ds, slice_start, varlist, binning_fun,
+                                                         epoch, one_day, is_WCUR)
+                        CELL_INDEX[slice_start:slice_end] = np.full(slice_end - slice_start, cell_idx, dtype=np.uint32)


Note that this ordering of values along the OBSERVATION dimension is different from what we did in the aggregated product. In that case we cycled through all the cells at a timestamp before moving on to the next timestamp. I can see why this would be a bit more difficult here, but in any case it might be helpful to keep the two products consistent in this sense.

aodntools/timeseries_products/velocity_hourly_timeseries.py

Co-Authored-By: Marty Hidas <[email protected]>

mhidas

A few more comments, but otherwise I think it's good for now. Let's see how the Tech Team review goes.

Main issues yet to be resolved:

I think we should be shifting the input timestamps according to the seconds_to_middle attribute, when present;
Ordering of time and depth/cell along the OBSERVATION dimension is different from the aggregated product.

aodntools/timeseries_products/velocity_hourly_timeseries.py

mhidas · 2020-03-19T00:46:11Z

aodntools/timeseries_products/velocity_hourly_timeseries.py

+
+    ## NOTE: There is a possibility of having NaNs in DEPTH after the binning
+    ## this is the warning when calculating the min/max DEPTH
+    ## maybe I should clean the dataset before close it


Is this comment still relevant? Doesn't using np.nanmin() get around this problem?

aodntools/timeseries_products/velocity_hourly_timeseries.py

aodntools/timeseries_products/velocity_hourly_timeseries_template.json

aodntools/timeseries_products/Documentation/velocity_hourly_timeseries.md

mhidas · 2020-03-19T01:11:44Z

aodntools/timeseries_products/Documentation/velocity_hourly_timeseries.md

+- All files to be aggregated are from the same site, and have the same `site_code` attribute; 
+- Variables to be aggregated have `TIME` and (optionally) `HEIGHT_ABOVE_SENSOR` as their only dimensions (or if `LATITUDE` and `LONGITUDE` are included as dimensions, they have size 1);
+- The in-water data are bounded by the global attributes `time_deployment_start` and `time_deployment_end`;
+


Might need to add this if it is so decided:
The TIME variable has an attribute seconds_to_middle_of_measurement to indicate the offset from each recorded timestamp to the centre of the averaging period.

aodntools/timeseries_products/Documentation/velocity_hourly_timeseries.md

aodntools/timeseries_products/velocity_hourly_timeseries_template.json

mhidas

Added a couple of suggestions in the global attributes, to warn users that the timestamps have not been centred before binning.

aodntools/timeseries_products/velocity_hourly_timeseries_template.json

(also update description of CELL_INDEX and SECONDS_TO_MIDDLE)

* make cell_methods CF compliant * more accurate description in long_name * update author_email to AODN one

* reorder imports * global variable for max QC flag to include * minor doc string edits * clearer variable renames

(add comments about not centering the timestamps before binning)

mhidas · 2020-04-01T02:02:35Z

@ocehugo If you have a bit of time today, please take a quick look at this & merge unless you find anything really broken. You only really need to look at the last 9 commits (starting with bd151bc), as the rest were done by @diodon and reviewed by me.

mhidas · 2020-04-01T02:10:51Z

Note the Travis tests are failing due to https://github.com/aodn/issues/issues/692, but a fix is almost ready.

aodntools/timeseries_products/velocity_hourly_timeseries.py

ocehugo · 2020-04-01T05:26:54Z

aodntools/timeseries_products/velocity_hourly_timeseries.py

@@ -41,14 +39,14 @@ def cell_velocity_resample(df, binning_function, is_WCUR):
    return UCUR, VCUR, WCUR, DEPTH


-def get_resampled_values(nc_cell, ds, slice_start, varlist, binning_fun, epoch, one_day, is_WCUR):
+def get_resampled_values(nc_cell, ds, slice_start, varlist, binning_function, epoch, one_day, is_WCUR):
    """


too many arguments

Agree. I have removed varlist and the last three.

ocehugo · 2020-04-01T05:27:29Z

aodntools/timeseries_products/velocity_hourly_timeseries.py

@@ -73,7 +71,7 @@ def get_resampled_values(nc_cell, ds, slice_start, varlist, binning_fun, epoch,
    ds['WCUR'][slice_start:slice_end], \


mutation of input arguments in the function

I would rename the function to highlight that I'm mutating ds.

Renamed to append_resampled_values, and I've made this clear in the docstring too

ocehugo · 2020-04-01T22:00:05Z

aodntools/timeseries_products/velocity_hourly_timeseries.py

@@ -54,14 +54,14 @@ def get_resampled_values(nc_cell, ds, slice_start, varlist, binning_function, ep
    """
    df_cell = nc_cell[varlist].squeeze().to_dataframe()
    ## back the index 30min
-    df_cell.index = df_cell.index - pd.Timedelta(minutes=30)
+    df_cell.index = df_cell.index + pd.Timedelta(minutes=30)
    # TODO: shift timestamps to centre of sampling interval


Comment not updated with code - you are advancing the index instead.

I imagine the code was wrong before, so the commit could highlight that

It wasn't wrong, it just took two steps instead of one: it subtracted 30min, resampled, then added an hour.
So I simplified it to just adding 30min, which is exactly equivalent. I did a test comparison and all the variables in files generated with the two version of the code are exactly the same.

comment changed to "shift the index forward 30min to centre the bins on the hour"

ocehugo · 2020-04-01T22:06:03Z

@mhidas - I'm leaving to you to merge of just a single thing - there a change on the time resampling (forward shifting instead of backward) that I assumed was further checked?

mhidas · 2020-04-02T05:19:48Z

Thanks @ocehugo. I will update that commit, fix the merge conflicts (due to version numbers changing after the other PR was merged) and merge.

initial commit

b04b008

diodon requested a review from mhidas February 20, 2020 06:33

diodon self-assigned this Feb 20, 2020

diodon added 11 commits February 24, 2020 12:14

add cell_index variable

c9b5632

change alias of the package

1f4c769

Merge remote-tracking branch 'origin/master' into velocity_hourly

f0e6fa3

replace functions by aodn package call

ac83ec4

rename variable

f7c71cd

Merge branch 'master' into velocity_hourly

3cfc227

adapt to NETCDF4_CLASSIC. Fix cell_index variable name in template

103d9c9

integer type to numpy int16

2df71ef

Merge branch 'master' into velocity_hourly

fbe0f89

Merge branch 'master' into velocity_hourly

83d0aff

remove check fo rseconds_to_middle and don't shift TIME. Add seconds_…

dac77e1

…to_middle as a variable

diodon added 8 commits March 11, 2020 08:43

implement process by chunks to reduce memory use

38296bb

add CELL_INDEX to documentation

beb4018

documentation for the hourly product

e21db79

documentation for the hourly product README

f6121b3

fix instrument dimension error

186521a

fix dtype in binning function

1a3b2e2

chunk size as variable

3a2094f

squeeze the dataset to remove extra dimensions in variables

72cd57f

mhidas requested changes Mar 12, 2020

View reviewed changes

diodon and others added 5 commits March 13, 2020 11:25

Update aodntools/timeseries_products/velocity_hourly_timeseries.py

6131301

Co-Authored-By: Marty Hidas <[email protected]>

Update aodntools/timeseries_products/velocity_hourly_timeseries.py

ba68960

Co-Authored-By: Marty Hidas <[email protected]>

Update aodntools/timeseries_products/velocity_hourly_timeseries.py

3974960

Co-Authored-By: Marty Hidas <[email protected]>

Update aodntools/timeseries_products/velocity_hourly_timeseries.py

535256c

Co-Authored-By: Marty Hidas <[email protected]>

changes according to review

1ce1508

mhidas added 2 commits March 18, 2020 20:54

Bump version to 1.3.0

8e99d5f

Merge branch 'master' into velocity_hourly

0e067eb

mhidas reviewed Mar 19, 2020

View reviewed changes

aodntools/timeseries_products/velocity_hourly_timeseries_template.json Outdated Show resolved Hide resolved

mhidas reviewed Mar 19, 2020

View reviewed changes

aodntools/timeseries_products/velocity_hourly_timeseries_template.json Outdated Show resolved Hide resolved

aodntools/timeseries_products/velocity_hourly_timeseries_template.json Show resolved Hide resolved

mhidas mentioned this pull request Mar 31, 2020

Pandas TypeErrors in hourly_timeseries #117

Closed

3 tasks

mhidas added 9 commits March 31, 2020 17:07

Update global attributes and code comments for no time shift

bd151bc

(also update description of CELL_INDEX and SECONDS_TO_MIDDLE)

import check_file from velocity_aggregated code

2bdf8c2

various attribute fixes

76432e4

* make cell_methods CF compliant * more accurate description in long_name * update author_email to AODN one

make bad_files a dict like in other products

d2cda9f

a bit of code clean-up (no functional changes)

b9de287

* reorder imports * global variable for max QC flag to include * minor doc string edits * clearer variable renames

rename DataFrame object to distinguish it from xarray Dataset

786d8ae

simplify 30min time shift for resample

b2e8f4f

update documentation to reflect code changes

1d72e12

(add comments about not centering the timestamps before binning)

fixup: remove unused -path command-line argument

0de304e

mhidas requested a review from ocehugo April 1, 2020 02:00

ocehugo reviewed Apr 1, 2020

View reviewed changes

aodntools/timeseries_products/velocity_hourly_timeseries.py Show resolved Hide resolved

ocehugo reviewed Apr 1, 2020

View reviewed changes

mhidas added 2 commits April 2, 2020 17:18

rename and refactor get_resampled_values in response to review

2332594

Merge branch 'master' into velocity_hourly

fef2508

mhidas merged commit 63abc6d into master Apr 2, 2020

mhidas deleted the velocity_hourly branch April 2, 2020 06:32

mhidas mentioned this pull request Apr 28, 2022

Fix centering of hourly interval #153

Merged

mhidas mentioned this pull request Jun 7, 2022

Inconsistent ordering of values along OBSERVATION dimension #163

Open

		@@ -73,7 +71,7 @@ def get_resampled_values(nc_cell, ds, slice_start, varlist, binning_fun, epoch,
		ds['WCUR'][slice_start:slice_end], \

Velocity HOURLY #99

Velocity HOURLY #99

Uh oh!

Conversation

diodon commented Feb 20, 2020

Uh oh!

codecov bot commented Feb 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

diodon commented Mar 5, 2020

Uh oh!

mhidas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mhidas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mhidas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mhidas commented Apr 1, 2020

Uh oh!

mhidas commented Apr 1, 2020

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ocehugo commented Apr 1, 2020

Uh oh!

mhidas commented Apr 2, 2020

Uh oh!

Uh oh!

codecov bot commented Feb 20, 2020 •

edited

Loading