The execution report cannot handle big workflows #547

pditommaso · 2017-12-07T10:05:22Z

Currently all tasks metadata and metrics are embedded as a JSON object in the execution report HTML file.

For big workflow executions ie. spawning 50K tasks or more, the resulting become to big (>100MB) and browser are not able to handle it.

The goal of this feature is to implement a better strategy to handle a large number of tasks.

/cc @ewels we need to brainstorm a bit about this at some point.

ewels · 2017-12-07T10:48:27Z

Hah, this is a familiar problem.. :) Can perhaps drop the table and generate the plots at run time to embed in the HTML?

pditommaso · 2017-12-07T11:24:38Z

It could be an option, but it should still parse the big JSON payload to compute the values displayed in the chart, shouldn't it? Maybe we could compute those values server side. Does it make sense ?

ewels · 2017-12-07T11:37:30Z

Yes exactly, that's what I meant by run time. So in MultiQC I have a max number of samples and after that I generate flat image plots, which scale to infinity without increasing file size. The Plotly library that we're using for the report has built-in options for exporting to flat images like this, at least in the Python version: https://plot.ly/python/static-image-export/

Phil

pditommaso · 2017-12-07T12:45:23Z

OK, but wouldn't be even easier moving this code NF side and leave the rendering on the client?

      for (var i = 0; i < window.data_byprocess[pname].length; i ++) {
        rc[i] = xround(parseInt(window.data_byprocess[pname][i]['%cpu']));
        pc[i] = xround((parseInt(window.data_byprocess[pname][i]['%cpu']) / (parseInt(window.data_byprocess[pname][i]['cpus']) * 100)) * 100);
        rm[i] = xround(parseInt(window.data_byprocess[pname][i]['vmem']) / 1000000000);
        pm[i] = xround((parseInt(window.data_byprocess[pname][i]['vmem']) / parseInt(window.data_byprocess[pname][i]['memory'])) * 100)
        rt[i] = xround(moment.duration( parseInt(window.data_byprocess[pname][i]['realtime']) ).asMinutes());
        pt[i] = xround((parseInt(window.data_byprocess[pname][i]['realtime']) / parseInt(window.data_byprocess[pname][i]['time'])) * 100)
        rd[i] = xround((parseInt(window.data_byprocess[pname][i]['read_bytes']) + parseInt(window.data_byprocess[pname][i]['write_bytes'])) / 1000000000);
        if (parseInt(window.data_byprocess[pname][i]['time']) > 0){ time_alloc_hasdata = true; }
        if (rd[i] > 0){ readwrite_hasdata = true; }
      }

ewels · 2017-12-07T12:48:11Z

Yes, if the raw plot data isn't too big I guess. If we have 10s or 100s thousand of tasks with four numbers each that could still be quite a bit of data and browser processor load. But yes maybe let's try that first!

Phil

pditommaso · 2017-12-07T12:56:46Z

Wait, basically the above code creates a separate series for each process name, but it still need to hold in memory and parse all tasks.

I was thinking it was calculating the min,max,median,etc values.

Is it not possible to give Plotly just the final values to render?

ewels · 2017-12-07T13:05:46Z

Apparently not 😞 plotly/plotly.js#1059

pditommaso · 2017-12-07T13:09:08Z

Too bad. This only leave the NF side image rendering, but I can't find the Java client for Plotly. Maybe it's possible to use the JS one with the Java embedded JavaScript engine ..

ewels · 2017-12-07T13:13:34Z

😰

It might be worth trying with just the numbers first. I imagine that 99% of the filesize will be the other JSON stuff.

pditommaso · 2017-12-07T13:18:26Z

Not so sure. I will give a try to see if it's possible to render a chart with Nashorn.

pditommaso · 2017-12-07T13:26:47Z

Wait, it seems it's possible to render a box plot just providing the Y data. Look for an example here. I guess this would give us much more control, including a way to fix #504.

What do you think?

ewels · 2017-12-11T11:52:28Z

Brilliant! This is totally cheating but why not??

pditommaso · 2017-12-11T15:30:43Z

Good. Being so if I implement the computation of [min, q1, median, q3, max] for each process, would you be able to render it? does it sound a good plan?

ewels · 2017-12-11T15:38:47Z

Yes, sounds great 👍

I hesitated about what the whiskers show - the min / max or some other metric. Plotly has some documentation on this here: https://help.plot.ly/what-is-a-box-plot/#the-whiskers - I guess the current plots show the min / max? May be worth double checking though to be sure that we're doing the same thing as normal reports.

pditommaso · 2017-12-11T15:46:51Z

OK, I will check that.

pditommaso · 2018-01-11T18:06:25Z

So, i've double checked this and the whiskers show the min and the max. How would you need the json data structured to render these info?

ewels · 2018-01-14T13:47:59Z

Great! Doesn't matter too much what the JSON looks like, I can work around it. At the moment we have window.data['trace'] that contains all of the stats for each task. Could instead have window.data['process'] that has these summary stats for the processes? Then if the former isn't defined the javascript can use the latter instead.

But whatever is easiest for you really - as long as the task JSON isn't printed to the HTML (to avoid the filesize) and the new summaries are then we should be able to adapt the javascript accordingly.

Phil

pditommaso · 2018-01-14T14:17:32Z

Ok. Let's try to speedup this. Could you provide the javascript snippet to render that charts using a fake json structure? then I can generate it dynamically on NF side.

ewels · 2018-01-15T14:30:50Z

Is it not nicer to just print the data into the report as JSON and then use the embedded javascript file that we already have for the javascript code?

Nothing fancy required. From the top of my head:

{
    "process_summary": {
        "time": [
            [ "fastqc", 12, 24, 35, 46, 90 ],
            [ "bwa", 120, 230, 340, 450, 560 ],
        ],
        "cpu": [
            [ "fastqc", 20, 50, 76, 100, 140 ],
        ],
    }
}

Where the arrays are process name, min, lower quartile, median, upper quartile, max. Or whatever.

Once we have this in as JSON in the report (the above printed in the same way as the existing data) I can plug it into the existing plot code pretty easily.

Phil

pditommaso · 2018-01-23T22:55:52Z

What about a structure like this:

{
   "bwa-mem":{
      "cpu":{
         "min":1000.0,
         "minLabel":"bwa-mem-1",
         "max":11000.0,
         "maxLabel":"bwa-mem-3",
         "q1":3500.0,
         "q2":6000.0,
         "q3":8500.0,
         "mean":6000.0
      },
      "mem":{
         "min":2000.0,
         "minLabel":"bwa-mem-1",
         "max":12000.0,
         "maxLabel":"bwa-mem-3",
         "q1":4500.0,
         "q2":7000.0,
         "q3":9500.0,
         "mean":7000.0
      },
      "time":{
         "min":3000.0,
         "minLabel":"bwa-mem-1",
         "max":13000.0,
         "maxLabel":"bwa-mem-3",
         "q1":5500.0,
         "q2":8000.0,
         "q3":10500.0,
         "mean":8000.0
      },
      "reads":{
         "min":4000.0,
         "minLabel":"bwa-mem-1",
         "max":14000.0,
         "maxLabel":"bwa-mem-3",
         "q1":6500.0,
         "q2":9000.0,
         "q3":11500.0,
         "mean":9000.0
      },
      "writes":{
         "min":5000.0,
         "minLabel":"bwa-mem-1",
         "max":15000.0,
         "maxLabel":"bwa-mem-3",
         "q1":7500.0,
         "q2":10000.0,
         "q3":12500.0,
         "mean":10000.0
      }
   },
   "multiqc":{
      "cpu":{
         "min":16000.0,
         "minLabel":"multiqc-1",
         "max":21000.0,
         "maxLabel":"multiqc-2",
         "q1":17250.0,
         "q2":18500.0,
         "q3":19750.0,
         "mean":18500.0
      },
      "mem":{
         "min":17000.0,
         "minLabel":"multiqc-1",
         "max":22000.0,
         "maxLabel":"multiqc-2",
         "q1":18250.0,
         "q2":19500.0,
         "q3":20750.0,
         "mean":19500.0
      },
      "time":{
         "min":18000.0,
         "minLabel":"multiqc-1",
         "max":23000.0,
         "maxLabel":"multiqc-2",
         "q1":19250.0,
         "q2":20500.0,
         "q3":21750.0,
         "mean":20500.0
      },
      "reads":{
         "min":19000.0,
         "minLabel":"multiqc-1",
         "max":24000.0,
         "maxLabel":"multiqc-2",
         "q1":20250.0,
         "q2":21500.0,
         "q3":22750.0,
         "mean":21500.0
      },
      "writes":{
         "min":20000.0,
         "minLabel":"multiqc-1",
         "max":25000.0,
         "maxLabel":"multiqc-2",
         "q1":21250.0,
         "q2":22500.0,
         "q3":23750.0,
         "mean":22500.0
      }
   }
}

ewels · 2018-01-23T23:57:43Z

Perfect! Labels are maybe not really needed, but doesn't hurt.

pditommaso · 2018-01-24T08:21:48Z

Good. Let me add a few notes:

the label are meant to provide an useful information when passing the mouse over the chart Include process tag in the hoverover of HTML reports #504
I've added also the mean which is not reported in the current chart.
I've split the storage data into reads and writes with the idea to render them in two separate charts, because I think the current report is not informative enough on disk usage.

This commit adds a new `summary` data object in the JSON payload embedded in the HTML execution report. This needs to be used to visualise the execution metrics charts, instead to compute them on client side which can be overkill for workflow executing a large number of tasks. Issues: #547, #504

pditommaso · 2018-01-24T23:01:45Z

Just pushed this change. Now the JSON payload includes a new summary data object structured as shown in the example in the previous comment.

pditommaso · 2018-01-25T09:36:23Z

I've fixed a couple of issues and added the logging of the summary JSON in the nextflow log.

A note more, there a could corner case in which a series eg. cpu, mem, etc it's empty because the metric are missing. Currently it's reported as a null. For example:

 {
     "foo": {
         "cpu": null,
         "mem": null,
         "time": {
             "mean": 82.25,
             "min": 54.0,
             "q1": 79.5,
             "q2": 90.0,
             "q3": 92.75,
             "max": 95.0,
             "minLabel": "foo (4)",
             "maxLabel": "foo (1)",
             "q1Label": "foo (4)",
             "q2Label": "foo (3)",
             "q3Label": "foo (2)"
         },
         "reads": null,
         "writes": null
     }
 }

pditommaso · 2018-01-31T16:33:31Z

Any progress here?

ewels · 2018-01-31T16:36:21Z

No sorry, haven't touched a keyboard for coding for a couple of weeks now. Next two weeks look packed too, will hopefully have some time after that (or will treat it as an evening / airplane project).

See you Monday! Maybe we can have a mini-hackathon during a coffee break ;)

pditommaso · 2018-01-31T16:37:16Z

It sounds good!

pditommaso · 2018-02-16T09:52:49Z

@ewels just to prevent any duplicated work, with @emi80 we have almost completed this.

ewels · 2018-02-16T10:50:22Z

Ah, you guys are heroes! Sorry for being so slow with this - I had a stab the other day at the airport back from Barcelona but couldn't get the testing environment running again for some reason so moved on to some of the MultiQC backlog instead.

This commit render the execution report boxplots using the report summary data collected by nextflow at runtime. The trace data is omitted when the number of tasks is greater than 10’000 (default) and the related table is not shown

pditommaso · 2018-02-16T19:12:31Z

OK. We refactored a bit the report to allow the rendering of large number of tasks. Main changes are the following:

task table rendering has been optimised to improve performance
trace data is retuned only when it contains < 10'000 tasks (default), for a higher number the tasks table is not shown
boxplots are rendered using the summary data produced by NF
as side effect the distribution is not showed any more (that's really a pity, but i was unable to find a way to render it at least when the trace records are available.

Please review it at your earliest convenience.

pditommaso · 2018-02-16T19:19:28Z

Sorry one more thing, now we have the information of the task name for each metric ie. minLabel, q1Label, etc

However I was unable to show it over the boxplot as it was suggesting #504. Any idea if it's plotty allows that?

pditommaso added this to the v0.27.0 milestone Dec 7, 2017

pditommaso mentioned this issue Dec 11, 2017

Workflow report should warn if some task executions were ignored #534

Closed

pditommaso modified the milestones: v0.27.0, v0.28.0 Dec 19, 2017

pditommaso added the help wanted label Jan 10, 2018

pditommaso added a commit that referenced this issue Jan 25, 2018

Fixed summary stats when for empty series #547

952998f

pditommaso removed the help wanted label Feb 16, 2018

pditommaso closed this as completed Feb 22, 2018

The execution report cannot handle big workflows #547

The execution report cannot handle big workflows #547

Comments

pditommaso commented Dec 7, 2017

ewels commented Dec 7, 2017

Uh oh!

pditommaso commented Dec 7, 2017

Uh oh!

ewels commented Dec 7, 2017

Uh oh!

pditommaso commented Dec 7, 2017

Uh oh!

ewels commented Dec 7, 2017

Uh oh!

pditommaso commented Dec 7, 2017

Uh oh!

ewels commented Dec 7, 2017

Uh oh!

pditommaso commented Dec 7, 2017

Uh oh!

ewels commented Dec 7, 2017

Uh oh!

pditommaso commented Dec 7, 2017

Uh oh!

pditommaso commented Dec 7, 2017

Uh oh!

ewels commented Dec 11, 2017

Uh oh!

pditommaso commented Dec 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ewels commented Dec 11, 2017

Uh oh!

pditommaso commented Dec 11, 2017

Uh oh!

pditommaso commented Jan 11, 2018

Uh oh!

ewels commented Jan 14, 2018

Uh oh!

pditommaso commented Jan 14, 2018

Uh oh!

ewels commented Jan 15, 2018

Uh oh!

pditommaso commented Jan 23, 2018

Uh oh!

ewels commented Jan 23, 2018

Uh oh!

pditommaso commented Jan 24, 2018

Uh oh!

pditommaso commented Jan 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pditommaso commented Jan 25, 2018

Uh oh!

pditommaso commented Jan 31, 2018

Uh oh!

ewels commented Jan 31, 2018

Uh oh!

pditommaso commented Jan 31, 2018

Uh oh!

pditommaso commented Feb 16, 2018

Uh oh!

ewels commented Feb 16, 2018

Uh oh!

pditommaso commented Feb 16, 2018

Uh oh!

pditommaso commented Feb 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pditommaso commented Dec 11, 2017 •

edited

Loading

pditommaso commented Jan 24, 2018 •

edited

Loading

pditommaso commented Feb 16, 2018 •

edited

Loading