[Web] WebGPU and WASM Backends Unavailable within Service Worker #20876

ggaabe · 2024-05-30T23:09:14Z

Describe the issue

I'm running into issues trying to use the WebGPU or WASM backends inside of a ServiceWorker (on a chrome extension). More specifically, I'm attempting to use Phi-3 with transformers.js v3

Every time I attempt this, I get the following error:

Uncaught (in promise) Error: no available backend found. ERR: [webgpu] 
TypeError: import() is disallowed on ServiceWorkerGlobalScope by the HTML specification. 
See https://github.com/w3c/ServiceWorker/issues/1356.

This is originating in the InferenceSession class in js/common/lib/inference-session-impl.ts.

More specifically, it's happening in this method:
const [backend, optionsWithValidatedEPs] = await resolveBackendAndExecutionProviders(options);
where the implementation is in js/common/lib/backend-impl.ts and the tryResolveAndInitializeBackend fails to initialize any of the execution providers.

WebGPU is now supported in ServiceWorkers though; it is a recent change and it should be feasible. Here were the chrome release notes.

Additionally, here is an example browser extension from the mlc-ai/web-llm framework that implements WebGPU usage in service workers successfully:
https://github.com/mlc-ai/web-llm/tree/main/examples/chrome-extension-webgpu-service-worker

Here is some further discussion on this new support from Google itself:
https://groups.google.com/a/chromium.org/g/chromium-extensions/c/ZEcSLsjCw84/m/WkQa5LAHAQAJ

So technically I think it should be possible for this to be supported now? Unless I'm doing something else glaringly wrong. Is it possible to add support for this?

To reproduce

Download and set up the transformers.js extension example and put this into the background.js file:

// background.js - Handles requests from the UI, runs the model, then sends back a response

import {
  pipeline,
  env,
  AutoModelForCausalLM,
  AutoTokenizer,
  TextStreamer,
  StoppingCriteria,
} from "@xenova/transformers";

// Skip initial check for local models, since we are not loading any local models.
env.allowLocalModels = false;

// Due to a bug in onnxruntime-web, we must disable multithreading for now.
// See https://github.com/microsoft/onnxruntime/issues/14445 for more information.
env.backends.onnx.wasm.numThreads = 1;

class CallbackTextStreamer extends TextStreamer {
  constructor(tokenizer, cb) {
    super(tokenizer, {
      skip_prompt: true,
      skip_special_tokens: true,
    });
    this.cb = cb;
  }

  on_finalized_text(text) {
    this.cb(text);
  }
}

class InterruptableStoppingCriteria extends StoppingCriteria {
  constructor() {
    super();
    this.interrupted = false;
  }

  interrupt() {
    this.interrupted = true;
  }

  reset() {
    this.interrupted = false;
  }

  _call(input_ids, scores) {
    return new Array(input_ids.length).fill(this.interrupted);
  }
}

const stopping_criteria = new InterruptableStoppingCriteria();

async function hasFp16() {
  try {
    const adapter = await navigator.gpu.requestAdapter();
    return adapter.features.has("shader-f16");
  } catch (e) {
    return false;
  }
}

class PipelineSingleton {
  static task = "feature-extraction";
  static model_id = "Xenova/Phi-3-mini-4k-instruct_fp16";
  static model = null;
  static instance = null;

  static async getInstance(progress_callback = null) {
    this.model_id ??= (await hasFp16())
      ? "Xenova/Phi-3-mini-4k-instruct_fp16"
      : "Xenova/Phi-3-mini-4k-instruct";

    this.tokenizer ??= AutoTokenizer.from_pretrained(this.model_id, {
      legacy: true,
      progress_callback,
    });

    this.model ??= AutoModelForCausalLM.from_pretrained(this.model_id, {
      dtype: "q4",
      device: "webgpu",
      use_external_data_format: true,
      progress_callback,
    });

    return Promise.all([this.tokenizer, this.model]);
  }
}

// Create generic classify function, which will be reused for the different types of events.
const classify = async (text) => {
  // Get the pipeline instance. This will load and build the model when run for the first time.
  const [tokenizer, model] = await PipelineSingleton.getInstance((data) => {
    // You can track the progress of the pipeline creation here.
    // e.g., you can send `data` back to the UI to indicate a progress bar
    console.log("progress", data);
    // data logs as this:
    /**
     * 
     * {
    "status": "progress",
    "name": "Xenova/Phi-3-mini-4k-instruct_fp16",
    "file": "onnx/model_q4.onnx",
    "progress": 99.80381792394503,
    "loaded": 836435968,
    "total": 838080131
  }

  when complete, last status will be 'done'
     */
  });
  /////////////
  const inputs = tokenizer.apply_chat_template(text, {
    add_generation_prompt: true,
    return_dict: true,
  });

  let startTime;
  let numTokens = 0;
  const cb = (output) => {
    startTime ??= performance.now();

    let tps;
    if (numTokens++ > 0) {
      tps = (numTokens / (performance.now() - startTime)) * 1000;
    }
    self.postMessage({
      status: "update",
      output,
      tps,
      numTokens,
    });
  };

  const streamer = new CallbackTextStreamer(tokenizer, cb);

  // Tell the main thread we are starting
  self.postMessage({ status: "start" });

  const outputs = await model.generate({
    ...inputs,
    max_new_tokens: 512,
    streamer,
    stopping_criteria,
  });
  const outputText = tokenizer.batch_decode(outputs, {
    skip_special_tokens: false,
  });

  // Send the output back to the main thread
  self.postMessage({
    status: "complete",
    output: outputText,
  });
  ///////////////

  // Actually run the model on the input text
  // let result = await model(text);
  // return result;
};

////////////////////// 1. Context Menus //////////////////////
//
// Add a listener to create the initial context menu items,
// context menu items only need to be created at runtime.onInstalled
chrome.runtime.onInstalled.addListener(function () {
  // Register a context menu item that will only show up for selection text.
  chrome.contextMenus.create({
    id: "classify-selection",
    title: 'Classify "%s"',
    contexts: ["selection"],
  });
});

// Perform inference when the user clicks a context menu
chrome.contextMenus.onClicked.addListener(async (info, tab) => {
  // Ignore context menu clicks that are not for classifications (or when there is no input)
  if (info.menuItemId !== "classify-selection" || !info.selectionText) return;

  // Perform classification on the selected text
  let result = await classify(info.selectionText);

  // Do something with the result
  chrome.scripting.executeScript({
    target: { tabId: tab.id }, // Run in the tab that the user clicked in
    args: [result], // The arguments to pass to the function
    function: (result) => {
      // The function to run
      // NOTE: This function is run in the context of the web page, meaning that `document` is available.
      console.log("result", result);
      console.log("document", document);
    },
  });
});
//////////////////////////////////////////////////////////////

////////////////////// 2. Message Events /////////////////////
//
// Listen for messages from the UI, process it, and send the result back.
chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
  console.log("sender", sender);
  if (message.action !== "classify") return; // Ignore messages that are not meant for classification.

  // Run model prediction asynchronously
  (async function () {
    // Perform classification
    let result = await classify(message.text);

    // Send response back to UI
    sendResponse(result);
  })();

  // return true to indicate we will send a response asynchronously
  // see https://stackoverflow.com/a/46628145 for more information
  return true;
});

Urgency

this would help enable a new ecosystem to build up around locally intelligent browser extensions and tooling.

it's urgent for me because it would be fun to build and I want to build it and it would be fun to be building it rather than not be building it.

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.19.0-dev.20240509-69cfcba38a

Execution Provider

'webgpu' (WebGPU)

The text was updated successfully, but these errors were encountered:

fs-eire · 2024-05-31T10:14:31Z

Than you for reporting this issue. I will try to figure out how to fix this problem.

fs-eire · 2024-06-02T01:18:01Z

So it turns out to be that dynamic import (ie. import()) and top-level await is not supported in current service worker. I was not expecting that import() is banned in SW.

Currently, the WebAssembly factory (wasm-factory.ts) uses dynamic import to load the JS glue. This does not work in service worker. A few potential solutions are also not available:

Modifying it to import statement: won't work, because the JS glue includes top-level await.
Using importScripts: won't work, because the JS glue is ESM
Using eval: won't work; same to importScripts

I am now trying to make a JS bundle that does not use dynamic import for usage of service worker specifically. Still working on it

ggaabe · 2024-06-02T03:03:34Z

Thanks, I appreciate your efforts around this. It does seem like some special-case bundle will need to be built after all; you might need iife or umd for the bundler output format

fs-eire · 2024-06-02T04:41:03Z

Thanks, I appreciate your efforts around this. It does seem like some special-case bundle will need to be built after all; you might need iife or umd for the bundler output format

I have considered this option. However, Emscripten does not offer an option to output both UMD(IIFE+CJS) & ESM for JS glue (emscripten-core/emscripten#21899). I have to choose either. I choose the ES6 format output for the JS glue, because of a couple of problems when import UMD from ESM, and import() is a standard way to import ESM from both ESM and UMD. ( Until I know its not working in service worker by this issue)

I found a way to make ORT web working, - yes this need the build script to do some special handling. And this will only work for ESM, because the JS glue is ESM and it seems no way to import ESM from UMD in service worker.

### Description  This PR allows to build ORT web to `ort{.all|.webgpu}.bundle.min.mjs`, which does not have any dynamic import. This makes it possible to use ort web via static import in service worker. Fixes #20876

fs-eire · 2024-06-05T06:08:40Z

@ggaabe Could you please help to try import * as ort from “./ort.webgpu.bundle.min.js” from version 1.19.0-dev.20240604-3dd6fcc089 ?

ggaabe · 2024-06-06T00:35:27Z

@fs-eire my project is dependent on transformersjs, which imports onnxruntime webgpu backend like this here:

https://github.com/xenova/transformers.js/blob/v3/src/backends/onnx.js#L24

Is this the right usage? In my project I've added this to my package.json to resolve onnx-runtime to this new version though the issue is still occurring:

  "overrides": {
    "onnxruntime-web": "1.19.0-dev.20240604-3dd6fcc089"
  }

ggaabe · 2024-06-06T00:45:27Z

Maybe also important: The same error is still occurring in same spot in inference session in the onnx package and not from transformersjs. Do I need to add a resolver for onnxruntime-common as well?

fs-eire · 2024-06-10T23:04:06Z

#20991 makes default ESM import to use non-dynamic-import and hope this change may fix this problem. PR is still in progress

ggaabe · 2024-06-12T22:36:00Z

Hi @fs-eire, is the newly-merged fix in a released build I can try?

fs-eire · 2024-06-13T03:39:18Z

Please try 1.19.0-dev.20240612-94aa21c3dd

ggaabe · 2024-06-13T23:24:04Z

@fs-eire EDIT: Nvm the comment I just deleted, that error was because I didn't set the webpack target to webworker.

However, I'm getting a new error now (progress!):

Error: no available backend found. ERR: [webgpu] RuntimeError: null function or function signature mismatch

ggaabe · 2024-06-13T23:33:52Z

Update: Found the error is happening in here:

onnxruntime/js/common/lib/backend-impl.ts

Lines 83 to 86 in fff68c3

    
           if (!isInitializing) { 
        
             backendInfo.initPromise = backendInfo.backend.init(backendName); 
        
           } 
        
           await backendInfo.initPromise;

For some reason the webgpu backend.init promise is rejecting due to the null function or function signature mismatch error. This is much further along than we were before though.

fs-eire · 2024-06-14T05:46:47Z

Update: Found the error is happening in here:

onnxruntime/js/common/lib/backend-impl.ts

Lines 83 to 86 in fff68c3

if (!isInitializing) {

backendInfo.initPromise = backendInfo.backend.init(backendName);

}

await backendInfo.initPromise;

For some reason the webgpu backend.init promise is rejecting due to the null function or function signature mismatch error. This is much further along than we were before though.

Could you share me the reproduce steps?

ggaabe · 2024-06-14T14:39:55Z

@fs-eire You'll need to run the webGPU setup in a chrome extension.

You can use my code I just published here: https://github.com/ggaabe/extension
run npm install
run npm run build
open the chrome manage extensions

load unpacked

select the build folder from the repo.
open the AI WebGPU Extension extension
type some text in the text input. it will load Phi-3 mini and after finishing loading this error will occur
if you view the extension in the extension in the extension manager and select the "Inspect views
service worker" link before opening the extension it will bring up an inspection window to view the errors as they occur. A little "errors" bubble link also shows up here after they occur.

You will need to click the "Refresh" button on the extension in the extension manager to rerun the error because it does not attempt reloading the model after the first attempt until another refresh

fs-eire · 2024-06-18T00:46:03Z

@ggaabe I did some debug on my box and made some fixes -

Changes to ONNXRuntime Web:

[js/web] skip default locateFile() when dynamic import is disabled #21073 is created to make sure the web assembly file can be loaded correctly when env.wasm.wasmPaths is not specified.
Changes to https://github.com/ggaabe/extension

fix ORT wasm loading ggaabe/extension#1 need to be made to the extension example, to make it load the model correctly. Please note:
- The onnxruntime-web version need to be updated to consume changes from (1) (after it get merged and published for dev channel)
- There are still errors in background.js, which looks like incorrect params passed to tokenizer.apply_chat_template(). However, the WebAssembly is initialized and the model loaded successfully.
Other issues:
- Transformerjs overrides env.wasm.wasmPaths to a CDN URL internally. At least for this example, we don't want this behavior so we need to reset it to undefined to keep the default behavior.
- Multi-threaded CPU EP is not supported because Worker is not accessible in service worker. Issue tracking: Allow workers & shared workers to be created within a service worker whatwg/html#8362

ggaabe · 2024-06-18T20:50:21Z

Awesome, thank you for your thoroughness in explaining this and tackling this head on. Is there a dev channel version I can test out?

fs-eire · 2024-06-18T23:03:03Z

Not yet. Will update here once it is ready.

ggaabe · 2024-06-23T01:34:38Z

sorry to bug; is there any dev build number? wasn't sure how often a release runs

fs-eire · 2024-06-23T20:51:23Z

sorry to bug; is there any dev build number? wasn't sure how often a release runs

Please try 1.19.0-dev.20240621-69d522f4e9

ggaabe · 2024-06-23T22:14:34Z

@fs-eire I'm getting one new error:

ort.webgpu.bundle.min.mjs:6 Uncaught (in promise) Error: The data is not on CPU. Use `getData()` to download GPU data to CPU, or use `texture` or `gpuBuffer` property to access the GPU data directly.
    at get data (ort.webgpu.bundle.min.mjs:6:13062)
    at get data (tensor.js:62:1)

I pushed the code changes to my repo and fixed the call to the tokenizer. To reproduce, just type 1 letter in the chrome extension’s text input and wait

nickl1234567 · 2024-06-24T12:22:00Z

Hey, I also need this. I am struggling with importing this version. So far I have been importing ONNX using
import * as ort from "https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/esm/ort.webgpu.min.js".
However, when I change to import * as ort from "https://cdn.jsdelivr.net/npm/[email protected]/dist/esm/ort.webgpu.min.js" it seems not to have an .../esm/ folder. Do you know why that is and how to import it then?

fs-eire · 2024-06-24T16:11:06Z

Hey, I also need this. I am struggling with importing this version. So far I have been importing ONNX using import * as ort from "https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/esm/ort.webgpu.min.js". However, when I change to import * as ort from "https://cdn.jsdelivr.net/npm/[email protected]/dist/esm/ort.webgpu.min.js" it seems not to have an .../esm/ folder. Do you know why that is and how to import it then?

just replace .../esm/ort.webgpu.min.js to .../ort.webgpu.min.mjs should work. If you are also using service worker, use ort.webgpu.bundle.min.mjs instead of ort.webgpu.min.mjs.

fs-eire · 2024-07-28T08:28:47Z

I created #21534, which is a replacement of #21430:

exposing the instantiateWasm directly may be not a good idea because this requires user to understand details of how WebAssembly works. I think this is unnecessary. To fulfill the requirement, allowing user to set an ArrayBuffer of the .wasm file should be good enough.
the import() is disallowed ... error is already fixed after version 1.19.0-dev.20240621-69d522f4e9. (only work for ESM. UMD will not work)
with this fix in ort-web, a few things are no longer need:
- hacks on package.json and bundle
- the top-level await is already taking care of by the build script of ort-web.

### Description This PR adds a new option `ort.env.wasm.wasmBinary`, which allows user to set to a buffer containing preload .wasm file content. This PR should resolve the problem from latest discussion in #20876.

kyr0 · 2024-08-01T21:11:07Z

@fs-eire Sounds fair, and thank you for your work on this. Is there a new dev release that contains #21534 and that I could use to test, maybe? I'd surely only use ESM, so there shouldn't be an issue with that.

fs-eire · 2024-08-06T10:07:48Z

1.19.0-dev.20240801-4b8f6dcbb6 includes the change.

lucasgelfond · 2024-08-09T19:33:55Z

To clarify, is the best way to go about running transformers.js with WebGPU for the onnxruntime to monkeypatch the package to make the necessary wasm stuff load in each service worker, a la @kyr0's easy-embeddings? (Having some issues with that workflow, see kyr0/easy-embeddings#1)

Has anyone had luck / have tips for just running the v3 branch of Transformers.js? Or, maybe more precisely — do we know how something like Segment Anything WebGPU, which Xenova has in an HF Space, is working? Seems like there's been some official solution here but I can't find it documented / implemented well.

fs-eire · 2024-08-09T22:30:14Z

To clarify, is the best way to go about running transformers.js with WebGPU for the onnxruntime to monkeypatch the package to make the necessary wasm stuff load in each service worker, a la @kyr0's easy-embeddings? (Having some issues with that workflow, see kyr0/easy-embeddings#1)

Has anyone had luck / have tips for just running the v3 branch of Transformers.js? Or, maybe more precisely — do we know how something like Segment Anything WebGPU, which Xenova has in an HF Space, is working? Seems like there's been some official solution here but I can't find it documented / implemented well.

I am working with Transformer.js to make v3 branch compatible with latest module system. This is one of the merged changes: huggingface/transformers.js#864. You probably need to use some workaround for now, but (hopefully) eventually you should be able to use it out of box.

kyr0 · 2024-08-10T09:12:27Z

@lucasgelfond Now that the new updates from @fs-eire are in place, I'm probably able to streamline the workaround. I'll have a look soon, but as I'm on vacation right now, I cannot give an ETA, unfortunately.

lucasgelfond · 2024-08-11T15:45:14Z

Thank you @fs-eire and @kyr0 ! No huge rush on my end, ended up getting inference working on WebGPU just on vanilla onnxruntime, will share results in a bit!

lucasgelfond · 2024-08-11T23:11:31Z

Has anyone tried getting these imports working in Vite/other bundlers? When I try the classic:

import * as ONNX_WEBGPU from 'onnxruntime/webgpu

(which works in create-react-app), Vite says:

Error:   Failed to scan for dependencies from entries:
  /Users/focus/Projects/---/webgpu-sam2/frontend/src/routes/+page.svelte

  ✘ [ERROR] Missing "./webgpu/index.js" specifier in "onnxruntime-web" package [plugin vite:dep-scan]

Anyways, I tried importing from url, a la

import { InferenceSession, Tensor as ONNX_TENSOR } from 'https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.webgpu.min.js';

which Vite also doesn't like

3:34:51 PM [vite] Error when evaluating SSR module /src/encoder.svelte: failed to import "https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.webgpu.min.js"
|- Error [ERR_UNSUPPORTED_ESM_URL_SCHEME]: Only URLs with a scheme in: file, data, and node are supported by the default ESM loader.

I disabled SSR in Svelte but still seemingly no luck/change.

I tried manually downloading the files with CURL, where I got an error about the lack of source map, so, I also downloaded .min.js.map. When I run it now, this works, but I get back to the original error in the thread about unavailable backends:

Error: no available backend found. ERR: [webgpu] TypeError: Failed to fetch dynamically imported module: http://localhost:5173/src/ort-wasm-simd-threaded.jsep.mjs

I figured it might work to just import directly, so I also tried:

import * as ONNX_WEBGPU from 'onnxruntime-web/dist/ort.webgpu.min.mjs';

but then I got
4:07:38 PM [vite] Internal server error: Missing "./dist/ort.webgpu.min.mjs" specifier in "onnxruntime-web" package

Anyone have ideas of how to handle? Happy to add more verbose error messages for any of the stuff above.

fs-eire · 2024-08-12T00:35:53Z

Has anyone tried getting these imports working in Vite/other bundlers? When I try the classic:

import * as ONNX_WEBGPU from 'onnxruntime/webgpu

(which works in create-react-app), Vite says:
Error:   Failed to scan for dependencies from entries:
  /Users/focus/Projects/---/webgpu-sam2/frontend/src/routes/+page.svelte

  ✘ [ERROR] Missing "./webgpu/index.js" specifier in "onnxruntime-web" package [plugin vite:dep-scan]
Anyways, I tried importing from url, a la

import { InferenceSession, Tensor as ONNX_TENSOR } from 'https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.webgpu.min.js';

which Vite also doesn't like
3:34:51 PM [vite] Error when evaluating SSR module /src/encoder.svelte: failed to import "https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.webgpu.min.js"
|- Error [ERR_UNSUPPORTED_ESM_URL_SCHEME]: Only URLs with a scheme in: file, data, and node are supported by the default ESM loader. 
I disabled SSR in Svelte but still seemingly no luck/change.

I tried manually downloading the files with CURL, where I got an error about the lack of source map, so, I also downloaded .min.js.map. When I run it now, this works, but I get back to the original error in the thread about unavailable backends:
Error: no available backend found. ERR: [webgpu] TypeError: Failed to fetch dynamically imported module: http://localhost:5173/src/ort-wasm-simd-threaded.jsep.mjs
I figured it might work to just import directly, so I also tried:

import * as ONNX_WEBGPU from 'onnxruntime-web/dist/ort.webgpu.min.mjs';

but then I got 4:07:38 PM [vite] Internal server error: Missing "./dist/ort.webgpu.min.mjs" specifier in "onnxruntime-web" package

Anyone have ideas of how to handle? Happy to add more verbose error messages for any of the stuff above.

Could you share me a repo that I can reproduce the issue? I will take a look.

lucasgelfond · 2024-08-12T04:10:56Z

@fs-eire you are amazing! https://github.com/lucasgelfond/webgpu-sam2

I swapped over to Webpack (in the svelte-webpack directory) but the original Vite version is in there. No immediate rush because I solved temporarily with Webpack, but Webpack breaks some other imports so would be awesome to move back—thanks so much again!

lucasgelfond · 2024-08-13T16:55:14Z

(Non-working Vite stuff is now in previous commits because I shipped this!)

If you want to play with it, SAM-2 working totally in WebGPU! Here's a live demo link and here's the source, thanks all for the help!

Eldow · 2024-09-09T08:57:06Z

👋 Thank you @fs-eire ! I tried using 1.19.0-dev.20240801-4b8f6dcbb6 inside of a chrome mv3 extension and it worked right away with the webgpu backend, however I'm more interested in using the wasm backend for running a simple decision forest as it doesn't include jsep and makes the overall bundle 10MB lighter. I wondered if there were any plans to support it too in a near future ?

fs-eire · 2024-09-09T18:53:23Z

👋 Thank you @fs-eire ! I tried using 1.19.0-dev.20240801-4b8f6dcbb6 inside of a chrome mv3 extension and it worked right away with the webgpu backend, however I'm more interested in using the wasm backend for running a simple decision forest as it doesn't include jsep and makes the overall bundle 10MB lighter. I wondered if there were any plans to support it too in a near future ?

Doesn't it work if by just replacing import "onnxruntime-web/webgpu" to import "onnxruntime-web"?

Eldow · 2024-09-10T07:22:47Z

It works indeed 😵 I tried doing import "onnxruntime-web/wasm" at first and this one still returned ERR: [wasm] TypeError: import() is disallowed on ServiceWorkerGlobalScope by the HTML specification

juliankolbe · 2024-09-11T17:34:55Z

So im still having this issue in 1.19.2.

This is in the context of a chrome extension, mv3, webgpu works fine. However cpu/wasm (as far as i understand they are aliases) does not. Ive been through the code and to me looks like the dynamic import is always happening and cant be skipped.

This:

onnxruntime/js/web/lib/wasm/wasm-factory.ts

Line 119 in e91ff94

    
           const [objectUrl, ortWasmFactory] = await importWasmModule(mjsPathOverride, wasmPrefixOverride, numThreads > 1);

Calling this:

onnxruntime/js/web/lib/wasm/wasm-utils-import.ts

Line 186 in e91ff94

    
           return [needPreload ? url : undefined, await dynamicImportDefault<EmscriptenModuleFactory<OrtWasmModule>>(url)];

Seems to lead to this:

ERR: [cpu] TypeError: import() is disallowed on ServiceWorkerGlobalScope by the HTML specification

I realize the poster above me is running the same setup and has it working, but Im really not sure what to do differently.

Using code from the test file, ive tried replicating it like so, but this doesnt seem to work:

import ort from "onnxruntime-web";

const binaryURL = ONNX_WASM_CDN_URL + "ort-wasm-simd-threaded.wasm";
const response = await fetch(binaryURL)
const binary = await response.arrayBuffer();
ort.env.wasm.wasmBinary = binary;
ort.env.wasm.numThreads = 1;

const session = await ort.InferenceSession.create(modelPath, {
  executionProviders: ["cpu"],
});

fs-eire · 2024-09-11T19:04:33Z

So im still having this issue in 1.19.2.

This is in the context of a chrome extension, mv3, webgpu works fine. However cpu/wasm (as far as i understand they are aliases) does not. Ive been through the code and to me looks like the dynamic import is always happening and cant be skipped.

This:

onnxruntime/js/web/lib/wasm/wasm-factory.ts

Line 119 in e91ff94

const [objectUrl, ortWasmFactory] = await importWasmModule(mjsPathOverride, wasmPrefixOverride, numThreads > 1);

Calling this:

onnxruntime/js/web/lib/wasm/wasm-utils-import.ts

Line 186 in e91ff94

return [needPreload ? url : undefined, await dynamicImportDefault<EmscriptenModuleFactory<OrtWasmModule>>(url)];

Seems to lead to this:

ERR: [cpu] TypeError: import() is disallowed on ServiceWorkerGlobalScope by the HTML specification

I realize the poster above me is running the same setup and has it working, but Im really not sure what to do differently.

Using code from the test file, ive tried replicating it like so, but this doesnt seem to work:
import ort from "onnxruntime-web";

const binaryURL = ONNX_WASM_CDN_URL + "ort-wasm-simd-threaded.wasm";
const response = await fetch(binaryURL)
const binary = await response.arrayBuffer();
ort.env.wasm.wasmBinary = binary;
ort.env.wasm.numThreads = 1;

const session = await ort.InferenceSession.create(modelPath, {
  executionProviders: ["cpu"],
});

If you are using 1.19.2 and still ran into this error, it is probably because your bundler imports onnxruntime-web as UMD. please verify the following:

check the version is 1.19.2 in <your_project_root>/node_modules/onnxruntime-web/package.json
check if your bundler actually loads file <your_project_root>/node_modules/onnxruntime-web/dist/ort.bundle.min.mjs. You can backup this file, then put a few illegal characters inside and run your bundler to quickly check. if you bundler still builds, that means it's not picking this file.

juliankolbe · 2024-09-12T08:33:14Z

Thanks so much for your help! The bundler was indeed the issue, for anyone reading this: I was using vite 4 and it was prefering the browser field in the package.json which led to the wrong file. Switching to vite 5 solves that issue as you can change the order of fields, even though it will by default already prefer the exports field.

I have another issue now though: Now that the correct file has made it, I am getting this error:

Error: no available backend found. ERR: [cpu] ReferenceError: Worker is not defined

any ideas what that might be?

edit:
Is tha poissble reason that web workers are not available in the background script of a chrome extension? (webgpu works here)
maybe @Eldow can shed some light on this?

edit#2:
Issue solved it seems, so:
content script, popup, background => webgpu
content script, popup, => webgpu, cpu

web worker is not available in service workers which run through the background script, hence cpu does not work there

edit#3

should multithreading be possible in a chrome extension? ive got this crossOriginIsolated blocking it. In a webapp easy to solve with headers, but in a chrome extension tricky.

So yeah if anyone has successfully used cpu multithreading in a chrome extension, doesnt matter how, please let me know.

fs-eire · 2024-09-13T00:45:47Z

In my understanding, Worker is unavailable in service worker. I have no idea whether this is by design or not. However, if you don't use multi-thread (set ort.env.wasm.numThreads = 1;) and disable proxy (do not set ort.env.wasm.proxy), onnxruntime-web should not use Worker.

Eldow · 2025-01-15T15:19:59Z

I have a follow-up issue that I struggle to understand: when using onnxruntime-web in a mv3 extension service-worker bundled with webpack.

The symptoms are that the service worker will crash right away with this error:

Uncaught ReferenceError: document is not defined

...

/******/ 	/* webpack/runtime/jsonp chunk loading */
/******/ 	(() => {
/******/ 		__webpack_require__.b = document.baseURI || self.location.href;

Here is a small reproduction sandbox:
https://codesandbox.io/p/devbox/cocky-hamilton-5m398f

(Download the dist folder to try it out in a web browser where you can load unpacked extensions - I tested in Chrome & firefox)

There is a way to make it work using output.chunkLoading = false in the webpack configuration but this is not ideal.
Could it be related to loading wasm with webpack: https://stackoverflow.com/a/71673305 ?

I am not sure if this requires a dedicated issue or if it's okay to follow up in this one, let me know @fs-eire

fs-eire · 2025-01-15T19:46:44Z

I have a follow-up issue that I struggle to understand: when using onnxruntime-web in a mv3 extension service-worker bundled with webpack.

The symptoms are that the service worker will crash right away with this error:
Uncaught ReferenceError: document is not defined

...

/******/ 	/* webpack/runtime/jsonp chunk loading */
/******/ 	(() => {
/******/ 		__webpack_require__.b = document.baseURI || self.location.href;
Here is a small reproduction sandbox: https://codesandbox.io/p/devbox/cocky-hamilton-5m398f

(Download the dist folder to try it out in a web browser where you can load unpacked extensions - I tested in Chrome & firefox)

There is a way to make it work using output.chunkLoading = false in the webpack configuration but this is not ideal. Could it be related to loading wasm with webpack: https://stackoverflow.com/a/71673305 ?

I am not sure if this requires a dedicated issue or if it's okay to follow up in this one, let me know @fs-eire

It looks like that you are using jsonp for chunk loading in webpack. I don't think this will work in a service worker.

I think the solution is to add target: 'webworker', like this:

 module.exports = {
   mode: "development",
   devtool: false,
+  target: 'webworker',
   entry: {

Then I can see another error, which is expected (because the model is an invalid dummy file):

[Background] Error loading ONNX model or running inference: Can't create a session. ERROR_CODE: 7, ERROR_MESSAGE: Failed to load model because protobuf parsing failed

Eldow · 2025-01-16T10:39:11Z

Thanks for looking into it !
I think this is what I will end up doing but I am wondering if webpack could have used the correct loader if emscripten generated a slightly different code to locate the wasm file without the case that relies on URL.

That could maybe be solved by generating a worker compatible only JS file through emscripten using the flag ENVIRONMENT="worker" ? (I am building from source using the doc so I have control over the args passed to emscripten at that stage)

I will give it a try and let you know !

edit: I tried adding --emscripten_settings ENVIRONMENT="worker" to the build wasm command and faced new issues in the esbuild compilation afterwards regarding Unexpected number of matches for minified "PThread.pthreads[thread].ref()" - I'm stopping trying to do it this way in favor of the webpack config way.

qdrk · 2025-01-19T04:59:27Z

This is somehow broken again in the latest dev releases. For example in 1.21.0-dev.20250117-db8e10b0b9, whereas just changing it back to 1.21.0-dev.20241205-d27fecd3d3 works.

xcaptain · 2025-01-23T05:59:14Z

Same problem here, how should I solve the no available backend error in chrome extension background script?

qdrk · 2025-01-24T16:05:23Z

More discussion here: #20991 (comment)

fs-eire · 2025-01-29T22:15:27Z

Please refer to #20991 (comment), which contains latest solution and explanation for using ORT-web in service worker.

fs-eire · 2025-01-29T22:25:14Z

Thanks for looking into it ! I think this is what I will end up doing but I am wondering if webpack could have used the correct loader if emscripten generated a slightly different code to locate the wasm file without the case that relies on URL.

That could maybe be solved by generating a worker compatible only JS file through emscripten using the flag ENVIRONMENT="worker" ? (I am building from source using the doc so I have control over the args passed to emscripten at that stage)

I will give it a try and let you know !

edit: I tried adding --emscripten_settings ENVIRONMENT="worker" to the build wasm command and faced new issues in the esbuild compilation afterwards regarding Unexpected number of matches for minified "PThread.pthreads[thread].ref()" - I'm stopping trying to do it this way in favor of the webpack config way.

I tried to use Emscripten to build multiple targets (emscripten-core/emscripten#21899), but it turns out difficult to do. Currently the onnxruntime-web build process has a few post-processing to make them working for Web, WebWorker and Node.

It may also be a problem for how to tell the runtime or bundler to import the correct file, giving the predefined "conditions" in the package.json "exports" property do not contain anything that distinguish between web and webworker.

Eldow · 2025-01-30T16:39:57Z

I think you're right, since emscripten outputs a web compatible file and webpack by default uses the web target it ends up trying to use the document jsonp style loader. I'm not sure if there is a way to differentiate them so webpack can pick the right import strategy automatically without needing a hint with target: 'webworker' but maybe different exports for web & worker and different emscripten pipelines could do the job. I'm also not entirely sure onnxruntime-web would work in the background script of a MV2 extension (workers are only MV3) - but most browsers are compliant with MV3.

juliankolbe · 2025-03-01T14:01:52Z

@fs-eire Actually have a different question, so I managed to get onnxruntime-web/webgpu with wegpu running very well. The only issue I have is that it actually uses a mix of GPU and CPU, using all CPU cores and not fully utilising the GPU.

The issue im having is that when i run a larger inference, it will block all other system tasks and browser ui etc. As its running max performance on all cores.

My goal: Have 1 or 2 reservered cores, so basically supplying numThreads to the webgpu provider, such that it doesnt use all threads.
I tried to use this setting intraOpNumThreads but it throws an error:

onnxruntime-web_webgpu.js?v=a67bba03:1636 Uncaught (in promise) BindingError: _emval_take_value has unknown type 0xcf9a7
    at kn (onnxruntime-web_webgpu.js?v=a67bba03:1636:60)
    at ep (onnxruntime-web_webgpu.js?v=a67bba03:1766:33)
    at ort-wasm-simd-threaded.jsep.wasm:0x7fc2be
    at I.<computed> (onnxruntime-web_webgpu.js?v=a67bba03:2023:22)
    at ar (onnxruntime-web_webgpu.js?v=a67bba03:1270:39)
    at Yi (onnxruntime-web_webgpu.js?v=a67bba03:2052:129)
    at s (onnxruntime-web_webgpu.js?v=a67bba03:2055:13)
    at zo (onnxruntime-web_webgpu.js?v=a67bba03:920:20)
    at onnxruntime-web_webgpu.js?v=a67bba03:2003:17

Having that work would allow me to let the browser ui/rest of processes breathe while the longer inference which can take like a min.

Edit: I am creating the session like this to get the error:

const session = await ort.InferenceSession.create(modelPath, {
    executionProviders: ["webgpu"],
    intraOpNumThreads: 14,
})

fs-eire · 2025-03-01T19:20:29Z

@fs-eire Actually have a different question, so I managed to get onnxruntime-web/webgpu with wegpu running very well. The only issue I have is that it actually uses a mix of GPU and CPU, using all CPU cores and not fully utilising the GPU.

The issue im having is that when i run a larger inference, it will block all other system tasks and browser ui etc. As its running max performance on all cores.

My goal: Have 1 or 2 reservered cores, so basically supplying numThreads to the webgpu provider, such that it doesnt use all threads. I tried to use this setting intraOpNumThreads but it throws an error:
onnxruntime-web_webgpu.js?v=a67bba03:1636 Uncaught (in promise) BindingError: _emval_take_value has unknown type 0xcf9a7
    at kn (onnxruntime-web_webgpu.js?v=a67bba03:1636:60)
    at ep (onnxruntime-web_webgpu.js?v=a67bba03:1766:33)
    at ort-wasm-simd-threaded.jsep.wasm:0x7fc2be
    at I.<computed> (onnxruntime-web_webgpu.js?v=a67bba03:2023:22)
    at ar (onnxruntime-web_webgpu.js?v=a67bba03:1270:39)
    at Yi (onnxruntime-web_webgpu.js?v=a67bba03:2052:129)
    at s (onnxruntime-web_webgpu.js?v=a67bba03:2055:13)
    at zo (onnxruntime-web_webgpu.js?v=a67bba03:920:20)
    at onnxruntime-web_webgpu.js?v=a67bba03:2003:17
Having that work would allow me to let the browser ui/rest of processes breathe while the longer inference which can take like a min.

Edit: I am creating the session like this to get the error:
const session = await ort.InferenceSession.create(modelPath, {
    executionProviders: ["webgpu"],
    intraOpNumThreads: 14,
})

The whole execution process is driven by CPU and the entry point of the inference execution is the wasm exported function _OrtRun. This means for heavy models if onnxruntime-web will occupy noticeable amount of CPU time and cause UI freeze.

If you are working on a product that has a UI responsive requirement, the solution is to use onnxruntime-web in a web worker and write communication logic between the worker and main thread.

Regarding the number of threads setting, for onnxruntime-web, intraOpNumThreads is not available ( you can see the document saying that it's only for onnxruntime-node and onnxruntime-react-native ). You should use ort.env.wasm.numThreads to set the number of threads instead. Also, this setting may be helpful for CPU(wasm) EP, but if you are using WebGPU, usually using multi-threads will not give you any performance benefits because we expect all heavy operators are calculated on GPU and CPU is only used for some very simple work like calculating tensor shapes. And for the reasons explained above, this setting does not help to imporve UI freeze. You need web worker for that.

ggaabe added the platform:web issues related to ONNX Runtime web; typically submitted using template label May 30, 2024

fs-eire self-assigned this May 31, 2024

ggaabe mentioned this issue Jun 2, 2024

WebGPU and WASM Backends Unavailable within Service Worker (V3 issue) huggingface/transformers.js#787

Open

5 tasks

fs-eire mentioned this issue Jun 2, 2024

[js/web] allow build target for non dynamic import #20898

Merged

fs-eire closed this as completed in #20898 Jun 3, 2024

fs-eire reopened this Jun 3, 2024

sophies927 added the ep:WebGPU ort-web webgpu provider label Jun 13, 2024

[Web] WebGPU and WASM Backends Unavailable within Service Worker #20876

[Web] WebGPU and WASM Backends Unavailable within Service Worker #20876

Comments

ggaabe commented May 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

fs-eire commented May 31, 2024

Uh oh!

fs-eire commented Jun 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggaabe commented Jun 2, 2024

Uh oh!

fs-eire commented Jun 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fs-eire commented Jun 5, 2024

Uh oh!

ggaabe commented Jun 6, 2024

Uh oh!

ggaabe commented Jun 6, 2024

Uh oh!

fs-eire commented Jun 10, 2024

Uh oh!

ggaabe commented Jun 12, 2024

Uh oh!

fs-eire commented Jun 13, 2024

Uh oh!

ggaabe commented Jun 13, 2024

Uh oh!

ggaabe commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fs-eire commented Jun 14, 2024

Uh oh!

ggaabe commented Jun 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fs-eire commented Jun 18, 2024

Uh oh!

ggaabe commented Jun 18, 2024

Uh oh!

fs-eire commented Jun 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggaabe commented Jun 23, 2024

Uh oh!

fs-eire commented Jun 23, 2024

Uh oh!

ggaabe commented Jun 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nickl1234567 commented Jun 24, 2024

Uh oh!

fs-eire commented Jun 24, 2024

Uh oh!

fs-eire commented Jul 28, 2024

Uh oh!

kyr0 commented Aug 1, 2024

Uh oh!

fs-eire commented Aug 6, 2024

Uh oh!

lucasgelfond commented Aug 9, 2024

Uh oh!

fs-eire commented Aug 9, 2024

Uh oh!

kyr0 commented Aug 10, 2024

Uh oh!

lucasgelfond commented Aug 11, 2024

Uh oh!

lucasgelfond commented Aug 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fs-eire commented Aug 12, 2024

Uh oh!

ggaabe commented May 30, 2024 •

edited

Loading

fs-eire commented Jun 2, 2024 •

edited

Loading

fs-eire commented Jun 2, 2024 •

edited

Loading

ggaabe commented Jun 13, 2024 •

edited

Loading

ggaabe commented Jun 14, 2024 •

edited

Loading

fs-eire commented Jun 18, 2024 •

edited

Loading

ggaabe commented Jun 23, 2024 •

edited

Loading

lucasgelfond commented Aug 11, 2024 •

edited

Loading

lucasgelfond commented Aug 13, 2024 •

edited

Loading

juliankolbe commented Sep 11, 2024 •

edited

Loading

juliankolbe commented Sep 12, 2024 •

edited

Loading

Eldow commented Jan 15, 2025 •

edited

Loading

fs-eire commented Jan 15, 2025 •

edited

Loading

Eldow commented Jan 16, 2025 •

edited

Loading

juliankolbe commented Mar 1, 2025 •

edited

Loading