Skip to content

Standardize configuration inheritance. #172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

demiankatz
Copy link
Contributor

This is a first stab at #171. I'm out of time for today and would like to test more before this gets merged, but I thought I'd open a PR now so you can take a glance and comment if you think I'm on the wrong (or right) track between now and the next time I have a chance to work on this.

@edsilv
Copy link
Member

edsilv commented Oct 7, 2015

Yep, looking good so far!

@demiankatz demiankatz force-pushed the improved-config-extends branch from 9f8f665 to cc20964 Compare October 8, 2015 14:07
@demiankatz
Copy link
Contributor Author

@edsilv, I think this is done. I've adjusted configuration inheritance so you can chain together an infinite number of files (previously it was limited to one level of inheritance). Each file will search its "extends" path relative to its own file location. I've also refactored all of the common strings out of the extensions into the shared base. I suspect there may be additional strings from the Seadragon extension that ought to move up, but for now I limited myself to refactoring this so that the output remains unchanged from before. It's also worth reviewing what's left behind in the media and PDF extensions, since there seem to be inconsistencies between the CY and EN files which may indicate obsolete stuff that can simply be removed entirely.

Let me know what you think, and whether I can be of any further assistance. If you're happy with this, feel free to merge to master, and then I'll update #168 to match.

@edsilv
Copy link
Member

edsilv commented Oct 8, 2015

Nice! One thing that occurs to me is that there is now no single l10n file for each locale to upload to transifex for translation... I'm actually starting to question the value of transifex. It's kinda difficult to work with, and currently only supports the openseadragon extension.

Maybe there's a better solution. Perhaps we could even supply the translation interface ourselves in the form of the config editor?

@edsilv
Copy link
Member

edsilv commented Oct 8, 2015

There's a problem with the config editor. It's only editing the config/content for the current extension. As we've discussed, data-config needs to be able to support different locales, probably by listing their file paths.

Maybe each localised config needs to contain everything for all extensions. These configs are split between src/l10n and src/config, and we do away with src/extensions/[extension]/l10n and src/extensions/[extension]/config.

An example src/l10n/en-GB.json might be:

{
  "localisation": {
    "label": "English (GB)"
  },
  "content": {
    "canvasIndexOutOfRange": "Canvas index out of range.",
    "degradedResourceMessage": "Please log in to view at full quality.",
    "degradedResourceLogin": "log in"
  },
  "extensions": {
    "uv-seadragon-extension": {
      "modules": {
         "seadragonCenterPanel": {
           "content": {
             "acknowledgements": "Acknowledgements",
             "goHome": "Go Home",
             "imageUnavailable": "Image Unavailable",
             "next": "Next",
             "previous": "Previous",
             "rotateRight": "Rotate Right",
             "zoomIn": "Zoom In",
             "zoomOut": "Zoom Out"
          }
        } etc...
      }
    }
  }

The same pattern would be employed for the src/config/en-GB.json except with options instead of content nodes.

These are then merged by the config editor. We add two checkboxes above the editor, "content" and "config". Checking "content" would allow you to edit the content only. The json can be copied and used to overwrite the src/l10n/[locale].json file. Alternatively it can be saved into myConfig.json and used to override the defaults using data-config.

The file loaded by data-config could be a list of links to each locale:


{
  "en-GB": "myConfig.en-GB.json",
  "cy-GB": "myConfig.cy-GB.json"
}

?

@demiankatz
Copy link
Contributor Author

I'm not familiar with Transifex, so I can't comment on its specific limitations. Are you familiar with TranslateWiki (https://translatewiki.net/)? It's been on my radar for some time to investigate using it for VuFind, but I haven't gotten around to it yet. Looks promising, though. Perhaps it would also be suitable for UV's needs.

@edsilv
Copy link
Member

edsilv commented Oct 8, 2015

I'll take a look. I think what I'm suggesting above could obviate third party translation services?

@edsilv
Copy link
Member

edsilv commented Oct 8, 2015

We can use the "extends" keyword to inherit things like licenses from the en-GB l10n file. I think this approach could solve a number of issues. It's straight-forward to filter what's included in the config editor using jmespath.

@demiankatz
Copy link
Contributor Author

I think one of the proposed benefits of TranslateWiki is that you may get translations into additional languages courtesy of the active volunteer population, which could be a benefit -- but it only works well if your language files match up with the sites conventions (and being unable to devote much time to cleaning up some messy things in VuFind is the main reason I haven't gone forward on that side).

If you're not interested in soliciting a broader base of translation volunteers, though, it certainly could make sense to have your translation staff/volunteers simply use the existing config editor.

The main challenge, which relates both to this and to your second comment, is that you have to figure out a way to easily map translations back into the source files. That's hard to do when there are multiple source files... but collapsing everything into a single file adds its own challenges, in terms of loss of modularity and potentially limitless growth. It also makes the possibilities for inheritance less flexible -- I'm gathering that each module's "common" section would get merged with the top-level "common" section... and maybe that's good enough... but it feels more restrictive than having the ability to chain files together more or less arbitrarily.

It seems to me that any approach is going to have some trade-offs. To some extent, it's also a question of whether to maintain this as a single file or simply to compile more things at build time -- perhaps it still makes sense to maintain files on an extension-by-extension basis, but to compile everything into a single-file format that's more compatible with the config editor. Of course, that doesn't address the problem of figuring out where to put newly-received translations in the source files. I suppose at this point I don't know enough about workflows to comment particularly intelligently!

@edsilv
Copy link
Member

edsilv commented Oct 8, 2015

I think if the config editor does a decent enough baseline translation job, it might be simpler to just go that route. My gut feeling is that UV translation will only ever be a slow trickle from implementing institutions, rather than something requiring crowdsourcing. If this turns out to be wrong, it should still be possible to add third party translation support.

One thing we'd lose from transifex is the "% translated" indicator, which is quite handy...

Here's one idea. We create a new "monolithic" config format that combines all content and options for all extensions, plus a shared section. These monolithic files would be named src/config/en-GB.json etc...

This format acts as a temporary file to be pasted into src/config. It also allows data-config to know about other extensions.

data-config can also optionally be a json array of references to locale-specific configs.

The locale is included in the monolithic file, so if it's referenced directly from data-config, it knows when to apply it. It's necessary to have the option of locale-specific data-configs, because you may want to enable/disable certain features per-locale. For example, the language toggle button used on the welsh locale.

After copying and pasting this monolithic file from the config editor, you run a build. During build, it's merged with the extension-specific l10n and config files (perhaps we should rename these to "content" and "options" to match the monolithic format node names? "config" being a combination of content and options). Everything but the shared "content" and "options" sections is removed. The extension-specific files are still able to inherit from each other, and when inheriting from the monolithic shared config file, only get the corresponding "content" or "options" nodes.

When launching the config editor, they are merged back into the monolithic config format again for the given locale. The editor has "content" and "options" checkboxes to enable/disable those sections and keep the file size down for data-config use. It also has checkboxes to enable/disable each extension section and the shared section.

The problem now is that this generated monolithic file will grow large as extensions are added. I think we can get away with this for the editing experience - it's acceptable for a tool like that to need to load heavy files. Consider Adobe Photoshop or Premier.

@demiankatz
Copy link
Contributor Author

Is there a way to give the config editor extra smarts so it can detect which strings are missing from one translation but present in another?

I imagine that % translated would actually be fairly easy to calculate at build time -- if we want to keep track of that statistic, we could probably do something in the Travis job to post the current value somewhere.

Your proposed solution sounds reasonable to me, though it would definitely be good to solicit some additional opinions given that I'm not yet so familiar with the project that I'm likely to think of every edge case!

A few questions/comments:

  1. My understanding of data-config was that it was for overrides that would be merged on top of the default UV settings. This way, you could filter it down to contain just the important things you want to change, and it wouldn't be difficult to maintain it over time as new features were added to UV. Perhaps I'm misunderstanding, but your description of running a build to process the output of the config editor sounds like a different approach to configuration customization -- and one that's probably less accessible to many users outside of Typescript/Node space. But maybe you're just describing a process for creating translations, and I'm reading it out of context.
  2. Related to the above: you are correct that it would be nice to disable unwanted extensions in order to keep the configuration file light in cases where parts are not needed... but this seems in direct opposition to the "data-config as delta" idea I described above. Maybe there need to be two configuration modes -- one for merging, one for overriding. But it's also possible that we're overthinking this. Even with many more extensions, I can't imagine that the configuration file will become so overwhelming that it will swamp modern browsers -- config loading is certainly not the most expensive thing going on in UV. So maybe this is a non-issue. I'd be inclined to work on the most user-friendly approach for now, and worry about performance optimization when we encounter a problem.
  3. I think it's sensible to try to align the directory names with the entries in the JSON files. I think "options" is probably a better directory name than "config" as it is more specific. However, I think I prefer "l10n" over "content" given that "content" could mean practically anything, but "l10n" carries more meaning.

@edsilv
Copy link
Member

edsilv commented Oct 12, 2015

  1. Yep, I'm imagining that the data-config remains as a "delta" merged with the base config at runtime. I'm just interested in making it work for all extensions and per locale by introducing a new "monolithic" format.
  2. I'm thinking that you'd copy the json from the config editor in two scenarios. First, having all extensions and all options/content enabled. This is for copying/pasting into the src/config/[locale].json. Second, for data-config. This is the "config as delta" where you only want to override certain bits whilst keeping the file size as minimal as possible.
  3. Agreed.

I'm thinking that it would be desirable to be able to perform some kind of type checking on the configs based on interface inheritance. Config currently feels like a bit of an unsafe loose end in that regard - pretty much everything else is statically typed.

That way we could ensure that each config has no missing fields and can therefore create a % translated metric around which fields are empty or somehow marked as inherited.

Runtime TS type checking doesn't appear to be one of their design goals: microsoft/TypeScript#1573

Not sure how to approach this yet. We could potentially maintain json schemas to validate against. Currently these are being generated on build for the config editor to work with. JSON schema as a format doesn't really lend itself to human read/writability though, and would be a maintenance headache. Would be ideal if we could somehow use typescript interfaces.

I scribbled a diagram to help explain these ideas to myself :-)

https://drive.google.com/file/d/0B64TWB9qdiOMNzVHQkNnRk9tV1E/view

@demiankatz
Copy link
Contributor Author

Okay, I think we're pretty close to being on the same page (though honestly I ought to spend more time absorbing this -- right now still catching up after two days of VuFind Summit).

How would you like to proceed? Is it worth merging this pull request as an incremental step (and so that we can also take care of #168) and then do more refactoring subsequently, or do you want to try to get more of this done here in the PR before proceeding? I suppose it depends on whether you envision the type of inheritance implemented here still being part of the process (i.e. assemble each chunk using inheritance, and then assemble into monolith) or whether you have a different mechanism in mind now that you are thinking more holistically about configuration.

@edsilv
Copy link
Member

edsilv commented Oct 15, 2015

I think this PR can be merged as an incremental step. Extension-specific configs can still inherit from each other as needed. E.g. the welsh config may only change one element of the English. The difference will be that when inheriting from the src/config/[locale].json, it will remove everything but the shared section during build and merge it into the specific extension's l10n and options files. The src/config/[locale].json acts as a convenient temporary place to copy/paste all config pre-build, and the source of all shared config post-build.

Just thinking, with the Welsh example, I think the new process would lead to the inherited config being copied in during build. So you'd open the config editor for the cy-GB locale, the monolithic format would be constructed using the inherited shared and en-GB config (everything extends shared config by default - no "extends" keyword necessary). You'd then edit it and copy and paste the changed config into src/config/cy-GB.json, run a build, everything but the shared config is moved to src/extensions/[extension]/l10n/cy-GB.json and src/extensions/[extension]/options/cy-GB.json. We'd need to ensure that the en-GB inherited bits are removed somehow.

@demiankatz
Copy link
Contributor Author

How would we control the specialized merging behavior? Could it be as simple as checking to see if the parent object has a shared section, and if so, behave differently?

@edsilv
Copy link
Member

edsilv commented Oct 16, 2015

http://jsbin.com/zojune/edit?js,console

In that jsbin, we'd want the merged config to exclude the "goHome" node. I think there must be a way to do it with _.merge using the customizer callback...

Also, as per the community call discussion, we need a way to set a data-config global override for all locales. This is to avoid a situation where we'd need to create duplicate overrides for every locale e.g:

{
    "shared": "myConfig.shared.json",
    "en-GB": "myConfig.en-GB.json",
    "cy-GB": "myConfig.cy-GB.json"
}

When bootstrapping, the UV checks for a data-config:

https://github.com/UniversalViewer/universalviewer/blob/master/src/Bootstrapper.ts#L194

We need to make this locale-aware so that it only applies config for the current locale. We also need to check for the existence of a "shared" entry, which would be merged on top of the locale-specific config.

The data-config contents as discussed can either be the monolithic file format with a "locale" node, or a json dictionary with paths to configs. In the case of receiving a monolithic file, shared is not an option. It's only looked for when using the dictionary. We could probably just check for the presence of a "locale" node to determine whether it's a dictionary or not.

@demiankatz
Copy link
Contributor Author

In the jsbin, why do we not want to allow the extended config to define strings that are not found in the base config? I agree that this could probably be prevented with the appropriate callback function, but I'm not sure I understand the reasoning. What is the harm of adding additional local strings?

I agree that looking for the "locale" node should be a sufficient means of differentiating a single config from a dictionary. I can't imagine there's ever going to be a locale named "locale," so I don't see a likely chance of collision. The alternative, though, would be to give the dictionary a distinctive top-level node like "config-dictionary" and then look for that. The advantage to this approach would be improved backward compatibility, since you could assume that a config file with no 'locale' node gets the default locale of 'shared', which would be closer to the current behavior.

@edsilv
Copy link
Member

edsilv commented Oct 16, 2015

I'm thinking that we want to avoid a situation where we define minimal extension configs using inheritance, then through the process of "monolithification" where everything is flattened into a single file then chunked up again, we fill these files up with their inherited elements, rather than keeping them minimal. When "chunking" we only want to overwrite what's already in these files, not bring along everything they've inherited from elsewhere.

Yep, agree regarding "config-dictionary". Maybe just "configs"?

{
    "configs": {
        "shared": ...
    }
}

@demiankatz
Copy link
Contributor Author

Okay, the chunking example definitely makes sense. I think this would have to be done recursively. This seems to do the trick (though it doesn't use _.merge at all):

function customizer(a, b) {
  var out = b;
  if (_.isObject(a)) {
    out = {};
    for (var i in a) {
      if (typeof b[i] !== 'undefined') {
        out[i] = customizer(a[i], b[i]);
      } else {
        out[i] = a[i];
      }
    }
  }
  return out;
}
var merged = customizer(config, extConfig);

I agree that "configs" would be fine in the dictionary.

@edsilv
Copy link
Member

edsilv commented Jan 12, 2016

Here's the "monolithification" diagram for reference: https://ipfs.pics/QmZWAB89yZutvSj6utMzdQRUfBmmtMwFHXzX2ZSGVQ4RBY

@edsilv
Copy link
Member

edsilv commented Apr 2, 2020

monolith

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants