Create/parse arcp (Archive and Package) URIs.
arcp
provides functions for creating arcp URIs,
which can be used for identifying or parsing hypermedia
files packaged in an archive or package, like a ZIP file.
arcp URIs can be used to consume or reference hypermedia resources bundled inside a file archive or an application package, as well as to resolve URIs for archive resources within a programmatic framework.
This URI scheme provides mechanisms to generate a unique base URI to represent the root of the archive, so that relative URI references in a bundled resource can be resolved within the archive without having to extract the archive content on the local file system.
An arcp URI can be used for purposes of isolation (e.g. when consuming multiple archives), security constraints (avoiding “climb out” from the archive), or for externally identiyfing sub-resources referenced by hypermedia formats.
- Examples:
arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/doc.html
arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/pics/
arcp://ni,sha-256;F-34D4TUeOfG0selz7REKRDo4XePkewPeQYtjL3vQs0/
arcp://name,gallery.example.org/
The different forms of URI authority in arcp URIs can be used depending on which uniqueness constraints to apply when addressing an archive. See the arcp specification (draft-soilandreyes-arcp) for details.
Note that this library only provides mechanisms to
generate and parse arcp URIs, and do not integrate with any
particular archive or URL handling modules like
zipfile
or urllib.request
.
© 2018-2020 Stian Soiland-Reyes <https://orcid.org/0000-0001-9842-9718>, The University of Manchester, UK
Licensed under the Apache License, version 2.0 <https://www.apache.org/licenses/LICENSE-2.0>, see the file LICENSE.txt for details.
Source code: <https://github.com/stain/arcp-py>
Feel free to raise a pull request at <https://github.com/stain/arcp-py/pulls> or an issue at <https://github.com/stain/arcp-py/issues>.
Submitted contributions are assumed to be covered by section 5 of the Apache License 2.0.
You will need Python 2.7, Python 3.4 or later (Recommended: 3.6).
If you have pip, then the easiest is normally to install from <https://pypi.org/project/arcp/> using:
pip install arcp
If you want to install manually from this code base, then try:
python setup.py install
For full documentation, see <https://arcp.readthedocs.io/> or use help(arcp)
This module provides functions for creating arcp URIs, which can be used for identifying or parsing hypermedia files packaged in an archive or package, like a ZIP file:: python
>>> from arcp import *>>> arcp_random() 'arcp://uuid,dcd6b1e8-b3a2-43c9-930b-0119cf0dc538/'>>> arcp_random("/foaf.ttl", fragment="me") 'arcp://uuid,dcd6b1e8-b3a2-43c9-930b-0119cf0dc538/foaf.ttl#me'>>> arcp_hash(b"Hello World!", "/folder/") 'arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/folder/'>>> arcp_location("http://example.com/data.zip", "/file.txt") 'arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/file.txt'
arcp URLs can be used with urllib.parse
,
for instance using urljoin
to resolve relative references:
>>> css = arcp.arcp_name("app.example.com", "css/style.css") >>> urllib.parse.urljoin(css, "../fonts/foo.woff") 'arcp://name,app.example.com/fonts/foo.woff'
In addition this module provides functions that can be used to parse arcp URIs into its constituent fields:: python
>>> is_arcp_uri("arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/file.txt") True
>>> is_arcp_uri("http://example.com/t") False
>>> u = parse_arcp("arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/file.txt") ARCPSplitResult(scheme='arcp',prefix='uuid',name='b7749d0b-0e47-5fc4-999d-f154abe68065', uuid='b7749d0b-0e47-5fc4-999d-f154abe68065',path='/file.txt',query='',fragment='')>>> u.path '/file.txt' >>> u.prefix 'uuid' >>> u.uuid UUID('b7749d0b-0e47-5fc4-999d-f154abe68065') >>> u.uuid.version 5>>> parse_arcp("arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/folder/").hash ('sha-256', '7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069')
The object returned from parse_arcp
is similar to
ParseResult
from urlparse
, but contains additional properties
prefix
, uuid
, ni
, hash
and name
,
some of which will be None
depending on the arcp prefix.
The function arcp.parse.urlparse
can be imported as an alternative
to urllib.parse.urlparse
. If the scheme is arcp
then the extra
arcp fields like prefix, uuid, hash and name are available
as from parse_arcp, otherwise the output is the same as from
regular urlparse:: python
>>> from arcp.parse import urlparse >>> urlparse("arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/folder/soup;sads") ARCPParseResult(scheme='arcp',prefix='ni', name='sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk', ni='sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk', hash=('sha-256', '7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069', path='/folder/soup;sads',query='',fragment='') >>> urlparse("http://example.com/help?q=a") ParseResult(scheme='http', netloc='example.com', path='/help', params='', query='q=a', fragment='')