Skip to content

Commit d1481d6

Browse files
committed
Compiler implementation for FluentBundle
1 parent 20f3e25 commit d1481d6

31 files changed

+4142
-95
lines changed

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
.tox
22
*.pyc
33
.eggs/
4+
*.pot
5+
*.mo
6+
*.po
7+
.pytest_cache
48
*.egg-info/
59
_build
10+
.benchmarks
11+
.hypothesis

fluent.runtime/CHANGELOG.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ fluent.runtime development version (unreleased)
88
terms.
99
* Refined error handling regarding function calls to be more tolerant of errors
1010
in FTL files, while silencing developer errors less.
11+
* Added ``CompilingFluentBundle`` implementation.
1112

1213
fluent.runtime 0.1 (January 21, 2019)
1314
-------------------------------------
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
FluentBundle Implementations
2+
============================
3+
4+
python-fluent comes with two implementations of ``FluentBundle``. The default is
5+
``fluent.runtime.InterpretingFluentBundle``, which is what you get under the
6+
alias ``fluent.runtime.FluentBundle``. It implements an interpreter for the FTL
7+
Abstract Syntax Tree.
8+
9+
The alternative is ``fluent.runtime.CompilingFluentBundle``, which works by
10+
compiling a set of FTL messages to a set of Python functions using Python `ast
11+
<https://docs.python.org/3/library/ast.html>`_. This results in very good
12+
performance (see below for more info).
13+
14+
While the two implementations have the same API, and return the same values
15+
under most situations, there are some differences, as follows:
16+
17+
* ``InterpretingFluentBundle`` has some protection against malicious FTL input
18+
which could attempt things like a `billion laughs attack
19+
<https://en.wikipedia.org/wiki/Billion_laughs_attack>`_ to consume a large
20+
amount of memory or CPU time. For the sake of performance,
21+
``CompilingFluentBundle`` does not have these protections.
22+
23+
It should be noted that both implementations are able to detect and stop
24+
infinite recursion errors (``CompilingFluentBundle`` does this at compile
25+
time), which is important to stop infinite loops and memory exhaustion which
26+
could otherwise occur due to accidental cyclic references in messages.
27+
28+
* While the error handling strategy for both implementations is the same, when
29+
errors occur (e.g. a missing value in the arguments dictionary, or a cyclic
30+
reference, or a string is passed to ``NUMBER()`` builtin), the exact errors
31+
returned by ``format`` may be different between the two implementations.
32+
33+
Also, when an error occurs, in some cases (such as a cyclic reference), the
34+
error string embedded into the returned formatted message may be different.
35+
For cases where there is no error, the output is identical (or should be).
36+
37+
Neither implementations guarantees that the exact errors returned will be the
38+
same between different versions of ``fluent.runtime``.
39+
40+
Performance
41+
-----------
42+
43+
Due to the strategy of compiling to Python, ``CompilingFluentBundle`` has very
44+
good performance, especially for the simple common cases. The
45+
``tools/benchmark/gettext_comparisons.py`` script includes some benchmarks that
46+
compare speed to GNU gettext as a reference. Below is a rough summary:
47+
48+
For the simple but very common case of a message defining a static string,
49+
``CompilingFluentBundle.format`` is very close to GNU gettext, or much faster,
50+
depending on whether you are using Python 2 or 3, and your Python implementation
51+
(e.g. CPython or PyPy). (The worst case we found was 5% faster than gettext on
52+
CPython 2.7, and the best case was about 3.5 times faster for PyPy2 5.1.2). For
53+
cases of substituting a single string into a message,
54+
``CompilingFluentBundle.format`` is between 30% slower and 70% faster than an
55+
equivalent implementation using GNU gettext and Python ``%`` interpolation.
56+
57+
For message where plural rules are involved, currently ``CompilingFluentBundle``
58+
can be significantly slower than using GNU gettext, partly because it uses
59+
plural rules from CLDR that can be much more complex (and correct) than the ones
60+
that gettext normally does. Further work could be done to optimize some of these
61+
cases though.
62+
63+
For more complex operations (for example, using locale-aware date and number
64+
formatting), formatting messages can take a lot longer. Comparisons to GNU
65+
gettext fall down at this point, because it doesn't include a lot of this
66+
functionality. However, usually these types of messages make up a small fraction
67+
of the number of internationalized strings in an application.
68+
69+
``InterpretingFluentBundle`` is, as you would expect, much slower that
70+
``CompilingFluentBundle``, often by a factor of 10. In cases where there are a
71+
large number of messages, ``CompilingFluentBundle`` will be a lot slower to
72+
format the first message because it first compiles all the messages, whereas
73+
``InterpretingFluentBundle`` does not have this compilation step, and tries to
74+
reduce any up-front work to a minimum (sometimes at the cost of runtime
75+
performance).
76+
77+
78+
Security
79+
--------
80+
81+
You should not pass un-trusted FTL code to ``FluentBundle.add_messages``. This
82+
is because carefully constructed messages could potentially cause large resource
83+
usage (CPU time and memory). The ``InterpretingFluentBundle`` implementation
84+
does have some protection against these attacks, although it may not be
85+
foolproof, while ``CompilingFluentBundle`` does not have any protection against
86+
these attacks, either at compile time or run time.
87+
88+
``CompilingFluentBundle`` works by compiling FTL messages to Python `ast
89+
<https://docs.python.org/3/library/ast.html>`_, which is passed to `compile
90+
<https://docs.python.org/3/library/functions.html#compile>`_ and then `exec
91+
<https://docs.python.org/3/library/functions.html#exec>`_. The use of ``exec``
92+
like this is an established technique for high performance Python code, used in
93+
template engines like Mako, Jinja2 and Genshi.
94+
95+
However, there can understandably be some concerns around the use of ``exec``
96+
which can open up remote execution vulnerabilities. If this is of paramount
97+
concern to you, you should consider using ``InterpretingFluentBundle`` instead
98+
(which is the default).
99+
100+
To reduce the possibility of our use of ``exec`` harbouring security issues, the
101+
following things are in place:
102+
103+
1. We generate `ast <https://docs.python.org/3/library/ast.html>`_ objects and
104+
not strings. This greatly reduces the security problems, since there is no
105+
possibility of a vulnerability due to incorrect string interpolation.
106+
107+
2. We use ``exec`` only on AST derived from FTL files, never on "end user input"
108+
(such as the arguments passed into ``FluentBundle.format``). This reduces the
109+
attack vector to only the situation where the source of your FTL files is
110+
potentially malicious or compromised.
111+
112+
3. We employ defence-in-depth techniques in our code generation and compiler
113+
implementation to reduce the possibility of a cleverly crafted FTL code
114+
producing security holes, and ensure these techniques have full test
115+
coverage.

fluent.runtime/docs/usage.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,19 @@ module or the start of your repl session:
9393
9494
from __future__ import unicode_literals
9595
96+
CompilingFluentBundle
97+
~~~~~~~~~~~~~~~~~~~~~
98+
99+
In addition to the default ``FluentBundle`` implementation, there is also a high
100+
performance implementation that compilers to Python AST. You can use it just the same:
101+
102+
.. code-block:: python
103+
104+
from fluent.runtime import CompilingFluentBundle as FluentBundle
105+
106+
Be sure to check the notes on :doc:`implementations`, especially the security
107+
section.
108+
96109
Numbers
97110
~~~~~~~
98111

@@ -225,5 +238,6 @@ Help with the above would be welcome!
225238
Other features and further information
226239
--------------------------------------
227240

241+
* :doc:`implementations`
228242
* :doc:`functions`
229243
* :doc:`errors`

fluent.runtime/fluent/runtime/__init__.py

Lines changed: 97 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,23 @@
11
from __future__ import absolute_import, unicode_literals
22

3+
from collections import OrderedDict
4+
35
import babel
46
import babel.numbers
57
import babel.plural
68

79
from fluent.syntax import FluentParser
8-
from fluent.syntax.ast import Message, Term
10+
from fluent.syntax.ast import Junk, Message, Term
911

1012
from .builtins import BUILTINS
13+
from .compiler import compile_messages
14+
from .errors import FluentDuplicateMessageId, FluentJunkFound
1115
from .prepare import Compiler
12-
from .resolver import ResolverEnvironment, CurrentEnvironment
16+
from .resolver import CurrentEnvironment, ResolverEnvironment
1317
from .utils import ATTRIBUTE_SEPARATOR, TERM_SIGIL, ast_to_id, native_to_fluent
1418

1519

16-
class FluentBundle(object):
20+
class FluentBundleBase(object):
1721
"""
1822
Message contexts are single-language stores of translations. They are
1923
responsible for parsing translation resources in the Fluent syntax and can
@@ -33,27 +37,60 @@ def __init__(self, locales, functions=None, use_isolating=True):
3337
_functions.update(functions)
3438
self._functions = _functions
3539
self.use_isolating = use_isolating
36-
self._messages_and_terms = {}
37-
self._compiled = {}
38-
self._compiler = Compiler()
40+
self._messages_and_terms = OrderedDict()
41+
self._parsing_issues = []
3942
self._babel_locale = self._get_babel_locale()
4043
self._plural_form = babel.plural.to_python(self._babel_locale.plural_form)
4144

4245
def add_messages(self, source):
4346
parser = FluentParser()
4447
resource = parser.parse(source)
45-
# TODO - warn/error about duplicates
4648
for item in resource.body:
4749
if isinstance(item, (Message, Term)):
4850
full_id = ast_to_id(item)
49-
if full_id not in self._messages_and_terms:
51+
if full_id in self._messages_and_terms:
52+
self._parsing_issues.append((full_id, FluentDuplicateMessageId(
53+
"Additional definition for '{0}' discarded.".format(full_id))))
54+
else:
5055
self._messages_and_terms[full_id] = item
56+
elif isinstance(item, Junk):
57+
self._parsing_issues.append(
58+
(None, FluentJunkFound("Junk found: " +
59+
'; '.join(a.message for a in item.annotations),
60+
item.annotations)))
5161

5262
def has_message(self, message_id):
5363
if message_id.startswith(TERM_SIGIL) or ATTRIBUTE_SEPARATOR in message_id:
5464
return False
5565
return message_id in self._messages_and_terms
5666

67+
def _get_babel_locale(self):
68+
for l in self.locales:
69+
try:
70+
return babel.Locale.parse(l.replace('-', '_'))
71+
except babel.UnknownLocaleError:
72+
continue
73+
# TODO - log error
74+
return babel.Locale.default()
75+
76+
def format(self, message_id, args=None):
77+
raise NotImplementedError()
78+
79+
def check_messages(self):
80+
"""
81+
Check messages for errors and return as a list of two tuples:
82+
(message ID or None, exception object)
83+
"""
84+
raise NotImplementedError()
85+
86+
87+
class InterpretingFluentBundle(FluentBundleBase):
88+
89+
def __init__(self, locales, functions=None, use_isolating=True):
90+
super(InterpretingFluentBundle, self).__init__(locales, functions=functions, use_isolating=use_isolating)
91+
self._compiled = {}
92+
self._compiler = Compiler()
93+
5794
def lookup(self, full_id):
5895
if full_id not in self._compiled:
5996
entry_id = full_id.split(ATTRIBUTE_SEPARATOR, 1)[0]
@@ -83,11 +120,55 @@ def format(self, message_id, args=None):
83120
errors=errors)
84121
return [resolve(env), errors]
85122

86-
def _get_babel_locale(self):
87-
for l in self.locales:
88-
try:
89-
return babel.Locale.parse(l.replace('-', '_'))
90-
except babel.UnknownLocaleError:
91-
continue
92-
# TODO - log error
93-
return babel.Locale.default()
123+
def check_messages(self):
124+
return self._parsing_issues[:]
125+
126+
127+
class CompilingFluentBundle(FluentBundleBase):
128+
def __init__(self, *args, **kwargs):
129+
super(CompilingFluentBundle, self).__init__(*args, **kwargs)
130+
self._mark_dirty()
131+
132+
def _mark_dirty(self):
133+
self._is_dirty = True
134+
# Clear out old compilation errors, they might not apply if we
135+
# re-compile:
136+
self._compilation_errors = []
137+
self.format = self._compile_and_format
138+
139+
def _mark_clean(self):
140+
self._is_dirty = False
141+
self.format = self._format
142+
143+
def add_messages(self, source):
144+
super(CompilingFluentBundle, self).add_messages(source)
145+
self._mark_dirty()
146+
147+
def _compile(self):
148+
self._compiled_messages, self._compilation_errors = compile_messages(
149+
self._messages_and_terms,
150+
self._babel_locale,
151+
use_isolating=self.use_isolating,
152+
functions=self._functions)
153+
self._mark_clean()
154+
155+
# 'format' is the hot path for many scenarios, so we try to optimize it. To
156+
# avoid having to check '_is_dirty' inside 'format', we switch 'format' from
157+
# '_compile_and_format' to '_format' when compilation is done. This gives us
158+
# about 10% improvement for the simplest (but most common) case of an
159+
# entirely static string.
160+
def _compile_and_format(self, message_id, args=None):
161+
self._compile()
162+
return self._format(message_id, args)
163+
164+
def _format(self, message_id, args=None):
165+
errors = []
166+
return self._compiled_messages[message_id](args, errors), errors
167+
168+
def check_messages(self):
169+
if self._is_dirty:
170+
self._compile()
171+
return self._parsing_issues + self._compilation_errors
172+
173+
174+
FluentBundle = InterpretingFluentBundle

0 commit comments

Comments
 (0)