Skip to content

Commit e77d428

Browse files
frenzymadnesshroncokvstinner
authored
bpo-40495: compileall option to hardlink duplicate pyc files (GH-19901)
compileall is now able to use hardlinks to prevent duplicates in a case when .pyc files for different optimization levels have the same content. Co-authored-by: Miro Hrončok <[email protected]> Co-authored-by: Victor Stinner <[email protected]>
1 parent 7443d42 commit e77d428

File tree

6 files changed

+285
-15
lines changed

6 files changed

+285
-15
lines changed

Doc/library/compileall.rst

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,11 @@ compile Python sources.
113113

114114
Ignore symlinks pointing outside the given directory.
115115

116+
.. cmdoption:: --hardlink-dupes
117+
118+
If two ``.pyc`` files with different optimization level have
119+
the same content, use hard links to consolidate duplicate files.
120+
116121
.. versionchanged:: 3.2
117122
Added the ``-i``, ``-b`` and ``-h`` options.
118123

@@ -125,7 +130,7 @@ compile Python sources.
125130
Added the ``--invalidation-mode`` option.
126131

127132
.. versionchanged:: 3.9
128-
Added the ``-s``, ``-p``, ``-e`` options.
133+
Added the ``-s``, ``-p``, ``-e`` and ``--hardlink-dupes`` options.
129134
Raised the default recursion limit from 10 to
130135
:py:func:`sys.getrecursionlimit()`.
131136
Added the possibility to specify the ``-o`` option multiple times.
@@ -143,7 +148,7 @@ runtime.
143148
Public functions
144149
----------------
145150

146-
.. function:: compile_dir(dir, maxlevels=sys.getrecursionlimit(), ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, workers=1, invalidation_mode=None, \*, stripdir=None, prependdir=None, limit_sl_dest=None)
151+
.. function:: compile_dir(dir, maxlevels=sys.getrecursionlimit(), ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, workers=1, invalidation_mode=None, \*, stripdir=None, prependdir=None, limit_sl_dest=None, hardlink_dupes=False)
147152

148153
Recursively descend the directory tree named by *dir*, compiling all :file:`.py`
149154
files along the way. Return a true value if all the files compiled successfully,
@@ -193,6 +198,9 @@ Public functions
193198
the ``-s``, ``-p`` and ``-e`` options described above.
194199
They may be specified as ``str``, ``bytes`` or :py:class:`os.PathLike`.
195200

201+
If *hardlink_dupes* is true and two ``.pyc`` files with different optimization
202+
level have the same content, use hard links to consolidate duplicate files.
203+
196204
.. versionchanged:: 3.2
197205
Added the *legacy* and *optimize* parameter.
198206

@@ -219,9 +227,9 @@ Public functions
219227
Setting *workers* to 0 now chooses the optimal number of cores.
220228

221229
.. versionchanged:: 3.9
222-
Added *stripdir*, *prependdir* and *limit_sl_dest* arguments.
230+
Added *stripdir*, *prependdir*, *limit_sl_dest* and *hardlink_dupes* arguments.
223231

224-
.. function:: compile_file(fullname, ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, invalidation_mode=None, \*, stripdir=None, prependdir=None, limit_sl_dest=None)
232+
.. function:: compile_file(fullname, ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, invalidation_mode=None, \*, stripdir=None, prependdir=None, limit_sl_dest=None, hardlink_dupes=False)
225233

226234
Compile the file with path *fullname*. Return a true value if the file
227235
compiled successfully, and a false value otherwise.
@@ -257,6 +265,9 @@ Public functions
257265
the ``-s``, ``-p`` and ``-e`` options described above.
258266
They may be specified as ``str``, ``bytes`` or :py:class:`os.PathLike`.
259267

268+
If *hardlink_dupes* is true and two ``.pyc`` files with different optimization
269+
level have the same content, use hard links to consolidate duplicate files.
270+
260271
.. versionadded:: 3.2
261272

262273
.. versionchanged:: 3.5
@@ -273,7 +284,7 @@ Public functions
273284
The *invalidation_mode* parameter's default value is updated to None.
274285

275286
.. versionchanged:: 3.9
276-
Added *stripdir*, *prependdir* and *limit_sl_dest* arguments.
287+
Added *stripdir*, *prependdir*, *limit_sl_dest* and *hardlink_dupes* arguments.
277288

278289
.. function:: compile_path(skip_curdir=True, maxlevels=0, force=False, quiet=0, legacy=False, optimize=-1, invalidation_mode=None)
279290

Doc/whatsnew/3.9.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,16 @@ that schedules a shutdown for the default executor that waits on the
245245
Added :class:`asyncio.PidfdChildWatcher`, a Linux-specific child watcher
246246
implementation that polls process file descriptors. (:issue:`38692`)
247247

248+
compileall
249+
----------
250+
251+
Added new possibility to use hardlinks for duplicated ``.pyc`` files: *hardlink_dupes* parameter and --hardlink-dupes command line option.
252+
(Contributed by Lumír 'Frenzy' Balhar in :issue:`40495`.)
253+
254+
Added new options for path manipulation in resulting ``.pyc`` files: *stripdir*, *prependdir*, *limit_sl_dest* parameters and -s, -p, -e command line options.
255+
Added the possibility to specify the option for an optimization level multiple times.
256+
(Contributed by Lumír 'Frenzy' Balhar in :issue:`38112`.)
257+
248258
concurrent.futures
249259
------------------
250260

Lib/compileall.py

Lines changed: 35 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
import importlib.util
1616
import py_compile
1717
import struct
18+
import filecmp
1819

1920
from functools import partial
2021
from pathlib import Path
@@ -47,7 +48,7 @@ def _walk_dir(dir, maxlevels, quiet=0):
4748
def compile_dir(dir, maxlevels=None, ddir=None, force=False,
4849
rx=None, quiet=0, legacy=False, optimize=-1, workers=1,
4950
invalidation_mode=None, *, stripdir=None,
50-
prependdir=None, limit_sl_dest=None):
51+
prependdir=None, limit_sl_dest=None, hardlink_dupes=False):
5152
"""Byte-compile all modules in the given directory tree.
5253
5354
Arguments (only dir is required):
@@ -70,6 +71,7 @@ def compile_dir(dir, maxlevels=None, ddir=None, force=False,
7071
after stripdir
7172
limit_sl_dest: ignore symlinks if they are pointing outside of
7273
the defined path
74+
hardlink_dupes: hardlink duplicated pyc files
7375
"""
7476
ProcessPoolExecutor = None
7577
if ddir is not None and (stripdir is not None or prependdir is not None):
@@ -104,22 +106,24 @@ def compile_dir(dir, maxlevels=None, ddir=None, force=False,
104106
invalidation_mode=invalidation_mode,
105107
stripdir=stripdir,
106108
prependdir=prependdir,
107-
limit_sl_dest=limit_sl_dest),
109+
limit_sl_dest=limit_sl_dest,
110+
hardlink_dupes=hardlink_dupes),
108111
files)
109112
success = min(results, default=True)
110113
else:
111114
for file in files:
112115
if not compile_file(file, ddir, force, rx, quiet,
113116
legacy, optimize, invalidation_mode,
114117
stripdir=stripdir, prependdir=prependdir,
115-
limit_sl_dest=limit_sl_dest):
118+
limit_sl_dest=limit_sl_dest,
119+
hardlink_dupes=hardlink_dupes):
116120
success = False
117121
return success
118122

119123
def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
120124
legacy=False, optimize=-1,
121125
invalidation_mode=None, *, stripdir=None, prependdir=None,
122-
limit_sl_dest=None):
126+
limit_sl_dest=None, hardlink_dupes=False):
123127
"""Byte-compile one file.
124128
125129
Arguments (only fullname is required):
@@ -140,6 +144,7 @@ def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
140144
after stripdir
141145
limit_sl_dest: ignore symlinks if they are pointing outside of
142146
the defined path.
147+
hardlink_dupes: hardlink duplicated pyc files
143148
"""
144149

145150
if ddir is not None and (stripdir is not None or prependdir is not None):
@@ -176,6 +181,14 @@ def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
176181
if isinstance(optimize, int):
177182
optimize = [optimize]
178183

184+
# Use set() to remove duplicates.
185+
# Use sorted() to create pyc files in a deterministic order.
186+
optimize = sorted(set(optimize))
187+
188+
if hardlink_dupes and len(optimize) < 2:
189+
raise ValueError("Hardlinking of duplicated bytecode makes sense "
190+
"only for more than one optimization level")
191+
179192
if rx is not None:
180193
mo = rx.search(fullname)
181194
if mo:
@@ -220,10 +233,16 @@ def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
220233
if not quiet:
221234
print('Compiling {!r}...'.format(fullname))
222235
try:
223-
for opt_level, cfile in opt_cfiles.items():
236+
for index, opt_level in enumerate(optimize):
237+
cfile = opt_cfiles[opt_level]
224238
ok = py_compile.compile(fullname, cfile, dfile, True,
225239
optimize=opt_level,
226240
invalidation_mode=invalidation_mode)
241+
if index > 0 and hardlink_dupes:
242+
previous_cfile = opt_cfiles[optimize[index - 1]]
243+
if filecmp.cmp(cfile, previous_cfile, shallow=False):
244+
os.unlink(cfile)
245+
os.link(previous_cfile, cfile)
227246
except py_compile.PyCompileError as err:
228247
success = False
229248
if quiet >= 2:
@@ -352,6 +371,9 @@ def main():
352371
'Python interpreter itself (specified by -O).'))
353372
parser.add_argument('-e', metavar='DIR', dest='limit_sl_dest',
354373
help='Ignore symlinks pointing outsite of the DIR')
374+
parser.add_argument('--hardlink-dupes', action='store_true',
375+
dest='hardlink_dupes',
376+
help='Hardlink duplicated pyc files')
355377

356378
args = parser.parse_args()
357379
compile_dests = args.compile_dest
@@ -371,6 +393,10 @@ def main():
371393
if args.opt_levels is None:
372394
args.opt_levels = [-1]
373395

396+
if len(args.opt_levels) == 1 and args.hardlink_dupes:
397+
parser.error(("Hardlinking of duplicated bytecode makes sense "
398+
"only for more than one optimization level."))
399+
374400
if args.ddir is not None and (
375401
args.stripdir is not None or args.prependdir is not None
376402
):
@@ -404,7 +430,8 @@ def main():
404430
stripdir=args.stripdir,
405431
prependdir=args.prependdir,
406432
optimize=args.opt_levels,
407-
limit_sl_dest=args.limit_sl_dest):
433+
limit_sl_dest=args.limit_sl_dest,
434+
hardlink_dupes=args.hardlink_dupes):
408435
success = False
409436
else:
410437
if not compile_dir(dest, maxlevels, args.ddir,
@@ -414,7 +441,8 @@ def main():
414441
stripdir=args.stripdir,
415442
prependdir=args.prependdir,
416443
optimize=args.opt_levels,
417-
limit_sl_dest=args.limit_sl_dest):
444+
limit_sl_dest=args.limit_sl_dest,
445+
hardlink_dupes=args.hardlink_dupes):
418446
success = False
419447
return success
420448
else:

0 commit comments

Comments
 (0)