Skip to content

lld looks for libraries given via relative paths at wrong place (relative to system paths) #67779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kalibera opened this issue Sep 29, 2023 · 17 comments · Fixed by #67857 or llvm/llvm-project-release-prs#719

Comments

@kalibera
Copy link

When libraries are passed to the linker as relative paths (e.g. "../../../../lib/libLLVMSupport.a"), they are sometimes looked for relative to the system installation path (e.g. "/something/usr/x86_64-pc-linux-gnu/lib/clang/17/lib/../../../../lib/libLLVMSupport.a"). Instead, they should be looked up always relative to the current working directory.

I've ran into this when cross-compiling a native LLVM compiler (to run on Windows/aarch64 and produce code for Windows/aarch64), using a cross-compiler running on Linux/x86_64. Linking failed due to errors like "ld.lld: error: unknown file type: CommandLine.cpp.o". The correct "libLLVMSupport.a" library has been available relatively to the current working directory in "../../../../lib/libLLVMSupport.a", but the linker instead used an incompatible version (for Linux/x86_64) which happened to be located in "/something/usr/x86_64-pc-linux-gnu/lib/clang/17/lib/../../../../lib/libLLVMSupport.a". When I deleted that incompatible library from that location, the linking passed as it should.

I'm attaching a narrowed-down example based on this problem. On my system, with that "incompatible" library in place,

cd test/sub/sub/sub/sub/
./test1

does

/home/tomas/toolchain/clang/toolchain_libs/mxe/usr/x86_64-pc-linux-gnu/bin/aarch64-w64-mingw32.static.posix-clang++ \
TableGen.cpp.obj \
../../../../lib/libLLVMSupport.a

and outputs

ld.lld: error: unknown file type: CommandLine.cpp.o
ld.lld: error: unknown file type: ManagedStatic.cpp.o
ld.lld: error: unknown file type: PrettyStackTrace.cpp.o
ld.lld: error: unknown file type: raw_ostream.cpp.o
ld.lld: error: unknown file type: Signals.cpp.o
clang: error: linker command failed with exit code 1 (use -v to see invocation) 

With the "incompatible" library not in place, the output includes only complaints about undefined symbols, as it should, because I've narrowed down the example. The archive includes also outputs from strace, which make it clear that lld opens the "incompatible" library at the wrong place:

openat(AT_FDCWD, "/home/tomas/toolchain/clang/toolchain_libs/mxe/usr/x86_64-pc-linux-gnu/lib/clang/17/lib/../../../../lib/libLLVMSupport.a", O_RDONLY|O_CLOEXEC) = 4

The archive also includes the "incompatible" version of the library one could place in the corresponding place to reproduce the problem.

test.zip

@mstorsjo
@tru @rnk @zmodem

@github-actions github-actions bot added the lld label Sep 29, 2023
@tru
Copy link
Collaborator

tru commented Sep 29, 2023

Is this the COFF or ELF backend? are you producing code for Linux or Windows?

@kalibera
Copy link
Author

I am producing code for Windows. The "compatible" library that should be used is COFF. The "incompatible" that is used by accident is ELF.

Per offline communication @mstorsjo knows almost surely why this is happening, I will let him comment.

@tru
Copy link
Collaborator

tru commented Sep 29, 2023

I think I know as well considering it's the COFF backend. I can have a look unless Martin already have a patch.

@mstorsjo
Copy link
Member

Here's another even smaller testcase of the same issue. It's a bit more contrieved (one probably wouldn't be doing it like this), but it showcases the issue much clearer:

  • lib.c:
void myLibFunc(void) { }
  • main.c:
void myLibFunc(void);
void mainCRTStartup() {
	myLibFunc();
}

To run:

$ clang -target x86_64-linux-gnu -c lib.c -o lib-linux.o
$ clang -target aarch64-windows-msvc -c lib.c -o lib-win.o
$ clang -target aarch64-windows-msvc -c main.c -o main-win.o
$ mkdir -p lib-linux
$ rm -f lib-linux/mylib.a mylib.a
$ llvm-ar rcs lib-linux/mylib.a lib-linux.o
$ llvm-ar rcs mylib.a lib-win.o
$ lld-link main-win.o -libpath:lib-linux mylib.a -out:main-win.exe -subsystem:console
# Success; mylib.a resolves to the file in the local directory, not in -libpath
$ cp lib-linux/mylib.a $(dirname $(which lld-link))/../lib
$ lld-link main-win.o -libpath:lib-linux mylib.a -out:main-win.exe -subsystem:console
lld-link: error: unknown file type: lib-linux.o
# Failure; mylib.a is found in $(dirname $(which lld-link))/../lib

The issue can be showed with the new -print-search-paths option to lld. Unfortnately, in this case, it errors out due to the unknown file type before it gets a chance of printing anything. With a modification to print earlier in the process, and a modification to quote the outputs, to make empty entries clearer, I get the following:

$ lld-link main-win.o -libpath:lib-linux mylib.a -out:main-win.exe -subsystem:console -print-search-paths
Library search paths:
  "/home/martin/clang-nightly/lib/clang/17/lib/windows"
  "/home/martin/clang-nightly/lib/clang/17/lib"
  "/home/martin/clang-nightly/lib"
  ""
  "lib-linux"

lld-link: error: unknown file type: lib-linux.o

So the core of the issue, is that the toolchain directories, <toolchain>/lib and <toolchain>/lib/clang/<ver>/lib, are searched before the current directory. Generally, one could think that's reasonable, but:

  • If one does a full install of LLVM, i.e. LLVM_INSTALL_TOOLCHAIN_ONLY set to FALSE, then <toolchain>/lib will contain files like libLLVMSupport.a for the host architecture. Accidentally picking up and preferring this file instead of a different one when trying to cross compile is entirely fatal. With regular use of lld-link, the naming style usually doesn't coincide, but in mingw contexts, one can try to link a libLLVMSupport.a for a cross target, but end up including one for the host

  • In the case that @kalibera showed, a very unexpected combination happened; a input of ../../../../lib/libLLVMSupport.a happened to match the early search path entry <toolchain>/lib/clang/17/lib resolving into <toolchain>/lib/libLLVMSupport.a.

Suggested change 1

I think that the empty path, i.e. resolving files based on the current directory, always should be first in the search path, before the toolchain files. Before the recent changes, this was always first, before anything grabbed from the windows SDK or from the $LIB env variable. addClangLibSearchPaths uses searchPaths.insert(searchPaths.begin(), with a comment that this is to make sure things are added before whatever is taken from MSVC or the WinSDK. But by doing that, it's also added before the first empty path entry.

I think we could change addClangLibSearchPaths to just append to searchPaths, but restructure the calls to add things in the desired order.

Currently we have this:

  // Construct search path list.
  searchPaths.emplace_back("");
  for (auto *arg : args.filtered(OPT_libpath))
    searchPaths.push_back(arg->getValue());
  detectWinSysRoot(args);
  if (!args.hasArg(OPT_lldignoreenv) && !args.hasArg(OPT_winsysroot))
    addLibSearchPaths();
  addClangLibSearchPaths(argsArr[0]);

But if we really want to have this priority order, if we'd change addClangLibSearchPaths to plain append, we could have this instead:

  // Construct search path list.
  searchPaths.emplace_back("");
  addClangLibSearchPaths(argsArr[0]);
  for (auto *arg : args.filtered(OPT_libpath))
    searchPaths.push_back(arg->getValue());
  detectWinSysRoot(args);
  if (!args.hasArg(OPT_lldignoreenv) && !args.hasArg(OPT_winsysroot))
    addLibSearchPaths();

Suggested change 2

Secondly, b6c2f10 changed so that we do resolve relative paths relative to the libpaths too. This was done with the intent of allowing <triple>/libclang_rt.builtins.a to resolve relative to a libpath. But I think it might be reasonable to not allow this with a relative path with .., what do you think, and only ever resolve such paths relative to the current directory?

Suggested change 3

For mingw contexts, we probably shouldn't be doing any of this logic for auto-adding search paths. In mingw contexts, the linker is always invoked (by the compiler driver) with suitable -L parameters specifying the intended search order. So there, we strictly don't need this logic at all.

I made a patch to do this some months ago, https://reviews.llvm.org/D144084, but when applied, this struck an issue where some compiler-rt test expects to invoke LLD in an MSVC environment, but with -lldmingw to opt in to mingw style handling of weak symbols - this testcase relies on autofinding MSVC libraries while enabling mingw mode. So we probably should separate these concerns a bit, either add a separate flag for -lldmingw-style-weak-symbols, or make a new flag -lld-no-auto-libpaths, which the lld mingw frontend always passes in.

I just haven't had a real demand for going forward with this patch so I haven't quite had time, but I guess this is a good case for it.

@JiaT75
Copy link

JiaT75 commented Sep 29, 2023

Hello!

So the core of the issue, is that the toolchain directories, <toolchain>/lib and <toolchain>/lib/clang/<ver>/lib, are searched before the current directory. Generally, one could think that's reasonable, but:

I also ran into this bug a few days ago and have been investigated it a bit. I was planning on submitting a bug report today but you beat me to it. You did a better job than I would have writing it up so its good that you did :)
I believe this is serious problem since it can lead to unexpected results when linking.

I see in the release notes for 17.0.1:
"By default lld-link will now search for libraries in the toolchain directories. Specifically it will search: /lib, /lib/clang//lib and /lib/clang//lib/windows."

In my case, my CI toolchain caught the bug in 17.0.1. I am building liblzma.a and disabling various features and testing that the build and tests are successful. The problem is that on MSYS2 CLANG64 environment liblzma.a is installed in the system libraries (/clang64/lib/). So when building the xz.exe command line tool, it links with the system liblzma.a instead of the one in the local directory.

This causes an obvious problem if trying to build and distribute a new version of a program that depends on a library built locally.

So it makes sense why this issue did not appear until 17.0.1. The expected behavior should be to check the local directory first, then the toolchain directories. I believe this problem is only with the COFF backend since I could not reproduce it on my Ubuntu machine after upgrading to the 17.0.1 release. It possible I set up my test wrong so don't believe me 100%.

@tru
Copy link
Collaborator

tru commented Sep 29, 2023

Secondly, b6c2f10 changed so that we do resolve relative paths relative to the libpaths too. This was done with the intent of allowing /libclang_rt.builtins.a to resolve relative to a libpath. But I think it might be reasonable to not allow this with a relative path with .., what do you think, and only ever resolve such paths relative to the current directory?

I think that's very reasonable and I would think that change would be easy and good.

mstorsjo added a commit to mstorsjo/llvm-project that referenced this issue Sep 29, 2023
…e search paths

In b6c2f10, we allowed looking
up relative paths within the search directories. This was done with
the intention to allow resolving paths like <triple>/clang_rt.builtins.lib
within a base search directory. However we shouldn't try to look
up e.g. a path like ../../../../lib/libLLVMSupport.a in the search
paths. In some cases, this could actually lead to accidental
matches, for a file other than the one that was intended.

This should fix one aspect of llvm#67779.
@mstorsjo
Copy link
Member

See #67856, #67857 and #67858 for one set of potential fixes for this issue.

@mstorsjo
Copy link
Member

I forgot to mention here, that this issue actually reproduces quite easily - if building LLVM with the CMake Makefile generator, as opposed to Ninja. With Ninja, all the build commands are executed in the same directory, the base of the build directory. However, with the makefile generator, it uses recursive makefiles, thus it enters subdirectories and executes commands there.

That's why there are relative path references such as ../../../../lib/libLLVMSupport.a. And such a path, applied on <toolchain>/lib/clang/18/lib easily finds <toolchain>/lib/libLLVMSupport.a.

mstorsjo added a commit to mstorsjo/llvm-project that referenced this issue Oct 2, 2023
… path

Before af744f0, the first entry
among the search paths was the empty string, indicating searching
in (or starting from) the current directory. After
af744f0, the toolchain/clang
specific lib directories were added at the head of the search path.

This would cause lookups of literal file names or relative paths
to match paths in the toolchain, if there are coincidental files
with similar names there, even if they would be find in the current
directory as well.

Change addClangLibSearchPaths to append to the list like all other
operations on searchPaths - but move the invocation of the
function to the right place in the sequence.

This fixes llvm#67779.
mstorsjo added a commit that referenced this issue Oct 2, 2023
… path (#67857)

Before af744f0, the first entry
among the search paths was the empty string, indicating searching
in (or starting from) the current directory. After
af744f0, the toolchain/clang
specific lib directories were added at the head of the search path.

This would cause lookups of literal file names or relative paths
to match paths in the toolchain, if there are coincidental files
with similar names there, even if they would be find in the current
directory as well.

Change addClangLibSearchPaths to append to the list like all other
operations on searchPaths - but move the invocation of the
function to the right place in the sequence.

This fixes #67779.
@mstorsjo
Copy link
Member

mstorsjo commented Oct 2, 2023

Reopening for reusing the PR for backport. The main fix should have landed now (which should be enough for avoiding the issue), I'll request backporting that. We'll see about backporting the others once they're landed.

(Also, @tru, I know we're slated to release 17.0.2 tomorrow - if you feel this is too soon to be backported before that, feel free to defer it until after the release to let it cook in main for a few days. Although I believe these changes should be fairly risk free.)

@mstorsjo
Copy link
Member

mstorsjo commented Oct 2, 2023

/cherry-pick 7d7d9e4 f906fd5

@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2023

Failed to cherry-pick: f906fd5

https://github.com/llvm/llvm-project/actions/runs/6379152727

Please manually backport the fix and push it to your github fork. Once this is done, please add a comment like this:

/branch <user>/<repo>/<branch>

@mstorsjo
Copy link
Member

mstorsjo commented Oct 2, 2023

Ah, apparently a reformatting in a5e280b affects the cherrypicking. While the reformatting isn't very small and localized, I think it's easiest to cherrypick it to the release branch (it applies cleanly); that also reduces the risk of conflicts for other future backports.

/cherry-pick a5e280b 7d7d9e4 f906fd5

@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2023

/branch llvm/llvm-project-release-prs/issue67779

@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2023

/branch llvm/llvm-project-release-prs/issue67779

@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2023

/pull-request llvm/llvm-project-release-prs#719

@tru tru moved this from Needs Triage to Needs Review in LLVM Release Status Oct 2, 2023
@tru tru moved this from Needs Review to Done in LLVM Release Status Oct 2, 2023
tru pushed a commit that referenced this issue Oct 2, 2023
… path (#67857)

Before af744f0, the first entry
among the search paths was the empty string, indicating searching
in (or starting from) the current directory. After
af744f0, the toolchain/clang
specific lib directories were added at the head of the search path.

This would cause lookups of literal file names or relative paths
to match paths in the toolchain, if there are coincidental files
with similar names there, even if they would be find in the current
directory as well.

Change addClangLibSearchPaths to append to the list like all other
operations on searchPaths - but move the invocation of the
function to the right place in the sequence.

This fixes #67779.

(cherry picked from commit f906fd5)
mstorsjo added a commit to mstorsjo/llvm-mingw that referenced this issue Oct 3, 2023
This contains a few bugfixes for mingw use cases:
- LLD in 17.0.1 and the earlier 17.x release candidates looked up
  libraries (including libraries named as a relative path) preferring
  to search in toolchain relative paths over first testing from the
  current directory. In some cases, this could cause cryptic build
  failures; see llvm/llvm-project#67779.
- Options that only are relevant for linking, like -mwindows, used
  to be accepted (but produce a warning) if used in a compilation-only
  job. In the earlier 17.x releases/RCs, such warnings caused an
  error instead; see llvm/llvm-project#64464
  and #371.
@mstorsjo
Copy link
Member

Suggested change 3

For mingw contexts, we probably shouldn't be doing any of this logic for auto-adding search paths. In mingw contexts, the linker is always invoked (by the compiler driver) with suitable -L parameters specifying the intended search order. So there, we strictly don't need this logic at all.

I made a patch to do this some months ago, https://reviews.llvm.org/D144084, but when applied, this struck an issue where some compiler-rt test expects to invoke LLD in an MSVC environment, but with -lldmingw to opt in to mingw style handling of weak symbols - this testcase relies on autofinding MSVC libraries while enabling mingw mode. So we probably should separate these concerns a bit, either add a separate flag for -lldmingw-style-weak-symbols, or make a new flag -lld-no-auto-libpaths, which the lld mingw frontend always passes in.

I just haven't had a real demand for going forward with this patch so I haven't quite had time, but I guess this is a good case for it.

FYI, this patch just landed in 241c290.

@llvmbot
Copy link
Member

llvmbot commented Oct 21, 2023

@llvm/issue-subscribers-lld-coff

Author: Tomas Kalibera (kalibera)

When libraries are passed to the linker as relative paths (e.g. "../../../../lib/libLLVMSupport.a"), they are sometimes looked for relative to the system installation path (e.g. "/something/usr/x86_64-pc-linux-gnu/lib/clang/17/lib/../../../../lib/libLLVMSupport.a"). Instead, they should be looked up always relative to the current working directory.

I've ran into this when cross-compiling a native LLVM compiler (to run on Windows/aarch64 and produce code for Windows/aarch64), using a cross-compiler running on Linux/x86_64. Linking failed due to errors like "ld.lld: error: unknown file type: CommandLine.cpp.o". The correct "libLLVMSupport.a" library has been available relatively to the current working directory in "../../../../lib/libLLVMSupport.a", but the linker instead used an incompatible version (for Linux/x86_64) which happened to be located in "/something/usr/x86_64-pc-linux-gnu/lib/clang/17/lib/../../../../lib/libLLVMSupport.a". When I deleted that incompatible library from that location, the linking passed as it should.

I'm attaching a narrowed-down example based on this problem. On my system, with that "incompatible" library in place,

cd test/sub/sub/sub/sub/
./test1

does

/home/tomas/toolchain/clang/toolchain_libs/mxe/usr/x86_64-pc-linux-gnu/bin/aarch64-w64-mingw32.static.posix-clang++ \
TableGen.cpp.obj \
../../../../lib/libLLVMSupport.a

and outputs

ld.lld: error: unknown file type: CommandLine.cpp.o
ld.lld: error: unknown file type: ManagedStatic.cpp.o
ld.lld: error: unknown file type: PrettyStackTrace.cpp.o
ld.lld: error: unknown file type: raw_ostream.cpp.o
ld.lld: error: unknown file type: Signals.cpp.o
clang: error: linker command failed with exit code 1 (use -v to see invocation) 

With the "incompatible" library not in place, the output includes only complaints about undefined symbols, as it should, because I've narrowed down the example. The archive includes also outputs from strace, which make it clear that lld opens the "incompatible" library at the wrong place:

openat(AT_FDCWD, "/home/tomas/toolchain/clang/toolchain_libs/mxe/usr/x86_64-pc-linux-gnu/lib/clang/17/lib/../../../../lib/libLLVMSupport.a", O_RDONLY|O_CLOEXEC) = 4

The archive also includes the "incompatible" version of the library one could place in the corresponding place to reproduce the problem.

test.zip

@mstorsjo
@tru @rnk @zmodem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment