Skip to content

Changed LaTeX $\bot$s to ⊥ #19998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 6, 2015
Merged

Conversation

th0114nd
Copy link
Contributor

In the HTML version of the documentation, it isn't rendered so might as well use the unicode representation.

@th0114nd th0114nd changed the title Changed LaTex $\bot$s to ⊥ Changed LaTeX $\bot$s to ⊥ Dec 18, 2014
@steveklabnik
Copy link
Member

We've previously rejected this patch, as the build doesn't actually build. Have you verified this works?

@th0114nd
Copy link
Contributor Author

Sorry I hadn't realized, I thought it was easy. So for successful inclusion, we would like an appropriately rendered $\bot$ or the UTF-8 character '⊥' (\u22A5) on the page or the HTML entity ⊥ in the source. From my preliminary experiments with pandoc --from=markdown --to=html5 --number-sections -o reference.html reference.md (which I'm not even confident is the appropriate build command) any of
$\bot$, ⊥, or ⊥ in the markdown file are translated by pandoc to ⊥ in the HTML file, and browsers (at least Chrome and Safari) interpret ⊥ as the three character sequence ⊥.

One potentially overkill solution is a pandoc filter or a sed script that executes s/⊥/⊥/ in reference.html, but I'm sure there's something simpler.

@th0114nd
Copy link
Contributor Author

Adding the --ascii flag to pandoc works, but it leaves a bad taste in your mouth because it seems inappropriate to ignore problems with UTF-8 characters by omitting them where possible.

@th0114nd
Copy link
Contributor Author

Even better, adding <meta http-equiv="Content-Type" content="text/x-markdown; charset=utf-8">
to reference.md fixes it and allows general utf-8 characters to be embedded in the markdown source.
With e639974, the generation of html should work but I'm not sure about pdf.

@steveklabnik
Copy link
Member

I thought it was easy.

Me too 😭

We don't use pandoc, we use xelatex, iirc. I think there's actually two or
three of them.

make docs will build the docs, so give that a try.

@th0114nd
Copy link
Contributor Author

So the three output types are html, epub, and latex/pdf, and I found the commands for them with

make docs --dry-run | grep reference

Making the html file with

$ DYLD_LIBRARY_PATH=/Users/tim/rust/x86_64-apple-darwin/stage2/lib rustdoc --html-before-content=doc/version_info.html --html-in-header=doc/favicon.inc --html-after-content=doc/footer.inc --markdown-playground-url='http://play.rust-lang.org/' --markdown-css rust.css src/doc/reference.md

does specify the charset as utf-8, so using ⊥ in the markdown file works. Using $\bot$ in the markdown file remains $\bot$ in the html file. However, using

$ pandoc --from=markdown --to=html5 --number-sections -o reference.html reference.md

as long as the charset is specified to be utf-8 works with either ⊥ or $\bot$.

Making the epub with

pandoc --standalone --toc --number-sections --to=epub src/doc/reference.md --output=doc/reference.epub

Changes both ⊥ and $\bot$ in the markdown file to ⊥ in the epub.

Making the pdf with

pandoc --standalone --toc --number-sections  --from=markdown --to=latex src/doc/reference.md --output=doc/reference.tex
lualatex -interaction=batchmode -output-directory=doc doc/reference.tex

translates $\bot$ to (\bot) and ⊥ to ⊥ in the tex file. (\bot) is rendered appropriately, and ⊥ does not show up in the pdf.

@th0114nd
Copy link
Contributor Author

However, with \usepackage{unicode-math} (available in texlive-math-extra) $⊥$ is appropriately rendered in the pdf.

Even better: putting the following in reference.tex

\usepackage{newunicodechar}
\newunicodechar{⊥}{{$\bot$}}

Enables the literal ⊥ in textmode to be rendered properly in the pdf. Now it's just a matter of figuring how to include those lines in reference.tex

@adrientetar
Copy link
Contributor

Now it's just a matter of figuring how to include those lines in reference.tex

Makefile, --include-in-header

@th0114nd
Copy link
Contributor Author

After b83c0e5, the html, pdf, and epub files generate correctly with ⊥.

@steveklabnik
Copy link
Member

This looks good to me! I'm not as good with makefile stuff though, so I'd like @alexcrichton or @brson to take a look.

In the menatime, would you mind squashing? And adding that this closes #15285 would be nice too :)

@adrientetar
Copy link
Contributor

What we have here is mostly a workaround but we should report it to jgm/pandoc (assuming that the latest version of pandoc still has the bug – bots run an older version).

@th0114nd
Copy link
Contributor Author

Hi @adrientetar: I'm not so sure that it is a bug, or if it is it's one that would be fairly difficult to correct.
For example, α or ポ would (and should) get literally transcribed to the same characters in a .tex file and it is up to the tex font chosen to correctly render it. In some contexts, α should be a mathmode \alpha but if it is in greek writing it should be left alone as the utf-8 char.

In order to fix the bug, pandoc would have to recognize a particular character as being a math mode one, and detexify the char to find a macro that describes it.

@adrientetar
Copy link
Contributor

Afaict:

  1. pandoc usually does sort of magic behavior to have working compatibility between formats. If you just look at the LaTeX stuff it outputs, there are multiple compatibility packages and decls. It isn't outside the scope of the project to provide this kind of functionality.
  2. With LaTeX engines using native unicode input and default font being Latin Modern, I don't think that this should be a problem.

pandoc would have to recognize a particular character as being a math mode one, and detexify the char to find a macro that describes it

In that case I think it'll just use math mode since it's wrapped in dollar signs.

@th0114nd
Copy link
Contributor Author

To be clear, ⊥ isn't wrapped in dollar signs in the latest commit. rustdoc doesn't (currently) interpret $ as starting math mode, so they don't really make much sense to include.

As to 2, I would think so but it doesn't seem to be the case: http://tex.stackexchange.com/questions/218746/is-it-possible-to-use-literals-like-%E2%8A%A5-outside-of-math-mode/218770?noredirect=1#comment513352_218770

In the HTML version of the documentation, it isn't rendered so might as well use the unicode representation.
Part of the problem was that putting a math unicode character wasn't
rendering properly in the pdf, so extra steps were needed to define
the unicode charecter ⊥ in reference.tex

closes rust-lang#15285
@emberian
Copy link
Member

emberian commented Jan 5, 2015

@adrientetar @steveklabnik is this good to go?

@steveklabnik
Copy link
Member

@alexcrichton says the makefile looks good, and so do I. Not rolling up because of makefile changes.

bors added a commit that referenced this pull request Jan 5, 2015
In the HTML version of the documentation, it isn't rendered so might as well use the unicode representation.
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Jan 6, 2015
In the HTML version of the documentation, it isn't rendered so might as well use the unicode representation.
@bors bors merged commit 4ee73a1 into rust-lang:master Jan 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants