Skip to content

homoglyphs translation to ASCII #348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rcbarnett-zz opened this issue Oct 17, 2013 · 10 comments
Closed

homoglyphs translation to ASCII #348

rcbarnett-zz opened this issue Oct 17, 2013 · 10 comments
Assignees

Comments

@rcbarnett-zz
Copy link
Contributor

MODSEC-194: Il would be useful to have a filter that convert all homoglyphs to their ASCII (or Latin?) equivalent.
This would be useful to stop SQL smuggling.

@rcbarnett-zz
Copy link
Contributor Author

Original reporter: marcstern

@rcbarnett-zz
Copy link
Contributor Author

rbarnett: Agreed. Two comments -

  1. We are looking into implementing something similar to Snort's unicode.map file for conversions
    http://cvs.snort.org/viewcvs.cgi/*checkout*/snort/etc/unicode.map?rev=HEAD&content-type=text/plain

  2. In the meantime, the latest CRS v2.1.1 has the BETA advanced_filter_converter.lua script that is used to normalize many of the same issues. This file is the Lua port of the PHPIDS Converter.PHP logic which combats many of these evasions attempts. The Lua script is used by the newly named modsecurity_crs_41_advanced_filters.conf file -
    http://mod-security.svn.sourceforge.net/viewvc/mod-security/crs/trunk/experimental_rules/modsecurity_crs_41_advanced_filters.conf

@rcbarnett-zz
Copy link
Contributor Author

marcstern: Also, extended characters like %u2329 should be supported. Currently, the lowest byte is zeroed which inhibits the parsing of these characters.
Should I open a new bug?

@rcbarnett-zz
Copy link
Contributor Author

rbarnett: We might be able to extend t:urlDecodeUni to better handle this issue. For example, we could do different Unicode mappings using the data found here -

http://www.lookout.net/2010/12/20/list-of-characters-for-testing-unicode-transformations-and-best-fit-mapping-to-dangerous-ascii/
http://www.lookout.net/wp-content/uploads/2010/12/uni2asc.csv
http://www.lookout.net/wp-content/uploads/2010/12/bestfit.csv

@ghost ghost assigned zimmerle Oct 17, 2013
@csanders-git
Copy link

@zimmerle why was this abandoned it'd be cool to do homoglyph detection, perhaps we can do this in a CRS rule @dune73, thoughts?

@dune73
Copy link
Member

dune73 commented Jun 10, 2017

Sure think it would be great to do this, but it sounds very tricky. It's certainly more flexible if done within a rule, but maybe it is too expensive and should be covered by ModSec itself.

Also I lack the know-how about much of this encoding, homoglyph stuff. So a couple of attacking payload examples would help me and probably some others to look at this from a practical viewpoint.

@marcstern
Copy link

I think I can help here.
There are several pre-requisites & limitations.

Pre-requisites:

  1. Let's assume that only UTF-8 is used and we block bad UTF-8 encoding (if you have to accept something else, I think it's game over)
  2. We map all Unicode characters to US-ASCII:
    SecUnicodeMapFile {...}/unicode.mapping 20127
  3. We use t:utf8toUnicode (+ t:urlDecodeUni if needed)

Limitations:

  1. The current file "unicode.mapping" is highly incomplete.
    We have an extended version (more or less exhaustive) that I generated automatically and updated manually.
    This file is not public yet because I consider it potentially not 100% correct and I don't want to distribute this information that we use in highly sensitive environments to attackers.
    It needs to be reviewed by several people but, most of all, the mapping principle should be validated: which characters should be mapped? For accented letters, it's obvious but what about greek characters for instance? Should they be mapped to a letter? What about the characters 02C5 (MODIFIER LETTER DOWN ARROWHEAD) & 02C7 (CARON)? Should they be mapped to a V?
    In order to answer that, I think we need an exhaustive list of the back-end systems (app servers and DB) that perform this kind of mapping and to adapt the list consistently.
    Potentially, we need to create several entries, one for each back-end.
    I we can construct complete requirements, I'll complete it and share it with everybody.
  2. In case we have different code mappings dependent on the back-end, that means that we can only support one back-end per WAF, as SecUnicodeMapFile is a global setting.
  3. In case of all above points are solved, the htmlEntityDecode does not support extended characters. We should extend it to have a complete solution: should be automatic when using utf8toUnicode (like urlDecodeUni), or, potentially, have a new transformation "htmlEntityDecodeUni"
  4. Unless there's an optimisation performed in htmlEntityDecode, we (maybe) need to use it twice:
    t:utf8toUnicode,t:urlDecodeUni,t:htmlEntityDecode,t:utf8toUnicode
    because a Unicode character could be coded as an html entity on top of the opposite - to be validated (as our parsing is maybe paranoiac)
  5. discussion in point 4 should be validated for sqlHexDecode

@csanders-git
Copy link

hmm yeah these are some good points... the transformation system as it exists is kinda not great is it... just not sure of other options. likewise good points need to be made about updating the unicode mapping file, i'm gonna link this issue in an open CRS bug we have on that matter.

@victorhora
Copy link
Contributor

Maybe the update to the unicode.map could be eased with something like CLDR transforms like Cyrillic->Latin

The fact that SecUnicodeMapFile is a global setting is a limitation indeed, but I think something like this can work for some scenarios:

<Location "/mysite/english/home/">
SecUnicodeMapFile unicode.mapping 1215
</Location>

<Location "/mysite/russian/home/">
SecUnicodeMapFile unicode.mapping 20127
</Location>

@marcstern
Copy link

I think the point is not to convert automatically (that's what I merely did) but to know

  1. where, in the back-end, it could be translated
  2. what translation is performed by each of these back-ends

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants