homoglyphs translation to ASCII #348

rcbarnett-zz · 2013-10-17T20:18:21Z

MODSEC-194: Il would be useful to have a filter that convert all homoglyphs to their ASCII (or Latin?) equivalent.
This would be useful to stop SQL smuggling.

rcbarnett-zz · 2013-10-17T20:18:21Z

Original reporter: marcstern

rcbarnett-zz · 2013-10-17T20:18:21Z

rbarnett: Agreed. Two comments -

We are looking into implementing something similar to Snort's unicode.map file for conversions
http://cvs.snort.org/viewcvs.cgi/*checkout*/snort/etc/unicode.map?rev=HEAD&content-type=text/plain
In the meantime, the latest CRS v2.1.1 has the BETA advanced_filter_converter.lua script that is used to normalize many of the same issues. This file is the Lua port of the PHPIDS Converter.PHP logic which combats many of these evasions attempts. The Lua script is used by the newly named modsecurity_crs_41_advanced_filters.conf file -
http://mod-security.svn.sourceforge.net/viewvc/mod-security/crs/trunk/experimental_rules/modsecurity_crs_41_advanced_filters.conf

rcbarnett-zz · 2013-10-17T20:18:21Z

marcstern: Also, extended characters like %u2329 should be supported. Currently, the lowest byte is zeroed which inhibits the parsing of these characters.
Should I open a new bug?

rcbarnett-zz · 2013-10-17T20:18:21Z

rbarnett: We might be able to extend t:urlDecodeUni to better handle this issue. For example, we could do different Unicode mappings using the data found here -

http://www.lookout.net/2010/12/20/list-of-characters-for-testing-unicode-transformations-and-best-fit-mapping-to-dangerous-ascii/
http://www.lookout.net/wp-content/uploads/2010/12/uni2asc.csv
http://www.lookout.net/wp-content/uploads/2010/12/bestfit.csv

csanders-git · 2017-06-09T19:16:41Z

@zimmerle why was this abandoned it'd be cool to do homoglyph detection, perhaps we can do this in a CRS rule @dune73, thoughts?

dune73 · 2017-06-10T05:59:03Z

Sure think it would be great to do this, but it sounds very tricky. It's certainly more flexible if done within a rule, but maybe it is too expensive and should be covered by ModSec itself.

Also I lack the know-how about much of this encoding, homoglyph stuff. So a couple of attacking payload examples would help me and probably some others to look at this from a practical viewpoint.

marcstern · 2017-06-12T08:19:43Z

I think I can help here.
There are several pre-requisites & limitations.

Pre-requisites:

Let's assume that only UTF-8 is used and we block bad UTF-8 encoding (if you have to accept something else, I think it's game over)
We map all Unicode characters to US-ASCII:
SecUnicodeMapFile {...}/unicode.mapping 20127
We use t:utf8toUnicode (+ t:urlDecodeUni if needed)

Limitations:

The current file "unicode.mapping" is highly incomplete.
We have an extended version (more or less exhaustive) that I generated automatically and updated manually.
This file is not public yet because I consider it potentially not 100% correct and I don't want to distribute this information that we use in highly sensitive environments to attackers.
It needs to be reviewed by several people but, most of all, the mapping principle should be validated: which characters should be mapped? For accented letters, it's obvious but what about greek characters for instance? Should they be mapped to a letter? What about the characters 02C5 (MODIFIER LETTER DOWN ARROWHEAD) & 02C7 (CARON)? Should they be mapped to a V?
In order to answer that, I think we need an exhaustive list of the back-end systems (app servers and DB) that perform this kind of mapping and to adapt the list consistently.
Potentially, we need to create several entries, one for each back-end.
I we can construct complete requirements, I'll complete it and share it with everybody.
In case we have different code mappings dependent on the back-end, that means that we can only support one back-end per WAF, as SecUnicodeMapFile is a global setting.
In case of all above points are solved, the htmlEntityDecode does not support extended characters. We should extend it to have a complete solution: should be automatic when using utf8toUnicode (like urlDecodeUni), or, potentially, have a new transformation "htmlEntityDecodeUni"
Unless there's an optimisation performed in htmlEntityDecode, we (maybe) need to use it twice:
t:utf8toUnicode,t:urlDecodeUni,t:htmlEntityDecode,t:utf8toUnicode
because a Unicode character could be coded as an html entity on top of the opposite - to be validated (as our parsing is maybe paranoiac)
discussion in point 4 should be validated for sqlHexDecode

csanders-git · 2017-06-13T01:57:33Z

hmm yeah these are some good points... the transformation system as it exists is kinda not great is it... just not sure of other options. likewise good points need to be made about updating the unicode mapping file, i'm gonna link this issue in an open CRS bug we have on that matter.

victorhora · 2017-06-13T19:29:36Z

Maybe the update to the unicode.map could be eased with something like CLDR transforms like Cyrillic->Latin

The fact that SecUnicodeMapFile is a global setting is a limitation indeed, but I think something like this can work for some scenarios:

<Location "/mysite/english/home/">
SecUnicodeMapFile unicode.mapping 1215
</Location>

<Location "/mysite/russian/home/">
SecUnicodeMapFile unicode.mapping 20127
</Location>

marcstern · 2017-06-14T06:21:47Z

I think the point is not to convert automatically (that's what I merely did) but to know

where, in the back-end, it could be translated
what translation is performed by each of these back-ends

ghost assigned zimmerle Oct 17, 2013

rcbarnett-zz closed this as completed Oct 17, 2013

csanders-git mentioned this issue Jun 13, 2017

SQLi id:942100, false positive on combination of two chars SpiderLabs/owasp-modsecurity-crs#794

Open

victorhora mentioned this issue Oct 30, 2017

bad unicode maping in detectsqli #1601

Closed

CRS-migration-bot mentioned this issue May 13, 2020

SQLi id:942100, false positive on combination of two chars coreruleset/coreruleset#794

Closed

zeridon mentioned this issue Jul 8, 2020

Add Cyrilyc charracters to unicode.mapping #2353

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

homoglyphs translation to ASCII #348

homoglyphs translation to ASCII #348

rcbarnett-zz commented Oct 17, 2013

rcbarnett-zz commented Oct 17, 2013

rcbarnett-zz commented Oct 17, 2013

rcbarnett-zz commented Oct 17, 2013

rcbarnett-zz commented Oct 17, 2013

csanders-git commented Jun 9, 2017

dune73 commented Jun 10, 2017

marcstern commented Jun 12, 2017

csanders-git commented Jun 13, 2017

victorhora commented Jun 13, 2017

marcstern commented Jun 14, 2017

homoglyphs translation to ASCII #348

homoglyphs translation to ASCII #348

Comments

rcbarnett-zz commented Oct 17, 2013

rcbarnett-zz commented Oct 17, 2013

rcbarnett-zz commented Oct 17, 2013

rcbarnett-zz commented Oct 17, 2013

rcbarnett-zz commented Oct 17, 2013

csanders-git commented Jun 9, 2017

dune73 commented Jun 10, 2017

marcstern commented Jun 12, 2017

csanders-git commented Jun 13, 2017

victorhora commented Jun 13, 2017

marcstern commented Jun 14, 2017