Skip to content

Commit 6cc4212

Browse files
committed
Updates for RFC: Locale-independent case conversion
* Add a detailed description of what is meant by ASCII case conversion to strtolower() and strtoupper(). * Replace locale-related text on ucfirst(), lcfirst() and ucwords() with text about ASCII case conversion. * Remove entity note.locale-single-byte since it was only used on ucwords(). * Add changelog entries to all the functions mentioned in the RFC.
1 parent 48b0774 commit 6cc4212

12 files changed

+203
-34
lines changed

language-snippets.ent

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,6 @@ cryptographically secure value, consider using <function>random_int</function>,
2828
<!ENTITY note.bin-safe '<note xmlns="http://docbook.org/ns/docbook"><simpara>This function is
2929
binary-safe.</simpara></note>'>
3030

31-
<!ENTITY note.locale-single-byte '<note xmlns="http://docbook.org/ns/docbook"><simpara>This function is locale-aware
32-
and will handle input according to the currently set locale. However, it only works on single-byte character sets.
33-
If you need to use multibyte characters (most non-western-European languages) look at the
34-
<link linkend="book.mbstring">multibyte</link> or <link linkend="book.intl">intl</link> extensions instead.</simpara></note>'>
35-
3631
<!ENTITY note.clearstatcache '<note xmlns="http://docbook.org/ns/docbook"><simpara>The results of this
3732
function are cached. See <function>clearstatcache</function> for
3833
more details.</simpara></note>'>

reference/array/constants.xml

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515
<constant>CASE_LOWER</constant> is used with
1616
<function>array_change_key_case</function> and is used to convert array
1717
keys to lower case. This is also the default case for
18-
<function>array_change_key_case</function>.
18+
<function>array_change_key_case</function>. Since PHP 8.2.0, only ASCII
19+
characters will be converted.
1920
</simpara>
2021
</listitem>
2122
</varlistentry>
@@ -28,7 +29,8 @@
2829
<simpara>
2930
<constant>CASE_UPPER</constant> is used with
3031
<function>array_change_key_case</function> and is used to convert array
31-
keys to upper case.
32+
keys to upper case. Since PHP 8.2.0, only ASCII characters will be
33+
converted.
3234
</simpara>
3335
</listitem>
3436
</varlistentry>
@@ -130,10 +132,10 @@
130132
</term>
131133
<listitem>
132134
<simpara>
133-
<constant>SORT_FLAG_CASE</constant> can be combined
134-
(bitwise OR) with
135-
<constant>SORT_STRING</constant> or
136-
<constant>SORT_NATURAL</constant> to sort strings case-insensitively.
135+
<constant>SORT_FLAG_CASE</constant> can be combined (bitwise OR) with
136+
<constant>SORT_STRING</constant> or <constant>SORT_NATURAL</constant> to
137+
sort strings case-insensitively. Since PHP 8.2.0, only ASCII case folding
138+
will be done.
137139
</simpara>
138140
</listitem>
139141
</varlistentry>

reference/strings/functions/lcfirst.xml

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,8 @@
1515
<para>
1616
Returns a string with the first character of
1717
<parameter>string</parameter> lowercased if that character is
18-
alphabetic.
19-
</para>
20-
<para>
21-
Note that 'alphabetic' is determined by the current locale. For
22-
instance, in the default "C" locale characters such as umlaut-a
23-
(ä) will not be converted.
18+
an ASCII character in the range <literal>"A"</literal> (0x41) to
19+
<literal>"Z"</literal> (0x5a).
2420
</para>
2521
</refsect1>
2622

@@ -47,6 +43,29 @@
4743
</para>
4844
</refsect1>
4945

46+
<refsect1 role="changelog">
47+
&reftitle.changelog;
48+
<informaltable>
49+
<tgroup cols="2">
50+
<thead>
51+
<row>
52+
<entry>&Version;</entry>
53+
<entry>&Description;</entry>
54+
</row>
55+
</thead>
56+
<tbody>
57+
<row>
58+
<entry>8.2.0</entry>
59+
<entry>
60+
Case conversion no longer depends on the locale set with
61+
<function>setlocale</function>. Only ASCII characters will be converted.
62+
</entry>
63+
</row>
64+
</tbody>
65+
</tgroup>
66+
</informaltable>
67+
</refsect1>
68+
5069
<refsect1 role="examples">
5170
&reftitle.examples;
5271
<para>

reference/strings/functions/setlocale.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@
6262
<listitem>
6363
<simpara>
6464
<constant>LC_CTYPE</constant> for character classification and conversion, for
65-
example <function>strtoupper</function>
65+
example <function>ctype_alpha</function>
6666
</simpara>
6767
</listitem>
6868
<listitem>

reference/strings/functions/str-ireplace.xml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,30 @@
9797
</para>
9898
</refsect1>
9999

100+
<refsect1 role="changelog">
101+
&reftitle.changelog;
102+
<informaltable>
103+
<tgroup cols="2">
104+
<thead>
105+
<row>
106+
<entry>&Version;</entry>
107+
<entry>&Description;</entry>
108+
</row>
109+
</thead>
110+
<tbody>
111+
<row>
112+
<entry>8.2.0</entry>
113+
<entry>
114+
Case folding no longer depends on the locale set with
115+
<function>setlocale</function>. Only ASCII case folding will be done.
116+
Non-ASCII bytes will be compared by their byte value.
117+
</entry>
118+
</row>
119+
</tbody>
120+
</tgroup>
121+
</informaltable>
122+
</refsect1>
123+
100124
<refsect1 role="examples">
101125
&reftitle.examples;
102126
<para>

reference/strings/functions/stripos.xml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,14 @@
8484
</row>
8585
</thead>
8686
<tbody>
87+
<row>
88+
<entry>8.2.0</entry>
89+
<entry>
90+
Case folding no longer depends on the locale set with
91+
<function>setlocale</function>. Only ASCII case folding will be done.
92+
Non-ASCII bytes will be compared by their byte value.
93+
</entry>
94+
</row>
8795
<row>
8896
<entry>8.0.0</entry>
8997
<entry>

reference/strings/functions/stristr.xml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,14 @@
7676
</row>
7777
</thead>
7878
<tbody>
79+
<row>
80+
<entry>8.2.0</entry>
81+
<entry>
82+
Case folding no longer depends on the locale set with
83+
<function>setlocale</function>. Only ASCII case folding will be done.
84+
Non-ASCII bytes will be compared by their byte value.
85+
</entry>
86+
</row>
7987
<row>
8088
<entry>8.0.0</entry>
8189
<entry>

reference/strings/functions/strripos.xml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,14 @@
9898
</row>
9999
</thead>
100100
<tbody>
101+
<row>
102+
<entry>8.2.0</entry>
103+
<entry>
104+
Case folding no longer depends on the locale set with
105+
<function>setlocale</function>. Only ASCII case folding will be done.
106+
Non-ASCII bytes will be compared by their byte value.
107+
</entry>
108+
</row>
101109
<row>
102110
<entry>8.0.0</entry>
103111
<entry>

reference/strings/functions/strtolower.xml

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,18 @@
1313
<methodparam><type>string</type><parameter>string</parameter></methodparam>
1414
</methodsynopsis>
1515
<para>
16-
Returns <parameter>string</parameter> with all alphabetic characters
16+
Returns <parameter>string</parameter> with all ASCII alphabetic characters
1717
converted to lowercase.
1818
</para>
1919
<para>
20-
Note that 'alphabetic' is determined by the current locale. This means
21-
that e.g. in the default "C" locale, characters such as umlaut-A
22-
(Ä) will not be converted.
20+
Bytes in the range <literal>"A"</literal> (0x41) to <literal>"Z"</literal>
21+
(0x5a) will be converted to the corresponding lowercase letter by adding 32
22+
to each byte value.
23+
</para>
24+
<para>
25+
This can be used to convert ASCII characters within strings encoded with
26+
UTF-8, since multibyte UTF-8 characters will be ignored. To convert multibyte
27+
non-ASCII characters, use <function>mb_strtolower</function>.
2328
</para>
2429
</refsect1>
2530

@@ -46,6 +51,29 @@
4651
</para>
4752
</refsect1>
4853

54+
<refsect1 role="changelog">
55+
&reftitle.changelog;
56+
<informaltable>
57+
<tgroup cols="2">
58+
<thead>
59+
<row>
60+
<entry>&Version;</entry>
61+
<entry>&Description;</entry>
62+
</row>
63+
</thead>
64+
<tbody>
65+
<row>
66+
<entry>8.2.0</entry>
67+
<entry>
68+
Case conversion no longer depends on the locale set with
69+
<function>setlocale</function>. Only ASCII characters will be converted.
70+
</entry>
71+
</row>
72+
</tbody>
73+
</tgroup>
74+
</informaltable>
75+
</refsect1>
76+
4977
<refsect1 role="examples">
5078
&reftitle.examples;
5179
<para>

reference/strings/functions/strtoupper.xml

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,18 @@
1313
<methodparam><type>string</type><parameter>string</parameter></methodparam>
1414
</methodsynopsis>
1515
<para>
16-
Returns <parameter>string</parameter> with all alphabetic characters
16+
Returns <parameter>string</parameter> with all ASCII alphabetic characters
1717
converted to uppercase.
1818
</para>
1919
<para>
20-
Note that 'alphabetic' is determined by the current locale. For instance,
21-
in the default "C" locale characters such as umlaut-a (ä) will not be
22-
converted.
20+
Bytes in the range <literal>"a"</literal> (0x61) to <literal>"z"</literal>
21+
(0x7a) will be converted to the corresponding uppercase letter by subtracting
22+
32 from each byte value.
23+
</para>
24+
<para>
25+
This can be used to convert ASCII characters within strings encoded with
26+
UTF-8, since multibyte UTF-8 characters will be ignored. To convert multibyte
27+
non-ASCII characters, use <function>mb_strtoupper</function>.
2328
</para>
2429
</refsect1>
2530

@@ -46,6 +51,29 @@
4651
</para>
4752
</refsect1>
4853

54+
<refsect1 role="changelog">
55+
&reftitle.changelog;
56+
<informaltable>
57+
<tgroup cols="2">
58+
<thead>
59+
<row>
60+
<entry>&Version;</entry>
61+
<entry>&Description;</entry>
62+
</row>
63+
</thead>
64+
<tbody>
65+
<row>
66+
<entry>8.2.0</entry>
67+
<entry>
68+
Case conversion no longer depends on the locale set with
69+
<function>setlocale</function>. Only ASCII characters will be converted.
70+
</entry>
71+
</row>
72+
</tbody>
73+
</tgroup>
74+
</informaltable>
75+
</refsect1>
76+
4977
<refsect1 role="examples">
5078
&reftitle.examples;
5179
<para>

reference/strings/functions/ucfirst.xml

Lines changed: 26 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,8 @@
1515
<para>
1616
Returns a string with the first character of
1717
<parameter>string</parameter> capitalized, if that character is
18-
alphabetic.
19-
</para>
20-
<para>
21-
Note that 'alphabetic' is determined by the current locale. For
22-
instance, in the default "C" locale characters such as umlaut-a
23-
(ä) will not be converted.
18+
an ASCII character in the range from <literal>"a"</literal> (0x61) to
19+
<literal>"z"</literal> (0x7a).
2420
</para>
2521
</refsect1>
2622

@@ -47,6 +43,29 @@
4743
</para>
4844
</refsect1>
4945

46+
<refsect1 role="changelog">
47+
&reftitle.changelog;
48+
<informaltable>
49+
<tgroup cols="2">
50+
<thead>
51+
<row>
52+
<entry>&Version;</entry>
53+
<entry>&Description;</entry>
54+
</row>
55+
</thead>
56+
<tbody>
57+
<row>
58+
<entry>8.2.0</entry>
59+
<entry>
60+
Case conversion no longer depends on the locale set with
61+
<function>setlocale</function>. Only ASCII characters will be converted.
62+
</entry>
63+
</row>
64+
</tbody>
65+
</tgroup>
66+
</informaltable>
67+
</refsect1>
68+
5069
<refsect1 role="examples">
5170
&reftitle.examples;
5271
<para>
@@ -76,6 +95,7 @@ $bar = ucfirst(strtolower($bar)); // Hello world!
7695
<member><function>strtolower</function></member>
7796
<member><function>strtoupper</function></member>
7897
<member><function>ucwords</function></member>
98+
<member><function>mb_convert_case</function></member>
7999
</simplelist>
80100
</para>
81101
</refsect1>

reference/strings/functions/ucwords.xml

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,20 @@
1515
</methodsynopsis>
1616
<para>
1717
Returns a string with the first character of each word in
18-
<parameter>string</parameter> capitalized, if that character is alphabetic.
18+
<parameter>string</parameter> capitalized, if that character is an ASCII
19+
character between <literal>"a"</literal> (0x61) and <literal>"z"</literal>
20+
(0x7a).
1921
</para>
2022
<para>
2123
For this function, a word is a string of characters that are not listed in
2224
the <parameter>separators</parameter> parameter. By default, these are:
2325
space, horizontal tab, carriage return, newline, form-feed and vertical tab.
2426
</para>
27+
<para>
28+
To do a similar conversion on multibyte strings, use
29+
<function>mb_convert_case</function> with the <constant>MB_CASE_TITLE</constant>
30+
mode.
31+
</para>
2532
</refsect1>
2633

2734
<refsect1 role="parameters">
@@ -55,6 +62,29 @@
5562
</para>
5663
</refsect1>
5764

65+
<refsect1 role="changelog">
66+
&reftitle.changelog;
67+
<informaltable>
68+
<tgroup cols="2">
69+
<thead>
70+
<row>
71+
<entry>&Version;</entry>
72+
<entry>&Description;</entry>
73+
</row>
74+
</thead>
75+
<tbody>
76+
<row>
77+
<entry>8.2.0</entry>
78+
<entry>
79+
Case conversion no longer depends on the locale set with
80+
<function>setlocale</function>. Only ASCII characters will be converted.
81+
</entry>
82+
</row>
83+
</tbody>
84+
</tgroup>
85+
</informaltable>
86+
</refsect1>
87+
5888
<refsect1 role="examples">
5989
&reftitle.examples;
6090
<para>
@@ -110,7 +140,6 @@ $baz = ucwords($foo, " \t\r\n\f\v'"); // Mike O'Hara
110140

111141
<refsect1 role="notes">
112142
&reftitle.notes;
113-
&note.locale-single-byte;
114143
&note.bin-safe;
115144
</refsect1>
116145

0 commit comments

Comments
 (0)