Skip to content

Commit 132fae9

Browse files
committed
bytes, strings: avoid unnecessary zero initialization
Add bytealg.MakeNoZero that specially allocates a []byte without zeroing it. It assumes the caller will populate every byte. From within the bytes and strings packages, we can use bytealg.MakeNoZero in a way where our logic ensures that the entire slice is overwritten such that uninitialized bytes are never leaked to the end user. We use bytealg.MakeNoZero from within the following functions: * bytes.Join * bytes.Repeat * bytes.ToUpper * bytes.ToLower * strings.Builder.Grow The optimization in strings.Builder transitively benefits the following: * strings.Join * strings.Map * strings.Repeat * strings.ToUpper * strings.ToLower * strings.ToValidUTF8 * strings.Replace * any user logic that depends on strings.Builder This optimization is especially notable on large buffers that do not fit in the CPU cache, such that the cost of runtime.memclr and runtime.memmove are non-trivial since they are both limited by the relatively slow speed of physical RAM. Performance: RepeatLarge/256/1 66.0ns ± 3% 64.5ns ± 1% ~ (p=0.095 n=5+5) RepeatLarge/256/16 55.4ns ± 5% 53.1ns ± 3% -4.17% (p=0.016 n=5+5) RepeatLarge/512/1 95.5ns ± 7% 87.1ns ± 2% -8.78% (p=0.008 n=5+5) RepeatLarge/512/16 84.4ns ± 9% 76.2ns ± 5% -9.73% (p=0.016 n=5+5) RepeatLarge/1024/1 161ns ± 4% 144ns ± 7% -10.45% (p=0.016 n=5+5) RepeatLarge/1024/16 148ns ± 3% 141ns ± 5% ~ (p=0.095 n=5+5) RepeatLarge/2048/1 296ns ± 7% 288ns ± 5% ~ (p=0.841 n=5+5) RepeatLarge/2048/16 298ns ± 8% 281ns ± 5% ~ (p=0.151 n=5+5) RepeatLarge/4096/1 593ns ± 8% 539ns ± 8% -8.99% (p=0.032 n=5+5) RepeatLarge/4096/16 568ns ±12% 526ns ± 7% ~ (p=0.056 n=5+5) RepeatLarge/8192/1 1.15µs ± 8% 1.08µs ±12% ~ (p=0.095 n=5+5) RepeatLarge/8192/16 1.12µs ± 4% 1.07µs ± 7% ~ (p=0.310 n=5+5) RepeatLarge/8192/4097 1.77ns ± 1% 1.76ns ± 2% ~ (p=0.310 n=5+5) RepeatLarge/16384/1 2.06µs ± 7% 1.94µs ± 5% ~ (p=0.222 n=5+5) RepeatLarge/16384/16 2.02µs ± 4% 1.92µs ± 6% ~ (p=0.095 n=5+5) RepeatLarge/16384/4097 1.50µs ±15% 1.44µs ±11% ~ (p=0.802 n=5+5) RepeatLarge/32768/1 3.90µs ± 8% 3.65µs ±11% ~ (p=0.151 n=5+5) RepeatLarge/32768/16 3.92µs ±14% 3.68µs ±12% ~ (p=0.222 n=5+5) RepeatLarge/32768/4097 3.71µs ± 5% 3.43µs ± 4% -7.54% (p=0.032 n=5+5) RepeatLarge/65536/1 7.47µs ± 8% 6.88µs ± 9% ~ (p=0.056 n=5+5) RepeatLarge/65536/16 7.29µs ± 4% 6.74µs ± 6% -7.60% (p=0.016 n=5+5) RepeatLarge/65536/4097 7.90µs ±11% 6.34µs ± 5% -19.81% (p=0.008 n=5+5) RepeatLarge/131072/1 17.0µs ±18% 14.1µs ± 6% -17.32% (p=0.008 n=5+5) RepeatLarge/131072/16 15.2µs ± 2% 16.2µs ±17% ~ (p=0.151 n=5+5) RepeatLarge/131072/4097 15.7µs ± 6% 14.8µs ±11% ~ (p=0.095 n=5+5) RepeatLarge/262144/1 30.4µs ± 5% 31.4µs ±13% ~ (p=0.548 n=5+5) RepeatLarge/262144/16 30.1µs ± 4% 30.7µs ±11% ~ (p=1.000 n=5+5) RepeatLarge/262144/4097 31.2µs ± 7% 32.7µs ±13% ~ (p=0.310 n=5+5) RepeatLarge/524288/1 67.5µs ± 9% 63.7µs ± 3% ~ (p=0.095 n=5+5) RepeatLarge/524288/16 67.2µs ± 5% 62.9µs ± 6% ~ (p=0.151 n=5+5) RepeatLarge/524288/4097 65.5µs ± 4% 65.2µs ±18% ~ (p=0.548 n=5+5) RepeatLarge/1048576/1 141µs ± 6% 137µs ±14% ~ (p=0.421 n=5+5) RepeatLarge/1048576/16 140µs ± 2% 134µs ±11% ~ (p=0.222 n=5+5) RepeatLarge/1048576/4097 141µs ± 3% 134µs ±10% ~ (p=0.151 n=5+5) RepeatLarge/2097152/1 258µs ± 2% 271µs ±10% ~ (p=0.222 n=5+5) RepeatLarge/2097152/16 263µs ± 6% 273µs ± 9% ~ (p=0.151 n=5+5) RepeatLarge/2097152/4097 270µs ± 2% 277µs ± 6% ~ (p=0.690 n=5+5) RepeatLarge/4194304/1 684µs ± 3% 467µs ± 6% -31.69% (p=0.008 n=5+5) RepeatLarge/4194304/16 682µs ± 1% 471µs ± 7% -30.91% (p=0.008 n=5+5) RepeatLarge/4194304/4097 685µs ± 2% 465µs ±20% -32.12% (p=0.008 n=5+5) RepeatLarge/8388608/1 1.50ms ± 1% 1.16ms ± 8% -22.63% (p=0.008 n=5+5) RepeatLarge/8388608/16 1.50ms ± 2% 1.22ms ±17% -18.49% (p=0.008 n=5+5) RepeatLarge/8388608/4097 1.51ms ± 7% 1.33ms ±11% -11.56% (p=0.008 n=5+5) RepeatLarge/16777216/1 3.48ms ± 4% 2.66ms ±13% -23.76% (p=0.008 n=5+5) RepeatLarge/16777216/16 3.37ms ± 3% 2.57ms ±13% -23.72% (p=0.008 n=5+5) RepeatLarge/16777216/4097 3.38ms ± 9% 2.50ms ±11% -26.16% (p=0.008 n=5+5) RepeatLarge/33554432/1 7.74ms ± 1% 4.70ms ±19% -39.31% (p=0.016 n=4+5) RepeatLarge/33554432/16 7.90ms ± 4% 4.78ms ± 9% -39.50% (p=0.008 n=5+5) RepeatLarge/33554432/4097 7.80ms ± 2% 4.86ms ±11% -37.60% (p=0.008 n=5+5) RepeatLarge/67108864/1 16.4ms ± 3% 9.7ms ±15% -41.29% (p=0.008 n=5+5) RepeatLarge/67108864/16 16.5ms ± 1% 9.9ms ±15% -39.83% (p=0.008 n=5+5) RepeatLarge/67108864/4097 16.5ms ± 1% 11.0ms ±18% -32.95% (p=0.008 n=5+5) RepeatLarge/134217728/1 35.2ms ±12% 19.2ms ±10% -45.58% (p=0.008 n=5+5) RepeatLarge/134217728/16 34.6ms ± 6% 19.3ms ± 7% -44.07% (p=0.008 n=5+5) RepeatLarge/134217728/4097 33.2ms ± 2% 19.3ms ±14% -41.79% (p=0.008 n=5+5) RepeatLarge/268435456/1 70.9ms ± 2% 36.2ms ± 5% -48.87% (p=0.008 n=5+5) RepeatLarge/268435456/16 77.4ms ± 7% 36.1ms ± 8% -53.33% (p=0.008 n=5+5) RepeatLarge/268435456/4097 75.8ms ± 4% 37.0ms ± 4% -51.15% (p=0.008 n=5+5) RepeatLarge/536870912/1 163ms ±14% 77ms ± 9% -52.94% (p=0.008 n=5+5) RepeatLarge/536870912/16 156ms ± 4% 76ms ± 6% -51.42% (p=0.008 n=5+5) RepeatLarge/536870912/4097 151ms ± 2% 76ms ± 6% -49.64% (p=0.008 n=5+5) RepeatLarge/1073741824/1 293ms ± 5% 149ms ± 8% -49.18% (p=0.008 n=5+5) RepeatLarge/1073741824/16 308ms ± 9% 150ms ± 8% -51.19% (p=0.008 n=5+5) RepeatLarge/1073741824/4097 299ms ± 5% 151ms ± 6% -49.51% (p=0.008 n=5+5) Updates #57153 Change-Id: I024553b7e676d6da6408278109ac1fa8def0a802 Reviewed-on: https://go-review.googlesource.com/c/go/+/456336 Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Run-TryBot: Joseph Tsai <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Daniel Martí <[email protected]>
1 parent 9a199d4 commit 132fae9

File tree

5 files changed

+57
-22
lines changed

5 files changed

+57
-22
lines changed

src/bytes/bytes.go

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -533,12 +533,22 @@ func Join(s [][]byte, sep []byte) []byte {
533533
// Just return a copy.
534534
return append([]byte(nil), s[0]...)
535535
}
536-
n := len(sep) * (len(s) - 1)
536+
537+
var n int
538+
if len(sep) > 0 {
539+
if len(sep) >= maxInt/(len(s)-1) {
540+
panic("bytes: Join output length overflow")
541+
}
542+
n += len(sep) * (len(s) - 1)
543+
}
537544
for _, v := range s {
545+
if len(v) > maxInt-n {
546+
panic("bytes: Join output length overflow")
547+
}
538548
n += len(v)
539549
}
540550

541-
b := make([]byte, n)
551+
b := bytealg.MakeNoZero(n)
542552
bp := copy(b, s[0])
543553
for _, v := range s[1:] {
544554
bp += copy(b[bp:], sep)
@@ -589,22 +599,22 @@ func Repeat(b []byte, count int) []byte {
589599
if count == 0 {
590600
return []byte{}
591601
}
602+
592603
// Since we cannot return an error on overflow,
593-
// we should panic if the repeat will generate
594-
// an overflow.
604+
// we should panic if the repeat will generate an overflow.
595605
// See golang.org/issue/16237.
596606
if count < 0 {
597607
panic("bytes: negative Repeat count")
598-
} else if len(b)*count/count != len(b) {
599-
panic("bytes: Repeat count causes overflow")
600608
}
609+
if len(b) >= maxInt/count {
610+
panic("bytes: Repeat output length overflow")
611+
}
612+
n := len(b) * count
601613

602614
if len(b) == 0 {
603615
return []byte{}
604616
}
605617

606-
n := len(b) * count
607-
608618
// Past a certain chunk size it is counterproductive to use
609619
// larger chunks as the source of the write, as when the source
610620
// is too large we are basically just thrashing the CPU D-cache.
@@ -623,9 +633,9 @@ func Repeat(b []byte, count int) []byte {
623633
chunkMax = len(b)
624634
}
625635
}
626-
nb := make([]byte, n)
636+
nb := bytealg.MakeNoZero(n)
627637
bp := copy(nb, b)
628-
for bp < len(nb) {
638+
for bp < n {
629639
chunk := bp
630640
if chunk > chunkMax {
631641
chunk = chunkMax
@@ -653,7 +663,7 @@ func ToUpper(s []byte) []byte {
653663
// Just return a copy.
654664
return append([]byte(""), s...)
655665
}
656-
b := make([]byte, len(s))
666+
b := bytealg.MakeNoZero(len(s))
657667
for i := 0; i < len(s); i++ {
658668
c := s[i]
659669
if 'a' <= c && c <= 'z' {
@@ -683,7 +693,7 @@ func ToLower(s []byte) []byte {
683693
if !hasUpper {
684694
return append([]byte(""), s...)
685695
}
686-
b := make([]byte, len(s))
696+
b := bytealg.MakeNoZero(len(s))
687697
for i := 0; i < len(s); i++ {
688698
c := s[i]
689699
if 'A' <= c && c <= 'Z' {

src/internal/bytealg/bytealg.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,3 +148,8 @@ func IndexRabinKarp(s, substr string) int {
148148
}
149149
return -1
150150
}
151+
152+
// MakeNoZero makes a slice of length and capacity n without zeroing the bytes.
153+
// It is the caller's responsibility to ensure uninitialized bytes
154+
// do not leak to the end user.
155+
func MakeNoZero(n int) []byte

src/runtime/slice.go

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -345,3 +345,11 @@ func slicecopy(toPtr unsafe.Pointer, toLen int, fromPtr unsafe.Pointer, fromLen
345345
}
346346
return n
347347
}
348+
349+
//go:linkname bytealg_MakeNoZero internal/bytealg.MakeNoZero
350+
func bytealg_MakeNoZero(len int) []byte {
351+
if uintptr(len) > maxAlloc {
352+
panicmakeslicelen()
353+
}
354+
return unsafe.Slice((*byte)(mallocgc(uintptr(len), nil, false)), len)
355+
}

src/strings/builder.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
package strings
66

77
import (
8+
"internal/bytealg"
89
"unicode/utf8"
910
"unsafe"
1011
)
@@ -65,7 +66,7 @@ func (b *Builder) Reset() {
6566
// grow copies the buffer to a new, larger buffer so that there are at least n
6667
// bytes of capacity beyond len(b.buf).
6768
func (b *Builder) grow(n int) {
68-
buf := make([]byte, len(b.buf), 2*cap(b.buf)+n)
69+
buf := bytealg.MakeNoZero(2*cap(b.buf) + n)[:len(b.buf)]
6970
copy(buf, b.buf)
7071
b.buf = buf
7172
}

src/strings/strings.go

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ import (
1313
"unicode/utf8"
1414
)
1515

16+
const maxInt = int(^uint(0) >> 1)
17+
1618
// explode splits s into a slice of UTF-8 strings,
1719
// one string per Unicode character up to a maximum of n (n < 0 means no limit).
1820
// Invalid UTF-8 bytes are sliced individually.
@@ -436,9 +438,19 @@ func Join(elems []string, sep string) string {
436438
case 1:
437439
return elems[0]
438440
}
439-
n := len(sep) * (len(elems) - 1)
440-
for i := 0; i < len(elems); i++ {
441-
n += len(elems[i])
441+
442+
var n int
443+
if len(sep) > 0 {
444+
if len(sep) >= maxInt/(len(elems)-1) {
445+
panic("strings: Join output length overflow")
446+
}
447+
n += len(sep) * (len(elems) - 1)
448+
}
449+
for _, elem := range elems {
450+
if len(elem) > maxInt-n {
451+
panic("strings: Join output length overflow")
452+
}
453+
n += len(elem)
442454
}
443455

444456
var b Builder
@@ -536,21 +548,20 @@ func Repeat(s string, count int) string {
536548
}
537549

538550
// Since we cannot return an error on overflow,
539-
// we should panic if the repeat will generate
540-
// an overflow.
551+
// we should panic if the repeat will generate an overflow.
541552
// See golang.org/issue/16237.
542553
if count < 0 {
543554
panic("strings: negative Repeat count")
544-
} else if len(s)*count/count != len(s) {
545-
panic("strings: Repeat count causes overflow")
546555
}
556+
if len(s) >= maxInt/count {
557+
panic("strings: Repeat output length overflow")
558+
}
559+
n := len(s) * count
547560

548561
if len(s) == 0 {
549562
return ""
550563
}
551564

552-
n := len(s) * count
553-
554565
// Past a certain chunk size it is counterproductive to use
555566
// larger chunks as the source of the write, as when the source
556567
// is too large we are basically just thrashing the CPU D-cache.

0 commit comments

Comments
 (0)