This patch significantly improves performance of memmem using a novel modified Horspool algorithm. Needles up to size 256 use a bad-character table indexed by hashed pairs of characters to quickly skip past mismatches. Long needles use a self-adapting filtering step to avoid comparing the whole needle repeatedly. By limiting the needle length to 256, the shift table only requires 8 bits per entry, lowering preprocessing overhead and minimizing cache effects. This limit also implies worst-case performance is linear. Small needles up to size 2 use a dedicated linear search. Very long needles use the Two-Way algorithm (to avoid increasing stack size inlining is now disabled). The performance gain is 6.6 times on English text on AArch64 using random needles with average size 8 (this is even faster than the recently improved strstr algorithm, so I'll update that in the near future). The size-optimized memmem has also been rewritten from scratch to get a 2.7x performance gain. Tested against GLIBC testsuite and randomized tests. Message-Id: <DB5PR08MB1030649D051FA8532A4512C883B20@DB5PR08MB1030.eurprd08.prod.outlook.com> |
||
|---|---|---|
| .. | ||
| argz | ||
| ctype | ||
| errno | ||
| iconv | ||
| include | ||
| locale | ||
| machine | ||
| misc | ||
| posix | ||
| reent | ||
| search | ||
| signal | ||
| ssp | ||
| stdio | ||
| stdio64 | ||
| stdlib | ||
| string | ||
| sys | ||
| syscalls | ||
| time | ||
| unix | ||
| xdr | ||
| Makefile.am | ||
| Makefile.in | ||
| aclocal.m4 | ||
| configure | ||
| configure.in | ||
| libc.in.xml | ||
| libc.texinfo | ||
| saber | ||
| sys.tex | ||