PHP Classes

Bin String: Manipulate text with mbstring extension functions

Recommend this page to a friend!
  Info   View files Example   View files View files (6)   DownloadInstall with Composer Download .zip   Reputation   Support forum   Blog    
Last Updated Ratings Unique User Downloads Download Rankings
2024-01-09 (2 months ago) RSS 2.0 feedNot enough user ratingsTotal: 216 This week: 1All time: 8,300 This week: 571Up
Version License PHP version Categories
bin-string 0.11GNU Lesser Genera...5.0PHP 5, Text processing
Description 

Author

This class can manipulate text with mbstring extension functions.

It can check if the mbstring extension is enabled and provides wrapper functions that use that extension functions if possible or fallback to the respective non-multibyte text function versions.

When PHP is configured to overload the mbstring functions, it can still use the original versions of those functions.

Currently it provides wrapper functions for the functions: mail, strlen, strpos, strrpos, substr, strtolower, strtoupper, substr_count, ereg, eregi, ereg_replace, eregi_replace, and split.

Innovation Award
PHP Programming Innovation award nominee
July 2014
Number 6


Prize: One downloadable e-book of choice by O'Reilly
The PHP mbstring extension provides functions to manipulate text strings that use character set encodings that may require more than one byte per character.

This allows PHP to manipulate strings with characters in practically any alphabet. However the mbstring extension functions are slower than similar functions for single byte character strings.

This class is a wrapper to use string manipulation functions that may use mbstring functions or not, depending on whether this extension is available in the current PHP environment, and also whether the application is using a single byte or multi-byte character set encodings for its texts.

Manuel Lemos
Picture of Asbjorn Grandt
Name: Asbjorn Grandt <contact>
Classes: 10 packages by
Country: Denmark Denmark
Age: 52
All time rank: 1711 in Denmark Denmark
Week rank: 103 Up1 in Denmark Denmark Equal
Innovation award
Innovation award
Nominee: 4x

Example

<!DOCTYPE html>
<html>
    <head>
        <meta charset="UTF-8">
        <title>BinString test</title>
        <style>
            .pass {
                color: green;
            }
            .fail {
                color: red;
            }
            td, th {
                border: 1px black solid;
            }
            th {
                font-weight: bold;
            }
            tr.new {
                border-top: 3px black solid;
            }
            table {
                border-collapse: collapse;
            }
            .expectFail {
                background-color: pink;
            }
        </style>
    </head>
    <body>
        <h1>BinString test</h1>
        <?php
       
use com\grandt\BinString;

       
error_reporting(E_ALL | E_STRICT);
       
ini_set('error_reporting', E_ALL | E_STRICT);
       
ini_set('display_errors', 1);

        include
'BinString.php';
       
$binstr = new BinString();

       
$mbStr = "\x74\x65\x73\x74\xC3\x86\xC3\xB8\xC3\xA5";
       
$mbStr_lower = "\x74\x65\x73\x74\xC3\xA6\xC3\xB8\xC3\xA5";
       
$mbStr_upper = "\x54\x45\x53\x54\xC3\x86\xC3\x98\xC3\x85";

       
$isoStr = "\x74\x65\x73\x74\xC6\xF8\xE5";
       
$isoStr_lower = "\x74\x65\x73\x74\xE6\xF8\xE5";
       
$isoStr_upper = "\x54\x45\x53\x54\xC6\xD8\xC5";

       
$mbNeedle = "\xC3\xB8";
       
$isoNeedle = "\xF8";

        function
test($result, $expected) {
            if (
$result == $expected) {
                if (
is_bool($result)) {
                   
$result = $result ? "<em>true</em>" : "<em>false</em>";
                }
                return
"<span class=\"pass\">PASS<br /></span>received: $result";
            } else {
                if (
is_bool($result)) {
                   
$result = $result ? "<em>true</em>" : "<em>false</em>";
                }
                return
"<span class=\"fail\">FAIL<br />received: $result<br />expected: $expected</span>";
            }
        }

        function
test_enc($result, $expected) {
            if (
$result == $expected) {
                if (
is_bool($result)) {
                   
$result = $result ? "<em>true</em>" : "<em>false</em>";
                }
                return
"<span class=\"pass\">PASS<br /></span>received: " . mb_convert_encoding($result, 'utf8', 'latin1');
            } else {
                if (
is_bool($result)) {
                   
$result = $result ? "<em>true</em>" : "<em>false</em>";
                }
                return
"<span class=\"fail\">FAIL<br />received: " . mb_convert_encoding($result, 'utf8', 'latin1') . "<br />expected: " . mb_convert_encoding($expected, 'utf8', 'latin1') . "</span>";
            }
        }
       
?>
<p>The idea is that the standard column should be identical to the BinString column on systems without mbstring.func_overload enabled.</p>
        <p class="expectFail">Cells with a pink background are expected to fail, as they are either parsing mb strings to non mb aware functions, where they can't convert or deal with those utf8 characters, or they are just containing characters the function can't handle, due to having the wrong locale.</p>
        <p>The test strings contain the three Danish letters &aelig;, &oslash; and &aring;, not all PHP string functions can parse those, for instance the case conversion functions.</p>
        <p>ereg functions are not tested, as they have been deprecated from PHP 5.3.0, and people should use the PCRE extension's preg_* functions instead.

        <h3>Instantiated class</h3>
        <table>
            <tr><th>function</th><th>mb_input</th><th>standard</th><th>mb_*</th><th>BinString._*</th></tr>
            <tr class="new">
                <td>mail()</td>
                <td>no</td>
                <td>UNTESTED</td>
                <td>UNTESTED</td>
                <td>UNTESTED</td>
            </tr>

            <tr class="new">
                <td rowspan="2">strlen()</td><td>yes</td>
                <td><?= test(strlen($mbStr), 10) ?></td>
                <td><?= test(mb_strlen($mbStr, 'utf8'), 7) ?></td>
                <td><?= test($binstr->_strlen($mbStr), 10) ?></td></tr>
            <tr>
                <td>no</td>
                <td><?= test(strlen($isoStr), 7) ?></td>
                <td><?= test(mb_strlen($isoStr, 'latin1'), 7) ?></td>
                <td><?= test($binstr->_strlen($isoStr), 7) ?></td></tr>

            <tr class="new">
                <td rowspan="2">strpos()</td><td>yes</td>
                <td><?= test(strpos($mbStr, $mbNeedle), 6) ?></td>
                <td><?= test(mb_strpos($mbStr, $mbNeedle, 0, 'utf8'), 5) ?></td>
                <td><?= test($binstr->_strpos($mbStr, $mbNeedle), 6) ?></td>
            </tr>
            <tr>
                <td>no</td>
                <td><?= test(strpos($isoStr, $isoNeedle), 5) ?></td>
                <td><?= test(mb_strpos($isoStr, $isoNeedle, 0, 'latin1'), 5) ?></td>
                <td><?= test($binstr->_strpos($isoStr, $isoNeedle), 5) ?></td>
            </tr>

            <tr class="new">
                <td rowspan="2">strrpos()</td><td>yes</td>
                <td><?= test(strrpos($mbStr, $mbNeedle), 6) ?></td>
                <td><?= test(mb_strrpos($mbStr, $mbNeedle, 0, 'utf8'), 5) ?></td>
                <td><?= test($binstr->_strrpos($mbStr, $mbNeedle), 6) ?></td>
            </tr>
            <tr>
                <td>no</td>
                <td><?= test(strrpos($isoStr, $isoNeedle), 5) ?></td>
                <td><?= test(mb_strrpos($isoStr, $isoNeedle, 0, 'latin1'), 5) ?></td>
                <td><?= test($binstr->_strrpos($isoStr, $isoNeedle), 5) ?></td>
            </tr>
            <?php
           
if ($binstr->getPHPVersionId() >= 50200) {
               
?>
<tr class="new">
                    <td rowspan="2">stripos()</td><td>yes</td>
                    <td><?= test(stripos($mbStr, $mbNeedle), 6) ?></td>
                    <td><?= test(mb_stripos($mbStr, $mbNeedle, 0, 'utf8'), 5) ?></td>
                    <td><?= test($binstr->_stripos($mbStr, $mbNeedle), 6) ?></td>
                </tr>
                <tr>
                    <td>no</td>
                    <td><?= test(stripos($isoStr, $isoNeedle), 5) ?></td>
                    <td><?= test(mb_stripos($isoStr, $isoNeedle, 0, 'latin1'), 5) ?></td>
                    <td><?= test($binstr->_stripos($isoStr, $isoNeedle), 5) ?></td>
                </tr>

                <tr class="new">
                    <td rowspan="2">strripos()</td><td>yes</td>
                    <td><?= test(strripos($mbStr, $mbNeedle), 6) ?></td>
                    <td><?= test(mb_strripos($mbStr, $mbNeedle, 0, 'utf8'), 5) ?></td>
                    <td><?= test($binstr->_strripos($mbStr, $mbNeedle), 6) ?></td>
                </tr>
                <tr>
                    <td>no</td>
                    <td class="expectFail"><?= test(strripos($isoStr, $isoNeedle), 5) ?></td>
                    <td><?= test(mb_strripos($isoStr, $isoNeedle, 0, 'latin1'), 5) ?></td>
                    <td class="expectFail"><?= test($binstr->_strripos($isoStr, $isoNeedle), 5) ?></td>
                </tr>

                <tr class="new">
                    <td rowspan="2">strstr()</td><td>yes</td>
                    <td><?= test(strstr($mbStr, "\xC3\xB8\xC3\xA5"), "\xC3\xB8\xC3\xA5") ?></td>
                    <td><?= test(mb_strstr($mbStr, "\xC3\xB8\xC3\xA5", null, 'utf8'), "\xC3\xB8\xC3\xA5") ?></td>
                    <td><?= test($binstr->_strstr($mbStr, "\xC3\xB8\xC3\xA5"), "\xC3\xB8\xC3\xA5") ?></td>
                </tr>
                <tr>
                    <td>no</td>
                    <td><?= test_enc(strstr($isoStr, "\xF8\xE5"), "\xF8\xE5") ?></td>
                    <td><?= test_enc(mb_strstr($isoStr, "\xF8\xE5", null, 'latin1'), "\xF8\xE5") ?></td>
                    <td><?= test_enc($binstr->_strstr($isoStr, "\xF8\xE5"), "\xF8\xE5") ?></td>
                </tr>

                <tr class="new">
                    <td rowspan="2">stristr()</td><td>yes</td>
                    <td><?= test(stristr($mbStr, "\xC3\xB8\xC3\xA5"), "\xC3\xB8\xC3\xA5") ?></td>
                    <td><?= test(mb_stristr($mbStr, "\xC3\xB8\xC3\xA5", null, 'utf8'), "\xC3\xB8\xC3\xA5") ?></td>
                    <td><?= test($binstr->_stristr($mbStr, "\xC3\xB8\xC3\xA5"), "\xC3\xB8\xC3\xA5") ?></td>
                </tr>
                <tr>
                    <td>no</td>
                    <td><?= test_enc(stristr($isoStr, "\xF8\xE5"), "\xF8\xE5") ?></td>
                    <td><?= test_enc(mb_stristr($isoStr, "\xF8\xE5", null, 'latin1'), "\xF8\xE5") ?></td>
                    <td><?= test_enc($binstr->_stristr($isoStr, "\xF8\xE5"), "\xF8\xE5") ?></td>
                </tr>

                <tr class="new">
                    <td rowspan="2">strrchr()</td><td>yes</td>
                    <td class="expectFail"><?= test(strrchr($mbStr, "\xC3\xB8"), "\xC3\xB8\xC3\xA5") ?></td>
                    <td><?= test(mb_strrchr($mbStr, "\xC3\xB8", null, 'utf8'), "\xC3\xB8\xC3\xA5") ?></td>
                    <td class="expectFail"><?= test($binstr->_strrchr($mbStr, "\xC3\xB8"), "\xC3\xB8\xC3\xA5") ?></td>
                </tr>
                <tr>
                    <td>no</td>
                    <td><?= test_enc(strrchr($isoStr, "\xF8"), "\xF8\xE5") ?></td>
                    <td><?= test_enc(mb_strrchr($isoStr, "\xF8", null, 'latin1'), "\xF8\xE5") ?></td>
                    <td><?= test_enc($binstr->_strrchr($isoStr, "\xF8"), "\xF8\xE5") ?></td>
                </tr>
            <?php
           
}
           
?>
<tr class="new">
                <td rowspan="2">substr()</td><td>yes</td>
                <td><?= test(substr($mbStr, 6), "\xC3\xB8\xC3\xA5") ?></td>
                <td><?= test(mb_substr($mbStr, 5, mb_strlen($mbStr, 'utf8'), 'utf8'), "\xC3\xB8\xC3\xA5") ?></td>
                <td><?= test($binstr->_substr($mbStr, 6), "\xC3\xB8\xC3\xA5") ?></td>
            </tr>
            <tr>
                <td>no</td>
                <td><?= test_enc(substr($isoStr, 5), "\xF8\xE5") ?></td>
                <td><?= test_enc(mb_substr($isoStr, 5, mb_strlen($isoStr, 'latin1'), 'latin1'), "\xF8\xE5") ?></td>
                <td><?= test_enc($binstr->_substr($isoStr, 5), "\xF8\xE5") ?></td>
            </tr>

            <tr class="new">
                <td rowspan="2">strtolower()</td><td>yes</td>
                <td class="expectFail"><?= test(strtolower($mbStr), $mbStr_lower) ?></td>
                <td><?= test(mb_strtolower($mbStr, 'utf8'), $mbStr_lower) ?></td>
                <td class="expectFail"><?= test($binstr->_strtolower($mbStr), $mbStr_lower) ?></td>
            </tr>
            <tr>
                <td>no</td>
                <td class="expectFail"><?= test_enc(strtolower($isoStr), $isoStr_lower) ?></td>
                <td><?= test_enc(mb_strtolower($isoStr, 'latin1'), $isoStr_lower) ?></td>
                <td class="expectFail"><?= test_enc($binstr->_strtolower($isoStr), $isoStr_lower) ?></td>
            </tr>

            <tr class="new">
                <td rowspan="2">strtoupper()</td><td>yes</td>
                <td class="expectFail"><?= test(strtoupper($mbStr), $mbStr_upper) ?></td>
                <td><?= test(mb_strtoupper($mbStr, 'utf8'), $mbStr_upper) ?></td>
                <td class="expectFail"><?= test($binstr->_strtoupper($mbStr), $mbStr_upper) ?></td>
            </tr>
            <tr>
                <td>no</td>
                <td class="expectFail"><?= test_enc(strtoupper($isoStr), $isoStr_upper) ?></td>
                <td><?= test_enc(mb_strtoupper($isoStr, 'latin1'), $isoStr_upper) ?></td>
                <td class="expectFail"><?= test_enc($binstr->_strtoupper($isoStr), $isoStr_upper) ?></td>
            </tr>

            <tr class="new">
                <td rowspan="2">substr_count()</td><td>yes</td>
                <td><?= test(substr_count($mbStr, $mbNeedle), 1) ?></td>
                <td><?= test(mb_substr_count($mbStr, $mbNeedle, 'utf8'), 1) ?></td>
                <td><?= test($binstr->_substr_count($mbStr, $mbNeedle), 1) ?></td>
            </tr>
            <tr>
                <td>no</td>
                <td><?= test_enc(substr_count($isoStr, $isoNeedle), 1) ?></td>
                <td><?= test_enc(mb_substr_count($isoStr, $isoNeedle, 'latin1'), 1) ?></td>
                <td><?= test_enc($binstr->_substr_count($isoStr, $isoNeedle), 1) ?></td>
            </tr>

            <tr class="new">
                <td>ereg()</td>
                <td>no</td>
                <td>UNTESTED</td>
                <td>UNTESTED</td>
                <td>UNTESTED</td>
            </tr>

            <tr class="new">
                <td>eregi()</td>
                <td>no</td>
                <td>UNTESTED</td>
                <td>UNTESTED</td>
                <td>UNTESTED</td>
            </tr>

            <tr class="new">
                <td>ereg_replace()</td>
                <td>no</td>
                <td>UNTESTED</td>
                <td>UNTESTED</td>
                <td>UNTESTED</td>
            </tr>

            <tr class="new">
                <td>eregi_replace()</td>
                <td>no</td>
                <td>UNTESTED</td>
                <td>UNTESTED</td>
                <td>UNTESTED</td>
            </tr>

            <tr class="new">
                <td>split()</td>
                <td>no</td>
                <td>UNTESTED</td>
                <td>UNTESTED</td>
                <td>UNTESTED</td>
            </tr>

            <tr class="new">
                <td>startsWith()</td>
                <td>no</td>
                <td><?= test($binstr->startsWith("TestFileName.html", "test"), false) ?></td>
                <td><?= test($binstr->startsWith("TestFileName.html", ".html"), false) ?></td>
                <td><?= test($binstr->startsWith("TestFileName.html", "Test"), true) ?></td>
            </tr>

            <tr class="new">
                <td>endsWith()</td>
                <td>no</td>
                <td><?= test($binstr->endsWith("TestFileName.html", ".xhtml"), false) ?></td>
                <td><?= test($binstr->endsWith("TestFileName.html", ".Test"), false) ?></td>
                <td><?= test($binstr->endsWith("TestFileName.html", ".html"), true) ?></td>
            </tr>
        </table>
    </body>
</html>


Details

Binary Safe String functions

If you use PHP's mbstring.func_overload, or the server you are running on has it enabled, you are in trouble. Especially if you are relying on being able to parse binary data and protocols.

Introduction

To the question "Should I use multi-byte overloading (mbstring.func_overload)?". user 'gphilip' said it well on this StackOverflow post: http://stackoverflow.com/questions/222630/should-i-use-multi-byte-overloading-mbstring-func-overload

> My answer is: definitely not! > > The problem is that there is no easy way to "reset" > str* functions once they are overloaded. > > For some time this can work well with your project, > but almost surely you will run into an external library > that uses string functions to, for example, implement a > binary protocol, and they will fail. They will fail and > you will spend hours trying to find out why they are > failing.

Description

This class is a wrapper for string functions, in cases where the mbstring.func_overload tripe have been enabled. Be warned, use this class ONLY if you have to, as it will affect performance a bit. For some functions, a lot, though that is due to problems in mb_string, not this class. Function calls in PHP are fairly expensive on their own, and if func_overload is enabled, it'll use mb_string functions exclusively in place of the built-in PHP string, to parse them as 'latin1', which is also expensive, cpu wise.

Why the potential performance impact?

PHP, like Java, have length aware strings, meaning the object header knows how long your string is. They are binary safe, and not null (0x00) terminated.

mb_string functions ignore that, and parse the entirety of the string, to figure out what is what. strlen(string) simply tells you how many bytes are in it, mb_strlen will parse it, to find multi byte characters, and tell you how many characters there are. That is great for handling multi-byte encoded strings correctly, such as UTF-8, it sucks for binary data handling, as multi-byte sequences are bound to occur by random chance, in any large enough binary data set.


  Files folder image Files  
File Role Description
Accessible without login Plain text file BinString.Example1.php Example Example and test file
Accessible without login Plain text file BinString.Example2.php Example Example of using the new "Static" version of BinString.
Plain text file BinString.php Class Main class
Plain text file BinStringStatic.php Class Same as BinString.php, but using static functions, rather than having to be instantiated.
Accessible without login Plain text file composer.json Data Composer file for this package.
Accessible without login Plain text file readme.markdown Doc. Readme

 Version Control Reuses Unique User Downloads Download Rankings  
 100%3
Total:216
This week:1
All time:8,300
This week:571Up