Hello,
I am processing some files using php. Basically I read every byte of
the file and perform a simple operation on it to compute a sum.
My initial implementation was in C, but now I am trying re-doing the
same in PHP. This is how my PHP code looks like:
function fn($fname) {
$fd = fopen($fname, 'rb');
if ($fd === false) return(0);
$result = 0;
while (!feof($fd)) {
$buff = fread($fd, 1024 * 1024);
foreach (str_split($buff) as $b) {
$result += ord($b);
$result &= 0xffff;
}
}
fclose($fd);
return($result);
}
It works, but it is really slow (approximately 100x slower than the
original C code). I know that I should not expect much performance
from interpreted PHP code, but still - is there any trick I could use to speed this up?
function fn($fname) {
$fd = fopen($fname, 'rb');
if ($fd === false) return(0);
$result = 0;
while (!feof($fd)) {
$buff = fread($fd, 1024 * 1024);
foreach (str_split($buff) as $b) {
$result += ord($b);
$result &= 0xffff;
}
}
fclose($fd);
return($result);
}
Your foreach(str_split) line is an obvious place to start.
str_split() creates an array from a string. In your case, the input
buffer is 1024*1024 bytes long, so you're splitting that megabyte
string and (re-)creating an array of more than a million elements for
_each iteration of the loop_. (Which will be a million+ times.)
Why? The very first thing you should consider is pulling that out
and doing it just once:
$buff = fread($fd, 1024 * 1024);
$whatever = str_split($buff);
foreach ($whatever as $b)
(There's nothing specific to PHP about that advice either. It's
equally applicable to C, although modern C compilers _may_ make that optimization for you.)
Your foreach(str_split) line is an obvious place to start.
str_split() creates an array from a string. In your case, the input
buffer is 1024*1024 bytes long, so you're splitting that megabyte
string and (re-)creating an array of more than a million elements for
_each iteration of the loop_. (Which will be a million+ times.)
(There's nothing specific to PHP about that advice either. It's
equally applicable to C, although modern C compilers _may_ make that
optimization for you.)
That's hardly applicable in C, since there is no "foreach" in C. There
is a "for", and it's initialization argument is processed exactly once.
The foreach() initialization is clearly processed only once.
Any other ideas?
It works, but it is really slow (approximately 100x slower than the original C code). I know that I should not expect much performance
from interpreted PHP code, but still - is there any trick I could
use to speed this up?
By *not* using arrays.
Just use the string as it is:
$len = strlen($buff);
$pos = 0;
while ($pos < $len) {
$result += ord($buff[$pos]);
$result &= 0xffff;
$pos++;
}
However this may still be quite slow on larger files. Since it seems
you want to create some kind of checksum based on the file content,
you may want to use something else like hash_file(), sha1_file() or md5_file() and use the result of these calls instead processing the
whole file content in a loop.
On 2022-01-20 11:37, Mateusz Viste wrote:
The foreach() initialization is clearly processed only once.
Any other ideas?
My next question is why create the array at all?
You can just use a simple for loop to iterate over the string, with
$buff[$i] to access it character by character. That would avoid the
overhead (of both memory use and computation time) that's involved in creating the associative array.
Hello,
I am processing some files using php. Basically I read every byte of
the file and perform a simple operation on it to compute a sum.
My initial implementation was in C, but now I am trying re-doing the
same in PHP. This is how my PHP code looks like:
function fn($fname) {
$fd = fopen($fname, 'rb');
if ($fd === false) return(0);
$result = 0;
while (!feof($fd)) {
$buff = fread($fd, 1024 * 1024);
foreach (str_split($buff) as $b) {
$result += ord($b);
$result &= 0xffff;
}
}
fclose($fd);
return($result);
}
It works, but it is really slow (approximately 100x slower than the
original C code). I know that I should not expect much performance
from interpreted PHP code, but still - is there any trick I could use to speed this up?
I have also tried to replace str_split() and ord() with unpack('C*'),
but it was even slower. Anything else I could try?
On 2022-01-20 11:23, Mateusz Viste wrote:
(There's nothing specific to PHP about that advice either. It's
equally applicable to C, although modern C compilers _may_ make
that optimization for you.)
That's hardly applicable in C, since there is no "foreach" in C.
There is a "for", and it's initialization argument is processed
exactly once.
I was speaking more broadly about pulling invariant code out of any
and all loops regardless of where it appears in said loop, rather than relying on the interpreter or compiler to handle it for you.
On Thu, 20 Jan 2022 19:23:49 +0100[...]
Arno Welzel <usenet@arnowelzel.de> wrote:
However this may still be quite slow on larger files. Since it seems
you want to create some kind of checksum based on the file content,
you may want to use something else like hash_file(), sha1_file() or
md5_file() and use the result of these calls instead processing the
whole file content in a loop.
Sadly, that won't work. The kind of checksum I am computing is not
supported by PHP, hence why I do it byte by byte myself. Another
solution is to do it with my C code by system()-calling it from PHP,
but that's really ugly. At this point I'd rather stick with a slow, 100%
PHP solution.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 351 |
Nodes: | 16 (2 / 14) |
Uptime: | 27:52:42 |
Calls: | 7,634 |
Files: | 12,796 |
Messages: | 5,688,674 |