Author |
|
C0nw0nk
Joined: 07 Oct 2013 Posts: 241 Location: United Kingdom, London
|
Posted: Tue 23 Dec '14 19:44 Post subject: |
|
|
Jan can i request you compile your builds with ssdeep
http://pecl.php.net/package/ssdeep
It is a extension i want to test with on windows i am working on preventing duplicate files and one site that currently uses this extension is www.virustotal.com
ssdeep :
384:1mDEF82t2udsptLuy3QCwiUDK68o7ZRH8vIDcSLNAafPdXpq1ym+6aFSonT6Ir0y:1mDE7tUtLPQC8K6t9KvDC9d3r0w3TMq
That would be a file identification key just like if you did md5_file(); or sha1_file();
So if we used the ssdeep extension we could check with ssdeep_fuzzy_hash_filename(); |
|
Back to top |
|
gijs
Joined: 27 Apr 2012 Posts: 189 Location: The Netherlands
|
Posted: Tue 23 Dec '14 23:35 Post subject: |
|
|
Why don't you make use of deduplication in Windows Server 2012 and newer?
It works on blok model so if a file is 99% identical it will only save the 1% difference. |
|
Back to top |
|
C0nw0nk
Joined: 07 Oct 2013 Posts: 241 Location: United Kingdom, London
|
Posted: Tue 23 Dec '14 23:50 Post subject: |
|
|
Because when you run a site with a mysql database that points to files it wont work will it. You can remove the duplicate file but the path will now return a 404 error by your web server and the database will still contain the data. That would break the site basically and cause allot more problems you want a method that functions with your database and existing code.
You need to use a method that works with your code. This being my answer.
I currently use sha1_file(); But the longer the string identifyer the less likely you are to encounter another file with the same hash blueprint.
for example :
picture.jpg has a salted hash of "a1fb9d651cd328cb83b43e616189b7406ee6f889"
if i take picture.jpg rename it to picture1.jpg and reupload it the hash sum is still the same due to the contents of the file being the same this prevents people from uploading the same picture more than once with different file names.
If you want to see this code in action for yourself go to www.virustotal.com upload a file then rename the file and upload it again and it recognizes the same file because the hashes (basic blueprint) that can be used to identify a file are the same.
Code: | MD5 | 98968828c85467a12aece86392029900
SHA1 | a1fb9d651cd328cb83b43e616189b7406ee6f889
SHA256 | 7bb2970d8a3b30b7eb88fcc2c985a8e66ccc16cdfc80b0a4b4d5162e9a974916
ssdeep | 1536:O35juMEED0tj+mJLjpMXwoxto0O7wSBSQOAdGOUvmhU:S4lt3jDoxtofS9AdSmhU
|
As you can see ssdeep is the longest identification output making it the most secure and less likely that you will encounter a file with different contents but the same hash. |
|
Back to top |
|
gijs
Joined: 27 Apr 2012 Posts: 189 Location: The Netherlands
|
Posted: Wed 24 Dec '14 2:36 Post subject: |
|
|
It doesn't change the path or anything like that it works on file system level, applications are not aware of it. |
|
Back to top |
|
C0nw0nk
Joined: 07 Oct 2013 Posts: 241 Location: United Kingdom, London
|
Posted: Wed 24 Dec '14 5:54 Post subject: |
|
|
gijs wrote: | It doesn't change the path or anything like that it works on file system level, applications are not aware of it. |
Thats the point it does not function with a mysql database that links to files there for files would start randomly returning 404 errors while others would stay as 200. Making random data in my database obsolete and when you deal with almost a million files this is a serious problem.
My soloution is better to fix this. And i am sure anyone who has to work with MySQL or NOSQL would agree with me. This way i can take every single file add it's unique hash to the database then based of uploaded date. The original (first uploaded file), Will be the the one that stays while all other duplicates can be deleted.
I do understand your point but its not like i am running a FTP or storage facility where files are not tracked every file is in my database if i delete a file from the database it goes from the server to via php. |
|
Back to top |
|
gijs
Joined: 27 Apr 2012 Posts: 189 Location: The Netherlands
|
|
Back to top |
|
Jan-E
Joined: 09 Mar 2012 Posts: 1265 Location: Amsterdam, NL, EU
|
Posted: Wed 24 Dec '14 19:18 Post subject: |
|
|
Ok, this was a real challenge. It took me some 6 hours of work, but I got it running:
Code: | D:\phpdev\php55nts.x32>type ssdeep.php
<?php
echo ssdeep_fuzzy_hash_filename('./ext/php_ssdeep.dll');
?>
D:\phpdev\php55nts.x32>php ssdeep.php
192:MILfpy6qwqWmq1XxcUsW3oTS3XpqM0+d:VLxy6NqWm4xcIoT+MY
D:\phpdev\php55nts.x32> |
|
|
Back to top |
|
C0nw0nk
Joined: 07 Oct 2013 Posts: 241 Location: United Kingdom, London
|
Posted: Wed 24 Dec '14 21:25 Post subject: |
|
|
Oh now i am with you so it deletes the duplicate but makes a basic redirect on the file path to first or original file.
It seems a good idea but i would rather prevent the upload and the waste of my resources on this in the first place id rather not have my site with loads of pages displaying the same file.
Thanks Jan-E looking forward to the release build I will let you know if i encounter any bugs with it i find it unlikely since the extension only has 3 functions. |
|
Back to top |
|
gijs
Joined: 27 Apr 2012 Posts: 189 Location: The Netherlands
|
Posted: Wed 24 Dec '14 22:20 Post subject: |
|
|
Yup, but you can't prevent the upload unless you check the filechecksum client side which isn't possible with PHP. |
|
Back to top |
|
C0nw0nk
Joined: 07 Oct 2013 Posts: 241 Location: United Kingdom, London
|
Posted: Wed 24 Dec '14 23:24 Post subject: |
|
|
Well i use sha1_file(); to check uploads and i deal with allot of uploads everyday and these are video uploads upto 2gb in size and speed and performance is not effected. I even use Wincache with this and and process and convert all uploads into HD videos both MP4 and Webm format then i do the sha1_file(); on those and log their hashes to the database to prevent users downloading them and uploading it trying to be a pain or bypass the system. |
|
Back to top |
|
jimski
Joined: 18 Jan 2014 Posts: 196 Location: USSA
|
Posted: Thu 25 Dec '14 10:13 Post subject: |
|
|
Jan-E I have a question.
I haven't tested any of your latest builds that you just posted however almost every build of php from any source has the same problem of failing an apache bench test when doing phpinfo(). This includes official php builds from php.net
Let me be more specific.
When doing ab test on a localhost using a php file that contains only one line <?php phpinfo(); ?>
ab -n 10000 -c 100 -k http://localhost/phpinfo.php
This produces large number of failed requests on all versions of php except for php 5.3.x
Can you shine some light on this ? |
|
Back to top |
|
Jan-E
Joined: 09 Mar 2012 Posts: 1265 Location: Amsterdam, NL, EU
|
Posted: Thu 25 Dec '14 16:58 Post subject: |
|
|
I could not reproduce it with PHP 5.5 nts as mod_fcgid, but I can try to recompile without the modules that PHP 5.4+ has, but PHP 5.3 not: php_event, php_jsonc and php_strict.
php_strict might be a culprit, because it could not be loaded together with suhosin: http://pecl.php.net/package/strict
Which versions do you want me to compile first? |
|
Back to top |
|
Jan-E
Joined: 09 Mar 2012 Posts: 1265 Location: Amsterdam, NL, EU
|
Posted: Thu 25 Dec '14 18:34 Post subject: |
|
|
I recompiled the PHP 5.5/6 x86 builds (and the PHP 5.3/4 x64) builds without php_strict, php_jsonc/d and php_event. Could you try these? Otherwise try the previous versions: they are still online, just lower the number.
Starting the recompile of 5.4 x86 and 5.5/6 x64 right now. |
|
Back to top |
|
jimski
Joined: 18 Jan 2014 Posts: 196 Location: USSA
|
Posted: Thu 25 Dec '14 21:56 Post subject: |
|
|
Jan-E wrote: | I could not reproduce it with PHP 5.5 nts as mod_fcgid |
I should have mentioned that I was testing the ts version. Let me try your new compilations. |
|
Back to top |
|
jimski
Joined: 18 Jan 2014 Posts: 196 Location: USSA
|
|
Back to top |
|
Jan-E
Joined: 09 Mar 2012 Posts: 1265 Location: Amsterdam, NL, EU
|
Posted: Sat 27 Dec '14 19:10 Post subject: |
|
|
On a Centos 6-server with PHP 5.3 I also had 0 failures. But with higher PHP-versions I ran into failures as well (1015 out of 10000). The output below is on a Centos 6 server with Apache 2.4.10, PHP 5.6.4 as php-fpm, using Zend OPcache.
So this seems a non OS-specific problem. It might even be that Nginx with the same PHP 5.6.4 as php-fpm runs into the same failures.
http://apaste.info/2du
Edit On a Centos 6 server with PHP 5.3.29 I had to run ApacheBench a couple of times before getting error-free. Probably I had to fill the OPcache:
http://apaste.info/YqC
Mod Note: please use apaste.info for long console outputs |
|
Back to top |
|
jimski
Joined: 18 Jan 2014 Posts: 196 Location: USSA
|
Posted: Sun 28 Dec '14 3:35 Post subject: |
|
|
I also tested in on Centos 6.5 and got similar results as you.
No problem with php 5.3.3 and php 5.3.28
However higher versions of php produced failures at the same rate on centos as on windows.
Unless we find a reason for this problem any benchmark test on higher versions of PHP will be probably meaningless if PHP can't reliably serve it's own info function without an error |
|
Back to top |
|
ng4win
Joined: 25 May 2014 Posts: 78
|
Posted: Sun 28 Dec '14 17:45 Post subject: |
|
|
Jan-E wrote: | Probably I had to fill the OPcache: |
Sounds like a cache issue, not a php issue. |
|
Back to top |
|
Jan-E
Joined: 09 Mar 2012 Posts: 1265 Location: Amsterdam, NL, EU
|
Posted: Sun 28 Dec '14 19:10 Post subject: |
|
|
It is a cache issue (Edit: in PHP 5.3!). If I disable Zend OPcache it goes well the first time, with PHP 5.3 29 as PHP-FPM (and a concurrency level of 1020).
Could you try ApacheBench on a Nginx server?
ab -n 10000 -c 1020 -k http://127.0.0.1/phpinfo.php
As far as I know ApacheBench should be able to connect to Nginx as well.
http://apaste.info/D7m
Mod Note: please use apaste.info for long console outputs
Last edited by Jan-E on Sun 28 Dec '14 20:40; edited 2 times in total |
|
Back to top |
|
Jan-E
Joined: 09 Mar 2012 Posts: 1265 Location: Amsterdam, NL, EU
|
Posted: Sun 28 Dec '14 19:38 Post subject: |
|
|
The same on another Centos 6 server (with a faster processor) with PHP 5.6.4 as FPM, reports about 9500 failures the first time and around 10% (1000 out of 10000) every next time. It does not matter if I enable or disable Zend OPcache, the 10% failure ratio stays the same. |
|
Back to top |
|