Reply to comment
How to batch the download of very large files with a php cron
How would you handle the upload of very big files on a distant ftp server ?
Or , put differently, how would you notice on a distant ftp folder that big files arrived in full , and are complete , if you are not the one that send the file.
The code below might not work on a window- based machine, i only use it on linux servers, and didn't try it on a personnal computer , so don't ask me to resolve problems between windows and Apache/PHP or that kind of stuff, you won't find me ....
Anyway , this kind of problematic is really related to programmers in a professional environment, so i don't think i'll have a lot of people reading it, still, if it can be of any help !!
The situation is : you have an access to a FTP server, it could be a public server, for instance ( see http://mambo.ucsc.edu/psl/sgiftp.html for a list of servers ) , from which you have to retrieve data from time to time , or a client send you files for instance videos on a FTP . How would you know that the data you' re retrieving is complete , how would you adapt your batch process so that it starts downloading the stuff that is apparently complete ?
There's a way to do that, using FTP commands, and the code i propose is a batch in PHP .
// folder on the FTP that is scrutinized
$pathftp = "/home/www/html/wires/efe/import";
// folder on my linux machine where the files will be downloaded to
define('TEMPFOLDER','usr/monrepertoire/');
// connection to the FTP
$hostftpdistant = "FTPHOST" ;
$ftp_user_name = 'USER' ;
$ftp_user_pass = 'LOGIN';
$conn_distant = ftp_connect($hostftpdistant);
if(!$login_distant = @ftp_login($conn_distant, $ftp_user_name, $ftp_user_pass)){
logerror('connexion ftp impossible') ;
print "Operation failed Login /connexion pb to ".$hostftpdistant." , please try again ".$rc;
die();
}
ftp_chdir($conn_distant,$pathftp) ;
// tables : res for the current table, prev for the previous one (the one generated by the batch last time it ran),
// to_process is the table i use to store the names of the files that are apparently complete.
// The two first tables have an identical structure to compare them .
$res = array() ;
$prev = array();
$to_process = array();
// List all the files in the distant FTP folder.
$buff = ftp_rawlist($conn_id, '.');
..This make up a table of lines like this .
Array
(
[0] => -rw-r--r-- 1 507 507 7787422 Oct 06 15:04 Envio_1_20091006_1600.zip
[1] => -rw-r--r-- 1 507 507 11020588 Oct 06 15:34 Envio_1_20091006_1630.zip
[2] => -rw-r--r-- 1 507 507 11921288 Oct 06 16:05 Envio_1_20091006_1700.zip
)
my fonction parse_rawlist_into_array transform this first table onto a table of key values for each element
$res = parse_rawlist_into_array($buff ,$path);
the final result is a table in which each element has for key the name of the file .I keep the complete path to the file, it size ( that will be used to compare and see if the file is complete ), and its date ..
[Envio_1_20091007_0000.zip] => Array
(
[path] => /home/www/html/wires/efe/import
[size] => 7812960
[month] => Oct
[day] => 06
[name] => Envio_1_20091007_0000.zip
)
Now you can read the .txt that lists all the files from the precedent run of the cron and that were not treated.
$prev = parse_textfile_into_array(FOLDER.'fichiers_present.txt',$path);
// compare the size of the file . If equals, i put them in the table to_process .
// If these are different, i suppose the upload is not finished yet, so i write the name of the file in the .txt fichiers_present.txt .
foreach($res as $filetocheck){
if($filetocheck['size'] == $prev[$filetocheck['name']]['size'] ){
$to_process[] = $filetocheck['name'] ;
}
}
foreach($to_process as $file){
debugConsole('Retrieve a zip ZIP : '.$file) ;
if(!$fp = fopen(TEMPFOLDER.$file,'w') ){
die('problem rights on the folder '.TEMPFOLDER);
}
if(!ftp_fget($conn_distant,$fp, $file ,FTP_BINARY)){
logerror('impossible to retrieve the file '.$file) ;
$db_ext->disconnect();
$db->disconnect();
fclose($fp);
die();
}
fclose($fp);
// delete the file from the remote folder once it's done.
if (!ftp_delete($conn_distant, $file)) {
logerror('impossible to delete the file on distant server '.$file) ;
}
}
// then i rewrite in the text file the files that were not retrieved
// I do that with ftp_rawlist
$newbuff = parse_rawlist($buff, $path);
if(!($handle = fopen(FOLDER.'fichiers_present.txt',"w+")))
die("Cannot open \"".$file."\".
\n\n");
And that's about it , now you can cfix the frequency of your batch using crontab, for instance every 10 minutes .
Now the three functions that i wrote for this script that return the arrays :
function parse_rawlist_into_array($array,$path)
{
$structure = array();
foreach($array as $curraw)
{
$struc = array();
$current = preg_split("/[\s]+/",$curraw,10);
// on prend seulement les fichiers pas les directories
if($current[4]>0){
$struc['path'] = $path;
$struc['size'] = $current[4];
$struc['month'] = $current[5];
$struc['day'] = $current[6];
$struc['name'] = $current[8];
$structure[$struc['name']] = $struc;
}
}
return $structure;
}
function parse_textfile_into_array($file,$path){
if(!($handle = fopen($file,"a")))
die("Cannot open \"".$file."\".
\n\n");
$struc = array();
$structure = array();
$lines = file($file);
foreach($lines as $Ligne){
$current = preg_split("/[\s]+/",$Ligne,10);
$struc['path'] = $path;
$struc['size'] = $current[4];
$struc['month'] = $current[5];
$struc['day'] = $current[6];
$current[9] ? $struc['name'] = $current[9] : $struc['name'] = $current[8] ;
$structure[$struc['name']] = $struc;
}
fclose($handle);
return $structure ;
}
function parse_rawlist($array,$path)
{
foreach($array as $Ligne){
$current = preg_split("/[\s]+/",$Ligne,9);
if($current[4]>0){
$buffer .= $Ligne."\n";
}
}
return $buffer;
}
