Languages

How to batch the download of very large files with a php cron

How would you handle the upload of very big files on a distant ftp server ?
Or , put differently, how would you notice on a distant ftp folder that big files arrived in full , and are complete , if you are not the one that send the file.
The code below might not work on a window- based machine, i only use it on linux servers, and didn't try it on a personnal computer , so don't ask me to resolve problems between windows and Apache/PHP or that kind of stuff, you won't find me ....

Anyway , this kind of problematic is really related to programmers in a professional environment, so i don't think i'll have a lot of people reading it, still, if it can be of any help !!
The situation is : you have an access to a FTP server, it could be a public server, for instance ( see http://mambo.ucsc.edu/psl/sgiftp.html for a list of servers ) , from which you have to retrieve data from time to time , or a client send you files for instance videos on a FTP . How would you know that the data you' re retrieving is complete , how would you adapt your batch process so that it starts downloading the stuff that is apparently complete ?
There's a way to do that, using FTP commands, and the code i propose is a batch in PHP .




// folder on the FTP that is scrutinized
$pathftp = "/home/www/html/wires/efe/import";

//  folder on my linux machine where the files will be downloaded to
define('TEMPFOLDER','usr/monrepertoire/');


// connection to the FTP
$hostftpdistant = "FTPHOST" ;
$ftp_user_name = 'USER' ;
$ftp_user_pass = 'LOGIN';
$conn_distant = ftp_connect($hostftpdistant);
if(!$login_distant = @ftp_login($conn_distant, $ftp_user_name, $ftp_user_pass)){
        logerror('connexion ftp  impossible') ;
        print  "Operation failed Login /connexion pb to ".$hostftpdistant." , please try again  ".$rc;
        die();
}
ftp_chdir($conn_distant,$pathftp) ;

// tables : res for the current table, prev for the previous one (the one generated by the batch last time it ran),
// to_process is the table i use to store the names of the files that are apparently complete.
// The two first tables have an identical structure to compare them .
$res = array() ;
$prev  = array();
$to_process = array();


//  List all the files in the distant FTP folder.
$buff = ftp_rawlist($conn_id, '.');
 

..This make up a table of lines like this .
Array
(
[0] => -rw-r--r-- 1 507 507 7787422 Oct 06 15:04 Envio_1_20091006_1600.zip
[1] => -rw-r--r-- 1 507 507 11020588 Oct 06 15:34 Envio_1_20091006_1630.zip
[2] => -rw-r--r-- 1 507 507 11921288 Oct 06 16:05 Envio_1_20091006_1700.zip

)

my fonction parse_rawlist_into_array transform this first table onto a table of key values for each element

$res = parse_rawlist_into_array($buff ,$path);

the final result is a table in which each element has for key the name of the file .I keep the complete path to the file, it size ( that will be used to compare and see if the file is complete ), and its date ..

[Envio_1_20091007_0000.zip] => Array
(
[path] => /home/www/html/wires/efe/import
[size] => 7812960
[month] => Oct
[day] => 06
[name] => Envio_1_20091007_0000.zip
)

Now you can read the .txt that lists all the files from the precedent run of the cron and that were not treated.


$prev = parse_textfile_into_array(FOLDER.'fichiers_present.txt',$path);
//  compare the size of the file . If equals, i put them in the table  to_process .
// If these are different, i suppose the upload is not finished yet, so i write the name of the file in the .txt fichiers_present.txt .
foreach($res as $filetocheck){
        if($filetocheck['size'] ==  $prev[$filetocheck['name']]['size'] ){
                $to_process[] = $filetocheck['name'] ;
        }
}


foreach($to_process as $file){
                debugConsole('Retrieve a zip  ZIP : '.$file) ;
                if(!$fp = fopen(TEMPFOLDER.$file,'w') ){
                        die('problem rights on the folder '.TEMPFOLDER);
                }

                if(!ftp_fget($conn_distant,$fp, $file ,FTP_BINARY)){
                        logerror('impossible to retrieve the file '.$file) ;
                        $db_ext->disconnect();
                        $db->disconnect();
                        fclose($fp);
                        die();
                }
                fclose($fp);                         
                // delete the file from the remote folder once it's done.
                if (!ftp_delete($conn_distant, $file)) {
                        logerror('impossible to delete the file on distant server '.$file) ;                                
                }
}

// then i rewrite in the  text file the files that were not retrieved
// I do that with  ftp_rawlist
$newbuff = parse_rawlist($buff, $path);
if(!($handle = fopen(FOLDER.'fichiers_present.txt',"w+")))
    die("

Cannot open \"".$file."\".

\n\n");

And that's about it , now you can cfix the frequency of your batch using crontab, for instance every 10 minutes .

Now the three functions that i wrote for this script that return the arrays :




function parse_rawlist_into_array($array,$path)
{
    $structure = array();
    foreach($array as $curraw)
    {
        $struc = array();
        $current = preg_split("/[\s]+/",$curraw,10);
        // on prend seulement les fichiers pas les directories
        if($current[4]>0){
            $struc['path'] = $path;
            $struc['size']  = $current[4];
            $struc['month']  = $current[5];
            $struc['day']    = $current[6];
            $struc['name'] = $current[8];
            $structure[$struc['name']] = $struc;
        }
    }
   return $structure;

}
function parse_textfile_into_array($file,$path){
    if(!($handle = fopen($file,"a")))
            die("

Cannot open \"".$file."\".

\n\n"); $struc = array(); $structure = array(); $lines = file($file); foreach($lines as $Ligne){ $current = preg_split("/[\s]+/",$Ligne,10); $struc['path'] = $path; $struc['size'] = $current[4]; $struc['month'] = $current[5]; $struc['day'] = $current[6]; $current[9] ? $struc['name'] = $current[9] : $struc['name'] = $current[8] ; $structure[$struc['name']] = $struc; } fclose($handle); return $structure ; } function parse_rawlist($array,$path) { foreach($array as $Ligne){ $current = preg_split("/[\s]+/",$Ligne,9); if($current[4]>0){ $buffer .= $Ligne."\n"; } } return $buffer; }
»