Administrators Tony Posted March 9, 2012 Administrators Posted March 9, 2012 So this one has been getting to me for a while now.. There is a site that I am trying to log in to, go to a reports page after a successful login, and then download up to 3 reports that are linked on that page. The problem that I'm having is that the reports are randomly named, and I can't get curl to download them. I'll post the html file that the site spit back out to me.. Can someone see how to download the 3 files that are on that page without hardcoding the file name? The 3 reports are named: My Company Invoice Upload_516_146_2012391728360 My Company Invoice Upload_516_146_20123917221402 My Company Patient Upload_516_146_2012391723258 My Scheduled Reports.htm Quote
Administrators Nathan Posted March 9, 2012 Administrators Posted March 9, 2012 Hopefully Brandon can help you with this one. He seems pretty knowledgeable on this. Quote
Brandon Posted March 9, 2012 Posted March 9, 2012 If you are scraping these out with regex: /Home/frmMyReportOpen.aspx?FileName=018a257a-4fa1-42e3-af0d-f818e4dfda3c.csv&FileExtension=CSV&FilePath=192.168.9.95BTReports and you are trying to get the csv file with a new cURL call you have to make the & into a "&" with html_entity_decode http://php.net/manual/en/function.html-entity-decode.php And you have to prepend the base url to that one as well as it is a relative path. Quote
Administrators Tony Posted March 10, 2012 Author Administrators Posted March 10, 2012 It's probably gonna take me a minute to digest what you are saying, but Thank You, and I will see if I can figure it out... Do you know of a tutorial that can tell me how to scrape with regex? I'm going to google it but if you have a good one in mind I'd like to hear it Thanks! Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.