Jump to content

Search the Community

Showing results for tags 'scraping'.



More search options

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • Welcome
    • Announcements & News
    • New Arrivals
  • Web Development
    • Programming
    • Database Development
    • Server Administration
    • Hosting & Domains
    • Frameworks
  • Web Design
    • HTML & CSS
    • Graphics & Multimedia
  • Desktop Discussion
    • Linux Development
    • Windows Development
    • Mac/Apple Development
    • Hardware Discussion
  • Marketing & Business
    • Advertising, Marketing, Monetization & Social Media
    • Search Engine Optimization & Traffic Building
    • Buy, Sell or Trade
  • Prodjex Web Development Applications, Tools and Plugins
    • IP.Board Applications and Plugins
    • Web Tools
  • The Developer Dump
    • General Chat

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


AIM


MSN


Website URL


ICQ


Yahoo


Jabber


Skype


Location


Interests

Found 1 result

  1. So this is my first little mini tutorial here. Hope someone will like it/find it useful. Basically what we are going to do is scrape some data from a remote website using PHP and cURL. cURL is a "client URL transfer library" for making all sorts of remote requests and is very useful for many things like getting data, logging in automatically, auto filling out forms etc. Lets get cracking! First and foremost we have to enable the cURL extension as this is not enabled by default. On a Windows machine edit your php.ini file and uncomment ;extension=php_curl.dll and restart your server. If you are using Ubuntu sudo apt-get install php5-curl and restart server. I use a WAMP server at home and it is super easy to install extensions on it simply: Go to icon down in the right corner of your screen->left click WAMPSERVER icon->PHP->PHP extensions->click on php_curl and then restart server. Voila! Alright now we are going to initiate cURL and make a request to another site and display the html with an echo: <?php $url = "http://www.nytimes.com/"; $ch = curl_init($url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $result = curl_exec($ch); curl_close($ch); echo $result; ?> Now that we have the html inside $result we can extract the data we are after using regular expressions. In this case I took a regex from http://regexlib.com/ to extract links and modded it just a little bit to make it work. You can just comment out the previous echo $result; and paste these 2 lines in there. preg_match_all("/<a[\s]+[^>]*?href[\s]?=[\s\"\']*(.*?)[\"\'].*?>([^<]+|.*?)?<\/a>/is", $result, $match, PREG_SET_ORDER); print_r($match); This is how this stuff works. Pretty easy basic stuff. You can read more about PHPs cURL support here http://php.net/manual/en/book.curl.php. I especially recommend the curl_setopt part were you can make all kinds of cool stuff like setting an user agent, set referrer, set cookie and a bunch of other stuff to mimic your request coming from an actual user. Any questions or suggestions, just fire away in the thread! More information on cURL: http://curl.haxx.se/
×
×
  • Create New...