Use your left/right keys to browse tutorials
Use jQuery and PHP to scrape page content

Use jQuery and PHP to scrape page content

1 Star2 Stars3 Stars4 Stars5 Stars
Posted on November 18, 2009

So we have content on another domain that we want to load via AJAX into a page how can we do this?…. This was a question that was put to the other day at work. More experienced web developers will know that JavaScript doesn’t allow cross domain XMLHttpRequest’s or AJAX requests (Asynchronous JavaScript and XML). There is a ‘dirty’ way to get around this using PHP and CURL to pull the HTML of the page you want to get the content from so JavaScript thinks it’s coming from your domain. Let me just say, this isn’t an ideal solution but it’s a useful technique when executed in the right situation.

NOTE: You need to have PHP5 installed on your server in order to use the CURL module.

The PHP

In this example we’re taking the community news section from smashingmagazine.com. Firstly using PHP we use CURL to get the whole contents of the homepage. we can then specify using javascript a specific div to get as explained below.

$ch = curl_init("http://www.smashingmagazine.com/");
$html = curl_exec($ch);
echo $html;

The JavaScript

This must be the simplest couple of lines of javascript ever. You can see within the DOM ready function we’re loading the content of the div #noupesoc into #content. As simple as that. You can specify any div or element on the page and grab it using this method.

    $("document").ready(function() {
        $("#content").load("curl.php #noupesoc");
    });

The HTML

 <h1>Smashing Community News</h1>
<div id="content"><img src="ajax-loader.gif" alt="Loading..." /></div>

demodownload



More tutorials from Papermashup
Comments
23 discussions around Use jQuery and PHP to scrape page content
Newer Comments
  1. rubbish says:

    executing someone else’s HTML within the context of your own page is asking for trouble and is the easiest way to get yourself XSS’d.

    • Ashley says:

      @rubbish, You’ve clearly not read the post, I’m not condoning users to actively trawl the internet and scrape any content they wish, however if you don’t want someone to use information from your site why post it online?

      The problem I had at work was that we had 2 sub domains that we needed to make ajax requests on, so they were owned by us.

      If you’d left your real name and email address I could have emailed you personally as you’ve clearly got the wrong end of the stick.

  2. very useful! thanks a lot

  3. Pingback: Really Useful Tutorials You Should Have Read in November 2009 Ajax Help W3C Tag

  4. Pingback: Destillat KW49-2009 | duetsch.info - GNU/Linux, Open Source, Softwareentwicklung, Selbstmanagement, Vim ...

  5. Hello, seems you have a problem in the code shown. Appears <h1> instead of !

  6. Hola, parece que tienes un problema en el código mostrado. Aparece <h1> en lugar de !

  7. Brenelz says:

    What do you think is the best way to deal with images / links as they will be broken if moved over.

  8. Davinder says:

    Hello,
    Thanks for this tutorial! but would you be able to create a simple login script with error/warning messages using Jquery?

    for example:
    I login with a wrong username/password and without refreshing the page a error pops out

    Thank you

  9. Ben says:

    Hmm…wouldn’t it be better to use PHP all the way? I can’t really see a practical use for this? Nice tutorial though.

    • Ashley says:

      @Ben the reason we use javascript is so we can easily inject the content into our page not compromising the rest of your content from loading properly, because if there is a problem loading the content we can easily detect that with jQuery, also it’s a lot easier to do this with jQuery than it would be to strip the content you want out of the page with PHP, you’d have to use some serious RegEx.

  10. Eire32 says:

    Thats a nice work work around, I like it. Handy for pulling news if they don’t have an RSS feed or the like.

Newer Comments




Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe

Get in contact

Please use the form below to get in contact. If your question is related to a free script download, please use the comments on the article page as community members are more likely to respond quicker than I can personally.

About Me

I'm Ashley Ford, Co-founder and Technical Director at Harkable.com London, UK. Previously I worked at InMobi, Spotify and MySpace. My interests include photography and making short videos I'm also an avid F1 fan. I'm always working on side projects. Here are a few: Easy Poll, We Deliver.



What do you specialise in?

I spend a lot of time coding in PHP and MySQL, as well as front end XHTML and CSS. I also specialise in javascript and the jQuery framework as well as being an avid designer. You can find me on dribbble

Interested in advertising?

If you'd like to advertise on Papermashup.com please get in touch via the contact link below for advertising opportunities.

How do I contact you

You can contact me here. and I'm available for consultation, freelance, programming book reviews.

Get on the mailing list

Join over 3000 people who have subscribed to the Papermashup inbox message, and be the first to find out about tutorial, competitions and giveaways.