+ Reply to Thread
Results 1 to 5 of 5

Thread: Getting Non-Syndicated Info from Another Site

  1. #1
    garrensilverwing's Avatar
    garrensilverwing is offline x10 Sophmore garrensilverwing is an unknown quantity at this point
    Join Date
    Nov 2008
    Posts
    148

    Getting Non-Syndicated Info from Another Site

    I want to grab information from another site that is not in an RSS feed. An example of what info I would like to grab can be seen at http://www.uschess.org/msa/MbrDtlMain.php?13923823. The information will change based on the USCF profile i need to pull up which can be done easily with the USCF id. the info will always look like this:
    Code:
    Regular Rating
    </td>
    
    <td>
    <b><nobr>
    **Variable**&nbsp;&nbsp;
    The information has several formats. It can be 3-4 digits long and possibly have a short string at the end.
    I know there is probably a way to access the page, grab the information, and store it in a variable using
    regular expressions but I'm pretty novice at regex so I was hoping you can provide me with an answer.
    the code would probably look like this but I'm afraid it would be more complicated than that:

    Code:
    Open USCF Webpage;
    if(webpage doesnt exist) $rating = "unknown";
    else
      {
        $rating = grabbed information;
        echo $rating
      }

  2. #2
    lemon-tree's Avatar
    lemon-tree is offline x10 Minion lemon-tree has a spectacular aura about
    Join Date
    Nov 2007
    Posts
    1,420

    Re: Getting Non-Syndicated Info from Another Site

    I am dubious of problems that may arise from using data that isn't meant to be extracted for custom use. Would it not be better to go to the website's owner and either request data access or at least get proper permission to use this method. It is very likely they will deny you direct data access, but they may be OK with you extracting the number from the page.
    Either email them or use their forums.
    Last edited by lemon-tree; 04-30-2010 at 05:20 PM.

  3. #3
    garrensilverwing's Avatar
    garrensilverwing is offline x10 Sophmore garrensilverwing is an unknown quantity at this point
    Join Date
    Nov 2008
    Posts
    148

    Re: Getting Non-Syndicated Info from Another Site

    thats a good idea i'll try that first, but seeing as it is public information i don't think it will be a problem extracting it from their website in the manner mentioned above, but i will definitely try that first

  4. #4
    lemon-tree's Avatar
    lemon-tree is offline x10 Minion lemon-tree has a spectacular aura about
    Join Date
    Nov 2007
    Posts
    1,420

    Re: Getting Non-Syndicated Info from Another Site

    Bear in mind that just because something is published publicly on the web does not automatically mean it is OK to take it. It's very likely that they'll be just fine with letting you use the data, but it is always worth checking.

  5. #5
    misson is offline x10 Spammer misson is a jewel in the rough
    Join Date
    Mar 2008
    Location
    Libertatia
    Posts
    2,506

    Re: Getting Non-Syndicated Info from Another Site

    If you get permission to use the data but can only access it from the web pages, you can use the regexp:
    Code:
    /Regular Rating\s*(?:<[^>]*>\s*)*([^<]+)/s
    which matches the first text after "Regular Rating" (the ([^<]+)) and ignores intervening tags and whitespace (the (?:<[^>]*>\s*)*). This should continue to work if the page structure changes in some ways, but can match multiple times if "Regular Rating" occurs more than once.

    Alternately, you can use a parsed version of the page (such as obtained with DOMDocument::loadHTML or simplexml_load_string) and access the info with the xpath:
    Code:
    //td[text()="Regular Rating"]/following-sibling::*//text()
    If the data changes infrequently, make sure you cache it to reduce the load on the other server.
    Last edited by misson; 04-30-2010 at 06:40 PM.
    Be sure to read all pages linked in this post; they have further information that should prove useful. When asking for help, make sure you follow Eric Raymond's and Jon Skeet's guidelines for prompt, accurate responses. Please answer any questions I ask; they're not rhetorical (probably). Any posted code is intended as illustrative example, rather than a solution to your problem to be copied without alteration. Study it to learn how to write your own solution.
    Misson, not Mission.

+ Reply to Thread

Similar Threads

  1. Site offline again, please help. Info needed!
    By petemint in forum Free Hosting
    Replies: 2
    Last Post: 09-01-2008, 07:38 PM
  2. Please Review my Site - iMusicz.info
    By Jesse in forum Review My Site
    Replies: 11
    Last Post: 08-07-2008, 02:14 PM
  3. Static info pages on the x10 site
    By Wogan in forum Feedback and Suggestions
    Replies: 3
    Last Post: 12-19-2007, 03:19 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
x10hosting free hosting for the masses
dedicated servers