+ Reply to Thread
Results 1 to 5 of 5

Thread: PHP - XPATH - Scraping Data From A Page

  1. #1
    masterjake is offline x10Hosting Member masterjake is an unknown quantity at this point
    Join Date
    Oct 2007
    Posts
    73

    PHP - XPATH - Scraping Data From A Page

    My friend wanted me to create a script to scrape his Xbox gamertag and total XP from a page that it's stored and automatically updated on. I barely even know how xpath works.

    Can someone tell me how I would do this.

    There's a ton of tags but the actual data I need to retrieve on the page is located in a set like this:

    Code:
    <div class="user">
                                    <h3>
                                        
                                                NEED TO RIP THIS
                                                                                    
                                    </h3>
                                    <div class="userpic">
                                     
    
                                    </div>
                                    <div class="userinfo">
                                        <dl>
                                            <dt></dt>
                                            <dd>NEED TO RIP THIS</dd><!-- TotalRank -->
                                        </dl>
                                        <dl>
                                            <dt></dt>
                                            <dd></dd>
                                        </dl>
                                        <dl>
                                            <dt></dt>
                                            <dd></dd>
                                        </dl>                                    
                                        <dl>
                                            <dt></dt>
                                            <dd></dd>
                                        </dl>
                                        <dl>
                                            <dt/>
                                            <dd>
                                                
                                            </dd>
                                        </dl>
                                    </div>
                                </div>
    I have deleted all data out of the tags and put a "NEED TO RIP THIS" statement in the 2 spots that I need to scrape the data from. Can someone help me?
    Last edited by masterjake; 09-01-2009 at 12:21 AM.

  2. #2
    misson is offline x10 Spammer misson is a jewel in the rough
    Join Date
    Mar 2008
    Location
    Libertatia
    Posts
    2,506

    Re: PHP - XPATH - Scraping Data From A Page

    Have you read any tutorials on XPATH? What XML library do you want to use to parse the document (DOM, libxml, SimpleXML, XML or XMLReader)?
    Be sure to read all pages linked in this post; they have further information that should prove useful. When asking for help, make sure you follow Eric Raymond's and Jon Skeet's guidelines for prompt, accurate responses. Please answer any questions I ask; they're not rhetorical (probably). Any posted code is intended as illustrative example, rather than a solution to your problem to be copied without alteration. Study it to learn how to write your own solution.
    Misson, not Mission.

  3. #3
    masterjake is offline x10Hosting Member masterjake is an unknown quantity at this point
    Join Date
    Oct 2007
    Posts
    73

    Re: PHP - XPATH - Scraping Data From A Page

    I've read a bit but they didn't explain very well as to why things were happening, e.g. the tag order and all that.

    I want to use DOM.

  4. #4
    misson is offline x10 Spammer misson is a jewel in the rough
    Join Date
    Mar 2008
    Location
    Libertatia
    Posts
    2,506

    Re: PHP - XPATH - Scraping Data From A Page

    You understand filesystem paths, right? XPath is a little like filesystem paths given the 6-million-dollar man treatment, with a teleporter, targeting computer and a high powered sensor array. A filesystem path selects a file or directory, wherease an XML path selects document nodes. Nodes can be elements, attributes, comments, text and namespaces (note there is some overlap with DOM nodes).

    Every step in an XML path has an axis, a node test and a predicate ("axis::test[predicate]") and are separated by forward slashes, while filesystem path steps only have a simple node test (no axis or predicate). In an XML path, axes basically say where to go from the previous node (parent, children, descendants or siblings). Filesystem paths offer only the "child::" axis, which you don't need to specify (in either XPath or fs paths). Predicates are filters; if filesystem paths had predicates, they would do things like let you specify file size, owner, modification time or permissions in the path (e.g. "./*.db[size>1M]" to select all files in the current directory ending in ".db" with size greater than 1 MiB).

    Nodes are in document order or reverse document order. Which you get depends on whether the axis is a forward (e.g. "descendent::", "following-sibling::") or reverse axis (e.g. "ancestor::", "preceding-sibling::").

    For more information, read the tutorials. If you have specific questions about features of XPath (such as your question about node order), read the XPath 1.0 standard or ask them here.

    Note that with the DOM extension, you can use DOMDocument::getElementById(), DOMDocument::getElementsByTagName() and DOMNode::$childNodes instead of using XPath, but the resulting PHP code will be more complex.
    Last edited by misson; 09-01-2009 at 06:36 PM.
    Be sure to read all pages linked in this post; they have further information that should prove useful. When asking for help, make sure you follow Eric Raymond's and Jon Skeet's guidelines for prompt, accurate responses. Please answer any questions I ask; they're not rhetorical (probably). Any posted code is intended as illustrative example, rather than a solution to your problem to be copied without alteration. Study it to learn how to write your own solution.
    Misson, not Mission.

  5. #5
    masterjake is offline x10Hosting Member masterjake is an unknown quantity at this point
    Join Date
    Oct 2007
    Posts
    73

    Re: PHP - XPATH - Scraping Data From A Page

    Thank you. I will get on that.

+ Reply to Thread

Similar Threads

  1. white php page
    By monsterm in forum Free Hosting
    Replies: 8
    Last Post: 07-11-2009, 02:37 PM
  2. tons of PHP Resources
    By Chris S in forum Scripts & 3rd Party Apps
    Replies: 10
    Last Post: 01-16-2009, 10:07 AM
  3. PHP Uprgrade Please - Error On Account Page
    By neon8100 in forum Free Hosting
    Replies: 0
    Last Post: 11-23-2008, 02:37 PM
  4. [REQ][$$$]1 page image flash redesign for choclate sales
    By tgkprog in forum The Marketplace
    Replies: 5
    Last Post: 11-17-2008, 09:53 AM
  5. Replies: 5
    Last Post: 04-06-2008, 12:47 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
x10hosting free hosting for the masses
dedicated servers