+ Reply to Thread
Results 1 to 7 of 7

Thread: Parsing html

  1. #1
    jensen's Avatar
    jensen is offline x10 Lieutenant jensen is an unknown quantity at this point
    Join Date
    Nov 2005
    Location
    At my desk
    Posts
    438

    Parsing html

    How do we parse the raw html page to retrieve the data so that we can display it on our website? Am trying to get a image that can show the number of current active members on this forum on my website.
    "For I am not ashamed of the gospel of Christ: for it is the power of God unto salvation to every one that believeth" Romans 1:16

  2. #2
    The_Magistrate's Avatar
    The_Magistrate is offline x10 Elder The_Magistrate is an unknown quantity at this point
    Join Date
    May 2005
    Location
    PA
    Posts
    559

    Re: Parsing html

    Really, the easiest way to do it would be to ask if there were an XML file which might have that data. I don't know if one is available, but Corey or another admin might be able to create one.

    Otherwise, the way I've done it is using PHP. Below is a little example of some code which you would use to parse through a webpage to find a specific line or piece of a line:

    This will allow you to read information from more than one line in the HTML source....

    Code:
    <?
    // Read the webpage into the PHP script and store it as an array of strings
    $theWebpage = file(##A VALID URL##);
    
    // Read each line of the array
    foreach ($theContents as $key => $value)
    {
            // If the string you want to find is in this line...
    	if(strpos($value, "##START OF THE STRING TO FIND##") !== FALSE)
    		$start = $key;  // This is the line we want to start at
    
            // If the start was already found, and the string you want to end on is in this line
    	if(isset($start) && strpos($value, "##END OF THE STRING TO FIND##") !== FALSE)
    	{
                    // This is the line we want to end at
    		$end = $key - 1;
    		break;  // Stop the loop.
    	}
    }
    
    // Read all the lines in the webpage between the beginning and the end and store it.
    for($i = $start; $i <= $end; $i++)
    	$theExtractedInfo .= $theWebpage[$i];
    ?>
    Getting Started | Terms of Service | Paid Hosting | Forum Rules | Free Server Status | Banned Countries

    If I have helped you through one of my posts, please click the
    blue checkbox on the right below my avatar to add to my reputation.

  3. #3
    jensen's Avatar
    jensen is offline x10 Lieutenant jensen is an unknown quantity at this point
    Join Date
    Nov 2005
    Location
    At my desk
    Posts
    438

    Re: Parsing html

    that's a great help. you must be coding regularly. will fit it in and learn. Thanks.
    "For I am not ashamed of the gospel of Christ: for it is the power of God unto salvation to every one that believeth" Romans 1:16

  4. #4
    Bryon is offline Administrator Bryon has disabled reputation
    Join Date
    Apr 2005
    Location
    Northfield, NH
    Posts
    7,608

    Re: Parsing html

    I have a suggestion with this.

    You should have a cron script run every few minutes that will get the data and save it (With a file or with MySQL). Loading a page over HTTP can take a few seconds to retrieve the entire page, so.. If you had an image it's load time would be a few seconds if it was directly taking the data from a site.

  5. #5
    jensen's Avatar
    jensen is offline x10 Lieutenant jensen is an unknown quantity at this point
    Join Date
    Nov 2005
    Location
    At my desk
    Posts
    438

    Re: Parsing html

    That's an important consideration that I overlooked. The loading time.
    Won't the cron also need to load the page over HTML? Anyway, am not sure how to command and run the cron. Also with the concerns about cron taking up the server bandwidth, I'll take it step by step.

    But am always ready to learn from the wise programmers at x10. You're the best.
    "For I am not ashamed of the gospel of Christ: for it is the power of God unto salvation to every one that believeth" Romans 1:16

  6. #6
    Origin's Avatar
    Origin is offline x10 Elder Origin is an unknown quantity at this point
    Join Date
    Mar 2005
    Location
    Silicon Valley, California
    Posts
    541

    Re: Parsing html

    Best one of those who have helped you.


    Visit OriginXT.COM Network! It has a popular RPG, featuring a battle system, an original RPG Shop, and much more! In addition to the RPG, we also feature a community with almost a quarter million posts, hundreds of free games in the arcade, free downloads and even an instant chatroom! Just come on and join the 1500 community members in hot debates and fun; you're SURE to love it!

    My blog (Origin XT Blog), Homepage and WebClubs.org

  7. #7
    Bryon is offline Administrator Bryon has disabled reputation
    Join Date
    Apr 2005
    Location
    Northfield, NH
    Posts
    7,608

    Re: Parsing html

    Quote Originally Posted by jensen
    That's an important consideration that I overlooked. The loading time.
    Won't the cron also need to load the page over HTML? Anyway, am not sure how to command and run the cron. Also with the concerns about cron taking up the server bandwidth, I'll take it step by step.

    But am always ready to learn from the wise programmers at x10. You're the best.
    The cron job wouldn't be hard to setup..

    Once you have the script that loads and parses the HTML, have it update a mysql db table with whatever information you'd like to store (Users online, for example). Then for the cron job, you could use the command:

    Code:
    php -q /home/[username]/public_html/path/to/cron/script.php
    I would say having the cron run every 5 minutes would be decent.. The script itself wouldn't really impact the server much. (Regarding CPU/MySQL usage)

+ Reply to Thread

Similar Threads

  1. Hybrid's HTML Lessons
    By Hybrid in forum Tutorials
    Replies: 18
    Last Post: 11-28-2009, 02:12 PM
  2. showing html on website
    By swirly in forum Scripts & 3rd Party Apps
    Replies: 2
    Last Post: 12-04-2005, 08:03 PM
  3. [IPB] Contiguous Board Index
    By phenetic in forum Tutorials
    Replies: 5
    Last Post: 09-18-2005, 10:31 AM
  4. What is XML!
    By wizeman in forum Tutorials
    Replies: 6
    Last Post: 08-27-2005, 12:05 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
x10hosting free hosting for the masses
dedicated servers