+ Reply to Thread
Results 1 to 2 of 2

Thread: Perl/Tkx incorrectly displayed characters

  1. #1
    bvforth is offline x10Hosting Member bvforth is an unknown quantity at this point
    Join Date
    Jan 2010
    Location
    East Coast
    Posts
    3

    Perl/Tkx incorrectly displayed characters

    Hi there!

    I'm using Perl/Tkx to grab webpages (LWP), then parse the HTML (HTML::TreeBuilder), then, once I find the right nodes, put the result of $element->as_text() into a text widget (new_tk__text).

    The problem I'm having is that some characters (those with umlauts, or accents, for instance) are incorrectly displayed in the Tkx widget, and sometimes incorrectly in my terminal window, too.
    I've tried to fixed this by saying utf8::upgrade($content) but this doesn't seem to have any effect, even though the page says that it's encoded in utf8.

    Could anybody shed more light on this? I've looked through a bunch of unicode resources, but so far, nothing I've tried in perl (encode, decode, use encoding 'utf8', utf8::upgrade) seems to have even the slightest effect on my strings (based on the unvarying output of print). I really don't know if this is a unicode/encoding/decoding issue. Also, I've found that sometimes there are character glyphs missing from Tkx, but I don't know whether this is the problem.....or how to rectify that if it is.


    my code:



    #!/usr/bin/perl

    use strict;
    use warnings;
    use LWP;
    use Tkx;
    use HTML::TreeBuilder;
    use Encode;

    my $url = "http://dict.leo.org/ende?lp=ende&lang=de&searchLoc=0&cmpType=relaxed&s ectHdr=on&spellToler=&search=for";

    my $lwp = LWP::UserAgent->new();
    my $response = $lwp->get($url);
    my $content = ${$response}{_content};
    #utf8::upgrade($content);#doesn't seem to do anything

    print "encoding: ".$response->content_charset()."\n";
    my $root = HTML::TreeBuilder->new();
    $root->parse($content);
    $root->eof();
    my @nodes = $root->look_down(_tag => 'div', 'id', 'singleword');

    my $mw = Tkx::widget->new(".");
    my $text = $mw->new_tk__text(-width => 100, -height => 30, -wrap => "word");
    my $scroll = $mw->new_ttk__scrollbar(-orient => 'vertical', -command => [$text, 'yview']);
    $text->configure(-yscrollcommand => [$scroll, 'set']);

    my $display_text = $nodes[0]->as_text();
    my $encoded_text = Encode::encode("iso-8859-1", $display_text);
    print "results: $encoded_text\n\noriginal: $display_text\n";

    $text->insert("end", $encoded_text);
    $text->g_grid(-column => 0);
    $scroll->g_grid(-column => 1, -row => 0, -sticky => 'ns');

    &Tkx::MainLoop();



    #####################


    Mac OS X 10.6.2
    Perl 5.10.1
    Tkx 1.08

  2. #2
    misson is offline x10 Spammer misson is a jewel in the rough
    Join Date
    Mar 2008
    Location
    Libertatia
    Posts
    2,506

    Re: Perl/Tkx incorrectly displayed characters

    The incoming data needs to be decoded before you can use it.

    Code:
    my $response = $lwp->get($url);
    my $content = Encode::decode($response->content_charset(), ${$response}{_content});
    You don't need to encode the text before displaying it.

    Code:
    $text->insert("end", $display_text); # rather than $encoded_text
    Nicely done describing the problem, by the way.
    Last edited by misson; 06-09-2010 at 11:48 PM.
    Be sure to read all pages linked in this post; they have further information that should prove useful. When asking for help, make sure you follow Eric Raymond's and Jon Skeet's guidelines for prompt, accurate responses. Please answer any questions I ask; they're not rhetorical (probably). Any posted code is intended as illustrative example, rather than a solution to your problem to be copied without alteration. Study it to learn how to write your own solution.
    Misson, not Mission.

+ Reply to Thread

Similar Threads

  1. Type email incorrectly on sign up?
    By refloghctarcs in forum Free Hosting
    Replies: 1
    Last Post: 01-27-2010, 09:36 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
x10hosting free hosting for the masses
dedicated servers