Sending GET packets with Java

ParallelLogic

Member
Messages
35
Reaction score
0
Points
6
My project: accessing this page (and pages by other users)
HTML:
http://spyed.deviantart.com/badges/
. At the bottom you will notice a "Next Page" button that will list the next set of five users to give this user a "badge". I wish to compile a list of all users that have given and received a badge from this user. To do that, it is my believef (and I may be mistaken), that it would be faster to send GET packets and get the data that way rather than individually loading the entire page url?offset=5 ... url?offset=10 ... etc.

I'm using Wireshark to view the network activity that happens when I press the "Next Page" button. I see a GET packet is sent and some data is sent back to me (and the last bit of this return packet has the data I'm looking for). My problem is I don't know how to send the initial GET packet with Java. The examples I've come across have to do with adding parameters to the GET packet sent to established websites (like search engines) as opposed to appending them to the URL - not for dealing with replicating an existing packet exchange.

I wish to know how to make Java perfectly replicate the packet that is seen in Wireshark - or make it replicate it close enough that I can get the data I'm after.

Thank you very much
 

essellar

Community Advocate
Community Support
Messages
3,295
Reaction score
227
Points
63
You're going to be calling the same URL as the script is currently calling (and likely getting JSON in response). The response will only give you five entries at a go, plus an indication of whether or not there is another page to be called. You don't need to create a "packet", just call the URL using the .net package and feed it to an input stream to read it. Here's an example from the Sun^H^H^HJava.net site:

Code:
URI uri = new URI("http://java.sun.com/");
URL url = uri.toURL();
InputStream in = url.openStream();

You have the base URL already, or at least know how to create it. (There is no way I'm wading through all of that minified JavaScript to figure out how they're doing it on the deviantart site.) The page you're calling will only provide 5 entries per call, but at least the response will be (apparently) JSON, which is easy enough to parse and compact. It will mean a lot of calls, but that's the price you pay for using an undocumented, non-public API.
 

ParallelLogic

Member
Messages
35
Reaction score
0
Points
6
There is no way I'm wading through all of that minified JavaScript to figure out how they're doing it on the deviantart site
I presume you are referring to the URL that appears in the status bar of the browser when you roll over the "Next page" button? I'm just wondering if that sounds about right for what you are referring to.

JSON, which is easy enough to parse and compact.
I found a JSON decoder http://jsonic.sourceforge.jp/ and am currently using that. I've never worked with JSON before, so if you know a better decoder, I'd appreciate knowing about it.

Code:
import net.arnx.jsonic.JSON;
import java.net.*;
import java.io.*;
public class JSONdecoder
{
    public static void main()
    {
        try{
            URI uri = new URI("http://spyed.deviantart.com/badges/");
            URL url = uri.toURL();
            InputStream in = url.openStream();
            Object out=JSON.decode(in);
            System.out.println(out);
            in.close();
        }catch(Exception e){ e.printStackTrace(); }
    }
}
I'm currently getting a 403 error. Are you saying the url should be something from within the javascript calls? Or something I can extract from the packet sent to the devart server perhaps?

You help is truly appreciated.
 

essellar

Community Advocate
Community Support
Messages
3,295
Reaction score
227
Points
63
The latter: your packet sniffer will tell you what URL is being called (the GET address).

What you see in the status bar is the URL that will be called if JavaScript is disabled in the browser. That URL will respond with the entire web page. You only want the data (as JSON).

The JavaScript on the page you linked to is used to call a URL (that is, to make a GET request) to a page that only carries the data used to change the table (and the previous/next links). All we can know about the URL directly is that it uses the "offset=x" value; that offset number is likely fed to a common library function that both the HTML web page and the JSON page use to make a database query. It would be possible to examine the JavaScript on the web page to see what the JSON URL ought to look like, but the JavaScript has been minified (made smaller for downloading) and minification is also a pretty good obfuscator (that is, it makes the code much harder to read and understand). For example, this function is easy enough to read (if redundant and useless):

Code:
function addToArray(newMember, existingArray) {
   if (!existingArray) {
      existingArray = window;
      }
   try {
      existingArray.push(newMember);
      return true;
      }
   catch(e) {
      return false;
      }
   }

The same function, minified, would look something like this:

Code:
function aA(a,b){b=b?b:window;try{b.push(a);return true}catch(e){return false}}

There's no indication there what the variables represent, and the function itself is just a variable used elsewhere in a longer script. This is a small example of a very simple function; you can imagine what it's like to wade through a 50 kilobyte source, add the line breaks and indentation to figure out the structure, then make educated guesses as to what the variables represent and what the functions are doing. If you can get the GET address from your sniffer, it'll save a whole lot of work.
 
Top