Friday, May 29, 2009

Thumbnails, Thumbnails, Thumbnails

    Today I decided to work from home, since all I'm really doing is research. I've been looking at a few FireFox extensions that involve web page thumbnails in one way or another, trying to determine how they generate the thumbnail.

    After jumping back and forth between each method that involves any "thumbnail" reference I think I've found the source, the following SQLite statement 

  "SELECT moz_historyvisits.id, title, domain, visit_count, screenshot, dayvisits, moz_places.url, visit_date " +
  "FROM moz_historyvisits " +
  "JOIN moz_places ON moz_historyvisits.place_id = moz_places.id " +
  "LEFT JOIN wr_places ON wr_places.url = moz_places.url " +
  "WHERE moz_historyvisits.id = ?1 " +
  "AND visit_type NOT IN (4,7);";

    The origin of all thumbnail references come from this SQLite statement and, in particular, there is,

  var thumbnail = createRootNodeStatement.getUTF8String(4);

     which gets the 4th variable returned from executing this statement - whatever is in the "screenshot" column. I was hoping that screenshot would be a picture, but according to the database initialization the screenshot column is just text. What really confused me the first few times I read over all the source code is that not once, in any of the Javascript files in WebReview, does any data ever get Inserted into the database. I know most of the tables dealt with here are built in to FireFox and it's history management system, but there is still the table made by WebReview

webreviewDBConn.executeSimpleSQL("CREATE TABLE IF NOT EXISTS wr.wr_places ( url TEXT PRIMARY KEY, frequency REAL, dayvisits INTEGER, domain TEXT, subdomain TEXT, screenshot TEXT, daysession INTEGER );");

    which never receives any data directly from a call in the JavaScripts. The only conclusion I can reach is that in the long SQLite statement, where the screenshot is extracted, the line "LEFT JOIN wr_places ON wr_places.url = moz_places.url " synchronizes the data on the moz_places table with wr_places to somehow generate useful data for the screenshot column. I found this site  which gives a rough idea of what each of the tables in the places.sqlite database contain, but none show any obvious screenshot related information. What I really need is an SQL expert to decode that statement for me, in hopes of getting more clues as to where the screenshot data comes from.

    Screengrab uses some Java methods to generate it's screenshots, the key ones being

var image = java.ava.awt.Robot().createScreenCapture(new java.awt.Rectangle(box.x, box.y, box.width, box.height));
  Packages.javax.imageio.ImageIO.write(image, "png", b64os);
  b64os.close();
  return "data:image/png;base64," + baos.toString();

    where box has a references to the screen dimensions. Screengrab uses its own custom class, Base64$OutputStreem, which I may have to decompile and read. The program basically gets the screen capture through the Robot's createScreenCapture method, and saves it to their custom Base64 OutputStreem. What's returned is the Base64 string representation of the image, which can be applied to any html img object through img.src = the returned string. I like this way of getting the screenshot - nothing is actually being saved, only the raw data of the screenshot is stored in the image src attribute. But would it be okay to decompile this custom made Base64$OutputStream file, understand it, then use it for my own extension?

    One extra piece of information I found while reading the source code is that there appears to be a method to save files built in to Mozilla. nsIFilePicker  creates an open/save dialog box, so this would be very useful for implementing a future save/load function to the graph. Also, nsIFile XPCOM should help in creating temporary files if needed.

    Showcase produces a nice thumbnail view of all the pages in tabs you currently have open. The source was pretty daunting to rummage through - one file had nearly 7,000 lines - but when I found how they actually make the thumbnails I was surprised, since it was so easy. It uses a drawWindow method for canvas's which renders the entire web content, given the dimensions, into a canvas object. I don't know why the previous two extensions didn't use this method, as opposed to their elaborate work around. I suppose there is the possibility that, for the Screengrab extension, saving it to a canvas does not help with actually saving the image. Using a canvas seems to be the most reasonable route to create the thumbnails. However, JSViz uses a CSS style property backgroundImage to set the image background, and it takes a URL as the parameter. This means I won't be able to use the JSViz library to create the graphs if I use canvas' to draw the thumbnails on. 
    On closer inspection it appears that WebReview uses a canvas to draw it's thumbnails as well, except it uses the drawImage method instead of the drawWindow method. This could be useful if I have problems with the drawWIndow method. However, I would still need to know what information is contained in the screenshot column of the database table. Based on what I learned from the Screengrab extension, I think that since it is a string it is most likely the base 64 encoding of the image that gets store in that column. How the information gets there with no table inserts is a mystery to me, though.
  
    As it stands I've made one step forward - I think I could do thumbnails! - but one step back - no more JSViz. I'll have an interesting time on Monday trying to find a graphing application/library that is compatible with Javascript canvas object (Steve told me not to design my own algorithm - thank god!). There is one more option I found though, which would work with JSViz. PageGlimpse offers a service to that offers developers access to thumbnails of any web page. Sounds perfect, right? Well, almost. It is indeed free, but only if I use under 300gb/month. This is a lot of bandwidth for just some small thumbnails, but to access the site through an application I write I would have to send my developer key (obtained through a free sign up) so the amount transferred would be linked to me. Also, since this is a FireFox extension, my hopes is that many people would be using it but the more people use it the more potential there is to go over the limit. Plus the site is still only in the beta stages; tying my extension to a site that I have no control over and could possibly shut down or decide to charge a fee does not sit well with me. Still, it is an interesting service to offer.
 
    The last thing I'd like to make a note about is this Python script using Mozilla to create a thumbnail of any URL. It's deceptively simple and only a hundred lines or less, but if it works correctly (haven't bothered testing) it would be a good option to look into if I have to cut my application away from being a FireFox extension and default to be a stand alone browser application. 
 
    I'll spend the rest of the day now cleaning up my test extension, which is now almost doesn't require clicking to log every website.

No comments:

Post a Comment