Climate Change Tool Development: June 2009

Thursday, June 25, 2009

Added Features and getting ready for Beta (Alpha?) Testing

1) Delete Node
It is now possible to delete a node in the graph, along with all of its child edges. The main reason for why I was avoiding implementing this was because I still hadn't fixed the problem of multiple root nodes causing the javascript to crash. Once I figured out how to allow for more than one root the repulsive forces caused the disconnected components to push each other all the way to the end of the screen. Eventually I found a solution to the problem.
Allowing for the deletion of nodes took some reworking of how I built the graph, particularly extraction from the SQLite database. My method followed the precondition that each index 1 through the number of rows in the database corresponded to an edge, but with nodes being deleting this no longer held. Luckily, this problem didn't take me long to solve. Now I have delete working exceptionally.

2) SVG Edges, HTML Nodes
I really wasn't fond of the dotted line appearance that the edges had when made by HTML div tags, so I changed them to SVG. However, this took awhile because I was held back by desperately searching for how to use SVG's image objects or add background images to the SVG shapes. I ended up doing what I was initially trying to avoid - editing the JSViz module even more. I had to alter how the view object was created to use both HTML - for the nodes - and SVG - for the edges. Now the edges are one smooth object and viewing the graph is significantly quicker, since I now longer have to add multiple event listeners for each pixel of the edge.

3) Direct Graph
This took me two whole days - two days to put triangular markers on edges pointing in the direction of the relation. My first idea was to brute force the calculations and add a triangular SVG object on top of the edge, with the tip always pointing at the destination of the edge. This involved using the X,Y co ordinates of both the source node and destination node as my only two references to find the equation for the three points, no matter what the position was. So I busted out my geometry, and was I in for a surprise. I hadn't done geometry in a long time. I couldn't even remember how to get the equation of the perpendicular line to a point on a slope - yes, my method required knowing this. Needless to say, after a lot of scribbling formula's and drawing mock graphs what I ended up with was a dozen seizure inducing, constantly moving, polygons that weren't even remotely near where I hoped they would be. I abandoned that idea by mid-afternoon and set out to do research. It turned out that SVG polylines had marker objects which do exactly what I wanted to do, but they were in no way simple to figure out. I still can't figure out why a style property would be called marker-mid when it does not place a marker in the middle of the line, but instead places one at every odd vertex excluding the start and the end. I mean, marker-start and marker-end do what they sound like they do, so this baffled me. Eventually I discovered that an odd combination of editing marker properties to refY = "5" and refX = "-15", in combination with a path obect with d = "M 10 10 0 5 10 0 z" resulted in the arrow positioned where I want. Why does it work? I'm not really sure, except M 10 10 positions the center at (10, 10), 0 5 draws a line from (10, 10) to (0, 5), then 10 0 draws a line from (0, 5) to (10, 0), and z close the object. What these coordinates are relative to, and what refY = "5" and refX = "-15" are relative to is still a mystery to me.

4) Renaming
Renaming my extension from "url" to "BreadCrumbs" took a lot longer than I thought it would, since I had to edit files that I haven't touched in ages, like the manifest file or install.rdf. I used this time to separate my methods into more Javascript files to group related methods together better. This close inspection of my code allowed me to spot some lines that were unneeded or just generally sloppy. In addition to this I added more comments, gave variables clearer names, and did more work on the CSS.

5) Brainstorming / Implementing a way to easily convey to the user the order of sites visited
During my demo on Tuesday Jon Pipitone asked about a way to show the order that sites were visited in. Adding the directed edges was meant to address this, but I realize that if one node has multiple edges out of it the direction of the arrow conveys nothing concerning the order the edges were traversed in. So with some brainstorming I decided to add a sidebar menu which shows an ordered list of the websites, based one their entries into the database. The first one in the list is always the root, and then as you go down the list it gets closer to the most recently logged website. The information displayed is only the website title (with the "- Mozilla Firefox" no longer being recorded) and the date/time that the site was first visited on, while logging. To easier display this information I added an effect to the list so that when the user mouses over any node on the graph the corresponding entry on the sidebar lights up.

I will continue with this idea by having the list in its own container with a scroll bar, so that you don't have to scroll all the way down and not see the graph. Also, I would like to add an event so that when a node on the graph is clicked the list automatically scrolls to center that entry in the list container.

That's all in terms of added features, for now. Once I'm satisfied with point #5 I'll be writing up some release notes (I haven't done this before) to prepare for the first round of testing. My goal is to have everything ready for Wednesday, July 1st, then hopefully some grad students and/or students in the lab here, BA2270, would be willing to take some time to play around / find bugs / give suggestions / complain about it!

Thursday, June 18, 2009

Code Cleaning, Short Term Memory, and Bread Crumbs...What?

Lately I haven't added many new features to my extension. Instead I've been refactoring the code, changing variable names to make it more understandable, and separating related methods into a few extra javascript files. This helps me get everything straight it my mind - just today I noticed there was a file called graph.js and could not remember what it was for. It turned out it was an old file from two weeks ago, from when I was trying to build an XML file parser to build the graph structure into an XML file. This "code cleaning" will hopefully help me be more productive.
My latest addition has been a context menu, for right-clicking on nodes of the graph, to open up a few extra options - Delete, Collapse, New Edge (?). I'm really at the stage where I have to come up with ideas on how the user may want to manipulate this graph (Any suggestions will be much appreciated!). One feature that I'm interested in adding is one that Steve mentioned, when Anita Sarma came to visit on Tuesday, which I would term as giving the extension a short term memory. The situation is as follows: Sometimes you don't know when you begin that your browsing session will be important enough to record until you visit a number of sites and reach a certain point. In this situation my extension cannot help, the user has to know to begin with that their future browsing will be useful and relevant. Steve told a story about a group who were teaching students with mental handicaps who would continuously record the students, how they got along, and so forth. Even though they were constantly recording only a certain time frame was ever SAVED, say one hour. So if something happened that warranted actually recording, one of the supervisors could press record or save so that the film would actually be saved to a persistent storage, along with the one hour of film previously recorded that has not been wiped. I could add the same feature, not based on times but based on a user specified amount of URL clicks or site visits. The idea would be to continuously log sites - even if its off - but once a certain number of entries are in the database the first ones would get overwritten. Then, once the logging feature is activated, the history will also be there. Now, this would have to be an option that could be turned on and off as desired, since often times the past, say, 30 links before the logging feature were turned on may be completely irrelevant, and just cause more work for the user to delete all those nodes out. Also, I'd have to do some tests to find an optimal default number of sites to remember from the past. Any thoughts on the utility of this idea?
I've been trying to find more information that Gina Venolia has done, with Microsoft Research, related to semantic searching and the most effective ways to display information about a web page. However, I've yet to find the specific or relating papers. The limitations of using only favicons are readily noticeable, so I would to find a solution that can convey more information at a glace than just the website - that is, assuming the user is familiar with the site's favicon. Come early next week, I will be taking a closer look at thumbnails - a visual representation of the page layout is the most immediately identifiable form to convey all needed information in.
I'm relatively far into development now, yet I still have no name. I tried asking a few non-computer scientist friends, explaining the basic idea of it. The best suggestion I've received so far for a name is Bread Crumbs. Also, I remember from the initial discussion about this with Steve that he kept saying things along the lines of "It's like leaving a trail of bread crumbs for yourself." I'm still open for suggestions, but by the end of next week I'd like to have a name.

Monday, June 15, 2009

I'm never satisfied with how people do things.

I spent at least two days trying to get edge annotations, separate from node annotations. Its hard to modify the JSViz source code to do what I want; I've probably spent 5-6 hours reading through it, trying to understand exactly what the code does (aside from the Runge-Kutta Integration algorithm, that's too much for me). It probably would have been easier to have two tables - nodes and edges - but that would make saving/loading twice as slow. Not to mention it would take a lot more code. Trying to do it with only one table is more complex: if a node is the target of more than one edges it creates an ugly situation since I can't directly reference edges. I had to draw a few pages of diagrams to run through the code and get a feeling for what was going wrong. I eventually (after a full day of working) figured out a solution by using only the index of each edge in the database, and a dictionary to store the annotations in. Its not too pretty but it actually works, and the only modification I had to add to JSViz was change a half dozen methods to take an additional parameter.
Although my approach works there is a limiting factor to it - it slows everything down. The edges that JSViz makes by default are dotted. This is because the edges are rendered in HTML using div tags, which can me translated but not rotated. By making the edge appear as a dotted line the effect of an edge rotating when one node is dragged and moved around can be achieved by translating each pixel of the edge appropriately. If the edge was one solid line this would be very hard to do in HTML. As a result, there is no one cohesive "edge" object between two nodes, the edge is actually an array of pixels - the individual HTML div's. This causes problems for me because I was adding event listens for both mouse over and right clicks. In order for the mouse over to work with the initial dotted edge of about 5 pixels the mouse had to be perfectly on one of those 5 pixels, so it didn't work well. To solve this I increased the pixels of each edge from 5 to 50. This made the edge more solid and thicker, allowing an easier and smoother mouse over. But, in order for it to work, each pixels had to have three event listeners - mouse over, mouse out, and right click. This meant that adding one edge to the graph added 150 event listeners. As you can assume, this slowed the once smooth graph building down to a choppy mess. My plan is to do a bit of testing, eventually, to achieve the optimal number of pixels so that a mouse over event isn't frustrating for the user while still having the graph building process a smooth one. I think this is something to do towards the end, maybe getting a few people to use it and see what they like. I'd most likely show them where in the source code the line for the number of pixels is, so they can edit it and play around to find their "optimal pixel count".
The latest accomplishment I've made is updating the favicon extraction process so that it will always work on any site that has a favicon - even Facebook, which stores it at the obscure address http://static.ak.fbcdn.net/favicon.ico?8:132011. It didn't go smoothly either though, for at least an hour and a half I was confused as to why my update cased one of my functions to suddenly no longer become a function. It turned out I had a local variable url and the function was url.getPossibleFavicon(), a different url object. That's the problem with working on the same piece of code for the whole day, I get focused on the small details and lose track of the bigger picture.
This afternoon was spent preparing for the demo, centered around building a PC. The challenging part was trying to come up with a situation where the Back button's limitations really show. Also, trying to get a few different sites so the favicons are distinguishable took awhile. But overall I think I have a decent little presentation. I'm just glad I have everything working!

Wednesday, June 10, 2009

Updates and Frustration

First off, a great extension to generate or extract favicons:

IdentFavIcon: The actual generation of the custom favicons involves the use of a 32 bit cyclic redundancy table and does a lot of random alterations to the pixel colours and rendering context to produce a visually unique icon. One idea that I have to utilize this addon is to suggest using it in parallel with my extension. IdentFavIcon stores the custom generated icon in the moz_favicon table so it can be easily accessed by my extension. The idea would be to cause graceful degradation when trying to extract the favicon URL - first do my simple regular expression query of the moz_favicon database, if that fails use a simple but efficient function IdentFavIcon has to get the explicit URL, and finally if that fails IdentFavIcon would have created and custom one. I can extract the custom one by getting the last entry of in the moz_favcon table. If I do this then I no longer need to use my "dummy icon" (a black box with a question mark) for sites that do not have favicons. However, I'd be requiring the user to install a second extension to work in tandem. I could possibly copy the IdentFavIcon Javascript (there is only one) and integrate that into my extension, but that seems like the lazy way to do it. Not to mention it would feel like I'm stealing someones hard work. I'm still undecided as to what I should do.

Now for some status updates:

I spent all Tuesday doing that Save and Load functions. I'm glad to say they both work. It's rather simple, the Save part works by dumping the database to a text file, with '|||' separating each entry. Yes, '|||' is an odd separating string to use, but since I'm saving the user entered annotation I couldn't use something as simple as a comma. '|||' is really just an arbitrary choice that I suspect no one would enter into an annotation (unless they feel the need to make my extension crash and burn). The Loading portion parses the text file and inserts each entry into the database. Also, I've set it up so that Loading opens a new tab with the page that was last visited in the loaded session, so it is exactly like picking up from where you left off. My only gripe is that it takes a long time to Load - loading a file that is a copy of a database with 30 entries, and opening the new tab, takes about 4 seconds from selecting Open to having the tab opened. I suppose my method for saving isn't the most efficient but are there any alternatives?

Today was spent cleaning up my code more - getting rid of many global variables, renaming methods, doing a few things more efficiently/elegantly, and so on. I did however end up wasting about two hours trying to upgrade the logging system to use page loading events as opposed to a constant timer. This turned out to be completely futile, as the page load event always fires two or three times during the first load, for no apparent reason. I've looked at the Mozilla documentation plus three extensions that use the page load event but none of their methods worked for me - every time the event would fire multiple times. This frustrated me to no end, so I just gave up and reverted to use timers.

My current dilemma:

Imagine this situation: You're running this extension and it is turned on, so it is logging the sites you visit. You visit site A, so it gets logged (its source/parent is irrelevant here). You then decide you don't want the application running anymore, so you click the icon and it stops. You click a mildly interesting link and go to site B, then click another and end up at site C. Site C is interesting to you, you want to log it, so you click the icon and turn the site logging on. Now, you select "Open Graph" to see what your browsing graph looks like. Is there one graph with an edge Site A -> Site C or is there multiple graphs, one of which only has two nodes and one edge, Site B -> Site C?

Currently my extension will show Site A -> Site C in a situation like this. This makes the most sense to the user, that IS the path they followed, just without the not-so-important Site B. However, its not the TRUE path they took, so the graph is not an unbiased "history of clicks". I'll most likely stick to the current model, but giving the user the option to choose between the two interpretations might be a good idea.

Monday, June 8, 2009

On Thursday morning I had a chat with Steve about some use cases. The focus was on scientists - I have to make this into a tool tailored for the scientific community. I think that may be why so many previous applications similar to this have failed: they lacked a specific audience. It's an interesting tool to develop but people need more of an incentive to use it and more of an idea of how and why it is useful.

The most difficult features to implemnet would most likely be the graph manipulation features, related to clustering nodes. Since this is only the initial version I'll leave those for the end, if I think I have enough time to include them. Since I have the "Searching the web for papers" section completed and working (minor bugs excepted) my next task is going to be working on Save/Restore. I can think of two different ways to do Save/Restore:
1) Save the data to a formatted XML file
Pros: Can be viewed by anywhere, it will just open in a browser and all the information will be displayed.
Cons: It would be unchangeable, since there is no database annotations could not be changed and the user wouldn't be able to continue on from where they left off. This option only really produces a nice graphic - say for a presentation.

2) Dump the database to a .csv file
Pros: Fully editable and allows the user to continue where they left off.
Cons: Overwrites the old session. No immediate way to integrate into something such as a blog post. Can only be viewed in the extension itself.
Which route I take is really determined by what is more important: portability or changeability. My personal opinion is that the latter is more desired and useful. However, now that I think about it there is nothing preventing me from eventually adding an option like "Export to XML", in addition to the general Save/Restore.
While I wanted to to start on the Save/Restore process today I instead chose to improve my code. I spent some time modifying the way that URLs are logged and added to the database, in order to overcome the problem of having multiple root nodes if the user logs URLs in multiple windows. Also, I changed around the way edges are created so that circuits are now possible and don't make the graphing procedure crash. I also modified the insertion procedure so that if there is an edge A -> B then an entry of the form B -> A does not get included.

Friday, June 5, 2009

Status Report

It's been a couple days since I've last posted, but I'm happy to say that I've finally made some real progress on my extension.

My extension now properly builds the graph of a browsing session, with favicons as the graph nodes for each page. However, they don't always work. Sometimes a websites favicon URL differs greatly from the URL visited. In these cases the node image defaults to a black box with a question mark. What annoys me most is that many sites don't have favicons, but I guess there isn't much I can do about that.

(Unfortunately the picture doesn't show very well.) I've also implemented a few features to the graph. Double clicking an node opens a new tab in the browser and loads that node's URL. Hovering the mouse over a node of the graph makes a nice little tooltip open, displaying the URL default, but it's main purpose is to hold an annotation.

A single click on a node opens up a little dialog box to edit the annotation.

What I'd eventually like to do is alter the JSViz source code so that the edges between nodes are actual objects, then I could place the annotation as a mouse over event for the edges since that's really where it belongs.

Overall I'm very pleased with how it's all going! Now that I have the basics done I can start cleaning up my code, adding features, and altering the interface.

My plan for Monday is to work on a better regular expression to extract favicon URLs. Currently I just parse each URL until I reach .com or .ca or .org, then search the moz_favicon table for any URL that has a matching initial section, up until the domain name. It's not the most effective query string. The first real feature I want to add is a saving/loading options, which I'll go into detail about on my next post, (hopefully) on Monday. Also on Monday I hope to make a short post about the meeting I have with Steve on Wednesday morning, concerning use cases and potential features.

Tuesday, June 2, 2009

What Doesn't Work

Today I've found out what doesn't work:

Using a hidden iframe to open the URL and this get its windowContents property, transferring that onto a canvas, does not work. Even though Mozilla says it should . Odd, isn't it?

Once the URL is loaded, prior to its information being inserted into my database, saving the windowContents to a canvas then converting that to a string - its base 64 representation - to be inserted into a database does not work. I think the problem is that inserting a string with nearly 10,000 characters may be pushing SQLite. I'm not 100% sure if this is the problem though, since the only notification I get is a catastrophic error message with the only relevant information being the line and NS_ERROR_COMPONENT_FAILURE which, lets face it, doesn't tell me much.

So currently thumbnails are a bit beyond my grasp, but during the meeting today I was given a great suggestion to use for now; instead of a thumbnail - use the favicon icon (I'm sorry, I forget what the name of the girl who suggested it!). I predict that this will work well, simply because I can query the SQLite database that Mozilla has for storing history related information though the moz_favicon table. The table has a reference to the URLs of the favicon icons. There are two issues I need to solve:

How do I tie each URL to the appropriate favicon URL?

I think I'll try to link them by querying the moz_favicon table. When a URL is being added to my URL table, query the moz_favicon database table with a regular expression containing the websites URL then add that URL as another column in my url table. However, it can't be the exact URL that I query the database with - that won't be in the moz_favicon table - so I have to instead get the relevant portion. I could possibly search from the beginning of the string to .com OR .ca OR .org OR...every possible domain. This isn't pretty, but it should work, in theory. Then I have to ensure that the favicon is indeed stored there. Unfortunately I have already ran into a site which has it stored at some obscure location; the Google favicon for some site I've visited was stored at http://www.gstatic.com/news/img/favicon.ico (found through a SQLite database manager extension), clearly the domain does not begin with google.com or google.ca. I think I'll try this route anyway, since its the only reasonable idea I can come up with. My other option would be to parse each website's HTML code and find a link tag which has rel="icon" or rel="icon shortcut", the two most prevalent tags used for linking the favicon. Once I found the proper link> tag I just insert its href attribute. What I don't like about this approach is that I'm relying on the website's coder to be tidy; I'm relying on them giving the tag an appropriate rel attribute (according to Wikipedia).
I'll take it one step at a time:
Step 1 - Use a placeholder/dummy image but just try to build the graph of a session.
Step 2 - Replace placeholder image with the favicon image of that site; this shouldn't require too much alterations if I can do Step 1.
Step 3 - Generate thumbnails for the nodes. This step I might not get to, unfortunately, but I'd really REALLY like to. If I'm not able to implement this during this summer it makes for a good starting point for future enhancements to my extension.

Does every site have a favicon image?

There is some standardization for file formats for favicon images and such, but I'm not sure if they're mandatory. I think I'll create a dummy image if my basic queries can't find the URL, until I research this question further and get a definite answer.

Since I now know how I'll approach the thumbnail issue (instead of flailing around with canvas examples and stumbling through documentation) what I want to do is choose a library to do the graphing of the data. Once I have this I can start doing simple tests to make sure my code works, and then really start making progress. The use of favicons makes it easier to choose, since they're direct URLs to the images I can use JSViz or, as Ainsley showed me to, Graph Gear - both offers nice graphs and use URLs for the images. But this got me thinking. If I go in this direction the whole meat of this extension will be tied to using URLs for the graph nodes. This is good for my use, favicons, but once it gets upgraded to use thumbnail images the code would have to be reworked quite a bit. Most extensions that have anything to do with thumbnails render them with canvas'. This is doable but it takes a lot of time to wrap my head around how exactly they're going about it - how they're saving the data without actually saving an image. Should I be concerned with this? Tomorrow I'll ask Steve about this. I know it would be possible to save them all to the users local hard drive - but would people be comfortable with that? Once I have a working prototype I'll have to find some testers willing to critique it.

Climate Change Tool Development