Thursday, August 20, 2009

Google Code Page

Check out my google code page at: http://code.google.com/p/breadcrumbsplugin/

In the Downloads tab there is a link to my BreadCrumbs poster, shown at the U of T Department of Computer Science's Undergraduate Poster Session, along with a downloadable screencast showing the basics of BreadCrumbs.

Monday, August 10, 2009

Updated Version and Future Plans

    I'm finally back to work after a weeks vacation, and I've spent the day busily reorganizing/cleaning/documenting my code. I've tried to split the code into more files to make it more comprehendible, along with giving better explanations for each function and pointers to related functions/JavaScript files. Luckily after moving everything around nothing ended up breaking, and it all seems to work exactly as it used to.

    Throughout my vacation I was still working a bit, from home, because there were a few things I wanted to add. First of all, I changed the layout of the buttons on the graph page. It was getting a bit too cluttered for me, so I added tabbed menus - Graph, Filter, and Options. I've moved the Save and Load options into the graph page, instead of being in the context menu of the Idle/Recording button on the browser overlay. I think it looks nicer, functions better, and it wasn't too much of a problem to add, I just found a script for making tabs called JavaScript Tabifier. It has an MIT license and says I can "modify and use in commercial products", so I don't think I'll have any legal problems using it. That's something I have to check out soon though, not specifically for this tabbing script but what kind of licensing information I have to provide with my extension. 

    The last thing I did over my vacation was add another column to the database which stores the coordinates of each node. If you reload the graph for any reason, prior to reloading the database gets updated and saves the coordinates per node. This was a feature that I few people thought would be useful; even I was getting annoyed with losing my bearings, after I move the graph around to exactly how I wanted it. Since I already have this part done I think one of my last features for the summer will be to allow users to label certain areas of the graph. I'm not yet sure how to do this, but now that I have the positions of each node saved it shouldn't be too hard, possibly add another table to the database that stores the labels contents and location then just copy the context into a div tag to attach to the graph. If I can finish this the last thing I'd like to do would be to add a rating feature, has Nelle suggested, to signify the importance of the page. Originally I was going to do it by scaling the size of the corresponding node based on the length of time the page was viewed for, but that’s a very error prone solution. So I'll probably end up adding a 1 to 5 rating for each node for the user to manually enter.

    Finally, the updated version:

    BreadCrumbs v1.5

    Note: The database table has been changed, so if you have an older version you have to navigate to your Firefox profile directory and delete breadCrumbs.sqlite prior to installing this updated version. This website should tell you how to find your profile directory.

    There are also two buttons to add, the View Graph button and Quick Title Edit button. The View Graph button opens the graph page, efficiently the same as right clicking the Idle/Record button and selecting View Graph. It's hotkey is alt+g. The Quick Title Edit button can only be clicked if BreadCrumbs is Recording. It opens the menu to annotate the title of the current website, and it's hotkey is alt+q. To add these buttons to your toolbar select View -> Toolbar -> Customize, then drag and drop the buttons.

Thursday, July 23, 2009

Progress, Problems, and Another Release (soon)

    The problem with thumbnails not always extracting still hasn't been completely solved. I wanted to use a timer, but a timer doesn't work the same as, say, wait or sleep in C programming. I tried making it recurse on itself until the thumbnail that at least 1000 characters, but that exceeded the recurssion depth too easily when websites load extremely slow. Also, I can't simply put it in a loop because Firefox will think my script is frozen while it loops until the web page sort of loads, and then Firefox freezes. The Mozilla Developer Center says here that it isn't a good idea to extract page thumbnails during the onload event, and I would have to use the MozAfterPaintEvent. But, that is only in Firefox 3.5. Right now I just have a default image if the thumbnail could not be extracted properly.
    Yesterday Jon Pipitone was giving me lots of suggestions for improvement. One idea that I really like is to add a filtering function, to remove all pages of a certain website. I don't think this would be too hard to implement, using a simple regular expression comparison while building the graph can easily filter out nodes of the chosen type. 
  The big issue to add is some sort of relevant ordering to the nodes. This is hard to do because I have to work around the limitations of JSViz. However, what I think I can do - for now - make root nodes appear in a sensible manner. Currently they spawn randomly across the screen, which isn't helpful. I can alter it so that root nodes appear in chronological order, either lined up vertically or horizontally. This would only help with at least a few root nodes, but it is better than nothing. For now, I'll try this and see if people think it helps.
    Two recent features I've added: Quick Edit Title button and Live Updates. The Quick Edit Title button is a button on the browser toolbar to instantly open up the window for editing the title, it is basically a shortcut. Instead of getting to a site, thinking of a little note to leave, opening the graph, finding that node, right clicked and entering the new note, you can just click this button and do it fast. I've implemented Live Updates so that you no longer have to reload the graph any time you log a new site, it will update itself in real time. 
    I haven't gotten much feedback from posting my extension a few weeks ago, but then again there were only 10 downloads. Soon I'll be putting it up on my CSLab page, with all the latest features. I think before I do that I want to have some more reliable way to organize nodes (or at least root nodes) along with testing everything even more. And I have to write up another set of release notes - that's never fun though.

    And finally, I found out that JSViz has a limitation for the number of edges come from/to a single node: 18. 18 edges are fine, but adding a 19th edge causes everything to catastrophically fail. The only reason I discovered this was because Ainsley tested her Trac plug-in on a large repository and it failed spectacularly, so I tested this by going to Wikipedia and opening every link on one page in new tabs until it crashed. My theory is that the minimum distance set for the magnetic forces can not be maintained once the 19th node is created, since there is not enough room for all of the nodes to evenly spread out around it. It could also be that the magnetic repulsive forces are so compacted and balanced that the 19th node throws the center node off center, causing it to rocket accross the screen. It has to do with the forces because a root node can have more than 19 edges attached to it. Here are so example files:

    Latest Version of BreadCrumbs

    (Note: I'm providing it early with no release notes just to illustrate this edge/node limit per node, some features still are not completely ready [ Stop Animation still has bugs, along with some deleting issues]. If you have an older version of BreadCrumbs please go to your Firefox Profile directory and delete "breadcrumbs.sqlite" as that old database does not have a thumbnail column.)

    Session file that illustrates the breaking [Copy into a text document and save as .session]

    Session file that shows a root with ober 19 nodes/edges [Copy into a text document and save as .session]

Friday, July 17, 2009

Thumbnails

  I'm revisiting the the idea of having thumbnails - not as each node, since that would cause the graph to be far too large, but to appear upon mousing over a node. Now that I know more about Javascript and Firefox it wasn't very hard to be able to get a thumbnail of each page. However, ever a thumbnail that's 20% the size of the real webpage will be huge when converted to a string using the Canvas method .toDataURL(). I was making some thumbnails that were, again, nearly 10,000 characters. So, there was no way I could reliably insert a string that long into the SQLite database for each node. 
  After talking it over with Steve he suggested trying to use the cache to store the thumbnails in temporarily, and then when the user saves the result would be a directory stucture. The idea is similiar to how Firefox can save webpages, by creating the directory structure of the page and saving all of the images in that directory. I figured this would work, but I didn't like the idea of having to save a folder instead of just one file. 
  I spent at least three hours searching around the documentation about Cache on the Mozilla Development Center but there were no examples of how to use it, and the descriptions of all the Cache related functions were very vague. However, I used this site Mozilla Cross Reference that Blake Winton suggested and found this which was exactly what I needed, to understand how the function can work together. However, I still wasn't sold on the idea of storing in the cache and having to save a directory, not just a simple file. 
  Before trying to manipulate the cache I tried searching for more extensions that deal with thumbnails. I found that WebReview, one of the first extensions I looked at, had been updated. I looked a lot at the source code and it had changed a lot, so I hoped to get some more information about how it stored thumbnails. Unfortunatly, when I installed it it didn't work at all, the graph portion just kept raising exceptions whenever I tried to open it. And the part that I really wanted to know, what it inserted into the database, wouldn't execute either. So, no hope for finding out how WebReview stores thumbnails.
  The next extension I tried was Thumbstrip, and luckily this also has to save a lot of thumbnails. I played around with it for a bit, and saved a session. When I looked at the saved file it turns out that they did it exactly how I initially planned on doing it - the thumbnails were saved in their text form and they were very long. I tested it, going to about 25 sits, then saving it. The resulting save file was over 5mb with nearly 40,000 lines and 6 million characters. But it worked, so I figured I might as well try my initial idea.

  I altered my database and tried inserting a few thumbnails - in text form - into it, and also changed the each node's mouse over div tag to show the thumbnail instead of the title. The result?

  

  It does exactly what I wanted it to. Now I just have to flush out a few bugs, since sometimes the page isn't fully loaded and the thumbnail is incomplete, but that shouldn't be too hard to fix with a timer.

New Approach

I've been trying to change the way I program, and this is the first real attempt I've made.  It's a simple function - deleting an edge. However, instead of just diving in and coding and running and see what works and what errors I get, I sat down and wrote out on paper what I'd have to do. I took into account as many possible boundary cases as I could think of and sketched exactly what I had to do. Here's the final version of what I wrote out:

deleteEdge(edge, edgeContainer)  

    1) Remove edge from edgeContainer (SVG object) - it can't be seen now.
    2) Extract row edge.idx from the database, getting its Source, Destination, and Title. [destination, source, and title are column names of my database]
    3) Attempt to extract similarEntry from the database, where destination = Destination (from step 2) and source != Source (from step 2). This is checking for if there is another edge that leads to Destination.
  
    if (similarEntry is not NULL) {
        // Then the Destination of this edge to be deleted has at least one other edge leading to it, so after deletion it will NOT become a root node.
        if (similarEntry's row > edge.idx [it's row number]) {
            // This means that the edge to be deleted is the first entry in the database where Destination (from step 2) is in the destination column. This is important because the user altered title for each website is stored in that websites first occurance in the database. So, we have to copy the title from this entry to the next reference of Destination - similarEntry (since we know it comes after it since its row number is greater).
            4) Update row similarEntry to have title = Title (from step 2).
        }
        5) Delete row edge.idx from the database.
    } else { 
        // There is NO other entry in the database where destination = Destination and source != Source, so the Destination of this edge we're deleting will become a root node.
       6) Update row edge.idx so that source='NULL', making it a root. 
    }
   
    7) Reload the graph_page.html, to reset the forces.
    8) Done.

    Once I had this all written up writing the code was simple, and aside from a few typos it worked perfectly on the first run. This is a major improvement over my standard method of writing the first idea that comes in to my head, then running it and fixing bugs over and over until it works.

Friday, July 10, 2009

Features to Come

    So I've gotten the first draft of the release notes up. I hope they'll suffice. It took a lot longer to do than I originally had planned due to testing and documentation. I'm pretty picky on documentation and trying to rename variables/functions to make everything more understandable. 
    The testing process mostly found errors where I had variable names changed. The frustrating bugs came from boundary cases and specific series of events that had to occur which would lead to an error or a site not getting logged. One example is that having multiple tabs open prior to turning logging on causes problems since those sites didn't get logged (no Load event occured to log the url) and thus would create unbounded non-root nodes. In order to solve these I had to add multiple if-else statements to make sure that each property required to successfully log the URL was present. This is extremely hard to do in the case of the user opening pages that get loaded in background tabs, through "open in new tab". The problem arises from the fact that background loading does not trigger any progress or event listeners, so I have no reliable way to log the site. For now I have it set on a timer of 500 milliseconds - it just needs to load the URL so that it does not default to 'about:blank'. However this does not always work. I could possibly change how the "open in new tab" function works, by automatically switching focus to the newly opened tab, but users may not like an extension altering default Firefox functionality. I at least have a backup safeguard so that the extension dosen't break. Whenever a tab is selected if, if the URL has not already been logged it gets entered into the database, but as a root node. This does not accurately re-create the user's browsing session, but it keeps things stable.
    Lately I've been reading some textbooks in my free time, the most recent one being "Artifical Intelligence and Software Engineering - Understanding the Promise of the Future". After reading the first couple chapters - a general overview of Software Engineering ideas - I've realized that I tend to follow a Run-Understand-Debug-Edit style of development. I find it is easier to keep a mental model of the program when I do incremental development of this sort. The result is that the program generally works (with the exception of a few unconsidered boundry cases) but is also messy. For example, the main procedure for logging websites currently has many bits and pieces and messy subroutines, since over time I have slowly widen the scope of what it can do. It works, but it's messy. I plan on taking a couple days to flush it out - to read over all the pieces and try to organize it to flow logically. My hope is that this will increase efficiency, reduce line-count, and prevent future bugs.
    I've finished with the basics, so now I have a few options on what to tackle next. Not all options are needed, and it really depends on what would be the most useful feature to have:
    1) Thumbnails
        I've had some problems early on with attempting to use thumbnails, but now that I have more experience with Javascript and Firefox I may be able to figure out some way to make the nodes thumbnails. 
    2) Relevant ordering of nodes
        Right now the layout of the graph has no significantly, it just uses magnetic and spring forces to spread the nodes out as evenly as possible. I could add a new layout type to list everything in chronological order, or a way to display only one specific website and all sites that came from it or linked to it. The graph could be filtered by site, so that a scientist could only see papers from one specific domain, for example.  
    3) Graph manipulation
        More features for users to edit the graph: New Edge, Delete Edge, Collapse Node, anything. These functions would be designed to present a clearer and more fluid graph for not only personal reference but sharing with others as well. 
    4) Significance of a node
        It was suggested two weeks ago, when I did my demo to the grad students, that I add a feature to show how significant or important a website is. I could do this by logging the length of time that was spent on each site, and then alter the size of the node to relate to how long was spent viewing. This would be more useful for the casual user than for a scientist, because the length of time spent on a website for a scientist could, most likely, correlate to the length of the article or paper being read. However, it is still an interesting idea and provides more information about the browsing session to the user at a single glance.

Thursday, July 9, 2009

Release Notes

DOWNLOAD:

   Download here from rapidshare. It can only be downloaded 10 times so if the link is down please send me and e-mail so I can reupload it.

INSTALL:

    1) I suggest setting up a new profile - just to be safe - as well as keeping an eye on the Firefox Error Console. To set up a new firefox profile see this short document: http://support.mozilla.com/en-US/kb/Managing+Profiles

    2) Drag BreadCrumbs.xpi into your Firefox browser ( >= 3.0)

    3) Click Install

    4) Done!

TO USE:

    On the bottom right hand corner of the Firefox status bar will be a red icon with URL on it, that is the main controlling icon for BreadCrumbs.

    To begin logging websites, simply left click the icon. It will turn green to signify that it is running. Browse away! It can be clicked again to turn off - essentially pausing the logging - and clicking once more will resume where you left off.

    Right clicking the icon will present a context menu.

        Save Session: This will save your current session to a file.

        Load Session: This will allow you to load a saved session file.

        New Session: This will erase any logged websites and start you over with a fresh graph.

        View Session: This will allow you to view the graph.

    Logged Browsing History (the graph)

        Show Session Trail: This shows a list of the links from site to site your session, ordered from earliest (top entry) to most recent (bottom entry). Each entry shows the destination site and the exact date and time. Hovering your mouse over an entry causes it to be highlighted, along with the edge that it describes. The entry can be clicked to bring up a window to enter a new annotation for that link [Please do not include "|||" in the annotation; three pipes].

        Reload: [Self explanatory]

        Pause Animation: When the page is loaded the nodes of the graph will continually spread out so that all are visible. Once they are visible enough for you you can click the Pause Animation button to stop them from spreading out more.  Note: Any clicking on the graph will cause the animation to resume.

The Graph itself:

    Nodes: Each node of the graph is a website. Most nodes can be dragged around the screen, but if is is outlined in red then it is a root node and thus cannot be dragged. Hovering over a node causes a tooltip to appear with the title of the site. Right click on a node to open a context menu.

        Edit Title: This opens a window to allow you to rename the node to anything you want - please, without the character sequence "|||" [three pipes].

        Visit Site: This opens the website in a background tab in your browser.

        Collapse Node: [NOT IMPLEMENTED YET].

        New Edge: [NOT IMPLEMENTED YET].

        Delete Node: This will delete the node and all edges that connect to it. This may result in multiple disconnected graphs, which is fine. [Deleting a node causes an automatic reload to reset the magnetic/spring forces - I am working on a better solution].

        Close: This simply closes the context menu.

    Edges: There are two types of edges, solid or dotted. Solid edges are formed when a link is a link is clicked (or selected to "open in new tab"), so it corresponds to direct links. Dotted edges are any other type of link: a bookmark, clicking "Home", manually entering a URL, opening a new tab (ctrl+t) then entering a URL, etc. If logging is paused during a session then turned back on in the future, the resulting edge will be a dotted edge since there may have been many sites in between.

    Hovering over an edge causes it to be highlighted, along with the corresponding entry in the Session Trail panel (if it is not hidden), and also displays the link annotation. Right clicking on the edge will bring up a context menu.

    Edit Annotation: This will open a small window to enter any annotation you wish for the edge [Again, avoid "|||"]. Clicking on the corresponding entry in the Session Trail will also open the window.

    Delete Edge: This will delete the edge and reload the graph automatically.

    Close: This closes the menu.

KNOWN BUGS/ISSUES:

  • "open in new tab" sometimes causes the resulting webpage to not become logged, or when it is logged it is set as a root node.
  • New Edge and Collapse functions are not implemented.
  • Selecting an improper file in the Load Session option causes it to break.
  • Saving/Loading a very long session is slow.
  • Sometimes the forces between edges and nodes in the graph cause it to spread out widely, the temporary fix for this is the "Pause Animation" button.
  • Not allowing websites to load sometimes results in two copies of the same link appearing in the Session Trail panel.
  • Favicon extraction isn't the best, but it should work for most sites.  
  • Browsing with multiple windows has not been significantly tested.
  • The colours don't match in any sense.

CURIOUS?:

    To view the source code rename BreadCrumbs.xpi to BreadCrumbs.zip, and unzip. If you want to see where everything is being stored go to your Firefox profile directory and look for breadCrumbs.sqlite. I use the Firefox extension SQLite Manager to check the contents of breadCrumbs.sqlite and the edgeLog table to see what's what.