Wednesday, May 20, 2009

Previous Wiki Entries

May 19
Graphical Representation of Browser History 
      Today I went through a few tutorials on creating Firefox extensions - learning XUL, understanding Javascript, getting the file structure and chrome options straightened out, etc. I found that making a javascript run continually while the browser is open is not all that hard. However, getting a function to run once when a webpage is loaded is frustratingly hard as Javascript's event listeners for on load execute multiple times. After over an hour of searching I have yet to find a solution. So far my work around is having a status bar button that has to be clicked to log the url, but as said before that would have to be clicked for every webpage that the user wants to log into the graph. 
    Once I began searching for immediately ran into WebReview 
        WebReview is a Firefox extension that has a number of features, most relevant to me being its WebGraph feature 
        "WebReview Graph cannot only display the complete tree you browsed currently, but visualizes also all browsing sessions with every visited web page in your history in a graph. You can find out, how you found are very interesting page even if it was months before (if your history of Firefox goes back that far). In addition, you will be able to export an entire graph to a single file if you want to archive it or to share." 
        I decided to test this out and it is exactly what Steve was looking for. It creates a complete graph of websites visited, what led to what, and it does this for the entire browser history. Sessions are all automatically saved. It even has thumbnails of each website for the graph nodes. Should I continue trying to create my own version of this? 
        A few changes I would make: 
            This extension lacks the ability to make annotations between two webpages in the graph. 
            Building multiple graphs based on Firefox's history seems very process intensive, I would change it so that the user can start/stop the script whenever they would like. Once stopped the graph could be viewed, annotated (optionally), and saved. 

May 15
Graphical Representation of Browser History 
        This project seems like it could not be efficiently done in Django, since Django is tied to a specific set of web pages. To get the most use of this I would need to have it generalizable to all web pages - a browser plug in. So today I will be researching about the development of Firefox browser extensions and learning the languages it uses (XUL and JavaScript). However, this would limit the application's usability to only Firefox, unless Firefox extensions can be ported to other browsers with ease. Should I continue this route and forget about Django or continue with Django and try to develop this as a web app to integrate into a site for finding relevant research articles? As of now I am still unsure, so for the time being I will researching the difficulty of Firefox extension development then examine the restrictions of presenting graphs with Django - to keep my options open. 

        So I spent most of the afternoon reading about Javascript and XUL. A Firefox extension might not be so bad, but after about an hour of browsing I'm still unsure if I can make a Javascript which will run the entire time that the browser is open. I initially thought that I could build it as a Greasemonkey script, but the limitations of Greasemonkey are just too much. So far my options seem to be to build the extension and add a button to the toolbar which when clicked automatically extracts the current URL and store it in some sort of abstract data structure. The problem is that I would need the javascript to run continually so that the data isn't lost, and I'm unsure if this is even possible to do efficiently with javascript. If Django could be used to make browser extensions then that would be great, since it seems like I need some sort of database to store the URLs. Also, if I implement it in this fashion it requires the user to click the button on every page they want logged. One button click isn't so much, but after a few dozen pages it may get annoying - having it automated would be much better. I may have to see how a client-side database could be managed when a Firefox session is opened and manipulated by Javascript, but this seems like an awkward work-around to solve the simple problem of saving the URL obtained from the Javascript. Unfortunately my knowledge of Javascript consists of the minimal amount that I've learned today, and dynamically managing information in a data structure, or even the simple problem of trying to have the script running continually through the session, have escaped me. 

Social Network through Subversion 
    Idea: Divide each file into Sections(divide the line number by a certain amount, say 10)then upon committing see which sections the altered lines fall within. Or divide by Method/Routines as opposed to sections. These options seems like the most challenging to develope since it would require a lot of process-intensive code scanning, but is going by module/file a useful option for anyone? 
    Can see how they relate through: 
        Bug Tickets (Does Trac/DrProject list which files contain the bug?) 
        Bug Fixes (Possibly examine the date/time that the bug ticket was closed and who closed it, then find the files and changes committed to the repository at that time period by that user) 
        Time Frame (Simply based on files being committed at the same time by the same user) 
        Overall (Combining all of the above) 
    Structure: 
    We need a data structure which can easily be interpreted to a visual representation. This would require maintaining a database of all of the relations, if we want the user to be able to sort and filter out certain ways that files relate - maybe the user doesn't care about which files were committed when a bug ticket was issued. We could have two Django models, Relations and Files. Each File would have to maintain a reference to all of the Relations it has, and the file name in the repository that it references. 
    Each Relation could have a number of fields: the two File models it relates (Only relate two files, so that in essence Relations constitute the edges of the graph), the user who caused the relation, the date/time it occurred, and the type (bug ticket, bug fix, commit, any more?). 
    With an implementation like this there could be many Relations between the same two Files. This could be represented in the graph as the weight of the edge (the Relation) and this we can filter what portion of the overall weight of the edge comes from which type of Relation. 
    This allows to do normal graph traversal routines such as Depth First Search from a specific File in the repository, or finding if there are any groups of Files (Connected Components) which are disjoint from other groups of Files. 

May 12

A few things I was thinking about during our meeting today with the grad students.

Integrating Branches in Subversion 
Does the Subversion client have access to each branch for cross checking multiple branches against each other? 

This ties in heavily with the awareness concept; once we have an idea of which sections of code and which modules appear to have a strong connection then people with multiple branches of the main project can be informed when another person on another branch has altered the same section of code/portion that has a strong connection. 

When person A commits changes they made to their branch to the repository, Subversion's logs could be used to automatically scan the other branches (assuming Subversion has access to each branch). If the changes person A made involve strongly connected code portions/modules of another current branch person B is working on, then automatically send a message to person B informing them that person A should be contracted to clear up any possible errors or conflicts. (This could be an e-mail or a notification through a blog widget). 

Issues: What counts as a "current branch"? How often are these branches re-integrated back into the main project? Would abandoned side projects be removed or left as dead branches forever - many open branches could result in unneeded stress on the system, having to cross-check many branches which no longer matter to the overall project. 

Code Awareness 

Related sections of code could be represented through graphs - the "strength" of the relation between two code sections/modules can be measured through their distance from one another in the graph, along edges. 

Sessions? A "play by play" of what person A did on date B with file C. 
Issues: Potentially lots of useless data to filter (typos, syntactic errors, etc). 
Either person A has to write a blurb about what they were doing or whoever is reading the session has to try to deduce what person A was fixing/altering/implementing. 

Indicate related code with line-by-line references, clicking as if to set a debugging break point? (Problem: This would have to be done through an IDE plugin) 

Use Subversion's logs to establish differences upon committing to see which lines of code have been altered or which files have been changed together in each submission. 

What would be the nodes of the graph? Individual lines? Functions/Routines? Do it by module/class? 

Possibly try two levels of interaction - module/file based (to give a broad idea of the interaction) and code-block based(to give a detailed level of the relation). But, how can we efficiently identify a code block? 

This seems like it would either require an extensive amount of meta data to build up a reliable graph of connected components or a large initial investment of time on the part of the people who know/wrote the code. 

Keep track of what sections of code/files each person on the project team has worked on. This would allow for searching by user. 

This would also help person A see the history of their own changes they made (easier than manually looking through Subversion logs) - enter their name, a specific file, and a date range to search within, then the output could be sorted by, say, most frequently edited code block or into chronological order or by file.

This could be implemented through indexing Subversion logs for each user on the system. 

Versions of Data Sets 

Tag crucial sections of the code that begin the initial processing of the data set. 

Crucial sections must be specified to begin with. 

Possibility: Any section that deals directly with processing raw data should be considered a "crucial section". 

Default: Load latest version of the data. 

Fallback: Upon errors, check if the error occurred within any of the crucial section, then re-run the program using the next latest version of the data. Continue this process until we have a successful run or there are no more versions (no more versions means its a problem with their code, not a problem in loading the proper data set). 

What changes in newer versions of the data sets - Simply more data? New syntax? Additional properties for each data entry?

No comments:

Post a Comment