Graphical Representation of Browser History
Once I began searching for immediately ran into WebReview
WebReview is a Firefox extension that has a number of features, most relevant to me being its WebGraph feature
"WebReview Graph cannot only display the complete tree you browsed currently, but visualizes also all browsing sessions with every visited web page in your history in a graph. You can find out, how you found are very interesting page even if it was months before (if your history of Firefox goes back that far). In addition, you will be able to export an entire graph to a single file if you want to archive it or to share."
I decided to test this out and it is exactly what Steve was looking for. It creates a complete graph of websites visited, what led to what, and it does this for the entire browser history. Sessions are all automatically saved. It even has thumbnails of each website for the graph nodes. Should I continue trying to create my own version of this?
A few changes I would make:
This extension lacks the ability to make annotations between two webpages in the graph.
Building multiple graphs based on Firefox's history seems very process intensive, I would change it so that the user can start/stop the script whenever they would like. Once stopped the graph could be viewed, annotated (optionally), and saved.
Graphical Representation of Browser History
Idea: Divide each file into Sections(divide the line number by a certain amount, say 10)then upon committing see which sections the altered lines fall within. Or divide by Method/Routines as opposed to sections. These options seems like the most challenging to develope since it would require a lot of process-intensive code scanning, but is going by module/file a useful option for anyone?
Can see how they relate through:
Bug Tickets (Does Trac/DrProject list which files contain the bug?)
Bug Fixes (Possibly examine the date/time that the bug ticket was closed and who closed it, then find the files and changes committed to the repository at that time period by that user)
Time Frame (Simply based on files being committed at the same time by the same user)
Overall (Combining all of the above)
We need a data structure which can easily be interpreted to a visual representation. This would require maintaining a database of all of the relations, if we want the user to be able to sort and filter out certain ways that files relate - maybe the user doesn't care about which files were committed when a bug ticket was issued. We could have two Django models, Relations and Files. Each File would have to maintain a reference to all of the Relations it has, and the file name in the repository that it references.
Each Relation could have a number of fields: the two File models it relates (Only relate two files, so that in essence Relations constitute the edges of the graph), the user who caused the relation, the date/time it occurred, and the type (bug ticket, bug fix, commit, any more?).
With an implementation like this there could be many Relations between the same two Files. This could be represented in the graph as the weight of the edge (the Relation) and this we can filter what portion of the overall weight of the edge comes from which type of Relation.
This allows to do normal graph traversal routines such as Depth First Search from a specific File in the repository, or finding if there are any groups of Files (Connected Components) which are disjoint from other groups of Files.
A few things I was thinking about during our meeting today with the grad students.
Does the Subversion client have access to each branch for cross checking multiple branches against each other?
This ties in heavily with the awareness concept; once we have an idea of which sections of code and which modules appear to have a strong connection then people with multiple branches of the main project can be informed when another person on another branch has altered the same section of code/portion that has a strong connection.
Issues: What counts as a "current branch"? How often are these branches re-integrated back into the main project? Would abandoned side projects be removed or left as dead branches forever - many open branches could result in unneeded stress on the system, having to cross-check many branches which no longer matter to the overall project.
Related sections of code could be represented through graphs - the "strength" of the relation between two code sections/modules can be measured through their distance from one another in the graph, along edges.
Issues: Potentially lots of useless data to filter (typos, syntactic errors, etc).
Either person A has to write a blurb about what they were doing or whoever is reading the session has to try to deduce what person A was fixing/altering/implementing.
Indicate related code with line-by-line references, clicking as if to set a debugging break point? (Problem: This would have to be done through an IDE plugin)
What would be the nodes of the graph? Individual lines? Functions/Routines? Do it by module/class?
Possibly try two levels of interaction - module/file based (to give a broad idea of the interaction) and code-block based(to give a detailed level of the relation). But, how can we efficiently identify a code block?
This seems like it would either require an extensive amount of meta data to build up a reliable graph of connected components or a large initial investment of time on the part of the people who know/wrote the code.
This would also help person A see the history of their own changes they made (easier than manually looking through Subversion logs) - enter their name, a specific file, and a date range to search within, then the output could be sorted by, say, most frequently edited code block or into chronological order or by file.
This could be implemented through indexing Subversion logs for each user on the system.
Tag crucial sections of the code that begin the initial processing of the data set.
Possibility: Any section that deals directly with processing raw data should be considered a "crucial section".
Fallback: Upon errors, check if the error occurred within any of the crucial section, then re-run the program using the next latest version of the data. Continue this process until we have a successful run or there are no more versions (no more versions means its a problem with their code, not a problem in loading the proper data set).