Wednesday, July 8, 2009

Abstraction and Separation of Concern

    Tuesday's lunchtime meeting was about Abstraction and Separation of Concern, with a lot of emphasis on open-collaborative science. I thought the discussion was very compelling, so I'd like to discuss (very informally) some thoughts on the issues.

    Issues for "Open Collaborative Science":

        1) Miss use and Misinterpretation

        Science is generally a messy process. The steps that have been taken to reach a conclusion and deduce an explanation or otherwise may not always be completely rigorous, in the full sense of the word. This is just the way science works. However, the average person does not need to know the stumbles that scientists went through - and they shouldn't. Allowing this information to be visible to the public, all of the steps and tests and experiments that scientists did to reach their conclusion, adds the possibility for people to poke holes in the work based on, as they set it, "bad science". This goes beyond just trying to do "good science" and give collaborative support or constructive criticism. This could come from people just possibly being afraid of wary of the results or even a result of ideological principles. Anyone who does not like a new scientific development because it contradicts their preset ideology that they may have could go to the online notebook and see all the potential flaws, without fully understanding the heuristics and assumptions that scientists work under, the constraints they have to work around.

        To combat this flaw I feel that the system should have multiple levels of openness - you can only dig ass deep as your qualifications will permit. I call this Limitation by Qualification. In order to effectively maintain a level of qualification that each user has the system would have to be maintained by one or more governing authorities from multiple scientific fields. Users would have their scentific qualifications for their field (and other fields) through an application process. The process could be simple: Do they have articles in respected journals? Do they have advanced degree(s) in their claimed area of expertise? Question of this nature would have to be asked. Once this is done, when scientists make entries or updates to their online notebooks data files, results, conditions, etc can all be locked to only be viewable by someone with the proper qualifications. The idea would be to have the abstract and a general idea of what the scientist is doing available to the public, with a finer level of detail of their works available to those who understand it.

        2) Trust: How do I know this won't be used against me, or stolen?

        Ideologies and fear of change are not the only drives for potential misuse of this data. Sometimes there are rivalries between scientists, and theft of work does occur. Limitation by Qualification would not work here - both scientists are equally qualified. To solve this an idea similar to Limitation by Qualification would be need, possibly something along the lines of a Limitation by Intentions. While it is easy for a human to figure out what I mean by Limitation by Intentions, this is not a simple concept to represent in a computer system. A rating system could possibly be used, to "blacklist" unethical scientists by majority opinion. This has the potential for abuse though, as scientists who merely disagree with the majority view could be "kept quiet" through blacklisting. Also, this relies on catching the scientist in the act, and thus does not prevent the initial theft. Another option would be to allow the owners of each online lab book to control who can view the bottom layer of their work. This would prevent theft, but defeats the very purpose of the system.

    If the above issues can be overcome, then the implementation of tools to facilitate open collaborative science need a couple things:

        1) Standardization for Accessibility by all Disciplines

        The main reason (that I see) for persuing the idea of open collaborative science is to have scientists easily share information, to remove all the "red tape" that slows down advancements in technologies and sciences. Scientists should be able to share, but how do we help them find the information they need in an area that has many foreign concepts and details? My idea would be to use an Interfacer. The Interfacer would be, at a basic level, an expert system. It would take the users request along with the data objects of the users current experiment to extract a list of relevant information and data sources for them. It would have to be highly modular, with two (possibly more?) input modules required - one consisting of an expert system for the input, tailored to understand concepts and terminology of the user's area of expertise, and one consisting of an expert system for the output, tailored to understand concepts and terminology of the discipline of the desired information. How these two modules would interact is a difficult question to answer. One problem is that the issue being looked at is not well defined; there is no clear mapping of terms from one discipline into coherent terms of another discipline, for example. The Interfacer would have to be able to inferr based on what information the user thinks they need to what information they actually need. This issue was highlighted in the discussion with the example Steve gave about a scientist interested in plant growth ( I think?) wanted information from climate simulations about the state of the climate at a specific location, 50 years from now. The scientist got the results and then would draw conclusions from them. Steve pointed out that the conclusions drawn this way would not be scientifically sound since the scientist does not fully understand the assumptions that the climate model was built upon. For something of this nature to work clear and concise assumptions would need to be stated for an experiment or observation - like disclaimers for use by others. To have an Interfacer efficiently do its job each piece of data through the whole network of open notebooks would have to be extractable and have relevant information for the extraction process, be it tags or otherwise. This could be maintained by storing semantic information for each data object in the system, to come up with an "intelligent" description.

        2) Collaboration Tools

        I can't say much for this part, but in my view Google Wave is the right idea. Real time alteration by multiple users is exactly the sort of collaboration that people need. To fully utilize this standards would have to be adopted, so that tables of data, multiple file types, and any other experimental data could be easily maintained. Also, Google Wave presents a solution for how each piece of data can be properly tagged on the fly. The Google Wave demo showed a real time spellchecked which analyzes the content of the sentence in real time to fix typos and grammatical errors. This could be applied to the content of the data or the abstract, to come up with a relevant "blurb" describing the piece of information in question (as mentioned in the point #1). However, this turns the issue of referencing into developing an network of semantically linked objects and thus it falls into the niche of a natural language processing problem.

    Now that I've gotten those thoughts out of the way I have to get back to writing release notes for my extension.


  1. Interesting thoughts, Brent, thanks for sharing!

  2. Brent, this is great! Keep the ideas flowing.