Intelligent-Data: July 2012

Friday, July 27, 2012

July 22nd - July 28th

Week's Plan:
Work on the "abstract" state representation, which will be some type of clustered state of our original states, using some type of grouping policy.

Week's Accomplishments:
Monday we finished up the work for Stephanie's poster. To do this we wrote an exporter that creates a edge-table for a graph and separates the frequencies based on the Case_Group_Id of the students. A strictly Excel based approach introduced errors into the data.

Tuesday I created a visual editor to control some of the display properties for customization purposes, which will be useful for creating images and makes the interface more usable. Also got screwed around by the action sequences.

Wednesday/Thursday - I worked on constructing the "abstract" state. The abstract state is our typical states, somehow clustered or grouped in some way. On the implementation side of the work, there are a handful of considerations to have in order to keep data-integrity. The issues I addressed included the software design of handling these "abstract" nodes. The next significant issue is properly generating the needed data out of parsed data file we created from Deep Thought. Our original parser did not consider, or generate some necessary pieces of data, impart because of how Deep Thought logs the data, so it was necessary to calculate some elements of the data. Excel was insufficient for handling these issues.

Friday - Confirm that our data converter properly calculates the new elements of the data we are interested in. We now have access to a Pre and Post Condition, the solution attempt via start-overs, and goals.

Problems:
No real problems. My biggest concern is that just reducing the number of nodes or edges is not a good metric for measuring the graph-reduction rate. The problem is, with that definition, a single node with no edges is best. I would put forth that, the derived graph must be generated strictly from the original Interaction Network, meaning only values from those nodes and edges and their relevant derived values are the only available input values for creating our new abstract-state network. This will insure that our new network is structurally sound. The spectrum approach for abstract-states may be the most appropriate approach.

Next Week's Plan:
1) I need to make a new Node-Table on the data-object class (see below) that will store our ClusterNodes and ClusterEdges which will be the back bone of our new abstract states.

2) I need to make a DataObject class, which will be similar to the data-parser, but will be used for displaying derived graphs, specifically our more "abstract" state representations.

Semi Distant 3) I should develop a Data-Properties class to more appropriately manage the different functions we have available depending on what type of data is read in. Basically the object just stores a dozen or so flags that are set based on what columns are read in, in the data-import stage of the program.

Other Pieces of Work: (I just don't want to forget these ideas, not sure how necessary / important they are)
We should export the frequency data for the stoichiometry data and load that into yEd and see what see.

We should write some type of data loader that lets load in "hint-actions" so we can see where students request hints.

Hours Worked:
Sun - 0?
Mon - 8
Tues - 9
Wed - 6
Thurs - 14
Fri - 9
Sat - 5 (planned)
Total: 51

Monday, July 23, 2012

July 15th - July 20th

Week's Plan:
Work with Aaron and find out what sequences we are able to detect.

Write up the alternative directions to research, ie. the tweaked research questions, with methodologies, introductions and expected results.

Friday I'll meet with Dr. B. and discuss the proposed directions for my research.

Week's Accomplishments:
Sunday I worked on combining the work that Aaron did, into the project with the pieces I have done, so that InVis has all its pieces connected, specifically the sequence detecting components. I also wrote a data-exporter for the InVis project so we can load in the frequencies for the Stoichiometry data into yEd, to look at that data.

Monday I met with Dr. Croy and finished my PhD paper work. We tried a handful of different tweaks in order to detect "better" sequences. We had some success, clearing out less meaningful discoveries, specifically subsets of longer sequences.

I also met with Dr. Wartell to look at the concept of Data / Ink Ratio but it doesn't seem that this is the direction we should go. Though he did provide helpful conversation which lead me to current direction. The Data / Ink Ratio is not so much a testable metric to compare against and measuring insights is not well defined in the field of info-vis. It is more of a comparison, being that Graph A showing some data, and Graph be showing the same data but having a multi-colored background, uses more ink but doesn't add to the value of the graph. Edward Tufte made the argument, in his book - without evidence, that this should be done. Later research papers on the issue showed that with "chart-junk" pretty pictures helped people remember the graphs, though accuracy of the data was not affected.

Tuesday I made some necessary fixes to the data parser, so that we would have more clean data, for both looking at in the tool, and also for performing our sequence detection on. I also continued to work with Aaron to fine tune our algorithm for detecting sequences, and it does find "things" but its hard to say if they are the most interesting or least interesting things.

Wednesday I read Biswas best paper from EDM-12, and was able to draw a decent idea out of it. I also updated my research questions towards questions which will be easier to test and have a greater differentiation between the two. I've written a section in my proposal to reflect this proposed ideas.

Thursday & Friday I worked on reading dealing with the hint and non-hint groups of data from the 2009 semester. So we can display what the two differences between the two groups which is the focus of Stephanie's end of summer report. To do this there were a handful of changes that needed to be made to the program. I also worked with Aaron and Stephanie to help get their reports in order.

Problems:
Making the tabbed pane able to show multiple types of sequences in cohesion with the ExplorerManager of the Netbeans platform is a giant pain in the butt. I'll likely ignore the "proper" solution and just get something that works for the time being.

Next Week's Plan:
Tuesday I will go to Evie's Dissertation Defense.

I need to incorporate some trivial means of incorporating grouping-data into InVis. We often want to look at the difference between hint groups and non-hint groups. The next would be students that solve the problems versus students that don't solve the problems.

Definitely need to spend some time working on an overview of the states, in an abstract sense. Rather than a graphical overview, re-work the description of state, to have a graph represent the more abstract states that people visit when solving a problem.

Other Pieces of Work:
We should export the frequency data for the stoichiometry data and load that into yEd and see what see.

We should write some type of data loader that lets load in "hint-actions" so we can see where students request hints.

Hours Worked:
Sun - 10
Mon - 7
Tues - 12
Wed - 8
Thurs - 8
Fri - 6
Sat - 0
Total: 51

Friday, July 13, 2012

July 9th - July 13th

Week's Plan:
Look at the Stoichiometry data to see what might be common between it and data from our other tutors.

Read up on works in Graphs, that might show me some ways of determining some ground truths to compare against.
Graph Reduction
Graph Complexity
Graph Filtering

Considered new research questions to address the similarity between the questions I have right now.
1) How do we discover important features of problem solutions in domain independent data driven ways?
2) How can graph visualization be leveraged to identify useful aspects of student solutions?

Week's Accomplishments:
Loaded the Stoichiometry data into yEd, to see if there were similarities in student behaviors with data we have from other tutors. Was hoping some similar graph structures between Deep Thought data and Stoichiometry data would stick out.

I read a handful of papers on different graph works. There is one paper by Conati who made graphs out of student solutions in the physics domain. It wasn't domain independent but I should be citing this work. They then used domain knowledge to generate hints. Papers on mathematical graphs didn't seem to offer to much help, mainly because we are more interested in graph-interpretation, and in turn actually data-interpretation, we just happen to represent it in a graph.

I also spoke with Mike and he suggested looking at the Ink / Information ratio, a concept in Information Visualization, that measures how effectively one uses screen real-estate when presenting some amount of information.

Problems:
After looking at the Stoichiometry data, there was nothing of particular interest that "jumped" out at me. One issue is that excel could not handle cell's with string lengths longer than 255 characters, which meant I couldn't "excel" my way to incorporating frequencies into this data for yEd. By not showing the frequencies, this made it significantly more difficult to identify interesting or important structures.

Looking at the two questions, it is trivial to convert one into the other. Change identify to discover, and change problem solutions to student solutions, or visa-versa, and it becomes clear they do not differ. This is even more clear, when trying to design an experiment that solves one question but not the other. I feel that there is only one contribution here.

Not having yFiles, will begin to impedes our work very soon, the evaluation period is almost expired. Aaron has a select nodes and create group-cluster implemented with y, but our sequence detection is written using Jung. To combine these we need to convert to y.

Next Week's Plan:
Monday I will meet Dr. Croy, to complete the necessary paper-work.

I absolutely must define my final questions for my dissertation. This is my biggest obstacle that hinders progress.

I want to read up how people measure the ink / information ratio in the info-vis literature. Ideally I can perform the same activity on a graph-node / information situation, to determine a metric for when combining or collapsing nodes is a good thing to do. Simply counting nodes and edges as the metric is not sufficient, because a single node, with no edges would then be shown to be the best. The problem of course is that a single node, no edge graph contains no information.

I want to read in the Stoichiometry data into the vis-tool and then export the states and edges with their frequencies. The output format will be pretty simple, tab-delminated list, of source, edge, target, frequency, where the frequency will be the edge frequency. For some of the problems in the deep thought data, distinct strategies were visible, mainly two, one of which is discussed in our case-study work. With frequencies being shown in the Stoichiometry data, my hope would be that we could see similar strategy structures - which could warrant building a strategy detector for the Interaction-Network. I estimate I can have this work done in just a few hours, potentially even by Saturday.

Additional Thoughts:
--- The Strategy-detector for the data from Deep Thought would be rather simple. As early as the first action being performed, identifies the strategy. Among the two strategies present in that particular data-set, the frequencies are the first highest and the second highest in frequencies. The issue here, is that the definition of a strategy for this Deep Thought data is, the actions that have the highest frequencies. The argument would be that if a lot of people perform a similar set of actions, that would identify a strategy. If on the other hand, the action, or set of actions, did not have high frequency, could that set of actions be identified as a strategy?

--- The issue with this would be, we are implying that there are no uncommon strategies, and in order to be classified as a strategy, your solution must be common, which isn't exactly brilliant, or even necessarily accurate. We want to create a strategy detector, not a common-set-of-actions detector.

--- In order to better detect strategies, even uncommon ones, we probably need more dimensions in our data, like time, perhaps hint usage, or "attempts". Strategies should not contain people starting over in the middle of the strategy. Another theory, would be that you wouldn't have a lot of hint requests in a strategy, if a student is thinking three steps ahead, they should kind-of have a thought on how they would get there. A boundary between low hint request to high hint request might identify the end of a sequence or strategy.

--- Another potential method for detecting sequences would be to look at the time data of the actions. Th theory would be that the variance of the time of the actions would be small in sequences. When a sequence ends, the variance in the time data would increase, depicting more thinking-time for how to proceed, once the set of actions in the sequence were complete.

The Biggest Problems:
With a strategy detector, or a sequence detector, how do we determine if we have identified the correct sequences or correct strategies? What makes one strategy detector better or worse than another one? What makes a sequence detector better or worse than another one?

We must provide evidence that the sequences or strategies detected are legitimate. Expert review and scoring would be one potential method.

If one detector worked across multiple domains, or more domains, that would support its strength over another detector which didn't work as well over multiple domains.

Hours Worked:
Mon - 10
Tues - 12
Wed - 12
Thurs - 5
Fri - 7
Sat - 8
Total: 54