Week's Plan:
Look at the Stoichiometry data to see what might be common between it and data from our other tutors.
Read up on works in Graphs, that might show me some ways of determining some ground truths to compare against.
Graph Reduction
Graph Complexity
Graph Filtering
Considered new research questions to address the similarity between the questions I have right now.
1) How do we discover important features of problem solutions in domain independent data driven ways?
2) How can graph visualization be leveraged to identify useful aspects of student solutions?
Week's Accomplishments:
Loaded the Stoichiometry data into yEd, to see if there were similarities in student behaviors with data we have from other tutors. Was hoping some similar graph structures between Deep Thought data and Stoichiometry data would stick out.
I read a handful of papers on different graph works. There is one paper by Conati who made graphs out of student solutions in the physics domain. It wasn't domain independent but I should be citing this work. They then used domain knowledge to generate hints. Papers on mathematical graphs didn't seem to offer to much help, mainly because we are more interested in graph-interpretation, and in turn actually data-interpretation, we just happen to represent it in a graph.
I also spoke with Mike and he suggested looking at the Ink / Information ratio, a concept in Information Visualization, that measures how effectively one uses screen real-estate when presenting some amount of information.
Problems:
After looking at the Stoichiometry data, there was nothing of particular interest that "jumped" out at me. One issue is that excel could not handle cell's with string lengths longer than 255 characters, which meant I couldn't "excel" my way to incorporating frequencies into this data for yEd. By not showing the frequencies, this made it significantly more difficult to identify interesting or important structures.
Looking at the two questions, it is trivial to convert one into the other. Change identify to discover, and change problem solutions to student solutions, or visa-versa, and it becomes clear they do not differ. This is even more clear, when trying to design an experiment that solves one question but not the other. I feel that there is only one contribution here.
Not having yFiles, will begin to impedes our work very soon, the evaluation period is almost expired. Aaron has a select nodes and create group-cluster implemented with y, but our sequence detection is written using Jung. To combine these we need to convert to y.
Next Week's Plan:
Monday I will meet Dr. Croy, to complete the necessary paper-work.
I absolutely must define my final questions for my dissertation. This is my biggest obstacle that hinders progress.
I want to read up how people measure the ink / information ratio in the info-vis literature. Ideally I can perform the same activity on a graph-node / information situation, to determine a metric for when combining or collapsing nodes is a good thing to do. Simply counting nodes and edges as the metric is not sufficient, because a single node, with no edges would then be shown to be the best. The problem of course is that a single node, no edge graph contains no information.
I want to read in the Stoichiometry data into the vis-tool and then export the states and edges with their frequencies. The output format will be pretty simple, tab-delminated list, of source, edge, target, frequency, where the frequency will be the edge frequency. For some of the problems in the deep thought data, distinct strategies were visible, mainly two, one of which is discussed in our case-study work. With frequencies being shown in the Stoichiometry data, my hope would be that we could see similar strategy structures - which could warrant building a strategy detector for the Interaction-Network. I estimate I can have this work done in just a few hours, potentially even by Saturday.
Additional Thoughts:
--- The Strategy-detector for the data from Deep Thought would be rather simple. As early as the first action being performed, identifies the strategy. Among the two strategies present in that particular data-set, the frequencies are the first highest and the second highest in frequencies. The issue here, is that the definition of a strategy for this Deep Thought data is, the actions that have the highest frequencies. The argument would be that if a lot of people perform a similar set of actions, that would identify a strategy. If on the other hand, the action, or set of actions, did not have high frequency, could that set of actions be identified as a strategy?
--- The issue with this would be, we are implying that there are no uncommon strategies, and in order to be classified as a strategy, your solution must be common, which isn't exactly brilliant, or even necessarily accurate. We want to create a strategy detector, not a common-set-of-actions detector.
--- In order to better detect strategies, even uncommon ones, we probably need more dimensions in our data, like time, perhaps hint usage, or "attempts". Strategies should not contain people starting over in the middle of the strategy. Another theory, would be that you wouldn't have a lot of hint requests in a strategy, if a student is thinking three steps ahead, they should kind-of have a thought on how they would get there. A boundary between low hint request to high hint request might identify the end of a sequence or strategy.
--- Another potential method for detecting sequences would be to look at the time data of the actions. Th theory would be that the variance of the time of the actions would be small in sequences. When a sequence ends, the variance in the time data would increase, depicting more thinking-time for how to proceed, once the set of actions in the sequence were complete.
The Biggest Problems:
With a strategy detector, or a sequence detector, how do we determine if we have identified the correct sequences or correct strategies? What makes one strategy detector better or worse than another one? What makes a sequence detector better or worse than another one?
We must provide evidence that the sequences or strategies detected are legitimate. Expert review and scoring would be one potential method.
If one detector worked across multiple domains, or more domains, that would support its strength over another detector which didn't work as well over multiple domains.
Hours Worked:
Mon - 10
Tues - 12
Wed - 12
Thurs - 5
Fri - 7
Sat - 8
Total: 54