This website uses cookies and similar technologies to understand visitors' experiences. By continuing to use this website, you accept our use of cookies and similar technologies,Terms of Use, and Privacy Policy.

Jul 25 2011 - 08:36 AM
PSLC Summer School - Day 1
First day at PSLC has started with a bang, seems like lot of interesting students and mentors are here. Looking forward to the developments, would keep you guys updated. Classes of EDM Method (Banker & Yacef, 2009) - Prediction - lot of emphasis - Clustering - Relationship Mining: whether students are - Discovery with Models - Distillation of Data for Human Judgement Prediction: Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other aspects of the data (predictor variables) - Does a student know a skill? - Which students are off-task? - Which students will fail the class? KDD Cup : - Top 3 data mining conferences Premier Bayesian Knowledge tracing Clustering: When we have unstructured data we use clustering, define sets of students or problems that can guide us to get some knowledge about the data. - find points that naturally group together, splitting full data set into set of clusters Relationship Mining: Discover relationships between variables in a dataset with many variables - Association rule mining - Correlation mining - Sequential pattern mining - Causal Data mining Discovery with Models: - Pre-existing models (developed with EDM prediction methods or clustering or knowledge engineering) - Applied to data and used as a component in another analysis. Distillation of Data for Human Judgment - Making compex data understandable by humans to leverage their judgement - Text replays are a simple example of this Knowledge Engineering - Creating a model by hand rather than automatically fitting model - In one comparison, leads to worse fit to gold-standard labels of construct of interest than data mining (Roll et al, 2005) but similar qualitative performance. EDM Track schedule Tuesday 10 am - Education data mining with DataShop (Stamper) Tuesday 11am - Item Response Theory and learning factor analysis EDM Tools: 1) DataShop - repository for educational data. 2) Excel (Add ins) - Data Analysis : Anova - Equation Solver - fit a model (initial knowledge - bayesian knowledge tracing) - Scatterplots 3) Free data mining packages - Weka - RapidMiner Weka vs RapidMiner - Weka easier to use than RapidMiner - RapidMiner significantly powerful than RapidMiner In particular… - It is impossible to do key types of model validation for EDM within Weka's GUI - RapidMiner can be kludged into owing so (more on this in hands-on session) 4) SPSS - statistical package and therefore can do a wide variety of statistical tests - it can also do some forms of data mining like factor analysis (a relative of clustering) Difference between statistical packages (like SPSS) and data mining packages like Weka - 5) R - is an open source competitor to SPSS - more powerful and flexible than SPSS - but much harder to use - I find it easy to accidentally do very, very incorrect things in R IRT Model - Associative Model DataShop - Phil Patt uses data webservices of DataShop to get the data and analyze Matlab - Beck and Changs Bayes Net Toolkit - Student modeling is built in Matlab Pre-processing 1) Where does EDM data come from? - Tutor or log files - Surveys/Tests - Recorded / Conversational data - from Sensors or Eye tracking or Facial Recognition (confused/angry), hand sensors, butt sensor (best data captured way) Common Approach - Flat Data file (even if you store your data in databases, most data mining techniques require a flat file) some useful features to distill for educational software - Type of interface widget - "Pknow" : the probability that the student knew the skill before answering (using Bayesia knowledge-tracing or PFA or your favorite approach) - Assessment of progress student is making towards correct answer (how many fewer constraints violated) - Whether this action is the first time a student attempts a given problem step - "Optoprac": How many - "timeSD" : time taken in terms of standard deviations above (+) or below (-) average for this skill across all actions and students - "time2SD": sum of timeSD for the last 3 action or 5 or 4 - Action type counts or percents : Total number of actions so fat : No of actions on this skill, divided by optoproac : no. of actions in last n actions logistical regression models Code available Ryan Baker has code available for EDM - - Distilling datashop data - Bayesian knowledge tracing
|By: Pranav Garg|2983 Reads