learning analytics: tool matrix -...

23
Learning Analytics: Tool Matrix David Dornan Tool (URL) Description Opportunities in Learning Analytic Solutions Weaknesses/Concerns/ Comments Data One of the biggest hurdles in developing learning analytic tools is developing data governance and privacy policy related to accessing student data. The two initiatives in this section offer frameworks for opening access to student attention/learning data. The first initiative provides a start to developing data collection standards and the second provides inspiration on how/why it is not only feasible to deliver free open courses, it is also makes sense in terms of providing a community based research environment to explore, develop and test learning theories and learning feedback mechanisms/tools. PSLC (Pittsburgh Science of Learning Center) DataShop The PSCL DataShop is a repository containing course data from a variety of math, science, and language courses. Data Standards Initiatives like PSLC will help the learning analytics community develop standards for collecting, anonomizing and sharing student level Convincing individual institutions to contribute to this type of data repository may be difficult given that many institutions do not have data governance/sharing 1

Upload: lekhanh

Post on 31-Jan-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

Learning Analytics: Tool Matrix

David Dornan

Tool (URL) Description Opportunities in Learning Analytic Solutions

Weaknesses/Concerns/ Comments

Data

One of the biggest hurdles in developing learning analytic tools is developing data governance and privacy policy related to accessing student data. The two initiatives in this section offer frameworks for opening access to student attention/learning data. The first initiative provides a start to developing data collection standards and the second provides inspiration on how/why it is not only feasible to deliver free open courses, it is also makes sense in terms of providing a community based research environment to explore, develop and test learning theories and learning feedback mechanisms/tools.

PSLC (Pittsburgh Science of Learning Center) DataShop

The PSCL DataShop is a repository containing course data from a variety of math, science, and language courses.

Data Standards

Initiatives like PSLC will help the learning analytics community develop standards for collecting, anonomizing and sharing student level course data.

Convincing individual institutions to contribute to this type of data repository may be difficult given that many institutions do not have data governance/sharing policies to share this type of information internally.

1

Page 2: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

Open Learning Initiative

This is an exciting initiative taking place at Carnegie Mellon University. Students’ interaction with free on-line course material/activities provides a virtual learning analytic laboratory to experiment with algorithms and feedback mechanisms.

From Solo Sport to Community Based Research Activity

Herbert Simon from Carnegie Mellon University states that,

“Improvement in Post Secondary Education will require converting teaching from a ‘solo sport’ to a community based research activity.”

There are often two concerns related to conducting experimentation using learning analytics:

1. Privacy concerns related to accessing student related data. 2. Ethical concerns related to testing different feedback\instructional response mechanisms.

By offering free courses to student with full disclosure of how their interactions will be tracked and analyzed, these the two issues are no longer road blocks for conducting learning analytics research. As learning material/objects become commodities, the development of learning analytics tools that help guide and direct students will

2

Page 3: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

become what is valued and this requires that institutions build expertise in developing and sustaining the communities required to conduct community based learning research.

Database Storage

The majority of current learning analytics initiative are handled adequately using relational databases. However, as learning analytics programs begin to make use of the semantic web and social media tools, there will be a need to start exploring data storage technology that can handle large unstructured data sets. This section provides a brief description to the data storage required for LA programs.

Relational Database For years we have used relational databases to structure the data required for our analyses. Data is stored in tables consisting of rows and columns. The columns are well-defined attributes pertaining to an object represented by a table. There are good open source relational database such as greenplum and mysql. However, most universities have standard supported RDMS offerings. At the University of Guelph we support both SQL Server and Oracle's RDMS.

Oracle provides a secure repository for structured data. The recent release of 11g also provides integration with the R engine permitting it to access data stored in the database.

NoSQL Database/Hadoop/Map Reduce

Hadoop is an Apache project inspired by Google's Mapreduce and the Google File System. It has become a standard for distributing

As learning analytics programs begin to make use of the semantic web and social media tools there will be a need to start exploring data storage

Universities have good relational database infrastructures including expertise. As LA programs grow to include analysis of unstructured

3

Page 4: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

large unstructured data sets. It provides a framework that can distribute large data set over a number of servers and can provide intermediate results as data flows through the framework's pipeline.

technology that can handle large unstructured data.

data, universities will need to develop skill and capacity to offer Hadoop data storage and retrieval services.

EC2 There are a number of companies that lease access to processing via virtual servers. Amazon’s EC2 is a common cloud server option available to host applications.

It is becoming common for organization to look at moving application to the cloud. For many of the traditional services, like the RDMS, there is resistance to cloud based deployments. This resistance is primarily due to privacy concerns and resistance to change. As LA programs require access to new technologies such as Hadoop and require infrequent massive analytical cycles, there may be an opportunity to introduce cloud-based offerings such as EC2.

The first assignment for this course (the development of a LA tool) provided me an opportunity to deploy an application using EC2. EC2 is a great way to explore new technologies. If mistakes are made one simply redeploys a new EC2 instance. There are many publically available instances that save time in deploying complete environments. In developing my LA tool, I deployed an Oralce XE instance (which required virtually no effort) and another RedHat instance where I installed RevoDeployR. Since RevoDeployR was a new tool for me, I had to start over several times before completing a successful installation. It is possible to create backup images in EC2. However, it was not as intuitive as creating a new instance.

Data Cleansing/Integration

4

Page 5: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

Prior to conducting data analysis and presenting it through visualizations, data must be acquired (extracted), integrated, cleansed and stored in an appropriate data structure. The tools that perform these tasks are commonly referred to as ETL tools. Given the need for both structured and unstructured data (as described in the above section), the ideal ETL tools will be able to access and load data to and from data sources including RRS feeds, API calls, RDMS and unstructured data stores such as Hadoop.

Needlebase Needlebase is a web-based webscraping tool that provides an easy to use interface to acquire, integrate and cleanse web-based data. As a user navigates a website tagging page elements of interest, Needlebase detects the underlying database structure and web navigation and automates the collection of the underlying data into a table of data.

Needle base is a great tool for accessing a websites underlying data when direct access to the data is not easily accessible. I have used Needlebase to create a lookup table for archived National Occupation Codes and to create a lookup table for our undergraduate course calendar.

There is no API access to the Needlebase scripts that are created. It seems best for one off extracts or for applications where the entire dataset is acquired using Needlebase tools. It does not seem all that useful for an integrated solution. One other restriction that I ran across using this tool was that it did not support accessing websites requiring authentication.

Pentaho Integration Pentaho Data Integration (PDI) is a powerful easy to learn open source ETL tool that supports acquiring data from a variety of data sources including flat files, relational databases, Hadoop databases, RSS Feeds, and RESTful API calls. It can also be used to cleanse and output data to the same list of data sources.

PDI provides a versatile ETL tool that can grow with the evolution of an institutions learning analytics program. For example, initially a LA program may start with institutional data that is easily accessible via institutional relational databases. As the program grows to include text mining and recommendation systems that require extracting unstructured data outside the institution, the skills developed with PDI will accommodate the new sources of data collection and

There are two concerns that I have with PDI:

1. Pentaho does not have built in integration with R statistics. Instead Pentaho data mining integration focuses on a WEKA module.

2. Pentaho is moving away from the open source model. Originally PDI was an open source ETL tool called Kettle developed by Matt Casters. Since Pentaho acquired Kettle (and Matt Caster), it has become a central

5

Page 6: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

cleansing. piece to their subscription based BI Suite and the support costs are growing at a rapid pace. Twice, I have budgeted for support on this product only to find that the support costs have more than doubled year over year.

Talend Talend is another open source ETL tool that has many of the same features as PDI. The main differences between PDI and Talend are presented in the following blog post:

http://churriwifi.wordpress.com/2010/06/01/comparing-talend-open-studio-and-pentaho-data-integration-kettle/

The main difference that from my perspective is that Talend is a code generator whereas PDI is not. I have also found PDI a much easier tool to learn and use.

Talend has the same strengths as described above with the additional benefit of having built in integration with R.

Yahoo Pipes Yahoo provides this free web-based GUI tool that allows users to extract web-based data and create data stream that will cleanse, filter or enhance data prior to outputting the

Since PDI and Talend seem to be able to provide the same ability as Yahoo Pipes I did not spend a great deal of time exploring Yahoo Pipes. However, it seems to me that Yahoo

The one concern that I have wrt Yahoo pipes is that some of the unstructured data that will require analysis in a LA system will be posts by student. If a free public service

6

Page 7: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

data via an RSS feed. pipes could provide the webscraping functionality that Needlebase provides, yet offer a RRS feed output that could be picked up by either Talend or Pentaho in order to schedule nightly loads. It might be a more efficient way to pass web based data streams through various API's prior to extractions using PDI>

like Yahoo Pipes is being used to stream data through various analytic API’s, we will potentially release personal student data.

Statistical Modeling

There are three major statistical software vendors: SAS, SPSS and R. All three of these tools are excellent for developing analytic/predictive models that are useful in developing learning analytics models. This section focuses on R. The open source project R has numerous packages and commercial add-ons available that position it well to grow with any LA program. Given that many researchers are proficient in R, incorporating the R engine into a LA platform also offers an opportunity to engage faculty in the development of reusable models/algorithms.

R R is an active open source project that has numerous packages available to perform any type of statistical modeling.

R statistics strength is the fact that it is a widely used by the research community. Code for analysis is widely available and there are many packages available to help with any type of analysis and presentation that might be of interest. Some of these include:

1) Visualization:a) ggplot provides good

charting functionality.b) googlevis provides an

interface between R and the

Although I really like R there are two issues that may be of concern to some universities:

1) Lack of Support - only Revolution R provides support for the R product

2) High Level of Expertise Required to Develop and Maintain R. How does a university retain people that have the skill required to develop and maintain R/RevoDeployR. However, since many faculty and students are

7

Page 8: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

Google Visualization API

2) Text Mining:a) tm provides functions for

manipulating text including stripping whitespace and stop words and removing suffixes (stemming).

b) openNLP identifies words as nouns, verbs, adjectives or adverbs

c) wordnet provides access to wordnet library. This is often used to replace similar words with a common word prior to text analysis.

Here are a few articles that show the power of using a few of these text mining packages:

1. Creating a wordle using tm and ggplot - http://www.r-bloggers.com/building-a-better-word-cloud/2. Provides an overview of conducting text analysis using R - http://www.jstatsoft.org/v25/i05/paper

Oracle has also integrated R into it's

proficient with R, perhaps building a platform similar to Datameer (see below) would allow R code to be community sourced allowing the majority of faculty and students to easily access and build their own learning dashboards.

8

Page 9: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

11g RDMS allowing R models direct access to RDMS data.

Revolution R Offerings Including:

RevoDeployR RevoConnectR Integration with

IBM Netezza

Revolution R provides support for the open source R engine and provides add on to enhance the integration and use of R within databases and websites. The RevoDeployR is a server-based platform that provides access to the R engine via a RESTful API. The RevoConnectR allows use of Hadoop stored data by the R engine. Revolution R also provides integration with IBM Netezza data warehouse appliances providing a scalable infrastructure for analyzing very large datasets.

Revolution R is the only commercial support offering for R. Revolution R will be useful for institutions that have procurement or risk management policies that restrict the use of open source products.

Revolution R tools are free for research purposes and their support contract or licenses for institutional purposes (i.e. learning analytics and dashboards) are very reasonable. I was quoted $4,ooo/core for RevoDeployR product.

The support that I received using RevoDeployR was very slow. However, I am not a supported customer.

rApache This is an open source apache module named mod_R that embeds the R statistical engine inside the web server.

Zementis ADAPA Zementis offers a PMML-based scoring engine which can be deployed on-site, within a greenplum database, within an excel spreadsheet or consumed as a web service using Zementis amazon cloud based service. By using the PMML (Predictive Model Markup

ADAPA allows for easy consumption of predictive scores into a student or faculty web based learning dashboard. The cloud based service starting at only $0.99/hr only requires a $2000/semester investment. I tried using the API to create a Purdue-like dashboard in

Zementis has partnered with RevoDeployR to create their web base subscription service using RevoDeployR. So if RevoDeployR is part of your LA architecture, it could provide the same functionality using your in house service.

9

Page 10: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

Language) standard ADAPA can easily leverage predictive models developed in the major statistical software including R, SAS and SPSS. It can quickly provide scoring based on any of the following modeling techniques: - Support Vector Machines - Naive Bayes Classifiers - Ruleset Models - Clustering Models - Decision Trees - Regression Models - Scorecards - Association Rules - Neural Networks

the LA tool, but I did not have time to get it working properly.

Network Analysis

Network Analysis focuses on the relationship between entities. Whether the entities are students, researchers, learning objects or ideas, network analysis attempts to understand how the entities are connected rather than understand the attributes of the entities. Measure include density, centrality, connectivity, betweenness and degrees. This is an important area to explore, as we take up Herbert Simon (from Carnegie Mellon University) challenge and nudge learning and teaching ‘from a solo sport to a community based research activity’. Network analysis can not only help us identify pattern that help identify dis-connected students or help predict success based network metrics, these tools can help student develop networking skill that will be required for successful life long learning and research.

SNAPP Social Networks Adapting Pedagogical Practice (SNAPP) is a network visualization tool that is delivered as a 'bookmarklet' . Users can easily create network visualizations from LMS forums in

Self Assessment Tool for Students

SNAPP provide students with easy access to network visualizations of forum posting. These diagram can help students understand their

10

Page 11: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

real time. contribution to class discussions.

Identify at Risk Students/ Monitor Impact of Learning Activity

Network Analysis visualizations can help faculty identify students that may be isolated. They can also be used to see if specific activities have impacted the class network.

NodeXL NodeXL is an excel add-on that creates network visualizations from a worksheet containing the lists of edges. The tool provides the ability to calculate common networking measures such as density, centrality, connectivity, betweenness and degrees. Data can be exported in a format that can be imported into gelphi for further analysis or refined visualization.

Sophisticated Network Analysis

Both NodeXL and Gelphi can be used to explore network patterns. These tools are useful for researchers. It would be interesting to explore the relationship these network metrics ( e.g. centrality and betweeness) and student success.

Gelphi Gelphi offers a standalone product for analyzing networks. It is the most advanced of the three network analysis tools described in this section.

11

Page 12: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

cohere Simon Buckingham and Anna De Liddo have developed an enhanced diigo-like tagging/bookmark tool that has allows a user to link their contributions to other ideas and websites with descriptive adjectives.

Idea Creation

While this tool provides the creators with data that is useful to conduct their discourse analysis research it also provides people/researchers with a tool that may help connect them to people that have related interests and ideas and may help to stimulate new ideas and collaborations.

Other Tools for Analysis

ViralHeat Viral heat provides a full-featured tool set and an API that helps monitor web content for specific mentions of people, products and services.

Monitor and Evaluate Course/Program Satisfaction

This relatively cheap analytics offering could help introduce the use of analytics by helping evaluate a recruitment drive/strategy or fundraising campaign.

WordNet Princeton University provides a lexical database that links English words (or sets of words) by their common meaning. It is essentially a database that helps identify synonyms.

Identify Main Concepts found in a Learning Objects / Forum Post

This lexical database is used in text analysis to replace similar words with one common descriptor.

Leximancer Leximancer provides sophisticated text analysis and presentation of concepts found in a learning object.

Identify Main Concepts found in a Learning Objects / Forum Post

12

Page 13: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

The API can return interactive concept maps demonstrating how different ideas connect.The tool provides the ability to drill from the concept map down to the text that spawned the concept map.

Leximacer could be a used to help consolidate the main ideas of a lecture or discussion groups. It can also provide students with easy access to the detailed discussion and material related to a concept via a link from the concept map to the discussion forum posting.

Wolfram API The Wolfram Alpha API provides a developers with the ability to submit free text/questions from a website to the Wolfram Alpha engine and have the results returned.

Dynamic Content Delivery

The wolfram API could be used to provide supplemental material to on-line discussion.

Linked Data

If Tim Berners-Lee vision of linked data (http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html) is successful in transforming the internet into a huge database, the value of delivering content via courses and programs will diminish and universities will need to find new ways of adding value to learning. Developing tools that can facilitate access to relevant content using linked data could be one way that universities remain relevant in the higher learning sector.

Ontologies

e.g. DBPedia

Ontologies are essentially an agreed upon concept map for a particular domain of knowledge.

Dynamically Deliver Relevant Content

Using OpenCalais along with well-

13

Page 14: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

defined ontologies provides a mechanism for dynamically delivering/suggesting related readings.

OpenCalais Reuter’s offers this free API that takes text input and returns tags that will link the concepts in the text to other linked data on the web.

Visualization

The presentation of the data after it has been extracted, cleansed and analyzed is critical to successfully engage students in learning and acting on the information that is presented.

Google Visualization API’s

(http://code.google.com/apis/chart/)

Google Visualization provides an API to their chart library allowing for the creation of charts and other visualizations. They have recently released an API to add interactive controls to their charts.

Interactive Learning Dashboards

All of these tools are useful for creating visualizations for learning feedback systems such as dashboards.

The Motion Chart (purchased from gapminder) is one of my favourite interactive charts that Google provides access via their API.

All of these tools can present data as a heat maps, network analysis diagrams and tree maps. Here's a link to an example dashboard created in D3, presenting university admission data.http://keminglabs.com/ukuni/

Learning how to use these tools/libraries requires a fair amount of effort. Developer retention is a risk for system maintenance and enhancement.

Protovis(http://mbostock.github.com/protovis/)D3(http://mbostock.github.com/d3/)

Protovis and D3 are JavaScript frameworks for creating web-based visualizations. Protovis is no longer an active open source project. It has been replaced by D3.

FusionCharts(http://www.fusioncharts.com/)

Fusion Charts provides a commercial JavaScript framework for creating dynamic visualizations.

Reporting Suites Many universities have reporting tools available to create

All of these vendors provide good tools to create reports and

14

Page 15: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

visualizations. Tools include Tableau, Cognos, Pentaho and Jasper Reports.

dashboards. My favourite is Tableau, however, JasperReports or Pentaho are much more affordable.

Full Analytics Offerings

LOCO Using LOCO (Learning Object Context Ontologies) student on-line activities are mapped to specific learning objectives. The tool set provides faculty with feedback related to how well material has been understood, as well it provides network visualizations describing student interaction. The tool provides a framework for describing on-line learning environments.

Faculty Feedback Related to Learning Success

DataMeer(http://www.datameer.com/)

Datameer provides full set of tools allowing users to conduct advanced analytics on Hadoop based data.

Engage Faculty in Learning Analytics

I like Datameer's wizard based approach to user controlled analytics. It provides some ideas on how one could provide faculty with the ability to contribute or reuse predictive models, quickly test historic data, deploy a learning analytics algorithm and present the results in a learning dashboard.

This approach may be too complicated for delivery to the masse, as I suspect that the majority of faculty will want something that requires less effort.

15

Page 16: Learning Analytics: Tool Matrix - Wikispaceslak12.wikispaces.com/file/view/analytics_tools.docx  · Web viewData Standards. Initiatives like PSLC will help the learning analytics

16