Special section on dataset descriptions for learning analytics
George Siemens, LINK Research Lab, University of Texas, Arlington
Stefan Dietze, L3S Research Center, Germany
Hendrik Drachsler, The Open University, Netherlands
Davide Taibi, CNR, Italy
Although learning analytics is increasingly being applied in education, it is still an application area that lacks publicly available and interoperable datasets. Though a lot of research is being conducted on learning analytics, the community lacks a sufficient amount of open, reusable and publicly available datasets that would allow the reproduction and experimental evaluation of algorithms, methods and tools in the Learning Analytics area. Given the data-centric nature of the LA domain, availability of data is seen as a key enabler for maturing the field. While the LAK Dataset has provided a an unprecedented and publicly available resource for structured data about Learning Analytics research - i.e. the actual research works as captured in scholarly papers from the community - actual learning analytics data used by such research works is either not publicly available at all, or, spread across distributed and disparate endpoints.
Here we seek to collect such datasets, i.e. data that arises from actual learning processes in any domain (kindergarten, K12, HE, or workplace) that is used within LA research and practice. Such datasets could originate from formal or informal online learning environments (for instance MOOCs, LMSs, digital games for learning, online inquiry tools or professional learning communities); they could also be gathered from face-to-face learning environment (for instance eye-tracking or motion capture traces). Datasets of relevance could include data about cognitive development, social learning, discourse progression, network interactions, learning paths through courses, competency completion, help seeking behaviour, and distributed multi-spaced interactions. While such primary data about learning processes are of central importance, we are also interested in complementary data gathered through surveys, for instance, about learner demographics, background knowledge, goals, perceptions, experiences and attitudes.
What kind of shared datasets are useful to advance learning analytics research and practice? This might involve datasets that help to further methodological development, or that are of direct utility for learning analytics. We welcome submissions that are relevant to this broad objective and datasets that are useful in learning analytics contexts. These might include, but should not necessarily be limited to datasets that:
Enrich learning analytics or educational data mining scenarios
Help evaluation of learning analytics, educational data mining or related tools and methods
Provide a challenging test case for algorithmic or model development;
Provide a scenario for visualisation techniques;
Express data in an existing or emerging data standard, either as a collection of reference examples or as a test case for interoperability (i.e. to test whether sufficient meaning can be reconstructed without knowledge of the source system);
Specific combinations of datasets designed to test a particular theory or hypothesis, accompanied by explanatory rationale and any research findings already derived, to aid replication studies and theoretical development
For any dataset it is important to explain how privacy has been protected (simulated data may be admissible, but this should be justified).
Datasets need to comply with the following criteria:
All data needs to be made available under open license terms (eg CC-BY, Open Data License) available for reuse by third parties
The dataset provider needs to hold all rights to share the data publicly on the Web
Data needs to be accessible online and preferable as a dump or via a public HTTP-accessible API or SPARQL endpoint
Data needs to be accessible in standardised serialisations and formats, such as, XML, CSV, JSON or RDF. Date should be complemented with a description of the fields, a schema file and/or vocabulary description
Submission system & author instructions are availalbe at http://epress.lib.uts.edu.au/journals/index.php/JLA/about/submissions#onlineSubmissions
In the journal submission system, the manuscripts need to be submitted to “Special section: Dataset Descriptions for Learning Analytics”.
The normal journal submission template is available at
All the submissions need to follow the formatting guidelines strictly and each submission needs to be formatted (including font styles and sizes, spacing, margins and other formatting issues) exactly as the papers published in the journal to date.
A typical learning analytics data paper will be around 1000-1500 words in length and should have the following sections:
Creator / Owner
Date / Version
Format, schema, vocabularies, codebook
Restrictions to use (if any)
Provenance, extraction, maintenance
Ethical and privacy considerations*
References (including references to the research papers with the data)
*The data paper itself gives the information about where and under what conditions the data can be accessed. It needs to include details of the ethical guidelines that were followed in collecting the data and that other researchers should follow.
Accepted dataset papers will be published in a special section of the Journal of Learning Analytics. In addition, in order to further disseminate and encourage reuse of datasets, we intend to set up a separate repository as part of the LAK Dataset, where all datasets will be cataloged and made available according to Linked Data principles.
31 May 2015 Submission deadline
31 June 2015 Review feedback
01 September 2015 Camera-ready version
Fall 2015 Special section publication