The Fetishization of Educational Data

As the term “Big Data” muscles into education, we’re starting to see what that collection of data means for learning outcomes, and it’s not pretty. Within the last few years “Big Data” has easily become one of the biggest trends in education. Everyone wants more data; we’ve got huge databases full of it, some people support it while others fear and oppose it. But after it’s all said and done, what is actually happening with that data?

Through a wide range of products I’ve seen, the term “educational data” can now refer to practically everything about a learner, some are fairly obvious and givens, but others leave me scratching my head. For example; during student enrolment, it would be important for the staff and teachers to know who a learner’s legal guardian is, or what their living arrangements are. i.e., they might live with a distant relative, or even potentially homeless, living in a hostel. All of which would be critically important to their teachers. But as some systems suggest, this data can be leveraged with software solutions to improve their learning outcomes, but we’ll come back to this shortly.

The promise of big data was that you could use it to inform your decisions. Because that promise was so general, the educational implementation of that has for the most part become “Collect everything, we’ll figure out what we can do with it later”. While that’s a bold ambition, the result of that has seen a lot of teachers manually updating and entering student data in many different systems with no clear vision of “why?”.

The range of data collected includes everything from the normal attendance, behaviour and individual class marks to standardised test results, biographical and geographical data. Including even one absurd product that asked for estimated parental / guardian income… you know. To “boost outcomes” …

Some of this data has to be collected anyway, attendance is a legal requirement for the most part, standardised test results are there whether you like it or not, but this isn’t just about a single instance of the data. The lack of interoperability between systems means that even after deciding to hand the data over, it’s being handled twice, three times or even more, across many disjointed systems and solutions.

These scenarios don’t even begin to cover the data that becomes “orphaned” when teachers migrate systems, try new tools, or school procurement overwrites their platform choices. Which leaves a lot to be said about school procurements policies.

It’s a fair argument to make that the more digital products you use, the more students you have, the greater the data footprint. This can be either behind the scene or at the forefront of the learning experience. At a recent event in London hosted by the Assignment Report, there were some interesting discussions around teacher workflows and educational data adding additional workload to a teacher. The most troubling parts of It came when an Ed Tech executive encouraged that this data (strengths, weaknesses, attendance and preliminary summative test results) could be useful for their learning products, or that it was critical to their learning journey to begin with. If in the age of machine learning, cognitive science and psychometrics, your digital product requires you to input strengths and weaknesses, attendance or any other arbitrary data points, you’re doing it wrong, and here’s why.

  • Manual data entry can be error prone and takes dedicated chunks of time to perform.
  • This assumes that there must be some other diagnostic before learning can happen, and the result of that diagnostic must be fed into the system to “warm it up”.
  • People lie, not always intentionally, but we’re all terrible at evaluating our own skills, why would we trust ourselves or anyone else to input our personal strengths and weaknesses?
  • This takes the focus away from formative learning back into to summative.

On the surface those points don’t sound too bad, but they are when you think in the context of machine learning and observable metrics.

Observable metrics don’t lie, and they don’t require personal or teacher involvement to collect or curate them. By simply having learners engage with your platform you can observe students logging in to systems, completing actions, consuming content, machine gradable test results, and even reflect that data back to evaluate how your content is performing. Here’s a sample of some of the data you can pull from observable interactions with learning systems.

  • Attendance — Capture log in / log out and session details.
  • Location — Utilise an IP lookup to see rough locations of the device when it connected. Is the learning happening mostly at school or elsewhere?
  • Engagement — If they’re meant to be watching an instructional video that’s 4:30 long and they’ve clicked through after 30 seconds that’s just one data point. Imagine taking into account reading levels, mouse movements, keyboard strokes and window focus (what window is actually open on their machine?)
  • Correct v’s Incorrect — Machine gradable questions easily allow you to establish quick assessment baselines in a formative and summative flow.
  • Strengths through media — Providing varied instruction and assessment can easily allow you to aggregate the results to see individual learning style preference (not excluding against Strength of Content).
  • Strengths through learning objectives — By correctly mapping your content to the very learning objectives it aims to achieve, you not only ensure your lessons and content are appropriate, but you also create a network of rankings. This then allows you to see raw data against learning outcomes, not arbitrary rubrics.
  • Strength of Content — By reversing the approach, you start to easily see which content is performing and which isn’t, purely by the impact it’s had on the learning objectives.

But let’s take this up a notch. This is just all the data you can observe through student interactions. What happens if we were to apply modern cognitive services and machine learning against it? Here’s just a few:

  • Predict attendance and engagement issues before they arise
  • Measure sentiment of student work to evaluate positive / negative writing patterns
  • Predict answer accuracy
  • Group and Individual Proficiency
  • Predict potential proficiency

Now compare what we can extract and infer, with what we’re meant to be manually entering. Can anyone tell me why we’re entering this data? Or managing it ourselves? The next shift in educational technology and data has to be a movement towards Evidence Based Data. Where observations of patterns and behaviours show the real results, rather than trying to leverage data that’s been handled.

Some of these solutions aren’t cheap to build and implement. But when you think about the cost to the teacher, parent, school and learner of having to maintain this data;

  • The lost opportunity cost of teaching a student.
  • The lost opportunity cost of spending hours dedicated to data admin.
  • The lost opportunity cost of improving as an educator because the data is wrong.
  • and the lost opportunity cost of a burnt-out teacher.

The cost v’s benefit becomes a lot clearer.

The aim of data should always be to help improve the learner and teacher abilities while also informing the school and guardians. If the data you’re handling, collecting or managing isn’t helping achieve those goals. Why is it being collected at all?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s