Predictive Analytics: Reading Resources


Share this Post

Many higher education institutions have widespread student success initiatives. Unfortunately, these often occur without an infrastructure to understand what among those interventions and innovations is working, and for whom. A growing number of leaders in higher education are turning to predictive analytics to bring together their disparate systems of information and data storage. By exploring the signals there, and using these insights to inform action, they are scientifically transforming educational outcomes for the better. To assist with their success, we compiled a reading list of articles, journal entries, blog posts, websites and more that speak to help inform your practice as you build your analytics adoption. These resources range from entry-level primers to advanced data science.

Articles About Predictive Analytics in Higher Education

Editor’s note March 2017: When this article was written there was a limited number of articles on this topic. In the last three years, the Civitas community has garnered tremendous national media interest. In addition to the stories and articles originally published below, please access our press room and read about the current work and the momentum of the analytics movement: Press Room Personalized Learning is Sweeping College Campuses, Courtesy of Big Data Ed Tech Magazine – February 2013 This article explores the roles of data mining and learning analytics in developing more personalized student learning environments. An infographic walks readers through the cycle for this process, from the collection of detailed data about the student’s experience in a course through the use of that data to make predictions that lead to interventions and learning materials appropriate to that student’s performance. The Power of Predictive Analytics Campus Technology – September 2013 In this story, Phil Ice explains that the American Public University System has long used predictive analytics, and as a result, their dropout rate has fallen by 17 percent through analysis of 187 data points that help pinpoint students who are likely to withdraw within the next five days. This story examines the use of predictive analytics in higher education and the importance of using that data on the front lines. Welcome to the Era of Big Data & Predictive Analytics in Higher Education These slides are from a session with Ellen Wagner (Sage Road Productions) and Joel Hartman (University of Central Florida) for the PAR Framework project. The content presents an introduction to the emerging and evolving topics of “Big Data” and predictive analytics particularly as they apply to higher education and the use of data to improve student persistence and outcomes. Using Predictive Analytics to Improve Student Success Educause – November 2012 Video: EDUCAUSE Session description: “As provost of a regional university, Tristan Denley has pioneered a wide variety of initiatives to improve college completion and students’ academic success. These ideas stretch from institutional transformation and course redesign in a variety of disciplines, to the role of predictive analytics and data mining in higher education. His most recent work has created a course recommendation system that successfully pairs current students with the courses that best fit their talents and program of study for upcoming semesters.” The Evolution of Big Data & Learning Analytics in American Higher Education The Sloan Consortium – June 2012 This article from Anthony Picciano, City University of New York, “examines the evolving world of big data and analytics in American higher education. Specifically, it will look at the nature of these concepts, provide basic definitions, consider possible applications, and last but not least, identify concerns about their implementation and growth.”

Predictive Analytics Beyond the Academy

Predictive Analytics 101 In this lengthy post with useful informational graphics, author Ravi Kalakota explains: “Analytics is the discovery and communication of meaningful patterns in data. It’s not the data but the signals in the data. Insight, not hindsight is the essence of predictive analytics. How organizations instrument, capture, create and use data is fundamentally changing the dynamics of work, life and leisure.” Predictive Analytics: Harnessing the Power of Big Data Eric Siegel explains the learning process from data to machine learning to predictions and how these insights can help us make better choices in important decisions. From machine learning at Stanford University successfully diagnosing breast cancer better than human doctors by discovering an innovative method that considers a greater number of factors in a tissue sample, to LinkedIn predicting your job skills, Siegel explores the value proposition of predictive analytics in an interesting narrative that entertains. He is the founder of Predictive Analytics World ( and the author of “Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die.” Predicting the Future, Part 1: What is Predictive Analytics This first in a four-part series from IBM offers an introduction to analytics. In this article the focus is on “knowledge obtained from analytics, which we may classify as descriptive or predictive. While descriptive analytics lets us know what happened in the past, predictive analytics focuses on what will happen next.” books



  • Eric Siegel – Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
  • Nate Silver – The Signals and the Noise: Why So Many Predictions Fall – but Some Don’t
  • Christopher Bishop – Pattern Recognition and Machine Learning
  • Trevor Hastie – The Elements of Statistical Learning
  • David MacKay – Information Theory, Inference, and Learning Algorithms
  • Mara Tableman – Survial Analysis Using S: Analysis of Time-to-Event Data
  • Gerald van Belle – Statistical Rules of Thumb



While by no means a complete review of the literature available, here are some recommended reads, along with a few additional comments from our data science team.
  • Chimka, J. R. and L. H. Lowe (2008). “Interaction and survival analysis of graduation data.” Educational Research and Review 3(1): 029-032.
  • Chimka, J. R. and Q. Wang (2009). “Accelerated failure-time models of graduation.” Educational Research and Review 4(5): 267-271.
  • Herzog, S. (2006). “Estimating Student Retention and Degree-Completion Time: Decision Trees and Neural Networks Vis-à-Vis Regression.” New Directions for Institutional Research, 131: 17-33.
  • Kovačić, Z. J. (2010). Early Prediction of Student Success: Mining Students Enrolment Data. Informing Science & IT Education Conference (InSITE) 2010.
  • Corinna Cortes & Vladimir Vapnik (1995). Support-Vector Networks. Considered to be the original paper on Support Vector Machines, heavily cited. Possibly required reading for those who want to leverage SVM.
  • L.G. Valiant (1984). A Theory of the Learnable. Ahead-of-its-time insight on computational learning theory.
  • R. Neal and G.E. Hinton (1998). A View of the EM Algorithm that Justifies Sparse, Incremental and Other Variants. This is an incredibly important paper that shed new light on the estimation of latent variable models, connected it to concepts in statistical physics, and justified a lot of methods previously considered hacks that people knew worked but didn’t know why.
  • M.I. Jordan, Z. Ghahramani, T.S. Jaakola and L.K. Saul (1998). An Introduction to Variational Methods for Graphical Models. What’s become a standard technique in the probabilistic inference toolbox for doing inference in models with intractable posterior distributions, and a convenient deterministic alternative to MCMC.
  • F.R. Kschischang, B.J. Frey and H.-A. Loeliger (2001). Factor Graphs and the Sum-Product Algorithm. Generalized the concept of directed and undirected graphical models into what is now one of the more preferred formalisms.
  • S. Roweis and Z. Ghahramani (1998). A Unifying Review of Linear-Gaussian Models. One of the best tutorial papers on machine learning ever written, in our opinion. Concerns unsupervised learning with linear parameterizations.
  • R. Neal (1993). Probabilistic Inference using Markov Chain Monte Carlo Methods. Widely cited review paper and still one of the more comprehensive treatments of sampling methods from a machine learning perspective. Surveyed the field as it existed and proposed a number of novel algorithms, written by Radford Neal as a tech report while he was a grad student.lend14) were greeted with a new face and name for what has for 22 years been the Sloan Consortium.

Related Posts

« »