Thursday, January 21, 2016

Moodle Outage Recap

On Friday 1/15 and Wednesday 1/20, the Moodle system (http://lms.manhattan.edu) experienced a major system issue.  As a result, the Moodle system was offline from approximately 9AM-4PM on Friday 1/15 and Wednesday 1/20 for troubleshooting.  The cause of the issue has since been identified and resolved.  As of 10AM on Thursday 1/21, the Moodle system is functioning normally and all class enrollments for Spring 2016 are online.

On Friday 1/15 around 9AM, ITS received a report of class enrollment issues for users accessing the Moodle system.  ITS promptly took the Moodle system offline for troubleshooting.  Troubleshooting lasted much of the day and the Moodle system was restored from a recent backup from 2AM on Friday 1/15.  Please note that course work posted between 2AM-9AM on Friday 1/15 may need to be reposted.  The restore process was completed by 4PM on Friday and automatic class enrollments for new students were halted until Tuesday 1/19.

ITS closely monitored the Moodle system through Tuesday 1/19.  New class enrollments were re-enabled on 1/19 and no issues were experienced or reported.  New user creation was re-enabled for Tuesday evening.

On Wednesday 1/20, ITS became aware of issues with the Moodle system similar to those reported on Friday 1/15.  Again, the Moodle system was taken offline for troubleshooting.  The cause of the issue was identified and resolved without the need to restore from a backup.  The Moodle system was made available to the community by 4PM on Wednesday 1/20.

As of 10AM on Thursday 1/21, all automated process to add users and course enrollments have been enabled and are functioning normally.

The cause of the issue in both cases was a configuration being applied to the SSO Authentication settings of Moodle that included a typographical error.  The configuration error caused unexpected behaviour in the automated processes to add new users and enrollments to the Moodle system.  The configuration error has been corrected and ITS is working to implement tighter controls to prevent similar errors from happening in the future.