Review of the recent Slicer package server failures

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Review of the recent Slicer package server failures

Jean-Christophe Fillion-Robin
Hi Slicers,

Starting Friday March 3rd 2017, both Slicer application and extensions nightly package downloaded from https://slicer.kitware.com and https://download.slicer.org were broken with a file size of 0 bytes [1]. Additionally, developers reported that they couldn't manage rights associated with collections [2].


I want to take the time to explain what happened. I recognize that this was a significant disruption to the productivity of everyone who relies on both the nightly builds and the server to organize data. We are implementing a number of changes to help prevent similar problems in the future.

On Friday afternoon, we quickly realized that the partition hosting the data asset store was full and planned to run the cleanup scripts used to remove all files associated with nightly packages older than the last stable Slicer release. We also conducted a pair debugging session to understand the second issue preventing collection administrators from managing access rights. Both issues were fixed on Monday.


[1] http://massmail.spl.harvard.edu/public-archives/slicer-devel/2017/041540.html
[2] https://github.com/midasplatform/Midas/issues/265


Midas and Slicer
============

In the coming weeks we will iteratively define the transition plan to upgrade the platform powering https://slicer.kitware.com used by the Slicer Extension Manager, Slicer Data Store, Slicer's testing data server, and http://download.slicer.org. As already presented in the email “Slicer Data Management and Web Services: Contributions Welcomed!” sent in October 2016 [3], during the coming developer hangouts we will formalize our plan to transition and adopt a more modern data management and web service technologies based on the Girder platform.

In the meantime, you can follow our plan on this wiki page:
   
    http://bit.ly/modernize-slicer-extensions-datastore-testingdata-packages-server


[3] http://slicer-devel.65872.n3.nabble.com/Slicer-Data-Management-and-Web-Services-Contributions-Welcomed-tt4037401.html



Issue #1: nightly packages of size 0
=========================

How did this happen ?
----------------------------

Our nagios probes properly notified our sysadmins, one time on Dec 28 2016, and another time on Feb 2nd 2017. After getting the email from our sysadmins early February, the asset store was then 97.9% full with 1444.08 of 1475.4 GB. That said, since the execution of the clean-up scripts [4] described earlier wasn't automated, we forgot to proceed to the cleanup. This explains the problem we observed.


Remedy and Prevention ?
---------------------------------

After improving, validating and running the cleanup scripts, a total of 890 GB was freed. This corresponds to ~200K extension nightly packages (257 GB) and 3159 slicer nightly packages (633 GB).

Note that only nightly packages older than release 4.6.2  have been removed and that download stats and metadata associated with each packages are still available.

To prevent similar issues, we updated the Slicer release checklist to make sure older packages will be removed [5]. We are also looking into automating the cleanup script execution.


[4] https://gist.github.com/jcfr/ea9ef199bd5a3e071b8f
[5] https://www.slicer.org/wiki/Documentation/Nightly/Developers/ReleaseProcess#Midas



Issue #2: problem with managing collections
===================================

How did this happen ?
----------------------------

Earlier this year, our security team worked on addressing “cross site scripting” (XSS) issues associated with our different instances of the midas platform. One of the patches added an extra parameter [5] to the getAll() function found in core/models/pdo/UserModel.php. Since the extra argument with default value was added to the middle of the function and not all call to the getAll() function were updated, arguments were improperly passed and a UserDAO object ended being passed instead of an offset argument. Also worth noting that PHP functions only support positional arguments. See [6].

[5] https://github.com/Slicer/Midas/commit/9a2a3b81cb3abf00e1e7ab1212acee2f69ed103f#diff-a5af07e0955dcd177f2e4984f6cc4cff
[6] http://stackoverflow.com/questions/1342908/named-php-optional-arguments/1342933#1342933


Remedy and Prevention ?
---------------------------------

After identifying the regression, a patch was tested and deployed on midas3.kitware.com and slicer.kitware.com. See [7]. In the meantime, the patches addressing the XSS issues (originally done locally) have also been moved to dedicated github branches.

Moving forward we will modernize the infrastructure adopting Girder platform where the testing infrastructure has been careful designed to run exhaustive test suites for both backend and frontend [8].


[7] https://github.com/Slicer/Midas/commit/fd959c06448ebc8e79914a037acb2075f9e9f846
[8] https://girder.readthedocs.io/en/latest/development.html



Thanks and have a good weekend,
Jc


_______________________________________________
slicer-devel mailing list
[hidden email]
http://massmail.spl.harvard.edu/mailman/listinfo/slicer-devel
To unsubscribe: send email to [hidden email] with unsubscribe as the subject
http://www.slicer.org/slicerWiki/index.php/Documentation/Nightly/Developers/FAQ
Reply | Threaded
Open this post in threaded view
|

Re: Review of the recent Slicer package server failures

Steve Pieper-2
Thanks for the fix and the detailed explanation Jc.

Have a great weekend,
Steve

On Fri, Mar 10, 2017 at 5:37 PM, Jean-Christophe Fillion-Robin <[hidden email]> wrote:
Hi Slicers,

Starting Friday March 3rd 2017, both Slicer application and extensions nightly package downloaded from https://slicer.kitware.com and https://download.slicer.org were broken with a file size of 0 bytes [1]. Additionally, developers reported that they couldn't manage rights associated with collections [2].


I want to take the time to explain what happened. I recognize that this was a significant disruption to the productivity of everyone who relies on both the nightly builds and the server to organize data. We are implementing a number of changes to help prevent similar problems in the future.

On Friday afternoon, we quickly realized that the partition hosting the data asset store was full and planned to run the cleanup scripts used to remove all files associated with nightly packages older than the last stable Slicer release. We also conducted a pair debugging session to understand the second issue preventing collection administrators from managing access rights. Both issues were fixed on Monday.


[1] http://massmail.spl.harvard.edu/public-archives/slicer-devel/2017/041540.html
[2] https://github.com/midasplatform/Midas/issues/265


Midas and Slicer
============

In the coming weeks we will iteratively define the transition plan to upgrade the platform powering https://slicer.kitware.com used by the Slicer Extension Manager, Slicer Data Store, Slicer's testing data server, and http://download.slicer.org. As already presented in the email “Slicer Data Management and Web Services: Contributions Welcomed!” sent in October 2016 [3], during the coming developer hangouts we will formalize our plan to transition and adopt a more modern data management and web service technologies based on the Girder platform.

In the meantime, you can follow our plan on this wiki page:
   
    http://bit.ly/modernize-slicer-extensions-datastore-testingdata-packages-server


[3] http://slicer-devel.65872.n3.nabble.com/Slicer-Data-Management-and-Web-Services-Contributions-Welcomed-tt4037401.html



Issue #1: nightly packages of size 0
=========================

How did this happen ?
----------------------------

Our nagios probes properly notified our sysadmins, one time on Dec 28 2016, and another time on Feb 2nd 2017. After getting the email from our sysadmins early February, the asset store was then 97.9% full with 1444.08 of 1475.4 GB. That said, since the execution of the clean-up scripts [4] described earlier wasn't automated, we forgot to proceed to the cleanup. This explains the problem we observed.


Remedy and Prevention ?
---------------------------------

After improving, validating and running the cleanup scripts, a total of 890 GB was freed. This corresponds to ~200K extension nightly packages (257 GB) and 3159 slicer nightly packages (633 GB).

Note that only nightly packages older than release 4.6.2  have been removed and that download stats and metadata associated with each packages are still available.

To prevent similar issues, we updated the Slicer release checklist to make sure older packages will be removed [5]. We are also looking into automating the cleanup script execution.


[4] https://gist.github.com/jcfr/ea9ef199bd5a3e071b8f
[5] https://www.slicer.org/wiki/Documentation/Nightly/Developers/ReleaseProcess#Midas



Issue #2: problem with managing collections
===================================

How did this happen ?
----------------------------

Earlier this year, our security team worked on addressing “cross site scripting” (XSS) issues associated with our different instances of the midas platform. One of the patches added an extra parameter [5] to the getAll() function found in core/models/pdo/UserModel.php. Since the extra argument with default value was added to the middle of the function and not all call to the getAll() function were updated, arguments were improperly passed and a UserDAO object ended being passed instead of an offset argument. Also worth noting that PHP functions only support positional arguments. See [6].

[5] https://github.com/Slicer/Midas/commit/9a2a3b81cb3abf00e1e7ab1212acee2f69ed103f#diff-a5af07e0955dcd177f2e4984f6cc4cff
[6] http://stackoverflow.com/questions/1342908/named-php-optional-arguments/1342933#1342933


Remedy and Prevention ?
---------------------------------

After identifying the regression, a patch was tested and deployed on midas3.kitware.com and slicer.kitware.com. See [7]. In the meantime, the patches addressing the XSS issues (originally done locally) have also been moved to dedicated github branches.

Moving forward we will modernize the infrastructure adopting Girder platform where the testing infrastructure has been careful designed to run exhaustive test suites for both backend and frontend [8].


[7] https://github.com/Slicer/Midas/commit/fd959c06448ebc8e79914a037acb2075f9e9f846
[8] https://girder.readthedocs.io/en/latest/development.html



Thanks and have a good weekend,
Jc


_______________________________________________
slicer-devel mailing list
[hidden email]
http://massmail.spl.harvard.edu/mailman/listinfo/slicer-devel
To unsubscribe: send email to [hidden email] with unsubscribe as the subject
http://www.slicer.org/slicerWiki/index.php/Documentation/Nightly/Developers/FAQ


_______________________________________________
slicer-devel mailing list
[hidden email]
http://massmail.spl.harvard.edu/mailman/listinfo/slicer-devel
To unsubscribe: send email to [hidden email] with unsubscribe as the subject
http://www.slicer.org/slicerWiki/index.php/Documentation/Nightly/Developers/FAQ