As some people may know as part of the move towards EPrints 3.2 we have rewritten the way file storage works within EPrints entirely (after the recode we had less lines of source code!). Basically we have implemented an abstracted storage layer called the EPrints Storage Controller that is able to utilize new “storage plug-ins” to store files in different places.
As a starting point myself and Tim Brody have just successfully tested an EPrints install which is storing it’s content in Amazon S3/Cloudfront! Details on the power and capabilities of providing these 2 services are outlined briefly here.
Storage Layer features:
- Write your own plug-ins to marry EPrints with any chosen storage platform.
- Use multiple plug-ins simultaneously.
- Impose rules such that all volatile files (thumbnails etc) are stored locally, with the main file being stored in the cloud.
- Impose rules which state that any document submitted by a Physicist on a Thursday are stored both locally and by two cloud storage providers. All rules are implemented in the existing epc xml scripting language.
What does Cloud Storage mean to your repository?
- Cloud Storage means less bandwidth usage for provisioning of resources to users.
- Cloud Storage enables those who don’t have money or space to host 100’s of Terabytes, to build and control a repository of this size.
- Services such as Amazons Cloudfront can handle the replication and worldwide distribution of your objects.
- Your users download documents from their local continental mirror, an advanced feature enabled by the new Storage Controller.
- Using Amazon services also provides a means by which the users pay for their download bandwidth.
For the Amazon S3 plug-in we used a Storage Controller ruleset which imposed rules meaning only full files were stored in Amazon S3, with all thumbnails and previews remaining local. This reduces the cost of storing and retrieving you files.
Of course the Amazon S3 plug-in is one example of a storage plug-in, you could just as easily write your own.