With the immense increase in volume of data acquisition and archival comes the challenge of intensive data processing that we are all trying to solve. There are many efforts underway to achieve this by infusing cloud technologies into software infrastructure. In this session we would like to cover the various approaches being taken to move towards scalable storage and auto scaled processing. We will talk about porting applications to the cloud, container based deployment models and a hybrid science data processing system. This data system utilizes both on-premise and remote compute resources to meet latency requirements while handling the large volumes of data. Cloud based infrastructure is being used for running data analytic stacks, automated workflows for reprocessing campaigns, forward keep up and much more. Several projects have invested in cloud technologies such as GRFN (Getting Ready for NISAR), PO.DAAC, SWOT and so on. We have explored running our softwares on several cloud platforms like Azure, Google Cloud Platform, Amazon Web Services, High Performance Computing and Kubernetes. We would like to shed some light on such work and lessons learned in the process.
Presentations
- Cloud-based Data Processing and Workflow Systems – Namrata Malarout (Jet Propulsion Laboratory, California Institute of Technology)
- Steering the Ship: Making Sense of Multi Container Deployments with the Help of Kubernetes and AWS – Frank Greguska (Jet Propulsion Laboratory, California Institute of Technology)
Packaging applications into containers is an easy and effective mechanism for delivering software that is runnable, repeatable, and reliable. However, that is just the first step. Scientific applications that deal with big data tend to require parallelism and multiple focused applications. In this case, it is necessary to manage multi-container deployments spread across multiple machines. In this talk, one solution for deploying and managing multi-container deployments will be explored in depth. The focus will be on Kubernetes, Amazon Web Services, and Apache SDAP as deployed for the NASA Sea Level Change Portal. You can look forward to a Kubernetes crash course followed by a detailed explanation of a production deployment of an Elastic Kubernetes Service (EKS) cluster. - Cumulus Lessons Learned: Building, testing, and sharing a cloud archive – Patrick Quinn (NASA / EED-2 / Element 84)
Cumulus is a scalable, extensible cloud-based archive system which is capable of ingesting, archiving, and distributing data from both existing on-prem sources and new cloud-native missions. As we have built and evolved the system with contributions from seven NASA EOSDIS organizations, we have learned several lessons about how to build a robust, broadly-applicable, microservices-based cloud system for geospatial data which we will share in this talk.
Session recording is here.