This event has ended. Create your own event on Sched.
Data to Action: Increasing the Use and Value of Earth Science Data and InformationFor 20 years, ESIP meetings have brought together the most innovative thinkers and leaders around Earth observation data, thus forming a community dedicated to making Earth observations more discoverable, accessible and useful to researchers, practitioners, policymakers, and the public.

The ESIP Summer Meeting has already taken place, but check out the ESIP Summer Meeting Highlights Webinar: https://youtu.be/vbA8CuQz9Rk.
Back To Schedule
Thursday, July 18 • 10:30am - 12:00pm
Challenges and Opportunities in Adopting Cloud technologies for Data Intensive Science

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
The amount of data generated by public and private sector organizations has increased many fold in the last decade. In recent years, consumers and providers of data are faced with an increasing challenge of managing the quantity and quality of information produced. The advent of cloud technologies has been a boon for the big data era offering a solution for the information overload. While cloud technologies have provided an excellent opportunity, challenges and opportunities on utilizing cloud technologies are still to be explored. The complex business/infrastructure aspect of the cloud technologies paradigm and the rapid changes in the technical development have made transitions complex and confusing at times. In this session, we hope to share case studies of migration/utilization of cloud technologies for data intensive science. The challenges and opportunities revealed by those case studies we hope will inform stakeholders, collaborators, and other interested parties. We hope that the lessons learned will inform future work and help expedite progress in the field of Earth Science informatics.

Developing Applications Using Earth Science Data in the AWS Cloud with PODPAC

Matt Ueckermann
Observational and modeled data products from NASA encompass petabytes of scientific data available for analysis, analytics, and exploitation. Unfortunately, these data sets are highly underutilized by the scientific community due to: (1) vast computational resource requirements; (2) disparate formats, projections, and resolutions that hinder data fusion and integrated analyses across different data sets; (3) complex and disjoint data access and retrieval protocols; and (4) task specific and non-reusable code development processes that hinder algorithm sharing and collaboration. In response, NASA EOSDIS is actively investigating migration of their vast data archives to storage on commercial cloud services such as Amazon Web Services (AWS). However, to maximize the benefit of cloud-based data storage, cloud-based data analysis and analytics are needed to process data “close” to where it is stored. Recognizing that migrating workflows to the cloud requires a high degree of cloud computing expertise, we are developing the Pipeline for Observational Data Analysis and Collaboration (PODPAC). PODPAC is a Python library designed to automatically harmonize disparate data sources, seamlessly access NASA earth science data, and analyze data in the AWS cloud. PODPAC is built around the tools of the Python data ecosystem (NumPy, Scipy, X-Array) and aims to bridge the gap between data sources, analysis, and the cloud. In this talk, we will introduce PODPAC, and demonstrate on-demand cloud computation of a value-added derived product using NASA data. 
Opportunities for Accelerating Science in the Cloud
Christopher Lynnes
As the data holdings of the Earth Observation System Data and Information System expand over the next several years, the typical data analysis process of downloading data to local compute resources will become increasingly inefficient. However, cloud computing promises to mitigate that by allowing the user to process close to the data. These improvements will be obtained via a variety of mechanisms: 1 - improving the ability of data transformation services to reduce the data prior to analysis; 2 – providing cloud-native analysis capabilities for common analysis functions; and 3 – providing the ability to work directly with data in Web Object Storage.

The role of data stewards in a cloud-based platform
Amanda Leon

Google Earth Engine has a growing user community as a cloud-based platform for analysis and visualization of geospatial data. This adoption is heavily driven by the ease of access Earth Engine’s Data Catalog provides to a wealth of satellite imagery and other geospatial data.  As stewards of NASA EOSDIS data, Distributed Active Archive Centers (DAACs) can play a key role in supporting and maximizing the utility of Earth Engine for the scientific community.  The NSIDC DAAC has been assessing various data stewardship topics to support the sustainment and expansion of NASA EOSDIS data in Google Earth Engine including: 1) data inclusion decisions based on science use cases; 2) optimized workflows for preparing 

Open Source Data-Intensive Platform for the Cloud
Thomas Huang
JPL has a long history of building many innovative solutions for onboard instrument, ground operation and data system, archive and distribution for our missions. As the rate of data generate from our missions continue to increase and is expected to rise significantly in near future, JPL is engaging in in reusable data-intensive technologies for mission operations and to enable science. This talk discusses open source solution we have developed for the Cloud platform to address three challenges from our growing collections of scientific data: interactive analysis, in situ match-up, and search relevancy, and their applications.

Developing a roadmap for cloud services
Suresh Vannan

The Physical Oceanography Distributed Active Archive Center (PO.DAAC) will be the data repository for the Surface Water Ocean Topography (SWOT) mission. SWOT provides new challenges, and opportunities, to PO.DAAC, a large data volume (20 TB/day) and a new community of users (hydrologists). This presentation will show how PO.DAAC plans on addressing those. PO.DAAC first assessed what tools and services current and new users will need to discover, access and utilize SWOT data. This analysis provided information for developing a roadmap that shows what services PO.DAAC (and ESDIS) will migrate and/or develop in a Cloud-based environment for the user community.

Leveraging an interoperable scalable data platform to support Earth Observation Data
Sudhir Raj Shrestha (sshrestha@esri.com)
With an ever-increasing wealth of scientific data produced from various sources and platforms including earth observations, models and forecasts, comes exciting and challenging opportunities to exploit such vast amounts of data to produce valuable information products. These data are widely used for monitoring, and analysis of measurements that are associated with physical, chemical and biological phenomena across earth’s oceans, atmosphere and land masses by government agencies like NOAA, NASA, USGS and private industries. The volume, diversity, and complexity of multidimensional earth science data have posed challenges in the past with how it is shared with a diverse community, visualized intuitively, and integrated for answering scientific questions. With advances in geospatial science and technology, these data and analytics can now advantageously be hosted in the cloud. This will have a tremendous impact on how scientists, policy makers, and the public ingest, manage, analyze, visualize, and share complex scientific data. GIS software is evolving in step with the technology industry to help meet these challenges. In this presentation, I will discuss briefly, how the current technology trend is driving more scalable, interoperable and format agnostic capabilities. We will share how the ArcGIS platform supports this “Open Science” and share use cases in place in NOAA and NASA. We will also share recent advancements in the cloud, spatial machine learning and geospatial data science that support various domain of science applications.

Session recording here.

avatar for Sudhir Raj Shrestha

Sudhir Raj Shrestha

Solution Engineer Researcher, Esri
Solution Engineer and Scientific Data enthusiast with keen interest in making data easily Discoverable and Interoperable. Passionate about geospatially driven Hydrological Modeling and Heuristic Soil Modeling and develop, implement new and innovative geospatial methods, techniques... Read More →
avatar for Amanda Leon

Amanda Leon

DAAC Manager, NASA National Snow and Ice Data Center DAAC
avatar for Thomas Huang

Thomas Huang

Technical Group Supervisor, JPL
avatar for Christopher Lynnes

Christopher Lynnes

Researcher, Self
Christopher Lynnes recently retired from NASA as System Architect for NASA’s Earth Observing System Data and Information System, known as EOSDIS. He worked on EOSDIS for 30 years, over which time he has worked multiple generations of data archive systems, search engines and interfaces... Read More →
avatar for Suresh Vannan

Suresh Vannan

Project Manager, NASA/Caltech Jet Propulsion Laboratory

Thursday July 18, 2019 10:30am - 12:00pm PDT
Ballrm BC
  Ballrm BC, Breakout