2019 ESIP Summer Meeting: Full Schedule

Data to Action: Increasing the Use and Value of Earth Science Data and Information: For 20 years, ESIP meetings have brought together the most innovative thinkers and leaders around Earth observation data, thus forming a community dedicated to making Earth observations more discoverable, accessible and useful to researchers, practitioners, policymakers, and the public.

The ESIP Summer Meeting has already taken place, but check out the ESIP Summer Meeting Highlights Webinar: https://youtu.be/vbA8CuQz9Rk.

10:15am PDT

Cloud 101: How Do I Get Started In Cloud Computing Workshop

This workshop is structured to provide Earth scientists and practitioners with an authentic experience in making use of current cloud computing resources and related tools and machine learning services available. Participants should bring their own computers and plan on working through a use case and complete some data analysis on the cloud.

10:15 Introduction

Why and when should we use the cloud?
Who is / are AWS?
How do we use the cloud?

10:35 Storing data in the cloud

What are the three primary ways of talking to the cloud?
What are the main activities supported by cloud consoles?

11:05 Doing computations in the cloud

What can we use a cloud machine for?

View Session Recording on YouTube.

Session Take-Aways

Clouding computing is very powerful, yet complicated to set up even with people who are familiar with the tools. Provides insight into the world of cloud computing - lots of good concepts and vocab.
We need to consider security issues when setting up VMs, particularly when dealing with controlled networks like at government institutions.
Amazon Machine Images (AMI) can be a useful tool for reproducibility of instances in the AMS structure. This produces a snapshot of an instance in time so that parameters can be replicated.

Moderators

Marge Cole

Speakers

Amanda Tan

Data Scientist, University of Washington

Cloud computing, distributed systems

Mike Little

CISTO, NASA

Computational Technology to support scientific investigations

Tuesday July 16, 2019 10:15am - 11:45am PDT
Ballrm D

Ballrm D, Workshop

Area data analysis, workshop, science
REMOTE PARTICIPATION LINK: https://global.gotomeeting.com/join/655619893
REMOTE PARTICIPATION PHONE #: (646) 749-3122
REMOTE PARTICIPATION ACCESS CODE 655-619-893

12:45pm PDT

Using Pangeo JupyterHubs to work with large public datasets

Bring your laptop to this hands-on workshop! Participants will learn about the open-source scientific python ecosystem for analytic workflows with big data in Earth Science. Pangeo is first and foremost a community promoting open, reproducible, and scalable science (read more at https://pangeo.io). This community provides documentation, develops and maintains software, and deploys computing infrastructure to make scientific research and programming easier. The Pangeo software ecosystem involves open source tools such as xarray, iris, dask, jupyter, and many other packages. In brief workshop, participants will familiarize themselves with writing code in Jupyter Notebooks that can be run on scalable computing clusters running on the Cloud, bypassing a common bottleneck of downloading ever-increasing volumes of remote sensing or modeling data. We will introduce key Python tools and have participants write simple code to work with large public datasets hosted on Amazon Web Services and Google Cloud.

Agenda
12:45 - 12:55 Quick introduction to Pangeo (http://bit.ly/esip-slides)
12:55 - 1:25 Introductory notebooks for jupyter, xarray, dask on Google Binder
1:25 - 1:45 Landsat-8 demo on AWS Binder
1:45 - 2:15 Time for participant experimentation and questions

View Session Recording on YouTube.

Speakers

Amanda Tan

Data Scientist, University of Washington

Cloud computing, distributed systems

Anthony Arendt

Andrew Pawloski

Scott Henderson

Research Scientist, University of Washington

ESIP Workshop Intro pdf

Tuesday July 16, 2019 12:45pm - 2:15pm PDT
Ballrm D

Ballrm D, Workshop

Area satellite imagery
REMOTE PARTICIPATION LINK: https://global.gotomeeting.com/join/655619893
REMOTE PARTICIPATION PHONE #: (646) 749-3122
REMOTE PARTICIPATION ACCESS CODE 655-619-893

2:45pm PDT

Hands on with Jetstream Atmosphere Part I

Hands on with the Atmosphere GUI interface on Jetstream cloud

This tutorial will first give an overview of Jetstream, the National Science Foundation's first production research and education cloud, and various aspects of the system. Then we will take attendees through the basics of using Jetstream via the Atmosphere web interface. This will include a guided walk-through of the interface itself, the features provided, the image catalog, launching and using virtual machines on Jetstream, using volume-based storage, and best practices.

We are targeting users of every experience level. Atmosphere is well-suited to both HPC novices and advanced users. This tutorial is generally aimed at those unfamiliar with cloud computing and generally doing computation on laptops or departmental server resources. While we will not cover advanced topics in this particular tutorial, we will touch on the available advanced capabilities during the initial overview.

Attendees will need to bring a laptop with a modern web browser (Firefox, Chrome, or Safari).
----

Jetstream is a user-friendly cloud computing environment for researchers based on Atmosphere and OpenStack.It is designed to provide configurable cyberinfrastructure that gives researchers access to interactive computing and data analysis resources on demand, whenever and wherever they want to analyze their data. For a more in-depth description please see the System Overview - http://wiki.jetstream-cloud.org/System+Overview

Session recording is here, but unfortunately, there is no audio in the recording. For best results, see the webinar recorded here, which provides much of the same information.

Speakers

Jeremy Fischer

Manager, Jetstream Cloud, Jetstream - Indiana University

Cloud computing for research and education!

Sanjana Sudarshan

Tuesday July 16, 2019 2:45pm - 4:15pm PDT
Ballrm D

Ballrm D, Workshop

REMOTE PARTICIPATION LINK: https://global.gotomeeting.com/join/655619893
REMOTE PARTICIPATION PHONE #: (646) 749-3122
REMOTE PARTICIPATION ACCESS CODE 655-619-893

4:30pm PDT

Hands on with Jetstream Atmosphere Part II

Speakers

Jeremy Fischer

Manager, Jetstream Cloud, Jetstream - Indiana University

Cloud computing for research and education!

Sanjana Sudarshan

Tuesday July 16, 2019 4:30pm - 6:00pm PDT
Ballrm D

Ballrm D, Workshop

REMOTE PARTICIPATION LINK: https://global.gotomeeting.com/join/655619893
REMOTE PARTICIPATION PHONE #: (646) 749-3122
REMOTE PARTICIPATION ACCESS CODE 655-619-893

10:30am PDT

Getting your data into the cloud: How to deploy and use Cumulus

This session will be an interactive walkthrough of how to deploy the open-source Cumulus tool for getting your data into the cloud and a live demo of using Cumulus to ingest a new set of science data into the cloud.

Presenter: Mark Boyd
Presentation Title: An Introduction to Cumulus
Slides: https://doi.org/10.6084/m9.figshare.8947106
Session recording is here.

Moderators

Mark Boyd

Engineer

Wednesday July 17, 2019 10:30am - 12:00pm PDT
Ballrm D

Ballrm D, Workshop

REMOTE PARTICIPATION LINK: https://global.gotomeeting.com/join/655619893
REMOTE PARTICIPATION PHONE #: (646) 749-3122
REMOTE PARTICIPATION ACCESS CODE 655-619-893

1:30pm PDT

Cloud Engineering in Practice

With the immense increase in volume of data acquisition and archival comes the challenge of intensive data processing that we are all trying to solve. There are many efforts underway to achieve this by infusing cloud technologies into software infrastructure. In this session we would like to cover the various approaches being taken to move towards scalable storage and auto scaled processing. We will talk about porting applications to the cloud, container based deployment models and a hybrid science data processing system. This data system utilizes both on-premise and remote compute resources to meet latency requirements while handling the large volumes of data. Cloud based infrastructure is being used for running data analytic stacks, automated workflows for reprocessing campaigns, forward keep up and much more. Several projects have invested in cloud technologies such as GRFN (Getting Ready for NISAR), PO.DAAC, SWOT and so on. We have explored running our softwares on several cloud platforms like Azure, Google Cloud Platform, Amazon Web Services, High Performance Computing and Kubernetes. We would like to shed some light on such work and lessons learned in the process.

Presentations

Cloud-based Data Processing and Workflow Systems – Namrata Malarout (Jet Propulsion Laboratory, California Institute of Technology)
Steering the Ship: Making Sense of Multi Container Deployments with the Help of Kubernetes and AWS – Frank Greguska (Jet Propulsion Laboratory, California Institute of Technology)
Packaging applications into containers is an easy and effective mechanism for delivering software that is runnable, repeatable, and reliable. However, that is just the first step. Scientific applications that deal with big data tend to require parallelism and multiple focused applications. In this case, it is necessary to manage multi-container deployments spread across multiple machines. In this talk, one solution for deploying and managing multi-container deployments will be explored in depth. The focus will be on Kubernetes, Amazon Web Services, and Apache SDAP as deployed for the NASA Sea Level Change Portal. You can look forward to a Kubernetes crash course followed by a detailed explanation of a production deployment of an Elastic Kubernetes Service (EKS) cluster.
Cumulus Lessons Learned: Building, testing, and sharing a cloud archive – Patrick Quinn (NASA / EED-2 / Element 84)
Cumulus is a scalable, extensible cloud-based archive system which is capable of ingesting, archiving, and distributing data from both existing on-prem sources and new cloud-native missions. As we have built and evolved the system with contributions from seven NASA EOSDIS organizations, we have learned several lessons about how to build a robust, broadly-applicable, microservices-based cloud system for geospatial data which we will share in this talk.

Session recording is here.

Speakers

Patrick Quinn

Software Engineer, Element 84

Frank Greguska

Scientific Applications Software Engineer, Jet Propulsion Laboratory, California Institute of Technology

Namrata Malarout

Scientific Applications Software Engineer, NASA / JPL

Wednesday July 17, 2019 1:30pm - 3:00pm PDT
Ballrm D

Ballrm D, Breakout

REMOTE PARTICIPATION LINK: https://global.gotomeeting.com/join/655619893
REMOTE PARTICIPATION PHONE #: (646) 749-3122
REMOTE PARTICIPATION ACCESS CODE 655-619-893

3:30pm PDT

Scalable, data-proximate cloud computing for Earth Science research

Data intensive scientific workflows are at a pivotal time in which traditional local computing resources are no longer capable of meeting the storage or computing demands of scientists. In the Earth Sciences, we are facing an explosion of data volumes sourced from models, in-situ observations, and remote sensing platforms. Some agencies are starting to move data to commercial Cloud providers to facilitate access (e.g. NASA on Amazon Web Services). Fully leveraging these opportunities will require new approaches in the way the scientific community handles data access, processing and analysis. In particular, we need to stop downloading data and start uploading algorithms to wherever large archives reside. This session is targeted at researchers who pioneering such “data-proximate” computing on commercial Cloud infrastructure. We hope to hear current success stories, as well as failures, and identify ways to improve existing workflows.

Agenda

3:30 - 3:35 Scott Henderson (eScience Institute) Introduction to the session - slides: http://bit.ly/2YhbWnr
3:35 - 3:55 Aimee Barciauskas (Development Seed): The Multi-Mission Algorithm and Analysis Platform (MAAP)
Slides: https://doi.org/10.6084/m9.figshare.8942108
3:55 - 4:15 Aji John (University of Washington) - Analyzing satellite imagery on the Cloud to understand wildflower phenology at Mt Rainier
4:15 - 4:35 Julien Chastang (UCAR/unidata) - Deploying a Unidata JupyterHub on the NSF Jetstream Cloud, Lessons Learned and Challenges Going Forward
Slides: https://doi.org/10.6084/m9.figshare.8944964
4:35 - 4:55 Rich Signell (USGS): Using the Pangeo ecosystem for model analysis and visualization
Slides: https://doi.org/10.6084/m9.figshare.9115229
4:55 - 5:00 Wrapup discussion

Session recording is here.

Session Take-Aways

A current challenge for cloud-based workflows is that datasets from different agencies are in different formats, different regions, and often have similar but slightly different access apis
Platforms such as MAAP and Pangeo are very promising and exciting. They enable the benefits of scalable computing on datasets stored on the cloud.
The cost model for scalable cloud computing is unclear. How to support platforms into the future and regulate user access to cluster resources.

Speakers

Aji John

University of Washington

Rich Signell

Research Oceanographer, USGS

Julien Chastang

Software Engineer, UCAR - Unidata

Scientific software developer at UCAR-Unidata.

Scott Henderson

Research Scientist, University of Washington

Aimee Barciauskas

Tech Lead / Engineer, Development Seed

Wednesday July 17, 2019 3:30pm - 5:00pm PDT
Ballrm D

Ballrm D, Breakout

REMOTE PARTICIPATION LINK: https://global.gotomeeting.com/join/655619893
REMOTE PARTICIPATION PHONE #: (646) 749-3122
REMOTE PARTICIPATION ACCESS CODE 655-619-893

10:30am PDT

Current Approaches for Tracking and Exposing Research Object Usage Metrics

Many publishers and funders have implemented open data policies in efforts to make research more transparent and re-usable. These policies also aim to support data, software, and other research objects as valuable output of the research process. To begin to assess impact and give credit to researchers for sharing research objects, however, the community needs to take additional steps to promote standardized measurement of research object usage and proper citation. This means different things for different stakeholders: researchers need to be informed on how and why research object citations should be included in articles and other publications, publishers need to promote and index research object citations, repositories need to standardize and display research object usage information, and institutions need to value these metrics.

Several stakeholders have begun improving capabilities for tracking and exposing research object usage metrics. For example, Make Data Count highlights the value of research data by providing the infrastructure for repositories to display data usage and citation metrics. The project has worked with COUNTER to develop a Code of Practice to enable standardization and has also developed mechanisms for repositories to expose data usage metrics, including implementation examples from California Digital Library, the Arctic Data Center, and DataONE. In this session, we will hear 1) how repositories are currently tracking research object citations, and 2) how the Make Data Count project and other efforts can help these repositories standardize their reporting approach to support accurate representation of the value of research objects.

Session recording here.

Moderators

Amber Budden

Director for Community Engagement and Outreach, DataONE

Robert R. Downs

Sr. Digital Archivist, Columbia University

Dr. Robert R. Downs serves as the senior digital archivist and acting head of cyberinfrastructure and informatics research and development at CIESIN, the Center for International Earth Science Information Network, a research and data center of the Columbia Climate School of Columbia... Read More →

Matt Jones

Director of Informatics R&D, NCEAS / DataONE / UC Santa Barbara

DataONE | Arctic Data Center | Open Science | Provenance and Semantics | Cyberinfrastructure

Madison Langseth

Science Data Manager, U.S. Geological Survey

Madison develops tools and workflows to make the USGS data release process more efficient for researchers and data managers. She also promotes data management best practices through the USGS’s Community for Data Integration Data Management Working Group and the USGS Data Management... Read More →

Dave Vieglais

Research Professor, University of Kansas

Speakers

Jessica Hausman

NASA HQ / ASRC Federal

MetricsatPO.DAACandNASA pdf

Thursday July 18, 2019 10:30am - 12:00pm PDT
Ballrm D

Ballrm D, Breakout

Area Research Object Citations
REMOTE PARTICIPATION LINK: https://global.gotomeeting.com/join/655619893
REMOTE PARTICIPATION PHONE #: (646) 749-3122
REMOTE PARTICIPATION ACCESS CODE 655-619-893

1:30pm PDT

How to build your data "groups" for optimal discovery?

How does your Earth Science community define a collection that is discoverable in catalogs AND yet can be simply understood by humans? Many areas need to be evaluated such as definitions, elements, vocabularies and more...oh, my! Help us create a cheat sheet to help the data management community by having fun.

Whether you come to our session or not, please help us by filling in the blanks for the following statement in Slido.
I work with _______ datatype and it is aggregated by ___________. An example answer would be: sonar data; single cruise + instrument type.

https://app.sli.do/event/dttuqvzw
or
Go to https://www.slido.com and enter the code #C690

Session recording here.

Speakers

Anna Milan

Metadata Standards Lead, NOAA NCEI

~*~Metadata Adds Meaning~*~

Heather Brown

Archive Data Management Specialist, Riverside for NESDIS/NCEI

Data Aggregations to Optimize Discovery ESIPSummer2019 pptx

Thursday July 18, 2019 1:30pm - 3:00pm PDT
Ballrm D

Ballrm D, Working Session

Area commoning, hardware, infrastructure, maintainers, metadata, granularity
REMOTE PARTICIPATION LINK: https://global.gotomeeting.com/join/655619893
REMOTE PARTICIPATION PHONE #: (646) 749-3122
REMOTE PARTICIPATION ACCESS CODE 655-619-893

10:00am PDT

Current Status in Cloud Data Access

Cloud computing holds the promise of novel data analysis capabilities for geoscientists by providing affordable on-demand computing system resources. One of the major differences with the traditional computing systems is web-based object storage which requires new data access methods with a different set of performance parameters.

The aim of this session is to provide the ESIP community with an opportunity to learn about the current capabilities for accessing data in cloud object stores. The emphasis will be on the actual software, data servers or libraries, which are capable of accessing cloud object stores, performance issues and bottlenecks, and best practices that can be adopted when migrating data to the cloud. When considering end-user applications, this session is about how those tools access data from the novel data storage systems available with cloud computing and not about the algorithms, etc., associated with data visualization, analytics, or machine learning.

Session recording here.

Moderators

James Gallagher

President, OPeNDAP

Aleksandar Jelenek

The HDF Group

Friday July 19, 2019 10:00am - 11:30am PDT
Ballrm D

Ballrm D, Breakout

REMOTE PARTICIPATION LINK: https://global.gotomeeting.com/join/655619893
REMOTE PARTICIPATION PHONE #: (646) 749-3122
REMOTE PARTICIPATION ACCESS CODE 655-619-893

11:45am PDT

Beyond the cookbook: Connecting workflows, data and people for sustainable interdisciplinary Earth Sciences

This interactive workshop intends to add to the Throughput cookbook, by having participants work through and annotate workflows. Additionally, participants will use the API to look at the networks already built to ascertain what additional tools are needed to make sense of it all.

Session recording here.

Moderators

Simon Goring

Friday July 19, 2019 11:45am - 1:15pm PDT
Ballrm D

Ballrm D, Workshop

REMOTE PARTICIPATION LINK: https://global.gotomeeting.com/join/655619893
REMOTE PARTICIPATION PHONE #: (646) 749-3122
REMOTE PARTICIPATION ACCESS CODE 655-619-893