Loading…
This event has ended. Create your own event on Sched.
Data to Action: Increasing the Use and Value of Earth Science Data and InformationFor 20 years, ESIP meetings have brought together the most innovative thinkers and leaders around Earth observation data, thus forming a community dedicated to making Earth observations more discoverable, accessible and useful to researchers, practitioners, policymakers, and the public.

The ESIP Summer Meeting has already taken place, but check out the ESIP Summer Meeting Highlights Webinar: https://youtu.be/vbA8CuQz9Rk.
Ballrm D [clear filter]
Tuesday, July 16
 

10:15am PDT

Cloud 101: How Do I Get Started In Cloud Computing Workshop
This workshop is structured to provide Earth scientists and practitioners with an authentic experience in making use of current cloud computing resources and related tools and machine learning services available. Participants should bring their own computers and plan on working through a use case and complete some data analysis on the cloud.

10:15 Introduction
  • Why and when should we use the cloud? 
  • Who is / are AWS? 
  • How do we use the cloud?
10:35  Storing data in the cloud
  • What are the three primary ways of talking to the cloud? 
  • What are the main activities supported by cloud consoles?
11:05  Doing computations in the cloud
  • What can we use a cloud machine for?

View Session Recording on YouTube.

Session Take-Aways
  1. Clouding computing is very powerful, yet complicated to set up even with people who are familiar with the tools. Provides insight into the world of cloud computing - lots of good concepts and vocab.
  2. We need to consider security issues when setting up VMs, particularly when dealing with controlled networks like at government institutions.
  3. Amazon Machine Images (AMI) can be a useful tool for reproducibility of instances in the AMS structure. This produces a snapshot of an instance in time so that parameters can be replicated.






Moderators
Speakers
avatar for Amanda Tan

Amanda Tan

Data Scientist, University of Washington
Cloud computing, distributed systems
ML

Mike Little

CISTO, NASA
Computational Technology to support scientific investigations


Tuesday July 16, 2019 10:15am - 11:45am PDT
Ballrm D
  Ballrm D, Workshop

12:45pm PDT

Using Pangeo JupyterHubs to work with large public datasets
Bring your laptop to this hands-on workshop! Participants will learn about the open-source scientific python ecosystem for analytic workflows with big data in Earth Science. Pangeo is first and foremost a community promoting open, reproducible, and scalable science (read more at https://pangeo.io). This community provides documentation, develops and maintains software, and deploys computing infrastructure to make scientific research and programming easier. The Pangeo software ecosystem involves open source tools such as xarray, iris, dask, jupyter, and many other packages. In brief workshop, participants will familiarize themselves with writing code in Jupyter Notebooks that can be run on scalable computing clusters running on the Cloud, bypassing a common bottleneck of downloading ever-increasing volumes of remote sensing or modeling data. We will introduce key Python tools and have participants write simple code to work with large public datasets hosted on Amazon Web Services and Google Cloud.

Agenda
12:45 - 12:55 Quick introduction to Pangeo (http://bit.ly/esip-slides)
12:55 - 1:25 Introductory notebooks for jupyter, xarray, dask on Google Binder
1:25 - 1:45 Landsat-8 demo on AWS Binder
1:45 - 2:15 Time for participant experimentation and questions

View Session Recording on YouTube.

Speakers
avatar for Amanda Tan

Amanda Tan

Data Scientist, University of Washington
Cloud computing, distributed systems
avatar for Scott Henderson

Scott Henderson

Research Scientist, University of Washington



Tuesday July 16, 2019 12:45pm - 2:15pm PDT
Ballrm D
  Ballrm D, Workshop

2:45pm PDT

Hands on with Jetstream Atmosphere Part I
Hands on with the Atmosphere GUI interface on Jetstream cloud

This tutorial will first give an overview of Jetstream, the National Science Foundation's first production research and education cloud, and various aspects of the system. Then we will take attendees through the basics of using Jetstream via the Atmosphere web interface. This will include a guided walk-through of the interface itself, the features provided, the image catalog, launching and using virtual machines on Jetstream, using volume-based storage, and best practices.

We are targeting users of every experience level. Atmosphere is well-suited to both HPC novices and advanced users. This tutorial is generally aimed at those unfamiliar with cloud computing and generally doing computation on laptops or departmental server resources. While we will not cover advanced topics in this particular tutorial, we will touch on the available advanced capabilities during the initial overview.

Attendees will need to bring a laptop with a modern web browser (Firefox, Chrome, or Safari).
----

Jetstream is a user-friendly cloud computing environment for researchers based on Atmosphere and OpenStack.It is designed to provide configurable cyberinfrastructure that gives researchers access to interactive computing and data analysis resources on demand, whenever and wherever they want to analyze their data. For a more in-depth description please see the System Overview - http://wiki.jetstream-cloud.org/System+Overview

Session recording is here, but unfortunately, there is no audio in the recording. For best results, see the webinar recorded here, which provides much of the same information.

Speakers
avatar for Jeremy Fischer

Jeremy Fischer

Manager, Jetstream Cloud, Jetstream - Indiana University
Cloud computing for research and education!


Tuesday July 16, 2019 2:45pm - 4:15pm PDT
Ballrm D
  Ballrm D, Workshop

4:30pm PDT

Hands on with Jetstream Atmosphere Part II
Hands on with the Atmosphere GUI interface on Jetstream cloud

This tutorial will first give an overview of Jetstream, the National Science Foundation's first production research and education cloud, and various aspects of the system. Then we will take attendees through the basics of using Jetstream via the Atmosphere web interface. This will include a guided walk-through of the interface itself, the features provided, the image catalog, launching and using virtual machines on Jetstream, using volume-based storage, and best practices.

We are targeting users of every experience level. Atmosphere is well-suited to both HPC novices and advanced users. This tutorial is generally aimed at those unfamiliar with cloud computing and generally doing computation on laptops or departmental server resources. While we will not cover advanced topics in this particular tutorial, we will touch on the available advanced capabilities during the initial overview.

Attendees will need to bring a laptop with a modern web browser (Firefox, Chrome, or Safari).
----

Jetstream is a user-friendly cloud computing environment for researchers based on Atmosphere and OpenStack.It is designed to provide configurable cyberinfrastructure that gives researchers access to interactive computing and data analysis resources on demand, whenever and wherever they want to analyze their data. For a more in-depth description please see the System Overview - http://wiki.jetstream-cloud.org/System+Overview

Session recording is here.

Speakers
avatar for Jeremy Fischer

Jeremy Fischer

Manager, Jetstream Cloud, Jetstream - Indiana University
Cloud computing for research and education!


Tuesday July 16, 2019 4:30pm - 6:00pm PDT
Ballrm D
  Ballrm D, Workshop
 
Wednesday, July 17
 

10:30am PDT

Getting your data into the cloud: How to deploy and use Cumulus
This session will be an interactive walkthrough of how to deploy the open-source Cumulus tool for getting your data into the cloud and a live demo of using Cumulus to ingest a new set of science data into the cloud.

Presenter: Mark Boyd
Presentation Title: 
An Introduction to Cumulus
Slides: https://doi.org/10.6084/m9.figshare.8947106
Session recording is here.

Moderators
MB

Mark Boyd

Engineer

Wednesday July 17, 2019 10:30am - 12:00pm PDT
Ballrm D
  Ballrm D, Workshop

1:30pm PDT

Cloud Engineering in Practice
With the immense increase in volume of data acquisition and archival comes the challenge of intensive data processing that we are all trying to solve. There are many efforts underway to achieve this by infusing cloud technologies into software infrastructure. In this session we would like to cover the various approaches being taken to move towards scalable storage and auto scaled processing. We will talk about porting applications to the cloud, container based deployment models and a hybrid science data processing system. This data system utilizes both on-premise and remote compute resources to meet latency requirements while handling the large volumes of data. Cloud based infrastructure is being used for running data analytic stacks, automated workflows for reprocessing campaigns, forward keep up and much more. Several projects have invested in cloud technologies such as GRFN (Getting Ready for NISAR), PO.DAAC, SWOT and so on. We have explored running our softwares on several cloud platforms like Azure, Google Cloud Platform, Amazon Web Services, High Performance Computing and Kubernetes. We would like to shed some light on such work and lessons learned in the process.

Presentations
  • Cloud-based Data Processing and Workflow Systems – Namrata Malarout (Jet Propulsion Laboratory, California Institute of Technology)
  • Steering the Ship: Making Sense of Multi Container Deployments with the Help of Kubernetes and AWS – Frank Greguska (Jet Propulsion Laboratory, California Institute of Technology)
    Packaging applications into containers is an easy and effective mechanism for delivering software that is runnable, repeatable, and reliable. However, that is just the first step. Scientific applications that deal with big data tend to require parallelism and multiple focused applications. In this case, it is necessary to manage multi-container deployments spread across multiple machines. In this talk, one solution for deploying and managing multi-container deployments will be explored in depth. The focus will be on Kubernetes, Amazon Web Services, and Apache SDAP as deployed for the NASA Sea Level Change Portal. You can look forward to a Kubernetes crash course followed by a detailed explanation of a production deployment of an Elastic Kubernetes Service (EKS) cluster.
  • Cumulus Lessons Learned: Building, testing, and sharing a cloud archive – Patrick Quinn (NASA / EED-2 / Element 84)
    Cumulus is a scalable, extensible cloud-based archive system which is capable of ingesting, archiving, and distributing data from both existing on-prem sources and new cloud-native missions. As we have built and evolved the system with contributions from seven NASA EOSDIS organizations, we have learned several lessons about how to build a robust, broadly-applicable, microservices-based cloud system for geospatial data which we will share in this talk.

Session recording is here.

Speakers
avatar for Patrick Quinn

Patrick Quinn

Software Engineer, Element 84
avatar for Frank Greguska

Frank Greguska

Scientific Applications Software Engineer, Jet Propulsion Laboratory, California Institute of Technology
avatar for Namrata Malarout

Namrata Malarout

Scientific Applications Software Engineer, NASA / JPL


Wednesday July 17, 2019 1:30pm - 3:00pm PDT
Ballrm D
  Ballrm D, Breakout

3:30pm PDT

Scalable, data-proximate cloud computing for Earth Science research
Data intensive scientific workflows are at a pivotal time in which traditional local computing resources are no longer capable of meeting the storage or computing demands of scientists. In the Earth Sciences, we are facing an explosion of data volumes sourced from models, in-situ observations, and remote sensing platforms. Some agencies are starting to move data to commercial Cloud providers to facilitate access (e.g. NASA on Amazon Web Services). Fully leveraging these opportunities will require new approaches in the way the scientific community handles data access, processing and analysis. In particular, we need to stop downloading data and start uploading algorithms to wherever large archives reside. This session is targeted at researchers who pioneering such “data-proximate” computing on commercial Cloud infrastructure. We hope to hear current success stories, as well as failures, and identify ways to improve existing workflows.

Agenda
  • 3:30 - 3:35 Scott Henderson (eScience Institute) Introduction to the session - slides: http://bit.ly/2YhbWnr
  • 3:35 - 3:55 Aimee Barciauskas (Development Seed): The Multi-Mission Algorithm and Analysis Platform (MAAP)
    Slides: https://doi.org/10.6084/m9.figshare.8942108
  • 3:55 - 4:15 Aji John (University of Washington) - Analyzing satellite imagery on the Cloud to understand wildflower phenology at Mt Rainier
  • 4:15 - 4:35 Julien Chastang (UCAR/unidata) - Deploying a Unidata JupyterHub on the NSF Jetstream Cloud, Lessons Learned and Challenges Going Forward
    Slides: https://doi.org/10.6084/m9.figshare.8944964
  • 4:35 - 4:55 Rich Signell (USGS): Using the Pangeo ecosystem for model analysis and visualization
    Slides: https://doi.org/10.6084/m9.figshare.9115229
  • 4:55 - 5:00   Wrapup discussion 

Session recording is here.

Session Take-Aways
  1. A current challenge for cloud-based workflows is that datasets from different agencies are in different formats, different regions, and often have similar but slightly different access apis
  2. Platforms such as MAAP and Pangeo are very promising and exciting. They enable the benefits of scalable computing on datasets stored on the cloud.
  3. The cost model for scalable cloud computing is unclear. How to support platforms into the future and regulate user access to cluster resources.


Speakers
AJ

Aji John

University of Washington
avatar for Rich Signell

Rich Signell

Research Oceanographer, USGS
avatar for Julien Chastang

Julien Chastang

Software Engineer, UCAR - Unidata
Scientific software developer at UCAR-Unidata.
avatar for Scott Henderson

Scott Henderson

Research Scientist, University of Washington
avatar for Aimee Barciauskas

Aimee Barciauskas

Tech Lead / Engineer, Development Seed


Wednesday July 17, 2019 3:30pm - 5:00pm PDT
Ballrm D
  Ballrm D, Breakout
 
Thursday, July 18
 

10:30am PDT

Current Approaches for Tracking and Exposing Research Object Usage Metrics
Many publishers and funders have implemented open data policies in efforts to make research more transparent and re-usable. These policies also aim to support data, software, and other research objects as valuable output of the research process. To begin to assess impact and give credit to researchers for sharing research objects, however, the community needs to take additional steps to promote standardized measurement of research object usage and proper citation. This means different things for different stakeholders: researchers need to be informed on how and why research object citations should be included in articles and other publications, publishers need to promote and index research object citations, repositories need to standardize and display research object usage information, and institutions need to value these metrics.

Several stakeholders have begun improving capabilities for tracking and exposing research object usage metrics. For example, Make Data Count highlights the value of research data by providing the infrastructure for repositories to display data usage and citation metrics. The project has worked with COUNTER to develop a Code of Practice to enable standardization and has also developed mechanisms for repositories to expose data usage metrics, including implementation examples from California Digital Library, the Arctic Data Center, and DataONE. In this session, we will hear 1) how repositories are currently tracking research object citations, and 2) how the Make Data Count project and other efforts can help these repositories standardize their reporting approach to support accurate representation of the value of research objects.

Session recording here.

Moderators
avatar for Amber Budden

Amber Budden

Director for Community Engagement and Outreach, DataONE
avatar for Robert R. Downs

Robert R. Downs

Sr. Digital Archivist, Columbia University
Dr. Robert R. Downs serves as the senior digital archivist and acting head of cyberinfrastructure and informatics research and development at CIESIN, the Center for International Earth Science Information Network, a research and data center of the Columbia Climate School of Columbia... Read More →
avatar for Matt Jones

Matt Jones

Director of Informatics R&D, NCEAS / DataONE / UC Santa Barbara
DataONE | Arctic Data Center | Open Science | Provenance and Semantics | Cyberinfrastructure
avatar for Madison Langseth

Madison Langseth

Science Data Manager, U.S. Geological Survey
Madison develops tools and workflows to make the USGS data release process more efficient for researchers and data managers. She also promotes data management best practices through the USGS’s Community for Data Integration Data Management Working Group and the USGS Data Management... Read More →
DV

Dave Vieglais

Research Professor, University of Kansas

Speakers
avatar for Jessica Hausman

Jessica Hausman

NASA HQ / ASRC Federal



Thursday July 18, 2019 10:30am - 12:00pm PDT
Ballrm D
  Ballrm D, Breakout

1:30pm PDT

How to build your data "groups" for optimal discovery?
How does your Earth Science community define a collection that is discoverable in catalogs AND  yet can be simply understood by humans? Many areas need to be evaluated such as definitions, elements, vocabularies and more...oh, my!  Help us create a cheat sheet to help the data management community by having fun.

Whether you come to our session or not, please help us by filling in the blanks for the following statement in Slido.
I work with _______ datatype and it is aggregated by ___________. An example answer would be: sonar data; single cruise + instrument type.

https://app.sli.do/event/dttuqvzw
or
Go to https://www.slido.com and enter the code #C690

Session recording here.

Speakers
avatar for Anna Milan

Anna Milan

Metadata Standards Lead, NOAA NCEI
~*~Metadata Adds Meaning~*~
avatar for Heather Brown

Heather Brown

Archive Data Management Specialist, Riverside for NESDIS/NCEI



Thursday July 18, 2019 1:30pm - 3:00pm PDT
Ballrm D
  Ballrm D, Working Session
 
Friday, July 19
 

10:00am PDT

Current Status in Cloud Data Access
Cloud computing holds the promise of novel data analysis capabilities for geoscientists by providing affordable on-demand computing system resources. One of the major differences with the traditional computing systems is web-based object storage which requires new data access methods with a different set of performance parameters.

The aim of this session is to provide the ESIP community with an opportunity to learn about the current capabilities for accessing data in cloud object stores. The emphasis will be on the actual software, data servers or libraries, which are capable of accessing cloud object stores, performance issues and bottlenecks, and best practices that can be adopted when migrating data to the cloud. When considering end-user applications, this session is about how those tools access data from the novel data storage systems available with cloud computing and not about the algorithms, etc., associated with data visualization, analytics, or machine learning.

Session recording here.

Moderators
JG

James Gallagher

President, OPeNDAP
AJ

Aleksandar Jelenek

The HDF Group

Friday July 19, 2019 10:00am - 11:30am PDT
Ballrm D
  Ballrm D, Breakout

11:45am PDT

Beyond the cookbook: Connecting workflows, data and people for sustainable interdisciplinary Earth Sciences
This interactive workshop intends to add to the Throughput cookbook, by having participants work through and annotate workflows. Additionally, participants will use the API to look at the networks already built to ascertain what additional tools are needed to make sense of it all.


Session recording here.

Moderators
Friday July 19, 2019 11:45am - 1:15pm PDT
Ballrm D
  Ballrm D, Workshop
 


Twitter Feed

Filter sessions
Apply filters to sessions.
  • Area
  • adoption
  • analysis-ready
  • Archiving
  • Assessment
  • Assessment dimensions
  • Big Data
  • CF
  • CF metadata
  • Citation
  • Climate Literacy
  • climDB
  • cloud
  • Cloud computing
  • collaboration
  • commoning
  • community ontology repository
  • Compliance
  • cor
  • crowdsourcing
  • cryosphere
  • cyberinfrastructure
  • data
  • data analysis
  • data citation
  • data integration
  • data intensive science
  • data management
  • data model
  • data packaging
  • Data product development
  • Data Rescue
  • data risk
  • Data Stewardship
  • data visualization
  • deep learning
  • Discovery
  • Documentation
  • DOI
  • Drones
  • Earth Science
  • Ecological Metadata Language
  • Education
  • Education Metadata
  • Educational resource assessment
  • EML
  • esiplab
  • Essential Variables
  • Evaluation
  • FAIR
  • FAIR Data
  • federation
  • geolocation
  • geoweaver
  • GIS
  • Government
  • granularity
  • hardware
  • HDF Group
  • hydrologic modeling
  • hydrosphere
  • IGSN
  • Improvement
  • Information Quality
  • infrastructure
  • interface design
  • international
  • Interoperability
  • Jupyter
  • Jupyter Notebooks
  • knowledge representation
  • LTER
  • maintainers
  • metadata
  • Mission Scale Data
  • Modeling
  • Multi-Cloud
  • Multi-Site
  • NASA
  • NASA DQWG
  • netCDF
  • NOAA
  • Ontology
  • ontology_engineering
  • PID
  • Planning
  • Public Sector
  • Python
  • R
  • Raster Analytics
  • remote sensing
  • repository
  • Research Object Citations
  • samples
  • Satellite Data
  • satellite imagery
  • schema-org dataset api
  • schema-org spatial temporal
  • schema.org
  • science
  • Science Communication
  • semantic technologies
  • Semantics
  • sensor networks
  • Services
  • Software
  • standardize data
  • Strategy
  • Subsetting
  • Sustainability
  • Sustainable Development Goals
  • sustainable education gateway
  • SWEET
  • Tools
  • Training assessment
  • Trusted data
  • Uncertainty
  • usability
  • Use
  • user communities
  • ux
  • vocabularly
  • vocabulary
  • water
  • water resources
  • working session
  • workshop