Presented by Sam Bail at Airflow Summit 2021.
Data quality has become a much discussed topic in the fields of data engineering and data science, and it has become clear that data validation is absolutely crucial to ensuring the reliability of any data products and insights produced by an organization’s data pipelines. This session will outline patterns for combining three popular open source tools in the data ecosystem – dbt, Airflow, and Great Expectations – and use them to build a robust data pipeline with data validation at each critical step.
0:00 Welcome
3:20 Quick review of dbt
6:32 Overview of Great Expectations
14:00 Integrating dbt and Airflow
27:50 Testing with dbt and Great Expectations
41:40 Wrap-up Video Rating: / 5
It has been just over a year since the NSF-funded PREPARE (Pandemic Research for Preparedness and Resilience (https://prepare-vo.org/) virtual organization was created, and we’re intent on building a community focused on pandemic preparedness and resilience. As we work to maximize the collaborative synergies of the outstanding research completed through the NSF RAPID grant program, we were excited to offer this opportunity to present work, learn from colleagues, and seek collaborative opportunities at RP2: NSF PREPARE 2nd Annual RAPID PI Meeting.
Lightning Round Presenters:
Nigel Reuel, Iowa State University
Amit Barui, Purdue University
Michael Klaczko, University of Rochester
Pelagia Gouma, The Ohio State University
Xiaochen Xian, University of Florida
Li Xiong, Emory University
Sharad Sharma, Bowie State University
Rupali Batta, Harvard University
Praveen Rao, University of Missouri-Columbia
Yung-Hsiang Lu, Purdue University
Saikat Basu, South Dakota State University Video Rating: / 5
https://www.ted.com/tedx/events/24878
http://www.bigdataexperience.org Dr. Tiranee Achalakul (ดร. ธีรณี อจลากุล) has worked in the fields of big data analytics, high performance computing, and software engineering since 2000. She has wide experience working with both the IT industry and in academia in the United States and Thailand in fields such as design and implementation of data methodologies, software systems and computing infrastructure; she has published two textbooks and multiple journal and conference papers.
During the past 14 years, Dr. Tiranee Achalakul has been participating in many data analytics and software development projects in the private and public sectors and has served on advisory boards for multiple agencies and on the committee of the National e-Science Infrastructure Consortium of Thailand. In addition to being Assistant President in Innovation and Partnership with the King Mongkut University of Technology Thonburi, she is Director of the Big Data Experience Center and the KMUTT student incubator (Hatch). This talk was given at a TEDx event using the TED conference format but independently organized by a local community. Learn more at https://www.ted.com/tedx Video Rating: / 5
The complexities of managing and delivering value from high throughput multi-omics data far outpace traditional approaches to IT infrastructure. Thus, building a robust, centralized ecosystem that ingests, stores & pre-processes these data for downstream ML applications becomes critical. Join our panel of industry experts as they make a case for strategic investments in biomedical data management and shed light on the challenges of building a data infrastructure from the ground up.
Talk to us for a personalized walkthrough on how we help data-driven drug discovery teams get to faster actionable insights: https://elucidata.io/schedule-a-meeting/ Video Rating: / 5
Health Information Exchange (HIE) is one of the most complex data systems in health care. Most HIEs are working to meet the challenge of streamlining data to create insights and support various stakeholders in their community around population health initiatives. Join speakers Jaime Bland, CEO of NeHII (Nebraska Health Information Initiative), one of the most advanced HIEs with a bold vision around HIEs as the enabler of population health through aligning HIE data to value based care, alternative payment model infrastructure and population health analytics, and Vineeth Yeddula, CEO of KPI Ninja, who has enabled NEHII to operationlize this vision by collaborating with NeHII. During this webinar, you will learn how using advanced analytics is revolutionizing this space by delivering insights in easing the burden for payors and providers as well as accelerating the improvement of outcomes. You’ll discover the unique approach to leverage existing data sources and design a population health analytics roadmap and improve the clinical outcomes that are aligned to value-based financial reimbursement.
Health Data is traditionally held and processed in large and complex mazes of hospital information systems. The market is dominated by vendors offering monolithic and proprietary software due to the critical nature of the supported processes and – in some cases – due to legal requirements. The “digital transformation”, “big data” and “artificial intelligence” are some of the hypes that demand for improved exchange of health care data in routine health care and medical research alike. Exchanging data at these scales requires open data formats and protocols, multi-stakeholder collaboration, and agile development. As an example, the de-facto messaging standard organization in medicine HL7 noticed a much more positive response from the medical research community regarding their openly available FHIR specification in comparison to the for-members-only and XML-based HL7v3 messaging standard specification.
While some past (or rather: ongoing) projects on a national scale in the German health care system have tried centralized, top-down specification and development approaches, more recent infrastructure projects embrace the competitive collaboration of a decentralized, bottom-up strategy. As a result, importance and recognition of free software increase in the Medical Informatics research community.
In a series of rapid spotlights, we present tools and frameworks that serve as cornerstones for the envisioned health data exchange infrastructure, including: Organization and collaboration tools; data extraction from clinical source systems, data transformation and de-identification; data management systems and long-term archival using persistent globally-unique object identifiers; federated queries across multiple independently managed clinical data integration centers.
We aim to encourage participants to actively add tools and frameworks within the discussion and highlight their experiences and challenges with using open systems in Medical Informatics.
❮h3❯Speaker bio:❮/h3❯
Marcel Parciak and Markus Suhr are research associates at the University Medical Center Göttingen (UMG), Department of Medical Informatics.
Marcel graduated the Göttingen Medical Informatics Master program in 2018 and is currently a PhD student, investigating the challenges of data provenance in medical research. He is a system architect for the HiGHmed project that facilitates innovative federated infrastructure for cross-organisational secondary use of health care data.
Markus started his professional career in 2014 as a system administrator and software developer at the UMG hospital data center. He joined the Department of Medical Informatics in 2017, becoming lead developer for a free software project and working on multiple biomedical research projects. Since 2019 he is technical lead for the newly created Medical Data Integration Center. Markus is a supporter of the Free Software Foundation Europe.
Room: AW1.126
Scheduled start: 2020-02-01 11:00:00 Video Rating: / 5
Over the past decade, we’ve witnessed a digital transformation in healthcare, with organizations capturing huge volumes of patient information. But this data is often unstructured and difficult to extract, with information trapped in clinical notes, insurance claims, recorded conversations, and more. In this session, explore how the new Amazon HealthLake service removes the heavy lifting of organizing, indexing, and structuring patient information to provide a complete view of each patient’s health record in the FHIR standard format. Come learn how to use prebuilt machine learning models to analyze and understand relationships in the data, identify trends, and make predictions, ultimately delivering better care for patients.
Learn more about re:Invent 2020 at http://bit.ly/3c4NSdY
Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4
#AWS #AWSEvents Video Rating: / 5
The INSPIRE Directive (2007) mandates European Union countries to share environmentally related datasets so that they can be easily accessed by other public organisations within their own and neighbouring countries to inform policies or activities that may impact on the environment. Key to delivering INSPIRE is the establishment of Spatial Data Infrastructures (SDIs) providing frameworks for coordinating the policies, infrastructure and standards needed to acquire, process, distribute, use, maintain and preserve spatial data through discovery, view and download services by 2020.
Archaeological information is inherently spatial yet, despite the environmental focus of INSPIRE, guidance is limited and ambiguous for archaeological datasets and consequentially there is limited engagement from data curators. Although Protected Sites is an INSPIRE theme does it cover only those formally designated through legislation or include sites managed through legal or other effective means?
INSPIRE publishes data to help inform environmental policies and if data is unpublished there is a risk it will simply be ignored. Complex modelling of environmental change through Ecosystem Services remotely consuming web services is already happening but the lack of published reference datasets from the historic environment compromises consideration of the resource in decision making processes.
Development of SDIs for heritage can bring wider benefits for the profession. Too often fieldwork extents and results are confined to paper publications or reside in project archives. Consequentially we lack a spatial record of fieldwork activities. Although cultural heritage data often has a strong spatial component, the full potential of the geographies created through discovery, recording and analysis is far from being realised. Harmonisation and publication of spatial data to consistent standards through an SDI is an essential pre-requisite for mainstreaming the use of heritage data in 21st century to get cultural heritage to work for Europe.
Peter McKeague (Historic Environment Scotland), Anthony Corns (The Discovery Programme), Axel Posluschny (University of Bamberg) Video Rating: / 5
To attend one of our AWS Loft events, visit us at one of our many locations – https://amzn.to/2YKpdS9.
Having accessible data that tells you about your customers and how they’re using your product is critical to the long-term health and success of your startup. By building a cloud-based data stack you’ll build better products, reach more customers, and more easily raise money.
Benn Stancil, Chief Analyst and Co-founder of Mode, gives an introduction to the tools you need to set up your data infrastructure in one-hour. He’ll arm you with the tools you need to move, store, and analyze data, and instructions on how to build a platform that can not only scale, but can also enable you to start answering product, marketing, and fundraising questions tonight. Video Rating: / 5
The key to successful analytics projects is to implement a robust data infrastructure. Find out what that means here – both for traditional and Big Data sources.