YOW! Data returns in 2022
YOW! Data brings together leading industry practitioners and applied researchers working on data-driven technology and applications, providing in-depth coverage of current and emerging technologies in the areas of Big Data, Analytics and Machine Learning.
Meet the experts, network and sharpen your skills all at the same time.
YOW! Data 2022 is an online, 2-day conference featuring invited International and Australian Speakers sharing their expertise in Data Science, Data Engineering, Machine Learning and AI.

This year’s YOW! Data will take place online.
Talks will be delivered live.
At Skills Matter, we’ve chosen to see the events of the past year as a challenge to make our content and community more inclusive and accessible to all. Beyond the COVID‑19 pandemic, we have a vision of a community where knowledge sharing and skills transfer are not limited by physical barriers.
We are excited about the opportunity to truly welcome our attendees to this year’s Data, no matter where you are in the world. We hope to see you there!
Who's going to be there?
This year's speakers include:
Explore Data 2022
Excited? Share it!
Day 1: June 1 AEST (UTC+10)
Main Track
Track | Main Track | |||
08:45
Invalid Time
Invalid Time
|
SESSION OVERVIEWS AND INTRODUCTIONS |
|||
09:00
Invalid Time
Invalid Time
|
KEYNOTE
Data Product is an exciting new concept - which means that every software vendor in the data space has their version of what data products are and the best way to build them. Each type of data product makes sense in a context - it solves specific problems for specific organizations with specific pre-existing capabilities and requirements. This talk will explore a few of the popular types of data products in those contexts and show example architectures that I’ve seen built to deliver these types of data products. By the end of the conference, you will be able to have a sensible conversation with your team and vendors; You will be discussing requirements, constraints, complexity, and costs - not buzzwords and hype. About the speaker...Gwen (Chen) ShapiraGwen is currently a co-founder of a stealth startup and building something awesome that she won’t talk about yet. Prior to starting a company, Gwen spent 6 years contributing to Apache Kafka, as an engineer, product manager and an engineering leader at Confluent. She is an author of Kafka - The Definitive Guide and a keynote speaker at international conferences. Follow Gwen on Twitter @gwenshap and LinkedIn at /gwenshapira. |
|||
09:45
Invalid Time
Invalid Time
|
Q&A WITH GWEN SHAPIRA & TEA BREAK |
|||
10:05
Invalid Time
Invalid Time
|
Unfortunately, the majority of data projects fail. Yet, they fail for the same reasons. Most management and data teams don’t know the reasons a project succeeds or fails. It just appears to be random, hard work, or luck. To help understand the reasons teams succeed, we will introduce the who, what, when, where, and how of data projects. By answering these questions, teams will understand what they’re trying to accomplish far better. Who: Data teams all start with people. This needs to be the right people, with the right skills and at the right ratios. You will need data scientists, data engineers, and operations all working together. What: Just saying you want AI isn’t enough. You need to know what business value will be generated. There should be a clear and attainable path to value creation. You have to clearly state what you are going to do to create value. When: Unattainable timelines aren’t feasible and neither are “when it’s ready” timeframes. Data projects need to deliver value on a sane timeline. This will include delivering in tranches so the team can gain velocity. Where: Clusters need to be spun up somewhere. Data needs to be stored somewhere. The data needs to come from somewhere. Data teams need to have a clear plan and architecture of where each piece will be done. How: Data teams need a clear plan that they are executing on. This plan needs a singular focus or the work will go in different directions. There need to be clear technical choices and specific technologies chosen. About the speaker...Jesse AndersonJesse Anderson is a Data Engineer, Creative Engineer and Managing Director of Big Data Institute. He mentors companies all over the world ranging from startups to Fortune 100 companies on Big Data. This includes projects using cutting-edge technologies like Apache Kafka, Apache Hadoop, and Apache Spark. He is widely regarded as an expert in the field and for his novel teaching practices. Jesse is published on Apress, O’Reilly, and Pragmatic Programmers. He has been covered in prestigious publications such as The Wall Street Journal, Harvard Business Review, CNN, BBC, NPR, Engadget, and Wired. You can learn more about Jesse at Jesse-Anderson.com. |
|||
10:35
Invalid Time
Invalid Time
|
Q&A WITH JESSE ANDERSON & BREAK |
|||
10:55
Invalid Time
Invalid Time
|
Workflow orchestration has traditionally been closely coupled to the concept of Directed Acyclic Graphs (DAGs). Building data pipelines involved registering a static graph containing all the tasks and their respective dependencies. During workflow execution, this graph would be traversed and executed. The orchestration engine would then be responsible for determining which tasks to trigger based on the success and failure of upstream tasks. This system was sufficient for standard batch processing-oriented data engineering pipelines but proved to be constraining for some emerging common use cases. Data professionals would have to compromise their vision to get their workflow to fit in a DAG. For example, How do I re-run a part of my workflow based on a downstream condition? How do I execute a long-running workflow? How do I dynamically add tasks to the DAG during runtime? This has led to the development of Prefect Orion (Prefect 2.0), a DAG-less workflow orchestration system that emphasizes runtime flexbility and an enhanced developer experience. By removing the DAG constraint, Orion offers an interface to workflow orchestration that feels more Pythonic than ever. Developers only need to wrap as little code as they want to get observability into a specific task of the workflows. About the speaker...Kevin KhoKevin Kho is an Open Source Community Engineer at Prefect, an open-source workflow orchestration management system. Previously, he was a data scientist at Paylocity, where he worked on adding machine learning features to their Human Capital Management (HCM) Suite. Outside of work, he is a contributor for Fugue, an abstraction layer for Pandas, Spark, and Dask. He also organizes the Orlando Machine Learning and Data Science Meetup. |
|||
11:25
Invalid Time
Invalid Time
|
Q&A WITH KEVIN KHO & BREAK |
|||
11:45
Invalid Time
Invalid Time
|
LUNCH BREAK |
|||
12:45
Invalid Time
Invalid Time
|
Data pipelines are a challenging part of working in data science. Netflix Conductor is a battle tested workflow orchestration tool invented at Netflix to automate millions of microservice workflows. Using a workflow tool like Conductor as a part of your data analysis pipeline can help streamline and improve the reproducibility of your results. In this presentation we will use Conductor to automate a set of microservices to build an automated workflow for a data pipeline workflow, automating the data acquisition, cleaning and analysis. About the speaker...Doug SillarsDoug Sillars is a Senior Developer Relations Engineer at Orkes. He’s a Google Developer Expert and the author of O’Reilly’s “High Performance Android Apps,” Doug regularly speaks at conferences, and blogs at dougsillars.com when he gets a chance. Doug’s career has spanned mobile application development, web performance, AR/VR, machine learning and video streaming and live streaming. He’s now focusing on the ‘backend’ helping developers build orchestrated micro services with Conductor. |
|||
13:15
Invalid Time
Invalid Time
|
Q&A WITH DOUG SILLARS & BREAK |
|||
13:35
Invalid Time
Invalid Time
|
This talk looks at Spark + Koalas, Dask, and Modin + Ray all of which attempt to provide the "holy grail" of big data – distributed pandas. We'll kick it old-school starting with "Sparkling Pandas", one of the OG distributed pandas (with the terrible performance to show for it). No talk like this would be complete without a conflict of interest disclosure, which for me includes being one of the two original co-authors for Sparkling Pandas (and some funny stories), being a Spark committer, but this balances out with my current work on co-writing books on Dask and Ray. At the end of this talk you will be questioning if you really want to scale pandas given all of the duct-tape involved, and have a good idea of how to choose which particular duct-taped-together solution is going to involve the least amount of rusty spoons in your eyeballs. About the speaker...Holden KarauHolden is a transgender Canadian Open Source Engineer at Netflix with a focus on improving OSS data tooling. She is the co-author of Kubeflow for Machine Learning (2020), High Performance Spark (2017) and Learning Spark (2015). She is a committer and PMC on Apache Spark and committer on SystemML & Mahout projects. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. |
|||
14:05
Invalid Time
Invalid Time
|
Q&A WITH HOLDEN KARAU & TEA BREAK |
|||
14:25
Invalid Time
Invalid Time
|
We all need a little optimism when it comes to climate change, but what does it look like when a team optimizes for reduced emissions? There's a huge amount of potential for data teams to benefit in surprising ways.Joining a cross functional data team at Culture Amp last year opened my mind to possibilities around communicating, monitoring and acting on climate change at work. In this talk I’ll unpack my journey, sharing ideas, anecdotes and tools to bring Carbon thinking onto the agenda.
data-machine-learning
About the speaker...Louis van SendenLouis van Senden is a Data Engineer at Culture Amp. Louis found his passion for data while leading the development of an IoT vehicle traffic data app at MetroCount. Having traversed the path from Front End through to Full Stack, applying his curiosity to huge vehicle by vehicle data sets was a natural progression. Drawn to Culture Amp's progressive values and ambitious goals Louis is now contributing to their growing data team helping improve the world of work. |
|||
14:55
Invalid Time
Invalid Time
|
Q&A WITH LOUIS VAN SENDEN & BREAK |
|||
15:15
Invalid Time
Invalid Time
|
END OF THE DAY 1 |
Day 2: June 2 AEST (UTC+10)
Main Track
Track | Main Track | |||
09:00
Invalid Time
Invalid Time
|
SESSION OVERVIEWS AND INTRODUCTIONS |
|||
09:15
Invalid Time
Invalid Time
|
KEYNOTE
Many scientific applications currently rely on the use of brute-force numerical methods performed on high-performance computing (HPC) infrastructure. However, these methods have their limits in many applications, e.g. climate prediction. Can artificial intelligence (AI) methods augment or even entirely replace these brute-force calculations to obtain significant speed-ups? Anima will present exciting recent advances that build new foundations in AI, viz. neural operators that are independent of the resolution or grid of training data and allow for zero-shot generalization to higher resolution evaluations. They have recently yielded 4-5 orders of magnitude speedups over standard numerical weather models. About the speaker...Animashree AnandkumarAnimashree (Anima) Anandkumar holds dual positions in academia and industry. She is a Bren professor at Caltech CMS department and a director of machine learning research at NVIDIA. At NVIDIA, she is leading the research group that develops next-generation AI algorithms. At Caltech, she is the co-director of Dolcit and co-leads the AI4science initiative, along with Yisong Yue. She has spearheaded the development of tensor algorithms, first proposed in her seminal paper. They are central to effectively processing multidimensional and multimodal data, and for achieving massive parallelism in large-scale AI applications. Follow Anima on Twitter, @animaanandkumar. |
|||
09:45
Invalid Time
Invalid Time
|
Q&A WITH ANIMASHREE ANANDKUMAR & TEA BREAK |
|||
10:05
Invalid Time
Invalid Time
|
Historically, data engineering has been relegated to batch processes running in the data warehouse or data lake - if the data didn't exist in the warehouse, it effectively didn't exist. In fact, operational systems - OLTP databases, messaging and streaming systems, caches, microservices, and so on - are not just primary sources of data, but destinations for processed data as well. In-app analytics, recommendation systems, online ML/AI infrastructure, automation, and a host of domain-specific services are just as much a part of a company's data platform as the data warehouse and BI tools. The common thread that ties these systems together is streaming and real-time data processing that handles both event-driven services and real-time data integration. While streaming systems have historically been the land of low level code and complex distributed systems, modern real-time data platforms and their associated patterns are making it just as easy to build data pipelines in SQL, on the stream, to power both operational and analytical applications. In this session we'll discuss the unified operational/analytical data platform architecture, the most common patterns, and how data engineering works in real-time. About the speaker...Eric SammerEric has been in the tech industry for over 20 years as an engineer, CTO, and most recently, CEO. Prior to founding Decodable, he held a number of different roles as an early Cloudera employee, and later, was the CTO and co-founder of Rocana which was acquired by Splunk in '17. At Splunk, Eric was a VP and Sr Distinguished Engineer responsible for cloud platform services. Eric is the author of Hadoop Operations (O'Reilly). Follow Eric on Twitter at @esammer and LinkedIn at /esammer. |
|||
10:35
Invalid Time
Invalid Time
|
Q&A WITH ERIC SAMMER & BREAK |
|||
10:55
Invalid Time
Invalid Time
|
The landscape of applied machine learning (ML) is becoming polarised. At one end are giants, complete with hundreds of people and GPUs (or TPUs) building impressive systems at scale. On the opposite end are startups, less burdened by the complexity of integrating with an existing product or the economics of turning a profit. But in the middle, established and growth companies using ML to create exceptional customer experiences walk a delicate tightrope. To succeed, they must engineer enough to match their existing products' scale and research algorithms that deliver a product experience not easily copied — all without breaking the bank.This talk explores the unique challenges of delivering impactful ML products in companies that are neither giants nor startups. We'll examine the economic constraints on innovation, how engineering choices shape the success of an ML product, and how to approach researching applied ML in new domains. About the speaker...Soon-Ee CheahSoon-Ee leads Applied Sciences (e.g. Machine Learning, Information Science) at Xero. He guides teams across the globe in building AI-powered products for Accountants, Bookkeepers, and Small Businesses that reduce toil and delight with insight. A desire to translate research into impact has led to a career traversing clinical trials, pharmacology, computational biology, and applied machine learning at SaaS companies. Soon-Ee's experience working on research programs with horizons ranging from weeks to years gives him a pragmatism and perspective about navigating the journey from lab to market. Follow Soon-Ee on LinkedIn here. |
|||
11:25
Invalid Time
Invalid Time
|
Q&A WITH SOON-EE CHEAH & BREAK |
|||
11:45
Invalid Time
Invalid Time
|
LUNCH BREAK |
|||
12:45
Invalid Time
Invalid Time
|
Genomics identifies your risk of developing disease and detects infectious diseases. However, to extract these clinical actionable insights, unprecedented data volumes need to be processed and done so within clinically relevant turn-around-times. The transformational bioinformatics group at CSIRO has built Big-Data and Cloud-computing solutions using machine learning and automated distribution channels to build health solutions that support globally connected health networks. This talk outlines how we developed novel bioinformatics approaches to cater for new applications, such as genomic new-born screening and tracking emerging strains of COVID-19. The talk concludes by touching on how increasingly privacy-conscious patients inspire a re-thing of data ownership and data transactions, where distributed dynamic consent borrows from ideas from the crypto space such as decentralized autonomous organization, and immutable tokens. About the speaker...Dr. Denis BauerDr Denis Bauer is an internationally recognised expert in bioinformatics and artificial intelligence, who is passionate about improving health by understanding the secrets in CSIRO's genome using cloud-computing technology. She is also a government research scientist, adjunct professor at Macquarie University and an AWS Hero. Her unique approach of joining cloud-computing with deep biological domain knowledge translates research into impactful products that have been used for disease gene detection in Motor Neuron Disease and the COVID-19 vaccine development. She contributes to open-source software projects and advisory committees, as well as keynotes international IT and Medical conferences. She was recognized as Brilliant Women in Digital Health 2021 and Finalist in Women in AI 2022. She has attracted more than $35M in funding to further life-science research and digital health.
|
|||
13:15
Invalid Time
Invalid Time
|
Q&A WITH DR.DENIS BAUER & BREAK |
|||
13:35
Invalid Time
Invalid Time
|
Operationalising Machine learning models, particularly scaling MLOps capability across teams within an organization is a difficult feat. In this session, we show how easily you can accelerate your MLOps journey using Metaflow and Amazon SageMaker. You will learn how Carsales keeps up with an increased demand in building and productionising AI models, and their strategy to democratise AI across the whole development teams. This allows any developer to be a citizen Data Scientist and ML Engineer. About the speaker...Agustinus NalwanPassionate in technology innovation to make people’s life easier and with over 25 years of experience in software development across industries from 3D/Animation, Games, mobile apps Computer Vision and AI. Gus currently works at Carsales as the Head of AI building cool AI techs and providing AI expertise to various teams across Carsales. His current mission is to scale up Carsales’ AI capabilities by democratising AI development to every software engineer & data scientist. Gus is an AWS Machine Learning Hero and has an extensive experience in Deep Learning, building/architecting a large scale end to end AI/ML pipeline and also loves speaking on AI and future tech related topics at various AI conferences to spread his experience, to get more people to join AI industry and to share crazy techs he built at home. Follow Gus on LinkedIn at @agustinus-nalwan. |
|||
14:05
Invalid Time
Invalid Time
|
Q&A WITH AGUSTINUS NALWAN & TEA BREAK |
|||
14:25
Invalid Time
Invalid Time
|
Bigger isn't always better and is definitely more likely to be irresponsible when it comes to datasets. But datasets are the unsung heroes of modern machine learning and AI systems, just as much as the algorithms, advanced hardware, and models that support them. In this talk, I share some tips and tricks from over 20 years of building and using datasets for search, natural language processing, and ML applications more generally. I discuss the importance of understanding your application task, gotchas with personalisation, the benefits of human diversity, a couple of patterns for dealing with too much or too little data, and last but not least some responsible AI considerations. About the speaker...Peter BaileyDr Peter Bailey is the ML Lead for Search and Recommendations at Canva. He has worked in a range of organisations, from academia to industrial research labs, from startups and consulting to Microsoft. He is a co-author of numerous scientific papers and co-inventor of a number of patents. He is the co-creator of several widely used datasets, including WT10g, CERC, UQV100, and CRD3. While at Microsoft, in addition to his applied scientist and manager roles and responsibilities, he was a member of its AETHER Committee's Fairness and Inclusiveness working group from 2018 through 2021. He is a Senior Member of the Association for Computing Machinery, and contributes regularly to program committees for conferences in the information retrieval community. He has the most fun learning and working in small teams creating new systems at the intersection of language, data and search. Follow Peter on LinkedIn @peter-bailey-0b74aa or on Twitter at @peterrbailey. |
|||
14:55
Invalid Time
Invalid Time
|
Q&A WITH PETER BAILEY & BREAK |
|||
15:15
Invalid Time
Invalid Time
|
END OF THE DAY 2 |
-
Building better and more responsible datasets
Featuring Peter Bailey
Bigger isn't always better and is definitely more likely to be irresponsible when it comes to datasets. But datasets are the unsung heroes of modern machine learning and AI systems, just as much as the algorithms, advanced hardware, and models that support them. In this talk, I share some...
-
Why Most Data Projects Fail and How to Avoid It
Featuring Jesse Anderson
Unfortunately, the majority of data projects fail. Yet, they fail for the same reasons. Most management and data teams don’t know the reasons a project succeeds or fails. It just appears to be random, hard work, or luck.
-
Managing Data Pipelines with Conductor
Featuring Doug Sillars
Data pipelines are a challenging part of working in data science. Netflix Conductor is a battle-tested workflow orchestration tool invented at Netflix to automate millions of microservice workflows. Using a workflow tool like Conductor as a part of your data analysis pipeline can help streamline...
-
Lessons Learned from DAG-based Workflow Orchestration
Featuring Kevin Kho
Workflow orchestration has traditionally been closely coupled to the concept of Directed Acyclic Graphs (DAGs). Building data pipelines involved registering a static graph containing all the tasks and their respective dependencies. During workflow execution, this graph would be traversed and...
-
How Digital Has Improved The Health Care We Receive
Featuring Dr. Denis Bauer
Genomics identifies your risk of developing disease and detects infectious diseases. However, to extract these clinical actionable insights, unprecedented data volumes need to be processed and done so within clinically relevant turn-around-times.
-
-
YOW! Data 2021
Two days - Online Conference
YOW! Data is an opportunity for data professionals to share their challenges and experiences while our speakers share the latest in best practices, techniques, and tools.
The 2021 conference was online in a two-day event featuring invited international and Australian speakers sharing their...
data machine-learning-ai data-engineering data-science -
YOW! Data 2020
Three days - Online Conference
We're delighted to present an online version of YOW! Data in 2020, featuring selected invited speakers from our face to face conference. YOW! Data is an opportunity for data professionals to share their challenges and experiences while our speakers share the latest in best practices, techniques,...
architecture discovery machine-learning-ai data-engineering data-science -
YOW! Data 2019
Two days in Sydney
YOW! Data is a two day conference that provides in-depth coverage of current and emerging technologies in the areas of Big Data, Analytics and Machine Learning.
The number of data generators (drones, cars, devices, home appliances, gaming consoles, online services, medical and wearables) is...
architecture discovery engineering ai-&-ml practice -
YOW! Data 2018
Two days in Sydney
YOW! Data is a two day conference that provides in-depth coverage of current and emerging technologies in the areas of Big Data, Analytics and Machine Learning.
The number of data generators (drones, cars, devices, home appliances, gaming consoles, online services, medical and wearables) is...
architecture discovery machine-learning visualisation data-architecture concept algorithm design technique practice data -
YOW! Data 2017
Two days in Sydney
YOW! Data is a two day conference that provides in-depth coverage of current and emerging technologies in the areas of Big Data, Analytics and Machine Learning.
The number of data generators (drones, cars, devices, home appliances, gaming consoles, online services, medical and wearables) is...
data-science data -
YOW! Data 2016
Two days in Sydney
YOW! Data is a two day conference that provides in-depth coverage of current and emerging technologies in the areas of Big Data, Analytics and Machine Learning.
The number of data generators (drones, cars, devices, home appliances, gaming consoles, online services, medical and wearables) is...
data-science data