Independent Health Case Study: Data Engineering Team Training

Awarded one of the best companies to work for in New York State 14 consecutive years, Independent Health is known for staying on the cutting edge of technology. While preparing to migrate their data to the cloud, the Information Management leadership team at Independent Health needed to ensure that the entire team of data architects and data engineers were able to quickly level up their skills in Python programming and in the new cloud technologies they would soon be using on a daily basis.
By Claudia Virlanuta • Updated on Aug 24, 2022

 

Executive Summary

Awarded one of the best companies to work for in New York State 14 consecutive years, Independent Health is known for staying on the cutting edge of technology. While preparing to migrate their data to the cloud, the Information Management leadership team at Independent Health needed to ensure that the entire team of data architects and data engineers were able to quickly level up their skills in Python programming and in the new cloud technologies they would soon be using on a daily basis. Edlitera was brought on to design and facilitate a custom training on Python programming for data engineering using the most popular analytics engine for large scale data processing, and other data engineering services and libraries. As a result of the training, the data migration and integration team were able to hit the ground running with using pyspark and other data engineering services in the cloud.

"I went from having little Python and no pyspark experience to having confidence to write Python and pyspark code while people are watching in our Cloud Build Lab in just a couple of weeks. I would like to thank Ciprian for the training and for his patience"
- Venkat K., Data Architect at Independent Health

 

Situation

A newly formed group, the data migration and integration team are responsible for crucial data-enabled processes. The team primarily used Oracle and ETL tools, but the organization is in the process of migrating targeted data assets to the cloud. The data migration and integration team have diverse backgrounds, with varying degrees of experience in SQL, ETL tools and other programming languages. The team was also looking for a new data analysis and scripting tool to use uniformly within the new cloud environment and for their most critical workflows. For this training, time was of the essence, and the team needed to become expert users of their new tools and environment as soon as possible.

 

Solution

Python was identified as the best tool to manage different processes and workflows in the cloud and in local environments, based on its large ecosystems, community and mature solutions for data processing, data analysis and data engineering. Edlitera was brought in to help bring the team to the needed baseline on Python programming, as well as on other data engineering services and tools.

Edlitera designed a custom training curriculum that combined theoretical concepts and hands-on practice, with a strong focus on using Python and other data engineering services and tools to design and build highly scalable data pipelines.

The focus of the training was threefold: 

 

  1. Get everyone comfortable using Python as a general programming language;
  2. Learn how to leverage Python to build highly scalable data pipelines in the cloud; 
  3. Learn about best-practices when using Python in the cloud.

In-class code concepts and examples were followed by in-class practice problems and assigned homework that allowed participants to get comfortable with Python and other data engineering services within the cloud environment.

 

Outcome

By the end of the course, participants learned how to use Python as a general programming language, and how to design, create and test custom data pipelines. They also got hands-on practice using Jupyter notebooks to author and run code for large-scale data processing. Finally, participants learned how to deploy and schedule data transformation jobs in the cloud, and how to leverage data stored in data lakes, databases and warehouses. Emphasis was also placed on how to use security, identity and access management policies, as well as Python and cloud best-practices.

 

Customer Quotes

“The interactive nature of the training and the combination of Python scripting and cloud tools were both instrumental in the learning process. Great discussions and exercises.” 
– Beth P., Senior Data Architect at Independent Health
 
“Ciprian has been an excellent teacher.  His training has equipped me with how to deal with real world Big Data scenarios, especially for ETL purposes. Ciprian really went above and beyond with his efforts throughout this training.”  
- Sushma D., Data Architect and Engineer at Independent Health

Claudia Virlanuta

CEO | Data Scientist

Claudia Virlanuta

Claudia is a data scientist, consultant and trainer. She is the CEO of Edlitera, a data science and machine learning training and consulting company helping teams and businesses futureproof themselves and turn their data into profits.

Before Edlitera, Claudia taught Computer Science at Harvard, and worked in biotech (Qiagen), marketing tech (ZoomInfo), and ecommerce (Wayfair). Claudia earned her degree in Economics from Yale, with a focus on Statistics and Computer Science.