Table of Contents
Executive Summary
A leading health insurance company based in Buffalo, NY, Independent Health has a track record of excellent service rooted in efficiency and innovation. In a bid to make certain workflows more efficient and resilient, the analytics group at Independent Health partnered with Edlitera to design and deliver a training curriculum covering Python for data analysis and machine learning. By the end of the training, participants got hands-on experience with using Python for analyzing, visualizing, filtering and merging data, as well as for training, evaluating and optimizing several types of machine learning models. After completing the training, the team was able to write Python code to automate several dashboards and reports that previously relied on manual processing using Microsoft Excel and SQL.
“Python has helped us to automate the data processing steps of our COVID dashboard. We receive updated files daily and use Python to append the new data to the existing dataset and perform a number of calculations that are persisted to a table that is used to populate a dashboard.”
- Nicole E., Manager, Informatics Analytics
Article continues below
Want to learn more? Check out some of our courses:
Situation
The analytics group at Independent health is responsible for developing clinical and bioinformatics analyses and applications with the main goal of improving patient care. The team has diverse backgrounds, including medical research, bioinformatics, statistics, predictive modeling and software development. Typical analysis workflows relied heavily on SQL, SAS and MS Excel spreadsheets. In light of the upcoming migration of some data assets to the cloud, the analytics team identified Python as the ideal tool to help simplify, automate and document a number of data processing, analysis and reporting processes.
Solution
Given participants’ varying levels of experience with Python programming, Edlitera was brought in to teach a 14-week program consisting of three courses:
- The basics of Python programming,
- Data processing using Python, and
- Machine learning using Python.
The training emphasized hands-on practice, best practices, data scalability, security and maintenance.
The focus of the training was 3-fold:
- To get everyone comfortable with Python and data processing using Python, as a way to replace or augment existing SAS, SQL and Excel processes.
- To learn how to break down complex tasks and write code to automate them.
- To learn how to leverage Python Machine Learning libraries and available data to implement solutions using classification, regression and clustering predictive models.
In-class code examples were followed by in-class practice problems and assigned homework that allowed participants to get comfortable with data processing and machine learning in Python. Participants were encouraged to work on relevant projects, and the instructor shared detailed proposed solutions to each project after the end of each course.
Outcome
By the end of the training series, participants learned how to use Python both as a general programming language, as well as in the context of data processing and machine learning. Assigned projects with detailed solutions allowed the participants to practice data analysis and machine learning techniques starting with raw data. Specifically, participants learned to read data from a variety of data sources such as files and databases, and to merge, filter, sort, analyze, visualize, aggregate and manipulate data at scale using Python. The team also got hands on practice training, evaluating and tuning machine learning models for regression, classification and clustering. As a result of the training, the team was able to use Python to automate previously manual and time consuming data processing tasks.
“I have a couple reports where I have to export data from our vendor tool and import them into Oracle. The files are very large, and it can be frustrating loading them to the database using the current tools we have, as the load tends to fail. I wrote a Python script that basically looks for the latest files in a folder and then replaces existing tables or creates new tables in Oracle with the data from the most recent files. This saves me a lot of frustration and time trying to get the files loaded. It was also fun to build!”
- Becky W., Senior Research Analyst
“While we haven't implemented any code as of yet, we plan to use Python to merge and output data to Excel for our comprehensive member stratification file. We also plan to try a modeling project with it in the near future.”
- Jennifer B., Manager, Informatics Analytics
“I have tried to build a few projects that I would have normally done in SQL in Python. I find that loading data using Python is a lot simpler and quicker than trying to load it to a table in Oracle where I always have to define data types and field sizes.”
- Laura A., Manager, Provider Analytics
“I've been working on automating our EPR (Employer Performance Reporting) process. Essentially I'm taking a lot of excel files and picking data from certain cells to provide analysts a quicker route of finding information rather than manually looking through these files to pull what they need.”
- Darren P., Intermediate Research Analyst
