Stay Ahead of Breaking Changes with the New Lightning Ecosystem CI

A new integration platform for Data Science and Deep Learning packages based on Lightning to boost compatibility, perform regular testing, and raise early warnings in case of possible collisions.

イルカ Borovec
PyTorch Lightning Developer Blog
5 min readDec 7, 2021

--

Illustration Image by Phoeby Naren.

The pace of innovation for deep learning is constantly changing, making it challenging for frameworks to keep up with all the latest techniques for achieving state-of-the-art (SOTA). Not all breakthrough ideas are compatible with existing framework codebases and APIs. Therefore dealing with breaking changes has become a new norm for deep learning projects.

Imagine you are about to announce a new version of a popular open-source AI project. All features are in place, tests are implemented, and ready for release.

When all of a sudden…

Oh, No!

Tests start to fail unexpectedly!

After a bit of freaking out, you start to troubleshoot what’s going on. Only to discover that some dependency which used to work an hour ago, released a new version with breaking changes.

Lightning has released a new EcoSystem CI project to make this pain, the pain of the past…

Project logo.

Lightning is a lightweight wrapper for high-performance AI research that aims to abstract Deep Learning boilerplate while providing you complete control and flexibility over your code.

As new SOTA methods are released, the Lightning team works hard to integrate these developments into our framework, which requires refactoring internal structures/modules to accommodate new features. As the adoption of PyTorch Lightning continues to grow, so does the need for framework stability and code transparency. We developed the new Lightning Ecosystem CI to provide transparency and resolve these challenges.

We invite all the AI community to take advantage of the best practices provided by this project, from continuous integration to compatibility testing.

Photo by fauxels from Pexels

How does the EcoSystem CI work?

The EcoSystem CI is a lightweight repository that provides easy configuration of Continues Integration running on CPU and GPU. Any user who wants to keep their project aligned with current and future Lightning releases can use the EcoSystem CI to configure their integrations. One of the main goals is to enable early discovery of issues through regular testing against stable and development versions of Lightning.

Using the Ecosystem CI directly, provides out-of-the-box nightly testing on CPU as well as multi-GPU compute. The project can also be forked and run with your own custom environments and compute resources.

We designed the EcoSystem CI to provide a unified structure flexible enough to cover all practical integration needs. The integrations leverage GitHub actions (CPU with many OS versions) and Azure pipelines (GPU with Linux only). We also provide native parallelization, so all projects are tested concurrently using caching to speed-up environment creation.

Under the hood, the platform runs two simple procedures:

  1. Prepare Environment and Install all Dependencies
  2. Copy and Execute all Linked Integration Tests

These steps can be easily extended and used with any other CI system such as CircleCI if you require testing on different types of computing.

Sample configuration file for adding TorchMetrics.

How to Integrate a New Project

We welcome including your project in the PyTorch Lightning ecosystem and providing integration/compatibility testing to ensure nothing accidentally breaks anything your project relies on…

To add a new project, follow these simple steps:

  1. Fork the ecosystem CI repository to be able to create a new Pull Request and work within a specific branch
  2. Create a new config file in configs/<Organization-name> folder and call it <project-name>.yaml
  3. Define runtime for CPU and link the config for GPU:
    * for CPU integrations, list OS and Python version combination to be running with GitHub actions
    * for GPU integration, you only add the path to the config (OS/Linux and Python version is fixed) to be running with Azure pipelines
  4. Add a Contact to the .github/CODEOWNERS list for your organization folder or just a single project
  5. Create a Draft PR with all mentioned requirements
  6. Join our Slack (Optional) channel #alerts-ecosystem-ci to be notified if your project is breaking
Sample notification when a project starts to fail.

Note: If your project tests against multiple Lightning versions or branches such as master and release, you must create a config file for each version/branch.

Photo by Christina Morillo from Pexels

Are you interested in more cool Pytorch Lightning integrations?
Follow me and join our fantastic
Slack community!

Next Steps

Built by the PyTorch Lightning creators, let us introduce you to Grid.ai. Our platform enables you to scale your model training without worrying about infrastructure, similarly to Lightning automates the training.

About the Author

Jirka Borovec has been working in Machine learning and Data science for several years in a few different IT companies. In particular, he enjoys exploring interesting world problems and solving them with state-of-the-art techniques. In addition, he developed several open-source python packages and actively participated in other well-known projects.

--

--

I have been working in ML and DS for a while in a few IT companies. I enjoy exploring interesting world problems and solving them with SOTA techniques…