Alternatives to Anaconda

Jan 19, 2022 4 min read

Table of Contents

Overview:
- Advantages of using Anaconda Distribution:
Alternatives to Anaconda distribution:
Summary

Overview:

Anaconda is one of the popular python distributions. Until recently, the individual repository could be freely used. However, recent changes to its terms of services prohibits free use of Anaconda repository.

Free use requirements have the following clauses.

Even though this is applicable for the commercial version, the source code branches for some of these libaries are shared between individual edition and commercial version and thereby individual version is also not free anymore.

The rest of this post discusses advantages of using Anaconda and in its absence, alternate options and its related configurations.

Advantages of using Anaconda Distribution:

Provides a package and environmental manager
Excellent package dependencies
Support for binary installers
Brings all important and frequently used data science libraries
Manages up to date package versions and conda channels/repos
User friendly GUI support and tools such as spyder
Multi OS support

Alternatives to Anaconda distribution:

I could explore three alternatives if we have to move away from Anaconda or Miniconda as python distributions. In no specific order, they are:

Installing packages from Conda-Forge Channel:

Conda forge is open source channel supported by Anaconda and community driven. This channel has most of the latest versions of the packages. However, we need an installer to pull the packages from this channel and the supported installers are conda and miniconda and both require licenses going forward.

To circumvent this, we have to first install ‘miniforge’, an open source installer from github and then using miniforge, install packages from conda-forge.

Steps to install miniforge:

Install python from python.org
Install miniforge from github
Create a viritual environment and always install packages inside virtual environment so as not to break the base setup.
Using conda-forge, manage package and environment dependences. Example: to install pandas from conda-forge, use the command `conda install -c conda-forge pandas`

Cons:

No support for binaries
No support for packages that are not in conda-forge channel
Latest versions may not be available sometimes

Using ActiveState distribution

I have not explored this distribution. This one comes with its own installer, environment and packages. If one is used to conda syntax and CLI, there could be minor learning curve to use new set of commands related to ActiveState.
Biggest limitation with this distribution is that only one runtime instance of python should be run at any point in time. Depending one’s use cases, if one needs to spin more than one instance of python within 24 hours, this distribution would require license purchase.

Using python and pip

This is the simplest approach to create a pythonic environment for data science work. All it requires is install python.

Once python is installed, we have to install two or three other tools before we start working on our data science workspace.
One of the virtual environment manager. With python3, environment manager comes with it (venv) but there are other 3rd party environment managers also available if one prefers it such as (pip-env, virutalenv, poetry, etc).
Install pip-tools to manage package dependencies

Pros:

All the packages are free and open source
pip-tools greatly enhances dependency management between packages
Limited learning curve and very easy to use

Cons:

Once the project management requirements become a bit more complex, pip-tools will start giving issues
If we require packages from github, pip-tools may not manage the dependencies all the time
Indirect dependency management could be an issue. Say we thought we wanted a stat package which in turn depends on another obscure package. Later when we decide we do not require stat package, the dependent obscure pakcage may be left in the system.

Summary

Anaconda seamlessly takes care of environment and package dependencies..though it may bloat our system with too many unwanted packages. With other tools, the user has to do some prep work to replicate more or less similar environment. Depending on users’s time, prior knowledge and comfort, this could take minimal to moderate effort if the user actively works on data science projects involving multiple packages and libraries.

If there is requirement to move the development between local and cloud or local and a datalake environment, the installation and package management steps may require replication efforts.

Anaconda