Getting Along with Python

Make installable Python packages

Learning to make installable Python packages isn’t as hard as you might think, and it’s very useful. It helps you share your work for others to use, if you want to. It also lets you modularize your own projects into separately installable components, so you can reuse more code across projects and manage large and complex projects more easily. It’s only a little work up front.

If you are at the point of thinking how you are going to copy your files to locations on a user’s system, it’s about time you made a package and let Python’s normal packaging tools automate and standardize most of the process for you (including creation of archive files or whatever other form you want to distribute your code in).

Python software which insists on a homegrown, nonstandard installation process is generally a huge pain for users and it’s harder on the author. If you’re serious about using Python, please learn basic Python packaging.

Note

Most details not covered in this guide will be covered somewhere in the extensive Python docs on distutils. Ignore old docs which reference ‘distribute’ and be careful to verify what you read about ‘setuptools’. The distutils docs will tell you 99% of what you need to know to do Python packaging, and this doc tries to provide the last 1%.

Lower-frustration directory structure

Unless you have specific technical reasons to do otherwise, it should minimize your frustration to structure your project’s directory as follows:

That might seem a little complex at first, but it’s 95% of what you will ever need. I’ve personally tried many variations, fighting the prevailing advice on project layout for years, so please take my word that the prevailing advice reflects accumulated wisdom on how to live with Python. Start with this standard way that makes it easy for tools and other people to understand what’s going on, and easy for you to expand as you need to add things to your package. After you have actually done it once or twice, it probably won’t seem complex any more and you will not have to think about it very much. (You can also automate the setup of a project skeleton with any templating tool like mrbob - or roll your own if you really feel like it).

The following sections look at what each piece does and why it is included in the default structure in more detail. (Of course you will want to enter all these files into Version Control to track changes to them.)

Top level directory

The top level directory name doesn’t matter. In the example, I named it foobar because my example package is named foobar. But whatever you call it, you do need a top level directory to contain things like setup.py.

Don’t confuse this with the directory containing your actual code, even though they will often have the same name.

setup.py

This required file must be in the top level directory, with this exact name. It is what makes the package installable and provides metadata about it, like the author name and whatnot.

Warning

A possible problem with setup.py is that it contains actual Python code which runs. While this may initially seem exciting, overusing it can make your package installation process brittle and likely to fail. With just a little restraint, this is not a big problem in practice. Just don’t put too much code in.

For instructional purposes, here is the smallest setup.py which could sort-of work (although technically, this is missing required fields, so please don’t actually use it as-is):

from distutils.core import setup

setup(
    packages=['foobar'],
)

distutils is a module that comes with Python itself, so you are guaranteed that everyone has it. It works.

All this setup.py does is import a function named setup from distutils, and run it, defining the ‘foobar’ subdirectory (or foobar.py) as containing a top level package to install. This is all other tools will need to do things like generate a .tar.gz of the package, or install it.

If you wanted to include a command line script that users could run as ‘narf’ after installing foobar, you write a command line script in scripts/narf (NOT narf.py) and add a scripts= directive to setup.py (again, a real one should be longer):

from distutils.core import setup

setup(
    packages=['foobar'],
    scripts=['scripts/narf'],
)

This would then copy ‘narf’ into an appropriate place when the user installed the package foobar (e.g., on PATH).

For the sake of understanding, these examples have been much simpler than they should be. In real practice, you would probably want to add a little more information on the project, which can also be reflected on PyPI if you want to publish your package there. For example:

from distutils.core import setup

setup(
    name="foobar",
    version="0.0.0",
    author="Pat Exampleperson",
    author_email="pat@example.com",
    url="http://example.com/projects/foobar",
    description="Munge important files according to the foobar algorithm",
    license="MIT",
    packages=['foobar'],
    scripts=['scripts/narf'],
    classifiers=[
        "Development Status :: 2 - Pre-Alpha",
        "Intended Audience :: Developers",
        "Programming Language :: Python :: 3.3",
    ]
)

name, version, and url are required fields. For more on the fields, see the distutils doc on additional meta-data. If you need to know the kinds of things you can put in the classifiers list, check out the complete list of PyPI classifiers. You might also be interested in Python’s distutils doc on writing setup.py, which covers the same territory as this section in more detail.

If you plan on releasing more than one version of your project, please consider using Semantic Versioning to make things easier for your users.

LICENSE

LICENSE should be a flat text file in the top level directory, clarifying the conditions under which you (assuming you hold copyright to the code) permit others to legally use, modify and redistribute the code in this package. Consider this a required file.

Without a clear license, your code is radioactive - legally risky to use in any way.

It’s probably a bad idea to make your own license; choose from standard licenses like GPL or the MIT license. If you want to release something as public domain, please research the legal implications of ‘public domain’ dedications, as these may be unclear and/or differ between countries. But you should still make clear note of your intent in the LICENSE file.

Everyone hates legal boilerplate and lawyers, but please don’t mess things up for people who want to use your code by using an unclear license that doesn’t clearly communicate the conditions under which they are not legally threatened.

README

README should be some kind of text file explaining what your package is, with a little top-level summary documentation.

This is mostly for the benefit of humans who might use your package, according to time-honored tradition; but may also be used by sites where you might put your code, like PyPI or GitHub, to generate a summary page for your project.

Since this is Python, you might consider writing your README as reStructuredText and naming it (or symlinking as) README.rst to tell some tools (like GitHub) that it is to be rendered accordingly. If you want PyPI to show your README contents on the package’s page there, you can set its contents as the long_description field in setup.py.

If you want to, you can have your setup.py include the content of your README.rst file as the package’s long_description (e.g. for display on PyPI).

from distutils.core import setup

setup(
    ...
    long_description=open('README.rst').read(),
    ...
)

(By the way, this is not normally a good way to open and read a file, but it should work fine for setup.py. You could also make a function or something in setup.py to read the contents of the file more properly, it’s up to you.)

docs/

docs/ is where you will put files used to build Python package documentation, or just some simple text docs. The name ‘docs’ is arbitrary, but it’s clear and short and conventional. If you don’t want to include any documentation, you can omit this directory, but any quality package has some documentation.

See also the section on how to do Document your project.

Code directory

This is a subdirectory underneath the top-level, containing the actual Python modules/subpackages you want to make installable. It contains an empty __init__.py to mark this directory as an importable package. Unlike the top-level directory, its name is critically important; it will be the top-level importable name of the package. Again, remember: the containing directory isn’t a package and isn’t imported. It is this subdirectory containing __init__.py which will be the package.

In this example, it is foobar/foobar: the first foobar before the slash is an arbitrary name for the top-level container, and the second foobar after the slash is the actual importable name of the package. (Remember that foobar isn’t a real package name, just an arbitrary example. In reality, you would use whatever you wanted the importable name of your top-level package to be in the places I have put the word ‘foobar’.)

The reason we do not name it something like src/ (which used to be common to do) is that we cannot so easily use nice tools like pip install -e with it. Just use the package name.

The reason we do not just put everything under the top-level directory (which I used to do, with difficulty) is that doing that will make it unnecessarily hard to write setup.py, which is happiest copying subdirectories underneath where setup.py is kept. And other reasons I forget. Just put it under the top-level directory.

Note

It is actually possible to make an installable package which is all contained in one .py file, with some adjustments to setup.py. But there is no real advantage to that, and making a subdirectory will simplify things if you want to upgrade to multiple modules in a package. It is also less new thinking to do for each new package.

Subpackages

To make a subpackage, just create a subdirectory under the package directory, with its own __init__.py. Example: if your package top level is foobar and you create a subdirectory under it named smorf, this will be accessible with ‘import foobar.smorf’ or ‘from foobar import smorf’. If the smorf/ subdirectory contains a module ‘handy.py’, that can be imported with ‘from foobar.smorf import handy’.

tests/

foobar/tests should be the default place to put automated tests.

Use this conventional location and name because that helps tools like test runners and IDEs to find it, as well as humans who want to quickly find the tests.

Include the tests dir under foobar/ and put an __init__.py in to make it a subpackage. Technically, you can either put tests underneath the top-level package directory (in the source tree) or underneath the very topmost directory (outside of the source tree, in parallel with it). But the first way can be nice to let people easily run your tests when they install the package, to check whether things are working.

Consistently putting the tests for foobar/somemodule.py in foobar/tests/somemodule.py makes it easy to find tests corresponding to tested things, and vice versa. (For example, you might want to bind a key in your editor to ‘switch to the test for this code’ or ‘switch to the code for this test’).

For an introduction to testing, see Write automated tests.

scripts/

scripts/ is where you will put any files you want to have installed as command-line scripts. You can technically use any name you want (some use bin/), but use something clear to humans. If you don’t need to install any scripts, then you can omit this directory altogether and that is very normal for libraries.

Assuming you do have a command-line script in a file called exactly scripts/narf, you can have it installed through setup.py by including 'scripts/narf' in the scripts= value passed to setup():

from distutils.core import setup

setup(
    ...
    scripts=['scripts/narf'],
    ...

Scripts you include with a package should ideally do nothing except import a main function from the package and run it. Resist the temptation to put lots of code into the script file itself. This lets the same script file work with any version of the package. (Sometimes users end up with old/mismatched scripts, and it is nice to handle this gracefully.) It also lets the whole implementation be reused or tested from python code.

If you write command-line scripts for other people to use, there are a number of specific practices you should be using to make those scripts standard and robust; see scripts.

__main__

There’s an additional nice mechanism for including scripts with your packaged library. If you put a file named __main__.py in a package directory (suppose it is called foobar), and then run python -m foobar, then Python will run that __main__.py. No hashbang line is necessary, and setup.py should not include this as a script.

This is useful if your package is really primarily a library but includes a demo or useful mini-utility, something like that, or if you already happen to know that your package won’t be allowed to put its script on $PATH.

Specifying Dependencies

Many packages depend on other packages, and it would be nice to have them installed when someone goes to install your package.

While you develop, you can just install everything mentioned in requirements.txt with pip install -r requirements.txt. And this is fine for many apps: you can tell your users to make a new virtualenv and run pip install -r requirements.txt in it to prepare their deployment. It is definitely enough for testing, just document in README that users should use test_requirements.txt or whatever, which can also be used from tox.ini if you use that. (For more about virtualenv, see Manage Python dependencies with virtualenv.)

But the presence of a requirements.txt doesn’t mean that anyone who installs your package with pip install will get the things from requirements.txt installed. This is probably not what you want if you are redistributing your code to other people, expecting them to just run pip install mypackage.

distutils lets you somehow declare requirements in setup.py using the requires= argument to setup(). But as far as I know, nothing uses this field. So that’s a dead end.

The only way I know of right now to automatically have your package’s dependencies install along with it is to use setuptools instead of distutils. Nowadays, most Python installations will come with setuptools, because they come with pip. However, there are some longer-term caveats I will discuss shortly.

Having your dependencies installed with setuptools requires two adjustments: the first is to import setuptools instead of distutils, and the second is to use install_requires instead of requires. In this example, we are specifying that we want version 1.0.0 of a package called something, and that we expect that to be automatically installed when our package is installed:

from setuptools import setup

setup(
    name="something_new",
    version="0.0.1",
    packages=['something_new'],
    install_requires=["something==1.0.0"],
)

These days, this is probably the single most convenient way to set up a Python package. However, I find the requirement to use setuptools for this purpose really unfortunate, for several reasons.

  • The situation is unnecessarily confusing. Instead of having one obvious way to make a Python package, there are two modules. Which to use? What’s worse, they work differently. For starters, instead of supporting distutils requires, setuptools created its own parameter called install_requires.
  • setuptools has a maze of strange old features. These attract the interest of users who then either find that they don’t work, that they are actually bad ideas, or that using them locks the project into setuptools. This is not a big problem, until setuptools starts doing something weird.
  • setuptools’ code is a bit of a mess, and it does questionable things like monkey-patching over distutils, so it isn’t uncommon to find it doing something weird.
  • setuptools has never been actually standardized as the Python packaging library, so there is no guarantee that it will always be available, or that any future standard will have backwards compatibility with its many quirks. Whenever setuptools is replaced with something official, the adjustment will be very painful to projects which depended on all the old features and idiosyncrasies attached only to setuptools.

So my general advice is to use setuptools, but to try to stick to the features that it shares with distutils, with the exception of install_requires.

And maybe someday Python will finally fix its packaging situation, like they were going to do with the standard “packaging” module before it was killed for some strange reason.

Making Distribution Files

Once you have your package set up, you should be able to use it to make .tar.gz or .zip files containing the package. To do that, go into your top-level directory containing setup.py

python setup.py sdist

The archive will be placed under a subdirectory called dist/. The included files will be listed in a temporary file called MANIFEST. Check these to see what was included.

You may want to read about what files are automatically included in an sdist archive.

Python packages should be listed in the packages=[...] line, scripts in scripts=[...] and data files needed by the packages in package_data=[...]. If for any reason those don’t cover your needs, anything else (maybe docs?) can be included by writing a file called MANIFEST.in which lives in the same place that your setup.py does. See Python’s docs on how to write MANIFEST.in. This could be a file which says “include docs/*.txt”, or whatever you want.

Another kind of distribution file you can make is a “wheel”. wheels are a newer kind of package file which has some advantages like installing faster, and allowing you to install binaries (for example, a pre-compiled C extension on Windows or OS X; though binary wheels have not properly supported Linux so far).

In order to generate a wheel from your package, make sure that you have wheel installed with pip install wheel and then run

python setup.py bdist_wheel

If you have code which supports both Python 2 and Python 3, and you are not using the 2to3 tool to achieve that, and you aren’t using C extensions, then you should create a file called setup.cfg at the project root containing:

[bdist_wheel]
universal = 1

This will make it possible to upload your “universal wheel” to PyPI and have it install correctly for Python 2 and Python 3 users across platforms.

I am omitting some details about wheels here. If you want to learn more, please see the documentation for the wheel module.

Making Releases

In order to make an initial release of your project, you will need to go through several steps.

  1. Make sure you have an account on PyPI.
  2. Check your setup.py for validity using python setup.py check. Solve any issues it finds.
  3. Double check that your package has a polite name and does not contain any nonsense like “Pat Exampleperson”.
  4. Use python setup.py register to upload the data about your package from your setup.py. This may prompt you for login information it can use and you can decide for yourself whether and how you want this information stored.

Now you are ready to upload some code. Before every release of your package, always check it over: make sure its tests are passing, run syntax/style checkers on it, and make sure it installs properly. Distributing broken code is embarassing and frustrates your users.

Then you can use this command to build an archive file (as in python setup.py sdist) and upload it to PyPI so others can install it:

python setup.py sdist upload

If you want to upload a wheel instead, here is how:

python setup.py bdist_wheel upload

Assume these uploads are forever: once you’ve uploaded something you can always assume someone downloaded it, and it’s not polite to actually remove an old version when someone might need it for some old project. Look at it as an added incentive to thoroughly test code before pushing it up to PyPI.

When you want to release another version of your package, change the version in your setup.py and repeat this process.

Semantic Versioning

If you are going to release a package to the public, it’s worth thinking about how you will manage version numbers. Semantic Versioning formalizes a discipline in setting your version numbers, which communicates to your users when a release breaks backward compatibility and when it doesn’t.

The basics, according to http://semver.org:

Given a version number MAJOR.MINOR.PATCH, increment the:

  • MAJOR version when you make incompatible API changes,
  • MINOR version when you add functionality in a backwards-compatible manner, and
  • PATCH version when you make backwards-compatible bug fixes.

The exception is your 0.x.x series, in which you do not have to increment the major version from 0 for every backward-incompatible change, just increment the minor version instead. Publishing a package as 0.x.x means that you are not ready to make ironclad long-term commitments not to break backward compatibility.