What’s __init__ for me?
Python modules
Modules in Python are simply Python files with a .py
extension. The name of the module will be the name of the file. A Python module can have a set of functions, classes or variables defined and implemented.
It is usually a good idea to split code into smaller modules for a couple of reasons. Primarily, modules can contain all of the code related to a particular coherent topic (e.g., all of the I/O functionality) without being cluttered by code related to something completely different (e.g., plotting). For this reason, it is common to see large classes get a dedicated module (e.g., geodataframe.py within geopandas). Secondarily, dividing code into appropriate logical units makes it easier to read and easier to understand.
Python module/package names should generally follow the following constraints:
- All lowercase
- Unique on pypi, even if you don’t want to make your package publicly available (you might want to specify it privately as a dependency later)
- Underscore-separated or no word separators at all (don’t use hyphens)
Module Initialization
The first time a module is loaded into a running Python script, it is initialized by executing the code in the module once. If another module in your code imports the same module again, it will not be loaded twice but once only - so local variables inside the module act as a "singleton" - they are initialized only once.Python Packages
In our computer systems, we store our files in organized hierarchies. We don’t store them all in one location. Likewise, when our programs grow, we divide it into packages. In real-life projects, programs are much larger, package lets us hold similar modules in one place. Like a directory may contain subdirectories and files, a package may contain sub-packages and modules. But what distinguishes a package from a regular directory?
Well, a Python package must have an __init__.py
file in the directory. You may leave it empty, or you may store initialization code in it. But if your directory does not have an __init__.py
file, it isn’t a package; it is just a directory with a bunch of Python scripts. Leaving __init__.py
empty is indeed good practice.
Packages are namespaces which contain multiple packages and modules themselves. They are simply directories, but with a twist.
Each package and subpackage in Python is a directory which MUST contain a special file called __init__.py
. This file can be empty, and it indicates that the directory it contains is a Python package, so it can be imported the same way a module can be imported. This will define what gets brought into the namespace with the import
statement.
If we create a directory called foo, which marks the package name, we can then create a module inside that package called bar. We also must not forget to add the __init__.py
file inside the foo directory.
To use the module bar, we can import it in two ways:
import foo.bar
or
from foo import bar
In the first method, we must use the foo prefix whenever we access the module bar. In the second method, we don't, because we import the module to our module's namespace.
The __init__.py
file can also decide which modules the package exports as the API, while keeping other modules internal, by overriding the __all__
variable, like so:
~] cat __init__.py
__all__ = ["bar"]
Example of Python Packages Module Structure
Here, the root package is Game. It has sub packages Sound, Image, and Level, and file __init__.py
. Sound further has modules load, play, and pause, apart from file init.py. Image package has modules open, change, and close, apart from __init__.py
. Finally, Level package has modules start, load, and over, apart from __init__.py
.
However, a good module structure for the developer may or may not be a good module structure for the user. In some cases, the user might not need to know that there are various modules underlying the package. In other cases, there might be good reasons that the user should explicitly ask only for the modules they need. That is what I want to explore here: what are the different use cases and what approach do they call for from the package developer.
An example package
Python packages come in a variety of structures, but let’s create a simple demo one here that we can use in all the examples.
/src
/example_pkg
__init__.py
foo.py
bar.py
baz.py
setup.py
README.md
LICENSE
It is composed of three modules: foo.py, bar.py, and baz.py, each of which has a single function that prints the name of the module where the function resides.
foo.py:
def foo_func():
print('this is a foo function')
bar.py:
def bar_func():
print('this is a bar function')
baz.py
def baz_func():
print('this is a baz function')
Your code as a Grocery store
Now is a good time to acknowledge that talking about import
statements and package structures can be pretty hard to follow, especially in text. To help make things a bit clearer, let’s think about a Python package as a grocery store and your users as the shoppers. As the developer, you are the store owner and manager. Your job is to figure out how to set up your store so that you serve your customers best. The structure of your __init__.py
file will determine that setup. Below, I’ll walk through three alternative ways to set up that file: the general store, the convenience store, and the online store.
The General Store
In this scenario, the user gets access to everything right away on import example_pkg
. In their code, they only need to type the package name and the class, function, or other object they want, regardless of what module of the source code it lives in.
This scenario is like an old-timey general store. Once the customer walks in the door, they can see all the goods placed with minimal fuss in bins and shelves around the store.
Behind Scenes
# __init__.py
from .foo import *
from .bar import *
from .baz import *
User Implementation
import example_pkg
example_pkg.foo_func()
example_pkg.bar_func()
example_pkg.baz_func()
Advantages
- Users do not need to know module names or remember, for instance, which function is in which module. They only need the package name and the function name. In the general store, all the products are on display with minimal signage. The customer doesn’t need to know which aisle to go down.
- Users can access any functionality once the top-level package is imported. Everything is on display.
- Tab-completion gives you everything with just
example_pkg.<TAB>
. Tab-completion is like the general store grocer who knows exactly where everything is and is happy to help. - When new features are added to modules, you do not need to update any
import statements
; they will automatically be included. In the general store, there is no fancy signage to change. Just put a new item on the shelf.
Disadvantages
- Requires that all functions and classes must be uniquely named (i.e., there are not functions called
save()
in both the foo and bar modules). You don’t want to confuse your customers by putting apples in two different bins. - If the package is large, it can add a lot to the namespace and (depending on a lot of factors) can slow things down. A general store can have a lot of little odds and ends that any individual customer might not want. That can might be overwhelming for your customers.
- Requires a bit more effort and vigilance to keep some elements away from the user. For example, you might need to use underscores to keep functions from importing (e.g.,
_function_name()
). Most general stores don’t have a big storage area where things like brooms and mops are kept; those items are visible to the customer. Even if it is unlikely that they would pick up a broom and start sweeping your floors, you might not want them to. In that case, you have to take extra steps to hide those supplies from view.
Recommendations
- Use when it is hard to predict the workflow of a typical user (e.g., general packages like pandas or numpy). This is the “general” part of general store.
- Use when the user might frequently bounce around between different modules (e.g., the leiap package)
- Use when function and class names are very descriptive and easy to remember and specifying the module names will not improve readability. If your products are familiar things like fruits and vegetables, you don’t need a lot of signage; customers will figure things out quite easily.
- Use with just a few modules. If there are many modules, it can be more difficult for a new user to find the functionality they want in the docs. If your general store gets too big, customers won’t be able to find the things they want.
- Use when objects might be added or removed frequently. It’s easy to add and remove products in the general store without disrupting the customer.
Well-known examples
- pandas
- numpy (with additional complexity)
- seaborn
The Convenience Store
By far the easiest to read and understand is a variation on the general store scenario that I call the convenience store. Instead of from .module import *
, you can specify exactly what to import with from .module import func
within __init__.py
.
The convenience store shares a lot of traits with the general store. It has a relatively limited selection of goods which can be replaced at any time with minimal hassle. The customer doesn’t need a lot of signage to find what they need because most of the goods are easily in view. The biggest difference is that a convenience store has a bit more order. The empty boxes, brooms, and mops are all kept out of view of the customer and only the products for sale are on the shelves.
Behind the scenes
# __init__.py
from .foo import foo_func
from .bar import bar_func
from .baz import baz_func
User implementation
import example_pkg
example_pkg.foo_func()
example_pkg.bar_func()
example_pkg.baz_func()
Advantages
Shares all of the advantages of the general store, and adds:
- Somewhat easier to control what objects are made available to the user
Disadvantages
__init__.py
can end up very cluttered if there are many modules with many functions. Like the general store, a convenience store that is too cluttered will be harder for customers to navigate.- When new features are added to a module (i.e., new class or functions), they have to be explicitly added to the
__init__.py
file too. Modern IDEs can help detect missed imports, but it is still easy to forget. Your convenience store has some minimal signage and price tags. You have to remember to update these when you change what is on the shelf.
Recommendations
I would add the following to the recommendations from the general store:
- Especially useful when your modules more or less consist of a single
Class
(e.g.,from geopandas.geodataframe import GeoDataFrame
) - Use when you have a small number of objects to import
- Use when your objects have clear names
- Use when you know exactly which objects your users will need and which they will not
- Use when you do not expect to frequently add a lot of new modules and objects that will need to be imported.
Well-known example
- geopandas
Online grocery shopping
Anyone who has bought groceries online knows that ordering the right product can take some effort on the part of the customer. You have to search for the product, choose a brand, choose the desired size, etc. All of these steps, however, allow you to buy exactly what you want from a nearly limitless stockroom.
In the case of Python packages, in some cases, it might be more prudent to eschew the convenience of simply importing the entire package and instead force the user to be more clear about what pieces are being imported. This allows you as the developer to include a lot more pieces to the package without overwhelming the user.
Behind the scenes
# __init__.py
import example_pkg.foo
import example_pkg.bar
import example_pkg.baz
User implementation
There are (at least) three different methods that a user could adopt in this case.
import example_pkg
example_pkg.foo.foo_func()
example_pkg.bar.bar_func()
example_pkg.bar.baz_func()
or
from example_pkg import foo, bar, baz
foo.foo_func()
bar.bar_func()
baz.baz_func()
or
import example_pkg.foo as ex_foo
import example_pkg.bar as ex_bar
import example_pkg.baz as ex_baz
ex_foo.foo_func()
ex_bar.bar_func()
ex_baz.baz_func()
Advantages
- Simplifies the
__init__.py
file. Only needs to be updated when a new module is added. Updating your online store is relatively painless. You only need to change a setting in your product database. - It is flexible. It can be used to import only what the user needs or to import everything. The customers in your online store can search for only what they want or need. No need to bother looking through a “fruit” bin when all you need is an apple. But if they do want everything in the “fruit” bin, they can get that too.
- Aliasing can clean up long package.module specifications (e.g.,
import matplotlib.pyplot as plt
). While online grocery shopping can be a big pain at first, if you save your shopping list for the future, your shopping can be done a lot quicker. - Can have multiple objects with the same name (e.g., functions called
save()
in both the foo and bar modules)
Disadvantages
- Some of the import methods can make code more complicated to read. For example,
foo.foo_func()
does not indicate which package foo comes from. - The most readable method (
import example_pkg
, with no alias) can lead to long code chunks (e.g.,example_pkg.foo.foo_func()
) that clutter things up. - Can be hard for users to track down all of the possible functionality. In your online grocery store, it would be hard for the shopper to see all of the possible goods.
Recommendations
- Use when you have a complex series of modules, most of which any one user will never need.
- Use when
import example_pkg
imports a LOT of objects and might be slow. - Use when you can define pretty clear workflows for different kinds of users.
- Use when you can expect the user to be able to navigate your documentation well.
Examples
- matplotlib
- scikit-learn
- bokeh
- scipy
These packages actually use combinations of different approaches in their __init__.py
files. I include them here because to users, they are generally used à la carte (e.g., import matplotlib.pyplot as plt
or import scipy.stats.kde
).
Conclusion
The three scenarios I outlined are certainly not the only possible structures for a Python package, but I hope they cover most of the cases that anyone reading learning this from a blog might be considering. In conclusion, I’ll return to a point I made earlier: a good module structure for the developer may or may not be a good module structure for the user. Whatever your decision, don’t forget to put yourself in the user’s shoes even, or especially, because that user is most likely to be you.
Notes
Use 'import module' or 'from module import'?
Question: I've tried to find a comprehensive guide on whether it is best to use import module
or from module import
?
Answer:
The difference between import module
and from module import foo
is mainly subjective. Pick the one you like best and be consistent in your use of it. Here are some points to help you decide.
import module
-
Pros:
- Less maintenance of your
import
statements. Don't need to add any additional imports to start using another item from the module
- Less maintenance of your
-
Cons:
- Typing
module.foo
in your code can be tedious and redundant (tedium can be minimized by usingimport module as mo
then typingmo.foo
)
- Typing
from module import foo
-
Pros:
- Less typing to use
foo
- More control over which items of a module can be accessed
- Less typing to use
-
Cons:
- To use a new item from the module you have to update your
import
statement - You lose context about
foo
. For example, it's less clear whatceil()
does compared tomath.ceil()
- To use a new item from the module you have to update your
Either method is acceptable, but don't use
from module import *
\
For any reasonable large set of code, if youimport *
you will likely be cementing it into the module, unable to be removed. This is because it is difficult to determine what items used in the code are coming from 'module', making it easy to get to the point where you think you don't use theimport
any more but it's extremely difficult to be sure.
Sources
- https://towardsdatascience.com/whats-init-for-me-d70a312da583
- https://www.learnpython.org/en/Modules_and_Packages
- How To Package Your Python Code
- http://xion.io/post/code/python-all-wild-imports.html
- https://data-flair.training/blogs/python-packages/
- https://stackoverflow.com/questions/710551/use-import-module-or-from-module-import
- https://data-flair.training/blogs/python-packages/