Information Science: From Faculty to Work, Half V

June 29, 2025

51

Make it work, then make it lovely, then in case you actually, actually must, make it quick. 90 p.c of the time, in case you make it lovely, it can already be quick. So actually, simply make it lovely! (Supply)

— Joe Armstrong (co-designers of the Erlang programming language.)

article about Python for the collection “Information Science: From Faculty to Work.” For the reason that starting, you’ve realized find out how to handle your Python venture with UV, find out how to write a clear code utilizing PEP and SOLID ideas, find out how to deal with errors and use loguru to log your code and find out how to write assessments.

Now you’re ready to create working, production-ready code. However code is rarely good and may all the time be improved. A closing (non-compulsory, however extremely beneficial) step in creating code is optimization.

To optimize your code, you want to have the ability to observe what’s happening in it. To take action, we use instruments known as Profilers. They generate profiles of your code. It means a set of statistics that describes how typically and for the way lengthy varied components of this system executed. They make it attainable to determine bottlenecks and components of the code that eat too many sources. In different phrases, they present the place your code must be optimized.

As we speak, there’s such a proliferation of profilers in Python that the default profiler in Pycharm is named yappi for “But One other Python Profiler”.

This text is subsequently not an exhaustive record of all present profilers. On this article, I current a instrument for every side of the code we need to profile: reminiscence, time and CPU/GPU consumption. Different packages will probably be talked about with some references however is not going to be detailed.

I – Reminiscence profilers

Reminiscence profiling is the strategy of monitoring and evaluating a program’s reminiscence utilization whereas operating. This methodology helps builders to find reminiscence leaks, optimizing reminiscence utilization, and comprehending their packages’ reminiscence consumption patterns. Reminiscence profiling is essential to stop functions from utilizing extra reminiscence than needed and inflicting sluggish efficiency or crashes.

1/ memory-profiler

memory_profiler is an easy-to-use Python module designed to profile reminiscence utilization of a script. It is dependent upon psutil module. To put in the package deal, merely kind:

pip set up memory_profiler # (in your digital surroundings)
# or in case you use uv (what I encourage)
uv add memory_profiler

Profiling executable

One of many benefits of this package deal is that it isn’t restricted to pythonic use. It installs the mprof command that permits monitoring the exercise of any executable.

For example, you may monitor the reminiscence consummation of functions like ollama by operating this command:

mprof run ollama run gemma3:4b
# or with uv
mprof run ollama run gemma3:4b

To see the consequence, you must set up matplotlib first. Then, you may plot the recorded reminiscence profile of your executable by operating:

mprof plot
# or with uv
mprof run ollama run gemma3:4b

The graph then seems to be like this:

Output of the command mprof plot after the monitoring of the executable ollama run gemma3:4b (from the creator).

Profiling Python code

Let’s get again to what brings us right here, the monitoring of a Python code.

memory_profiler works with a line-by-line mode utilizing a easy decorator @profile. First, you beautify the curiosity operate and you then run the script. The output will probably be written on to the terminal. Contemplate the next monitoring.py script:

@profile
def my_func():
    a = [1] * (10 ** 6)
    b = [2] * (2 * 10 ** 7)
    del b
    return a


if __name__ == '__main__':
    my_func()

It is very important discover that it isn’t essential to import the package deal from memory_profiler import profile on the start of the script. On this case you must specify some particular arguments to the Python interpreter.

python-m memory_profiler monitoring.py # with an area between python and -m
# or
uv run -m memory_profiler monitoring.py

And you’ve got the next output with a line-by-line particulars:

Output of the command -m memory_profiler monitoring.py (from creator)

The output is a desk with 5 columns.

Line #: The road variety of the profiled code
Mem utilization: The reminiscence utilization of the Python interpreter after executing that line.
Increment: The change in reminiscence utilization in comparison with the earlier line.
Occurrences: The variety of occasions that line was executed.
Line Contents: The precise supply code.

This output may be very detailed and permits very superb monitoring of a selected operate.

Necessary: Sadly, this package deal is now not actively maintained. The creator is in search of a substitute.

2/ tracemalloc

tracemalloc is a built-in module in Python that tracks reminiscence allocations and deallocations. Tracemalloc gives an easy-to-use interface for capturing and analyzing reminiscence utilization snapshots, making it a useful instrument for any Python developer.

It presents the next particulars:

Reveals the place every object was allotted by offering a traceback.
Offers reminiscence allocation statistics by file and line quantity, together with the general measurement, rely, and common measurement of reminiscence blocks.
Lets you examine two snapshots to determine potential reminiscence leaks.

The package deal tracemalloc could also be usefull to determine reminiscence leak in your code.

Personally, I discover it much less intuitive to arrange than the opposite packages introduced on this article. Listed below are some hyperlinks to go additional:

II – Time profilers

Time profiling is the method of measuring the time spent in numerous components of a program. By figuring out efficiency bottlenecks, you may focus their optimization efforts on the components of the code that may have essentially the most important influence.

1/ line-profiler

The line-profiler package deal is kind of just like memory-profiler, but it surely serves a special objective. It’s designed to profile particular features by measuring the execution time of every line inside these features. To make use of LineProfiler successfully, you’ll want to explicitly specify which features you need it to profile by merely including the @profile decorator above them.

To put in it simply kind:

pip set up line_profiler # (in your digital surroundings)
# or
uv add line_profiler

Contemplating the next script named monitoring.py

@profile
def create_list(lst_len: int):
    arr = []
    for i in vary(0, lst_len):
        arr.append(i)


def print_statement(idx: int):
    if idx == 0:
        print("Beginning array creation!")
    elif idx == 1:
        print("Array created efficiently!")
    else:
        elevate ValueError("Invalid index offered!")


@profile
def essential():
    print_statement(0)
    create_list(400000)
    print_statement(1)


if __name__ == "__main__":
    essential()

To measure the execution time of the operate essential() and create_list(), we add the decorator @profile.

The simplest technique to get a time profiling of this script to make use of the kernprof script.

kernprof -lv monitoring.py # (in your digital surroundings)
# or
uv run kernprof -lv monitoring.py

It should create a binary file named your_script.py.lprof. The argument -v permits to point out directyl the output within the terminal.
In any other case, you may view the outcomes later like so:

python-m line_profiler monitoring.py.lprof # (in your digital surroundings)
# or
uv run python -m line_profiler monitoring.py.lprof

It gives the next informations:

Output of the command kernprof -lv monitoring.py (from creator)

There are two tables, one by profiled operate. Every desk containes the next informations

Line #: The road quantity within the file.
Hits: The variety of occasions that line was executed.
Time: The entire period of time spent executing the road within the timer’s models. Within the header info earlier than the tables, you will note a line “Timer unit:” giving the conversion issue to seconds. It might be completely different on completely different programs.
Per Hit: The typical period of time spent executing the road as soon as within the timer’s models
% Time: The share of time spent on that line relative to the overall quantity of recorded time spent within the operate.
Line Contents: The precise supply code.

1/ cProfile

Python comes with two built-in profilers:

cProfile: A C extension with cheap overhead that makes it appropriate for profiling long-running packages. It is suggested for many customers.
profile: A pure Python module whose interface is imitated by cProfile, however which provides important overhead to profiled packages. It may be a worthwhile instrument when you’ll want to lengthen or customise the profiling performance.

The bottom syntax is cProfile.run(assertion, filename=None, kind=-1). The filename argument could be handed to avoid wasting the output. And the kind argument can be utilized to specify how the output must be printed. By default, it’s set to -1( no worth).

For example, in case you modify the monitoring script like this:

import cProfile


def create_list(lst_len: int):
    arr = []
    for i in vary(0, lst_len):
        arr.append(i)


def print_statement(idx: int):
    if idx == 0:
        print("Beginning array creation!")
    elif idx == 1:
        print("Array created efficiently!")
    else:
        elevate ValueError("Invalid index offered!")


def essential():
    print_statement(0)
    create_list(400000)
    print_statement(1)


if __name__ == "__main__":
    cProfile.run("essential()")

we have now the next output:

First, we have now the script outputs: print_statement(0) and print_statement(1).

Then, we have now the profiler output: The primary line exhibits the variety of operate calls and the time it took to run. The second line is a reminder of the sorted parameter. And, the profiler gives a desk with six columns:

ncalls: Reveals the variety of calls made
tottime: Complete time taken by the given operate. Be aware that the time made in calls to sub-functions are excluded.
percall: Complete time / No of calls. (the rest is not noted)
cumtime: In contrast to tottime, this contains time spent on this and all subfunctions that the higher-level operate calls. It’s most helpful and is correct for recursive features.
percall: The percall following cumtime is calculated because the quotient of cumtime divided by primitive calls. The primitive calls embody all of the calls that weren’t included by way of recursion.
filename: The identify of the strategy.

The primary and the final rows of the desk come from cProfile. The opposite rows are concerning the script.

You possibly can customise the output through the use of the Profile() class. First, you must initialize an occasion of Profile class and utilizing the strategy allow() and disable() to, respectively, begin and to finish the gathering of profiling knowledge. Then, the pstats module can be utilized to govern the outcomes collected by the profiler object.

To kind output by cumulative time, as a substitute of the usual identify the earlier code could be rewritten like this:

import cProfile, pstats


# ... 
# Identical as earlier than


if __name__ == "__main__":
    profiler = cProfile.Profile()
    profiler.allow()
    essential()
    profiler.disable()
    stats = pstats.Stats(profiler).sort_stats('cumtime')
    stats.print_stats()

And the output turns into:

As you may see, now the desk is sorted by cumtime. And the 2 rows of cProfile of the earlier desk will not be on this desk.

Visualize profiling with Snakeviz.

The output may be very straightforward to analyse. However, it could actually turn out to be unreadable if the profiled code turns into too huge.

One other technique to analyse the ouput is to visualise knowledge as a substitute of learn it. To take action, we use the Snakeviz package deal. To put in it, merely kind:

pip set up snakeviz # (in your digital surroundings)
# or
uv add snakeviz

Then, exchange stats.print_stats() by stats.dump_stats("profile.prof") to avoid wasting profiling knowledge. Now, you may have a visualization of your profiling by typing:

snakeviz profile.prof

It launches a file browser interface from which you’ll be able to select amongst two knowledge visualizations: Icicle and Sunburst.

The Icicle visualization of the profiling of the regression script (from the creator).

The Sunburst visualization of the profiling of the regression script (from the creator).

It’s simpler to learn than the print_stats() output as a result of you may work together with every component by transferring your mouse over it. For example, you may have extra particulars concerning the operate create_list()

Particulars concerning the time consumption of the operate `evaluate_model()` (from the creator).

Create a name graph with gprof2dot

A name graph is a visible illustration of the relationships between features or strategies in a program, displaying which features name others and the way lengthy every operate or methodology takes. It may be seen as a map of your code.

pip set up gprof2dot # (in your digital surroundings)
# or
uv add gprof2dot

Then exectute your by typing

python-m cProfile -o monitoring.pstats .monitoring.py # (in your digital surroundings)
# or
uv run python-m cProfile -o monitoring.pstats .monitoring.py

It should create a monitoring.pstats that may be flip right into a name graph utilizing the next command:

gprof2dot -f pstats monitoring.pstats | dot -Tpng -o monitoring.png # (in your digital surroundings)
# or
uv run gprof2dot -f pstats monitoring.pstats | dot -Tpng -o monitoring.png

Then the decision graph is saved right into a png file named monitoring.png

The decision graph of the script monitoring.py (from the creator).

2/ Different fascinating packages

a/ PyCallGraph

PyCallGraph is a Python module that creates name graph visualizations. To make use of it, you must :

To create a name graph of your code, provide run it a PyCallGraph context like this:

from pycallgraph import PyCallGraph
from pycallgraph.output import GraphvizOutput

with PyCallGraph(output=GraphvizOutput()):
    # code you need to profile

Then, you get a png of the decision graph of your code is called by default pycallgraph.png.

I’ve made the decision graph of the earlier instance:

The decision graph from PyCallGraph of the monitoring.py script.

In every field, you’ve the identify of the operate, the time spent in and the variety of calls. Like with snakeviz, the graph could also be very complicated in case your code has many dependencies. However the shade signifies the bottlenecks. In complicated code, it’s very fascinating to check it to see the dependencies and relationships.

b/ PyInstrument

PyInstrument can be a Python profiler very straightforward to make use of. You possibly can add the profiler in your script by surredning the code like this:

from pyinstrument import Profiler

profiler = Profiler()
profiler.begin()

# code you need to profile

profiler.cease()
print(profiler.output_text(unicode=True, shade=True))

The output provides

It’s much less detailled than cProfile however additionally it is extra readable. Your features are highlighted and sorted by time.

Butthe true curiosity of PyInstrument comes with its html output. To get this html output merely kind within the terminal:

pyinstrument --html .monitoring.py
# or
uv run pyinstrument --html .monitoring.py

It launches a file browser interface from which you’ll be able to select amongst two knowledge visualizations: Name stack and Timeline.

Name stack illustration of the monotoring.py script (from the creator).

Timeline illustration of the monotoring.py script (from the creator).

Right here, the profile is extra detailed and you’ve got many choices to filter.

CPU/GPU profiler

CPU and GPU profiling is the method of analyzing the utilization and efficiency of a program on the central processing unit (CPU) and graphics processing unit (GPU). By measuring how a lot sources are spent on completely different components of the code on these processing models, builders can determine efficiency bottlenecks, perceive the place their code is being executed, and optimize their software to attain higher efficiency and effectivity.

So far as I do know, there is just one package deal that may profile GPU energy consumption.

1/ Scalene

Scalene is a high-performance CPU, GPU and reminiscence profiler designed particularly for Python. It’s an open-source package deal that gives detailed insights. It’s designed to be quick, correct, and straightforward to make use of, making it a wonderful instrument for builders trying to optimize their code.

CPU/GPU Profiling: Scalene gives detailed info on CPU/GPU utilization, together with the time spent in numerous components of your code. It might assist you determine efficiency bottlenecks and optimize your code for higher execution occasions.
Reminiscence Profiling: Scalene tracks reminiscence allocation and deallocation, serving to you perceive how your code makes use of reminiscence. That is notably helpful for figuring out reminiscence leaks or optimizing memory-intensive functions.
Line-by-Line Profiling: Scalene gives line-by-line profiling, which supplies you an in depth breakdown of the time spent in every line of your code. This function is invaluable for pinpointing efficiency points.
Visualization: Scalene features a graphical interface for visualizing profiling outcomes, making it simpler to grasp and navigate the info.

To spotlight all some great benefits of Scalene, I’ve developed features with the only real purpose of consuming reminiscence memory_waster(), CPU cpu_waster() and GPU gpu_convolution(). All of them are in a script scalene_tuto.py.

import random
import copy
import math
import cupy as cp
import numpy as np


def memory_waster():
    """Wastes reminiscence however in a managed method"""
    memory_hogs = []

    # Create reasonably sized redundant knowledge buildings
    for i in vary(100):
        garbage_data = []
        for j in vary(1000):
            waste = f"Ineffective string #{j} repeated " * 10
            garbage_data.append(waste)
            garbage_data.append(
                {
                    "id": j,
                    "knowledge": waste,
                    "numbers": [random.random() for _ in range(50)],
                    "range_data": record(vary(100)),
                }
            )
        memory_hogs.append(garbage_data)

    for iteration in vary(4):
        print(f"Creating copy #{iteration}...")
        memory_copy = copy.deepcopy(memory_hogs)
        memory_hogs.lengthen(memory_copy)

    return memory_hogs


def cpu_waster():
    meaningless_result = 0

    for i in vary(10000):
        for j in vary(10000):
            temp = (i**2 + j**2) * random.random()
            temp = temp / (random.random() + 0.01)
            temp = abs(temp**0.5)
            meaningless_result += temp

            # Some trigonometric operations
            angle = random.random() * math.pi
            temp += math.sin(angle) * math.cos(angle)

        if i % 100 == 0:
            random_mess = [random.randint(1, 1000) for _ in range(1000)]  # Smaller record
            random_mess.kind()
            random_mess.reverse()
            random_mess.kind()

    return meaningless_result


def gpu_convolution():
    image_size = 128
    kernel_size = 64

    picture = np.random.random((image_size, image_size)).astype(np.float32)
    kernel = np.random.random((kernel_size, kernel_size)).astype(np.float32)

    image_gpu = cp.asarray(picture)
    kernel_gpu = cp.asarray(kernel)

    consequence = cp.zeros_like(image_gpu)

    for y in vary(kernel_size // 2, image_size - kernel_size // 2):
        for x in vary(kernel_size // 2, image_size - kernel_size // 2):
            pixel_value = 0
            for ky in vary(kernel_size):
                for kx in vary(kernel_size):
                    iy = y + ky - kernel_size // 2
                    ix = x + kx - kernel_size // 2
                    pixel_value += image_gpu[iy, ix] * kernel_gpu[ky, kx]
            consequence[y, x] = pixel_value

    result_cpu = cp.asnumpy(consequence)
    cp.cuda.Stream.null.synchronize()

    return result_cpu


def essential():
    print("n1/ Losing some reminiscence (managed)...")
    _ = memory_waster()

    print("n2/ Losing CPU cycles (managed)...")
    _ = cpu_waster()

    print("n3/ Losing GPU cycles (managed)...")
    _ = gpu_convolution()


if __name__ == "__main__":
    essential()

For the GPU operate, you must set up cupy in keeping with your cuda model (nvcc --version to get it)

pip set up cupy-cuda12x # (in your digital surroundings)
# or
uv add set up cupy-cuda12x

Additional particulars on putting in cupy could be discovered within the documentation.

To run Scalene, use the command

scalene scalene_tuto.py
# or
uv run scalene scalene_tuto.py

It profiles each CPU, GPU, and reminiscence by default. When you solely need one or a number of the choices, use the flags --cpu, --gpu, and --memory.

Scalene gives a line-level and a operate stage profiling. And it has two interfaces: the Command Line Interface (CLI) and the net interface.

Necessary: It’s higher to make use of Scalene with Ubuntu utilizing WSL. In any other case, the profiler doesn’t retrieve reminiscence consumption info.

a) Command Line Interface

By default, Scalene’s output is the net interface. To acquire the CLI as a substitute, add the flag --cli.

scalene scalene_tuto.py --cli
# or
uv run scalene scalene_tuto.py --cli

You could have the next outcomes:

Scalene output within the terminal (from the creator).

By default, the code is displayed in darkish mode. So if, like me, you’re employed in gentle mode, the consequence isn’t very fairly.

The visualization is categorized into three distinct colours, every representing a special profiling metric.

The blue part represents CPU profiling, which gives a breakdown of the time spent executing Python code, native code (resembling C or C++), and system-related duties (like I/O operations).
The inexperienced part is devoted to reminiscence profiling, displaying the proportion of reminiscence allotted by Python code, in addition to the general reminiscence utilization over time and its peak values.
The yellow part focuses on GPU profiling, displaying the GPU’s operating time and the amount of knowledge copied between the GPU and CPU, measured in mb/s. It’s price noting that GPU profiling is presently restricted to NVIDIA GPUs.

b) The online interface.

The online interface is split in three components.

Scalene interface within the browser (from the creator).

The colour code is identical as within the command lien interface. However some icons are added:

💥: Optimizable code area (efficiency indication within the Operate Profile part).
⚡: Optimizable traces of code.

c) AI Options

One of many nice benefits of Scalene is the flexibility to make use of AI to enhance the slowness and/or overconsumption you’ve recognized. It presently helps OpenAI API, Amazon BedRock, Azure OpenAI and ollama in native

Scalene AI optimization choices menu (from the creator).

After choosing your instruments, you simply must click on on 💥 or ⚡if you wish to optimize part of the code or only a line.

I check it with codellama:7b-python from ollama to optimize the gpu_convolution() operate. Sadly, as talked about within the interface:

Be aware that optimizations are AI-generated and is probably not right.

Not one of the instructed optimizations labored. However the codebase was not conducive to optimization because it was artificially difficult. Simply take away pointless traces to avoid wasting time and reminiscence. Additionally, I used a small mannequin, which may very well be the rationale.

Though my assessments had been inconclusive, I believe this selection could be fascinating and can absolutely proceed to enhance.

Conclusion

These days, we’re much less involved concerning the useful resource consumption of our developments, and really shortly these optimization deficits can accumulate, making the code sluggish, too sluggish for manufacturing, and typically even requiring the acquisition of extra highly effective {hardware}.
Code profiling instruments are indispensable on the subject of figuring out areas in want of optimization.

The mix of the reminiscence profiler and line profiler gives an excellent preliminary evaluation: straightforward to arrange, with easy-to-understand stories.

Instruments resembling cProfile and Scalene are full and have graphical representations, however require extra time to research. Lastly, the AI optimization choice supplied by Scalene is an actual asset, even when in my case the mannequin used was not adequate to offer something related.

Interested in Python & Information Science?
Observe me for extra tutorials and insights!