Sunday, October 19, 2025

Accessing Knowledge Commons with the New Python API Consumer


Accessing Knowledge Commons with the New Python API Consumer
Picture by Editor

 

Introduction

 
Knowledge is on the core of any knowledge skilled’s work. With out helpful and legitimate knowledge sources, we can not carry out our duties. Moreover, poor-quality or irrelevant knowledge will solely trigger our work to go to waste. That’s why getting access to dependable datasets is a vital place to begin for knowledge professionals.

Knowledge Commons is an open-source initiative by Google to prepare the world’s obtainable knowledge and make it accessible for everybody to make use of. It’s free for anybody to question publicly obtainable knowledge. What units Knowledge Commons other than different public dataset initiatives is that it already performs the schematic work, making knowledge prepared to make use of far more shortly.

Given the utility of Knowledge Commons for our work, accessing it’s turning into essential for a lot of knowledge duties. Fortuitously, Knowledge Commons offers a brand new Python API consumer to entry these datasets.

 

Accessing Knowledge Commons with Python

 
Knowledge Commons works by organizing knowledge right into a queryable information graph that unifies info from various sources. At its core, it makes use of the schema-based mannequin from schema.org to standardize knowledge representations.

Utilizing this schema, Knowledge Commons can join knowledge from numerous sources right into a single graph the place nodes characterize entities (reminiscent of cities, areas, and folks), occasions, and statistical variables. Edges depict the relationships between these nodes. Every node is exclusive and identifiable by a DCID (Knowledge Commons ID), and plenty of nodes embrace observations — measurements linked to the variable, entity, and interval.

With the Python API, we are able to simply entry the information graph to accumulate the required knowledge. Let’s check out how we are able to try this.

First, we have to purchase a free API key to entry Knowledge Commons. Create a free account and replica the API key to a safe location. You may also use the trial API key, however entry is extra restricted.

Subsequent, set up the Knowledge Commons Python library. We are going to use the V2 API consumer, as it’s the newest model. To try this, run the next command to put in the Knowledge Commons consumer with optionally available help for Pandas DataFrames as effectively.

pip set up "datacommons-client[Pandas]"

 

With the library put in, we’re able to fetch knowledge utilizing the Knowledge Commons Python consumer.

To create the consumer that can entry the info from the cloud, run the next code.

from datacommons_client.consumer import DataCommonsClient

consumer = DataCommonsClient(api_key="YOUR-API-KEY")

 

One of the essential ideas in Knowledge Commons is the entity, which refers to a persistent and bodily factor in the true world, reminiscent of a metropolis or a rustic. It turns into an essential a part of fetching knowledge, as most datasets require specifying the entity. You possibly can go to the Knowledge Commons Place web page to study all obtainable entities.

For many customers, the info that we wish to purchase is extra particular: the statistical variables saved in Knowledge Commons. To pick out the info we wish to retrieve, we have to know the DCID of the statistical variables, which you could find through the Statistical Variable Explorer.

 
Accessing Data Commons with the New Python API ClientAccessing Data Commons with the New Python API Client
 

You possibly can filter variables and choose a dataset from the choices above. For instance, select the World Financial institution dataset for “ATMs per 100,000 adults.” On this case, you’ll be able to get hold of the DCID by analyzing the knowledge supplied within the explorer.

 
Accessing Data Commons with the New Python API ClientAccessing Data Commons with the New Python API Client
 

If you happen to click on on the DCID, you’ll be able to see all the knowledge associated to the node, together with the way it connects to different info.

 
Accessing Data Commons with the New Python API ClientAccessing Data Commons with the New Python API Client
 

For the statistical variable DCID, we additionally have to specify the entity DCID for the geography. We will discover the Knowledge Commons Place web page talked about above, or we are able to use the next code to see the obtainable DCIDs for a sure place identify.

# Search for DCIDs by place identify (returns a number of candidates)
resp = consumer.resolve.fetch_dcids_by_name(names="Indonesia").to_dict()
dcid_list = [c["dcid"] for c in resp["entities"][0]["candidates"]]
print(dcid_list)

 

With output much like the next:

['country/IDN', 'geoId/...' , '...']

 

Utilizing the code above, we fetch the DCID candidates obtainable for a selected place identify. For instance, among the many candidates for “Indonesia,” we are able to choose nation/IDN because the nation DCID.

All the knowledge we want is now prepared, and we solely have to execute the next code:

variable = ["worldBank/GFDD_AI_25"]
entity = ["country/IDN"]

df = consumer.observations_dataframe(
    variable_dcids=variable,
    date="all",
    entity_dcids=entity
)

 

The result’s proven within the dataset beneath.

 
Accessing Data Commons with the New Python API ClientAccessing Data Commons with the New Python API Client
 

The present code returns all obtainable observations for the chosen variables and entities throughout your complete time-frame. Within the code above, additionally, you will discover that we’re utilizing lists as a substitute of single strings.

It’s because we are able to move a number of variables and entities concurrently to accumulate a mixed dataset. For instance, the code beneath fetches two distinct statistical variables and two entities directly.

variable = ["worldBank/GFDD_AI_25", "worldBank/SP_DYN_LE60_FE_IN"]
entity = ["country/IDN", "country/USA"]

df = consumer.observations_dataframe(
    variable_dcids=variable,
    date="all",
    entity_dcids=entity
)

 

With output like the next:

 
Accessing Data Commons with the New Python API ClientAccessing Data Commons with the New Python API Client
 

You possibly can see that the ensuing DataFrame combines the variables and entities you set beforehand. With this technique, you’ll be able to purchase the info you want with out executing separate queries for every mixture.

That’s all it’s good to find out about accessing Knowledge Commons with the brand new Python API consumer. Use this library everytime you want dependable public knowledge in your work.

 

Wrapping Up

 
Knowledge Commons is an open-source undertaking by Google aimed toward democratizing knowledge entry. The undertaking is inherently totally different from many public knowledge initiatives, because the datasets are constructed on prime of a information graph schema, which makes the info simpler to unify.

On this article, we explored the way to entry datasets inside the graph utilizing Python—leveraging statistical variables and entities to retrieve observations.

I hope this has helped!
 
 

Cornellius Yudha Wijaya is a knowledge science assistant supervisor and knowledge author. Whereas working full-time at Allianz Indonesia, he likes to share Python and knowledge suggestions through social media and writing media. Cornellius writes on quite a lot of AI and machine studying matters.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles

PHP Code Snippets Powered By : XYZScripts.com