Datalog

Before understanding PyDatalog, let's first understand what Datalog is.

By definition, Datalog is a declarative logic programming language that is used to query data. It is based on Prolog, a logic programming language. Datalog is used to query databases, knowledge bases, and other structured data sources.

A Datalog program consists of facts, rules, and queries. Facts are statements about the data, rules define relationships between the data, and queries are used to retrieve information from the data.

Datalog is similar to SQL, but it is more expressive and can be used to define complex relationships between data.

Here's an example to understand Datalog:

CTech is a department under School of Computing.
BTech CSE core is a course offered by CTech.
Minor Degree Program in Computer Science is offered by CTech.
NWC is a department under School of Computing.
BTech CSE IT is a course offered by NWC.
BTech CSE IoT is a course offered by NWC.
BTech CSE Cyber Security is a course offered by NWC.
CIntel is a department under School of Computing.
BTech AI is a course offered by CIntel.
BTech CSE AIML is a course offered by CIntel.
School of Computing is a part of SRMIST.

Well, this is a list of facts. Now, let's define some rules:

A department is a part of a school.
A course is offered by a department.

Now, let's ask some questions:

What are the courses offered by CTech?
What are the departments under School of Computing?
What are the courses offered by NWC?
BTech CSE AIML is offered by which department?

These are some of the queries that can be asked using Datalog.

The first statements (1-11) are facts. These define the data. The next two statements (12-13) are rules. These define the relationships between the data. The last four statements (14-17) are queries. These are used to retrieve information from the data.

The entire structure- facts, rules, and queries demonstrates the fundamental structure of a Datalog program. The facts provide the base data, while the rules allow for logical deductions to be made about that data. The queries enable users to extract meaningful information based on the established relationships.

Now that we have a basic understanding of Datalog, let's move on to PyDatalog.

PyDatalog

PyDatalog is a Python library that allows you to use Datalog in Python. It provides a way to write Datalog queries in Python code.

Installation

You can install PyDatalog using pip:

pip install pyDatalog

`create_terms`

create_terms is a function provided by PyDatalog that is used to define the terms that will be used in the Datalog queries. It is used to define the variables and constants that will be used in the queries.

In our example, for the four queries mentioned above, we can define the terms as follows:

pyDatalog.create_terms('X, Y, Z, department, course, part_of, offered_by')

Here, X, Y, and Z are the variables that will be used in the queries. You will understand this better when we write the queries. department, course, part_of, and offered_by are the constants that define the relationships between the data.

Defining the Facts

Now, let's define the facts using PyDatalog:

+ department('CTech', 'School of Computing')
+ course('BTech CSE core', 'CTech')
+ course('Minor Degree Program in Computer Science', 'CTech')
+ department('NWC', 'School of Computing')
+ course('BTech CSE IT', 'NWC')
+ course('BTech CSE IoT', 'NWC')
+ course('BTech CSE Cyber Security', 'NWC')
+ department('CIntel', 'School of Computing')
+ course('BTech AI', 'CIntel')
+ course('BTech CSE AIML', 'CIntel')
+ part_of('School of Computing', 'SRMIST')

Read the above code snippet in the following way:

CTech is a department under School of Computing.
BTech CSE core is a course offered by CTech.
...
CIntel is a department under School of Computing.
...
School of Computing is a part of SRMIST.

Here, we are defining the facts that we mentioned earlier. The + sign is used to define the facts.

Defining the Rules

Now, let's define the rules using PyDatalog:

part_of(X, Y) <= department(X, Z)
offered_by(X, Y) <= course(X, Z)

First Rule

This rule states that if a department X is part of some entity Z, then we can conclude that department X is part of entity Y.

Breaking it down

Variables:
- X: Represents a department. (e.g., CTech, NWC, CIntel)
- Y: Represents an intermediate entity. (e.g., School of Computing)
- Z: Represents the parent entity. (e.g., SRMIST)
How it works:
- If we have the facts:
  - department(CTech, School of Computing) (CTech is part of School of Computing)
  - part_of(School of Computing, SRMIST) (School of Computing is part of SRMIST)
- We can apply the rule:
  - Substitute X with CTech, Z with School of Computing and Y with SRMIST.
  - Since the fact is true, we can conclude that part_of(CTech, SRMIST) is also true.
  - Hence, CTech is part of SRMIST.

Second Rule

This rule states that if a course X is offered by a department Z, then we can conclude that course X is associated with department Y.

Breaking it down

Variables:
- X: Represents a course. (e.g., BTech CSE core, Minor Degree Program in Computer Science)
- Y: Represents a department. (e.g., CTech, NWC, CIntel)
- Z: Represents the offering department. (e.g., CTech, NWC, CIntel)
How it works:
- If we have the facts:
  - offered_by(CTech, School of Computing) (CTech is part of School of Computing)
  - course(BTech CSE core, CTech) (BTech CSE core is offered by CTech)
- We can apply the rule:
  - Substitute X with BTech CSE core, Z with CTech and Y with School of Computing.
  - Since the fact is true, we can conclude that offered_by(BTech CSE core, School of Computing) is also true.
  - Hence, BTech CSE core is offered by School of Computing.

Defining the Queries

Now, let's define the queries using PyDatalog:

print(offered_by(X, 'CTech')) # What are the courses offered by CTech?
print(part_of(X, 'School of Computing')) # What are the departments under School of Computing?
print(offered_by(X, 'NWC')) # What are the courses offered by NWC?
print(offered_by('BTech CSE AIML', Y)) # BTech CSE AIML is offered by which department?

The above code snippet will output the answers to the queries mentioned.

I hope this part is self-explanatory. Else, feel free to ask questions in the comments.

Simulation:

Here is an interactive simulation of the above Datalog program to help you understand better:

Full Implementation

Here's the full implementation of the PyDatalog code:

from pyDatalog import pyDatalog

pyDatalog.create_terms('X, Y, Z, department, course, offered_by, part_of')

+ department('CTech', 'School of Computing')
+ course('BTech CSE core', 'CTech')
+ course('Minor Degree Program in Computer Science', 'CTech')
+ department('NWC', 'School of Computing')
+ course('BTech CSE IT', 'NWC')
+ course('BTech CSE IoT', 'NWC')
+ course('BTech CSE Cyber Security', 'NWC')
+ department('CIntel', 'School of Computing')
+ course('BTech AI', 'CIntel')
+ course('BTech CSE AIML', 'CIntel')
+ part_of('School of Computing', 'SRMIST')

offered_by(X, Y) <= course(X, Y)
part_of(X, Y) <= department(X, Y)

print(offered_by(X, 'CTech'))
print(part_of(X, 'School of Computing'))
print(offered_by(X, 'NWC'))
print(offered_by('BTech CSE AIML', Y))

Output:

This is a simple example to demonstrate how PyDatalog can be used to query data using Datalog in Python.

I hope this post was helpful in understanding Datalog and how it can be used to query data.