- Published on
PyDatalog
- Authors
- Name
- Vishok Manikantan
Datalog
Before understanding PyDatalog, let's first understand what Datalog is.
By definition, Datalog is a declarative logic programming language that is used to query data. It is based on Prolog, a logic programming language. Datalog is used to query databases, knowledge bases, and other structured data sources.
A Datalog program consists of facts, rules, and queries. Facts are statements about the data, rules define relationships between the data, and queries are used to retrieve information from the data.
Datalog is similar to SQL, but it is more expressive and can be used to define complex relationships between data.
Here's an example to understand Datalog:
- CTech is a department under School of Computing.
- BTech CSE core is a course offered by CTech.
- Minor Degree Program in Computer Science is offered by CTech.
- NWC is a department under School of Computing.
- BTech CSE IT is a course offered by NWC.
- BTech CSE IoT is a course offered by NWC.
- BTech CSE Cyber Security is a course offered by NWC.
- CIntel is a department under School of Computing.
- BTech AI is a course offered by CIntel.
- BTech CSE AIML is a course offered by CIntel.
- School of Computing is a part of SRMIST.
Well, this is a list of facts. Now, let's define some rules:
- A department is a part of a school.
- A course is offered by a department.
Now, let's ask some questions:
- What are the courses offered by CTech?
- What are the departments under School of Computing?
- What are the courses offered by NWC?
- BTech CSE AIML is offered by which department?
These are some of the queries that can be asked using Datalog.
The first statements (1-11) are facts
. These define the data. The next two statements (12-13) are rules
. These define the relationships between the data. The last four statements (14-17) are queries
. These are used to retrieve information from the data.
The entire structure- facts
, rules
, and queries
demonstrates the fundamental structure of a Datalog program. The facts
provide the base data, while the rules
allow for logical deductions to be made about that data. The queries
enable users to extract meaningful information based on the established relationships.
Now that we have a basic understanding of Datalog, let's move on to PyDatalog.
PyDatalog
PyDatalog is a Python library that allows you to use Datalog in Python. It provides a way to write Datalog queries in Python code.
Installation
You can install PyDatalog using pip:
pip install pyDatalog
create_terms
create_terms
is a function provided by PyDatalog that is used to define the terms that will be used in the Datalog queries. It is used to define the variables and constants that will be used in the queries.
In our example, for the four queries mentioned above, we can define the terms as follows:
pyDatalog.create_terms('X, Y, Z, department, course, part_of, offered_by')
Here, X
, Y
, and Z
are the variables that will be used in the queries. You will understand this better when we write the queries. department
, course
, part_of
, and offered_by
are the constants that define the relationships between the data.
Defining the Facts
Now, let's define the facts using PyDatalog:
+ department('CTech', 'School of Computing')
+ course('BTech CSE core', 'CTech')
+ course('Minor Degree Program in Computer Science', 'CTech')
+ department('NWC', 'School of Computing')
+ course('BTech CSE IT', 'NWC')
+ course('BTech CSE IoT', 'NWC')
+ course('BTech CSE Cyber Security', 'NWC')
+ department('CIntel', 'School of Computing')
+ course('BTech AI', 'CIntel')
+ course('BTech CSE AIML', 'CIntel')
+ part_of('School of Computing', 'SRMIST')
Read the above code snippet in the following way:
- CTech is a department under School of Computing.
- BTech CSE core is a course offered by CTech.
- ...
- CIntel is a department under School of Computing.
- ...
- School of Computing is a part of SRMIST.
Here, we are defining the facts that we mentioned earlier. The +
sign is used to define the facts.
Defining the Rules
Now, let's define the rules using PyDatalog:
part_of(X, Y) <= department(X, Z)
offered_by(X, Y) <= course(X, Z)
First Rule
This rule states that if a department X
is part of some entity Z
, then we can conclude that department X
is part of entity Y
.
Breaking it down
Variables:
X
: Represents a department. (e.g., CTech, NWC, CIntel)Y
: Represents an intermediate entity. (e.g., School of Computing)Z
: Represents the parent entity. (e.g., SRMIST)
How it works:
If we have the facts:
department(CTech, School of Computing)
(CTech is part of School of Computing)part_of(School of Computing, SRMIST)
(School of Computing is part of SRMIST)
We can apply the rule:
- Substitute
X
withCTech
,Z
withSchool of Computing
andY
withSRMIST
. - Since the fact is true, we can conclude that
part_of(CTech, SRMIST)
is also true. - Hence, CTech is part of SRMIST.
- Substitute
Second Rule
This rule states that if a course X
is offered by a department Z
, then we can conclude that course X
is associated with department Y
.
Breaking it down
Variables:
X
: Represents a course. (e.g., BTech CSE core, Minor Degree Program in Computer Science)Y
: Represents a department. (e.g., CTech, NWC, CIntel)Z
: Represents the offering department. (e.g., CTech, NWC, CIntel)
How it works:
If we have the facts:
offered_by(CTech, School of Computing)
(CTech is part of School of Computing)course(BTech CSE core, CTech)
(BTech CSE core is offered by CTech)
We can apply the rule:
- Substitute
X
withBTech CSE core
,Z
withCTech
andY
withSchool of Computing
. - Since the fact is true, we can conclude that
offered_by(BTech CSE core, School of Computing)
is also true. - Hence, BTech CSE core is offered by School of Computing.
- Substitute
Defining the Queries
Now, let's define the queries using PyDatalog:
print(offered_by(X, 'CTech')) # What are the courses offered by CTech?
print(part_of(X, 'School of Computing')) # What are the departments under School of Computing?
print(offered_by(X, 'NWC')) # What are the courses offered by NWC?
print(offered_by('BTech CSE AIML', Y)) # BTech CSE AIML is offered by which department?
The above code snippet will output the answers to the queries mentioned.
I hope this part is self-explanatory. Else, feel free to ask questions in the comments.
Simulation:
Here is an interactive simulation of the above Datalog program to help you understand better:
Full Implementation
Here's the full implementation of the PyDatalog code:
from pyDatalog import pyDatalog
pyDatalog.create_terms('X, Y, Z, department, course, offered_by, part_of')
+ department('CTech', 'School of Computing')
+ course('BTech CSE core', 'CTech')
+ course('Minor Degree Program in Computer Science', 'CTech')
+ department('NWC', 'School of Computing')
+ course('BTech CSE IT', 'NWC')
+ course('BTech CSE IoT', 'NWC')
+ course('BTech CSE Cyber Security', 'NWC')
+ department('CIntel', 'School of Computing')
+ course('BTech AI', 'CIntel')
+ course('BTech CSE AIML', 'CIntel')
+ part_of('School of Computing', 'SRMIST')
offered_by(X, Y) <= course(X, Y)
part_of(X, Y) <= department(X, Y)
print(offered_by(X, 'CTech'))
print(part_of(X, 'School of Computing'))
print(offered_by(X, 'NWC'))
print(offered_by('BTech CSE AIML', Y))
Output:
This is a simple example to demonstrate how PyDatalog can be used to query data using Datalog in Python.
I hope this post was helpful in understanding Datalog and how it can be used to query data.