Introduction to Neo4j. Examples of queries in graphs

We want to design a database in Neo4j with the data provided in the following tables, so so that information can be extracted about the students' activity.

Enrolled students.
IoEIoTTitleCreditsProfessorStudent name
201 101 Romance philology 7 Sara Martinez Maria Mestre
201 102 modern greek 7 Daniel Perez Maria Mestre
202 106 Literary theories 3 Juan Garcia Rodrigo Calvo
203 103 modern literature 10 Amalia Sierra Oriol Menezes
203 105 Phonetics and morphology 5 Miguel Hernandez Oriol Menezes
203 108 modern Spanish 10 Isabel Sanz Oriol Menezes
204 101 Romance philology 7 Sara Martinez Carlo Berruzo
205 103 Hispanic-American literature 5 Paloma Sanchez Sofia Canyadell
205 104 Hispanic-American literature 5 Paloma Sanchez Sofia Canyadell
205 108 modern Spanish 10 Isabel Sanz Sofia Canyadell
206 106 Literary theories 3 Juan Garcia Marina Perez
206 107 General linguistics 3 Samuel Lopez Marina Perez
207 107 General linguistics 3 Samuel Lopez Arianna Ruiz
208 104 Hispanic-American literature 5 Paloma Sanchez Naiara Zapico
Note of the examined students.
CreditsExam NoteIoETitleProfessor
7 b 201 Romance philology Sara Martinez
7 b 201 modern greek Daniel Perez
3 c 202 Literary theories Juan Garcia
10 TO 203 modern literature Amalia Sierra
5 b 203 Phonetics and morphology Miguel Hernandez
10 TO 203 modern Spanish Isabel Sanz
7 b 204 Romance philology Sara Martinez
10 c 205 modern literature Paloma Sanchez
5 b 205 Hispanic-American literature Paloma Sanchez
10 TO 205 modern Spanish Isabel Sanz
3 b 206 Literary theories Juan Garcia
3 b 206 General linguistics Samuel Lopez
3 TO 207 General linguistics Samuel Lopez
5 c 208 Hispanic-American literature Paloma Sanchez

1. Argue what is the best way to structure the information provided for the second table, that is that is, present the notes of the examined students.

In the conditions set forth, we must implement a multilevel tree so that each level serves to filter for the information that interests you.

At the highest level of the hierarchy, “Course” type nodes will appear that represent the different courses. offered. At the next level, “Student” type nodes will appear representing those studied who have been evaluated in the courses, relating through the relationship “EXAMINED”. Finally, nodes of type “Professor” who represent the professors who have taught the courses, relating through the relationship “IT WAS_GIVEN”.

An example of this structure is the following image:

1 Proposed structure for the Neo4J exercise

2. We create the database using statements in Cypher, following the proposal proposed in the section 1. The queries and the result obtained must be shown.

The result is:

2 Structure resulting from the query proposed in this section

3.Resolve the following queries:

  • For all students who earned a 'C' grade in a 5-credit course, list the name of the student, the title of the course, and the name of the course instructor

    In this case we have to look for all the students who have been examined in a course and have obtained a 'C'. Once this information is obtained, we obtain the professor who taught the course.

    3 Response to the first query

  • Visualize the nodes and relationships of students who took “Modern Greek” or “Modern Spanish” in 2019

    In this section we have to look for all the students who have been examined in a course. In the relationship, We must only choose the users examined in 2019 for the “Modern Greek” or “Spanish” courses modern.”

    4 Graph structure of the response

    5 Response to the second query

4 Twitter

Consider the guide “Analyzing Twitter with Neo4j” which describes the implementation of a database in Neo4j.

  • Get the relevant user who wrote the geolocated tweet with the highest number of replies. List the userName and the number of replies.

    Basically I'm going to look for all the responses to the geolocated tweets. Once obtained, I have searched for those tweets written by relevant users. Finally, I return your username along with the amount of created tweets. Since the tweets are ordered, the first value will be the one that has the greatest number of replies and, hence the value I need to return.

    The result obtained when executing the query is:

    6 The most relevant user who wrote the geolocated tweet with the highest number of replies. List the userName and the number of replies.

  • Obtain the number of Geolocated tweets written from Barcelona and with the word “Buenafuente” in the text.

    In this query we have used geolocated tweets and their relationship with Location. We will only use all those located from Barcelona and that contain 'Buenafuente' in the text variable.

    The result obtained when executing the query is:

    7 Number of Geolocated tweets written from Barcelona and with the word “Buenafuente” in the text

  • Obtain the list of relevant users from Barcelona with followers from Madrid. Sort the list by number of followers and list only the second.

    The most important thing about this query is to use skip and limit in order to obtain the second highest result. Otherwise, I have only searched for relevant users from Barcelona who have a FOLLOWS relationship with a relevant user from Madrid.

    8 relevant users from Barcelona with followers from Madrid.

  • Calculate the ratio between the number of TwitterUsers in Spanish and English

    To achieve my goal I have counted the number of users who have the profile in English (numin) and Spanish (numbers). Later I have obtained the proportion that is 2 to 1 in favor of English.

    9 Ratio between the number of TwitterUsers in Spanish and English.