NoSQL · Cassandra · CQL

Apache Cassandra and CQL: data modeling and query examples

Cassandra is designed for distributed writes, high availability and queries known in advance. The most important lesson is that the table is modeled from the access pattern, not from a normalized entity diagram.

Partition key

Defines where data lives in the cluster and which queries are efficient.

Clustering columns

Define order inside a partition and support range-like reads.

Query-first design

Duplicate data when needed so each important query has a table that serves it.

Careful filtering

ALLOW FILTERING is useful for learning, but often a warning sign in production.

2026 reading note

Keep this exercise as a compact CQL practice, not as a normalized relational design translated table by table. Current Cassandra documentation still emphasizes query-driven data modeling: start from the queries the application must answer, then design partition keys and clustering columns for those access patterns.

For production systems, review the maintained Cassandra branch, replication strategy, compaction, consistency levels and operational monitoring before copying these examples. If a query needs ALLOW FILTERING, treat it as a sign that the table may need to be remodeled for that query.

In the Cassandra 5.x context, Storage-Attached Indexing, vector search and Unified Compaction Strategy are worth knowing about. They expand what Cassandra can do, especially for search and AI-oriented workloads, but they do not remove the central modeling rule: design tables from access patterns first and only then decide whether indexes or vectors belong in the solution.

Introduction to Cassandra. CQL query examples

The aim of this article is to manage information about courses and students of the master's degree in Hispanic philology in a Cassandra database:

Courses

IdC	Title	Credits	Professor
101	Romance philology	7	Sara Martinez
102	Modern Greek	7	Daniel Perez
103	Modern literature	10	Amalia Sierra
104	Hispanic-American literature	5	Paloma Sanchez
105	Phonetics and morphology	5	Miguel Hernandez
106	Literary theories	3	Juan Garcia
107	General linguistics	3	Samuel Lopez
108	Modern Spanish	10	Isabel Sanz

Student

IdE	Student name
201	Maria Mestre
202	Rodrigo Calvo
203	Oriol Menezes
204	Carlo Berruzo
205	Sofia Canyadell
206	Marina Perez
207	Arianna Ruiz
208	Naiara Zapico

Grades

IdE	IdC	Exam grade	Exam date
201	101	B	2019-02-11
201	102	B	2018-06-30
202	106	C	2018-06-30
203	103	A	2019-02-11
203	108	A	2018-06-30
203	105	B	2020-01-04
204	101	B	2020-01-04
205	103	C	2020-01-04

1. Retrieve the information of the courses, that is, course title, credits, professor name and student name, attended by a specific student and ordered by course title.

IdE	IdC	Title	Credits	Professor	Student name
201	101	Romance philology	7	Sara Martinez	Maria Mestre
201	102	Modern Greek	7	Daniel Perez	Maria Mestre
202	106	Literary theories	3	Juan Garcia	Rodrigo Calvo
203	103	Modern literature	10	Amalia Sierra	Oriol Menezes
203	105	Phonetics and morphology	5	Miguel Hernandez	Oriol Menezes
203	108	Modern Spanish	10	Isabel Sanz	Oriol Menezes
204	101	Romance philology	7	Sara Martinez	Carlo Berruzo
205	103	Hispanic-American literature	5	Paloma Sanchez	Sofia Canyadell
205	104	Hispanic-American literature	5	Paloma Sanchez	Sofia Canyadell
205	108	Modern Spanish	10	Isabel Sanz	Sofia Canyadell
206	106	Literary theories	3	Juan Garcia	Marina Perez
206	107	General linguistics	3	Samuel Lopez	Marina Perez
207	107	General linguistics	3	Samuel Lopez	Arianna Ruiz
208	104	Hispanic-American literature	5	Paloma Sanchez	Naiara Zapico

As can be seen, to avoid problems with duplicated names I added the fields IdE and IdC to the query, even though they were not requested in the statement. With those fields I can correctly distinguish two students with the same name or two courses with the same title.

Due to the requirements of the following sections, I decided to use the primary key `idE_estudiante`, `titulo_curso` and `idC_curso`. The main key, `idE_estudiante`, allows me to group the data by that value and therefore determine the course with the maximum number of credits in which each student enrolled.

The next key used was `titulo_curso`, and in this way I also satisfy the ordering requirement stated in the exercise.

2. Retrieve the information of the courses in which there were students who obtained a specific grade. The statement asks for course title, credits, professor name, student identifier and grade, ordered by the number of course credits.

Credits	Exam grade	IdE	Title	Professor
7	B	201	Romance philology	Sara Martinez
7	B	201	Modern Greek	Daniel Perez
3	C	202	Literary theories	Juan Garcia
10	A	203	Modern literature	Amalia Sierra
5	B	203	Phonetics and morphology	Miguel Hernandez
10	A	203	Modern Spanish	Isabel Sanz
7	B	204	Romance philology	Sara Martinez
10	C	205	Modern literature	Paloma Sanchez
5	B	205	Hispanic-American literature	Paloma Sanchez
10	A	205	Modern Spanish	Isabel Sanz
3	B	206	Literary theories	Juan Garcia
3	B	206	General linguistics	Samuel Lopez
3	A	207	General linguistics	Samuel Lopez
5	C	208	Hispanic-American literature	Paloma Sanchez

In this point we repeat the previous process. To avoid data duplication issues and still include all the information, it is necessary to use `idE_estudiante` and `titulo_curso` as the primary key of the collection. In this case, I do not consider it important to distinguish courses by identifier because what really matters is the course title and its credits.

The order of the keys will be: `nota_exam`, `creditos_curso`, `idE_estudiante` and `titulo_curso`. The main key, `nota_exam`, allows me to solve the first requirement of the statement appropriately.

In that sense, the next key used was `creditos_curso`, which solves the ordering requirement indicated in the statement.

The result can be evaluated in the following image:

3. Starting from the two column families created in sections A) and B), generate CQL queries and show the results obtained in each one:

Retrieve all courses with students who received a 'C', ordered descending by number of credits.
The query is very simple for the `ejercicio_b` collection. We use its primary key to obtain the desired information and then sort the data as required by the statement.
3 Cassandra result for section C, part 1
Obtain the maximum number of credits obtained by each student from a subject.
At this point we use the `ejercicio_a` collection. Since it is ordered by the primary key `idE_estudiante`, it is possible to perform a `group by` on that value and obtain the desired query.
4 Cassandra answer for section C, part 2
I add the student's name, thereby satisfying the requirements of the statement.
5 Cassandra answer for section C, part 2 with the student's name added

Retrieve the information for the course "Modern Literature" without using the ALLOW FILTERING clause.
Knowing that `titulo_curso` is not part of the family keys of this collection and that it is part of a `WHERE` condition, it is necessary to define an index on that field. In that way we can avoid using the `ALLOW FILTERING` command mentioned in the statement.
6 Cassandra answer for section C, part 3
Change the professor name Daniel Perez to Xesus Cortinhas.
At this point we need to make modifications in both collections. To do so, we perform the following procedure in table `ejercicio_a`:
- Search for the affected rows.
- Obtain the primary keys that identify the displayed records: `idc_curso in (102) and ide_estudiante in (201) and titulo_curso='Griego moderno'`.
- Verify that the modification was successful.
7 Cassandra answer for exercise 4a. Table ejercicio_a
To modify collection `ejercicio_b`, we follow this procedure:
- Search for the affected rows.
- Obtain the primary keys that identify the displayed records: `titulo_curso='Griego moderno' and creditos_curso=7 and ide_estudiante=201 and nota_exam='B'`.
- Verify that the modification was successful.
8 Cassandra answer for exercise 4b. Table ejercicio_b

Delete the information related to the student "Rodrigo Calvo".
As in the previous point, we have to make changes in both collections. For that reason, we perform the following procedure in collection `ejercicio_a`:
- Search for the affected rows.
- Obtain the primary keys that identify the displayed records: `idc_curso=106 and ide_estudiante=202 and titulo_curso='Teorias literarias'`, and execute the delete.
- Verify that the deletion was successful.
9 Cassandra answer for exercise 5a. Table ejercicio_a
We now carry out the following procedure in collection `ejercicio_b`:
- Search for the affected rows.
- Obtain the primary keys that identify the displayed records: `nota_exam='C' and creditos_curso=3 and ide_estudiante=202 and titulo_curso='Teorias literarias'`, and execute the delete.
- Verify that the deletion was successful.
10 Cassandra answer for exercise 5b. Table ejercicio_b