Estudiantes y sus exámenes

Escribe una solución para encontrar el número de veces que cada estudiante presentó cada examen. Regresa el resultado ordenado por student_id y subject_name.

#pandas

Tabla: Students

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| student_id    | int     |
| student_name  | varchar |
+---------------+---------+
- `student_id` es la llave primaria (columna con valores únicos) para esta tabla.
- Cada fila de esta tabla contine el ID y el nombre de cada estudiante en la escuela.

Tabla: Subjects

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| subject_name | varchar |
+--------------+---------+
- `subject_name` es la llave primaria (columna con valores únicos) para esta tabla.
- Cada fila de esta tabla contine el nombre de cada materia en la escuela.

Tabla: Examinations

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| student_id   | int     |
| subject_name | varchar |
+--------------+---------+
- No hay llave primaria (columna con valores únicos) para esta tabla. Puede contener duplicados.
- Cada estudiante de la tabla Students toma cada curso de la tabla Subjects.
- Cada fila de esta tabla indica que un estudiante con ID `student_id` hizo el examen de la `subject_name`.

El formato del resultado se muestra en el siguiente ejemplo.

Ejemplo 1:

Entrada:

Tabla Students:
+------------+--------------+
| student_id | student_name |
+------------+--------------+
| 1          | Alice        |
| 2          | Bob          |
| 13         | John         |
| 6          | Alex         |
+------------+--------------+

Tabla Subjects:
+--------------+
| subject_name |
+--------------+
| Math         |
| Physics      |
| Programming  |
+--------------+

Tabla Examinations:
+------------+--------------+
| student_id | subject_name |
+------------+--------------+
| 1          | Math         |
| 1          | Physics      |
| 1          | Programming  |
| 2          | Programming  |
| 1          | Physics      |
| 1          | Math         |
| 13         | Math         |
| 13         | Programming  |
| 13         | Physics      |
| 2          | Math         |
| 1          | Math         |
+------------+--------------+

Salida:

+------------+--------------+--------------+----------------+
| student_id | student_name | subject_name | attended_exams |
+------------+--------------+--------------+----------------+
| 1          | Alice        | Math         | 3              |
| 1          | Alice        | Physics      | 2              |
| 1          | Alice        | Programming  | 1              |
| 2          | Bob          | Math         | 1              |
| 2          | Bob          | Physics      | 0              |
| 2          | Bob          | Programming  | 1              |
| 6          | Alex         | Math         | 0              |
| 6          | Alex         | Physics      | 0              |
| 6          | Alex         | Programming  | 0              |
| 13         | John         | Math         | 1              |
| 13         | John         | Physics      | 1              |
| 13         | John         | Programming  | 1              |
+------------+--------------+--------------+----------------+

Explicación:
- El resultado debe contener todos los estudiantes y todas las materias.
- Alice hizo el examen de Math 3 veces, el examen de Physics 2 veces, y el de Programming 1 vez.
- Bob hizo el examen de Math 1 vez, el de Programming 1 vez, y no ha hecho el examen de Physics aún.
- Alex no ha hecho ningún examen.
- John hizo el examen de Math 1 vez, el de Physics 1 vez, y el de Programming 1 vez.

Solución

import pandas as pd

def students_and_examinations(students: pd.DataFrame, subjects: pd.DataFrame, examinations: pd.DataFrame) -> pd.DataFrame:
    students_exams = students.merge(examinations, on='student_id')

    unique_cols = ['student_id', 'subject_name']
    subject_attempts = students_exams.
        groupby(unique_cols)['subject_name'].count().reset_index(name='attended_exams')
    subject_attempts.insert(1, 'student_name', '')

    students_info = pd.merge(subjects, students, how='cross')
    _type = {
        'student_id': 'Int64',
        'student_name': 'object',
        'subject_name': 'object',
        'attended_exams': 'Int64',
    }
    missing_rows = []

    for _, row in students_info.iterrows():
        subject, student_id, name = row['subject_name'], row['student_id'], row['student_name']

        item = subject_attempts.loc[subject_attempts['subject_name'].eq(subject) &
                                    subject_attempts['student_id'].eq(student_id)]
        if item.empty:
            new_item = pd.DataFrame([[student_id, name, subject, 0]],
                                columns=['student_id', 'student_name', 'subject_name', 'attended_exams']
                                ).astype(_type)
            missing_rows.append(new_item)
        else:
            subject_attempts.loc[item.index, 'student_name'] = name

    result = pd.concat([subject_attempts, *missing_rows])

    return result.sort_values(by=['student_id', 'subject_name'])

slackmart blog © 2024