335 lines
11 KiB
Plaintext
335 lines
11 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7c2d295d-86f0-4e47-8f29-6c8d2c2a2bcd",
|
|
"metadata": {},
|
|
"source": [
|
|
"Ejecutaremos el script `sinac-download.py` y nos dara los archivos zip a trabajar, los descomprimimos y vermos los CSV necesarios."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ebcb3503-cbae-400d-ab1e-5736692d0004",
|
|
"metadata": {},
|
|
"source": [
|
|
"Despues de descomprimir los ZIP hay que renombrar los archivos:\n",
|
|
"\n",
|
|
" - datasets/Nacimientos_2021.csv\n",
|
|
" - datasets/Nacimientos_2022.csv\n",
|
|
" - datasets/sinac_2020.csv\n",
|
|
"\n",
|
|
"Para que sigan la convencion: datasets/sinac2008DatosAbiertos.csv"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "260278d1-79fd-414f-a576-8d88e6965b5a",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from pathlib import Path\n",
|
|
"import json\n",
|
|
"import pandas as pd"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "fee3e9d2-29dd-475a-9cf8-2d9021bf6783",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"csvs = sorted([p for p in Path(\"datasets\").iterdir() if \"csv\" in p.name])\n",
|
|
"samples = {}"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "caac2ebb-23eb-4436-be62-203b8f80fd95",
|
|
"metadata": {
|
|
"scrolled": true
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[PosixPath('datasets/sinac2008DatosAbiertos.csv'),\n",
|
|
" PosixPath('datasets/sinac2009DatosAbiertos.csv'),\n",
|
|
" PosixPath('datasets/sinac2010DatosAbiertos.csv'),\n",
|
|
" PosixPath('datasets/sinac2011DatosAbiertos.csv'),\n",
|
|
" PosixPath('datasets/sinac2012DatosAbiertos.csv'),\n",
|
|
" PosixPath('datasets/sinac2013DatosAbiertos.csv'),\n",
|
|
" PosixPath('datasets/sinac2014DatosAbiertos.csv'),\n",
|
|
" PosixPath('datasets/sinac2015DatosAbiertos.csv'),\n",
|
|
" PosixPath('datasets/sinac2016DatosAbiertos.csv'),\n",
|
|
" PosixPath('datasets/sinac2017DatosAbiertos.csv'),\n",
|
|
" PosixPath('datasets/sinac2018DatosAbiertos.csv'),\n",
|
|
" PosixPath('datasets/sinac2019DatosAbiertos.csv'),\n",
|
|
" PosixPath('datasets/sinac2020DatosAbiertos.csv'),\n",
|
|
" PosixPath('datasets/sinac2021DatosAbiertos.csv'),\n",
|
|
" PosixPath('datasets/sinac2022DatosAbiertos.csv')]"
|
|
]
|
|
},
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"csvs"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "c2fd2a75-3129-4a34-987c-22f944df45b3",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'2008'"
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"csvs[0].name[5:9]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "9936ea77-24d7-4d03-9429-e98675a7ebbb",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"columnas_totales = []\n",
|
|
"for csv in csvs:\n",
|
|
" df = pd.read_csv(csv, low_memory=False)\n",
|
|
" columnas = set(df.columns.to_list())\n",
|
|
" columnas_totales.append(columnas)\n",
|
|
" # print(csv.name, len(df), len(columnas), columnas)\n",
|
|
" samples[csvs[0].name[5:9]] = df.sample(25)\n",
|
|
" del df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"id": "ee74afa2-1ed1-40b2-8d90-c83d860d8971",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"columns = {}\n",
|
|
"for i, columnas in enumerate(columnas_totales):\n",
|
|
" columns[csvs[i].name[5:9]] = sorted(list(columnas))\n",
|
|
"\n",
|
|
"# json.dumps(columns)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "694dddec-1458-4220-ad11-9a534e6cad7b",
|
|
"metadata": {},
|
|
"source": [
|
|
"Con este siguient script produciremos un archivo Excel, colocara las columnas en un renglon por año y utilizaremos esto para determinar las equivalencias."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 20,
|
|
"id": "7c90b0cb-7cf2-471b-a331-44b7697e038e",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"file = Path(\"columns.csv\")\n",
|
|
"with file.open(\"w\") as f:\n",
|
|
" for year, column in columns.items():\n",
|
|
" print(year, *column, sep=\",\", file=f)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "9fc8f124-5bc9-46e6-9f97-b73b3409e36f",
|
|
"metadata": {},
|
|
"source": [
|
|
"Utilizando como base las columnas correspondientes a los primeros años (antes del 2017), definiremos equivalencias para los rangos del 2017 al 2019.\n",
|
|
"\n",
|
|
"Para esto utilizamos ChatGPT con el siguiente Prompt:\n",
|
|
"\n",
|
|
"```\n",
|
|
"Te proporcionare dos listas que nombran columnas,\n",
|
|
"\n",
|
|
"Primera Lista:\n",
|
|
"{PRIMERA LISTA}\n",
|
|
"\n",
|
|
"Segunda Lista:\n",
|
|
"{SEGUNDA LISTA}\n",
|
|
"\n",
|
|
"Determina cuales son las equivalencias de columnas. Y proporciona un JSON que exprese las equivalencias. No todas las columnas existen mutuamente, asi que determina el conjunto maximo de columnas compatibles por significado de la columna.\n",
|
|
"\n",
|
|
"Si sirve de contexto, enterate que son columnas correspondientes a campos de informacion de utilidad en resportes de natalidad.\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4e9d50a0-ab17-475d-bf18-3149d22d1008",
|
|
"metadata": {},
|
|
"source": [
|
|
"Despues de analizar manualmente las columnas para el 2017,2018,2019 se ha encontrado que el siguiente es un candidato a mapa de equivalencias de columnas."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "raw",
|
|
"id": "92dffbb6-27bf-4509-8b1b-c0d1de1bf105",
|
|
"metadata": {},
|
|
"source": [
|
|
"{\n",
|
|
" \"afiliacion_serv_salud\": [\"DERHAB\", \"DERHAB2\"],\n",
|
|
" \"anomalia_congenita_nac_vivo\": [\"CVE_CIE\", \"CVE_CIE2\"],\n",
|
|
" \"certificado_por\": \"CERT_POR\",\n",
|
|
" \"clues\": \"CLUES\",\n",
|
|
" \"codigo_anomalia\": [\"CVE_CIE\", \"CVE_CIE2\"],\n",
|
|
" \"edad_madre\": \"EDADM\",\n",
|
|
" \"fecha_nacimiento_nac_vivo\": \"FECH_NACH\",\n",
|
|
" \"peso_nac_vivo\": \"PESOH\",\n",
|
|
" \"semanas_gestacion_nac_vivo\": \"GESTACH\",\n",
|
|
" \"sexo_nac_vivo\": \"SEXOH\",\n",
|
|
" \"talla_nac_vivo\": \"TALLAH\",\n",
|
|
" \"tipo_formato\": \"TIPO_FORMATO\",\n",
|
|
" \"total_consultas_recibidas\": \"TOT_CONS\",\n",
|
|
" \"trabaja_actualmente\": \"TRAB_ACT\",\n",
|
|
" \"trimestre_recibio_primera_consulta\": \"TRIM_CONS\",\n",
|
|
" \"unidad_medica_certifico\": [\"UNIMED\", \"UNIMED_33_1\"],\n",
|
|
" \"valoracion_apgar_nac_vivo\": \"APGARH\",\n",
|
|
" \"quien_atendio_parto\": [\"ATENDIO\", \"ATEN_OTRO\", \"OTROMEDICO\"],\n",
|
|
" \"recibio_atencion_prenatal\": \"ATEN_PREN\",\n",
|
|
" \"recibio_vacuna_bcg\": \"BCG\",\n",
|
|
" \"recibio_vacuna_hep_b\": \"HEP_B\",\n",
|
|
" \"recibio_vit_a\": \"VIT_A\",\n",
|
|
" \"recibio_vit_k\": \"VIT_K\",\n",
|
|
" \"madre_habla_lengua_indigena\": \"HABLA_INDM\",\n",
|
|
" \"madre_se_considera_indigena\": \"CON_INDM\",\n",
|
|
" \"entidad_certifico\": \"ENT_CERT\",\n",
|
|
" \"entidad_nacimiento\": \"ENT_NAC\",\n",
|
|
" \"entidad_residencia_madre\": \"ENT_RES\",\n",
|
|
" \"estado_conyugal\": \"EDOCIVIL\",\n",
|
|
" \"escolaridad_madre\": \"NIV_ESCOL\",\n",
|
|
" \"lugar_de_nacimiento\": [\"INST_NAC\", \"PROCNAC\"],\n",
|
|
" \"madre_sobrevivio_al_parto\": \"SOB_PARTO\",\n",
|
|
" \"vive_aun_hijo_anterior\": \"VIVE_AUN\",\n",
|
|
" \"orden_nacimiento\": \"ORDEN_NAC\",\n",
|
|
" \"hijos_sobrevivientes\": \"HIJO_SOBV\",\n",
|
|
" \"el_hijo_anterior_nacio\": \"HIJO_ANTE\",\n",
|
|
" \"numero_embarazos\": \"NUM_EMB\",\n",
|
|
" \"hijos_nacidos_vivos\": \"NUM_NACVIVO\",\n",
|
|
" \"hijos_nacidos_muertos\": \"NUM_NACMTO\",\n",
|
|
" \"ocupacion_habitual_madre\": \"OCUPHAB\",\n",
|
|
" \"se_utilizaron_forceps\": \"FORCEPS\",\n",
|
|
" \"se_realizo_tamiz_auditivo\": \"TAM_AUD\",\n",
|
|
" \"valoracion_silverman_nac_vivo\": \"SILVERMAN\"\n",
|
|
"}"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "810c8093-1bba-4ad0-a2c3-38433e6b162e",
|
|
"metadata": {},
|
|
"source": [
|
|
"Para los años 2020, 2021, 2022, tenemos la siguiente relacion de equivalencias."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "raw",
|
|
"id": "2d716936-b373-4100-875d-5aab6442a4e7",
|
|
"metadata": {},
|
|
"source": [
|
|
" {\n",
|
|
" \"afiliacion_serv_salud\": \"AFILIACION\",\n",
|
|
" \"anomalia_congenita_nac_vivo\": [\"CODIGOCIEANOMALIA1\", \"CODIGOCIEANOMALIA2\"],\n",
|
|
" \"certificado_por\": \"CERTIFICADOPOR\",\n",
|
|
" \"clues\": \"CLUES\",\n",
|
|
" \"clues_certifico\": \"CLUESCERTIFICA\",\n",
|
|
" \"codigo_anomalia\": [\"CODIGOCIEANOMALIA1\", \"CODIGOCIEANOMALIA2\"],\n",
|
|
" \"edad_madre\": \"EDAD\",\n",
|
|
" \"entidad_certifico\": \"ENTIDADFEDERATIVACERTIFICA\",\n",
|
|
" \"entidad_nacimiento\": \"ENTIDADNACIMIENTO\",\n",
|
|
" \"entidad_residencia_madre\": \"ENTIDADRESIDENCIA\",\n",
|
|
" \"escolaridad_madre\": \"ESCOLARIDAD\",\n",
|
|
" \"estado_conyugal\": \"ESTADOCONYUGAL\",\n",
|
|
" \"fecha_certificacion\": \"FECHACERTIFICADO\",\n",
|
|
" \"fecha_nacimiento_nac_vivo\": \"FECHANACIMIENTO\",\n",
|
|
" \"fecha_nac_madre\": \"FECHANACIMIENTOMADRE\",\n",
|
|
" \"hijos_nacidos_muertos\": \"HIJOSNACIDOSMUERTOS\",\n",
|
|
" \"hijos_nacidos_vivos\": \"HIJOSNACIDOSVIVOS\",\n",
|
|
" \"hijos_sobrevivientes\": \"HIJOSSOBREVIVIENTES\",\n",
|
|
" \"hora_nacimiento_nac_vivo\": \"HORANACIMIENTO\",\n",
|
|
" \"madre_habla_lengua_indigena\": \"HABLALENGUAINDIGENA\",\n",
|
|
" \"madre_se_considera_indigena\": \"SECONSIDERAINDIGENA\",\n",
|
|
" \"madre_sobrevivio_al_parto\": \"SOBREVIVIOPARTO\",\n",
|
|
" \"numero_embarazos\": \"NUMEROEMBARAZOS\",\n",
|
|
" \"orden_nacimiento\": \"ORDENNACIMIENTO\",\n",
|
|
" \"peso_nac_vivo\": \"PESO\",\n",
|
|
" \"quien_atendio_parto\": \"PERSONALATENDIO\",\n",
|
|
" \"recibio_vacuna_bcg\": \"VACUNA_BCG\",\n",
|
|
" \"recibio_vacuna_hep_b\": \"VACUNAHEPATITIS_B\",\n",
|
|
" \"recibio_vit_a\": \"VITAMINA_A\",\n",
|
|
" \"recibio_vit_k\": \"VITAMINA_K\",\n",
|
|
" \"se_realizo_tamiz_auditivo\": \"TAMIZAUDITIVO\",\n",
|
|
" \"semanas_gestacion_nac_vivo\": \"EDADGESTACIONAL\",\n",
|
|
" \"sexo_nac_vivo\": \"SEXO\",\n",
|
|
" \"talla_nac_vivo\": \"TALLA\",\n",
|
|
" \"trabaja_actualmente\": \"TRABAJAACTUALMENTE\",\n",
|
|
" \"trimestre_recibio_primera_consulta\": \"TRIMESTREPRIMERCONSULTA\",\n",
|
|
" \"vive_aun_hijo_anterior\": \"VIVEHIJOANTERIOR\"\n",
|
|
"}\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "62e36cac-2d1a-42c2-a68e-e0888d7afa0d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "8929221a-af25-4673-8952-0a55f4e59e80",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|