674 lines
24 KiB
Plaintext
674 lines
24 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<img src=\"./images/DLI_Header.png\" width=400/>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Fundamentals of Accelerated Data Science # "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 08 - Multi-GPU K-Means with Dask ##\n",
|
|
"\n",
|
|
"**Table of Contents**\n",
|
|
"<br>\n",
|
|
"This notebook uses GPU-accelerated K-means to identify population clusters in a multi-node, multi-GPU scalable way with Dask. This notebook covers the below sections: \n",
|
|
"1. [Environment](#Environment)\n",
|
|
"2. [Load and Persist Data](#Load-and-Persist-Data)\n",
|
|
"3. [Training the Model](#Training-the-Model)\n",
|
|
" * [Exercise #1 - Count Members of the Southernmost Cluster](#Exercise-#1---Count-Members-of-the-Southernmost-Cluster)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Environment ##\n",
|
|
"First we import the needed modules to create a Dask cuDF cluster. As we did before, we need to import CUDA context creators after setting up the cluster so they don't lock to a single device. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import subprocess\n",
|
|
"import logging\n",
|
|
"\n",
|
|
"from dask.distributed import Client, wait, progress\n",
|
|
"from dask_cuda import LocalCUDACluster"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import cudf\n",
|
|
"import dask_cudf\n",
|
|
"\n",
|
|
"import cuml\n",
|
|
"from cuml.dask.cluster import KMeans"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# create cluster\n",
|
|
"cmd = \"hostname --all-ip-addresses\"\n",
|
|
"process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)\n",
|
|
"output, error = process.communicate()\n",
|
|
"IPADDR = str(output.decode()).split()[0]\n",
|
|
"\n",
|
|
"cluster = LocalCUDACluster(ip=IPADDR, silence_logs=logging.ERROR)\n",
|
|
"client = Client(cluster)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Load and Persist Data ##\n",
|
|
"We will begin by loading the data, The data set has the two grid coordinate columns, `easting` and `northing`, derived from the main population data set we have prepared."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"ddf = dask_cudf.read_csv('./data/uk_pop5x_coords.csv', dtype=['float32', 'float32'])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Training the Model ##\n",
|
|
"Training the K-means model is very similar to both the scikit-learn version and the cuML single-GPU version--by setting up the client and importing from the `cuml.dask.cluster` module, the algorithm will automatically use the local Dask cluster we have set up.\n",
|
|
"\n",
|
|
"Note that calling `.fit` triggers Dask computation.\n",
|
|
"\n",
|
|
"Once we have the fit model, we extract the cluster centers and rename the columns from their generic `0` and `1` to reflect the data on which they were trained."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"CPU times: user 5.24 s, sys: 2.48 s, total: 7.72 s\n",
|
|
"Wall time: 1min 54s\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<style>#sk-container-id-1 {\n",
|
|
" /* Definition of color scheme common for light and dark mode */\n",
|
|
" --sklearn-color-text: black;\n",
|
|
" --sklearn-color-line: gray;\n",
|
|
" /* Definition of color scheme for unfitted estimators */\n",
|
|
" --sklearn-color-unfitted-level-0: #fff5e6;\n",
|
|
" --sklearn-color-unfitted-level-1: #f6e4d2;\n",
|
|
" --sklearn-color-unfitted-level-2: #ffe0b3;\n",
|
|
" --sklearn-color-unfitted-level-3: chocolate;\n",
|
|
" /* Definition of color scheme for fitted estimators */\n",
|
|
" --sklearn-color-fitted-level-0: #f0f8ff;\n",
|
|
" --sklearn-color-fitted-level-1: #d4ebff;\n",
|
|
" --sklearn-color-fitted-level-2: #b3dbfd;\n",
|
|
" --sklearn-color-fitted-level-3: cornflowerblue;\n",
|
|
"\n",
|
|
" /* Specific color for light theme */\n",
|
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
|
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
|
" --sklearn-color-icon: #696969;\n",
|
|
"\n",
|
|
" @media (prefers-color-scheme: dark) {\n",
|
|
" /* Redefinition of color scheme for dark theme */\n",
|
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
|
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
|
" --sklearn-color-icon: #878787;\n",
|
|
" }\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 {\n",
|
|
" color: var(--sklearn-color-text);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 pre {\n",
|
|
" padding: 0;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 input.sk-hidden--visually {\n",
|
|
" border: 0;\n",
|
|
" clip: rect(1px 1px 1px 1px);\n",
|
|
" clip: rect(1px, 1px, 1px, 1px);\n",
|
|
" height: 1px;\n",
|
|
" margin: -1px;\n",
|
|
" overflow: hidden;\n",
|
|
" padding: 0;\n",
|
|
" position: absolute;\n",
|
|
" width: 1px;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-dashed-wrapped {\n",
|
|
" border: 1px dashed var(--sklearn-color-line);\n",
|
|
" margin: 0 0.4em 0.5em 0.4em;\n",
|
|
" box-sizing: border-box;\n",
|
|
" padding-bottom: 0.4em;\n",
|
|
" background-color: var(--sklearn-color-background);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-container {\n",
|
|
" /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
|
|
" but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
|
|
" so we also need the `!important` here to be able to override the\n",
|
|
" default hidden behavior on the sphinx rendered scikit-learn.org.\n",
|
|
" See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
|
|
" display: inline-block !important;\n",
|
|
" position: relative;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-text-repr-fallback {\n",
|
|
" display: none;\n",
|
|
"}\n",
|
|
"\n",
|
|
"div.sk-parallel-item,\n",
|
|
"div.sk-serial,\n",
|
|
"div.sk-item {\n",
|
|
" /* draw centered vertical line to link estimators */\n",
|
|
" background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
|
|
" background-size: 2px 100%;\n",
|
|
" background-repeat: no-repeat;\n",
|
|
" background-position: center center;\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Parallel-specific style estimator block */\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-parallel-item::after {\n",
|
|
" content: \"\";\n",
|
|
" width: 100%;\n",
|
|
" border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
|
|
" flex-grow: 1;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-parallel {\n",
|
|
" display: flex;\n",
|
|
" align-items: stretch;\n",
|
|
" justify-content: center;\n",
|
|
" background-color: var(--sklearn-color-background);\n",
|
|
" position: relative;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-parallel-item {\n",
|
|
" display: flex;\n",
|
|
" flex-direction: column;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-parallel-item:first-child::after {\n",
|
|
" align-self: flex-end;\n",
|
|
" width: 50%;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-parallel-item:last-child::after {\n",
|
|
" align-self: flex-start;\n",
|
|
" width: 50%;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-parallel-item:only-child::after {\n",
|
|
" width: 0;\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Serial-specific style estimator block */\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-serial {\n",
|
|
" display: flex;\n",
|
|
" flex-direction: column;\n",
|
|
" align-items: center;\n",
|
|
" background-color: var(--sklearn-color-background);\n",
|
|
" padding-right: 1em;\n",
|
|
" padding-left: 1em;\n",
|
|
"}\n",
|
|
"\n",
|
|
"\n",
|
|
"/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
|
|
"clickable and can be expanded/collapsed.\n",
|
|
"- Pipeline and ColumnTransformer use this feature and define the default style\n",
|
|
"- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
|
|
"*/\n",
|
|
"\n",
|
|
"/* Pipeline and ColumnTransformer style (default) */\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-toggleable {\n",
|
|
" /* Default theme specific background. It is overwritten whether we have a\n",
|
|
" specific estimator or a Pipeline/ColumnTransformer */\n",
|
|
" background-color: var(--sklearn-color-background);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Toggleable label */\n",
|
|
"#sk-container-id-1 label.sk-toggleable__label {\n",
|
|
" cursor: pointer;\n",
|
|
" display: block;\n",
|
|
" width: 100%;\n",
|
|
" margin-bottom: 0;\n",
|
|
" padding: 0.5em;\n",
|
|
" box-sizing: border-box;\n",
|
|
" text-align: center;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 label.sk-toggleable__label-arrow:before {\n",
|
|
" /* Arrow on the left of the label */\n",
|
|
" content: \"▸\";\n",
|
|
" float: left;\n",
|
|
" margin-right: 0.25em;\n",
|
|
" color: var(--sklearn-color-icon);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {\n",
|
|
" color: var(--sklearn-color-text);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Toggleable content - dropdown */\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-toggleable__content {\n",
|
|
" max-height: 0;\n",
|
|
" max-width: 0;\n",
|
|
" overflow: hidden;\n",
|
|
" text-align: left;\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-toggleable__content.fitted {\n",
|
|
" /* fitted */\n",
|
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-toggleable__content pre {\n",
|
|
" margin: 0.2em;\n",
|
|
" border-radius: 0.25em;\n",
|
|
" color: var(--sklearn-color-text);\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-toggleable__content.fitted pre {\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
|
|
" /* Expand drop-down */\n",
|
|
" max-height: 200px;\n",
|
|
" max-width: 100%;\n",
|
|
" overflow: auto;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
|
|
" content: \"▾\";\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Pipeline/ColumnTransformer-specific style */\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|
" color: var(--sklearn-color-text);\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Estimator-specific style */\n",
|
|
"\n",
|
|
"/* Colorize estimator box */\n",
|
|
"#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|
" /* fitted */\n",
|
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-label label.sk-toggleable__label,\n",
|
|
"#sk-container-id-1 div.sk-label label {\n",
|
|
" /* The background is the default theme color */\n",
|
|
" color: var(--sklearn-color-text-on-default-background);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* On hover, darken the color of the background */\n",
|
|
"#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {\n",
|
|
" color: var(--sklearn-color-text);\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Label box, darken color on hover, fitted */\n",
|
|
"#sk-container-id-1 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
|
|
" color: var(--sklearn-color-text);\n",
|
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Estimator label */\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-label label {\n",
|
|
" font-family: monospace;\n",
|
|
" font-weight: bold;\n",
|
|
" display: inline-block;\n",
|
|
" line-height: 1.2em;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-label-container {\n",
|
|
" text-align: center;\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Estimator-specific */\n",
|
|
"#sk-container-id-1 div.sk-estimator {\n",
|
|
" font-family: monospace;\n",
|
|
" border: 1px dotted var(--sklearn-color-border-box);\n",
|
|
" border-radius: 0.25em;\n",
|
|
" box-sizing: border-box;\n",
|
|
" margin-bottom: 0.5em;\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-estimator.fitted {\n",
|
|
" /* fitted */\n",
|
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* on hover */\n",
|
|
"#sk-container-id-1 div.sk-estimator:hover {\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 div.sk-estimator.fitted:hover {\n",
|
|
" /* fitted */\n",
|
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
|
|
"\n",
|
|
"/* Common style for \"i\" and \"?\" */\n",
|
|
"\n",
|
|
".sk-estimator-doc-link,\n",
|
|
"a:link.sk-estimator-doc-link,\n",
|
|
"a:visited.sk-estimator-doc-link {\n",
|
|
" float: right;\n",
|
|
" font-size: smaller;\n",
|
|
" line-height: 1em;\n",
|
|
" font-family: monospace;\n",
|
|
" background-color: var(--sklearn-color-background);\n",
|
|
" border-radius: 1em;\n",
|
|
" height: 1em;\n",
|
|
" width: 1em;\n",
|
|
" text-decoration: none !important;\n",
|
|
" margin-left: 1ex;\n",
|
|
" /* unfitted */\n",
|
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
|
"}\n",
|
|
"\n",
|
|
".sk-estimator-doc-link.fitted,\n",
|
|
"a:link.sk-estimator-doc-link.fitted,\n",
|
|
"a:visited.sk-estimator-doc-link.fitted {\n",
|
|
" /* fitted */\n",
|
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* On hover */\n",
|
|
"div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
|
|
".sk-estimator-doc-link:hover,\n",
|
|
"div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
|
|
".sk-estimator-doc-link:hover {\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
|
" color: var(--sklearn-color-background);\n",
|
|
" text-decoration: none;\n",
|
|
"}\n",
|
|
"\n",
|
|
"div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
|
|
".sk-estimator-doc-link.fitted:hover,\n",
|
|
"div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
|
|
".sk-estimator-doc-link.fitted:hover {\n",
|
|
" /* fitted */\n",
|
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
|
" color: var(--sklearn-color-background);\n",
|
|
" text-decoration: none;\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Span, style for the box shown on hovering the info icon */\n",
|
|
".sk-estimator-doc-link span {\n",
|
|
" display: none;\n",
|
|
" z-index: 9999;\n",
|
|
" position: relative;\n",
|
|
" font-weight: normal;\n",
|
|
" right: .2ex;\n",
|
|
" padding: .5ex;\n",
|
|
" margin: .5ex;\n",
|
|
" width: min-content;\n",
|
|
" min-width: 20ex;\n",
|
|
" max-width: 50ex;\n",
|
|
" color: var(--sklearn-color-text);\n",
|
|
" box-shadow: 2pt 2pt 4pt #999;\n",
|
|
" /* unfitted */\n",
|
|
" background: var(--sklearn-color-unfitted-level-0);\n",
|
|
" border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
|
|
"}\n",
|
|
"\n",
|
|
".sk-estimator-doc-link.fitted span {\n",
|
|
" /* fitted */\n",
|
|
" background: var(--sklearn-color-fitted-level-0);\n",
|
|
" border: var(--sklearn-color-fitted-level-3);\n",
|
|
"}\n",
|
|
"\n",
|
|
".sk-estimator-doc-link:hover span {\n",
|
|
" display: block;\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* \"?\"-specific style due to the `<a>` HTML tag */\n",
|
|
"\n",
|
|
"#sk-container-id-1 a.estimator_doc_link {\n",
|
|
" float: right;\n",
|
|
" font-size: 1rem;\n",
|
|
" line-height: 1em;\n",
|
|
" font-family: monospace;\n",
|
|
" background-color: var(--sklearn-color-background);\n",
|
|
" border-radius: 1rem;\n",
|
|
" height: 1rem;\n",
|
|
" width: 1rem;\n",
|
|
" text-decoration: none;\n",
|
|
" /* unfitted */\n",
|
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 a.estimator_doc_link.fitted {\n",
|
|
" /* fitted */\n",
|
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* On hover */\n",
|
|
"#sk-container-id-1 a.estimator_doc_link:hover {\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
|
" color: var(--sklearn-color-background);\n",
|
|
" text-decoration: none;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-1 a.estimator_doc_link.fitted:hover {\n",
|
|
" /* fitted */\n",
|
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
|
"}\n",
|
|
"</style><div id=\"sk-container-id-1\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>KMeansMG()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-1\" type=\"checkbox\" checked><label for=\"sk-estimator-id-1\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\"> KMeansMG<span class=\"sk-estimator-doc-link fitted\">i<span>Fitted</span></span></label><div class=\"sk-toggleable__content fitted\"><pre>KMeansMG()</pre></div> </div></div></div></div>"
|
|
],
|
|
"text/plain": [
|
|
"KMeansMG()"
|
|
]
|
|
},
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"%%time\n",
|
|
"dkm = KMeans(n_clusters=20)\n",
|
|
"dkm.fit(ddf)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"northing float32\n",
|
|
"easting float32\n",
|
|
"dtype: object"
|
|
]
|
|
},
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"cluster_centers = dkm.cluster_centers_\n",
|
|
"cluster_centers.columns = ddf.columns\n",
|
|
"cluster_centers.dtypes"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Exercise #1 - Count Members of the Southernmost Cluster ###\n",
|
|
"Using the `cluster_centers`, identify which cluster is the southernmost (has the lowest `northing` value) with the `nsmallest` method, then use `dkm.predict` to get labels for the data, and finally filter the labels to determine how many individuals the model estimated were in that cluster. \n",
|
|
"\n",
|
|
"**Instructions**: <br>\n",
|
|
"* Modify the `<FIXME>` only and execute the below cell to estimate the number of individuals in the southernmost cluster. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"31435157"
|
|
]
|
|
},
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"south_idx = cluster_centers.nsmallest(1, 'northing').index[0]\n",
|
|
"labels_predicted = dkm.predict(ddf)\n",
|
|
"labels_predicted[labels_predicted==south_idx].compute().shape[0]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{'status': 'ok', 'restart': True}"
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"154087362\n",
|
|
"144014032\n",
|
|
"131789736\n",
|
|
"154907810\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"import IPython\n",
|
|
"app = IPython.Application.instance()\n",
|
|
"app.kernel.do_shutdown(True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Well Done!**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<img src=\"./images/DLI_Header.png\" width=400/>"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.15"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|