<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Diego Ripley – Blog</title><link>https://www.diegoripley.ca/blog/</link><description>Recent content in Blog on Diego Ripley</description><generator>Hugo -- gohugo.io</generator><language>en</language><atom:link href="https://www.diegoripley.ca/blog/index.xml" rel="self" type="application/rss+xml"/><item><title>What I Learned From Processing All of Statistics Canada's Tables</title><link>https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/</link><pubDate>Thu, 19 Jun 2025 00:00:00 +0000</pubDate><guid>https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/</guid><description>
&lt;p>Over the past few weeks I have processed all of Statistics Canada&amp;rsquo;s data tables (also known as cubes and referred as &lt;code>product_id&lt;/code> in the tables embedded in this post) that are available through Statistics Canada&amp;rsquo;s &lt;a href="https://www.statcan.gc.ca/en/developers/wds"target="_blank" rel="noopener">Web Data Service&lt;/a> (WDS). I have always been interested in making statistical data products easily accessible to users, and after analyzing the current way of disseminating data tables, I was able to make several improvements. In this blog post I will talk about (1) the problem, (2) what I was able to achieve, (3) issues encountered through processing the data, and (4) next steps.&lt;/p>
&lt;h2>1. Problem&lt;span class="hx:absolute hx:-mt-20" id="1-problem">&lt;/span>
&lt;a href="#1-problem" class="subheading-anchor" aria-label="Permalink for this section">&lt;/a>&lt;/h2>&lt;p>As of July 6, 2025, there are 7918 data tables. There are two formats that can be downloaded, CSV, and XML, which are both disseminated as ZIP files. I chose to download the &lt;strong>English&lt;/strong> CSV files. I downloaded 7918 ZIP files that amounted to 178.33 GB compressed, and 3314.57 GB uncompressed.&lt;/p>
&lt;p>After working with the data for a bit I noticed the following problems:&lt;/p>
&lt;ul>
&lt;li>You first need to download a ZIP file, extract it, then process the dataset to your needs. That&amp;rsquo;s a lot of unnecessary steps. What if the data was just in a file format that was optimized for efficient data storage and retrieval. My goal is to allow users to easily link the Dissemination Geography Unique Identifier (DGUID) code to their geographic boundaries, so users can visualize all data tables in software such as QGIS and ArcGIS Pro.&lt;/li>
&lt;li>There is no site that keeps track of all changes to Statistics Canada&amp;rsquo;s data tables. That means that data can just dissapear without any accountability.&lt;/li>
&lt;/ul>
&lt;h2>2. Result&lt;span class="hx:absolute hx:-mt-20" id="2-result">&lt;/span>
&lt;a href="#2-result" class="subheading-anchor" aria-label="Permalink for this section">&lt;/a>&lt;/h2>&lt;p>I was able to process 7911/7918 data tables (99.91%) and created Parquet files that amounted to 25.73 GB (14.43% of the ZIP file size). Here is an interactive table with each data table, and the various file statistics:&lt;/p>
&lt;p>
&lt;div id="grid-container-02" class="grid-container">&lt;/div>
&lt;script type="module" crossorigin src="https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/index-CM8rRtFp.js">&lt;/script>
&lt;link rel="modulepreload" crossorigin href="https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/hyparquet-DUoUNJtp.js">
&lt;link rel="modulepreload" crossorigin href="https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/ag-grid-C8nY5wNI.js">
&lt;link rel="stylesheet" crossorigin href="https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/index-Bh7G-G2M.css">
&lt;div class="hx:overflow-x-auto hx:mt-6 hx:flex hx:rounded-lg hx:border hx:py-2 hx:ltr:pr-4 hx:rtl:pl-4 hx:contrast-more:border-current hx:contrast-more:dark:border-current hx:border-blue-200 hx:bg-blue-100 hx:text-blue-900 hx:dark:border-blue-200/30 hx:dark:bg-blue-900/30 hx:dark:text-blue-200">
&lt;div class="hx:ltr:pl-3 hx:ltr:pr-2 hx:rtl:pr-3 hx:rtl:pl-2">&lt;svg height=1.2em class="hx:inline-block hx:align-middle" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="2" stroke="currentColor" aria-hidden="true">&lt;path stroke-linecap="round" stroke-linejoin="round" d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z"/>&lt;/svg>&lt;/div>
&lt;div class="hx:w-full hx:min-w-0 hx:leading-7">
&lt;div class="hx:mt-6 hx:leading-7 hx:first:mt-0">&lt;a href="product_stats_july_05_2025.parquet">Click here&lt;/a> to download this table as Parquet.&lt;/div>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h3>2.1 Notable Changes Made to the Data Tables&lt;span class="hx:absolute hx:-mt-20" id="21-notable-changes-made-to-the-data-tables">&lt;/span>
&lt;a href="#21-notable-changes-made-to-the-data-tables" class="subheading-anchor" aria-label="Permalink for this section">&lt;/a>&lt;/h3>&lt;p>Here are some notable changes made to the data tables:&lt;/p>
&lt;ul>
&lt;li>The Parquet files have optimized data types, so for example if the &lt;code>VALUE&lt;/code> column is an integer, and has a maximum value of 2147483646, then the column is defined as a 32-bit integer; this is important as optimized data types means less memory usage when processing the file.&lt;/li>
&lt;li>Two new columns were added to each data table: &lt;code>REF_START_DATE&lt;/code> and &lt;code>REF_END_DATE&lt;/code>, that were based from the &lt;code>REF_DATE&lt;/code> column. This was added to enable date range queries via software such as DuckDB. The logic for the &lt;code>REF_START_DATE&lt;/code> and &lt;code>REF_END_DATE&lt;/code> columns is as follows:
&lt;ul>
&lt;li>When the &lt;code>REF_DATE&lt;/code> column contained just the year (ex. &lt;code>2024&lt;/code>), the &lt;code>REF_START_DATE&lt;/code> was set to &lt;code>2024-01-01&lt;/code> and the &lt;code>REF_END_DATE&lt;/code> was set to &lt;code>2024-12-31&lt;/code>.&lt;/li>
&lt;li>When the &lt;code>REF_DATE&lt;/code> column contained the year and month (ex. &lt;code>2024-01&lt;/code>), the &lt;code>REF_START_DATE&lt;/code> was set to &lt;code>2024-01-01&lt;/code> and the &lt;code>REF_END_DATE&lt;/code> was set to &lt;code>2024-01-31&lt;/code>.&lt;/li>
&lt;li>When the &lt;code>REF_DATE&lt;/code> column contained the year, month, and day (ex. &lt;code>2024-01-01&lt;/code>), the &lt;code>REF_START_DATE&lt;/code> was set to &lt;code>2024-01-01&lt;/code> and the &lt;code>REF_END_DATE&lt;/code> was set to &lt;code>2024-01-01&lt;/code>.&lt;/li>
&lt;li>There were cases that I was unable to parse, such as a &lt;code>REF_DATE&lt;/code> set to &lt;code>2023/2024&lt;/code> in table &lt;code>17100022&lt;/code>. According to the metadata, the period is from July 1 to June 30, so I cannot just set January 1, 2023 as the &lt;code>REF_START_DATE&lt;/code> and December 31, 2024 as the &lt;code>REF_END_DATE&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Had to rename columns with same name to avoid conflicts with DuckDB. An example is table &lt;code>10100164&lt;/code>, it has two columns with the same name: &lt;code>Value&lt;/code> and &lt;code>VALUE&lt;/code>. DuckDB treats column names in a case insensitive manner, so in these cases, &lt;code>Value&lt;/code> was renamed to &lt;code>Value.1&lt;/code>.&lt;/li>
&lt;/ul>
&lt;h2>3. Issues Encountered&lt;span class="hx:absolute hx:-mt-20" id="3-issues-encountered">&lt;/span>
&lt;a href="#3-issues-encountered" class="subheading-anchor" aria-label="Permalink for this section">&lt;/a>&lt;/h2>&lt;p>These are the issues I encountered when using Statistics Canada&amp;rsquo;s WDS.&lt;/p>
&lt;h3>3.1 Inconsistent Timezone Used for &lt;code>releaseTime&lt;/code>&lt;span class="hx:absolute hx:-mt-20" id="31-inconsistent-timezone-used-for-releasetime">&lt;/span>
&lt;a href="#31-inconsistent-timezone-used-for-releasetime" class="subheading-anchor" aria-label="Permalink for this section">&lt;/a>&lt;/h3>&lt;p>When using &lt;a href="https://www.statcan.gc.ca/en/developers/wds/user-guide#a11-5"target="_blank" rel="noopener">getAllCubesListLite&lt;/a>, the &lt;code>releaseTime&lt;/code> is in Coordinated Universal Time (UTC). However when you get the table metadata via &lt;a href="https://www.statcan.gc.ca/en/developers/wds/user-guide#a11-1"target="_blank" rel="noopener">getCubeMetadata&lt;/a>, the &lt;code>releaseTime&lt;/code> is in Eastern Standard Time (EST).&lt;/p>
&lt;div class="hextra-cards hx:mt-4 hx:gap-4 hx:grid not-prose" style="--hextra-cards-grid-cols: 3;">
&lt;a
class="hextra-card hx:group hx:flex hx:flex-col hx:justify-start hx:overflow-hidden hx:rounded-lg hx:border hx:border-gray-200 hx:text-current hx:no-underline hx:dark:shadow-none hx:hover:shadow-gray-100 hx:dark:hover:shadow-none hx:shadow-gray-100 hx:active:shadow-sm hx:active:shadow-gray-200 hx:transition-all hx:duration-200 hx:hover:border-gray-300 hx:bg-gray-100 hx:shadow-sm hx:dark:border-neutral-700 hx:dark:bg-neutral-800 hx:dark:text-gray-50 hx:hover:shadow-lg hx:dark:hover:border-neutral-500 hx:dark:hover:bg-neutral-700"href="https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/getAllCubesListLite_example.webp"
>&lt;img
alt="UTC"
class="hextra-card-image"
loading="lazy"
decoding="async"
src="https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/getAllCubesListLite_example.webp"
/>&lt;div class="hx:mt-auto">
&lt;span class="hextra-card-icon hx:flex hx:font-semibold hx:items-start hx:gap-2 hx:pt-4 hx:px-4 hx:text-gray-700 hx:hover:text-gray-900 hx:dark:text-neutral-200 hx:dark:hover:text-neutral-50">UTC&lt;/span>&lt;div class="hextra-card-subtitle hx:line-clamp-3 hx:text-sm hx:font-normal hx:text-gray-500 hx:dark:text-gray-400 hx:px-4 hx:mb-4 hx:mt-2">releaseTime for getAllCubesListLite in UTC&lt;/div>&lt;/div>&lt;/a>
&lt;a
class="hextra-card hx:group hx:flex hx:flex-col hx:justify-start hx:overflow-hidden hx:rounded-lg hx:border hx:border-gray-200 hx:text-current hx:no-underline hx:dark:shadow-none hx:hover:shadow-gray-100 hx:dark:hover:shadow-none hx:shadow-gray-100 hx:active:shadow-sm hx:active:shadow-gray-200 hx:transition-all hx:duration-200 hx:hover:border-gray-300 hx:bg-gray-100 hx:shadow-sm hx:dark:border-neutral-700 hx:dark:bg-neutral-800 hx:dark:text-gray-50 hx:hover:shadow-lg hx:dark:hover:border-neutral-500 hx:dark:hover:bg-neutral-700"href="https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/getCubeMetadata_example.webp"
>&lt;img
alt="EST"
class="hextra-card-image"
loading="lazy"
decoding="async"
src="https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/getCubeMetadata_example.webp"
/>&lt;div class="hx:mt-auto">
&lt;span class="hextra-card-icon hx:flex hx:font-semibold hx:items-start hx:gap-2 hx:pt-4 hx:px-4 hx:text-gray-700 hx:hover:text-gray-900 hx:dark:text-neutral-200 hx:dark:hover:text-neutral-50">EST&lt;/span>&lt;div class="hextra-card-subtitle hx:line-clamp-3 hx:text-sm hx:font-normal hx:text-gray-500 hx:dark:text-gray-400 hx:px-4 hx:mb-4 hx:mt-2">releaseTime for getCubeMetadata in EST&lt;/div>&lt;/div>&lt;/a>
&lt;/div>
&lt;p>You can replicate this issue by running the following two commands. The first command gets the &lt;code>releaseTime&lt;/code> for productId (table) 10100139 through getAllCubesListLite and the second command gets the &lt;code>releaseTime&lt;/code> through getCubeMetadata.&lt;/p>
&lt;div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">
&lt;div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Get releaseTime for productId 10100139 via getAllCubesListLite&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">echo&lt;/span> &lt;span class="s2">&amp;#34;This is the releaseTime for productId 10100139 retrieved through /getAllCubesListLite&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">curl https://www150.statcan.gc.ca/t1/wds/rest/getAllCubesListLite &lt;span class="p">|&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> jq -r &lt;span class="s1">&amp;#39;.[] | select(.productId==10100139) | .releaseTime&amp;#39;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/div>&lt;div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
&lt;button
class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
title="Copy code"
aria-label="Copy code"
data-copied-label="Copied!"
>
&lt;div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;/button>
&lt;/div>
&lt;/div>
&lt;div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">
&lt;div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Get releaseTime for productId 10100139 via getCubeMetadata&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">echo&lt;/span> &lt;span class="s2">&amp;#34;This is the releaseTime for productId 10100139 retrieved through /getCubeMetadata&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">curl https://www150.statcan.gc.ca/t1/wds/rest/getCubeMetadata &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --header &lt;span class="s1">&amp;#39;Content-Type: application/json&amp;#39;&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --data &lt;span class="s1">&amp;#39;[{&amp;#34;productId&amp;#34;:10100139}]&amp;#39;&lt;/span> &lt;span class="p">|&lt;/span> jq &lt;span class="s1">&amp;#39;.[0].object.releaseTime&amp;#39;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/div>&lt;div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
&lt;button
class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
title="Copy code"
aria-label="Copy code"
data-copied-label="Copied!"
>
&lt;div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;/button>
&lt;/div>
&lt;/div>
&lt;h3>3.2. Different &lt;code>releaseTime&lt;/code> Values&lt;span class="hx:absolute hx:-mt-20" id="32-different-releasetime-values">&lt;/span>
&lt;a href="#32-different-releasetime-values" class="subheading-anchor" aria-label="Permalink for this section">&lt;/a>&lt;/h3>&lt;p>There is a difference in &lt;strong>some&lt;/strong> of the &lt;code>releaseTime&lt;/code> values that are returned when using &lt;a href="https://www.statcan.gc.ca/en/developers/wds/user-guide#a11-5"target="_blank" rel="noopener">getAllCubesListLite&lt;/a> or &lt;a href="https://www.statcan.gc.ca/en/developers/wds/user-guide#a11-1"target="_blank" rel="noopener">getCubeMetadata&lt;/a>
For the example below there is a 3 year difference in the &lt;code>releaseTime&lt;/code>.&lt;/p>
&lt;div class="hextra-cards hx:mt-4 hx:gap-4 hx:grid not-prose" style="--hextra-cards-grid-cols: 3;">
&lt;a
class="hextra-card hx:group hx:flex hx:flex-col hx:justify-start hx:overflow-hidden hx:rounded-lg hx:border hx:border-gray-200 hx:text-current hx:no-underline hx:dark:shadow-none hx:hover:shadow-gray-100 hx:dark:hover:shadow-none hx:shadow-gray-100 hx:active:shadow-sm hx:active:shadow-gray-200 hx:transition-all hx:duration-200 hx:hover:border-gray-300 hx:bg-gray-100 hx:shadow-sm hx:dark:border-neutral-700 hx:dark:bg-neutral-800 hx:dark:text-gray-50 hx:hover:shadow-lg hx:dark:hover:border-neutral-500 hx:dark:hover:bg-neutral-700"href="https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/getAllCubesListLite_date_discrepancy.webp"
>&lt;img
alt="3 Year Difference"
class="hextra-card-image"
loading="lazy"
decoding="async"
src="https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/getAllCubesListLite_date_discrepancy.webp"
/>&lt;div class="hx:mt-auto">
&lt;span class="hextra-card-icon hx:flex hx:font-semibold hx:items-start hx:gap-2 hx:pt-4 hx:px-4 hx:text-gray-700 hx:hover:text-gray-900 hx:dark:text-neutral-200 hx:dark:hover:text-neutral-50">3 Year Difference&lt;/span>&lt;div class="hextra-card-subtitle hx:line-clamp-3 hx:text-sm hx:font-normal hx:text-gray-500 hx:dark:text-gray-400 hx:px-4 hx:mb-4 hx:mt-2">releaseTime of 2020-11-02T13:30:00Z&lt;/div>&lt;/div>&lt;/a>
&lt;a
class="hextra-card hx:group hx:flex hx:flex-col hx:justify-start hx:overflow-hidden hx:rounded-lg hx:border hx:border-gray-200 hx:text-current hx:no-underline hx:dark:shadow-none hx:hover:shadow-gray-100 hx:dark:hover:shadow-none hx:shadow-gray-100 hx:active:shadow-sm hx:active:shadow-gray-200 hx:transition-all hx:duration-200 hx:hover:border-gray-300 hx:bg-gray-100 hx:shadow-sm hx:dark:border-neutral-700 hx:dark:bg-neutral-800 hx:dark:text-gray-50 hx:hover:shadow-lg hx:dark:hover:border-neutral-500 hx:dark:hover:bg-neutral-700"href="https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/getCubeMetadata_date_discrepancy.webp"
>&lt;img
alt="3 Year Difference"
class="hextra-card-image"
loading="lazy"
decoding="async"
src="https://www.diegoripley.ca/blog/2025/what-i-learned-from-processing-all-statcan-tables/getCubeMetadata_date_discrepancy.webp"
/>&lt;div class="hx:mt-auto">
&lt;span class="hextra-card-icon hx:flex hx:font-semibold hx:items-start hx:gap-2 hx:pt-4 hx:px-4 hx:text-gray-700 hx:hover:text-gray-900 hx:dark:text-neutral-200 hx:dark:hover:text-neutral-50">3 Year Difference&lt;/span>&lt;div class="hextra-card-subtitle hx:line-clamp-3 hx:text-sm hx:font-normal hx:text-gray-500 hx:dark:text-gray-400 hx:px-4 hx:mb-4 hx:mt-2">releaseTime of 2023-11-02T14:15&lt;/div>&lt;/div>&lt;/a>
&lt;/div>
&lt;p>You can replicate this issue by running the following two commands. The first command gets the &lt;code>releaseTime&lt;/code> for productId (table) 10100007 through getAllCubesListLite and the second command gets the &lt;code>releaseTime&lt;/code> through getCubeMetadata.&lt;/p>
&lt;div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">
&lt;div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Get releaseTime for productId 10100007 via getAllCubesListLite&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">echo&lt;/span> &lt;span class="s2">&amp;#34;This is the releaseTime for productId 10100007 retrieved through /getAllCubesListLite&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">curl https://www150.statcan.gc.ca/t1/wds/rest/getAllCubesListLite &lt;span class="p">|&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> jq -r &lt;span class="s1">&amp;#39;.[] | select(.productId==10100007) | .releaseTime&amp;#39;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/div>&lt;div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
&lt;button
class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
title="Copy code"
aria-label="Copy code"
data-copied-label="Copied!"
>
&lt;div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;/button>
&lt;/div>
&lt;/div>
&lt;div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">
&lt;div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Get releaseTime for productId 10100007 via getCubeMetadata&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">echo&lt;/span> &lt;span class="s2">&amp;#34;This is the releaseTime for productId 10100007 retrieved through /getCubeMetadata&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">curl https://www150.statcan.gc.ca/t1/wds/rest/getCubeMetadata &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --header &lt;span class="s1">&amp;#39;Content-Type: application/json&amp;#39;&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> --data &lt;span class="s1">&amp;#39;[{&amp;#34;productId&amp;#34;:10100007}]&amp;#39;&lt;/span> &lt;span class="p">|&lt;/span> jq &lt;span class="s1">&amp;#39;.[0].object.releaseTime&amp;#39;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/div>&lt;div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
&lt;button
class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
title="Copy code"
aria-label="Copy code"
data-copied-label="Copied!"
>
&lt;div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;/button>
&lt;/div>
&lt;/div>
&lt;h3>3.3 Different Data Types for The &lt;code>productId&lt;/code>&lt;span class="hx:absolute hx:-mt-20" id="33-different-data-types-for-the-productid">&lt;/span>
&lt;a href="#33-different-data-types-for-the-productid" class="subheading-anchor" aria-label="Permalink for this section">&lt;/a>&lt;/h3>&lt;p>See the previous two examples. When using &lt;a href="https://www.statcan.gc.ca/en/developers/wds/user-guide#a11-5"target="_blank" rel="noopener">getAllCubesListLite&lt;/a>, the &lt;code>productId&lt;/code> is an integer. However when you get the table metadata via &lt;a href="https://www.statcan.gc.ca/en/developers/wds/user-guide#a11-1"target="_blank" rel="noopener">getCubeMetadata&lt;/a>, the &lt;code>productId&lt;/code> is a string. This is a minor issue.&lt;/p>
&lt;h3>3.4 Invalid DGUIDs&lt;span class="hx:absolute hx:-mt-20" id="34-invalid-dguids">&lt;/span>
&lt;a href="#34-invalid-dguids" class="subheading-anchor" aria-label="Permalink for this section">&lt;/a>&lt;/h3>&lt;p>There are 6037 distinct invalid DGUIDs (see interactive list below). These records were found by finding any records that did not match the regular expression listed below. The regular expression was built from the definitions outlined in &lt;a href="https://www150.statcan.gc.ca/n1/pub/92f0138m/92f0138m2019001-eng.htm"target="_blank" rel="noopener">1&lt;/a> and &lt;a href="https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/definition-eng.cfm?ID=geo055"target="_blank" rel="noopener">2&lt;/a>.&lt;/p>
&lt;div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">
&lt;div>&lt;pre>&lt;code># Regular expression made from:
# https://www150.statcan.gc.ca/n1/pub/92f0138m/92f0138m2019001-eng.htm
# https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/definition-eng.cfm?ID=geo055
pattern = r&amp;#39;^(?P&amp;lt;vintage&amp;gt;\d{4})(?P&amp;lt;type&amp;gt;[ASCBZ])(?P&amp;lt;schema&amp;gt;\d{4})(?P&amp;lt;guid&amp;gt;[A-Za-z0-9.]{1,11})$&amp;#39;&lt;/code>&lt;/pre>&lt;/div>&lt;div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
&lt;button
class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
title="Copy code"
aria-label="Copy code"
data-copied-label="Copied!"
>
&lt;div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;/button>
&lt;/div>
&lt;/div>
&lt;p>I have noticed a few patterns:&lt;/p>
&lt;ul>
&lt;li>Some don&amp;rsquo;t have a &lt;code>DGUID&lt;/code>, but have a &lt;code>GEO&lt;/code> value.
&lt;ul>
&lt;li>This makes sense in cases where the table is talking about geographies areas outside of Canada. For example, there are some &lt;code>Mexico&lt;/code> values.&lt;/li>
&lt;li>There are multiple cases where there should be a &lt;code>DGUID&lt;/code>. Table &lt;code>43100008&lt;/code> has &lt;code>Canada&lt;/code> as the &lt;code>GEO&lt;/code> value but has no &lt;code>DGUID&lt;/code>.&lt;/li>
&lt;li>Some cases would require a reworking of the &lt;code>DGUID&lt;/code>, for example table &lt;code>11100025&lt;/code> has a &lt;code>GEO&lt;/code> value of &lt;code>All census metropolitan areas&lt;/code>, but it is doable.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Some are just regular geographic unique identifiers without the &lt;code>Vintage&lt;/code>, &lt;code>Type&lt;/code>, and &lt;code>Schema&lt;/code>. For example &lt;code>product_id&lt;/code> &lt;code>13100409&lt;/code> has a &lt;code>DGUID&lt;/code> of &lt;code>10&lt;/code>, which is the Province and Territory code.&lt;/li>
&lt;li>Some are completely wrong, such as table &lt;code>38100162&lt;/code>, which has a &lt;code>DGUID&lt;/code> of &lt;code>2016E200213.1.195&lt;/code>.&lt;/li>
&lt;/ul>
&lt;p>
&lt;div id="grid-container" class="grid-container">&lt;/div>
&lt;div class="hx:overflow-x-auto hx:mt-6 hx:flex hx:rounded-lg hx:border hx:py-2 hx:ltr:pr-4 hx:rtl:pl-4 hx:contrast-more:border-current hx:contrast-more:dark:border-current hx:border-blue-200 hx:bg-blue-100 hx:text-blue-900 hx:dark:border-blue-200/30 hx:dark:bg-blue-900/30 hx:dark:text-blue-200">
&lt;div class="hx:ltr:pl-3 hx:ltr:pr-2 hx:rtl:pr-3 hx:rtl:pl-2">&lt;svg height=1.2em class="hx:inline-block hx:align-middle" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="2" stroke="currentColor" aria-hidden="true">&lt;path stroke-linecap="round" stroke-linejoin="round" d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z"/>&lt;/svg>&lt;/div>
&lt;div class="hx:w-full hx:min-w-0 hx:leading-7">
&lt;div class="hx:mt-6 hx:leading-7 hx:first:mt-0">&lt;a href="invalid_dguids_tables_july_05_2025.parquet">Click here&lt;/a> to download this table as Parquet.&lt;/div>
&lt;/div>
&lt;/div>
&lt;/p>
&lt;h3>3.5 Empty XML Data for Certain Tables&lt;span class="hx:absolute hx:-mt-20" id="35-empty-xml-data-for-certain-tables">&lt;/span>
&lt;a href="#35-empty-xml-data-for-certain-tables" class="subheading-anchor" aria-label="Permalink for this section">&lt;/a>&lt;/h3>&lt;p>I processed all English CSV data, but I was curious how large of an XML we would get for the large CSV tables. I checked out table &lt;code>98100404&lt;/code>, which has a CSV file size of 37.67 GB, and when I tried to download it, it returned a 66.37 KB ZIP file, which is far too small. When I unzipped the file, it just returned the &lt;code>98100404_Structure.xml&lt;/code>, and it is missing the expected &lt;code>98100404_1.xml&lt;/code> file.&lt;/p>
&lt;p>You can replicate the issue by running the following.&lt;/p>
&lt;div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">
&lt;div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Downloads the zipped up XML for productId 98100404&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">curl https://www150.statcan.gc.ca/t1/wds/rest/getFullTableDownloadSDMX/98100404 &lt;span class="p">|&lt;/span> &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="se">&lt;/span> jq -r &lt;span class="s1">&amp;#39;.object&amp;#39;&lt;/span> &lt;span class="p">|&lt;/span> xargs curl -O&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/div>&lt;div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
&lt;button
class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
title="Copy code"
aria-label="Copy code"
data-copied-label="Copied!"
>
&lt;div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;/button>
&lt;/div>
&lt;/div>
&lt;h3>3.6 Not Enough RAM!&lt;span class="hx:absolute hx:-mt-20" id="36-not-enough-ram">&lt;/span>
&lt;a href="#36-not-enough-ram" class="subheading-anchor" aria-label="Permalink for this section">&lt;/a>&lt;/h3>&lt;p>I only have 32 GB of RAM on my PC, and as you can see on the table listed in &lt;code>2. Result&lt;/code>, the largest table is 120.09 GB. I had to get creative when processing it. I added a 400 GB swapfile and changed a couple of kernel parameters (see below) in &lt;code>/etc/sysctl.d&lt;/code>.&lt;/p>
&lt;div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">
&lt;div>&lt;pre>&lt;code># Goes up to 200. A higher value means the system will swap more aggressively. The default value is 60.
vm.swappiness = 200
# Controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects. The default value is 100.
vm.vfs_cache_pressure = 0&lt;/code>&lt;/pre>&lt;/div>&lt;div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
&lt;button
class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
title="Copy code"
aria-label="Copy code"
data-copied-label="Copied!"
>
&lt;div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;/button>
&lt;/div>
&lt;/div>
&lt;h2>4. Next Steps&lt;span class="hx:absolute hx:-mt-20" id="4-next-steps">&lt;/span>
&lt;a href="#4-next-steps" class="subheading-anchor" aria-label="Permalink for this section">&lt;/a>&lt;/h2>&lt;ul>
&lt;li>Create a Dagster pipeline that automatically keeps the data up-to-date.&lt;/li>
&lt;li>Make sure that the data is accessible long-term by storing the data in &lt;a href="https://zenodo.org/communities/dataforcanada/records"target="_blank" rel="noopener">Zenodo&lt;/a> (operated by CERN). Zenodo allows versioning of a dataset, so we can keep track of the changes to each table.&lt;/li>
&lt;li>Create Python and R API bindings that use DuckDB. Users will be able to filter the data and also link the geographic boundaries if they wish. I am currently working on this in &lt;a href="https://github.com/dataforcanada/d4c-api"target="_blank" rel="noopener">here&lt;/a>.&lt;/li>
&lt;/ul>
&lt;h2>5. Other&lt;span class="hx:absolute hx:-mt-20" id="5-other">&lt;/span>
&lt;a href="#5-other" class="subheading-anchor" aria-label="Permalink for this section">&lt;/a>&lt;/h2>&lt;p>I made a brief 5 minute presentation on modernizing Statistics Canada data. You can view it &lt;a href="https://data-01.diegoripley.ca/modernizing_access_to_statistics_canada_data_july_11_2025/#/"target="_blank" rel="noopener">here&lt;/a>, best viewed in full screen.&lt;/p></description></item><item><title>Calculating Polygon Centroid with OGR</title><link>https://www.diegoripley.ca/blog/2014/calculating-centroid-with-ogr/</link><pubDate>Sun, 30 Nov 2014 00:00:00 +0000</pubDate><guid>https://www.diegoripley.ca/blog/2014/calculating-centroid-with-ogr/</guid><description>
&lt;p>A simple recipe for calculating the centroid values for polygon features.&lt;/p>
&lt;div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">
&lt;div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">collections&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">OrderedDict&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">ogr&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">calculate_centroid&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">spatial_file&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">unique_field&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="s2">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> :param spatial_file: The path of the spatial file.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> :param unique_field: The unique id to use as a key for
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> our dictionary.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> :return: Ordered dictionary of the centroid of features
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> (longitude, latitude).
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="s2"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">data_source&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">ogr&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">spatial_file&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">layer&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">data_source&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">GetLayerByIndex&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">feature_centroids&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">OrderedDict&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">feature&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">layer&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">geom&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">feature&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">GetGeometryRef&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">unique_identifier&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">feature&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">GetField&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">unique_field&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># If multipart, get centroid from the part with largest area.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="n">geom&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">GetGeometryName&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s1">&amp;#39;MULTIPOLYGON&amp;#39;&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parts&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">part&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">geom&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">centroid&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">part&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Centroid&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parts&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">append&lt;/span>&lt;span class="p">(((&lt;/span>&lt;span class="n">centroid&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">GetX&lt;/span>&lt;span class="p">(),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">centroid&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">GetY&lt;/span>&lt;span class="p">()),&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">part&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Area&lt;/span>&lt;span class="p">()))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># Choose centroid from largest multipart feature.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">parts&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="nb">max&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">parts&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">key&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="k">lambda&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="n">x&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">centroid&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">parts&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">else&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">centroid&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">geom&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Centroid&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">centroid&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">centroid&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">GetX&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="n">centroid&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">GetY&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">feature_centroids&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">unique_identifier&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">centroid&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">feature_centroids&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">calculate_centroid&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;filename.geojson&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;field&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/div>&lt;div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
&lt;button
class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
title="Copy code"
aria-label="Copy code"
data-copied-label="Copied!"
>
&lt;div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4">&lt;/div>
&lt;/button>
&lt;/div>
&lt;/div></description></item></channel></rss>