Recently in the Mala GIS group, someone raised an interesting question: Can GIS be used to analyze which province in China has produced the most leaders? Previously, you might have had to search via a search engine, but the data might not be up-to-date, and quantitative analysis would be cumbersome, requiring tedious data cleaning. However, in the AI era, we can easily accomplish this task using AI + GIS.
P.S.: This article is purely technical, and the data may not be entirely accurate. Corrections are welcome if any issues are found.
Approach
To create this map, two key points need to be defined:
- Definition of "Leaders": Here, we define them as Standing Committee members (abbreviated, as everyone understands).
- Province Attribution: We use native place (籍贯), a commonly searchable field. Note: Native place may differ from birthplace.
With these definitions, the basic workflow is as follows:
- Obtain the list of Standing Committee members from the 1st to the 20th National Congress.
- Search for their native places based on the list.
- Analyze the tabular data to count the number of members per province.
- Generate the map.
Using AI to Obtain Statistical Data
When writing this article, I used Gemini's 2.5 Pro model. You can try other models too. I won't share the specific prompts here, as it's inconvenient. I recommend using multiple AIs for this task, as the results might differ and require cross-verification.
After obtaining the data, export it to an Excel file. This step often requires significant data cleaning. For instance, Gemini might include province Pinyin alongside names, making the cleaning process tedious and requiring skill and patience. One advantage of Gemini is its ability to generate Excel files directly, meaning you can instruct it to clean the data before outputting to Excel.
However, use this feature cautiously. In my tests, AI performance in this area was inconsistent. For example, Hunan (湖南) appeared 12 times, but both Gemini and Yuanbao counted it as 13... The final cleaned data looked like this:
Finally, export the data as a CSV file for later use.
Generating the Map with GIS Software
Now it's time for GIS software. Simple tools like ArcGIS or QGIS will suffice. I'll use QGIS as an example to outline the basic steps.
Vector Map: We'll use the standard national vector map data shared in a previous article "「GIS Data」2024 National Standard Vector Map (Precise to County Level) Review Map No.: GS(2024) 0650". The provincial-level data is sufficient here.
Import the CSV file using Layer -> Add Layer -> Add Delimited Text Layer...
.
Since our CSV doesn't contain coordinates, leave the geometry settings as "No geometry".
Joining Data: Here's an issue: The name
field in the Tianditu map uses full province names (e.g., "山东省" - Shandong Province), while our collected data uses abbreviations (e.g., "山东" - Shandong). To match "山东省" with "山东", I used a simple method: Open the Tianditu layer's Attribute Table.
Use the Field Calculator to create a new field with the expression:
left(name,2)
This generates a new field with the abbreviation (e.g., "山东"). Note: This might cause issues like "黑龙江" (Heilongjiang) becoming "黑龙" (Heilong), but it's acceptable as Heilongjiang has no data points in this analysis. After processing the data, perform a table join.
Finally, apply a graduated color symbology to the map layer:
The final result is shown below:
Final Thoughts
This article focuses solely on discussing methods for data collection and visualization. The data accuracy is not guaranteed, and corrections are welcome if any errors are found. Also, due to various reasons, the specific prompts used and the final CSV/Excel data will not be shared. Please collect and process the data yourself following the approach outlined here.
Feel free to suggest other interesting map topics you'd like to see!