Effective Methods for Validating GeoJSON Data Integrity

A few days ago, while working on a GIS dashboard development, I encountered a highly peculiar issue: the map worked perfectly in the development environment but failed to display in the production environment. After a thorough investigation, I finally discovered that the GeoJSON data in the production environment was problematic. The data team might have manually edited and accidentally deleted some content during the GeoJSON export process, resulting in incomplete data. Since no comprehensive validation was performed beforehand, this caused the map to fail to load.

After identifying the problem, I decided to create a validation tool to prevent similar issues in the future. I tested several methods available online and have summarized a few reliable ones to share with you. The core idea is: Attempt to convert each GeoJSON file into an ArcGIS format (such as a feature class). If the conversion is successful, the file is considered valid; if it fails, capture the error and log it.

Method 1: Using Geoprocessing Tools for Manual Batch Processing (Suitable for a Small Number of Files)

This method leverages the error handling mechanism of geoprocessing tools but requires manual setup.

First, place all the GeoJSON files you need to check in the same folder. Then, open the Geoprocessing pane (Analysis > Tools), search for and find the JSON To Features tool. This official tool for converting GeoJSON is strict with format requirements, making it very suitable for checking validity. In the tool interface, click the folder icon next to the Input JSON parameter. Locate the GeoJSON folder and select multiple files (hold Ctrl or Shift keys). Finally, set an output location and click Run.

The advantage of this method is that it requires no programming and utilizes existing tools. However, for a large number of files, the manual operation is cumbersome and may produce many temporary feature classes that are not needed.

Method 2: Using Python Scripts for Automated Batch Detection (Recommended)

This is the most powerful and automated method. You can create a Python script and run it in the ArcGIS Pro Python window or Notebook.

Script Approach:

Traverse all .geojson / .json files in a specified folder.
For each file, attempt conversion using arcpy.management.JSONToFeatures.
Use try-except statements to capture errors.
Record information about successful and failed files in a log file.

Example Script:

import arcpy
import os

# Set the workspace (folder containing GeoJSON files)
geo_json_folder = r"C:\Path\To\Your\GeoJSON\Folder"
# Set a temporary output geodatabase (for attempted conversions)
temp_gdb = r"C:\Path\To\Your\Temp\Validation.gdb"
# Set the log file path
log_file_path = os.path.join(geo_json_folder, "GeoJSON_Validation_Log.txt")

# Ensure the temporary GDB exists
if not arcpy.Exists(temp_gdb):
    arcpy.management.CreateFileGDB(os.path.dirname(temp_gdb), os.path.basename(temp_gdb))

# Get all GeoJSON files in the folder
geo_json_files = [f for f in os.listdir(geo_json_folder) if f.endswith(('.geojson', '.json'))]

# Open the log file for writing
with open(log_file_path, 'w') as log_file:
    log_file.write("GeoJSON File Validity Check Report\n")
    log_file.write("=" * 50 + "\n")

    for file_name in geo_json_files:
        file_path = os.path.join(geo_json_folder, file_name)

        # Set the temporary output feature class name
        output_fc_name = os.path.splitext(file_name)[0] # Remove extension
        output_fc = os.path.join(temp_gdb, output_fc_name)

        try:
            # Attempt to convert GeoJSON to a feature class
            # This is the critical step; failure will throw an exception
            arcpy.conversion.JSONToFeatures(file_path, output_fc)

            # If successful, write to log
            status_msg = f"SUCCESS: {file_name} is a valid GeoJSON file.\n"
            print(status_msg)
            log_file.write(status_msg)

            # Optional: Delete the successfully converted temporary feature class to save space
            arcpy.management.Delete(output_fc)

        except arcpy.ExecuteError as e:
            # If conversion fails, capture ArcGIS error
            error_msg = f"FAILED: {file_name} is not a valid GeoJSON file. Error: {str(e)}\n"
            print(error_msg)
            log_file.write(error_msg)
        except Exception as e:
            # Capture other possible exceptions (e.g., file unreadable)
            error_msg = f"ERROR: An unknown error occurred while processing {file_name}. Error: {str(e)}\n"
            print(error_msg)
            log_file.write(error_msg)

print(f"Validation complete! Detailed report saved to: {log_file_path}")

After the script runs, it will generate a text file named GeoJSON_Validation_Log.txt in your GeoJSON folder, clearly recording the detection results and specific error messages for each file.

Method 3: Using Third-Party Libraries for Quick Syntax Check (Lighter Weight)

If your focus is on "syntax validity" rather than "ArcGIS compatibility", you can use dedicated JSON/GeoJSON libraries, which are faster and do not generate temporary data.

import os
import json

geo_json_folder = r"C:\Path\To\Your\GeoJSON\Folder"
log_file_path = os.path.join(geo_json_folder, "GeoJSON_Syntax_Only_Log.txt")

geo_json_files = [f for f in os.listdir(geo_json_folder) if f.endswith(('.geojson', '.json'))]

with open(log_file_path, 'w') as log_file:
    log_file.write("GeoJSON File Syntax Check Report (Basic JSON Syntax Only)\n")
    log_file.write("=" * 60 + "\n")

    for file_name in geo_json_files:
        file_path = os.path.join(geo_json_folder, file_name)

        try:
            with open(file_path, 'r', encoding='utf-8') as f:
                # Attempt to load and parse JSON content
                data = json.load(f)

            # Optional: Perform some basic GeoJSON structure checks (e.g., must have "type" and "features" properties)
            if data.get("type") == "FeatureCollection" and "features" in data:
                status_msg = f"SUCCESS: {file_name} has valid syntax and basic structure.\n"
            else:
                status_msg = f"WARNING: {file_name} has valid syntax but may not be a standard FeatureCollection.\n"

            print(status_msg)
            log_file.write(status_msg)

        except json.JSONDecodeError as e:
            error_msg = f"FAILED: {file_name} has JSON syntax errors. Error: {e.msg} (at line {e.lineno}, column {e.colno})\n"
            print(error_msg)
            log_file.write(error_msg)
        except Exception as e:
            error_msg = f"ERROR: An unknown error occurred while reading {file_name}. Error: {str(e)}\n"
            print(error_msg)
            log_file.write(error_msg)

print(f"Syntax check complete! Detailed report saved to: {log_file_path}")

Summary and Recommendations

Finally, here is a summary of the three methods:

If you prioritize ultimate simplicity and speed, and only need to quickly check a small amount of data, choose Method 1 (direct drag-and-drop/add data).
If you prioritize reliable data quality and automation for formal editing and analysis, choose Method 2 (using Python script conversion detection).
If you prioritize pure efficiency and only need to quickly validate basic file syntax, choose Method 3 (using third-party library checks).

MalaGIS
Sharing GIS Technologies, Resources and News.