From Photos to 3D Scenes: An Easy Exploration with Gaussian Splatting and ml-sharp

In recent years, applications such as digital museums, digital cultural heritage preservation, and digital scenic spots have become commonplace online. The 3D reconstruction technology behind them impresses with its stunning visual appeal—using just a screen, it can deliver an immersive experience. Nowadays, this kind of immersive virtual experience has permeated various industries. For example, in the popular game Black Myth: Wukong, the "Little Western Paradise" scene was created by capturing and reconstructing the Thousand-Buddha Cliffside Temple (Qianfo'an) in Xi County, Linfen, Shanxi Province. Similarly, Assassin's Creed: Unity used 3D technology to recreate the majestic Notre-Dame de Paris, achieving a remarkable synergy between the virtual and the real.

Above: The Thousand-Buddha Cliffside Temple in Xi County, Linfen, Shanxi Province.

Every time I see such applications, I wish I could create a 3D scene myself. However, traditional 3D reconstruction technologies are typically characterized by "high acquisition costs and complex reconstruction processes," requiring significant investment in both software and hardware, which remains prohibitive for average users.

Gaussian Splatting Technology

Gaussian Splatting (3DGS) is a 3D reconstruction technology that emerged in 2023 and represents a significant innovative direction in the field. Scenes generated with Gaussian Splatting combine photorealistic quality with real-time rendering capabilities, making it suitable for high-precision applications like cultural relic preservation and digital twins. The main advantages of Gaussian Splatting are its efficiency, excellent detail reproduction, and the fact that it doesn't require topological modeling. However, it also comes with large data volumes, reliance on high-performance GPUs, weaker performance in handling dynamic scenes, and room for improvement in ecosystem compatibility.

Recently, Apple open-sourced a Gaussian Splatting model called ml-sharp, which can transform ordinary photos into 3D scenes in just a few seconds! Initially skeptical, I downloaded and tried it myself. Overall, the results are quite impressive. Interested readers are welcome to download and experience it.

Official Website: https://github.com/apple/ml-sharp

PS: This is currently the simplest machine learning model to install that I've encountered—it took less than 10 minutes from start to finish.

ml-sharp Installation

Installing ml-sharp is very straightforward and can be done in three simple steps.

Clone the ml-sharp Repository

Use git to clone the ml-sharp repository.

git clone https://github.com/apple/ml-sharp.git
cd ml-sharp

Create a Conda Environment

Then, use conda to create an environment.

conda create -n sharp python=3.13
conda activate sharp

PS: For a workflow guide on using conda, you can refer to a previous article: "[GIS Tutorial] Conda Basic Usage Workflow" (https://malagis.com/gis-tutorial-conda-base-work-flow.html)

Install Dependencies

pip install -r requirements.txt

Finally, verify the installation was successful using the following command.

sharp --help

Upon successful execution, you should see the following information, indicating a successful installation.

Performance Test

After installation, use the following command to convert photos into a Gaussian Splatting format model.

sharp predict -i /path/to/input/images -o /path/to/output/

Perhaps due to my relatively modest computer configuration, the process wasn't as fast as officially claimed, but it was completed within 30 seconds. Below are a couple of my test examples. First, a scene from the Yellow Crane Tower in Wuhan.

Original Image:

After Model Conversion (Opened with Blender):

Another scene from Huangshi 1907:

Original Image:

After Model Conversion (Opened with Blender):

Summary

Overall, while the results are decent, they still fall short compared to professional technologies like point clouds. Currently, it can only achieve a basic sense of three-dimensionality from images, with some reconstructed details appearing distorted. However, it must be said that the core advantage of ml-sharp is that it further lowers the barrier to applying 3D technology:

First, its high efficiency (generating a 3D scene from a single image in seconds) significantly shortens the production cycle for 3D content.
Second, its low barrier to entry (no professional 3D modeling skills required; simple command-line operation) allows practitioners without a technical background to get started quickly.
Third, its cross-platform compatibility (supporting CPU, CUDA, MPS, and other computing environments, with 3DGS files compatible with mainstream renderers) meets deployment needs across different scenarios.

These characteristics are pushing 3D technology from "professional-grade applications" towards "universal adoption," shifting from the traditional "high-cost, heavy-process" 3D production model to a "lightweight, high-efficiency, high-fidelity" reconstruction model. It is believed that in the future, there will be further applications in areas such as digital cultural heritage preservation (high-precision archiving of relics and ancient architecture), digital twins (city/campus modeling), gaming and film (generation of high-fidelity scene assets), cultural tourism (immersive digital scenic spots), VR/AR interaction, and industrial design (3D product presentation).

MalaGIS
Sharing GIS Technologies, Resources and News.