.NET6使用GDAL.Core中文乱码问题及解决方法

  • Post author:
  • Post category:其他




1.前言


GDAL

是一个处理矢量和栅格的GIS库,本文总结了在.NET6项目下使用Gdal.Core遇到的一些编码问题以及解决方案。

关键字:.NET6、Gdal.Core(版本2.3.0-beta-023)、中文乱码



2.问题类型及解决方案

首先创建一个.NET6项目后,安装Gdal.Core2.3.0-beta-023包(包含Gdal.Core与Gdal.Core.WindowsRuntime),注册GDAL和OGR组件后,开始对SHP进行操作,读写SHP时可能会遇到中文乱码问题。



2.1分析

中文显示乱码,归因还是编码问题,主要是由于GDAL与C#代码编码不一致,或者是封装层的编码不一致导致的。根据网上查的资料以及自己测试得出,GDAL中的中文编码有时候采用GBK,比如Ogr.Open(string utf8_path, int update)需要传入UTF8路径;

        //Ogr.Open(string utf8_path, int update)
        public static DataSource Open(string utf8_path, int update)
        {
            IntPtr intPtr = OgrPINVOKE.Open(StringToUtf8Bytes(utf8_path), update);
            DataSource result = (intPtr == IntPtr.Zero) ? null : new DataSource(intPtr, cMemoryOwn: true, ThisOwn_true());
            if (OgrPINVOKE.SWIGPendingException.Pending)
            {
                throw OgrPINVOKE.SWIGPendingException.Retrieve();
            }

            return result;
        }
        
        //Ogr.StringToUtf8Bytes(string str)
        internal static byte[] StringToUtf8Bytes(string str)
        {
            if (str == null)
            {
                return null;
            }

            byte[] array = new byte[Encoding.UTF8.GetMaxByteCount(str.Length) + 1];
            Encoding.UTF8.GetBytes(str, 0, str.Length, array, 0);
            return array;
        }

有时候采用GBK,比如属性字段名和属性值。而创建的文件中中文属性乱码,则是由于默认方法缺少.cpg文件(里面存有编码名称,ArcGIS根据该文件设置编码方式)。



2.2 问题类型及解决方案


  1. 无法打开中文路径

    ,打开中文路径的shp会报错;

    网上针对无法打开中文路径,大多说法是修改配置”GDAL_FILENAME_IS_UTF8″为“YES”或”NO”,这个可以根据当前编码是否为UTF8进行判断,在调用Ogr.RegisterAll();后使用。
        /// <summary>
        /// 配置编码
        /// </summary>
        private static void ConfigEncoding()
        {
            // 为了支持中文路径,如果默认编码非UTF8,请添加下面这句代码
            if (Encoding.Default.EncodingName != Encoding.UTF8.EncodingName || Encoding.Default.CodePage != Encoding.UTF8.CodePage)
            {
                var filenameConfig = Gdal.GetConfigOption("GDAL_FILENAME_IS_UTF8", string.Empty);
                if (filenameConfig!= "NO")
                {
                    Gdal.SetConfigOption("GDAL_FILENAME_IS_UTF8", "NO");
                }
            }
            try
            {
                Encoding gbk = Encoding.GetEncoding(FeatureExtensions.GdalEncoding);
            }
            catch (Exception e)//如果无法获取GBK编码,则需注册编码
            {
                Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
            }
        }

  1. 获取中文名称乱码

    ,获取DataSource数据源或图层Layer的中文名称乱码;

    同样有网友说修改配置”SHAPE_ENCODING”为””,但本人试过多次依然乱码,因此可以通过导入GDAL函数,自行转码的方式进行处理,本文提供几个类以供参考。


    LayerExtensions
using OSGeo.OGR;
using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
using System.Text;

namespace EM.GIS.GdalExtensions
{
    /// <summary>
    /// 图层扩展方法
    /// </summary>
    public static class LayerExtensions
    {
        /// <summary>
        /// 获取名称
        /// </summary>
        /// <param name="layer">图层</param>
        /// <returns>名称</returns>
        [DllImport(FeatureExtensions.GdalDllName, CallingConvention = CallingConvention.Cdecl)]
        public extern static IntPtr OGR_L_GetName(HandleRef layer);
               /// <summary>
        /// 获取名称
        /// </summary>
        /// <param name="layer">图层</param>
        /// <returns>名称</returns>
        public static string GetNameUTF8(this Layer layer)
        {
            var layerRef = Layer.getCPtr(layer); 
             IntPtr strPtr = OGR_L_GetName(layerRef);
            string value = strPtr.IntPtrTostring(Encoding.UTF8);
            return value;
        }
    }
}


IntPtrExtensions

using System;
using System.Runtime.InteropServices;
using System.Text;

namespace EM.GIS.GdalExtensions
{
    /// <summary>
    /// IntPtr扩展
    /// </summary>
    public static class IntPtrExtensions
    {
        /// <summary>
        /// 计算指定地址字节长度
        /// </summary>
        /// <param name="strPtr">地址</param>
        /// <returns>字节长度</returns>
        public static int GetIntPtrLength(this IntPtr strPtr)
        {
            int size;
            for (size = 0; Marshal.ReadByte(strPtr, size) > 0; size++) ;
            return size;
        }
        /// <summary>
        /// 从指定地址根据编码读取字符串
        /// </summary>
        /// <param name="strPtr">地址</param>
        /// <param name="encodingName">编码名称</param>
        /// <returns>字符串</returns>
        public static string IntPtrTostring(this IntPtr strPtr, string encodingName)
        {
            var encoding= Encoding.GetEncoding(encodingName);
            string value = strPtr.IntPtrTostring(encoding);
            return value;
        }
        /// <summary>
        /// 从指定地址根据编码读取字符串
        /// </summary>
        /// <param name="strPtr">地址</param>
        /// <param name="encodingName">编码名称</param>
        /// <returns>字符串</returns>
        public static string IntPtrTostring(this IntPtr strPtr, Encoding encoding)
        {
            int size = GetIntPtrLength(strPtr);
            byte[] array = new byte[size];
            Marshal.Copy(strPtr, array, 0, size);
            string value = encoding.GetString(array);
            return value;
        }
        /// <summary>
        /// 将字符串转成IntPtr
        /// </summary>
        /// <param name="str">字符串</param>
        /// <param name="encoding">编码</param>
        /// <returns>IntPtr</returns>
        public static IntPtr StringToIntPtr(this string str, Encoding encoding)
        {
            byte[] array = encoding.GetBytes(str);
            GCHandle hObject = GCHandle.Alloc(array, GCHandleType.Pinned);
            IntPtr pObject = hObject.AddrOfPinnedObject();
            if (hObject.IsAllocated)
                hObject.Free();
            return pObject;
        }
    }
}

  1. 获取中文属性乱码

    ,获取中文属性字段或者属性值出现乱码;

    与中文名称乱码处理方式一致,代码如下:


    FeatureExtensions
using OSGeo.OGR;
using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
using System.Text;

namespace EM.GIS.GdalExtensions
{
    /// <summary>
    /// 要素扩展
    /// </summary>
    public static class FeatureExtensions
    {
        /// <summary>
        /// gdal文件名
        /// </summary>
        public const string GdalDllName = "gdal202.dll";
        /// <summary>
        /// gdal编码名称
        /// </summary>
        public const string GdalEncoding = "GBK";

        /// <summary>
        /// 获取要素指定字段的字符串值
        /// </summary>
        /// <param name="featureHandle">要素句柄</param>
        /// <param name="fieldIndex">字段索引</param>
        /// <returns>字符串值</returns>
        [DllImport(GdalDllName, CallingConvention = CallingConvention.Cdecl)]
        public extern static IntPtr OGR_F_GetFieldAsString(HandleRef featureHandle, int fieldIndex);

        /// <summary>
        /// 设置要素指定字段的值为字符串
        /// </summary>
        /// <param name="featureHandle">要素句柄</param>
        /// <param name="fieldIndex">字段索引</param>
        /// <param name="value">值</param>
        [DllImport(GdalDllName, CallingConvention = CallingConvention.Cdecl)]
        public extern static void OGR_F_SetFieldString(HandleRef featureHandle, int fieldIndex, IntPtr value);
        /// <summary>
        /// 获取要素指定字段的字符串值
        /// </summary>
        /// <param name="feature">要素</param>
        /// <param name="fieldIndex">字段索引</param>
        /// <returns>字符串值</returns>
        public static string GetFieldAsStringUTF8(this Feature feature, int fieldIndex)
        {
            HandleRef handle = Feature.getCPtr(feature);
            IntPtr intptr = OGR_F_GetFieldAsString(handle, fieldIndex);
            string value = intptr.IntPtrTostring(Encoding.UTF8);
            return value;
        }
    }
}


FieldDefnExtensions

using OSGeo.OGR;
using System;
using System.Runtime.InteropServices;
using System.Text;

namespace EM.GIS.GdalExtensions
{
    public static class FieldDefnExtensions
    {
        /// <summary>
        /// 获取字段名称
        /// </summary>
        /// <param name="fieldDefn">字段定义</param>
        /// <returns>字段名称</returns>
        [DllImport(FeatureExtensions.GdalDllName, CallingConvention = CallingConvention.Cdecl)]
        public extern static IntPtr OGR_Fld_GetNameRef(IntPtr fieldDefn);
        /// <summary>
        /// 获取字段名称
        /// </summary>
        /// <param name="fieldDefn">字段定义</param>
        /// <returns>字段名称</returns>
        [DllImport(FeatureExtensions.GdalDllName, CallingConvention = CallingConvention.Cdecl)]
        public extern static IntPtr OGR_Fld_GetNameRef(HandleRef fieldDefn);
    
        /// <summary>
        /// 获取字段名称
        /// </summary>
        /// <param name="feature">字段定义</param>
        /// <returns>字段名</returns>
        public static string GetNameUTF8(this FieldDefn fieldDefn)
        {
            var fieldDefnRef = FieldDefn.getCPtr(fieldDefn);
            IntPtr strPtr = OGR_Fld_GetNameRef(fieldDefnRef);
            string value = strPtr.IntPtrTostring(Encoding.UTF8);
            return value;
        }
    }
}

  1. 创建带中文属性乱码

    ,当创建带有中文属性字段或属性值时,在ArcGIS或QGIS中显示乱码;

    需要在创建时,增加配置”ENCODING=UTF-8″,代码如下:


    DriverExtensions
using OSGeo.OGR;
using System;
using System.Linq;

namespace EM.GIS.GdalExtensions
{
    public static class DriverExtensions
    {
        /// <summary>
        /// 复制数据源(解决写入中文乱码)
        /// </summary>
        /// <param name="driver">驱动</param>
        /// <param name="srcDataSource">原始数据源</param>
        /// <param name="path">目录</param>
        /// <param name="options">可选项</param>
        /// <returns>新的数据源</returns>
        public static DataSource CopyDataSourceUTF8(this Driver driver, DataSource srcDataSource, string path, string[] options)
        {
            DataSource destDataSource = null;
            if (driver!=null&&srcDataSource!=null&&!string.IsNullOrEmpty(path))
            {
                string[] destOptions = GetOptionsWithUTF8(options);
                if (srcDataSource==null)
                {
                    destDataSource=driver.CreateDataSource(path, destOptions);
                }
                else
                {
                    destDataSource=driver.CopyDataSource(srcDataSource, path, destOptions);
                }
            }
            return destDataSource;
        }
        /// <summary>
        /// 获取包含UTF8编码的配置
        /// </summary>
        /// <param name="options">原有配置</param>
        /// <returns>新的配置</returns>
        public static string[] GetOptionsWithUTF8(this string[] options)
        {
            string encodingStr = "ENCODING=UTF-8";//配置增加编码,添加.cpg文件,以解决写入中文乱码
            string[] destOptions = options;
            if (destOptions==null)
            {
                destOptions=new string[] { encodingStr };
            }
            else
            {
                if (!destOptions.Contains(encodingStr))
                {
                    destOptions=new string[options.Length+1];
                    Array.Copy(options, destOptions, options.Length);
                    destOptions[destOptions.Length-1]=encodingStr;
                }
            }
            return destOptions;
        }
        /// <summary>
        /// 创建数据源(解决写入中文乱码)
        /// </summary>
        /// <param name="driver">驱动</param>
        /// <param name="path">目录</param>
        /// <param name="options">可选项</param>
        /// <returns>新的数据源</returns>
        public static DataSource CreateDataSourceUTF8(this Driver driver, string path, string[] options)
        {
            DataSource destDataSource = null;
            if (driver!=null&&!string.IsNullOrEmpty(path))
            {
                var destOptions = GetOptionsWithUTF8(options);
                destDataSource=driver.CreateDataSource(path, destOptions);
            }
            return destDataSource;
        }
    }
}



3.总结

本文通过调用导入函数再转码处理的方式解决中文乱码的问题,而如果C++能力较好的话可以自行修改GDAL源码中涉及编码部分代码,重新编译。另外本文中扩展类可在

EMap代码仓库

中获取。



版权声明:本文为lc156845259原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。