.NET6使用GDAL.Core中文乱码问题及解决方法
1.前言
GDAL
是一个处理矢量和栅格的GIS库,本文总结了在.NET6项目下使用Gdal.Core遇到的一些编码问题以及解决方案。
关键字:.NET6、Gdal.Core(版本2.3.0-beta-023)、中文乱码
2.问题类型及解决方案
首先创建一个.NET6项目后,安装Gdal.Core2.3.0-beta-023包(包含Gdal.Core与Gdal.Core.WindowsRuntime),注册GDAL和OGR组件后,开始对SHP进行操作,读写SHP时可能会遇到中文乱码问题。
2.1分析
中文显示乱码,归因还是编码问题,主要是由于GDAL与C#代码编码不一致,或者是封装层的编码不一致导致的。根据网上查的资料以及自己测试得出,GDAL中的中文编码有时候采用GBK,比如Ogr.Open(string utf8_path, int update)需要传入UTF8路径;
//Ogr.Open(string utf8_path, int update)
public static DataSource Open(string utf8_path, int update)
{
IntPtr intPtr = OgrPINVOKE.Open(StringToUtf8Bytes(utf8_path), update);
DataSource result = (intPtr == IntPtr.Zero) ? null : new DataSource(intPtr, cMemoryOwn: true, ThisOwn_true());
if (OgrPINVOKE.SWIGPendingException.Pending)
{
throw OgrPINVOKE.SWIGPendingException.Retrieve();
}
return result;
}
//Ogr.StringToUtf8Bytes(string str)
internal static byte[] StringToUtf8Bytes(string str)
{
if (str == null)
{
return null;
}
byte[] array = new byte[Encoding.UTF8.GetMaxByteCount(str.Length) + 1];
Encoding.UTF8.GetBytes(str, 0, str.Length, array, 0);
return array;
}
有时候采用GBK,比如属性字段名和属性值。而创建的文件中中文属性乱码,则是由于默认方法缺少.cpg文件(里面存有编码名称,ArcGIS根据该文件设置编码方式)。
2.2 问题类型及解决方案
-
无法打开中文路径
,打开中文路径的shp会报错;
网上针对无法打开中文路径,大多说法是修改配置”GDAL_FILENAME_IS_UTF8″为“YES”或”NO”,这个可以根据当前编码是否为UTF8进行判断,在调用Ogr.RegisterAll();后使用。
/// <summary>
/// 配置编码
/// </summary>
private static void ConfigEncoding()
{
// 为了支持中文路径,如果默认编码非UTF8,请添加下面这句代码
if (Encoding.Default.EncodingName != Encoding.UTF8.EncodingName || Encoding.Default.CodePage != Encoding.UTF8.CodePage)
{
var filenameConfig = Gdal.GetConfigOption("GDAL_FILENAME_IS_UTF8", string.Empty);
if (filenameConfig!= "NO")
{
Gdal.SetConfigOption("GDAL_FILENAME_IS_UTF8", "NO");
}
}
try
{
Encoding gbk = Encoding.GetEncoding(FeatureExtensions.GdalEncoding);
}
catch (Exception e)//如果无法获取GBK编码,则需注册编码
{
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
}
}
-
获取中文名称乱码
,获取DataSource数据源或图层Layer的中文名称乱码;
同样有网友说修改配置”SHAPE_ENCODING”为””,但本人试过多次依然乱码,因此可以通过导入GDAL函数,自行转码的方式进行处理,本文提供几个类以供参考。
LayerExtensions
using OSGeo.OGR;
using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
using System.Text;
namespace EM.GIS.GdalExtensions
{
/// <summary>
/// 图层扩展方法
/// </summary>
public static class LayerExtensions
{
/// <summary>
/// 获取名称
/// </summary>
/// <param name="layer">图层</param>
/// <returns>名称</returns>
[DllImport(FeatureExtensions.GdalDllName, CallingConvention = CallingConvention.Cdecl)]
public extern static IntPtr OGR_L_GetName(HandleRef layer);
/// <summary>
/// 获取名称
/// </summary>
/// <param name="layer">图层</param>
/// <returns>名称</returns>
public static string GetNameUTF8(this Layer layer)
{
var layerRef = Layer.getCPtr(layer);
IntPtr strPtr = OGR_L_GetName(layerRef);
string value = strPtr.IntPtrTostring(Encoding.UTF8);
return value;
}
}
}
IntPtrExtensions
using System;
using System.Runtime.InteropServices;
using System.Text;
namespace EM.GIS.GdalExtensions
{
/// <summary>
/// IntPtr扩展
/// </summary>
public static class IntPtrExtensions
{
/// <summary>
/// 计算指定地址字节长度
/// </summary>
/// <param name="strPtr">地址</param>
/// <returns>字节长度</returns>
public static int GetIntPtrLength(this IntPtr strPtr)
{
int size;
for (size = 0; Marshal.ReadByte(strPtr, size) > 0; size++) ;
return size;
}
/// <summary>
/// 从指定地址根据编码读取字符串
/// </summary>
/// <param name="strPtr">地址</param>
/// <param name="encodingName">编码名称</param>
/// <returns>字符串</returns>
public static string IntPtrTostring(this IntPtr strPtr, string encodingName)
{
var encoding= Encoding.GetEncoding(encodingName);
string value = strPtr.IntPtrTostring(encoding);
return value;
}
/// <summary>
/// 从指定地址根据编码读取字符串
/// </summary>
/// <param name="strPtr">地址</param>
/// <param name="encodingName">编码名称</param>
/// <returns>字符串</returns>
public static string IntPtrTostring(this IntPtr strPtr, Encoding encoding)
{
int size = GetIntPtrLength(strPtr);
byte[] array = new byte[size];
Marshal.Copy(strPtr, array, 0, size);
string value = encoding.GetString(array);
return value;
}
/// <summary>
/// 将字符串转成IntPtr
/// </summary>
/// <param name="str">字符串</param>
/// <param name="encoding">编码</param>
/// <returns>IntPtr</returns>
public static IntPtr StringToIntPtr(this string str, Encoding encoding)
{
byte[] array = encoding.GetBytes(str);
GCHandle hObject = GCHandle.Alloc(array, GCHandleType.Pinned);
IntPtr pObject = hObject.AddrOfPinnedObject();
if (hObject.IsAllocated)
hObject.Free();
return pObject;
}
}
}
-
获取中文属性乱码
,获取中文属性字段或者属性值出现乱码;
与中文名称乱码处理方式一致,代码如下:
FeatureExtensions
using OSGeo.OGR;
using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
using System.Text;
namespace EM.GIS.GdalExtensions
{
/// <summary>
/// 要素扩展
/// </summary>
public static class FeatureExtensions
{
/// <summary>
/// gdal文件名
/// </summary>
public const string GdalDllName = "gdal202.dll";
/// <summary>
/// gdal编码名称
/// </summary>
public const string GdalEncoding = "GBK";
/// <summary>
/// 获取要素指定字段的字符串值
/// </summary>
/// <param name="featureHandle">要素句柄</param>
/// <param name="fieldIndex">字段索引</param>
/// <returns>字符串值</returns>
[DllImport(GdalDllName, CallingConvention = CallingConvention.Cdecl)]
public extern static IntPtr OGR_F_GetFieldAsString(HandleRef featureHandle, int fieldIndex);
/// <summary>
/// 设置要素指定字段的值为字符串
/// </summary>
/// <param name="featureHandle">要素句柄</param>
/// <param name="fieldIndex">字段索引</param>
/// <param name="value">值</param>
[DllImport(GdalDllName, CallingConvention = CallingConvention.Cdecl)]
public extern static void OGR_F_SetFieldString(HandleRef featureHandle, int fieldIndex, IntPtr value);
/// <summary>
/// 获取要素指定字段的字符串值
/// </summary>
/// <param name="feature">要素</param>
/// <param name="fieldIndex">字段索引</param>
/// <returns>字符串值</returns>
public static string GetFieldAsStringUTF8(this Feature feature, int fieldIndex)
{
HandleRef handle = Feature.getCPtr(feature);
IntPtr intptr = OGR_F_GetFieldAsString(handle, fieldIndex);
string value = intptr.IntPtrTostring(Encoding.UTF8);
return value;
}
}
}
FieldDefnExtensions
using OSGeo.OGR;
using System;
using System.Runtime.InteropServices;
using System.Text;
namespace EM.GIS.GdalExtensions
{
public static class FieldDefnExtensions
{
/// <summary>
/// 获取字段名称
/// </summary>
/// <param name="fieldDefn">字段定义</param>
/// <returns>字段名称</returns>
[DllImport(FeatureExtensions.GdalDllName, CallingConvention = CallingConvention.Cdecl)]
public extern static IntPtr OGR_Fld_GetNameRef(IntPtr fieldDefn);
/// <summary>
/// 获取字段名称
/// </summary>
/// <param name="fieldDefn">字段定义</param>
/// <returns>字段名称</returns>
[DllImport(FeatureExtensions.GdalDllName, CallingConvention = CallingConvention.Cdecl)]
public extern static IntPtr OGR_Fld_GetNameRef(HandleRef fieldDefn);
/// <summary>
/// 获取字段名称
/// </summary>
/// <param name="feature">字段定义</param>
/// <returns>字段名</returns>
public static string GetNameUTF8(this FieldDefn fieldDefn)
{
var fieldDefnRef = FieldDefn.getCPtr(fieldDefn);
IntPtr strPtr = OGR_Fld_GetNameRef(fieldDefnRef);
string value = strPtr.IntPtrTostring(Encoding.UTF8);
return value;
}
}
}
-
创建带中文属性乱码
,当创建带有中文属性字段或属性值时,在ArcGIS或QGIS中显示乱码;
需要在创建时,增加配置”ENCODING=UTF-8″,代码如下:
DriverExtensions
using OSGeo.OGR;
using System;
using System.Linq;
namespace EM.GIS.GdalExtensions
{
public static class DriverExtensions
{
/// <summary>
/// 复制数据源(解决写入中文乱码)
/// </summary>
/// <param name="driver">驱动</param>
/// <param name="srcDataSource">原始数据源</param>
/// <param name="path">目录</param>
/// <param name="options">可选项</param>
/// <returns>新的数据源</returns>
public static DataSource CopyDataSourceUTF8(this Driver driver, DataSource srcDataSource, string path, string[] options)
{
DataSource destDataSource = null;
if (driver!=null&&srcDataSource!=null&&!string.IsNullOrEmpty(path))
{
string[] destOptions = GetOptionsWithUTF8(options);
if (srcDataSource==null)
{
destDataSource=driver.CreateDataSource(path, destOptions);
}
else
{
destDataSource=driver.CopyDataSource(srcDataSource, path, destOptions);
}
}
return destDataSource;
}
/// <summary>
/// 获取包含UTF8编码的配置
/// </summary>
/// <param name="options">原有配置</param>
/// <returns>新的配置</returns>
public static string[] GetOptionsWithUTF8(this string[] options)
{
string encodingStr = "ENCODING=UTF-8";//配置增加编码,添加.cpg文件,以解决写入中文乱码
string[] destOptions = options;
if (destOptions==null)
{
destOptions=new string[] { encodingStr };
}
else
{
if (!destOptions.Contains(encodingStr))
{
destOptions=new string[options.Length+1];
Array.Copy(options, destOptions, options.Length);
destOptions[destOptions.Length-1]=encodingStr;
}
}
return destOptions;
}
/// <summary>
/// 创建数据源(解决写入中文乱码)
/// </summary>
/// <param name="driver">驱动</param>
/// <param name="path">目录</param>
/// <param name="options">可选项</param>
/// <returns>新的数据源</returns>
public static DataSource CreateDataSourceUTF8(this Driver driver, string path, string[] options)
{
DataSource destDataSource = null;
if (driver!=null&&!string.IsNullOrEmpty(path))
{
var destOptions = GetOptionsWithUTF8(options);
destDataSource=driver.CreateDataSource(path, destOptions);
}
return destDataSource;
}
}
}
3.总结
本文通过调用导入函数再转码处理的方式解决中文乱码的问题,而如果C++能力较好的话可以自行修改GDAL源码中涉及编码部分代码,重新编译。另外本文中扩展类可在
EMap代码仓库
中获取。
版权声明:本文为lc156845259原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。