free pascal 2.6.2x使用教学

【Pascal语言编译器 Free Pascal for Mac 2.6.4 下载】_编译工具_编程开发_软件下载_新浪科技_新浪网
& Pascal语言编译器 Free Pascal for Mac
Pascal语言编译器 Free Pascal for Mac 2.6.4
发布公司:Free Pascal team
授权方式:
软件评级:
绿色认证:
操作系统:Mac
软件语言:英文
软件大小:88,954 KB
更新日期:
下载次数:累计/2978 本周/14
关键字项:
  Free Pascal简称FPC(原名为 FPK Pascal)是一个32及64位的Pascal及Object Pascal编译器。
  Free Pascal提供多种语法模式,使其兼容于 Turbo Pascal、Delphi、Apple Pascal 等 Pascal 语法。Free Pascal并支持多种处理器,包括(但不限于) Intel 80386、Motorola 680x0,以及多种操作系统,包括(但不限于) Linux、FreeBSD、NetBSD、DOS、Win32、Win64、OS/2、BeOS、SunOS(Solaris)、QNX 以及以前的 Amiga。
  Free Pascal的口号是“Write Once, Compile Everywhere”(写一次代码,在各处编译)。
  Free Pascal是开源软件,用 Object Pascal 编写,并以 GPL 许可。
  Free Pascal开发团队于 2009 年 4 月 12 日发布了最新的稳定版本 2.2.4[1]。目前最新的开发版本号为 2.3.x。
  在 Free Pascal之基础上,尚有一个名为 Lazarus 的项目。Lazarus 是一个类似 Delphi 的快速应用开发(RAD)环境。Lazarus 用 Free Pascal编译,也利用 Free Pascal作位编译器,其结果是 Lazarus 也可在多种操作系统上运行,并且用户可以极为方便地创建跨平台图形接口应用程序。(来源:wikipedia)
一个32及64位的Pascal及Object Pascal编译器
Win7/2008/Vista/2003/XP|免费版|星级:&下载次数:210160
一个32及64位的Pascal及Object Pascal编译器
Unix/Linux|免费版|星级:&下载次数:667
一个32及64位的Pascal及Object Pascal编译器
Mac|免费版|星级:&下载次数:2966
允许您使用宏来自动执行一些操作。
WindowsAll|免费版|星级:&下载次数:10655
本类周排名
正在加载数据
本类总排名
正在加载数据
欢迎参与投票调查From Free Pascal wiki
English (en) |
As of 0.9.25, Lazarus has full Unicode support in all platforms except Gtk 1. In this page one can find instructions for Lazarus users, roadmaps, descriptions of basic concepts and implementation details.
Even though Lazarus has Unicode widgetsets, it's important to note that not everything is Unicode. It's the responsibility of the developer to know what is the encoding of their strings and do the proper conversion between libraries which expect different encodings.
Usually the encoding is per library (e.g. a dynamic library DLL or a Lazarus package). Each library will uniformly expect 1 kind of encoding, which will usually either be Unicode (UTF-8 for Lazarus) or ANSI (which actually means the system encoding, and may be UTF-8 or not). The RTL and the FCL of FPC &= 2.6 expect ANSI strings.
You can convert between Unicode and ANSI using
the UTF8ToAnsi and AnsiToUTF8 functions from the (FPC) System unit
or the UTF8ToSys and SysToUTF8 from the (Lazarus) FileUtil unit.
The latter two are smarter (faster) but pull more code into your program.
The Free Pascal Runtime Library (RTL), and the Free Pascal Free Component Library (FCL) in current FPC versions (&= 2.6.x) are ANSI, so you will need to convert strings coming from Unicode libraries or going to Unicode libraries (e.g. the LCL).
There are significant improvements to development branches of FPC 2.7.1 with regard to Strings.
See RawByteString and UTF8String in .
Note: AnsiToUTF8 and UTF8ToAnsi require a widestring manager under Linux, BSD and Mac OS X. You can use the SysToUTF8 and UTF8ToSys functions (unit FileUtil) or add the widestring manager by adding cwstring as one of the first units to your program's uses section.
Say you get a string from a TEdit and you want to give it to some RTL file routine:
MyString: string; // utf-8 encoded
MyString := MyTEdit.Text;
SomeRTLRoutine(UTF8ToAnsi(MyString));
And for the opposite direction:
MyString: string; // ANSI encoded
MyString := SomeRTLRoutine;
MyTEdit.Text := AnsiToUTF8(MyString);
A widestring is a string type whose basic data holding elements have a size of 2 bytes. Widestrings almost always hold data in the UTF-16 encoding. See
Note that while each data point accessible as an array of a widestring has 2 bytes, in UTF-16 a character may have 1 or 2 data points, which would then occupy 2 or 4 bytes. This means that accessing a Widestring as an array and expecting to obtain UTF-16 characters this way is completely wrong and will fail when a 4 byte character is present in the string. Note also that UTF-16, like UTF-8, may have decomposed characters. The character "?" for example might be encoded as a single character or as 2 characters: "A" + a modifying accent. Thus in Unicode a text which involves accented letters can often be encoded in multiple ways and Lazarus and FPC do not handle this automatically.
When passing Ansistrings to Widestrings you have to convert the encoding.
w: widestring;
w:='?ber'; // wrong, because FPC will convert system codepage to UTF16
w:=UTF8ToUTF16('?ber'); // correct
Button1.Caption:=UTF16ToUTF8(w);
Until Lazarus 0.9.30 the UTF-8 handling routines were in the LCL in the unit LCLProc. In Lazarus 0.9.31+ the routines in LCLProc are still available for backwards compatibility but the real code to deal with UTF-8 is located in the lazutils package in the unit lazutf8.
To execute operations on UTF-8 strings please use routines from the unit lazutf8 instead of routines from the SysUtils routine from Free Pascal, because SysUtils is not yet prepared to deal with Unicode, while lazutf8 is. Simply substitute the routines from SysUtils with their lazutf8 equivalent, which always has the same name except for an added "UTF8" prefix.
Also note that simply iterating over chars as if the string was an array does not work in Unicode. This is not something specific to UTF-8: one simply cannot suppose that a character will have a fixed size in Unicode. If you want to iterate over the characters of an UTF-8 string, there are basically two ways:
iterate over the bytes - useful for searching a substring or when looking only at the ASCII characters of the UTF8 string. For example when parsing XML files.
iterate over the characters - useful for graphical components like synedit. For example when you want to know the third printed character on the screen.
Due to the special nature of UTF8 you can simply use the normal string functions for searching a sub-string. Even though UTF-8 is a multi-byte encoding the first byte can not be confused with the second. So searching for a valid UTF-8 string with Pos will always return a valid UTF-8 position:
uses lazutf8; // LCLProc for Lazarus 0.9.30 or inferior
procedure Where(SearchFor, aText: string);
BytePos: LongInt;
CharacterPos: LongInt;
BytePos:=Pos(SearchFor,aText);
CharacterPos:=UTF8Length(PChar(aText),BytePos-1);
writeln('The substring &',SearchFor,'& is in the text &',aText,'&',
' at byte position ',BytePos,' and at character position ',CharacterPos);
Due to the ambiguity of Unicode, Pos() (just like any compare) might show unexpected behavior, when e.g. one of the string contains decomposed characters, while the other uses the direct codes for the same letter. This is not automatically handled by the RTL.
Unicode characters can vary in length, so the best solution for accessing them is to use an iteration when one intends to access the characters in the sequence in which they are. For iterating through the characters use this code:
uses lazutf8; // LCLProc for Lazarus 0.9.30 or lower
procedure DoSomethingWithString(AnUTF8String: string);
CharLen: integer;
FirstByte, SecondByte, ThirdByte: Char;
p:=PChar(AnUTF8String);
CharLen := UTF8CharacterLength(p);
// Here you have a pointer to the char and its length
// You can access the bytes of the UTF-8 Char like this:
if CharLen &= 1 then FirstByte := P[0];
if CharLen &= 2 then SecondByte := P[1];
if CharLen &= 3 then ThirdByte := P[2];
inc(p,CharLen);
until (CharLen=0) or (p^ = #0);
Besides iterating one might also want to have random access to UTF-8 Characters.
uses lazutf8; // LCLProc for Lazarus 0.9.30 or inferior
AnUTF8String, NthChar: string;
NthChar := UTF8Copy(AnUTF8String, N, 1);
The following demonstrates how to show the 32bit code point value of each character in an UTF8 string:
uses lazutf8; // LCLProc for Lazarus 0.9.30 or inferior
procedure IterateUTF8Characters(const AnUTF8String: string);
unicode: Cardinal;
CharLen: integer;
p:=PChar(AnUTF8String);
unicode:=UTF8CharacterToUnicode(p,CharLen);
writeln('Unicode=',unicode);
inc(p,CharLen);
until (CharLen=0) or (unicode=0);
Nearly all operations which one might want to execute with UTF-8 strings are covered by the routines in the unit lazutf8 (unit LCLProc for Lazarus 0.9.30 or lower). See the following list of routines taken from lazutf8.pas:
function UTF8CharacterLength(p: PChar): integer;
function UTF8Length(const s: string): PtrInt;
function UTF8Length(p: PChar; ByteCount: PtrInt): PtrInt;
function UTF8CharacterToUnicode(p: PChar; out CharLen: integer): Cardinal;
function UnicodeToUTF8(u: cardinal; Buf: PChar): integer; inline;
function UnicodeToUTF8SkipErrors(u: cardinal; Buf: PChar): integer;
function UnicodeToUTF8(u: cardinal): shortstring; inline;
function UTF8ToDoubleByteString(const s: string): string;
function UTF8ToDoubleByte(UTF8Str: PChar; Len: PtrInt; DBStr: PByte): PtrInt;
function UTF8FindNearestCharStart(UTF8Str: PChar; Len: integer;
BytePos: integer): integer;
// find the n-th UTF8 character, ignoring BIDI
function UTF8CharStart(UTF8Str: PChar; Len, CharIndex: PtrInt): PChar;
// find the byte index of the n-th UTF8 character, ignoring BIDI (byte len of substr)
function UTF8CharToByteIndex(UTF8Str: PChar; Len, CharIndex: PtrInt): PtrInt;
procedure UTF8FixBroken(P: PChar);
function UTF8CharacterStrictLength(P: PChar): integer;
function UTF8CStringToUTF8String(SourceStart: PChar; SourceLen: PtrInt) : string;
function UTF8Pos(const SearchForText, SearchInText: string): PtrInt;
function UTF8Copy(const s: string; StartCharIndex, CharCount: PtrInt): string;
procedure UTF8Delete(var s: String; StartCharIndex, CharCount: PtrInt);
procedure UTF8Insert(const source: String; var s: string; StartCharIndex: PtrInt);
function UTF8LowerCase(const AInStr: string; ALanguage: string=''): string;
function UTF8UpperCase(const AInStr: string; ALanguage: string=''): string;
function FindInvalidUTF8Character(p: PChar; Count: PtrInt;
StopOnNonASCII: Boolean = false): PtrInt;
function ValidUTF8String(const s: String): String;
procedure AssignUTF8ListToAnsi(UTF8List, AnsiList: TStrings);
//compare functions
function UTF8CompareStr(const S1, S2: string): Integer;
function UTF8CompareText(const S1, S2: string): Integer;
Lazarus controls and functions expect filenames and directory names in UTF-8 encoding, but the RTL uses ANSI strings for directories and filenames.
For example, consider a button which sets the Directory property of the TFileListBox to the current directory. The RTL Function
is ANSI, not Unicode, so conversion is needed:
procedure TForm1.Button1Click(Sender: TObject);
FileListBox1.Directory:=SysToUTF8(GetCurrentDir);
// or use the functions from the FileUtil unit
FileListBox1.Directory:=GetCurrentDirUTF8;
The unit FileUtil defines common file functions with UTF-8 strings:
// basic functions similar to the RTL but working with UTF-8 instead of the
// system encoding
// AnsiToUTF8 and UTF8ToAnsi need a widestring manager under Linux, BSD, Mac OS X
// but normally these OS use UTF-8 as system encoding so the widestringmanager
// is not needed.
function NeedRTLAnsi: boolean;// true if system encoding is not UTF-8
procedure SetNeedRTLAnsi(NewValue: boolean);
function UTF8ToSys(const s: string): string;// as UTF8ToAnsi but more independent of widestringmanager
function SysToUTF8(const s: string): string;// as AnsiToUTF8 but more independent of widestringmanager
// file operations
function FileExistsUTF8(const Filename: string): boolean;
function FileAgeUTF8(const FileName: string): Longint;
function DirectoryExistsUTF8(const Directory: string): Boolean;
function ExpandFileNameUTF8(const FileName: string): string;
function ExpandUNCFileNameUTF8(const FileName: string): string;
{$IFNDEF VER2_2_0}
function ExtractShortPathNameUTF8(Const FileName : String) : String;
function FindFirstUTF8(const Path: string; Attr: Longint; out Rslt: TSearchRec): Longint;
function FindNextUTF8(var Rslt: TSearchRec): Longint;
procedure FindCloseUTF8(var F: TSearchrec);
function FileSetDateUTF8(const FileName: String; Age: Longint): Longint;
function FileGetAttrUTF8(const FileName: String): Longint;
function FileSetAttrUTF8(const Filename: String; Attr: longint): Longint;
function DeleteFileUTF8(const FileName: String): Boolean;
function RenameFileUTF8(const OldName, NewName: String): Boolean;
function FileSearchUTF8(const Name, DirList : String): String;
function FileIsReadOnlyUTF8(const FileName: String): Boolean;
function GetCurrentDirUTF8: String;
function SetCurrentDirUTF8(const NewDir: String): Boolean;
function CreateDirUTF8(const NewDir: String): Boolean;
function RemoveDirUTF8(const Dir: String): Boolean;
function ForceDirectoriesUTF8(const Dir: string): Boolean;
// environment
function ParamStrUTF8(Param: Integer): string;
function GetEnvironmentStringUTF8(Index : Integer): String;
function GetEnvironmentVariableUTF8(const EnvVar: String): String;
function GetAppConfigDirUTF8(Global: Boolean): string;
The file functions of the FileUtil unit also take care of Mac OS X specific behaviour: OS X normalizes filenames. For example the filename '?.txt' can be encoded in Unicode with two different sequences (#$C3#$A4 and 'a'#$CC#$88). Under Linux and BSD you can create a filename with both encodings. OS X automatically converts the a umlaut to the three byte sequence. This means:
if Filename1 = Filename2 then ... // is not sufficient under OS X
if AnsiCompareFileName(Filename1, Filename2) = 0 then ... // not sufficient under fpc 2.2.2, not even with cwstring
if CompareFilenames(Filename1, Filename2) = 0 then ... // this always works (unit FileUtil or FileProcs
The default font (Tahoma) for user interface controls under Windows XP is capable of correctly displaying several scripts/alphabets/languages, including Arabic, Russian (Cyrillic alphabet) and Western languages (Latin/Greek alphabets), but not East Asian languages, like Chinese, Japanese and Korean.
Simply by going to the Control Panel, choosing Regional Settings, clicking on the Languages Tab and installing the East Asia Language Pack, the standard user interface font will start showing those languages correctly. Obviously Windows XP versions localized for those languages will already have this language pack installed. Extended instructions .
Later Windows versions presumably have support for these languages out of the box.
When you create source files with Lazarus and type some non-ASCII characters the file is saved in UTF8. It does not use a BOM (Byte Order Mark).
You can change the encoding via right click on source editor / File Settings / Encoding. Apart from the fact that UTF-8 files are not supposed to have BOMs, the reason for the lacking BOM is how FPC treats Ansistrings. For compatibility the LCL uses Ansistrings and for portability the LCL uses UTF8.
Note: Some MS Windows text editors might treat the files as encoded with the system codepage (OEM codepage) and show them as invalid characters. Do not add the BOM. If you add the BOM you have to change all string assignments.
For example:
Button1.Caption := '?ber';
When no BOM is given (and no codepage parameter was passed) the compiler treats the string as system encoding and copies each byte unconverted to the string. This is how the LCL expects strings.
// source file saved as UTF without BOM
if FileExists('?ber.txt') then ; // wrong, because FileExists expects system encoding
if FileExistsUTF8('?ber.txt') then ; // correct
The Unicode standard maps integers from 0 to 10FFFF(h) to characters. Each such mapping is called a code point. In other words, Unicode characters are in principle defined for code points from U+000000 to U+10FFFF (0 to 1 114 111).
There are three major schemes for representing Unicode code points as unique byte sequences. These schemes are called Unicode transformation formats: UTF-8, UTF-16 and UTF-32. Conversions between all of them are possible. Here are their basic properties:
UTF-8 UTF-16 UTF-32
Smallest code point [hex] 000 000000
Largest code point
[hex] 10FFFF 10FFFF 10FFFF
Code unit size [bits]
Minimal bytes/character
Maximal bytes/character
UTF-8 has several important and useful properties:
It is interpreted as a sequence of bytes, so that the concept of lo- and hi-order byte does not exist. Unicode characters U+0000 to U+007F (ASCII) are encoded simply as bytes 00h to 7Fh (ASCII compatibility). This means that files and strings which contain only 7-bit ASCII characters have the same encoding under both ASCII and UTF-8. All characters &U+007F are encoded as a sequence of several bytes, each of which has the two most significant bits set.
No byte sequence of one character is contained within a longer byte sequence of another character. This allows easy searching for substrings. The first byte of a multibyte sequence (representing a non-ASCII character) is always in the range C0h to FDh and it indicates how many bytes follow for this character. All further bytes in a multibyte sequence are in the range 80h to BFh. This allows easy resynchronization and robustness.
UTF-16 has the following most important properties:
It uses a single 16-bit word to encode characters from U+0000 to U+d7ff, and a pair of 16-bit words to encode any of the remaining Unicode characters.
Finally, any Unicode character can be represented as a single 4 byte/32-bit unit in UTF-32.
For more, see:
Since the GTK1 interface was declared obsolete in Lazarus 0.9.31, all LCL interfaces are Unicode capable and the LCL uses and accepts only UTF-8 encoded strings, unless in routines explicitly marked as accepting other encodings.
First, and most importantly, all Unicode patches for the Win32 interface must be enclosed by IFDEF WindowsUnicodeSupport, to avoid breaking the existing ANSI interface. After this stabilizes, all ifdefs will be removed and only the Unicode part will remain. At his moment all existing programs that use ANSI characters will need migration to Unicode.
Windows platforms &=Win9x are based on ISO code page standards and only partially support Unicode. Windows platforms starting with Windows NT (e.g. Windows 2000, XP, Vista, 7, 8) and Windows CE fully support Unicode.
Win 9x and NT offer two parallel sets of API functions: the old ANSI enabled *A and the new, Unicode enabled *W. *W functions accept wide strings - UTF-16 encoded strings - as parameters.
Windows 9x has all *W functions but they mostly have empty implementations, so they do nothing. Only some some *W functions are fully implemented in 9x; these are listed below in the section "Wide functions present on Windows 9x". This property is relevant as it allows to have one single application for both Win9x and WinNT and detect at runtime which set of APIs to use.
Windows CE only uses Wide API functions.
Some Wide API functions are present on Windows 9x. Here is a list of such functions:
Conversion example:
GetTextExtentPoint32(hdcNewBitmap, LPSTR(ButtonCaption),
Length(ButtonCaption), TextSize);
{$ifdef WindowsUnicodeSupport}
GetTextExtentPoint32W(hdcNewBitmap, PWideChar(Utf8Decode(ButtonCaption)), Length(WideCaption), TextSize);
GetTextExtentPoint32(hdcNewBitmap, LPSTR(ButtonCaption), Length(ButtonCaption), TextSize);
First Conversion example:
function TGDIWindow.GetTitle: String;
l: Integer;
l := Windows.GetWindowTextLength(Handle);
SetLength(Result, l);
Windows.GetWindowText(Handle, @Result[1], l);
function TGDIWindow.GetTitle: String;
l: Integer;
AnsiBuffer: string;
WideBuffer: WideString;
{$ifdef WindowsUnicodeSupport}
if UnicodeEnabledOS then
l := Windows.GetWindowTextLengthW(Handle);
SetLength(WideBuffer, l);
l := Windows.GetWindowTextW(Handle, @WideBuffer[1], l);
SetLength(WideBuffer, l);
Result := Utf8Encode(WideBuffer);
l := Windows.GetWindowTextLength(Handle);
SetLength(AnsiBuffer, l);
l := Windows.GetWindowText(Handle, @AnsiBuffer[1], l);
SetLength(AnsiBuffer, l);
Result := AnsiToUtf8(AnsiBuffer);
l := Windows.GetWindowTextLength(Handle);
SetLength(Result, l);
Windows.GetWindowText(Handle, @Result[1], l);
The compiler (FPC) supports specifying the code page in which the source code has been written via the command option -Fc (e.g. -Fcutf8) and the equivalent codepage directive (e.g. {$codepage utf8}). In this case, rather than literally copying the bytes that represent the string constants in your program, the compiler will interpret all character data according to that codepage. There are two things to watch out for though:
on Unix platforms, make sure you include a widestring manager by adding the cwstring unit to your uses-clause. Without it, the program will not be able to convert all character data correctly when running. It's not included by default because this unit makes your program dependent on libc, which makes cross-compilation harder.
The compiler converts all string constants that contain non-ASCII characters to widestring constants. These are automatically converted back to ansistring (either at compile time or at run time), but this can cause one caveat if you try to mix both characters and ordinal values in a single string constant:
For example:
program project1;
{$codepage utf8}
{$mode objfpc}{$H+}
{$ifdef unix}
uses cwstring;
a,b,c: string;
b:='='#$C3#$A4; // #$C3#$A4 is UTF-8 for ?
c:='?='#$C3#$A4;
writeln(a,b); // writes ?=?
writeln(c);
// writes ?=?¤
When compiled and executed, this will write:
The reason is once the ? is encountered, as mentioned above the rest of the constant string assigned to 'c' will be parsed as a widestring. As a result the #$C3 and #$A4 are interpreted as widechar(#$C3) and widechar(#$A4), rather than as ansichars.
Experimental - needs Windows testers!
Usually the RTL uses the system codepage for strings (e.g. FileExists and TStringList.LoadFromFile). On Windows this is a non Unicode encoding, so you can only use characters from your language group. The LCL works with UTF-8 encoding, which is the full Unicode range. On Linux and Mac OS X UTF-8 is typically the system codepage, so the RTL uses here by default CP_UTF8.
Since FPC 2.7.1 the default system codepage of the RTL can be changed to UTF-8 (CP_UTF8). So Windows users can now use UTF-8 strings in the RTL.
You can test it by adding -dEnableUTF8RTL to the Lazarus build options and recompiling you project.
For example FileExists and aStringList.LoadFromFile(Filename) now support full Unicode. See here for the complete list of functions that already support full Unicode:
AnsiToUTF8, UTF8ToAnsi, SysToUTF8, UTF8ToAnsi have no effect. They were mainly used for the above RTL functions, which no longer need a conversion. For WinAPI functions see below.
Many UTF8Encode and UTF8Decode calls are no longer needed, because when assigning UnicodeString to String and vice versus the compiler does it automatically for you.
When accessing the WinAPI you must use the "W" functions or use the functions UTF8ToWinCP and WinCPToUTF8.
You can enable the new mode by compiling Lazarus clean with -dEnableUTF8RTL
If you use string literals with WideString', UnicodeString or UTF8String, your sources now must have the right encoding. For example you can use UTF-8 source files (Lazarus default) and pass -FcUTF8 to the compiler.
"String" and "UTF8String" are different types. If you assign a String to an UTF8String the compiler adds code to check if the encoding is the same. This is costs unnecessary time and increases code size.
More information about the new FPC Unicode Support:
- Description of UTF-8 strings
- Code snippet that shows how to traverse UTF8 strings
Personal tools
This page was last modified on 2 December 2014, at 12:46.
This page has been accessed 36,320 times.

我要回帖

更多关于 free pascal linux 的文章

 

随机推荐