Choose encoding in r

6/3/2023

The file encoding needs therefore to be explicit as to ensure portability: utils :: read.csv2 ( file = file_path, fileEncoding = 'WINDOWS-1252' ) "LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=fr_FR.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C" However, once moving the code onto a Linux-environment, I got the following error: Loading the CSV file from Windows with the utils package appears to be a breeze: utils :: read.csv2 ( file = file_path ) # Date Heure Appelé Durée Coût # "Windows" Sys.getlocale () # "LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252" I work on Windows, and the Windows-1252 encoding is native to the platform: Sys.info () # sysname The encoding is displayed in the status bar while the Encoding menu enables you to change the selected character set. The editor does a pretty good job figuring out the encoding of the file. Its content is displayed below using Notepad++.

Let’s take the example of a file encoded as Windows-1252. If the encoding is different, pay attention on how you load the file into R. UTF-8 (or UTF-16) is the de facto encoding that you hope to get.

When working with flat files, encoding needs to be factored in right away to avoid issues down the line.

0 Comments

Choose encoding in r

Leave a Reply.

Author

Archives

Categories