Saturday, September 29, 2018

CSV vs XML vs JSON - Which is the Best Response Data Format?







Whether you are building a thin client (web application) or thick client (client-server application) at some point you are probably making requests to a web server and need a good data format for responses. As of today, there are three major data formats being used to transmit data from a web server to a client: CSV, XML, and JSON. In order to develop an application with a solid architecture, it's a good idea to understand the differences between each format and know when to use them. The purpose of this post is to define each data format, lay out the pros and cons for each, and discover which situations work best with each format.

CSV

CSV stands for "comma separated values". As the name implies, this data format is basically a list of elements separated by commas. Let's say that your response is sending back a list of people in a particular family. The format would look like this:

Eric,Andrea,Kusco

Pros - This format is the most compact of all three formats. Generally speaking, CSV formats are about half the size of XML and JSON formats. This is the major advantage of CSV because it can help reduce bandwidth

Cons - This format is the least versatile of all three formats. This is because a homemade parser is required to convert the CSV data into a native data structure. As a result, if the data structure changes, there is an associated overhead of having to change or even redesign your parsers. Furthermore, since the program creating the CSV and the program parsing the CSV reside on different machines (remember that we are passing data from one machine to another) then both programs must be updated simultaneously to prevent the receiving program to crash. Otherwise, an outage is required to update both programs individually to prevent incompatibility issues.

Finally, CSV does not really support data hierarchies. What if you wanted to send back attributes for each person in each family? You would then have to design a complex parser that knows which parts of the CSV are referring to elements of a family, and which parts are referring to elements of each person. One way to solve this problem is to use another delimiter like ";" to separate each person's attribute:

Eric;male;26,Andrea;female;26,Kusco;male;8

The problem with creating customized formats, however, is that you incur an overhead of maintaining an even more complex parser.

XML

XML stands for "extensible markup language". XML was designed in 1996 and officially became a W3C standard in 1998. It was created to better represent data formats with a hierarchical structure. The format looks like this:

<br /><br /><br /><br /><br /><br /> <person><br /><br /><br /><br /><br /><br /> <name><br /><br /><br /><br /><br /><br /> Eric<br /><br /><br /><br /><br /><br /> </name><br /><br /><br /><br /><br /><br /> <age><br /><br /><br /><br /><br /><br /> 26<br /><br /><br /><br /><br /><br /> </age><br /><br /><br /><br /><br /><br /> </person><br /><br /><br /><br /><br /><br /> <person><br /><br /><br /><br /><br /><br /> <name><br /><br /><br /><br /><br /><br /> Andrea<br /><br /><br /><br /><br /><br /> </name><br /><br /><br /><br /><br /><br /> <age><br /><br /><br /><br /><br /><br /> 26<br /><br /><br /><br /><br /><br /> </age><br /><br /><br /><br /><br /><br /> </person><br /><br /><br /><br /><br /><br /> <person><br /><br /><br /><br /><br /><br /> <name><br /><br /><br /><br /><br /><br /> Kusco<br /><br /><br /><br /><br /><br /> </name><br /><br /><br /><br /><br /><br /> <age><br /><br /><br /><br /><br /><br /> 8<br /><br /><br /><br /><br /><br /> </age><br /><br /><br /><br /><br /><br /> </person><br /><br /><br /><br /><br /><br />

Pros - This data format fully supports hierarchical data structures and is very appropriate when receiving complex data as a response. It is also very human readable. Most browsers have built in XML readers that allow you to inspect XML files. Since XML was the first standard hierarchical data format, most APIs have built in functionality to automatically convert XML data streams into native data structures like objects.

Cons - This data format is about three times as large as CSV. This is because each data element has an associated open and close parameter tag.

JSON

JSON stands for (Javascript Object Notation). It was invented in 2001 and became popularized by Yahoo and Google in 2005 and 2006. It was created as an alternative to XML. Like XML, however, it represents hierarchical data with the use of commas, curly braces and brackets. An example of JSON looks like this:

{"name":"Eric","age":"26"},






{"name":"Andrea","age":"26"},






{"name":"Kusco","age":"8"}

Pros - This data format supports hierarchical data while being smaller in size than XML. As its name implies, it was also created to more easily parse data into native Javascript objects, making it very useful for web applications. JSON is the best of both worlds with respect to CSV and XML. It's simple and compact like CSV, but supports hierarchical data like XML. Unlike XML, JSON formats are only about twice as large as CSV formats.

Cons - This data format has a little bit less support than XML. Since JSON is relatively newer than XML, fewer APIs exist to automatically convert JSON to native data structures. However, this is rapidly changing because newer APIs and plugins are supporting both XML and JSON.

Conclusion

As a general rule of thumb, JSON is the best data exchange format to date. It's light weight, compact, and versatile. CSV should only be used if you are sending huge amounts of data and if bandwidth is an issue. Today, XML should not be used as a data exchange format because it's better suited for document markups.