Structured Data Formats Compared
JSON, XML, and YAML are the three most common text-based data serialisation formats in software development. Each has distinct strengths: JSON dominates web APIs, XML remains essential in enterprise systems and document markup, and YAML is the standard for configuration files in DevOps and cloud infrastructure.
JSON: JavaScript Object Notation
JSON was formalised by Douglas Crockford in 2001 as a lightweight subset of JavaScript's object literal syntax. It has become the default data interchange format for web APIs.
Syntax
{
"name": "Colin Mackay",
"age": 42,
"active": true,
"languages": ["TypeScript", "Python", "C#"],
"address": {
"city": "Edinburgh",
"country": "UK"
}
} Characteristics
- Six data types: string, number, boolean, null, object, array
- No comments allowed (by specification)
- Keys must be quoted strings
- Trailing commas are not permitted
- Native parsing in all browsers and most languages
- Compact — minimal syntax overhead
Common uses
- REST and GraphQL API responses
- Package manifests (package.json, composer.json)
- Configuration (tsconfig.json, .eslintrc.json)
- NoSQL databases (MongoDB, CouchDB)
XML: Extensible Markup Language
XML was standardised by the W3C in 1998 as a general-purpose markup language. It is self-describing, extensible, and supports schemas, namespaces, and transformations.
Syntax
<?xml version="1.0" encoding="UTF-8"?>
<person>
<name>Colin Mackay</name>
<age>42</age>
<active>true</active>
<languages>
<language>TypeScript</language>
<language>Python</language>
<language>C#</language>
</languages>
<address>
<city>Edinburgh</city>
<country>UK</country>
</address>
</person> Characteristics
- Tag-based syntax with opening and closing tags
- Supports attributes and nested elements
- Comments allowed:
<!-- comment --> - Schema validation (XSD, DTD, RelaxNG)
- Namespaces for avoiding name conflicts
- XSLT for transformations, XPath for querying
- Verbose — significantly larger than equivalent JSON or YAML
Common uses
- SOAP web services
- RSS and Atom feeds
- SVG images, XHTML documents
- Office documents (OOXML: .docx, .xlsx, .pptx are ZIP archives of XML)
- Android layouts, Maven/Gradle build files
- Sitemaps (sitemap.xml)
YAML: YAML Ain't Markup Language
YAML was first released in 2001 as a human-friendly data serialisation format. It uses indentation for structure (similar to Python) and has become the standard for configuration in DevOps tools.
Syntax
name: Colin Mackay
age: 42
active: true
languages:
- TypeScript
- Python
- C#
address:
city: Edinburgh
country: UK Characteristics
- Indentation-based (spaces only — tabs are not allowed)
- Comments allowed:
# comment - Supports anchors and aliases for reusable nodes
- Multi-line strings with
|(literal block) or>(folded block) - Type inference (bare
true,42,nullare typed automatically) - Superset of JSON — valid JSON is valid YAML
- Multiple documents in one file using
---separator
Common uses
- CI/CD pipelines (GitHub Actions, GitLab CI, Azure Pipelines)
- Container orchestration (Docker Compose, Kubernetes manifests)
- Infrastructure as code (Ansible, CloudFormation)
- Static site generators (Hugo, Jekyll front matter)
Head-to-Head Comparison
| Feature | JSON | XML | YAML |
|---|---|---|---|
| Human readability | Good | Moderate | Excellent |
| Verbosity | Low | High | Very low |
| Comments | No | Yes | Yes |
| Schema validation | JSON Schema | XSD, DTD, RelaxNG | JSON Schema (via conversion) |
| Data types | 6 types | Text only (schema-defined) | Auto-typed |
| Parsing speed | Fast | Moderate | Slow |
| Whitespace sensitivity | No | No | Yes (indentation) |
| Browser native support | Yes (JSON.parse) | Yes (DOMParser) | No |
| Binary data | Base64 encoded | Base64 or CDATA | Base64 encoded |
| File extensions | .json | .xml | .yaml, .yml |
File Size Comparison
Representing the same data structure, typical size ratios:
- JSON: Baseline (1×)
- YAML: ~0.8× (slightly smaller due to no braces or quotes on keys)
- XML: ~1.5–2× (opening and closing tags add overhead)
After gzip/Brotli compression, the differences shrink significantly because repetitive tag structures compress well.
YAML Gotchas
YAML's type inference can cause unexpected behaviour:
noandoffare parsed as booleanfalse(in YAML 1.1)3.10may be parsed as3.1(float, not string "3.10")- Norway's country code
NObecomesfalse - Unquoted strings that look like numbers, booleans, or dates are auto-typed
YAML 1.2 (2009) reduced some of these issues, but not all parsers use 1.2 by default. When in doubt, quote your values.
When to Use Each Format
Choose JSON when:
- Building web APIs or communicating between services
- Storing data in NoSQL databases
- Working in JavaScript/TypeScript ecosystems
- You need fast parsing and broad language support
Choose XML when:
- Working with SOAP services or legacy enterprise systems
- You need document-oriented markup (mixed content with text and elements)
- Schema validation is critical
- Using RSS/Atom feeds or sitemaps
Choose YAML when:
- Writing configuration files for DevOps tools
- Human readability and editability are the priority
- Working with CI/CD pipelines or Kubernetes
- You need comments in your configuration
Frequently Asked Questions
Is JSON5 a good alternative to JSON?
JSON5 adds comments, trailing commas, and unquoted keys to JSON. It is useful for configuration files but is not suitable for APIs — there is no browser-native parser. Consider using JSONC (JSON with Comments) in VS Code or YAML for configuration needs.
Is TOML better than YAML for configuration?
TOML (Tom's Obvious Minimal Language) avoids YAML's whitespace sensitivity and type inference gotchas. It is used by Rust (Cargo.toml) and Python (pyproject.toml). For simple configuration, TOML is often clearer. For deeply nested structures, YAML may be more readable.
Can I convert between formats?
Yes. Most data can be round-tripped between JSON, XML, and YAML, though XML attributes and mixed content do not map cleanly to JSON or YAML. Tools like yq, jq, and xmlstarlet handle conversions on the command line.