SPEC 008 - Country Dataset Container
Status
- Status: Working specification of the current result
- Scope:
docs/ontology/social/ds/country - Dataset version:
0.1.0
1. Purpose
This specification describes the current country dataset package stored under:
docs/ontology/social/ds/country
The package combines:
- a container metadata graph in
index.rdf - a tabular source in CSV
- an RDF graph in
payload/triples/nid.rdfpopulated for all countries currently present in the source CSV - a Frictionless Data Package descriptor in
linkset/resources/datapackage.json - local copies of the Container and Linkset ontologies as normative references
The goal is to keep the package layered:
index.rdfdescribes the package and its internal documents- the CSV preserves the source table
nid.rdfcarries the semantic country node plus the link back to the CSVdatapackage.jsonformalizes the CSV schema
Within this directory, ds means dataset.
The broader design intent is not limited to one country example. The idea is to build a dataset from living documents and easily manipulable formats that, together, describe an entity as a whole while preserving multiple complementary views of the same subject:
- a semantic view
- a tabular or raw operational view
- a package/container view
- a schema view
In this sense, the package is not just "using" ISO 21597. It is extending the ISO 21597 container and linkset ideas toward a more general dataset-oriented application profile, suitable for datasets composed of heterogeneous but coordinated documents.
2. Physical Structure
docs/ontology/social/ds/
country/
index.rdf
ontology/
resources/
Container.rdf
Linkset.rdf
payload/
documents/
country-identifier-iso3166-1-alpha-2-en.csv
triples/
nid.rdf
linkset/
resources/
datapackage.json
3. Main Artifacts
index.rdf- package-level metadata graph
- declares the internal CSV document and the internal RDF document
payload/documents/country-identifier-iso3166-1-alpha-2-en.csv- source table with columns
NameandCode payload/triples/nid.rdf- current semantic graph
- currently contains modeled individuals for all countries present in the source CSV
linkset/resources/datapackage.json- Frictionless descriptor for the CSV resource and its schema
ontology/resources/Container.rdf- local reference copy of the ISO Container ontology
ontology/resources/Linkset.rdf- local reference copy of the ISO Linkset ontology
4. Current Logical Model
4.1 Container Layer
index.rdf plays the role of container description.
It declares:
- one publisher individual:
#OntoBDC - one CSV internal document:
#country-identifier - one RDF internal document:
#country-list - one
ct:ContainerDescription
It also records:
- publication metadata
- version metadata
- provenance with
prov:wasDerivedFrom - source distribution with
dcat:downloadURL
4.2 Tabular Layer
The CSV file is the operational source dataset:
payload/documents/country-identifier-iso3166-1-alpha-2-en.csv
Current fields:
NameCode
The current semantic mapping deliberately uses the logical key Code, not the physical row number.
4.3 RDF Layer
nid.rdf currently contains one modeled individual for each country present in the CSV source.
Representative example:
#BR
This resource is intentionally modeled as both:
schema:Countryls:Directed1toNLink
This means the same stable IRI acts as:
- the semantic country node
- the mini-container that carries the outbound mapping to other resources
The current nid.rdf also types this resource as:
prov:Entity
This is conceptually coherent with the idea that the country node is a managed data entity inside the dataset.
4.4 Mapping Layer
The mapping is encoded inside nid.rdf using the Linkset vocabulary.
For #BR, the graph contains:
- one anonymous
ls:LinkElementon the RDF side - one anonymous
ls:LinkElementon the CSV side - one anonymous
ls:URIBasedIdentifierfor the RDF-side reference - one anonymous
ls:StringBasedIdentifierfor the CSV-side reference
The CSV-side identifier currently uses:
ls:identifierField = "Code"ls:identifier = "BR"
This is the current core design decision: the graph points to the source record through the logical key field, without duplicating Name or Code inside the semantic node.
5. Why It Was Modeled This Way
5.1 Keep Package Metadata Separate From Domain Data
index.rdf exists to describe:
- what the dataset package is
- which documents belong to it
- who publishes it
- where it came from
It does not need to carry country-level semantic mapping.
5.2 Keep Source Data In The Source File
The CSV is the source of truth for tabular values.
That is why the current nid.rdf does not duplicate:
schema:nameschema:identifier- CSV field/value pairs such as
Code = BR
Instead, it links back to the CSV using a logical field/value identifier.
5.3 Use The Country Node As The Stable Anchor
The current design intentionally uses the country resource itself as the stable reference:
#BR a schema:Country#BR a ls:Directed1toNLink
This avoids scattering the model across extra named helper nodes such as #BR-map, #BR-rdf, or #BR-csv.
5.4 Prefer Logical Matching Over Physical Row Addressing
The current model uses:
identifierField = Codeidentifier = BR
instead of:
- row
32
This is preferable because row numbers are fragile under sorting, filtering, insertion, and regeneration. The Code column is the intended stable semantic key.
6. Advantages
- Lower redundancy: semantic RDF does not repeat data that already exists in the CSV.
- Clear anchoring: one IRI per country can act as both domain node and mapping anchor.
- Better resilience: logical identification by
Codesurvives tabular reordering. - Cleaner layering: container metadata, source data, semantic graph, and schema remain distinct.
- Incremental growth: the same pattern can be repeated for every country later.
7. Trade-Offs
- Heavier semantics per node: the same resource is both
schema:Countryandls:Directed1toNLink, which is concise but conceptually denser. - Linkset verbosity: even the minimal Linkset pattern still requires
LinkElementand identifier structures. - Identifier indirection: the CSV link still depends on a field/value indirection instead of carrying a direct row address.
- Mixed concerns in one RDF file:
nid.rdfcurrently carries both the country node and the cross-document mapping. - Application-profile choices: some modeling decisions intentionally generalize ISO 21597 beyond its narrower document-container usage into a broader dataset profile.
8. Current datapackage.json Role
The Frictionless descriptor currently declares:
- one table resource
- relative path to the CSV file
text/csvmedia typeutf-8encoding- field schema:
Code: stringName: string
The current descriptor also adds semantic typing at field level:
Code -> rdfType = http://schema.org/identifierName -> rdfType = http://schema.org/name
Its role is currently schema-oriented, not link-oriented.
In the current state of the model, nid.rdf does not directly point to datapackage.json.
An additional improvement worth keeping in scope is explicit language metadata for textual columns, especially the Name field, since the current file is effectively an English country label table.
9. Normative Alignment
9.1 ISO 21597 Container
The package is structurally aligned with the Container vocabulary because index.rdf uses:
ct:ContainerDescriptionct:InternalDocumentct:containsDocument- container-oriented document metadata such as filename, type, format, and version
Compliance level:
- Partial but meaningful alignment
Reason:
- the vocabulary is used consistently for package description
- the result is a project-specific profile that expands the container idea toward a broader dataset packaging use
9.2 ISO 21597 Linkset
The current mapping in nid.rdf uses:
ls:Directed1toNLinkls:LinkElementls:hasFromLinkElementls:hasToLinkElementls:hasDocumentls:hasIdentifierls:URIBasedIdentifierls:StringBasedIdentifier
Compliance level:
- Partial and structurally aligned
Reason:
- the current graph respects the basic modeling intent of document-to-element linking
- the CSV side now uses the vocabulary in a simpler and more normative way through field/value identification
- the current profile also pushes the linkset idea toward a generalized dataset entity model, where a semantic resource can simultaneously act as the stable anchor of a mini-container
9.3 Frictionless Data Package
datapackage.json is practically aligned with Frictionless because it declares:
- resource path
- format
- media type
- encoding
- field schema
- field-level
rdfTypeannotations
Compliance level:
- Good practical alignment
Reason:
- the descriptor is minimal but useful
- it currently serves as schema metadata for the CSV rather than a full publication workflow
- it is also being used as a bridge between tabular structure and semantic interpretation
9.4 Schema.org
The RDF graph currently aligns with Schema.org in a minimal way through:
schema:Country
Compliance level:
- Minimal but valid alignment
Reason:
- the class use is appropriate
- the current graph intentionally avoids duplicating descriptive properties that already live in the CSV
9.5 Provenance And Distribution
The package also uses:
prov:wasDerivedFromprov:generatedAtTimedcat:downloadURL
This improves traceability of the package as a whole.
10. Current Gaps
nid.rdfis now populated for all countries in the current CSV source, but the pattern still depends on the stability of that source table.nid.rdfstill contains OWL-generated declaration noise that could be reduced later.- The mapping currently assumes that
Codeis the stable key of the CSV resource. datapackage.jsonis present as schema metadata, but is not yet explicitly integrated into the RDF mapping.datapackage.jsonsemantically types the fields withrdfType, but explicit language metadata for textual columns is still missing.
11. Recommended Next Steps
- Decide whether the OWL-generated declarations in
nid.rdfshould be kept or simplified. - Decide whether the long-term design keeps domain data and link mappings in the same RDF file.
- Add explicit language metadata for the relevant textual fields in
datapackage.json, especially theNamecolumn.
12. Summary
The current result under docs/ontology/social/ds/country is a layered dataset package composed of:
- a container metadata graph in
index.rdf - a source CSV document
- a Frictionless schema descriptor
- an RDF graph in
nid.rdfpopulated for all countries currently present in the source CSV
The key current modeling choice is this:
- the country node itself is the stable semantic anchor
- the CSV remains the source of truth for tabular values
- the semantic mapping points back to the CSV logically through
identifierField = Codeandidentifier = BR
Holistically, the package is meant to work as a dataset made of living documents in easy-to-handle formats, each one contributing a different but coordinated perspective over the same entity:
- the container perspective
- the semantic perspective
- the raw/tabular perspective
- the schema perspective
This produces a result that is:
- compact
- traceable
- low in redundancy
- standards-aligned at the vocabulary level
but still intentionally lightweight and project-specific.
More specifically, it can be understood as a generalization of ISO 21597 from a document-container pattern to a broader dataset pattern, where multiple coordinated documents describe a full entity without forcing all meanings into a single serialization.
Reference
Linkset.rdf defines the ontology used to represent links between documents and between elements within those documents.
Classes
ls:Link: base link class; groups two or morels:LinkElementinstances.ls:BinaryLink: specialization ofls:Linkwith exactly 2 linked elements.ls:DirectedLink: link with semantic direction, separating source and target.ls:DirectedBinaryLink: directed link with exactly 1 source and 1 target.ls:Directed1toNLink: directed link with 1 source and multiple targets.ls:LinkElement: represents the "point" that participates in the link, usually pointing to a document and, optionally, to an internal identifier.ls:Identifier: abstract class for the mechanism used to identify an element within a document.ls:StringBasedIdentifier: string-based identifier.ls:QueryBasedIdentifier: identifier obtained through a query expression.ls:URIBasedIdentifier: URI-based identifier.
Object Properties
ls:hasLinkElement: links aLinkto itsLinkElementinstances.ls:hasFromLinkElement: subproperty ofhasLinkElementused to indicate source.ls:hasToLinkElement: subproperty ofhasLinkElementused to indicate target.ls:hasDocument: links aLinkElementto act:Documentfrom the Container ontology.ls:hasIdentifier: links aLinkElementto anIdentifier.
Datatype Properties
ls:identifier: textual value of the identifier inStringBasedIdentifier.ls:identifierField: name of the field where this identifier should be looked up.ls:queryLanguage: language used byQueryBasedIdentifier.ls:queryExpression: query expression.ls:uri: URI used inURIBasedIdentifier.