Tsv 0.1.0, an old new data format for serialization/deserialization


#1

The tsv project introduces a new format for data serialization/deserialization, which is text-based and deals with tabular data.

The problem

At serde’s point of view, the classic tsv is only applicable to the schema of (a sequence of) a struct composed of primitives( integer, floats, strings etc). The specification has to be extended to allow arbitrary schemas, such as a struct of a struct.

The solution

This project extends the spec by placing sequences in columns. See this file for specification.
It uses serde crate for serialization/deserialization, and reflection crate for generating column names and dealing with enums.

Notice

If you impl Serialize/Deserialize for your types to tell serde they are sequences/maps, do make sure their schemata() and Vec::schemata()/HashMap::schemata() are isomorphic.

Pros

  1. Simple.
    The only requirement for end users to use tsv files is to understand what a table is. It is deadly simple as a configuration file format for non-technical users.

  2. Available.
    You can use Microsoft Excel, OpenOffice/LibreOffie Calc and text editors that support elastic tabstops to view/edit tsv files.
    And it is easy to write tsv by hand if you have read all the 63 lines of the spec.

Cons

  1. Not efficiency-oriented.

  2. Not self-descripting.

License

Under MIT.

Example

A cargo configuration file written in tsv format could look like the following table( with spaces replacing tabs ):

                                        deps
package                         lib             value
name    version authors keyword macro   name    Version Path
tsv     0.1.0   oooutlk tsv     X       serde   1.0
                        tab             trees           ~/trees
                        table
                        serde

Serialization

#[derive(Serialize,Reflection)]
struct Package {
    name    : String,
    version : String,
    authors : Vec<String>,
    keyword : Vec<String>,
}
                                                                                                          
#[derive(Serialize,Reflection)]
struct Lib {
    #[serde(rename="macro")]
    proc_macro : bool,
}
                                                                                                          
#[derive(Serialize,Reflection)]
enum PkgSpecifier {
    Version( String ),
    Path( String ),
}
                                                                                                          
#[derive(Serialize,Reflection)]
struct CargoTsv {
    package : Package,
    lib     : Lib,
    deps    : BTreeMap<String,PkgSpecifier>,
}
                                                                                                          
let mut deps = BTreeMap::new();
deps.insert( "serde".to_string(),
    PkgSpecifier::Version( "1.0".to_string() ));
deps.insert( "trees".to_string(),
    PkgSpecifier::Path( "~/trees".to_string() ));
                                                                                                          
let cargo_tsv = CargoTsv {
    package : Package {
        name    : "tsv".to_string(),
        version : "0.1.0".to_string(),
        authors : vec![ "oooutlk".to_string() ],
        keyword : vec![
                        "tsv".to_string(),
                        "tab".to_string(),
                        "table".to_string(),
                        "serde".to_string() ],
    },
    lib: Lib{ proc_macro: false },
    deps,
};
let result = to_string( &cargo_tsv, Config::default() ).unwrap();
let expected = read_file( "test/cargo.tsv" ).unwrap();
assert_eq!( result, expected );

Deserialization

#[derive(Deserialize,Reflection,Debug,PartialEq)]
struct Package {
    name    : String,
    version : String,
    authors : Vec<String>,
    keyword : Vec<String>,
}
                                                                                                          
#[derive(Deserialize,Reflection,Debug,PartialEq)]
struct Lib {
    #[serde(rename="macro")]
    proc_macro : bool,
}
                                                                                                          
#[derive(Deserialize,Reflection,Debug,PartialEq)]
enum PkgSpecifier {
    Version( String ),
    Path( String ),
}
                                                                                                          
#[derive(Deserialize,Reflection,Debug,PartialEq)]
struct CargoTsv {
    package : Package,
    lib     : Lib,
    deps    : BTreeMap<String,PkgSpecifier>,
}
                                                                                                          
let mut deps = BTreeMap::new();
deps.insert( "serde".to_string(),
    PkgSpecifier::Version( "1.0".to_string() ));
deps.insert( "trees".to_string(),
    PkgSpecifier::Path( "~/trees".to_string() ));
                                                                                                          
let expected = CargoTsv {
    package: Package {
        name    : "tsv".to_string(),
        version : "0.1.0".to_string(),
        authors : vec![ "oooutlk".to_string() ],
        keyword : vec![
                        "tsv".to_string(),
                        "tab".to_string(),
                        "table".to_string(),
                        "serde".to_string() ],
    },
    lib: Lib{ proc_macro: false },
    deps,
};
let input = read_file( "test/cargo.tsv" ).unwrap();
let mut env = Env::default();
let result: CargoTsv = from_str( &input,
    CargoTsv::schemata(), &mut env ).unwrap();
assert_eq!( result, expected );

#2

I’d like to share you things interesting in tsv implementation.

  1. The data formats “in one-dimension space”, such as json or various *ML fit well with serde’s framework. Unfortunately tsv is “in two-dimension space” and some user-define stack must be used to fill the gap between the framework and the client code.

  2. One feature missing in serde but sometimes useful is generating the schema of (de)serialized type. The tsv crate uses an aux derive, Reflection to accomplish it. However I feel it’s like juggling to make the two working with each other.

  3. The effort to support enum can be much more than it seems to be.