Comprehensive guide on how to better name and organize your data

Best practices in file naming

RDM file naming
Image: Laure Perrier, University of Toronto
  • Avoid using spaces, dots and special characters (& or ? or !)
  • Use hyphens (-), underscores (_), or capitalization (C) to separate elements in a file name
  • Include an abbreviation in the file name to identify
    • the instrument used
    • the phase (if research constitutes multiple phases)
    • the transformation phase (i.e. original, raw, compressed, digitized, recoded, restructured, cleaned)
    • the source of third-party data (data provider or principle investigator) (i.e.  YinGiordanoAntarcticImagesSTCCensus2016)
    • the team (if working with multiple teams)
    • the language (if working with multiple languages)
  • Include versioning within file names as appropriate
     

Examples of good file names

  • MCIM_Proposal_0.9.doc
  • PressHouseUserManual-01.02.doc
     

File versioning

Versioning is used to ensure you are not working on outdated versions of files (or documents or datasets or records, etc.), particularly in collaborative work.

  • Include version information in both the file name, and in the document itself
  • Use sequential numbering (i.e. 0.1, 0.2, 0.3 …) for draft until a final version is reached
  • Number the final version 1.0. If the final version is revised, number as 1.1, 1.2, 1.3 until a version 2.0 is completed. Continue in this fashion.
  • Use versioning software, such as Git; or follow a version control chart (PDF) and create a version control table that details the version number, person responsible for the change, purpose or nature of the change, date of the change
Version numberAuthorPurpose/ChangeDate
0.1S.Smith, Post-DocInitial draft01/01/2016
0.2F.Hill, Post-DocChanges to conclusion01/07/2016
0.3G.Joe, PIChanges to introduction01/12/2016

Best practices in folder structure

Folder organization
Image: Laure Perrier, University of Toronto
  • Restrict level of folders to three or four deep
  • Consider limiting the number of folders within each folder, to ten
  • Include a folder within the folder structure for “documentation”. This might include:
    • Project proposals/protocols
    • Consent and approval forms
    • Methodology documents
    • Data management plan
    • Code used for recodes, analysis, and outputs
    • Readme files with transformation information
    • Readme files with the full names or titles for any abbreviations used in file names
    • Codebooks or guides

Preferred long-term file format qualities

File formats can help ensure long-term access and share-ability of your data. Once data analysis is complete using the software and formats most suited to the planned analysis, consider converting files into stable, open file formats for long-term storage.

  • Non-proprietary
  • Lossless
  • Standard for the field or in common usage by the research community
    • Include data labels (metadata)
  • Adheres to an open, documented standard
    • Interoperable among diverse platforms and applications
    • Fully published and available royalty-free
    • Fully and independently implementable by multiple software providers on multiple platforms without any intellectual property restrictions for necessary technology
    • Developed and maintained by an open standards organization with a well-defined inclusive process for evolution of the standard.

Examples of preferred file formats

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV, POR
  • Geospatial: SHP (with SHX and DBF), DBF, GeoTIFF, NetCDF, e00
  • Moving Images: MKV, MPEG, AVI
  • Sounds: FLAC, WAVE, AIFF, MP3
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still Images: TIFF, PDF/A, PNG, GIF
  • Text: PDF/A, ASCII, TEI XML
  • Web Archive: WARC

Further reading

UBC Research Data Management (2019) Organize: Set up conventions for your project, document them for all team members, and be consistent.

Library and Archives Canada. (2014). Advice on the digital file formats to be used when transferring information

chat loading...