It will probably need some cleaning and simplification if it is to be indexed easily. Phylesystem uses a truly horrible-looking JSON transformation of NeXML (NeXML itself is ugly), and TreeBASE also supports NeXML, so some form of NeXML or a JSON transformation seems the obvious storage format. There is a file-based storage system for phylogenies phylesystem (see "Phylesystem: a git-based data store for community-curated phylogenetic estimates" ) that is sort of what I had in mind, although long term persistence is based on GitHub rather than a repository such as Zenodo. It's been a while since I've paid a lot of attention to phylogenetic databases, and it shows. I realise that this is all wild arm waving, but maybe now is the time to reinvent TreeBASE? Updates One reason I think having Zenodo as a storage engine is that it takes care of long term sustainability of the data. It has essentially been a volunteer effort to date, with little or no funding. More and more functionality can be devolved elsewhere.Īnother other issue is how to support TreeBASE. Need a search engine? Just spin up a container in the cloud and you have one. Things have trended towards being simpler, with lots of building blocks now available in the cloud. If one was starting from scratch today I don't think this would be the obvious solution. My sense is that the TreeBASE code is very much of its time (10-15 years ago), a monolithic block of code with SQL, Java, etc. There's lots of details to tweak, for example how many of the existing URLs for studies are preserved (some URL mapping), and what about the API? And I'm unclear about the relationship with Dryad. Maybe have "stars" for the level of curation so that users know whether anyone has checked the data. This presupposes that there are people available to do curation. As time allows, add an interface where people upload data directly, it gets curated, then deposited in Zenodo. A bot then grabs a feed of these datasets and adds them to the search engine in (1) above. To add data to TreeBASE the easiest way would be for people to upload them direct to Zenodo and tag them "treebase". The number one goal is for people to be able to find trees, view them, and download them. Trees are displayed natively on the web using SVG. A simple web interface is placed on top so that people can easily find trees (never a strong point of the original TreeBASE). The data is transformed into JSON and indexed using Elasticsearch. This becomes the default storage for TreeBASE. The data (individual studies with trees and data) are packaged into whatever format is easiest (NEXUS, XML, JSON) and uploaded to a repository such as Zenodo for long term storage. Perhaps this is a chance to rethink TreeBASE, assuming that a repository of published phylogenies is still considered a worthwhile thing to have (and I think that question is open). So it looks like TreeBASE is in trouble, it's legacy Java code a victim of security issues. This is what does.Īt the moment my work on material citations (i.e., lists of specimens in taxonomic papers) is focussing on 1 (generating citations from specimen data in GBIF) and 2 (parsing citations into structured is Naturalis no longer hosting Treebase? - Hilmar Lapp May 10, 2022 In the "old days" a typical strategy was to attempt to parse the citation string into structured data (see AnyStyle for a nice example of this), then we could extract a truple of (journal, volume, starting page) and use that to query CrossRef to find if there was an article with that tuple, which gave us the DOI.Īnother strategy is to take all the citations strings for each DOI, index those in a search engine, then just use a simple search to find the best match to your citation string, and hence the DOI. Going in the reverse direction (string to identifier) is a little more challenging. Citation.js: a format-independent, modular bibliography tool for the browser and command line. Given a DOI we can get structured data with a simple HTTP fetch, then use a tool such as citation.js to convert that data into a human-readable string in a variety of formats. Note to self (basically rewriting last year's Finding citations of specimens).īibliographic data supports going from identifier to citation string and back again, so we can do a "round trip." 1.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |