The most prevalent data source of institutional references is author affiliations found on scientific articles. Typically, creators of original research articles and conference proceedings annotate papers with their department, organisation and address. Not only does this allow readers to recognise the origin of the research, but it also serves as a mechanism to aggregate and assess scientific output in order to provide science metrics.
However, when acquiring and processing large quantities of author affiliations, it becomes apparent that significant variation in the format and structure prevents effective aggregation and reporting. This, coupled with changes in name and institutional structure over time, makes large-scale integration of such data prohibitively expensive given the manual effort required to properly disambiguate each affiliation.