IEF logo

Freiburg Open Science Portal for Industrial Ecology and Socio-Metabolic Research

Large Language Model Based Label Matching

Matching the labels of new datasets to already defined labels greatly increases the consistency of the database and makes it much easier to find the data. It is therefore a core requirement for incoming datasets. Label matching can be a lot of work, however, which is why the IEDC offers this probabilistic and semantics-based label matching tool based on nomic-embed-text:v1.5 by Ollama large language model runner. This tool uses LLM-generated embeddings and cosine similarity to identify the top matching labels based on meaning, not just keywords. Simply choose one of IEDC's main classifications and enter a label to discover the semantically most similar labels from IEDC database. Under the "List of Labels Search" option, you can upload a batch of labels in an xlsx template for automatic matching. The model is trained for only English labels and the user input will be accepted as lower case letters to yield betetr matches.

1. Choose the classification name in which you would like to search for the label.

Classification IDClassification Name
2regions_iso_iedc
4materials
6industry
7products
13building types
90components of products

1. Choose the classification name in which you would like to search for the list of labels.See example:

Classification IDClassification Name
2regions_iso_iedc
4materials
6industry
7products
13building types
90components of products