Engineering projects can be complex, involving an extensive collection of data from diverse sources. This data, ranging from internal standards, technical specifications, design documents, project reports and emails, is mainly unstructured.
About 80-90% of all data in the world is unstructured – and that won’t change anytime soon. Unstructured data is not the enemy – it’s essential to our communication. But our issues lie in the disconnect between unstructured and structured data, which is more usable by machines. Integrating this is essential to take advantage of new technologies to transform our businesses.
The universe of data is not so monochrome. Floating in the vast divide between unstructured and structured data is semi-structured data, including formats such as JavaScript object notation and extensible markup language.

Advancements in machine learning (ML), natural language processing (NLP) and document understanding have produced new software solutions that employ sophisticated algorithms and artificial intelligence to extract, transform, analyse and classify unstructured data. These solutions can extract information from PDFs, audio and video, then convert it into semi-structured formats that can be integrated with and used by existing databases and applications such as product lifecycle management systems. Once converted to semi-structured data, it becomes more accessible to enterprise systems, advanced analytics and digital threading. The format’s flexibility allows applications and systems to evolve over time without requiring a rewrite.
There is no one answer to what data is best. Decision-making relies on quantitative (structured) and qualitative (unstructured) data, and the creation of semi-structured data – through ML and NLP – is key to understanding qualitative data better. Integrating and finding harmony among these three has many benefits, including significantly reducing the time engineers search for and process information, allowing them to focus on higher-value, core engineering work.
This accelerates project timelines, and enhances the quality of work by reducing the likelihood of oversights and errors. With access to a unified data repository, teams can improve collaboration, information sharing and govern change management easily, all of which are crucial in complex projects with multiple stakeholders and engineering disciplines.
Next-Gen solutions
As younger workers enter industries, they bring new skillsets often tied to more advanced technology. Data command enables us to take advantage of these technologies and expedite the onboarding and effectiveness of new generations.
But how should these benefits be realised? It’s dangerous to go alone – so don’t. It is crucial for businesses to partner with solution providers that align with their specific needs, workflows and, in some cases, industry.
A solution provider’s deep understanding of a specific industry can significantly amplify their value. Their specialised knowledge allows them to offer tailored solutions that address common operational needs and solve industry-specific problems. Integrating various data sources is the tip of the iceberg. Understanding syntax, context and intent behind the data is pivotal to extracting the insights engineers need to drive data benefits. Using similar technology as general providers, niche industry specialists can provide superior insights and semi-structured capabilities due to specifically trained models.
Remember, data comes in all shapes and sizes, each with specific benefits. Integrating them successfully will help avoid “garbage-in/garbage-out” pitfalls, enabling solutions that fit your exact needs.
This is the viewpoint of Michael Arnold, Executive Director Product and Technology Accuris and it first appeared in Advanced Manufacturing
