CRAFT 🛠: Cross-domain Abstractive Summarization through Incremental Fine-Tuning

During my time at Northwestern University as an undergraduate researcher in Machine Learning and NLP, I was part of a class, CS 397: Natural Language Processing Seminar, where we had the opportunity to study Natural Language Processing innovations, witness the evolution of these approaches over the years, and partake in some groundbreaking research! Our class project revolved around developing a more flexible model for summarizing textual data across different domains.

Let’s tackle the problem!

The world is dominated by data, especially textual data. Effectively summarizing this overwhelming amount of information quickly and accurately is a notable task.
A common challenge is that certain summarization models perform well within specific domains, but their effectiveness degrades when dealing with a mix of different domains.
As the volume and diversity of textual data increases constantly, the need for a versatile summarizing model is increasingly evident.

To tackle these challenges, our team built an innovative solution.

CRAFTing a Solution

Our contribution, CRAFT (Cross-domain Abstractive Summarization through Incremental Fine-Tuning), is a model built to overcome the barriers of domain dependency in summarization.

The idea behind CRAFT is optimizing a pre-existing language model to be more effective across domains. We hypothesized that models could be “trained” using a mixed collection of similar types of texts - effectively adapting its capabilities to better suit the styles and structures usual to these texts.

Using this approach, coupled with incremental fine-tuning processes, the CRAFT model enhanced its comprehension of different styles and authoring traits in the supplied data - leading to a better summarization irrespective of the domain.

Testing showed that CRAFT was capable of handling a wide range of textual content - from news articles to scientific papers, from conversational dialogues to legislative documents - maintaining its consistency in summarization quality. Our findings and developments were put together in a paper we called “CRAFT: CRoss-domain Abstractive Summarization through Incremental Fine-Tuning”. Upon drafting our initial version of the research, we were keen to receive feedback from the academic faculty, and look forward to refining it further.

Paper: