top of page
Search

Introducing the BERT Keyword Extractor: a Streamlit interface for KeyBERT! 🎈

Updated: Oct 1, 2021


🎲 Want to jump right in? Try the Streamlit app here!


Automatic keyword generation methods have been around for a while (TF-IDF, Rake, YAKE!, just to name a few), all widely implemented in Python, all widely used in fields such Information Retrieval, Text Mining and of course, SEO!


Although techniques vary, they usually extract keywords and keyphrases from a document, assign a weight to each word, to signify the importance of that word in the wider document and corpus.


What's KeyBERT?


While all valuable, the KeyBERT library goes a step further than most in terms of accuracy by leveraging BERT embeddings!


What also makes KeyBERT stand out from the library crowd is its lightweightness, power and versatility.


Lightweight, as unlike other libraries, KeyBERT works very well with CPU configs. It can be used with a wide range of applications as a result.


Powerful, as KeyBERT supports the latest and best-performing embedding models, such as:

  • Flair

  • Spacy

  • Gensim

You can even select any sentence-transformers model and pass it through KeyBERT!

ree

KeyBERT is also versatile with a bazillion of parameters to choose from. Here's a non-exhaustive list below:

ree

That being said, like with any elaborate library, that versatility may come with a trade-off.

It can sometimes be cumbersome to choose the right model embedding and set of parameters to quickly iterate through your use cases.


This is where Streamlit comes in handy!


Introducing the BERT Keyword Extractor! 🎈


With the BERT Keyword Extraction (BERT KE), I wanted to create a simple interface that provides the relevant parameters at your fingertips, allowing you to iterate through in seconds and allow you to export your results!


🎲 Try the app here!


ree

The BERT Keyword Extractor is currently in early beta with the following limitations:

  • 2 embedding models (DistilBERT and Flair)

  • Only the first 500 words are currently reviewed

Once the app is deemed stable, I will add more models, more parameters, and more text allowance, so keep your eyes peeled!


Let's see what settings are currently available:


Choosing your model

ree




At present, you can choose between two embedding models: DistilBERT, which is the default engine, and Flair. More to come soon!


Top N results

ree




You can choose the number of results to be displayed. Between 1 and 30, the default number is 10.


Min/Max Ngrams

ree








You can choose the minimum and maximum values for the ngram range.


This sets the length of the resulting keywords/keyphrases.


  • To extract a set of single keywords only, set the ngram range to (1, 1)

  • To extract keyphrases, set the minimum ngram value to 2. The maximum ngram value can be set to 2 or higher, depending on the number of words you would like to see in each keyphrase.


Check Stop Words

ree


Tick this box to remove stop words from the document (currently English only).


Use MMR

ree


You can use Maximal Margin Relevance (MMR) to diversify the results. It creates keywords/keyphrases based on cosine similarity.


Try high/low 'Diversity' settings for interesting variations.


Diversity

ree





The higher the setting, the more diverse the keywords. Note that the *Keyword diversity* slider only works if the *MMR* checkbox is ticked.


Credits


Just a note where credit is due, KeyBERT has been created by the amazing Maarten Grootendorst


Maarten writes insightful Data Science articles in Medium, and is also the creator of 2 other awesome Python libraries: BERTopic and PolyFuzz!


BERTopic is a semi-supervised topic modelling library with a built-in visualiser. Check out Koray’s excellent article for some SEO use cases!

PolyFuzz is a mighty fuzzy string-matching/string-grouping library. It has been my go-to tool for fuzzy matching for over a year now, and it’s bang on for SEO tasks!


It can be used for mapping keywords to URLs, site migrations & redirect management.


It’s also got some good momentum in the SEO community, check-out what Greg Bernhardt, SearchSolved's Lee Foot, and yours truly have been doing with it!

 
 
 

11 Comments


Jenny Vee
Jenny Vee
6 days ago

The College of Contract Management can help you accomplish exceptional career goals. Their online college offers a diverse choice of outstanding courses that are all easily accessible. Students can easily balance employment and study with engaging online sessions, ensuring they receive a respected qualification that companies highly value.

Like

Mollie Talbot
Mollie Talbot
7 days ago

Professionals seeking to enhance their expertise can find industry-relevant learning opportunities. Flexible programs cater to various career stages, offering practical skills for immediate application. Elevate professional development with UNICCM.

Like

Choosing the College of Contract Management is a strategic decision for anyone seeking career growth. Its programs are respected for their professional focus and relevance. The online model ensures that education is accessible to a global audience. Learners can advance in their careers while continuing their current jobs. It has become a top choice for modern professionals. 

Like

UNICCM has consistently developed courses that match the needs of working professionals. Its programs focus on equipping learners with skills that can be directly transferred into practice. The online model ensures that education remains accessible across different schedules. Learners value the credibility and applicability of the qualifications offered. Such consistency underpins its strong academic profile.

Like

Solve Zone's IGNOU solved project and Ignou Project Report are like guides that help students complete their projects. We cover different subjects.


Like

© Charly Wargnier - 2025

Follow me on:

  • X
  • LinkedIn
  • logo (1)
bottom of page