Search notes:

Python library: spaCy

Example

Download the spacy small english model (apparently referred to as pipeline by spacy itself). This model is for CPU, not for GPU.
$ python3 -m spacy download en_core_web_sm
Then run this simple script:
import spacy

text = "Advanced engineering mathematics in Google located in USA is version abriged edition by Omkar Bhalerao"

text = """
spaCy (/speɪˈsiː/ spay-SEE) is an open-source software library for advanced
natural language processing, written in the programming languages Python and
Cython. The library is published under the MIT license and its main
developers are Matthew Honnibal and Ines Montani, the founders of the
software company Explosion. 

Unlike NLTK, which is widely used for teaching and research, spaCy focuses
on providing software for production usage. spaCy also supports deep
learning workflows that allow connecting statistical models trained by
popular machine learning libraries like TensorFlow, PyTorch or MXNet through
its own machine learning library Thinc. Using Thinc as its backend,
spaCy features convolutional neural network models for part-of-speech
tagging, dependency parsing, text categorization and named entity
recognition (NER). Prebuilt statistical neural network models to perform
these tasks are available for 23 languages, including English, Portuguese,
Spanish, Russian and Chinese, and there is also a multi-language NER model.
Additional support for tokenization for more than 65 languages allows users
to train custom models on their own datasets as well. 

History
-------

    Version 1.0 was released on October 19, 2016, and included preliminary
    support for deep learning workflows by supporting custom processing
    pipelines. It further included a rule matcher that supported entity
    annotations, and an officially documented training API.

    Version 2.0 was released on November 7, 2017, and introduced convolutional
    neural network models for 7 different languages. It also supported
    custom processing pipeline components and extension attributes, and
    featured a built-in trainable text classification component.

    Version 3.0 was released on February 1, 2021, and introduced
    state-of-the-art transformer-based pipelines. It also introduced a new
    configuration system and training workflow, as well as type hints and
    project templates. This version dropped support for Python 2.
"""

nlp = spacy.load('en_core_web_sm')
# print(type(nlp)) # spacy.lang.en.English

doc = nlp(text)
# print(type(doc)) # spacy.tokens.doc.Doc

for ent in doc.ents:
#   print(type(ent)) # spacy.tokens.span.Span
    print(f'{ent.label_:<8}: {text[ent.start_char : ent.end_char]}')
The script prints:
WORK_OF_ART: Cython
ORG        : MIT
PERSON     : Matthew Honnibal
PERSON     : Ines Montani
PERSON     : NLTK
PRODUCT    : TensorFlow
PERSON     : PyTorch
ORG        : MXNet
GPE        : Thinc
ORG        : Thinc
ORG        : NER
CARDINAL   : 23
LANGUAGE   : English
NORP       : Portuguese
LANGUAGE   : Spanish
NORP       : Russian
NORP       : Chinese
ORG        : NER
CARDINAL   : more than 65
CARDINAL   : 1.0
DATE       : October 19, 2016
ORG        : API
CARDINAL   : 2.0
DATE       : November 7, 2017
CARDINAL   : 7
CARDINAL   : 3.0
DATE       : February 1, 2021
CARDINAL   : 2

Explain «token types»

import spacy

print(spacy.explain('CARDINAL'   ))  # Numerals that do not fall under another type
print(spacy.explain('DATE'       ))  # Absolute or relative dates or periods
print(spacy.explain('GPE'        ))  # Countries, cities, states
print(spacy.explain('LANGUAGE'   ))  # Any named language
print(spacy.explain('NORP'       ))  # Nationalities or religious or political groups
print(spacy.explain('ORG'        ))  # Companies, agencies, institutions, etc.
print(spacy.explain('PERSON'     ))  # People, including fictional
print(spacy.explain('PRODUCT'    ))  # Objects, vehicles, foods, etc. (not services)
print(spacy.explain('WORK_OF_ART'))  # Titles of books, songs, etc.

See also

The NLTK library.

Index