Login

venture tools

Startups clustering

Discover investment theses, cluster startups product similarity, and simplify complex groupings

1. Export a Crunchbase CSV file, such as a list of funding rounds or a list of companies

2. upload the CSV to initiate NLP analysis. (Example input: tiger-rounds-21h1.csv)

How does it work?

This computes the similarity of a text field (company description, company industry) to everyone else's.

For N startups:
 1. Process N descriptions with NLP and produce N * 512 numbers (aka Embeddings)
DONE! now load the output into Tensorflow Projector. However, in the previous edition, we did this:
 2. Inner-product of Embeddings to N * N correlation matrix (i.e. correlation of each startup to every other startup)
 3. Using the correlation matrix as 'edge weights' of a graph with startups as nodes - then solve for positions

However, in the last release, we do less work and use the TensorFlow Projector web application over the 512-dimensional embeddings. It's not as perfectly fit as the manually-generated charts we had before, but it's more configurable and a great tool for browsing and learning.