venture tools
Startups clustering
Discover investment theses, cluster startups product similarity, and simplify complex groupings
1. Export a Crunchbase CSV file, such as a list of funding rounds or a list of companies
2. upload the CSV to initiate NLP analysis. (Example input: tiger-rounds-21h1.csv)
How does it work?
This computes the similarity of a text field (company description, company industry) to everyone else's.
For N startups:
1. Process N descriptions with NLP and produce N * 512 numbers (aka Embeddings)
DONE! now load the output into Tensorflow Projector. However, in the previous edition, we did this:
2. Inner-product of Embeddings to N * N correlation matrix (i.e. correlation of each startup to every other startup)
3. Using the correlation matrix as 'edge weights' of a graph with startups as nodes - then solve for positions
However, in the last release, we do less work and use the TensorFlow Projector web application over the 512-dimensional embeddings. It's not as perfectly fit as the manually-generated charts we had before, but it's more configurable and a great tool for browsing and learning.