Hyper Collocation

What is Hyper Collocation?

Hyper Collocation is a search engine for finding example sentences from 811,761 English papers included in arXiv. To clarify typical phrasing, the search results are sorted by the frequency of the collocation. This tool is mainly intended to help academic writing and can be used as an alternative to Springer Exemplar (discontinued).

In the beginning, I developed this tool for my personal use but decided to go public because an r4.large instance of Amazon Web Service EC2 is too big for personal use. Enjoy! 🙂

News

Migrated the server from AWS.
The maintenance cost is greatly reduced (0.152 → ~0.018 USD / hour), and the backend was rebuilt on a sharded full-text index, so new papers can now be added by rebuilding only the latest shard. The search corpus (811,761 papers) is unchanged. (2026/6/18)
Migrated from r4.large instance to r5.large instance.
The maintenance cost becomes low (0.16 → 0.152 USD / hour), and the processing becomes slightly faster. (2018/9/29)

Tips

Query is case-sensitive.
Query is whitespace-sensitive.
So, “word” and “␣word␣” produce different results.
[Math] in the snippet indicactes there was an equation in a separeted line.

Privacy Policy

This site uses no cookies and performs no third-party tracking such as analytics or advertising.
For operation and security, the web server records standard access logs (timestamp, requested URL, referrer, user agent, etc.). IP addresses are anonymized (last octet masked) before storage and deleted after at most 30 days. This processing relies on legitimate interests (GDPR Art. 6(1)(f)) to operate and protect the service.
The server is located in Germany (Hetzner Online GmbH, within the EU). We do not sell or share the collected information with third parties.
If you reside in the EU/EEA, you have the right to access, rectify, erase, or object to the processing of your data, and to lodge a complaint with a supervisory authority. For inquiries, contact ichiro.maruta@gmail.com.

Acknowledgments

Hyper Collocation depends on many open projects. While it is difficult to list enormous number of the projects on which Hyper Collocation depends, I especially would like to acknowledge Succinct Data Structure Library (SDSL), Pandoc, Crow, Vue.js, and arXiv Bulk Full-Text Access.

Hyper Collocation is developed and maintained by Ichiro Maruta.