What is Hyper Collocation?

Hyper Collocation is a search engine for finding example sentences from 811,761 English papers included in arXiv. To clarify typical phrasing, the search results are sorted by the frequency of the collocation. This tool is mainly intended to help academic writing and can be used as an alternative to Springer Exemplar (discontinued).

In the beginning, I developed this tool for my personal use but decided to go public because an r4.large instance of Amazon Web Service EC2 is too big for personal use. Enjoy! 🙂

Usage Examples

News

  • Migrated the server from AWS.
    The maintenance cost is greatly reduced (0.152 → ~0.018 USD / hour), and the backend was rebuilt on a sharded full-text index, so new papers can now be added by rebuilding only the latest shard. The search corpus (811,761 papers) is unchanged. (2026/6/18)
  • Migrated from r4.large instance to r5.large instance.
    The maintenance cost becomes low (0.16 → 0.152 USD / hour), and the processing becomes slightly faster. (2018/9/29)

Tips

  • Query is case-sensitive.
  • Query is whitespace-sensitive.
    So, “word” and “␣word␣” produce different results.
  • [Math] in the snippet indicactes there was an equation in a separeted line.

Privacy Policy

  • This site uses no cookies and performs no third-party tracking such as analytics or advertising.
  • For operation and security, the web server records standard access logs (timestamp, requested URL, referrer, user agent, etc.). IP addresses are anonymized (last octet masked) before storage and deleted after at most 30 days. This processing relies on legitimate interests (GDPR Art. 6(1)(f)) to operate and protect the service.
  • The server is located in Germany (Hetzner Online GmbH, within the EU). We do not sell or share the collected information with third parties.
  • If you reside in the EU/EEA, you have the right to access, rectify, erase, or object to the processing of your data, and to lodge a complaint with a supervisory authority. For inquiries, contact ichiro.maruta@gmail.com.

Acknowledgments

Hyper Collocation depends on many open projects. While it is difficult to list enormous number of the projects on which Hyper Collocation depends, I especially would like to acknowledge Succinct Data Structure Library (SDSL), Pandoc, Crow, Vue.js, and arXiv Bulk Full-Text Access.

Hyper Collocation is developed and maintained by Ichiro Maruta.

{{result.occs}} occurrences of *{{result.query|visws}}* found. ({{numeral(result.elapsed*0.001).format('0.000')}} seconds) detail

Searched phrases cover {{result.coverage}} ({{result.coverage/result.occs | percent}}). Largest missing phrase may have {{result.largest_remaining}}({{result.largest_remaining/result.occs | percent}}) occurences.

Listed
Not Listed
Confident
Not confident
Not Listed

Ranking

Some low frequency (p < {{result.largest_remaining/result.occs | percent}}) phrases will be missing. The following ranking is inaccurate.
{{snippet.pre|nobr}}{{phrase.fix|nobr}}
{{result.query|nobr}}{{phrase.fix|nobr}}{{snippet.post|nobr}}
Loading...

Now Searching...

please wait for a while.