Show HN: ArXiv-txt, LLM-friendly ArXiv papers

(arxiv-txt.org)

20 points | by jerpint1 day ago

4 comments

lgas1 day ago
It just extracts the abstracts?
- jerpint1 day ago
  For now , yes - abstracts and other metadata
  rrekaf21 hours ago
  do you plan on adding descriptions of figures and tables?
  jerpint16 hours ago
  will probably focus on getting the text out of the papers first, figures might be a good next step after that
sbpost1 day ago
The example you give doesn't seem to work - the raw txt does not have authors.
- jerpint16 hours ago
  you're right - I hadn't noticed! I fixed it now, thanks for pointing it out
jmartin26831 day ago
This would be awesome wrapped in an MCP server/tool call :)
- jerpint1 day ago
  whoa - i haven't yet played with MCP - might be a good first project!
westurner1 day ago
If you train an LLM on only formally verified code, it should not be expected to generate formally verified code.
Similarly, if you train an LLM on only published ScholarlyArticles ['s abstracts], it should not be expected to generate publishable or true text.
Traceability for Retraction would be necessary to prevent lossy feedback.