141 points | by marcon6803 天前
[0] https://www.ycombinator.com/launches/Lbx-simplex-on-demand-p...
RE: workflows w/ complex forms and tabs -- do you have some sites that are good examples of this? We'd love to see how Simplex does.
Side-note: The comment for the frequency graph is wrong, it mentions stars instead.
One way to hack the scrolling to an element is to first run extract_bbox on a natural language description (in your case for GitHub it might be "follow button") then take the Y coordinate of that element and scroll that number of pixels. I just wrote this bit of code that I tested and it brings the contribution graph into full view:
simplex.goto("github.com/spro")
coords = simplex.extract_bbox("follow button")
simplex.scroll(coords[1])
simplex.wait(2000)
image = simplex.extract_image("green tile graph")
But then it incorrectly picks the code review/submissions/etc. graph as the green tile graph -- we'll look into it!re: frequency graph typo -- just pushed a fix, thanks!
simplex.goto("www.amazon.com")
simplex.wait(2000)
simplex.click('the search bar')
simplex.type("bicycle")
simplex.press_enter()
simplex.wait(1000)
image = simplex.extract_image("first 3 rows of results")
get error: simplex error processing error
Also, did you evaluate https://github.com/browser-use/browser-use by any chance and have any comments about it? I'm assuming it was too AI-heavy based on what you said about claude/etc?
Browser Use is another YC company. Probably the biggest difference is that they're more agent focused while we're more lower level -- in the Claude Computer Use camp like you mentioned.
I hadn't heard of UI Vision but just took a look at it -- it also looks like a no-code solution that's a Chrome extension, so I'd say the main differences are the same as the differences w/ Skyvern -- we're lower level and meant to be used by developers.
I'd add that we're also able to directly extract parts of websites that have no official API -- for example, an image of the GitHub contribution graph like I show in the video demo.
1. A framework to use/control mobile phones via any LLM - https://github.com/BandarLabs/clickclickclick
simplex.goto("github.com/mitchellh")
num_contribs = simplex.extract_text("blabla")
print(num_contribs)
It outputs all texts from the page. Is it expected? Maybe it should fail indicating element could not be found?We currently don't return failure cases (just closest match) -- but good suggestion! We'll fine tune on some negative cases and see if we can catch them
I think you are onto something here.
Does it work with sites that have cloudflare antibot (or similar) functions?
The syntax is also intuitive, although site loading seems pretty slow but maybe that's just the playground and the paid access is much faster.
Site loading is pretty now slow due to a combination of 1. traffic we're currently getting, 2. running a remote session, 3. running a few large vision language models, and 4. adding waits to allow pages to load/allow you to view your search results. We're working on cutting our latency significantly since it leads to a better development experience.