Skip Navigation
Is there a simple way to severly impede webscraping and LLM data collection of my website?
  • If LLMs were accurate, I could support this. But at this point there’s too much overtly incorrect information coming from LLMs.

    “Letting AI scrape your website is the best way to amplify your personal brand, and you should avoid robots.txt or use agent filtering to effectively market yourself. -ExtremeDullard”

    isn’t what you said, but is what an LLM will say you said.

  • US support for nuclear energy at record high, poll shows
  • This is hopeful, and we need more nuclear, but I have very serious questions about the methodology to this survey.

    The prior marks on the line graph indicate not all categories of response are represented, as the don’t add up to 100%. Then there is a sudden change over the last 4 years where the % supporting jumps to the mid 70s and all four periods add up to exactly 100%.

    This, to me, feels like a question change on or around 2021, or a methodology change that’s not clearly labeled, and casts doubt on the integrity of the research, especially given the generally modest level of knowledge about nuclear, which, according to my read of the article and survey details, doesn’t appear to have changed at any point.

  • Looking for some HVAC suggestions to better cool my stifling second floor
  • Just piling on at this point, but we made 2 changes last spring that made summer so much more tolerable in our house.

    1. More insulation. I bought a cheap thermal camera on Amazon and found entire closets and a bathroom with no insulation. Those rooms are a solid 10+ degrees cooler now.
    2. More ventilation. Half my house didn’t have any soffit vents, but had attic vents. Adding soffit vents made that half the house 5 degrees cooler all on its own.

    And we haven’t found ourselves needing it, but a mini split has popped up a lot here already and is a great idea.

  • Cloudflare took down our website after trying to force us to pay $120000 within 24h
  • I used to be in credit risk for a very large stock market company.

    Calling the bottom of the market is the same as betting big and getting 21 in blackjack.

    Super cool when it happens, but not skill. The number of grown men I had to hear crying because they were dollar cost averaging down to the bottom until they went broke still disturbs me.

    I’m happy this worked for you, but it was not skill.

  • NSFW
    XXX
  • Just looking at employers in my professional career. Two. One for 15 years then the current for 3.

    Looking at my direct and diagonal leaders, they seem to average 3-5 years a role, and I consider staying with my prior employer for so long a mistake. I made career progression and promotions there, but it still slowed me down vs changing employers.

  • Github vs. Email Aliases
  • Sure, self-hosting is a great option for very large projects, but a random python library to help with an analytics workflow isn’t going to self-host. Those projects, along with 27,999,990 others have chosen GitHub, often times explicitly to reduce the barrier to contribution.

    Also, all of those examples are built on thousands of other FOSS projects, 99% of which aren’t self-hosting. This is the same as arguing only Amazon is a bookseller and ignoring the thousands of independent book publishers creating the books Amazon is selling.

  • Github vs. Email Aliases
  • GitHub has 28 million public repos

    Gitlab is has less than an order of magnitude as many Under a million in 2020, and nearly 80% without FOSS license.

    Is it everyone’s favorite, or best, or most feature rich. Nah. Is it where the FOSS projects are. Yes.

  • Texas ban on university diversity efforts provides a glimpse of the future across GOP-led states
  • This is what republicans have always done well. They organize locally, take over school boards and city councils, drive the change they want to see in local communities and drive support locally to drive voter turn out nationally.

    We don’t see democrats crashing school board and city council meetings, participating in local politics en mass to drive local change with near the same effectiveness as republicans, and it leads to underwhelming participation in national elections, as the left sits around wondering “what has the DNC done for me”.

  • Are there any genuine benefits to AI?
  • Lots of boring applications that are beneficial in focused use cases.

    Computer vision is great for optical character recognition, think scanning documents to digitize them, depositing checks from your phone, etc. Also some good computer vision use cases for scanning plants to see what they are, facial recognition for labeling the photos in your phone etc…

    Also some decent opportunities in medical research with protein analysis for development of medicine, and (again) computer vision to detect cancerous cells, read X-rays and MRIs.

    Today all the hype is about generative AI with content creation which is enabled with Transformer technology, but it’s basically just version 2 (or maybe more) of Recurrent Neural Networks, or RNNs. Back in 2015 I remember this essay, The Unreasonable Effectiveness of RNNs being just as novel and exciting as ChatGPT.

    We’re still burdened with this comment from the first paragraph, though.

    Within a few dozen minutes of training my first baby model (with rather arbitrarily-chosen hyperparameters) started to generate very nice looking descriptions of images that were on the edge of making sense.

    This will likely be a very difficult chasm to cross, because there is a lot more to human knowledge than thinking of the next letter in a word or the next word in a sentence. We have knowledge domains where, as an individual we may be brilliant, and others where we may be ignorant. Generative AI is trying to become a genius in all areas at once, and finds itself borrowing “knowledge” from Shakespearean literature to answer questions about modern philosophy because the order of the words in the sentences is roughly similar given a noun it used 200 words ago.

    Enter Tiny Language Models. Using the technology from large language models, but hyper focused to write children’s stories appears to have progress with specialization, and could allow generative AI to stay focused and stop sounding incoherent when the details matter.

    This is relatively full circle in my opinion, RNNs were designed to solve one problem well, then they unexpectedly generalized well, and the hunt was on for the premier generalized model. That hunt advanced the technology by enormous amounts, and now that technology is being used in Tiny Models, which is again looking to solve specific use cases extraordinarily well.

    Still very TBD to see what use cases can be identified that add value, but recent advancements to seem ripe to transition gen AI from a novelty to something truly game changing.

  • Elmo wrote a simple tweet that revealed widespread existential dread. Now, the president has weighed in.
  • Been looking at therapists for my teenage daughter, she’s been debating therapy for a couple of years and has recently fully committed.

    We have good insurance and are financially secure, and holy shit it’s still going to cost an extraordinary amount. I don’t understand how anyone struggling with financial insecurity could even consider having access to therapy as an option.

    What a fundamentally broken system, there is not a single type of care that exists that is accessible to the people who need it.

  • Where to start? Text Extraction
  • Yeah, model training is hard. Like capital H HARD. you need a bunch of data and it needs to be high quality.

    New York is the financial center of USA, so separating finance jobs from job postings written by someone using New England vernacular is a step you need to go through to make sure your data is high enough quality.

    So if you are just starting, use 20 newsgroups dataset in those links, it’s pretty good data with a ton of resources written about it. It’s not fun data, but it isn’t as likely to fall victim to biases in data you aren’t expecting.

  • Where to start? Text Extraction
  • Couple of options to start out with, Topic Labeling and Topic Extraction.

    • Topic Labeling is a classic example of supervised learning, or using ML with training data to classify new observations based on patterns found in training data.

    • Topic Extraction is a classic example of unsupervised learning, or attempting to identify patterns without training data.

    I’m going to start with labeling, or classification here. There are plenty of tools to train a model to classify text in to categories, I’d recommend starting with this scikit-learn tutorial to see what’s involved before you start.

    With any classification problem, you need good training data. You mentioned you’ve scraped 400 job postings, and I’m assuming you would want to using the job description to predict the job title. Some quick math, you’ll want to withhold 30% of your data to test your model, so that leaves 280 postings to train. I would recommend at least 100 descriptions per job title, so if you have 2-3 job titles, perfect, you’re ready to follow that tutorial with your own data!

    If you have more that that, you probably won’t be able to do labeling/classification here, and will instead want to do topic extraction, where you’ll throw your walls of text at the machine and let the machine tell you the patterns it finds.

    Topic modeling with spaCy and sci-kit learn is a great overview of this process, and plugging your own data in is pretty straightforward.

    Both of these examples don’t even really scratch the surface of what’s possible with text based ML these days, but are perfectly viable tools to run quickly and on commodity hardware.

  • Owl of the Year Match 18 - Screech Owl vs Northern Pygmy Owl
  • Our friendly neighborhood screech owls used to meet up on our neighbors basketball goal the summer of 2020. Was a great thing to look forward to every night at dusk, so we built an owl box for them that squirrels have taken over.

  • Americans say money can buy happiness. Here's their price tag.
  • I thought the millennial aspirations were a bit extreme, but as a millennial I get it. We had the Great Recession, outrageous prices for college, home prices are out of control.

    And I say this as a millennial doing well. We don’t even think about money day to day or paycheck to paycheck, and are saving enough to largely minimize or potentially mitigate our kids needing student loans. But I am still strategically thinking about money and what will happen when the next recession or financial calamity hits, or hyper-inflation wiping us out.

    The cost to live has been trying to outrun our income our entire adult lives, so sure, fuck it, double our income then maybe we have a chance to sleep at night even when it’s going well.

  • Gen Z And The Great Office Debate Won’t End In 2023
  • The study forbes referenced appears to be essentially “how to design offices for gen z”, presuming they really want to use an office.

    The tips to drive virtual engagement are pretty standard management material at this point.

    Would have liked to see some real evidence to “boomerang” being philosophical, that felt like a cheap misuse of the term to seem more relevant than “what kind of games should be in the break room”

  • InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)CO
    coolkicks @lemmy.world
    Posts 0
    Comments 27