Yahoo Answers is shutting down on May 4th, 2021 (Eastern Time) and beginning April 20th, 2021 (Eastern Time) the Yahoo Answers website will be in read-only mode. There will be no changes to other Yahoo properties or services, or your Yahoo account. You can find more information about the Yahoo Answers shutdown and how to download your data on this help page.

How to parse a string in C++?

I have a string that has a ton of random characters (pulled from a webpage). I want to pull all of the numbers, commas, hyphens, periods, and percentages from the page and put them into a text file.

I'm new to C++ so I'm not sure how I'd go about doing this. I'm pretty sure I can write to a text file, but pulling the numbers and those symbols has been a bit of a struggle.

How should I do this? Please don't just link me to a help site, I've scoured the internet already. :)

Thank you in advance!

2 Answers

Relevance
  • ?
    Lv 7
    10 years ago
    Favorite Answer

    Since all you need are individual characters (not context-dependent multicharacter tokens for example), there is nothing special to do. Simply iterate through the string character-by-character and write any chracter that matches your requirements to the output file:

    #include <iostream>

    #include <fstream>

    #include <string>

    #include <cctype>

    #include <boost/asio.hpp>

    int main()

    {

         boost::asio::ip::tcp::iostream s("www.google.com", "http");

         if(!s)

             std::cout << "Could not connect to www.google.com\n";

         s << "GET / HTTP/1.0\r\n"

              << "Host: www.google.com\r\n"

              << "Accept: */*\r\n"

              << "Connection: close\r\n\r\n" ;

         std::ofstream out("output.txt");

         for(char c; s.get(c); )

         {

              if(std::isdigit(c) || c == ',' || c == '-' || c == '%' || c == '.')

                   out.put(c);

         }

    }

    test (without webpage access) https://ideone.com/0qgam

    If you need something more complicated, formatted input (stream >> variable and getline(stream, string, delimiter)) can help with breaking the text into individual multicharacter tokens, regular expressions (#include <regex>) can deal with many context-dependent tasks, and full-fledged parser libraries such as boost.spirit can deal with arbitrarily complex grammars.

  • 10 years ago

    mmk...

    if you want to take things out of it (numbers commas hypens etc...) do this:

    I assume that you have all of the text in a single string (named source) for this.

    string result;

    vector<char> result_pre_parse(); /*this is a vector object, you need to #include <vector>*/

    char[] source_array = source.c_str(); /*preparse it for faster output in case if its very big*/

    for(int i = 0; i<source.size(); i++)

    {

    switch(source_array[i])

    {

    case ','://put all of your characters like so, without breaks

    case '.':

    case '1':

    case '2'://etc....

    result_pre_parse.push_back(source_array[i]);

    break;

    default:

    break;

    }//switch

    }//for

    result = result_pre_parse; /* this should work, if it doesnt, find some way to change a vector into an array, should be easier*/

    /*do something with the result string, like write to a text file*/

Still have questions? Get your answers by asking now.