bankruptcy yahoo answers

bankruptcy yahoo answers

Data Mining: An Overview

1. Introduction.

Each organization collects from large amounts of data a variety of sources on a daily basis. Data mining is an iterative process of creating predictive and descriptive models, by uncovering previously unknown trends and patterns in vast amounts of data from the entire company in order to support decision making. Text mining applies the same techniques to analyze text documents. The findings Knowledge from data and text mining can be used to fuel strategic decision making. In the last ten years a number of knowledge discovery systems have been created that Structure of the data in the form of functional dependencies between attributes and detect hidden to formulate them as mathematical equations or other symbolic rules. One of the most most developed systems that determine a very complex and diverse problems systematically solve equations of data and error analysis evaluates statistical significance of the results Results discovers, develops empirical laws in data in the form of functional programs of standard and user-defined functional primitives built. Although the systems Explore numerical dependencies in data use different knowledge representation formalisms and search techniques, which they face the same set of difficulties their approach. Traditional and not enough text-document-management tools to meet the utilities. Document management systems work well with homogeneous collections of Documents, but not with the heterogeneous mixture that knowledge workers face every day.

Even the best Internet search tools suffer from poor precision and to remember.

2. An Architecture for Data Mining

To them best for these advanced techniques have it filled with a data warehouse, and flexible interactive business analysis tools are integrated. Many data mining tools are now working outside the camp, would require additional Steps for extracting, importing, and analyzing the data. Moreover, if new evidence requires a practical implementation that facilitates integration with the camp application of the results data mining. The resulting analytical data warehouse can be used to improve business processes across the organization in areas such as advertising campaign management, Detecting fraud, and the introductions of new products, and so on. Figure 1 illustrates an architecture for advanced analysis in a large data warehouse.

Figure 1 – Integrated Data Mining Architecture

The ideal starting point is a data warehouse with a combination of internal data tracking all customer contact coupled with external market data on competitor activity. Background information on potential customers also provides an excellent basis for prospecting. These Storage can be implemented in a variety of relational database systems: Sybase, Oracle, Red Brick, and so on, and should be a flexible and fast data access be optimized.

An OLAP (On-Line Analytical Processing) server allows end-users sophisticated business model will be applied when the menu through the data warehouse. The multidimensional structures enable the user to analyze the data as they want to see their business – a summary by product line, region, and other important perspectives of their business. The data mining server must be integrated with data warehouse and OLAP server integrated ROI-focused business analysis directly into these Infrastructure. An advanced, process-centric metadata template defines the data mining objectives for specific business issues such as campaign management, prospecting and promotion to optimize. The integration with the data warehouse operational decisions can be directly implemented and pursued. As the hall is growing with new decisions and results can the organization constantly the best practices and apply them to my future decisions.

2.1. The scope of data mining

Data mining derives its name from the similarity between the search for to find valuable business information in a large database – for example, to in conjunction with products Gigabyte store scanner data – and mining a mountain for a vein valuable ores. Both processes require either sifting through an enormous amount of material, or intelligent Find it just that if the value is probing. Given databases of sufficient size and quality, data-mining technology to generate new business opportunities, these skills by:

2.2. Capabilities:

  • Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in large databases. Questions that traditionally required extensive hands-on analysis can now be answered directly from the data – quickly. A typical example of a predictive problem is targeted marketing. Data mining uses data on past Promotional mailings to maximize the return on investment goals are most likely to identify in future mailings. Other problems are predictive forecasting bankruptcy and other forms of standard and to identify segments of the population who react similarly to certain events.

  • Automated Discovery of previously unknown patterns. Data mining tools sweep through databases and identify previously hidden patterns in one step. An example of pattern discovery is the analysis to identify the retail sales data, seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data could be data entry typos.

Data mining techniques can yield the benefits of automation on software and hardware platforms and can be implemented on new systems as existing platforms are upgraded and developed new products. When data mining tools are high-performance systems performed in parallel, they can analyze massive databases in minutes. Faster processing means that users can automatically experiment with more models to to understand complex data. High speed makes it practical for users to analyze, to carry huge amounts of data. Larger databases, in turn, yield improved predictions. Databases can be larger than in breadth and depth:

  • More columns. Analysts often have the number of variables, to examine them, if they involve hands-on analysis for reasons of time. However, variables that are disposed of because they can cause to appear unimportant information about unknown Pattern. Powerful data-mining now allows the full depth of a database to explore, without a preliminary subset of the variables.

  • More lines. Larger samples yield lower estimation errors and variance, and allow users to draw conclusions on small but important To make parts of the population.

A recent Gartner Group Advanced Technology Research Note listed data mining and artificial Intelligence at the top of the five key technologies "that will clearly have a major impact in a wide range of industries within the next 3 to 5 years. "2 Gartner also listed in parallel architectures and data mining, as two of the 10 new technologies in which companies will invest in the next 5 years. According to a HPC recent Gartner Research Note: "With the rapid development of data collection, transmission and storage, large systems, the user is becoming increasingly necessary to adopt new and innovative ways to implement the after-market value of their huge stocks of detailed information to me, busy MPP [massively parallel processing] systems to develop new business Advantage (creating 0.9 probability) to. "3

3rd are the most commonly used techniques in data mining:

  • Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure.

  • Decision trees: Tree-structures that represent sets of decisions. These choices to generate rules for classifying a record. Specific decision tree methods include Classification and Regression Trees (CART) and Chi-squared Automatic Interaction Detection (CHAID).

  • Genetic algorithms: Optimization techniques, with methods such as genetic combination, mutation and natural selection in a design based on the Concepts of evolution.

  • Nearest neighbor method: A technique that each record in a record based on a combination of the classes of the k record (s) classification is most similar to it in a historical data set (k ³ 1). Sometimes, the k-nearest neighbor technique.

4.Text mining) Techniques ™

The main techniques of text mining include:

1. Feature extraction

2. Topic tic indexing

3. Clustering

4. Compaction

These four methods are essential, because it solved two major problems with text-mining application: You make achieving textual information, and they reduce the Volume of text

that the end user must be read before the information can be found. feature extraction deals with the search for special Pieces of information within a text. The target information, a general form, such as type descriptions or driven the economy of the former, while pattern will. For example, applications to analyze mergers and acquisitions stories names of the companies involved, costs, financing, and whether government approval is required to extract. thematic indexing uses the knowledge about the meaning of words in a text to identify broad topics covered in the document. For example, classified documents about aspirin and could be both a pain reliever or analgesic. Thematic indexing, as is often implemented through a multi-dimensional taxonomy. Taxonomy, in which text-mining sense, is a hierarchical knowledge representation scheme. This construct, sometimes called ontology and is different from navigational taxonomies such as Yahoo, provides the means to search for documents about a topic instead of documents with certain keywords. For example, should the research analyst of mobile communications in the situation to search for documents about wireless protocols, without requiring you to know phrases like "Wireless Application Protocol (WAP). clustering is another Text mining applications in engineering with the bus in intelligence. Clustering groups of similar documents according to dominant. In text mining and information retrieval, is a weighted feature vector often used to describe a document. These feature vectors contain a list of main topics or keywords, along with a numerical indication of the relative weight Importance of the theme or concept to the document as a whole. Unlike data mining applications that use a specific set of functions for all elements analyzed (eg age, income, gender, etc.), documents are described with a small number of concepts or topics chosen by potentially thousands of possible dimensions. There is no single, best way to deal with document clustering lot, but three approaches are commonly used: hierarchical clustering, binary clusters and Self-Organizing Maps. Hierarchical Cluster [3] use a set approach. The root of the hierarchy is the set of all documents in a collection, and the leaf node-sets with individual documents. Between layers in the leaf nodes have increasingly large amounts of documents, grouped by similarity. In each document into a binary clusters and created only one cluster and cluster in order to maximize the similarity measures between documents in a cluster and minimize the similarity measure between documents in different clusters. Self-Organizing Maps (SOM) using neural networks to map documents in sparse high-dimensional spaces in

Two-dimensional maps. Similar documents tend to the same position in the two-dimensional lattice. The latest text mining technology is compression. The purpose of the summary is to describe the contents of a document; read during the reduction of the amount of text must be a user. The main ideas of most documents can be described with as little as 20 percent of the original text. Little is lost with a summary. Like clustering, there is no single compression algorithm. Most use morphological analysis of words to the most frequently to identify terms used, while eliminating words that carry little meaning, as the contributions of the one-weight algorithms and a. Some terms used in Open or closing sentences more severe than other terms, while some approaches to search by keywords to identify which

5. Application of TM

And legislative branches of government organizations, enterprises and universities as well as journalists, writers and Students, we create all, store, retrieve and analyze text. Therefore, numerous organizations are familiar with various document management and text analysis tasks. Consider A few simple examples: · Internet search engine could achieve much better results by the acceptance and processing of natural language queries. If the documents in Response to an inquiry found on their semantic relevance in the context of the original request, analyzes it could grow significantly the accuracy of the search: instead of Search for a knock-out crowd of more than 10,000 documents that could, in answer to your question, the system presents a short list of key documents. · Call center specialists have the To understand customer support issues quickly select relevant documents from among the available manuals, frequently asked questions lists, and engineering notes, and call the pieces of knowledge that will help answer the question. An automated system for the categorization of available materials and retrieving the main fragments of appropriate natural Language issues could dramatically reduce hundreds of thousands of man hours and response to save time. The determination of the best pieces of thesauri and anthologies could significantly improve to remember, or the thoroughness of the search. Lawyers, insurance companies and venture capitalists are often quick to grasp the significance of the cases, claims and proposals accordingly. You need to find the quality of the query of the web and various databases, enhance and retrieve the relevant documents. Your practice could greatly from automatic grouping of text and feature extraction will benefit if major points from the text in a database with meta-information organized to improve the future Access to knowledge contained in the documents. Works of medical journals for new hypotheses of cause and effect of disease is an ideal case, what the text Mining should be able to do. "Intelligent e-mail routing, chat areas Automatic monitoring, monitoring of web sites are all important application

5.1. Big challenges for text mining.

Text mining is an exciting research – a zone which the information overload problem solved by the use of techniques, data mining, machine learning, NLP, IR, and Knowledge Management to try. Text Mining involves the preprocessing of document collections (text Categorization, information extraction, term extraction), the storage of intermediate representations, the techniques to analyze these intermediate representations (distribution analysis, clustering, Trend analysis, association rules, etc.) and visualization of results. Here are some of the challenges faced by the text mining research:

5.2.Challenge 1: Entity extraction.

Most text analytics systems rely on accurate extraction of entities and relationships from the documents. However, the accuracy of the company Extraction in some areas reaches only 70 to 80% and creates a noise level, the adaptation of text-mining systems prohibits by a wider audience. We are looking for domain independent and language-independent NER (named entity recognition) systems to be in a position with an accuracy of 99-100% is reached. Are based on such a system, we are looking for domain independent and language-independent ratio of Extraction will be able to

Precision to reach from 98-100% and the recall of 95-100%. Since the systems in each domain, it should be completely autonomous and requires no human intervention.

5.3.Challenge 2: Autonomous Text Analysis.

Text analytics systems are now fairly conducted searching, and users can to look at various aspects of the body. We want a text analytics system that are completely autonomous and analyze large corpora and come with really interesting insights not covered by a single document in the corpus and are not previously known. The system can use the Internet to filter results that are already known. The "interest" Measure, which totally subjective

will be defined by a committee of experts in each area. Such systems can then alert the purposes are used in the financial sector, the anti-terrorism domain, the biomedical field, and many other commercial areas. The system will flow from a variety of documents received from sources and e-mails relevant persons, if a "find interesting is" to found. Based on systems developed in Step 1 and 2, we have wants (this is our text-mining Grand Challenge)

6th Conclusion

Mining Texts in different languages is a major problem, since text-mining tools should be able to deal with many languages and multilingual workforce documents. Integration of a domain knowledge base with a text-mining engine would increase its efficiency, especially in information retrieval and information extraction phases. The acquisition of such knowledge necessitates effective search of documents and linking pieces of textual information from various sources (eg the World Wide Web). Discover this quiet know ledge is an essential prerequisite for many companies, due its wide range of applications

7. References

1. Jochen Dörre, Peter Gersti, Roland Seiffert (1999), Text mining: Finding nuggets in mountains of textual data, ACM KDD 1999 in San Diego, CA, USA.

2. Ah-Hwee Tan, (1999), Text Mining: The Prior art and the challenges in

Procedures PAKDD'99 Workshop on Knowledge Discovery from Advanced

Databases (KDAD'99) Beijing, pp. 71-76, April 1999.

3. Danial Tkach, (1998), Text Mining Technology Turning Information Into

Knowledge A Whitepaper from IBM.

4. Helena Ahonen, Oskari Heinonen, Mika Klemettinen, A. Inkeri Verkamo, (1997),

Applying Data Mining Techniques in Text Analysis, Report C-1997-23,

Department of Computer Science, University of Helsinki, 1997

5. Mark Dixon, (1997), An Overview of Document Mining Technology,

http://www.geocities.com/ResearchTriangle/Thinktank/1997/mark/writings/dixm

97_dm.ps

Arseniev, SB & Kiselev, MV (1991)

The Object-Oriented Approach to the Medical Real-Time System Design, Proceedings of MIE-91, In:

Lecture Notes in Medical Informatics, Springer-Verlag, Berlin, V.45, p. 508-512

Falkenhainer, BC & Michalski, RS (1990)

Integration Quantitative and Qualitative Discovery in the ABACUS system, in: Y. Kodratoff,

RSMichalski, (ed.) Machine Learning: An Artificial Intelligence Approach (Volume III). San

Mateo, CA: Kaufmann, p. 153-190.

Kiselev, MV (1994)

PolyAnalyst – a machine discovery system Inferring Functional Programs, Proceedings of AAAI

Workshop on Knowledge Discovery in Databases'94, Seattle, p. 237-249.

Kiselev, MV, Arseniev, SB & Flerov EV (1994)

PolyAnalyst – a machine discovery system for Intelligent analysis of clinical Data ESCTAIC-4

Abstracts (European Association for Computer Technology in Anesthesiology and Intensive Care)

Halkidiki, Greece, p. H6.

Langley, P., SIMON, HA, BRADSHAW, GL & Zytkow JM (1987)

Scientific discovery: Computational explorations the creative processes. Cambridge, MA: MIT

Press.

Mr. Chandrakant R. Satpute. Is a librarian in Godavari College of Engineering, Jalgaon Maharashtra. He carried it with 11 years experience in teaching and librarianship. He was linked with the KLA. (Khandesh Library Association), he has published six national and international Paper. His area of interest in library automation and digitization.

E-mail: chandanlib1@yahoo.co.in

About the Author

Master in Library and information science

What can i keep if they foreclose on my house? Act Now!

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay

Leave a comment

Your comment